Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
On scheduling, timeliness and security in large scale distributed computing
(USC Thesis Other)
On scheduling, timeliness and security in large scale distributed computing
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
On Scheduling, Timeliness and Security in Large Scale Distributed Computing by Chien-Sheng Yang A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Electrical Engineering) December 2021 Copyright 2021 Chien-Sheng Yang To My Lovely Family. ii Acknowledgements First, I would like to thank to my advisor Prof. Salman Avestimehr for his guidance and feedback during my PhD study. It has been an absolute honor to work with him. His vision, knowledge, and patience have paved the way toward the accomplishments that I have achieved in the past few years. He always provides explicit and detailed suggestions on how to identify research directions, formulate problems, and present research results. He always tells me that it is important to make good presentations such that the audience can also get excited on the results. I am grateful for the constant support that he has provided throughout my time at USC. I would next thank Prof. Ramtin Pedarsani (from University of California, Santa Bar- bara) who has been an amazing collaborator over the past 4 years. We have collaborated on several papers on task scheduling and coded computing. I always appreciate his insightful ideas to solve technical difficulties. I would also like to thank my thesis committee members Prof. Bhaskar Krishnamachari, Prof. Adel Javanmard and my qualifying exam committee members Prof. Urbashi Mitra and Prof. Murali Annavaram, whose insightful feedbacks have helped to significantly improve the quality of this dissertation. I would like to thank my friends and colleagues Songze Li, Qian Yu, Saurav Prakash, Jinhyun So, Mohammadreza Mousavi Kalan, Chaoyang He, Ahmed R. Elkordy, Yue Niu, Ramy Ali, Po-Han Huang and Ming-Chun Lee for their advice and supports. Last but not least, I would like to express my deepest gratitude to my lovely family for their relentless love and support. This dissertation is dedicated to them. iii Table of Contents Dedication ii Acknowledgements iii List of Tables vii List of Figures viii Abstract x Chapter 1: Introduction 1 Chapter 2: Communication-Aware Scheduling of Serial Tasks for Dispersed Computing 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Related Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Computation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Capacity Region Characterization . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Queueing Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.1 Queueing Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.2 Queueing Network Planning Problem . . . . . . . . . . . . . . . . . . 20 2.5 Throughput-Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.6 Complexity of throughput-optimal policy . . . . . . . . . . . . . . . . . . . . 30 2.7 Towards more general Computing Model . . . . . . . . . . . . . . . . . . . . 35 2.7.1 DAG Computing Model . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.7.2 Queueing Network Model for DAG Scheduling . . . . . . . . . . . . . 37 2.7.3 Capacity Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7.4 Throughput-Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . 41 2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Chapter 3: Timely-Throughput Optimal Coded Computing over Cloud Net- works 44 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 iv 3.1.1 Related Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2.1 Computation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2.2 Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3 Lagrange Estimate-and-Allocate (LEA) Strategy . . . . . . . . . . . . . . . . 52 3.3.1 Data Encoding in LEA . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.2 Load Allocation in LEA . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4 Upper bound on the timely computation throughput . . . . . . . . . . . . . 56 3.4.1 Optimal Success Probability of One Round Computation . . . . . . . 57 3.4.2 Load Allocation Problem . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5 Optimality of LEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.6.1 Numerical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.6.2 Experiments using Amazon EC2 machines . . . . . . . . . . . . . . . 68 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Chapter4: EdgeComputingintheDark: LeveragingContextual-Combinatorial Bandit and Coded Computing 72 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.1.1 Related Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2.1 Computation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2.2 Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.3 Online Coded Edge Computing . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3.1 Lagrange Coded Computing . . . . . . . . . . . . . . . . . . . . . . . 80 4.3.2 CC-MAB for Coded Edge Computing . . . . . . . . . . . . . . . . . . 82 4.3.3 Optimal Offline Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.4 Online Coded Edge Computing Policy . . . . . . . . . . . . . . . . . 85 4.4 Asymptotic Optimality of Online Coded Edge Computing Policy . . . . . . . 87 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.6 Concluding Remarks and Future Directions . . . . . . . . . . . . . . . . . . . 96 Chapter 5: Coded Computing for Secure Boolean Computations 98 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.1.1 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.1.2 Related Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3 Overview of Lagrange Coded Computing . . . . . . . . . . . . . . . . . . . . 106 5.4 Scheme 1: Coded Algebraic Normal Form . . . . . . . . . . . . . . . . . . . 108 5.4.1 Formal Description of Coded ANF . . . . . . . . . . . . . . . . . . . 109 5.4.2 Security Threshold of Coded ANF . . . . . . . . . . . . . . . . . . . . 110 5.5 Scheme 2: Coded Disjunctive Normal Form . . . . . . . . . . . . . . . . . . . 111 5.5.1 Formal Description of Coded DNF . . . . . . . . . . . . . . . . . . . 112 v 5.5.2 Security Threshold of Coded DNF . . . . . . . . . . . . . . . . . . . . 113 5.6 Scheme 3: Coded Polynomial Threshold Function . . . . . . . . . . . . . . . 114 5.6.1 Formal Description of Coded PTF . . . . . . . . . . . . . . . . . . . . 114 5.6.2 Security Threshold of Coded PTF . . . . . . . . . . . . . . . . . . . . 116 5.6.3 Coded D-partitioned PTF . . . . . . . . . . . . . . . . . . . . . . . . 117 5.7 Matching Outer Bound for coded ANF and coded DNF . . . . . . . . . . . . 120 5.8 Concluding Remarks and Future Directions . . . . . . . . . . . . . . . . . . . 122 Bibliography 123 Appendices 132 I Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 J Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 K Proof of Lemma 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 L Proof of Lemma 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 M Proof of Lemma 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 N Proof of Lemma 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 O Proof of Lemma 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 P Proof of Lemma 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Q Proof of Lemma 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 vi List of Tables 5.1 Performance comparison of LCC and the proposed three schemes for the Boolean functionf :f0; 1g m !f0; 1g which has the sparsityr(f) and weight w(f). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 vii List of Figures 2.1 Illustration of dispersed computing. . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 A simple example of task scheduling for dispersed computing. . . . . . . . . 10 2.3 The comparison of capacity regions between the previous model without com- munication constraints [80, 79] and our proposed model with communication constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 k is a root of one chain (k2C). . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 k is not a root of one chain (k = 2C). . . . . . . . . . . . . . . . . . . . . . . . 18 2.6 k2H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.7 Overview of DAG scheduling for dispersed computing. . . . . . . . . . . . . . 35 2.8 Queueing Network for the simple DAG in Fig. 2.7. . . . . . . . . . . . . . . 37 2.9 Queueing Network with Additional Precedence Constraints for DAG in Fig. 2.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1 Empiricalmeasurementofspeedvariationofacredit-based t2.microinstance in Amazon EC2 in which we keep assigning computation (e.g., a matrix mul- tiplication) to the instance and measure the finish times: A two-state Markov model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2 Overview of dynamic load allocation over a coded computing framework with timely computation requests. In each round m, the goal is to compute the evaluations f(X 1 );:::;f(X k ) by the deadline d using n workers. . . . . . . . 49 3.3 Numerical Evaluations. Compared with the static load allocation strategy, LEA improves the timely computation throughput by 1:38 17:5. . . . . 68 3.4 Experimental evaluations over 15 t2.micro instances in Amazon EC2. Com- pared with the static load allocation strategy, LEA improves the timely com- putation throughput by 1:27 6:5. . . . . . . . . . . . . . . . . . . . . . 70 viii 4.1 Overview of online computation offloading over an edge network with timely computation requests. In round t, the goal of user is to compute the Map functions f t (X 1 );:::;f t (X k ) by the deadline d t using the edge devices. . . . 78 4.2 Numerical evaluations for cumulative reward for Scenario 1. . . . . . . . . . 94 4.3 Numerical evaluations for cumulative reward for Scenario 2. . . . . . . . . . 94 4.4 Numerical evaluations for cumulative reward for Scenario 3. . . . . . . . . . 96 4.5 Numerical evaluations for cumulative reward for Scenario 4. . . . . . . . . . 96 4.6 Expected regret of the online coded edge computing policy for Scenario 1. . . 97 5.1 Modeling the Boolean function as a general polynomial can result in the high- degree difficulty which makes the security threshold low by using LCC encod- ing. The main idea of our proposed approach is to model it as the concatena- tion of some low-degree polynomials and the threshold functions. . . . . . . 100 ix Abstract In this dissertation, we focus on scheduling, timeliness and security in modern large-scale distributed computing networks, where a massive computation is partitioned into smaller computations and performed in a distributed manner for improving overall performance. For the first part of this thesis, we consider the problem of task scheduling in dispersed computing networks that leverage the computing capabilities of heterogeneous resources across the network. In dispersed computing networks, communication bandwidth between different nodes scenarios can be also very limited and heterogeneous. Moreover, the ar- riving computation jobs are modeled as chains, with nodes representing tasks, and edges representing precedence constraints among tasks. We aim at designing efficient algorithms which carefully account for computation and communication heterogeneity. We propose a Max-Weight type scheduling policy and show that it is throughput-optimal. Then, we consider the problem of computation offloading over cloud networks with a sequence of timely computation jobs, which have to be accomplished before their deadlines. Motivated by measurements over Amazon EC2 clusters, we consider a two-state Markov model for variability of computing speed. In particular, we focus on a Coded Computing framework, in which the data is possibly encoded and stored at the nodes in order to provide robustness against slow nodes. We propose a dynamic computation strategy called Lagrange Estimate-and-Allocate strategy, which achieves the optimal timely computation throughput. Beyond the two-state Markov model, we consider a more general framework that does not make any strong assumption (e.g., Markov model) on underlying model for the speed of devices. Under this framework, we formulate the problem of computation offloading using contextual-combinatorial multi-armed bandits (CC-MAB), and aim to maximize the x cumulative expected reward. We propose an online learning policy, which provably achieves asymptotically-optimal performance. Finally, we consider the problem of distributed computation with particular focus on security against Byzantine workers. In a distributed setting, we focus on computing Boolean functions, which are the key components of many applications, e.g., verification functions in blockchain systems and design of cryptographic algorithms. The maximum number of adversarial workers tolerated by the state-of-the-art scheme Lagrange Coded Computing can be extremely low if the degree of the polynomial is high. To overcome such degree problem, we propose three different schemes called coded Algebraic normal form (ANF), coded Disjunctive normal form (DNF) andcoded polynomial threshold function (PTF),whose key idea is to model a Boolean function as the concatenation of some low-degree polynomials and threshold functions. xi Chapter 1 Introduction In recent years, the rapid growth of large-scale data analytics has expedited the devel- opments of data-intensive applications, e.g., image recognition, autonomous navigation and augmented/virtual reality. To handle the growing size of modern datasets, a large-scale com- putation is partitioned into smaller computations and performed in a distributed manner. In order to improve the overall performance, two modern distributed computing frameworks, dispersed computing and coded distributed computing, have been proposed. Dispersed com- puting migrates cloud services closer to end users by leveraging the computation and storage capacity of edge devices (e.g., mobile user equipment, routers and IoT devices [18, 24, 41]). Coded distributed computing utilizes coding to inject computation redundancy in order to alleviate the various issues that arise in large-scale distributed computing (e.g., straggler mitigation [54] and bandwidth reduction [62]). In this dissertation, we consider these two paradigms with focus on task scheduling, timely computation and security. First, we consider the problem of task scheduling for dispersed computing networks that leverage the computing capabilities of heterogeneous resources dispersed across the network for processing massive amount of data collected at the edge of the network. In a dynamic setting, arriving computation jobs are modeled as chains, with nodes representing tasks, and edgesrepresentingprecedenceconstraintsamongtasks. Inourproposedmodel, motivatedby significant communication costs in dispersed computing environments, the communication timesaretakenintoaccount. Morespecifically, weconsideranetworkwhereserverscanserve 1 all task types, and sending the outputs of processed tasks from one server to another server resultsinsomecommunicationdelay. Wefirstcharacterizethecapacityregionofthenetwork, then propose a novel virtual queueing network encoding the state of the network. Finally, we propose a Max-Weight type scheduling policy, and considering the stochastic network in the fluid limit, we use a Lyapunov argument to show that the policy is throughput-optimal. Beyond the model of chains, we extend the scheduling problem to the model of directed acyclic graph (DAG) which imposes a new challenge, namely logic dependency difficulty, requiring the data of processed parents tasks to be sent to the same server for processing the child task. We propose a virtual queueing network for DAG scheduling over broadcast networks, where servers always broadcast the data of processed tasks to other servers, and prove that Max-Weight policy is throughput-optimal. Motivated by the fact that unpredictable and unreliable infrastructures result in high variability of computing resources [113, 5], we then consider a coded computing framework, in which the data is possibly encoded and stored at the worker nodes in order to provide robustness against slow nodes. At the same time, there is a significant increase in utilizing the cloud for event-driven and time-sensitive computations (e.g., IoT applications and cog- nitive services). We consider the problem of computation offloading over a coded distributed computing network with a sequence of timely computation jobs. In particular, we consider a two-state Markov model for variability of computing speed in cloud networks. In this model, each node can be either in a good state or a bad state in terms of the computation speed, and the transition between these states is modeled as a Markov chain which is unknown to the scheduler. With timely computation requests submitted to the system with computa- tion deadlines, our goal is to design the optimal computation-load allocation scheme and the optimal data encoding scheme that maximize the timely computation throughput (i.e, the average number of computation tasks that are accomplished before their deadline). Our main result is the development of a dynamic computation strategy called Lagrange Estimate- and-Allocate (LEA) strategy, which achieves the optimal timely computation throughput. It 2 is shown that compared to the static allocation strategy, LEA improves the timely computa- tion throughput by 1:4 17:5 in various scenarios via simulations and by 1:27 6:5 in experiments over Amazon EC2 clusters. Beyond the two-state Markov model, we further consider a more general model that does not make any strong assumption on the speed of devices. Under this framework, we model the service quality of each device as function of context (collection of factors that affect devices). Motivated by the MapReduce computation paradigm, we assume that each computation job can be partitioned to smaller Map functions which are processed at the edge, and the Reduce function is computed at the user after the Map results are collected from the edge nodes. We formulate this problem under a Coded Computing framework using contextual-combinatorial multi-armed bandits (CC-MAB), and aim to maximize the cumulative expected reward. We propose an online learning policy called online coded edge computing policy, which provably achieves asymptotically-optimal performance in terms of regret loss compared with the optimal offline policy for the proposed CC-MAB problem. In terms of the cumulative reward, it is shown that the online coded edge computing policy significantly outperforms other benchmarks via numerical studies. Another key challenge in distributed computing networks is to provide security against adversarial workers that deliberately send erroneous data in order to affect the computa- tion for their benefit [13, 25, 17]. We consider such distributed computing problem with particular focus on security against Byzantine workers. In a distributed setting, we focus on computing Boolean functions, which are the key components of many applications, e.g., verification functions in blockchain systems [58] and design of cryptographic algorithms [26]. Any Boolean function can be modeled as a multivariate polynomial with high degree in general. However, the security threshold (i.e., the maximum number of adversarial work- ers can be tolerated such that the correct results can be obtained) provided by the recent proposed Lagrange Coded Computing (LCC) [112] can be extremely low if the degree of the polynomial is high. To resolve such degree problem, we propose three different schemes 3 called coded Algebraic normal form (ANF), coded Disjunctive normal form (DNF) and coded polynomial threshold function (PTF). The key idea of the proposed schemes is to model it as the concatenation of some low-degree polynomials and threshold functions. In terms of the security threshold, we show that the proposed coded ANF and coded DNF are optimal by providing a matching outer bound. For the Boolean functions with polynomial sparsity and weight, it is demonstrated that the proposed coded PTF outperforms LCC in terms of the security threshold and the decoding complexity. Beyond the Boolean computations, we extend the problem to the model of general multivariate polynomials, and propose coded data logarithm and coded data augmentation to resolve the high-degree difficulty. 4 Chapter 2 Communication-Aware Scheduling of Serial Tasks for Dispersed Computing 2.1 Introduction In many large-scale data analysis application domains, such as surveillance, autonomous navigation, and cyber-security, much of the needed data is collected at the edge of the network via a collection of sensors, mobile platforms, and users’ devices. In these scenarios, continuoustransferofthemassiveamountofcollecteddatafromedgeofthenetworktoback- end servers (e.g., cloud) for processing incurs significant communication and latency costs. As a result, there is a growing interest in development of in-network dispersed computing paradigms that leverage the computing capabilities of heterogeneous resources dispersed across the network (e.g., edge computing, fog computing [18, 24, 41]). At a high level, a dispersed computing scenario (see Fig. 2.1) consists of a group of networkedcomputationnodes,suchaswirelessedgeaccesspoints,networkrouters,andusers’ computers that can be utilized for offloading the computations. There is, however, a broad rangeofcomputingcapabilitiesthatmaybesupportedbydifferentcomputationnodes. Some may perform certain kinds of operations at extremely high rate, such as high throughput matrix multiplication on GPUs, while the same node may perform worse on single threaded performance. Communication bandwidth between different nodes in dispersed computing 5 Figure 2.1: Illustration of dispersed computing. scenarios can also be very limited and heterogeneous. Thus, for scheduling of computation tasks in such networks, it is critical to design efficient algorithms which carefully account for computation and communication heterogeneity. Inthischapter,weconsiderthetaskschedulingprobleminadispersedcomputingnetwork in which arriving jobs are modeled as chains, with nodes representing tasks, and edges representing precedence constraints among tasks. Each server is capable of serving all the task types and the service rate of a server depends on which task type it is serving. 1 More specifically, after one task is processed by a server, the server can either process the children task locally or send the result to another server in the network to continue with processing of the children task. However, each server has a bandwidth constraint that determines the delay for sending the results. A significant challenge in this communication-aware scheduling is that unlike traditional queueing networks, processed tasks are not sent from one queue to another queue probabilistically. Indeed, the scheduling decisions also determine the routing of tasks in the network. Therefore, it is not clear what is the maximum throughput (or, equivalently, the capacity region) that one can achieve in such networks, and what scheduling policy is throughput-optimal. This raises the following questions. • What is the capacity region of the network? 1 The exponential distribution of servers’ processing times is commonly observed in many computing scenarios (see e.g., [54, 86]), and the considered geometric distribution in this chapter is the equivalent of exponential distribution for discrete-time systems. 6 • What is a throughput-optimal scheduling policy for the network? Our computation and network models are related to [80, 79]. However, the model that we consider in this chapter is more general, as the communication times between servers are taken into account. In our network model, sending the outputs of processed tasks from one server to another server results in some communication constraints that make the design of efficient scheduling policy even more challenging. As main contributions of this chapter, we first characterize the capacity region of this problem (i.e., the set of all arrival rate vectors of computations for which there exists a schedulingpolicythatmakesthenetworkratestable). Tocapturethecomplicatedcomputing and communication procedures in the network, we propose a novel virtual queueing network encoding the state of the network. Then, we propose a Max-Weight type scheduling policy for the virtual queueing network, and show that it is throughput-optimal. Since the proposed virtual queueing network is quite different from traditional queueing networks, it is not clear that the capacity region of the proposed virtual queueing network is equivalent to the capacity region of the original scheduling problem. Thus, to prove throughput-optimality Max-Weight policy, we first show the equivalence of two capacity regions: one for the dispersed computing problem that is characterized by a linear program (LP),andoneforthevirtualqueueingnetworkcharacterizedbyamathematicaloptimization problem that is not an LP. Then, under the Max-Weight policy, we consider the stochastic network in the fluid limit, and using a Lyapunov argument, we show that the fluid model of the virtual queueing network is weakly stable [27] for all arrival vectors in the capacity region, and stable for all arrival vectors in the interior of the capacity region. This implies that the Max-Weight policy is throughput-optimal for the virtual queueing network as well as for the original scheduling problem. Finally, we extend the scheduling problem for dispersed computing to a more general computing model, where jobs are modeled as directed acyclic graphs (DAG). Modeling a job as a DAG incurs more complex logic dependencies among the smaller tasks of the job 7 compared to chains. More precisely, the logic dependency difficulty arises due to the re- quirement that the data of processed parents tasks have to be sent to the same server for processing child tasks. To resolve this logic dependency difficulty, we consider a specific class of networks, named broadcast network, where servers in the network always broadcast the data of processed tasks to other servers, and propose a virtual queueing network for the DAG scheduling problem. We further demonstrate that Max-Weight policy is throughput-optimal for the proposed virtual queueing network. In the following, we summarize the key contributions in this chapter: • We characterize the capacity region for the new network model. • To capture the heterogeneity of computation and communication in the network, we propose a novel virtual queueing network. • We propose a Max-Weight type scheduling policy, which is throughput-optimal for the proposed virtual queueing network. • For the communication-aware DAG scheduling problem for dispersed computing, we demonstrate that Max-Weight policy is throughput-optimal for broadcast networks. 2.1.1 Related Prior Work Task scheduling problem has been widely studied in the literature, which can be divided into two main categories: static scheduling and dynamic scheduling. In the static or offline scheduling problem, jobs are present at the beginning, and the goal is to allocate tasks to servers such that a performance metric such as average computation delay is minimized. In most cases, the static scheduling problem is computationally hard, and various heuristics, approximation and stochastic approaches are proposed (see e.g. [50, 116, 97, 56, 23, 16, 107, 99]). Given a task graph over heterogeneous processors, [99] proposes Heterogeneous Earliest Finish Time (HEFT) algorithm which first prioritizes tasks based on the dependencies in the graph, and then assign tasks to processors starting with the highest priority. In the scenarios 8 of edge computing, static scheduling problem has been widely investigated in recent years [44, 42, 66, 115]. To minimize computation latency while meeting prescribed constraints, [44] proposes a polynomial time approximation scheme algorithm with theoretical performance guarantees. [42] proposes an heuristic online task offloading algorithm which makes the parallelism between the mobile and the cloud maximized by using a load-balancing approach. [66] proposes an optimal wireless-aware joint scheduling and computation offloading scheme for multicomponent applications. In [115], under a stochastic wireless channel, collaborative task execution between a mobile device and a cloud clone for mobile applications has been investigated. In the online scheduling problem, jobs arrive to the network according to a stochastic process, and get scheduled dynamically over time. In many works in the literature, the tasks have dedicated servers for processing, and the goal is to establish stability conditions for the network [10, 9]. Given the stability results, the next natural goal is to compute the expected completion times of jobs or delay distributions. However, few analytical results are available for characterizing the delay performance, except for the simplest models. One approach to understand the delay performance of stochastic networks is analyzing the network in “heavy- traffic” regime. See for example [100, 72, 95]. When the tasks do not have dedicated servers, one aims to find a throughput-optimal scheduling policy (see e.g., [32]), i.e., a policy that stabilizes the network, whenever it can be stabilized. Max-Weight scheduling, proposed in [98, 28], is known to be throughput-optimal for wireless networks, flexible queueing networks [71, 31, 101] and data centers networks [65]. In [33, 114], the chain-type computation model is also considered for distributed computing networks. However, our network model is more general as it captures the computation heterogeneity in dispersed computing networks, e.g., the service rate of a server in our network model depends on which task type it serves. Notation. We denote by [N] the setf1;:::;Ng for any positive integerN. For any two vectors~ x and~ y, the notation~ x~ y means that x i y i for all i. 9 Figure 2.2: A simple example of task scheduling for dispersed computing. 2.2 System Model 2.2.1 Computation Model As shown in Fig. 2.2, each job is modeled as a chain which includes serial tasks. Each node of the chain represents one task type, and each (directed) edge of the chain represents a precedence constraint. Moreover, we consider M types of jobs, where each type is specified by one chain structure. For this problem, we define the following terms. Let (I m ;fc k g k2Im ) be the chain cor- responding to the job of type m, m2 [M], whereI m denotes the set of nodes of type-m jobs, and c k denotes the data size (bits) of output type-k task. Let the number of tasks of a type-m job be K m , i.e.jI m j =K m , and the total number of task types in the network be K, so that P M m=1 K m = K. We assume that jobs are independent with each other which impliesI 1 ;I 2 ;:::;I m are disjoint. Thus, we can index the task types in the network by k, k2 [K], starting from job type 1 to M. Therefore, task type-k belongs to job type m(k) if m(k)1 X m 0 =1 K m 0 <k m(k) X m 0 =1 K m 0: We call task k 0 a parent of task k if they belong to the same chain and there is a directed edge from k 0 to k. Without loss of generality, we let task k be the parent of task k + 1, if task k and task k + 1 belong to the same chain, i.e. m(k) =m(k + 1). In order to process task k + 1, the processing of task k should be completed. Node k is said to be the root of 10 chain type m(k) if k = 1 + P m(k)1 m 0 =1 K m 0. We denoteC as the set of the root nodes of the chains, i.e.C =fk :k = 1 + P i1 m=1 K m ;8i2 [M]g. Also, node k is said to be the last node of chain typem(k) ifk = P m(k) m 0 =1 K m 0. Then, we denoteH as the set of the last nodes of the chains, i.e.H =fk :k = P i m=1 K m ;8 i2 [M]g. 2.2.2 Network Model In the dispersed computing network, as shown in Fig. 2.2, there are J servers which are connected with each other. Each server can serve all types of tasks. We consider the network in discrete time. We assume that the arrival process of jobs of type m is a Bernoulli process with rate m , 0< m < 1; that is, in each time slot a job of type m arrives to the network with probability m independently over time. We assume that the service times for the nodes are geometrically distributed, independent across time slots and across different nodes, and alsoindependentfromthearrivalprocesses. Whenserverj processestype-k tasks, theservice completion time has mean 1 (k;j) . Thus, (k;j) can be interpreted as the service rate of type-k task when processed by serverj. Similarly, we model the communication times between two serversasgeometricdistribution, whichareindependentacrosstimeslotsandacrossdifferent nodes, and also independent from the arrival processes. When server j communicates data of size 1 bit to another server, the communication time has mean b 1 j . Therefore, b j can be interpreted as the average bandwidth (bits/time slot) of server j for communicating data of processed tasks. Without loss of generality, the system parameters can always be rescaled so that b j c k < 1 for all k and j, by speeding up the clock of the system. We assume that each server is able to communicate data and process tasks at the same time slot. In the task scheduling problem of dispersed computing, tasks are scheduled on servers based on a scheduling policy. After a task is served by a server, a scheduling policy is to determine where the data of processed task should be sent to for processing the child task. 11 2.2.3 Problem Formulation Given the above computation model and network model, we formulate the task scheduling problem of dispersed computing based on the following terms. Definition 1. LetQ n be a stochastic process of the number of jobs in the network over time n 0. A network is rate stable if lim n!1 Q n n = 0 almost surely: (2.1) Definition 2 (Capacity Region). We define the capacity region of the network to be the set of all arrival rate vectors where there exists a scheduling policy that makes the network rate stable. Definition 3. The fluid level of a stochastic process Q n , denoted by X(t), is defined as X(t) = lim r!1 Q brtc r : (2.2) Definition 4. LetX(t) be the fluid level of a stochastic process. The fluid model of the the process is weakly stable, if X(0) = 0 for t = 0, then X(t) = 0 for all t 0 [27]. Note that we later model the network as a network of virtual queues. Since the arrival and service processes are memoryless, given a scheduling policy, the queue-length vector in this virtual queueing network is a Markov process. Definition 5. A network is strongly stable if its underlying Markov process is positive recurrent for all the arrival rate vectors in the interior of the capacity region. Definition 6. A scheduling policy is throughput-optimal if, under this policy, the network is rate stable for all arrival rate vectors in the capacity region; and strongly stable for all arrival rate vectors in the interior of the capacity region. Based on above definitions, our problem is now formulated as the following. 12 Problem. Consider a dispersed computing network consisting of network and computation models as defined in Sections 2.2.1 and 2.2.2, we pose the following two questions: • What is the capacity region of the network as defined in Definition 2? • What is a throughput-optimal scheduling policy for the network as defined in Definition 6? 2.3 Capacity Region Characterization As mentioned previously, our goal is to find a throughput-optimal scheduling policy. Before that, we characterize the capacity region of the network. Now, we consider an arbitrary scheduling policy and define two allocation vectors to characterize the scheduling policy. Let p (k;j) be the long-run fraction of capacity that server j allocates for processing available type-k tasks. We define ~ p to be the capacity allocation vector. An allocation vector ~ p is feasible if K X k=1 p (k;j) 1;8j2 [J]: (2.3) Let q (k;j) be the long-run fraction of the bandwidth that server j allocates for commu- nicating data of processed type-k tasks. We can define ~ q to be the bandwidth allocation vector. Therefore, an allocation vector ~ q is feasible if X k2[K]nH q (k;j) 1;8j2 [J]: (2.4) Given a capacity allocation vector~ p, consider taskk and taskk +1 which are in the same chain on server j. As time t is large, up to time t, the number of type-k tasks processed by server j is (k;j) p (k;j) t and the number of type-(k + 1) tasks processed by server j is 13 (k+1;j) p (k+1;j) t. Therefore, as t is large, up to time t, the number of type-(k + 1) tasks that server j is not able to serve is (k;j) p (k;j) t (k+1;j) p (k+1;j) t (2.5) Clearly, the type-(k + 1) tasks which cannot be served by server j have to be processed by other servers. Hence, up to time t and t is large, server j has to at least communicate data of (k;j) p (k;j) t (k+1;j) p (k+1;j) t processed type-k tasks to other servers. On the other hand, given a bandwidth allocation vector ~ q, up to time t and t is large, the number of the type-k tasks communicated by server j is b j q (k;j) t c k . Therefore, to make the network stable, we obtain the following constraints: b j q (k;j) c k (k;j) p (k;j) (k+1;j) p (k+1;j) (2.6) 8 j2 [J] and8 k2 [K]nH. For this scheduling problem, we can define a linear program (LP) that characterizes the capacity region of the network, defined to be the rate vectors ~ for which there is a scheduling policy with corresponding allocation vectors ~ p and ~ q such that the network is rate stable. The nominal traffic rate to all nodes of job type m in the network is m . Let k ( ~ ) be the nominal traffic rate to the node of task k in the network. Then, k ( ~ ) = m if m(k) = m. The LP that characterizes capacity region of the network makes sure that the total service capacity allocated to each node in the network is at least as large as the nominal traffic rate to that node, and the communication rate of each server is at least as large as the rate of task that the server is not capable of serving. Then, the LP known as the static planning problem (SPP) [37] - is defined as follows: Static Planning Problem (SPP): Maximize (2.7) 14 subject to k ( ~ ) J X j=1 (k;j) p (k;j) ;8 k (2.8) b j q (k;j) c k (k;j) p (k;j) (k+1;j) p (k+1;j) ;8 j;8 k2 [K]nH (2.9) 1 K X k=1 p (k;j) ;8 j (2.10) 1 X k2[K]nH q (k;j) ;8 j (2.11) ~ p ~ 0; ~ q ~ 0: (2.12) Based on SPP above, the capacity region of the network can be characterized by following proposition. Proposition 1. The capacity region of the network characterizes the set of all rate vectors ~ 2R M + for which the corresponding optimal solution to the static planning problem (SPP) satisfies 0. In other words, capacity region of the network is characterized as follows , ( ~ 2R M + :9 ~ p ~ 0; ~ q ~ 0 s.t. K X k=1 p (k;j) 18 j; X k2[K]nH q (k;j) 18 j; k ( ~ ) J X j=1 (k;j) p (k;j) 8 k; and b j q (k;j) c k (k;j) p (k;j) (k+1;j) p (k+1;j) 8 j; 8 k2 [K]nH ) : Proof. We show that 0 is a necessary and sufficient condition for the rate stability of the network. Consider the network in the fluid limit (See [27] for more details on the stability of fluid models). At time t, we denote fluid level of type-k tasks in the network as X k (t), fluid level of type-k tasks served by server j as X (k;j) (t) and fluid level of type-k tasks sent by server j as X (k;j);c (t). The dynamics of the fluid are as follows X k (t) =X k (0) + m(k) tD k (t) (2.13) 15 where m(k) t is the total number of jobs of type m that have arrived to the network until time t, and D k (t) is the total number of type-k tasks that have been processed up to time t in the fluid limit. For8 k2 [K]nH, because of flow conservation, we have X (k;j) (t)X (k;j);c (t)X (k+1;j) (t): (2.14) Suppose < 0. Let’s show that the network is weakly unstable, i.e., if X k (0) = 0 for all k, there exists t 0 and k such that X k (t 0 ) > 0. In contrary, suppose that there exists a scheduling policy such that under that policy for all t 0 and all k, X k (t) = 0. Now, we pick a regular point t 1 which means X k (t 1 ) is differentiable at t 1 for all k. Then, for all k, _ X k (t 1 ) = 0 which implies that _ D k (t 1 ) = m(k) = k (). At a regular point t 1 , _ D k (t 1 ) is exactly the total service capacity allocated to type-k tasks at time t 1 . This implies that there exists p (k;j) at time t 1 such that k () = P J j=1 (k;j) p (k;j) for all k. Furthermore, from (2.14), we have (k;j) p (k;j) t 1 b j q (k;j) c k t 1 (k+1;j) p (k+1;j) t 1 (2.15) which implies that there exists q (k;j) at time t 1 such that b j q (k;j) c k (k;j) p (k;j) (k+1;j) p (k+1;j) (2.16) 8 k2 [K]nH. However, this contradicts < 0. Now suppose that 0, ~ p and ~ q are the capacity allocation vector and bandwidth allocation vector respectively that solve SPP. Now, let us consider a generalized head-of- the-line processor sharing policy that server j works on type-k tasks with capacity p (k;j) and communicates the processed data of type-k tasks with bandwidth b j q (k;j) . Then the cumulative service allocated to type-k tasks up to timet is P J j=1 (k;j) p (k;j) t ( k () + )t. Thus, we have _ X k (t) = k () P J j=1 (k;j) p (k;j) 0 for all t > 0 and all k. If 16 0 0.5 1 1.5 2 2.5 3 3.5 4 λ 1 0 0.5 1 1.5 2 2.5 λ 2 Without Communication Constraints With Communication Constraints Figure 2.3: The comparison of capacity regions between the previous model without communication constraints [80, 79] and our proposed model with communication constraints. X k (0) = 0 for all k, then X k (t) = 0 for all t 0 and all k which implies that the network is weakly stable, i.e., the network is rate stable [27]. Example. We consider that there are two types of jobs arriving to a network, in which K 1 = 2, K 2 = 3 and c k = 1,8 k. There are 2 servers in the network, where the service rates are: (1;1) = 4, (1;2) = 3, (2;1) = 2, (2;2) = 4, (3;1) = 2:5, (3;2) = 3:5, (4;1) = 0:5, (4;2) = 4:5, (5;1) = 3:5, (5;2) = 1 and average bandwidths areb 1 = 1:5,b 2 = 1. As shown in Fig. 2.3, the capacity region of the previous model without communication constraints [80, 79] is larger than the capacity region of our proposed model with communication constraints. 2.4 Queueing Network Model In this section, in order to find a throughput-optimal scheduling policy, we first design a virtual queueing network that encodes the state of the network. Then, we introduce an optimization problem called queueing network planning problem for the virtual queueing network to characterize the capacity region of this virtual queueing network. 17 Figure 2.4: k is a root of one chain (k2C). Figure 2.5: k is not a root of one chain (k = 2C). Figure 2.6: k 2 H. 2.4.1 Queueing Network Based on the computation model and network model described in Section 2.2, let’s illustrate how we model a queueing network. The queueing network consists of two kinds of queues, processing queue and communication queue, which are modeled in the following manner: 1. Processing Queue: We maintain one virtual queue called (k;j) for type-k tasks which are processed at server j. 2. Communication Queue: For k = 2H, we maintain one virtual queue called (k;j);c for processed type-k tasks to be sent to other servers by server j. Therefore, there are (2KM)J virtual queues in the queueing network. Concretely, the queueing network can be shown as illustrated in Fig. 2.4, Fig. 2.5 and Fig. 2.6. Now, we describe the dynamics of the virtual queues in the network. Let’s consider one type of job which consists of serial tasks. As shown in Fig. 2.4, a root task k of the job is sent to processing queue (k;j) if the task k is scheduled on server j when a new job comes to the network. For any node k in this chain, the result in processing queue (k;j) is sent to processing queue (k + 1;j) if task k + 1 is scheduled on server j. Otherwise, the result is sent to communication queue (k;j);c. If task k + 1 in queue (k;j);c is scheduled on server l, it is sent to queue (k + 1;l), where l2 [J]nfjg. As shown in Fig. 2.4, if k is a root of one chain, the traffic to processing queue (k;j) is only the traffic of type-m(k) jobs coming to the network. Otherwise, as shown in Fig. 2.5, the traffic to processing queue (k;j) 18 is from processing queue (k 1;j) and communication queues (k 1;l);c,8 l2 [J]nfjg. Furthermore, the traffic to communication queue (k;j);c is only from processing queue (k;j), where k2 [K]nH. Let Q (k;j) denote the length of processing queue (k;j) and Q (k;j);c denote the length of communication queue (k;j);c. A task of type k can be processed by server j if and only if Q (k;j) > 0 and a processed task of typek can be sent by serverj to other servers if and only if Q (k;j);c > 0. Let d n (k;j) 2f0; 1g be the number of processed tasks of type k by server j at timen,a n m(k) 2f0; 1g be the number of jobs of typem that arrives to the network at timen and d n (k;j);c 2f0; 1g be the number of processed type-k tasks sent to other servers by server j at time n. We denote u n m!j 2f0; 1g as the decision variable that root task of the type-m job is scheduled on server j at time n, i.e., u n m!j = 8 > > < > > : 1 root task of job m is scheduled on server j at time n 0 otherwise: We denote w n k;j!l 2f0; 1g as the decision variable that processed type-k task in (k;j);c is sent to (k + 1;l) at time n, i.e., w n k;j!l = 8 > > < > > : 1 processed type-k task in (k;j);c is sent to (k + 1;l) at time n 0 otherwise: Moreover, let s n k;j!j 2f0; 1g as the decision variable that processed type-k task in queue (k;j) is sent to queue (k + 1;j) at time n, i.e., s n k;j!j = 8 > > < > > : 1 processed type-k task in (k;j) is sent to (k + 1;j) at time n 0 otherwise: 19 We define~ u n , ~ w n and~ s n to be the decision vectors for u n m!j ,w n k;j!l ands n k;j!j respectively at time n . Now, we state the dynamics of the queueing network. If k2C, then Q n+1 (k;j) =Q n (k;j) +a n m(k) u n m(k)!j d n (k;j) ; (2.17) else, Q n+1 (k;j) =Q n (k;j) + X l2[J]nfjg d n (k1;l);c w n k1;l!j +d n (k1;j) s n k1;j!j d n (k;j) : (2.18) For8 k2 [K]nH, Q n+1 (k;j);c =Q n (k;j);c +d n (k;j) (1s n k;j!j )d n (k;j);c : (2.19) 2.4.2 Queueing Network Planning Problem Before we introduce an optimization problem that characterizes the capacity region of the described queueing network, we first define the following terms. Consider an arbitrary scheduling policy. Let u m!j be the long-run fraction that root tasks of the type-m job are scheduled on serverj. We define~ u to be the root-node allocation vector. A root-node allocation vector ~ u is feasible if J X j=1 u m!j = 1; 8 m2 [M]: (2.20) For the type-k tasks served by server j, we denote s k;j!j as the long-run fraction that their child tasks (type-(k + 1)) are scheduled on server j. Then, we denote ~ s as an allocation vector. 20 For the outputs of processed type-k tasks in virtual queue (k;j);c, we denote w k;j!l be the long-run fraction that they are sent to virtual queue (k + 1;l). An allocation vector ~ w is feasible if X l2[J]nfjg w k;j!l = 1; 8 j2 [J];8 k2 [K]nH: (2.21) For the type-k tasks which are processed by server j, we define f k;j!l as long-run fraction that their child tasks are scheduled on server l. Given allocation vectors ~ s and ~ w, we can write f k;j!l as follows: f k;j!l = 8 > > < > > : s k;j!j ; if l =j (1s k;j!j )w k;j!l ; otherwise: (2.22) Clearly, we have J X l=1 f k;j!l = 1; 8 j2 [J];8 k2 [K]nH: (2.23) Let r (k;j) denote the nominal rate to the virtual queue (k;j) and r (k;j);c denote the nominal rate to the virtual queue (k;j) c . If k2C, the nominal rate r (k;j) can be written as r (k;j) = m(k) u m(k)!j : (2.24) If k = 2C, the rate r (k;j) can be obtained by summing r (k1;l) with f k1;l!j over all servers, i.e., r (k;j) = J X l=1 r (k1;l) f k1;l!j ; (2.25) because of flow conservation. Moreover, the rate r (k;j);c can be written as r (k;j);c =r (k;j) (1s k;j!j ): (2.26) 21 Now, we introduce an optimization problem called queueing network planning problem (QNPP) that characterizes the capacity region of the virtual queueing network. Given the arrival rate vectors ~ , the queueing network planning problem ensures that the service rate allocated to each queue in the queueing network is at least as large as the nominal traffic rate to that queue. The problem is defined as follows: Queueing Network Planning Problem (QNPP): Maximize (2.27) subject to r (k;j) (k;j) p (k;j) ;8 j;8 k: (2.28) r (k;j);c b j q (k;j) c k ;8 j;8 k2 [K]nH: (2.29) and subject to allocation vectors being feasible, where r (k;j) and r (k;j);c are nominal rate defined in (2.24)-(2.26). Note that the allocation vectors~ p,~ q,~ u, ~ w are feasible if (2.3), (2.4), (2.20) and (2.21) are satisfied. Based on QNPP above, the capacity region of the virtual queueing network can be characterized by following proposition. Proposition 2. The capacity region 0 of the virtual queueing network characterizes the set of all rate vectors ~ 2R M + for which the corresponding optimal solution to the queueing network planning problem (QNPP) satisfies 0. In other words, capacity region 0 is characterized as follows 0 , ( ~ 2R M + :9 ~ p ~ 0; ~ q ~ 0; ~ u ~ 0; ~ s ~ 0; ~ w ~ 0 s.t. K X k=1 p (k;j) 18 j; r (k;j);c b j q (k;j) c k 8 j;8 k2 [K]nH; X k2[K]nH q (k;j) 18 j; r (k;j) (k;j) p (k;j) 8 k; 22 J X j=1 u m!j = 18 m; X l2[J]nfjg w k;j!l = 18 j; 8 k2 [K]nH ) : Proof. Weconsiderthevirtualqueueingnetworkinthefluidlimit. Definetheamountoffluid in virtual queue corresponding to type-k tasks processed by server j as X (k;j) (t). Similarly, define the amount of fluid in virtual queue corresponding to processed type-k tasks sent by server j as X (k;j);c (t). If k2C, the dynamics of the fluid are as follows X (k;j) (t) =X (k;j) (0) + m(k) u m(k)!j t (k;j) p (k;j) t; (2.30) else, X (k;j) (t) =X (k;j) (0) + ( X l2[J]nfjg b l q (k1;l) c k1 w k1;l!j + (k1;j) p (k1;j) s k1;j!j (k;j) p (k;j) )t: (2.31) For k2 [K]nH, we have X (k;j);c (t) =X (k;j);c (0) + (1s k;j!j ) (k;j) p (k;j) t b j q (k;j) c k t: (2.32) We suppose that < 0. Let’s show that the virtual queueing network is weakly unstable. In contrary, we suppose that there exists a scheduling policy such that under that policy for all t 0, we have X (k;j) (t) = 0;8 j;8 k (2.33) X (k;j);c (t) = 0;8 j;8 k2 [K]nH: (2.34) 23 Now, we pick a regular point t 1 . Then, for all k and all j, _ X (k;j) (t 1 ) = 0 which implies that there exit allocation vectors ~ p, ~ q,~ u,~ s, ~ w such that m(k) u m(k)!j (k;j) p (k;j) = 0;8 j;8 k2C (2.35) X l2[J]nfjg b l q (k1;l) c k1 w k1;l!j + (k1;j) p (k1;j) s k1;j!j (k;j) p (k;j) = 0;8 j;8 k2 [K]nC: (2.36) Similarly, we have (1s k;j!j ) (k;j) p (k;j) b j q (k;j) c k = 0;8 j;8 k2 [K]nH: Now, we show that r (k;j) = (k;j) p (k;j) for all j and all k by induction. First, consider an arbitrary k2C which is the root node of job m(k), then r (k;j) = m(k) u m(k)!j = (k;j) p (k;j) ;8 j: (2.37) Then, we have r (k+1;j) = J X l=1 r (k;l) f k;l!j = J X l=1 (k;l) p (k;l) f k;l!j (2.38) = X l2[J]nfjg (k;l) p (k;l) (1s k;l!l )w k;l!j + (k;j) p (k;j) s k;j!j (2.39) = X l2[J]nfjg b l q (k;l) c k w k;l!j + (k;j) p (k;j) s k;j!j (2.40) = (k+1;j) p (k+1;j) : (2.41) By induction, we have r (k;j) = (k;j) p (k;j) for all j and all k2I m . Thus, we aslo have r (k;j);c =r (k;j) (1s k;j!j ) (2.42) 24 = (k;j) p (k;j) (1s k;j!j ) (2.43) = b j q (k;j) c k ;8 j;8 k2 [K]nH (2.44) which contradicts < 0. Now, we suppose that 0. It follows that there exist vectors ~ p ,~ q ,~ u ,~ s , ~ w , (k;j) and (k;j);c such that (k;j) p (k;j) =r (k;j) + (k;j) ;8 j;8 k; (2.45) b j q (k;j) c k =r (k;j);c + (k;j);c ;8 j;8 k2 [K]nH; (2.46) (k;j) = y[k P m(k)1 m 0 =1 K m 0 1] y[K m(k) ] ;8 j;8 k; (2.47) (k;j);c = (k;j) + 1 y[K m(k) ] ;8 j;8 k2 [K]nH: (2.48) where the sequence y[n] satisfies the recurrence relationship: y[1] = 1; y[n] =J(y[n 1] + 1);8 n> 1: (2.49) Therefore, if k2C, then _ X (k;j) (t) = m(k) u m(k)!j (k;j) p (k;j) (2.50) =r (k;j) (k;j) p (k;j) (2.51) = (k;j) 0;8 j;8 t> 0; (2.52) else, _ X (k;j) (t) = X l2[J]nfjg b l q (k1;l) c k1 w k1;l!j + (k1;j) p (k1;j) s k1;j!j (k;j) p (k;j) (2.53) = X l2[J]nfjg r (k1;l);c w k1;l!j +r (k1;j) s k1;j!j r (k;j) 25 + X l2[J]nfjg (k1;l);c w k1;l!j + (k1;j) s k1;j!j (k;j) : (2.54) Note that (2.53) to (2.54) follows from (2.45) and (2.46). In (2.54), we have X l2[J]nfjg r (k1;l) (1s k1;l!l )w k1;l!j +r (k1;j) f k1;j!j r (k;j) (2.55) = X l2[J]nfjg r (k1;l) f k1;l!j +r (k1;j) f k1;j!j r (k;j) (2.56) = X 1lJ r (k1;l) f k1;l!j r (k;j) = 0; (2.57) and X l2[J]nfjg (k1;l);c w k1;l!j + (k1;j) s k1;j!j (k;j) (2.58) X l2[J]nfjg (k1;l);c + (k1;j) (k;j) (2.59) = y[K m(k) ] : (2.60) Note that (2.55) to (2.56) follows from (2.22); and (2.56) to (2.57) follows from (2.25); (2.58) to (2.59) is because w k1;l!j 1 and s k1;j!j 1. Thus, (2.53) can be written as _ X (k;j) (t) y[K m(k) ] 0;8 j;8 t> 0: (2.61) For k2 [K]nH, we have _ X (k;j);c (t) = (1s k;j!j ) kj p (k;j) b j q (k;j) c k (2.62) = (1s k;j!j )(r (k;j) + (k;j) )r (k;j);c (k;j);c (2.63) = (1s k;j!j ) (k;j) (k;j);c (2.64) (k;j) (k;j);c (2.65) 26 = y[K m(k) ] 0: (2.66) If X (k;j) (0) = 0 for all k and all j, then X (k;j) (t) = 0 for all t 0, all j and all k. Also, if X (k;j);c (0) = 0 for all k2 [K]nH and all j, then X (k;j);c (t) = 0 for all t 0, all j and all k. Thus, the virtual queueing network is weakly stable, i.e., the queueing network process is rate stable. 2.5 Throughput-Optimal Policy In this section, we propose Max-Weight scheduling policy for the network of virtual queues in Section 2.4 and show that it is throughput-optimal for the network of the original scheduling problem. The proposed virtual queueing network is quite different from traditional queueing net- works since the proposed network captures the communication procedures (routing of tasks determined by scheduling policies) in the network. Therefore, it is not clear that the capacity region characterized by QNPP is equivalent to the capacity region characterized by SPP. To prove the throughput-optimality of Max-Weight policy for the original scheduling problem, we first need to show the equivalence of capacity regions characterized by SPP and QNPP. Then, under the Max-Weight policy, we consider the queueing network in the fluid limit, and using a Lyapunov argument, we show that the fluid model of the virtual queueing network is weakly stable for all arrival vectors in the capacity region, and stable for all arrival vectors in the interior of the capacity region. 27 Now, we give a description of the Max-Weight policy for the proposed virtual queueing network. Given virtual queue-lengths Q n (k;j) , Q n (k;j);c and historyF n at time n, Max-Weight policy allocates the vectors ~ p, ~ q,~ u,~ s and ~ w that are 2 arg max ~ p;~ q;~ u;~ s;~ w ( ~ Q n ) T E[ ~ Q n jF n ] ( ~ Q n c ) T E[ ~ Q n c jF n ] = arg min ~ p;~ q;~ u;~ s;~ w ( ~ Q n ) T E[ ~ Q n jF n ] + ( ~ Q n c ) T E[ ~ Q n c jF n ]; where ~ Q n and ~ Q n c are the vectors of queue-lengths Q n (k;j) and Q n (k;j);c at time n. The Max- Weight policy is the choice of~ p,~ q,~ u,~ s and ~ w that minimizes the drift of a Lyapunov function V n = P k;j (Q n (k;j) ) 2 + P k;j (Q n (k;j);c ) 2 . The following theorem shows the throughput-optimality of Max-Weight policy. Theorem 1. Max-Weight policy is throughput-optimal for the network, i.e. Max-Weight policy is rate stable for all the arrival vectors in the capacity region defined in Proposition 1, and it makes the underlying Markov process positive recurrent for all the arrival rate vectors in the interior of . Proof. In order to prove Theorem 1, we first state Lemma 1 which is proved in Appendix A. Lemma 1. The capacity region characterized by static planning problem (SPP) is equivalent to the capacity region characterized by queueing network planning problem (QNPP), i.e. = 0 . Having Lemma 1, we now show that the queueing network is rate stable for all arrival vectorsin 0 , andstronglystableforallarrivalvectorsintheinteriorof 0 underMax-Weight policy. We consider the problem in the fluid limit. Define the amount of fluid in virtual queue corresponding to type-k tasks by server j as X (k;j) (t). Similarly, define the amount of fluid in virtual queue corresponding to processed type-k tasks sent by server j as X (k;j);c (t). 2 We defineF n to be the -algebra generated by all the random variables in the system up to time n. 28 Now, we define as the optimal value of QNPP. If we consider a rate vector ~ in the interior of 0 , then > 0. Directly from (2.52), (2.61) and (2.66), for t> 0, we have _ X (k;j) (t)< 0;8 j;8 k; (2.67) _ X (k;j);c (t)< 0;8 j;8 k2 [K]nH: (2.68) Now, we take V (t) = 1 2 ~ X T (t) ~ X(t) + 1 2 ~ X T c (t) ~ X c (t) as the Lyapunov function where X(t) and X c (t) are vectors forX (k;j) (t) andX (k;j);c (t) respectively. The drift ofV by using Max-Weight policy is _ V Max-Weight (t) = min ~ p;~ q;~ u;~ s;~ w ~ X T (t) ~ _ X(t) + ~ X T c (t) ~ _ X c (t) (2.69) ( ~ X ) T (t)( ~ _ X )(t) + ( ~ X c ) T (t) ~ _ X c (t) (2.70) < 0;8 t> 0; (2.71) using (2.67) and (2.68). Thus, for t> 0, we show that _ V Max-Weight (t)< 0 if ~ in the interior of 0 . This proves that the fluid model is stable under Max-Weight policy, which implies the positive recurrence of the underlying Markov chain [27]. Consider a vector ~ 2 0 , there exist allocation vectors ~ p , ~ q ,~ u ,~ s and ~ w such that (k;j) p (k;j) =r (k;j) ;8 j;8 k: (2.72) b j q (k;j) c k =r (k;j);c ;8 j;8 k2 [K]nH (2.73) From (2.50) and (2.53), we have _ X (k;j) (t) = 0;8 j;8 k;8 t> 0: (2.74) 29 From (2.62), we have _ X (k;j);c (t) = 0;8 j;8 k2 [K]nH;8 t> 0: (2.75) From (2.69), for t > 0, we have _ V Max-Weight (t) 0 if ~ 2 0 . If ~ X(0) = ~ 0 and ~ X c (0) = ~ 0, we have ~ X(t) = ~ 0 and ~ X c (t) = ~ 0 for all t 0, which shows that the fluid model is weakly stable under Max-Weight Policy, i.e., the queueing network process is rate stable. Therefore, Max-Weight policy is throughput-optimal for the queueing network which completes the proof. Remark 1. In the proof of Theorem 1, the most significant part is to prove Lemma 1 which shows that capacity region of the original scheduling problem (characterized by a LP) is equivalent to the capacity region of the proposed virtual queueing network (characterized by a complicated mathematical optimization problem). In [80, 79], without communication constraints, the capacity regions of original problem (denoted by ~ ) and the virtual queueing network (denoted by ~ 0 ) are characterized by LPs. Given a ~ 2 ~ with corresponding allocation vector, one can construct a feasible allocation vector for virtual queueing network supporting ~ bysplittingthetrafficequallyfromaqueueintothefollowingbranchingqueues. In Lemma 1, to prove 0 , we construct feasible vectors for virtual queueing network supporting ~ by splitting traffic differently from a queue into following branching queues through a careful and clever design, which is fundamentally different from [80, 79]. 2.6 Complexity of throughput-optimal policy In the previous section, we showed the throughput-optimality of the scheduling policy, but it is not clear how complex it is to implement the policy. In this section, we describe how the Max-Weight policy is implemented and show that the Max-Weight policy has complexity which is almost linear in the number of virtual queues. 30 First, we denote p n (k;j) 2f0; 1g as the decision variable that server j processes task k in (k;j) at time n, i.e., p n (k;j) = 8 > > < > > : 1 server j processes task k in (k;j) at time n 0 otherwise: We denote q n (k;j) 2f0; 1g as the decision variable that server j sends the data of processed task k in (k;j);c at time n, i.e., q n (k;j) = 8 > > < > > : 1 server j sends the data of processed task k in (k;j);c at time n 0 otherwise: Then, we define ~ p n and ~ q n to be decision vectors for p n (k;j) and q n (k;j) respectively at time n. Given virtual queue-lengths ~ Q n , ~ Q n c and historyF n at time n, Max-Weight policy min- imizes ( ~ Q n ) T E[ ~ Q n jF n ] + ( ~ Q n c ) T E[ ~ Q n c jF n ] (2.76) over the vectors ~ p n , ~ q n ,~ u n ,~ s n and ~ w n . That is, Max-Weight policy minimizes J X j=1 X k2C Q n (k;j) ( m(k) u n m(k)!j (k;j) p n (k;j) ) + J X j=1 X k= 2C Q n (k;j) ( X l2[J]nfjg b l q n (k1;l) c k1 w n k1;l!j + (k1;j) p n (k1;j) s n k1;j!j (k;j) p n (k;j) ) + J X j=1 X k2[K]nH Q n (k;j);c f(1s n k;j!j ) (k;j) p n (k;j) b j q n (k;j) c k g which can be rearranged to J X j=1 X k2[K]nH q n (k;j) b j c k ( X l2[J]nfjg w n k;j!l Q n (k+1;l) Q n (k;j);c ) 31 + J X j=1 X k2[K]nH p n (k;j) (k;j) fs n k;j!j (Q n (k+1;j) Q n (k;j);c ) +Q n (k;j);c Q n (k;j) g J X j=1 X k2H p n (k;j) (k;j) Q n (k;j) + X k2C m(k) J X j=1 Q n (k;j) u n m(k)!j = J X j=1 K X k=1 p n (k;j) F n (k;j) + J X j=1 X k2[K]nH q n (k;j) G n (k;j) + X k2C m(k) J X j=1 Q n (k;j) u n m(k)!j : (2.77) Note that the function F n (k;j) is defined as follows: F n (k;j) = (k;j) fs n k;j!j (Q n (k+1;j) Q n (k;j);c ) +Q n (k;j);c Q n (k;j) g;8 j;8 k2 [K]nH; (2.78) else F n (k;j) = (k;j) Q n (k;j) ;8 j;8 k2H: (2.79) The function G n (k;j) is defined as follows: G n (k;j) = b j c k ( X l2[J]nfjg w n k;j!l Q n (k+1;l) Q n (k;j);c );8 k2 [K]nH: (2.80) For (2.77), we first minimize X k2C m(k) J X j=1 Q n (k;j) u n m(k)!j (2.81) 32 over~ u n . We denotej n u (k) = arg min j2[J] Q n (k;j) ,8k2C. It is clear that the minimizer~ u n is u n m(k)!j = 8 > > < > > : 1 if j =j n u (k) 0 otherwise: (2.82) Whenmorethanonequeue-lengthsareminima, weusearandomtie-breakingrule. Secondly, we minimize J X j=1 K X k=1 p n (k;j) F n (k;j) (2.83) over ~ p n and~ s n . For each j2 [J] and k2 [K]nH, the minimizer s k;j!j of the function F n (k;j) is s n k;j!j = 8 > > < > > : 1 if Q n (k+1;j) Q n (k;j);c 0 0 if Q n (k+1;j) Q n (k;j);c > 0: (2.84) We denote F n (k;j) as minima of F n (k;j) for each j2 [J] and k2 [K]. Then, we define k n F (j) as follows: k n F (j) = arg min k2[K] F n (k;j) ;8 j2 [J]: (2.85) To minimize (2.83), we have p n (k;j) = 8 > > < > > : 1 if k =k n F (j) 0 otherwise (2.86) for each j2 [J]. Lastly, we minimize J X j=1 X k2[K]nH q n (k;j) G n (k;j) (2.87) 33 over ~ q n and ~ w n . We denote j n w (k) = arg min l2[J]nfjg Q n (k+1;l) ,8 k2 [K]nH. For each j2 [J] and k2 [K]nH, the minimizer of G n (k;j) is w n k;j!l = 8 > > < > > : 1 if l =j n w (k) 0 otherwise: (2.88) We denoteG n (k;j) as minima ofG n (k;j) for eachj2 [J] andk2 [K]nH. Then, we definek n G (j) as follows: k n G (j) = arg min k2[K]nH G n (k;j) ;8 j2 [J]: (2.89) To minimize (2.87), we have q n (k;j) = 8 > > < > > : 1 if k =k n G (j) 0 otherwise (2.90) for each j2 [J]. Based on the optimization above, we describe how the Max-Weight policy is implemented. We consider the virtual queueing network at each time n. The procedures of minimizing (2.81) show that when a new job comes to the network, the root task k is sent to virtual queue (k;j) if queue length of (k;j) is the shortest. The procedures of minimizing (2.83) show that server j processes task k in virtual queue (k;j) if p n (k;j) = 1; and then the output of processed task k is sent to virtual queue (k + 1;j) if s n k;j!j = 1 or sent to virtual queue (k;j);c if s n k;j!j = 0. The procedures of minimizing (2.87) shows that server j sends output of processed taskk in virtual queue (k;j);c to virtual queue (k+1;j n w (k)) iffq n (k;j) = 1 andw n k;j!j n w (k) = 1. By the analysis above, we know that the complexity of the Max-Weight policy is dominated by the procedure of sorting some linear combinations of queue lengths. As we know, the complexity of sorting algorithm is N logN. To minimize (2.77), there are M procedures of sortingJ values forQ n (k;j) ifk2C,KM procedures of sortingJ1 values 34 Figure 2.7: Overview of DAG scheduling for dispersed computing. for Q n (k+1;l) , J procedures of sorting KM values for G n (k;j) , and J procedures of sorting K values for F n (k;j) . Thus, the complexity of the Max-Weight policy is bounded above by 2KJ logK +KJ logJ which is almost linear in the number of virtual queues. Note that the number of the virtual queues in the network proposed in Section 2.4 is (2KM)J. 2.7 Towards more general Computing Model In this section, we extend our framework to a more general computing model, where jobs are modeled as directed acyclic graphs (DAG), which capture more complicated dependencies among tasks. As shown in Fig. 2.7, nodes of the DAG represent tasks of the job and edges represent the logic dependencies between tasks. Different from the model of chains, tasks in the model of DAGs might have more than one parent node, e.g., task 4 in Fig. 2.7 has two parent nodes, task 2 and task 3. One major challenge in the communication-aware DAG scheduling is that the data of processed parents tasks have to be sent to the same server for processing the child task. This logic dependency difficulty for communications doesn’t appear in the model of chains for computations because tasks of a chain can be processed one by one without the need for merging the processed tasks. Due to logic dependency difficulty incurred in the model of DAGs, designing a virtual queueing network which encodes the state of the network is more difficult. 35 Motivated by broadcast channel model in many areas (e.g. wireless network, message passing interface (MPI) and shared bus in computer networks), we simplify the network to a broadcast network 3 in which the servers always broadcast the output of processed tasks to other servers. Inspired by [80], we propose a novel virtual queueing network to solve the logic dependency difficulty for communications, i.e., guarantee that the results of preceding tasks will be sent to the same server. Lastly, we propose the Max-Weight policy and show that it is throughput-optimal for the network. 2.7.1 DAG Computing Model For the DAG scheduling problem, we define the following terms. As shown in Fig. 2.7, each job is modeled as a DAG. Let (V m ;E m ;fc k g k2Vm ) be the DAG corresponding to the job of type m, m2 [M], whereV m denotes the set of nodes of type-m jobs,E represents the set of edges of the DAG, and c k denotes the data size (bits) of output type-k task. Similar to the case of jobs modeled as chains, let the number of tasks of a type-m job be K m , i.e. jV m j = K m , and the total number of task types in the network be K, and we index the task types in the network by k, k2 [K], starting from job type 1 to M. We call task k 0 a parent of taskk if they belong to the same DAG, and (k 0 ;k)2E m . LetP k denote the set of parents ofk. In order to processk, the processing of all the parents of k,k 0 2P k , should be completed, and the results of the processed tasks should be all available in one server. We call task k 0 a descendant of task k if they belong to the same DAG, and there is a directed path from k to k 0 in that DAG. In the rest of this section, we consider the network of dispersed computing platform to be the broadcast network where each server in the network always broadcasts the result of a processed task to other servers after that task is processed. 3 Note that the communication-aware DAG scheduling in a general network without broadcast constraints would become intractable since all the tasks have to be tracked in the network for the purpose of merging, which can highly increase the complexity of scheduling policies. 36 Figure 2.8: Queueing Network for the simple DAG in Fig. 2.7. Figure 2.9: Queueing Network with Additional Precedence Constraints for DAG in Fig. 2.7. 2.7.2 Queueing Network Model for DAG Scheduling In this subsection, we propose a virtual queueing network that guarantees the output of processed tasks to be sent to the same server. Consider M DAGs, (V m ;E m ;fc k g k2Vm ), m2 [M]. We construct M parallel networks of virtual queues by forming two kinds of virtual queues as follows: 1. Processing Queue: For each non-empty subsetS m ofV m ,S m is a stage of job m if and only if for all k2S m , all the descendants of k are also inS m . For each stage of job m, we maintain one virtual queue. Also, a task k in a stage is processable if there are no parents of task k in that stage. 2. Communication Queue: For each serverj and each processable taskk in stageS m , we maintain one virtual queue which is indexed byS m and (k;j). Example. Consider a job specified by the DAG shown in Fig. 2.7 and a network consisting J = 2 servers. We maintain the processing queues for each of 5 possible stages of the job 37 which aref1; 2; 3; 4g,f2; 3; 4g,f2; 4g,f3; 4g andf4g. Since task 1 in stagef1; 2; 3; 4g is processable, we maintain communication queuesf1; 2; 3; 4g (1;1) andf1; 2; 3; 4g (1;2) for server 1 and server 2 respectively. Similarly, we maintain communication queues f2; 3; 4g (2;1) , f2; 3; 4g (2;2) ,f2; 3; 4g (3;1) andf2; 3; 4g (3;2) for stagef2; 3; 4g; communication queuesf3; 4g (3;1) , f3; 4g (3;2) for stagef3; 4g; communication queuesf2; 4g (2;1) ,f2; 4g (2;2) for stagef2; 4g. The corresponding network of virtual queues is shown as Fig. 2.8. Nowwedescribethedynamicsofthevirtualqueuesinthenetwork. Whenanewjobmcomes to network, the job is sent to the processing queue corresponding to stageS m =V m of jobm. When server j works on task k in processing queue corresponding to subsetS m , the result of the process is sent to the communication queue indexed byS m and (k;j) with rate (k;j) . When serverj broadcasts the result of processed taskk in the communication queue indexed byS m and (k;j), the result of the process is sent to the processing queue corresponding to subsetS m nfkg with rate b j c k . We call the action of processing task k in processing queue corresponding toS m as a processing activity. Also, we call the action of broadcasting the output of processed taskk in communication queue as a communication activity. We denote the collection of different processing activities and different communication activities in the network asA andA c respectively. Let A =jAj and A c =jA c j. Define the collection of processing activities that server j can perform asA j , and the collection of communication activities that server j can perform asA c;j . Remark 2. In general, the number of virtual queues corresponding to different stages of a job can grow exponentially with K since each stage denotes a feasible subset of tasks. It can result in the increase of complexity of scheduling policies that try to maximize the throughput of the network. In terms of number of virtual queues, it is important to find a queueuing network with low complexity while resolving the problem of synchronization (see [80] for more details) and guaranteeing the output of processed tasks to be sent to the same server. 38 Remark 3. To decrease the complexity of the queueing network, a queueing network with lower complexity can be formed by enforcing some additional constrains such that the DAG representing the job becomes a chain. As an example, if we force another constraint that task 3 should proceed task 2 in Fig. 2.7, then the job becomes a chain of 4 nodes with queueing network represented in Fig. 2.9. The queueing network of virtual queues for stages of the jobs has K queues which largely decreases the complexity of scheduling policies for the DAG scheduling problem. 2.7.3 Capacity Region LetK 0 be the number of virtual queues for the network. For simplicity, we index the virtual queues byk 0 ,k 0 2 [K 0 ]. We define a drift matrixD2R K 0 (A+Ac) ford (k 0 ;a) whered (k 0 ;a) is the rate that virtual queuek 0 changes if activitya is performed. Define a lengthK 0 arrival vector ~ e( ~ )suchthate k 0( ~ ) = m ifvirtualqueuek 0 correspondstothefirststageofjobsinwhichno tasks are yet processed, ande k 0( ~ ) = 0 otherwise. Let~ z2R (A+Ac) be the allocation vector of the activitiesa2A[A c . Similar to the capacity region introduced in the previous sections, we can introduce an optimization problem for the network that characterizes the capacity region. The optimization problem called broadcast planning problem (BPP) is defined as follows: Broadcast Planning Problem (BPP): Minimize (2.91) subject to ~ e( ~ ) +D~ z ~ 0 (2.92) X a2A j z a ;8 j2 [J]; (2.93) X a2A c;j z a ;8 j2 [J]; (2.94) ~ z ~ 0: (2.95) 39 Based on BPP above, the capacity region of the network can be characterized by following proposition. Proposition 3. The capacity region 00 of the virtual queueing network characterizes the set of all rate vectors ~ 2R M + for which the corresponding optimal solution to the broadcast planning problem (BPP) satisfies 1. In other words, capacity region 00 is characterized as follows 00 , ( ~ 2R M + :9 ~ z ~ 0 such that 1 X a2A j z a ; 8 j; 1 X a2A c;j z a ; 8 j; and~ e( ~ ) +D~ z ~ 0 ) : The proof of Proposition 3 is similar to Proposition 2. Due to the space limit, we only provide the proof sketches. Consider the virtual queueing network in the fluid limit. Suppose > 1. Weassumethatthereexitsaschedulingpolicysuchthatunderthatpolicythevirtual queueing network is weakly stable. Then, one can obtain a solution such that 1 which contradicts > 1. On the other hand, suppose 1. Similar to (2.45) and (2.46), we can find a feasible allocation vector ~ z such that the derivative of each queue’s fluid level is not greater than 0 which implies the network is weakly stable. Remark 4. Additional precedence constraints do not result in loss of throughput. Given a ~ 2 00 with corresponding allocation vector~ z, one can construct a feasible allocation vector ~ z 0 for the virtual queueing network based on additional precedence constraints: For each task k and each server j, we choose the allocation of processing activity that task k is processed by serverj to be the sum of all allocations which correspond to taskk and serverj in~ z. For communication activity, it can be done in a similar argument. However, this serialization technique could increase the job latency (delay) as it de-parallelizes computation tasks. Also, the gap could be huge if the original DAG of job has a large number of tasks stemming from a common parent node. 40 2.7.4 Throughput-Optimal Policy Now,weproposeMax-Weightpolicyforthequeueingnetworkandshowthatitisthroughput- optimal. Given virtual queue-lengths Q n k 0 at time n, Max-Weight policy allocates a vector ~ z that is arg max ~ z are feasible ( ~ Q n ) T E[ ~ Q n jF n ] (2.96) = arg max ~ z are feasible ( ~ Q n ) T D~ z (2.97) where ~ Q n is the vector of queue-lengthsQ n k 0 at timen. Next, we state the following theorem for the throughput-optimality of Max-Weight Policy. Theorem 2. Max-Weight policy is throughput-optimal for the queueing network proposed in Subsection 2.7.2. Proof. We consider the problem in the fluid limit. Define the amount of fluid in virtual queue k 0 as X k 0(t). The dynamics of the fluid are as follows: ~ X(t) = ~ X(0) +~ e( ~ )t +D ~ T (t); (2.98) where ~ X(t) is the vector of queue-lengthsX k 0(t),T a (t) is the total time up tot that activity a is performed, and ~ T (t) is the vector of total service times of different activities T a (t). By Max-weight policy, we have _ ~ T Max-Weight (t) = arg min ~ z is feasible ~ X T (t)D~ z: (2.99) Now, we take V (t) = 1 2 ~ X T ~ X as the Lyapunov function. The drift of V (t) by using Max- Weight policy is _ V Max-Weight (t) = ~ X T (t)(~ e( ~ ) +D ~ _ T Max-Weight (t)) 41 = ~ X T (t)~ e( ~ ) + min ~ z is feasible ~ X T (t)D~ z ~ X T (t)(~ e( ~ ) +D~ z ) where ~ z is a feasible allocation vector. If ~ 2 00 , then _ V (t) 0 which is directly from (2.92). That is if ~ X(0) = ~ 0, then ~ X(t) = ~ 0 for all t 0 which implies that the fluid model is weakly stable, i.e. the queueing network is rate stable [27]. If ~ is in the interior of capacity region 00 , we have _ V Max-Weight (t) (2.100) ~ X T (t)(~ e( ~ ) +D~ z )< ~ 0; (2.101) which proves that the fluid model is stable which implies the positive recurrence of the underlying Markov chain [27]. 2.8 Conclusion Inthischapter,weconsidertheproblemofcommunication-awaredynamicschedulingofserial tasks for dispersed computing, motivated by significant communication costs in dispersed computing networks. We characterize the capacity region of the network and propose a novel network of virtual queues encoding the state of the network. Then, we propose a Max- Weight type scheduling policy, and show that the policy is throughput-optimal through a Lyapunov argument by considering the virtual queueing network in the fluid limit. Lastly, we extend our work to communication-aware DAG scheduling problem under a broadcast network where servers always broadcast the output of processed tasks to other servers. We propose a virtual queueing network encoding the state of network which guarantees the results of processed parents tasks are sent to the same server for processing child task, and show that the Max-Weight policy is throughput-optimal for the broadcast network. Some 42 future directions are to characterize the delay properties of the proposed policy, develop robust scheduling policies that are oblivious to detailed system parameters such as service rates, and develop low complexity and throughput-optimal policies for DAG scheduling. Beyond these directions, another future research direction is to consider communication- aware task scheduling when coded computing is also allowed. Coded computing is a recent technique that enables optimal tradeoffs between computation load, communication load, and computation latency due to stragglers in distributed computing (see, e.g., [62, 61, 54, 30, 86]). Therefore, designing joint task scheduling and coded computing in order to leverage tradeoffs between computation, communication, and latency could be an interesting problem (e.g., [106]). 43 Chapter 3 Timely-Throughput Optimal Coded Computing over Cloud Networks 3.1 Introduction Large-scale distributed computing systems can substantially suffer from unpredictable and unreliable computing infrastructure which can result in high variability of computing re- sources, i.e., speed of the computing resources vary over time. The speed variation has several causes including hardware failure, co-location of computation tasks, communication bottlenecks, etc. [113, 5] This variability is further amplified in computing clusters, such as Amazon EC2, due to the utilization of credit-based computing policy, in which the most commonly used T2 and T3 instances can operate significantly above a baseline level of CPU performance (approximately 10 times faster as shown in Fig. 3.1) by consuming CPU credits that are allocated periodically to the nodes. At the same time, there is a significant increase in utilizing the cloud for event-driven and time-sensitive computations (e.g., IoT applications and cognitive services), in which the users increasingly demand timely services with deadline constraints, i.e., computations of requests have to be finished within specified deadlines. Our goal in this chapter is to study the problem of computation allocation over cloud networks with particular focus on variability of computing resources and timely computation tasks. From the measurements of nodes’ computation speeds over Amazon EC2 clusters, 44 shown in Fig. 3.1, we observe that when a node is slow (fast), it is more likely that it continues to be slow (fast) in the following rounds of computation, which implies temporal correlation of computation speeds. Thus, to capture this phenomenon, we consider a two- state Markov model for variability of computing speed in cloud networks. In this model, each worker can be either in a good state or a bad state in terms of the computation speed, and the transition between these states is modeled as a Markov chain which is unknown to the scheduler. Furthermore, we consider a Coded Computing framework, in which the data is possibly encoded and stored at the worker nodes in order to provide robustness against nodes that may be in a bad state. The key idea of coded computing is to encode the data and design each worker’s computation task such that the fastest responses of any k workers out of total of n workers suffice to complete the distributed computation, similar to classical coding theory where receiving any k symbols out of n transmitted symbols enables the receiver to decode the sent message. We consider a dynamic computation model, where a sequence of functions needs to be computed over the (encoded) data that is distributedly stored at the nodes. More precisely, in an online manner, timely computation requests with given deadlines are submitted to the system, i.e., each computation has to be finished within the given deadline. Our goal is then to design the optimal computation-load allocation strategy and the optimal data encoding scheme that maximize the timely computation throughput (i.e, the average number of computation tasks that are accomplished before their deadline). 1 One significant challenge in this problem is the joint design of (1) a data encoding scheme to provide robustness against straggling workers; and (2) an adaptive computation load allocation strategy for the workers based on the history of previous computation times. In particular, the state of the computing nodes and the transition probabilities of the Markov model are unknown to the scheduler. We note that to find the optimal computation strategy, 1 Our metric of timely computation throughput is motivated by timely throughput metric, introduced in [40], which measures the average number of packets that are delivered by their deadline in a communication network. 45 0 10 20 30 40 50 60 70 80 90 100 Time 0 0.1 0.2 0.3 0.4 0.5 0.6 Relative Speed Figure 3.1: Empirical measurement of speed variation of a credit-based t2.micro instance in Ama- zon EC2 in which we keep assigning computation (e.g., a matrix multiplication) to the instance and measure the finish times: A two-state Markov model. one has to solve a complex optimization which in general requires searching over all possible load allocations, even if the transition probabilities of Markov model are known to the master. Thus, it is not clear how one allocates the computation loads efficiently and what computation strategy is optimal, especially for the network with unknown Markov model. As the main contributions of the chapter, we propose a dynamic computation strategy called Lagrange Estimate-and-Allocate (LEA) strategy, and show that it achieves the optimal timely computation throughput. Utilizing Lagrange coding scheme for data encoding [112], the LEA strategy estimates the transition probabilities by observing the past events at each time step, and then assigns computation loads based on the estimated probabilities. Moreover, we also show that finding the optimal load assignment using LEA can be done efficiently instead of searching over all possible load allocations which is computationally infeasible to implement. To prove the optimality of the LEA strategy, we first provide an upper bound for the timely computation throughput by maximizing the success probability of each round when the transition probabilities are known to master. For any fixed load assignment, we show that using Lagrange coding scheme proposed in [112] has the highest success probability of each round. Then, we show that the success probability using LEA converges to the optimal 46 success probability. By the Strong Law of Large Numbers (SLLN), Ergodic theorem and a coupling argument, we finally prove that timely computation throughput achieved by the LEA strategy is equal to the optimal timely computation throughput, i.e., LEA is optimal. In addition to proving the optimality of LEA, we carry out numerical studies and exper- iments over Amazon EC2 clusters. We compare the proposed LEA strategy with a static load allocation strategy for the benchmark. In our numerical analysis, compared to the static computation strategy, the LEA strategy improves the timely computation throughput by 1:38 17:5. In experiments over Amazon EC2 clusters, the LEA strategy increases the timely computation throughput by 1:27 6:5. 3.1.1 Related Prior Work We divide the literature review to two main lines of work: task scheduling over cloud net- works, and coded computing in distributed systems. Inthedynamictaskschedulingproblem, jobsarrivetothenetworkaccordingtoastochas- tic process, and get scheduled dynamically over time. In many works in the literature, the tasks have dedicated servers for processing, and the goal is to establish stability conditions for the network [10]. Given the stability results, the next natural goal is to compute the expected completion times of jobs or delay distributions. However, few analytical results are available for characterizing the delay performance, except for the simplest models. When the tasks do not have dedicated servers, one aims to find a throughput-optimal scheduling policy (see e.g. [32]), i.e. a policy that stabilizes the network, whenever it can be stabilized. For example, Max-Weight scheduling, first proposed in [98, 28], is known to be throughput- optimal for wireless networks, flexible queueing networks [71, 31, 79], data centers networks [65] and dispersed computing networks [103]. Moreover, there have been many works which focus on task scheduling problem with deadline constraints over cloud networks (see e.g. [6, 39]). 47 Coded computing broadly refers to a family of techniques that utilize coding to inject computation redundancy in order to alleviate the various issues that arise in large-scale distributed computing. In the past few years, coded computing has a tremendous success in various problems, such as straggler mitigation and bandwidth reduction (e.g., [54, 62, 30, 53, 111, 96, 61, 81]). Coded computing has also been expanded in various directions, such as heterogeneous networks (e.g., [86]), partial stragglers (e.g., [34]), secure and private computing (e.g., [21, 12, 112]) and distributed optimization (e.g., [45]). So far, research in coded computing has focused on developing frameworks for one round of computation instead of considering network dynamics for analyzing long-run performance of distributed computing systems. In this chapter, considering the dynamics of the network, we make substantial progress by combining the ideas of coded computing with dynamic computation load allocation over cloud networks, and developing Lagrange Estimate-and Allocate strategy that can adaptively assign computation loads to workers and essentially learn the unknown network dynamics. Furthermore, we consider the metric "timely com- putation throughput" which denotes the average number of successful completions instead of the metric "timely throughput" which usually denotes the average number of packets delivered successfully in network scenarios (see e.g., [52]). 3.2 System Model 3.2.1 Computation Model We consider a distributed computing problem, in which computation requests are submitted to a distributed computing system in an online manner, and the computation is carried out in the system. In particular, there is a fixed deadline for each computation round, i.e., each computation has to be finished within the given deadline. As shown in Fig. 3.2, the considered system is composed of a master node and n worker nodes. There is also a datasetX which is divided toX 1 ;X 2 ;:::;X k . Specifically, eachX j is 48 Figure 3.2: Overview of dynamic load allocation over a coded computing framework with timely computation requests. In each round m, the goal is to compute the evaluations f(X 1 );:::;f(X k ) by the deadline d using n workers. an element in a vector spaceV over a fieldF. In each roundm (or time slot in a discrete-time system), a computation request with a function f m is submitted to the system, where the function f m is an arbitrary multivariate polynomial with vector coefficients having degree deg(f). We denote by d the deadline of each computation request which is smaller than or equal to the duration of each round. In such distributed computing system, we are interested in computing the evaluations f m (X 1 );f m (X 2 );:::;f m (X k ) in each round m by the deadline d. Prior to the computation, the master first encodes the datasetX 1 ;X 2 ;:::;X k to ~ X 1 , ~ X 2 , :::, ~ X nr via a set of nr encoding functions~ g = (g 1 ;g 2 ; :::;g nr ), where encoded data ~ X v , g v (X 1 ;:::;X k ) is determined by the encoding function g v : V! U. Each worker i stores r encoded data chunks ~ X (i1)r+1 , ~ X (i1)r+2 ,:::; ~ X ir locally. In each round m, each worker evaluates certain subset of f m ( ~ X (i1)r+1 ), f m ( ~ X (i1)r+2 ), :::, f m ( ~ X ir ) which is determined by the master. Given a function f m in round m, the master assigns the computations to each worker. More specifically, we define ~ ` m = (` m;1 ;` m;2 ;:::;` m;n ) to be the load allocation vector, in which ` m;i denotes the number of polynomial or function evaluations computed by worker i in round m. Each worker i computes ` m;i evaluations of function f m over the stored data 49 without specified order, and returns all the results back to the master upon the completion of all assigned computations. The master node aggregates the results from the worker nodes until it receives a decodable set of computations and recovers f m (X 1 );f m (X 2 );:::;f m (X k ). We say a set of computations is decodable if the evaluations f m (X 1 ), f m (X 2 ), :::, f m (X k ) can be obtained by computing decoding functions over received results. In each round, the goal of the master is to receive a decodable set of computations within the given deadline d. Let us illustrate the model through a simple example. Example. In each round m, we consider a problem of evaluating a linear function f m (X j ) = X j ~ w m over n = 3 workers, where the input dataset X is divided to X 1 , X 2 and ~ w m is the input vector. One possible coding scheme is to encode X 1 and X 2 to ~ X 1 = X 1 , ~ X 2 =X 2 and ~ X 3 =X 1 +X 2 . Each workeri storesr = 1 encoded data chunk ~ X i . If the load allocation vector ~ ` m = (1; 1; 1) is used by the master, then each worker i computes ~ X i ~ w m and sends the result back the master upon its completion. The setf ~ X 1 ~ w m ; ~ X 3 ~ w m g is one of decodable sets since the master can obtain X 1 ~ w m and X 2 ~ w m by computing X 1 ~ w m = ~ X 1 ~ w m and X 2 ~ w m = ~ X 3 ~ w m ~ X 1 ~ w m . We note that the considered computation model naturally appears in many gradient computingproblems. Forexample, inlinearregressionproblems, wewanttocomputef m (X j ) = X > j (X j ~ w m ~ y) which is the gradient of the quadratic loss function 1 2 (X > j ~ w m ~ y) 2 with respect to the weight vector ~ w m in round m. 3.2.2 Network Model Motivated by the measurements over Amazon EC2 clusters, shown in Fig. 3.1, we assume that each worker has two different states for computing, good state and bad state. We denote g as the computing speed (evaluations per second) in the good state, and denote b as the computing speed in the bad state. We assume that the computing speeds g and b are known to the master. Note that given a worker’s state, its computation time (per evaluation) is deterministic. We denote m;i as computing speed of workeri in roundm. And, we denote 50 ~ m = ( m;1 ; m;2 ;:::; m;n ) as computing speed vector in round m. For each worker i, we modelthestatetransitionsasastationaryMarkovprocessS i [1];S i [2];:::;withthetransition matrix defined as follows: P i = 2 6 4 p g!g;i 1p g!g;i 1p b!b;i p b!b;i 3 7 5 (3.1) where p g!g;i is the transition probability of worker i going to the good state from the good state, andp b!b;i is the transition probability of workeri going to the bad state from the bad state. We assume that the Markov processes of different workers are mutually independent. Prior to the computation, we assume the initial state of worker i is given by the stationary distribution of Markov chain (S i [1];S i [2];:::). We assume that the transition probabilities and current state of each worker are unknown to the master before the master assigns the computations to each worker. 3.2.3 Problem Formulation Given the computation deadlined, we denoteN m (d) as an indicator representing whether the computation is finished by deadline d, i.e.,N m (d) = 1 if the computation is finished by time d in round m, and N m (d) = 0 otherwise. We denote = (~ g;f ~ ` m g 1 m=1 ) as the computation strategy. Also, we denote the set of all computation strategies as . Definition 7 (Timely Computation Throughput). Given the computation deadlined, using computation strategy , the timely computation throughput, denoted by R(d;), is defined as follows: R(d;) = lim M!1 P M m=1 N m (d) M : (3.2) Based on the above definitions, our problem is now formulated as the following. 51 Problem. Consider a distributed computing system consisting of computation and network models as defined in Subsections 3.2.1 and 3.2.2. Our goal is to find an optimal computation strategy achieving optimal timely computation throughput, denoted by R (d) which is defined as follows: R (d) = sup 2 R(d;) (3.3) 3.3 Lagrange Estimate-and-Allocate (LEA) Strategy In this section, we propose a dynamic computation strategy called Lagrange Estimate-and- Allocate (LEA) strategy, which is composed of Lagrange coding scheme for data encoding and Estimate-and-Allocate (EA) algorithm for allocating loads to the workers adaptively by observing the history of computation times. In each round, the EA algorithm first assigns computation loads by maximizing the estimated success probability based on the estimated transition probabilities of the underlying Markov chain (and based on that the previous state of the workers). After receiving the results, the EA algorithm updates the estimated transition probabilities by observing the computation times in the past events. 3.3.1 Data Encoding in LEA For data encoding, we leverage a linear coding scheme called Lagrange coding scheme [112] which is demonstrated to simultaneously provide resiliency, security, and privacy in dis- tributed computing. We start with an example to illustrate how Lagrange coding scheme is applied to our problem. We first consider the scenario where nr k deg(f) 1. In each round m, we consider a problem of evaluating a quadratic function f m (X j ) = X > j X j ~ w m (deg(f)=2) over n = 3 52 workers, where the input dataset X is divided to X 1 ;X 2 . Each worker stores r = 2 encoded data chunks (nr = 6>k deg(f) 1 = 3). We define u as follows: u(z),X 1 z 1 0 1 +X 2 z 0 1 0 =z(X 2 X 1 ) +X 1 ; (3.4) in which u(0) = X 1 and u(1) = X 2 . Then, we encode X 1 and X 2 to ~ X i = u(i 1), i.e., ~ X 1 = X 1 , ~ X 2 = X 2 , ~ X 3 =X 1 + 2X 2 , ~ X 4 =2X 1 + 3X 2 , ~ X 5 =3X 1 + 4X 2 and ~ X 6 =4X 1 + 5X 2 . Each worker i stores ~ X 2i1 and ~ X 2i locally. We now consider the scenario wherenr<k deg(f)1. We consider the same problem in the previous scenario, but the there is larger input dataset X which is divided toX 1 ;X 2 ;X 3 andX 4 (nr = 6<k deg(f)1 = 7). We encodeX 1 andX 2 using a repetition coding design such that ~ X 1 = X 1 , ~ X 2 = X 2 , ~ X 3 = X 3 , ~ X 4 = X 4 , ~ X 5 = X 1 and ~ X 6 = X 2 . Each worker i stores ~ X 2i1 and ~ X 2i locally. Formally, we describe Lagrange coding scheme in the following. (1)nrk deg(f) 1: We first selectk distinct elements 1 ; 2 ;:::; k fromF, and letu be the respective Lagrange interpolation polynomial u(z), k X j=1 X j Y l2[k]nfjg z l j l : (3.5) where u : F ! V is a polynomial of degree k 1 such that u( j ) = X j . To encode the input X 1 ;X 2 ;:::;X k , we select nr distinct elements 1 ; 2 ;:::; nr fromF, and encode X 1 ;X 2 ;:::;X k to ~ X v =u( v ) for all v2 [nr], i.e., ~ X v =g v (X) =u( v ), k X j=1 X j Y l2[k]nfjg v l j l : (3.6) Each worker i stores ~ X (i1)r+1 ; ~ X (i1)r+2 ;:::; ~ X ir locally. (2)nr<k deg(f)1: We use a repetition coding design to encode the inputX 1 ;X 2 ;:::;X k . We replicate every X i eitherb nr k c ord nr k e times such that the number of total encoded data 53 chunks is nr. Then, we obtain the encoded data ~ X 1 ; ~ X 2 ;:::; ~ X nr . Each worker picks r of the encoded data ~ X 1 ; ~ X 2 ;:::; ~ X nr to be stored locally. We note that decoding and encoding in Lagrange coding scheme relies on polynomial interpolation and evaluation which can be done efficiently. 3.3.2 Load Allocation in LEA Before introducing the EA algorithm, we first define the following terms. For each worker i, we denoteC g!g;i (m) as the number of times that event "good state to good state" happened up to round m. Similarly, C g!b;i (m), C b!g;i (m) and C b!b;i (m) are for events "good state to bad state", "bad state to good state" and "bad state to bad state" respectively. For worker i, we denote ^ p g!g;i (m) and ^ p b!b;i (m) as the estimated transition probabilities after the first m 1 rounds of computations. For worker i, we denote ^ p g;i (m) and ^ p b;i (m) as the estimated probabilities being in the good state and the bad state in roundm respectively. Without loss of generality, we assume that ^ p g;1 (m) ^ p g;2 (m) ^ p g;n (m). We also define ` b , b d and ` g , min( g d;r). Now, we formally describe the EA algorithm. In each round m, the EA algorithm has the following 4 phases: (1) Load Assignment Phase: The master maximizes the estimated success probability in round m based on the the estimated probabilities ^ p g;i (m) and ^ p b;i (m). To do so, the master findsi m (1i m n) maximizing the estimated success probability function defined as follows 2 : ^ P m ( ~ i) = 0 if K > ~ i` g + (n ~ i)` b ; (3.7) 2 Note that we only consider the case: K n b d = n` b , otherwise the computation can be always finished in time d which is trivial. 54 otherwise ^ P m ( ~ i) = ~ i X l=w( ~ i) X G:G[ ~ i];jGj=l Y i2G ^ p g;i (m) Y i2[ ~ i]nG ^ p b;i (m) (3.8) where w( ~ i),d K (n ~ i)` b `g e and K is defined as follows: K = 8 > > < > > : (k 1)deg(f) + 1 if nrk deg(f) 1 nrb nr k c + 1 otherwise: (3.9) Note that equations (3.7) and (3.8) define the estimated success probability which is the function of ~ i (number of workers assigned to compute ` g evaluations). The intuition behind equation (3.7) is that if total load assigned to all the workers is smaller than the optimal recovery threshold, the probability of success is zero. Based on the estimated probabilities ^ p g and ^ p b , equation(3.8)givesustheestimatedsuccessprobabilitybysummingtheprobabilities of events which have enough workers in good state leading to successful completion of the computation before the deadline. Also,K defined in (3.9) is the optimal recovery threshold using Lagrange coding scheme [112] which guarantees that the evaluations can be recovered when the master receives anyK results from the workers. Thus,i m = arg max ^ P m ( ~ i). Then, the master does assignment by using the load allocation vector ` m such that ` m;i = 8 > > < > > : ` g ; if 1ii m ` b ; otherwise: (3.10) In load assignment phase, the main idea is to select workers in the order of the estimated probability being in the good state, and assign more loads accordingly. Note that it is just a linear search in load assignment phase which is computationally efficient. (2) Local Computation Phase: Within each round m of computation, each worker i receives function f m and load assignment ` m;i from the master. Then, each worker i 55 computes evaluations of function f m over encoded data ~ X (i1)r+1 ; ~ X (i1)r+2 ;:::; ~ X (i1)r+` m;i , i.e., f m ( ~ X (i1)r+1 );f m ( ~ X (i1)r+2 );:::;f m ( ~ X (i1)r+` m;i ). After the computation, each worker sends all the computation results back to the master upon its completion. (3) Aggregation and Observation Phase: Having received the fastest K computa- tionresultsfromtheworkers,themasterrecoverstheevaluationsf m (X 1 );f m (X 2 );:::;f m (X k ) for the request functionf m . By observing whether the results are sent back or not, the mas- ter checks which one of events "good state to good state", "good state to bad state", "bad state to good state" and "bad state to bad state" has happened in round m for each worker i. Then, the master obtains C g!g;i (m), C g!b;i (m), C b!g;i (m) and C b!b;i (m). Note that the time that it takes for one worker’s result to be completed and sent back to the master actu- ally indicates the (previous) state of that worker, since the speeds are deterministic and the computation time in a good state is less than the computation time in a bad state. (4) Update Phase: After aggregation and observation phase, the master updates the estimated transition probabilities ^ p g!g;i (m + 1) and ^ p b!b;i (m + 1) for the round m + 1: ^ p g!g;i (m + 1) = C g!g;i (m) C g!g;i (m)+C g!b;i (m) and ^ p b!b;i (m + 1) = C b!b;i (m) C b!g;i (m)+C b!b;i (m) . The master updates the estimated probabilities ^ p g;i (m + 1) and ^ p b;i (m + 1). If workeri was in good state in roundm, ^ p g;i (m+1) = ^ p g!g;i (m+1), and ^ p g;i (m+1) = 1 ^ p b!b;i (m+1) otherwise. Then, the computation goes to the round m + 1. 3.4 Upper bound on the timely computation throughput In this section, we give an upper bound for the timely computation throughput. The idea is to consider the case that the Markov model of the network is known to the master and achieve the optimal computation throughput for this case. 56 3.4.1 Optimal Success Probability of One Round Computation First, we consider one round of computation using a load allocation vector ~ ` with a linear coding scheme ~ g. Without knowing computing speed vector ~ , we denote T ( ~ `;~ g) (~ ) as the random variable of finish time using ~ ` and ~ g. We define the success probability as the probability that the computation is finished in time d, i.e., P(T ( ~ `;~ g) d) according to the distribution of ~ . For a coding scheme, we define recovery threshold which is formally stated as follows: Definition 8 (Recovery Threshold). For an integer k, a coding scheme~ g is k-recoverable if the master can recover the required function evaluations from anyk ofnr local computation results. We define the recovery threshold of a coding scheme ~ g, denoted by K(~ g), as the minimum number of k such that the coding scheme~ g is k-recoverable. Given a coding scheme ~ g, we have the recovery threshold K(~ g) which is the minimum number of evaluations to be received in total from the workers. Thus, we aim at finding a coding scheme and a load allocation vector that maximizes the success probability by solving the following optimization problem: MaximizeP(T ( ~ `;~ g) d) (3.11) subject to n X i=1 ` i K(~ g); (3.12) 0` i r; ` i 2Z;81in: (3.13) In the following, we show that Lagrange coding scheme achieves the highest success proba- bility for any fixed load allocation vector. Before proving the optimality of Lagrange coding scheme in terms of success probability, we first define optimal recovery threshold as follows: 57 Definition 9. We define the optimal recovery threshold, denoted by K , as the minimum achievable recovery threshold. Specifically, K , min ~ g K(~ g): (3.14) By [112], Lagrange coding scheme achieves the optimal recovery threshold of evaluating a multivariate polynomial function f (total degree deg(f)) on a dataset of k inputs, which is given by K = (k 1)deg(f) + 1 (3.15) when nrk deg(f) 1, and K =nrb nr k c + 1 (3.16) otherwise. The following lemma shows that Lagrange coding scheme achieves the highest success probability for any fixed load allocation vector. It is intuitive that a coding scheme achieving smaller recovery threshold should have higher success probability. Lemma 2. (Monotonicity) Consider an arbitrary load allocation vector ~ `, for any coding schemes ~ g 1 and ~ g 2 , such that K(~ g 1 )K(~ g 2 ), we have P(T ( ~ `;~ g 1 ) d)P(T ( ~ `;~ g 2 ) d): (3.17) The proof of the Lemma 2 is provided in the Appendix J. 58 3.4.2 Load Allocation Problem From Lemma 2, by fixing Lagrange coding scheme denoted by ~ g , the optimization problem proposed in Subsection 3.4.1 can be simplified to the optimization problem that only has load allocation vector as variables. We now introduce an optimization problem called Load Allocation Problem which is defined as follows: Load Allocation Problem: MaximizeP(T ( ~ `; ~ g ) d) (3.18) subject to n X i=1 ` i K ; (3.19) 0` i r; ` i 2Z;81in: (3.20) where K is the optimal recovery threshold defined in (3.15) and (3.16). Note that the proposed load allocation problem is a combinatorial optimization problem that in general requirescombinatorialsearchoverallpossibleallocationstomaximizethesuccessprobability. To show that load allocation problem can be solved efficiently, we first present the fol- lowing lemma whose proof is provided in Appendix K. Lemma 3. Given a deadline d, if a load allocation vector ~ ` has the success probability P(T ( ~ `; ~ g ) (~ )d), then there exists a load allocation vector ~ ` 0 with success probabilityP(T ( ~ ` 0 ; ~ g ) (~ ) d) such that P(T ( ~ ` 0 ; ~ g ) (~ ) d)P(T ( ~ `; ~ g ) (~ ) d) and ` 0 i 2f` g ;` b g where ` g = min( g d;r) and ` b = b d. By Lemma 3, we can focus on finding the optimal load allocation vector by searching all ~ ` satisfying that ` i 2f` g ;` b g for all i. To find the optimal load allocation vector, we now consider the load allocation vector characterized by the setG g =fi : ` i = ` g ; 1 i ng which represents the set of workers that computes ` g evaluations locally. Once the setG g has been determined,G b representing the set of workers that computes` b evaluations can be defined asfi :i2 [n]nG g g. 59 Since ` b i is always less than d, the workers inG b will always send the results back to the master in timed. Since the optimal recovery threshold isK using Lagrange coding scheme, the master has to receive at least K jG b j` b results from the workers inG g to recover the computation in time d. That is, there must be at leastd K jG b j` b `g e workers in the good state in setG g . We define a(G g ),d K (njGgj)` b `g e which denotes the minimum number of workers in the good state inG g to guarantee that the master can recover the computation in time d. Before writing the success probability as a function ofG g , we first define the following terms. WedefineT (Gg ) (~ )astherandomvariabledenotingthefinishtimeusingtheallocation vector characterized byG g . We denote p g;i as the probability that worker i is in the good state andp b;i as the probability that workeri is in the bad state. Also, we denote the random variable that represents the number of workers being in good state in setG as Q(G). Using the load allocation vector characterized byG g , we can find the success probability which is a function ofG g as follows: (1) a(G g ) >jG g j: In this case, the master needs at least a(G g ) workers being in good state which is greater thanjG g j. It implies thatP(T (Gg ) (~ )d) = 0. (2) 0a(G g )jG g j: In this case, we have P(T (Gg ) (~ )d) =P(Q(G g )a(G g )) (3.21) = jGgj X l=a(Gg ) P(Q(G g ) =l) (3.22) = jGgj X l=a(Gg ) X G:GGg;jGj=l Y i2G p g;i Y i2GgnG p b;i : (3.23) Therefore, our goal is to find the optimal setG g characterizing the optimal load allo- cation vector which maximizes the success probability over all possible setsG g [n]. The complexity of searching over all possible setsG g [n] grows exponentially withn, since there are overall 2 n choices forG g . 60 The following lemma shows that the optimalG g contains the workers having the largest p g;i among all the workers, which largely reduce the time complexity of finding the optimal G g , Lemma 4. Without loss of generality, we assume p g;1 p g;2 p g;n . Considering all possible setsG g with fixed cardinality n g , the optimalG g with cardinality n g that maximizes the success probability is G g =f1; 2;:::;n g g (3.24) which represents the set of n g workers having largest p g;i among all the workers. Proof. For a fixed integer n g , we supposeG 1 is the optimal set with cardinality n g where i = 2G 1 and 1in g . Thus, there exists aj2G 1 such thatj >n g . Then, we construct a set G 2 = (G 1 nfjg)[fig. The success probability of using the load allocation vector characterized byG 1 can be written as P(T (G 1 ) (~ )d) (3.25) =P(Q(G 1 )a(G 1 )) (3.26) =p g;j P(Q(G 1 nfjg)a(G 1 ) 1) + (1p g;j )P(Q(G 1 nfjg)a(G 1 )); (3.27) where the first term is the success probability when worker j is in the good state, and the second term is the success probability when worker j is in bad state. Similarly, the success probability of using the load allocation vector characterized byG 2 can be written as P(T (G 2 ) (~ )d) (3.28) =P(Q(G 2 )a(G 2 )) (3.29) =p g;i P(Q(G 2 nfig)a(G 2 ) 1) + (1p g;i )P(Q(G 2 nfig)a(G 2 )); (3.30) 61 which can be further written as p g;i P(Q(G 1 nfjg)a(G 1 ) 1) + (1p g;i )P(Q(G 1 nfjg)a(G 1 )) sinceG 2 = (G 1 nfjg)[fig and a(G 1 ) = a(G 2 ). Because p g;i p g;j and P(Q(G 1 nfjg) a(G 1 ) 1)P(Q(G 1 nfjg)a(G 1 )), we have P(T (G 2 ) (~ )d)P(T (G 1 ) (~ )d) (3.31) =(p g;i p g;j )fP(Q(G 1 nfjg)a(G 1 ) 1)P(Q(G 1 nfjg)a(G 1 ))g 0; (3.32) which is a contradiction. Thus, the optimal setG g with fixed cardinality n g must include i for all 1in g . By Lemma 4, for a fixed cardinality n g , the optimalG g is the collection of n g workers having largest p g;i among all the workers. Therefore, to find the optimal load allocation vector, we can only focus on finding the optimal n g . Since there are only n choices for n g (i.e. 1; 2;:::;n), the complexity of searching the optimal n g is linear in the number of workers n which is much smaller than 2 n . The following theorem shows that the computation strategy composed of the Lagrange coding scheme and the load allocation vector that is the solution of load allocation problem achieves the optimal timely computation throughput when the Markov model is known to the master. Theorem 3. Assume the Markov model of the network is know to the master. Let the com- putation strategy = ( ~ g ;f ~ ` m g 1 m=1 ) be the computation strategy where ~ g is the Lagrange coding scheme andf ~ ` m g 1 m=1 is given by solving load allocation problem. Then, achieves the optimal timely computation throughput. 62 Proof. WeconsiderthecomputationofroundmanddenoteN m (d)astheindicatorrepresents whether the computation is finished in time d in round m using an arbitrary computation strategy. Clearly,N m (d) is a Bernoulli random variable with parameterP(m) which denotes the success probability using this computation strategy in round m. Thus, N m (d) would contribute to the throughput with probability P(m). Since maximizes P(m) for all m, this strategy is optimal. Since the Markov model is unknown to the master in the original problem, the timely computation throughput achieved by gives us an upper bound. In the next section, we will show that this upper bound can be matched by using LEA. 3.5 Optimality of LEA First, we show that the success probability using LEA converges to the optimal success probability derived in Section 3.4. Then, with SLLN and Ergodic theorem, we show that the timely computation throughput is equal to the optimal timely computation throughput (upper bound) through a coupling argument. The following theorem shows the optimality of the LEA strategy. Theorem 4. The proposed Lagrange Estimate-and-Allocate (LEA) strategy is optimal, i.e., R LEA (d) =R (d) almost surely; (3.33) where R LEA (d) denotes the timely computation throughput using the LEA strategy. Proof. In order to prove Theorem 4, we first state Lemma 5 whose proof is moved to Ap- pendix L for the purpose of readibility. Lemma 5. P LEA (m) converges to P (m) as m goes to infinity, where P (m) denotes the optimal success probability in round m andP LEA (m) denotes the success probability in round m using the LEA strategy. 63 Before proving the optimality of LEA, we first define the following terms. We denote N m (d) as the indicator representing whether the computation is finished by time d in round m using the optimal computation strategy which maximizes the success probability in round m. Clearly, N m (d) is a Bernoulli random variable with parameter P (m). Also, we de- note N LEA;m (d) as the indicator representing whether the computation is finished in time d in round m using LEA. Then, N LEA;m (d) is a Bernoulli random variable with parameter P LEA (m). Now, we model the state of the whole system which includes all n workers as a Markov chain. Since each worker has 2 states (good or bad), there are a total of 2 n different states of the system. Without loss of generality, we index the states of the system asf1; 2;:::; 2 n g. Since the transition matrix of this Markov chain has all the entries larger than 0, this Markov chain is irreducible. We denotes(m) as the state of the system in roundm. Also,p s is denoted as the success probability of state s using the optimal computation strategy, i.e., P (m) =p s ifs(m) =s. By SLLN and the Ergodic theorem, the optimal timely computation throughput R (d) can be written as R (d) = lim M!1 P M m=1 N m (d) M (3.34) = lim M!1 2 n X s=1 P m1:s(m)=s N m (d) V s (M) V s (M) M (3.35) = 2 n X s=1 p s 1 E s [T s ] a:s:; (3.36) where the Ergodic theorem is formally stated as follows: Theorem (Ergodic Theorem). If transition matrix P of a Markov chain (X m ) m0 is irre- ducible, then we have lim m!1 V s (m) m = 1 E s [T s ] a:s: (3.37) 64 where V s (m) is the number of visits to state s up to round m and E s [T s ] is the expected return time to state s. By Lemma 5, for all > 0, there exits m() such that P LEA (m) > P (m) for all m>m(). Let ~ N m (d) be the independent Bernoulli process with parameter P (m). We coupleN LEA;m (d)and ~ N m (d)asfollows. IfN LEA;m (d) = 0,then ~ N m (d) = 0. IfN LEA;m (d) = 1, then ~ N m (d) = 1 with probability P (m) P LEA (m) , and ~ N m (d) = 0 with probability 1 P (m) P LEA (m) . Note that ~ N m (d) is still marginally independent Bernoulli process of parameterP (m). Then, we have R LEA (d) = lim M!1 P M m=1 N LEA;m (d) M (3.38) lim M!1 P M m=m()+1 N LEA;m (d) M (3.39) lim M!1 P M m=m()+1 ~ N m (d) M (3.40) = lim M!1 1 M 2 n X s=1 X mm()+1:s(m)=s ~ N m (d) (3.41) = lim M!1 Mm() M 2 n X s=1 P mm()+1:s(m)=s ~ N m (d) V s (M)V s (m()) V s (M)V s (m()) Mm() (3.42) = 2 n X s=1 (p s ) 1 E s [T s ] (3.43) = 2 n X s=1 p s 1 E s [T s ] 2 n X s=1 1 E s [T s ] (3.44) =R (d) 2 n X s=1 1 E s [T s ] a:s:; (3.45) using SLLN and the Ergodic theorem. By the fact thatR LEA (d)R (d) and Letting! 0, we have R EA (d) =R (d) which completes the proof. 65 3.6 Experiments In this section, we demonstrate the impact of LEA by simulation studies as well as experi- ments over Amazon EC2 cluster. 3.6.1 Numerical Analysis We now present numerical results evaluating the performance of the LEA strategy. First, we call a computation strategy static if this computation strategy assigns the loads to workers without considering their states in previous rounds. For comparison with LEA, we consider the following static computation strategy: Static Computation Strategy: Prior to computation, Lagrange coding scheme is used for data encoding. In each round m, each worker i is assigned to ` g or ` b number of evaluations based on the stationary distributions of the underlying Markov model, in which we denote ( g;i ; b;i ) as stationary distribution of workeri. More specifically, for each worker i in each round m, this strategy does assignment as follows: ` m;i = 8 > > < > > : ` g with probability g;i ` b with probability b;i : (3.46) Note that whenever the total loads of the generated ~ ` m is smaller than the optimal recovery threshold, thenthestrategywoulddoassignmentsagainuntilthetotalloadsofthegenerated ~ ` m is greater than the optimal recovery threshold. Since static computation strategies don’t learn the dynamics of network, they can only do loadassignmentsinadeterministicmannerorrandomlywithoutusinganyhistory. Thus, the chosen static computation strategy which utilizes the stationary distributions of underlying Markov model is better than other static computation strategies in general. Given deadline d = 1 second in each round m, we consider a problem of evaluating a quadratic function f m (X j ) = X > j (X j ~ w m ~ y) over n = 15 workers, where the dataset 66 X 1 ;X 2 ;:::;X 50 2 R 10001000 , ~ y2 R 10001 and ~ w m 2 R 10001 which is the input vector in roundm. Each worker storesr = 10 encoded data chunks using Lagrange coding scheme. In such setting, we have the optimal recovery threshold K = 99 for both LEA and the static computation strategy. For simulations, we let p g!g;i = p g!g ;p b!b;i = p b!b for all i, and consider the following four scenarios: • Scenario 1: ( g ; b ) = (10; 3), (p g!g ;p b!b ) = (0:8; 0:8) and the corresponding station- ary probabilities (p g ;p b ) = (0:5; 0:5). • Scenario 2: ( g ; b ) = (10; 3), (p g!g ;p b!b ) = (0:8; 0:7) and the corresponding station- ary probabilities (p g ;p b ) = (0:6; 0:4). • Scenario 3: ( g ; b ) = (10; 3), (p g!g ;p b!b ) = (0:8; 0:533) and the corresponding stationary probabilities (p g ;p b ) = (0:7; 0:3). • Scenario 4: ( g ; b ) = (10; 3), (p g!g ;p b!b ) = (0:9; 0:6) and the corresponding station- ary probabilities (p g ;p b ) = (0:8; 0:2). Fig. 3.3illustratestheperformancecomparisonforLEAandthestaticcomputationstrategy. We make the following conclusions from the results: • LEA increases substantial improvement in terms of the timely computation through- put. Over the four scenarios, LEA improves the static computation strategy by 1:38 17:5. • The timely computation throughput improvements over the static computation strat- egy become more significant as the stationary probability p g decreases. When p g is small, the workers would be in the bad state more probably in the long run. In this sense, thestaticcomputationstrategyassignsloadstotheworkersinamorepessimistic way. However, there is temporal correlation of computation speeds which the static computation strategy doesn’t take into account. Thus, although p g is small, LEA can 67 1 2 3 4 Scenarios 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Timely Computation Throughput LEA Static Load Allocation Figure 3.3: Numerical Evaluations. Compared with the static load allocation strategy, LEA im- proves the timely computation throughput by 1:38 17:5. achieve much higher timely computation throughput which demonstrates that LEA can adapt to the dynamics of network well. 3.6.2 Experiments using Amazon EC2 machines Before showing the experimental results, we first introduce CPU credits [1] which can boost T2 and T3 instances above baseline performance. For a t2.micro instance, as shown in Fig. 3.1, there is a 10 times difference between baseline performance and burstable perfor- mance, i.e., a burst t2.micro instance has computing speed 10 times faster. The baseline performance and ability to burst are governed by CPU credits [1]. Weranthemasternodeoverm4.xlargeinstanceandallworkersovert2.microinstances. We implemented two computation strategies in python, and used MPI4py [29] for message passing between instances. Before starting computations, each worker stores encoded data in its local memory. In each round, having received a function f m from the master, each worker computes the assigned computation using the stored data, and sends it back to the master asynchronously using Isend(). As soon as the master gathers enough results from the workers, it recovers the evaluations for the request function. 68 Given deadline d seconds in each round m, we consider a problem of evaluating a linear function f m (X j ) = X > j B m over n = 15 workers, where the datasetsfX j g k j=1 ’s are real matrices with certain dimensions, and B m 2 R 30003000 is the input matrix. Each worker stores r = 10 encoded data chunks using Lagrange coding scheme. In particular, in each round, the computation request’s arrival time is shift-exponential random variable which is the sum of a constant T c = 30 and an exponential random variable with mean . In this setting, we have the optimal recovery threshold K = 50 for both LEA and the static computation strategy. Since the Markov model is unknown (and indeed even the type of the underlying stochastic process determining the states of the workers in the cloud is not known), to compare with the LEA strategy, we consider a static computation strategy that each worker is assigned to ` g or ` b number of evaluations with equal probability in each round. For experiments, we consider the following six scenarios: • Scenario 1: Size of X j = 25 3000, k = 120, = 10 and d = 2:5. • Scenario 2: Size of X j = 25 3000, k = 120, = 30 and d = 2:5. • Scenario 3: Size of X j = 30 3000, k = 100, = 10 and d = 3. • Scenario 4: Size of X j = 30 3000, k = 100, = 30 and d = 3. • Scenario 5: Size of X j = 60 3000, k = 50, = 10 and d = 6. • Scenario 6: Size of X j = 60 3000, k = 50, = 30 and d = 6. Fig. 3.4 provides a performance comparison of LEA with the static load allocation strategy for the six scenarios. From the results, we found that LEA provides substantial improvement in terms of the timely computation throughput. Over the six scenarios, LEA increases the static computation strategy by 1:27 6:5. 69 1 2 3 4 5 6 Scenarios 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Timely Computation Throughput LEA Static Load allocation Figure 3.4: Experimental evaluations over 15 t2.micro instances in Amazon EC2. Compared with the static load allocation strategy, LEA improves the timely computation throughput by 1:27 6:5. 3.7 Conclusion Motivated by high variability of computing resources in modern distributed computing sys- tems and increasing demand for timely event-driven services with deadline constraints, we consider the problem of dynamic computation load allocation over a coded computing frame- work. We propose an optimal dynamic computation strategy Lagrange Estimate and Al- locate, LEA, which is composed of utilizing the Lagrange coding scheme for data encoding and assigning computation loads based on the estimated state of the network, which is done by estimating the transition probabilities of an underlying Markov model for the system’s state from observing the past events at each time step. In the end, we show that compared to the static computation strategy, LEA increases the timely computation throughput by 1:38 17:5 in simulations and by 1:27 6:5 in Amazon EC2 clusters. At a conceptual level, this chapter has some interesting comparisons/connections with [48]. Under wireless networks, [48] investigates how to turn base stations on or off, in order to adapt to the unknown load arrival and channel statistics. Under cloud computing networks, we focus on how to do the computation load assignment in order to adapt to unknown computing networks. So, at a high-level, the corresponding scheduling problems can be seen 70 as dual of each other: [48] assigns base stations to good (on) or bad (off) states in order to meet the demands, while our goal is to assign the computation loads in order to optimally exploit the (unknown) state of the workers. However, we also point out that the setting and objective of the two works are quite different. We consider cloud computing platforms and focus on the timely computation throughput, which is very different from [48]. Another difference is in the proof techniques to show the optimality of the proposed algorithms. The Lyapunov arguments for the adaptive scheme used in [48] is quite different from our approach. 71 Chapter 4 Edge Computing in the Dark: Leveraging Contextual-Combinatorial Bandit and Coded Computing 4.1 Introduction Recent advancements in edge cloud has enabled users to offload their computations of inter- est to the edge for processing. Specifically, there has been a significant increase in utilizing the edge cloud for event-driven and time-sensitive computations (e.g., IoT applications and cognitive services), in which the users increasingly demand timely services with deadline constraints, i.e., computations of requests have to be finished within specified deadlines. However, large-scale distributed computing networks can substantially suffer from unpre- dictable and unreliable computing infrastructure which can result in high variability of com- puting resources, i.e., service quality of the computing resources may vary over time. The speed variation has several causes including hardware failure, co-location of computation tasks, communication bottlenecks, etc [113, 5]. While edge computing has offered a novel framework for computing service provisioning, a careful design of task scheduling policy is still needed to guarantee the timeliness of task processing due to the increasing demand on real-time response of various applications and the unknown environment of the network. To take advantage of the parallel computing resources for reducing the total latency, the applications are often modeled as a MapReduce computation model, i.e., the computation 72 job can be partitioned to some smaller Map functions which can be distributedly processed by the edge devices. Since the data transmissions between the edge devices can result in large latency delay, it is often the case that the user computes the Reduce function on the results of the Map functions upon receiving the computation results of edge devices to complete the computation job. In this chapter, we study the problem of computation offloading over edge cloud networks with particular focus on unknown environment of computing resources and timely computa- tion jobs. We consider a dynamic computation model, where a sequence of computation jobs needs to be computed over the (encoded) data that is distributedly stored at the edge nodes. More precisely, in an online manner, computation jobs with given deadlines are submitted to the edge network, i.e., each computation has to be finished within the given deadline. We assume the service quality (success probability of returning results back to the user in deadline) of each edge device is parameterized by a context (collection of factors that affect each edge device). The user aims at selecting edge devices from the available edge devices such that the user can receive a recoverable set of computation results in the given deadline. Our goal is then to design an efficient edge computing policy that maximizes the cumulative expected reward, where the expected reward collected at each round is a linear combination of the success probability of the computation and the amount of computational resources used (with negative sign). One significant challenge in this problem is the joint design of (1) data storage scheme to provide robustness against unknown behaviors of edge devices; (2) computation offloading to edge device; and (3) an online learning policy for making the offloading decisions based on the past observed events. In our model, the computation capacities of the devices (e.g., how likely the computation can be returned to the user within the deadline) are unknown to the user. As the main contributions of the chapter, we introduce a coded computing framework in which the data is encoded and stored at the edge devices in order to provide robustness 73 againstunknowncomputationcapabilitiesofthedevices. Thekeyideaofcodedcomputingis to encode the data and design each worker’s computation task such that the fastest responses of any k workers out of total of n workers suffice to complete the distributed computation, similar to classical coding theory where receiving anyk symbols out ofn transmitted symbols enables the receiver to decode the sent message. Under coded computing framework, we formulate a contextual-combinatorial multi-armed bandit (CC-MAB) problem for the edge computing problem, in which the Lagrange coding scheme is utilized for data encoding [112]. Then, we propose a policy called online coded edge computing policy, and show that it achieves asymptotically optimal performance in terms of regret loss compared with the optimal offline policy for the proposed CC-MAB problem by the careful design of the policy parameters. To prove the asymptotic optimality of online coded edge computing policy, we divide the expected regret to three regret terms due to (1) exploration phases, (2) bad selections of edge devices in exploitation phases, and (3) good selections of edge devices in exploitation phases; then we bound these three regrets separately. In addition to proving the asymptotic optimality of online coded edge computing policy, we carry out numerical studies using the real world scenarios of Amazon EC2 clusters. In terms of the cumulative reward, the results show that the online coded edge computing policy significantly outperforms other benchmarks. In the following, we summarize the key contributions in this chapter: • We formulate the problem of coded edge computing using the CC-MAB framework. • We propose online coded edge computing policy, which is provably asymptotically optimal. • We show that the online coded edge computing policy outperforms other benchmarks via numerical studies. 74 4.1.1 Related Prior Work Next,weprovideabriefliteraturereviewthatcoversthreemainlinesofwork: taskscheduling over cloud networks, coded computing, and the multi-armed bandit problem. Inthedynamictaskschedulingproblem, jobsarrivetothenetworkaccordingtoastochas- tic process, and get scheduled dynamically over time. The first goal in task scheduling is to find a throughput-optimal scheduling policy (see e.g., [32]), i.e. a policy that stabilizes the network, whenever it can be stabilized. For example, Max-Weight scheduling, first pro- posed in [98, 28], is known to be throughput-optimal for wireless networks, flexible queueing networks [71], data centers networks [65] and dispersed computing networks [103]. More- over, there have been many works which focus on task scheduling problem with deadline constraints over cloud networks (see e.g., [39]). Coded computing broadly refers to a family of techniques that utilize coding to inject computation redundancy in order to alleviate the various issues that arise in large-scale dis- tributed computing. In the past few years, coded computing has had a tremendous success in various problems, such as straggler mitigation and bandwidth reduction (e.g., [54, 62, 30, 53, 111, 96, 61, 60, 82, 109]). Coded computing has also been expanded in various directions, such as heterogeneous networks (e.g., [86]), partial stragglers (e.g., [34]), secure and private computing (e.g., [21, 112, 102, 93, 91]), distributed optimization (e.g., [45]), federated learn- ing (e.g., [92, 83, 84]), blockchains (e.g., [108, 63]). In a dynamic setting, [106, 105] consider the coded computing framework with deadline constraints and develops a learning strategy that can adaptively assign computation loads to cloud devices. In this chapter, we go beyond the two states Markov model considered in Chapter 3, and make a substantial progress by combining the ideas of coded computing with contextual-combinatorial MAB, which is a more general framework that does not make any strong assumption (e.g., Markov model) on underlying model for the speed of edge devices. 75 The multi-armed bandit (MAB) problem has been widely studied to address the critical tradeoff between exploration and exploitation in sequential decision making under uncer- tainty of environment [51]. The goal of MAB is to learn the single optimal arm among a set of candidate arms of a priori unknown rewards by sequentially selecting one arm each time and observing its realized reward [8]. Contextual bandit problem extends the basic MAB by considering the context-dependent reward functions [57, 88, 90]. The combinatorial bandit problem is another extension of the MAB by allowing multiple-play (select a set of arms) each time [35, 55]. The contextual-combinatorial MAB problem considered in this chapter has also received much attention recently [22, 59, 69, 85]. However, [59, 85] assume that the reward of an action is a linear function of the contexts different from the reward function considered in this chapter. [69] assumes the arm set is fixed throughout the time but the arms (edge devices) may appear and disappear across the time in edge networks. [22] considers a CC-MAB problem for the vehicle cloud computing, in which the tasks are deadline-constrained. However, the task replication technique used in [22] is to replicate the "whole job" to multiple edge devices without taking advantage of parallelism of compu- tational resources. Coded computing is a more general technique which allows the dataset to be first partitioned to smaller datasets and then encoded such that each device has smaller computation compared to [22]. Moreover, the success probability term (for receiving any k results out of n results) of reward function considered in this chapter is more general than the success probability term (for receiving any 1 result out of n results) of reward function considered in [22]. 4.2 System Model 4.2.1 Computation Model We consider an edge computing problem, in which a user offloads its computation to an edge network in an online manner, and the computation is executed by the edge devices. In 76 particular, there is a given deadline for each round of computation, i.e., computation has to be finished within the given deadline. As shown in Fig. 4.1, the considered edge network is composed of a user node and a set of edge devices. There is a dataset X 1 ;X 2 ;:::;X k where each X j is an element in a vector spaceV over a sufficiently large finite fieldF. Each edge device prestores the data which can be possibly a function of X 1 ;X 2 ;:::;X k . Letf1; 2;:::;Tg be the index of the user’s computation jobs received by the edge network over T time slots. In each round t (or time slot in a discrete-time system), the user has a computation job denoted by function g t . Especially, we assume that function g t can be computed by g t (X 1 ;X 2 ;:::;X k ) =h t (f t (X 1 );f t (X 2 );:::;f t (X k )) where function g t and f t (with degree deg(f t )) are multivariate polynomial functions with vector coefficients. In such edge network and motivated by a MapReduce setting, the user is interestedincomputingMapfunctionsf t (X 1 );f t (X 2 );:::;f t (X k )ineachroundtandtheuser computes Reduce functionh t on those results of Map functions to obtaing t (X 1 ;X 2 ;:::;X k ). Remark 5. We note that the considered computation model naturally appears in many machine learning applications which use gradient-type algorithms. For example, in linear regression problems given~ y j which is the vector of observed labels for data X j , each worker j computes f t (X j ) = X > j (X j ~ w t ~ y j ) which is the gradient of the quadratic loss function 1 2 kX j ~ w t ~ y j k 2 with respect to the weight vector ~ w t in round t. To complete the update ~ w t+1 =g t (X 1 ;:::;X k ) = ~ w t t P k j=1 f t (X j ), the user has to collect the computation results f t (X 1 );f t (X 2 );:::f t (X k ). Moreover, the considered computation model also holds for various edge computing ap- plications. For example, in a mobile navigation application, the goal of user is to compute the fastest route to its destination. Given a dataset containing the map information and the traffic conditions over a period of time, edge devices compute map functions which output 77 Figure4.1: Overviewofonlinecomputationoffloadingoveranedgenetworkwithtimelycomputation requests. In round t, the goal of user is to compute the Map functions f t (X 1 );:::;f t (X k ) by the deadline d t using the edge devices. all possible routes between the two end locations. After collecting the intermediate results from edge devices, the user computes the best route. 4.2.2 Network Model In an edge computing network, whether a computation result can be returned to the user depends on many factors. For example, the computation load of an edge device influences its runtime; the output size of the computation task affects the transmission delay, etc. Such factors are referred to as context throughout the chapter. The impact of each context on the edge devices is unknown to the user. More specifically, the computation service of each edge device is modeled as follows. Let T be the context space of dimension D T includes D T different information of com- putation task, e.g., size of input/output, size of computation, and deadline, etc. Let S be the context space of dimension D S for edge devices which includes the information related to edge devices such as computation speed, bandwidth, etc. Let = T S be the joint 78 context space which is assumed to be bounded and thus can be defined by = [0; 1] D and D =D T +D S is the dimension of context space without loss of generality. Ineachroundt, letV t denotethesetofedgedevicesavailabletotheuserforcomputation, i.e., theavailablesetofdevicesmightchangeovertime. Moreover, wedenotebyb t thebudget (maximum number of devices to be used) in round t. The service delay (computation time plus transmission time) of each edge device is parameterized by a given context t 2 . We denote byc t the service delay of edge device, andd t the computation deadline in round t. Let q t = 1 fc t d t g be the indicator that the service delay of edge device is smaller than or equal to the given deadline d t in round t. Also, let ( t ) = E[q t ] = P(c t d t ) be the success probability that edge device returns the computation result back to the user within deadline d t , and t =f( t )g 2V t be the collection of success probabilities of edge devices in round t. Let us illustrate the model through a simple example. Example. In [86], the shifted exponential distributions have been demonstrated to be a good fit for modeling the execution time of a node in cloud networks. Thus, we can model the success probability of an edge device as follows: ( t ) =P(c t d t ) = 8 > > < > > : 1e t (d t a t ) ; d t a t 0 ; a t >d t 0; where the context space consists of the deadline d t , the shift parameter a t > 0, and the straggling parameter t > 0 associated with an edge device. 4.2.3 Problem Statement LetV =f1; 2;:::;jVjg be the set of all edge devices in the network. Given context t = f t g 2V t of the edge devices available to the user in round t, the goal of the user is to select a subset of edge devices from the available set of edge devicesV t V, and decide what to be computed by each selected edge device, such that a recoverable (or decodable as will 79 be clarified later) set of computation results f t (X 1 );:::f t (X k ) can be returned to the user within deadline d t . 4.3 Online Coded Edge Computing In this section, we introduce a coded computing framework for the edge computing problem, and formulate the problem as a contextual-combinatorial multi-armed bandit (CC-MAB) problem. Then, we propose a policy called online coded edge computing policy, which is a context-aware learning algorithm. 4.3.1 Lagrange Coded Computing Similar to Section 3.3, we leverage Lagrange coding scheme [112] for the data storage of edge devices. In the following, we provide an example to show how Lagrange coding scheme is utilized in this framework. In each round t, we consider a computation job which consists of computing quadratic functionsf t (X j ) =X > j (X j ~ w t ~ y j ) over available edge devicesV t =f1; 2;:::; 6g, where input dataset X is partitioned to X 1 ;X 2 . Then, we define function m as follows: m(z),X 1 z 1 0 1 +X 2 z 0 1 0 =z(X 2 X 1 ) +X 1 ; (4.1) in which m(0) = X 1 and m(1) = X 2 . Then, we encode X 1 and X 2 to ~ X = m( 1), i.e., ~ X 1 = X 1 , ~ X 2 = X 2 , ~ X 3 =X 1 + 2X 2 , ~ X 4 =2X 1 + 3X 2 , ~ X 5 =3X 1 + 4X 2 and ~ X 6 =4X 1 + 5X 2 . Each edge device 2f1; 2;:::; 6g prestores an encoded data chunk ~ X locally. If edge device is selected in round t, it computes f t ( ~ X ) = ~ X > ( ~ X ~ w t ~ y ) and returns the result back to the user upon its completion. We note that f t ( ~ X ) =f t (m( 1)) is an evaluation of the composition polynomial f t (m(z)), whose degree at most 2, which implies that f t (m(z)) can be recovered by any 3 results via polynomial interpolation. Then we have f t (X 1 ) =f t (m(0)) and f t (X 2 ) =f t (m(1)). 80 In the following, we formally describe how Lagrange coding scheme works in our proposed framework. We first select k distinct elements 1 ; 2 ;:::; k from F, and let m be the respective Lagrange interpolation polynomial m(z), k X j=1 X j Y l2[k]nfjg z l j l ; (4.2) where u : F! V is a polynomial of degree k 1 such that m( j ) = X j . Recall that V =[ T t=1 V t which is the set of all edge devices. To encode input X 1 ;X 2 ;:::;X k , we select jVj distinct elements 1 ; 2 ;:::; jVj fromF, and encode X 1 ;X 2 ;:::;X k to ~ X v =m( v ) for all v2 [jVj], i.e., ~ X v =m( v ), k X j=1 X j Y l2[k]nfjg v l j l : (4.3) Each edge device2V stores ~ X locally. If edge device is selected in roundt, it computes f t ( ~ X ) and returns the result back to the user upon its completion. Then , the optimal recovery threshold Y t using Lagrange coding scheme is Y t = (k 1)deg(f t ) + 1 (4.4) which guarantees that the computation tasks f t (X 1 );:::;f t (X k ) can be recovered when the user receives anyY t results from the edge devices. 1 The encoding of Lagrange coding scheme is oblivious to the computation task f t . Also, decoding and encoding process in Lagrange coding scheme rely on polynomial interpolation and evaluation which can be done efficiently. Remark 6. We note that the data newly generated in edge device can be encoded and distributed to other devices at off-peak time. Especially, a key property of LCC is that the encoding process can be done incrementally, i.e., when there are some new added datasets, 1 In this chapter, we assume that the number of edge devices is large enough. 81 the update of encoded data can be done incrementally by encoding only on the new data instead of redoing the encoding on all the datasets. For example, let us consider the case that each data X j is represented by a vector. When there is a new generated data element x j added to each data X j , we just encode new data elements x 1 ;x 2 ;:::;x k to ~ x 1 ; ~ x 2 ::: and the new encoded data can be obtained by appending the new encoded data to old encoded data vectors ~ X 1 ; ~ X 2 ;:::. 4.3.2 CC-MAB for Coded Edge Computing Now we consider a coded computing framework in which the Lagrange coding scheme is used for data encoding, i.e., each edge device prestores encoded data ~ X . The encoding process is only performed once for dataset X 1 ;:::;X k . After Lagrange data encoding, the size of input data and computation of each user do not change, i.e., context t of each edge device remains the same. More specifically, we denote byA t the set of devices which are selected in round t for computation. In each roundt, the user picks a subset of devicesA t from all available devices V t , and we callA t V t the “offloading decision”. The reward function r(A t ) achieved by offloading decisionA t is composed of the reward term and the cost term, which is defined as follows: r(A t ) = 8 > > < > > : 1jA t j; if P 2A t q t Y t jA t j; if P 2A t q t <Y t (4.5) where the termjA t j captures the cost of using offloading decisionA t with the unit cost for using one edge device, and Y t is the optimal recovery threshold defined in (4.4). More precisely, the reward term is equal to 1 if the total number of received results is greater than the optimal recovery threshold, i.e., P 2A t q t Y t ; otherwise the reward term is equal to 0. On the other hand, the cost term is defined asjA t j which is the cost of usingA t . 82 Then, the expected reward denoted by u( t ;A t ) in round t can be rewritten as follows: u( t ;A t ) = jA t j X s=Y t X AA t ;jAj=s Y 2A ( t ) Y 2A t nA (1( t ))jA t j (4.6) where the first term of the expected reward of an offloading decision is the success probability that there are at least Y t computation results received by the user for LCC decoding. Consider an arbitrary sequence of computation jobs indexed byf1; 2;:::;Tg for which the user makes offloading decisionsfA t g T t=1 . To maximize the expected cumulative reward, we introduce a contextual-combinatorial multi-armed bandit (CC-MAB) problem for coded edge computing defined as follows: CC-MAB for Coded Edge Computing: max fA t g T t=1 T X t=1 u( t ;A t ) (4.7) s.t.A t V t ;jA t jb t ;8t2 [T ] (4.8) where the constraint (4.8) indicates that the number of edge devices inA t cannot exceed the budgetb t inroundt. TheproposedCC-MABproblemisequivalenttosolvinganindependent subproblem in each round t as follows: max A t jA t j X s=Y t X AA t ;jAj=s Y 2A ( t ) Y 2A t nA (1( t ))jA t j (4.9) s.t.A t V t ;jA t jb t : (4.10) Remark 7. We note that the proposed CC-MAB not only works for LCC but also for any other coding schemes. In this chapter, we focus on LCC since LCC is a universal and optimal encoding technique for arbitrary multivariate polynomial computations. 83 4.3.3 Optimal Offline Policy We now assume that the success probability of each edge device 2V t is known to the user. In round t, to find the optimalA t , we present the following intuitive lemma proved in Appendix Q. Lemma 6. Without loss of generality, we assume ( t 1 )( t 2 ) ( t jV t j ) in round t. Considering all possible setsA t g V t with fixed cardinality n g , the optimalA t g with cardinality n g that achieves the largest expected reward u( t ;A t g ) is A t g =f1; 2;:::;n g g (4.11) which represents the set of n g edge devices having largest success probability ( t ) among all the edge devices. By Lemma 6, to find the optimal setA t , we can only focus on finding the optimal size ofA t . Since there are only b t choices for size ofjA t j (i.e., 1; 2;:::;b t ), this procedure can be done by a linear search with the complexity linear in the number of edge devicesjV t j. We present the optimal offline policy in Algorithm 1. Remark 8. We note that the expected reward function considered in [22] is a submodular function, which can be maximized by a greedy algorithm. However, the expected reward functiondefinedin(4.6)ismoregeneralwhichcannotbemaximizedbythegreedyalgorithm. More specifically, one can show that the expected reward defined in equation (4.6) is not submodular by checking the property of submodular functions, i.e., for all possible subsets ABV, u(;fg[A)u(;A) u(;fg[B)u(;B) does not hold. Without the property of submodularity, Lemma 6 enables us to maximize equation (4.6) by a linear search. 84 Algorithm 1: Optimal Offline Policy Input:V t ;b t ;Y t ;( t );2V t ; Initialization:A =;,A opt =;, u opt = 0; Sort t : ( t 1 )( t 2 )( t jV t j ); A f1; 2;:::;Y t g ; A opt f1; 2;:::;Y t g ; u opt u( t ;A); for z Y t + 1 to b t do A A[fzg; if u( t ;A)>u opt then A opt A; u opt u( t ;A) end end returnA opt LetfA t g T t=1 be the offloading decisions derived by a certain policy. The performance of this policy is evaluated by comparing its loss with respect to the optimal offline policy. This loss is called the regret of the policy which is formally defined as follows: R(T ) =E T X t=1 r(A t )r(A t ) (4.12) = T X t=1 u( t ;A t )u( t ;A t ): (4.13) In general, the user does not know in advance the success probabilities of edge devices due to the uncertainty of the environment of edge network. In the following subsection, we will propose an online learning policy for the proposed CC-MAB problem which enables the user to learn the success probabilities of edge devices over time by observing the service quality of each selected edge device, and then make offloading decisions adaptively. 4.3.4 Online Coded Edge Computing Policy Now, we describe the proposed online edge computing policy. The proposed policy has two parametersh T andK(t) to be designed, whereh T decides how we partition the context 85 space, andK(t) is a deterministic and monotonically increasing function, used to identify the under-explored context. The proposed online coded edge computing policy (see Algorithm 2) is performed as follows: Initialization Phase: Given parameterh T , the proposed policy first creates a partition denoted byP T for the context space , which splits into (h T ) D sets. Each set is a D- dimensional hypercube of size 1 h T 1 h T . For each hypercube p2P T , the user keeps a counterC t (p) which is the number of selected edge devices that have context t in hypercube p before round t. Moreover, the policy also keeps an estimated success probability denoted by ^ t (p) for each hypercube p. LetQ t (p) =fq : 2 p;2A ; = 1;:::;t 1g be the set of observed indicators (successful or not) of edge devices with context in p before round t. Then, the estimated success probability for edge devices with context t 2p is computed by ^ t (p) = 1 C t (p) P q2Q t (p) q. In each round t, the proposed policy has the following phases: Hypercube Identification Phase: Given the contexts of all available edge devices t =f t g 2V t, the policy determines the hypercube p t 2P T for each context t such that t is inp t . We denote byp t =fp t g 2V t the collection of these identified hypercubes in round t. To check whether there exist hypercubes p2p t that have not been explored sufficiently, we define the under-explored hypercubes in round t as follows: P ue;t T =fp2P T :92V t ; t 2p;C t (p)K(t)g: (4.14) Also, we denote byV ue;t the set of edge devices which fall in the under-explored hypercubes, i.e.,V ue;t =f2V t :p t 2P ue;t T g. Depending onV ue;t in round t, the proposed policy then either enters an exploration phase or an exploitation phase. Exploration Phase: IfV ue;t is non-empty, the policy enters an exploration phase. If setV ue;t contains at least b t edge devices (i.e.,jV ue;t j b t ), then the policy randomly selects b t edge devices fromV ue;t . IfV ue;t contains fewer than b t edge devices (jV ue;t j < b t ), then the policy selects all edge devices fromV ue;t . To fully utilize the budget b t , the 86 remaining (b t jV ue;t j) ones are picked from the edge devices with the highest estimated success probability among the remaining edge devices inV t nV ue;t . Exploitation Phase: IfV ue;t is empty, the policy enters an exploitation phase and it selectsA t using the optimal offline policy based on the estimated success probabilities ^ t =f^ t (p t )g 2V t. Update Phase: After selecting the edge devices, the proposed policy observes whether each selected edge device returns the result within the deadline; then, it updates ^ t (p t ) and C t (p t ) of each hypercube p t 2P T . The following example illustrates how the policy works given parameters h T and K(t). Example. Consider the edge computing network in which the success probability of an edge device is defined by a shifted exponential distribution as defined in Example 4.2.2. It can be shown that the Hölder condition with = 1 holds. Then, we have parameters h T =dT 1 6 e and K(t) = t 1 3 log (t). We assume that the online coded edge computing policy is run over time horizon T = 1000. Then, we have h T = 4. Before running the policy, we createP T by partitioning the domain of each context (i.e., deadlines, shift parameters and straggling parameters) into h T = 4 intervals, which generates totally 64 sets. We keep a counter C t (p) for each generated hypercubep2P T . In hypercube identification phase, if there exists edge device with t such that t is located in hypercubep and the counterC t (p) is smaller than K(t), thenp is the under-explored hypercube. The policy will proceed to either exploration phase or exploitation phase depending on whether under-explored hypercube exists. 4.4 AsymptoticOptimalityofOnlineCodedEdgeComputing Policy In this section, by providing the design of policy parameters h T andK(t), we show that the online coded edge computing policy achieves a sublinear regret in the time horizon T which guarantees an asymptotically optimal performance, i.e., lim T!1 R(T ) T = 0. 87 Algorithm 2: Online Coded Edge Computing Policy Input: T;h T ;K(t); Initialization:P T ; C(p) = 0, ^ (p) = 0,8p2P T ; for t 1 to T do Observe edge deviceV t and contexts t ; Findp t =fp t g 2V t, p t 2P T such that t 2p t ; IdentifyP ue;t andV ue;t ; ifP ue;t 6=; then ifjV ue;t jb t then A t randomly pick b t edge devices inV ue;t ; else A t pick all edge devices inV ue;t and other (b t jV ue;t j) ones with the largest ^ (p t ) inV t nV ue;t end else A t obtained by Algorithm 1 based on ^ t and b t end for each edge device 2A t do Observe q t of edge device ; Update ^ (p t ) = ^ (p t )C(p t )+q t C(p t )+1 ; Update C(p t ) =C(p t ) + 1; end end To conduct the regret analysis for the proposed CC-MAB problem, we make the follow- ing assumption on the success probabilities of edge devices in which the devices’ success probabilities are equal if they have the same contexts. This natural property is formalized by the Hölder condition defined as follows: Assumption 1 (Hölder Condition). A real function f on D-dimensional Euclidean space satisfies a Hölder condition, when there existL> 0 and> 0 for any two contexts; 0 2 , such thatjf()f( 0 )jLk 0 k , wherekk is the Euclidean norm. Under Assumption 1, we choose parameters h T =dT 1 3+D e for the partition of context space andK(t) =t 2 3+D log (t) in roundt for identifying the under-explored hypercubes of the context. We present the following theorem which shows that the proposed online coded edge computing policy has a sublinear regret upper bound. 88 Theorem 5 (Regret Upper Bound). Let K(t) = t 2 3+D log (t) and h T =dT 1 3+D e. If the Hölder condition holds, the regret R(T ) is upper-bounded as follows: R(T )(1 +B)2 D (T 2+D 3+D log (T ) +T D 3+D ) + (1 +B)B 2 3 B X k=1 jVj k + (3LD 2 + 6 + 2D 2 +D )BMT 2+D 3+D ; where B = max 1tT b t and M = max 1tT B1 Y t 1 . The dominant order of the regret R(T ) is O(T 2+D 3+D log (T )) which is sublinear to T. Proof. We first define the following terms. For each hypercube p2P T , we define = sup 2p () and = inf 2p () as the best and worst success probabilities over all contexts 2p. Also, we define the context at center of a hypercubep as ~ p and its success probability ~ (p) = ( ~ p ). Given a set of available edge devices V t , the corresponding context set t =f t g 2V t and the corresponding hypercube setP t =fp t g 2V t for each round t, we also define t =f(p t )g 2V t, t =f(p t )g 2V t and ~ t =f~ (p t )g 2V t. For each round t, we define set ~ A t which satisfies ~ A t = argmax AV t ;jAjb tu(~ t ;A) (4.15) We then use set ~ A t to identify the set of edge device which are bad to select. We define L t = G :GV t ;jGjb t ;u( t ; ~ A t )u( t ;G)At to be the set of suboptimal subsets of arms for hypercube setP t , where A > 0 and < 0 are the parameters which will be used later in the regret analysis. We call a subset G2L t 89 suboptimal andA t b nL t near-optimal forP t , whereA t b denotes the subset ofV t with size less than b t . Then the expected regret R(T ) can be divided into three summands: R(T ) =E[R e (T )] +E[R s (T )] +E[R n (T )]; (4.16) where E[R e (T )] is the regret due to exploration phases and E[R s (T )] and E[R n (T )] both correspond to regret in exploitation phases: E[R s (T )] is the regret due to suboptimal choices, i.e., the subsets of edge devices fromL t are selected; E[R n (T )] is the regret due to near- optimal choices, i.e., the subsets of edge devices fromA t b nL t . In the following, we prove that each of the three summands is bounded. First, the following lemma (see the proof in Appendix M) gives a bound for E[R e (T )], which depends on the choice of two parameters z and . Lemma 7. (Bound for E[R e (T )]). Let K(t) = t z log (t) and h T =dT e, where 0 < z < 1 and 0< < 1 D . If the algorithm is run with these parameters, the regret E[R e (T )] is bounded by E[R e (T )] (1 +B)2 D (T z+ D log (T ) +T D ) (4.17) where B = max 1tT b t . Next, the following lemma (see the proof in Appendix N) gives a bound for E[R s (T )], which depends on the choice of z and with an additional condition of these parameters which has to be satisfied. Lemma 8. (Bound for E[R s (T )]). Let K(t) = t z log (t) and h T =dT e, where 0 < z < 1 and 0< < 1 D . If the algorithm is run with these parameters, Assumption 1 holds, and the 90 additional condition 2BMt z 2 At is satisfied for all 1 t T, the regret E[R s (T )] is bounded by E[R s (T )] (1 +B)B 2 3 B X k=1 jVj k ; (4.18) where B = max 1tT b t , and M = max 1tT B1 Y t 1 . Lastly, the following lemma (see the proof in Appendix O) gives a bound for E[R n (T )], which depends on the choice of z and . Lemma 9. (Bound for E[R n (T )]). Let K(t) = t z log (t) and h T =dT e, where 0 < z < 1 and 0< < 1 D . If the algorithm is run with these parameters and Assumption 1 holds, the regret E[R n (T )] is bounded by E[R n (T )] 3BMLD 2 T 1 + A 1 + T 1+ : (4.19) where B = max 1tT b t and M = max 1tT B1 Y t 1 . Now, let K(t) = t z log (t) and h T = dT e, where 0 < z < 1 and 0 < < 1 D ; let H(t) = BMt z 2 . Also, we assume that Assumption 1 holds and the additional condition 2BMt z 2 At is satisfied for all 1 t T. By Lemma 7, 8, and 9, the regret R(T ) is bounded as follows: R(T )(1 +B)2 D (T z+ D log (T ) +T D ) + (1 +B)B 2 3 B X k=1 jVj k + 3BMLD 2 T 1 + A 1 + T 1+ : (4.20) 91 Now, we select the parameters z; ;A; according to the following values z = 2 3+D 2 (0; 1), = 1 3+D 2 (0; 1 D ), = 3+D and A = 2BM. It is clear that condition 2BMt z 2 At is satisfied. Then, the regret R(T ) can be bounded as follows: R(T )(1 +B)2 D (T 2+D 3+D log (T ) +T D 3+D ) + (1 +B)B 2 3 B X k=1 jVj k + (3LD 2 + 6 + 2D 2 +D )BMT 2+D 3+D ; (4.21) which has the dominant order O(T 2+D 3+D log (T )). Remark 9. Based on Assumption 1, the parameters h T andK(t) are designed such that the regretachievedbythepolicyissublinearasstatedinTheorem5. Inthefollowing, weprovide some intuitions behind the choices of h T and K(T ). We first assume that the parameters are chosen as h T =dT e and K(t) = t z log (t), in which and z are designed later. In the proof of Lemma 7, to boundE[R e (T )], our main task is designingT z+ D log (T ) +T D to be a sublinear term. In the proof of Lemma 8, one of key steps is to bound Pr(V t G ;W t ) by the term oft 2 (see equation (42)) such thatE[R s (T )] is bounded by the term of P 1 t=1 t 2 which converges to a constant (see equation (46)). In particular, we first bound Pr(E 1 ) and Pr(E 2 ) by the terms of exp 2t z log (t)H(t) 2 B 2 M 2 (see equation (37)), and choose H(t) to be BMt z 2 . By Lemma 9, we boundE[R n ] by 3BMLD 2 T 1 + A 1+ T 1+ which can be sublinear by selecting the appropriate andz. By carefully selecting parametersh T =dT 1 3+D e andK(t) =t 2 3+D , the regret upper bound is shown to be subliear in Theorem 5. 92 4.5 Experiments In this section, we demonstrate the impact of the online coded edge computing policy by simulation studies. In particular, we carry out extensive simulations using the shifted ex- ponential models which have been demonstrated to be a good model for Amazon EC2 clus- ters [86]. Given a dataset partitioned to X 1 ;X 2 ;:::;X 5 , we consider the linear regression problem using the gradient algorithm. It computes the gradient of quadratic loss function 1 2 kX j ~ w t ~ y j k 2 with respect to the weight vector ~ w t in round t, i.e., f t (X j ) = X > j (X j ~ w t ~ y j ) for all 1j 5. The computation is executed over a set of edge devicesV, where each edge device 2V stores an encoded data chunk ~ X using Lagrange coding scheme. In such setting, we have the optimal recovery threshold Y t = 9. The penalty parameter is 0:01. Motivated by the distribution model proposed in [86] for total execution time in cloud networks, we model the success probability of each edge device2V as a shifted exponential function defined as follows: ( t ) =P(c t d t ) = 8 > > < > > : 1e t (d t a t ) ; d t a t ; 0 ; a t >d t 0; (4.22) where the context of each edge device consists of the deadlined t , the shift parametera t > 0, and the straggling parameter t > 0 associated with edge device . Under this model, the dimension of context space D is 3. Moreover, for function defined in (4.22), it can be shown that the Hölder condition with = 1 holds. Thus, we run the online coded edge computing policy with parameters h T =dT 1 6 e and K(t) =t 1 3 log (t). By the empirical analysis in [86], the instance of type r4.2xlarge is shown to have the shift parameter a = 1:37 and the straggling parameter = 120. And, the instance of type r4.xlarge has the shift parameter a = 2 and the straggling parameter = 115. Based on the real-world parameters for Amazon EC2 clusters, the deadline d t 2 [d min ;d max ] (sec), the 93 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Round t 0 1000 2000 3000 4000 5000 6000 Cumulative Reward Online coded compuitng policy Optimal offline policy LinUCB UCB Random Figure 4.2: Numerical evaluations for cumulative reward for Scenario 1. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Round t 0 1000 2000 3000 4000 5000 6000 Cumulative Reward Online coded compuitng policy Optimal offline policy LinUCB UCB Random Figure 4.3: Numerical evaluations for cumulative reward for Scenario 2. shift parameter a t 2 [1:37; 2] (sec), and the straggling parameter t 2 [115; 120] (1/sec) are chosen uniformly at random in each round t. We consider the following four scenarios for the simulations: • Scenario 1:jVj = 20, (d min ;d max ) = (1; 2), and b t = 12. • Scenario 2:jVj = 15, (d min ;d max ) = (1; 2), and b t = 12. • Scenario 3:jVj = 20, (d min ;d max ) = (1; 2), and b t = 15. • Scenario 4:jVj = 20, (d min ;d max ) = (0:5; 3), and b t = 12. 94 For each scenario, the following benchmarks are considered to compare with the online coded edge computing policy: 1. Optimal Offline policy: Assuming knowledge of the success probability of each edge device in each round, the optimal set of edge devices is selected via Algorithm 1. 2. LinUCB [57]: LinUCB is a contextual-aware bandit algorithm which picks one arm in each round. We obtain a set of edge devices by repeating b t times of LinUCB. By sequentially removing selected edge devices, we ensure that the b t chosen edge devices are distinct. 3. UCB [8]: UCB algorithm is a non-contextual and non-combinatorial algorithm. Sim- ilar to LinUCB, we repeat UCB b t times to select edge devices. 4. Random: A set of edge devices with size ofb t is selected randomly from the available edge devices in each round t. Fig. 4.2 to Fig. 4.5 provide the cumulative rewards comparison of the online coded edge computing policy with the other 4 benchmarks. We make the following conclusions from Fig. 4.2 to Fig. 4.5: • The optimal offline policy achieves the highest reward which gives an upper bound to the other policies. After a period of exploration, the proposed online policy is able to exploit the learned knowledge, and the cumulative reward approaches the upper bound. • The proposed online coded edge computing policy significantly outperforms other benchmarks by taking into account the context of edge computing network. • Random and UCB algorithms are not effective since they do not take the context into account for the decisions. Although LinUCB is a contextual-aware algorithm, it achieves similar cumulative regret as random and UCB algorithms. That is because the 95 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Round t 0 1000 2000 3000 4000 5000 6000 Cumulative Reward Online coded compuitng policy Optimal offline policy LinUCB UCB Random Figure 4.4: Numerical evaluations for cumulative reward for Scenario 3. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Round t 0 1000 2000 3000 4000 5000 6000 Cumulative Reward Online coded compuitng policy Optimal offline policy LinUCB UCB Random Figure 4.5: Numerical evaluations for cumulative reward for Scenario 4. success probability model is more general here than the linear functions that LinUCB is tailored for. Fig. 4.6 presents the expected regret of the proposed policy for Scenario 1. We can conclude that the proposed policy achieves a sublinear regret in the time horizon T demonstrates the asymptotic optimality, i.e., lim T!1 R(T ) T = 0. 4.6 Concluding Remarks and Future Directions Motivated by the volatility of edge devices’ computing capabilities and the quality of ser- vice, and increasing demand for timely event-driven computations, we consider the problem 96 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Round t 0 50 100 150 200 250 300 350 400 450 500 Expected Regret Figure 4.6: Expected regret of the online coded edge computing policy for Scenario 1. of online computation offloading over unknown edge cloud networks without the knowl- edge of edge devices’ capabilities. Under the coded computing framework, we formulate a combinatorial-contextual multiarmed bandit (CC-MAB) problem, which aims to maximize the cumulative expected reward. We propose the online coded edge computing policy which provably achieves asymptotically-optimal performance in terms of timely throughput, since the regret loss for the proposed CC-MAB problem compared with the optimal offline policy is sublinear. Finally, we show that the proposed online coded edge computing policy sig- nificantly improves the cumulative reward compared to the other benchmarks via numerical studies. 97 Chapter 5 Coded Computing for Secure Boolean Computations 5.1 Introduction With the growing size of modern datasets for applications such as machine learning and data science, it is necessary to partition a massive computation into smaller computations and perform these smaller computations in a distributed manner for improving overall per- formance [2]. However, distributing the computations to some external entities, which are not necessarily trusted, i.e., adversarial servers make security a major concern [13, 25, 17]. Thus, it is important to provide security against adversarial workers that deliberately send erroneous data in order to affect the computation for their benefit. Boolean functions are primarily used in the design of cryptographic algorithms [26]. In particular, computing Boolean functions is one of the key components of blockchains. In the blockchain systems, Boolean functions can be used to represent the verification functions which validate the transactions in the new proposed blocks [58]. Specifically, each node computes function is_valid_txn2fTrue,Falseg to determine whether a transaction is valid or not [20]. Due to the heavy computation cost incurred by validating all the blocks, the nodes with limited resources cannot verify all the blocks independently. To improve the efficiency (e.g., number of transactions verified by the system), the leading solution is via sharding [64] whose idea is to partition the blockchain into sub-chains and the block validations are executed distributively in each node. 98 In this chapter, we consider the problem of computing a Boolean function (e.g., block validation) in which the computation is carried out distributively across several workers with particular focus on security against Byzantine workers. Specifically, using a master-worker distributed computing system with N workers, the goal is to compute the Boolean function f :f0; 1g m !f0; 1g over a dataset of K samples X 1 ;:::;X K , i.e., f(X 1 );:::;f(X K ), in which the (encoded) datasets are prestored in the workers such that the computations can be secure against adversarial workers in the system. Especially, we consider the adversarial model in which the malicious workers do not have any computational restriction and are capable of sending erroneous data. To measure the robustness against adversaries of a given schemeS, we use the metric security threshold S which is defined as the maximum number of adversarial workers that can be tolerated by the master, i.e., the correct results can be recovered even if there are up to S adversarial workers. Any Boolean function can be modeled as an Algebraic normal form (i.e., multivariate polynomial) [26]. Thus, the recently proposed Lagrange Coded Computing (LCC) [112], a universal encoding technique for arbitrary multivariate polynomial computations, can be used to simultaneously alleviate the issues of resiliency, security, and privacy. In overview, for the problem of computing an arbitrary multivariate polynomial f :V!U over a field F, LCC encodes X 1 ;:::;X K 2 V by evaluating the well-known Lagrange polynomial, and each encoded data is stored in a different worker. The workers then apply the multivariate polynomial of interest f (e.g., Boolean function) on their encoded data and return the computation results back to the master. Since the computation executed in each worker can be viewed as a composition of a multivariate polynomial and a univariate polynomial, the problem becomes a polynomial interpolation with errors and erasures. The master recovers the computation by evaluating the interpolated polynomial at the appropriately chosen points. The security threshold provided by LCC is N(K1)degf1 2 (given N and K) which can be extremely low if the degree of corresponding multivariate polynomial degf is high 99 1 −1 y x Figure 5.1: Modeling the Boolean function as a general polynomial can result in the high-degree difficulty which makes the security threshold low by using LCC encoding. The main idea of our proposed approach is to model it as the concatenation of some low-degree polynomials and the threshold functions. (see more details in Section 5.3). Such degree problem can be further amplified in complex Boolean functions whose degree can be high in general. Thus, our main problem is as follows: What is the maximum possible security threshold and the corresponding scheme, given f, N and K? 5.1.1 Main Contributions As main contributions of this chapter, instead of modeling the Boolean function as a general polynomial, we propose the three schemes modeling it as the concatenation of some low- degree polynomials and the threshold functions (see Figure 5.1). To illustrate the main idea of the proposed schemes, consider an AND function of three input bits X[1];X[2];X[3] which is formally defined by f(X) = X[1]^X[2]^X[3]. The function f can be modeled as a polynomial function (Algebraic normal form) X[1]X[2]X[3] which has a degree of 3. 100 For this polynomial, LCC achieves the security threshold N3(K1)1 2 . Instead of directly computingthedegree-3polynomial,ourproposedapproachistomodelitasalinearthreshold function sgn(X[1]+X[2]+X[3] 5 2 ) in whichf(X) = 1 if and only if sgn(X[1]+X[2]+X[3] 5 2 )> 0. Then, a simple linear code (e.g., (N;K) MDS code) can be used for computing the linear function X[1] +X[2] +X[3], which provides the optimal security threshold NK 2 . We propose three different schemes called coded Algebraic normal form (ANF), coded Disjunctive normal form (DNF) and coded polynomial threshold function (PTF). The idea behind coded ANF (DNF) is to first decompose the Boolean function into some monomials (clauses) and then construct a linear threshold function for each monomial (clause). For both of coded ANF and coded DNF, an (N;K) MDS code is used to encode the datasets. On the other hand, the proposed coded PTF models the Boolean function as a low-degree polynomial threshold function, and LCC is used for the data encoding. For any general Boolean function f, the proposed coded ANF and coded DNF achieve the security threshold NK 2 , which is independent of degf. In terms of security threshold, we prove that coded ANF and coded DNF are optimal by deriving a matching theoretical outer bound. To demonstrate the impact of coded ANF and coded DNF, we consider the problem of computing 8-bit S-box in the application of block cyphers using a distributed computing system with 100 workers in Appendix A. We show that coded ANF and coded DNF can significantly improve the security threshold by 150% as compared to LCC. In Table 5.1, we summarize the performance comparison of LCC and the proposed three schemes in terms of the security threshold and the decoding complexity. As compared to LCC, coded ANF and coded DNF provide the substantial improvement on the security threshold. In particular, coded ANF has the decoding complexityO(r(f)N log 2 N log logN) which works well for the Boolean functions with low sparsityr(f); coded DNF has the decod- ing complexity O(w(f)N log 2 N log logN) which works well for the Boolean functions with small weightw(f) (see the definitions ofr(f) andw(f) in Section 2.2). For the Boolean func- tions with the polynomial size of r(f) andw(f), coded PTF outperforms LCC by achieving 101 Security Threshold Decoding Complexity LCC N(K1)deff1 2 O(mN log 3 N log logN) Coded ANF NK 2 O(r(f)N log 2 N log logN) Coded DNF NK 2 O(w(f)N log 2 N log logN) Coded PTF N(K1)(blog 2 w(f)c+1)1 2 O(N log 2 N log logN) Outer Bound NK 2 - Table5.1: PerformancecomparisonofLCCandtheproposedthreeschemesfortheBooleanfunction f :f0;1g m !f0;1g which has the sparsity r(f) and weight w(f). the better security threshold and the almost linear decoding complexity which is independent of m (see more details in Section 5.6). We also extend the problem to a more general computation model in Appendix B, i.e., f is a multivariate polynomial function. To resolve the high-degree difficulty arising in com- puting general polynomials, we propose two schemes: coded data logarithm and coded data augmentation. By taking the logarithm of original data, the proposed coded data logarithm scheme reduces the degree of polynomial computations, and improves the security threshold as compared to LCC. On the other hand, the proposed coded data augmentation scheme pre-stores some low-degree monomials in advance to make the polynomial computation’s degree reduced. 5.1.2 Related Prior Work Next, we provide a brief literature review that covers two main lines of work: polynomial threshold functions representing Boolean functions, and coded computing. The expressive power of real polynomial threshold functions for representing Boolean functions has been extensively studied over the decades. The study of representing Boolean functions by polynomial threshold functions was initiated in [87, 14, 68]. The following works focused largely on the degree of PTF needed to represent a Boolean function (e.g., [73, 7, 77, 76, 19]), and the density of PTF needed to represent a Boolean function (e.g, [78, 76, 4, 102 89]). Polynomials threshold functions also play a vital role in complexity theory and learning theory (e.g., [46, 3]). Coded computing broadly refers to a family of techniques that utilize coding to inject computation redundancy in order to alleviate the various issues that arise in large-scale distributed computing. In the past few years, coded computing has had a tremendous success in various problems, such as straggler mitigation and bandwidth reduction (e.g., [54, 62, 30, 53, 111, 96, 61, 70, 60, 82, 36]). Coded computing has also been expanded in various directions, such as heterogeneous networks (e.g., [86]), partial stragglers (e.g., [34]), secure and private computing (e.g., [21, 112, 43, 74, 93, 91, 110, 94]), distributed optimization (e.g., [45]), federated learning (e.g., [92, 84, 83]), blockchains (e.g., [108, 58]) and dynamic networks (e.g., [105, 106, 104]). So far, research in coded computing has focused on developing frameworks for some linear functions (e.g., matrix multiplications). However, there has been no works prior to our work that consider coded computing for Boolean functions. In this chapter, we make the substantial progress of improving the security threshold by proposing coded ANF, coded DNF and coded PTF which leverage the idea of the threshold function representation. Notation. For the Boolean logical operations, we denote the logical operators of AND, OR, XOR and NOT by^,_, and respectively. 5.2 System Model We consider the problem of evaluating a Boolean functionf :f0; 1g m !f0; 1g over a dataset ~ X = (X 1 ;:::;X K ), whereX 1 ;:::;X K arem-dimensional vectors over the fieldf0; 1g. Given a distributed computing environment with a master and N workers, our goal is to compute f(X 1 );:::;f(X K ). 103 Each Boolean function f :f0; 1g m !f0; 1g can be represented by an Algebraic normal form (ANF) [26, 75] as follows: f(X) = M S[m] f (S) Y j2S X[j] (5.1) where X[j] is the j-bit of data X and f (S)2f0; 1g is the ANF coefficient of the cor- responding monomial Q j2S X[j]. The total degree 1 of the ANF representation of Boolean functionf is denoted by degf. We denote the sparsity (number of monomials) of f byr(f), i.e.,r(f) = P S[m] f (S). Since each monomial in ANF has the degree up to degf, the total complexity of computing f(X 1 );:::;f(X K ) via ANF of f is O(Kr(f)degf). Furthermore, we denote the support off by Supp(f) which is the set of vectors inf0; 1g m such that f = 1, i.e., Supp(f) =fX2f0; 1g m : f(X) = 1g. Let w(f) be the weight of Boolean function f, defined by w(f) =jSupp(f)j. Alternatively, each Boolean function f can be represented by a Disjunctive normal form (DNF) as follows: f =T 1 _T 2 __T w(f) (5.2) where each clauseT i hasm literals 2 in which each literal corresponds to an inputY i such that f(Y i ) = 1. For example, ifY i = 001, then the corresponding clause isY i [1]^Y i [2]^Y i [3]. Since each clause of DNF hasm literals, the total complexity of computingf(X 1 );:::;f(X K ) via DNF of f is O(Kmw(f)). Prior to computation, each worker has already stored a fraction of the dataset in a possibly coded manner. Specifically, each worker n stores ~ X n = g n (X 1 ;:::;X K ), where g n :f0; 1g m f0; 1g m | {z } K !U is the encoding function of worker n andU is an arbitrary vector space. We restrict our attention to linear encoding schemes, which guarantee low encoding complexity. Each worker n computes h( ~ X n ) and returns the result back to the 1 The total degree of a multivariate polynomial is the maximum among all the total degrees of its monomials. 2 A literal is a Boolean variable or the complement of a Boolean variable. 104 master, in which h is the the multivariate polynomial function decided by the master and f(X) is function of h(X). Then, the master aggregates the results from the workers until it receives a decodable set of local computations. We say a set of computations is decodable if h(X 1 );:::;h(X K )canbeobtainedbycomputingdecodingfunctionsoverthereceivedresults. More concretely, given any subset of workers that return the computing results (denoted byK), the master computes v K (fh( ~ X n )g n2K ), where each v K is a deterministic function. We refer to the v K ’s as decoding functions. Finally, the master computes f(X 1 );:::;f(X K ) based on h(X 1 );:::;h(X K ). In particular, we focus on finding the scheme (~ g;h) to be robust to as many adversarial workers as possible in the system where~ g = (g 1 ;:::;g N ) is the collection of encoding func- tions. To measure the robustness against adversaries of a given scheme, we use the metric security threshold defined as follows: Definition 10 (Security Threshold). For an integer b, we say a scheme S is b-secure if the master can be robust againstb adversaries, i.e., the master can recover all the correct results even if up to b workers return arbitrarily erroneous results. The security threshold, denoted by S , is the maximum value of b such that a scheme S is b-secure, i.e., S , supfb :S is b-secureg: (5.3) Based on the above system model, the problem is now formulated as: What is the scheme which achieves the optimal security threshold with low decoding complexity? Remark 10. To see how much computation cost that the master can save using a given scheme,itisimportanttocomparethetotalcomplexityofcomputingK evaluationsf(X 1 );:::;f(X K ) (by the master itself) with the complexity incurred by the scheme. Since the encoding pro- cess of a scheme is only executed once before starting any computations, we focus on the decoding complexity which is the main cost incurred by a scheme throughout this chapter. 105 Remark 11. To see how the distributed Boolean computation is applicable to a sharded blockchain system, we can consider a blockchain system PolyShard [7] which is implemented distributedly over some untrusted nodes. At each time epoch, each node stores a coded version of sub-chain and computes a validation function directly on the coded sub-chain and a coded block (generated by computing an encoding function on the incoming blocks). After the computations, each node broadcasts the computed result to all other nodes. Then, each node computes the decoding function on the received computation results to reduce the desired validation result and determines the validity of block. That is, each node plays the role of a master node after the procedure of broadcasting. When there is a new participant joining the network, a new coded sub-chain can be generated and stored in this new node. When there is a participant leaving the network, the blockchain with remaining nodes can still work since each node stores a coded sub-chain and the system can follow the same procedure for the block validations. 5.3 Overview of Lagrange Coded Computing In this section, we consider Lagrange Coded Computing (LCC) [112], which is a universal encoding technique for the class of multivariate polynomial functions. Then, we show how it works for our problem. Since Lagrange coded computing requires the underlying field size to be at least the numberofworkersN,wefirstextendthefieldsizeoff0; 1gsuchthatthesizeofextensionfield is at least the number of workers N. More specifically, we embed each bit X k [j]2f0; 1g of dataX k into a binary extension fieldF 2 t such that with 2 t N. The embedding X k [j]2F 2 t of the bit X k [j] is generated such that X k [j] = 8 > > > > < > > > > : 00 0 | {z } t ; X k [j] = 0; 00 0 | {z } t1 1; X k [j] = 1: (5.4) 106 Note that over extension field the output of Boolean function f is 00 0 | {z } t if the original result is 0; 00 0 | {z } t1 1 if the original result is 1. For the data encoding by using LCC, we first select K distinct elements 1 ; 2 ;:::; K from the binary extension fieldF 2 t, and let u be the respective Lagrange interpolation poly- nomial: u(z), K X k=1 X k Y l2[K]nfkg z l k l ; (5.5) where u :F 2 t!F m 2 t is a polynomial of degree K 1 such that u( k ) = X k . Then we can select distinct elements 1 ; 2 ;:::; N 2F 2 t, and encode X 1 ;:::; X K to ~ X n = u( n ) for all n2 [N], i.e., ~ X n =u( n ), K X k=1 X k Y l2[K]nfkg n l k l : (5.6) Each worker n2 [N] stores ~ X n locally. Following the above data encoding, each worker n computes function f on ~ X n and sends the result back to the master upon its comple- tion. Since the computation is over the extension field, the complexity at each worker is O(tr(f)degf). After receiving results from all the workers, the master can obtain all coefficients of f(u(z)) by applying Reed-Solomon decoding [11, 67]. Having this polynomial, the master evaluates it at k for everyk2 [K] to obtainf(u( k )) =f( X k ). The complexity of decoding a length-N Reed-Solomon code with dimension (K 1)degf + 1 for one symbol over the extension field is O(tN log 2 N log logN). To have a sufficiently large field for LCC, we pick t =dlogNe. Since there are m symbols in each ~ X n , the decoding process by the master requires complexity O(mN log 3 N log logN). 107 In the following, we present the security threshold provided by LCC. By [112], to be robust to b adversarial workers (given N and K), LCC requires N (K 1)degf + 2b + 1; i.e., LCC achieves the security threshold LCC = N (K 1)degf 1 2 : (5.7) The security threshold achieved by LCC depends on the degree of function f, i.e., the security guarantee is highly degraded if f has high degree. To mitigate such degree effect, we model the Boolean function as the concatenation of some low-degree polynomials and the threshold functions by proposing three schemes in the following sections. 5.4 Scheme 1: Coded Algebraic Normal Form In this section, we propose a coding scheme called coded Algebraic normal form (ANF) which computes the ANF representations of Boolean function by the linear threshold functions (LTF) and a simple linear code is used for the data encoding. We start with an example to illustrate the idea of coded ANF. Example. We consider a function which has an ANF representation defined as follows: f(X) =X[1]X[2]X[ m 2 ]: (5.8) Then, we define a linear function over real field as follows: L(X) = m 2 X j=1 X[j] (5.9) with a bias term B = m 2 + 1 2 , where L(X) +B = 1 2 if and only if f(X) = 1. Otherwise, L(X) + B 1 2 . Thus, we can compute f(X) by computing its corresponding linear threshold function sgn(L(X)+B), i.e.,f(X) = 1 if sgn(L(X)+B) = 1; otherwise,f(X) = 0 108 if sgn(L(X) +B) =1. Unlike computing the function f(X) with the degree m 2 which results in low security threshold, computing the linear function L(X) allows us to apply a linear code on the computations which can lead to a much higher security threshold. 5.4.1 Formal Description of Coded ANF Given the ANF representation defined in (5.1), we now present the proposed coded ANF scheme in the following. For each monomial Q j2S X[j] such that f (S) = 1, we define a linear function L S :R m !R and a bias term B S 2R as follows: 3 L S (X) = X j2S X[j]; B S =jSj + 1 2 : (5.10) It is clear thatL S (X) +B S = 1 2 if and only if Q j2S X[j] = 1. Otherwise,L S (X) +B S 1 2 . Thus, there are r(f) constructed linear threshold functions, and each monomial Q j2S X[j] can be computed by its corresponding linear threshold function sgn(L S (X) +B S ). Byconsideringeachbitinrealfield,themasterencodesX 1 ;X 2 ;:::;X K to ~ X 1 ; ~ X 2 ;:::; ~ X N using an (N;K) MDS code. Each worker n2 [N] stores ~ X n locally. Each worker n2 [N] computes the functionsfL S ( ~ X n )g fS[m]; f (S)=1g and then sends the results back to the master. After receiving the results from the workers, the master first recovers L S (X k ) for each k2 [K] and eachS2fG :G [m]; f (G) = 1g. Then, the master has Q j2S X k [j] = 1 if sgn(L S (X k ) +B S ) = 1; Q j2S X k [j] = 0 if sgn(L S (X k ) +B S ) =1. Lastly, the master recovers f(X 1 );:::;f(X K ) by summing the monomials. Since each of r(f) linear functions has up to m variables, the complexity at each worker is O(mr(f)). Remark 12. WecandemonstratethedecodabilityoffL S ( ~ X n )g n2[N] ’sbyconvertingourprob- lem to the distributed matrix-matrix multiplications as follows. ComputingfL S (X k )g k2[K] 3 The linear threshold function defined in (5.10) is adapted from the degree-1 polynomial threshold function p(X) = P m j=1 Z[j]X[j]m+ 1 2 considered in [17] where X2f1;1g m and p(X)> 0 iff X =Z. Since the Boolean domain considered in [76] isf1;1g instead off0;1g and all the bits are taken into account inp(X), we define (5.10) by lettingZ[j]=0;8j = 2S and the bias term to bejSj+ 1 2 such that only the bitsX[j];8j2S in the domainf0;1g are taken into account in (5.10). 109 for eachS is equivalent to computingK matrix-matrix multiplications X 1 A;X 2 A;:::;X K A (X 1 ;:::;X K are considered as row vectors) whereA is anm byjSj matrix and each column of matrix A is the coefficients of X[j]’s in the corresponding L S (X). Similarly, computing fL S ( ~ X n )g n2[N] for the correspondingS is equivalent to computing N matrix-matrix mul- tiplications ~ X 1 A; ~ X 2 A;:::; ~ X N A. Therefore, our problem can be converted to the coded distributed matrix-matrix multiplication in which an (N;K) MDS code is used to each ele- ment of the matricesX 1 ;:::;X K and the encoded matrices ~ X 1 ;:::; ~ X N are obtained. In [54], it is shown that matrix multiplications X 1 A;X 2 A;:::;X K A can be recovered from any K outofN codedresults ~ X 1 A;::: ~ X N AbytheMDSpropertyandthelinearpropertyofmatrix- matrix multiplications. In our problem, we deal with adversarial workers which are treated as errors. Since the system can be robust to NK erasures, one can show that the system can be robust tob NK 2 c errors (adversaries) by Lemma 3 proved in [109]. 5.4.2 Security Threshold of Coded ANF To decode the (N;K) MDS code, coded ANF applies Reed-Solomon decoding. Successful decoding requires the number of errors of computation results such that NK + 2b. The following theorem shows that the security threshold provided by coded ANF is NK 2 which is independent of degf. Theorem 6. Given a number of workers N and a dataset X = (X 1 ;:::;X K ), the proposed coded ANF can be robust tob adversaries for computingff(X k )g K k=1 for any Boolean function f, as long as NK + 2b; (5.11) i.e., coded ANF achieves the security threshold ANF = NK 2 : (5.12) 110 Whenever the master receives N results from the workers, the master decodes the com- putation results using a length-N Reed-Solomon code for each ofr(f) linear functions which incurs the total complexityO(r(f)N log 2 N log logN). Computing all the monomials via the signs of corresponding linear threshold functions incurs the complexity O(Nr(f)). Lastly, computing f(X 1 );:::;f(X K ) by summing the monomials incurs the complexity O(Nr(f)) since there arer(f) 1 additions in functionf. Thus, the total complexity of decoding step isO(r(f)N log 2 N log logN) which works well for smallr(f). Note that the operation of this scheme is over real field whose size does not scale with size of m. 5.5 Scheme 2: Coded Disjunctive Normal Form In this section, we propose a coding scheme called coded Disjunctive normal form (DNF) which computes the DNF representations of Boolean function by LTFs and a simple linear code is used for the data encoding. We start with an example to illustrate the idea behind coded DNF. Example. Consider a function which has an ANF representation defined as follows: f(X) = (X[1]X[m]) (X[1] 1) (X[m] 1) which has the degree degf =m 1 and the number of monomials r(f) = 2 m 1. Alterna- tively, this function has a DNF representation as follows: f(X) = (X[1]^^X[m])_ (X[1]^^X[m]) which has the weight w(f) = 2. For the clause X[1]^^X[m], we define a linear function over real field as follow: L 1 (X) =X[1] + +X[m] (5.13) 111 with a bias term B 1 =m + 1 2 , whereX[1]^^X[m] = 1 if and only if L 1 (X) +B 1 = 1 2 . Otherwise, L 1 (X) +B 1 1 2 . Similarly, for the clauseX[1]^^X[m], we define a linear function over real field as follows: L 2 (X) =X[1]X[m] (5.14) with a bias B 2 = 1 2 , where X[1]^^ X[m] = 1 if and only if L 2 (X) +B 2 = 1 2 . Otherwise,L 2 (X) +B 2 1 2 . Therefore, we can computef(X) by computing sgn(L 1 (X) + B 1 ) and sgn(L 2 (X)+B 2 ), i.e.,f(X) = 1 if at least one of sgn(L 1 (X)+B 1 ) and sgn(L 2 (X)+ B 2 ) is equal to 1. Otherwise, f(X) = 0. Unlike directly computing the function f(X) with the degree of m 1, computing the linear functions L 1 (X) and L 2 (X) allows us to apply a linear code on the computations. 5.5.1 Formal Description of Coded DNF Given the DNF representation defined in (5.2), we now present the proposed coded DNF scheme in the following. For each clause T i with the corresponding input Y i 2 Supp(f) such that f(Y i ) = 1, we define a linear function L i :R m !R and a bias term B i 2R as follows: 4 L i (X) = m X j=1 Z i [j]X[j]; B i = m X j=1 Y i [j] + 1 2 (5.15) where Z i [j] = 8 > > < > > : 1; if Y i [j] = 1 1; if Y i [j] = 0: (5.16) 4 Similar to the linear threshold function defined in (5.10), we define (5.15) by adjusting the bias term such that the threshold function can work in the domain off0;1g. 112 It is clear that L i (Y i ) +B i = 1 2 and L i (X) +B i 1 2 for all other inputs X6= Y i . Thus, there are w(f) constructed linear threshold functions, and each clause T i can be computed by its corresponding linear threshold function sgn(L i (X) +B i ). Byconsideringeachbitoverrealfield,themasterencodesX 1 ;X 2 ;:::;X K to ~ X 1 ; ~ X 2 ;:::; ~ X N using an (N;K) MDS code. Each worker n 2 [N] stores ~ X n locally. Each worker n computes the functions L 1 ( ~ X n );:::;L w(f) ( ~ X n ) and then sends the results back to the mas- ter. After receiving the results from the workers, the master first recovers L i (X k ) for each i 2 [w(f)] and each k 2 [K] via MDS decoding. Then, the master has T i (X k ) = 1 if sgn(L i (X k ) +B i ) = 1; otherwise T i (X k ) = 0. Lastly, the master has f(X k ) = 1 if at least one of T 1 (X k );:::;T w(f) (X k ) is equal to 1. Otherwise, f(X k ) = 0. Since each of w(f) linear functions has m variables, the complexity at each worker is O(mw(f)). 5.5.2 Security Threshold of Coded DNF Similar to coded ANF deploying Reed-Solomon code for the decoding process, we have the following theorem to show that the security threshold provided by coded DNF is NK 2 which is independent of degf. Theorem 7. Given a number of workers N and a dataset X = (X 1 ;:::;X K ), the proposed coded DNF can be robust tob adversaries for computingff(X k )g K k=1 for any Boolean function f, as long as NK + 2b; (5.17) i.e., coded DNF achieves the security threshold DNF = NK 2 : (5.18) 113 Upon receiving N results from the workers, the master decodes the computation results using a length-N Reed-Solomon code for each of w(f) linear functions which incurs the total complexity O(w(f)N log 2 N log logN). Computing all the clauses via the signs of correspondinglinearthresholdfunctionsincursthecomplexityO(Nw(f)). Lastly, computing f(X 1 );:::;f(X K ) by checking all the clauses requires the complexity O(Nw(f)). Thus, the total complexity of decoding step is O(w(f)N log 2 N log logN) which works well for small w(f). Remark 13. Learning the DNF representation of a Boolean function is an intensively studied problem in computational learning theory and is hard in general [47]. Thus, people focus on some more tractable classes of functions, e.g., O(logn)-term DNF is considered in PAC learning literature [49], which well motivates our proposed coded DNF. Remark 14. Although both coded ANF and coded DNF achieve the security threshold NK 2 , coded ANF has the decoding complexityO(r(f)N log 2 N log logN) and coded DNF has the decoding complexityO(w(f)N log 2 N log logN). Based on the sparsityr(f) and the weightw(f), one can choose either one of two schemes that has a smaller decoding complex- ity. When r(f) is smaller than w(f), coded ANF should be chosen. One the contrary, we can choose coded DNF. 5.6 Scheme 3: Coded Polynomial Threshold Function In this section, we propose a coding scheme called coded polynomial threshold function (PTF) which computes the DNF representations of Boolean function by PTFs and LCC is used for the data encoding. 5.6.1 Formal Description of Coded PTF Given the DNF representation defined in (5.2), we now present coded PTF. Following the construction proposed in [76, 47], we now construct a polynomial threshold function 114 sgn(P (X)) for computingf(X) whereP :R m !R is a polynomial function with the degree at mostblog 2 w(f)c + 1. The construction of such PTF has the following steps. 1. Decision Tree Construction: We construct anw(f)-leaf decision tree over variables X[1];:::;X[m] such that each input in Supp(f) arrives at a different leaf. Such a tree can be always constructed by a greedy algorithm. Let ` i be a leaf of this tree in which Y i reaches leaf` i . We label` i with the linear threshold function sgn(L i (X)+B i ) where L i (X) and B i are defined in (5.15). The constructed decision tree, in which internal nodes are labeled with variables and leaves are labeled with linear threshold functions, computes exactly f. 2. Decision List Construction: For thisw(f)-leaf decision tree, we construct an equiv- alentblog 2 w(f)c-decision list. Following from the definition that the rank of an w(f)- leaf tree is at mostblog 2 w(f)c. We find a leaf in the decision tree at distance at most blog 2 w(f)c from the root, and place the literals along the path to the leaf as a mono- mial at the top of a new decision list. We then remove the leaf from the tree, creating a new decision tree with one fewer leaf, and repeat this process [15]. Without loss of generality, we let ` i be the i-th removed leaf in the process of list construction with the corresponding monomial C i of at mostblog 2 w(f)c variables. The constructed list is defined as "if C 1 (X) = 1 then output 1+sgn(L 1 (X)+B 1 ) 2 ; else if C 2 (X) = 1 then output 1+sgn(L 2 (X)+B 2 ) 2 ; ... else if C w(f) (X) = 1 then output 1+sgn(L w(f) (X)+B w(f) ) 2 . 3. Polynomial Threshold Function Construction: Having the constructed deci- sion list, we now construct the polynomial function P (X) with degree of at most blogw(f)c + 1 as follows: P (X) =A 1 C 1 (X)(L 1 (X) +B 1 ) + +A w(f) C w(f) (X)(L w(f) (X) +B w(f) ) (5.19) where A 1 A 2 A 3 A m > 0 are appropriately chosen positive values. 115 After constructing the corresponding PTF sgn(P (X)) for Boolean function f(X), the procedure of computations is as follows. By considering each bit over real field, the master encodesX 1 ;X 2 ;:::;X K to ~ X 1 ; ~ X 2 ;:::; ~ X N usingLCC.Eachworkern2 [N]stores ~ X n locally. Each worker n computes the function P ( ~ X n ) and then sends the result back to the master. After receiving the results from the workers, the master first recovers P (X 1 );:::;P (X K ) via LCC decoding. Then, the master has f(X k ) = 1 if sgn(P (X k )) = 1; otherwise f(X k ) = 0. Since C i (X)’s are monomials with the degree of at mostblog 2 w(f)c, computing A i C i (X) incurs the complexity O(blog 2 w(f)c). Also, computing L i (X) +B i incurs the complex- ity O(m). Thus, computing function P (X) at each worker incurs the total complexity O(w(f)(blog 2 w(f)c +m)). 5.6.2 Security Threshold of Coded PTF SinceP (X) has degree of at mostblog 2 w(f)c+1, to be robust tob adversaries, LCC requires the number of workersN such thatN (K 1)(blog 2 w(f)c + 1) + 2b + 1. Then, we present the security threshold provided by coded PTF in the following theorem. Theorem 8. Given a number of workers N and a dataset X = (X 1 ;:::;X K ), the proposed coded polynomial threshold function can be robust to b adversaries for computingff(X k )g K k=1 for any Boolean function f, as long as N (K 1)(blog 2 w(f)c + 1) + 2b + 1; (5.20) i.e., coded PTF achieves the security threshold PTF = N (K 1)(blog 2 w(f)c + 1) 1 2 : (5.21) Whenever the master receivesN results from the workers, the master decodes the compu- tation results using a length-N Reed-Solomon code for the polynomial function which incurs 116 the total complexity O(N log 2 N log logN). Lastly, computing f(X 1 );f(X 2 );:::;f(X K ) by checking the signs requires the complexity O(N). Thus, the total complexity of decoding step is O(N log 2 N log logN). In the following example, we show that coded PTF outperforms LCC for the Boolean functions with the polynomial size of r(f) and w(f). Example. Consider a function which has an ANF representation defined as follows: f(X) = (X[1]X[2]) (X[2m 0 1])X[2m 0 ])X[2m 0 + 1]X[m] (5.22) wherem 0 =blog 2 m 2 c. Note that here we focus on the case thatm is large enough such that m>m 0 =blog 2 m 2 c. The function f has the degree of mblog 2 m 2 c, the sparsity ofm 2 and the weight ofm 2 . For the Boolean function considered in Example 5.6.2, coded PTF achieves the security threshold N(K1)(blog 2 m 2 c+1)1 2 whichisgreaterthanthesecuritythreshold N(K1)(mblog 2 m 2 c)1 2 provided by LCC. Although coded ANF and coded DNF achieve security threshold NK 2 but they require decoding complexity O(m 2 N log 2 N log logN) which has the order of m 2 , i.e., they only work for smallm. With the security slightly worse than coded ANF and coded DNF, coded PTF achieves the better decoding complexity which is independent of m, i.e., coded PTF can work for large m. 5.6.3 Coded D-partitioned PTF Inthissubsection, weextendcodedPTFbyproposing codedD-partitioned polynomial thresh- old function whose idea is to partition the Boolean function into some DNFs and construct their corresponding PTFs with low-degree. It allows us to apply LCC on the corresponding low-degree PTFs for improving the security threshold. 117 Given the DNF representation defined in (5.2) of Boolean function f and an integer D (1Dw(f)), we partition the DNF representation of f to D different DNF representa- tions as follows: f =G 1 _G 2 __G D (5.23) where eachG d includes w(f) D clauses of m literals, e.g., G 1 =T 1 __Tw(f) D : (5.24) Thus, we have that eachG d is a Boolean function with weight of w(f) D . By the PTF construc- tion described in Subsection 5.6.1, each Boolean functionG d can be computed by a PTF sgn(P d (X)) where P d (X) has degree of at mostblog 2 w(f) D c + 1. Similar to coded PTF using LCC for data encoding, each worker n2 [N] stores ~ X n locally. Each workern computes the functionP 1 ( ~ X n );:::;P D ( ~ X n ) and then sends the results back to the master. Upon receiving the results from the workers, the master first recovers P d (X 1 );:::;P d (X K ) for each d via LCC decoding. Then, the master has f(X k ) = 1 if at least one of sgn(P 1 (X k ));:::;sgn(P D (X k )) is equal to 1. Otherwise, f(X k ) = 0. Similar to coded PTF, computingD polynomial functions with the degree up toblog 2 w(f) D c + 1 at each worker incurs the complexity O(w(f)(blog 2 w(f) D c +m)). Since each P d (X) has degree of at mostblog 2 w(f) D c + 1, to be robust to b adversaries, LCC requires the number of workers N such that N (K 1)(blog 2 w(f) D c + 1) + 2b + 1. Formally, we have the following theorem. Theorem 9. Given a number of workers N and a dataset X = (X 1 ;:::;X K ), the pro- posed coded D-partitioned polynomial threshold function can be robust to b adversaries for computingff(X k )g K k=1 for any Boolean function f, as long as N (K 1)(blog 2 w(f) D c + 1) + 2b + 1; (5.25) 118 i.e., coded D-partitioned PTF achieves the security threshold PTF (D) = N (K 1)(blog 2 w(f) D c + 1) 1 2 : (5.26) Whenever the master receives N results from the workers, the master decodes the com- putation results using a length-N Reed-Solomon code forD constructed polynomial function whichincursthetotalcomplexityO(DN log 2 N log logN). Then,computingf(X 1 );f(X 2 );:::;f(X K ) by checking the signs and OR operations requires the complexity O(DN). Thus, the total complexity of decoding step is O(DN log 2 N log logN). Remark 15. The proposed codedD-partitioned PTF characterize a tradeoff between the se- curity threshold and the decoding complexity. For each chosenD(1Dw(f)), the pair of thesecuritythresholdandthedecodingcomplexity ( N(K1)(blog 2 w(f) D c+1)1 2 ;DN log 2 N log logN) canbeachievedbytheproposedcodedD-partitionedPTF.Inparticular, theproposedcoded DNF and coded PTF schemes correspond to the two extreme points of this tradeoff that minimize the security threshold and the decoding complexity respectively. Coded DNF corresponds to the point D = 1, i.e., no partition performed. On the other hand, coded corresponds to the point D =w(f), i.e., each DNF after partition process only contains one vector inf0; 1g m . Thus, codedD-partitioned PTF generalizes our previously proposed coded DNF and coded PTF, and allows to systematically operate at any points on this tradeoff. Remark 16. The total complexity of computing K evaluations f(X 1 );:::;f(X K ) via ANF is O(Kr(f)degf). Thus, it is more efficient to use coded ANF than computing all the evaluations at the master when degf > N K log 2 N log logN. On the other hand, since com- puting f(X 1 );:::;f(X K ) via DNF incurs the total complexity O(Kmw(f)), we can con- clude that it is more efficient to use coded DNF when m > N K log 2 N log logN. When mw(f) > DN K log 2 N log logN, coded D-partitioned PTF is more efficient than computing all the evaluations at the master. 119 5.7 Matching Outer Bound for coded ANF and coded DNF In this section, we show that coded ANF and coded DNF are optimal in terms of the security threshold. Westartbydefiningtherecoverythresholdandthehammingdistanceofascheme as follows: Definition 11. For any integerk, we say a scheme isk-recoverable if the master can recover h(X 1 );:::;h(X K ) given the computing results from any k workers. We define the recovery threshold of a scheme (~ g;h), denoted byK(~ g;h), as the minimum integerk such that scheme (~ g;h) is k-recoverable. Definition 12. We define the Hamming distance of any scheme (~ g;h), denoted by d(~ g;h), as the maximum integerd such that for any pair of input dataset whose computation results h(X 1 );:::;h(X K ) are different, at least d workers compute different values of h( ~ X n ). We prove the matching outer bound for coded ANF and coded DNF by the following theorem. Theorem 10. For a distributed computing problem of computing Boolean function f using N workers over a dataset X = (X 1 ;:::;X K ), any scheme (~ g;h) can achieve the security threshold up to = NK 2 : (5.27) Proof. The following lemma proved in [109] bridges the coding theory and distributed com- puting via the recovery threshold and the hamming distance of a scheme. 120 Lemma 10 (Lemma 3 in [109]). For any scheme (~ g;h), we have K(~ g;h) =Nd(~ g;h) + 1; (5.28) E detect (~ g;h) =d(~ g;h) 1; (5.29) E correct (~ g;h) = d(~ g;h) 1 2 (5.30) where K(~ g;h) is the the recovery threshold provided by scheme (~ g;h), E detect (~ g;h) denotes the maximum number of errors can be detected by the scheme, and E correct (~ g;h) denotes the maximum number of errors can be corrected by the scheme. Lemma 10 indicates that given any scheme that achieves a certain recovery threshold, denotedbyK(~ g;h), itcancorrectuptob NK(~ g;h) 2 cerrors. WithLemma10, provingTheorem 10 is equivalent to proving that the minimum recovery threshold of any scheme is K. Suppose that a scheme (~ g;h) is used for the computations. In the following, we present the following lemma proved in [112] which provides the converse bound of recovery threshold of computing any multilinear function h. Lemma 11 (Lemma 1 in [112]). Given any multilinear function h, the recovery threshold K(~ g;h) of any scheme (~ g;h) satisfies K(~ g;h) minf(K 1)degh + 1;NbN=Kc + 1g: (5.31) It is clear that the degree of function h is at least 1 since constant functions do not work in our problem. Moreover, the recovery threshold is a non-decreasing function on degree of h. By Lemma 11, the recovery threshold K(~ g;h) is lower bounded by K which concludes the proof. By Theorem 10, we have shown that the proposed coded ANF and coded DNF schemes are optimal in terms of the security threshold. 121 5.8 Concluding Remarks and Future Directions In this chapter, we focus on computing a Boolean function in a distributed manner against adversarial servers. To resolve the degree problem of using LCC (i.e., the security threshold provided by LCC can be low if the polynomial’s degree is high), the proposed schemes called coded ANF, coded DNF and coded PTF largely improve the security threshold by modeling the polynomial as the concatenation of some low-degree polynomial functions and threshold functions. It is shown that coded ANF and coded DNF are optimal by matching to the derived theoretical outer bound; and increase the security threshold by 150% for computing 8-bit S-box in the application of block cyphers using a distributed computing system with 100 workers. There are many interesting directions can be pursued on the problem of coded Boolean computations. For example, the proposed coded ANF and coded DNF require embedding bits to reals, which might lead to some floating-point errors during decoding process. Thus, one direction is to implement two schemes in an actual computing system and measure the effect of field transformation. 122 Bibliography [1] [Online]. Amazon EC2. [2] Martın Abadi et al. “Tensorflow: A system for large-scale machine learning”. In: 12th fUSENIXg Symposium on Operating Systems Design and Implementation (fOSDIg 16). 2016, pp. 265–283. [3] Josh Alman, Timothy M Chan, and Ryan Williams. “Polynomial representations of threshold functions and algorithmic applications”. In: 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS). IEEE. 2016, pp. 467–476. [4] Kazuyuki Amano. “New upper bounds on the average PTF density of boolean functions”. In: International Symposium on Algorithms and Computation. Springer. 2010, pp. 304–315. [5] Ganesh Ananthanarayanan et al. “Effective Straggler Mitigation: Attack of the Clones.” In: NSDI. Vol. 13. 2013, pp. 185–198. [6] Vahid Arabnejad, Kris Bubendorfer, and Bryan Ng. “Scheduling deadline constrained scientific workflows on dynamically provisioned cloud resources”. In: Future Generation Computer Systems 75 (2017), pp. 348–364. [7] James Aspnes et al. “The expressive power of voting polynomials”. In: Combinatorica 14.2 (1994), pp. 135–148. [8] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. “Finite-time analysis of the multiarmed bandit problem”. In: Machine learning 47.2-3 (2002), pp. 235–256. [9] Francois Baccelli, Armand M Makowski, and Adam Shwartz. “The fork-join queue and related systems with synchronization constraints: Stochastic ordering and computable bounds”. In: Advances in Applied Probability 21.3 (1989), pp. 629–660. [10] François Baccelli, William A Massey, and Don Towsley. “Acyclic fork-join queuing networks”. In: Journal of the ACM (JACM) 36.3 (1989). 123 [11] E Berlekamp. “Nonbinary BCH decoding (Abstr.)” In: IEEE Transactions on Information Theory 14.2 (1968), pp. 242–242. [12] Rawad Bitar, Parimal Parag, and Salim El Rouayheb. “Minimizing latency for secure distributed computing”. In: Information Theory (ISIT), 2017 IEEE International Symposium on. IEEE. 2017, pp. 2900–2904. [13] Peva Blanchard, Rachid Guerraoui, Julien Stainer, et al. “Machine learning with adversaries: Byzantine tolerant gradient descent”. In: Advances in Neural Information Processing Systems. 2017, pp. 119–129. [14] Hans-Dieter Block. “The perceptron: A model for brain functioning. i”. In: Reviews of Modern Physics 34.1 (1962), p. 123. [15] Avrim Blum. “Rank-r decision trees are a subclass of r-decision lists”. In: Information Processing Letters 42.4 (1992), pp. 183–185. [16] James Blythe et al. “Task scheduling strategies for workflow-based applications in grids”. In: Cluster Computing and the Grid, 2005. CCGrid 2005. IEEE International Symposium on. Vol. 2. IEEE. 2005, pp. 759–767. [17] Dan Bogdanov, Sven Laur, and Jan Willemson. “Sharemind: A framework for fast privacy-preserving computations”. In: European Symposium on Research in Computer Security. Springer. 2008, pp. 192–206. [18] Flavio Bonomi et al. “Fog computing and its role in the internet of things”. In: Proceedings of the first edition of the MCC workshop on Mobile cloud computing. ACM. 2012, pp. 13–16. [19] Mark Bun and Justin Thaler. “A Nearly Optimal Lower Bound on the Approximate Degree of AC ˆ0”. In: SIAM Journal on Computing 0 (2019), FOCS17–59. [20] Steven Cao, Swanand Kadhe, and Kannan Ramchandran. “CoVer: Collaborative Light-Node-Only Verification and Data Availability for Blockchains”. In: arXiv preprint arXiv:2010.00217 (2020). [21] Lingjiao Chen et al. “DRACO: Byzantine-resilient Distributed Training via Redundant Gradients”. In: International Conference on Machine Learning. 2018, pp. 903–912. [22] Lixing Chen and Jie Xu. “Task Replication for Vehicular Cloud: Contextual Combinatorial Bandit with Delayed Feedback”. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE. 2019, pp. 748–756. 124 [23] Wei-Neng Chen and Jun Zhang. “An ant colony optimization approach to a grid workflow scheduling problem with various QoS requirements”. In: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 39.1 (2009), pp. 29–43. [24] Mung Chiang and Tao Zhang. “Fog and IoT: An overview of research opportunities”. In: IEEE Internet of Things Journal 3.6 (2016). [25] Ronald Cramer, Ivan Bjerre Damgård, and Jesper Buus Nielsen. Secure multiparty computation. Cambridge University Press, 2015. [26] Thomas W Cusick and Pantelimon Stanica. Cryptographic Boolean functions and applications. Academic Press, 2017. [27] Jim G Dai. “On positive Harris recurrence of multiclass queueing networks: a unified approach via fluid limit models”. In: The Annals of Applied Probability (1995). [28] Jim G Dai and Wuqin Lin. “Maximum pressure policies in stochastic processing networks”. In: Operations Research 53.2 (2005). [29] Lisandro D Dalcin et al. “Parallel distributed computing using Python”. In: Advances in Water Resources 34.9 (2011), pp. 1124–1139. [30] Sanghamitra Dutta, Viveck Cadambe, and Pulkit Grover. “Short-dot: Computing large linear transforms distributedly using coded short dot products”. In: Advances In Neural Information Processing Systems. 2016, pp. 2100–2108. [31] Atilla Eryilmaz and R Srikant. “Fair resource allocation in wireless networks using queue-length-based scheduling and congestion control”. In: IEEE/ACM Transactions on Networking (TON) 15.6 (2007), pp. 1333–1344. [32] Atilla Eryilmaz, Rayadurgam Srikant, and James R Perkins. “Stable scheduling policies for fading wireless channels”. In: IEEE/ACM Transactions on Networking 13.2 (2005), pp. 411–424. [33] Hao Feng et al. “Approximation algorithms for the NFV service distribution problem”. In: IEEE INFOCOM 2017-IEEE Conference on Computer Communications. IEEE. 2017, pp. 1–9. [34] Nuwan Ferdinand and Stark C Draper. “Hierarchical coded computation”. In: 2018 IEEE International Symposium on Information Theory (ISIT). IEEE. 2018, pp. 1620–1624. 125 [35] Yi Gai, Bhaskar Krishnamachari, and Rahul Jain. “Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations”. In: IEEE/ACM Transactions on Networking (TON) 20.5 (2012), pp. 1466–1478. [36] Başak Güler, A Salman Avestimehr, and Antonio Ortega. “TACC: Topology-Aware Coded Computing for Distributed Graph Processing”. In: IEEE Transactions on Signal and Information Processing over Networks 6 (2020), pp. 508–525. [37] J Michael Harrison. “Brownian models of open processing networks: Canonical representation of workload”. In: Annals of Applied Probability (2000). [38] Wassily Hoeffding. “Probability inequalities for sums of bounded random variables”. In: The Collected Works of Wassily Hoeffding. Springer, 1994, pp. 409–426. [39] Mina Hoseinnejhad and Nima Jafari Navimipour. “Deadline constrained task scheduling in the cloud computing using a discrete firefly algorithm”. In: INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING 8.3 (2017). [40] I. -. Hou, V. Borkar, and P. R. Kumar. “A Theory of QoS for Wireless”. In: IEEE INFOCOM 2009. Apr. 2009, pp. 486–494. doi: 10.1109/INFCOM.2009.5061954. [41] Yun Chao Hu et al. “Mobile edge computing—A key technology towards 5G”. In: ETSI White Paper 11.11 (2015), pp. 1–16. [42] Mike Jia, Jiannong Cao, and Lei Yang. “Heuristic offloading of concurrent tasks for computation-intensive applications in mobile cloud computing”. In: Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on. IEEE. 2014, pp. 352–357. [43] Swanand Kadhe, O Ozan Koyluoglu, and Kannan Ramchandran. “Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers”. In: arXiv preprint arXiv:1904.13373 (2019). [44] Yi-Hsuan Kao et al. “Hermes: Latency optimal task assignment for resource-constrained mobile computing”. In: IEEE Transactions on Mobile Computing (2017). [45] Can Karakus et al. “Straggler mitigation in distributed optimization through data encoding”. In: Advances in Neural Information Processing Systems. 2017, pp. 5434–5442. [46] Adam R Klivans, Ryan O’Donnell, and Rocco A Servedio. “Learning intersections and thresholds of halfspaces”. In: Journal of Computer and System Sciences 68.4 (2004), pp. 808–840. 126 [47] Adam R Klivans and Rocco A Servedio. “Learning DNF in time 2O (n1/3)”. In: Journal of Computer and System Sciences 68.2 (2004), pp. 303–318. [48] Subhashini Krishnasamy et al. “Augmenting max-weight with explicit learning for wireless scheduling with switching costs”. In: IEEE/ACM Transactions on Networking 26.6 (2018), pp. 2501–2514. [49] Eyal Kushilevitz. “A simple algorithm for learning o (logn)-term DNF”. In: Information Processing Letters 61.6 (1997), pp. 289–292. [50] Yu-Kwong Kwok and Ishfaq Ahmad. “Static scheduling algorithms for allocating directed task graphs to multiprocessors”. In: ACM Computing Surveys (CSUR) 31.4 (1999), pp. 406–471. [51] Tze Leung Lai and Herbert Robbins. “Asymptotically efficient adaptive allocation rules”. In: Advances in applied mathematics 6.1 (1985), pp. 4–22. [52] Sina Lashgari and A Salman Avestimehr. “Timely throughput of heterogeneous wireless networks: Fundamental limits and algorithms”. In: IEEE Transactions on Information Theory 59.12 (2013), pp. 8414–8433. [53] Kangwook Lee, Changho Suh, and Kannan Ramchandran. “High-dimensional coded matrix multiplication”. In: Information Theory (ISIT), 2017 IEEE International Symposium on. IEEE. 2017, pp. 2418–2422. [54] Kangwook Lee et al. “Speeding up distributed machine learning using codes”. In: IEEE Transactions on Information Theory 64.3 (2018), pp. 1514–1529. [55] Fengjiao Li, Jia Liu, and Bo Ji. “Combinatorial sleeping bandits with fairness constraints”. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE. 2019, pp. 1702–1710. [56] Kenli Li et al. “Scheduling precedence constrained stochastic tasks on heterogeneous cluster systems”. In: IEEE Transactions on computers 64.1 (2015), pp. 191–204. [57] Lihong Li et al. “A contextual-bandit approach to personalized news article recommendation”. In: Proceedings of the 19th international conference on World wide web. ACM. 2010, pp. 661–670. [58] S. Li et al. “PolyShard: Coded Sharding Achieves Linearly Scaling Efficiency and Security Simultaneously”. In: IEEE Transactions on Information Forensics and Security (2020), pp. 1–1. [59] Shuai Li et al. “Contextual combinatorial cascading bandits”. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48. 2016, pp. 1245–1253. 127 [60] Songze Li and Salman Avestimehr. “Coded computing”. In: Foundations and Trends® in Communications and Information Theory 17.1 (2020). [61] Songze Li, Mohammad Ali Maddah-Ali, and A Salman Avestimehr. “Coding for distributed fog computing”. In: IEEE Communications Magazine 55.4 (2017), pp. 34–40. [62] Songze Li et al. “A fundamental tradeoff between computation and communication in distributed computing”. In: IEEE Transactions on Information Theory 64.1 (2018), pp. 109–128. [63] Songze Li et al. “Polyshard: Coded sharding achieves linearly scaling efficiency and security simultaneously”. In: IEEE Transactions on Information Forensics and Security 16 (2020), pp. 249–261. [64] Loi Luu et al. “A secure sharding protocol for open blockchains”. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016, pp. 17–30. [65] Siva Theja Maguluri, R Srikant, and Lei Ying. “Stochastic models of load balancing and scheduling in cloud computing clusters”. In: INFOCOM, 2012 Proceedings IEEE. IEEE. 2012, pp. 702–710. [66] S Eman Mahmoodi, RN Uma, and KP Subbalakshmi. “Optimal joint scheduling and cloud offloading for mobile applications”. In: IEEE Transactions on Cloud Computing (2016). [67] James Massey. “Shift-register synthesis and BCH decoding”. In: IEEE transactions on Information Theory 15.1 (1969), pp. 122–127. [68] ML MINSKY. “Perceptrons”. In: MIT Press (1969). [69] Sabrina Müller et al. “Context-aware proactive content caching with service differentiation in wireless networks”. In: IEEE Transactions on Wireless Communications 16.2 (2016), pp. 1024–1036. [70] Krishna Giri Narra et al. “Slack squeeze coded computing for adaptive straggler mitigation”. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM. 2019, p. 14. [71] Michael J Neely, Eytan Modiano, and Charles E Rohrs. “Dynamic power allocation and routing for time-varying wireless networks”. In: IEEE Journal on Selected Areas in Communications 23.1 (2005), pp. 89–103. 128 [72] Viên Nguyen. “Processing networks with parallel and sequential tasks: Heavy traffic analysis and Brownian limits”. In: The Annals of Applied Probability (1993), pp. 28–55. [73] Noam Nisan and Mario Szegedy. “On the degree of Boolean functions as real polynomials”. In: Computational complexity 4.4 (1994), pp. 301–313. [74] Hanzaleh Akbari Nodehi and Mohammad Ali Maddah-Ali. “Secure coded multi-party computation for massive matrix operations”. In: arXiv preprint arXiv:1908.04255 (2019). [75] Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014. [76] Ryan O’Donnell and Rocco A Servedio. “Extremal properties of polynomial threshold functions”. In: Journal of Computer and System Sciences 74.3 (2008), pp. 298–312. [77] Ryan O’Donnell and Rocco A Servedio. “New degree bounds for polynomial threshold functions”. In: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing. 2003, pp. 325–334. [78] Erhan Oztop. “An upper bound on the minimum number of monomials required to separate dichotomies off- 1, 1g n”. In: Neural computation 18.12 (2006), pp. 3119–3138. [79] Ramtin Pedarsani, Jean Walrand, and Yuan Zhong. “Robust scheduling for flexible processing networks”. In: Advances in Applied Probability 49 (2017). [80] Ramtin Pedarsani, Jean Walrand, and Yuan Zhong. “Scheduling tasks with precedence constraints on multiple servers”. In: Communication, Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on. IEEE. 2014. [81] Saurav Prakash et al. “Coded computing for distributed graph analytics”. In: 2018 IEEE International Symposium on Information Theory (ISIT). IEEE. 2018, pp. 1221–1225. [82] Saurav Prakash et al. “Coded computing for distributed graph analytics”. In: IEEE Transactions on Information Theory 66.10 (2020), pp. 6534–6554. [83] Saurav Prakash et al. “Coded Computing for Low-Latency Federated Learning Over Wireless Edge Networks”. In: IEEE Journal on Selected Areas in Communications 39.1 (2020), pp. 233–250. [84] Saurav Prakash et al. “Hierarchical Coded Gradient Aggregation for Learning at the Edge”. In: 2020 IEEE International Symposium on Information Theory (ISIT). IEEE. 2020, pp. 2616–2621. 129 [85] Lijing Qin, Shouyuan Chen, and Xiaoyan Zhu. “Contextual combinatorial bandit and its application on diversified online recommendation”. In: Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM. 2014, pp. 461–469. [86] Amirhossein Reisizadeh et al. “Coded computation over heterogeneous clusters”. In: IEEE Transactions on Information Theory (2019). [87] Frank Rosenblatt. “The perceptron: a probabilistic model for information storage and organization in the brain.” In: Psychological review 65.6 (1958), p. 386. [88] Rajat Sen et al. “Contextual Bandits with Latent Confounders: An NMF Approach”. In: Artificial Intelligence and Statistics. 2017, pp. 518–527. [89] Can Eren Sezener and Erhan Oztop. “Heuristic algorithms for obtaining Polynomial Threshold Functions with low densities”. In: arXiv preprint arXiv:1504.01167 (2015). [90] Roshan Shariff and Or Sheffet. “Differentially private contextual linear bandits”. In: Advances in Neural Information Processing Systems. 2018, pp. 4296–4306. [91] Jinhyun So, Basak Guler, and A Salman Avestimehr. “A Scalable Approach for Privacy-Preserving Collaborative Machine Learning”. In: arXiv preprint arXiv:2011.01963 (2020). [92] Jinhyun So, Basak Guler, and A Salman Avestimehr. “Turbo-aggregate: Breaking the quadratic aggregation barrier in secure federated learning”. In: IEEE Journal on Selected Areas in Information Theory (2021). [93] Jinhyun So et al. “Codedprivateml: A fast and privacy-preserving framework for distributed machine learning”. In: arXiv preprint arXiv:1902.00641 (2019). [94] Mahdi Soleymani, Hessam Mahdavifar, and A Salman Avestimehr. “Analog lagrange coded computing”. In: IEEE Journal on Selected Areas in Information Theory (2021). [95] Alexander L Stolyar et al. “Maxweight scheduling in a generalized switch: State space collapse and workload minimization in heavy traffic”. In: The Annals of Applied Probability 14.1 (2004), pp. 1–53. [96] Rashish Tandon et al. “Gradient coding: Avoiding stragglers in distributed learning”. In: International Conference on Machine Learning. 2017, pp. 3368–3376. [97] Xiaoyong Tang et al. “A stochastic scheduling algorithm for precedence constrained tasks on grid”. In: Future Generation Computer Systems 27.8 (2011), pp. 1083–1091. 130 [98] Leandros Tassiulas and Anthony Ephremides. “Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks”. In: IEEE transactions on automatic control 37.12 (1992), pp. 1936–1948. [99] Haluk Topcuoglu, Salim Hariri, and Min-you Wu. “Performance-effective and low-complexity task scheduling for heterogeneous computing”. In: IEEE transactions on parallel and distributed systems 13.3 (2002), pp. 260–274. [100] Subir Varma. “Heavy and light traffic approximations for queues with synchronization constraints”. PhD thesis. 1990. [101] Neil Stuart Walton. “Concave switching in single and multihop networks”. In: ACM SIGMETRICS Performance Evaluation Review. Vol. 42. 1. ACM. 2014, pp. 139–151. [102] Chien-Sheng Yang and A Salman Avestimehr. “Coded computing for secure Boolean computations”. In: IEEE Journal on Selected Areas in Information Theory 2.1 (2021), pp. 326–337. [103] Chien-Sheng Yang, Ramtin Pedarsani, and A Salman Avestimehr. “Communication-Aware Scheduling of Serial Tasks for Dispersed Computing”. In: IEEE/ACM Transactions on Networking (TON) 27.4 (2019), pp. 1330–1343. [104] Chien-Sheng Yang, Ramtin Pedarsani, and A Salman Avestimehr. “Edge computing in the dark: Leveraging contextual-combinatorial bandit and coded computing”. In: IEEE/ACM Transactions on Networking (2021). [105] Chien-Sheng Yang, Ramtin Pedarsani, and A Salman Avestimehr. “Timely coded computing”. In: 2019 IEEE International Symposium on Information Theory (ISIT). IEEE. 2019, pp. 2798–2802. [106] Chien-Sheng Yang, Ramtin Pedarsani, and A Salman Avestimehr. “Timely-Throughput Optimal Coded Computing over Cloud Networks”. In: Proceedings of the Twentieth ACM International Symposium on Mobile Ad Hoc Networking and Computing. ACM. 2019, pp. 301–310. [107] Jia Yu and Rajkumar Buyya. “Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms”. In: Scientific Programming 14.3-4 (2006), pp. 217–230. [108] Mingchao Yu et al. “Coded merkle tree: Solving data availability attacks in blockchains”. In: International Conference on Financial Cryptography and Data Security. Springer. 2020, pp. 114–134. 131 [109] Qian Yu, Mohammad Ali, and A Salman Avestimehr. “Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding”. In: IEEE Transactions on Information Theory (2020). [110] Qian Yu and A Salman Avestimehr. “Coded Computing for Resilient, Secure, and Privacy-Preserving Distributed Matrix Multiplication”. In: IEEE Transactions on Communications (2020). [111] Qian Yu, Mohammad Maddah-Ali, and Salman Avestimehr. “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication”. In: Advances in Neural Information Processing Systems. 2017, pp. 4403–4413. [112] Qian Yu et al. “Lagrange Coded Computing: Optimal Design for Resiliency, Security, and Privacy”. In: The 22nd International Conference on Artificial Intelligence and Statistics. 2019, pp. 1215–1225. [113] Matei Zaharia et al. “Improving MapReduce performance in heterogeneous environments.” In: Osdi. Vol. 8. 4. 2008, p. 7. [114] Jianan Zhang et al. “Optimal control of distributed computing networks with mixed-cast traffic flows”. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE. 2018, pp. 1880–1888. [115] Weiwen Zhang, Yonggang Wen, and Dapeng Oliver Wu. “Collaborative task execution in mobile cloud computing under a stochastic wireless channel”. In: IEEE Transactions on Wireless Communications 14.1 (2015), pp. 81–93. [116] Wei Zheng and Rizos Sakellariou. “Stochastic DAG scheduling using a Monte Carlo approach”. In: Journal of Parallel and Distributed Computing 73.12 (2013), pp. 1673–1689. 132 Appendices I Proof of Lemma 1 First, consider a vector ~ 2 0 . There exist feasible allocation vectors ~ p, ~ q, ~ u,~ s and ~ w such that (k;j) p (k;j) =r (k;j) ;8 j;8 k: (I.1) b j q (k;j) c k =r (k;j);c =r (k;j) (1s k;j!j );8 j;8 k2 [K]nH: (I.2) Now, we focus on job m specified by a chain and compute P J j=1 r (k;j) . If k2C, we have J X j=1 r (k;j) = J X j=1 m(k) u m!j (I.3) = m(k) J X j=1 u m!j (I.4) = m(k) (I.5) since P J j=1 u m!j = 1. Then, we can compute P J j=1 r (k+1;j) as follows: J X j=1 r (k+1;j) = J X j=1 J X l=1 r (k;l) f k;l!j (I.6) = J X l=1 r (k;l) J X j=1 f k;l!j (I.7) 133 = J X l=1 r (k;l) (I.8) = m(k) (I.9) since P J j=1 f k;l!j = 1. By induction, we have P J j=1 r (k;j) = m(k) ,8k2I m . Then, we have J X j=1 r (k;j) = J X j=1 (k;j) p (k;j) ; (I.10) which concludes k ( ~ ) = m(k) = P J j=1 (k;j) p (k;j) for all k. For8k2 [K]nH and all j, we can write b j q (k;j) c k =r (k;j);c (I.11) =r (k;j) (1s k;j!j ) (I.12) =r (k;j) r (k;j) f k;j!j (I.13) r (k;j) J X l=1 r (k;l) f k;l!j (I.14) =r (k;j) r (k+1;j) (I.15) = (k;j) p (k;j) (k+1;j) p (k+1;j) : (I.16) Thus, 0 . Now, we consider a rate vector ~ 2 . There exist allocation vectors ~ p and ~ q such that k ( ~ ) = P J j=1 (k;j) p (k;j) ,8 k; and b j q (k;j) c k (k;j) p (k;j) (k+1;j) p (k+1;j) ,8 j,8 k2 [K]nH. For QNPP, for8 m2 [M], one can simply choose u m!j as follows: u m!j = (k;j) p (k;j) k ( ~ ) ;8 j; (I.17) 134 where k is the root node of job m. For k2 [K]nH, we denote D k =fj : (k;j) p (k;j) (k+1;j) p (k+1;j) < 0;8 jg: (I.18) Then, for k2 [K]nH, we choose s k;j!j as follows: s k;j!j = 8 > > < > > : 1 if j2D k (k+1;j) p (k+1;j) (k;j) p (k;j) if j = 2D k (I.19) For j2D k , we choose w k;j!l to be any feasible value such that X l2[J]nfjg w k;j!l = 1: (I.20) For j = 2D k , we choose w k;j!l as follows: w k;j!l = 8 > > < > > : (k+1;l) p (k+1;l) (k;l) p (k;l) P l2D k (k+1;l) p (k+1;l) (k;l) p (k;l) if l2D k 0 if l = 2D k : (I.21) One can easily check that ~ s and ~ w are feasible. Based on the feasible vectors ~ u, ~ s and ~ w stated above, we can compute r (k;j) . Let’s focus on job m and compute nominal rate r (k;j) . If k2C, we have r (k;j) = k ( ~ )u m!j = (k;j) p (k;j) ;8 j: (I.22) Then, we can compute r (k+1;j) in the following two cases: Case 1: j2D k : We compute r (k+1;j) as r (k+1;j) = J X l=1 r (k;l) f k;l!j (I.23) 135 = (k;j) p (k;j) f k;j!j + X l2[J]nfjg (k;l) p (k;l) f k;l!j (I.24) = (k;j) p (k;j) s k;j!j + X l2[J]nfjg (k;l) p (k;l) (1s k;l!l )w k;l!j (I.25) = (k;j) p (k;j) + X l= 2D k (k;l) p (k;l) (1 (k+1;l) p (k+1;l) (k;l) p (k;l) ) (k+1;j) p (k+1;j) (k;j) p (k;j) P l2D k (k+1;l) p (k+1;l) (k;l) p (k;l) (I.26) = (k;j) p (k;j) + ( (k+1;j) p (k+1;j) (k;j) p (k;j) ) P l= 2D k (k;l) p (k;l) (k+1;l) p (k+1;l) P l2D k (k+1;l) p (k+1;l) (k;l) p (k;l) (I.27) = (k;j) p (k;j) + ( (k+1;j) p (k+1;j) (k;j) p (k;j) ) (I.28) = (k+1;j) p (k+1;j) (I.29) using the fact k ( ~ ) = k+1 ( ~ ), i.e. P J j=1 (k;j) p (k;j) = P J j=1 (k+1;j) p (k+1;j) . Case 2: j = 2D k : We compute r (k+1;j) as r (k+1;j) = J X l=1 r (k;l) f k;l!j (I.30) = (k;j) p (k;j) f k;j!j + X l2[J]nfjg (k;l) p (k;l) f k;l!j (I.31) = (k;j) p (k;j) s k;j!j + X l2[J]nfjg (k;l) p (k;l) (1s k;l!l )w k;l!j (I.32) = (k+1;j) p (k+1;j) (I.33) since s k;l!l = 1 for l2D k and w k;l!j = 0 for l;j = 2D k and l6=j. Similarly, we can obtain r (k;j) = (k;j) p (k;j) for8 k2I m . Now, for k2 [K]nH, we can compute r (k;j);c . There are two cases as follows: Case 1: j2D k : We compute r (k;j);c as r (k;j);c =r (k;j) (1s k;j!j ) (I.34) = (k;j) p (k;j) (1s k;j!j ) (I.35) =0 (I.36) 136 since s k;j!j = 1 for j2D k . Therefore, b j q (k;j) c k 0 =r (k;j);c . Case 2: j = 2D k : We compute r (k;j);c as r (k;j);c =r (k;j) (1s k;j!j ) (I.37) = (k;j) p (k;j) (1 (k+1;j) p (k+1;j) (k;j) p (k;j) ) (I.38) = (k;j) p (k;j) (k+1;j) p (k+1;j) (I.39) sinces k;j!j = (k+1;j) p (k+1;j) (k;j) p (k;j) forj = 2D k . Then, we have b j q (k;j) c k (k;j) p (k;j) (k+1;j) p (k+1;j) = r (k;j);c . Thus, 0 which completes the proof. J Proof of Lemma 2 Given an outcome of ~ , we denote Y (d;~ ; ~ `) as the total number of evaluations sent back to the master in time d using the load allocation vector ~ `. We define two events A , f~ : Y (d;~ ; ~ `) K(~ g 1 )g and B , f~ : Y (d;~ ; ~ `) K(~ g 2 )g. It is clear that we have P(T ( ~ `;~ g 1 ) d) = P(A) and P(T ( ~ `;~ g 2 ) d) = P(B). Considering an arbitrary outcome of ~ with the fact K(~ g 1 )K(~ g 2 ), we have that if Y (d;~ ; ~ `)K(~ g 2 ) then Y (d;~ ; ~ `)K(~ g 1 ). It implies BA which concludesP(A)P(B), i.e.,P(T ( ~ `;~ g 1 ) (~ )d)P(T ( ~ `;~ g 2 ) (~ )d). K Proof of Lemma 3 Given a load allocation vector ~ `, we can construct ~ ` 0 by assigning ` 0 i =` b if 0` i ` b , and ` 0 i =` g otherwise. Given an outcome of ~ , we denote Y (d;~ ; ~ `) as total number of results sent back to the master in time d using the load allocation vector ~ `. We define two events A ,f~ : Y (d;~ ; ~ `) K g and B ,f~ : Y (d;~ ; ~ ` 0 ) K g. It is clear that we have P(T ( ~ `; ~ g ) (~ ) d) =P(A), andP(T ( ~ ` 0 ; ~ g ) (~ )d) =P(B). Considering an arbitrary outcome of ~ , we have 137 the following facts: (1) If 0` i ` b , then we have ` 0 i i d. (2) If` b <` i ` g , we have either ` i g ; ` 0 i g d or ` i b ; ` 0 i b > d. (3) ` 0 i ` i for all i. By the facts above, if Y (d;~ ; ~ `) K , then Y (d;~ ; ~ ` 0 )K which implies AB. Thus, we haveP(T ( ~ ` 0 ; ~ g ) (~ )d)P(T ( ~ `; ~ g ) (~ )d) which completes the proof. L Proof of Lemma 5 In round m, we have the optimal success probability: P (m) = jG g (m)j X l=a(G g (m)) X G:GG g (m);jGj=l Y i2G p g;i (m) Y i2G g (m)nG p b;i (m) whereG g (m) characterizes the optimal load allocationvector in roundm. Let’s recall that we havei m todetermineloadallocationvectorinroundmusingLEA,i.e.,` m;i =` g if 1ii m , ` m;i =` b otherwise. Itisclearthatthisallocationvectorischaracterizedbyaset ^ G(m) = [i m ]. Also, we have w(i m ) = a( ^ G(m)) where w( ~ i),d K (n ~ i)` b `g e. Thus, P LEA (m) can be written as follows: P LEA (m) = i m X l=w(i m ) X G:G[i m ];jGj=l Y i2G p g;i (m) Y i2[ ~ i]nG p b;i (m) = j ^ Gg (m)j X l=a( ^ Gg (m)) X G:G ^ Gg (m);jGj=l Y i2G p g;i (m) Y i2 ^ Gg (m)nG p b;i (m) Note that the allocation vector characterized by ^ G g (m) maximizes the estimated success probability defined in (3.7) and (3.8) which is the estimated success probability based on ^ p g;i (m) and ^ p b;i (m). By SLLN, we have that ^ p g;i (m) converges to p g;i (m) and ^ p b;i (m) converges to p b;i (m) almost surely, as m goes to infinity. For all > 0, there exists m() such thatjp g;i (m) 138 ^ p g;i (m)j < andjp b;i (m) ^ p b;i (m)j < for all m > m(). Since ^ G g (m) maximizes the estimated success probability based on ^ p g;i (m) and ^ p b;i (m), for all m>m(), we have P (m) jG g (m)j X l=a(jG g (m)j) X G:GG g (m);jGj=l Y i2G (^ p g;i (m) +) Y i2G g (m)nG (^ p b;i (m) +) = jG g (m)j X l=a(jG g (m)j) X G:GG g (m);jGj=l Y i2G ^ p g;i (m) Y i2G g (m)nG ^ p b;i (m) +f() j ^ Gg (m)j X l=a(j ^ Gg (m)j) X G:G ^ Gg (m);jGj=l Y i2G ^ p g;i (m) Y i2 ^ Gg (m)nG ^ p b;i (m) +f() j ^ Gg (m)j X l=a(j ^ Gg (m)j) X G:G ^ Gg (m);jGj=l Y i2G (p g;i (m) +) Y i2 ^ Gg (m)nG (p b;i (m) +) +f() =P LEA (m) +g() +f(): Note thath(),g()+f() is a polynomial function of andh(0) = 0, which impliesh()! 0 as ! 0. Moreover, it is clear thatP LEA (m)P (m) sinceP (m) is optimal. Therefore, we can conclude that for all 1 > 0, there existsm( 1 ) such thatjP LEA (m)P (m)j< 1 for all m>m( 1 ) which completes the proof. M Proof of Lemma 7 Suppose the policy enters the exploration phase in roundt and letP t =fp t g 2V t be the cor- responding hypercubes of edge devices. Then, based on the design of the proposed policy, the set of under-explored hypercubesP ue;t T is non-empty, i.e., there exists at least one edge device with context t such that a hypercube p satisfying t 2 p has C t (p) K(t) = t z log (t). Clearly, there can be at mostdT z log (T )e exploration phases in which edge devices with contexts in p are selected due to under-exploration of p. Since there are (h T ) D hypercubes in the partition, there can be at most (h T ) D T z log (T ) exploration phases. Also, the max- imum achievable reward of an offloading decision is bounded by 1 and the minimum 139 achievable reward isB. The maximum regret in one exploration phase is bounded by 1 +(B 1)< 1 +B. Therefore, we have E[R e (T )] (1 +B)(h T ) D dT z log (T )e (M.1) = (1 +B)dT e D dT z log (T )e (M.2) (1 +B)2 D T D (T z log (T ) + 1) (M.3) = (1 +B)2 D (T z+ D log (T ) +T D ) (M.4) using the fact thatdT e D (2T ) D = 2 D T D . N Proof of Lemma 8 For each t2 [T ], we define W t =fP ue;t =;g as the event that the algorithm enters the exploitation phase. By the definition ofP ue;t , we have that C t (p)>K(t) =t z log (t) for all p2P t . Let V t G be the event that subset G2L t is selected in round t. Then, we have R s (T ) = T X t=1 X G2L t 1 fV t G ;W t g (r(A t )r(G)): (N.1) Since the maximum regret is bounded by 1 +B, we have R s (T ) (1 +B) T X t=1 X G2L t 1 fV t G ;W t g : (N.2) By taking the expectation, the regret can be bounded as follows E[R s (T )] (1 +B) T X t=1 X G2L t Pr(V t G ;W t ): (N.3) 140 Now, we explain how to bound Pr(V t G ;W t ). Because of the design of policy, the choice of G is optimal based on the estimated ^ t . Thus, we have u(^ t ;G)u(^ t ; ~ A t ) which implies Pr(V t G ;W t ) Pr(u(^ t ;G)u(^ t ; ~ A t );W t ): (N.4) The eventfu(^ t ;G) u(^ t ; ~ A t );W t g actually implies that at least one of the following events holds for any H(t)> 0: E 1 = u(^ t ;G)u( t ;G) +H(t);W t (N.5) E 2 = u(^ t ; ~ A t )u( t ; ~ A t )H(t);W t (N.6) E 3 = u(^ t ;G)u(^ t ; ~ A t );u(^ t ;G)u( t ; ~ A t )H(t);W t : (N.7) Therefore, we have u(^ t ;G)u(^ t ; ~ A t );W t E 1 [E 2 [E 3 . Then, we proceed to bound the probabilities of events E 1 , E 2 and E 3 separately. Before bounding Pr(E 1 ), we first present the following lemma which is proved in Appendix P. Lemma 12. Given a positive number H(t), 1 , 2 and G, if u( 1 ;G)u( 2 ;G) +H(t), then there exits 2G such that 1 (p t ) 2 (p t ) + H(t) BM ; (N.8) where B = max 1tT b t and M = max 1tT B1 Y t 1 . Thus, by Lemma 12, we have E 1 = u(^ t ;G) u( t ;G) +H(t);W t ^ t (p t ) (p t ) + H(t) BM ;92G;W t . By the definition of (p), the expectation of estimated success probability for the edge device 2V t can be bounded byE[^ t (p t )](p t ). Then, we bound Pr(E 1 ) as follows Pr(E 1 ) = Pr(u(^ t ;G)u( t ;G) +H(t);W t ) (N.9) 141 Pr(^ (p t )(p t ) + H(t) BM ;92G;W t ) (N.10) Pr(^ t (p t )E[^ t (p t )] + H(t) BM ;92G;W t ) (N.11) X 2G Pr(^ t (p t )E[^ t (p t )] + H(t) BM ;W t ): (N.12) By applying Chernoff-Hoeffding inequality [38] and the fact that there are at least K(t) = t z log (t) samples drawn, we have Pr(E 1 ) X 2G Pr(^ t (p t )E[^ t (p t )] + H(t) BM ;W t ) (N.13) X 2G exp ( 2C t (p t )H(t) 2 B 2 M 2 ) (N.14) X 2G exp ( 2t z log (t)H(t) 2 B 2 M 2 ): (N.15) If we choose H(t) =BMt z=2 > 0, we have Pr(E 1 )B exp ( 2t z log (t)H(t) 2 B 2 M 2 ) (N.16) =B exp (2 log (t)) (N.17) =Bt 2 : (N.18) Similarly, we have a bound for Pr(E 2 ): Pr(E 2 )Bt 2 : (N.19) Lastly, we bound Pr(E 3 ). Now we suppose that the following condition is satisfied: 2H(t)At : (N.20) 142 SinceG2L t , we haveu( t ; ~ A t )u( t ;G)At . With (N.20), we haveu( t ; ~ A t )H(t) u( t ;G) + H(t) which contradicts event E 3 . That is, under condition (N.20), we have Pr(E 3 ) = 0. Under condition (N.20), using (N.18) and (N.19), we have Pr(V t G ;W t ) (N.21) Pr(E 1 [E 2 [E 3 ) (N.22) Pr(E 1 ) +Pr(E 2 ) +Pr(E 3 ) (N.23) 2Bt 2 : (N.24) Finally, we complete the regret bound forE[R s (T )] as follows: E[R s (T )](1 +B) T X t=1 X G2L t Pr(V t G ;W t ) (N.25) (1 +B)jL t j T X t=1 2Bt 2 (N.26) (1 +B)jL t j(2B) 1 X t=1 t 2 (N.27) =(1 +B)jL t j(2B) 2 6 (N.28) (1 +B)B 2 3 B X k=1 jVj k : (N.29) O Proof of Lemma 9 For each t 2 [T ], we define W t = fP ue;t = ;g as the event that the policy enters the exploitation phase. Then, the regret due to near-optimal subsets can be written as R n (T ) = T X t=1 1 fW t ;G t 2A t b nL t g (r(A t )r(G t )): (O.1) 143 Let Q t =fW t ;G t 2A t b nL t g be the event that a near-optimal subset is selected in round t. Then, we have E[R n (T )] = T X t=1 Pr(Q t )E[r(A t )r(G t )jQ t ] (O.2) T X t=1 (u( t A t )u( t ;G t )): (O.3) where G t is near-optimal in each round t. By the definition ofL t , we then have u( t ; ~ A t )u( t ;G t )<At : (O.4) By the function c defined in Appendix P and Assumption 1, we have u( t ;A t )u(~ t ;A t ) (O.5) =c( t ; ~ t ;A t ;Y t ) (O.6) X (G 1 ;G 2 ;)2S(A t ;Y t ) j( t )( ~ p t )j (O.7) X (G 1 ;G 2 ;)2S(A t ;Y t ) Lk t ~ p t k (O.8) X (G 1 ;G 2 ;)2S(A t ;Y t ) LD 2 h T (O.9) = jA t j Y t Y t LD 2 h T = jA t j 1 Y t 1 jA t jLD 2 h T (O.10) BMLD 2 h T : (O.11) Similarly, we have the following inequalities: u(~ t ; ~ A t )u( t ; ~ A t )BMLD 2 h T (O.12) u( t ;G t )u( t ;G t )BMLD 2 h T (O.13) 144 Now, we bound u( t A t )u( t ;G t ) as follows: u( t ;A t )u( t ;G t ) (O.14) u(~ t ;A t ) +BMLD 2 h T u( t ;G t ) (O.15) u(~ t ; ~ A t ) +BMLD 2 h T u( t ;G t ) (O.16) u( t ; ~ A t ) + 2BMLD 2 h T u( t ;G t ) (O.17) u( t ; ~ A t ) + 3BMLD 2 h T u( t ;G t ) (O.18) 3BMLD 2 h T +At (O.19) by the definition of ~ A t and (O.4). With h T =dT e, we have u( t A t )u( t ;G t ) 3BMLD 2 dT e +At (O.20) 3BMLD 2 T +At : (O.21) Thus, we complete the regret bound forE[R n ] as follows: E[R n ] T X t=1 (3BMLD 2 T +At ) (O.22) 3BMLD 2 T 1 + A 1 + T 1+ : (O.23) P Proof of Lemma 12 First, we suppose that 1 (p t ) 2 (p t )< H(t) BM ;82G: (P.1) 145 We note that the following equation holds and will be used for analysis later. N Y i=1 a i N Y i=1 b i = N X i=1 a 1 :::a i1 (a i b i )b i+1 :::b N : (P.2) Without loss of generality, we can index the elements in G by G =f1; 2; 3:::;jGjg. Then we define a function c( 1 ; 2 ;G;Y ),u( 1 ;G)u( 2 ;G), i.e., c( 1 ; 2 ;G;Y t ) = jGj X s=Y t X G 0 G;jG 0 j=s Y 2G 0 1 (p t ) Y 2GnG 0 (1 1 (p t )) Y 2G 0 2 (p t ) Y 2GnG 0 (1 2 (p t ) : (P.3) We first define a function f(G 1 ;G 2 ;) as follows f(G 1 ;G 2 ;), Y 1 2G 1 ; 1 < (1 1 (p t 1 )) Y 1 2G 2 ; 1 < 1 (p t 1 ) Y 2 2G 1 ; 2 > (1 2 (p t 2 )) Y 2 2G 2 ; 2 > 1 (p t 2 )f 1 (p t ) 2 (p t )g; and a setS(G;Y t ),f(G 1 ;G 2 ;) :jG 1 j =jGjY t ;jG 2 j =Y t ;G 1 [G 2 =G;2G 2 g. We now show that c( 1 ; 2 ;G;Y ) can be rewritten as c( 1 ; 2 ;G;Y t ) = X (G 1 ;G 2 ;)2S(G;Y t ) f(G 1 ;G 2 nfg;): (P.4) If Y t =jGj, by equation (P.2), we have c( 1 ; 2 ;G;jGj) = Y 2G 1 (p t ) Y 2G 2 (p t ) (P.5) = X 2G Y 1 2G; 1 < 1 (p t 1 ) Y 2 2G; 2 > 2 (p t 2 )f 1 (p t ) 2 (p t )g (P.6) = X (G 1 ;G 2 ;)2S(G;jGj) f(G 1 ;G 2 nfg;); (P.7) 146 which implies that (P.4) holds for Y t =jGj. Now we suppose that Equation (P.4) holds for Y t , then we consider the case of Y t 1. By the definition of function c, we have c( 1 ; 2 ;G;Y t 1) (P.8) =c( 1 ; 2 ;G;Y t ) + X G 0 G;jG 0 j=Y t 1 Y 2G 0 1 (p t ) Y 2GnG 0 (1 1 (p t )) Y 2G 0 2 (p t ) Y 2GnG 0 (1 2 (p t ) : (P.9) Then, by using (P.2) and the definition of functionf and setS, we can write the second term of (P.9) as P (G 1 ;G 2 ;)2S(G;Y t 1) f(G 1 ;G 2 nfg;) P (G 1 ;G 2 ;)2S(G;jGjY t +1) f(G 2 nfg;G 1 ;). For each (G 1 ;G 2 ;)2S(G;jGjY t + 1), the corresponding (G 2 nfg;G 1 [fg;) is also inS(G;Y t ). Thus, we have X (G 1 ;G 2 ;)2S(G;jGjY t +1) f(G 2 nfg;G 1 ;) (P.10) = X (G 1 ;G 2 ;)2S(G;Y t ) f(G 1 ;G 2 nfg;) (P.11) =c( 1 ; 2 ;G;Y t ): (P.12) It follows that c( 1 ; 2 ;G;Y t 1) = X (G 1 ;G 2 ;)2S(G;Y t 1) f(G 1 ;G 2 nfg;) (P.13) which implies that (P.4) holds for all 1Y t jGj. With (P.1) and the definition of function f, we have f(G 1 ;G 2 ;)< H(t) BM for all . Then we further have u( 1 ;G)u( 2 ;G) (P.14) 147 X (G 1 ;G 2 ;)2S(G;Y t ) H(t) BM (P.15) = jGj Y t Y t H(t) BM (P.16) = jGj 1 Y t 1 jGj H(t) BM (P.17) H(t); (P.18) which contradicts u( 1 ;G) u( 2 ;G) +H(t), i.e., there exits 2 G such that 1 (p t ) 2 (p t ) + H(t) BM . Q Proof of Lemma 6 For a fixed integer n g , we supposeA 1 is the optimal set with cardinality n g where i = 2A 1 and 1 i n g . Thus, there exists a j 2 G 1 such that j > n g . We construct a set A 2 = (A 1 nfjg)[fig, whereA 1 nfjg =A 2 nfig. Then, we have u( t ;A 2 ) = Pr( P 2A 2 q t Y t )n g = (p t i )Pr( P 2A 2 nfig q t Y t 1) + (1(p t i ))Pr( P 2A 2 nfig q t Y t )n g and u( t ;A 1 ) =(p t j )Pr( P 2A 1 nfjg q t Y t 1)+(1(p t j ))Pr( P 2A 1 nfjg q t Y t )n g . Then, wehaveu( t ;A 2 )u( t ;A 1 ) = ((p t i )(p t j ))(Pr( P 2A 2 nfig q t Y t 1)Prf P 2A 2 nfig q t Y t )g 0 which is a contradiction. 148
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Dispersed computing in dynamic environments
PDF
Coded computing: a transformative framework for resilient, secure, private, and communication efficient large scale distributed computing
PDF
Coded computing: Mitigating fundamental bottlenecks in large-scale data analytics
PDF
Resource scheduling in geo-distributed computing
PDF
Theoretical foundations for dealing with data scarcity and distributed computing in modern machine learning
PDF
Building straggler-resilient and private machine learning systems in the cloud
PDF
Advancing distributed computing and graph representation learning with AI-enabled schemes
PDF
Enhancing collaboration on the edge: communication, scheduling and learning
PDF
Coding centric approaches for efficient, scalable, and privacy-preserving machine learning in large-scale distributed systems
PDF
Scheduling and resource allocation with incomplete information in wireless networks
PDF
Efficient delivery of augmented information services over distributed computing networks
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
PDF
Taming heterogeneity, the ubiquitous beast in cloud computing and decentralized learning
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Algorithmic aspects of energy efficient transmission in multihop cooperative wireless networks
PDF
Distributed interference management in large wireless networks
PDF
Quantum computation in wireless networks
PDF
Federated and distributed machine learning at scale: from systems to algorithms to applications
PDF
Optimizing distributed storage in cloud environments
PDF
Efficient data collection in wireless sensor networks: modeling and algorithms
Asset Metadata
Creator
Yang, Chien-Sheng
(author)
Core Title
On scheduling, timeliness and security in large scale distributed computing
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2021-12
Publication Date
09/14/2021
Defense Date
08/17/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
coded computing,dispersed computing,distributed computing,edge computing,OAI-PMH Harvest,Security,task scheduling,timely computation
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Avestimehr, Salman (
committee chair
), Javanmard, Adel (
committee member
), Krishnamachari, Bhaskar (
committee member
)
Creator Email
chienshy@usc.edu,paul555111@hotmail.com.tw
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC15916711
Unique identifier
UC15916711
Legacy Identifier
etd-YangChienS-10061
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Yang, Chien-Sheng
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
coded computing
dispersed computing
distributed computing
edge computing
task scheduling
timely computation