Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Dispersed computing in dynamic environments
(USC Thesis Other)
Dispersed computing in dynamic environments
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DISPERSED COMPUTING IN DYNAMIC ENVIRONMENTS
by
Jared Ray Coleman
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2024
Dedication
To mom, for seeing potential in me when I couldn’t and felt like no-one else did.
ii
Acknowledgements
First and foremost I would like to thank my wife, Tainã Coleman, who has unwaveringly supported me
throughout my academic career as a professional colleague, a fellow student, a teacher, and most
importantly, as a compassionate best friend. I owe any and all success to the support of my family — my
mother Jennifer Coleman, father David Coleman, sister Emily Coleman, in-laws Fernando, Valeria,
Matheus, and Rafaela Gariglio (Dias), grandparents, aunts, uncles, and friends. I want to extend a special
thank-you to Glenn Nelson, whose dedication continuously inspires me, and the rest of my teachers and
fellow students at the Nüümü Yadoha language program for helping me stay connected to my tribe in
Payahuunadü even while working and studying in Los Angeles.
I am grateful to my advisor, Dr. Bhaskar Krishnamachari, for showing me how a world-class academic
maintains high standards for quality work, encourages curiosity, and consistently demonstrates kindness
and understanding towards others. I will carry the lessons I learned from him, both personal and
professional, for the rest of my life. I also want to thank my defense committee members Dr. Konstantinos
Psounis and Dr. Jyo Deshmukh and thesis proposal committee members Dr. Rafael Ferreira da Silva and Dr.
Murali Annavaram for their helpful feedback and support. Finally, I would like to thank Dr. Oscar
Morales-Ponce, Dr. Evangelos Kranakis, and Dr. Danny Krizanc for their invaluable mentorship.
Collaborating with them over the past five years has been one of the highlights of my academic career so
far.
iii
This work was supported in part by Army Research Laboratory under Cooperative Agreement
W911NF-17-2-0196. The authors acknowledge the Center for Advanced Research Computing (CARC) at
the University of Southern California for providing computing resources that have contributed to the
research results reported within this publication. URL: https://carc.usc.edu.
Jared Coleman
University of Southern California
May 2024
iv
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Reproducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 2: Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 A Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 The HEFT Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Chapter 3: Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Scheduling in Dynamic Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Network Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Comparing Task Scheduling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 4: Task Scheduling for Dynamic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 GCNScheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 The Input Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.2 Imitation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.1 The Task graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Simulated IoT Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 5: Task Scheduling and Network Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1 A Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 The NSDC Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4 Implementation & Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
v
Chapter 6: Comparing Task Scheduling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.1 The SAGA Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.2 Benchmarking Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 Adversarial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3.2 Case Study: HEFT vs. CPoP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 Application-Specific PISA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Chapter 7: Comparing Task Scheduling Algorithm Components . . . . . . . . . . . . . . . . . . . . 70
7.1 A Generalized List-Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.2.1 Effects of Algorithmic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Chapter 8: Conclusion and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
vi
List of Tables
5.1 Risk, deploy-cost, and speeds for the different deployable network elements. . . . . . . . . 39
6.1 Schedulers implemented in SAGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Datasets available in SAGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.1 Parametric Task Graph Scheduling: pareto-optimal schedulers . . . . . . . . . . . . . . . . 79
vii
List of Figures
1.1 Example problem instance and schedule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Example problem instance and schedule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Example problem instance and schedule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1 GCNScheduler for Dynamic Networks: Approach . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Task Graph Structure for GCNScheduler for Dynamic Networks Experiments . . . . . . . 28
4.3 Example Problem Instance for GCNScheduler for Dynamic Networks Experiments . . . . . 29
4.4 GCNScheduler for Dynamic Networks: Average communication strength between nodes . 31
4.5 GCNScheduler for Dynamic Networks: Average communication strength between for one
node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.6 GCNScheduler Performance Results: Number of Average Makespan . . . . . . . . . . . . . 33
4.7 GCNScheduler Performance Results: Number of Executions . . . . . . . . . . . . . . . . . 33
5.1 Network Synthesis Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 The NSDC Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3 Network Synthesis: Mother Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Network Synthesis: Best Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.5 Network Synthesis: Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.1 Adversarial Algorithm Comparison: Benchmarking Results . . . . . . . . . . . . . . . . . . 49
6.2 Adversarial Algorithm Comparison: Comparison of Scheduling Algorithms on Slightly
Modified Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
viii
6.3 Adversarial Algorithm Comparison: Makespan Ratios of PISA-Identified Problem Instances 54
6.4 Problem Instance where HEFT performs ≈ 1.55 times worse than CPoP. . . . . . . . . . . 56
6.5 Problem Instance where CPoP performs ≈ 1.88 times worse than HEFT. . . . . . . . . . . 57
6.6 Example workflow structures for blast and srasearch scientific workflows. . . . . . . . . . 61
6.7 srasearch benchmarking results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.8 blast benchmarking and adversarial analysis results . . . . . . . . . . . . . . . . . . . . . . 63
6.9 bwa benchmarking and adversarial analysis results . . . . . . . . . . . . . . . . . . . . . . 64
6.10 epigenomics benchmarking and adversarial analysis results . . . . . . . . . . . . . . . . . . 65
6.11 1000genome benchmarking and adversarial analysis results . . . . . . . . . . . . . . . . . . 66
6.12 montage benchmarking and adversarial analysis results . . . . . . . . . . . . . . . . . . . . 67
6.13 seismology benchmarking and adversarial analysis results . . . . . . . . . . . . . . . . . . 68
6.14 soykb benchmarking and adversarial analysis results . . . . . . . . . . . . . . . . . . . . . 69
7.1 Parametric Task Graph Scheduling: example task graphs . . . . . . . . . . . . . . . . . . . 78
7.2 Parametric Task Graph Scheduling: pareto-optimal schedulers . . . . . . . . . . . . . . . . 81
7.3 Parametric Task Graph Scheduling: individual effect of priority function . . . . . . . . . . 82
7.4 Parametric Task Graph Scheduling: individual effect of comparison function . . . . . . . . 82
7.5 Parametric Task Graph Scheduling: individual effect of insertion scheme . . . . . . . . . . 83
7.6 Parametric Task Graph Scheduling: individual effect of critical path reservation . . . . . . 83
7.7 Parametric Task Graph Scheduling: individual effect of the sufferage selection scheme . . . 83
7.8 Parametric Task Graph Scheduling: individual effect of comparison function on the cycles
dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.9 Parametric Task Graph Scheduling: interactions between algorithmic components . . . . . 85
ix
Abstract
Scheduling a distributed application modeled as a directed acyclic task graph (where nodes represent
computation tasks and edges represent precedence constraints/data flow between the tasks) over a set of
networked compute nodes is a fundamental problem in distributed computing and thus has received
substantial scholarly attention. Most existing solutions, however, fall short of accommodating the dynamic
nature of modern dispersed computing systems (e.g., IoT, edge, and robotic systems) where applications
and compute networks are not known ahead of time, constantly changing, and are heavily
resource-constrained. We propose solutions that address this gap and show that novel methods for
understanding the limitations of existing task scheduling algorithms can inform the design of new
algorithms better suited for dynamic, unpredictable, and resource-constrained modern dispersed
computing systems.
The goal in task scheduling is to assign computational tasks from the given task graph to different
compute nodes in such a way that minimizes or maximizes some performance metric (e.g., total execution
time, energy consumption, throughput, etc.). We will focus on the task scheduling problem concerning
heterogeneous task graphs and compute networks with the objective of minimizing makespan (total
execution time). In particular, we explore the potential and limitations of both new and existing task
scheduling algorithms for constantly changing, unpredictable, and resource-constrained environments.
First, we show that many of the most widely-used task scheduling algorithms are not well-suited for
scenarios with highly dynamic network conditions and propose a novel approach to scheduling in such
x
environments. Leveraging recent advances in machine learning, we show that graph neural networks
(GNNs) can be used to quickly produce high-quality schedules in highly dynamic environments.
Then, we study the problem from a slightly different perspective. In real-world scenarios, it is often the
case that the compute network is not given, but rather is something that can be designed ahead of time. For
example, IoT system designers can decide where to place sensors and compute nodes in order to best
support their application. Because task scheduling is NP-hard [1], however, we know there must exist
problem instances on which even the (empirically) best algorithms perform poorly. In our work on
network synthesis, we show that, for this reason, optimizing networks with respect to traditional metrics
like communication strength can be detrimental to performance. We propose a framework for studying
network synthesis for task scheduling applications and demonstrate its utility with an interesting, tactical
IoT inspired use-case.
We also show that the traditional benchmarking approach to comparing existing task scheduling
algorithms is insufficient because problem instances on which algorithms perform poorly are often
remarkably similar to problem instances on which they perform well. We propose a new approach to
comparing task scheduling algorithms that can help us better understand the conditions under which an
algorithm performs well and poorly. Using our approach, we are able to find problem instances for which
every algorithm we consider performs twice as bad (produces a schedule with twice the makespan) as
another algorithm. In fact, for most algorithms, we are able to find conditions for which they perform five
times as bad as another algorithm. These are the first results of their kind in the literature and represent an
important step toward understanding the performance boundaries between algorithms.
Finally, we extend our efforts in comparing task scheduling algorithms to the individual algorithmic
components that make up the algorithms. We propose a generalized parametric list scheduling approach
for studying the individual and combined effects of such algorithmic components. We evaluate 72
algorithms produced by combining five different types of algorithmic components on 20 datasets and
xi
present results on their individual and combined effects on average performance and runtime. We also
discuss how these results differ for individual datasets, suggesting that the way algorithmic components
interact with each other is problem-dependent.
xii
Chapter 1
Introduction
The five-word title of this dissertation, “Dispersed Computing for Dynamic Environments” carries a lot of
meaning about the goal and scope of this work. Let’s start with the first term: dispersed computing, or
rather, the more well-known term distributed computing. Very generally, a distributed computation is one
that uses more than one computer (i.e., a distributed system) to solve a computational problem. There are
many potential advantages to distributing computations over multiple computers, but perhaps the most
obvious (and the most applicable to the work here) is speedup. By dividing up a computation and running
its individual pieces in parallel on different computers, we can potentially save a ton of time! In practice
though, there are many challenges to distributed computing, one of which is balancing the benefit of
parallelizing computation and the detriment of delays caused by sending data back and forth between
compute nodes (communication delays). We study a very well-known problem that involves exactly this
kind of balancing game: Task Scheduling.
Very generally, the goal in task scheduling is to assign computational tasks from the given task graph
to different compute nodes in such a way that minimizes or maximizes some performance metric (e.g., total
execution time, energy consumption, throughput, etc.). We will focus on the task scheduling problem
concerning heterogeneous task graphs and compute networks with the objective of minimizing makespan
1
(total execution time) under the related machines model∗
. Figure 1.1 depicts an example task graph,
compute network, and valid schedule. The task graph in the example has a typical fork-join type structure
c (t1, t2) = 0.6
c (t1, t3) = 0.5
c (t2, t4) = 1.3
c (t3, t4) = 1.6
t1 c(t1) = 1.7
t2 c(t2) = 1.2 t3 c(t3) = 2.2
t4 c(t4) = 0.8
(a) Task Graph
s (v1, v2) = 0.5
s (v1, v3) = 1.0
s (v2, v3) = 1.2
s(v1) = 1.0
s(v2) = 1.2
s(v3) = 1.5
v1
v2
v3
(b) Network
0 1 2 3 4 5
Time
v1
v2
v3
Nodes
t1 t3 t4
t2
(c) Schedule
Figure 1.1: Example problem instance and schedule.
that is quite common in distributed computing. Observe that tasks t2 and t3 both depend on output from
task t1 to run. This precedence constraint is the reason task t2 and task t3 are not scheduled to start
running until task t1 finishes. Also observe the gap in the schedule between the time that task t1 finishes
executing and task t2 starts executing. This gap in the schedule is caused by the time it takes for node v1
(the node task t1 is scheduled to run on) to send t1’s output data to node v3 (the node task t2 is scheduled
∗
In the related machines model, if the same task executes faster one some compute node n1 than on node n2, then n1 must
execute all tasks faster than n2 (n1 is strictly faster than n2). Observe this model cannot describe, for example, multi-modal
systems, where certain classes of tasks (e.g., GPU-heavy tasks) might run better/worse on different types of machines (e.g., those
with or without GPUs). The related machines model as it pertains to the task scheduling problem we study in this thesis is
described further in Section 2.1.
2
to run on). As evident by this example, it isn’t always the case that distributing computation across nodes
shortens the makespan of the schedule. In fact, for this example, scheduling all tasks to run on node v3
would result in a schedule with a much lower makespan!
Task Scheduling is an NP-Hard problem [1] and also not polynomial-time approximable within a
constant factor [2]. As a result, many heuristic algorithms have been proposed over the past decades. These
algorithms aren’t guaranteed to be optimal (most have no formal guarantees whatsoever) but rather have
been shown empirically to work reasonably well for real-world applications.
Before we get too deep into task scheduling, though, let us return to our term of interest, “dispersed
computing”. We use the term dispersed rather than distributed to narrow the scope of this work to the
kinds to applications that motivated it in the first place: distributed systems with physically dispersed
compute nodes. These kinds of systems are everywhere: from internet of things (IoT) systems to robotic
networks to edge computing systems. That the nodes are physically dispersed has some important
implications for the task scheduling problem. First, dispersed computing systems are typically very
resource-constrained. IoT nodes like raspberry pi devices (for example) do not have anywhere near the
compute power of a typical PC or cloud server node. Also, the communication strength between nodes is
typically weaker than in traditional distributed systems (cloud computing systems, supercomputing
systems, etc.) and may be related to the distance between nodes.
Dispersed systems are often different from traditional distributed systems in another key way that is
captured by the second term in this dissertation’s title: “dynamic environments”. Many of the computing
environments we’re interested in (IoT, edge, etc.) are inherently unpredictable or constantly changing. For
example, the communication strength between two IoT nodes may be dependent on their distance to each
other, weather, barriers, and other factors. This is especially true when the compute nodes are mobile (e.g.,
when they are mounted to a mobile robot).
3
Due to the proliferation of IoT devices and the increasing popularity of edge computing, dispersed
computing systems are receiving increasingly more attention from researchers in recent years [3]. Of
particular interest are dispersed computing systems for tactical environments (e.g., military applications)
where high-stakes applications are run on resource-constrained, unpredictable, and potentially unsafe
networks [4]–[8]. We will show that traditional task scheduling algorithms fall short of accommodating
these kinds of systems. We propose solutions that address this gap and show that novel methods for
understanding the limitations of existing task scheduling algorithms can inform the design of new
algorithms better suited for dynamic, unpredictable, and resource-constrained modern dispersed
computing systems. In particular, we explore three interesting research topics:
1. Scheduling in Dynamic Environments: How does dynamicity (e.g., constantly changing network
conditions) affect the performance of heuristic scheduling algorithms and how can they be adapted
to better serve applications running in these kinds of environments?
2. Network Synthesis: Due to the Task Scheduling problem being NP-Hard and not approximable
within a constant factor, there must exist scenarios (problem instances) for which heuristic
algorithms perform arbitrarily poorly. Can we design compute networks with the task scheduling
problem in mind to avoid these scenarios in practice?
3. Comparing Task Scheduling Algorithms: Under what conditions do heuristic scheduling
algorithms perform well and how do individual algorithmic components affect algorithm
performance?
While the motivation for this work comes from dispersed, dynamic distributed computing systems, the
methods and results we will present are general and can be applied to a wide range of task scheduling
problems. We believe the insights gained from this work will be useful for researchers and practitioners
working on task scheduling problems in a variety of domains.
4
In Chapter 2 we formally define the task scheduling variant that is the subject of this dissertation,
present a brief history the problem, and provide background on some of the technical tools to be used in
subsequent chapters. We then survey related work in Chapter 3.
In Chapter 4, we present our work in using GCNScheduler [9], a graph convolutional network based
scheduler trained to imitate a traditional “teacher” scheduling algorithm, for scheduling in highly dynamic
environments that traditional scheduling algorithms are not well suited for. We provide evidence to
support our hypothesis that machine learning based schedulers are useful for scenarios where traditional
scheduling algorithms are prohibitively slow to produce schedules (scenarios with constantly changing
network conditions).
In Chapter 5, we look at the task scheduling problem from a slightly different perspective. In real-world
scenarios, it is often the case that the compute network is not given, but rather, is something that can be
designed specifically to support one or more applications (e.g., IoT sensor/compute node allocation,
cloud/super-computer design, etc.). We show that optimizing networks with respect to traditional metrics
like communication strength can be detrimental to performance. Specifically, we present an example where
strictly improving the communication strength between two nodes in the network causes HEFT [10], one
of the most popular task scheduling algorithms, to produce a schedule with twice the makespan of the
schedule it produces for the initial (worse) network. We propose a framework for studying task scheduling
network synthesis problems and demonstrate its utility with an interesting use-case.
Building upon our observation from Chapter 5 that task scheduling algorithms can perform poorly on
problem instances that are remarkably similar to problem instances on which they perform well, we
propose in Chapter 6 a novel method for comparing existing task scheduling algorithms to better
understand the conditions under which their performances degrade. Despite the overwhelming number of
algorithms now available in the literature, open-source implementations of them are scarce. Even worse,
the datasets on which they were evaluated on are often not available and the implementations that do exist
5
are not compatible with each other (different programming languages, frameworks, problem instance
formats, etc.). In order to address this technological shortcoming and enable this work, we built SAGA, a
Python framework for running, evaluating, and comparing task scheduling algorithms [11].
SAGA’s modularity and extensibility makes it easy to benchmark algorithms on various datasets (SAGA
currently includes datasets of randomly generated problem instances and datasets based on real-world
scientific workflows and IoT/edge computing applications). The underlying theme throughout Chapter 6,
however, is that the traditional benchmarking approach to comparing task scheduling algorithms is
insufficient. Benchmarking is only as useful as the underlying datasets are representative and, in practice,
peculiarities of heuristic algorithms for task scheduling make it difficult to tell just what broader family of
problem instances a dataset is really representative of. For this reason, benchmarking results for task
scheduling algorithms can be misleading. We show examples and provide methods for the automatic
discovery of in-family problem instances — that is, ones that are similar to other problem instances in the
dataset — on which algorithms that appear to perform well on the original dataset perform very poorly. In
fact, for every scheduling algorithm we consider (15 total), our proposed simulated-annealing based
adversarial instance finder (PISA) finds problem instances on which it performs at least twice as bad as
another of the algorithms. For most of the algorithms (10 of 15), it finds a problem instance on which the
algorithm performs at least five times worse than another algorithm!
In Chapter 7, we turn our attention to the individual algorithmic components of well-known heuristic
scheduling algorithms. We propose a general parametric scheduler (extending SAGA) that allows us to
mix-and-match different algorithmic components and evaluate how they individually contribute to a
list-scheduling algorithm’s performance and runtime. Interestingly, we find that many new algorithms
(composed of previously unstudied combinations of such components) are pareto-optimal with respect to
performance and runtime for the datasets we evaluate on. We also report the average effects that both
individual components and combinations of different components have on performance and runtime,
6
providing evidence that the way algorithmic components interact with each other is problem-dependent
(i.e., depends on the task graph structure, whether or not the application is communication or computation
heavy, etc.).
Finally, in Chapter 8, we provide a summary of the contributions presented in this dissertation and the
avenues for future research they open up.
1.1 Reproducibility
In order to support and encourage reproducible science, links to all code, datasets, and other publicly
available resources used for the work presented in this dissertation have been aggregated into a single
GitHub repository at https://github.com/jaredraycoleman/phd_thesis_code.
7
Chapter 2
Background
We will now formally define the task scheduling problem that is the subject of this dissertation, present a
brief history of task scheduling problems, and introduce some of the tools to be used in subsequent
chapters.
2.1 Problem Definition
Let us denote the task graph as G = (T, D), where T is the set of tasks and D contains the directed edges
or dependencies between these tasks. An edge (t, t′
) ∈ D implies that the output from task t is required
input for task t
′
. Thus, task t
′
cannot start executing until it has received the output of task t. This is often
referred to as a precedence constraint. For a given task t ∈ T, its compute cost is represented by c(t) ∈ R
+
and the size of the data exchanged between two dependent tasks, (t, t′
) ∈ D, is c(t, t′
) ∈ R
+. Let
N = (V, E) denote the compute node network, where N is a complete undirected graph. V is the set of
nodes and E is the set of edges. The compute speed of a node v ∈ V is s(v) ∈ R
+ and the communication
strength between nodes (v, v′
) ∈ E is s(v, v′
) ∈ R
+. Under the related machines model [12], the execution
time of a task t ∈ T on a node v ∈ V is c(t)
s(v)
, and the data communication time between tasks (t, t′
) ∈ D
from node v to node v
′
(i.e., t executes on v and t
′
executes on v
′
) is c(t,t′
)
s(v,v′)
.
8
The goal is to schedule the tasks on different compute nodes in such a way that minimizes the
makespan (total execution time) of the task graph. Let A denote a task scheduling algorithm. For a given a
problem instance (N, G) which represents a network/task graph pair, a schedule is a set of tuples of the
form (t, v, r) such that t ∈ T, v ∈ V is the node that task t is scheduled to run on, and r ∈ R
+ is the start
time of the task. A valid schedule S must satisfy the following properties
• All tasks must be scheduled exactly once:
∀t ∈ T, ∃(t, v, r) ∈ S
∀(t, v, r),(t
′
, v′
, r′
) ∈ S, t = t
′ ⇒ (v, r) = (v
′
, r′
)
• Only one task can be scheduled on a node at a time (i.e., their start/end times cannot overlap):
∀(t, v, r),(t
′
, v, r′
) ∈ S, r +
c(t)
s(v)
≤ r
′ ∨ r
′ +
c(t
′
)
s(v)
≤ r
• A task cannot start executing until all of its dependencies have finished executing and their outputs
have been received at the node on which the task is scheduled:
∀(t, v, r),(t
′
, v′
, r′
) ∈ S,(t, t′
) ∈ D =⇒ r +
c(t)
s(v)
+
c(t, t′
)
s(v, v′)
≤ r
′
Figure 2.1 depicts an example problem instance and solution (the same example as shown in the
Introduction). We define the makespan of a schedule S as the time at which the last task finishes executing:
m(S) = max
(t,v,r)∈S
r +
c(t)
s(v)
9
c (t1, t2) = 0.6
c (t1, t3) = 0.5
c (t2, t4) = 1.3
c (t3, t4) = 1.6
t1 c(t1) = 1.7
t2 c(t2) = 1.2 t3 c(t3) = 2.2
t4 c(t4) = 0.8
(a) Task Graph
s (v1, v2) = 0.5
s (v1, v3) = 1.0
s (v2, v3) = 1.2
s(v1) = 1.0
s(v2) = 1.2
s(v3) = 1.5
v1
v2
v3
(b) Network
0 1 2 3 4 5
Time
v1
v2
v3
Nodes
t1 t3 t4
t2
(c) Schedule
Figure 2.1: Example problem instance and schedule.
Because the problem of minimizing makespan is NP-hard for this model [1], many heuristic algorithms
have been proposed. Traditionally, these heuristic algorithms are evaluated on a set of problem instances
and compared against other algorithms based on their makespan ratio, which for a given problem instance
(N, G) is the makespan of the schedule produced by the algorithm divided by the minimum makespan of
the schedules produced by the baseline algorithms. Let SA,N,G denote the schedule produced by algorithm
A on problem instance (N, G). Then the makespan ratio of an algorithm A against a set of baseline
algorithms A1, A2, . . . for a problem instance (N, G) can be written
m (SA,N,G)
min {m (SA1,N,G), m (SA2,N,G), m (SA3,N,G), . . .}
.
10
Makespan ratio is a commonly used concept in the task scheduling literature [10], [13]. It is common
to measure the makespan ratios of an algorithm (against a set of baseline algorithms) for a dataset of
problem instances. System designers use this technique, called benchmarking, to decide which algorithm(s)
best support(s) their application.
We can define the runtime ratio of an algorithm in similar terms. Let rA(N, G) denote the amount of
time it takes to produce a schedule (on a particular system) using algorithm A. Then, the runtime ratio of
an algorithm A against a set of baseline algorithms A1, A2, . . . for a problem instance (N, G) can be
written
rA (N, G)
min {rA1
(N, G), rA2
(N, G), rA3
(N, G), . . .}
.
In reality, the actual runtime of an algorithm even on a particular instance is not deterministic (due, for
example, to background processes running on the system). Thus, the runtime ratios reported in this
manuscript should be interpreted as estimated runtime ratios, rather than absolute deterministic values
(unlike the makespan ratios).
2.2 A Brief History
Task scheduling is a fundamental problem in distributed computing and has been studied for decades. A
comprehensive survey by Ronald Graham in 1979 classified multiple variants of the task scheduling
problem and proposed a structured approach to analyze their complexity classes [14]. Though the variant
of task scheduling discussed in this dissertation wasn’t directly addressed in their study, it did address
many facets of task graph scheduling, such as node heterogeneity, precedence constraints, and makespan
minimization. It did not, however, consider inter-node communication. This gap was addressed a decade
later in 1989, by Hwang et al., who introduced the concept of task graph scheduling with inter-node
11
communication (albeit only considering homogeneous compute speeds). They proposed the ETF (Earliest
Task First) algorithm and proved that it produces schedules with makespan of at most (2 − 1/n)ω
(i)
opt + C
where ω
(i)
opt is the optimal makespan without considering inter-node communication delays and C is the
worst-case communication requirement over a terminal chain of tasks (one which determines the
makespan). The task scheduling problem we consider is now known to be NP-Hard [1] and not
polynomial-time approximable within a constant factor [2].
Over time, as distributed computing came into the mainstream, many heuristic scheduling algorithms
emerged [7], [10], [15]–[25]. These algorithms typically have no formal guarantees but have been shown
empirically (through benchmarking) to produce low-makespan schedules for many real-world, practical
applications. In this dissertation, our goal is to explore the potential and limitations of such algorithms for
constantly changing, unpredictable, and resource-constrained dynamic environments.
2.3 The HEFT Scheduling Algorithm
HEFT [10] is one of the most popular task scheduling heuristic algorithms and is referenced frequently
throughout this dissertation. The goal of this section is to familiarize the reader with heuristic task
scheduling algorithms (and task scheduling in general) through a detailed description of the HEFT
algorithm.
In short, HEFT is a greedy algorithm that assigns tasks to the node which, given all previously
assigned tasks, would result in the earliest finish time of the task [10]. The first step of the algorithm is to
compute task ranks which are used to decide the order in which tasks are scheduled. For a given task
t ∈ T, let succ(t) denote the set of tasks that are dependent on t, {t
′ ∈ T : (t, t′
) ∈ D} (i.e., task t’s
12
successors). The rank of a task is the sum of its average execution time (across all nodes) and the maximum
average communication time (across all links) of its successors:
rank(t) = c(t) + max
t
′∈succ(t)
rank(t
′
) + c(t, t′
)
where c(t) is the average execution time of task t across all nodes and c(t, t′
) is the average
communication time between tasks t and t
′
across all communication links. Observe that the rank is
recursively defined and that the rank of a task is always greater than the rank of its successors. HEFT
schedules tasks in decreasing order of rank.
In scheduling each task t, HEFT computes the earliest start time (EST) and earliest finish time (EFT) of
the task on each node v. Let pred(t) denote the set of tasks that task t is dependent on,
{t
′ ∈ T : (t
′
, t) ∈ D} (i.e., task t’s predecessors). Let AFT(t
′
) be the time at which a previously scheduled
task t
′ finishes executing and NODE(t
′
) be the node on which task t
′
is scheduled. Then, we can define
DAT (data arrival time) to be the time at which the output data of all predecessors of task t is available at
node v:
DAT(t, v) = max
t
′∈pred(t)
AFT(t
′
) + c(t
′
, t)
s(NODE(t
′), v)
Then EST(t, v) is the earliest time such that all data from the task’s predecessors is available at the node
(DAT(t, v)) and the node is available to execute the task (i.e., no other task is scheduled to run on the
node between time EST(t, v) and EST(t, v) + c(t)
s(v)
). This could be as late as the time at which the node
finishes executing last scheduled task. The EFT, then, is simply the EST plus the time it takes to execute the
task on the node:
EFT(t, v) = EST(t, v) + c(t)
s(v)
1
The algorithm schedules the task on the node with the earliest EFT:
v
∗ = argminv∈V
EFT(t, v)
Consider the example in Figure 2.2. The ranks of the tasks can then be computed as follows:
c (t1, t2) = 1
c (t1, t3) = 1
c (t2, t4) = 5
c (t3, t4) = 5
t1 c(t1) = 1
t2 c(t2) = 3 t3 c(t3) = 2
t4 c(t4) = 1
(a) Task Graph
s (v1, v2) = 2
s (v1, v3) = 2
s (v2, v3) = 1
s(v1) = 1
s(v2) = 2
s(v3) = 2
v1
v2
v3
(b) Network
0 1 2 3 4 5 6
Time
v1
v2
v3
Nodes
t4
t1 t2
t3
(c) Schedule produced by HEFT.
Figure 2.2: Example problem instance and schedule.
14
rank(t4) = 1
2
+
1
2
+ 1 = 2
rank(t2) = 3
2
+
3
2
+ 3 + max{rank(t4)} = 8
rank(t3) = 2
2
+
2
2
+ 2 + max{rank(t4)} = 6
rank(t1) = 1
2
+
1
2
+ 1 + max{rank(t2), rank(t3)} = 10
The algorithm then schedules the tasks in decreasing order of rank: t1, t2, t3, t4. The first task is the easiest
since it has no dependencies and EST(t1, v) = 0 for all v. Thus, task t1 is scheduled on the fastest node,
v2 and finishes at time 1
2
. The next task is t2. Since t2 depends on t1, the earliest time that t2 can start is
when t1 finishes and the output of t1 arrives at the node on which t2 is scheduled. We can compute the
EST for t2 on each node:
EST(t2, v1) = AF T(t1) + c(t1, t2)
s(v1)
=
1
2
+
1
2
= 1
EST(t2, v2) = AF T(t1) + 0 = 1
2
EST(t2, v3) = AF T(t1) + c(t1, t2)
s(v3)
=
1
2
+ 1 =
3
2
Then the EFT for t2 on each node is:
EFT(t2, v1) = EST(t2, v1) + c(t2)
s(v1)
= 1 + 3 = 4
EFT(t2, v2) = EST(t2, v2) + c(t2)
s(v2)
=
1
2
+
3
2
= 2
EFT(t2, v3) = EST(t2, v3) + c(t2)
s(v3)
=
3
2
+
3
2
= 3
Thus, t2 is scheduled on v2 and finishes at time 2. The process continues for t3 (scheduled on node v3,
finishing at time 2.5) and t4 (scheduled on node v1, finishing at time 6). The makespan of the schedule,
then, is 6. Observe that the schedule produced by HEFT for this problem instance is not optimal. A much
15
better solution would be to schedule all tasks on node v2 (the resulting makespan would be 3.5). Recall that
the task scheduling problem is NP-Hard and that no polynomial-time algorithm can guarantee an optimal
schedule. Heuristic algorithms like HEFT have been empirically shown to produce low-makespan
schedules for many real-world applications, but they are not guaranteed to produce optimal (or even
approximately optimal) schedules. We explore this further in Chapters 6 and 7.
2.4 Graph Neural Networks
In this section, we present the background on graph neural networks (GNNs) necessary to understand the
solution and results presented in Chapter 4. Graph neural networks (GNNs) are a class of neural networks
that operate on graph-structured data (as opposed, for example, to grid-structured data like images). To
understand how GNNs work, it can be useful to interpret graph edges as message passing channels [26].
Each node is initialized with an embedding (its feature vector) and, in each layer, the embedding of a node
is updated by aggregating the embeddings of its neighbors. EDGNN [27] was proposed to capture the
nonreciprocal relationship between nodes in directed graphs by treating incoming and outgoing edges
differently. The embedding for a node u in the input graph is computed as follows:
h
(t)
n,u = σ
W(t)
1 h
(t−1)
n,u + W(t)
2
X
neighbor v
h
(t−1)
n,v
+W(t)
3
X
v:(v,u)∈E
h
(t−1)
e,(v,u) +W(t)
4
X
v:(u,v)∈E
h
(t−1)
e,(u,v)
,
(2.1)
where h
(t)
n,v and h
(t)
e,(u,v)
denote the layer-t embeddings of node v and of the edge from node u to node v,
respectively. W(t)
1
, W(t)
2
, W(t)
3
, and W(t)
4
represent the weight matrices of layer t for the embedding of
the node itself, neighboring nodes, incoming edges, and outgoing edges, respectively.
Finally, σ is a non-linear activation function (e.g., ReLU).
16
GNNs have been used in a variety of applications, including social network analysis, recommendation
systems, and bio-informatics [28]. In Chapter 4, we propose using GCNScheduler, a recently proposed
GNN-based scheduling algorithm that uses GNNs to imitate a “teacher” scheduling algorithm (e.g., HEFT),
for scheduling in highly dynamic environments.
2.5 Simulated Annealing
In this section, we present an overview of simulated annealing that will be useful for understanding the
algorithm and results presented in Chapter 6. Simulated annealing is a probabilistic optimization algorithm
that is inspired by the annealing process in metallurgy [29], where a heated metal is carefully cooled to
reach a desirable, low-energy state. In the context of optimization, we start with an initial solution
(analogous to the heated metal) and gradually improve it by making small random changes to the solution.
As the algorithm progresses, we settle on a (hopefully) low-cost solution by gradually making it less
susceptible to changes (analogous to cooling the metal). The algorithm works as follows. First, we start
with an arbitrary solution. In each iteration, the algorithm generates a new solution by making a small
change to the current solution. The algorithm then evaluates the change in “energy” (i.e., cost) between the
new solution and the current solution and decides whether or not to accept the new solution. If the new
solution is better than the current solution, the algorithm always accepts it. If the new solution is worse
than the current solution, however, the algorithm accepts it with a probability that depends on the change
in energy and a temperature parameter. The temperature gradually decreases over time, and, as a result,
the probability of accepting worse solutions decreases. This allows the algorithm to escape local optima in
the beginning, when the temperature is high, and still settle on a good solution by the end of the
optimization, when the temperature is low.
Let E(S) denote the energy (cost) of a solution S. The algorithm works as follows:
1. Initialize the temperature T and the initial solution S.
17
2. Repeat until the stopping criterion is met (a maximum number of iterations is reached or the
temperature falls below a certain threshold):
(a) Generate a new solution S
′ by making a small change to the current solution S.
(b) Calculate the change in energy ∆E = E(S
′
) − E(S).
(c) If ∆E < 0, accept the new solution S
′
.
(d) If ∆E ≥ 0, accept the new solution S
′ with probability e
−∆E/T
.
(e) Lower the temperature T.
Simulated annealing is a very general-purpose optimization algorithm that has been used to solve
innumerable problems in image processing, network design, combinatorics (like the famous knapsack
problem and the traveling salesman problem), aircraft trajectory planning, and many other areas [30], [31].
In Chapter 6, we use simulated annealing to find problem instances (solutions) for which an algorithm
performs maximally poorly compared to a given baseline algorithm.
18
Chapter 3
Related Work
In this chapter, we survey related work for each of these research topics introduced in Chapter 2.
3.1 Scheduling in Dynamic Environments
For dynamic environments like IoT systems, task scheduling comes with unique challenges: nodes tend to
be more severely compute constrained, given their size, and more severely communication constrained,
given their wireless nature. In Chapter 4, we discuss task scheduling for tactical IoT settings, where
highly-volatile network dynamics (i.e., constantly changing communication strengths between nodes)
impose the need for quick schedule computation. Recent work in learning-based algorithms shows promise
in reducing computation times to address this challenge [32], [33]. This is an emerging research topic,
however, and the literature is still developing. Most existing work, for example, considers only static
network conditions [34]. To effectively exploit the relational information of graph-structured data, graph
neural networks (GNNs) have recently become a popular method for approaching optimization problems
in wireless networks [32]. GCNScheduler, a new approach to task scheduling that works by training a
GNN to imitate a traditional scheduling algorithm, shows promise for dynamic environments due to its
very quick runtime [9]. In Chapter 4, we propose using GCNScheduler for scheduling in dynamic
19
environments, where traditional scheduling algorithms like HEFT [10] are too slow for networks with
constantly changing communication strengths.
3.2 Network Synthesis
Network synthesis addresses the important problem of deploying scarce resources (communication nodes,
sensors, compute devices, etc.) to best support a given task. Prior work in network synthesis has largely
focused either purely on communication (k-connectivity [35], [36], fault-tolerance [37]) or sensor
coverage [38] (and sometimes hybrids the two [39]). As the internet of things comes into the battlefield
domain, though, the nature of tactical networks is shifting and operations like search and rescue or
surveillance will rely on in-network distributed edge computing of sensor data.
In Chapter 5, we propose a novel network synthesis problem inspired by next-generation tactical
network capabilities, prior work in communication-oriented network synthesis, and task scheduling [40]
(e.g., HEFT [10]). In particular, we study the network synthesis problem of selecting one among a set of
deployable networks which optimally supports a given tactical mission modeled as the distributed and
coordinated execution of complex tasks/applications. To the best of our knowledge, our work is the first to
show that optimizing for traditional metrics like communication strength is insufficient for synthesizing
networks to support applications scheduled by heuristic task scheduling algorithms. The framework we
propose for studying these kind of network synthesis problems is also novel.
3.3 Comparing Task Scheduling Algorithms
Existing approaches to comparing task scheduling algorithms mostly involve benchmarking, whereby a set
of algorithms is evaluated on one or more datasets and various performance metrics are reported for
comparison (see benchmarking/comparison/survey papers [13], [17], [41]–[43]). One of the most common
20
metrics (and the one we use in Chapters 6 and 7) is the makespan ratio (also called schedule length ratio)
which is the makespan of the schedule produced by an algorithm for a given problem instance normalized
by the minimum makespan produced by any of the algorithms being evaluated. Other metrics include
speedup (how many times larger the makespan would be if the algorithm scheduled all tasks to run on a
single node), efficiency (average speedup per node), frequency that the algorithm is the best among those
being evaluated, and slack (a measurement of schedule robustness) [44].
We study many different heuristic scheduling algorithms and benchmark their performance on
different datasets (we also propose a adversarial approach as an alternative to benchmarking in Chapter 6).
We will also explore how individual components of algorithms (e.g., how they prioritize tasks, how they
choose which node to assign a task to, etc.) contribute to the performance and runtime of different
scheduling algorithms. When studying the effects of individual algorithmic components, we restrict our
attention to one of the most popular task scheduling paradigms: list scheduling. In general, list scheduling
algorithms involve two steps:
1. Compute a priority p(t) for each task t.
2. Greedily schedule tasks in order of their computed priority to run on the node that minimizes or
maximizes some predefined cost function.
In other words, list scheduling algorithms first decide on a valid topological sort of the task graph, and then
use it to schedule tasks greedily according to some objective function. ETF (mentioned above) is a list
scheduling algorithm. Two of the most well-known heuristic algorithms for task graph scheduling, HEFT
(Heterogeneous Earliest Finish Time) and CPoP (Critical Path on Processor) [10], are also list scheduling
algorithms. Other paradigms studied in the literature include cluster-scheduling [13] and
meta-heuristic [18] algorithms. Cluster scheduling involves dividing the task graph into groups of tasks to
execute on a single node. Meta-heuristic approaches (e.g., simulated annealing or genetic algorithms) have
21
been shown to work well in some situations, but generally take longer to run and can be difficult to
tune [41]. Online and distributed scheduling algorithms have also been proposed [45]–[47].
Among the many notable benchmarking efforts in the literature, one study [43] benchmarks 15
scheduling algorithms and even discusses many of the algorithmic components that we consider Chapter 7,
including Assigning Priorities to Nodes (tasks), Insertion vs. Non-Insertion, and Critical-Path-Based vs.
non Critical-Path-Based schedulers. They do not, however, study all combinations of such components and
also do not report results on their individual and combined effects on performance or runtime. Another
evaluation [17] evaluates eleven algorithms for independent non-communicating tasks with uniformly
random costs and heterogeneous compute nodes with uniformly random speeds and yet another [44]
evaluates six algorithms on randomly generated application graphs and real-world application graphs (FFT,
Gaussian Elimination, Montage, and Epigenomics Scientific Workflow). Most similar to the work we
present in Chapter 7, a comprehensive evaluation of 31 list scheduling algorithms with different
combinations of algorithmic components [13] (the same paper also presents results for cluster scheduling
algorithms), reports useful benchmarking results but does not consider the individual effects of such
components.
To the best of our knowledge, the work presented in Chapter 6 is the first to propose an alternative that
addresses some of the gaps of the typical benchmarking approach to comparing task scheduling algorithms.
The work presented in Chapter 7 is the first, to the best of our knowledge, to study all combinations of a set
of algorithmic components, benchmark each of the 72 algorithms built from such combinations on diverse
datasets, and report results the individual and combined effects of the individual algorithmic components
on performance (makespan of the schedules produced) and runtime (time to produce schedules).
22
Chapter 4
Task Scheduling for Dynamic Networks∗
Existing solutions for scheduling arbitrarily complex distributed applications on networks of
computational nodes are insufficient for scenarios where the network topology is changing rapidly. New
Internet of Things (IoT) domains like the Internet of Robotic Things (IoRT) and the Internet of Battlefield
Things (IoBT) demand solutions that are robust and efficient in environments that experience constant or
rapid change. We demonstrate how recent advancements in machine learning (in particular, in graph
convolutional neural networks) can be leveraged to solve the task scheduling problem with decent
performance and in much less time than traditional algorithms.
The Internet of Battlefield Things (IoBT) is a subdomain of IoT which considers connected devices
(sensors, actuators, compute-nodes, etc.) that support critical military operations [49]. Common
assumptions about network connectivity, node reliability, and environmental conditions that are acceptable
for civilian IoT applications are unacceptable in high-stakes battlefield environments. Motivated by the
IoBT domain, we study the problem of scheduling arbitrarily complex distributed applications over
resource-constrained mobile patrol robots. The mobility of the patrolling network nodes results in dynamic
communication conditions and demands careful attention to the compute resources available. We consider
distributed applications which are modeled as directed acyclic task graphs. Each node in a task graph
∗The work presented in this chapter has been previously published and appears in [48].
23
Figure 4.1: An overview of our approach: Task graphs are augmented with node and edge feature vectors
that encode data from the network profile. GCNScheduler learns to imitate a teacher algorithm and assign
tasks to network nodes.
represents a compute task and edges between tasks represent a dependency relationship (i.e., one task
cannot start until its dependencies have terminated).
The purpose of a scheduler is to determine where and when tasks should execute in order to minimize
some metric like makespan (total execution time). Efficient schedulers must balance the cost and benefit of
sending tasks to execute in parallel on different compute nodes. On one hand, sending a task and its input
data to execute on another node takes time while on the other, executing two tasks in parallel saves time.
For this study, we constructed a task graph that resembles many real-world applications and is a good
demonstration of this trade-off (Figure 4.2).
Deploying complex distributed applications over a network of mobile compute nodes is challenging
when the network experiences constant or rapid change. A schedule that minimizes makespan for the
network at a particular time may not be optimal (or anywhere near optimal) for the network at a future
time. Existing scheduling algorithms that leverage the entire network state suffer with respect to
complexity as the number of tasks or network nodes increase. In order to adapt to the rapid dynamics of
24
the IoBT domain, a scheduling algorithm needs to be capable of computing and re-computing complex
schedules quickly. More efficient online algorithms that only consider partial network information (i.e.,
neighborhood information) have been proposed [45]–[47]. These algorithms produce schedules quickly
and dynamically but often result in larger makespans since they do not leverage all the information
available about a network.
We demonstrate how recent advancements in artificial intelligence can be applied to solve practical
problems in the Internet of Robotic Things domain. We demonstrate through simulation that Graph
Convolutional Networks can learn to produce schedules with low makespans quickly, making them
suitable for IoBT applications.
Contributions and highlights of this work include
• The design of application-specific training scenarios.
• The evaluation of GCNScheduler for highly dynamic networks.
• An open source simulation environment to test conventional and learning-based schedulers for
IoBT†
.
4.1 GCNScheduler
GCNScheduler applies recent advancements in machine learning for graphs to the task scheduling problem
[9]. Task scheduling presents a unique challenge for graph-based learning because it involves two different
graphs - the task graph (with task and dependency costs) and the network (with node compute capabilities
and communication strengths). The primary innovation of GCNScheduler is the combination of the two
graphs into one graph with the same structure as the task graph, but with node and edge features that
†
See the repository with pointers to all of the publicly available resources used in this dissertation at https://github.com/
jaredraycoleman/phd_thesis_code
25
capture the network configuration. Our previous work shows that that GCNScheduler can produce
schedules in a fraction of the time of HEFT [9]. Here we summarize it briefly.
4.1.1 The Input Graph
For a single network N = (V, E) and task graph G = (T, D), GCNScheduler uses the task graph nodes
and edges to construct the input graph, i.e. Ginput := (T
′
, D′
). Since the makespan (along with other
objective metrics) is a function of the required computational time of tasks across machines and the
required communication times to transfer task outputs to successor tasks across pairs of machines,
GCNScheduler incorporates this information into node and edge features. The node feature vector for a
task t ∈ T is defined as:
xn,t :=
c(t)
s(1),
c(t)
s(2), · · · ,
c(t)
s(|V |)
⊤
and the edge feature vector for an dependency (t, t′
) ∈ D is defined as:
xe,(t,t′)
:=
c(t, t′
)
s(1, 1),
c(t, t′
)
s(1, 2), · · · ,
c(t, t′
)
s(|V |, |V |)
⊤
The intuition behind xn,t is that these features represent the required computational time of task t on each
compute node. Similarly, the intuition behind xe,(t,t′)
is that these features represent the required time for
transferring the result of executing task t to the dependent task t
′ between each pair of compute nodes.
Recall that our goal is to train GCNScheduler so that it is generally effective for a given class of
networks or task graphs. Thus, our training data (and therefore input graph) must include labeled data for
many task graph/network pairs. Let Ginput(N, G) represent the input graph for a single network N and
task graph G as described above. Then for a given set of networks N and set of task graphs G, the input
26
graph for GCNScheduler during training is Ginput := S
(N,G)∈N ×G Ginput(N, G). Essentially, the input
graph is a forest of task graphs with network configurations encoded into node and edge feature vectors.
4.1.2 Imitation Learning
For the training data, GCNScheduler requires that each node in Ginput be labeled using some
predetermined “teacher” algorithm. We use HEFT as the teacher algorithm. So, for every task graph
G = (T, D) and network N = (V, E) pair in the training data, we label each node t ∈ T with the node
v ∈ V that HEFT assigns the task to execute on.
4.2 Implementation Details
4.2.1 The Task graph
The task graph we use features a common pattern for distributed applications modeled as task graphs with
four parallel chains of tasks that can be executed in parallel (Fig. 4.2). We assume the data cost of each edge
is 1 for all dependencies. In order to demonstrate that GCNScheduler generalizes for a class of task graphs,
we assume tasks costs are drawn from a truncated normal distribution with mean 1, standard-deviation
1/2, and lower/upper bounds of 0 and 2 respectively. For our dataset, we generated 50 task graphs (30 for
training, 10 for validation, and 10 for testing).
4.2.2 Simulated IoT Network
For the purpose of comparison between GCNScheduler and HEFT makespans over time, we consider a
relatively small network of 10 robots equally spaced along the perimeter of a randomly generated polygon.
We generate polygons by first drawing the number of vertices k from {3, 4, . . . , 10} uniformly at random
and then using the well-known 2-opt algorithm [51] to generate a k-vertex polygon in O(k
3
) time. Each
robot patrols counter-clockwise along the perimeter of the polygon at the same speed. For consistency
27
Figure 4.2: Task graph used for our experiments with a single source nodes and four parallel four-node
chains of tasks. Dependency data sizes are assumed to be 1 while task cost is drawn from a truncated normal
distribution with mean 1 and endpoints 0 and 2. Its structure is inspired by the epigenomics scientific
workflow from [50], it captures the essence of many broader applications where the processing involves
multiple parallel chains of tasks.
across experiments, we scale polygons so that it takes exactly 500 units of time for a robot to make one trip
around the polygon. Each robot is assumed to have equal compute speed of 1. Two robots can
communicate directly whenever they are within s := 50 units of distance from each other (this number
was chosen so that adjacent robots are always connected at the maximum distance). Two robots that are
closer than this distance (due to the shape of the polygon) have a communication strength of
1 − b
1 + e
c(2d/s−1) + b
where d is the distance between the robots, b := 0.2 (so that the communication strength between two
directly connected nodes is between 0.2 and 1), and c := 5 (a constant that affects the shape of the inverse
sigmoid). Observe that, by construction, the network is always connected but not necessarily fully
connected. Since HEFT requires all pairwise communication strengths, we augment the networks to
consider multi-hop communication strength between nodes that are further than distance s apart. Let
28
Figure 4.3: Example problem instance where one region of the polygon is highly connected due to its shape
and another region is not.
r1, r2 ∈ R be two robots further than s away from each other. Then their communication strength is given
by
1
Pl
i=1 1/comm(pi
, pi−1)
where p = (p0, p1, . . . , pl) is a shortest path (with respect to inverse communication strength) between r1
and r2 such that p0 = r0 and pl = r1. This results in highly dynamic communication strengths, as shown
in Fig. 4.4. Figure 4.3 shows an example of a polygon and the resulting network.
4.2.3 Training
For our dataset, we generated 10 random polygons and, for each of these, construct the 10 corresponding
networks after robots patrol for 0, 50, 100, . . . , 450 units of time. This results in 100 networks in our
dataset (60 for training, 20 for validation, and 20 for testing) and a total of 2200 task graph/network pairs
(1800 for training, 200 for validation, and 200 for testing). Since each task graph has 19 tasks, the resulting
input graph has 41800 nodes (34200 for training, 3800 for validation, and 3800 for testing). We trained
29
GCNScheduler with two 64-unit hidden layers, with 0 dropout using the ReLU activation function, a
learning rate of 0.001, a weight-decay of 0.005 in batches of 128 nodes for 200 epochs and achieved a
testing accuracy of 56%.
4.3 Simulation Results
After training GCNScheduler, we generated new polygons and task graphs (drawn from the same
distributions as described above) and simulated the robots patrolling for 1500 units of time so that each
robot traverses the entire perimeter of the random polygon three times. For each task graph/network pair,
we run four simulations in parallel to compare the following scheduling algorithms:
1. HEFT: The standard HEFT algorithm
2. GCNScheduler: The trained network used in prediction mode
3. Random: Each task is assigned to execute on a random node
4. Static: HEFT runs once on the initial network and the schedule is not updated
The random and static schedulers are of interest because as the task graph or network size is increased,
recomputing a schedule becomes intractable. Each algorithm recomputes a schedule when the last
execution has completed, as described in Algorithm 1.
We ran 50 simulations and collected information on robot communication strengths over time, the
number of full application executions in each simulation, and the makespans of each of these executions.
While the average communication strength over all robots is relatively stable over time (Fig. 4.4), the
individual communication strength for a single robot is highly volatile over the course of its patrol around
the polygon (Fig. 4.5). Therefore, we expect that a scheduler which can adapt to changing network
conditions (HEFT and GCNScheduler) will be relatively stable while a scheduler that does not (Static and
Random schedules) will experience large spikes in makespans as nodes communication strengths degrade.
30
Algorithm 1 Simulation with stop time stop and time step dt
1: t ← 0
2: sched ← ∅
3: polygon ← random polygon
4: G ← random task graph
5: while t < stop do
6: N ← network induced by robot positions at time t
7: if schedule = ∅ or finished executing then
8: sched ← SCHEDULER(G, N)
9: end if
10: execute G according to sched on N for dt time
11: t ← t + dt
12: end while
Figure 4.4: Average communication strength (with one-standard-deviation error band) of all robots over
the time it takes to make one trip around the perimeter of the polygon.
31
Figure 4.5: Average communication strength (with one-standard-deviation error band) of one robot as it
makes one trip around the perimeter of the polygon.
Fig. 4.6 shows the average makespan of each scheduler over the average makespan of the HEFT
scheduler, or makespan ratio. This metric is an indicator of how much better or worse than HEFT a
scheduler is at minimizing makespan (< 1 indicates better than HEFT; > 1 indicates worse). Over all 50
simulations, GCNScheduler has the lowest average makespan ratio and smallest variance. Since schedules
are produced back-to-back, the makespan has a direct effect on the number of executions (or the number of
schedules produced) for each scheduler. Fig. 4.7 shows that while HEFT is able to complete the most
executions in one trip around the polygon, GCNScheduler outperforms the static and random schedulers
and achieves the smallest variance. The key takeaway is that GCNScheduler is able to adapt to the high
volatility of the network conditions because of its fast computation times.
32
Figure 4.6: Violin plot for 50 simulations of makespan ratios for each scheduler. For each violin, the dashed
line marks the mean, the solid line marks the median, and the top and bottom of the boxes mark the first and
third quartiles, respectively. GCNScheduler has the lowest average makespan ratio of 1.58 with a standard
deviation of 0.154, the static scheduler an average makespan ratio of 2.246 with a standard deviation of
0.831, and the random scheduler an average makespan ratio of 2.442 with a standard deviation of 0.396.
Figure 4.7: Violin plot for 50 simulations of the number of task graph executions for each scheduler. HEFT
supports 15 executions on average with a standard deviation of 2.087, GCNScheduler supports 10.48
executions on average with a standard deviation of 1.163, the static scheduler supports 7.82 executions
on average with a standard deviation of 2.247, and the random scheduler supports 1.78 executions on
average with a standard deviation of 1.379.
33
Chapter 5
Task Scheduling and Network Synthesis∗
We will now look at the task scheduling problem from a slightly different perspective. In real-world
scenarios, it is often the case that the compute network is not given, but rather, is something that can be
designed specifically to support one or more applications (e.g., IoT sensor/compute node allocation,
cloud/super-computer design, etc.). We show that using traditional metrics like connectivity to design
compute networks that run distributed applications modeled as task graphs can be detrimental to
performance. Specifically, we present an example where strictly improving the communication strength
between two nodes in the network causes HEFT, one of the most popular task scheduling algorithms, to
produce a schedule with twice the makespan. We formalize this network synthesis problem as selecting
one among a set of potentially deployable networks that optimally supports the distributed execution of
complex applications. We present the NSDC (Network Synthesis for Dispersed Computing) Framework, a
general framework for studying this type of problem and use it to provide a solution for one
well-motivated variant. We discuss how the framework can be extended to support other objectives,
parameters, and constraints as well as more scalable solution approaches.
∗The work presented in this chapter has been previously published and appears in [52].
34
5.1 A Motivating Example
In this section, we present a motivating example for the network synthesis problem. Consider the task
graph in Figure 5.1a and the network in Figure 5.1b which has just two homogeneous nodes connected by a
very weak communication link. For this problem instance, the HEFT scheduling algorithm produces a
schedule with a makespan of 4 time units. Then, consider the network in Figure 5.1c which is the same as
c (A, B) = 1
c (A, C) = 10
c (B, D) = 10
c (C, D) = 10
A
c(A) = 1
B
c(B) = 1 C
c(C) = 1
D
c(D) = 1
(a) Task Graph
s (1, 2) = 0.1
s(1) = 1
s(2) = 1
1
2
(b) Worse Network: nodes are connected
by a very weak communication link.
s (1, 2) = 2
s(1) = 1
s(2) = 1
1
2
(c) Better Network: nodes have 20 times
the communication strength as before.
0 1 2 3 4 5 6 7 8
Time
1
2
Nodes
A B C D
(d) HEFT Schedule on Worse Network
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Time
1
2
Nodes
A B C D
HEFT Schedule for “Better” Network
(e) HEFT Schedule on Better Network
Figure 5.1: A motivating example for the network synthesis problem where HEFT produces a schedule
with twice the makespan when the communication strength between two nodes is improved.
the previous network but with a communication link that is 20 times stronger. For this problem instance,
the HEFT scheduling algorithm produces a schedule with a makespan of 8 time units (Figure 5.1e), twice
that of the previous network! This example motivates the need for considering task scheduling directly in
the network synthesis problem — optimizing with respect to traditional network metrics like connectivity
can be detrimental to performance.
35
5.2 Problem Definition
Nodes and edges may have other attributes besides cost, data size, speed, and communication strength (as
defined in Section 2.1) not directly related to scheduling such as risk or deployment cost which may still
affect network synthesis. In the example problem studied in this chapter, we assume each node and edge
has both risk (denoted risk(v) and risk(v1, v2)) and deployment cost (denoted deploy-cost(v) and
deploy-cost(v1, v2)). Then we frame the network synthesis problem as finding the subset of nodes
V
′ ⊆ V and edges E′ ⊆ E which minimizes some linear combination of risk, deployment cost, and average
makespan across a set of given task graphs G:
cost(V
′
, E′
, G) = R
X
v∈V ′
risk(v) + X
v1,v2∈V ′
risk(v1, v2)
+
D
X
v∈V ′
deploy-cost(v) + X
v1,v2∈V ′
deploy-cost(v1, v2)
+
M ·
1
|G|
X
G∈G
MAKESPAN(V
′
, E
′
, G)
(5.1)
We assume MAKESPAN(V
′
, E′
, G) for a subset of the network N′ = (V
′
, E′
) and a task graph G is
computed using a scheduler like the Heterogeneous Earliest Time Finish scheduler which uses a heuristic
algorithm to produce a task-to-node allocation schedule to minimize the makespan (total execution time)
of the DAG over the network (with no formal guarantees on its optimality). We call this problem which
minimizes a linear combination of risk, deployment cost, and makespan (RDM) the RDM
Network-Synthesis problem:
Problem 1 (RDM Network-Synthesis). Given a network N = (V, E) and a set of task graphs G, find a
V
′ ⊆ V and E′ ⊆ E which minimizes Equation (5.1).
36
5.3 The NSDC Framework
In this section we propose the NSDC Framework, a general framework for solving network synthesis
problems like RDM Network-Synthesis presented in the previous section. RDM Network-Synthesis belongs
to a family of network synthesis problems which involve a set or distribution of candidate networks, a set
or distribution of distributed applications to support, a scheduler for evaluating the quality of networks for
the set of distributed apps, and an optimizer which iteratively constructs/chooses candidate networks to
evaluate, eventually arriving at or close to an optimum with respect to network attributes (i.e. risk,
deployment cost, makespan, etc.).
The NSDC Framework provides common interfaces for optimizers, schedulers, networks, distributed
applications, and objective functions. By creating different implementations which adhere to common
interfaces, we can mix and match (see Figure 5.2) different optimizers, schedulers, etc. to solve problems all
kinds of distributed computation oriented network synthesis problems like RDM Network-Synthesis. We
have made an open-source Python implementation of these common interfaces available on GitHub†
.
†
See the repository with pointers to all of the publicly available resources used in this dissertation at https://github.com/
jaredraycoleman/phd_thesis_code
37
Figure 5.2: The NSDC Framework allows us to combine different optimizers, schedulers, networks,
distributed applications, and cost functions to solve problems like RDM Network-Synthesis and others in
the larger family of Network Synthesis problems it belongs to.
5.4 Implementation & Results
To demonstrate the utility of the NSDC Framework, we consider an instance of RDM Network-Synthesis
with the “mother network” (the set of all deployable nodes and communication links) depicted in Figure 5.3.
We consider high/low-speed compute nodes, high/low speed connections between them and a set of five
task graphs with high/low cost tasks and high/low data dependencies (Figure 5.3). We implemented a
HEFT scheduler and an exhaustive search optimizer compatible with the NSDC Framework and ran it with
the risk, deployment cost, and speeds for the different link/node types in the “mother network” specified in
Table 5.1. We used the parameters R ← 1, D ← 0.1, M ← 0.1 for the overall cost metric to be minimized.
The network which optimizes the specified cost function involves three of five possible nodes and four
of nine possible communication links (Figure 5.4). Not surprisingly, the optimal network is not the fastest,
the cheapest, nor the least risky (Figure 5.5). Rather, it represents the best balance of these three metrics
subject to the provided cost function.
38
Figure 5.3: One of the five task graphs to be supported by the network (left) and the “mother network”
which represents all deployable nodes and links (right). We consider five task graphs which have the same
structure as Task Graph 1 depicted here but different cost/data requirements.
Risk Deploy Cost Speed
Satellite
Link
Low 0 5 0.2
High 0 10 0.4
Radio
Link
Low 0 5 0.2
High 0 10 0.4
Gray Cellular
Link
Low 1 1 0.8
High 1 2 1.6
Compute
Node
Low 0 1 0.1
High 0 2 0.2
Table 5.1: Risk, deploy-cost, and speeds for the different deployable network elements.
39
Figure 5.4: The network which minimized the specified linear combination of risk, deployment cost, and
makespan was that which included the low-speed radio link between 2 and 3 and the risky gray cellular
links between 2 and 4 (low-speed) and 3 and 4 (high-speed). Thus, nodes 2, 3, and 4 are deployed as
compute nodes.
Figure 5.5: The results from exhaustive search over the example described demonstrates how the optimal
network may not simultaneously minimize risk, deployment cost, and makespan. Rather, the optimal
network is that for which these attributes are most balanced (as a linear combination according to given
parameters). In the figure, each marker represents a valid network and larger markers indicate a higher
deployment cost for the network.
40
Chapter 6
Comparing Task Scheduling Algorithms∗
Scheduling a task graph representing an application over a heterogeneous network of computers is a
fundamental problem in distributed computing. It is known to be not only NP-hard but also not
polynomial-time approximable within a constant factor. As a result, many heuristic algorithms have been
proposed over the past few decades. Yet it remains largely unclear how these algorithms compare to each
other in terms of the quality of schedules they produce. We identify gaps in the traditional benchmarking
approach to comparing task scheduling algorithms and propose a simulated annealing-based adversarial
analysis approach called PISA to help address them. We also introduce SAGA, a new open-source library for
comparing task scheduling algorithms. We use SAGA to benchmark 15 algorithms on 16 datasets and PISA
to compare the algorithms in a pairwise manner. Algorithms that appear to perform similarly on
benchmarking datasets are shown to perform very differently on adversarially chosen problem instances.
Interestingly, the results indicate that this is true even when the adversarial search is constrained to
selecting among well-structured, application-specific problem instances. This work represents an
important step towards a more general understanding of the performance boundaries between task
scheduling algorithms on different families of problem instances.
The main contributions presented in this chapter are:
∗The work presented in this chapter has been previously published and appears in [53].
41
1. Introduces SAGA [54], an open-source, modular, and extensible Python package with
implementations of many well-known task scheduling algorithms and tools for benchmarking and
adversarial analysis.
2. Reports benchmarking results of 15 well-known scheduling algorithms on 16 datasets.
3. Proposes PISA, a novel simulated annealing-based adversarial analysis method for comparing
algorithms that identifies problem instances for which a given algorithm maximally under-performs
another.
4. Identifies gaps in the traditional benchmarking approach: PISA finds problem instances where
algorithms that appear to do well on benchmarking datasets perform poorly on adversarially chosen
datasets. We also show that this is true even when PISA is restricted to searching over
well-structured, application-specific problem instances.
6.1 The SAGA Framework
We built SAGA to overcome the notable lack of publicly available datasets and task scheduling algorithm
implementations. SAGA is a Python library for running, evaluating, and comparing task scheduling
algorithms. It currently contains implementations of 17 algorithms using a common interface. It also
includes interfaces for generating, saving, and loading datasets for benchmarking. Finally, it includes an
implementation of the main contribution presented in this chapter: PISA, a simulated annealing-based
adversarial analysis method for comparing algorithms. For more information on SAGA, we refer the reader
to the open-source repository [11].
Table 6.1 lists the 17 algorithms currently implemented in SAGA. To orient the reader, we provide a
brief description of each of these algorithms along with their scheduling complexity, the model they were
42
Table 6.1: Schedulers implemented in SAGA
Abbreviation Algorithm Reference
BIL Best Imaginary Level [22]
BruteForce Brute Force -
CPoP Critical Path on Processor [10]
Duplex Duplex [17]
ETF Earliest Task First [55]
FastestNode Fastest Node -
FCP Fast Critical Path [23]
FLB Fast Load Balancing [23]
GDL Generalized Dynamic Level [25]
HEFT Heterogeneous Earliest Finish Time [10]
MaxMin MaxMin [17]
MCT Minimum Completion Time [15]
MET Minimum Execution Time [15]
MinMin MinMin [17]
OLB Opportunistic Load Balancing [15]
SMT SMT-driven Binary Search -
WBA Workflow-Based application [16]
designed for, performance guarantees (if any), datasets they have been evaluated on, and other algorithms
they have been evaluated against.
BIL (Best Imaginary Level) is a list scheduling algorithm designed for the unrelated machines model.
In this model, the runtime of a task on a node need not be a function of task cost and node speed. Rather, it
can be arbitrary. This model is more general even than the related machines model we study. BIL’s
scheduling complexity is O
|T|
2
· |V | log |V |
and was proven to be optimal for linear graphs. The
authors report a makespan speedup over the GDL (Generalized Dynamic Level) scheduler on randomly
generated problem instances. The exact process for generating random problem instances is not described
4
except to say that CCRs (communication to computation ratios — average communication time over
average execution time) of 1/2, 1, and 2 were used.
CPoP (Critical Path on Processor) was proposed in the same paper as HEFT (Heterogeneous Earliest
Finish Time). Both are list scheduling algorithms with scheduling complexity O
|T|
2
|V |
. No formal
bounds for HEFT and CPoP are known. HEFT works by first prioritizing tasks based on their upward rank,
which is essentially the length of the longest chain of tasks (considering average execution and
communication times). Then, it greedily schedules tasks in this order to the node that minimizes the task
completion time given previously scheduled tasks. CPoP is similar but uses a slightly different priority
metric. The biggest difference between the two is that CPoP always schedules critical path tasks (those on
the longest chain in the task graph) to execute on the fastest node. Both algorithms were evaluated on
random graphs (the process for generating graphs is described in the paper) and on real task graphs for
Gaussian Elimination and FFT applications. They were compared against different scheduling algorithms:
Mapping Heuristic (similar to HEFT without insertion) [24], Dynamic-Level Scheduling [25], and Levelized
Min Time†
. Schedule Length Ratios (makespan scaled by the sum of minimum computation costs of all
tasks), speedup, and schedule generation times are reported.
The MinMin, MaxMin, and Duplex list scheduling algorithms may have been proposed many times
independently, but exceptionally clear definitions can be found in a paper that compares them to OLB,
MET, MCT, and a few meta-heuristic algorithms (e.g., genetic algorithms and simulated annealing) [17].
MinMin schedules tasks by iteratively selecting the task with the smallest minimum completion time
(given previously scheduled tasks) and assigning it to the corresponding node. MaxMin, on the other hand,
schedules tasks by iteratively selecting the task with the largest minimum completion time and assigning it
to the the corresponding node. Duplex simply runs both MinMin and MaxMin and returns the schedule
with the smallest makespan. In the paper, the authors evaluate these algorithms on independent
†We could not find the original paper that proposes this algorithm.
4
non-communicating tasks with uniformly random costs and heterogeneous compute nodes with uniformly
random speeds. MinMin (and therefore also Duplex) is shown to generate schedules with low makespan
compared to the other algorithms while relatively high makespans for MaxMin are reported.
ETF (Earliest Task First) is one of the few algorithms we discuss in this chapter that has formal bounds.
It is also a list-scheduling algorithm with runtime O(|T||V |
2
) and was designed for heterogeneous task
graphs but homogeneous compute networks. It works by iteratively scheduling the task with the earliest
possible start time given previously scheduled tasks (usually — details omitted for the sake of simplicity can
be found in the original paper). Note how this is different from HEFT and CPoP, which schedule according
to the earliest possible completion time of the task. This important difference is what allows the authors to
prove a formal bound of ωETF ≤ (2 − 1/n)ω
(i)
opt + C where ω
(i)
opt is the optimal schedule makespan without
communication and C is the total communication requirement over some terminal chain of tasks.
FCP (Fast Critical Path) and FLB (Fast Load Balancing) both have a runtime of O(|T| log (|V |) + |D|)
and were designed for heterogeneous task graphs and heterogeneous node speeds but homogeneous
communication strengths. The algorithms were evaluated on three types of task graphs with different
structures based on real applications (LU decomposition, Laplace equation solver, and a stencil algorithm)
with varied CCR and uniformly random task costs. Both algorithms were shown to perform well compared
to HEFT and ERT [20] despite their lower schedule generation times.
GDL (Generalized Dynamic Level), also called DLS (Dynamic Level Scheduling), is a variation on list
scheduling where task priorities are updated each time a task is scheduled. Due to this added computation
in each iteration, the complexity of DLS is O(|V |
3
|T|) (a factor |V | greater than HEFT and CPoP). GDL
was originally designed for the very general unrelated machines model and was shown to outperform
HDLFET [25] on randomly generated problem instances (though the method used for generating random
task graphs is not well-described) and on four real digital signal processing applications (two sound
synthesis algorithms, a telephone channel simulator, and a quadrature amplitude modulation transmitter).
45
MCT (Minimum Completion Time) and MET (Minimum Execution Time) are very simple algorithms
originally designed for the unrelated machines model. MET simply schedules tasks to the machine with the
smallest execution time (regardless of task start/end time). MCT assigns tasks in arbitrary order to the
node with the smallest completion time given previously scheduled tasks (basically HEFT without
insertion or its priority function). MET and MCT have scheduling complexities of O(|T||V |) and
O(|T|
2
|V |), respectively. The algorithms were evaluated on task graphs with 125 or 500 tasks, each task
having one of five possible execution times. They were shown to outperform a very naive baseline
algorithm which does not use expected execution times for scheduling.
OLB (Opportunistic Load Balancing) has a runtime of just O(|T|) and was designed for independent
tasks under the unrelated machines model. Probably useful only as a baseline for understanding the
performance of other algorithms, OLB schedules tasks in arbitrary order on the earliest available compute
node. Its performance has been shown to be significantly worse than MET, MCT, and LBA [15].
WBA (Workflow Based Application) is a scheduling algorithm developed for managing scientific
workflows in cloud environments and was designed for the fully heterogeneous model. We observe that its
scheduling complexity is at most O(|T||D||V |) (the authors do not report this, however, and a more
efficient implementation might be possible). WBA operates by randomly assigning tasks to nodes, guided
by a distribution that favors choices that least increase the schedule makespan in each iteration.
FastestNode is a simple baseline algorithm that schedules all tasks to execute in serial on the fastest
compute node. BruteForce is a naive algorithm that tries every possible schedule and returns that with
the smallest makespan. SMT uses an SMT (satisfiability modulo theory) solver and binary search to find a
(1 + ϵ)-OPT schedule. Both BruteForce and SMT take much longer to run (exponential time) than the other
algorithms and are thus not included in the benchmarking or adversarial analysis results reported in this
chapter.
46
SAGA also includes a set of tools for generating, saving, and loading datasets. Table 6.2 lists the 16
dataset generators currently included in SAGA. The in_trees, out_trees, and chains datasets each consist of
Table 6.2: Datasets available in SAGA
Name Task Graph Network
in_trees randomly weighted in-trees
out_trees randomly weighted out-trees randomly weighted
chains randomly weighted parallel chains
blast synthetic Blast workflows [56]
Chameleon cloud inspired [57]
bwa synthetic BWA workflows [56]
cycles synthetic Cycles workflows [58]
epigenomics synthetic Epigenomics workflows [50]
genome synthetic 1000genome workflows [59]
montage synthetic Montage workflows [60]
seismology synthetic Seismology workflows [61]
soykb synthetic SoyKB [62]
srasearch synthetic SRASearch workflows [63]
etl RIoTBench ETL application [64]
Edge/Fog/Cloud Networks [65]
predict RIoTBench PREDICT application [64]
stats RIoTBench STATS application [64]
train RIoTBench TRAIN application [64]
1000 randomly generated network/task graph pairs following a common methodology used in the
literature [66]. In-trees and out-trees are generated with between 2 and 4 levels (chosen uniformly at
random), a branching factor of either 2 or 3 (chosen uniformly at random), and node/edge-weights drawn
from a clipped gaussian distribution (mean: 1, standard deviation: 1/3, min: 0, max: 2). Parallel chains task
graphs are generated with between 2 and 5 parallel chains (chosen uniformly at random) of length
between 2 and 5 (chosen uniformly at random) and node/edge-weights drawn from the same clipped
gaussian distribution. Randomly weighted networks are complete graphs with between 3 and 5 nodes
(chosen uniformly at random) and node/edge-weights drawn from the same clipped gaussian distribution.
47
The scientific workflow datasets blast, bwa, cycles, epigenomics, genome, montage, seismology, soykb,
and srasearch each contain 100 problem instances. The task graphs are synthetically generated using the
WfCommons Synthetic Workflow Generator [67] and are based on real-world scientific workflows. The
Chameleon cloud inspired networks are generated by fitting a distribution to the machine speed data from
the execution traces (detailed information from a real execution of the application including task start/end
times, cpu usages/requirements, data I/O sizes, etc.) of real workflows on Chameleon that are available in
WfCommons and then sampling from that distribution to generate random networks. Because Chameleon
uses a shared filesystem for data transfer, the communication cost can be absorbed into the computation
cost and thus the communication strength between nodes is considered to be infinite.
The IoT/edge-inspired etl, predict, stats, and train datasets each contain 1000 problem instances. The
task graphs and networks are generated using the approach described in [65]. The task graph structure is
based on real-world IoT data streaming applications and the node weights are generated using a clipped
gaussian distribution (mean: 35, standard deviation: 25/3, min: 10, max: 60). The input size of the
application is generated using a clipped gaussian distribution (mean: 1000, standard deviation: 500/3, min:
500, max: 1500) and the edge weights are determined by the known input/output ratios of the tasks. The
Edge/Fog/Cloud Networks are generated by constructing a complete graph with three types of nodes: edge
nodes with CPU speed 1, fog nodes with CPU speed 6, and cloud nodes with CPU speed 50. The
communication strength between edge and fog nodes is 60 and between fog and cloud/fog nodes is 100.
The number of edge, fog, and cloud nodes is between 75 and 125, 3 and 7, and 1 and 10, respectively
(chosen uniformly at random). In order to generate a complete graph (a requirement for many of
scheduling algorithms), the communication strength between edge and cloud nodes is set to 60, the
communication strength between cloud nodes is set to infinity (i.e., no communication delay).
48
Many other schedulers and datasets have been proposed and used in the literature and can be easily
integrated into SAGA in the future. SAGA is open-source [11] and designed to be modular and extensible.
We encourage the community to contribute new algorithms, datasets, and experimentation tools.
6.2 Benchmarking Results
Figure 6.1 shows the results of benchmarking 15 algorithms on 16 datasets. Different scheduling algorithms BIL CPoP Duplex ETF FCP FLB FastestNode GDL HEFT MCT MET
MaxMin
MinMin
OLB
WBA
Scheduler
train
stats
srasearch
soykb
seismology
predict
out trees
montage
in trees
genome
etl
epigenomics
cycles
chains
bwa
blast
Dataset
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Maximum Makespan Ratio
Figure 6.1: Makespan Ratios of 15 algorithms evaluated on 16 datasets. Gradients depict performance on
different problem instances in each dataset.
perform better or worse depending on the dataset and algorithms that weren’t designed for fully
heterogeneous task graphs and networks (e.g., ETF, FastestNode) tend to perform poorly. Many of the
algorithms, though, perform similarly across the datasets. While these experiments provide valuable
49
information about the performance of each algorithm on each dataset, they provide much less information
about the algorithms themselves.
Consider the illustrative scenario in Figure 6.2. A simplistic task graph, as depicted in Figure 6.2a,
c (1, 2) = 2
c (1, 3) = 2
c (1, 4) = 2
c (2, 5) = 3
c (3, 5) = 3
c (4, 5) = 3
1 c(1) = 3
2 c(2) = 3 3 c(3) = 3 4 c(4) = 3
5 c(5) = 3
(a) Task Graph
s (1, 2) = 1
s (1, 3) = 1
s (2, 3) = 1
s(1) = 1
s(2) = 1
s(3) = 1.0
1
2
3
(b) Original Network
s (1, 2) = 1
s (1, 3) = 0.5
s (2, 3) = 0.5
s(1) = 1
s(2) = 1
s(3) = 1.0
1
2
3
(c) Modified Network
0 2 4 6 8 10 12 14 16
Time
1
2
3
Nodes
3
2
1 4 5
(d) HEFT Schedule on Original Network
0 2 4 6 8 10 12 14 16
Time
1
2
3
Nodes
1 2 3 4 5
(e) CPoP Schedule on Original Network
0 2 4 6 8 10 12 14 16
Time
1
2
3
Nodes
2 5
1 4 3
(f) HEFT Schedule on Modified Network
0 2 4 6 8 10 12 14 16
Time
1
2
3
Nodes
1 2 3 4 5
(g) CPoP Schedule on Modified Network
Figure 6.2: Comparison of Scheduling Algorithms on Slightly Modified Networks
coupled with a minor alteration to the initial network (Figures 6.2b and 6.2c) — a reduction in the strength
of node 3’s communication links — causes HEFT to perform worse than CPoP (Figures 6.2d, 6.2e, 6.2f,
and 6.2g). This example underscores the shortcoming of traditional benchmarking: it provides little insight
into the conditions under which an algorithm performs well or poorly. Observe that a structurally
equivalent instance of this problem with all node/edge weights scaled so they are between 0 and 1 could
have been generated by the Parallel Chains dataset generator in SAGA. So while the benchmarking results
50
in Figure 6.1 indicate that HEFT performs slightly better than CPoP on this dataset, there are, in fact,
Parallel Chains instances where CPoP performs significantly better than HEFT.
6.3 Adversarial Analysis
We propose a novel adversarial analysis method for comparing task scheduling algorithms called PISA
(Problem-instance Identification using Simulated Annealing). The goal of this method is to identify
problem instances on which one algorithm outperforms another. More formally, the goal is to find a
problem instance (N, G) that maximizes the makespan ratio of algorithm A against algorithm B:
max
N,G
m (SA,N,G)
m (SB,N,G)
In doing so, we hope to fill in some of the gaps we know exist in the benchmarking results. We propose a
simulated annealing-based approach for finding such problem instances. Simulated annealing is a
meta-heuristic that is often used to find the global optimum of a function that has many local optima. In
the context of our problem, the optimizer starts with an initial problem instance (N, G) and then randomly
perturbs the problem instance by changing the network or task graph in some way. If the perturbed
problem instance has a higher makespan ratio than the current problem instance, the optimizer accepts the
perturbation and continues the search from the new problem instance. If the perturbed problem instance
has a lower makespan ratio than the current problem instance, the optimizer accepts the perturbation with
some probability. This allows the optimizer to escape local optima and (potentially) find the global
optimum. Over time, the probability of accepting perturbations that decrease the makespan ratio decreases,
allowing the optimizer to settle into a high-makespan ratio state. The pseudo-code for our proposed
simulated annealing process is shown in Algorithm 2.
51
Algorithm 2 PISA (Problem-instance Identification using Simulated Annealing)
1: procedure PISA(N, G, Tmax, Tmin, Imax, α, A, B)
(N, G) is the initial problem instance
Tmax is the initial (maximum) temperature
Tmin is the minimum temperature
Imax is the maximum number of iterations
α is the cooling rate
A is the scheduler
B is the baseline scheduler
2: Nbest ← N
3: Gbest ← G
4: Mbest ←
m(SA,N,G)
m(SB,N,G)
5: T ← Tmax
6: i ← 0
7: while T > Tmin and i < Imax do
8: N′
, G′ ← Perturb(N, G)
9: M′ ←
m(SA,N′
,G′)
m(SB,N′
,G′)
10: if M′ > Mbest then
11: Nbest ← N′
12: Gbest ← G′
13: Mbest ← M′
14: else
15: p ← exp
−
M′/Mbest
T
16: r ← Random(0, 1) ▷ Uniform random number between 0 and 1
17: if r < p then
18: Nbest ← N′
19: Gbest ← G′
20: Mbest ← M′
21: end if
22: end if
23: T ← T · α
24: i ← i + 1
25: end while
26: return Nbest, Gbest, Mbest
27: end procedure
52
The Perturb function is responsible for perturbing the problem instance by changing the network or
task graph in some way. In our implementation, the Perturb function randomly selects (with equal
probability) one of the following perturbations:
1. Change Network Node Weight: Select a node v ∈ V uniformly at random and change its weight
by a uniformly random amount between −1/10 and 1/10 with a minimum weight of 0 and a
maximum weight of 1.
2. Change Network Edge Weight: Same as Change Network Node Weight, but for edges (not
including self-edges).
3. Change Task Weight: Same as Change Network Node Weight, but for tasks.
4. Change Dependency Weight: Same as Change Network Edge Weight, but for dependencies.
5. Add Dependency: Select a task t ∈ T uniformly at random and add a dependency from t to a
uniformly random task t
′ ∈ T such that (t, t′
) ∈/ D and doing so does not create a cycle.
6. Remove Dependency: Select a dependency (t, t′
) ∈ D uniformly at random and remove it.
Some of the algorithms we evaluate on were only designed for homogeneous compute nodes or
communication links. In these cases, we restrict the perturbations to only change the aspects of the
network that are relevant to the algorithm. For ETF, FCP, and FLB, we set all node weights to be 1 initially
and do not allow them to be changed. For BIL, GDL, FCP, and FLB we set all communication link weights
to be 1 initially and do not allow them to be changed.
For every pair of schedulers, we run the simulated annealing algorithm 5 times with different
randomly generated initial problem instances. The initial problem instance (N, G) is such that N is a
complete graph with between 3 and 5 nodes (chosen uniformly at random) and node/edge-weights
between 0 and 1 (generated uniformly at random, self-edges have weight ∞) and G is a simple chain task
53
graph with between 3 and 5 tasks (chosen uniformly at random) and task/dependency-weights between 0
and 1 (generated uniformly at random). In our implementation, we set Tmax = 10, Tmin = 0.1,
Imax = 1000, and α = 0.99.
6.3.1 Results
Figure 6.3 shows the PISA results for each pair of schedulers. The benefit of our approach is clear from just BIL CPoP Duplex ETF FCP FLB FastestNode GDL HEFT MCT MET
MaxMin
MinMin
OLB
WBA
Scheduler
Worst
WBA
OLB
MinMin
MaxMin
MET
MCT
HEFT
GDL
FastestNode
FLB
FCP
ETF
Duplex
CPoP
BIL
Base Scheduler
> 5.0 > 5.0 > 5.0 > 5.0 4.47 2.38 4.78 > 1000 > 5.0 > 5.0 3.70 > 5.0 4.35 > 1000> 1000
4.10 > 5.0 3.17 2.39 2.34 1.89 2.33 > 1000 3.57 2.68 2.42 1.86 2.95 > 1000
1.82 1.68 1.00 1.36 1.44 1.51 3.37 > 5.0 1.21 1.90 2.52 1.74 1.53 > 1000
4.14 > 5.0 1.00 2.00 3.27 1.99 3.20 > 1000 3.87 3.35 2.90 3.37 > 1000> 1000
> 5.0 > 5.0 1.00 1.75 1.58 1.98 3.48 > 1000 1.45 1.86 2.88 2.16 > 1000> 1000
3.86 > 5.0 > 5.0 2.16 1.96 1.73 1.00 > 1000 1.83 > 5.0 > 5.0 4.35 > 1000> 1000
4.56 1.86 1.47 1.67 2.06 2.05 3.47 > 1000 2.74 2.71 > 5.0 1.69 > 1000> 1000
4.10 4.69 1.89 2.22 1.83 2.38 3.18 > 1000 2.62 3.40 3.19 3.59 > 1000> 1000
0.98 0.89 0.96 1.00 0.98 1.08 1.02 0.91 0.93 0.92 0.98 0.94 1.26 0.80
3.83 > 5.0 3.52 3.66 1.58 1.98 > 1000 4.34 > 5.0 1.00 > 5.0 2.44 > 1000> 1000
> 5.0 1.35 1.04 1.44 1.76 4.78 > 5.0 1.86 1.31 3.70 2.06 1.87 > 1000 1.81
> 5.0 1.53 1.21 1.91 1.99 3.48 > 1000 1.00 1.62 3.17 1.60 1.99 4.85 1.99
> 5.0 4.69 1.90 2.07 1.99 3.57 4.55 1.19 1.41 3.59 1.85 1.58 > 1000 > 5.0
4.92 > 5.0 1.81 4.47 1.93 3.03 > 1000 > 5.0 1.77 2.57 > 5.0 2.05 > 1000> 1000
4.84 1.80 > 5.0 1.62 2.00 2.48 > 1000 1.24 1.84 2.81 2.38 2.03 > 1000> 1000
2.50 1.36 1.46 1.21 1.60 2.92 > 1000 1.60 1.84 3.46 1.84 1.91 > 1000> 1000
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Figure 6.3: Heatmap of Makespan Ratios for all 15 Algorithms compared to each other. The cell value (and
color) for row i and column j indicates the Makespan Ratio for the worst-case instance found by PISA for
scheduler j against baseline scheduler i.
a quick glance at Figure 6.3 (there’s a lot more red!). A closer look reveals even more interesting results.
First, PISA finds, for every scheduling algorithm, problem instances such that it performs at least twice as
bad as another of the algorithms. In fact, for most of the algorithms (10 of 15), it finds a problem instance
such that the algorithm performs at least five times worse than another algorithm! Furthermore, we note
54
that for nearly every pair of algorithms A, B, PISA identifies a problem instance where A outperforms B
and where B outperforms A. Finally, some of the cells in Figure 6.3 have values of > 1000. In this case,
PISA identified a problem instance where one algorithm drastically outperforms the other. For these cases,
it’s likely that there exist problem instances where the scheduler performs arbitrarily poorly compared to
the baseline scheduler.
6.3.2 Case Study: HEFT vs. CPoP
The HEFT and CPoP algorithms were proposed in the same paper by Topcuoglu, Hariri, and Wu in 1999
and have remained two of the most popular task scheduling algorithms in decades since. Both are
list-scheduling algorithms that use a priority list to determine the order in which tasks are greedily
scheduled. In HEFT, each task is scheduled on the node that minimizes its finish time, given previous
scheduling decisions. CPoP is similar, but commits to scheduling all critical-path tasks (those on the longest
path in the task graph) on the fastest node‡
. The priority function for these algorithms are slightly different
from each other, but are both based on task distance (in terms of average compute and communication
time) from the start or end of the task graph. In HEFT, a sink task’s (one with no dependencies) priority is
its average execution time over all nodes in the network. A non-sink task’s priority is the sum of its
average execution time over all nodes in the network plus the maximum communication and compute time
of its successors. In other words, it’s the average amount of time it takes to execute the task on any node in
the network plus the maximum amount of time it takes (in an ideal scenario) to execute the rest of the task
graph. In CPoP, the task priorities are computed based on the distance to the end and from the start of the
task graph. This difference is what causes the two algorithms to perform differently on certain problem
instances. Observe in Figure 6.4 a problem instance identified by PISA where HEFT performs
approximately 1.55 times worse than CPoP. In this instance, the algorithms differ primarily in whether
‡This statement is true for the related machines model. In general, CPoP schedules critical-path tasks to the node that
minimizes the sum of execution times of critical-path tasks on that node.
55
c (B, C) = 0.8
A
c(A) = 0.8
B
c(B) = 0.0
C
c(C) = 0.8
(a) Task Graph
s (1, 2) = 0.6
s (1, 3) = 0.3
s (2, 3) = 0.1
s(1) = 0.4
s(2) = 0.7
s(3) = 0.5
1
2
3
(b) Network
0.0 0.5 1.0 1.5 2.0
Time
1
2
3
Nodes B A C
(c) HEFT Schedule
0.0 0.5 1.0 1.5 2.0
Time
1
2
3
Nodes B C
A
(d) CPoP Schedule
Figure 6.4: Problem Instance where HEFT performs ≈ 1.55 times worse than CPoP.
task A or task C has higher priority. In both algorithms, task B has the highest priority and is thus
scheduled first on node 2 (the fastest node). For CPoP, task C must have the highest priority because it is
on the critical path B → C. For HEFT, though, task A has the higher priority than C because it is further
from the end of the task graph. As a result, CPoP schedules C on node 2, which allows task A to be
scheduled and executed in parallel on node 3 (the second fastest node). HEFT, on the other hand, schedules
all tasks on node 2 and thus does not benefit from parallel execution. CPoP succeeds in this instance
because it prioritizes tasks that are on the critical path, keeping high-cost tasks/communications on the
same node and allowing low-cost tasks/communications to be executed in parallel on other nodes. HEFT
greedily schedules high-cost tasks on fast nodes without taking into as much consideration how doing so
might affect the rest of the task graph (especially the tasks on the critical path).
This does not mean that CPoP is better than HEFT, though! Observe in Figure 6.5 a problem instance
identified by PISA where CPoP performs approximately 1.88 times worse than HEFT. In this instance, the
56
c (A, C) = 0.7
c (B, C) = 0.2
A
c(A) = 0.8
B
c(B) = 0.9
C
c(C) = 0.6
(a) Task Graph
s (1, 2) = 0.1
s (1, 3) = 0.4
s (2, 3) = 0.3
s(1) = 0.4
s(2) = 0.8
s(3) = 0.0
1
2
3
(b) Network
0 1 2 3 4 5 6 7 8
Time
1
2
3
Nodes
A C
B
(c) HEFT Schedule
0 1 2 3 4 5 6 7 8
Time
1
2
3
Nodes
A
B C
(d) CPoP Schedule
Figure 6.5: Problem Instance where CPoP performs ≈ 1.88 times worse than HEFT.
algorithms priorities are actually the same for all tasks. In fact, they schedule both A and B identically.
The problem that CPoP faces for this instance is exactly what allows it to succeed in the previous instance:
committing to scheduling all critical-path tasks on the fastest node. In this instance, the critical path is
A → C. Thus, CPoP schedules task C on node 2 (the fastest node) even though it would have finished
much faster (due to the communication cost incurred by its dependency on task A) on node 1. HEFT does
not have this problem because it does not commit to scheduling all critical-path tasks on the fastest node.
In conclusion, CPoP succeeds when the task has a critical path with low-cost tasks that “mess up”
HEFTs priority function and fails when scheduling a task on the fastest node causes a large communication
penalty. It’s easy to see that these bad case-scenarios can be modified to make HEFT and CPoP perform
arbitrarily worse than each other (e.g., by increasing the number of parallel source tasks in the first
example and by changing the network weight between nodes 2 and 3 to 0 in the second example). This
example highlights one of the benefits of our proposed approach to comparing task scheduling algorithms:
57
it allows us to identify patterns where the algorithms under-/over-perform and then extrapolate those
patterns to other problem instances.
6.4 Application-Specific PISA
In Section 6.3, we introduced PISA as an effective method for finding problem instances where an
algorithm performs much worse than benchmarking results suggest. The results, however, depend greatly
on the initial problem instance and the implementation of the Perturb function. These two things define
the space of problem instances the algorithm searches over and also affect which problem instances are
more or less likely to be explored. We chose an implementation in Section 6.3 that kept problem instances
relatively small (between three and five tasks and compute nodes) and allowed arbitrary task graph
structure and CCRs (communication-to-computation ratios). By keeping the problem instance size small
but allowing for almost arbitrary task graph structure and CCR, this implementation allowed us to explore
how the structure of problem instances affects schedules in order to find patterns where certain algorithms
out-perform others. In many more realistic scenarios, though, application developers have a better idea of
what their task graph or compute network will look like. PISA can be easily restricted to searching over a
space of realistic problem instances by adjusting the Perturb implementation and initial problem instance.
In this section, we report results on experiments with application-specific Perturb implementations.
Again, these results support the main hypothesis of this chapter that PISA reveals performance boundaries
that a traditional benchmarking approach does not.
6.4.1 Experimental Setup
One of the largest communities of researchers that depend on efficient task scheduling algorithms is that of
scientists that use scientific workflows for simulation, experimentation, and more. Scientific workflows are
typically big-data scale task graphs that are scheduled to run on cloud or super computing platforms.
58
These researchers typically have little to no control over how their workflows are scheduled, instead
relying on Workflow Management Systems (WFMS) like Pegasus [68], Makeflow [69], and Nextflow [70]
(to name a just few examples) to handle the technical execution details. For this reason, it is especially
important for WFMS developers/maintainers (who choose which scheduling algorithms their system uses)
to understand the performance boundaries between different algorithms for the different types of scientific
workflows and computing systems their clients use.
SAGA currently has datasets based on nine real-world scientific workflows (blast, bwa, cycles,
epigenomics, 1000genome, montage, seismology, soykb, and srasearch). These applications come from a
very wide range of scientific domains — from astronomy (montage builds mosaics of astronomical imagery)
to biology (1000genome performs human genome reconstruction) to agriculture (cycles is a multi-crop,
multi-year agroecosystem model). For each workflow, the runtime of each task, input/output sizes in bytes,
and speedup factor (compute speed) for each machine are available from public execution trace
information§
. The inter-node communication strength, however, is not available. We set communication
strengths to be homogeneous so that the average CCR, or average data size
commmunication strength , is 1
5
,
1
2
, 1, 2, or 5 (resulting
in five different experiments for each workflow).
Because these scientific workflows are much larger than the workflows used in the experiments from
Section 6.3, we evaluate a smaller subset of the schedulers available in SAGA: FastestNode, HEFT, CPoP,
MaxMin, MinMin, and WBA. Performing these experiments for the rest of the schedulers remains a task
for future work. For generating a benchmarking dataset, we use the WfCommons Synthetic Workflow
Generator [67] to generate random in-family task graphs and create random networks using a best-fit
distribution computed from the real execution trace data. We also use this method to generate initial
problem instances for PISA. Then, we adapt PISA’s Perturb implementation as follows:
§
https://github.com/wfcommons/pegasus-instances, https://github.com/wfcommons/makeflow-instances
59
• Change Network Node Weight: Same as described in Section 6.3, except the weight is scaled
between the minimum and maximum node speeds observed in the real execution trace data.
• Change Network Edge Weight: This option is removed since network edge weights are
homogeneous and fixed to ensure the average CCR for the given application is a specific value.
• Change Task Weight: Same as described in Section 6.3, except the weight is scaled between the
minimum and maximum task runtime observed in the real execution trace data.
• Change Dependency Weight: Same as described in Section 6.3, except the weight is scaled
between the minimum and maximum task I/O size observed in the real execution trace data.
• Add/Remove Dependency: This option is removed so that the task graph structure remains
representative of the real application.
These adjustments to the Perturb implementation allow us to explore application-specific problem
instances more realistically.
6.4.2 Results
In this section we present and discuss results on two of the workflows analyzed: srasearch and blast. Due
to space constraints, the results for the rest of the workflows can be found in the Appendix. Srasearch
(workflow structure depicted in Figure 6.6a) is a toolkit for interacting with data in the INSDC Sequence
Read Archives and blast (workflow structure depicted in Figure 6.6b) is a toolkit for finding regions of
similarity between biological sequences.
Observe in Figure 6.6 that while the number of tasks may vary, the structure of both workflows is very
rigid. Our new Perturb implementation, though, guarantees that the search space contains only task
graphs with appropriate structure. Figures 6.7 and 6.8 show the benchmarking and PISA results for
srasearch and blast, respectively. First, observe that the benchmarking approach suggests all algorithms
60
t1 tn+1
t2n+1 t3n+1
t2 tn+2
t2n+2 t3n+2
tn t2n
t3n t4n
. . . . . .
. . . . . .
t0
t4n+1 t4n+2
t4n+3
(a) Srasearch workflow structure
t1 t2 t3 t4 . . . tn
t0
tn+1 tn+2
(b) Blast workflow structure
Figure 6.6: Example workflow structures for blast and srasearch scientific workflows.
(except FastestNode) perform very well on the srasearch applications with a CCR of 1/5. Using PISA,
however, we are able to identify problem instances where WBA performs thousands of times worse than
FastestNode! Also, we’re able to find instances where MinMin performs almost twice as bad CPoP. Even
among the “good” algorithms, though, we see interesting behavior. Observe the results of HEFT and
MaxMin. We are able to find both a problem instance where HEFT performs approximately 20% worse
that MaxMin and an instance where MaxMin performs approximately 11% worse than HEFT.
The results for srasearch seem to imply that CPoP, which appears to perform consistently well for all
CCRs tested (the worst case found has a makespan ratio of 1.15) would be the scheduling algorithm of
choice for a WFMS designer. Workflow Management Systems do not support just one type of scientific
workflow, though, and CPoP’s effectiveness for srasearch workflows may not extend to others. This is true
for blast (see results in Figure 6.8) where CPoP performs generally poorly for all CCRs tested and in one
case yields a schedule with 1000 times the makespan of the schedule WBA produces!
These results are evidence that the traditional benchmarking approach to comparing algorithms is
insufficient even for highly regular, application-specific problem instances. They also have implications for
the design of Workflow Management Systems (and other task scheduling systems in IoT, edge computing
environments, etc.). It may be reasonable for a WFMS to run a set of scheduling algorithms that best covers
61
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 4.50 1.01 1.03 1.63
1.00 3.33 1.00 1.01 1.80
1.01 4.77 1.20 1.39 2.80
1.02 4.34 1.11 1.81 > 5.0
1.00 1.00 1.00 1.00 > 1000
3.84 1.00 1.02 1.92 2.72
1.00 2.57 1.00 1.00 1.07 1.12
srasearch (CCR = 0.2)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.03 3.82 1.03 1.02 1.28
1.00 3.17 1.00 1.00 1.95
1.10 3.32 1.00 1.76 3.03
1.04 4.32 1.03 1.95 2.83
1.00 1.00 1.00 1.00 1.00
> 5.0 1.01 1.03 1.90 2.60
1.01 2.66 1.01 1.00 1.08 1.15
srasearch (CCR = 0.5)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.00 4.01 1.01 1.03 1.50
1.01 3.62 1.00 1.00 1.74
1.06 > 5.0 1.09 1.85 2.70
1.07 3.26 1.02 1.88 3.07
1.00 1.00 1.00 1.00 1.00
> 5.0 1.00 1.05 1.84 4.71
1.01 2.57 1.01 1.00 1.08 1.19
srasearch (CCR = 1.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.07 3.79 1.07 1.03 1.53
1.01 3.70 1.00 1.00 > 5.0
1.15 4.67 1.00 1.94 2.77
1.02 3.65 1.06 1.82 2.61
1.00 1.00 1.00 1.00 1.00
3.43 1.00 1.02 1.51 > 5.0
1.02 2.46 1.02 1.00 1.07 1.42
srasearch (CCR = 2.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.02 3.91 1.01 1.01 1.41
1.13 3.51 1.00 1.00 1.79
1.04 4.34 1.46 1.51 4.46
1.01 4.18 1.03 1.68 3.64
1.00 1.00 1.00 1.00 1.55
> 5.0 1.10 1.04 1.94 3.67
1.01 2.57 1.01 1.00 1.08 1.19
srasearch (CCR = 5.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Figure 6.7: Benchmarking and PISA results for srasearch workflows with different average CCRs. For each
CCR, the top row shows benchmarking results, where gradients indicate makespan ratios on different
problem instances in the dataset. All other cells indicate the highest makespan ratio yielding problem
instance discovered by PISA.
the different types of client scientific workflows and computing systems. For example, a WFMS designer
might run PISA and choose the three algorithms with the combined minimum maximum makespan ratio.
62
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
> 1000 > 5.0 1.00 1.01 1.08
1.04 > 5.0 1.01 1.02 2.69
1.16 > 5.0 1.01 1.18 1.67
1.56 > 5.0 1.07 1.14 1.57
1.00 1.00 1.00 1.00 1.00
> 5.0 1.01 1.04 1.17 1.89
1.00 2.54 1.00 1.05 1.04 1.04
blast (CCR = 0.2)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 > 5.0 1.01 1.01 1.11
1.37 > 5.0 1.02 1.05 1.54
1.14 > 5.0 1.04 1.14 1.68
1.02 > 5.0 1.29 1.14 1.72
1.00 1.00 1.00 1.00 1.00
> 5.0 1.01 1.24 1.26 1.76
1.00 2.51 1.00 1.12 1.10 1.10
blast (CCR = 0.5)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.61 > 5.0 1.00 1.13 1.08
1.00 > 5.0 1.00 1.00 1.60
1.12 > 5.0 1.03 1.13 1.78
1.43 > 5.0 1.34 1.13 1.76
1.00 1.00 1.00 1.00 > 5.0
> 5.0 1.01 1.31 1.17 1.58
1.00 2.45 1.00 1.21 1.16 1.17
blast (CCR = 1.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
> 5.0 > 5.0 1.01 1.01 1.17
1.01 > 5.0 1.01 1.10 1.41
1.65 > 5.0 1.00 1.16 1.82
1.54 > 5.0 1.47 1.14 > 5.0
1.00 1.00 1.00 1.00 1.00
> 5.0 1.01 1.32 1.39 1.64
1.00 2.16 1.00 1.40 1.34 1.34
blast (CCR = 2.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
2.40 > 5.0 1.00 1.01 1.07
1.14 > 5.0 1.00 1.03 1.48
1.19 > 5.0 1.02 1.17 > 5.0
4.27 > 5.0 1.24 1.58 1.56
1.00 1.19 1.41 1.56 1.74
> 5.0 1.02 1.02 1.15 1.72
1.00 2.71 1.00 1.02 1.01 1.02
blast (CCR = 5.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Figure 6.8: Benchmarking and PISA results for blast workflows with different average CCRs. For each CCR,
the top row shows benchmarking results, where gradients indicate makespan ratios on different problem
instances in the dataset. All other cells indicate the highest makespan ratio yielding problem instance
discovered by PISA.
Different methodologies for constructing and comparing such hybrid algorithms is an interesting topic for
future work.
63
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.00 1.91 1.00 1.00 1.01
1.00 2.01 1.00 1.00 1.71
1.00 2.25 1.01 1.02 1.67
1.00 3.58 1.00 1.07 1.72
1.00 1.00 1.00 1.00 1.00
1.67 1.01 1.00 1.03 1.56
1.00 2.03 1.00 1.00 1.01 1.02
bwa (CCR = 0.2)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.00 1.55 1.00 1.00 1.01
> 5.0 2.33 1.00 1.00 1.54
1.00 1.98 1.02 1.03 2.21
1.00 1.74 > 5.0 1.02 1.55
1.00 1.00 1.00 1.00 1.03
4.09 1.02 1.00 1.12 1.76
1.00 2.10 1.00 1.00 1.01 1.02
bwa (CCR = 0.5)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.00 2.12 1.00 1.00 1.01
1.00 3.52 1.00 1.00 1.65
1.00 > 5.0 1.01 1.03 1.69
1.00 1.60 1.16 1.02 1.60
1.00 1.00 1.00 1.00 1.00
1.57 1.04 1.00 1.02 1.60
1.00 2.16 1.00 1.00 1.01 1.02
bwa (CCR = 1.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.00 3.50 1.00 1.00 1.19
1.00 4.31 1.00 1.00 1.57
1.00 1.90 1.01 1.09 1.81
1.00 1.81 1.00 1.05 1.83
1.00 1.00 1.00 1.00 1.00
4.47 1.01 1.00 1.02 1.61
1.00 2.11 1.00 1.00 1.01 1.02
bwa (CCR = 2.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.00 2.24 1.00 1.00 1.01
1.00 2.86 1.00 1.00 1.40
1.00 2.28 1.02 1.04 1.56
1.00 2.65 1.00 1.03 4.92
1.00 1.00 1.00 1.00 1.13
1.74 1.02 1.00 1.05 1.78
1.00 2.23 1.00 1.00 1.01 1.02
bwa (CCR = 5.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Figure 6.9: Results for the bwa scientific workflow.
64
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.02 1.46 1.01 1.01 1.03
1.01 1.51 1.01 1.01 1.71
1.02 1.55 1.01 1.05 1.70
1.02 1.47 1.02 1.06 > 5.0
1.00 1.00 1.00 1.00 1.25
1.47 1.01 1.01 1.07 1.79
1.02 2.37 1.00 1.04 1.00 1.04
epigenomics (CCR = 0.2)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
> 1000 1.50 1.01 1.01 1.03
1.00 1.32 1.01 1.00 1.74
1.01 1.54 1.01 1.08 1.87
1.02 1.59 1.01 1.06 1.79
1.00 1.00 1.00 1.00 1.15
1.52 1.01 1.01 1.06 > 5.0
1.02 2.29 1.00 1.04 1.01 1.05
epigenomics (CCR = 0.5)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 1.45 1.01 1.01 1.03
> 5.0 1.44 1.01 1.00 1.66
1.02 1.48 1.01 1.05 2.03
1.02 1.59 1.07 1.04 1.80
1.00 1.00 1.00 1.00 1.26
1.46 1.01 1.02 1.05 > 5.0
1.02 2.01 1.01 1.06 1.01 1.06
epigenomics (CCR = 1.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.02 1.45 1.01 1.02 1.03
1.02 1.41 1.00 1.04 1.75
1.02 1.53 1.01 1.07 1.96
1.03 1.48 1.02 1.07 1.75
1.00 1.00 1.00 1.00 1.00
1.41 1.01 1.06 1.07 1.86
1.01 1.60 1.02 1.06 1.02 1.06
epigenomics (CCR = 2.0)
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 1.43 1.01 1.02 1.04
> 5.0 1.42 1.01 1.00 1.74
1.03 1.35 1.01 1.06 1.96
1.10 1.53 1.02 1.10 1.78
1.00 1.00 1.05 > 1000 1.20
1.48 1.01 1.04 > 1000 1.66
1.02 2.45 1.00 1.04 1.01 1.06
epigenomics (CCR = 5.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Figure 6.10: Results for the epigenomics scientific workflow.
65
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 4.43 1.02 1.04 1.06
1.01 4.30 1.00 1.01 1.73
1.11 4.98 1.02 1.20 2.37
1.06 > 5.0 1.05 1.35 2.18
1.00 1.00 1.00 1.00 1.00
> 5.0 1.04 1.09 1.15 2.21
1.00 2.50 1.00 1.00 1.01 1.02
genome (CCR = 0.2)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.00 4.14 1.01 1.01 1.04
1.00 4.36 1.00 1.01 > 1000
1.02 > 5.0 1.02 1.21 1.87
> 1000 > 5.0 1.06 1.25 > 1000
1.00 1.00 1.00 1.00 1.00
> 5.0 1.03 1.19 1.27 2.02
1.00 2.53 1.00 1.01 1.01 1.03
genome (CCR = 0.5)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.00 4.14 1.00 1.04 1.06
1.04 4.77 1.00 1.01 1.69
1.06 > 5.0 1.03 1.18 2.07
1.03 > 5.0 1.08 1.28 2.23
1.00 1.00 1.00 1.00 1.00
> 5.0 1.04 1.17 1.21 2.11
1.01 2.48 1.00 1.02 1.01 1.02
genome (CCR = 1.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 4.28 1.01 1.18 1.09
1.03 4.67 1.02 1.01 1.79
1.01 > 5.0 1.04 1.21 1.96
1.05 > 5.0 1.06 1.30 1.74
1.00 1.00 1.00 1.00 1.00
> 5.0 1.10 1.15 1.33 2.18
1.01 2.69 1.00 1.08 1.01 1.02
genome (CCR = 2.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 > 5.0 1.01 1.06 1.06
1.01 4.85 1.02 1.41 1.70
1.01 > 5.0 1.02 1.24 2.06
1.48 > 5.0 1.38 1.19 > 1000
1.00 1.00 1.00 1.00 > 1000
> 5.0 1.10 1.26 1.21 > 1000
1.00 2.40 1.00 1.00 1.01 1.02
genome (CCR = 5.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Figure 6.11: Results for the 1000genome scientific workflow.
66
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.63 > 5.0 1.01 1.06 1.01
1.60 > 5.0 1.03 1.08 1.96
1.52 > 5.0 1.00 1.00 1.54
1.90 4.88 1.11 1.06 1.72
1.00 1.00 1.00 1.00 1.00
4.51 1.00 1.06 1.00 1.32
1.59 2.57 1.00 1.01 1.00 1.00
montage (CCR = 0.2)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.56 > 5.0 1.00 1.06 1.01
1.75 > 5.0 1.00 1.08 2.32
1.55 > 5.0 1.00 1.00 1.69
1.68 > 5.0 1.10 1.06 1.74
1.00 1.00 1.00 1.00 1.00
4.14 1.00 1.00 1.00 1.45
1.61 2.50 1.00 1.01 1.00 1.00
montage (CCR = 0.5)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.70 > 5.0 1.08 1.06 1.00
1.55 4.79 1.02 1.08 1.89
1.58 > 5.0 1.02 1.00 1.81
1.79 > 5.0 1.11 1.06 1.87
1.00 1.00 1.00 1.00 1.00
4.17 1.00 1.00 1.02 1.47
1.49 2.51 1.00 1.01 1.00 1.01
montage (CCR = 1.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.56 4.62 1.00 1.06 1.00
1.63 > 5.0 1.15 1.09 1.66
1.59 > 5.0 1.12 1.00 1.55
1.60 > 5.0 1.10 1.07 1.80
1.00 1.00 1.00 1.00 1.00
4.03 1.00 1.03 1.00 1.29
1.38 2.54 1.00 1.01 1.00 1.01
montage (CCR = 2.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.68 4.42 1.00 1.05 1.01
1.64 > 5.0 1.01 1.08 2.13
1.60 > 5.0 1.10 1.00 1.66
1.73 > 5.0 1.12 1.07 1.68
1.00 1.00 1.00 1.00 1.00
3.83 1.00 1.05 1.00 1.23
1.63 2.59 1.00 1.01 1.00 1.00
montage (CCR = 5.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Figure 6.12: Results for the montage scientific workflow.
67
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.46 5.00 1.00 1.05 > 5.0
1.58 4.85 1.00 1.08 1.81
1.57 > 5.0 1.00 1.60 1.50
1.64 > 5.0 1.12 1.76 2.01
1.00 1.00 1.00 1.00 1.00
3.99 1.00 1.00 1.59 1.43
1.00 2.56 1.00 1.00 1.01 1.01
seismology (CCR = 0.2)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
> 1000 1.50 1.01 1.01 1.57
1.00 1.32 1.01 1.00 1.74
1.01 1.54 1.01 1.81 1.87
1.02 1.59 1.01 1.71 1.79
1.00 1.00 1.00 > 1000 1.15
1.52 1.01 1.01 1.79 > 5.0
1.00 2.47 1.00 1.00 1.01 1.01
seismology (CCR = 0.5)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 1.45 1.01 1.01 > 1000
> 5.0 1.44 1.01 1.00 1.66
1.02 1.48 1.01 1.89 2.03
1.02 1.59 1.07 1.73 1.80
1.00 1.00 1.00 > 5.0 1.26
1.46 1.01 1.02 1.77 > 5.0
1.00 2.24 1.00 1.01 1.01 1.01
seismology (CCR = 1.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.02 1.45 1.01 1.02 1.51
1.02 1.41 1.00 1.04 1.75
1.02 1.53 1.01 1.99 1.96
1.03 1.48 1.02 1.70 1.75
1.00 1.00 1.00 > 5.0 1.00
1.41 1.01 1.06 1.73 1.86
1.00 2.54 1.01 1.02 1.01 1.02
seismology (CCR = 2.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 1.43 1.01 1.02 1.63
> 5.0 1.42 1.01 1.00 1.74
1.03 1.35 1.01 > 5.0 1.96
1.10 1.53 1.02 > 1000 1.78
1.00 1.00 1.05 > 1000 1.20
1.48 1.01 1.04 > 1000 1.66
1.00 2.64 1.00 1.00 1.01 1.02
seismology (CCR = 5.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Figure 6.13: Results for the seismology scientific workflow.
68
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 3.55 1.00 1.01 1.15
1.00 3.96 1.01 1.00 1.82
1.02 4.34 1.00 1.29 2.46
1.26 4.36 1.03 1.38 2.06
1.00 1.00 1.00 1.00 1.00
4.54 1.01 1.04 1.31 2.17
1.00 2.49 1.00 1.01 1.04 1.05
soykb (CCR = 0.2)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.00 3.83 1.00 1.00 1.16
> 1000 3.74 1.00 1.00 1.81
1.10 4.19 1.01 1.26 1.76
1.20 4.47 1.04 1.24 1.96
1.00 1.00 1.00 1.00 1.00
4.35 1.01 1.02 1.23 1.77
1.00 2.58 1.00 1.02 1.04 1.07
soykb (CCR = 0.5)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.01 4.17 1.00 1.00 1.18
1.00 4.00 1.00 1.01 1.48
1.01 4.02 1.01 1.22 2.04
1.05 4.63 1.04 1.29 2.03
1.00 1.00 1.00 1.00 1.00
4.03 1.02 1.03 1.24 2.37
1.00 2.60 1.00 1.04 1.03 1.07
soykb (CCR = 1.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
> 1000 4.00 1.01 > 1000 1.14
1.00 3.96 1.00 1.01 1.64
1.05 4.41 1.01 1.31 2.48
1.02 4.50 1.06 1.29 1.89
1.00 1.00 1.00 1.00 1.00
4.24 1.03 1.04 1.28 2.15
1.01 2.45 1.00 1.08 1.01 1.12
soykb (CCR = 2.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Cpop
FastestNode
Heft
MaxMin
MinMin
WBA
Scheduler
WBA
MinMin
MaxMin
Heft
FastestNode
Cpop
Benchmarking
Base Scheduler
1.00 3.84 1.01 1.00 1.16
1.00 3.82 1.00 1.13 1.67
1.01 4.48 1.00 1.25 2.01
1.02 4.76 1.16 1.28 1.85
1.00 1.00 1.00 1.00 1.00
4.60 1.08 1.18 1.28 1.71
1.00 2.56 1.00 1.00 1.04 1.04
soykb (CCR = 5.0)
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
> 5.0
Makespan Ratio
Figure 6.14: Results for the soykb scientific workflow.
69
Chapter 7
Comparing Task Scheduling Algorithm Components∗
Scheduling distributed applications modeled as directed, acyclic task graphs to run on heterogeneous
compute networks is a fundamental (NP-Hard) problem in distributed computing for which many heuristic
algorithms have been proposed over the past decades. Many of these algorithms fall under the
list-scheduling paradigm, whereby the algorithm first computes priorities for the tasks and then schedules
them greedily to the compute node that minimizes some cost function. Thus, many algorithms differ from
each other only in a few key components (e.g., the way they prioritize tasks, their cost functions, where the
algorithms consider inserting tasks into a partially complete schedule, etc.). We propose a generalized
list-scheduling algorithm that allows mixing and matching different task prioritization and greedy node
selection schemes to produce 72 unique algorithms. We benchmark these algorithms on four datasets to
study the individual effect of different algorithmic components on performance and runtime.
One of the most popular paradigms for heuristic scheduling algorithms is the list-scheduling paradigm.
Essentially, list-scheduling algorithms involve the following steps:
1. Compute a priority for each task such that every task has a higher priority than its dependents.
2. Greedily schedule tasks in order of their computed priorities (from highest to lowest) to run on the
node that minimizes or maximizes some predefined cost function.
∗The work presented in this chapter has been previously published and appears in [71].
70
Thus, many list scheduling algorithms differ only in a few algorithmic components (e.g., their prioritization
functions, their cost functions, where the algorithms consider inserting tasks into an existing schedule,
etc.). We propose a general parametric scheduler (extending SAGA [11], an open-source python library for
comparing task scheduling algorithms) that allows us to mix-and-match different algorithmic components
and evaluate how they individually contribute to a list-scheduling algorithm’s performance and runtime.
Interestingly, we find that many new algorithms (composed of previously unstudied combinations of
algorithmic components) are pareto-optimal with respect to performance and runtime. We also report the
average effects that both individual components and combinations of different components have on
performance and runtime, presenting evidence that the way algorithmic components interact with each
other is problem-dependent (i.e., depends on the task graph structure, whether or not the application is
communication or computation heavy, etc.).
The main contributions presented in this chapter are:
1. An open-source generalized parametric list scheduling algorithm that allows mixing and matching of
different algorithmic components.
2. Benchmarking results for 72 algorithms produced by combining five different algorithmic
components on 20 publicly available datasets.
3. Results on both individual and combined effects of different algorithmic components on average
performance and runtime across all datasets.
4. Results on both individual and combined effects of different algorithmic components on performance
and runtime for each individual dataset.
The rest of this chapter is organized as follows. In Section 7.1 we present our methodology for comparing
different algorithmic components and the generalized parametric list scheduling algorithm for doing so. In
Section 7.2 we report benchmarking results for 72 algorithms created by combining five different types of
71
algorithmic components and study the effects that individual components have on performance and
runtime.
7.1 A Generalized List-Scheduling Algorithm
To study the effects of individual algorithmic components in list-scheduling algorithms, we extended
SAGA [11] (a python library for comparing task graph scheduling algorithms) with a generalized
parametric scheduling algorithm that allows users to specify the following algorithmic components:
• Priority Function: Function used to determine the order in which to schedule tasks.
• Comparison Function: Function used to determine which node to schedule a task to.
• Insertion-based vs. append-only scheduling: Insertion-based algorithms insert tasks into the earliest
sufficiently large window (on a particular node) in an existing schedule. Append-only algorithms
only consider scheduling tasks after the last currently scheduled task finishes on a particular node.
• Critical path reservation vs. no reservation: Whether or not the algorithm commits to scheduling all
tasks on the critical path — the longest (with respect to node/edge weights) chain of tasks in the task
graph — on the fastest compute node †
.
• Sufferage vs. no-sufferage consideration: Sufferage schedulers [21] consider the top 2
highest-priority task in each iteration, choosing to schedule the one for which scheduling on the
second-best node would cause the greatest detriment (with respect to the comparison function).
The algorithm works first by using the configured priority function to determine the order in which to
schedule tasks. The priority function takes the problem instance (network/task graph pair) as input and
returns a sequence of tasks. In each iteration, the algorithm uses the comparison function and whether or
†This is consistent with the original definition [10] of critical path and critical path reservation for the task scheduling problem
we study in this thesis (fully heterogeneous, related machines model) but may not be for other task scheduling variants (e.g., for
the unrelated machines model).
72
not the algorithm is using insertion-based or append-only scheduling to decide which node it should
schedule a task on. In insertion-based scheduling, the scheduler finds, for each node, the earliest window of
time for which the node is idle and the window is large enough for the task to execute (taking into account
communication delays for dependency data transfer). In the append-only scheme, the scheduling algorithm
only considers scheduling tasks after the last currently scheduled task on a node finishes executing. Once
the algorithm computes the candidate start/end times for the task on each candidate node, it uses the
comparison method to choose which node to select. The comparison function may consider the candidate
task’s start time, end time, or other information to make this decision. If the algorithm is a sufferage
scheduler, then the comparison function is used identify the best and the second-best node on which to
schedule the task and then used again to quantify the candidate task’s sufferage: the difference in quality
between scheduling on the second-best and best node. The algorithm then chooses to schedule the task the
highest sufferage value (the idea being that not scheduling this task on its preferred node is more
detrimental) and returns the other task to the queue.
The priority and comparison functions are very general algorithmic components for which there are
many possible implementations. We consider the following implementations of each algorithmic
component:
• Priority Function (initial_priority)
– UpwardRank: the priority function used in the HEFT scheduling algorithm [10]
– CPoPRank: the priority function used in the CPoP scheduling algorithm [10]
– ArbitraryTopological: an arbitrary topological sort of the task graph
• Comparison Function (compare)
– EFT: (Earliest Finish Time): schedules tasks to the node on which the task can finish the soonest
– EST (Earliest Start Time): schedules tasks to the node on which the task can start the soonest
73
– Quickest: schedules tasks to the node on which it executes in the least amount of time
• Insertion-based vs. append-only scheduling (append_only = True or False)
• Critical path reservation vs. no reservation (critical_path = True or False)
• Sufferage vs. no-sufferage consideration (sufferage = True or False)
Algorithms 3,4,5 implement the EFT, EST, and Quickest comparison functions, respectively. Algorithms 6
and 7 implement functionality for append-only and insertion-based scheduling. Implementations for the
priority functions are not included. Details for the UpwardRank and CPoPRank priority functions can be
found in the original paper they were proposed in [10]. Finally, Algorithm 8 represents the full parametric
scheduling algorithm and uses the functions defined in Algorithms 3-7.
Algorithm 3 Earliest Finish Time Compare Algorithm
1: function EFT((start, end), (start′
, end′
))
inputs: (start, end) ∈ R
2 ▷ start/end time of first candidate task
(start′
, end′
) ∈ R
2 ▷ start/end time of second candidate task
2: return end − end′
3: end function
Algorithm 4 Earliest Start Time Compare Algorithm
1: function EST((start, end), (start′
, end′
))
inputs: (start, end) ∈ R
2 ▷ start/end time of first candidate task
(start′
, end′
) ∈ R
2 ▷ start/end time of second candidate task
2: return start − start′
3: end function
Algorithm 5 Quickest Execution Time Compare Algorithm
1: function EST((start, end), (start′
, end′
))
inputs: (start, end) ∈ R
2 ▷ start/end time of first candidate task
(start′
, end′
) ∈ R
2 ▷ start/end time of second candidate task
2: return (end − start) − (end′ − start′
)
3: end function
The combinations of these algorithmic components allow for 72 unique scheduling algorithms. We
evaluated each of these on 20 datasets based on four task graph structures (chains, in_trees, and out_trees,
74
Algorithm 6 Append-only available window finding algorithm
1: function GetOpenWindowAppendOnly(u, t)
2: Let est be the finish time of the last task scheduled on node u (0 if no tasks scheduled on u)
3: Let dat be the minimum time at which all data from task t dependencies can be available on node u
4: start ← max{est, dat}
5: end ← start +
c(t)
s(u)
6: return (start, end)
7: end function
Algorithm 7 Insertion-based available window finding algorithm
1: function GetOpenWindowInsertionBased(u, t)
2: T ← list of tasks scheduled on node u in order of start time
3: if T is empty then return GetOpenWindowAppendOnly(()u, t)
4: end if
5: Let dat be the minimum time at which all data from task t dependencies can be available on node u
6: for task t
′
scheduled on node u do
7: Let est be the time that task t
′ finishes
8: start ← max{est, dat}
9: end ← start +
c(t)
s(u)
10: if end ≤ start time of next task in T or t is the last task in T then return (start, end)
11: end for
12: end function
cycles) and five communication-to-computation ratios, or CCRs, (CCR 1/5, 1/2, 1, 2, 5) for each. CCR is
commonly used to characterize problem instances in the literature [13], albeit with varying definitions.
Intuitively, though, the CCR determines how communication or computation heavy a task graph is. We
define the CCR for a particular problem instance (N, G) (where N = (V, E) and G = (T, D)) as the
average communication time over the average computation time:
1
|D|
P
(t,t′)∈D c(t, t′
)
1
|E|
P
(v,v′)∈E
s(v, v′)
1
|T|
P
t∈T
c(t)
1
|V |
P
v∈V
s(v)
Each in_trees, out_trees, and chains datasets consists of 100 randomly generated network/task graph pairs
following a common methodology used in the literature [66]. Each in_trees, out_trees, and chains datasets
consists of 100 randomly generated network/task graph pairs following a common methodology used in
the literature [66]. In-trees and out-trees are generated with between 2 and 4 levels (chosen uniformly at
75
Algorithm 8 Generalized Parametric Scheduling Algorithm
parameters: GetPriority ∈ {UpwardRanking, CPoPRanking,ArbitraryTopological}
Compare ∈ {EFT, EST,Quickest}
append_only ∈ {0, 1}
sufferage ∈ {0, 1}
critical_path ∈ {0, 1}
1: Initialize empty schedule S
2: Compute priority for each task in G using Priority Function
3: Sort tasks in descending order of priority
4: if append_only then
5: Let GetWindow ← GetOpenWindowAppendOnly
6: else
7: Let GetWindow ← GetOpenWindowInsertionBased
8: end if
9: while unscheduled tasks exist do
10: initialize best and second_best to arbitrary nodes
11: Let t be the unscheduled task with highest priority
12: for u ∈ V do
13: if Compare(GetWindow(t, u),GetWindow(t, best)) < 0 then
14: second_best ← best
15: best ← u
16: else if Compare(GetWindow(t, u),GetWindow(t, second_best)) < 0 then
17: second_best ← u
18: end if
19: end for
20: if sufferage and there are at least two unscheduled tasks then
21: initialize best′
and second_best′
to arbitrary nodes
22: Let t
′ be the unscheduled task with second-highest priority
23: for u ∈ V do
24: if Compare(GetWindow(t
′
, u),GetWindow(t
′
, best′
)) < 0 then
25: second_best′ ← best′
26: best′ ← u
27: else if Compare(GetWindow(t
′
, u),GetWindow(t
′
, second_best′
)) < 0 then
28: second_best′ ← u
29: end if
30: end for
31: sufferage_t ←Compare((t, second_best),(t, best))
32: sufferage_t
′ ←Compare((t
′
, second_best′
),(t
′
, best′
))
33: if sufferage_t
′ > sufferage_t then
34: t, best ← t
′
, best′
35: end if
36: end if
37: Add (t, start, end, best) to S
38: end while
39: return S
76
random), a branching factor of either 2 or 3 (chosen uniformly at random), and node/edge-weights drawn
from a clipped Gaussian distribution (mean: 1, standard deviation: 1/3, min: 0, max: 2). Parallel chains task
graphs are generated with between 2 and 5 parallel chains (chosen uniformly at random) of length
between 2 and 5 (chosen uniformly at random) and node/edge-weights drawn from the same clipped
Gaussian distribution. Randomly weighted networks are complete graphs with between 3 and 5 nodes
(chosen uniformly at random) and node/edge-weights drawn from the same clipped Gaussian distribution.
Finally, the network communication strengths are scaled to achieve the desired CCR of either 1/5, 1/2, 1,
2, or 5. The cycles dataset is based on the cycles scientific workflow [58] and represents a multi-crop,
multi-year agro-ecosystem model for simulating crop production. For each workflow, the runtime of each
task, input/output sizes in bytes, and speedup factor (compute speed) for each machine are available from
public execution trace information‡
. We set communication strengths to be homogeneous so that the
average CCR is 1
5
,
1
2
, 1, 2, or 5 (resulting in five different datasets). Figure 7.1 shows an example in_trees,
out_trees, chains, and cycles task graph.
‡
https://github.com/wfcommons/pegasus-instances, https://github.com/wfcommons/makeflow-instances
77
t1 t2 t3 t4
t5 t6
t7
(a) Example task graph structure for the in-trees datasets.
t4 t5 t6 t7
t2 t3
t1
(b) Example task graph structure for the out-trees
datasets.
t1 t2 t3 t4
t5 t6 t7 t8
t9 t10 t11 t12
(c) Example task graph structure for the chains datasets.
t5 t6 t7 t8 t9 t10 t11 t12
t1 t2 t3 t4
t13 t14 t15 t16
t17 t18 t19 t20
t21
(d) Example task graph structure for the cycles datasets.
Figure 7.1: Example task graphs for each of the datasets evaluated.
7.2 Results
In this section we discuss the results of running each of the 72 schedulers on 20 datasets (4 dataset types
and 5 CCRs). Each dataset consists of 100 problem instances. All schedulers that are pareto optimal for at
least one of the evaluated datasets are listed in Table 7.1. There are 24 such schedulers (48 of the 72 total
schedulers evaluated were strictly dominated by at least one other scheduler for every dataset). A
pareto-optimal scheduler is one such that no other scheduler has both a lower average makespan ratio and
a lower average runtime ratio on a given dataset. Figure 7.2a depicts the pareto-optimal schedulers for each
dataset. Each subplot has 24 markers (one for each pareto-optimal scheduler) but only the schedulers that
are pareto optimal for each dataset are colored blue. For example, for the dataset in_trees_ccr_0.2 (in_trees
with CCR=1/5), two schedulers that are pareto-optimal for another dataset are strictly dominated (by large
78
scheduler initial_priority append_only compare critical_path sufferage
EFT_Ins_AT ArbitraryTopological False EFT False False
EFT_Ins_AT_Suf ArbitraryTopological False EFT False True
EST_Ins_AT ArbitraryTopological False EST False False
EST_Ins_AT_Suf ArbitraryTopological False EST False True
EST_Ins_CP_AT ArbitraryTopological False EST True False
MCT [17] ArbitraryTopological True EFT False False
Sufferage [21] ArbitraryTopological True EFT False True
EFT_App_CP_AT ArbitraryTopological True EFT True False
EST_App_AT ArbitraryTopological True EST False False
EST_App_AT_Suf ArbitraryTopological True EST False True
MET [17] ArbitraryTopological True Quickest False False
CPoP [10] CPoPRanking False EFT False False
EFT_Ins_CR_Suf CPoPRanking False EFT False True
EST_Ins_CR CPoPRanking False EST False False
EST_Ins_CR_Suf CPoPRanking False EST False True
EFT_App_CR_Suf CPoPRanking True EFT False True
HEFT [10] UpwardRanking False EFT False False
EFT_Ins_UR_Suf UpwardRanking False EFT False True
EST_Ins_UR UpwardRanking False EST False False
EST_Ins_UR_Suf UpwardRanking False EST False True
EFT_App_UR UpwardRanking True EFT False False
EFT_App_UR_Suf UpwardRanking True EFT False True
EST_App_UR UpwardRanking True EST False False
EST_App_UR_Suf UpwardRanking True EST False True
Table 7.1: All scheduling algorithms that were pareto-optimal for at least one evaluated dataset (24 of 72
schedulers). All other schedulers (48 of 72) were strictly dominated by another algorithm in every dataset.
margins) by another scheduler for this dataset. Figure 7.2b serves as a kind of legend for the scatter plot.
Each cell represents the pareto-optimal scheduler’s rank compared to other pareto-optimal schedulers for
the same dataset with respect to its runtime ratio. For each dataset, the schedule labeled 1 is the schedule
79
with the least runtime ratio (furthest blue marker to the left in the corresponding scatter plot in
Figure 7.2a). Because it is pareto-optimal, this also indicates it has the highest makespan ratio of
pareto-optimal schedulers for the dataset (and so is also the highest blue marker in the corresponding
scatter plot in Figure 7.2a). Blank cells indicate that the scheduling algorithm is not pareto-optimal for the
dataset. Thus, schedulers that have consistently low order numbers (like Sufferage) are pareto-optimal
mostly (or entirely) due to their low runtimes and not due to their low makespan ratios. Results indicate
that these are scheduling algorithms that are fast, but not very performant. On the other hand, scheduling
algorithms that have consistently high order numbers (like HEFT) are pareto-optimal mostly (or entirely)
due to their low makespan ratios and not despite higher runtimes. Results indicate that these are
scheduling algorithms that are performant, but slower in generating schedules.
80
(a) Markers represent scheduling algorithms that are pareto-optimal for at least one of the evaluated datasets.
(b) The chart number represents the pareto-optimal schedule’s position (from left to right) for each dataset.
Figure 7.2: Pareto-optimal scheduling algorithms for each dataset.
81
7.2.1 Effects of Algorithmic Components
In this section, we study the effect that different algorithmic components have on performance and
runtime. Figure 7.3 suggests that, across all datasets, the priority function has a small effect makespan ratio
with UpwardRanking just slightly out-performing ArbitraryTopological and CPoPRanking. Similar results
for the append-only and sufferage components are shown in Figures 7.7 and 7.5. Figure 7.4 indicates that
while the Quickest comparison function is clearly the least performant, EFT and EST have roughly similar
performance across all datasets. Figure 7.6 suggests that critical path reservation tends to increase
makespan ratios and, more dramatically, increase runtime ratios, suggesting that critical path reservation,
in general, is a poor strategy. Table 7.1 from the previous section though, there are some datasets for which
critical path reserving schedulers are pareto-optimal. This suggests that the effects of these components
might be dataset-specific.
Figure 7.3: Effect of the initial priority function on the makespan and runtime ratios over all datasets.
Figure 7.4: Effect of the comparison function on the makespan and runtime ratios over all datasets.
82
Figure 7.5: Effect of the insertion vs. append-only scheme on the makespan and runtime ratios over all
datasets.
Figure 7.6: Effect of the critical path reservation on the makespan and runtime ratios over all datasets.
Figure 7.7: Effect of the sufferage selection scheme on the makespan and runtime ratios over all datasets.
We do, in fact, observe in some cases quite different behavior for individual datasets. For example,
Figure 7.8 depicts the effect of the comparison function on the makespan and runtime ratios for the
cycles_ccr_5 dataset (cycles task graphs with CCR=5). The Quickest comparison function, which performs
generally terribly compared to EFT and EST for other datasets, outperforms EFT and EST by a large
margin! The dataset isn’t the only thing that interacts with the effects of algorithmic components, though.
83
Figure 7.9 depicts a few of the more interesting interactions between algorithmic components (averaged
over all datasets). Figure 7.9a depicts an interaction between the append_only and initial_priority
parameters, suggesting that the combination of append_only=True and initial_priority=CPoPRanking has
a more detrimental effect on the makespan ratio that either of the parameters do by themselves. In other
words, the append-only strategy is particularly bad when using the CPoPRanking priority function.
Figure 7.9b shows that the Quickest comparison function is generally bad but less so on
communication-heavy applications (those with high CCR). Figure 7.9c that it is also a particularly bad
comparison function for the out_trees datasets. Figure 7.9d suggests that the small difference in the
makespan ratio for schedulers that do critical path reservation is due almost entirely to how critical path
reservation increases makespan ratios for in_trees datasets.
Figure 7.8: Effect of the comparison function on the makespan and runtime ratios for the cycles_ccr_5
dataset (cycles task graphs with CCR=5).
84
(a) Interaction between append_only and priority parameters.
(b) Interaction between the comparison function and
CCR
(c) Interaction between the comparison function and
dataset.
(d) Interaction between the critical path reservation and
dataset.
Figure 7.9: Interactions between algorithmic components.
85
Chapter 8
Conclusion and Future Directions
For decades, task scheduling has been a subject of consistent research, but dynamic environments
introduce new challenges. We addressed important research questions born out of these challenges:
1. Scheduling in Dynamic Environments: How does dynamicity (e.g., constantly changing network
conditions) affect the performance of heuristic scheduling algorithms and how can they be adapted
to better serve applications running in these kinds of environments?
2. Network Synthesis: Due to the Task Scheduling problem being NP-Hard and not approximable
within a constant factor, there must exist scenarios (problem instances) for which heuristic
algorithms perform arbitrarily poorly. Can we design compute networks with the task scheduling
problem in mind to avoid these scenarios in practice?
3. Comparing Task Scheduling Algorithms: Under what conditions do heuristic scheduling
algorithms perform well and how do individual algorithmic components affect performance?
In Chapter 4, we highlighted the limitations that traditional scheduling algorithms face under dynamic
network conditions and proposed leveraging recent advances in machine learning (namely, GCNScheduler)
to address these limitations. One issue with GCNScheduler is that it cannot learn the ordering that HEFT
prescribes when scheduling since it only labels tasks with the compute nodes they should execute on. This
86
may be part of the reason why the accuracy of the trained model is relatively low. Addressing this issue is
an important direction for future work. It may also be interesting to explore whether different loss
functions (we use cross-entropy loss) or neighborhood embedding aggregation functions (we use
averaging) can improve accuracy.
In order to make the problem interesting, we chose a task graph for which HEFT is able to produce
“interesting” schedules (i.e., schedules with tasks assigned to different nodes). In order to create better
models, though, we need to better understand the performance of HEFT (or any other scheduler) on
different classes of task graphs and networks. It seems reasonable, for example, that one scheduler works
very well on one class of networks while another scheduler works well on a different class. The motivating
example presented in Chapter 5 and the results presented in Chapter 7 show that this is indeed the case.
We provided examples (and methods for the automatic discovery of) problem instances where scheduling
algorithms that might be expected to perform well (due to their performance on similar problem instances)
actually perform poorly.
We have experimented with GCNScheduler on a network of mobile robots (Figure 4.1) and preliminary
findings indicate that it adapts when a single robot becomes disconnected from the network [72]. These
findings were limited to four network nodes, while the simulation results presented here indicate that
GCNScheduler shows more impressive performance on larger problem instances. Experimenting with
larger, real-world robotic networks is another direction for future work. We have demonstrated that a
Graph Convolutional Network-based Scheduler can learn to produce schedules with decent makespan and
can compute schedules fast enough to adapt to dynamic networks, making them applicable for the Internet
of Robotic Things. We are also interested in exploring how machine learning can be helpful for other
aspects task scheduling problems. For example, it may be interesting to explore whether an ML model can
be trained to replace the prioritization phase of list-scheduling algorithms, since this phase alone has a
huge impact on schedule makespan. Also, recent advances the transformer architecture [73] suggest that
87
an autoregressive model which considers the task graph, network, and previously scheduled tasks to
decide where to schedule the next task, might perform well.
The motivating example presented in Chapter 5 also showed that designing networks to support task
scheduling applications using traditional metrics like connection strength can be detrimental to
performance. To address this, we formalized a novel network synthesis problem that takes the task
scheduling problem into account. We presented the NSDC Framework, a general framework for solving
these kinds of problems and demonstrated its utility on a well-motivated example.
This work opens many avenues for future research. First, exhaustive search is inherently not scalable
and an exploration into other optimizers (e.g., simulated annealing) for the outer loop would be valuable.
Second, the HEFT scheduling algorithm used in the use-case is also not scalable. Other scheduling
algorithms must be implemented. Finally, more complex variations of the proposed network synthesis
problem (e.g., networks with more complex connectivity, larger and more realistic distributed applications,
mobile compute nodes, etc.) can now be studied more easily leveraging the NSDC Framework.
Motivated by the example presented in Chapter 5, we proposed a new approach to comparing task
scheduling algorithms that can help better understand the conditions under which an algorithm performs
well and poorly. In Chapter 6, we presented SAGA, a Python framework for running, evaluating, and
comparing task scheduling algorithms. We evaluated 15 of the scheduling algorithms implemented in SAGA
on 16 datasets and demonstrated how our proposed adversarial analysis method, PISA, provides useful
information that traditional benchmarking does not. We used the results of the adversarial analysis to
explore HEFT and CPoP in more detail and identified network and task graph conditions that cause one
algorithm to perform better or worse than the other. We also explored how PISA can be used for
application-specific scenarios where system designers have some idea of what the target task graphs and
compute networks look like ahead of time. We showed that, even for this restricted case, PISA is successful
in identifying performance boundaries between task scheduling algorithms where the traditional
88
benchmarking approach is not. We hope that SAGA will be a useful tool for the community and that PISA
will advance the state-of-the-art in task scheduling algorithm design and evaluation.
There are many directions for future work in this area as well. First, we plan to extend SAGA to include
more algorithms and datasets. Another logical next step is to extend SAGA and the adversarial analysis
method to support other problem variants. Due to our particular interest in task scheduling for dynamic
environments, we plan to add support for stochastic scheduling (stochastic task costs, data sizes,
computation speeds, and communication costs). It would also be interesting to explore other
meta-heuristics for adversarial analysis (e.g., genetic algorithms) and other performance metrics (e.g.,
throughput, energy consumption, cost, etc.). Finally, the application-specific scenario we explored in
Section 6.4 suggests that an exploration into different methodologies for constructing and comparing
hybrid scheduling algorithms using PISA might be fruitful.
In Chapter 7, we turned our attention to the individual algorithmic components that make up task
scheduling algorithms. We proposed a generalized parametric list scheduling approach for studying the
individual and combined effects of such algorithmic components. We evaluated 72 algorithms produced by
combining five different types of algorithmic components on 20 datasets and presented results on their
individual and combined effects on average performance and runtime. We also discussed how these results
differ for individual datasets, suggesting that the way algorithmic components interact with each other is
problem-dependent (i.e., depends on the task graph structure, whether or not the application is
communication or computation heavy, etc.). There are many directions for future work in this area. First,
this work can be extended by considering new algorithmic components (e.g., k-depth lookahead), new
implementations for the five current algorithmic components, and other datasets. In particular, it would be
interesting to see more results for application-specific datasets.
For this work, we compared algorithms using the traditional benchmarking approach, whereby we ran
each scheduler on different datasets. In Chapter 6, we showed that this approach, while certainly useful,
89
has gaps and can be misleading in some instances. It may be interesting to evaluate the scheduling
algorithms and algorithmic components using the adversarial approach presented in Chapter 6.
The work presented in this dissertation supports our conclusion that novel methods for understanding
the limitations of existing task scheduling algorithms can inform the design of new algorithms better
suited for dynamic, unpredictable, and resource-constrained environments. We believe this work
represents an important step toward a better understanding of task scheduling in modern dispersed
computing systems and hope that it will inspire future work in this area.
90
Bibliography
[1] O. Sinnen, Task Scheduling for Parallel Systems (Wiley series on parallel and distributed
computing). Wiley, 2007. [Online]. Available:
http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471735760.html.
[2] A. Bazzi and A. Norouzi-Fard, “Towards tight lower bounds for scheduling problems,” in
Algorithms - ESA 2015 - 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015,
Proceedings, N. Bansal and I. Finocchi, Eds., ser. Lecture Notes in Computer Science, vol. 9294,
Springer, 2015, pp. 118–129. doi: 10.1007/978-3-662-48350-3\_11.
[3] H. El-Sayed, S. Sankar, M. Prasad, D. Puthal, A. Gupta, M. Mohanty, and C. Lin, “Edge of things:
The big picture on the integration of edge, iot and the cloud in a distributed computing
environment,” IEEE Access, vol. 6, pp. 1706–1717, 2018. doi: 10.1109/ACCESS.2017.2780087.
[4] T. Anevlavis, J. Bunton, J. Coleman, M. G. Dogan, E. Grippo, A. Souza, C. Fragouli,
B. Krishnamachari, M. Maness, K. Olson, et al., “Network synthesis for tactical environments:
Scenario, challenges, and opportunities,” Artificial Intelligence and Machine Learning for
Multi-Domain Operations Applications IV, vol. 12113, pp. 199–206, 2022, Publisher: SPIE. doi:
https://doi.org/10.1117/12.2619048.
[5] P. Fraga-Lamas and T. M. Fernández-Caramés, “Tactical edge iot in defense and national security,”
in IoT for Defense and National Security. John Wiley & Sons, Ltd, 2022, ch. 20, pp. 377–396. doi:
https://doi.org/10.1002/9781119892199.ch20.
[6] B. M. Marlin, N. Suri, S. Fang, M. B. Srivastiva, C. Samplawski, Z. Wang, and M. B. Wigness,
“Iobt-max: A multimodal analytics experimentation testbed for iobt research,” in IEEE Military
Communications Conference, MILCOM 2023, Boston, MA, USA, October 30 - Nov. 3, 2023, IEEE, 2023,
pp. 127–132. doi: 10.1109/MILCOM58377.2023.10356347.
[7] A. Poylisher, A. Cichocki, K. Guo, J. Hunziker, L. Kant, B. Krishnamachari, S. Avestimehr, and
M. Annavaram, “Tactical jupiter: Dynamic scheduling of dispersed computations in tactical
manets,” in MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM), 2021,
pp. 102–107. doi: 10.1109/MILCOM52596.2021.9652937.
91
[8] M. Wigness, T. Abdelzaher, S. Russell, and A. Swami, “Internet of battlefield things: Challenges,
opportunities, and emerging directions,” in IoT for Defense and National Security. 2023, pp. 5–22.
doi: 10.1002/9781119892199.ch1.
[9] M. Kiamari and B. Krishnamachari, “Gcnscheduler: Scheduling distributed computing applications
using graph convolutional networks,” in Proceedings of the 1st International Workshop on Graph
Neural Networking, GNNet 2022, Rome, Italy, 9 December 2022, P. Barlet-Ros, P. Casas, F. Scarselli,
X. Cheng, and A. Cabellos, Eds., ACM, 2022, pp. 13–17. doi: 10.1145/3565473.3569185.
[10] H. Topcuoglu, S. Hariri, and M. Wu, “Task scheduling algorithms for heterogeneous processors,”
in 8th Heterogeneous Computing Workshop, HCW 1999, San Juan, Puerto Rico, April12, 1999, IEEE
Computer Society, 1999, pp. 3–14. doi: 10.1109/HCW.1999.765092.
[11] J. Coleman, Saga: Scheduling algorithms gathered, Github, https://github.com/ANRGUSC/saga,
2023. [Online]. Available: https://github.com/ANRGUSC/saga.
[12] R. L. Graham, “Bounds on multiprocessing timing anomalies,” SIAM Journal on Applied
Mathematics, vol. 17, no. 2, pp. 416–429, 1969. doi: 10.1137/0117039.
[13] H. Wang and O. Sinnen, “List-scheduling versus cluster-scheduling,” IEEE Trans. Parallel
Distributed Syst., vol. 29, no. 8, pp. 1736–1749, 2018. doi: 10.1109/TPDS.2018.2808959.
[14] R. Graham, E. Lawler, J. Lenstra, and A. Kan, “Optimization and approximation in deterministic
sequencing and scheduling: A survey,” in Discrete Optimization II, ser. Annals of Discrete
Mathematics, P. Hammer, E. Johnson, and B. Korte, Eds., vol. 5, Elsevier, 1979, pp. 287–326. doi:
https://doi.org/10.1016/S0167-5060(08)70356-X.
[15] R. Armstrong, D. Hensgen, and T. Kidd, “The relative performance of various mapping algorithms
is independent of sizable variances in run-time predictions,” in Proceedings Seventh Heterogeneous
Computing Workshop (HCW’98), 1998, pp. 79–87. doi: 10.1109/HCW.1998.666547.
[16] J. Blythe, S. Jain, E. Deelman, Y. Gil, K. Vahi, A. Mandal, and K. Kennedy, “Task scheduling
strategies for workflow-based applications in grids,” in 5th International Symposium on Cluster
Computing and the Grid (CCGrid 2005), 9-12 May, 2005, Cardiff, UK, IEEE Computer Society, 2005,
pp. 759–767. doi: 10.1109/CCGRID.2005.1558639.
[17] T. D. Braun, H. J. Siegel, N. Beck, L. Bölöni, M. Maheswaran, A. I. Reuther, J. P. Robertson,
M. D. Theys, B. Yao, D. A. Hensgen, and R. F. Freund, “A comparison of eleven static heuristics for
mapping a class of independent tasks onto heterogeneous distributed computing systems,” J.
Parallel Distributed Comput., vol. 61, no. 6, pp. 810–837, 2001. doi: 10.1006/jpdc.2000.1714.
92
[18] E. H. Houssein, A. G. Gad, Y. M. Wazery, and P. N. Suganthan, “Task scheduling in cloud
computing based on meta-heuristics: Review, taxonomy, open challenges, and future trends,”
Swarm Evol. Comput., vol. 62, p. 100 841, 2021. doi: 10.1016/j.swevo.2021.100841.
[19] D. Hu and B. Krishnamachari, “Throughput optimized scheduler for dispersed computing
systems,” in 7th IEEE International Conference on Mobile Cloud Computing, Services, and
Engineering, MobileCloud 2019, Newark, CA, USA, April 4-9, 2019, IEEE, 2019, pp. 76–84. doi:
10.1109/MOBILECLOUD.2019.00018.
[20] C.-Y. Lee, J.-J. Hwang, Y.-C. Chow, and F. D. Anger, “Multiprocessor scheduling with
interprocessor communication delays,” Operations Research Letters, vol. 7, no. 3, pp. 141–147, 1988.
doi: https://doi.org/10.1016/0167-6377(88)90080-6.
[21] T. N’Takpé and F. Suter, “Critical path and area based scheduling of parallel task graphs on
heterogeneous platforms,” in 12th International Conference on Parallel and Distributed Systems,
ICPADS 2006, Minneapolis, Minnesota, USA, July 12-15, 2006, IEEE Computer Society, 2006,
pp. 3–10. doi: 10.1109/ICPADS.2006.32.
[22] H. Oh and S. Ha, “A static scheduling heuristic for heterogeneous processors,” in Euro-Par ’96
Parallel Processing, Second International Euro-Par Conference, Lyon, France, August 26-29, 1996,
Proceedings, Volume II, L. Bougé, P. Fraigniaud, A. Mignotte, and Y. Robert, Eds., ser. Lecture Notes
in Computer Science, vol. 1124, Springer, 1996, pp. 573–577. doi: 10.1007/BFb0024750.
[23] A. Radulescu and A. J. C. van Gemund, “Fast and effective task scheduling in heterogeneous
systems,” in 9th Heterogeneous Computing Workshop, HCW 2000, Cancun, Mexico, May 1, 2000, IEEE
Computer Society, 2000, pp. 229–238. doi: 10.1109/HCW.2000.843747.
[24] H. El-Rewini and T. G. Lewis, “Scheduling parallel program tasks onto arbitrary target machines,”
Journal of Parallel and Distributed Computing, vol. 9, no. 2, pp. 138–153, 1990. doi:
https://doi.org/10.1016/0743-7315(90)90042-N.
[25] G. C. Sih and E. A. Lee, “A compile-time scheduling heuristic for interconnection-constrained
heterogeneous processor architectures,” IEEE Trans. Parallel Distributed Syst., vol. 4, no. 2,
pp. 175–187, 1993. doi: 10.1109/71.207593.
[26] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network
model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009. doi:
10.1109/TNN.2008.2005605.
93
[27] G. Jaume, A. Nguyen, M. R. Martínez, J. Thiran, and M. Gabrani, “Edgnn: A simple and powerful
GNN for directed labeled graphs,” CoRR, vol. abs/1904.08745, 2019. [Online]. Available:
http://arxiv.org/abs/1904.08745.
[28] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural
networks,” IEEE Trans. Neural Networks Learn. Syst., vol. 32, no. 1, pp. 4–24, 2021. doi:
10.1109/TNNLS.2020.2978386.
[29] S. Kirkpatrick, C. D. Gelatt Jr, and M. P. Vecchi, “Optimization by simulated annealing,” science,
vol. 220, no. 4598, pp. 671–680, 1983. doi: 10.1126/science.220.4598.671.
[30] D. Delahaye, S. Chaimatanan, and M. Mongeau, “Simulated annealing: From basics to
applications,” in Handbook of Metaheuristics, M. Gendreau and J.-Y. Potvin, Eds. Cham: Springer
International Publishing, 2019, pp. 1–35. doi: 10.1007/978-3-319-91086-4_1.
[31] B. Hajek, “A tutorial survey of theory and applications of simulated annealing,” in 1985 24th IEEE
Conference on Decision and Control, 1985, pp. 755–760. doi: 10.1109/CDC.1985.268599.
[32] H. Mao, M. Schwarzkopf, S. B. Venkatakrishnan, Z. Meng, and M. Alizadeh, “Learning scheduling
algorithms for data processing clusters,” in Proceedings of the ACM special interest group on data
communication, 2019, pp. 270–288.
[33] P. Sun, Z. Guo, J. Wang, J. Li, J. Lan, and Y. Hu, “Deepweave: Accelerating job completion time
with deep reinforcement learning-based coflow scheduling,” in Proceedings of the Twenty-Ninth
International Conference on International Joint Conferences on Artificial Intelligence, 2021,
pp. 3314–3320.
[34] M. R. Alizadeh, V. Khajehvand, A. M. Rahmani, and E. Akbari, “Task scheduling approaches in fog
computing: A systematic review,” International Journal of Communication Systems, vol. 33, no. 16,
e4583, 2020.
[35] V. Kamalesh and S. Srivatsa, “On the design of minimum cost survivable network topologies,” in
Nat. Conf. on communication at IIT, Guwahati, India, Citeseer, 2009, pp. 394–397.
[36] K. Steiglitz, P. Weiner, and D. Kleitman, “The design of minimum-cost survivable networks,” IEEE
Transactions on Circuit Theory, vol. 16, no. 4, pp. 455–460, 1969. doi: 10.1109/TCT.1969.1083004.
94
[37] R. Beckett, R. Mahajan, T. D. Millstein, J. Padhye, and D. Walker, “Network configuration
synthesis with abstract topologies,” in Proceedings of the 38th ACM SIGPLAN Conference on
Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017,
A. Cohen and M. T. Vechev, Eds., ACM, 2017, pp. 437–451. doi: 10.1145/3062341.3062367.
[38] M. F. Younis and K. Akkaya, “Strategies and techniques for node placement in wireless sensor
networks: A survey,” Ad Hoc Networks, vol. 6, no. 4, pp. 621–655, 2008. doi:
10.1016/J.ADHOC.2007.05.003.
[39] P. Ghosh, J. Bunton, D. Pylorof, M. A. M. Vieira, K. Chan, R. Govindan, G. S. Sukhatme, P. Tabuada,
and G. Verma, “Rapid top-down synthesis of large-scale iot networks,” in 29th International
Conference on Computer Communications and Networks, ICCCN 2020, Honolulu, HI, USA, August
3-6, 2020, IEEE, 2020, pp. 1–9. doi: 10.1109/ICCCN49398.2020.9209680.
[40] K. Singh, M. Alam, and S. K. Sharma, “A survey of static scheduling algorithm for distributed
computing system,” International Journal of Computer Applications, vol. 129, no. 2, pp. 25–30, 2015.
doi: 10.5120/ijca2015906828.
[41] T. Braun, H. Siegal, N. Beck, L. Boloni, M. Maheswaran, A. Reuther, J. Robertson, M. Theys, B. Yao,
D. Hensgen, and R. Freund, “A comparison study of static mapping heuristics for a class of
meta-tasks on heterogeneous computing systems,” in Proceedings. Eighth Heterogeneous
Computing Workshop (HCW’99), 1999, pp. 15–29. doi: 10.1109/HCW.1999.765093.
[42] L. Canon, E. Jeannot, R. Sakellariou, and W. Zheng, “Comparative evaluation of the robustness of
DAG scheduling heuristics,” in Grid Computing - Achievements and Prospects: CoreGRID Integration
Workshop 2008, Hersonissos, Crete, Greece, April 2-4, 2008, S. Gorlatch, P. Fragopoulou, and T. Priol,
Eds., Springer, 2008, pp. 73–84. doi: 10.1007/978-0-387-09457-1\_7.
[43] Y.-K. Kwok and I. Ahmad, “Benchmarking the task graph scheduling algorithms,” in Proceedings of
the First Merged International Parallel Processing Symposium and Symposium on Parallel and
Distributed Processing, 1998, pp. 531–537. doi: 10.1109/IPPS.1998.669967.
[44] A. K. Maurya and A. K. Tripathi, “On benchmarking task scheduling algorithms for heterogeneous
computing systems,” J. Supercomput., vol. 74, no. 7, pp. 3039–3070, 2018. doi:
10.1007/S11227-018-2355-0.
[45] M. A. Iverson and F. Özgüner, “Hierarchical, competitive scheduling of multiple dags in a dynamic
heterogeneous environment,” Distributed Syst. Eng., vol. 6, pp. 112–, 1999.
95
[46] R. F. de Mello, J. A. A. Filho, L. J. Senger, and L. T. Yang, “Grid job scheduling using route with
genetic algorithm support,” Telecommunication Systems, vol. 38, pp. 147–160, 2008.
[47] O. Votava, P. Macejko, and J. Janecek, “Dynamic local scheduling of multiple dags in distributed
heterogeneous systems,” in DATESO, 2015.
[48] J. Coleman, M. Kiamari, L. Clark, D. D’Souza, and B. Krishnamachari, “Graph Convolutional
Network-based Scheduler for Distributing Computation in the Internet of Robotic Things,” in
MILCOM 2022-2022 IEEE Military Communications Conference (MILCOM), IEEE, 2022,
pp. 1070–1075. doi: 10.1109/MILCOM55135.2022.10017673.
[49] S. Russell and T. Abdelzaher, “The internet of battlefield things: The next generation of command,
control, communications and intelligence (c3i) decision-making,” in MILCOM 2018-2018 IEEE
Military Communications Conference (MILCOM), IEEE, 2018, pp. 737–742.
[50] G. Juve, A. L. Chervenak, E. Deelman, S. Bharathi, G. Mehta, and K. Vahi, “Characterizing and
profiling scientific workflows,” Future Gener. Comput. Syst., vol. 29, no. 3, pp. 682–692, 2013. doi:
10.1016/j.future.2012.08.015.
[51] T. Auer and M. Held, “Heuristics for the generation of random polygons,” in Proceedings of the 8th
Canadian Conference on Computational Geometry, Carleton University, Ottawa, Canada, August
12-15, 1996, F. Fiala, E. Kranakis, and J. Sack, Eds., Carleton University Press, 1996, pp. 38–43.
[Online]. Available: http://www.cccg.ca/proceedings/1996/cccg1996%5C_0007.pdf.
[52] J. Coleman, E. Grippo, B. Krishnamachari, and G. Verma, “Multi-objective network synthesis for
dispersed computing in tactical environments,” in Signal Processing, Sensor/Information Fusion, and
Target Recognition XXXI, vol. 12122, SPIE, 2022, pp. 132–137. doi:
https://doi.org/10.1117/12.2616187.
[53] J. R. Coleman and B. Krishnamachari, “Comparing task graph scheduling algorithms: An
adversarial approach,” CoRR, vol. abs/2403.07120, 2024. doi: 10.48550/ARXIV.2403.07120.
[54] J. Coleman and B. Krishnamachari, “Scheduling algorithms gathered: A framework for
implementing, evaluating, and comparing task graph scheduling algorithms,” University of
Southern California, Tech. Rep., 2023.
[55] J. Hwang, Y. Chow, F. D. Anger, and C. Lee, “Scheduling precedence graphs in systems with
interprocessor communication times,” SIAM J. Comput., vol. 18, no. 2, pp. 244–257, 1989. doi:
10.1137/0218016.
96
[56] N. Hazekamp and D. Thain, Makeflow examples repository, Github,
http://github.com/cooperative-computing-lab/makeflow-examples, 2017. [Online]. Available:
http://github.com/cooperative-computing-lab/makeflow-examples.
[57] K. Keahey, J. Anderson, Z. Zhen, P. Riteau, P. Ruth, D. Stanzione, M. Cevik, J. Colleran,
H. S. Gunawi, C. Hammock, J. Mambretti, A. Barnes, F. Halbach, A. Rocha, and J. Stubbs, “Lessons
learned from the chameleon testbed,” in 2020 USENIX Annual Technical Conference, USENIX ATC
2020, July 15-17, 2020, A. Gavrilovska and E. Zadok, Eds., USENIX Association, 2020, pp. 219–233.
[Online]. Available: https://www.usenix.org/conference/atc20/presentation/keahey.
[58] R. F. da Silva, R. Mayani, Y. Shi, A. R. Kemanian, M. Rynge, and E. Deelman, “Empowering
agroecosystem modeling with HTC scientific workflows: The cycles model use case,” in 2019 IEEE
International Conference on Big Data (IEEE BigData), Los Angeles, CA, USA, December 9-12, 2019,
C. K. Baru, J. Huan, L. Khan, X. Hu, R. Ak, Y. Tian, R. S. Barga, C. Zaniolo, K. Lee, and Y. ( Ye, Eds.,
IEEE, 2019, pp. 4545–4552. doi: 10.1109/BigData47090.2019.9006107.
[59] R. F. da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, and M. P. Atkinson,
“Using simple pid-inspired controllers for online resilient resource management of distributed
scientific workflows,” Future Gener. Comput. Syst., vol. 95, pp. 615–628, 2019. doi:
10.1016/j.future.2019.01.015.
[60] M. Rynge, G. Juve, J. Kinney, J. Good, B. Berriman, A. Merrihew, and E. Deelman, “Producing an
Infrared Multiwavelength Galactic Plane Atlas Using Montage, Pegasus, and Amazon Web
Services,” in Astronomical Data Analysis Software and Systems XXIII, N. Manset and P. Forshay,
Eds., ser. Astronomical Society of the Pacific Conference Series, vol. 485, May 2014, p. 211.
[61] R. Filgueira, R. F. da Silva, A. Krause, E. Deelman, and M. P. Atkinson, “Asterism: Pegasus and
dispel4py hybrid workflows for data-intensive science,” in Seventh International Workshop on
Data-Intensive Computing in the Clouds, DataCloud@SC 2016, Salt Lake, UT, USA, November 14,
2016, IEEE Computer Society, 2016, pp. 1–8. doi: 10.1109/DataCloud.2016.004.
[62] Y. Liu, S. M. Khan, J. Wang, M. Rynge, Y. Zhang, S. Zeng, S. Chen, J. V. M. dos Santos,
B. Valliyodan, P. Calyam, N. C. Merchant, H. T. Nguyen, D. Xu, and T. Joshi, “Pgen: Large-scale
genomic variations analysis workflow and browser in soykb,” BMC Bioinform., vol. 17, no. S-13,
p. 337, 2016. doi: 10.1186/s12859-016-1227-y.
[63] M. Rynge, Sra search pegasus workflow, Github,
https://github.com/pegasus-isi/sra-search-pegasus-workflow, 2017. [Online]. Available:
https://github.com/pegasus-isi/sra-search-pegasus-workflow.
97
[64] A. Shukla, S. Chaturvedi, and Y. Simmhan, “Riotbench: A real-time iot benchmark for distributed
stream processing platforms,” CoRR, vol. abs/1701.08530, 2017. [Online]. Available:
http://arxiv.org/abs/1701.08530.
[65] P. Varshney, S. Ramesh, S. Chhabra, A. Khochare, and Y. Simmhan, “Resilient execution of
data-triggered applications on edge, fog and cloud resources,” in 2022 22nd IEEE International
Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2022, pp. 473–483. doi:
10.1109/CCGrid54584.2022.00057.
[66] D. Cordeiro, G. Mouniê, S. Perarnau, D. Trystram, J.-M. Vincent, and F. Wagner, “Random graph
generation for scheduling simulations,” ICST, May 2010. doi: 10.4108/ICST.SIMUTOOLS2010.8667.
[67] T. Coleman, H. Casanova, and R. F. da Silva, “Automated generation of scientific workflow
generators with wfchef,” Future Gener. Comput. Syst., vol. 147, pp. 16–29, 2023. doi:
10.1016/j.future.2023.04.031.
[68] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J. Maechling, R. Mayani, W. Chen,
R. Ferreira da Silva, M. Livny, and K. Wenger, “Pegasus, a workflow management system for
science automation,” Future Generation Computer Systems, vol. 46, pp. 17–35, 2015. doi:
https://doi.org/10.1016/j.future.2014.10.008.
[69] M. Albrecht, P. Donnelly, P. Bui, and D. Thain, “Makeflow: A portable abstraction for data
intensive computing on clusters, clouds, and grids,” in Proceedings of the 1st ACM SIGMOD
Workshop on Scalable Workflow Execution Engines and Technologies, ser. SWEET ’12, Scottsdale,
Arizona, USA: Association for Computing Machinery, 2012. doi: 10.1145/2443416.2443417.
[70] P. Di Tommaso, M. Chatzou, E. W. Floden, P. P. Barja, E. Palumbo, and C. Notredame, “Nextflow
enables reproducible computational workflows,” Nature Biotechnology, vol. 35, no. 4, pp. 316–319,
Apr. 2017. doi: 10.1038/nbt.3820.
[71] J. R. Coleman, R. V. Agrawal, E. Hirani, and B. Krishnamachari, “Parameterized task graph
scheduling algorithm for comparing algorithmic components,” CoRR, vol. abs/2403.07112, 2024.
doi: 10.48550/ARXIV.2403.07112.
[72] D. D’Souza, M. Kiamari, L. Clark, J. Coleman, and B. Krishnamachari, “Graph convolutional
network-based scheduler for distributing computation in the internet of robotic things,” The 2nd
Student Design Competition on Networked Computing on the Edge.
url=https://github.com/ANRGUSC/gcnschedule-turtlenet, 2022.
98
[73] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin,
“Attention is all you need,” in Advances in Neural Information Processing Systems 30: Annual
Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA,
USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and
R. Garnett, Eds., 2017, pp. 5998–6008. [Online]. Available: https:
//proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
99
List of Publications
Articles in International Refereed Conferences
[C1] J. Coleman, L. Cheng, and B. Krishnamachari, “Search and Rescue on the Line,” in Structural
Information and Communication Complexity, S. Rajsbaum, A. Balliu, J. J. Daymude, and D. Olivetti,
Eds., Cham: Springer Nature Switzerland, 2023, pp. 297–316.
[C2] J. Coleman, E. Kranakis, D. Krizanc, and O. Morales-Ponce, “Delivery to Safety with Two
Cooperating Robots,” in SOFSEM 2023: Theory and Practice of Computer Science - 48th International
Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2023, Nový
Smokovec, Slovakia, January 15-18, 2023, Proceedings, L. Gasieniec, Ed., ser. Lecture Notes in
Computer Science, vol. 13878, Springer, 2023, pp. 359–371. doi: 10.1007/978-3-031-23101-8_24.
[C3] J. Coleman, E. Kranakis, O. Morales-Ponce, J. Opatrny, J. Urrutia, and B. Vogtenhuber, “Minimizing
The Maximum Distance Traveled To Form Patterns With Systems of Mobile Robots,” in Proceedings
of the 32nd Canadian Conference on Computational Geometry, CCCG 2020, August 5-7, 2020, University
of Saskatchewan, Saskatoon, Saskatchewan, Canada, J. M. Keil and D. Mondal, Eds., 2020, pp. 73–79.
[C4] J. Coleman and O. Morales-Ponce, “Robotic Sorting on the Grid,” in ICDCN ’22: 23rd International
Conference on Distributed Computing and Networking, Delhi, AA, India, January 4 - 7, 2022, ACM,
2022, pp. 26–30. doi: 10.1145/3491003.3491016.
100
[C5] J. Coleman and O. Morales-Ponce, “The Snow Plow Problem: Perpetual Maintenance by Mobile
Agents on the Line,” in 24th International Conference on Distributed Computing and Networking,
ICDCN 2023, Kharagpur, India, January 4-7, 2023, ACM, 2023, pp. 110–114. doi:
10.1145/3571306.3571396.
[C6] J. Coleman, E. Kranakis, D. Krizanc, and O. Morales-Ponce, “The Pony Express Communication
Problem,” in Combinatorial Algorithms - 32nd International Workshop, IWOCA 2021, Ottawa, ON,
Canada, July 5-7, 2021, Proceedings, P. Flocchini and L. Moura, Eds., ser. Lecture Notes in Computer
Science, vol. 12757, Springer, 2021, pp. 208–222. doi: 10.1007/978-3-030-79987-8_15.
[C7] J. Coleman, E. Kranakis, D. Krizanc, and O. Morales-Ponce, “Message Delivery in the Plane by Robots
with Different Speeds,” in Stabilization, Safety, and Security of Distributed Systems - 23rd International
Symposium, SSS 2021, Virtual Event, November 17-20, 2021, Proceedings, C. Johnen, E. M. Schiller, and
S. Schmid, Eds., ser. Lecture Notes in Computer Science, vol. 13046, Springer, 2021, pp. 305–319. doi:
10.1007/978-3-030-91081-5_20.
[C8] J. R. Coleman, E. Kranakis, D. Krizanc, and O. Morales-Ponce, “Line Search for an Oblivious Moving
Target,” in 26th International Conference on Principles of Distributed Systems, OPODIS 2022, December
13-15, 2022, Brussels, Belgium, E. Hillel, R. Palmieri, and E. Rivière, Eds., ser. LIPIcs, vol. 253, Schloss
Dagstuhl - Leibniz-Zentrum für Informatik, 2022, 12:1–12:19. doi: 10.4230/LIPICS.OPODIS.2022.12.
[C9] S. T. Valapu, T. Sarkar, J. R. Coleman, A. Avyukt, H. Embrechts, D. Torfs, M. Minelli, and
B. Krishnamachari, “DARSAN: A decentralized review system suitable for NFT marketplaces,” CoRR,
vol. abs/2307.15768, 2023. doi: 10.48550/arXiv.2307.15768.
101
Articles in International Refereed Workshops
[W1] T. Anevlavis, J. Bunton, J. Coleman, M. G. Dogan, E. Grippo, A. Souza, C. Fragouli, B. Krishnamachari,
M. Maness, K. Olson, et al., “Network synthesis for tactical environments: Scenario, challenges, and
opportunities,” Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications
IV, vol. 12113, pp. 199–206, 2022, Publisher: SPIE. doi: https://doi.org/10.1117/12.2619048.
[W2] J. Coleman, M. Kiamari, L. Clark, D. D’Souza, and B. Krishnamachari, “Graph Convolutional
Network-based Scheduler for Distributing Computation in the Internet of Robotic Things,” in
MILCOM 2022-2022 IEEE Military Communications Conference (MILCOM), IEEE, 2022, pp. 1070–1075.
doi: 10.1109/MILCOM55135.2022.10017673.
[W3] J. Coleman, E. Grippo, B. Krishnamachari, and G. Verma, “Multi-objective network synthesis for
dispersed computing in tactical environments,” in Signal Processing, Sensor/Information Fusion, and
Target Recognition XXXI, vol. 12122, SPIE, 2022, pp. 132–137. doi:
https://doi.org/10.1117/12.2616187.
Technical Reports
[R1] S. Yu, J. Coleman, and B. Krishnamachari, “Chatlang: A two-window approach to chatbots for
language learning,” Autonomous Networks Research Group, University of Southern California, Tech.
Rep., 2023. [Online]. Available: https://anrg.usc.edu/www/papers/chatlang.pdf.
Preprints
[P1] J. R. Coleman and B. Krishnamachari, “Comparing task graph scheduling algorithms: An adversarial
approach,” CoRR, vol. abs/2403.07120, 2024. doi: 10.48550/ARXIV.2403.07120.
102
[P2] J. R. Coleman, R. V. Agrawal, E. Hirani, and B. Krishnamachari, “Parameterized task graph
scheduling algorithm for comparing algorithmic components,” CoRR, vol. abs/2403.07112, 2024. doi:
10.48550/ARXIV.2403.07112.
103
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
On scheduling, timeliness and security in large scale distributed computing
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
PDF
AI-enabled DDoS attack detection in IoT systems
PDF
Scientific workflow generation and benchmarking
PDF
Resource scheduling in geo-distributed computing
PDF
Sample-efficient and robust neurosymbolic learning from demonstrations
PDF
Advancing distributed computing and graph representation learning with AI-enabled schemes
PDF
Optimizing task assignment for collaborative computing over heterogeneous network devices
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Enhancing collaboration on the edge: communication, scheduling and learning
PDF
Dynamic pricing and task assignment in real-time spatial crowdsourcing platforms
PDF
Efficient crowd-based visual learning for edge devices
PDF
Theoretical foundations for dealing with data scarcity and distributed computing in modern machine learning
PDF
Remote exploration with robotic networks: queue-aware autonomy and collaborative localization
PDF
Learning, adaptation and control to enhance wireless network performance
PDF
Graph embedding algorithms for attributed and temporal graphs
PDF
Scheduling and resource allocation with incomplete information in wireless networks
PDF
Workflow restructuring techniques for improving the performance of scientific workflows executing in distributed environments
PDF
Algorithmic aspects of energy efficient transmission in multihop cooperative wireless networks
PDF
Scalable evacuation routing in dynamic environments
Asset Metadata
Creator
Coleman, Jared Ray
(author)
Core Title
Dispersed computing in dynamic environments
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2024-05
Publication Date
04/09/2024
Defense Date
04/04/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
dispersed computing,distributed computing,IoT,OAI-PMH Harvest,scheduling,scientific workflows,task graphs,task scheduling
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Krishnamachari, Bhaskar (
committee chair
), Deshmukh, Jyotirmoy (
committee member
), Psounis, Konstantinos (
committee member
)
Creator Email
jaredcol@usc.edu,jaredraycoleman@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113871305
Unique identifier
UC113871305
Identifier
etd-ColemanJar-12783.pdf (filename)
Legacy Identifier
etd-ColemanJar-12783
Document Type
Dissertation
Format
theses (aat)
Rights
Coleman, Jared Ray
Internet Media Type
application/pdf
Type
texts
Source
20240409-usctheses-batch-1138
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
dispersed computing
distributed computing
IoT
scheduling
scientific workflows
task graphs
task scheduling