Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Advancing distributed computing and graph representation learning with AI-enabled schemes
(USC Thesis Other)
Advancing distributed computing and graph representation learning with AI-enabled schemes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Advancing Distributed Computing and Graph Representation Learning with AI-Enabled Schemes
by
Mehrdad Kiamari
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Electrical Engineering)
August 2024
Dedication
To my beloved family, for their constant support.
ii
Acknowledgements
I extend my heartfelt gratitude to my advisor, Professor Bhaskar Krishnamachari, whose transformative
guidance and profound insights have played a pivotal role in my growth as a researcher. His guidance style,
impressive patience, and life philosophy have been exemplary and deeply inspiring. The nurturing
intellectual environment he fostered has been a crucial source of motivation and aspiration for me. This
dissertation embodies the collective wisdom, encouragement, and support I have received during my
academic journey. I consider myself truly fortunate to have had such an outstanding advisor in my life.
I am deeply grateful to Professors Meisam Razaviyayn, Yue Zhao, Cauligi Raghavendra for their
guidance, feedback, and collaboration, as well as their roles in my defense committee. My appreciation
extends to my dissertation committee members, whose expertise and insightful critiques have significantly
deepened and broadened my research. For my Qualification Exams, I owe much to Professors Meisam
Razaviyayn, Konstantinos Psounis, Leana Golubchik, and Ashutosh Nayyar, whose rigorous evaluations
and thoughtful insights have immensely refined my work. Special thanks to Dr. Mohammad Reza Rajati for
his continual support and guidance. Also, I am thankful to Professors Murali Annavaram and Salman
Avestimehr for their contributions.
I am deeply appreciative of my fellow researchers and colleagues at the Autonomous Networks
Research Group (ANRG), whose invaluable feedback, engaging discussions, and camaraderie have played a
pivotal role in shaping the concepts presented in this dissertation. Their diverse viewpoints and collective
insight have been contributing to this research endeavor.
iii
The resources and support provided by the University of Southern California have been fundamental
to my research endeavors. I am especially thankful for USC Center for Advanced Research Computing
(CARC), which has been vital for advancing my work.
My family’s endless love, encouragement, and belief in my capabilities have been the cornerstone of
my perseverance and success. Their unwavering support has consistently been a source of strength and
motivation.
I extend my sincere appreciation to all participants and contributors for dedicating their time and
insights to this research. Their engagement has been fundamental to the success of this work.
To conclude, I am profoundly grateful to everyone involved in this academic endeavor. Their
contributions have been vital to completing this dissertation and have greatly enriched my development as
a scholar.
Mehrdad Kiamari
University of Southern California
August 2024
iv
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 2: Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1 Distributed Ledger Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Distributed Iterative Processes on Networked Systems . . . . . . . . . . . . . . . . . . . . . 16
2.3 Scheduling Problems for Distributed Computing Systems . . . . . . . . . . . . . . . . . . . 18
2.4 Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Multi-Layer Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.3 Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.4 Graph Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.5 Kolmogorov-Arnold Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 3: Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Conventional Distributed Ledger and Consensus . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Traditional Methods for Scheduling for Distributed Iterative Processes . . . . . . . . . . . 29
3.3 Machine Learning based Scheduling Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Graph Representation Learning Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Chapter 4: Distributed Ledger and Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1 The System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.1 Proposed Blizzard Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.2 Proposed GCN-Consensus Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Safety and Liveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.1 Safety Analysis of Blizzard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.2 Safety Evaluation of GCN-Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.3 Liveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Performance Analysis of Blizzard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
v
4.3.1 Throughput per Shard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3.2 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.3 Average Message Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 Implementation and Experimental Measurements . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 Throughput Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.2 Latency Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4.3 Battery Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5.1 Mobile-Device Oriented Sybil Control . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5.2 Improving Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5.3 Safety under a Partially Synchronous Model . . . . . . . . . . . . . . . . . . . . . . 65
4.5.4 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5.5 Challenges in Real-World Implementation . . . . . . . . . . . . . . . . . . . . . . . 66
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Chapter 5: Distributed Iterative Processes over Networked Systems . . . . . . . . . . . . . . . . . . 68
5.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1.1 Gossip-based Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2 Proposed Semi-Definite Programming (SDP) Relaxation . . . . . . . . . . . . . . . . . . . . 77
5.2.1 Expected Value Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2.2 Upper Bound on the Optimal Solution . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3.1 Distributed Iterative Process with Pre-defined Settings . . . . . . . . . . . . . . . . 82
5.3.2 Gossip-based Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Chapter 6: Graph Neural Network based Scheduling Scheme . . . . . . . . . . . . . . . . . . . . . . 89
6.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.1.1 Objective 1: Makespan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.1.2 Objective 2: Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.2 Proposed GCNScheduler Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2.1 Overview of GCNs for Directed Graphs . . . . . . . . . . . . . . . . . . . . . . . . 94
6.2.2 Proposed Input Graph for GCN Models . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2.3 Implementation and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3.1 Makespan Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.3.2 Throughput Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Chapter 7: Graph Kolmogorov Arnold Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.1 Overview on Graph Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.2 Proposed GKAN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.2.1 GKAN Architecture 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2.2 GKAN Architecture 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.3.1 Comparison with GCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.3.2 Evaluating the Influence of Parameters on GKAN . . . . . . . . . . . . . . . . . . . 113
7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
vi
Chapter 8: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.1 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
vii
List of Tables
3.1 Comparison of Different Protocols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1 Notations Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Features for Consensus Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Comparison of Different Protocols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1 Makespan comparison of GCNScheduler, HEFT [13], READYS [128], and the random
scheduler for different numbers of tasks with a task graph width of 10. . . . . . . . . . . . 100
6.2 Time taken (in seconds) by GCNScheduler, HEFT [13], and READYS [128] to perform
scheduling for medium-scale synthetic task graphs with different numbers of tasks. . . . . 101
6.3 Makespan and time to find schedule (TTFS) (in seconds) for HEFT [13], READYS [128], and
GCNScheduler for real-world applications from [152, 150, 149, 148, 151]. . . . . . . . . . . 101
6.4 Throughput (TP) and time to find schedule (TTFS) in seconds for GCNScheduler (optimized
for throughput), Throughput (TP)-HEFT algorithm [19], and READYS [128] for medium-size
synthetic task graphs with varying numbers of tasks. . . . . . . . . . . . . . . . . . . . . . 103
6.5 Throughput and time to find schedule (TTFS) in milliseconds for TP-HEFT [19],
READYS [128], and GCNScheduler for real-world applications. . . . . . . . . . . . . . . . . 104
6.6 Throughput of GCNScheduler (with the throughput-maximization objective) and the
random task scheduler for large-scale task graphs. . . . . . . . . . . . . . . . . . . . . . . . 104
6.7 Time taken (in milliseconds) by GCNScheduler (with throughput objective) to schedule
large-scale task graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.1 Architectures and their performance on the first 100 features of the Cora Dataset. . . . . . 112
7.2 Architectures and their performance on the first 200 features of the Cora Dataset. . . . . . 112
7.3 Range of Values for Parameters for the degree of the spline polynomial k, grid size for
spline g, and number of hidden nodes h, with default values in bold . . . . . . . . . . . . . 114
viii
List of Figures
2.1 Illustration of the convolution process in a 2-layer GCN. Each node updates its embedding
by aggregating information from its neighbors across multiple layers. . . . . . . . . . . . . 22
4.1 An illustration on how Blizzard works for k = 2. Mobile node 1 queries brokers B2 and B3
on a new transaction these brokers have not queried yet (shown with blue arrows). Then,
these brokers query all their connected mobile nodes about the transaction (depicted with
orange arrows). Afterwards, all connected mobile nodes respond back to queries (shown
with green arrows) and brokers reflect the majority vote to all of the connected nodes
(presented by dashed-black arrows). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Proposed distributed random matching scheme for connecting each mobile node to k brokers. 41
4.3 An illustration of how node embeddings are formed in a two-layer GCN-Consensus. . . . . 44
4.4 An illustration of safety-guaranteed region (indicated with yellow color) of Blizzard
protocol through simulation for different number of connections of each mobile node,
i.e. k, and different number of brokers, i.e. m while fixing total number of mobile nodes
n = 2000 and 1000 iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5 Percentage of reaching consensus by GCN-Consensus scheme versus Byzantine ratio of
nodes for test graphs with 100 nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 Required number of rounds to reach consensus by GCN-Consensus scheme versus
Byzantine ratio of nodes for test graphs with 100 nodes and setting conviction threshold to
15. Note that 0 means that it does not reach consensus. . . . . . . . . . . . . . . . . . . . . 57
4.7 The pipeline modeling of different components of Blizzard to acquire throughput. We use
orange for computing components and green for communication components in our box
diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.8 The total number of massages versus the lowest number of communication rounds (LNCR)
in Blizzard compared with Avalanche for the following setting: n = 500, k = 3, and m is
varied from 4 to 19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
ix
4.9 The left and right plots respectively represent the histogram of transactions latency of our
proposed scheme (Blizzard) compared with Avalanche [7] for the case of 100 transactions,
security parameters β1 = 11 and β2 = 150. The left histogram corresponds to the settings
of 100 nodes, 8 brokers, and 3 connections per node; while the right figure corresponds to
the case of 200 nodes, 11 broker, and 6 connections per node. . . . . . . . . . . . . . . . . . 62
5.1 An illustration of a task graph consisting of five tasks. . . . . . . . . . . . . . . . . . . . . . 71
5.2 An illustration of a compute graph consists of three machines with corresponding
bandwidths shown on edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 The corresponding DAG of the task graph GT ask depicted in Fig. 5.1 to be used with
HEFT-based schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4 Bottleneck time of different schemes: HEFT [13], Throughput HEFT [19], SDP method with
naive rounding, our proposed scheme, and the upper bound of our proposed approach, for
various numbers of tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5 Comparison of bottleneck time across different schemes: HEFT [13], Throughput HEFT
[19], SDP method with naive rounding, our proposed scheme, and the upper bound of our
proposed approach, for varying degrees of the task graphs. . . . . . . . . . . . . . . . . . . 85
5.6 Bottleneck time for executing gossip-based federated learning of T = 10 tasks on K = 4
distributed machines using four different schemes: HEFT [13], Throughput HEFT (TP) [19],
the SDP method with naive rounding, and our proposed scheme (SDP with randomized
rounding). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1 Illustrations of a task graph and distributed computing systems. . . . . . . . . . . . . . . . 91
6.2 An illustration of the designed input graph fed into the EDGNN model for the task graph
shown in Fig. 6.1a. Node and edge features are represented with brown and green colors,
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3 Task graphs of the three perception application considered in [152]. . . . . . . . . . . . . 99
6.4 Makespan of GCNScheduler and the random scheduler for large-scale task graphs with
different number of tasks and different EP. . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.5 Inference time of our GCNScheduler for large-scale task graphs with varying numbers of
tasks and different edge probabilities (EP). . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.1 Overview of a two-layer GCN [49] architecture. . . . . . . . . . . . . . . . . . . . . . . . . 108
7.2 Overview of a two-layer GKAN Architecture 1. . . . . . . . . . . . . . . . . . . . . . . . . 109
7.3 Overview of a two-layer GKAN Architecture 2. . . . . . . . . . . . . . . . . . . . . . . . . 110
7.4 Accuracy comparison of GCN and GKAN architectures for k = 1 and g = 3. . . . . . . . . 113
x
7.5 Loss value comparison of GCN and GKAN architectures for k = 1 and g = 3. . . . . . . . 114
7.6 Accuracy comparison of GKAN Architecture 2 for g ∈ {3, 7, 11}, k = 1, and h = 16. . . . 115
7.7 Loss values of GKAN Architecture 2 for g ∈ {3, 7, 11}, k = 1, and h = 16. . . . . . . . . . 115
7.8 Accuracy of GKAN Architecture 2 for k ∈ {1, 2, 3}, g = 3, and h = 16. . . . . . . . . . . . 116
7.9 Loss values of GKAN Architecture 2 for k ∈ {1, 2, 3}, g = 3, and h = 16. . . . . . . . . . . 116
7.10 Accuracy comparison of GKAN Architecture 2 for h ∈ {8, 12, 16}, g = 3, and k = 1. . . . 117
7.11 Accuracy comparison of GKAN Architecture 2 for h ∈ {8, 12, 16}, g = 3, and k = 1 over
600 epochs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.12 Loss values of GKAN Architecture 2 for h ∈ {8, 12, 16}, g = 3, and k = 1. . . . . . . . . . 118
xi
Abstract
This thesis investigates the evolving challenges and opportunities within distributed computing and
communications, emphasizing the optimization of performance, security, and efficiency. It is structured
into interconnected chapters, each addressing key aspects of distributed systems research. To begin with, it
focuses on robust consensus mechanisms for mobile distributed systems, crucial for maintaining the
integrity and reliability of decentralized networks. This includes the introduction of Blizzard, the first
mobile-based consensus protocol for distributed ledger, as well as the novel application of graph
convolutional networks (GCNs) in managing consensus. In the next chapter, the focus shifts to minimizing
bottleneck time for distributed iterative processes. A new task scheduling scheme for such processes on
networked distributed systems is introduced, initially modeled as a Binary Quadratic Program and
subsequently refined through a Semi-Definite Program with randomized rounding. This optimized method,
when applied to gossip-based federated learning, significantly outperforms traditional scheduling methods
by dramatically reducing bottleneck times. Lastly, it presents groundbreaking scheduling schemes for
distributed resources, focusing on the "GCNScheduler," which is the first GCN designed to optimize task
scheduling. The GCNScheduler significantly reduces scheduling times by several orders of magnitude as
well as facilitating efficient task execution across a range of applications. In addition to the GCNScheduler,
we introduce Graph Kolmogorov Arnold Networks (GKAN) for general purposes. Overall, this thesis
advances distributed computing and graph neural networks by introducing new methodologies that
xii
enhance communication efficiency and computational performance, supporting the next generation of
computing infrastructure to meet growing data and computational demands.
xiii
Chapter 1
Introduction
The growing complexity and interconnectivity of digital systems have ushered in unprecedented
challenges and opportunities in the field of distributed computing and communications. This thesis
addresses several critical facets of this dynamic landscape, structured into three cohesive chapters, each
tackling distinct but interrelated topics at the forefront of current research in distributed systems. The
unifying thread across these chapters is the pursuit of optimizing performance, security, and efficiency in
consensus mechanisms for distributed systems, SDP-based scheduling scheme for distributed computing
systems, and finally proposing graph neural network based task scheduling scheme as well as introducing
new graph neural networks.
In chapter 4, we focus on the safety and efficiency of distributed systems, in particular the
development of robust consensus mechanisms for distributed mobile nodes, an area critical for maintaining
the integrity and reliability of decentralized networks. This chapter provides a detailed presentation of two
crucial studies that have substantially advanced the consensus of mobile systems.
It is estimated that the global number of mobile device users exceeds 4 billion [1], with the mobile
ecosystem continuing to expand [2]. This significant user base motivates our exploration into a
mobile-first consensus protocol for distributed ledger management. Contrary to prior studies that
concentrated on the security and privacy issues of mobile devices merely functioning as transaction
1
transmitters and receivers [3, 4], our research investigates the feasibility of utilizing these devices more
actively in the consensus process, specifically in transaction validation. Although earlier research primarily
utilized mobile devices as clients due to their limited resources compared to larger computing servers, it is
worth noting that modern mobile devices now boast substantial computational, communication, and
storage capabilities. Additionally, given the ongoing advancements in mobile technology as predicted by
Moore’s law [5], these capabilities are expected to further improve.
We introduce a mobile-based distributed ledger maintenance consensus protocol named Blizzard.
Described briefly in [6], Blizzard is a leaderless consensus protocol where mobile nodes interact with
online servers, known as brokers. Instead of maintaining connections with the entire network, mobile
nodes manage connections with a scalable number of broker-associated end-addresses. While non-mobile
devices can also serve as validators, Blizzard is distinctive in explicitly supporting mobile device-based
validators that can function intermittently. Each mobile node is linked to a subset of servers for specified
periods, facilitating communication within each broker’s mobile node group, thus forming an effective
broadcast network. Blizzard is designed to perform efficiently across a range of mobile devices,
accommodating various operating systems, hardware capabilities, and communication technologies. We
provide a mathematical proof of Blizzard’s safety, specifying the ratios of Byzantine nodes and brokers that
ensure the protocol’s reliability. The conditions for maintaining liveness are also discussed. The
throughput of Blizzard is analytically determined by modeling the transaction processing sequence at each
node and identifying empirical bottlenecks. We further analyze the protocol’s confirmation latency and
message complexity. Using brokers, Blizzard achieves an efficient consensus communication structure,
allowing transactions to be propagated within just four communication rounds under optimal settings.
Experimental results from Blizzard’s software implementation demonstrate a capacity for handling 10,000
transactions per second per shard, with confirmation times under one second.
2
In the second work, we introduce a groundbreaking approach by using graph neural networks (GNNs)
to manage consensus in gossip-based protocols, exploiting their ability to encapsulate complex functions
crucial for network-based querying processes. Our method’s architecture utilizes the natural proficiency of
GNNs to handle and interpret relational data among nodes, which consistently exchange and update
information. Within this framework, each node is characterized by a set of features that depict its current
state and dynamically affect and shape the outcomes of its interactions with adjacent nodes. This study
suggests that the GNN’s capacity to assimilate and learn from the complex dynamics of these interactions
makes it perfectly suited for navigating the complexities inherent in gossip-based consensus mechanisms.
To demonstrate this approach, we implement our GNN-based consensus model in the Avalanche protocol
environment [7] to show its capability in learning and adapting to the complex requirements of consensus
processes. Our results reveal that GNNs are exceptionally proficient at mastering these complex relational
functions and also enhance the protocol’s decision-making by incorporating additional data such as the
risk associated with each entity. This leads to a more informed network analysis, improving the protocol’s
consensus-reaching capabilities
In chapter 5, we shift our focus to the semi-definite programming based scheduling scheme for
distributed computing systems. For the emerging wave of applications such as Internet-of-Things (IoT) and
mobile-data, training Machine Learning (ML) models may need to be performed in a distributed fashion for
reasons such as data privacy∗
. This has given rise to Federated Learning (FL) frameworks which aim at
preserving data privacy. Another reason for training ML models in a distributed manner is due to massive
computations of their growing scale†
[8, 9], hence careful allocation of processing ML models on
distributed and networked computers plays a crucial role in significantly reducing the execution time ‡
.
∗Not allowing one centralized entity have access to data from many sources.
†
For instance, the computations for deep neural networks remarkably increases as number of layers and hidden nodes
increases.
‡High-performance ML training with privacy guarantees can be achieved on trusted distributed computing platforms by
utilizing recently developed techniques [10].
3
Distributed ML applications such as Federated Learning (FL), where model parameters are exchanged
after certain number of iterations, fall under the umbrella of distributed iterative processes [11].
Distributed iterative processes consist of multiple tasks with a given inter-task data dependency structure,
i.e. each task generates inputs for certain other tasks. Such a distributed iterative process can be described
by a directed graph, known as task graph, where vertices represent tasks of the process and edges indicate
the inter-task data dependencies.
In each iteration of executing a distributed iterative process, every task requires to be executed and its
processed data needs to be transferred to computing resources where its successive tasks located on. The
total time taken by the task with a dominant combined computational time (for executing a task) and
communication time (to transfer the processed data of a task), which is referred to as bottleneck time, will
be equal to the required time for an iteration. Since the total time required to execute an iterative process
for a certain number of iterations is equal to the summation of time required to complete each iteration,
minimizing bottleneck time would consequently lead to decreasing the completion time of the entire
process.
Bottleneck time minimization can be achieved through efficient task scheduling where tasks of an
iterative process are assigned to appropriate distributed computing resources to be executed. Most prior
task scheduling schemes are tailored to a particular class of task graphs called Directed Acyclic Graph
(DAG) [12, 13, 14, 15], while the task graphs of distributed iterative processes that we consider belong to a
broader class of directed graphs (with or without cycles). Furthermore, significant number of existing task
scheduling schemes (e.g., [13, 16, 17, 18]) have focused on minimizing makespan, i.e. the time it takes to
finish the execution for one set of inputs, which is meaningful only for a DAG-based task graph. Only few
works have investigated minimizing bottleneck time (or equivalently maximizing throughput) such as [19].
The underlying methodology for task scheduling can be categorized into heuristic-based algorithms
(e.g. [20, 21]), meta-heuristic ones (e.g. [22, 23, 24, 25, 26, 27]) , and optimization-based schemes (e.g. [28,
4
29]). One of the most well-known heuristic task scheduling scheme is HEFT [13] which we will consider as
one of our benchmarks. Although heuristic schemes are used to be fast and often get stuck in a local
optimum, meta-heuristic and optimization-based schemes have gained significant attention as they are
practically able to obtain a solution near the optimum [30, 31, 32]. Since task scheduling belongs to the
class of NP-hard problems in its nature [30]
§
, an appropriate relaxation of the optimization problem such
as a Convex-based one can potentially lead to an efficacious performance [33]. Nevertheless, most
Convex-based task scheduling schemes (e.g. [28, 29]) have focused on the case where there is no inter-task
data dependency and the communication delay across distributed resources is negligible compared to
computational time. However, there are reasons to explore other parts of the design space as processors
today are remarkably quicker in computing and distributed processing may potentially operate across a
wireless network with low bandwidth or wide-area network with high delays.
In chapter 6, we present new graph neural networks for scheduling problems of distributed computing
as well as introducing new graph neural networks for general purposes. In the first study of this chapter,
we explore groundbreaking scheduling schemes essential for the efficient allocation and execution of
computational tasks across distributed resources. Central to this discussion is the "GCNScheduler," a
game-changing approach detailed in the corresponding document, which leverages graph convolutional
networks to optimize task distribution in scenarios ranging from the Internet of Robotic Things to sensor
networks. Follow-up studies and extensions, such as those discussed in "Graph Convolutional
Network-based Scheduler for Distributing Computation in the Internet of Robotic Things" and "Sensor
Selection and Task Scheduling," further refine and adapt this innovative scheduling framework to a variety
of complex, real-world environments.
Successfully running complex graph-based applications, from edge-cloud processing in IoT systems
[34, 35] to astronomical data analysis [36], hinges on the effective execution of all application components
§Task scheduling is a well-known NP-hard problem because of the solution space and required time to obtain the optimal
solution.
5
via proficient task scheduling. Efficient task scheduling is critical not only for optimizing the use of
computing resources and minimizing task execution time but also for potentially generating significant
profits for service providers [37]. Within this structure, an application consists of multiple tasks, each
linked by a data dependency structure represented as a directed acyclic graph (DAG), or task graph, where
nodes and edges denote tasks and data dependencies respectively. A job is completed once all tasks are
executed by compute machines in accordance with these dependencies.
Task schedulers typically aim to optimize either of the following two key metrics: makespan or
throughput. Makespan refers to the time required to complete all tasks for a single input, while throughput
is the maximum rate at which inputs can be processed continuously. Both makespan reduction and
throughput enhancement are achievable through effective task-scheduling algorithms that allocate tasks to
distributed computing resources for execution.
Task scheduling strategies can be divided into heuristic-based (e.g., [20, 21]), meta-heuristic (e.g., [22,
23, 24, 25, 26, 27]), and optimization-based methods (e.g., [28, 29]). Among these, the heterogeneous
earliest-finish time (HEFT) algorithm [13] is a well-known heuristic method for minimizing makespan, and
for throughput maximization, the TP-HEFT algorithm [19] serves as a benchmark.
A major limitation of these traditional scheduling methods is their reduced effectiveness in larger
settings; they become computationally demanding as task graphs grow in size. This is particularly relevant
in the context of large language models, where the training and inference tasks are inherently complex and
distributed across multiple systems. As these models or their associated datasets span vast distributed
networks, efficient task scheduling becomes crucial. GCNScheduler can play a critical role in optimizing
these processes by reducing the time required to generate responses to prompts, perform initial model
training, or execute fine-tuning operations. This optimization is essential for maintaining prompt response
times and effective resource utilization, particularly as the scale and complexity of language models
continue to grow. This is also relevant as applications in domains such as IoT for smart cities involve
6
increasingly complex interdependencies among numerous tasks, with scheduling often required to adapt
frequently due to network or resource changes [38, 39]. Therefore, there is a critical need for a quicker task
scheduling method for large-scale task graphs.
An innovative alternative involves applying machine learning for function approximation in task
scheduling. Given the graphical nature of application structures, we propose the use of a graph
convolutional network (GCN) [40] designed to learn from the task graph’s inter-task dependencies and
network configurations, such as execution speeds and communication bandwidths between compute
machines. GCNs have gained significant attention for their efficacy in various graph-based applications
including semi-supervised link prediction [41] and node classification [40]. GCNs create node embeddings
through a layer-by-layer process, where each node’s embedding is updated by aggregating the embeddings
of its neighbors, followed by a neural transformation and non-linear activation. For node classification
tasks, the final layer’s embeddings are processed through a softmax operator to predict node labels,
allowing for end-to-end learning of GCN parameters. GCNs can be categorized into spectral-based,
requiring matrix decomposition with scalability issues [42], and spatial-based, which avoid such
complexities through message-passing [43].
As far as the last chapter is concerned, we aim at advancing graph neural networks for general
purposes. The landscape of deep learning has witnessed transformative advancements in recent years,
particularly in the development of methodologies that effectively handle graph-structured data—a crucial
element in applications like recommendation systems that utilize intricate user-to-item interaction and
social graphs [44, 45, 40, 46, 47, 48]. Among the notable innovations, Graph Convolutional Networks
(GCNs) have emerged as a paradigm-shifting architecture [45, 40, 46, 47, 49]. GCNs harness the power of
neural networks to iteratively aggregate and transform feature information from local graph
neighborhoods [49]. This method enables a robust integration of both content and structural data from
graphs, setting new benchmarks across various recommender system applications [45]. Further
7
improvements in accuracy are essential across many domains, including large-scale networks, to address
the limitations of GCNs.
At their core, GCNs are based on Multi-layer Perceptrons (MLPs), which are foundational to modern
deep learning frameworks and essential for their robust capability to approximate nonlinear functions—a
trait anchored in the universal approximation theorem [50]. Despite their widespread use and critical role
in contemporary models, MLPs encounter notable limitations, such as significant consumption of
non-embedding parameters in transformers [51] and limited interpretability unless additional post-analysis
tools are employed [52]. In response, [53] have recently introduced an innovative alternative, the
Kolmogorov-Arnold Networks (KAN), inspired not by the universal approximation theorem like MLPs but
by the Kolmogorov-Arnold representation theorem [54, 55, 56]. Unlike MLPs, which utilize learnable
weights on the edges and fixed activation functions on nodes, KANs deploy learnable univariate functions
on the edges and simple summations on the nodes. Each weight in a KAN is thus a learnable
one-dimensional function, shaped as a spline, allowing for significantly smaller computation graphs
compared to those required by MLPs. Historically, neural network designs based on the
Kolmogorov-Arnold theorem have primarily adhered to a depth-2, width-(2n + 1) model [57], lacking in
modern training enhancements such as backpropagation. KANs, however, allow the stacking of
KAN-layers to create deep learning networks that can be trained using backpropagation.
1.1 Contributions
In this dissertation, we introduce a series of innovative contributions to the advancements of distributed
computing as well as graph neural networks. Our research is founded on extensive analysis and the
development of novel methods designed to enhance the performance of distributed computing and graph
representation learning. The key contributions of our research are outlined as follows:
8
• Distributed Consensus Protocol for Mobile Devices: We introduce Blizzard, a Byzantine Fault
Tolerant (BFT) distributed ledger protocol designed to integrate mobile devices as primary
participants in the consensus process. Blizzard features an innovative two-tier architecture where
mobile nodes engage through online brokers, coupled with a decentralized matching mechanism to
ensure connections between nodes and a set number of random brokers. We provide a mathematical
analysis that establishes a guaranteed safety region, defining the proportions of malicious nodes and
brokers under which the protocol remains secure. Additionally, Blizzard’s performance is evaluated
in terms of throughput, latency, and message complexity. Our experimental results from a software
implementation demonstrate that Blizzard achieves throughput of several thousand transactions per
second per shard, with sub-second confirmation times.
• GCN-based Consensus Scheme: We present an innovative consensus protocol built on Graph
Convolutional Networks (GCN), specifically tailored to boost the efficiency and security of
blockchain technologies. Utilizing the structural data processing power of GCNs, our model
addresses the scalability and performance bottlenecks commonly found in blockchain networks. We
concentrate on validating the compatibility of our GCN-based protocol with the Avalanche protocol,
which is well-known for its efficient handling of conflicting transactions. Through comprehensive
simulations, we demonstrate that our GCN-based consensus protocol effectively emulates the
Avalanche approach, thereby maintaining strong safety and reliability in the resolution of
transaction conflicts.
• Bottleneck Time Minimization for Distributed Iterative Processes on Networked
Computers: We introduce a new task scheduling framework designed to speed up computational
applications that rely on distributed iterative processes executed across networked computing
resources. Each application involves multiple tasks that generate data with each iteration, which
must be processed by adjacent tasks, forming a structure that can be represented as a directed graph.
9
Initially, we model the problem as a Binary Quadratic Program (BQP), considering both
computational and communication costs, and establish that the problem is NP-hard. Subsequently,
we relax the formulation to a Semi-Definite Program (SDP) and apply a randomized rounding
method using a multi-variate Gaussian distribution. We also calculate the expected bottleneck time.
This scheduling scheme is then implemented in gossip-based federated learning to demonstrate its
effectiveness in real-world iterative processes. Our numerical tests on the MNIST and CIFAR-10
datasets reveal that our method surpasses established distributed computing scheduling techniques.
• Scheduling Distributed Computing Applications using Graph Convolutional Networks: We
present a highly efficient solution to the traditional challenge of scheduling task graphs for complex
applications in distributed computing systems. Previously, various heuristics were developed to
optimize task scheduling based on different metrics such as makespan and throughput, but they
often proved too slow for large-scale and dynamic systems. To address these limitations, we
introduce a novel approach: the Graph Convolutional Network-based Scheduler (GCNScheduler).
This method integrates the inter-task data dependencies and the computational network into a
unified input graph, allowing for rapid and efficient task scheduling for different objectives. Our
simulations show that the GCNScheduler not only learns from existing scheduling methods quickly
but also scales effectively to large systems where traditional methods fails to do so. We also validate
the ability of the GCNScheduler to generalize to unseen real-world applications, demonstrating that
it matches the makespan and throughput of benchmark methods while taking several orders of
magnitude less time in finding the schedule.
• Graph Kolmogorov-Arnold Networks: We introduce Graph Kolmogorov-Arnold Networks
(GKAN), a novel neural network architecture that adapts the principles of Kolmogorov-Arnold
Networks (KAN) to graph-structured data. By leveraging the unique properties of KANs, specifically
the use of learnable univariate functions in place of fixed linear weights, we create a powerful model
10
tailored for graph-based learning tasks. Unlike traditional Graph Convolutional Networks (GCNs),
which rely on a fixed convolutional framework, GKANs employ learnable spline-based functions
between layers, revolutionizing information processing within graph structures.
We propose two methods for integrating KAN layers into GKAN: in the first architecture, learnable
functions are applied to input features after aggregation, and in the second, they are applied before
aggregation. Our empirical evaluation on the Cora dataset, using a semi-supervised graph learning
task, demonstrates that GKANs generally outperform traditional GCNs. For instance, with 100
features, a GCN achieves an accuracy of 53.5%, while a GKAN with a comparable number of
parameters achieves 61.76%; with 200 features, a GCN reaches 61.24%, whereas a GKAN achieves
67.66%. We also explore the effects of various parameters, such as the number of hidden nodes, grid
size, and the polynomial degree of the spline, on the performance of GKAN.
To summarize, these contributions mark significant advancements in distributed computing and graph
neural networks. Through thorough analysis and the development of innovative methodologies, our
research enhances the performance and scalability of both distributed computing systems and graph
representation learning.
11
Chapter 2
Background
The background chapter of this dissertation lays the groundwork for understanding the core concepts and
technologies central to our study, particularly focusing on distributed computing systems. It emphasizes
the optimization of performance, security, and efficiency, and explores the application of machine learning
to enhance these systems. Additionally, we provide the foundational background for Graph
Kolmogorov-Arnold Networks (GKAN) for general purposes. Initially, we delve into the distributed ledger
for mobile devices, discussing its significance and architecture. We then examine the nature of distributed
iterative processes and scheduling problems, highlighting their impact on distributed systems and the
challenges they present in maintaining efficient and robust operations. The core of this chapter is dedicated
to an in-depth examination of various machine learning and deep learning models, including Multi-Layer
Perceptrons (MLP), Convolutional Neural Networks (CNN), Federated Learning (FL), Graph Convolutional
Networks (GCN), and Kolmogorov-Arnold Networks (KAN). For each model, we discuss its foundational
principles and relevance to distributed computing and graph representation learning. This discussion aims
to provide the reader with a thorough understanding of the current state of machine learning and its
applications to distributed computing problems, setting the stage for our investigation into innovative
strategies to advance distributed computing and graph representation learning.
12
2.1 Distributed Ledger Systems
Distributed ledger systems (DLS) function as decentralized databases, maintained by multiple participants
spread across various locations. Unlike traditional centralized databases, distributed ledgers enable data to
be recorded, shared, and synchronized across numerous nodes without requiring a central authority. This
technology forms the backbone of cryptocurrencies like Bitcoin and Ethereum, but its utility extends far
beyond digital currencies, influencing sectors such as finance, supply chain management, healthcare, and
more. The key concepts and technologies of distributed ledger systems (DLS) include the following:
• Decentralization and Consensus Mechanisms: Distributed ledgers function on a peer-to-peer
network where every node retains a copy of the ledger. Consensus mechanisms are essential to
ensure all ledger copies are synchronized and unanimously agreed upon by participants. Prominent
consensus criteria include Proof of Work (PoW), employed by Bitcoin [58], and Proof of Stake (PoS),
used by Ethereum 2.0 [59]. These mechanisms prevent double-spending and uphold the ledger’s
integrity.
• Blockchain Structure: A blockchain is a specific type of distributed ledger where transactions are
organized into blocks, which are linked sequentially. Each block contains a cryptographic hash of
the previous block, a timestamp, and transaction data, creating an immutable chain of records. This
structure enhances transparency and security [60].
• Smart Contracts: Smart contracts are self-executing contracts with terms directly embedded into
code. These contracts automatically enforce and execute terms when predefined conditions are met,
minimizing the need for intermediaries and boosting efficiency. Ethereum is a notable platform that
supports smart contracts [61].
Applications of Distributed Ledger Systems (DLS) include:
13
• Finance: Distributed ledger systems are transforming the financial sector by improving the
efficiency, security, and transparency of transactions. DLS streamline international transactions by
eliminating intermediaries, reducing the associated time and costs. This is especially beneficial for
global trade settlements, where traditional processes can be slow and expensive. DLS also provide a
transparent and immutable record of asset ownership and transfers, crucial for managing assets like
stocks, bonds, real estate, and digital assets. They simplify tracking and transferring assets, reducing
fraud, and enhancing trust. Platforms such as Ethereum enable the creation and management of
tokenized assets. By implementing DLS, trading platforms can ensure the integrity and transparency
of trade processes, automate compliance, reduce fraud risks, and enable real-time trade settlement,
enhancing market efficiency.
• Supply Chain Management: Distributed ledger systems significantly enhance supply chain
operations by ensuring transparency, authenticity, and traceability. They enable detailed tracking of
goods from production to end consumers, crucial for industries like luxury goods, pharmaceuticals,
and food products, where authenticity is key. Blockchain ensures transparent recording of all
transactions, reducing counterfeiting risks. By providing a single, immutable source of truth for all
supply chain transactions, DLS improve transparency. Real-time information on the status and
location of goods is accessible to stakeholders, leading to better decision-making and increased trust.
As far as operational optimization, companies like IBM and Walmart have adopted blockchain
solutions for supply chain optimization. For instance, IBM’s Food Trust blockchain provides
end-to-end visibility in the food supply chain, enhancing food safety and reducing waste. Walmart
uses blockchain to track produce freshness, ensuring higher quality for consumers.
• Healthcare: In healthcare, distributed ledger systems enhance the security, interoperability, and
accuracy of patient data management. DLS provide a secure framework for storing and sharing
patient records, ensuring data encryption and access only for authorized parties. This enhances
14
patient privacy and data security, reducing risks of data breaches and unauthorized access.
Blockchain facilitates healthcare data interoperability across different systems and providers. Using a
standardized, immutable ledger, patient information can be seamlessly shared among hospitals,
clinics, and other healthcare entities, leading to better-coordinated care and improved patient
outcomes. By providing an immutable record of patient histories, DLS improve the accuracy and
reliability of medical records, ensuring healthcare providers access accurate patient information for
effective diagnosis and treatment.
The challenges of distributed ledger systems include:
• Scalability: A major challenge for distributed ledger systems is scalability. As transaction volumes
increase, the computational resources and time required for processing also escalate. To mitigate
these issues, techniques such as sharding and off-chain transactions are being investigated. Sharding
involves splitting the ledger into smaller, manageable pieces, while off-chain transactions handle
some processes outside the main blockchain to reduce load [62].
• Security: Despite the decentralized and secure nature of distributed ledgers, they are not
invulnerable to attacks. Issues such as vulnerabilities in smart contracts, the risk of 51% attacks
where a single entity controls the majority of the network’s mining power, and potential threats
from quantum computing pose significant security concerns. Continuous research and development
are essential to devise strategies to counter these threats [63].
Distributed ledger systems represent a transformative technology with the potential to revolutionize
various industries by enhancing transparency, security, and efficiency. Ongoing research and development
are crucial to overcoming current challenges and fully realizing the potential of this technology.
15
2.2 Distributed Iterative Processes on Networked Systems
Distributed iterative processes are a core methodology in computational systems that require repetitive
computation across multiple nodes or resources. These processes play a pivotal role in addressing complex
problems across diverse fields such as scientific computing, machine learning, and big data analytics. By
distributing the workload across several computational resources, these systems significantly enhance
performance, scalability, and fault tolerance. The key concepts of distributed iterative processes on
networked systems include the following:
• Parallelism and Concurrency: Distributed iterative processes harness parallelism by dividing
tasks into smaller sub-tasks that can be processed simultaneously across different nodes. Effective
concurrency management is essential to ensure these tasks are executed in an orderly fashion,
avoiding conflicts and maintaining data consistency. This method enhances the overall speed and
efficiency of the computation [64].
• Synchronization and Coordination: Synchronization mechanisms are vital in distributed iterative
processes to ensure effective coordination among nodes. This involves managing task dependencies,
coordinating the start and end of iterative cycles, and maintaining consistency across all nodes.
Techniques such as distributed locks are commonly employed to achieve these objectives, ensuring
that all nodes progress together harmoniously [65].
• Data Partitioning and Distribution: Efficient data partitioning and distribution are critical for
optimizing the performance of distributed iterative processes. Data is segmented into chunks and
allocated across nodes to balance the computational load and minimize communication overhead.
Methods such as hashing, range partitioning, and graph partitioning are utilized to distribute data
effectively, ensuring that each node processes an appropriate amount of data [66].
Applications of distributed iterative process on networked systems include:
16
• Scientific Computing: In scientific computing, distributed iterative processes are employed for
simulations, numerical methods, and large-scale computations. They enable the handling of complex
calculations that would be impractical on a single machine, such as climate modeling, molecular
dynamics, and astrophysical simulations. These processes allow scientists to perform detailed and
large-scale analyses efficiently [67].
• Machine Learning: Distributed iterative processes are integral to training large-scale machine
learning models. Techniques such as distributed stochastic gradient descent (SGD) facilitate the
parallel training of models across multiple nodes, significantly speeding up the training process and
enabling the handling of massive datasets. This approach is crucial for developing sophisticated
machine learning applications [68].
• Big Data Analytics: In big data analytics, distributed iterative processes are used to analyze large
datasets across distributed storage systems. Frameworks like Apache Spark leverage these processes
to perform iterative operations such as data transformation, aggregation, and machine learning on
vast amounts of data. This capability is essential for extracting meaningful insights from big data
[69].
Some of the associated challenges of distributed iterative processes on networked systems include:
• Scalability: Scalability remains a major challenge for distributed iterative processes. As data sizes
and task complexities grow, it becomes increasingly difficult to maintain efficient performance.
Research into scalable algorithms and architectures is ongoing to address this challenge and ensure
that distributed systems can handle ever-increasing workloads.
• Communication Overhead: Minimizing communication overhead between nodes is essential to
maximize efficiency. Techniques such as data locality optimization and the development of efficient
17
communication protocols are being explored to reduce the time and resources spent on node-to-node
communication [70].
2.3 Scheduling Problems for Distributed Computing Systems
Scheduling tasks efficiently is essential in distributed computing, where computational work is distributed
across multiple resources. Effective scheduling maximizes resource utilization, minimizes execution times,
and ensures balanced loads, which are critical for optimizing performance. The associated challenges of
scheduling problems are prevalent in various domains, including cloud computing, high-performance
computing, and distributed databases.
Types of scheduling problems are as it follows:
• Task Scheduling: This involves assigning tasks to different nodes in the distributed system. The
main goal is to minimize the overall execution time (makespan), balance the load across nodes, and
ensure efficient resource utilization [71].
• Resource Scheduling: Focuses on allocating resources such as CPU, memory, and storage to
various tasks and jobs. This ensures that resources are not over-allocated or underutilized, which can
lead to performance degradation or resource wastage [72].
Some of the key challenges of scheduling problems are:
• Heterogeneity of Resources: Distributed systems often consist of nodes with varying
computational power, memory capacity, and other resources. Scheduling algorithms must account
for these differences to optimize task allocation [73].
• Dynamic Environment: The state of the system can change dynamically due to node failures,
network issues, or varying workloads. Effective scheduling algorithms must be adaptive and resilient
to such changes [74].
18
• Scalability: As the size of the distributed system grows, the scheduling algorithm must efficiently
handle the increased number of tasks and resources without significant performance degradation
[75].
• Communication Overhead: Tasks often require data from other tasks or nodes, leading to
communication overhead. Scheduling algorithms need to minimize this overhead to improve overall
system performance [76].
• Load Balancing: Ensuring that all nodes have a balanced workload is crucial to prevent some nodes
from being overloaded while others remain underutilized. This helps in achieving better system
performance and resource utilization [77].
Scheduling algorithms can be categorized into the following:
• Static Scheduling: Tasks are assigned to nodes before execution begins. This approach works well
when the workload is predictable and the system’s state remains relatively stable. Examples include
Round Robin, First-Come-First-Served (FCFS), and Static Load Balancing [78].
• Dynamic Scheduling: Tasks are assigned to nodes during execution based on the current state of
the system. This approach is more flexible and can adapt to changes in the system. Examples include
Dynamic Load Balancing, Genetic Algorithms, and Ant Colony Optimization [79].
• Heuristic-Based Scheduling: These algorithms use heuristics or rules of thumb to make
scheduling decisions. They are specially useful when finding an optimal solution is computationally
infeasible [80].
• Metaheuristic-Based Scheduling: These are higher-level procedures designed to find good
solutions to optimization problems by combining different heuristic methods. Examples include
Simulated Annealing, Particle Swarm Optimization, and Tabu Search [81].
19
As far as the metrics for evaluation of schedulers are concerned, here are some of them:
• Makespan: The total time required to complete all tasks.
• Throughput: The number of jobs completed in a given time period.
• Resource Utilization: The extent to which computational resources are used effectively.
• Load Balance: The extent to which the distribution of workload across all nodes are even.
• Scalability The ability of the algorithm to handle an increasing number of tasks and resources
efficiently.
2.4 Machine Learning Models
In the realm of optimizing distributed computing systems, machine learning models present a promising
approach to enhancing performance. This section explores key machine learning models utilized in this
dissertation, examining their underlying principles, benefits, and relevance to distributed computing
environments.
2.4.1 Multi-Layer Perceptrons
Multi-Layer Perceptrons (MLPs) are a type of feedforward artificial neural network that include at least
two layers, with one or more being hidden layers. Each layer is made up of neurons that apply nonlinear
activation functions. The backbone of MLPs is based on the universal approximation theorem, which states
that a feedforward network with a single hidden layer containing a finite number of neurons can
approximate any continuous function under certain conditions. MLPs can model complex data
relationships by adjusting the weights between neurons during training, typically using backpropagation
techniques [82]. MLPs are integral to many machine learning frameworks, including convolutional neural
20
networks, federated learning, and graph convolutional networks. Their power comes from their capacity to
learn nonlinear models, enabling them to effectively recognize subtle patterns in data.
2.4.2 Convolutional Neural Networks
Convolutional Neural Networks (CNNs) are a specialized class of artificial neural networks designed for
processing structured grid data, such as images [83]. CNNs consist of convolutional layers that apply filters
to scan across the input data, capturing local patterns such as edges, textures, and shapes. This architecture
enables CNNs to detect spatial hierarchies in data, making them particularly effective for image and video
recognition tasks. Additionally, CNNs incorporate pooling layers that downsample the data, reducing
dimensionality and computational complexity, and fully connected layers that consolidate the extracted
features for final predictions. The ability of CNNs to automatically learn and generalize features from raw
input data has positioned them as the foundation of many state-of-the-art systems in computer vision and
related fields [84, 85].
2.4.3 Federated Learning
Federated learning is a decentralized approach to machine learning where multiple clients, such as mobile
devices or edge servers, collaboratively train a shared model while keeping their local data private.
Introduced by McMahan et. al. [86], this method enables the aggregation of insights from diverse data
sources without the need to centralize data, thus enhancing privacy and security. As an example of a
distributed iterative process, federated learning involves each client training the model on their local data
and then sharing only the model updates (gradients) with a central server. The server aggregates these
updates to improve the global model, which is then sent back to the clients for the next round of training.
There are several types of federated learning: horizontal federated learning, where clients have similar
features but different samples; vertical federated learning, where clients have different features but the
21
same sample ID space; and federated transfer learning, which handles cases where both features and
samples are different among clients. This iterative process continues until the model converges. Federated
learning is particularly useful in scenarios where data privacy is paramount, such as healthcare, finance,
and IoT applications.
2.4.4 Graph Convolutional Networks
Graph Neural Networks (GNNs) are a class of neural networks designed to perform inference on data
structured as graphs. They excel at capturing the dependencies and relationships between nodes in a graph
through iterative message passing and aggregation processes. GNNs come in various types, including
Graph Convolutional Networks (GCNs) [87], which generalize convolution operations to graph structures;
Graph Attention Networks (GATs), which leverage attention mechanisms to weigh the importance of
neighboring nodes; and Graph Recurrent Networks (GRNs), which use recurrent neural networks to handle
dynamic graphs. GNNs are particularly useful for distributed computing systems because distributed
computing problems inherently form graph structures. By applying GNNs, we can effectively model and
extract underlying graph-based relationships of among distributed resources as well as among tasks need
to executed on machines, leading to optimized task scheduling, resource allocation, and load balancing in
distributed environments. A
D
C
F
B
E
A
A
A
A
B
B
C
C
D
E
F
INPUT GRAPH
TARGET NODE
hD
(2)
CONVOLVE(2)
CONVOLVE (1)
ℎ!
(#)
ℎ%(!)
(#)
"
hB
(1)
hC
(1) hA
(2)
Figure 2.1: Illustration of the convolution process in a 2-layer GCN. Each node updates its embedding by
aggregating information from its neighbors across multiple layers.
22
The goal of a GCN is to generate meaningful embeddings (feature representations) for each node in the
graph. Here, we explain how this process works, particularly focusing on the methodology introduced by
Kipf et. al. [87]. The GCN starts with an input graph where each node has an initial feature vector. These
features could represent various attributes of the nodes. For example, in a social network, the features
might represent user profile information. These initial feature vectors are denoted as h
(0)
i
for node i. GCNs
apply convolution operations on the graph to update the node embeddings. This is done through layers of
message passing where each node aggregates information from its neighbors. In the first layer, each node
updates its embedding by aggregating features from its immediate neighbors. The updated embedding for
node i in the first layer, h
(1)
i
, is computed as follows:
h
(1)
i = σ
X
j∈N (i)
1
cij
h
(0)
j W(0)
. (2.1)
Here:
• N (i) denotes the neighbors of node i.
• cij is a normalization constant (e.g., the degree of node i).
• h
(0)
j
represents the initial features of the neighboring node j.
• W(0) is a weight matrix.
• σ is an activation function (e.g., ReLU).
The embeddings from the first layer are further refined in the second layer by considering the
neighbors’ neighbors. The updated embedding for node i in the second layer, h
(2)
i
, is computed similarly:
h
(2)
i = σ
X
j∈N (i)
1
cij
h
(1)
j W(1)
. (2.2)
23
After several layers of convolution, each node i ends up with a final embedding h
(L)
i
that captures both
the local and global structure of the graph. These embeddings can then be used for various tasks like node
classification, link prediction, or clustering.
Figure 2.1 is a conceptual figure illustrating the convolution process on a graph for a 2-layer GCN.
Graph Convolutional Networks leverage the graph structure to iteratively refine node embeddings by
aggregating information from neighboring nodes. This process enables GCNs to learn meaningful
representations that are useful for a variety of graph-based tasks, such as distributed computing problems.
2.4.5 Kolmogorov-Arnold Networks
Kolmogorov-Arnold networks (KANs) [53] are based on the Kolmogorov-Arnold representation theorem,
which asserts that any continuous multivariate function can be represented as a superposition of
continuous univariate functions. Specifically, Kolmogorov’s theorem states that for any continuous
function f of n variables, there exist continuous univariate functions ϕi and continuous functions ψi such
that
f(x1, x2, . . . , xn) =
2
Xn+1
i=1
ϕi
Xn
j=1
ψij (xj )
. (2.3)
This contrasts with Multi-Layer Perceptrons (MLPs), which approximate functions by learning weights
for multiple layers of neurons connected in a feedforward manner, relying on nonlinear activation
functions to capture complex patterns. While MLPs use layers of interconnected neurons to learn
representations directly, KANs decompose the problem into simpler, univariate functions that are
combined to reconstruct the original multivariate function. This approach can potentially offer more
interpretable models for certain types of problems, as it breaks down the approximation process into more
manageable components.
24
Each of these machine learning models provides distinct benefits for addressing challenges in
distributed computing systems. The selection of an appropriate model is influenced by the specific
attributes of the computational tasks and the nature of the distributed computing problem. By utilizing
these models, researchers in distributed computing can create more advanced and efficient solutions,
thereby improving the performance and robustness of distributed computing systems.
25
Chapter 3
Related Works
In this chapter, we present an extensive analysis of the dynamic field of distributed computing and graph
representation learning techniques, ranging from traditional to state-of-the-art machine learning (ML)
methods. We begin by discussing conventional strategies that have formed the foundation of distributed
ledger and consensus mechanisms. Moving forward, we examine the shift towards utilizing diverse
scheduling methods and advanced learning-based algorithms in distributed computing. This includes a
detailed explanation of various scheduling schemes and their operations. Subsequently, we delve into the
established principles of graph representation learning. ∗
3.1 Conventional Distributed Ledger and Consensus
The original Bitcoin paper by Nakomoto [58] provided a comprehensive solution to multiple challenges,
including consensus (via longest-chain adoption), ledger representation (through a hashed chain of blocks),
Sybil control (using Proof of Work), and incentives (through mining and transaction rewards). However,
we contend that it is both feasible and beneficial to examine these components individually. To
contextualize our work, we briefly review the literature across these four dimensions.
∗This chapter is adapted from [88],[89],[90],[91],[92].
26
Table 3.1: Comparison of Different Protocols.
Protocol Sybil
Control
Method
LeaderlessLedger
Structure
BFT Trans.
per
Second
per
Shard
Confirm.
Latency
Numb.
of Validators
Mobilebased
Bitcoin [58] PoW No Chain No 7 ∼ 40
mins
100k+ No
Ethereum [59] PoW/PoS No Chain No 20 ∼ 60s 100k+ No
Tendermint [93]Agnostic No Chain Yes ∼ 10K ∼ 2-15s ∼
100-1k
No
Avalanche [7] Agnostic Yes DAG Yes ∼ 3.4K ∼ 1.35s 100k+ No
Blizzard Agnostic Yes DAG Yes ∼ 10K ∼ 0.65s 100M+ Yes
• Consensus: This work focuses exclusively on the problem of distributed consensus, particularly
Byzantine Fault Tolerance (BFT). The formalization of the Bitcoin protocol in [94] demonstrates its
extension to provide Byzantine agreement. There exists earlier literature on BFT consensus for
distributed systems, mainly concentrating on leader-based protocols like PBFT [95] and
BFT-Smart [96], among others. Recently, in the context of blockchain, several new leader-based
consensus solutions have emerged to improve the speed of BFT consensus under partial synchrony
assumptions, including Tendermint [93], Hotstuff [97], and Casper CBC [98]. Additionally, there has
been recent work on leaderless protocols, such as Hashgraph [99], Avalanche [7], DBFT [100], and
Aleph [101], which do not require a single proposer or leader for each round of consensus. The
Blizzard protocol, described in this paper, is a leaderless BFT protocol designed to enable mobile
devices to participate in consensus by leveraging online brokers. Blizzard builds on the gossip-based
consensus concept initially presented in Avalanche but differs structurally due to the introduction of
aggregating brokers, necessitating distinct safety, throughput, and latency analyses, which are all
presented in this work.
• Ledger Representation: There are broadly two approaches to ledger representation in distributed
ledger systems: a linear blockchain, as seen in Bitcoin, Ethereum, and many other protocols, or a
27
directed acyclic graph (DAG) data structure, where transactions or blocks point to other previous
transactions or blocks. Examples of DAG-based protocols include IOTA [102], Hashgraph [99],
Avalanche [7], and Helix [103].
• Sybil Control: Examining various Sybil control mechanisms, we find that Ethereum also utilizes
Proof of Work, while newer projects and systems like Cosmos [104], Algorand [105],
Ouroboros [106], Dfinity [107], and Ethereum 2.0 [59] explore energy-efficient alternatives such as
Proof of Stake and delegated Proof of Stake. In permissioned blockchains like Hyperledger
Fabric [108] and Hyperledger Sawtooth [109], Sybil control is managed explicitly by allowing only a
predefined set of vetted validators. This work does not address Sybil control, similar to other
research focused on distributed consensus.
• Incentives: Like most prior work on BFT consensus protocols and permissioned blockchains (unlike
many cryptocurrency projects such as Bitcoin and Ethereum), Blizzard is agnostic to how incentives
are provided to validating nodes. This flexibility allows Blizzard to be used in a wider range of
applications beyond cryptocurrency, with the potential for incentive mechanisms to be employed as
a separate, modular layer if necessary.
• Mobility: Several key studies have laid the groundwork for mobile-based consensus protocols,
though they primarily focus on nodes with mobility rather than utilizing smartphones as consensus
nodes. Xiao-li et al. [110] introduce a clustering-based consensus protocol for mobile ad hoc
networks to reduce message cost. Wu et al. [111] present an efficient consensus protocol for
MANETs using a hierarchical approach for message efficiency. Badache et al. [112] address the
consensus problem in mobile environments with disconnections. A modular approach for designing
hierarchical consensus protocols in MANETs is proposed in [113]. The design of a message-efficient
consensus protocol for MANETs is presented in [114]. Our proposed Blizzard protocol, as the first to
28
specifically use smartphones for consensus, represents a significant departure from these earlier
works.
A crucial aspect of widely adopted protocols such as Bitcoin, Ethereum, and Avalanche is their open,
permissionless nature, allowing any device to join or leave the network at any time. Consequently, they do
not utilize any information about the total number of validator nodes when voting for blocks. This feature,
essential for mobile networks, incurs the cost that such validator number-agnostic protocols can only be
proven safe under a synchronous model [115]. The safety of permissionless consensus in partially
synchronous or asynchronous models is not guaranteed [115]. Similarly, the Blizzard protocol does not
assume knowledge of the total number of nodes involved, limiting its safety assurances to a synchronous
model, akin to Bitcoin, Ethereum, and Avalanche.
In Table 3.1, we summarize some of the main blockchain protocols and their key properties and
attributes, along with Blizzard, to contextualize our contribution. As we will demonstrate, Blizzard offers
both high throughput and low latency comparable to state-of-the-art protocols while enabling significantly
greater scalability by allowing mobile devices to serve as transaction validators.
3.2 Traditional Methods for Scheduling for Distributed Iterative
Processes
Efficient task scheduling is essential for optimizing the utilization of computing resources and minimizing
the time required to execute tasks. Task scheduling can be categorized into various groups based on
different perspectives. For example, from the perspective of the type of tasks to be processed, task
scheduling is traditionally divided into two categories: static and dynamic scheduling. Static scheduling is
applicable when information about tasks (such as required computational resources or deadlines) and
computing resources (such as processing power or communication delays) is available in advance. In
29
contrast, dynamic scheduling deals with the arrival of new tasks during the execution of ongoing tasks.
Another classification of task scheduling schemes is based on the relationship between tasks: independent
tasks and dependent tasks (as seen in realistic applications). Dependent tasks can be represented through a
directed graph where tasks and inter-task data dependencies are represented as nodes and edges,
respectively. The directed graph enforces inter-task dependencies by ensuring a task is executed only after
all its predecessors are completed.
Task scheduling schemes can also be categorized based on the type of algorithms used to assign tasks
to computing resources. Heuristic, meta-heuristic, and optimization-based are three main categories of task
scheduling schemes. Heuristic task scheduling schemes can be further divided into subcategories based on
their objectives, such as load balancing [116, 117, 118], priority-based scheduling [119, 120, 121], task
duplication [122], and clustering [123].
Since heuristic algorithms may significantly deviate from the optimal task scheduling, meta-heuristic
and optimization-based schemes have garnered considerable attention for approximating the NP-hard
optimization of task scheduling. These schemes are suitable for solving large-scale problems and are
practically efficient in yielding near-optimal solutions. Some examples of meta-heuristic schemes include
Particle Swarm Optimization [22], Simulated Annealing [23, 24], and Genetic-based approaches [25, 26, 27].
While there has been limited work focused on providing convex-relaxation-based solutions for the
scheduling problem, these approaches are not applicable to any distributed iterative process as they do not
consider a general directed task graph (which may contain cycles) [28, 29]. Furthermore, these schemes
often focus on different objective functions with varying constraints (such as allowing task splitting across
distributed computing machines) rather than minimizing bottleneck time. Additionally, they do not
account for the fact that different links/paths between distributed computing machines may have varying
communication bandwidths. To our knowledge, this is the first work that a) considers general directed
30
graphs, b) minimizes bottleneck time by taking into account both compute costs and heterogeneous
network communication costs, and c) provides a convex-relaxation-based solution.
3.3 Machine Learning based Scheduling Schemes
All the aforementioned heuristic, meta-heuristic, and optimization-based schemes tend to run very slowly
as the number of tasks increases due to their iterative nature, which demands excessive computations. This
limitation makes these schemes unsuitable for handling large-scale task graphs.
As obtaining an optimal scheduler is essentially equivalent to finding an appropriate mapper function
that assigns tasks to compute machines, machine-learning-based scheduling has emerged as a promising
alternative, leveraging advances in fundamental learning methods such as deep learning [124] and
reinforcement learning (RL) [125]. Sun et al. proposed DeepWave [126], a scheduler that reduces job
completion time utilizing RL while considering a priority list†
as the action and the reward is the
completion time of a job DAG. Additionally, Decima [127] schedules tasks over a Spark cluster by training
a neural network with RL, using the scheduling of the next task for execution as the action and minimizing
the makespan as the reward. However, these RL-based schemes suffer from having a large action space, i.e.,
the space of scheduling decisions.
While Decima [127] operates only in homogeneous environments, Grinsztajn et al. proposed
READYS [128] to function in heterogeneous environments. READYS combines two components: a Graph
Convolutional Network and an actor-critic algorithm. There are three main differences from our work
regarding their use of GCN: first, they use the GCN to embed task nodes only, without considering
network settings as we do; second, they use a regular GCN that does not explicitly account for directed
nodes, whereas we use an Edge-Directed Graph Neural Network (EDGNN) [129] which does; and finally, in
†Which indicates the scheduling priority of edges in a job DAG.
31
READYS, the GCN is not used for scheduling (it only embeds task nodes, while scheduling is done via the
actor-critic algorithm), whereas we are the first to propose using a GCN directly for task scheduling.
3.4 Graph Representation Learning Schemes
Graph representation learning has become a critical area of research in machine learning, driven by the
need to process graph-structured data effectively. The foundation for modern graph representation
learning was laid by the introduction of Graph Convolutional Networks (GCNs) by Kipf and Welling in
[87]. GCNs operate on graph structures by applying convolutional operations, which aggregate
information from a node’s neighbors to learn node representations. This approach can be broadly divided
into two domains: spectral and spatial.
Spectral domain GCNs are based on the graph Fourier transform, where convolutions are defined in
the frequency domain [130]. The key idea is to project the graph signals onto the eigenbasis of the graph
Laplacian, apply the convolution operation, and then transform back to the spatial domain. This method
leverages the eigenvalues and eigenvectors of the graph Laplacian to perform convolution, but it suffers
from high computational complexity and difficulty in generalization to different graph structures.
Spatial domain GCNs, on the other hand, define convolutions directly on the graph in the spatial
domain [87]. These methods aggregate features from a node’s local neighborhood, making them more
intuitive and scalable compared to spectral methods. Kipf and Welling’s GCN is a prime example of a
spatial GCN, where the convolution operation is simplified using a localized first-order approximation of
spectral graph convolutions.
Building on the foundation of GCNs, several advanced graph neural network architectures have been
developed to address various challenges and improve performance. Graph Attention Networks
(GATs) [131] introduce attention mechanisms to GCNs, allowing the model to assign different weights to
32
different nodes in the neighborhood during aggregation. This attention mechanism helps in capturing the
importance of different neighbors, leading to better representation learning.
Graph Recurrent Neural Networks (GRNNs) extend GCNs to dynamic graphs, where the graph
structure and node features can change over time. GRNNs leverage recurrent neural network architectures,
such as LSTMs or GRUs, to model temporal dependencies in the graph [132]. This approach enables the
learning of node representations that evolve over time, making it suitable for applications like traffic
prediction and social network analysis.
Graph Transformers represent another advancement in graph representation learning, inspired by the
success of transformer models in natural language processing [133]. Graph Transformers adapt the
self-attention mechanism to graphs, enabling the capture of long-range dependencies and complex
interactions within the graph. This architecture has shown promise in tasks requiring high expressiveness
and flexibility.
In addition to these, other notable advancements include GraphSAGE [134], which introduces an
inductive framework for learning node embeddings, and Graph Autoencoders [135], which apply
autoencoder architectures to graphs for tasks like link prediction and node clustering.
The field of graph representation learning continues to evolve, with ongoing research focused on
improving scalability, expressiveness, and applicability to various real-world problems. The development
of new architectures and algorithms promises to further enhance our ability to process and understand
complex graph-structured data.
33
Chapter 4
Distributed Ledger and Consensus
In this chapter, we discuss our proposed distributed mobile-based consensus protocol called Blizzard, as
well as the GCN-based consensus mechanism ∗
.
Blizzard is a leaderless consensus protocol where mobile nodes connect and interact with online
servers known as brokers. Instead of storing and communicating with addresses for all mobile devices,
these mobile nodes only need to handle a number of end addresses that scales with the number of brokers
in the system†
. Each mobile node connects to a random subset of these servers for a specified period,
forming a broadcast group with all other mobile nodes in each broker’s group for querying and responding.
Importantly, Blizzard is designed to operate efficiently across environments with varied mobile devices,
adapting seamlessly to different operating systems, hardware capabilities, and communication technologies
among these devices.
As far as the proposed GCN-based consensus mechanism, it is a novel approach by employing graph
neural networks (GNNs) to orchestrate consensus within gossip-based protocols, capitalizing on their
aptitude for encapsulating complex functions essential for network-based querying processes. The
architecture of our method leverages the inherent capability of GNNs to process and analyze the relational
data among nodes, which continuously exchange and update information. In this context, each node in the
∗This chapter is adapted from [88] and [89].
†Non-mobile devices such as online servers can also serve as validators. Blizzard is the first protocol to explicitly allow mobile
device-based validators, which can be turned off or connect intermittently.
34
network is characterized by a set of features that not only represent its state but also dynamically influence
and determine the outcomes of its interactions with neighboring nodes. This work demonstrates the
GNN’s ability to integrate and learn from the multifaceted nature of these interactions makes it an ideal
candidate for managing the underlying complexities of gossip-based consensus mechanisms.
We briefly enumerate our contributions in this work as follows:
1. Blizzard is the first mobile-based leaderless Byzantine Fault Tolerant (BFT) distributed consensus
protocol. Not only can mobile devices issue transactions, but they can also participate in the core
transaction verification and consensus process. This could increase the number of nodes capable of
participating in validation by 2 to 3 orders of magnitude, enhancing both adoption and security.
2. We propose a novel two-tier protocol, where consensus between mobile nodes is enabled by the use
of online brokers, and a decentralized matching scheme that ensures each mobile node connects to
exactly k random brokers.
3. We provide a provable safety guarantee by mathematically deriving the set of ratios of Byzantine
nodes and brokers for which Blizzard’s safety is ensured. We also discuss why liveness holds in
Blizzard.
4. We analytically characterize the throughput of Blizzard by modeling the sequential pipeline involved
in processing transactions at each node and identifying the throughput bottleneck via empirical
profiling.
5. Similarly, we analytically characterize the confirmation latency and message complexity of Blizzard.
We show that the use of brokers creates a communication topology allowing efficient consensus;
under reasonable parameter settings, transactions are propagated in Blizzard within just four
communication rounds with high probability.
35
6. Through experiments based on a software implementation of Blizzard, we demonstrate that it is
capable of 10,000 transactions per second per shard, with sub-second confirmations.
7. We show that our GCN-based consensus model could mimic Avalanche protocol to illustrate its
effectiveness in learning and adapting to the intricate functions required for consensus operations.
8. Our findings indicate that GNNs not only excel in learning these complex relational functions but
also facilitate the incorporation of additional inputs such as the risk associated with each entity. This
enhances the protocol’s ability to reach a consensus by utilizing a more informed and comprehensive
network analysis.
4.1 The System Model
We first provide details on the system model and the proposed Blizzard scheme in Section 4.1.1, where we
discuss the network composition, consensus mechanism, and the integration of mobile devices. Following
this, in Section 4.1.2, we elaborate upon the system model of the GCN-Consensus scheme, outlining the use
of Graph Neural Networks to enhance the consensus process and manage the complexities of gossip-based
protocols.
4.1.1 Proposed Blizzard Scheme
We consider a broker-assisted mobile network where disjoint sets NC and NM respectively represent sets
of correct and malicious mobile nodes (set of all nodes is denoted by N := NC ∪ NM). Furthermore, set B
indicates set of all brokers, which consists of sets BC and BM representing the subsets of correct and
malicious brokers, respectively. Other notations are provided in Table 4.1.
36
Nodes
Brokers
�" �# �$ �% �&
�" �# �$ �% �( �) �*
Figure 4.1: An illustration on how Blizzard works for k = 2. Mobile node 1 queries brokers B2 and B3
on a new transaction these brokers have not queried yet (shown with blue arrows). Then, these brokers
query all their connected mobile nodes about the transaction (depicted with orange arrows). Afterwards,
all connected mobile nodes respond back to queries (shown with green arrows) and brokers reflect the
majority vote to all of the connected nodes (presented by dashed-black arrows).
Mobile nodes issue cryptographically signed transactions. We assume all validating nodes have access
to a common function that can determine if any two transactions are conflicting or not‡
. Correct nodes
never issue conflicting transaction, while Byzantine nodes may issue conflicting transactions.
Regarding misbehavior acts, we assume the existence of both Byzantine mobile nodes as well as
brokers. In Blizzard, malicious brokers are effectively limited to suppressing messages as they do not sign
or initiate any messages themselves and are assumed to not being able to forge messages from mobile
nodes. As far as the Byzantine behavior for malicious mobile nodes is concerned, malicious nodes are
computationally limited (not able to forge signature) while they can choose any execution strategy that
they desire. Moreover, the consensus remains unaffected by mobile nodes turning off as long as the
fraction of correct connected mobile nodes are sufficiently high [7].
Since we aim to have a protocol such that any vote on a new transaction would be a vote on some
previous transactions, we incorporate a DAG structure into our protocol. To do so, some parent
transactions would be assigned for each new transaction. Therefore, any vote on a specific transaction is a
‡This is general enough, for example, to cover the detection of conflicting transactions under either a UTXO or account-based
model.
37
Table 4.1: Notations Description
n : Number of all mobile nodes (where n = |N |).
c : Number of correct mobile nodes.
b : Number of Byzantine mobile nodes.
m : Number of all brokers.
mc : Number of correct brokers.
mb : Number of Byzantine brokers.
N (b)
: Set of connected mobile nodes to broker b.
B
(u)
: Set of connected brokers to mobile node u.
k : Number of brokers being sampled by each
mobile node.
α : Majority threshold of mobile nodes for considering a “yes” vote.
η : Majority threshold of brokers for considering
a “yes” vote.
β1 : Security threshold used for consecutive
counter.
β2 : Security threshold used for confidence
counter.
vote on all of its ancestor transactions§
as well. The overview of the DAG structure is presented in
Appendix B.
QueryBroker (b, T) :=
1 if P
u′∈N (b) QueryN ode(u
′
, T) ≥ η|N (b)
|
0 if else
(4.1)
where QueryN ode(u
′
, T) is 1 if transaction T and all its ancestor transactions are the preferred
transactions among their corresponding conflict sets (see Appendix B for more details on conflict sets and
preferred transaction in a conflict set); otherwise it is 0.
We next present how our proposed Blizzard scheme works in detail.
§All transactions accessible through the parent of a transaction are referred as ancestor transactions.
38
Algorithm 1 Blizzard Algorithm
Input: Set of mobile nodes N , brokers B, and transactions coming over time
Output: Set of non-conflicting transactions accepted by every correct mobile node u ∈ N
Initialization:
- Each node u ∈ N randomly connects to k brokers represented by B
(u)
- Set Tu = Qu = ∅ for all nodes which do not issue transactions where Tu and Qu represent known
and queried transaction sets of node u, respectively
1: while there is a transaction T at any node u such that T ∈ Tu and T ̸∈ Qu do
2: Rbrokers := P
b∈B(u) QueryBroker(b, T)
3: if Rbrokers ≥ αk then
4: vu,T = 1 ▷ T receives a voucher and is appended to the DAG of node u
5: Update DAG and the conflicting sets of node u after appending T
6: else
7: Qu = Qu ∪ {T} ▷ mark T as a queried transaction
8: end if
9: end while
4.1.1.1 Proposed Blizzard Scheme
Our proposed scheme works as follows: each node u ∈ N connects to k brokers (represented by B
(u)
)
uniformly at random (the mechanism of k random connections will be discussed in the following
subsection) and queries them on a new transaction T.
Upon receiving a query, each broker b ∈ B(u)
computes η-majority vote on T by querying all of its
connected nodes denoted by N (b)
. Then, each node v ∈ N (b)
for all brokers b ∈ B(u)
affirmatively
responds to the query if all of the ancestor transactions of T are currently the preferred choice of
transaction in their corresponding conflict sets in the stored DAG of node v. Afterwards, broker b ∈ B(u)
aggregates the count of all positive responses collected from nodes N (b)
and sends an affirmative response
to all its connected nodes using a suitable key aggregation scheme if ≥ η|N (b)
| (where 1
2 < η < 1)
collected responses from nodes N (b)
are positive. In the case of having ≥ αk (where 1
2 < α < 1) positive
responses, collected from brokers Bu for T, then T will be appended to the stored DAG of node u and node
u never queries T again. The protocol would continue by following the same process for all nodes having
T in their known transaction sets. An example of our proposed scheme for k = 2 where one node queries
39
on a new transaction is depicted in Fig. 4.1. The details of the Blizzard protocol are elaborated in Algorithm
1 and side function Querybroker(., .) in (4.1).
We next present a distributed mechanism to enforce k random connections for mobile nodes.
4.1.1.2 Distributed Random Matching
For Blizzard to work, we need to ensure that each mobile node is connected to k random brokers. Since
random connection plays a key role in preventing collusion between Byzantine entities, we propose a
mechanism which requires all mobile nodes, even Byzantine ones, to provide a proof of their connections
being random.
Our proposed distributed random matching scheme works as follows:
1. Each mobile device applies a Hash function on the combination of the random number coming from
a distributed random beacon¶
[136] with the ID of the mobile device. Regarding the Hash function, it
outputs B bits where 0 and 1 are equally likely. Then the indices of the first k ones represent the
brokers each mobile device has to connect with. The mobile device afterwards sends the output of its
Hash function as well as its IDs to the brokers it is supposed to connect with.
2. Brokers validate the ID of mobile devices, verify that hash values generated by mobile devices are
correctly produced, taking into account the distributed random beacon, and thus verify that mobile
devices are authorized to connect.
An illustration of the proposed distributed random matching scheme is depicted in Fig. 4.2. One crucial
factor ensuring the effectiveness of our proposed scheme is that the hash function outputs a sufficiently
long sequence (or equivalently, a large value of B), ensuring that there are at least k ones in the hashed
value with high probability. Theorem 1 determines parameter B for our proposed distributed random
matching scheme to satisfy this condition.
¶Note that a distributed random beacon is now live online at https://drand.love/ .
40
Distributed Random Beacon
ID Hash
Mobile
Verifier
Broker
Figure 4.2: Proposed distributed random matching scheme for connecting each mobile node to k brokers.
Theorem 1. In order to ensure the probability of existing at least k ones in a sequence of length B to be 1 − δ
(for small δ), parameter B should satisfy 1
2
log 1
δ = ( 1
2 −
k−1
B
)
2B.
Proof.
P(H(B) ≤ k − 1) ≤ δ = exp (−2ϵ
2B)
(4.2)
where (4.2) follows from Hoeffding’s inequality, H(B) indicates the number of ones in a sequence of
length B, and ϵ := 1
2 −
k−1
B
.
4.1.2 Proposed GCN-Consensus Scheme
In terms of the network model, it consists of nodes divided into two distinct sets: NC and NM,
representing the correct and malicious nodes, respectively. The total set of nodes is denoted by N , where
N = NC ∪ NM. The total number of nodes in the network is n, with nc representing the number of
correct nodes and nm representing the number of malicious nodes, such that n = nc + nm. Additionally,
N (i)
indicates the subset of nodes connected to a specific node i.
Nodes in the network issue cryptographically signed transactions. It is assumed that all validating
nodes have access to a function to determine whether any two transactions conflict, which is general
enough to handle conflicts in both UTXO and account-based models. Correct nodes do not issue conflicting
transactions, whereas Byzantine nodes may do so.
41
Regarding misbehavior, we consider the presence of Byzantine nodes. These malicious nodes, while
computationally limited (incapable of forging signatures), can adopt any operational strategy. The
consensus process remains stable despite nodes going offline, as long as a sufficient number of mobile
nodes remain correctly connected, as detailed in [7].
Following the approaches in [7] and [6], our protocol incorporates a Directed Acyclic Graph (DAG)
structure, ensuring that voting on any new transaction also constitutes a vote on several preceding
transactions. Each new transaction is assigned parent transactions, making a vote on any given transaction
a vote on all of its ancestor transactions. The DAG structure is the same as DAG structure of [6] and [7].
Conceptually, GCNs are closely related to neighborhood-aggregation encoder algorithms [137].
However, instead of aggregating information from neighbors, GCNs view graphs from the perspective of a
message passing algorithm between nodes [138]. In the GCN framework, each node is initialized with an
embedding, which is the same as its feature vector. At each layer of the GCN algorithm, nodes take the
average of neighbor messages and apply a neural network as follows:
h
(t)
v = σ
W(t)
1 h
(t−1)
v + W(t)
2
X
u:u∈N (v)
h
(t−1)
u
, (4.3)
where h
(t)
v , W(t)
1
, W(t)
2
, and σ represent the hidden vector of node v at layer t, the weight matrix at layer t
for the self node, the weight matrix at layer t for neighboring nodes, and a non-linear function (e.g., ReLU),
respectively. Furthermore, N (v) indicates the neighbors of node v. After K layers of neighborhood
aggregation, we obtain the output embedding for each node. These embeddings, along with any loss
function, can be used to train GCN model parameters using stochastic gradient descent.
We next present the features that could be considered for each node to utilize all available information
comprehensively:
42
Table 4.2: Features for Consensus Mechanism
Feature Description
Coordinate of Nodes Geographical or logical coordinates of a node within
the network topology.
Risk Associated with a
Node
Probability or level of risk of failure or malicious
behavior based on historical data and behavior analysis.
Time it Issues a
Transaction
Timestamp when a node issues a transaction, crucial
for ordering and validating transactions.
Time to Verify the
Transaction
Time taken by nodes to verify a transaction, affecting
consensus speed and efficiency.
Number of Connected
Nodes
Count of direct connections a node has, influencing
its robustness and trust level in the network.
Node Reputation Assessment of a node’s reliability and past performance in the network.
Transaction Size Size of the transactions issued by the node, which
may affect processing time and priority.
Node Capacity Resources available to a node, such as computational
power and storage, affecting its performance.
Network Latency Delay in communication between nodes, impacting
the speed of consensus and transaction verification.
Consensus Protocol
Compliance
Degree to which a node adheres to the rules and
mechanisms of the consensus protocol.
Security Level Measures the security protocols in place for a node,
including encryption and authentication mechanisms.
Energy Consumption The amount of energy a node consumes during its
operation, relevant in eco-friendly blockchain designs.
These features can be extended further, as our proposed GCN-Consensus is not tailored to any specific
set of features.
4.1.2.1 Proposed GCN-Consensus Method
Our proposed scheme operates as follows: Each node u ∈ N connects to k nodes, represented by N (u)
,
uniformly at random and queries them on a new transaction T.
Upon receiving a query, each node v ∈ N (u)
responds affirmatively if all ancestor transactions of T
are the preferred choice of transaction in their corresponding conflict sets in the stored DAG of node v.
43
Then, node u uses GCN to aggregates the embedding of neighboring node v for all v ∈ N (u)
. Note that
node v’s response to a query is one of node v’s features, and the input features essentially form the node
embeddings at layer zero. In case the otuput of GCN for node u belongs to class 1, then T will be appended
to the stored DAG of node u and node u will no longer query T; otherwise it belongs to class 0 and T will
not receive a voucher to be appended to the stored DAG. This process continues for all nodes with T in
their known transaction sets. An example of node embedding representation of a two-layer
GCN-Consensus scheme is depicted in Figure 4.3. The details of the GCN-Consensus protocol are
presented in Algorithm 2 and QueryN ode(v, T) is 1 if transaction T and all its ancestor transactions are
the preferred transactions among their corresponding conflict sets; otherwise, it is 0.
A
D
C
F
B
E
A
A
A
A
B
B
C
C
D
E
F
INPUT GRAPH
TARGET NODE
hD
(2)
CONVOLVE(2)
CONVOLVE (1)
ℎ!
(#)
ℎ%(!)
(#)
�
hB
(1)
hC
(1) hA
(2)
h!
(#) = X!
Layer 0: (Input features)
Figure 4.3: An illustration of how node embeddings are formed in a two-layer GCN-Consensus.
We next present a distributed mechanism to enforce k random connections for mobile nodes.
4.2 Safety and Liveness
As mentioned earlier, it has been shown in [115] that permissionless protocols that do not make any
assumption about the total number of nodes involved in the consensus protocol can only be proved to
operate correctly under a synchronous model. This applies to previously proposed protocols such as
44
Algorithm 2 GCN-Consensus Algorithm
Input: Features of nodes N are initialized and transactions coming over time
Output: Non-conflicting transactions accepted by every correct node u ∈ N
Initialization:
- Each node u ∈ N randomly connects to k nodes represented by N (u)
- Set Tu = Qu = ∅ for all nodes which do not issue transactions, where Tu and Qu represent known
and queried transaction sets of node u, respectively
1: while there is a transaction T at any node u such that T ∈ Tu and T ̸∈ Qu do
2: if GCN(X)u belongs to class 1 then
3: vu,T = 1 ▷ T receives a voucher and is appended to the DAG of node u
4: Update DAG and the conflicting sets of node u after appending T
5: else
6: Qu = Qu ∪ {T} ▷ mark T as a queried transaction
7: end if
8: end while
Bitcoin, Ethereum and Avalanche, and also to Blizzard. Thus, in the following discussion, we assume a
synchronous network where the maximum latency for any message is bounded by a known constant.
4.2.1 Safety Analysis of Blizzard
To establish the robustness of the Blizzard scheme in terms of safety, it suffices to demonstrate that every
correct node uniformly agrees on the same transaction from a set of conflicting ones within a finite time
frame almost surely. For instance, consider a scenario where transaction T1 is already present and
transaction T2 (which conflicts with T1) is newly introduced to the network. For simplicity and illustrating
the nodes’ preferences between two conflicting transactions, we assign two distinct colors to the nodes: red
and blue.
Nodes favoring transactions T1 and T2 are depicted as red and blue colored nodes, respectively. For
simplicity and without any loss of generality, our analysis concentrates on scenarios where all correct
nodes unanimously choose the red color as their consensus.
The operational guidelines for mobile nodes and brokers in the network are outlined as follows:
1. Every correct mobile node always responds with honesty upon receiving any query.
45
2. A queried Byzantine node may respond with any color or even refuse to respond.
3. Every correct broker computes η-majority of the votes collected from all nodes connected to it and
broadcasts the majority vote to all of them.
4. Byzantine brokers cannot forge information since they cannot cryptographically sign transactions
from nodes. However, a Byzantine broker may compute η-majority of the votes collected from an
arbitrary subset of nodes connected to it and send the computed vote to any selected nodes which
are connected to this broker. They could also choose not to communicate.
The primary objective of adversaries with respect to safety is to prevent correct nodes from reaching
consensus on the same transaction among all conflicting transactions within a finite time. To achieve this,
malicious nodes do their best to hinder the process of all correct nodes reaching a unanimous decision on a
single color.
In order to prevent adversarial attacks, similar to Avalanche, we incorporate two following counters
for each mobile node:
1) Conviction in Current Color (C3) counter to store how many consecutive computed majority votes
have resulted in the same color. Once a node flips its color, this counter reset to zero. Furthermore, a node
locks into the current color when this counter exceeds some security parameter β1.
2) Confidence counter to take into account the number of queries which have yielded a majority vote
for their corresponding colors. A node flips its color only if the confidence value of its computed vote is
larger than the confidence value of the current color. Moreover, a node locks into a color once the
confidence value of this color exceeds some security parameter β2.
Color-based Blizzard: Considering the aforementioned assumptions, the color-based Blizzard scheme
works as follows: each mobile node connects to k brokers uniformly at random and queries them. Please
note that randomly connecting each mobile to k brokers is guaranteed by the distributed random matching
46
scheme described in the previous section. Then, each of these brokers queries all its connected nodes
regarding their colors. Subsequently, each of the connected nodes, upon being queried, sends its color to
the brokers which it is connected to. Once the color of all nodes connected to broker i received, then the
broker computes whether ≥ η|N (i)
| (where N (i)
represents nodes connected to broker i and η ∈ (
1
2
, 1])
collected responses have the same color or not. In the case of having ≥ η|N (i)
| responses with the same
color, then the broker broadcasts the computed majority vote to all nodes connected to it.
Once α-majority of votes collected from brokers connected to a node yields to a color, i.e. receiving
≥ αk positive responses, the Confidence counter for that color will increase by one. If this color (the one
computed from the majority votes coming from k brokers) is the same as the node’s current color, C3
counter increases by one; otherwise the node resets C3 counter to zero. A node flips its color to a new color
if the Confidence counter of new color is larger than the Confidence counter of current color.
Safety Analysis of Color-based Blizzard: Without loss of generality we assume that the initial node
colors are randomly assigned such that c
2 + 1 nodes have red color while the remaining nodes are blue ∥
.
Let us represent correct mobile nodes which prefer red and blue colors with u and v, respectively. To
show consensus is achieved, we can show that nodes preferred blue color gradually gain confidence in the
red color over time and they eventually turn into red-preferred nodes with high probability. By defining
z.Conf[R] as the confidence of node z in color R, i.e., the number of queries yielding a majority vote for
color R for node z (and similar definition for z.Conf[B]), the rule for color change is straightforward: node
z will change its color to red if z.Conf[R] > z.Conf[B] and flips to blue otherwise.
Our proof for reaching consensus will be shown as the following steps:
Step 1: After some finite time, system reaches to the point where there are c
2 + ∆ red nodes while the
remaining nodes are blue.
∥This is the worst case scenario for reaching consensus due to balance in number of nodes with different colors.
47
Step 2: At this point, v nodes have negative average growth in confidence value for blue color at any time
∗∗, with high probability. After short period of time, we have v.Conf[B] − v.Conf[R] = −1, with high
probability, and as a result, v nodes flips their color to red. Note that u nodes just gain more confidence in
red as time elapses in this step.
4.2.1.1 Step 1
We can model our scheme as a discrete time Markov Chain with state si
, ∀i ∈ {0, . . . , c}, where si
represents the state with i red and c − i blue nodes, with transition probability matrix M.
Since each of these transition probabilities are a function of the adversary, let us now elaborate upon
its behavior. The most malicious scenario conducted by adversary aims to achieve the following goal: keep
the confidence value of blue and red colors nearly the same for u nodes, while letting v nodes to increase
their confidence value of blue color as much as they can. More formally, the scenario is to have
u.Conf[R] = u.Conf[B] + 1 and maximizing κ where κ ≜ v.Conf[B] − v.Conf[R]. Therefore,
• When a u node queries: All Byzantine mobile nodes acquire red colors. Regarding the malicious
brokers, all of them act with honesty without suppressing any color.
• When a v node queries: All Byzantine mobile nodes pick blue colors and all Byzantine brokers with
red-color majority switch to a blue-majority broker by not reflecting their red-color mobile nodes. With
high probability, as shown in Appendix A, a node is connected to at most f ≜
mbr
m
Byzantine brokers in a
population of r red brokers and m − r blue brokers.
Since none of the transition probabilities are zero, the system can reach to the state sc/2+∆ in finite
time. We will discuss how ∆ can be determined in the next step. It is important to emphasize that we set
the security parameters k, α,η, β1, and β2 such that no node finalizes its color during this step.
∗∗Average growth in confidence value of node z for color x is E[z.Conf (t)
[x]] − E[z.Conf (t−1)[x]].
48
4.2.1.2 Step 2
In this step, we show that how v nodes flip their color to red and as a result, all correct nodes reach
consensus with high probability. To do so, we write the expected confidence value for mobile nodes at time
t as follows:
E[u.Conf (t)
[R]] = E[u.Conf (t−1)[R]] + P(C
(t)
u
is red)
= E[u.Conf (t−1)[R]] + Xm
r=αk
P(C
(t)
u = red | Ai
r
)P(A
i
r
)
= E[u.Conf (t−1)[R]] + Xm
r=αk
(
X
k
j
′=αk
r
j
′
m−r
k−j
′
m
k
)
m
r
p
r
i+b
(1 − pi+b)
m−r
,
(4.4)
where C
(t)
u represents the color of the computed vote of node u at time t. Moreover, Ai
r denotes the event
of having r red-majority brokers and m − r blue-majority brokers when there are i red mobile nodes in a
population of n nodes. Therefore, P(Ai
r
) = m
r
p
r
i
(1 − pi)
m−r
, where pi
, the probability of a broker to be
red given total i red nodes in a population of n nodes, is
pi ≜
Xn
ℓ=1
P
j≥ηℓ
i
j
n−i
ℓ−j
n
ℓ
.
| {z }
probability of the broker being red given ℓ connections
×
n
ℓ
(
1
m
)
ℓ
(1 −
1
m
)
n−ℓ
| {z }
probability of the broker having ℓ connections
(4.5)
Similarly, we would have
E[u.Conf (t)
[B]] = E[u.Conf (t−1)[B]] + Xm
r=αk
X
k
j
′=αk
r
j
′
m−r
k−j
′
m
k
m
r
(1 − pi)
r
pi
m−r
(4.6)
E[v.Conf (t)
[R]] = E[v.Conf (t−1)[R]] + Xm
r=αk
X
k
j
′=αk
r−f
j
′
m−r+f
k−j
′
m
k
m
r
p
r
i
(1 − pi)
m−r
(4.7)
E[v.Conf (t)
[B]] = E[v.Conf (t−1)[B]] + Xm
r=αk
X
k
j
′=αk
r+f
j
′
m−r−f
k−j
′
m
k
m
r
(1 − pi)
r
pi
By defining D ≜
E[v.Conf (t)
[B]] − E[v.Conf (t−1)[B]]
−
E[v.Conf (t)
[R]] − E[v.Conf (t−1)[R]]
, we have
D =
Xm
r=αk
X
k
j
′=αk
r+f
j
′
m−r−f
k−j
′
m
k
m
r
(1 − pi)
r
pi
m−r −
Xm
r=αk
X
k
j
′=αk
r−f
j
′
m−r+f
k−j
′
m
k
m
r
p
r
i
(1 − pi)
m−r
.
(4.9)
We now aim to show that D acquires negative values and once v.Conf[B] − v.Conf[R] reaches value −1,
all correct nodes acquire red color. Let us introduce the following random variables
Xt ≜ v.Conf (t)
[B] − v.Conf (t)
[R] and X1:t ≜
Pt
i=1 Xi
.
Since X1:t satisfies Hoeffding’s inequality condition due to the fact that a) Xt are i.i.d and b) Xt
’s
are sub-Gaussian due to their nature of having bounded values, we would have
P(X1:t − E[X1:t
] ≥ q) ≤ exp (−2tq2
).
Therefore, in order to show X1:t
is negative, with high probability, it suffices to show that E[X1:t
] is
negative. To do so, based on the recursive formula (4.7) and (4.8), we need to show that D acquires
negative values under certain conditions. Let us first elaborate upon how (4.9) can be approximated as in
the following Theorem.
Theorem 2. D can be approximated as follows
D ≈ G(k, m, 1 + ρb, α, 1 − pi) − G(k, m, 1 − ρb, α, pi),
(4.10)
where
G(k, m, λ, α, pi) ≜
X
r
1
1 + e
−1.702 km λr−αk √ km λr(1− λrm )
e
−
(r−mpi
)
2
2σ2
i
p
2πσ2
i
,
σ
2
i ≜ mpi(1 − pi).
(4.11)
Proof. We note that the Probability Mass Function (PMF) of hyper-geometric distribution with parameters
(r, m, k) is phg(x; r, m, k) ≜
(
r
x
)(m−r
k−x
)
(
m
k
)
. By approximati
1. Binomial distribution Bionomial(n, p) (corresponds to terms m
r
p
r
i
(1 − pi)
m−r
or
m
r
(1 − pi)
r
pi
m−r
) with Normal distribution N (np, np(1 − p)),
2. Hyper-geometric distribution Hyper − Geometrical(r, m, k) with Normal distribution
N (k
r
m
, k r
m
(1 −
r
m
)),
D can be approximated as
D ≈
X
r
1 − Φ( αk − k
r+f
q m
k
r+f
m (1 −
r+f
m )
)
e
−
(r−m(1−pi
))2
2σ2
i
p
2πσ2
i
−
X
r
1 − Φ( αk − k
r−f
q m
k
r−f
m (1 −
r−f
m )
)
e
−
(r−mpi
)
2
2σ2
i
p
2πσ2
i
.
(4.12)
where Φ(x) represents the Cumulative Distributive Function (CDF) of Normal distribution N (0, 1).
According to [139], Φ(x) ≈
1
1+e−1.702x . Therefore, by substituting f = ρbr and using the aforementioned
approximation of Φ(x), we can approximate D as (4.10).
Remark: One can easily see that D can acquire negative values when pi >
1
2
. This is due to the fact that
the logistic coefficient of Normal distribution in G(.) considerably scales down Normal distribution
N (m(1 − pi), mpi(1 − pi)) (appeared in G(k, m, 1 + ρb, α, 1 − pi)) compared to Normal distribution
N (mpi
, mpi(1 − pi)) (appeared in G(k, m, 1 − ρb, α, pi)).
Remark: As k increases, logistic term in G(.) forms a sharper transition around central point ††
.
Therefore, for a sufficiently large k, D would be negative if mα
1+ρb
> m(1 − pi)
‡‡ and mα
1−ρb
< mpi
§§. The
aforementioned conditions are equivalent to pi > max( α
1−ρb
, 1 −
α
1+ρb
).
Determining ∆: The proper choice of i can be obtained by finding the least integer i where D is negative.
By denoting the appropriate i as i
∗
, then ∆ (presented in step 1) can be found by solving c
2 + ∆ = i
∗
.
††Value which makes logistic term equals to 1
2
.
‡‡Mean of N (m(1 − pi), mpi(1 − pi)) < central point of logistic coefficient of N (m(1 − pi), mpi(1 − pi)).
§§Mean of N (mpi, mpi(1 − pi)) > central point of logistic coefficient of N (mpi, mpi(1 − pi)).
All tuples (ρn, ρb) for which the safety is guaranteed can be obtained by checking if there exists an i
such that D < 0 for those values of ρn and ρb.
We further perform simulations to obtain all tuples (ρn, ρb) such that the safety is assured for the case
of having 2,000 mobile devices. Fig. 4.4 illustrates this guaranteed safety region, shown with yellow color,
for different number of brokers and different number of connections. We consider 1,000 iterations with 0.05
resolution for the Byzantine ratio of nodes and brokers. One interesting observation is that Byzantine ratio
of mobile nodes and brokers can respectively reach 50% (when ρb is small) and < 60% of Byzantine brokers
(when ρn is small), with a tradeoff seen between these ratios. Notably, if the ratio of Byzantine nodes
increases, the maximum tolerable ratio of Byzantine brokers for ensuring safety decreases, and vice versa.
This dynamic is particularly relevant when considering the specific behaviors of Byzantine entities in
Blizzard. Malicious brokers are mostly limited to suppressing messages, as they cannot initiate or forge
messages, while Byzantine mobile nodes, although computationally constrained and unable to forge
signatures, can adopt any execution strategy. It is crucial to recognize that mobile devices are inherently
more susceptible to failures, such as battery drain or network disconnections, which, within the context of
the Blizzard protocol, are treated akin to malicious behaviors. However, the protocol is designed to operate
effectively as long as the combined proportion of Byzantine mobile nodes and brokers falls within the
safety guaranteed region. Moreover, brokers, due to their typically more stable infrastructure, are less
prone to the types of failures common to mobile devices, allowing for a higher tolerable ratio of Byzantine
brokers compared to mobile nodes. This distinction underscores the strategic advantage of investing in the
integrity and reliability of brokers. By ensuring a higher proportion of correct brokers, Blizzard can
maintain operational safety even with a greater ratio of mobile nodes exhibiting malicious or
failure-induced behaviors. This insight highlights the protocol’s capacity to adapt to and mitigate the risks
associated with the inherent vulnerabilities of mobile nodes, further enhancing Blizzard’s resilience and
operational efficacy in environments with diverse Byzantine threats. One notable observation is that the
52
safety region expands as k increases. This phenomenon can be attributed to the intuitive understanding
that a higher value of k indicates a greater involvement of nodes in the transaction query.
The safety region plot effectively showcases the resilience of the Blizzard protocol against adversarial
behaviors, particularly focusing on the range of actions that Byzantine nodes and brokers might undertake.
It is crucial to note that while correct mobile nodes consistently respond truthfully, Byzantine nodes may
offer any response. Additionally, while correct brokers broadcast the η-majority vote, Byzantine brokers,
despite their inability to forge information, can selectively compute and communicate the η-majority from
a subset of nodes. This nuanced behavior underlines the importance of the safety region in ensuring that
all correct nodes decide on the same transaction among conflicting ones, a vital aspect for maintaining
integrity in real-world scenarios where diverse adversarial tactics may be employed. For example, if there
are 170 brokers in the network and around 20% of them are malicious, then according to the safety
guaranteed region in Figure 3, the Blizzard protocol guarantees reaching consensus as long as the portion
of Byzantine mobile nodes does not exceed 40% of all mobile nodes. This demonstrates Blizzard’s capacity
to uphold consensus and safety even in significantly adverse conditions, emphasizing its practical utility in
distributed systems where the presence of malicious actors is a real concern.
4.2.2 Safety Evaluation of GCN-Consensus
Similar to Blizzard [88] and Avalanche [7], to demonstrate the robustness of the GCN-Consensus scheme in
terms of safety, we illustrate how our GCN-Consensus can learn from Avalanche [7] to ensure that every
correct node uniformly agrees on the same transaction from a set of conflicting ones within a finite time
frame. For example, consider a scenario where transaction T1 already exists and a conflicting transaction
T2 is newly introduced to the network. To simplify and illustrate the nodes’ preferences between two
conflicting transactions, we assign two distinct colors to the nodes: red and blue.
53
� = 100
�
=
3
5
�
=
5
� = 170
Node Byzantine ratio
Node Byzantine ratio
Node Byzantine ratio
Node Byzantine ratio Node Byzantine ratio
Broker Byzantine ratio Broker Byzantine ratio
Broker Byzantine ratio Broker Byzantine ratio
Figure 4.4: An illustration of safety-guaranteed region (indicated with yellow color) of Blizzard protocol
through simulation for different number of connections of each mobile node, i.e. k, and different number of
brokers, i.e. m while fixing total number of mobile nodes n = 2000 and 1000 iterations.
Nodes favoring transactions T1 and T2 are depicted as red and blue nodes, respectively. The
operational guidelines for nodes in the network, similar to those in Avalanche, are outlined as follows:
1. Every correct node always responds honestly upon receiving any query.
2. A queried Byzantine node may respond with any color or even refuse to respond.
The primary objective of adversaries with respect to safety is to prevent correct nodes from reaching
consensus on the same transaction among all conflicting transactions within a finite time. To achieve this,
malicious nodes attempt to obstruct the process of all correct nodes reaching a unanimous decision on a
single color.
To counter adversarial attacks, similar to Blizzard and Avalanche, we incorporate the Conviction in
Current Color (C3) counter as well as the Confidence counter (as mentioned in the previous section) for each
node.
54
Color-based GCN-Consensus: For simplicity, we assume that the initial node colors are randomly
assigned such that c
2 + 1 nodes are colored red, while the rest are colored blue. This represents the most
challenging scenario for reaching consensus due to the even distribution of nodes with different colors.
Regarding the input features of nodes, we consider seven features: the current color (where 1
represents red and 0 indicates blue), the preference counter in red, the preference counter in blue, the
conviction in current color, the confidence threshold (set the same value as in Avalanche), the conviction
threshold (set the same value as in Avalanche), and the finalization status. The output of the GCN is a
binary value, where 1 indicates red and 0 represents blue.
Under these assumptions, the color-based GCN-Consensus scheme operates as follows: at each round,
every node connects to k other nodes uniformly at random. Then, by running GCN-Consensus on the
input features, the model’s output determines the current color of the correct nodes. Note that malicious
nodes can take any color. The input features are updated in each subsequent round.
Using this scheme, we trained a two-layer GCN-Consensus model with 50 hidden nodes on a 400-node
graph with an edge probability of 0.05. Figure 4.5 illustrates the safe region (the ratio of Byzantine nodes
that can be tolerated without compromising the safety of the proposed scheme). As expected and similar to
Blizzard, the GCN-Consensus protocol guarantees safe operation as long as the ratio of Byzantine nodes
does not exceed 50%. Figure 4.6 shows the required number of rounds to reach consensus. Note that 0
values correspond to cases where consensus is not guaranteed and should not be considered. As expected,
the average number of rounds to reach consensus increases as the Byzantine ratio increases. This is due to
the fact that having more malicious nodes strengthens the adversary’s ability to delay consensus by
deviating correct nodes from reaching a unanimous decision.
55
Figure 4.5: Percentage of reaching consensus by GCN-Consensus scheme versus Byzantine ratio of nodes
for test graphs with 100 nodes.
4.2.3 Liveness
Similar to other DAG-based protocols such as IOTA [102] and Avalanche [7], liveness failure in Blizzard
occurs when either a transaction has an invalid transaction as its parent or a transaction does not gain
enough confidence value. The former scenario can be resolved by re-issuing the transaction with new valid
parents, while the latter could be resolved by having a node send additional valid transactions as
successors to increase the confidence value.
4.3 Performance Analysis of Blizzard
We first present our analysis on throughput per shard, then we focus on analyzing latency.
4.3.1 Throughput per Shard
We elaborate upon obtaining the throughput of Blizzard from a novel perspective, i.e. modeling it as a
pipeline, as illustrated in Fig. 4.7. By considering ti as the required time for performing the task of
56
Figure 4.6: Required number of rounds to reach consensus by GCN-Consensus scheme versus Byzantine
ratio of nodes for test graphs with 100 nodes and setting conviction threshold to 15. Note that 0 means
that it does not reach consensus.
component i, and considering that the component with the smallest rate would dominate the result, the
throughput would be equal to min1≤i≤8
1
ti
.
Considering the network bandwidth as BW bps and the transaction size as 300 Bytes, the rate of the
communication components (represented by green colored boxes, specifically components 2-3, 5, and 7)
would be approximately BW
2400 transactions per second (tps). Among the computing components (orange
colored boxes), i.e. components 1,4,6,8, component 4 (checking if a transaction is strongly preferred)
dominates the computing time due to the fact that it is more time-consuming compared to the other
computing components. Using this analysis, we will quantify the throughput using experimental
measurements in section 4.4.
57
Node
Preprocessing
Query
Connected
Brokers
Query
Connected
Nodes
Nodes
Compute
IsStronglyPref
Brokers
Compute
Majority
of Nodes’
Responses
Brokers
Response
To Node
Nodes
Response
To Broker
Nodes
Compute
Majority
of Brokers’
Responses
Component
Required
Time:
New
Transaction
�" �# �$ �% �& �' �( �)
1 2 3 4 5 6 7 8
Figure 4.7: The pipeline modeling of different components of Blizzard to acquire throughput. We use orange
for computing components and green for communication components in our box diagrams
4.3.2 Latency
Total latency refers to the time interval from when a transaction is issued until it is finalized. It is
upper-bounded by the sum of three terms:
Latency ≤ tpropagation + tvalidation + tconfidence (4.13)
These terms are, respectively, 1) the propagation time tpropagation - the time taken for a transaction to be
disseminated throughout the entire network, 2) the transaction validation time tvalidation, and 3) the
confidence-gathering time tconfidence, which is the time required for a transaction to achieve a threshold level
of confidence through successive transactions voting for it. We analyze each of the aforementioned terms
as follows.
4.3.2.1 Propagation Time
Since the propagation time is linearly proportional to the number of communication rounds ¶¶, we aim to
obtain the Least Number of Communication Rounds (LNCR) required for a transaction to propagate in the
entire network by starting from the node which issued this transaction and ending by the last node which
discovers this transaction. We are able to prove a strong result about LNCR in Blizzard.
¶¶A communication round is a communication transmission from a mobile node to a broker or vice versa.
58
Theorem 3. Let us assume that (m−k)n
(m−1)m > 1, then LNCR of Blizzard equals 4 with high probability.
Proof. We first demonstrate that, with high probability, the distance between any two vertices representing
brokers in the corresponding bipartite graph is 2. Then, the distance between any two vertices indicating
mobile nodes would be at most 2 more than that, i.e. 4 with high probability. The probability that vertices
v1 and v2, ∀v1 ̸= v2 and v1, v2 ∈ V , have distance 2 can be obtained as follows:
P(dist(v1, v2) = 2) = P(existence of at least one node
u ∈ U which is connected to both v1 and v2)
(a)
= 1 −
1 −
m−2
k−1
m−1
k−1
n
m
= 1 −
1 −
m − k
m − 1
n
m (b)
≈ 1 − e
−r
where (a) follows from considering average n
m mobile nodes per broker and (b) follows from
limn→∞ (1 −
r
n
)
n = e
−r
and considering r :=
(m−k)n
(m−1)m
. It is easy to see e
−r
is small enough if r > 1 (or
equivalently the condition of the Theorem holds). This condition is realistic due to the fact that we expect
n >> m.
4.3.2.2 Transaction Validation Time
As mentioned earlier, the transaction validation time refers to the time required for a node to check the
validity of transactions. Therefore, we can use the pipeline scheme, explained in the previous section on
throughput and illustrated in Fig. 4.7, to model this time. Mathematically, the transaction validation time
can be expressed as P8
i=1 ti
.
4.3.2.3 Confidence-gathering Time
Let us assume L represents the average number of transactions come later after a transaction appended to
the DAG until it gets finalized∗∗∗. By defining ζ as the arrival rate of transactions, the average
confidence-gathering time would be L
ζ
.
Using the above analysis, we will quantify the latency using experimental measurements in section 4.4.
4.3.3 Average Message Complexity
Average message complexity refers to average number of messages required for a transaction to be queried
by all nodes. The average message complexity of Blizzard can be obtained by noting that each broker is
queried by one of its connected nodes and then collects and sends back the majority vote to all its
connected nodes. This implies that Blizzard needs m + 2kn messages for querying.
While we argue that Avalanche cannot be implemented on mobile device networks in a scalable
manner as it requires direct peer to peer communication between any two random devices, we can still
compare Blizzard and Avalanche in terms of their trade-off between message complexity and LNCR
assuming Blizzard were to be implemented on the same network as Avalanche, as shown in Fig. 4.8. For
having a fair comparison (i.e. equal number of nodes being queried on each transaction), q (number of
sampled nodes in Avalanche) should be nk
m
. As q increases in Avalanche protocol, LNCR decreases while
total number of required messages significantly increases. However, Blizzard protocol obtains the best of
both worlds, i.e. having a lower LNCR and lower total number of required messages †††
.
∗∗∗The value of L would depend on the security parameters of the protocol as well as the DAG-attachment policy adopted by
nodes; in the special case when all transactions are attached sequentially in a chain, it can be shown that L would be equal to
min (β1, β2).
†††Note that the number of required messages in Blizzard could be reduced further by decreasing m, but this would result in
lower security.
60
3.019
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10
Number of Messages
Thousands
LNCR
Blizzard
Avalanche
Figure 4.8: The total number of massages versus the lowest number of communication rounds (LNCR) in
Blizzard compared with Avalanche for the following setting: n = 500, k = 3, and m is varied from 4 to 19.
4.4 Implementation and Experimental Measurements
We implemented Blizzard in C++ and also a version of Avalanche for comparison purposes. We have made
our source code for the implementation of Blizzard as well as its comparison with Avalanche publicly
available at https://github.com/ANRGUSC/Blizzard.
Using our implementation, we ran computations of Blizzard on a compute machine with configuration
of 2.3 GHz Intel core i5 (which its frequency is less than the the frequency of state-of-the-art Mobile CPUs
[140]), so that all computations are emulated in real time. For the communication between nodes and
brokers, we simulated it with one-way network latency drawn from uniform distribution with 100ms mean
and standard deviation 25ms. We next present the experimental measurements of throughput and latency,
as well as an estimation of battery energy consumption.
4.4.1 Throughput Evaluation
As we explained in 4.3.1, since component 4 (checking if a transaction is strongly preferred) in Fig. 4.7
dominates the computational time, we implemented component 4 in C++ and observed empirically that
t4 ≈ 100µs for the setting where there are 400 transactions known by mobile nodes. Since the computing
61
power of top 10 iOS mobile devices exceeds the computing power of the device used in the implementation
we performed‡‡‡, the throughput of Blizzard is as presented in the following table:
Network
Bandwidth 100 Mbps 10 Mbps 1 Mbps
Throughput
on PCs 10,000 TPS 4,166 TPS 416 TPS
4.4.2 Latency Evaluation
Assuming security parameters β1 = 11, β2 = 150, a chain topology for the DAG and a transaction arrival
rate higher than 100 tps, the propagation and validation times are going to be dominant compared to the
confidence time, and in turn they will each be dominated by four communication steps; this implies a total
latency on the order of < 1s (∼ 0.65s).
Fig. 4.9 shows the histograms of transactions latency of our proposed scheme and Avalanche [7] in
two different settings. As it can be observed, Blizzard significantly reduces latency by ∼ 50%. Further, one
can see Avalanche has a wide range of transactions latency while Blizzard has a dense one.
Figure 4.9: The left and right plots respectively represent the histogram of transactions latency of our
proposed scheme (Blizzard) compared with Avalanche [7] for the case of 100 transactions, security parameters β1 = 11 and β2 = 150. The left histogram corresponds to the settings of 100 nodes, 8 brokers,
and 3 connections per node; while the right figure corresponds to the case of 200 nodes, 11 broker, and 6
connections per node.
‡‡‡Based on https://www.geekbench.com/
62
Table 4.3: Comparison of Different Protocols.
Protocol Transaction per
Second per
shard
Confirmation
Latency
Number of
Validators Mobile-based
Avalanche [7] ∼ 3,400 ∼ 1.35s 100k+ No
Blizzard ∼ 10,000 ∼ 0.65s 100M+ Yes
4.4.3 Battery Energy Consumption
Here, we provide a rough estimate of the energy consumed per transaction validated by a mobile device.
Given that the amount of data exchanged between the mobile nodes and online brokers for each
transaction is relatively small, and considering that no computationally intensive Sybil control mechanism,
such as Proof of Work, is required, the primary source of energy consumption in our protocol is
computation. As discussed above, the dominant source of computation when validating a transaction (see
Figure 4.7) is component 4 (Computing IsStronglyPref.). Based on our experiments, we estimate this takes
about 100 µs. Assuming a mobile CPU power consumption of 1.5 Watt [140] and conservatively assuming
full CPU utilization, this would translate to about 1.5 × 10−4
Joules per transaction. While we are not
aware of benchmark numbers for other protocols we could compare this with since other protocols are
typically not built with energy efficiency of validators in mind, we believe this imposes a relatively low,
manageable load on a mobile device, particularly as the device owner is free to determine what number of
transactions it participates in over a given period of time.
To succinctly highlight the differences between Blizzard and Avalanche, we present the comparative
Table 4.3.
4.5 Discussions
Here, we briefly discuss topics that require further exploration. While these are beyond the current work’s
scope, they represent active areas of our ongoing research.
63
4.5.1 Mobile-Device Oriented Sybil Control
This work presents a consensus mechanism that assumes the existence of a Sybil-control mechanism, such
as Proof of Work or Proof of Stake. Implementing Blizzard in a network with millions of mobile devices
necessitates a Sybil control mechanism that ensures only users with legitimate mobile devices can
participate, deterring the creation or operation of multiple fake identities. One possible design could utilize
globally unique mobile IDs, like IMEI numbers, while maintaining an open and decentralized architecture.
Alternatively, decentralized IDs with a permissioned setup or using location information and wireless
signal strength could serve as a Sybil control mechanism [141, 142]. Another approach could involve Proof
of Social Contacts [143], leveraging encounter information between mobile nodes to detect and blacklist
Sybil nodes.
4.5.2 Improving Scalability
A significant bottleneck in a DAG-based protocol is that all nodes must store the entire DAG and verify
transactions to be strongly preferred by traversing the DAG. This issue is compounded when
resource-limited mobile nodes are involved in the consensus mechanism, as proposed in Blizzard. To
address computation and storage challenges, we explore three approaches: sharding [144], pruning DAG,
and off-loading verification.
• Sharding: This technique creates multiple pools of mobile nodes, with each pool focusing on storing
and verifying transactions for a specific subset of accounts. This method trades off security for
scalability, per the Blockchain Trilemma, while maintaining decentralization. Sharding alleviates
computational and storage constraints on mobile devices, enabling effective participation in the
consensus process without overwhelming the devices.
64
• Pruning DAG: By defining a checkpoint transaction as finalized, all ancestor transactions of a
checkpoint transaction are also finalized. In the account model (used in Ethereum), once a
checkpoint transaction is reached on the DAG, storing all ancestor transactions becomes
unnecessary. Thus, each mobile node can save significant memory by storing only the pruned DAG.
Including checkpoint concepts allows for lightweight nodes. However, practical considerations, such
as retaining historical information on some full nodes for security, must be addressed [145]. In our
architecture, online brokers could serve as full nodes storing the entire history, while mobile nodes
store information past the last checkpoint, easing memory constraints.
• Off-loading verification: Inspired by [146], this approach allows verification computation to be
offloaded to powerful servers that provide zero-knowledge proofs, which mobile devices can verify
more efficiently. This method addresses the computational power constraints of mobile devices,
enabling them to verify transactions without significant resource expenditure.
4.5.3 Safety under a Partially Synchronous Model
As shown in [115], proving safety under a partially synchronous model requires explicitly considering the
total number of participating nodes and their votes when finalizing transactions, as seen in protocols like
Tendermint [93] and Hotstuff [97]. Alternatively, exploring a decentralized BFT approach to empirically
quantify an upper bound on network latency at all times could be valuable. With such a mechanism,
protocol parameters could be adapted to ensure correct and efficient operation despite time-varying
network latency in a partially synchronous network.
4.5.4 Connectivity
Blizzard is designed to function in the unpredictable and unreliable network environments characteristic of
mobile devices. If a mobile node disconnects, the system’s integrity is maintained as long as the proportion
65
of malicious entities is within the specified safe range. The only requirement for the disconnected node
upon reconnection is to update its DAG ledger through communication with connected brokers.
Safety in our protocol is ensured through random connections to brokers, which protect against
coordinated adversarial attacks. To further adapt our protocol to the dynamic nature of user mobility, we
propose leveraging advancements in telecommunications like 5G and forthcoming 6G networks. These
technologies will enable a more densely connected network of brokers, enhancing resilience to adversarial
threats and complexities introduced by mobile user behavior. Combining random broker connections with
increased broker density is key to evolving our protocol for more realistic network scenarios.
4.5.5 Challenges in Real-World Implementation
Implementing consensus protocols on mobile devices presents several challenges. One major hurdle is the
resource constraints of mobile devices, which have limited battery life compared to traditional computing
nodes. Optimizing consensus protocols for low resource consumption is critical to prevent excessive
battery drain and performance degradation.
From an implementation standpoint, it is essential that every mobile node and broker have access to
the distributed random beacon. This access is vital for facilitating random connections between mobile
devices and brokers, raising additional considerations regarding network architecture, security, and
dependability. Providing stable and secure beacon access in a broad and fluctuating network of mobile
devices is unique and must be addressed to preserve the consensus protocol’s integrity and effectiveness.
Another crucial aspect is the impact of these protocols on user experience. It is imperative that the
running of consensus protocols in the background does not intrusively affect the device’s primary
functionalities, ensuring seamless operation without significantly impacting the user’s experience.
66
4.6 Summary
In this work, we introduced Blizzard, the first Byzantine Fault Tolerant (BFT) consensus-based distributed
ledger protocol designed specifically for mobile devices as well as proposing the GCN-Consensus. Blizzard
features a novel two-tier architecture utilizing brokers and a decentralized random matching mechanism.
We have provided a mathematical analysis of Blizzard’s safety guarantees, which are represented as a
two-dimensional region. Our numerical computations for sufficiently large networks indicate that Blizzard
can support less than 50% Byzantine nodes (when the number of Byzantine brokers is small) and less than
60% Byzantine brokers (when the number of Byzantine nodes is small), demonstrating a trade-off between
these ratios. Additionally, we analyzed and evaluated Blizzard’s performance in terms of throughput,
latency, and message complexity. Our results show that Blizzard exhibits superior performance with
significantly lower message complexity and shorter propagation latency compared to its benchmarks. This
is particularly important in the mobile-first scenario considered here, where directly allowing mobile
devices to randomly query other mobile nodes in a large network without online servers is challenging.
The proposed GCN-based consensus mechanism represents a novel approach that utilizes graph neural
networks (GNNs) to manage consensus in gossip-based protocols. This method leverages the inherent
ability of GNNs to process and analyze relational data among nodes, which continuously exchange and
update information. Each node is characterized by features that reflect its state and dynamically influence
interactions with neighboring nodes. This work demonstrates that GNNs are well-suited for handling the
complexities of gossip-based consensus mechanisms. We show that our GCN-based consensus model can
effectively mimic the Avalanche protocol, showcasing its ability to learn and adapt to the intricate
functions required for consensus operations.
67
Chapter 5
Distributed Iterative Processes over Networked Systems
Heuristic algorithms can significantly deviate from optimal task scheduling, which has led to the growing
interest in meta-heuristic and optimization-based approaches that approximate the NP-hard optimization
of task scheduling. These schemes are not only adept at addressing large-scale problems, but they are also
practically efficient in yielding near-optimal solutions. Notable examples of meta-heuristic schemes include
Particle Swarm Optimization [22], Simulated Annealing [23]-[24], and Genetic Algorithms [25],[26],[27].
While few works have explored convex-relaxation based solutions for scheduling problems, these
approaches are not suitable for distributed iterative processes as they do not account for general directed
task graphs (which may include cycles) [28]-[29]. Additionally, these schemes have different objectives and
constraints, such as allowing task splitting across distributed computing machines, rather than focusing on
minimizing bottleneck time. Moreover, they often overlook the variability in communication bandwidth
between different links/paths among distributed computing machines. To our knowledge, this work is the
first to: a) consider general directed graphs, b) minimize bottleneck time while accounting for both
computation costs and heterogeneous network communication costs, and c) provide a convex-relaxation
based solution.
To the best of our knowledge, no previous work has proposed a Semi-Definite Programming (SDP)
relaxation for distributed iterative processes ∗
that simultaneously considers both computation and
∗This chapter is adapted from [90].
68
communication aspects. The primary contributions of this work include the formulation of task scheduling
for distributed iterative processes, requiring only that the task graph is directed, on distributed computing
machines as an optimization problem. We introduce a specific SDP approximation for this optimization
problem and employ a randomized rounding technique to obtain a feasible solution. Furthermore, we
analyze the expected value of the bottleneck time for our proposed scheme and provide a mathematical
upper bound on the optimal solution. We also evaluate the performance of our scheme using real data and
Gossip-based federated learning, demonstrating that it surpasses the HEFT algorithm [13] and another
approach [19] aimed at maximizing throughput.
5.1 Problem Definition
In this chapter, we address the minimization of bottleneck time in distributed iterative processes with
inter-task dependencies. We begin by briefly discussing gossip-based federated learning (FL) as an example
of such processes, followed by a detailed problem formulation.
5.1.1 Gossip-based Federated Learning
To ensure data privacy in distributed machine learning (ML) scenarios like FL, secure communication links
can be established using trusted distributed computing resources†
. A gossip-based FL scheme can be
modeled as a network topology with a set of users denoted by U, where each user i ∈ U gossips its local
model parameters to a predefined set of other users, denoted by Ui
. This network topology can be
represented by a directed graph GF L := (VF L, EF L), where VF L := {i | i ∈ U} and
EF L := {e
(F L)
i,j | i ∈ U, j ∈ Ui} represent the set of vertices and edges, respectively. Each user aggregates
the model parameters received from other users and updates its local model parameters. Training
convergence is achieved by iterating this procedure [147].
†Amazon Web Services (AWS) is an example of trusted distributed computing resources.
69
5.1.2 Problem Formulation
In this section, we formally define the optimization problem for minimizing bottleneck time in distributed
iterative processes. Such processes, including gossip-based FL, operate on a distributed computing platform
with interconnected resources via communication links. This execution can be represented by two distinct
directed graphs: the task graph and the compute graph, which correspond to the task structure of the
distributed iterative process and the distributed computing platform, respectively. We describe these
graphs in detail below:
Task Graph: The dependencies among various tasks, where one task generates inputs for others, can be
modeled using a directed graph as shown in Fig. 5.1. Unlike most previous work that considers Directed
Acyclic Graphs (DAGs), we assume a general directed graph for our task graph. Define the task graph as
GT ask = (VT ask, ET ask), where VT ask := {Ti}
T
i=1 represents the set of vertices (tasks {Ti}
T
i=1), and
ET ask := {ei,i′}(i,i′)∈Ω represents the set of edges (task dependencies), with
Ω := {(i, i′
) | task i generates inputs for task i
′}. Let p := [p1, . . . , pT ]
T denote the required
computational effort for each task.
We next show how to formally express bottleneck time minimization as an optimization problem.
Every distributed iterative process (such as gossip-based FL) can be executed on a distributed computing
platform where resources are interconnected via communication links. Hence, executing a distributed
iterative process on distributed computing resources can be described by two separate directed graphs,
namely task graph and compute graph which stand for the task structure of the distributed iterative process
and the distributed computing platform, respectively. We next explain about each of these graphs as
follows:
Task Graph: Since there are dependencies across different tasks, meaning that a task generates inputs for
certain other tasks, we can model this dependency through a directional graph as depicted in Fig. 5.1.
Unlike most prior work which considered Directed Acyclic Graph (DAG), we assume our task graph to be a
70
general directed graph. Let us define GT ask = (VT ask, ET ask) as our task graph where VT ask := {Ti}
T
i=1
and ET ask := {ei,i′}(i,i′)∈Ω respectively represent the set of vertices (tasks {Ti}
T
i=1) and edges (task
dependencies) with Ω := {(i, i′
)| if task i generates inputs for task i
′ }. Let us define p := [p1, . . . , pT ]
T
as
the required amount of computations of tasks.
Figure 5.1: An illustration of a task graph consisting of five tasks.
Compute Graph: Each task must be executed on a compute node (machine), which is connected to other
compute nodes via communication links. In this context, the terms compute node and machine are used
interchangeably. The execution speed of these compute nodes is represented by the vector
e := [e1, . . . , eK]
T
. The communication delay between any two compute nodes is determined by the
bandwidth of the link connecting them. If two machines are not directly connected, we assume the
corresponding bandwidth is zero, implying an infinite communication delay.
The communication network of distributed computing nodes can be modeled as a complete‡ graph
GCompute = (NCompute, LCompute). Here, NCompute = {nj}
K
j=1 represents the set of compute nodes, and
LCompute =
n
1
bj,j′
o
∀j̸=j
′
represents the links connecting these nodes, where bj,j′ is the bandwidth of the
link between machines j and j
′
. An illustration of a compute graph is shown in Fig. 5.2. This model
accounts for the case where two machines are not connected by assigning an infinite communication delay.
Since the result of executing a task is a model with the same number of parameters, we can represent
the communication delays between machines using the matrix C ∈ R
K×K.
‡A complete graph is a type of graph where each vertex is connected to every other vertex.
71
Figure 5.2: An illustration of a compute graph consists of three machines with corresponding bandwidths
shown on edges.
We now detail the formulation for minimizing the bottleneck time in distributed iterative processes.
Specifically, we introduce the objective function and constraints inherent to the problem.
Objective Function: The goal is to determine the optimal task mapper function, denoted as
m(.) : VT ask → NCompute, which assigns task i to machine m(i) for all i such that the bottleneck time of
throughput is minimized. The bottleneck time of throughput is defined as the maximum
compute-communicate time across all tasks for a given task mapper matrix M ∈ {0, 1}
T ×K, which is
equivalent to the mapper function m(.). In this matrix, [M]i,j = 1 if task i is assigned to compute machine
j; otherwise, [M]i,j = 0. By defining S := (GT ask, GN ode, m(VT ask)), the compute-communicate time for
task i can be expressed as follows:
t
(i)
comp−comm(S) := t
(i)
comp(S) + t
(i)
comm(S) ∀i, (5.1)
where t
(i)
comp(S) is the needed time for task i to be executed on machine m(i) (compute time) and t
(i)
comm(S)
is the time taken to transmit the results to machines which are specified to run the immediate successive
tasks of task i (communicate time). Therefore, the bottleneck time of throughput is
tBottleneck(S) := max
i
t
(i)
comp−comm(S).
(5.2)
72
The objective function can be formally written as:
t
∗
Bottleneck := min
M∈{0,1}
T ×K
max
i
t
(i)
comp−comm(M). (5.3)
Constraints: due to the fact that each task needs to be executed on a machine and then its results to be
transmitted to machines where executing the successive tasks, we can write these constraints as follows
X
j
[M]i,j = 1 ∀i. (5.4)
or equivalently rewriting as
M1K×1 = 1T ×1.
(5.5)
Optimization Problem: by considering aforementioned objective function and the constraints, the
optimization problem would be as
min
M∈{0,1}
T ×K
max
i
t
(i)
comp−comm(S)
s.t. M1K×1 = 1T ×1.
(5.6)
We can rewrite the objective function in terms of required task processing vector p, machine
execution speeds vector e, communication delay matrix C, and task mapper function in closed form. Each
term of (5.1) can be further derived as
• t
(i)
comp(S): Since each machine executes all tasks assigned to it in parallel§
, the required time to run
task i is
t
(i)
comp(S) =
P
r:m(r)=m(i)
pr
em(i)
(a)
=
X
K
ℓ=1
p
TMℓ
Mie
[M]i,ℓ ∀i, (5.7)
§The CPU allocation is proportional to the size of required computations for tasks.
73
where Mi represents the ith row of matrix M, and step (a) follows from the constraint that each row
of matrix M contains exactly one 1. Simplifying further, (5.7) can be rewritten as:
t
(i)
comp(S) = MiDMT p = IiMDMT p ∀i, (5.8)
where Ii
:= [0, . . . , 0, 1
|{z}
ith component
, 0, . . . , 0], D := diag(1 ⊘ e) and ⊘ denotes element-wise division.
By defining m := vec(M) and using the identity trace{AXBXT } = vec(X)
T
(BT ⊗ A)vec(X)
(where ⊗ indicates the Kronecker product), we can rewrite (5.8) as:
t
(i)
comp(S) = trace{IiMDMTp} = trace{MDMTpIi}
= mT
(DT ⊗ pIi)m ∀i,
(5.9)
t
(i)
comm(S): The results of computed task i need to be transmitted to all machines assigned to execute
successive tasks i
′ where ei,i′ ∈ ET ask. Given the constraints in (5.4), the communication delay for
sending the result of task i from machine m(i) to machine i
′
is [C]m(i),m(i
′)
. Since the result must be
sent to all machines running the successive tasks, we have:
t
(i)
comm(S) = max
i
′
:ei,i′∈ET ask
[C]m(i),m(i
′) ∀i. (5.10)
Further simplification yields:
t
(i)
comm(S) = max
i
′
:ei,i′∈ET ask
MiCMT
i
′
= max
i
′
:ei,i′∈ET ask
IiMCMT
I
T
i
′
= max
i
′
:ei,i′∈ET ask
mT
(C
T ⊗ I
T
i
Ii
′)m ∀i,
(5.11)
By combining (5.9) and (5.11), we can formulate the objective function as follows:
74
min
m∈{0,1}
TK×1
max
i,i′
:ei,i′∈ET ask
{mTQi,i′m}
s.t. Hm = 1T ×1,
(5.12)
where
Qi,i′ := DT ⊗ pIi + C
T ⊗ I
T
i
Ii
′
H := 11×K ⊗ IT ×T
(5.13)
and IT ×T is the identity matrix of size T by T.
The optimization problem (5.12) can be rewritten as:
min
m∈{0,1}
TK×1,t
t
s.t. mTQi,i′m ≤ t ∀i, i′
: ei,i′ ∈ ET ask
Hm = 1T ×1,
(5.14)
Since the elements of vector m in (5.14) can only be 0 or 1, the problem (5.14) is not convex, making it
challenging to find the optimal solution for this binary quadratic programming (BQP) problem.
Remark 1: After symmetrizing the matrices Qi,i′ by replacing Qi,i′ with
Qi,i′+QT
i,i′
2
, the problem (5.14) is
still not necessarily semi-definite positive (or negative).
Before we proceed with the relaxation of (5.14), let’s consider two special cases of this problem in the
following theorem and proposition:
Theorem 1: If the communication delay is negligible compared to computational time (i.e., C = 0K×K),
there is no inter-task data dependency, and at most one task is allowed to be executed on each machine, then
the optimal task mapper function m(.) can be obtained by assigning the task with the highest required
computation to the machine with the fastest execution speed, after sorting tasks and machines based on their
computational requirements and execution speeds, respectively.
75
Proof: By sorting machines by their execution speeds (with the fastest machine first) and tasks by their
computational requirements (with the highest requirement first), the task at index ℓ in the sorted list is
executed by the machine at index ℓ in the sorted machines. Consider tasks i and k where pi ≥ pk (i.e., task
i has more computation than task k) are executed on machines j and j
′
, respectively. The bottleneck time
would be t1 = max{tothers,
pi
ej
,
pk
ej
′
}, where tothers represents the time for completing other tasks. By
swapping the task assignments of i and k (i.e., i is executed on j
′
and k on j), the required time would be
t2 = max{tothers,
pk
ej
,
pi
ej
′
} ≤ t1 due to pi
ej
′
≤
pk
ej
≤ max{
pi
ej
′
,
pk
ej
} and pk
ej
≤
pi
ej
′
≤ max{
pk
ej
,
pi
ej
′
}.
Proposition 1: If the communication delay is negligible compared to computational time (i.e.,
C = 0K×K), there are no inter-task data dependencies, and all compute nodes have identical execution
speeds, then the optimization problem (5.14) simplifies to:
min
m(.)
max
i
X
ℓ:m(ℓ)=m(i)
pℓ
, (5.15)
This problem is analogous to load balancing, which aims to minimize the maximum load (defined as
the sum of computational processing amounts of tasks assigned to each machine) across K machines.
Essentially, the goal of (5.15) is to distribute T tasks among K machines such that the total computation
time is evenly balanced among all machines. This problem is known to be NP-complete for the following
reasons:
• A non-deterministic polynomial-time algorithm can solve (5.15) by guessing an assignment of tasks
to K machines, and then verifying in polynomial time if all machines have the same computational
load.
• The problem can be reduced from the well-known NP-complete problem of Set Partitioning in
polynomial time. Specifically, given T, K, {pi}
T
i=1, and a target value θ =
P
i
pi
K
, if there is a solver
for our problem (i.e., verifying in polynomial time whether there exists an assignment of tasks with a
76
workload of at most θ for all machines), then this solver can determine if there is a solution to the Set
Partition problem with {pi}
T
i=1 as the instance inputs of the Set Partition problem.
5.2 Proposed Semi-Definite Programming (SDP) Relaxation
In this chapter, we address the challenge of solving (5.14) by relaxing the problem into a Semi-Definite
Programming (SDP) problem, which is easier to solve and can still yield a desirable solution.
Since it is more convenient to apply approximations on homogenized quadratic programming
problems when m ∈ {−1, +1}
TK×1
rather than m ∈ {0, 1}
TK×1
, we rewrite (5.14) as:
min
x∈{−1,1}
TK×1,t
t
s.t. x
TQi,i′x + 2
1
T
NK×1Qi,i′x
+ 1
T
NK×1Qi,i′1NK×1
≤ 4t ∀i, i′
: ei,i′ ∈ ET ask,
Hx = (2 − K)1T ×1,
(5.16)
This is done by replacing m with x+1
2
. We can reformulate (5.16) as the following optimization
problem:
min
x∈RTK×1,X∈RTK×TK,t
t
s.t. < Qi,i′, X > +2
1
T
NK×1Qi,i′x
+ 1
T
NK×1Qi,i′1NK×1
≤ 4t ∀i, i′
: ei,i′ ∈ ET ask,
Hx = (2 − K)1T ×1,
X = xxT
,
diag(X) = 1,
(5.17)
77
where < Qi,i′, X >= trace{Qi,i′X}. The optimization problem (5.17) is not convex due to the
constraint X = xxT
. A well-known SDP technique is to replace the constraint X = xxT with X ⪰ xxT
,
where ⪰ denotes semi-definite positivity for matrices. Therefore, the relaxed version of (5.17) is:
min
x∈RTK×1,X∈RTK×TK,t
t
s.t. < Qi,i′, X > +2
1
T
NK×1Qi,i′x
+ 1
T
NK×1Qi,i′1NK×1
≤ 4t ∀i, i′
: ei,i′ ∈ ET ask,
Hx = (2 − K)1T ×1,
X x
x
T 1
⪰ 0,
diag(X) = 1.
(5.18)
Due to the non-homogeneous structure of (5.18), which includes both X and x in the linear and
quadratic constraints, it is challenging to round the final solution to a feasible point. Therefore, we aim to
re-formulate (5.18) into a new homogenized optimization problem.
min
x∈RTK×1,X∈RTK×TK,u∈R,t
t
s.t. < Qi,i′, X > +2u1
T
NK×1Qi,i′x + u
21
T
NK×1Qi,i′1NK×1
≤ 4t ∀i, i′
: ei,i′ ∈ ET ask,
uHx = (2 − K)1T ×1,
X x
x
T 1
⪰ 0,
diag(X) = 1,
u
2 = 1.
(5.19)
78
By defining x
′
:= [x
T
, u]
T
, we can rewrite (5.19) as follows
min
x′∈R(TK+1)×1,X′∈R(TK+1)×(TK+1),t
t
s.t. < Q˜
i,i′, X′ > ≤ 4t ∀i, i′
: ei,i′ ∈ ET ask,
< Ai
, X′ > = 0 ∀i ∈ {1, . . . , T},
X′ x
′
x
′T
1
⪰ 0,
diag(X′
) = 1,
(5.20)
where
Q˜
i,i′ :=
Qi,i′
Qi,i′1TK
2
1
T
TKQi,i′
2
1
T
TKQi,i′1TK
∀i, i′
: ei,i′ ∈ ET ask
Ai
:=
0(TK+1)×(TK+1)
HT
i
2
Hi
2
(K − 2)
∀i ∈ {1, . . . , T}.
(5.21)
To round the obtained result of (5.20), denoted as X′∗
and x
′∗
, we employ the randomized technique
developed for SDP problems. This method involves sampling z ∼ N (0, X′∗
) and taking the sign of z to
get binary values of −1 or +1. We then retain the samples that meet the constraints. To avoid failing to
find points that satisfy all constraints, we can modify the constraint Hx = (2 − K)1T ×1 to
Hx ≥ (2 − K)1T ×1 without changing the optimal value. This adjustment allows tasks to be executed
more than once, ensuring the optimal solution remains the same. Finally, we select the sample point with
the lowest objective value. In the next section, we elaborate on an upper bound of the bottleneck time for
our randomized scheme. Since our proposed algorithm picks the sample point with the minimum
bottleneck time, the average bottleneck time serves as an upper bound for our randomized method.
79
5.2.1 Expected Value Analysis
In this section, we provide the average bottleneck time for our proposed technique given a task graph
GT ask and compute graph GCompute. The expected value of the bottleneck time is:
max
i,i′
:ei,i′∈ET ask
1
4
Ez[ˆzTQ˜
i,i′ˆz]. (5.22)
To obtain this expected value, we need to find E[ˆzTQˆz] where Q = Q˜
i,i′ for given i and i
′
, as follows:
Ez[ˆzTQˆz]
(a)
=
2
π
X
w,v
[Q]w,v arcsin([X
′∗
]w,v) (5.23)
where (a) follows from Lemma 4 presented below.
Lemma 4. Ez[ˆzTQˆz] = 2
π
P
w,v [Q]w,v arcsin([Σ]w,v) for z ∼ N (0, Σ).
Proof.
Ez[ˆzTQˆz] = E[trace{QˆzˆzT
}] = E[
X
w,v
[Q]w,vzˆwzˆv]
=
X
w,v
[Q]w,vE[ˆzwzˆv]
=
X
w,v
[Q]w,vE[sign(zw)sign(zv)]
=
X
w,v
[Q]w,v
P r[zw ≥ 0, zv ≥ 0] + P r[zw ≤ 0, zv ≤ 0]
− P r[zw ≤ 0, zv ≥ 0] − P r[zw ≥ 0, zv ≤ 0]
,
(5.24)
80
where P r[A] denotes the probability of event A. By defining random variable z :=
z√v−ρzw
1−ρ
2
where
ρ := cov(zw, zv), one can easily verify that z ⊥ zw with z and zw have zero-mean unit-variance normal
distribution. Considering this, we have
P r[zw ≥ 0, zv ≥ 0] = P r[zw ≥ 0, z ≥
ρ
p
1 − ρ
2
zw]
=
Z ∞
zw=0
Z ∞
z=azw
1
√
2π
e
−
z
2
w
2 .
1
√
2π
e
− z
2
2 dzdzw
=
1
2π
(
π
2
− arctan(a)) = 1
2π
(
π
2
+ arcsin(ρ)),
(5.25)
where a = √−ρ
1−ρ
2
. By following similar approach for P r[zw ≤ 0, zv ≤ 0], P r[zw ≤ 0, zv ≥ 0], and
P r[zw ≥ 0, zv ≤ 0], we can simplify (5.25) as
Ez[ˆzTQˆz] = 2
π
X
w,v
[Q]w,v arcsin([Σ]w,v). (5.26)
5.2.2 Upper Bound on the Optimal Solution
Since (5.20) is the relaxed version of (5.14), it is evident that the optimal solution to (5.14), denoted as OP T,
is greater than or equal to the solution of the SDP problem (5.20), i.e.
max
i,i′
:
ei,i′∈ET ask
1
4
X
w,v
[Qi,i′]w,v[X
′∗
]w,v ≤ OP T. (5.27)
Conversely, the optimal solution to (5.14) is less than or equal to any other feasible solution, including
the minimization of the expected value of the bottleneck time expressed as follows:
81
min
t
t
s.t. Ez[ˆzTQ˜
i,i′ˆz] ≤ 4t ∀i, i′
: ei,i′ ∈ ET ask.
(5.28)
Since (5.23) can be upper-bounded as follows:
2
π
X
w,v
[Q]w,v arcsin([X
′∗
]w,v)
(b)
≤
2
π
X
w,v
[Q]w,v(0.112 + 0.878[X
′∗
]w,v)
(5.29)
where (b) follows from the fact that arcsin(x) ≤ 0.112 + 0.878x for all |x| ≤ 1, we can easily see:
OP T ≤ max
i,i′
:
ei,i′∈ET ask
1
2π
X
w,v
[Qi,i′]w,v(0.112 + 0.878[X
′∗
]w,v)
(5.30)
We are guaranteed to achieve this expected value after sampling a sufficient number of feasible points
ˆz, thereby ensuring that the optimal value lies between the solution of the SDP and the solution to (5.28).
5.3 Numerical Results
In this section, we present numerical results by comparing our proposed scheme with well-known
techniques for distributed computing, such as HEFT [13] and throughput HEFT [19]. The simulations are
conducted for two different scenarios: (1) an arbitrary distributed iterative process with predefined
settings¶
, and (2) gossip-based federated learning for the classification of MNIST and CIFAR-10 datasets.
5.3.1 Distributed Iterative Process with Pre-defined Settings
In this subsection, we provide numerical results for an arbitrary task computation vector P and arbitrary
execution speed vector E of distributed machines. Since HEFT-based schemes require the task graph to be
¶The required computational effort for tasks and the execution speed of computing machines are known in advance. This
scenario is used to evaluate the performance of different schedulers under arbitrary settings.
82
a Directed Acyclic Graph (DAG), we detail how to construct a new DAG from a given task graph to input
into HEFT-based algorithms.
T1 T2 T3 T4 T5
S
T1,2 T1,3 T1,4 T2,3 T3,4 T3,5 T4,2 T5,2 T5,3
D
Figure 5.3: The corresponding DAG of the task graph GT ask depicted in Fig. 5.1 to be used with HEFT-based
schemes.
5.3.1.1 Creating a new DAG from GT ask for HEFT-based Schemes
Let us define GDAG = (VDAG, EDAG) as the corresponding DAG of the task graph GT ask. We first
determine the set of vertices VDAG, followed by the set of edges EDAG. The set VDAG consists of all
vertices of the task graph GT ask, as well as the following vertices:
• Source vertex S.
• Intermediate vertices vi,j for all i and j such that eij ∈ ET ask (i.e., task Ti
is the parent of task Tj ).
• Destination vertex D.
The set EDAG includes the edges of the task graph GT ask and the following edges:
• Outgoing edges from vertex S: a set of edges {eS,vi
} for all vi ∈ VT ask, connecting S to each vertex
vi
in VT ask.
83
• Incoming edges to intermediate vertices Ti,j : a set of edges {evi,vi,j }, connecting vertex vi to vertex
vi,j for all i and j where Ti
is the parent of Tj (eij ∈ ET ask).
• Incoming edges to vertex D: a set of edges {evi,j ,D}, connecting vertex vi,j to vertex D for all i and
j.
Thus, the corresponding DAG is formally represented as GDAG = (VDAG, EDAG), where
VDAG := VT ask ∪ {S, {vi,j}i,j:ei,j∈ET ask , D} and
EDAG := {eS,vi
}i:vi∈VT ask ∪ {evi,vi,j }i,j:ei,j∈ET ask ∪ {evi,j ,D}i,j:ei,j∈ET ask .
Figure 5.4 shows the bottleneck time of our proposed scheme, the SDP method with naive rounding,
compared against HEFT [13] and Throughput HEFT (TP) [19] under the following settings: K = 4 (four
compute nodes), components of the communication matrix C are i.i.d. drawn from N (0,
√
10), execution
speeds of compute nodes are i.i.d. drawn from N (0,
√
15), and required computations for tasks are i.i.d.
drawn from N (0, 1). As illustrated, our proposed scheme outperforms HEFT [13], as HEFT schedules tasks
based on the average communication delay of links, while our scheme schedules based on actual
communication delays by solving an optimization problem. Specifically, our proposed scheme achieves a
63%-91% reduction in bottleneck time compared to HEFT [13] and a 41%-84% reduction compared to TP
HEFT [19]. Additionally, Figure 5.4 shows the upper bound for our proposed scheme, demonstrating that
even the upper bound is significantly lower than the HEFT results in most cases.
Figure 5.5 compares our proposed scheme and the SDP approach with naive rounding against
well-known methods like HEFT [13] and TP HEFT [19] in terms of bottleneck time for varying degrees of
task graph vertices. In Fig. 5.5, dL and dH denote the minimum and maximum degree of the task graph
vertices, respectively. The other simulation settings remain the same as previously described, with T = 21.
One key observation from Fig. 5.5 is that the upper bound of our proposed scheme is significantly lower
than that of the HEFT scheme [13] (approximately 29%-39% lower). Another important finding is that our
proposed scheme outperforms HEFT [13] and TP HEFT [19] significantly (achieving a 59%-90% reduction
84
8.97 9.13
9.67 10.04 9.79
4.49 4.82 4.90
6.15 6.21
2.26
2.99
3.79
5.58
6.42
0.71 0.96
1.48
2.51
3.66
5.50 5.77 5.84
12.18
6.48
0
2
4
6
8
10
12
14
T=16 T=21 T=26 T=31 T=36
Bottleneck Time (s)
Number of Tasks
HEFT TP HEFT
SDP w. Naive Rounding Proposed Scheme
Upper Bound of Proposed Scheme
Figure 5.4: Bottleneck time of different schemes: HEFT [13], Throughput HEFT [19], SDP method with
naive rounding, our proposed scheme, and the upper bound of our proposed approach, for various numbers
of tasks.
9.83 9.65
9.13
9.66
5.28
6.04
4.82
5.55
3.59
2.41
2.99
5.04
3.99
1.76
0.96 1.01
6.97
6.43
5.77 5.86
0
2
4
6
8
10
12
dL=2, dH =3 dL=4 dH =5 dL=6, dH =7 dL=8, dH =9
Bottleneck Time (s)
Degrees of Task Graph
HEFT TP HEFT
SDP w. Naive Rounding Proposed Scheme
Upper Bound of Proposed Scheme
Figure 5.5: Comparison of bottleneck time across different schemes: HEFT [13], Throughput HEFT [19],
SDP method with naive rounding, our proposed scheme, and the upper bound of our proposed approach,
for varying degrees of the task graphs.
in bottleneck time compared to HEFT [13] and a 25%-82% reduction compared to TP HEFT [19]) as the task
graph becomes denser (with higher vertex degrees). This improvement is because denser task graphs have
more successors (child tasks), increasing the likelihood of scheduling a task on a machine with poor
communication in HEFT-based schemes. Our scheme optimizes for all communication links, whereas
HEFT-based schemes only consider average communication delays during task mapping. Consequently,
HEFT-based schemes can make inefficient assignments by scheduling a task on a poorly communicating
machine, leading to higher bottleneck times.
85
36.03
21.97
15.13 15.10
30.28
26.57
13.38 13.33
0
5
10
15
20
25
30
35
40
HEFT TP-HEFT SDP w. Naïve-Round. SDP w. Randomized
Round.
Bottleneck Time (s)/# of Epochs
MNIST Dataset CIFAR-10 Dataset
Figure 5.6: Bottleneck time for executing gossip-based federated learning of T = 10 tasks on K = 4
distributed machines using four different schemes: HEFT [13], Throughput HEFT (TP) [19], the SDP method
with naive rounding, and our proposed scheme (SDP with randomized rounding).
5.3.2 Gossip-based Federated Learning
In this section, we evaluate the bottleneck time for an application of our optimization problem 5.6,
specifically gossip-based federated learning. In this context, each task is associated with a part of the whole
dataset (each task represents a user in a real-world gossip-based federated learning problem). To simulate
gossip-based federated learning, we consider 10 tasks forming a random task graph Gtask with vertex
degrees randomly drawn from a Unif(6, 7) distribution.
In terms of gossiping model parameters, all users first partition their associated data into w chunks and
train a neural network model with a chunk of data, then send their models to a predefined set of tasks (or
users)∥
. Upon receiving model parameters, each user aggregates∗∗ the models, updates its local model, and
continues training with the next data chunk. This process is repeated for multiple epochs until the model
converges.
We use the classification of MNIST and CIFAR-10 datasets through a Convolutional Neural Network
(CNN) as two examples of gossip-based federated learning on distributed computing machines. The CNN
model used in our simulation has two convolutional layers and three fully connected layers. Considering
network delay and the size of model parameters to be gossiped, we assume each component of the
communication matrix C (i.e., Ci,j for i ̸= j) is randomly drawn from a Unif(0, 1) distribution.
∥Task dependencies are enforced by the task graph.
∗∗A simple aggregation method is the weighted average of the model parameters.
86
Figure 5.6 presents the bottleneck time for running gossip-based federated learning on distributed
computing systems with four different schedulers: HEFT [13], TP-HEFT [19], our SDP approach with naive
rounding, and our proposed SDP scheme with randomized rounding. Since all scheduler schemes assign
tasks to compute machines based on the required processing vector P, we implement a pilot phase†† to
estimate the tasks’ required computational efforts. For simplicity, we assume all compute machines are
homogeneous, with identical execution speeds. During the pilot phase, each task uses a small portion of its
data to estimate the required computations by measuring the time to train the model with the pilot data
and multiplying it by the execution speed of the compute machine. Our results show that the two proposed
SDP-based approaches outperform HEFT [13] and TP-HEFT [19] in terms of bottleneck time for
gossip-based federated learning.
5.4 Summary
In this chapter, we introduced a novel task scheduling scheme designed to expedite iterative processes
executed on distributed computing resources. We began by mathematically formulating the task
scheduling problem as a Binary Quadratic Programming (BQP) problem. To tackle the complexity of this
formulation, we proposed an approximation based on Semi-Definite Programming (SDP) combined with a
randomized rounding technique.
Our analysis included an evaluation of the expected value of bottleneck time, demonstrating that our
scheme maintains a constant gap from the optimal BQP solution. This rigorous mathematical foundation
ensures that our proposed scheduling scheme is both efficient and effective.
To validate our approach, we applied it to a concrete example: gossip-based federated learning, which
is a quintessential distributed iterative process. The results of our simulations revealed that our proposed
scheduling scheme significantly outperforms established techniques such as HEFT [13] and Throughput
††This step is conducted before running gossip-based federated learning.
87
HEFT (TP) [19]. These findings underscore the potential of our approach to enhance the performance of
distributed iterative processes, making it a valuable contribution to the field of distributed computing.
88
Chapter 6
Graph Neural Network based Scheduling Scheme
A promising alternative to conventional scheduling methods is to apply machine learning techniques for
function approximation to the task scheduling problem, leveraging the fact that scheduling essentially
involves finding a function mapping tasks to compute machines. Given the graph structure of applications,
we propose using an appropriate graph convolutional network (GCN) [40] to schedule tasks by learning
the inter-task dependencies of the task graph and network settings, such as the execution speed of
compute machines and communication bandwidth across machines. This approach aims to extract the
relationships between different entities.
To the best of our knowledge, no prior work has proposed a purely GCN that incorporates carefully
designed features of both nodes and edges for task graphs to perform scheduling over distributed
computing systems. We propose GCNScheduler ∗†
, a novel approach that quickly schedules tasks by
integrating a task graph with network settings into a single input graph and feeding it into a GCN model
suitable for directed graphs. We demonstrate that any existing scheduling algorithm can be used as a
teacher to train GCNScheduler for any metric. For example, we train GCNScheduler using HEFT [13] for
makespan minimization and TP-HEFT [19] for throughput maximization. Our results show that
GCNScheduler achieves comparable scheduling performance to the teacher algorithms, matching HEFT in
∗This chapter is adapted from [91].
†The dataset and source code used in this chapter are available in an open-source repository online at https://github.
com/GCNScheduler/GCNScheduler.git.
89
terms of makespan and TP-HEFT in terms of throughput. Moreover, we show that GCNScheduler can be
trained in a very short period of time, taking around a minute to train a model using a graph with 5,000
nodes. GCNScheduler significantly outperforms previous algorithms in scheduling speed for a given task
graph, completing the scheduling for 100-node task graphs in approximately 3.8 milliseconds for makespan
minimization, compared to around 288 milliseconds for HEFT [13] and 25 seconds for READYS [128], and
in about 3.3 milliseconds for throughput maximization, compared to about 6.9 seconds for TP-HEFT [19].
Finally, we show that GCNScheduler can efficiently perform scheduling for task graphs of any size,
particularly demonstrating its ability to handle large-scale task graphs where existing schemes require
excessive computational resources.
6.1 Problem Definition
In this chapter, we elaborate on formally representing the minimization of makespan and the maximization
of throughput as optimization problems. To complete a job, all its tasks need to be executed on at least one
compute machine. Before defining makespan and throughput, we will explain task dependencies, known as
the task graph, and network settings.
Task Graph: Each application or job consists of inter-task data dependencies. Since tasks depend on each
other, meaning a task generates inputs for certain other tasks, we model this dependency using a DAG
(Directed Acyclic Graph) as shown in Fig. 6.1a. Suppose we have NT tasks {Ti}
NT
i=1 with a given task graph
GT ask := (VT ask, ET ask), where VT ask := {Ti}
NT
i=1 and ET ask := {ei,i′}(i,i′)∈Ω represent the set of
vertices and edges (task dependencies), respectively, with Ω := {(i, i′
)| if task Ti generates inputs for task
Ti
′ }. Let p := [p1, . . . , pNT
]
T denote the amount of computations required by tasks. For every pair of
tasks Ti and Tj , ∀i, j where ei,j ∈ ET ask, task Ti produces di,j amount of data for task Tj after being
executed by a machine.
90
T3 T4 T5
T1
T2
T6 T7
T8
(a) An example of a task graph, which is a DAG, with
eight tasks. For instance, task T7 requires tasks T2, T3,
and T5 to be executed and generate their outputs before
executing task T7.
�"#
�� ��
�� ��
�#"
�)*
�*)
�"*
�*"
�#)
�)#
�")
�)"
�#*
�*#
(b) An example of heterogeneous distributed computing
systems with four machines and their pair-wise communication bandwidths.
Figure 6.1: Illustrations of a task graph and distributed computing systems.
Network Settings: Each task must be executed on a compute node (machine), which is connected to other
compute nodes through communication links (compute node and machine are used interchangeably in this
paper). Suppose there are NC compute nodes {Cj}
NC
j=1. The execution speed of compute nodes is
represented by vector e := [e1, . . . , eNC
]
T
. The communication link delay between any two compute
nodes can be characterized by bandwidth, denoted Bi,j for the link from compute node Ci to compute
node Cj . If two machines are not connected, the corresponding bandwidth is zero (indicating infinite
communication delay).
A task-scheduling scheme maps tasks to compute nodes based on a specific objective. Formally, a task
scheduler can be represented as a function m(.) : VT ask→{Ck}
NC
k=1, where task Ti
, ∀i, is assigned to
machine m(Ti).
We next focus on two well-known objectives: makespan minimization and throughput maximization
91
6.1.1 Objective 1: Makespan
The first objective function for task assignment we consider is minimizing the makespan. To do so, we
need to define several terms: Earliest Start Time (EST), Earliest Finish Time (EFT), Actual Start Time (AST),
and Actual Finish Time (AFT).
Definition 1: EST(Ti
, Cj ) denotes the earliest execution start time for task Ti on compute node Cj . Note
that EST(T0, Cj ) = 0, ∀j.
Definition 2: EF T(Ti
, Cj ) represents the earliest execution finish time for task Ti on compute node Cj .
Definition 3: AST(Ti) and AF T(Ti) denote the actual start time and actual finish time of task Ti
.
These times can be computed recursively starting from task T1 using the following formula [13]:
EST(Ti
, Cj ) = max{avail[j], max
Tk:ek,i∈ET ask
AF T(Tk) + commk,i
},
EF T(Ti
, Cj ) = EST(Ti
, Cj ) + pi
ej
,
(6.1)
where commi,j :=
datai,j
Bm(Ti
),m(Tj
)
and avail[j] is the earliest time at which compute node Cj is available to
execute a task.
Definition 4 (Makespan): The makespan, or the total time required to complete all tasks, is the actual finish
time of the last task. Thus, it can be represented as:
makespan = max{AF T(TNT
)}.
(6.2)
6.1.2 Objective 2: Throughput
The second objective function for task scheduling is maximizing throughput. Unlike makespan, which
measures the total execution time for a given set of tasks, throughput measures the average number of
tasks completed per unit of time in steady state.
92
Assuming an infinite number of tasks, let N(t) denote the number of tasks completed by time t. The
throughput is then limt→∞
N(t)
t
. According to [19], the throughput of a scheduler can be characterized by
the following definition.
Definition 5 (Throughput [19]): For a given task assignment, the throughput of a scheduler is 1/τ , where τ
is the maximum time taken by any resource to complete its tasks. It can be expressed as:
τ := max{max
Cq
{t
comp
q
, tout
q
, tin
q }, max
Cq→Cr
tq,r}, (6.3)
with
• t
comp
q : computation time of compute machine Cq for a single task (i.e., t
comp
q =
P
i:m(Ti
)=Cq
pi
eq
),
• t
out
q
: time taken by compute machine Cq for outgoing data transfer (i.e., t
out
q =
P
r
d
(node)
q,r
Bout
q
, where
d
(node)
q,r is the data transferred from Cq to Cr, and Bout
q
is the maximum outgoing bandwidth of Cq),
• t
in
q
: time taken by compute machine Cq for incoming data transfer (i.e., t
in
q =
P
r
d
(node)
r,q
Bin
q
, where Bin
q
is the maximum incoming bandwidth of Cq),
• tq,r: communication time to transfer data from compute machine Cq to Cr (i.e., tq,r = d
(node)
q,r /Bq,r).
The primary goal is to find a scheduler that assigns tasks to compute machines to optimize a given
objective (e.g., makespan minimization). Our proposed scheme aims to achieve this by using a suitable
GCN, integrated with carefully designed features, to classify tasks into machines‡
.
6.2 Proposed GCNScheduler Mechanism
In this chapter, we propose a novel machine-learning-based task scheduler that can be trained according to
the objectives mentioned earlier. Since task scheduling inherently involves graphs (i.e., task graphs), it is
‡
In this context, each machine represents a class.
93
crucial to employ a machine-learning approach designed to capture the graph-based relationships
underlying the task-scheduling problem. For this purpose, we utilize a suitable GCN[129] to create a model
capable of automatically assigning tasks to compute machines. A unique challenge in this problem is the
need to consider two different graphs: a task graph and a compute network graph. We address this by
merging the two graphs into one, where each node represents a task, and the node and edge features
combine the characteristics of both the task graph and compute network graph. This unified graph serves
as the foundation for GCN training and inference. To our knowledge, no previous GCN-based scheduling
scheme has implemented such unification. This innovative approach offers significant advantages over
conventional scheduling schemes, including a substantial reduction in computational complexity and the
ability to handle large-scale task graphs, which traditional schemes struggle with due to scalability issues.
6.2.1 Overview of GCNs for Directed Graphs
In applications where relationships between nodes are not reciprocal, such as social media, standard
GCN-based schemes may not be effective. An alternative is to use schemes designed for directed graphs,
like EDGNN [129], which treat incoming and outgoing edges differently to capture non-reciprocal
relationships between nodes. EDGNN assigns different weights to outgoing and incoming edges, in
addition to weights for neighboring nodes. Specifically, the embedding of node v is computed as follows:
h
(t)
n,v = σ(W(t)
1 h
(t−1)
n,v + W(t)
2
X
u:u∈N (v)
h
(t−1)
n,u
+ W(t)
3
X
u:eu,v∈ET ask
h
(t−1)
e,(u,v) + W(t)
4
X
u:ev,u∈ET ask
h
(t−1)
e,(v,u)
),
(6.4)
where W(t)
1
, W(t)
2
, W(t)
3
, and W(t)
4
are the weight matrices at layer t for the node itself, neighboring
nodes, incoming edges, and outgoing edges, respectively. Moreover, h
(t)
n,v and h
(t)
e,(u,v)
denote the
embedding of node v and the embedding of the edge from node u to node v at layer t.
94
6.2.2 Proposed Input Graph for GCN Models
To train an EDGNN-based model, we need to carefully design the input graph components, namely the
adjacency matrix, nodes’ features, edges’ features, and labels. Our scheme is not tailored to a specific
criterion, as it can learn from different scheduling schemes with various objectives. The designed input
graph can be fed into the EDGNN, and the model will be trained according to labels generated from a given
scheduling scheme. Here’s how we design the input graph:
Designed Input Graph: We start from the original task graph and maintain the same set of nodes and
edges for our input graph. Representing the input graph as Ginput := (Vinput, Einput), we have
Vinput := VT ask and Einput := ET ask. The key to an effective GCN-based scheduler lies in carefully
designing the features of nodes and edges, as well as the labels. According to the definitions of makespan
in 6.1.1 and throughput in 6.1.2, these objectives depend on the computational time required for all tasks
across all machines (i.e., {
pi
ej
}∀i,j ) and the communication delay for transferring the results of executing
tasks to their successor tasks across all machine pairs. Therefore, we incorporate the following features:
• The feature vector of node Ti
, ∀Ti ∈ Vinput, denoted by xn,i, includes:
xn,i := pi
e1
,
pi
e2
, · · · ,
pi
eNC
T
∈ R
NC . (6.5)
The rationale behind xn,i is that these features represent the computational time required for task Ti
across all compute machines.
• The feature vector of edge eu,v, ∀eu,v ∈ Einput, denoted by xe,(u,v)
, includes:
xe,(u,v)
:= du,v
B1,1
,
du,v
B1,2
, · · · ,
du,v
BNC ,NC
T
∈ R
N2
C . (6.6)
95
The rationale behind xe,(u,v)
is that these features represent the time required to transfer the results of
executing task Tu to the following task Tv across all possible machine pairs. An illustration of our designed
input graph for the task graph in Fig. 6.1 is shown in Fig. 6.2.
Objective-Dependent Labeling: Depending on the task scheduler from which our method should
learn (referred to as the "teacher" scheduler, namely HEFT for makespan minimization and TP-HEFT for
throughput maximization), we label all nodes and edges. Let Ln,v and Le,(u,v) denote the labels for node v
and edge eu,v, respectively. For node labeling, the label of node Ti
is considered as the index of the
compute node that the teacher algorithm assigns task Ti to run on.
Thus, for makespan minimization, we have Ln,v = mHEF T (v) ∀v ∈ VInput, where mHEF T (.) is the
mapper function of the HEFT algorithm. For throughput maximization, we have Ln,v = mT P −HEF T (v)
∀v ∈ VInput, where mT P −HEF T (.) is the mapper function of the TP-HEFT algorithm.
Finally, each edge is labeled according to the label of its originating vertex. In other words,
Le,(u,v) = Ln,u ∀u, v such that eu,v ∈ Einput. This edge labeling is crucial in ensuring the model learns to
label outgoing edges of a node with the same label as the node itself. Empirically, we find that labeling
incoming edges with a node’s label results in performance degradation.
6.2.3 Implementation and Training
The implementation of our scheme is available at: https://github.com/GCNScheduler/GCNScheduler.
We employ a 3-layer EDGNN with 32 nodes per layer and a LeakyReLU activation function. Both
nodes and edges are embedded since they possess features.
GCNScheduler is trained for each objective—makespan minimization and throughput
maximization—only once. Although the total number of machines is assumed to be known, this only
affects the input size of the GCN model. After training, GCNScheduler avoids assigning tasks to machines
that are either disconnected or unavailable.
96
To create a diverse dataset, we generate a dataset {G
(i)
input,L
(i)}
Ng
i=1 for a sufficiently large Ng (e.g.,
Ng = 100), where L
(i)
represents the labels from the teacher scheduler for nodes and edges of the ith
input graph G
(i)
input. Each input graph G
(i)
input is constructed from a task graph G
(i)
T ask, task computation
amounts pi
, machine execution speeds ei
, and communication bandwidth matrix Bi
. The components of
different input graphs are set independently.
We generate random DAGs for task graphs G
(i)
T ask in two ways: 1) establishing an edge between any
two tasks with a given probability (edge probability, EP) and then pruning to form a DAG, and 2)
specifying the width and depth of the graph and randomly selecting successive tasks for each task. To
resemble real application task graphs, we set EP ≤ 0.25 for the former and a width and depth of 5 and 10,
respectively, for the latter.
Given that HEFT and TP-HEFT are extremely slow for large-scale task graphs, obtaining labels for a
single large graph is cumbersome. Therefore, we use medium-sized task graphs (with ≤ 50 tasks each) that
HEFT and TP-HEFT can handle.
For network settings of input graphs, we let the network operate in both normal heterogeneous mode
(bandwidths and execution speeds randomly drawn from a uniform distribution for 80% of input graphs)
and straggler mode (for 20% of input graphs) where one or more machines have poor execution speed or
communication bandwidth.
We use a batch size of 16, a learning rate of 10−3
, and the Adam optimizer. The model is trained for a
maximum of 50 epochs with early stopping based on the validation set cross-entropy loss. All experiments
are performed with 10-fold cross-validation.
GCNScheduler achieves around 91% accuracy in labeling nodes according to the schedules produced
by the teacher scheduler. However, the primary focus is on the makespan or throughput rather than
individual task labeling accuracy. Training the model for a given objective takes less than a minute on our
local cluster (with 16 CPUs, each having 8 cores and 2 threads per core, Intel(R) Xeon(R) E5-2620 v4 @
97
2.10GHz). This training efficiency is a significant advantage over RL-based schedulers like READYS [128],
which take around 20 minutes to train their models.
T3 T4 T5
T1
T2
T6 T7
T8
��,�: =
�(
�(
�(
�*
⋯ �(
�,-
.
��,� =
�0
�(
�0
�*
⋯ �0
�,-
.
��,(�,�): =
�(,6
�(,(
�(,6
�(,*
⋯ �(,6
�,-,,-
.
��,(�,�)
��,(�,�)
��,(�,�)
��,(�,�)
��,(�,�)
��,�
��,(�,�)
��,� ��,�
��,� ��,�
��,�
��,(�,�)
��,(�,�)
��,(�,�)
��,(�,�)
��,(�,�)
Figure 6.2: An illustration of the designed input graph fed into the EDGNN model for the task graph shown
in Fig. 6.1a. Node and edge features are represented with brown and green colors, respectively.
6.3 Simulation Results
In this section, we evaluate the performance of our trained model using two metrics: makespan and
throughput, for various scenarios. For each metric, we measure the performance of GCNScheduler and the
time taken to find the schedule (TTFS). We compare GCNScheduler’s performance against benchmarks,
including HEFT [13] and READYS [128] for makespan minimization, and TP-HEFT and a random task
scheduler for throughput maximization. All evaluations are conducted on our local cluster equipped with
16 CPUs (8 cores and 2 threads per core) of Intel(R) Xeon(R) E5-2620 v4 @ 2.10GHz.
To demonstrate GCNScheduler’s generalization capability, we test our trained model on both synthetic
(medium-scale and large-scale) task graphs and real-world application task graphs. The real-world
applications include Cycles [148] (simulating crop production and water, carbon, and nitrogen cycles),
Epigenomics [149] (genome sequencing operations), Montage [150] (astronomical image mosaicking),
Seismology [151], and three perception applications: Face recognition, Object-and-pose recognition, and
Gesture recognition, from [152], depicted in 6.3. For simplicity, we assume each task produces the same
amount of data after execution.
98
Source
Copy
Tiler
Detect
Feature Merger
Graph Splitter
Classifier
Reco. Merge
Display
Source
Copy
Feature Merger
Descaler
Cluster joiner
Display
Tiler
Scaler
Feature Splitter
Match joiner
SIFT
Model matcher
Cluster splitter
Clustering
RANSAC
Source
Copy
Display
Scaler
Tiler
Detect
Feature Merger
Descaler
Pair generator
Motion SIFT Motion SIFT
Copy
Copy
Scaler
Tiler
Detect
Face Merger
Descaler
Face Detect
Copy
a) Face Recognition b) Object and Pose Recognition c) Gesture Recognition
Figure 6.3: Task graphs of the three perception application considered in [152].
For the network settings, we consider six machines with execution speeds and communication
bandwidths drawn randomly from uniform distributions. The reported makespan and throughput are
averages over 10 iterations.
6.3.1 Makespan Minimization
The trained model (optimized for makespan minimization) labels tasks and assigns them to specific
machines. We evaluate GCNScheduler’s performance on medium-scale and large-scale task graphs and the
task graphs of real-world applications provided in [152, 150, 148, 149, 151].
Table 6.1 shows the average makespan of GCNScheduler (with the makespan-minimization objective)
compared to HEFT [13], READYS [128], and the random task scheduler for medium-size task graphs with
varying numbers of tasks. The time taken to find the schedule for GCNScheduler, HEFT [13], and
READYS [128] is presented in Table 6.2. GCNScheduler achieves a similar makespan to HEFT [13] and
READYS [128], while significantly outperforming them in schedule finding time by 1-2 orders of
magnitude.
99
Fig. 6.4 illustrates the average makespan for our proposed GCNScheduler (top plot) and the random
task scheduler (bottom plot) in large-scale settings where the number of tasks ranges from 3,500 to 5,000
and the edge probability (EP) is set to 0.005, 0.01, and 0.02. It is evident that our GCNScheduler
significantly reduces the makespan, achieving up to an 8-fold reduction for higher EP values. The
significant improvement for larger EP (indicating higher node degrees) is due to the fact that some tasks
may depend on a larger number of predecessor tasks being executed beforehand. Random task assignment
might place one of these predecessor tasks on a machine with suboptimal computing power or
communication bandwidth, leading to an increased average makespan. In contrast, GCNScheduler
leverages inter-task dependencies and network settings (such as machine execution speeds and
communication bandwidths) through carefully designed node and edge features, resulting in a significantly
lower makespan.
Fig. 6.5 depicts the inference time (the time taken to assign tasks to compute machines) of our
GCNScheduler for different numbers of tasks and varying EP. Our GCNScheduler requires less than 80
milliseconds to assign labels for each of these large-scale task graphs. This highlights the significant
advantage of our approach, making it an ideal alternative to state-of-the-art scheduling schemes that
struggle to efficiently manage complex jobs with thousands of tasks and intricate inter-task dependencies.
Table 6.1: Makespan comparison of GCNScheduler, HEFT [13], READYS [128], and the random scheduler
for different numbers of tasks with a task graph width of 10.
Scheduler Size of task graph (NT )
40 50 60 70 80 90 100
GCNSch. 3.998 4.671 7.413 9.368 11.228 13.417 14.996
HEFT [13] 3.905 4.629 7.516 9.549 11.357 13.571 15.076
READYS [128] 3.883 4.539 7.124 9.469 11.218 13.172 14.757
Random 6.946 9.522 14.165 19.044 22.663 29.034 28.544
Table 6.3 presents the makespan and TTFS for GCNScheduler (with the makespan-minimization
objective) compared to HEFT [13] and READYS [128] for seven real-world applications. While
GCNScheduler achieves similar makespan performance compared to benchmarks, it significantly reduces
100
Table 6.2: Time taken (in seconds) by GCNScheduler, HEFT [13], and READYS [128] to perform scheduling
for medium-scale synthetic task graphs with different numbers of tasks.
Scheduler Size of task graph (NT )
40 50 60 70 80 90 100
GCNSch. 0.00345 0.00369 0.00352 0.00355 0.00371 0.00376 0.00385
READYS [128] 0.08532 0.11251 0.14263 0.17541 0.21113 0.24763 0.28794
HEFT [13] 0.3051 0.3632 0.9907 2.7705 6.4545 13.8825 25.0067
Table 6.3: Makespan and time to find schedule (TTFS) (in seconds) for HEFT [13], READYS [128], and
GCNScheduler for real-world applications from [152, 150, 149, 148, 151].
Scheduler
Application HEFT [13] READYS [128] GCNScheduler
Makespan TTFS(s) Makespan TTFS(s) Makespan TTFS(s)
Face Recog. [152] 0.248 0.3543 0.244 0.0437 0.248 0.0043
Pose Recog. [152] 0.222 0.6583 0.219 0.0493 0.222 0.0044
Gesture Recog. [152] 0.744 0.7015 0.731 0.0562 0.256 0.0050
Montage [150] 633.401 14.6947 630.523 1.4902 631.740 0.0061
Epigenomics [149] 111.521 8.1977 110.926 0.9645 112.167 0.0058
Cycles [148] 237.566 2.3316 235.997 0.2725 237.752 0.0057
Seismology [151] 3.549 0.4607 3.112 0.0536 3.149 0.0042
the time taken to find the schedule, demonstrating its efficiency and potential for various applications,
including business and security.
6.3.2 Throughput Maximization
We also evaluate the performance of GCNScheduler in terms of throughput on both medium-scale and
large-scale task graphs, as well as real-world application task graphs, comparing it against benchmark
algorithms.
Table 6.4 presents the throughput and TTFS for GCNScheduler (optimized for throughput) and
TP-HEFT [19] on medium-size task graphs with varying numbers of tasks. GCNScheduler achieves slightly
higher throughput compared to TP-HEFT [19] and performs significantly better in terms of TTFS.
Since TP-HEFT [19] and other existing scheduling algorithms are extremely slow for very large task
graphs (e.g., graphs with several thousand tasks), we only compare the throughput of GCNScheduler
101
0.005
0.01
0.02
0
5
10
15
20
25
30
3,500 4,000 4,500 5,000
Edge Probability
Time(s)
Number of Tasks
Makespan of GCNScheduler for Large-scale Settings
0.005
0.01
0.02
0
5
10
15
20
25
30
3,500 4,000 4,500 5,000
Edge Probability
Time(s)
Number of Tasks
Makespan of Random Secheduling for Large-scale Settings
Figure 6.4: Makespan of GCNScheduler and the random scheduler for large-scale task graphs with different
number of tasks and different EP.
against the random task scheduler. The results are shown in Table 6.6. Additionally, Table 6.7 displays the
time taken to assign tasks to compute nodes for these large-scale task graphs. It is evident that
GCNScheduler (optimized for throughput maximization) is remarkably fast and efficient in handling
large-scale task graphs.
Table 6.5 displays the throughput and TTFS for GCNScheduler (optimized for throughput) compared
to TP-HEFT [19] for real-world applications. While GCNScheduler achieves slightly better throughput
performance, it significantly reduces the TTFS by 2-3 orders of magnitude, as demonstrated in Table 6.5.
6.4 Summary
In this chapter, we introduced GCNScheduler, a graph convolutional network (GCN)-based model designed
for efficient task scheduling in distributed computing environments. Our research demonstrated the
102
Figure 6.5: Inference time of our GCNScheduler for large-scale task graphs with varying numbers of tasks
and different edge probabilities (EP).
Table 6.4: Throughput (TP) and time to find schedule (TTFS) in seconds for GCNScheduler (optimized for
throughput), Throughput (TP)-HEFT algorithm [19], and READYS [128] for medium-size synthetic task
graphs with varying numbers of tasks.
Size of task graph (NT )
Scheduler 100 200 300 400
TP TTFS(s) TP TTFS(s) TP TTFS(s) TP TTFS(s)
GCNSch. 3.1254 0.0033 3.1251 0.0049 3.1261 0.0071 3.1185 0.0078
TP-HEFT 2.1731 6.9235 2.1690 27.229 2.8034 79.940 2.9046 115.221
READYS 1.0952 0.2881 1.0778 0.5901 1.0119 0.8992 1.0324 1.1972
versatility of GCNs in capturing and leveraging the complex interdependencies between tasks and
computing resources, conceptualized as nodes and edges within a graph. GCNScheduler’s innovative
approach integrates task and compute graphs into a unified framework, allowing for the effective
scheduling of tasks to optimize specific objectives such as minimizing makespan or maximizing
throughput.
We detailed the architecture and training process of GCNScheduler, highlighting its ability to learn
from existing scheduling algorithms, referred to as teacher schedulers, including HEFT for makespan
minimization and TP-HEFT for throughput maximization. Through extensive simulations, we evaluated
GCNScheduler’s performance against these benchmarks and demonstrated its capability to handle
103
Table 6.5: Throughput and time to find schedule (TTFS) in milliseconds for TP-HEFT [19], READYS [128],
and GCNScheduler for real-world applications.
Scheduler
Application TP-HEFT [19] GCNScheduler
TP TTFS(ms) TP TTFS(ms)
Face Recog. [152] 1.198 87.34 1.592 0.488
Pose Recog. [152] 1.245 257.8 1.578 0.511
Gesture Recog. [152] 1.461 290.1 1.462 0.560
Montage [150] 1.474 5,832.337 1.497 5.596
Epigenomics [150] 2.581 26,444.576 2.542 5.799
Cycles [148] 8.541 4,267.352 8.372 5.289
Seismology [151] 3.853 9,971.013 3.874 4.803
Table 6.6: Throughput of GCNScheduler (with the throughput-maximization objective) and the random
task scheduler for large-scale task graphs.
NT
3,500 4,000 4,500 5,000
GCNSch. 2.9737 2.9731 2.9733 2.9734
Random 0.6344 0.6336 0.6332 0.6331
large-scale task graphs, significantly outperforming traditional scheduling schemes in terms of speed and
scalability.
Our experimental setup included both synthetic and real-world task graphs to validate
GCNScheduler’s effectiveness. We tested various configurations and network settings to ensure robustness
and generalizability. The results showed that GCNScheduler achieves comparable performance to HEFT
and TP-HEFT in terms of makespan and throughput while drastically reducing the time required to
generate schedules, often by several orders of magnitude.
In summary, GCNScheduler emerges as a powerful tool for task scheduling in distributed systems,
offering significant advantages in computational efficiency and scalability. Its adaptability to various
Table 6.7: Time taken (in milliseconds) by GCNScheduler (with throughput objective) to schedule large-scale
task graphs.
NT
3,500 4,000 4,500 5,000
GCNSch. 66.050 74.978 83.817 87.388
104
scheduling objectives and ability to learn from different teacher schedulers underscore its potential as a
cornerstone for future task scheduling solutions. As a future direction, we plan to explore the performance
of GCNScheduler with additional scheduling objectives and teacher algorithms, and extend our supervised
learning approach to an unsupervised one, further enhancing its applicability and effectiveness in diverse
computing environments.
105
Chapter 7
Graph Kolmogorov Arnold Networks
In this chapter, we extend the foundational principles of Kolmogorov-Arnold Networks (KANs) into the
domain of graph-based data by introducing Graph Kolmogorov-Arnold Networks (GKANs)∗ †
. These
networks are specifically engineered to address the scalability and flexibility challenges inherent in
traditional Graph Convolutional Networks (GCNs). GKANs, akin to KANs, incorporate learnable functions
directly within the graph edges, facilitating more dynamic feature propagation across extensive and
evolving graph structures. This novel approach aims to surpass the limitations of existing graph learning
methodologies.
Our software experiments, grounded in real-world datasets, demonstrate that GKANs significantly
exceed the performance of state-of-the-art GCNs in both classification accuracy and computational
efficiency. Our key contribution is the integration of the data’s graph structure with the innovative
architecture of KANs, setting a new benchmark for graph-based deep learning models in managing
large-scale, dynamic datasets effectively. Although this work focuses on GCNs, we believe that GKANs
represent a pivotal advancement in graph representation learning. They have the potential to form the
foundation for various approaches that previously relied on Multi-Layer Perceptrons (MLPs) at their core,
∗This chapter is adapted from [92].
†We plan to make our GKAN software implementation publicly available on GitHub soon at the following URL: https:
//github.com/ANRGUSC/GKAN.
106
including GCNs [45, 40, 46, 47, 49], Graph Attention Networks (GAT) [153, 154], Graph Autoencoders [155,
156], Graph Transformers [157, 158, 159, 160], and numerous other graph deep learning frameworks.
7.1 Overview on Graph Convolutional Networks
Traditional GCNs postulate that node labels y are a function of both the graph structure (i.e., adjacency
matrix A) and node features X, formally y = f(X, A). A multi-layer Graph Convolutional Network
(GCN) follows the layer-wise propagation rule:
H(l+1) = σ
D˜ − 1
2 A˜D˜ − 1
2 H(l)W(l)
, (7.1)
where A˜ = A + IN is the adjacency matrix with added self-connections, IN is the identity matrix, and
D˜
ii =
P
j A˜
ij . Here, W(l) denotes the trainable weight matrix at layer l, and σ(·) is an activation function,
such as ReLU. The matrix H(l) ∈ R
N×D represents the activations in the l-th layer, with H(0) = X.
In [40], it was shown that this propagation rule is motivated by a first-order approximation of localized
spectral filters on graphs, a concept initially developed by [161] and further explored by [162]. This rule
facilitates iterative transformations across the graph, incorporating node features and graph topology to
learn meaningful hidden layer representations in a scalable manner.
For illustrative purposes, consider a conventional two-layer GCN for graph node classification with a
symmetric adjacency matrix, whether binary or weighted, as depicted in Figure 7.1. A preprocessing step
involves computing Aˆ = D˜ − 1
2 A˜D˜ − 1
2 . The forward model is then expressed as:
Z = f(X, A) = softmax
AˆReLU
AXW ˆ (0)
W(1)
, (7.2)
where W(0) ∈ R
C×H is the weight matrix mapping inputs to a hidden layer with H feature maps, and
W(1) ∈ R
H×F maps hidden layer features to the output layer. The softmax activation function, applied to
107
A
D
C
F
B
E
A
A
A
A
B
B
C
C
D
E
F
INPUT GRAPH
TARGET NODE
hD
(2)
CONVOLVE(2)
CONVOLVE (1)
ℎ!
(#)
ℎ%(!)
(#)
"
hB
(1)
hC
(1) hA
(2)
Figure 7.1: Overview of a two-layer GCN [49] architecture.
each row, is defined as softmax(xi) = P
exp(xi)
i
exp(xi)
. For semi-supervised multi-class classification tasks, the
cross-entropy error across all labeled examples is:
L = −
X
l∈YL
X
F
f=1
Ylf lnZlf , (7.3)
where YL denotes the set of labeled node indices. The neural network weights W(0) and W(1) are
optimized using gradient descent. The spatial-based graph representation of the embeddings described is
presented in Fig. 7.1, as introduced by [49].
7.2 Proposed GKAN Architectures
In this model, the process of node classification utilizes both the graph structure and node features through
successive layers of transformation and non-linearity, providing an effective mechanism for learning from
both labeled and unlabeled data in a semi-supervised setting. The simplicity and efficiency of this approach
allow the model to capture complex patterns of node connectivity and feature distribution.
We propose two straightforward architectures of GKAN, named Architecture 1 and Architecture 2,
which we detail in the following subsections.
108
A
D
C
F
B
E
A
A
A
A
B
B
C
C
D
E
F
INPUT GRAPH
TARGET NODE KAN Layer (2) KAN Layer
(1)
ℎ!
(#$%)
ℎ%
(#$%)
ℎ!
(#)
ℎ%
(#)
ℎ'
(#)
∅ " =∑!"#$ $!%!(")
∅ " = )
%"#
##
$,
* %,
* (,)
-' =10
-' =5
hA
(2)
hB
(1)
hC
(1)
h D
(1)
Figure 7.2: Overview of a two-layer GKAN Architecture 1.
7.2.1 GKAN Architecture 1
In this architecture, the embeddings of nodes at layer ℓ + 1 are generated by passing the aggregated (e.g.,
summation) node embeddings at layer ℓ through KANLayer(ℓ)
. Mathematically, this is expressed as:
H
(ℓ+1)
Archit. 1 = KANLayer(AHˆ
(ℓ)
Archit. 1) (7.4)
with H
(0)
Archit. 1 = X. The implementation of KANLayer is presented in the Appendix. Considering L layers
for this architecture, the forward model is presented as Z = softmax(H
(L)
Archit. 1). This represents a
spectral-based method. The construction of node embeddings in different layers for the spatial-based
representation of GKAN Architecture 1 is shown in Fig. 7.2.
109
A
D
C
F
B
E
A
A
A
A
B
B
C
C
D
E
F
INPUT GRAPH
TARGET NODE KAN Layer (2)
KAN Layer(1)
h B
(1)
hA
(2)
hD
(1)
hC
(1)
Figure 7.3: Overview of a two-layer GKAN Architecture 2.
7.2.2 GKAN Architecture 2
In contrast to the first architecture, here we first pass the embeddings through the KANLayer, then
aggregate the result according to the normalized adjacency matrix. Formally, the embeddings of nodes at
layer ℓ + 1 are given by:
H
(ℓ+1)
Archit. 2 = Aˆ KANLayer(H
(ℓ)
Archit. 2) (7.5)
with H
(0)
Archit. 2 = X. Similar to Architecture 1, the forward model is Z = softmax(H
(L)
Archit. 2). This is also a
spectral-based representation. The construction of node embeddings in different layers for the
spatial-based representation of GKAN Architecture 2 is depicted in Fig. 7.3.
7.3 Experiments
Data: We utilize the Cora dataset, a citation network described in [163], which comprises 2,708 nodes and
5,429 edges where nodes represent documents and edges represent citation links. The dataset includes 7
classes and features 1,433 attributes per document. The dataset is split into a training set with 1,000
samples, a validation set with 200 samples, and a test set with 300 samples. Cora uses sparse bag-of-words
110
vectors for each document, and the citation links among these documents are considered as undirected
edges to form a symmetric adjacency matrix. Each document is labeled with a class, and although only 20
labels per class are actively used for training, all feature vectors are employed. These characteristics
highlight the structured and informative nature of the Cora dataset, which is essential for network training
and evaluation.
Overview of Experiments: Our experiments are divided into two parts. First, we compare the
accuracy of the two GKAN architectures against the conventional GCN on both training and test data. To
ensure a fair comparison between GCN and GKAN, we equalize the parameter sizes. Second, we analyze
the impact of various parameters for GKANs, such as the number of hidden nodes and the order and grid
parameters for the B-splines used in the learnable univariate functions.
7.3.1 Comparison with GCN
To fairly evaluate the performance of GKAN compared to GCN, we ensure that both networks have the
same number of layers with dimensions specified as:
GCN : [dinput : hGCN : C]
GKAN Architecture 1 : [dinput : h : C] (7.6)
GKAN Architecture 2 : [dinput : h : C] (7.7)
where dinput, hGCN , h, and C respectively represent the input feature dimension, the hidden layer
dimension of GCN, the hidden layer dimension of GKAN architectures, and the number of classes. To
ensure a fair comparison, we set hGCN higher than h for GCN to approximate the total number of
parameters in both schemes. The total number of trainable parameters for GCN and GKAN Architectures 1
and 2 are measured using the parameter counting module in PyTorch.
111
The accuracy of our proposed GKAN for different settings of k (the degree of the polynomial in KAN
settings) and g (the number of intervals in KAN settings) against GCN [40] on the Cora dataset using the
first 100 features is shown in Table 7.1. Additionally, we present the accuracy of GKAN architectures
compared to GCN [40] on the Cora dataset using the first 200 features in Table 7.2. The results indicate that
all the GKAN variants perform better in terms of accuracy compared to a GCN with a comparable number
of parameters. Furthermore, Architecture 2 generally performs better. For the case of 100 features, the best
GKAN model shows more than an 8% higher accuracy than the conventional GCN model; for 200 features,
the best GKAN model shows a more than 6% higher accuracy than the conventional GCN model.
Table 7.1: Architectures and their performance on the first 100 features of the Cora Dataset.
Architecture #Parameters Test
GCNhGCN =205 22,147 53.50
GKAN(Archit. 1)
(k=1,g=10,h=16) 22,279 59.32
GKAN(Archit. 2)
(k=1,g=10,h=16) 22,279 61.48
GKAN(Archit. 1)
(k=2,g=9,h=16) 22,279 56.76
GKAN(Archit. 2)
(k=2,g=9,h=16) 22,279 61.76
Table 7.2: Architectures and their performance on the first 200 features of the Cora Dataset.
Architecture #Parameters Test
GCNhGCN =104 21,639 61.24
GKAN(Archit. 1)
(k=2,g=2,h=17) 21,138 63.58
GKAN(Archit. 2)
(k=2,g=2,h=17) 21,138 64.10
GKAN(Archit. 1)
(k=1,g=2,h=20) 20,727 67.44
GKAN(Archit. 2)
(k=1,g=2,h=20) 20,727 67.66
In the following sections, we evaluate the performance of GCN and the proposed GKAN Architectures
1 and 2 on the Cora dataset considering only the first 100 features.
112
7.3.1.1 Accuracy and Loss Values vs. Epochs
To ensure a fair comparison between GKAN(k=1,g=3,h=16) and GCN(hGCN =100), we set h = 16 and
hGCN = 100, maintaining an almost identical number of trainable parameters. For these settings, the total
number of trainable parameters for GCN, GKAN Architecture 1, and GKAN Architecture 2 are 10807,
10295, and 10295, respectively.
Figures 7.4a and 7.4b display the training and test accuracy for GCN and our proposed GKAN
architectures with k = 1 and g = 3 on the Cora dataset using the first 100 features. As observed, GKAN
architectures significantly enhance the test accuracy of GCN by a margin of 6%. Figures 7.5a and 7.5b
present the training and test loss for GCN and our proposed GKAN architectures under the same settings
of k = 1 and g = 3. It is evident that GKAN architectures lead to a rapid reduction in loss values,
indicating fewer epochs are needed for training, consistent with previous findings [53] about KAN
compared to Multi-Layer Perceptron (MLP).
(a) Training accuracy of different schemes. (b) Test accuracy of different schemes.
Figure 7.4: Accuracy comparison of GCN and GKAN architectures for k = 1 and g = 3.
7.3.2 Evaluating the Influence of Parameters on GKAN
In this section, we investigate the impact of the parameters g (spline grid size), k (spline polynomial
degree), and h (number of hidden nodes) on the performance of GKAN. To isolate the effect of each
parameter, we vary one while keeping the others at their default values. The default settings for g, k, and h
113
(a) Training loss of different schemes. (b) Test loss of different schemes.
Figure 7.5: Loss value comparison of GCN and GKAN architectures for k = 1 and g = 3.
are 3, 1, and 16, respectively. Table 7.3 outlines the range of values tested for each parameter, providing a
structured framework for our investigation. This approach allows us to systematically determine how each
parameter affects the overall effectiveness of our model. Given our earlier observations, we focus on GKAN
architecture 2 in this section.
Table 7.3: Range of Values for Parameters for the degree of the spline polynomial k, grid size for spline g,
and number of hidden nodes h, with default values in bold
Parameter Values
k {1, 2, 3}
g {3, 7, 11}
h {8, 12, 16}
7.3.2.1 Effect of Varying Grid Size g
Figures 7.6a and 7.6b show the training and test accuracy of GKAN Architecture 2 for different values of
grid g (i.e., g = 3, g = 7, and g = 11) with a fixed k = 1. Based on the validation performance, the optimal
grid size for this problem is an intermediate value of g = 7. While increasing the grid size from g = 3 to
g = 7 provides some benefits, a higher grid size of g = 11 results in decreased performance, similar to that
of g = 3.
114
(a) Training accuracy of GKAN Architecture 2 for
different grid sizes g with k = 1.
(b) Test accuracy of GKAN Architecture 2 for different
grid sizes g with k = 1.
Figure 7.6: Accuracy comparison of GKAN Architecture 2 for g ∈ {3, 7, 11}, k = 1, and h = 16.
The training and test loss values for GKAN Architecture 2 with varying grid sizes g and a fixed degree
k = 1 are shown in Figures 7.7a and 7.7b, respectively.
(a) Training loss values of GKAN Architecture 2 for
different grid sizes g with k = 1.
(b) Test loss values of GKAN Architecture 2 for different grid sizes g with k = 1.
Figure 7.7: Loss values of GKAN Architecture 2 for g ∈ {3, 7, 11}, k = 1, and h = 16.
7.3.2.2 Effect of Varying the Degree of Polynomials k
We examine the accuracy of GKAN Architecture 2 for different polynomial degrees k, ranging from k = 1
to k = 3, while keeping the grid size fixed at g = 3, as shown in Figures 7.8a and 7.8b. The results indicate
that a degree of one provides the best performance among these values, suggesting the underlying
ground-truth function may be piece-wise linear. Figures 7.9a and 7.9b present the training and test loss
values, respectively.
115
(a) Training accuracy of GKAN Architecture 2 for
k ∈ {1, 2, 3} with g = 3.
(b) Test accuracy of GKAN Architecture 2 for k ∈
{1, 2, 3} with g = 3.
Figure 7.8: Accuracy of GKAN Architecture 2 for k ∈ {1, 2, 3}, g = 3, and h = 16.
(a) Training loss values of GKAN Architecture 2 for
k ∈ {1, 2, 3} with g = 3.
(b) Test loss values of GKAN Architecture 2 for k ∈
{1, 2, 3} with g = 3.
Figure 7.9: Loss values of GKAN Architecture 2 for k ∈ {1, 2, 3}, g = 3, and h = 16.
116
7.3.2.3 Effect of Varying the Size of Hidden Layer
Figures 7.10a and 7.10b depict the training and test accuracy of GKAN Architecture 2 for hidden layer sizes
h ∈ {8, 12, 16}, with k = 1 and g = 3 held constant. Additionally, Figures 7.11a and 7.11b show the
accuracy over 600 epochs for the same parameter settings. These results indicate that a hidden layer size of
h = 12 is particularly effective during initial training phases and achieves nearly the same test
performance as h = 16.
(a) Training accuracy of GKAN Architecture 2 for
h ∈ {8, 12, 16} with g = 3 and k = 1.
(b) Test accuracy of GKAN Architecture 2 for h ∈
{8, 12, 16} with g = 3 and k = 1.
Figure 7.10: Accuracy comparison of GKAN Architecture 2 for h ∈ {8, 12, 16}, g = 3, and k = 1.
(a) Training accuracy of GKAN Architecture 2 for
h ∈ {8, 12, 16} over 600 epochs with g = 3 and
k = 1.
(b) Test accuracy of GKAN Architecture 2 for h ∈
{8, 12, 16} over 600 epochs with g = 3 and k = 1.
Figure 7.11: Accuracy comparison of GKAN Architecture 2 for h ∈ {8, 12, 16}, g = 3, and k = 1 over 600
epochs.
117
We also present the training and test loss values for GKAN Architecture 2 with hidden layer sizes
h ∈ {8, 12, 16}, fixing the polynomial degree at k = 1 and grid size g = 3, as shown in Figures 7.12a and
7.12b, respectively.
(a) Training loss values of GKAN Architecture 2 for
h ∈ {8, 12, 16} with g = 3 and k = 1.
(b) Test loss values of GKAN Architecture 2 for h ∈
{8, 12, 16} with g = 3 and k = 1.
Figure 7.12: Loss values of GKAN Architecture 2 for h ∈ {8, 12, 16}, g = 3, and k = 1.
7.4 Summary
In this chapter, we considered how to apply the idea of learnable functions from the recently-proposed
Kolmogorov-Arnold Neural Networks (KANs) to graph-structured data. We introduced, for the first time,
two distinct architectures for Graph Kolmogorov-Arnold Networks (GKANs). Empirical evaluations on the
Cora dataset demonstrated that GKANs achieve significantly better parameter-efficiency than conventional
GCNs, yielding higher accuracy for comparable parameter sizes. We also investigated the impact of various
parameters such as the number of hidden nodes, grid size, and the spline order parameter on performance.
Our findings suggest that GKANs open a new avenue in graph representation learning and could serve
as the foundation for various approaches that previously relied on MLPs at their core, such as GCNs, GAT,
Graph Autoencoders, Graph Transformers, and other graph deep learning schemes. Promising directions
for future research include exploring and evaluating extensions of these approaches using KAN on more
comprehensive datasets. GKANs currently inherit the property of KANs in that the training process is
118
rather slow, and [53] leave to future work the task of optimizing training time. Advances in alternative
learning approaches and architectures for KAN could also be applied to GKANs in the future.
119
Chapter 8
Conclusion
In this dissertation, we have introduced a series of innovative contributions to the advancements of
distributed computing as well as graph neural networks. Our research was founded on extensive analysis
and the development of novel methods designed to enhance the performance of distributed computing and
graph representation learning.
Firstly, we introduced Blizzard, a Byzantine Fault Tolerant (BFT) distributed ledger protocol designed
to integrate mobile devices as primary participants in the consensus process. Blizzard features an
innovative two-tier architecture where mobile nodes engage through online brokers, coupled with a
decentralized matching mechanism to ensure connections between nodes and a set number of random
brokers. We provided a mathematical analysis that establishes a guaranteed safety region, defining the
proportions of malicious nodes and brokers under which the protocol remains secure. Additionally,
Blizzard’s performance was evaluated in terms of throughput, latency, and message complexity. Our
experimental results from a software implementation demonstrated that Blizzard achieves throughput of
several thousand transactions per second per shard, with sub-second confirmation times. We also
presented an innovative protocol built on Graph Convolutional Networks (GCN) to perform consensus for
blockchain technologies. We concentrated on validating the compatibility of our GCN-based protocol with
the Avalanche protocol, which is well-known for its efficient handling of conflicting transactions. Through
120
comprehensive simulations, we demonstrated that our GCN-based consensus protocol effectively emulates
the Avalanche approach, thereby maintaining strong safety and reliability in the resolution of transaction
conflicts.
Furthermore, we introduced a new task scheduling framework designed to speed up computational
applications that rely on distributed iterative processes executed across networked computing resources.
Each application involves multiple tasks that generate data with each iteration, which must be processed
by adjacent tasks, forming a structure that can be represented as a directed graph. Initially, we modeled the
problem as a Binary Quadratic Program (BQP), considering both computational and communication costs,
and established that the problem is NP-hard. Subsequently, we relaxed the formulation to a Semi-Definite
Program (SDP) and applied a randomized rounding method using a multi-variate Gaussian distribution. We
also calculated the expected bottleneck time. This scheduling scheme was then implemented in
gossip-based federated learning to demonstrate its effectiveness in real-world iterative processes. Our
numerical tests on the MNIST and CIFAR-10 datasets revealed that our method surpasses established
distributed computing scheduling techniques.
We also presented a highly efficient solution to the traditional challenge of scheduling tasks of complex
applications in distributed computing systems. Previously, various heuristics were developed to optimize
task scheduling based on different metrics such as makespan and throughput, but they often proved too
slow for large-scale and dynamic systems. To address these limitations, we introduced a novel approach:
the Graph Convolutional Network-based Scheduler (GCNScheduler). This method integrates the inter-task
data dependencies and the computational network into a unified input graph, allowing for rapid and
efficient task scheduling for different objectives. Our simulations showed that the GCNScheduler not only
learns from existing scheduling methods quickly but also scales effectively to large systems where
traditional methods fail to do so. We also validated the ability of the GCNScheduler to generalize to unseen
121
real-world applications, demonstrating that it matches the makespan and throughput of benchmark
methods while taking several orders of magnitude less time in finding the schedule.
Lastly, we introduced Graph Kolmogorov-Arnold Networks (GKAN), a novel neural network
architecture that adapts the principles of Kolmogorov-Arnold Networks (KAN) to graph-structured data.
By leveraging the unique properties of KANs, specifically the use of learnable univariate functions in place
of fixed linear weights, we created a powerful model for graph-based learning tasks. Unlike traditional
Graph Convolutional Networks (GCNs), which rely on a fixed convolutional framework, GKANs employ
learnable spline-based functions between layers, revolutionizing information processing within graph
structures. We proposed two methods for integrating KAN layers into GKAN: in the first architecture,
learnable functions are applied to input features after aggregation, and in the second, they are applied
before aggregation. Our empirical evaluation on the Cora dataset, using a semi-supervised graph learning
task, demonstrated that GKANs generally outperform traditional GCNs. For instance, with 100 features, a
GCN achieves an accuracy of 53.5%, while a GKAN with a comparable number of parameters achieves
61.76%; with 200 features, a GCN reaches 61.24%, whereas a GKAN achieves 67.66%. We also explored the
effects of various parameters, such as the number of hidden nodes, grid size, and the polynomial degree of
the spline, on the performance of GKAN.
To summarize, these contributions mark significant advancements in distributed computing and graph
neural networks. Through thorough analysis and the development of innovative methodologies, our
research enhances the performance and scalability of both distributed computing systems and graph
representation learning.
122
8.1 Future Directions
Based on the contributions presented in this dissertation, we believe that our work significantly advances
distributed computing and graph representation learning methodologies. Promising avenues for future
research include:
• Incorporating Other Components to Blizzard: Future studies could focus on developing
mobile-device oriented Sybil control to ensure only users with legitimate mobile devices can
participate, deterring the creation or operation of multiple fake identities. To improve scalability, it is
worth exploring sharding, pruning DAG, and off-loading verification. Safety under a partially
synchronous model and connectivity challenges of mobile devices are other interesting real-world
research directions.
• Extending GCN-Consensus Mechanism with Other Properties: The current GCN-Consensus
mechanism is trained to mimic the Avalanche consensus. However, it is worth investigating other
gossip-based protocols where node relations could be represented as a graph. Additionally, it would
be interesting to consider other features, such as the associated risk of each user, to demonstrate the
effectiveness of GCN in capturing complex functions. Furthermore, incorporating a detection
mechanism based on GCN to penalize malicious nodes would be valuable. Lastly, it is interesting to
explore whether GCN could improve the latency of Avalanche by optimizing parent selection, as the
ledger essentially forms a DAG and GCN could potentially be beneficial.
• Improving the Upper Bound to the Optimal Solution of Scheduling Distributed Iterative
Processes: One of the interesting future directions is to derive a tighter upper bound to the optimal
solution of scheduling tasks in distributed iterative processes.
123
• Evaluating GCNScheduler with Respect to Other Objectives: The current GCNScheduler was
evaluated with respect to makespan minimization and throughput maximization. It would be
interesting to assess its performance with respect to other objectives.
• Designing an Unsupervised Version of GCNScheduler: The proposed GCNScheduler is a
supervised framework that requires a teacher algorithm to generate the labels. Extending our
supervised learning approach to an unsupervised one could further enhance its applicability and
effectiveness in diverse computing environments, as well as improving makespan and throughput.
• Extending GCNScheduler to New GNN Frameworks: The core component of GCNScheduler is
based on Graph Convolution Networks. It would be interesting to extend GCNScheduler to be built
upon other graph neural networks such as GAT or Graph Transformer models. Comparing them
with respect to the time to find the schedule would be valuable. Additionally, investigating new
graph neural networks that capture the information of the entire graph for node classification is
worth exploring, given that computer network problems such as scheduling require such
information.
• Utilizing GKAN to Build New Graph Neural Networks: Based on the evidence presented in this
dissertation, we believe that GKANs open a new avenue in graph representation learning and could
serve as the foundation for various approaches that previously utilized MLPs at their core, such as
GCNs, GAT, Graph Autoencoders, Graph Transformers, and many other graph deep learning
schemes. Promising avenues for future work include exploring and evaluating extensions based on
all these approaches using KAN over more comprehensive datasets. GKAN currently inherits the
property of present-generation KAN in that the training process is rather slow, and [53] leaves to
future work the task of optimizing training time. Advances in alternative learning approaches and
architectures for KAN could also be applied to GKAN in the future.
124
As distributed computing and graph representation learning continue to evolve in complexity and
scale, the investigation and enhancement of models will be crucial for advancing the performance of
distributed computing systems and graph neural networks.
125
Bibliography
[1] Pirkko Walden, Tomi Dahlberg, and Esko Penttinen. “Introduction to the Minitrack on Digital
Mobile Services for Everyday Life”. In: Proceedings of the 52nd Hawaii International Conference on
System Sciences. 2019.
[2] Ben Greenstein. “Delivering the Mobile Web to the Next Billion Users”. In: Proceedings of the 19th
International Workshop on Mobile Computing Systems & Applications. ACM. 2018, pp. 99–99.
[3] Alex Biryukov and Sergei Tikhomirov. “Security and privacy of mobile wallet users in Bitcoin,
Dash, Monero, and Zcash”. In: Pervasive and Mobile Computing (2019), p. 101030.
[4] Ashish Rajendra Sai, Jim Buckley, and Andrew Le Gear. “Privacy and Security Analysis of
Cryptocurrency Mobile Applications”. In: 2019 Fifth Conference on Mobile and Secure Services
(MobiSecServ). IEEE. 2019, pp. 1–6.
[5] Kam Yu and Juhong Feng. “Moore’s Law and Price Trends of Digital Products: The Case of
Smartphones”. In: Economics of Innovation and New Technology, DOI 10.10438599.2019 (2019),
p. 1628509.
[6] Mehrdad Kiamari, Bhaskar Krishnamachari, Muhammad Naveed, and Seokgu Yun. “Distributed
Consensus for Mobile Devices using Online Brokers”. In: 2020 IEEE International Conference on
Blockchain and Cryptocurrency (ICBC). 2020, pp. 1–3.
[7] Team Rocket, Maofan Yin, Kevin Sekniqi, Robbert van Renesse, and Emin Gün Sirer. “Scalable and
Probabilistic Leaderless BFT Consensus through Metastability”. In: CoRR abs/1906.08936 (2019).
arXiv: 1906.08936.
[8] Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, and Gabriel F. Manso. The Computational
Limits of Deep Learning. 2020. arXiv: 2007.05558 [cs.LG].
[9] Maryam Najafabadi, Flavio Villanustre, Taghi Khoshgoftaar, Naeem Seliya, Randall Wald, and
Edin Muharemagic. “Deep learning applications and challenges in big data analytics”. In: Journal of
Big Data 2 (2015).
126
[10] Ryan Karl, Jonathan Takeshita, and Taeho Jung. “Using Intel SGX to Improve Private Neural
Network Training and Inference”. In: Proceedings of the 7th Symposium on Hot Topics in the Science
of Security. HotSoS ’20. Lawrence, Kansas: Association for Computing Machinery, 2020.
[11] Dimitri P. Bertsekas and John N. Tsitsiklis. “Some aspects of parallel and distributed iterative
algorithms—A survey”. In: Automatica 27.1 (1991), pp. 3–21.
[12] H. Topcuoglu, S. Hariri, and Min-You Wu. “Task scheduling algorithms for heterogeneous
processors”. In: Proceedings. Eighth Heterogeneous Computing Workshop (HCW’99). 1999, pp. 3–14.
[13] H. Topcuoglu, S. Hariri, and Min-You Wu. “Performance-effective and low-complexity task
scheduling for heterogeneous computing”. In: IEEE Transactions on Parallel and Distributed Systems
13.3 (2002), pp. 260–274.
[14] Fatma A. Omara and Mona M. Arafa. “Genetic Algorithms for Task Scheduling Problem”. In:
Foundations of Computational Intelligence Volume 3: Global Optimization. Ed. by Ajith Abraham,
Aboul-Ella Hassanien, Patrick Siarry, and Andries Engelbrecht. Berlin, Heidelberg: Springer Berlin
Heidelberg, 2009, pp. 479–507.
[15] O. Sinnen and L. A. Sousa. “Communication contention in task scheduling”. In: IEEE Transactions
on Parallel and Distributed Systems 16.6 (2005), pp. 503–515.
[16] Lai Tsung-Chyan, Y.N. Sotskov, N.Yu. Sotskova, and F. Werner. “Optimal makespan scheduling with
given bounds of processing times”. In: Mathematical and Computer Modelling 26.3 (1997), pp. 67–86.
[17] Berit Johannes. “Scheduling Parallel Jobs to Minimize the Makespan”. In: 9.5 (2006).
[18] Liu Min and Wu Cheng. “A genetic algorithm for minimizing the makespan in the case of
scheduling identical parallel machines”. In: Artificial Intelligence in Engineering 13.4 (1999),
pp. 399–403.
[19] M. Gallet, L. Marchal, and F. Vivien. “Efficient scheduling of task graph collections on
heterogeneous resources”. In: 2009 IEEE International Symposium on Parallel Distributed Processing.
2009, pp. 1–11.
[20] Leila Eskandari, Jason Mair, Zhiyi Huang, and David Eyers. “Iterative Scheduling for Distributed
Stream Processing Systems”. In: Proceedings of the 12th ACM International Conference on Distributed
and Event-Based Systems. DEBS ’18. Hamilton, New Zealand: Association for Computing
Machinery, 2018, pp. 234–237.
127
[21] Xuan-Qui Pham, Nguyen Doan Man, Nguyen Dao Tan Tri, Ngo Quang Thai, and Eui-Nam Huh. “A
cost- and performance-effective approach for task scheduling based on collaboration between cloud
and fog computing”. In: International Journal of Distributed Sensor Networks 13.11 (2017),
p. 1550147717742073. eprint: https://doi.org/10.1177/1550147717742073.
[22] J. Kennedy and R. C. Eberhart. “A discrete binary version of the particle swarm algorithm”. In: 1997
IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and
Simulation. Vol. 5. 1997, 4104–4108 vol.5.
[23] Sourav Kanti Addya, Ashok Kumar Turuk, Bibhudatta Sahoo, Mahasweta Sarkar, and
Sanjay Kumar Biswash. “Simulated annealing based VM placement strategy to maximize the profit
for Cloud Service Providers”. In: Engineering Science and Technology, an International Journal 20.4
(2017), pp. 1249–1259.
[24] Z. Fan, H. Shen, Y. Wu, and Y. Li. “Simulated-Annealing Load Balancing for Resource Allocation in
Cloud Environments”. In: 2013 International Conference on Parallel and Distributed Computing,
Applications and Technologies. 2013, pp. 1–6.
[25] Henrique Yoshikazu Shishido, Júlio Cezar Estrella, Claudio Fabiano Motta Toledo, and
Marcio Silva Arantes. “Genetic-based algorithms applied to a workflow scheduling algorithm with
security and deadline constraints in clouds”. In: Computers and Electrical Engineering 69 (2018),
pp. 378–394.
[26] Kousik Dasgupta, Brototi Mandal, Paramartha Dutta, Jyotsna Kumar Mandal, and Santanu Dam. “A
Genetic Algorithm (GA) based Load Balancing Strategy for Cloud Computing”. In: Procedia
Technology 10 (2013). First International Conference on Computational Intelligence: Modeling
Techniques and Applications (CIMTA) 2013, pp. 340–347.
[27] Habib Izadkhah and Yangming Li. “Learning Based Genetic Algorithm for Task Graph Scheduling”.
In: Appl. Comp. Intell. Soft Comput. 2019 (Jan. 2019).
[28] Yossi Azar and Amir Epstein. “Convex Programming for Scheduling Unrelated Parallel Machines”.
In: Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing. STOC ’05.
Baltimore, MD, USA: Association for Computing Machinery, 2005, pp. 331–337.
[29] Martin Skutella. “Convex Quadratic and Semidefinite Programming Relaxations in Scheduling”. In:
J. ACM 48.2 (Mar. 2001), pp. 206–242.
128
[30] Poonam Singh, Maitreyee Dutta, and Naveen Aggarwal. “A review of task scheduling based on
meta-heuristics approach in cloud computing”. In: Knowledge and Information Systems 52 (July
2017).
[31] C. Tsai and J. J. P. C. Rodrigues. “Metaheuristic Scheduling for Cloud: A Survey”. In: IEEE Systems
Journal 8.1 (2014), pp. 279–291.
[32] Mala Kalra and Sarbjeet Singh. “A review of metaheuristic scheduling techniques in cloud
computing”. In: Egyptian Informatics Journal 16.3 (2015), pp. 275–295.
[33] Z. Luo, W. Ma, A. M. So, Y. Ye, and S. Zhang. “Semidefinite Relaxation of Quadratic Optimization
Problems”. In: IEEE Signal Processing Magazine 27.3 (2010), pp. 20–34.
[34] Nan Cheng, Feng Lyu, Wei Quan, Conghao Zhou, Hongli He, Weisen Shi, and Xuemin Shen.
“Space/Aerial-Assisted Computing Offloading for IoT Applications: A Learning-Based Approach”.
In: IEEE Journal on Selected Areas in Communications 37.5 (2019), pp. 1117–1129.
[35] Kuljeet Kaur, Sahil Garg, Gagangeet Singh Aujla, Neeraj Kumar, Joel J. P. C. Rodrigues, and
Mohsen Guizani. “Edge Computing in the Industrial Internet of Things Environment:
Software-Defined-Networks-Based Edge-Cloud Interplay”. In: IEEE Communications Magazine 56.2
(2018), pp. 44–51.
[36] Garcia-Piquer, A., Morales, J. C., Ribas, I., Colomé, J., Guàrdia, J., Perger, M., Caballero, J. A.,
Cortés-Contreras, M., Jeffers, S. V., Reiners, A., Amado, P. J., Quirrenbach, A., and Seifert, W.
“Efficient scheduling of astronomical observations - Application to the CARMENES radial-velocity
survey”. In: A&A 604 (2017), A87.
[37] Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. The Datacenter as a Computer: An Introduction
to the Design of Warehouse-Scale Machines, Second Edition. 2013.
[38] Schahram Dustdar, Stefan Nastić, and Ognjen Šćekić. “Smart Cities”. In: The Internet of Things,
People and Systems. Springer, 2017.
[39] Abbas Shah Syed, Daniel Sierra-Sosa, Anup Kumar, and Adel Elmaghraby. “IoT in Smart Cities: A
Survey of Technologies, Practices and Challenges”. In: Smart Cities 4.2 (2021), pp. 429–475.
[40] Thomas N. Kipf and Max Welling. “Semi-Supervised Classification with Graph Convolutional
Networks”. In: International Conference on Learning Representations. 2017.
129
[41] Muhan Zhang and Yixin Chen. Link Prediction Based on Graph Neural Networks. 2018. arXiv:
1802.09691 [cs.LG].
[42] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. “Convolutional Neural Networks on
Graphs with Fast Localized Spectral Filtering”. In: Proceedings of the 30th International Conference
on Neural Information Processing Systems. NIPS’16. Barcelona, Spain: Curran Associates Inc., 2016,
pp. 3844–3852.
[43] William L. Hamilton, Rex Ying, and Jure Leskovec. “Inductive Representation Learning on Large
Graphs”. In: Proceedings of the 31st International Conference on Neural Information Processing
Systems. NIPS’17. Long Beach, California, USA: Curran Associates Inc., 2017, pp. 1025–1035.
[44] Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst.
“Geometric Deep Learning: Going beyond Euclidean data”. In: IEEE Signal Processing Magazine 34.4
(2017), pp. 18–42.
[45] William L. Hamilton, Rex Ying, and Jure Leskovec. Representation Learning on Graphs: Methods and
Applications. 2018. arXiv: 1709.05584 [cs.SI].
[46] Federico Monti, Michael M. Bronstein, and Xavier Bresson. “Geometric matrix completion with
recurrent multi-graph neural networks”. In: Proceedings of the 31st International Conference on
Neural Information Processing Systems. NIPS’17. Long Beach, California, USA: Curran Associates
Inc., 2017, pp. 3700–3710.
[47] Rianne van den Berg, Thomas N. Kipf, and Max Welling. Graph Convolutional Matrix Completion.
2017. arXiv: 1706.02263 [stat.ML].
[48] Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, and Jure Leskovec. “GraphRNN:
Generating Realistic Graphs with Deep Auto-regressive Models”. In: International Conference on
Machine Learning. 2018.
[49] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec.
“Graph Convolutional Neural Networks for Web-Scale Recommender Systems”. In: Proceedings of
the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018).
[50] Kurt Hornik, Maxwell B. Stinchcombe, and Halbert L. White. “Multilayer feedforward networks are
universal approximators”. In: Neural Networks 2 (1989), pp. 359–366.
130
[51] Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,
Lukasz Kaiser, and Illia Polosukhin. “Attention is All you Need”. In: Neural Information Processing
Systems. 2017.
[52] Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, and Lee Sharkey. “Sparse
Autoencoders Find Highly Interpretable Features in Language Models”. In: The Twelfth
International Conference on Learning Representations. 2024.
[53] Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić,
Thomas Y. Hou, and Max Tegmark. KAN: Kolmogorov-Arnold Networks. 2024. arXiv: 2404.19756
[cs.LG].
[54] A.N. Kolmogorov. “On the representation of continuous functions of several variables as
superpositions of continuous functions of a smaller number of variables”. In: Dokl. Akad. Nauk.
1956.
[55] A.N. Kolmogorov. “On the representation of continuous functions of many variables by
superposition of continuous functions of one variable and addition”. In: Dokl. Akad. Nauk. Vol. 114.
1957, pp. 953–956.
[56] Jürgen Braun and Michael Griebel. “On a Constructive Proof of Kolmogorov’s Superposition
Theorem”. In: Constructive Approximation 30 (2009), pp. 653–675.
[57] Ming-Jun Lai and Zhaiming Shen. The Kolmogorov Superposition Theorem can Break the Curse of
Dimensionality When Approximating High Dimensional Functions. 2023. arXiv: 2112.09963
[math.NA].
[58] S Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008.
[59] Peter Fairley. “Ethereum will cut back its absurd energy use”. In: IEEE Spectrum 56.1 (2018),
pp. 29–32.
[60] Arvind Narayanan, Joseph Bonneau, Edward Felten, Andrew Miller, and Steven Goldfeder. Bitcoin
and Cryptocurrency Technologies: A Comprehensive Introduction. USA: Princeton University Press,
2016.
[61] Daniel Davis Wood. “ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION
LEDGER”. In: 2014.
131
[62] Vitalik Buterin. Ethereum Sharding FAQ. https://github.com/ethereum/wiki/wiki/Sharding-FAQ.
Accessed: 2024-06-28. 2018.
[63] Mauro Conti, E. Sandeep Kumar, Chhagan Lal, and Sushmita Ruj. “A Survey on Security and
Privacy Issues of Bitcoin”. In: IEEE Communications Surveys and Tutorials 20.4 (2018), pp. 3416–3452.
[64] Jeffrey Dean and Sanjay Ghemawat. “MapReduce: simplified data processing on large clusters”. In:
Commun. ACM 51.1 (Jan. 2008), pp. 107–113.
[65] K. M. Chandy and J. Misra. “The drinking philosophers problem”. In: ACM Trans. Program. Lang.
Syst. 6.4 (Oct. 1984), pp. 632–646.
[66] George Karypis and Vipin Kumar. “Parallel Multilevel series k-Way Partitioning Scheme for
Irregular Graphs”. In: SIAM Review 41.2 (1999), pp. 278–300. eprint:
https://doi.org/10.1137/S0036144598334138.
[67] William Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI (2nd ed.): portable parallel
programming with the message-passing interface. Cambridge, MA, USA: MIT Press, 1999.
[68] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao,
Marc’aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc Le, and Andrew Ng. “Large
Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems. Ed. by
F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger. Vol. 25. Curran Associates, Inc., 2012.
[69] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley,
Michael J. Franklin, Scott Shenker, and Ion Stoica. “Resilient distributed datasets: a fault-tolerant
abstraction for in-memory cluster computing”. In: Proceedings of the 9th USENIX Conference on
Networked Systems Design and Implementation. NSDI’12. San Jose, CA: USENIX Association, 2012,
p. 2.
[70] Paul Beame, Paraschos Koutris, and Dan Suciu. “Communication Steps for Parallel Query
Processing”. In: Journal of the ACM (JACM) 64 (2017), pp. 1–58.
[71] K. Li. “Energy efficient scheduling of parallel tasks on multiprocessor computers”. In: J
Supercomput 60 (2012), pp. 223–247.
[72] Fatos Xhafa and Ajith Abraham. “Computational models and heuristic methods for Grid scheduling
problems”. In: Future Generation Computer Systems 26.4 (2010), pp. 608–621.
132
[73] Shahid H. Bokhari. Assignment problems in parallel and distributed computing. USA: Kluwer
Academic Publishers, 1987.
[74] Tharam Dillon, Chen Wu, and Elizabeth Chang. “Cloud Computing: Issues and Challenges”. In:
2010 24th IEEE International Conference on Advanced Information Networking and Applications. 2010,
pp. 27–33.
[75] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic. “Cloud
computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the
5th utility”. In: Future Generation Computer Systems 25.6 (2009), pp. 599–616.
[76] KwangSik Shin, MyongJin Cha, MunSuck Jang, JinHa Jung, WanOh Yoon, and SangBang Choi.
“Task scheduling algorithm using minimized duplications in homogeneous systems”. In: Journal of
Parallel and Distributed Computing 68.8 (2008), pp. 1146–1156.
[77] Chenzhong Xu and Francis CM Lau. Load balancing in parallel computers: theory and practice.
Vol. 381. Springer, 2007.
[78] Jack Verhoosel, E. Luit, Dieter K. Hammer, and E. Jansen. “A static scheduling algorithm for
distributed hard real-time systems”. In: Real-Time Systems 3 (Jan. 1991), pp. 227–246.
[79] Fatma Omara and Doaa Abdelkader. “Dynamic Task Scheduling Algorithm with Load Balancing for
Heterogeneous Computing System”. In: The Egyptian Informatics Journal 13 (July 2012),
pp. 135–145.
[80] Amit Chhabra. “Heuristics Based Genetic Algorithm for Scheduling Static Tasks in Homogeneous
Parallel System”. In: International Journal of Computer Science and Security 4 (June 2010).
[81] Mala Kalra and Sarbjeet Singh. “A review of metaheuristic scheduling techniques in cloud
computing”. In: Egyptian Informatics Journal 16.3 (2015), pp. 275–295.
[82] A.R. Barron. “Universal approximation bounds for superpositions of a sigmoidal function”. In: IEEE
Transactions on Information Theory 39.3 (1993), pp. 930–945.
[83] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. “Gradient-based learning applied to document
recognition”. In: Proceedings of the IEEE 86.11 (1998), pp. 2278–2324.
133
[84] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep
Convolutional Neural Networks”. In: Advances in Neural Information Processing Systems. Ed. by
F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger. Vol. 25. Curran Associates, Inc., 2012.
[85] Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image
Recognition. 2015. arXiv: 1409.1556 [cs.CV].
[86] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas.
“Communication-Efficient Learning of Deep Networks from Decentralized Data”. In: Proceedings of
the 20th International Conference on Artificial Intelligence and Statistics. Ed. by Aarti Singh and
Jerry Zhu. Vol. 54. Proceedings of Machine Learning Research. PMLR, 2017, pp. 1273–1282.
[87] Thomas N Kipf and Max Welling. “Semi-supervised classification with graph convolutional
networks”. In: arXiv preprint arXiv:1609.02907 (2016).
[88] Mehrdad Kiamari, Bhaskar Krishnamachari, and Seokgu Yun. “Blizzard: A Distributed Consensus
Protocol for Mobile Devices”. In: Mathematics 12.5 (2024).
[89] Mehrdad Kiamari, Bhaskar Krishnamachari, Muhammad Naveed, and Seokgu Yun. “Distributed
Consensus for Mobile Devices using Online Brokers”. In: 2020 IEEE International Conference on
Blockchain and Cryptocurrency (ICBC). IEEE. 2020, pp. 1–3.
[90] Mehrdad Kiamari and Bhaskar Krishnamachari. “Bottleneck Time Minimization for Distributed
Iterative Processes: Speeding Up Gossip-Based Federated Learning on Networked Computers”. In:
arXiv preprint arXiv:2106.15048 (2021).
[91] Mehrdad Kiamari and Bhaskar Krishnamachari. “Gcnscheduler: Scheduling distributed computing
applications using graph convolutional networks”. In: Proceedings of the 1st International Workshop
on Graph Neural Networking. 2022, pp. 13–17.
[92] Mehrdad Kiamari, Mohammad Kiamari, and Bhaskar Krishnamachari. “GKAN: Graph
Kolmogorov-Arnold Networks”. In: arXiv preprint arXiv:2406.06470 (2024).
[93] Jae Kwon. Tendermint: Consensus without Mining. [Online]. Available:
http://tendermint.com/docs/tendermint.pdf. 2014.
[94] Juan Garay, Aggelos Kiayias, and Nikos Leonardos. “The bitcoin backbone protocol: Analysis and
applications”. In: Annual International Conference on the Theory and Applications of Cryptographic
Techniques. Springer. 2015.
134
[95] Miguel Castro and Barbara Liskov. “Practical Byzantine fault tolerance and proactive recovery”. In:
ACM Transactions on Computer Systems (TOCS) 20.4 (2002).
[96] Alysson Bessani, João Sousa, and Eduardo EP Alchieri. “State machine replication for the masses
with BFT-SMaRt”. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems
and Networks. IEEE. 2014, pp. 355–362.
[97] Ittai Abraham, Guy Gueta, and Dahlia Malkhi. “Hot-Stuff the Linear, Optimal-Resilience,
One-Message BFT Devil”. In: CoRR abs/1803.05069 (2018).
[98] Vlad Zamfir, Nate Rush, Aditya Asgaonkar, and Georgios Piliouras. Introducing the “Minimal CBC
Casper” Family of Consensus Protocols. 2018.
[99] Leemon Baird. “The swirlds hashgraph consensus algorithm: Fair, fast, byzantine fault tolerance”.
In: Swirlds Tech Reports SWIRLDS-TR-2016-01, Tech. Rep. (2016).
[100] Tyler Crain, Vincent Gramoli, Mikel Larrea, and Michel Raynal. “DBFT: Efficient Leaderless
Byzantine Consensus and its Application to Blockchains”. In: 2018 IEEE 17th International
Symposium on Network Computing and Applications (NCA). IEEE. 2018, pp. 1–8.
[101] Adam Gagol and Michal Swietek. “Aleph: A Leaderless, Asynchronous, Byzantine Fault Tolerant
Consensus Protocol”. In: arXiv preprint arXiv:1810.05256 (2018).
[102] Serguei Popov. The Tangle. https://www.iota.org/research/academic-papers. 2016.
[103] Dmitrii Zhelezov and Oliver Fohrmann. “HelixMesh: a Consensus Protocol for IoT”. In: Proceedings
of the 2019 International Electronics Communication Conference. ACM. 2019, pp. 44–51.
[104] Cosmos. https://cosmos.network/resources/whitepaper. Accessed: 2019-12-18.
[105] Silvio Micali. “ALGORAND: The Efficient and Democratic Ledger”. In: CoRR abs/1607.01341 (2016).
arXiv: 1607.01341.
[106] Aggelos Kiayias, Alexander Russell, Bernardo David, and Roman Oliynykov. Ouroboros: A Provably
Secure Proof-of-Stake Blockchain Protocol. Cryptology ePrint Archive, Report 2016/889.
https://eprint.iacr.org/2016/889. 2016.
135
[107] Timo Hanke, Mahnush Movahedi, and Dominic Williams. “DFINITY Technology Overview Series,
Consensus System”. In: CoRR abs/1805.04548 (2018). arXiv: 1805.04548.
[108] Androulaki et al. “Hyperledger fabric: a distributed operating system for permissioned
blockchains”. In: Proceedings of the Thirteenth EuroSys Conference. 2018.
[109] Kelly Olson, Mic Bowman, James Mitchell, Shawn Amundson, Dan Middleton, and
Cian Montgomery. “Sawtooth: An Introduction”. In: The Linux Foundation, Jan (2018).
[110] Yuan Xiao-li. “Research on performance of clustering-based consensus protocol in mobile Ad Hoc
networks”. In: Information Technology (2012).
[111] Weigang Wu, Jiannong Cao, Jin Yang, and M. Raynal. “A hierarchical consensus protocol for mobile
ad hoc networks”. In: 14th Euromicro International Conference on Parallel, Distributed, and
Network-Based Processing (PDP’06). 2006, pp. 1–9.
[112] N. Badache, M. Hurfin, and R. Macedo. “Solving the consensus problem in a mobile environment”.
In: 1999 IEEE International Performance, Computing and Communications Conference (Cat.
No.99CH36305). 1999, pp. 29–35.
[113] Weigang Wu, Jiannong Cao, and Michel Raynal. “Eventual Clusterer: A Modular Approach to
Designing Hierarchical Consensus Protocols in MANETs”. In: IEEE Transactions on Parallel and
Distributed Systems 20.6 (2009), pp. 753–765.
[114] Weigang Wu, Jiannong Cao, Jin Yang, and Michel Raynal. “Design and Performance Evaluation of
Efficient Consensus Protocols for Mobile Ad Hoc Networks”. In: IEEE Transactions on Computers
56.8 (2007), pp. 1055–1070.
[115] Rafael Pass and Elaine Shi. “Hybrid Consensus: Efficient Consensus in the Permissionless Model”.
In: 31st International Symposium on Distributed Computing (DISC 2017). 2017, 39:1–39:16.
[116] H. Ren, Y. Lan, and C. Yin. “The load balancing algorithm in cloud computing environment”. In:
Proceedings of 2012 2nd International Conference on Computer Science and Network Technology. 2012,
pp. 925–928.
[117] J. Bhatia, T. Patel, H. Trivedi, and V. Majmudar. “HTV Dynamic Load Balancing Algorithm for
Virtual Machine Instances in Cloud”. In: 2012 International Symposium on Cloud and Services
Computing. 2012, pp. 15–20.
136
[118] Mohit Kumar and S.C. Sharma. “Dynamic load balancing algorithm for balancing the workload
among virtual machine in cloud computing”. In: Procedia Computer Science 115 (2017). 7th
International Conference on Advances in Computing and Communications, ICACC-2017, 22-24
August 2017, Cochin, India, pp. 322–329.
[119] Rajesh Sudarsan and Calvin J. Ribbens. “Combining performance and priority for scheduling
resizable parallel applications”. In: Journal of Parallel and Distributed Computing 87 (2016),
pp. 55–66.
[120] K. Dubey, M. Kumar, and M. A. Chandra. “A priority based job scheduling algorithm using IBA and
EASY algorithm for cloud metaschedular”. In: 2015 International Conference on Advances in
Computer Engineering and Applications. 2015, pp. 66–70.
[121] P. Chrétienne, E.G. Coffman, J.K. Lenstra, and Zhen Liu, eds. Scheduling theory and its applications.
English. United States: Wiley, 1995.
[122] Ishfaq Ahmad and Yu-Kwong Kwok. “On exploiting task duplication in parallel program
scheduling”. In: IEEE Transactions on Parallel and Distributed Systems 9.9 (1998), pp. 872–892.
[123] Michael A. Palis, Jing-Chiou Liou, and David S. L. Wei. “Task Clustering and Scheduling for
Distributed Memory Parallel Architectures”. In: IEEE Trans. Parallel Distributed Syst. 7.1 (1996),
pp. 46–55.
[124] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning.
http://www.deeplearningbook.org. MIT Press, 2016.
[125] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. Cambridge, MA,
USA: A Bradford Book, 2018.
[126] Penghao Sun, Zehua Guo, Junchao Wang, Junfei Li, Julong Lan, and Yuxiang Hu. “DeepWeave:
Accelerating Job Completion Time with Deep Reinforcement Learning-based Coflow Scheduling”.
In: IJCAI. 2020.
[127] Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and
Mohammad Alizadeh. “Learning Scheduling Algorithms for Data Processing Clusters”. In:
Proceedings of the ACM Special Interest Group on Data Communication. SIGCOMM ’19. Beijing,
China: Association for Computing Machinery, 2019, pp. 270–288.
137
[128] Nathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, and Philippe Preux. “READYS: A
Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling”. In: 2021 IEEE
International Conference on Cluster Computing (CLUSTER). 2021, pp. 70–81.
[129] Guillaume Jaume, An-phi Nguyen, María Rodríguez Martínez, Jean-Philippe Thiran, and
Maria Gabrani. edGNN: a Simple and Powerful GNN for Directed Labeled Graphs. 2019. arXiv:
1904.08745 [cs.LG].
[130] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. “Spectral Networks and Locally
Connected Networks on Graphs”. In: CoRR abs/1312.6203 (2013).
[131] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and
Yoshua Bengio. “Graph Attention Networks”. In: International Conference on Learning
Representations. 2018.
[132] Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, and Xavier Bresson. Structured Sequence
Modeling with Graph Convolutional Recurrent Networks. 2017.
[133] Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and
Xavier Bresson. “Benchmarking Graph Neural Networks”. In: Journal of Machine Learning Research
24.43 (2023), pp. 1–48.
[134] William L. Hamilton, Rex Ying, and Jure Leskovec. “Inductive representation learning on large
graphs”. In: Proceedings of the 31st International Conference on Neural Information Processing
Systems. NIPS’17. Long Beach, California, USA: Curran Associates Inc., 2017, pp. 1025–1035.
[135] Thomas Kipf and Max Welling. “Variational Graph Auto-Encoders”. In: ArXiv abs/1611.07308 (2016).
[136] E. Syta, P. Jovanovic, E. K. Kogias, N. Gailly, L. Gasser, I. Khoffi, M. J. Fischer, and B. Ford. “Scalable
Bias-Resistant Distributed Randomness”. In: 2017 IEEE Symposium on Security and Privacy (SP).
2017, pp. 444–460.
[137] William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive Representation Learning on Large
Graphs. 2018. arXiv: 1706.02216 [cs.SI].
[138] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. “The Graph Neural Network
Model”. In: IEEE Transactions on Neural Networks 20.1 (2009), pp. 61–80.
138
[139] Shannon Bowling, Mohammad Khasawneh, Sittichai Kaewkuekool, and Byung Cho. “A logistic
approximation to the cumulative normal distribution”. In: Journal of Industrial Engineering and
Management 2.1 (2009), pp. 114–127.
[140] Matthew Halpern, Yuhao Zhu, and Vijay Janapa Reddi. “Mobile CPU’s rise to power: Quantifying
the impact of generational mobile CPU design trends on performance, energy, and user
satisfaction”. In: 2016 IEEE International Symposium on High Performance Computer Architecture
(HPCA). 2016, pp. 64–76.
[141] Zhiyuan Jiang, Bhaskar Krishnamachari, Sheng Zhou, and Zhisheng Niu. “SENATE: A
Permissionless Byzantine Consensus Protocol in Wireless Networks”. In: arXiv preprint:1803.08694
(2018).
[142] Ryan John King. “Introduction to Proof of Location: The Case for Alternative Location Systems”. In:
FOAM (16 October 2018) https://blog. foam. space/introduction-to-proof-of-location-6b4c77928022
(2018).
[143] Martin Martinez, Arvin Hekmati, Bhaskar Krishnamachari, and Seokgu Yun. “Mobile
Encounter-based Social Sybil Control”. In: 2nd International Workshop on Blockchain Applications
and Theory (BAT 2020) (2020).
[144] On Sharding Blockchains, Ethereum Wiki. github.com/ethereum/wiki/wiki/Sharding-FAQ.
[145] On Pruning in Ethereum. http://tiny.cc/ethpruning.
[146] Mustafa Al-Bassam, Alberto Sonnino, and Vitalik Buterin. “Fraud Proofs: Maximising Light Client
Security and Scaling Blockchains with Dishonest Majorities”. In: CoRR abs/1809.09044 (2018).
[147] Chenghao Hu, Jingyan Jiang, and Zhi Wang. Decentralized Federated Learning: A Segmented Gossip
Approach. 2019. arXiv: 1908.07782 [cs.LG].
[148] Rafael Ferreira da Silva, Rajiv Mayani, Yuning Shi, Armen R. Kemanian, Mats Rynge, and
Ewa Deelman. “Empowering Agroecosystem Modeling with HTC Scientific Workflows: The Cycles
Model Use Case”. In: 2019 IEEE International Conference on Big Data (Big Data). 2019, pp. 4545–4552.
[149] Gideon Juve, Ann Chervenak, Ewa Deelman, Shishir Bharathi, Gaurang Mehta, and Karan Vahi.
“Characterizing and profiling scientific workflows”. In: Future Generation Computer Systems 29.3
(2013). Special Section: Recent Developments in High Performance Computing and Security,
pp. 682–692.
139
[150] Mats Rynge, Gideon Juve, Jamie Kinney, John Good, G. Bruce Berriman, Ann Merrihew, and
Ewa Deelman. “Producing an Infrared Multiwavelength Galactic Plane Atlas using Montage,
Pegasus and Amazon Web Services”. In: 23rd Annual Astronomical Data Analysis Software and
Systems (ADASS) Conference. Funding Acknowledgments: OCI SI2-SSI program grant #1148515.
2013.
[151] Tainã Coleman, Henri Casanova, Loïc Pottier, Manav Kaushik, Ewa Deelman, and
Rafael Ferreira da Silva. “WfCommons: A Framework for Enabling Scientific Workflow Research
and Development”. In: Future Generation Computer Systems 128 (2022), pp. 16–27.
[152] Moo-Ryong Ra, Anmol Sheth, Lily Mummert, Padmanabhan Pillai, David Wetherall, and
Ramesh Govindan. “Odessa: Enabling Interactive Perception Applications on Mobile Devices”. In:
Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services. New
York, NY, USA: Association for Computing Machinery, 2011, pp. 43–56.
[153] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and
Yoshua Bengio. “Graph Attention Networks”. In: International Conference on Learning
Representations. 2018.
[154] Shaked Brody, Uri Alon, and Eran Yahav. “How Attentive are Graph Attention Networks?” In:
ArXiv abs/2105.14491 (2021).
[155] Thomas Kipf and Max Welling. “Variational Graph Auto-Encoders”. In: ArXiv abs/1611.07308 (2016).
[156] Guillaume Salha-Galvan, Romain Hennequin, and Michalis Vazirgiannis. “Simple and Effective
Graph Autoencoders with One-Hop Linear Models”. In: ECML/PKDD. 2020.
[157] Yuan Li, Xiaodan Liang, Zhiting Hu, Yinbo Chen, and Eric P. Xing. Graph Transformer. 2019.
[158] Deng Cai and Wai Lam. “Graph Transformer for Graph-to-Sequence Learning”. In: ArXiv
abs/1911.07470 (2019).
[159] Grégoire Mialon, Dexiong Chen, Margot Selosse, and Julien Mairal. “GraphiT: Encoding Graph
Structure in Transformers”. In: ArXiv abs/2106.05667 (2021).
[160] Dexiong Chen, Leslie O’Bray, and Karsten M. Borgwardt. “Structure-Aware Transformer for Graph
Representation Learning”. In: International Conference on Machine Learning. 2022.
140
[161] David K. Hammond, Pierre Vandergheynst, and Rémi Gribonval. “Wavelets on graphs via spectral
graph theory”. In: Applied and Computational Harmonic Analysis 30.2 (2011), pp. 129–150.
[162] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. “Convolutional neural networks on
graphs with fast localized spectral filtering”. In: Proceedings of the 30th International Conference on
Neural Information Processing Systems. NIPS’16. Barcelona, Spain: Curran Associates Inc., 2016,
pp. 3844–3852.
[163] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad.
“Collective Classification in Network Data”. In: AI Magazine 29.3 (Sept. 2008), p. 93.
141
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Algorithm and system co-optimization of graph and machine learning systems
PDF
Dispersed computing in dynamic environments
PDF
On scheduling, timeliness and security in large scale distributed computing
PDF
Enhancing collaboration on the edge: communication, scheduling and learning
PDF
Novel graph representation of program algorithmic foundations for heterogeneous computing architectures
PDF
AI-enabled DDoS attack detection in IoT systems
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
PDF
Towards the efficient and flexible leveraging of distributed memories
PDF
Fast and label-efficient graph representation learning
PDF
Theoretical foundations for dealing with data scarcity and distributed computing in modern machine learning
PDF
Accelerating reinforcement learning using heterogeneous platforms: co-designing hardware, algorithm, and system solutions
PDF
Striking the balance: optimizing privacy, utility, and complexity in private machine learning
PDF
Resource scheduling in geo-distributed computing
PDF
Scaling up deep graph learning: efficient algorithms, expressive models and fast acceleration
PDF
Graph embedding algorithms for attributed and temporal graphs
PDF
Interaction and topology in distributed multi-agent coordination
PDF
Applications of explicit enumeration schemes in combinatorial optimization
PDF
Optimizing task assignment for collaborative computing over heterogeneous network devices
PDF
Scaling up temporal graph learning: powerful models, efficient algorithms, and optimized systems
PDF
Coded computing: a transformative framework for resilient, secure, private, and communication efficient large scale distributed computing
Asset Metadata
Creator
Kiamari, Mehrdad
(author)
Core Title
Advancing distributed computing and graph representation learning with AI-enabled schemes
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2024-08
Publication Date
08/09/2024
Defense Date
07/17/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
distributed computing,graph representation learning,OAI-PMH Harvest
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Krishnamachari, Bhaskar (
committee chair
), Raghavendra, Cauligi (
committee member
), Razaviyayn, Meisam (
committee member
), Zhao, Yue (
committee member
)
Creator Email
kiamari@usc.edu,mehrdad.kiamari1989@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113998TER
Unique identifier
UC113998TER
Identifier
etd-KiamariMeh-13369.pdf (filename)
Legacy Identifier
etd-KiamariMeh-13369
Document Type
Dissertation
Format
theses (aat)
Rights
Kiamari, Mehrdad
Internet Media Type
application/pdf
Type
texts
Source
20240813-usctheses-batch-1196
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
distributed computing
graph representation learning