Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Cyberinfrastructure management for dynamic data driven applications
(USC Thesis Other)
Cyberinfrastructure management for dynamic data driven applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Cyberinfrastructure Management For Dynamic Data Driven Applications
by
Georgios Papadimitriou
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2023
Copyright 2023 Georgios Papadimitriou
Dedication
To my parents Panagiotis and Adamantia, and my sister Konstantina.
ii
Acknowledgements
This dissertation marks the end of my doctoral studies. The PhD journey was a long but rewarding experience and would not be successful without the guidance, help, and support of many special people in my
life.
First, I would like to thank my advisor, Professor Ewa Deelman, for believing in me and giving me
the opportunity to be part of the SciTech group at the Information Sciences Institute. I would also like to
express my gratitude to Professor Aiichiro Nakano, Professor Viktor Prasanna, Professor John Heidemann,
and Professor Ramesh Govindan, who were part of my qualification examination committee and part of
my thesis committee, and for providing insightful and sincere feedback to my work.
During my PhD journey, I had the opportunity to work with many talented people outside the University of Southern California. I would like to thank our collaborators at Lawrence Berkeley National
Laboratory, including Mariam Kiran and Imtiaz Mahmud; our collaborators at Argonne National Laboratory Prasanna Balaprakash, Krishnan Raghavan and Hongwei Jin, as well as our collaborators at Oak
Ridge National Laboratory: Jeff Vetters, Vicky Lynch, Jason Kincl, Valentine Anantharaj, and Jack Wells. I
also want to extend my appreciation to our collaborators at the Renaissance Computing Institute: Anirban
Mandal, Cong Wang, Komal Thareja, and Paul Ruth; our collaborators at the University of Massachusetts
at Amherst: Michael Zink, Eric Lyons, and Andrew Grote; and our collaborators at the University of Missouri: Prasad Calyam, Alicia Esquivel Morel, and Chengyi Qu.
iii
Additionally, I would like to thank all the staff, my fellow PhD students, and the visiting scholars at
the SciTech group for the stimulating conversations, the fun social events, and their support throughout
my doctoral studies: Rafael Ferreira da Silva, Karan Vahi, Mats Rynge, Loïc Pottier, Rajiv Mayani, Ryan
Tanaka, Tu Mai Anh Do, Patrycja Krawczuk, Hamza Safri, Rosa Filgueira, Tainã Coleman, Wendy Witcup,
Nicole Virdone, Ciji Davis, and Zaiyan Alam.
Finally, I want to thank my family. I will always be grateful to my parents and my grandparents for
everything they have offered me, for creating an environment with many opportunities, and for always
supporting every decision I made while chasing my dreams.
iv
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapter 1: Context and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Application Domain Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Collaborative Adaptive Sensing of the Atmosphere (CASA) . . . . . . . . . . . . . 3
1.2.1.1 Contouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1.2 Nowcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1.3 Wind Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1.4 Hail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Ocean Observatories Initiative (OOI) . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2.1 Orcasound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Science Machine Learning Applications . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3.1 Galaxy Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3.2 Lung Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.3.3 Crisis Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Contributions Across This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.1 Mechanisms to Control Compute and Network Resources of the Cyberinfrastructure 15
1.3.2 Monitoring Dynamic Data Driven Workflow Executions on Distributed Cyberinfrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.3 Evaluation of Cyberinfrastructure Configuration’s Impact on Data Intensive
Workflow Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.4 Reduction of Network Contention Using Workflow Management System
Mechanisms During the Workflow Planning Phase . . . . . . . . . . . . . . . . . . 17
1.3.5 Reduction of Network Contention Using Application-Aware Software Defined
Flows and Workflow Management System Mechanisms During the Workflow
Running Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 2: A Framework to Control the Cyberinfrastructure . . . . . . . . . . . . . . . . . . . . . . 19
v
2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 DyNamo System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Integrated, Multi-Cloud Resource Provisioning . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Dynamic Data Movement Across Infrastructures and External Repositories . . . . 24
2.2.3 Mobius: DyNamo’s Network-centric Provisioning Platform . . . . . . . . . . . . . 26
2.2.4 Workflow Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.4.1 CASA Pegasus Nowcast Workflow . . . . . . . . . . . . . . . . . . . . . . 30
2.2.4.2 CASA Pegasus Wind Speed Workflow . . . . . . . . . . . . . . . . . . . . 31
2.2.4.3 CASA Pegasus Hail Workflow . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Evaluation of DyNamo System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.1 CASA Testcases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.2 Experimental Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.3.1 Effect of cluster parallelism in the nowcast workflow . . . . . . . . . . . 38
2.3.3.2 Performance study of Nowcast workflows . . . . . . . . . . . . . . . . . 39
2.3.3.3 Improving data movement performance . . . . . . . . . . . . . . . . . . 40
2.3.3.4 Compute slot utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 3: A Framework to Capture End-to-End Workflow Behavior . . . . . . . . . . . . . . . . . 44
3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Existing Monitoring Systems and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.1 System Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.2 Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Online Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.1 Pegasus-Kickstart Online Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.2 Data Collection - Extending Workflow Monitoring . . . . . . . . . . . . . . . . . . 51
3.3.3 A Versatile Approach for Integrating New Tools . . . . . . . . . . . . . . . . . . . . 55
3.3.4 Data Capture Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.5 Data Discovery and Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4 Deploying the Online Monitoring Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5.1 Scientific Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.5.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5.3 Network Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.4 I/O Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.5 CPU Contention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Chapter 4: Evaluation of Cyberinfrastructure Configuration’s Impact on Data Intensive Workflow
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.1 Galaxy Classification Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.2 Lung Segmentation Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.3 Crisis Computing Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.1 Chameleon Cloud Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
vi
4.3.2 Execution Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.4 General Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.1 Job-level Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.2 Workflow-level Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Chapter 5: Reduction of Network Contention Using Workflow Management System Mechanisms
During the Workflow Planning Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Task Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.1 Effect of Task Clustering in Network Requirements . . . . . . . . . . . . . . . . . . 94
5.3 Reducing Network Requirements of DDDAS Workflows Using the Edge To Cloud Paradigm 97
5.3.1 Orchestrating DDDAS Workflows From Edge To Cloud . . . . . . . . . . . . . . . . 97
5.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3.2.1 Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3.2.2 Execution Environment Setup . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Chapter 6: Reduction of Network Contention Using Application-Aware Software Defined Flows
and Workflow Management System Mechanisms During the Workflow Running Phase 108
6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2.1 vSDX module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2.2 Pegasus Ensemble Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3.1 CASA Pegasus Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3.2 Experimental Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.3.3 Workflow Ensembles - Network Requirements . . . . . . . . . . . . . . . . . . . . 117
6.3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3.4.1 Dedicated link performance . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.3.4.2 Uncontrolled network sharing . . . . . . . . . . . . . . . . . . . . . . . . 120
6.3.4.3 Applying SDX QoS policies . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3.4.4 Applying QoS Policies using Pegasus-EM . . . . . . . . . . . . . . . . . . 122
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Chapter 7: Summary and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
vii
List of Tables
3.1 Summary of online workflow performance metrics and characteristics provided by
pegasus-kickstart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Summary of Darshan metrics captured during workflow execution. Since Darshan logs
are only produced at job completion, near real-time monitoring is attained at per job
completion granularity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1 Machine Learning Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Machine learning executable workflow scenarios and transfers settings (Baseline,
Container Installed, NFS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3 Job-level characterization when running with non shared filesystem (Baseline scenario) . . 83
4.4 Workflow execution profiles. (Baseline scenario) . . . . . . . . . . . . . . . . . . . . . . . . 84
5.1 Machine learning executable workflow scenarios and transfers settings (Baseline,
Clustering) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
viii
List of Figures
1.1 Multi-workflow display of hail (orange) and wind (red) contours, with GIS boundaries,
infrastructure, and rainfall imagery (green) overlaid during a severe weather event. . . . . 5
1.2 Nowcast Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 The Orcasound hydrophone network continuously collects and streams in real-time audio
data from sensors located in the North Pacific region. The audio files are converted into
spectrograms for further analysis and identification of Orca sounds. . . . . . . . . . . . . . 9
1.4 Galaxy Classification Preprocessing (image sizes in pixels). . . . . . . . . . . . . . . . . . . 11
1.5 Lung segmentation using convolutional neural networks on X-ray images. The yellow
highlighted regions have been added by the model to highlight the human lungs. . . . . . 12
1.6 Crisis Computing constantly consumes and analyses social media posts using machine
learning, to identify disaster events in real-time and quickly direct first responders to save
lives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1 Data movement and resource provisioning across CI federation. . . . . . . . . . . . . . . . 25
2.2 Mobius network-centric platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 CASA Pegasus Workflow - Nowcast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 CASA Pegasus Workflow - Wind Speed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5 CASA Pegasus Workflow - Hail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 CASA workflow deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 Nowcast: Exploring Cluster Parallelism (Using Stitchport 1Gbps). . . . . . . . . . . . . . . 38
2.8 Nowcast: Using Stitchport 1Gbps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.9 Wind: Workflow Ensemble Runs - No Clustering. . . . . . . . . . . . . . . . . . . . . . . . 40
ix
2.10 Nowcast: Workflow Ensemble Runs - Chameleon Compute Slot Utilization. . . . . . . . . . 41
3.1 Architecture overview of the end-to-end online performance data capture and analysis
process. How we are using the Pegasus workflow management system to collect and
publish workflow logs, execution traces and transfer logs. . . . . . . . . . . . . . . . . . . . 49
3.2 Screenshots of the Kibana plugin for near real-time monitoring of workflow performance
metrics. Top: workflow progression and detailed job characteristics. Bottom: time series
data of job performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 A diagram of a branch of the SNS workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.4 Overview of the 1000Genome sequencing analysis workflow. . . . . . . . . . . . . . . . . . 64
3.5 Experimental setup on the ExoGENI testbed. . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.7 Cumulative I/O over time. Top: 1000Genome workflow without interference. Bottom: I/O
stressing of the workers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.8 Average STDIO and POSIX performance for NAMD and Sassena Jobs obtained from
Darshan’s logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.9 CPU utilization per rank for the NAMD MPI job. Without interference the CPU utilization
is steady, close to 100%. However with interference CPU utilization fluctuates between
70% and 90%. To introduce interference, one stress process per worker consumed approx.
25% of the node’s CPU time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1 Galaxy Classification Workflow [93]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Lung Segmentation Workflow [80]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3 Crisis Computing Workflow [92]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.4 Non-shared filesystem deployment on Chameleon. . . . . . . . . . . . . . . . . . . . . . . . 80
4.5 Shared filesystem deployment on Chameleon. . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.6 Cumulative compute time for each workflow when running on Chameleon Cloud. . . . . . 86
4.7 Workflows end-to-end execution time for each scenario when running on Chameleon Cloud. 86
4.8 Cumulative stage-in and stage-out time for each workflow when running on Chameleon
Cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1 Horizontal clustering example. Horizontal clusters don’t present intra-cluster dependencies, any execution order maintains DAG semantics. . . . . . . . . . . . . . . . . . . . . . . 92
x
5.2 Label clustering example. Subscripts designate the label assigned to each task. Within
clusters execution happens based on topological sorting. . . . . . . . . . . . . . . . . . . . 93
5.3 Cumulative stage-in and stage-out time for each workflow. . . . . . . . . . . . . . . . . . . 95
5.4 Cumulative compute time and end-to-end workflow execution time. . . . . . . . . . . . . . 96
5.5 Synthetic Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.6 Orcasound Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.7 Experimental Setup on Chameleon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.8 Workflow makespans. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.9 Cumulative job walltime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.10 Cumulative job walltime as observed from the submit node. . . . . . . . . . . . . . . . . . 103
5.11 Cumulative time spent on transferring data over WAN. . . . . . . . . . . . . . . . . . . . . 104
5.12 Total data transferred over WAN (Yaxis in logscale(2)). . . . . . . . . . . . . . . . . . . . . 104
6.1 Virtual Software Defined Exchange (SDX) Network Architecture . . . . . . . . . . . . . . . 112
6.2 CASA vSDX workflow deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.3 Network Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.4 Workflow Makespans. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.5 Data Transfer Durations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.6 Heatmap of Average Workflow Makespans . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.7 Heatmap of Workflow Ensemble Makespans . . . . . . . . . . . . . . . . . . . . . . . . . . 123
xi
Abstract
Computational science today depends on complex, data-intensive applications operating on datasets from
a variety of scientific instruments. These datasets may be huge in volume, may have high velocity or both,
raising major challenges of how scientists can analyze these datasets. For example, the Legacy Survey of
Space and Time (LSST) telescope collects over 20TB of data per day, with a goal of 500 PB towards the end
of the survey (large data volume). Other applications, such as the network of weather radars located in the
Dallas Fort-Worth (DFW) area can produce a steady data flow of over 400Mbits/second (high velocity). On
the other hand, workflows processing these datasets might need to respond to changes in the processing
load (e.g, increases in data flow), in order to maintain a steady and predictable turnaround time.
With the deployment of these data-intensive applications, scientists have to answer questions related
to the cyberinfrastructure (CI) they rely on for the processing. Such questions are: 1) What are the most appropriate resources needed for the application execution? 2) How can the computations scale on demand?
3) How to distribute and access the datasets efficiently, satisfying quality of service (QoS) requirements?
Having to worry about the CI can be a distraction or even a road block, consuming time or preventing
scientists from achieving their goals. Workflow management systems provide tools that can address these
issues and enable the execution of complex applications in heterogeneous and distributed CI, but can be
too generic and might be missing functionality essential to the applications.
xii
In this thesis we present our efforts to improve the performance of these data-intensive application
systems, while they are executing on modern CI. We construct and evaluate novel approaches and methodologies that aim to improve performance without adding more complexity to the CI, and we develop new
tools that extend the functionality offered by the CI. We provide a methodology that makes it easier for
scientists to interact with the cyberinfrastructure available to them, and we develop a new framework to
capture end-to-end performance statistics of the data-intensive workflows executed on the heterogeneous
and highly distributed CI, which couldn’t be done at this scale before. Additionally, since modern CI is
very malleable and provides many configuration opportunities, we evaluate how the choices during the
acquisition and configuration of resources affect the performance of the data-intensive workflows.
Finally, network performance is a very important factor that dictates the performance of the dataintensive workflows. This thesis answers the fundamental question of how scientists can manage the CI
and apply policies that can help their data-intensive applications meet their constraints (.e.g, turn around
time), by avoiding network degradation. We develop methodologies that rely on workflow restructuring
and optimizations that take place during the planning phase of the workflows, and can reduce their peak
network requirements. We also develop active approaches that can be applied to reduce the per workflow
network requirements during their execution. These approaches use a workflow ensemble manager and
application aware software defined flows.
xiii
Chapter 1
Context and Contributions
1.1 Introduction
Dynamic Data Driven Application Systems (DDDAS) [22, 38] are a paradigm where data is dynamically
integrated into an executing application and vice versa: the application dynamically steers the measurement process in a dynamic feedback control loop. Many scientific domains rely on DDDAS where their
application models dynamically integrate, at model-execution time, real-time and archival data. The results of the application may control the data acquisition system and further guide data selection. Examples
of these applications can be found in weather modeling, ocean sciences, seismology, astronomy and many
other domains [119, 208, 179, 71].
A major challenge for dynamic data driven application systems is the integration of data into the applications. These applications are often modelled as workflows that are composed of a set of dependent
tasks, each of which has different compute, network, storage, and input requirements. The requirements
determine where and how each task can be executed. To address the needs of dynamic data driven workflows, it is critical to integrate existing infrastructures, e.g., instruments, distributed computing resources,
and data repositories, using high-performance networks and data management software, in order to make
the execution of these workflows feasible, increase their efficiency and meet their real time needs [194].
1
Currently, such integration is either not available, or is purpose-built manually for a specific scientific application or community. However, recent advances in dynamic networked cloud infrastructures, such as
ExoGENI [20] and FABRIC [21], provide the technical building blocks to construct and manage such an integrated, reconfigurable, end-to-end infrastructure, with performance guarantees to satisfy the computing
and data movement requirements of the DDDAS workflows.
Despite the rich set of capabilities offered by dynamic, networked infrastructures, it is the case that
data-driven applications and workflows have not adequately taken advantage of current cloud architectures. Applications are not designed to utilize adaptive features offered by state-of-the-art,
networked cloud infrastructures, especially with respect to delivering end-to-end, high performance data flows. This is mainly due to the extra complexity involved in setting up, maintaining and
incorporating these new capabilities into the applications. As a result the scientific community is slow in
adopting these new solutions. However, scientists in weather modeling, ocean sciences, seismology, and
other big data-driven scientific disciplines, need these new capabilities so that they can analyze data in
real-time and react to the observed phenomenon and/or missing longitudinal patterns.
Additionally, resource requirements for data driven workflows are increasing in scale. Advances in
electronics, semiconductors and hardware architecture [86], [82], [199] have lead to the construction and
deployment of instruments and edge devices that can produce data at higher resolutions and at faster
rates than ever before [119], [183]. Traditional approaches of statically provisioned, dedicated, preconfigured computing and network infrastructure are expensive, hard to adapt, and difficult to
manage. The bursty computational and network demands for these workflows warrant flexible processing solutions on diverse infrastructures that integrate computing, and malleable, high-performance data
movement for expeditious result delivery. While dynamic provisioning mechanisms exist, these are not
offered to the application scientists at the right level of abstraction, making them difficult to use, with no
guarantee of a proper resource match.
2
1.2 Application Domain Challenges
An important characteristic for many Dynamic Data Driven Applications is the ability to ingest real-time
data and model-augmented data into the processing pipelines while creating time critical responses based
on the analysis of the ingested data.
1.2.1 Collaborative Adaptive Sensing of the Atmosphere (CASA)
An example of such an application is the weather warning system that has been designed, built, and is
currently operated by the NSF Engineering Research Center for Collaborative Adaptive Sensing of the
Atmosphere (CASA) [119]. CASA has the goal to improve the ability to better observe, understand, predict, and respond to hazardous weather events. In 2012, CASA deployed an operational network of seven
dual polarized, X-band Doppler radars in the Dallas-Fort Worth (DFW) metroplex [119], and has since
augmented it with other types of sensors such as rain gauges, disdrometers (instruments that can distinguish between rain, graupel, and hail), wind profilers, Global Positioning System Meteorology (GPS-Met)
stations, and barometric pressure sensors. While the generation of atmospheric observation data by the
radars and other sensors is an important component of the CASA system, it also depends on scalable
processing of volumetric radar scans and the fast and efficient distribution of actionable information to a
diverse end-user base for decision making. Actionable information in the case of severe weather warnings is most useful to the end-users in the form of products derived from raw sensor information. Such
derivations include combinations of multiple sensors and sensor types, extractions over specific geographic
areas, format conversions into existing toolsets, and tailored imagery. These products have to be provided
to the end-user in a timely fashion to generate warnings with sufficient lead time in the case of quickly
developing weather events.
Because of the potentially high computing load associated with product generation, particularly during
widespread severe weather events, product workflows must be carefully and efficiently managed. Certain
3
base products may need to be generated in an ongoing basis, but others can be run on-demand, based on
the characteristics of the ongoing weather regime. For example, there is no need to run a hail detection
suite until there is the possibility for strong, convective weather.
Setting up a CASA workflow can be a complex process due to asynchronously generated sensor input
data, strict sequential processing steps, and fluctuating runtimes. In recent years, dynamic provisioning solutions have reduced some unnecessary processing [110]; however static architectures must still be chosen
in advance based on estimated requirements, and effort is needed to distribute workflow processes evenly
across an arbitrary collection of Virtual Machines (VMs). The preferred solution is to be able to quickly acquire appropriate resources in an automated fashion based on specific detectable, meteorological triggers
and measured data loads.
1.2.1.1 Contouring
Contouring is an important processing step in CASA’s operations. Similar to contours on a topographical
map that enclose areas above a certain elevation, in CASA, contours indicate areas where values within
the grid exceed a certain threshold [112]. Contouring produces 2D closed GIS style polygon ‘objects’ in
GeoJSON [178] format that describe geographical areas of observed or forecast weather risk. The contouring algorithm is used to describe areas with high radar reflectivity as a proxy for storm location in the
nowcasting workflow, areas of severe winds that are likely to be causing damage in the wind workflow, or
areas where hail has been detected in the hail workflow. While the contouring process is a highly useful
mechanism for communicating risk locations to users, generating valid, well ordered polygons out of large
grids of imperfect data is a CPU-intensive process. The CASA contouring procedure uses the Marching
Squares algorithm [116] to generate a series of isolines, followed by a custom approach to connect them
into ordered concave polygons in a GeoJSON format. Figure 1.1 depicts contoured geofences of wind and
hail regions of interest.
4
Figure 1.1: Multi-workflow display of hail (orange) and wind (red) contours, with GIS boundaries, infrastructure, and rainfall imagery (green) overlaid during a severe weather event.
1.2.1.2 Nowcast
Nowcasts are short-term advection forecasts that are computed by mosaicing asynchronous individual
radar reflectivity∗ data, accumulating composite grids over a short duration, and projecting into the future
by estimating the derivatives of motion and intensity with respect to time [99, 153]. Every minute, the
CASA nowcasting system generates 31 grids of predicted reflectivity, one for each minute into the future
from minutes 0-30. Nowcasts are valuable for short term planning and estimating the arrival or departure
of precipitation by providing an estimate of the timing and trajectory of weather features. Nowcasts can
assist in such tasks as route planning, deployment of spotters, and keeping emergency responders out of
harm’s way.
1.2.1.3 Wind Speed
A Doppler radar is able to estimate the velocity of moving objects based on a phase shift that occurs if the
objects are moving toward or away from the radar beam. Components of velocity perpendicular to the
beam are not sensed. For a given Doppler radar, this means that there will be substantial underestimations
∗A metric to express the intensity of precipitation.
5
Figure 1.2: Nowcast Output.
of true wind speed over portions of the sensing domain where certain directional components of the winds
are not able to be sampled. However, with an overlapping network of radars (as in the case of the CASA
network in DFW), areas not adequately sampled by one radar are often better sampled by other radars with
different relative angles. Therefore, CASA’s maximum observed velocity workflow ingests the single radar
base data from all of the radars in the network and creates a gridded product representing the maximum
observed wind speeds. As part of the CASA workflow, areas of severe winds are identified and checked
against the location of known infrastructure, in this case-study, hospitals, with email alerts sent out should
they be impacted. The CASA wind speed workflow blends together voluminous radar-based data as an
initial processing step and requires significantly more networking resources than workflows operating on
derived products (e.g., the one described in Section 1.2.1.2, above). Input data rates can approach 10Mbps
per radar, of which CASA currently operates seven of them in its DFW network.
6
1.2.1.4 Hail
Dual-polarized [47] radars are able to discern the shape of hydrometeors by providing information on
their horizontal and vertical cross-sections. Such information can be used to characterize the shape of the
falling precipitation and in combination with temperature soundings derived from weather balloons, can
accurately differentiate between areas of hail, sleet, and liquid precipitation. Hail is highly destructive in
the densely populated DFW metroplex, with four hail events since 2016 totaling over $1 billion in damages [128]. The hail workflow ingests high bandwidth single radar moment data from each of the CASA
radars and the NOAA Next Generation Weather Radar (NEXRAD) radars, as well as the latest weather
balloon sounding. Regions of the radar scan are classified as rain, drizzle, hail, ground clutter, graupel,
clear air, etc. Images are generated of all the single radar hail data for single radar web display. Then, at a
specified interval, a compositing algorithm collects all the single radar based hail classifications that have
been performed, and merges them together into a network-level precipitation classification product that
encompasses the region. Once produced, a network-level image is created, and the data is contoured to
extract discrete areas of hail from the grid. The contours are then compared against known Geographic
Information System (GIS) based infrastructure, generating alerts and visual cues to indicate that an infrastructure has been, or can be impacted by hail.
1.2.2 Ocean Observatories Initiative (OOI)
The Ocean Observatories Initiative (OOI) [170] is a science-driven ocean observing network that delivers
real-time data from more than 900 instruments. These data can be used in applications that address critical
science questions regarding the world’s ocean. The Ocean Observatories Initiative (OOI), supports critical research in climate change, ocean acidification, carbon cycling, and marine life, through a variety of
datasets and data infrastructure services, capable of direct streaming of real-time data and bulk downloads
of historical samples.
7
1.2.2.1 Orcasound
An example of a Dynamic Data Driven Application System (DDDAS) using OOI data is the Orcasound
application [179], which is part of a community driven initiative dedicated to studying Orca whales in the
Pacific Northwest region. Orcasound relies on a network of hydrophone sensors strategically deployed
in three locations within the state of Washington: San Juan Island, Point Bush, and Port Townsend (Figure 1.3). These sensors continuously capture underwater sounds, including the distinct vocalizations of
Orca whales. This vast and continuous stream of data (~50GB/day) serves as the foundation for real-time
analysis and long-term research. As an integral part of the Orcasound project, a DDDAS system has been
developed to process and analyze the hydrophone data. This system leverages cutting-edge techniques
in data analysis and machine learning to automatically identify and classify Orca whale sounds. Specifically, machine learning models have been trained to detect and flag the characteristic whistles produced
by Orcas. One key aspect of the Orcasound DDDAS system is its ability to provide real-time insights. The
live notification module is designed to swiftly process incoming hydrophone data streams and identify
Orca sounds in real-time. This functionality is crucial for alerting local observation stations and nearby
vessels to the presence of Orcas. It enables researchers and enthusiasts to react promptly to Orca activity,
facilitating better understanding and conservation efforts.
In addition to real-time processing, Orcasound also incorporates a batch processing module. This
component is essential for conducting in-depth analysis over extended periods of time and refining the
inference models using new annotated data. The batch processing module is responsible for analyzing
historical data, which may span several months or even years. It also integrates new recordings into the
dataset and continuously works to improve the accuracy and effectiveness of the artificial intelligence
models used for sound identification.
These two modules, the live notification and batch processing, present distinct computational challenges and resource requirements. The live notification module must operate with low latency, ensuring
8
swift identification of Orca sounds to trigger timely alerts. On the other hand, the batch processing module
has to efficiently manage and analyze large volumes of historical data, which demands significant computational resources. Additionally, both modules need to adapt to changing environmental conditions and
Orca behaviors, further highlighting the dynamic nature of DDDAS.
Figure 1.3: The Orcasound hydrophone network continuously collects and streams in real-time audio data
from sensors located in the North Pacific region. The audio files are converted into spectrograms for
further analysis and identification of Orca sounds.
1.2.3 Science Machine Learning Applications
Machine Learning is becoming a common component of science application and DDDAS systems have
been incorporating methodologies to retrain and refine machine learning models as new data arrive, provide fast inference modules and build notification systems for critical applications, such as notifying first
responders during disaster conditions. In this section, we introduce three science machine learning applications, originating from three distinct scientific domains (astronomy, medicine, and crisis computing)
that use different supervised learning approaches (image classification, image segmentation, and natural
language processing), and need DDDAS to operate efficiently.
9
1.2.3.1 Galaxy Classification
Automated galaxy morphology classification is a critical step in understanding the formation and evolution
of galaxies. It allows astronomers to systematically categorize and analyze galaxies on an unprecedented
scale, enabling the identification of trends and patterns in the galaxy population. The Sloan Digital Sky
Survey (SDSS) [88] has gathered over 600 terabytes of image and spectral data over its mission lifetime [3]
and it has become an important resource in studying the sky. This staggering volume of data highlights
the need for automated classification techniques like deep learning to efficiently process and categorize
galaxies.
Dynamic data driven systems using deep learning have emerged as powerful tools in tackling this challenge. An example deep learning application is presented by Zhu et al., in their publication “Galaxy morphology classification with deep convolutional neural networks" [208], where they present deep learning
techniques to automate the classification of galaxies based on their morphological features, using feedback loops to refine their models, and fast inference modules as new data arrive from the SDSS archive.
Figure 1.4 shows the preprocessing that is done before the galaxy images can be classified. In their work
Zhu et al. classify the galaxies into five distinct categories:
• Completely Round-Smooth: This category includes galaxies that exhibit a symmetrical, smooth appearance, devoid of prominent features or irregularities.
• In-Between Smooth: Galaxies falling into this category display some degree of smoothness but may
also exhibit minor irregularities or subtle features.
• Cigar-Shaped Smooth: These galaxies are elongated and possess a cigar-like morphology, often indicating a specific orientation.
• Edge-On: Edge-on galaxies are characterized by their presentation as thin disks when viewed from
our vantage point. This orientation can reveal valuable insights into the galaxy’s structure.
10
• Spiral: Spiral galaxies, are one of the most iconic galaxy types. Understanding their morphology is
crucial for unraveling the dynamics of spiral galaxies and their evolution.
Deep convolutional neural networks (CNNs) are at the forefront of automating the classification of
galaxies into these categories [208],[27],[66]. These neural networks are designed to mimic the human
visual recognition process, capable of extracting intricate features from galaxy images. They learn to
differentiate between the subtle details in the images, which might be hard for human observers. Related
publications showcase the ability of CNNs to process enormous datasets quickly and accurately, which
opens up the possibility of analyzing even larger datasets that were previously too daunting to handle. For
the efficient analysis of such massive datasets, the analysis needs to be driven by the data. As the SDSS
periodically adds new data, expanding its catalog, these systems require new processing and refinement
of their CNN models. Additionally, specific areas of the sky are of higher interest than others, highlighting
the need of the galaxy morphology classification pipelines to focus on certain regions of the sky using
higher resolution images to provide more accurate classifications and galaxy morphology evolution.
Figure 1.4: Galaxy Classification Preprocessing (image sizes in pixels).
1.2.3.2 Lung Segmentation
Lung segmentation is a crucial component of the broader field of medical image analysis, playing a pivotal role in the diagnosis and treatment of pulmonary diseases. Accurate and precise delineation of lung
structures from radiological images, such as X-rays, CT scans, or MRI scans, is paramount for a multitude
11
of clinical applications. In recent years, the integration of DL techniques into the realm of medical image
analysis has revolutionized the way we approach lung segmentation tasks.
DL methods, particularly convolutional neural networks (CNNs), have become an established technique in automating the segmentation of lung regions [105],[151],[67]. These DL-based algorithms have
been able to accurately detect the boundaries of the lungs and to identify various anomalies, including
tumors. Figure 1.5 shows lung segmentation in X-Ray images using DL. One of the primary advantages
of using these techniques for lung segmentation is their ability to learn complex, hierarchical features directly from the pixel-level data in medical images. Unlike traditional image processing techniques, which
often rely on handcrafted features and heuristics, DL models can autonomously extract relevant patterns
and representations, leading to improved accuracy and adaptability across diverse datasets. As new data
become available, the need for building dynamic data driven systems around the deep learning models for
lung segmentation also increases. Researchers and medical practitioners can leverage these systems to retrain and refine their models to better generalize across different imaging modalities, patient populations,
and imaging conditions.
Figure 1.5: Lung segmentation using convolutional neural networks on X-ray images. The yellow highlighted regions have been added by the model to highlight the human lungs.
Furthermore, these medical purpose dynamic data driven applications with deep learning approaches
on their back-end, have not only accelerated the process of lung segmentation but have also facilitated the
12
integration of real-time or near-real-time automated systems into clinical workflows. This has the potential to streamline diagnosis, reduce human error, and ultimately improve patient outcomes by providing
physicians with more accurate and timely information. Despite the remarkable progress in DL-based lung
segmentation, several challenges still persist in their use. These include addressing class imbalance in
medical image datasets, ensuring model interpretability and transparency, and dealing with issues related
to data privacy and security [40].
1.2.3.3 Crisis Computing
In recent years, the role of social media platforms, such as Twitter and Instagram, has evolved significantly in the context of crisis computing. These platforms have emerged as invaluable sources of critical
information during disaster events, offering a multi-modal stream of content that can provide timely and
actionable insights to both the public and local officials. These platforms enable the rapid dissemination
of real-time updates, photos, videos, and firsthand accounts, which collectively contribute to building a
comprehensive understanding of an emergency situation (Figure 1.6).
However, there are numerous challenges that must be addressed to harness the full potential of social
media data for crisis response. One of the most prominent hurdles is the issue of information overload.
At any given moment, millions of social media posts flood the digital sphere, making it imperative to
develop robust systems capable of processing vast amounts of data in near real-time. This necessitates the
creation of sophisticated algorithms and dynamic data driven platforms that can effectively sift through
thousands of short, informal messages and vast quantities of pictures to extract trustworthy and relevant
information. This curated information serves as the foundation for constructing situational awareness
during a catastrophic event, allowing responders to make informed decisions and to efficiently allocate
resources.
13
Figure 1.6: Crisis Computing constantly consumes and analyses social media posts using machine learning,
to identify disaster events in real-time and quickly direct first responders to save lives.
The growing field of crisis informatics represents a crucial endeavor aimed at addressing these challenges. It focuses on the development of dynamic data driven systems using advanced deep learning
models to automate the extraction of valuable posts and information from the deluge of social media
threadss [133],[9]. By leveraging machine learning techniques, natural language processing, and computer vision, researchers in this field work toward creating intelligent systems that can filter, categorize,
and prioritize social media content during crises. These systems aim to identify critical information such
as requests for help, reports of damage, emerging trends, and potential hazards, all of which contribute to
a more accurate and up-to-date understanding of the crisis situation. Furthermore, crisis informatics not
only seeks to enhance information extraction but also emphasizes the importance of ensuring the credibility and reliability of the data obtained from social media. In the chaotic environment of a disaster, misinformation and rumors can spread rapidly, potentially leading to confusion, misallocation of resources, and
harm. Consequently, researchers are developing methods to verify the authenticity of information sources
and assess the trustworthiness of posts, thereby reducing the risk of relying on inaccurate or misleading
data.
14
1.3 Contributions Across This Thesis
In this thesis we identify challenges and provide algorithms and solutions for managing the cyberinfrastructure in support of Dynamic Data Driven Application Systems. The aim is to enable and enhance the
performance of DDDAS applications. We are focusing our efforts into making data integration to DDDAS
more efficient by building new automation mechanisms driven by the science application and by managing
the network infrastructure to reduce performance bottlenecks and increase performance of data transfers.
In the following sections we briefly introduce the contributions of this thesis.
1.3.1 Mechanisms to Control Compute and Network Resources of the
Cyberinfrastructure
In Chapter 2 we present our approaches to infrastructure federation and the design and evaluation of
mechanisms that enable malleable, high-performance data flows between diverse, distributed, nationalscale cloud platforms (ExoGENI and Chameleon) and the CASA data repository. We conceptualized and
developed new features in a network-centric platform called Mobius, which bridges the abstraction gap by
presenting an appropriate, high-level interface to scientists for dynamic, end-to-end resource provisioning. We show how CASA workflows can be parallelized as ensembles and automatically executed on the
Mobius-provisioned infrastructure (Exogeni & Chameleon) by a modern workflow management system.
Our approach improved CASA’s operations (1) by making the network performance more predictable,
which reduced the standard deviation in end-to-end workflow execution time by more that 25%, and (2)
by reducing by over 65% the required resources needed to obtain the same results while still meeting the
roundtrip time guarantees required by the application.
15
1.3.2 Monitoring Dynamic Data Driven Workflow Executions on
Distributed Cyberinfrastructure
The highly distributed and reconfigurable character of today’s Cyberinfrastructure makes it difficult to
methodically track the execution and performance of every aspect of a Dynamic Data Driven Application and Systems (data transfers, execution provenance data, resource utilization etc.). In Chapter 3 we
present an end-to-end framework for online performance data capture, preprocessing, and storing of scientific workflows performance data, which includes fine-grained and coarse-grained application-level and
resource-level data within or across distributed cyberinfrastructure. Through our evaluation, we show that
our framework can capture the performance of various aspects of the applications performance, including
changes in workflow performance originating from CPU, network or filesystem slowdowns, across single
threaded, multi threaded, and MPI applications.
1.3.3 Evaluation of Cyberinfrastructure Configuration’s Impact on Data Intensive
Workflow Performance
Cyberinfrastructure offers high levels of configurability: a variety of environment configurations can be
created, leading to the applications having different performance. In Chapter 4 we present an experimental
study demonstrating how data management configurations affect data-intensive applications, when executed in cloud environments. To showcase these behaviors we are using a set of science machine learning
workflows. Our experimental results show that even though shared filesystem cyberinfrastructure configurations can improve data stage in and stage out times by over 60%, in some cases, they can also negatively
impact the end-to-end execution time of the data-intensive applications (e.g., machine learning) by up to
13% compared to a non shared filesystem baseline. This performance degradation is due to network and
filesystem overheads introduced during execution. We also include a discussion of the challenges workflow
16
management systems (WMS) will be facing when supporting ML workflows and how they could tackle
them.
1.3.4 Reduction of Network Contention Using Workflow Management System
Mechanisms During the Workflow Planning Phase
Network contention can have a significant impact on the performance of data-intensive workflows, since
computations block while waiting for data. Additionally different cyberinfrastructure configurations can
negatively affect the overall performance of these workflows, even though they are improving some of
their subsystem operations, like data transfers (as we demonstrated in Chapter 4). In Chapter 5 we explore mechanisms during the planning phase of the workflows, where the workflow management system
makes resource selection and task scheduling decisions. The workflow management systems can perform
dynamic workflow reorganization and can place computing tasks at the edge, closer to the data. This may
help optimize the application’s network needs by reducing data movement and increasing data reuse. Using task clustering we were able to reduce the data transfer needs of three machine learning workflows
by up to 85% and improved their end-to-end execution time by up to 17%. By placing tasks closer to the
edge, applications that produce less data than the ones they consume can be greatly benefited, and reduce
dramatically the data they have to transfer over WAN. In the case of CASA’s wind, the workflow reduces
the time spend in transfer by over 3 times in a hybrid edge-cloud setting. These mechanisms, however, do
not allow for any adjustments during execution.
17
1.3.5 Reduction of Network Contention Using Application-Aware Software Defined
Flows and Workflow Management System Mechanisms During the Workflow
Running Phase
The problem of network contention is augmented when application ensembles have different network
needs and compete for the same network resources. This results in performance degredation of the entire
ensemble. As discussed in our previous contribution, at plan-time, workflow management mechanisms
can help reduce the network contention by optimizing data transfers and data placement, but they do not
allow fine grained control over individual ensemble members and QoS policies cannot be changed while
the application is running. In Chapter 6 we conceptualized and implemented two novel methodologies to
alleviate network contention using application-aware network control mechanisms. These identify and
throttle application flows that saturate the network. At runtime, workflow management system policies,
integrated into the ensemble manager, pace the dispatch of workflows. Our experiments show that having
two competing workflow ensembles being executed at the same time in a shared resource setting where no
policies are applied, can make their end-to-end execution time up to 300% slower because of the network
contention during data transfers. By applying pacing and throttling policies over the shared resources,
the slowdown to the workflow ensembles’ execution time is minimized to up to 60%, providing an overall
more fair utilization of the available infrastructure.
18
Chapter 2
A Framework to Control the Cyberinfrastructure
A major challenge for dynamic data driven application systems is the integration of data into the application workflows. These workflows are composed of a set of dependent tasks, each of which has different
compute, network, storage, and input requirements that determine where and how each task can be executed. To address the needs of dynamic data driven workflows, it is critical to be able to easily control the
cyberinfrastracture and to integrate instruments, distributed computing resources, and data repositories,
using high-performance networks and data management software, in order to increase the efficiency of
these workflows and to meet their real time needs [194]. In this chapter we present the DyNamo system
that addresses the above challenges. DyNamo, enables high-performance, adaptive, performance-isolated
data-flows across a federation of distributed cloud computing resources and community data repositories
for data driven scientific workflows. Additionally, DyNamo is capable of provisioning appropriate computing, networking and storage resources for observational science workflows from diverse, national-scale
cyberinfrastructures (CI). Driven by specific needs of science applications, these capabilities are presented
to applications and users at the right level of abstraction through an easily understandable, high-level interface. By intergrating the Pegasus Workflow Management System [42], DyNamo is also able to automate
the orchestration of data-driven science workflows on the provisioned infrastructures and resources across
multiple administrative domains. The rest of this chapter is organized as follows: In Section 2.1 we review
19
related work, in Section 2.2 we describe the DyNamo system and we present the CASA [119] workflows
that have been adapted to use DyNamo’s workflow automation, in Section 2.3 we present an evaluation
of the capabilities of the DyNamo system using the CASA dynamic data driven workflows, and finally
Section 2.4 summarizes and concludes this chapter.
2.1 Related Work
The related work can be roughly classified into three categories: research cloud platforms, resource provisioning for workflows, and workflow management for science applications.
Cloud platforms. A number of public cloud providers, such as Amazon EC2 [14], Microsoft Azure
[121], Google Cloud [68], Rackspace [148], offer IaaS abstractions and some ability to orchestrate their resources together with networks through mechanisms like CloudFormation [18] and Heat [137]. However,
these platforms have to maintain strict security standards, in order to protect their systems and keep their
user data safe from outside attacks [16],[70]. For this reason they do not expose a lot of information from
their systems internals, and they do not allow for direct connection with external data repositories, making
moving data in and out of their infrastructure difficult. This limits the usecases these public cloud platforms can cover and especially for the science dynamic data driven applications that need to be deployed
on a multi-cloud environment [19, 187]. The GlobusOnline [59] project permits users to efficiently move
data from one computing resource to another; however, it does not provide any additional mechanisms
to support the end to end execution of science workloads (e.g., resource provisioning, workflow automation). In this thesis, the focus is on integration of scalable, reconfigurable distributed testbeds, including
ExoGENI [20] and Chameleon [87] with emphasis on adaptive data movements, including those that are
DDDAS-based.
20
Resource provisioning for science workflows. There have been several existing works aiming
to improve the effectiveness of resource provisioning for science workflows. Several recent survey papers [103, 61, 37] discussed the effectiveness of IaaS cloud resource provisioning for executing scientific
workflows. Wang et al. [188] propose an approach to build and run scientific workflows on a federation of
multiple clouds using Kepler [109] and CometCloud [187]. Moreover, there are strategies for workflow systems focusing on the deployment of virtual machines in the cloud with limited support for on demand provisioning and elasticity, while none or minimal support to infrastructure optimization is enabled. In particular, data placement/movement and network configuration/provisioning decisions are crucial to achieving
high performance for big data applications [168]. Ostermann et al. [139] discuss a set of VM provisioning
policies to acquire and release cloud resources for overflow grid jobs from workflows, and the impact of
those policies on execution time and overall cost. A previous work by the authors [115], presented dynamic
provisioning techniques that spawn resources based on compute elasticity using Mobius [123]. The work
in this thesis differs from above by allowing easy-to-use, on-demand resource provisioning mechanisms
for malleable data movement and compute provisioning for inter-cloud workflows.
Science workflow management systems and execution environments. Several workflow management systems focus on the execution and management of science applications on cloud platforms.
Malawski et al. [113] focuses on cost optimization modeling for scheduling workflows on public clouds to
minimize the cost of workflow execution under deadline constraints. Abrishami et al. [4] present workflow
scheduling algorithms based on partial critical paths, which also optimize for cost of workflow execution
while meeting deadlines. With the rise of (federated) multi-clouds, many workflow management systems
have focused on this type of platform. Matthew et al. [45] discuss workflow management on multi-cloud
brokering among multi-cloud domains with heterogeneous security postures. Senturk et al. [166] deals
with bioinformatics applications on multi-clouds with focus on resource provisioning. Additionally, new
systems have been designed to support custom workflow execution environments on cloud platforms, with
21
extensive support for Linux containers [63, 64, 130, 44], while well-established workflow systems have
evolved to provide seamless support for running containerized applications [206, 205]. Skyport [63, 64]
utilizes Docker containers to solve software deployment issues via software isolation. Likewise, a DDDAS
approach [194] using model-based and data-driven components demonstrated real-time performance with
Docker containers as an elastic cloud to support multi-domain and multi-modal data fusion.
In this thesis chapter, a set of new approaches is discussed and evaluated, such as the on-demand highspeed network provisioning closely coupled with the automation offered by the workflow management
systems, aiming to satisfy the complex requirements of dynamic data driven science applications.
2.2 DyNamo System Description
The challenges described in Section 1.2, presented using the CASA example, are common to many other
dynamic data-driven scientific workflows. To address these challenges, we have designed the DyNamo
system that leverages well established tools and low level cloud APIs to provide a platform that increases
reliability and accelerates dynamic data driven applications. The DyNamo system offers many capabilities to simplify the process to provision resources and to execute science workflows on the execution
platforms. The capabilities include: (a) programmable resource provisioning for federated multi-cloud infrastructures, which enables on-demand access to multiple national-scale computational cloud resources
for science workflows, (b) end-to-end network management to enable data movement across infrastructures and external science data repositories, (c) a network-centric platform, named Mobius, which hides the
complexity of resource and network provisioning from scientists, and transforms high-level applicationaware, infrastructure-agnostic resource requests into low-level infrastructure provisioning actions, and
(d) support for workflow automation for science applications on the provisioned infrastructure using the
Pegasus workflow management system [42].
22
2.2.1 Integrated, Multi-Cloud Resource Provisioning
Dynamic data driven applications, like those supported in CASA, need to quickly acquire appropriate
resources in an automated fashion, in order to satisfy their bursty computational and network needs. These
resources need to quickly adapt to the ever-changing weather conditions, the severity of the expected
weather events, the number and status of available sensors, and end user defined triggers. DyNamo is
designed to enable domain scientists to acquire cloud resources from multiple cloud providers based on
high-level descriptions of resources. Using the programmable provisioning capabilities on multiple cloud
platforms we designed DyNamo’s application programming interface (APIs), to aid scientists to avoid the
complexity of directly interacting with the cloud provider’s software, which makes it easier for them to
run their applications on multi-provider, national scale cyberinfrastructures (CIs).
In this section we describe the major design elements behind the DyNamo system: (a) how we have
designed the support for multi-cloud integration that is demonstrated using two research cloud providers:
ExoGENI and Chameleon cloud [87] and advanced dynamic network provisioning capabilities for malleable data flows, (b) what the architectural elements of the DyNamo network platform are, and (c) how
CASA scientists can leverage the high-level resource provisioning capabilities of DyNamo for representative data-driven CASA workflows.
ExoGENI [20] is a networked Infrastructure-as-a-Service (IaaS) testbed, which connects 20 cloud sites
on campuses across the US through regional and national transit networks (such as, Internet2 [174], ESnet [39], etc.). ExoGENI allows users to dynamically provision mutually isolated “slices" of interconnected infrastructure from multiple independent providers (compute, network, and storage) by taking
advantage of existing virtualization mechanisms and integrating various resources together, such as the
23
layer-2∗ global dynamic-circuit networks like Internet2 and ESnet, and private clouds operated with OpenStack [136]. ExoGENI provides value added over dynamic-circuit providers by permitting users to instantiate virtual, distributed topologies, and by provisioning the appropriate network resources corresponding
to the topologies, thereby creating end-to-end layer-2 paths.
The NSF Chameleon Cloud [87] is a large, deeply programmable testbed designed for systems and
networking experiments. Similar to ExoGENI, it leverages OpenStack to deploy isolated slices of cloud
resources for user experiments. However, where ExoGENI scales in geographic distribution, Chameleon
scales by providing large amounts of compute, storage, and networking resources spread across only two
sites. In total, Chameleon provides over 15,000 cores and 5 PB storage across the University of Chicago
and the Texas Advanced Computing Center. Users can provision bare metal compute nodes with custom
system configuration connected to user-controlled OpenFlow†
[118] switches operating at up to 100 Gbps.
In addition, Chameleon networks can be stitched to external partners including ExoGENI slices.
2.2.2 Dynamic Data Movement Across Infrastructures and External Repositories
Integrating data movements with compute operations is essential, but often overlooked, for the DDDAS
applications and their workflows. Such needs are especially critical for the DDDAS applications and their
environments. While using resources from multiple clouds can provide scientists with significant flexibility with compute resources, it is challenging to support high-performance data movements into, across,
and out of the distributed compute infrastructure. To address the data movement challenges, DyNamo
is designed to enable flexible, programmable provisioning of high-bandwidth network links across and
within different infrastructures and science data repositories.
∗Also known as the Data Link Layer, which is the second level in the seven-layer Open Systems Interconnection (OSI)
reference model for network protocol design.
†OpenFlow is a communications protocol that gives access to the forwarding plane of a network switch or router over the
network.
24
Slice2: for workflow X
Slice3: for workflow Y
Dedicated data plane
Dedicated data plane
Slice1: for moving data
Streaming &
archived data
Orchestrate
workflows
high-performance
data flows
HTCondor pool Pegasus
Stitchport to
Chameleon NODE NODE
Chameleon Nodes CASA Data
Repository
CASA Radar
Network
Stitchport to
CASA Data server
OSF/DOE,
Oakland, CA
RCI,
Chapel Hill, NC
FIU,
Miami, FL
ExoGENI
SDX
Switch @SL
ExoGENI
SDX
Switch @DD
Duke,
Durham, NC
StarLight
40G
40G
10G
10G 10G UCD,
Davis, CA
UFL,
Gainesville, FL
10G
UVA
Amsterdam,
The Netherlands
40G
ESnet5
Departure Drive
Raleigh, NC
40G
10G SURFnet
10G
10G FLR
TB
CENIC
10G
WVNet,
Morgantown, WV
ION
AL2S
ESnet
10G
100G
10G
10G
10G
10G
10G
TAMU,
College Station, TX
LEARN
10G
UMass,
Holyoke, MA
100G
10G
10G
UAF
Fairbanks, AK
NICTA, Sydney,
Australia
L2TP
Tunnel
BEN
CIENA HN,
Hanover, PA
CIENA,
Ottawa, Canada
40G 100G
10G
10G
PSC,
Pittsburgh, PA
GWU,
Washington, DC
10G
WSU,
Detroit, MI
10G
I2 ION
UH,
Houston, TX
BBN,
Boston, MA
SL,
Chicago, IL
Pacific NW
GigaPop
I2 AL2S
10G
NOX
ESnet 100G
Testbed
Leverage ExoGENI for resource provisioning and Layer 2
connectivity between national CI resources and data repositories
Computed
data products
Radial velocity data
Figure 2.1: Data movement and resource provisioning across CI federation.
Figure 2.1 shows the deployment scenario for supporting dynamic data movements. DyNamo leverages the ExoGENI network overlay to connect data repositories, like the CASA radar data repository, with
national scale CI resources. DyNamo currently supports resource provisioning on Chameleon and ExoGENI clouds, with plans for connecting to XSEDE JetStream [172], Open Science Grid (OSG) [138] and
Amazon AWS [184]. The network links between ExoGENI, the external resources and data repositories
use ExoGENI stitchports [19], a set of dedicated, reconfigurable network links, as discussed next.
Stitchport. A stitchport (used in ExoGENI) is an intersection of two independent network domains
that are stitched together to establish a layer-2 network connection. A general stitchport abstraction is
a fundamental building block for DyNamo enabled DDDAS applications. What lies beyond a stitchport
is assumed to be an IP-based subnet including infrastructure like data transfer nodes for large compute
clusters, data repository endpoints, nodes connected to storage arrays for instrument data, sensors, or any
other sources or sinks of scientific data sets. Stitchports enable high-performance networking to external
infrastructure outside of ExoGENI over layer-2 network connectivity (data layer), which improves privacy
and security of the transfers, and the network’s performance.
University of North Texas (UNT) stitchport. To accommodate CASA’s needs, DyNamo deployed
a stitchport at the location of the CASA data repository at the University of North Texas in Denton, TX.
The server behind this stitchport hosts the weather data from the 7 CASA radars. The UNT server is
connected to a static pool of Virtual Local Area Networks (VLANs) set up and connected to the ExoGENI
25
network fabric by transiting through the Texas Lonestar Education and Research Network (LEARN) [176]
regional network. This connectivity makes it possible to programmatically connect the data repository
to any of the 20 ExoGENI racks or other external infrastructures that ExoGENI can connect to, with a
dynamic bandwidth specification.
Chameleon Stitchport. Chameleon supports the creation of isolated OpenFlow networks controlled
by users. In addition to connecting Chameleon compute resources, these networks can be stitched to
external partners, including ExoGENI, using dedicated layer-2 circuits [28]. Stitching uses a set of ExoGENI
stitchports, which can be allocated to Chameleon networks. A Chameleon user can create an isolated
network that includes one of the ExoGENI stitchports. Then ExoGENI users can create slices that include
the allocated stitchport. After the stitchport has been allocated on both infrastructures, layer-2 traffic will
pass between the networks.
DyNamo, through stitchports, connects the CASA data repository, via dynamically provisioned layer-2
overlay networks, to ExoGENI resources, which are then stitched to Chameleon. DyNamo creates layer-2,
end-to-end, virtualized, performance isolated data movement paths for data-intensive CASA workflows.
2.2.3 Mobius: DyNamo’s Network-centric Provisioning Platform
During the design of the DyNamo system, new capabilities were added to a network-centric platform
named Mobius [115, 123] to support the needs of DyNamo. Mobius was extended to support multi-cloud
resource provisioning and high-performance isolated data flows across diverse federated cyberinfrastuctures. Additionally, new modules were added to allow higher level applications and workflow management
systems to interface with Mobius and translate high-level resource requests to low-level cyberinfrastructure actions, thereby bridging the abstraction gap between dynamic data-driven science applications and
resource provisioning systems.
26
ExoGENI Layer2
Network Overlay
Mobius Controller
Mobius Network-centric Platform
Monitoring
and Control
Periodic
processor
Workflow
database
Ahab adapter
(ExoGENI cloud +
network)
jclouds adapter
(OpenStack:
ChameleonCloud and
Jetstream)
Multi-cloud and Network Resource Manager
Workflow Management
Systems
Scientists/Users
Mobius REST API Resource requirements
Gantt chart
Science Data
Repositories
Figure 2.2: Mobius network-centric platform.
27
The Mobius platform has been implemented as a Representational State Transfer (REST) service and
exposes REST APIs [123] for programmable provisioning of network and compute resources. It takes
high-level application-aware resource descriptions from the application scientists or workflow systems,
and automatically provisions resources using the native APIs of different providers. Essentially, the applications can express their resource requirements on Mobius over time in the form of a Gantt chart. Scientists can easily set up application-specific environments by invoking the Mobius REST API. As depicted in
Figure 2.2, Mobius contains 7 major components:
Multi-cloud and Network Resource Manager. At the network management layer, Mobius translates application requests to native cloud specific requests. Leveraging the ExoGENI network overlay, the
application-level data movement requests get translated to ExoGENI network provisioning requests, to set
up data movement paths. The Multi-cloud and Network Resource Manager consists of two native cloud
specific adapters to provision resources on the targeted infrastructures.
Ahab adapter: Ahab [114] is a collection of graph-based Java libraries designed to allow applications to control, modify and manage the state of ExoGENI slices. Ahab includes libndl, which provides
a graph-based abstraction for interacting with ExoGENI slices. Ahab primarily handles the conversion
of an abstract topology graph consisting of ComputeNodes, networks, stitchports, storage, etc. into native
ExoGENI resource requests, expressed as Network Description Language-Ontology Web Language (NDLOWL [74]), which are then sent to ExoGENI using another library called libtransport. The Mobius Ahab
adapter leverages the Ahab library functionalities to instantiate compute resources on ExoGENI racks and
to create network paths between stitchports, ExoGENI racks and other cloud providers like Chameleon.
Jclouds adapter. Apache jclouds [15] is an open source multi-cloud toolkit that allows the creation of
portable applications across different cloud providers while maintaining full control to use cloud-specific
features. Part of the advances includes implementing a Mobius jclouds adapter for OpenStack to provision
resources on Chameleon and XSEDE JetStream. It is also planned to implement a Mobius jclouds adapter
28
for Amazon Elastic Compute Cloud (EC2) to provision resources on Amazon Web Services (AWS), which
can then be used in conjunction with the stitchport for AWS Direct Connect to move data in and out of
the EC2 provisioned resources.
Workflow Database. The information about all the resources provisioned for a workflow or an application on different clouds and the corresponding application request parameters are maintained in the
Workflow Database.
Periodic Processor. High-level application requests can be represented as a Gantt chart of required
resources for a particular application or workflow. The periodic processor triggers the provisioning of the
resources at the scheduled time. It also monitors the provisioning status of all the resources instantiated for
various application workflows and triggers notifications to applications/workflow management system.
Monitoring and Control. The Monitoring and Control module is designed to transparently maintain
the quality of service of the provisioned end-to-end infrastructure through continuous monitoring and
control (e.g. growing and shrinking compute or storage resource pools and changing network properties
of links).
Mobius Controller. The Mobius controller orchestrates all the above components and processes the
incoming REST requests to trigger appropriate Mobius components.
2.2.4 Workflow Automation
One of the requirements of the DyNamo system, was to provide execution automation, failure resilience
and recovery to the dynamic data driven applications. To satisfy this requirement the DyNamo system
design leverages the Pegasus Workflow Management System (WMS).
Pegasus WMS. Pegasus [41] is a popular workflow management system that enables users to design
workflows at a high-level of abstraction that are independent of the resources available to execute them and
are also independent of the location of data and executables. Workflows are described as directed acyclic
29
graphs (DAGs), where nodes represent individual compute tasks and the edges represent data and control dependencies between tasks. Pegasus transforms these abstract workflows into executable workflows
that can be deployed onto distributed and high-performance computing resources such as DOE Leadership
Computing Facilities (e.g., NERSC [127] and OLCF [132]), shared computing resources (e.g., XSEDE [180]
and OSG [145]), local clusters, and clouds. During the compilation process, Pegasus performs data discovery, locating input data files and executables. Data transfer tasks are automatically added to the executable
workflow and perform two key functions: (1) stage in input files to staging areas associated with the computing resources, and (2) transfer the generated outputs back to a user-specified location. Additionally,
data cleanup (removes data that is no longer required) and data registration tasks (catalog the output files)
are also automatically added to the workflow. To manage user’s data, Pegasus interfaces with a wide
variety of backend storage systems that use different data access and transfer protocols.
Within the scope of this thesis we have implemented three CASA workflows in Pegasus WMS, that we
present in the following subsections. One that computes nowcast products, one that computes maximum
wind velocity products and one that computes hail products, as described in Section 1.2.1. The DAGs of
these workflows range from small and simple ones to large and complex, but all of them share some similarities. All the workflows support execution using container technologies, and there are images available
for both Docker [23] and Singularity [96] virtualization services to encapsulate workflow applications. Additionally, all workflows have to fit into CASA’s operational requirements, and thus they are designed to
scale over the resources and allow compile-time task clustering optimizations with Pegasus.
2.2.4.1 CASA Pegasus Nowcast Workflow
The Pegasus Nowcast workflow [163] computes short-term advection forecasts, as described in Section
1.2.1.2, by splitting grided reflectivity data into 31 grids and computing reflectivity predictions over the
next 30 minutes. An abstract version of the workflow’s DAG is presented in Figure 2.3, which reveals that
30
the size of the workflow doesn’t depend on the size of the input, and the number of compute tasks is fixed.
The nowcast workflow contains 63 compute tasks, 1 task for splitting the input data to individual grids
and 2 tasks for each of the 31 grids. Additionally, because the nowcast workflow doesn’t have many levels
but fits many tasks (fan out), Pegasus can optimize task placement during compile-time using horizontal
task clustering with a variable clustering size.
NowcastToWDSS2
MergeDarts.nc
PredictedReflectivity_0min.nc PredictedReflectivity_30min.nc . . .
mrtV2 NetCDFToPNG
mrtV2_config.txt nexrad_ref.png
Storm_CASA_0.geojson PredictedReflectivity_0min.png
mrtV2 NetCDFToPNG
mrtV2_config.txt nexrad_ref.png
Storm_CASA_30.geojson PredictedReflectivity_30min.png
Input File
Intermediate File
Output File
Compute Job
Figure 2.3: CASA Pegasus Workflow - Nowcast.
2.2.4.2 CASA Pegasus Wind Speed Workflow
The Pegasus Wind Speed workflow [164] computes the maximum wind velocity, by combining multiple
single radar output to cover blind spots that radars may present individually (Section 1.2.1.3). An abstract
version of this workflow’s DAG is depicted in Figure 2.4. To construct the input for the wind speed pipeline,
single radar data files are accumulated over a variable time window (minimum 1 minute), which regulates
how often CASA produces maximum wind velocity contours, but also affects the size of the input of a
single workflow run. From Figure 2.4, it can be derived that the first level of tasks (unzipping any zipped
31
files) in the wind speed workflow depends on the number of input files and thus this workflow has a
variable number of tasks. However, the number of tasks at the next levels of the workflow are fixed and
one can derive the size by using K + 4, where K: number of input files. For example, using a 1-minute
accumulation period the Radar Network in the Dallas Fort Worth area produces 20 input files (on average),
where the workflow has 24 compute tasks, and while using a 5-minute period the same radars produce
85 input files (on average) and the workflow grows to 89 compute tasks. To optimize the workflow for
a specific resource set, one can employ horizontal task clustering for the first level of the workflow and
vertical task clustering to group and label the rest of the compute tasks together.
unzip unzip unzip
radar_1.netcdf
radar_1.netcdf.gz radar_2.netcdf.gz radar_N.netcdf.gz
radar_2.netcdf radar_N.netcdf
max_velocity
MaxVelocity.netcdf
max_wind.png
merged_netcdf2png
MaxVelocity.png
mvt
pointAlert_config.txt
MaxVelocity.geojson locations.geojson
pointalert
alert.geojson
. . .
Input File
Intermediate File
Output File
Compute Job
Figure 2.4: CASA Pegasus Workflow - Wind Speed.
32
2.2.4.3 CASA Pegasus Hail Workflow
The Pegasus Hail workflow [162] is the most complex workflow among the three Pegasus workflows currently available to CASA. As described in Section 1.2.1.4, the hail workflow classifies the type of expected
precipitation between hail, sleet and liquid. The hail workflow is particularly interesting because of the
timing of availability of the input files. The input data consist of radar moment data from CASA’s radars
and NOAA’s NEXRAD radar. The NEXRAD radar data are not produced as frequently as the CASA radar
data, and when they are, they become available approximately 5-minutes after the CASA radar data. As a
result, multiple time scales leads to a more complex workflow design.
An abstract version of workflow’s DAG is presented in Figure 2.5. The DAG contains a main workflow,
also referred to as “root” workflow that processes the CASA radar data and a sub-workflow, triggered by
the main workflow, which processes the NEXRAD data if and when they become available to compute
the final composite output. The size of the main workflow depends on the number of input files that have
arrived from CASA’s network, and for each file, 5 compute tasks are added to the DAG. The last task of
the main workflow is preparing the DAG of the sub-workflow - on the fly - during the execution, and
waits until NEXRAD data become available or if the time threshold set is exceeded. In case the NEXRAD
data are not received, the sub-workflow DAG is not executed and the main workflow finishes. The size of
the sub-workflow’s DAG, as seen in Figure 2.5, is fixed and contains 6 compute tasks. For the two DAGs,
Pegasus task clustering can be applied to group tasks together and optimize for data movement and task
placement. Horizontal task clustering can be used to group tasks together in the main workflow, since
there are many tasks on the same DAG level, while vertical (label) task clustering is suited better for the
sub-workflow, since it has a more sequential profile.
33
unzip unzip
radar_1.netcdf
radar_1.netcdf.gz radar_N.netcdf.gz
radar_N.netcdf
. . .
hydroclass
NetCDFToPNG
current_soundings.txt
hydroclass
NetCDFToPNG radar_N.ini
standard_ref.png
radar_1-ref.png radar_N-ref.png
standard_hmc_single.png radar_1.hc.netcdf radar_N.hc.netcdf
NetCDFToPNG NetCDFToPNG
standard_hmc_single.png
radar_1.hc.netcdf.cfradial radar_N.hc.netcdf.cfradial
RadxToGrid RadxToGrid radar_N_latlon.txt
radar_1-hmc.png radar_N-hmc.png
radar_1.hc.cart.netcdf radar_1.hc.cart.netcdf
. . .
. . .
prepare_subwf
radar_X.netcdf.gz
RadxConvert
radar_X.ini
current_soundings.txt
radar_X.hc.netcdf.in.cfradial
hydroclass
radar_X.hc.netcdf.cfradial
radar_X_latlon.txt
RadxToGrid
radar_X.hc.cart.netcdf
hydroclass_composite
composite.nc standard_mergedhmc.png
d3_options.cfg
NetCDFToPNG
d3_hmt
hydro_composite.png
hmt_hail.geojson
Pegasus
Sub-Workflow
Input File
Intermediate File
Output File
Compute Job
standard_ref.png
radar_1.ini
radar_N_latlon.txt
Figure 2.5: CASA Pegasus Workflow - Hail.
34
2.3 Evaluation of DyNamo System
To evaluate the capabilities of DyNamo, we selected CASA workflows that produce nowcasts and wind
speed estimates as described in Section 1.2.1. Figure 2.6 shows the deployment scenario for the CASA
workflows used for the evaluation. The workflow processes include input data collection and product
generation, visualization, contouring into polygon objects, spatial comparisons of identified weather features with infrastructure, and dissemination of notifications.
2.3.1 CASA Testcases
The generated Pegasus workflow for nowcast [163] consists of 63 compute tasks, as presented in Section 2.2.4.1. All tasks run within a Docker container that is managed by Pegasus and has a size of 476MB.
For the evaluation of the nowcast workflow, 30 minutes of pre-captured data (individual file size 9.6MB,
total size 287MB) are used for this case study, and which are replayed using an accumulation interval of 1
minute.
On the other hand, the wind workflow [164], presented in Section 2.2.4.2, has a variable size. The
preprocessing phase is responsible to unzip any zipped input files, and the number of tasks depends on
the decompressing time. The compute tasks are running within a Docker container, 523MB in size, and
for the evaluation of the wind workflow, 40 minutes of pre-captured data (individual file size ∼12MB, total
size ∼6GB) are used, and which are replayed using an accumulation interval of 5 minutes.
Based on the number of jobs and the amount of data for each testcase, nowcast is classified as a computeintensive workflow and wind speed as a data-intensive workflow.
2.3.2 Experimental Infrastructure
A realistic scenario that is similar to the real CASA operational radar data processing setup is deployed as
a case study on ExoGENI and Chameleon clouds using the DyNamo system. It is worth mentioning that
35
Data Repo /
HTCondor
Submit
CASA Radar
Network
NowCast
Initial
Processing/
Data Merging
LEARN
Network
ExoGENI
Provisioned
Layer-2 VLANs
Master
Worker 1
Worker 2
Worker N
HTCondor
Compute
Cluster 1
Master ExoGENI Cloud
Worker 1
Worker 2
Worker N
HTCondor
Compute
Cluster 2
Chameleon Cloud
UNT
DROC
Pegasus Orchestration
Mobius Resource
Provisioning
(Compute and
Networking)
Fort Worth NEXRAD
… …
… …
Figure 2.6: CASA workflow deployment.
rather than direct performance comparison between ExoGENI and Chameleon testbeds, the major goal
here is to demonstrate the effectiveness of the DyNamo workflow execution over heterogeneous compute
and networking infrastructures based on their resource availability. In the setup considered here, the
ExoGENI compute cluster is located in Jacksonville, FL, and contains 13 VM nodes, with 1 master node,
11 worker nodes (also referred to here as “workers”), and 1 Network File System (NFS) storage node. Each
node has 4 virtual cores, 12 GB RAM, and is connected to 1 Gbps network. The Chameleon compute cluster
contains 5 nodes and is located in Chicago, IL. It is comprised of 4 worker nodes and 1 NFS storage node.
Each node has 24 cores, 192 GB RAM, 250GB Solid State Drive (SSD) and is connected to a shared 10 Gbps
network. The two configurations provide 44 compute slots and 96 compute slots, respectively.
Because the master node doesn’t require much processing power for a small-sized cluster, for this
test-study, it was decided that it is suitable to maintain it within ExoGENI and connect all the workers
to the same HTCondor pool. In order to specify which workers to dispatch jobs to, the HTCondor’s job
requirements filter is used, and the workers are requested on either ExoGENI or Chameleon based on their
hostnames.
36
Finally, the test-study deployed the Pegasus submit node at the University of North Texas (UNT) in
Denton, TX, which receives data from CASA’s data sources via the TX LEARN research network and
triggers new workflows dynamically, as new data arrive. Raw data from the CASA radars and the DFW
NEXRAD radar are directly transmitted to the submit node. Nowcasts are first computed on the DFW
Radar Operations Center (DROC) and then transmitted to the UNT submit node. The UNT submit node is
connected to the ExoGENI and Chameleon processing clusters via 1 Gbps stitchable layer-2 VLANs, which
are dynamically provisioned on ExoGENI by Mobius.
Software. HTCondor version 8.6 has been installed on all of the nodes in the infrastructure above, and
its configuration was customized to match the role of each node. In this setup, the workers are configured
with partitionable slots, which allows one to request multiple cores per job request. Additionally, Pegasus
version 4.9.1 and Docker CE version 18.09.5 are installed on the submit node and all of the workers. Mobius was used to provision compute resources on ExoGENI and Chameleon, and the network connections
between ExoGENI, Chameleon and the CASA repository.
2.3.3 Experimental Results
During the preliminary tests with the nowcast workflow, it was observed that transferring and loading the
Docker container for every task had a severe impact on performance. To alleviate this impact, the nowcast
workflows were configured for an environment that does not have access to shared storage and another
one that shares Network File System (NFS) folders across the worker nodes. In the first environment,
Pegasus uses HTCondor’s file transfer feature to move data in and out of the worker nodes, and back to
the submit node. For the NFS environment, Pegasus uses its own data transfer recipes to stage in data
to a shared scratch location on the NFS shared folder and orchestrates the jobs on the workers to create
a symbolic link (symlink). When the final output products are generated, Pegasus pulls the data back to
the submit node, via scp. This approach allows removing frequent transfers between the submit node and
37
the workers, and enables placing large common files (e.g., the container images) closer to the workers.
For both cases, varying sizes of task clusters were tested, grouping together tasks of the same type, and
evaluating the effect for each configuration.
2.3.3.1 Effect of cluster parallelism in the nowcast workflow
0
50
100
150
200
250
300
4 8 12 16 24
Avg. Runtime (Seconds)
Task Clustering Size
Job Parallelism = 4
Job Parallelism = Clustering Size
Figure 2.7: Nowcast: Exploring Cluster Parallelism (Using Stitchport 1Gbps).
As described in Section 1.2.1.2, nowcast is a time critical and compute intensive data-driven workflow.
It is important to explore whether high parallelism in task clustering yields faster completion times. Figure 2.7 presents the average workflow makespan, which is defined as the average runtime of the workflow,
for executions of the nowcast workflow (with 63 tasks) on Chameleon nodes, while having NFS storage
enabled. Pegasus allows the selection of a task clustering size (X-axis), which is the number of workflow
tasks grouped in a job. It also allows the selection of a desired job parallelism, which is the number of
cores allocated to the job for running the tasks in the job. For each task clustering size, the runtimes are
plotted, for two values of job parallelism - a fixed value of 4 and for a value equal to the task clustering size.
Figure 2.7 shows that there are marginal gains in workflow runtimes, ranging between 3 and 28 seconds
for all task clustering sizes, but these benefits are quickly overshadowed by the increased need for more
38
compute slots, blocking other jobs from being executed concurrently. Thus, for the rest of the nowcast
experiments, when Pegasus clusters tasks together, a fixed parallelism of 4 is used, since being able to fit
more jobs while marginally increasing the job runtime (Figure 2.7) resulted in better resource utilization
and reduced the time that jobs had to wait in the queue.
2.3.3.2 Performance study of Nowcast workflows
0
100
200
300
400
500
1 4 8 12 16 24 32
Avg. Runtime (Seconds)
Task Clustering Size
Exo-11
Exo-11-NFS
Cham-4
Cham-4-NFS
(a) Single Workflow Runs.
0
1000
2000
3000
4000
5000
6000
1 4 8 12 16 24 32
Avg. Runtime (Seconds)
Task Clustering Size
Exo-11
Exo-11-NFS
Cham-4
Cham-4-NFS
(b) Workflow Ensemble Runs.
Figure 2.8: Nowcast: Using Stitchport 1Gbps.
Figures 2.8a and 2.8b present the average nowcast workflow makespan, while targeting different execution environments on ExoGENI and Chameleon, with varying task clustering sizes. Figure 2.8a shows
results from running a single nowcast workflow without any other workflows competing for resources for
different cluster sizes. Due to the size of the nowcast workflow (63 tasks), one could fit the non-clustered
workflow at once in Chameleon (96 slots available), but there were insufficient resources on ExoGENI (44
slots available). By submitting the workflow without clustering (cluster size 1) there is a noticeable impact
on the performance in the cases where NFS is not used, as observed on both sites. The delay is due to
the increased network traffic created by transferring the application container. Additionally, in all cases,
39
Chameleon executions tend to be a bit slower than the ExoGENI ones, due to the loading of Docker images into the workers for every task. Having 11 nodes on ExoGENI helps to spread out the load imposed
by Docker. However, as the number of clustered tasks is increased, the performance advantage is almost
negligible since for cluster sizes >12 the runtimes of the configurations are 30-45 seconds apart. Figure
2.8b presents results from replaying the test case data, which results in executing a workflow ensemble,
defined as a set of workflows running the same computations on different input data, possibly arriving at
different times. Similar to the single workflow runs, non-clustered configurations perform very poorly as
they saturate the submit node’s link for prolonged periods, while overlapping workflows in the ensemble
competing for the same resources, makes the situation even worse. It is observed that, by increasing the task
clustering size to 12 or more, the average runtime is improved by more than 4500 seconds, reaching a stable
point, on both ExoGENI and Chameleon.
2.3.3.3 Improving data movement performance
0
500
1000
1500
2000
2500
3000
3500
1Gbps
LEARN
1Gbps
Stitch
500Mbps
Stitch
100Mbps
Stitch
Runtime (Seconds)
Network Configurations
(a) Exogeni - HTCondor Transfers.
200
250
300
350
400
450
500
1Gbps
LEARN
1Gbps
Stitch
500Mbps
Stitch
100Mbps
Stitch
Runtime (Seconds)
Network Configurations
(b) Exogeni - NFS.
Figure 2.9: Wind: Workflow Ensemble Runs - No Clustering.
Figure 2.9 depicts the distribution of the wind workflow makespans (completion times) while either
using TX LEARN public layer 3 network or the stitchport with variable bandwidths. Figure 2.9a displays
40
makespan statistics without using NFS, while executions in Figure 2.9b use NFS to optimize data placement.
For both cases, it is observed that the performance of the 1Gbps stitchport is more consistent than the
1Gbps LEARN, which is not performance isolated. As the stitchport bandwidth is decreased to 500Mbps
and 100Mbps, the makespan increases and its distribution spreads, since the wind workflow, being dataintensive, is sensitive to network bandwidth. In addition, it is observed that the median makespans for
1Gbps LEARN and 1Gpbs or 500Mbps stitchport are similar, proving that 500Mb/s bandwidth is sufficient
to allow the processing infrastructure (in this case study) to keep up with the incoming data. These results
show the benefits and value of dynamically provisioned, performance isolated networks for data-intensive
workflows.
2.3.3.4 Compute slot utilization
0
20
40
60
80
100
0 500 1000 1500 2000 2500
Chameleon - HTCondor Transfers
Number of compute slots
Runtime (Seconds)
Task Clustering Size = 4
Task Clustering Size = 16
Task Clustering Size = 32
0
20
40
60
80
100
0 500 1000 1500 2000 2500
Chameleon - NFS
Number of compute slots
Runtime (Seconds)
Task Clustering Size = 4
Task Clustering Size = 16
Task Clustering Size = 32
Figure 2.10: Nowcast: Workflow Ensemble Runs - Chameleon Compute Slot Utilization.
41
Another aspect of great operational interest is the amount of resources that have to be allocated for a
compute intensive workflow like nowcast. Figure 2.10 presents the number of active compute slots while
replaying the nowcast testcase, using task clustering sizes of 4, 16 and 32, on Chameleon nodes, with and
without an NFS server. In both cases, increasing task cluster size decreases the number of active compute
slots, and thus decreases the number of nodes that need to be provisioned. It is observed that clustering 4
tasks creates high demand, with spikes close to 80 slots (NFS case). Increasing the task clustering to 16
or 32 tasks, decreases peak compute slot demand to just 20 slots. The runtime is the heighest with task
cluster size 4 (HTCondor case) because that case has total 16 jobs moving containers vs. 4 and 2 jobs with
task clustering 16 and 32.
2.4 Conclusion
This chapter presented the DyNamo system that addresses the computing and networking challenges
for CASA, a distributed, dynamic data-driven observational science application. DyNamo enables highperformance, adaptive data flows and coordinated access to distributed cloud resources and data repositories to improve both the efficiency and ease of deployment for CASA workflows. The chapter described
how the DyNamo system encapsulates various cyberinfrastructure capabilities and uses the Mobius platform and makes it easier for scientists to dynamically provision end-to-end resources across multiple
national scale cyberinfrastructures. We also presented how the DyNamo system leverages the Pegasus
Workflow Management System for orchestration of complex workflows on the provisioned infrastructure.
Through a performance evaluation of the CASA workflows, we demonstrated that (1) DyNamo results in
timely processing of CASA’s Nowcast workflows under different infrastructure configurations and network conditions, and (2) the effectiveness of workflow task clustering on throughput of an ensemble of
Nowcast workflows. The methods discussed in this chapter show that by using advanced layer2 networking capacities, DyNamo enables fast, reliable data transfers, easy resource provisioning, and programmable
42
configuration for CASA’s production workflows. In the future the DyNamo system can be extended in five
areas: (1) integration with other resource providers like public clouds and other research cloud infrastructures, (2) development of additional support for streaming workflows, (3) support of additional CASA
workflows that include model and streaming data, (4) development of policies and mechanisms for monitoring and control to transparently maintain the Quality of Service (QoS) of the provisioned infrastructure,
and (5) incorporation of virtual software defined exchanges [114] as a basis for network link adaptation,
flow prioritization and traffic control between endpoints. These enhancements will increase the diversity of the resources made available by DyNamo to the scientists and will offer finer grained control over
acquired resources and Dynamic Data Driven Applications (DDDAS). Finer grained control will be instrumental for the next steps in CASA’s operations and other dynamic data driven operations, since it will
enable a more fair use of compute, network and storage resources while meeting the expected Quality of
Service (QoS).
43
Chapter 3
A Framework to Capture End-to-End Workflow Behavior
In spite of significant efforts on solving critical engineering challenges in workflow management [41]
and the wide use of workflow systems executing many scientific applications from diverse domains in
production [173], the current state-of-the-art approaches lack a deep understanding and support of the
requirements, characteristics, and relationships of current and emerging dynamic data driven applications
and systems. As described in Section 1.2, the DDDAS can span across federated domains, can ingest data
from a variety of geo-distributed data repositories and can scale their system resources across facilities, on
demand, to respond to their processing load (Section 1.2.1). However, the state-of-the-art monitoring tools
focus on specific aspects of the execution, such as only data transfer performance or only compute resource
utilization within a sole domain, and they fail to correlate the statistics into a holistic view across federated
domains for the entire dynamic data drive application system. To address these limitations, we have designed a framework to capture the end-to-end DDDAS behavior, and in this chapter, we present how this
framework integrates a collection of well-established tools, to produce a comprehensive knowledge base of
applications, systems characteristics, and requirements. Due to the complexity of the dynamic data driven
scientific applications and the nature of distributed computing, building such knowledge not only requires
information about the workflow application execution and its execution host, but also requires a broader
44
understanding of the system as a whole. This includes capturing data at the application-level, both characteristics (e.g., workflow structure, input data, etc.) and performance (e.g., CPU and memory usage, I/O
operations, etc.); and also at the system level within and across computing sites (e.g., bulk data transfers,
networking including routing, IPv6, flow dynamics, etc.). Gathering, storing, and distributing such information is not only challenging due to the large volume of data produced by each system at different levels
of the software stack, but also because of the need to properly identify common, overlapped, or complementary information generated by different tools. The framework we have designed not only automates
data gathering, preparation, and storage, but also enables data querying, retrieval, and analytics. The rest
of this chapter is organized as follows: In Section 3.1 we review related work, in Section 3.2 we cover
existing monitoring systems and tools, in Section 3.3 we go in depth to describe our framework to capture
end-to-end workflow behavior, in Section 3.5 we present case studies and demonstrate the capabilities of
our solution, and finally Section 3.6 summarizes and concludes this chapter.
3.1 Related Work
The gathering and characterization of accurate resource usage is crucial for the development of solutions
that enhance scientific productivity (such as tuned infrastructure configurations, and sophisticated resource allocation and task scheduling algorithms). As scientific applications and systems become more
complex, understanding a system’s behavior and its environment are key to efficient resource and application management. Although workload gathering and characterization is not novel, the fast pace at which
such systems evolve requires new tools and mechanisms to efficiently process the large, heterogeneous
volumes of data that they generate [25].
In the past decade, workload archives have become very popular [51, 79, 91] and have enabled several
advances in distributed computing including the design, development, and evaluation of a number of algorithms [51]. Nevertheless, most of the algorithms are impractical due to the unrealistic assumptions, lack
45
of meaningful comparison, or bindings to a specific platform. Additionally, such archives are not suitable
for investigation of workflow-based applications as the dependencies between workflow tasks affect each
other in terms of resource needs, performance, failures, and so on.
Some efforts have been made to collect, profile, and publish traces and performance statistics for real
scientific workflows [94, 124, 54, 62, 83, 56, 150, 140, 186]. These traces provide fine-grained information
about workflow and job characteristics and performance metrics including CPU and memory usage, I/O
operations, job dependencies, among others. Although detailed workflow information could be extracted
from these traces, there is limited information about the infrastructure as well as limited information about
end-to-end performance data (e.g., TCP flows for data transfers, I/O profiling, etc.). In [94], we have profiled
time-series data for the SNS workflow, in which anomalous behaviors could be identified. However, that
work was limited to information gathered by the pegasus-kickstart toolkit. A common element absent
from all past works is the lack of a system that automates data gathering of monitoring information and
enables near real-time monitoring and investigation. The work described in this chapter proposes a new
framework that leverages existing and newly built state-of-the-art tools and systems to provide a single
entry point for data analysis of end-to-end, comprehensive, distributed workflow executions.
3.2 Existing Monitoring Systems and Tools
Building a system for the end-to-end performance data capture of scientific workflow executions requires
combining information from a collection of complex and heterogeneous tools. In this section, we briefly
introduce the existing (and proven) software tools we have leveraged for each component of our proposed
system architecture (presented in Section 3.3).
46
3.2.1 System Frameworks
Pegasus WMS. As briefly introduced in Section 2.2.4, Pegasus [41] can increase the science productivity
by automating the execution of workflows. Additionally, by using the Pegasus WMS, during workflow
execution, provenance information from the workflow and the job logs is automatically parsed and stored
in a relational datastore by a monitoring daemon called pegasus-monitord [73].
• Workflow walltime – the wall time from the start until the end of the workflow execution, which is
reported by HTCondor DAGMan (the workflow executor used in Pegasus);
• Workflow cumulative job wall time – aggregated run times for all the individual jobs in the workflow
(aids resource requirements estimation);
• Breakdown of jobs by count and runtime – for each job type, the total number of jobs (succeeded and
failed), as well as the total, minimum, maximum, and average runtimes; and
• Breakdown of tasks and jobs over time on hosts – the number of jobs and total runtime of jobs running
over different hosts.
The monitoring daemon can also be configured to send normalized events to an Advanced Message
Queuing Protocol (AMQP) endpoint [5]. This is particularly useful when conducting analysis across workflows and correlating monitoring information from various monitoring sources.
Globus. The Globus transfer service [195] is a cloud-hosted software-as-a-service implementation that
orchestrates file transfers between pairs of storage systems [11, 108]. A transfer request specifies, among
other things, a source and destination; the file(s) and/or directory(ies) to be transferred; and (optionally)
whether to perform integrity checking (enabled by default) and/or encrypt the data (disabled by default).
Globus provides automatic fault recovery and automatic tuning of optimization parameters to achieve high
performance. It can also transfer data using either the GridFTP or HTTP protocols. During the transfer,
47
Globus provides performance monitoring metrics such as the average throughput at 60 seconds interval.
Upon completion, a detailed transfer log is made available for the users, which includes:
• The request and completion time of the transfer;
• The source and destination endpoint information such as the number of physical Data Nodes (DTNs)
and the type of transfer software stack (Globus Connect Personal or Globus Connect Server [106]);
• Transfer performance data such as the average transfer throughput, the total number of faults and
checksum failures;
• Transfer parameters such as concurrency, parallelism, pipeline depth [89, 106], data integrity checking and encryption setting; and
• Transferred dataset information like total number of files, directories and bytes.
3.2.2 Monitoring Tools
Darshan. I/O performance behavior is gathered with Darshan [171], an HPC, lightweight, applicationlevel I/O profiling tool that captures statistics about the behavior of HPC I/O operations. Darshan captures
data for each file opened by the application, including I/O operation counts, common I/O access sizes,
cumulative timers, and so on. I/O behavior is captured for POSIX IO, MPI-IO, HDF5, and Parallel netCDF
data interface layers [146, 75, 81]. Darshan also captures a set of job-level characteristics such as the
number of application processes, the job’s start and end times, and the job unique identification provided
by the scheduler. Lastly, Darshan can instrument I/O functions in both statically and dynamically linked
executables.
Tstat. The TCP STatistic and Analysis Tool (Tstat) [181] is an open source, passive trace collection tool
used to capture a packet-level log of a comprehensive set of parameters such as TCP’s RTT, congestion
window size, ACK/SYN/FIN counts, and retransmitted, reordered or lost packets. It also covers a wide
48
spectrum of network activities such as TCP, UDP, and RTP/RTCP traffic. Tstat distinguishes between
“complete” and “not complete” flows and also between clients (hosts that actively open a connection) and
servers (hosts that passively listen for connection requests).
pegasus-kickstart. Compute jobs in Pegasus are wrapped using a lightweight executable written in C
called pegasus-kickstart [84] that captures runtime job provenance data. The toolkit provides useful information about the execution of the wrapped task such as (1) information about the execution node
(architecture, OS, number of cores, available memory); (2) the environment setup of the machine while the
task was running; (3) the task’s characteristics and performance data (arguments, start time, duration, exit
code, etc.); and (4) the task’s output logs (stdout and stderr).
3.3 Online Monitoring System
abstract
DAX
Pegasus
workflow planner
executable
workflow
DagMan
workflow engine
workflow
logs
Pegasus
pegasus-monitord
Pegasus
stampede workflow
database
Pegasus
dashboard
REST API
Workflow-level
analysis for
anomalies &
failures
RabbitMQ
message broker
Logstash
data processing
pipeline
Elasticsearch
search engine
Kibana
data visualization
Kibana Plugin
Workflow Monitoring Events
Globus Online Data
TSTAT Logged Connections
Pegasus
pegasus-monitord-tstatlog-retrieval-thread
Computational Resource
Transfer Sub-System
(T5) send Globus
logs to RabbitMQ
Globus
transfer service
(T4) retrieve Globus
transfer logs
(T1) transfer
task request Pegasus
pegasus-transfer
data stage-in job
(T2) request
transfer
(T3) stage-in
input data
Globus
Globus online
endpoint
input data
Tstat Globus
Globus online
endpoint
execution site
Tstat
(T6) Tstat logs
compute job
Kickstart
monitoring
events
job kickstart
data
HPC I/O
data
DARSHAN
data flow
sub-component
data description
Figure 3.1: Architecture overview of the end-to-end online performance data capture and analysis process.
How we are using the Pegasus workflow management system to collect and publish workflow logs, execution traces and transfer logs.
49
Previous works [60],[58], [57], have demonstrated the potential of workflow performance data capture and analysis to estimate workflow resource requirements and to detect anomalies present in a single
workflow execution trace generated by the workflow management system. In this thesis, we aim to collect and correlate information from heterogeneous monitoring components that are resident in various
execution sites. These monitoring sources include underlying data transfer infrastructure for workflows
(i.e., Globus Online and Tstat) and I/O performance monitoring tools (i.e., Darshan). Each of these sources
has its own proprietary format that introduces challenges related to how information can be integrated.
By approaching the problem from a system architecture perspective (Figure 3.1), we use the RabbitMQ
message broker [144] as the central endpoint where monitoring information is gathered from the various
sources. Underneath the message broker, we have deployed a standard ELK stack [48]. The ELK stack consists of the Elasticsearch, Logstash, and Kibana open-source tools. Elasticsearch provides a RESTful search
and analytics endpoint. Logstash is a data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a “stash” like Elasticsearch. Kibana lets users query and
visualize data with charts and graphs in Elasticsearch. Combined, these tools provide a complete platform
for data storage, retrieval, sorting, and analysis. We use Logstash’s RabbitMQ input plugin to fetch data
from the AMQP endpoint and push it into an Elasticsearch instance. This plugin monitors the RabbitMQ
queues for new messages with monitoring information, and reprocesses them into a common format that
can be ingested into Elasticsearch, using custom prepossessing function we have introduced. Additionally,
we have built a custom workflow dashboard as a Kibana plugin [165]. This dashboard allows users to select
a particular workflow and visualize the related performance data in near real-time: both aggregated at a
workflow level and a per job level.
In this section, we present an overview of the data collection framework. We describe the newly introduced online monitoring capabilities to pegasus-kickstart [84], which are fundamental for workflow performance data gathering. We then describe the data collection challenges encountered and the extensions
50
we have introduced to pegasus-transfer and pegasus-monitord to tackle them. Additionally we develop a
new tool pegasus-darshan that enables the extraction of Darshan logs in real time. This is followed by a
description of the data capture flow during the execution of a workflow job. Finally, we present the Kibana
plugin for data discovery and visualization.
3.3.1 Pegasus-Kickstart Online Monitoring
Near real-time monitoring enables the rapid detection of poor performance issues and workflow or infrastructure anomalies. By harvesting such information in a timely fashion, one could identify, mitigate, or
prevent undesired behaviors at runtime. To this end, we have extended pegasus-kickstart to include finegrained monitoring capabilities that can pull resource usage statistics of workflow running tasks within a
predefined time interval. The statistics include resource usage from accelerators (e.g., Nvidia GPUs) on top
of the statistics offered by Linux OS counters. The maximum polling frequency is limited to one second
to prevent system flooding. This information is then published to an AMQP endpoint in JavaScript Object
Notation (JSON) format so it can be ingested to a permanent storage (e.g., InfluxDB) or to an analysis
framework (e.g., Elasticsearch). Table 3.1 summarizes performance metrics and workflow characteristics
provided by pegasus-kickstart∗
.
3.3.2 Data Collection - Extending Workflow Monitoring
To support the online nature of our data capture system, we have designed and extended Pegasus to include
online publishing capabilities. Specifically, we have modified the monitoring daemon (pegasus-monitord)
and the transfer tool (pegasus-transfer). In addition to Pegasus-specific tools, we have also developed
mechanisms to gather, preprocess, and publish job-level and infrastructure-level performance data from
∗Currently, this implementation is available online in a separate GitHub branch:
https://github.com/pegasus-isi/pegasus/tree/panorama
51
Field Description
event type of event (kickstart.inv.online)
ts timestamp of the measurement
hostname the hostname of the compute node
site the execution site
wf_uuid the workflow UUID
wf_label the workflow label
dag_job_id the job ID referring to Pegasus’s dag
xformation the job label from Pegasus’s dag
task_id task ID
pid the process ID
exe the invoked executable
rank the process rank
utime time spent on executing user code
stime time spent on executing system code
iowait time spent on waiting for IO
vm virtual memory used by the process
rss resident-set size memory used by the process
procs number of processes
threads number of threads
bread number of bytes read
bwrite number of bytes written
rchar number of chars read
wchar number of chars written
syscr number of read system calls
syscw number of write system calls
bsend number of bytes sent
brecv number of bytes received
Table 3.1: Summary of online workflow performance metrics and characteristics provided by pegasuskickstart.
system profiling tools, such as Darshan and from transfer services such as Globus Online. Below is a
summary of the challenges intrinsic to each tool and a description of our approach to overcome them.
pegasus-monitord. The Pegasus monitoring daemon follows a workflow execution and records provenance information from the workflow and its completed jobs. Typically, coarse-grained runtime provenance information is populated to a relational datastore (the Stampede Workflow
Database [73]) upon job completion. To enable near real-time, fine-grained monitoring while still gathering and storing provenance data in a traditional relational database, we have developed a multiplexing
52
capability that allows pegasus-monitord to publish events to an AMQP endpoint in addition to the relational database. We have also extended pegasus-monitord to parse additional monitoring events from job
stdout records to facilitate population of events from Darshan.
pegasus-transfer. The Pegasus transfer tool performs data movement operations by invoking the appropriate underlying data transfer tool based on the protocol scheme specified for the source and destination
URLs. It supports a variety of protocols such as SCP, GridFTP, HTTP, S3, stashcp, file copy, etc. [10, 13,
191]. Upon completion, transfer logs are summarized into file records (job’s standard output files) which
include the number of transfer operations performed, the amount of data transferred, and the transfer
rate. Due to the imminent need for a better and more rapid detection of data transfer issues, we have
extended pegasus-tranfer to perform transfer operations via the Globus Online transfer service. Not only
does it provide fast, secure, and reliable data movement operations for workflow data transfers, but Globus
Online also provides a querying mechanism to retrieve the transfer status near real-time. After a transfer
finishes, either with a successful or a failed status, pegasus-transfer queries the Globus transfer service
and publishes detailed information about the transfer to the AMQP endpoint. Fine-grained transfer logs
include:
• Transfer request, start, and completion times;
• Transfer steps (for long running transfers);
• Accurate transfer throughput;
• Level of concurrency and parallelism used in transfer;
• Number of subtasks failed and retried; and
• Human readable error messages describing the failure.
53
Field Description
event event type: stampede.task.monitoring
monitoring_event monitoring event type: darshan.perf
darshan_log_version log file version number
exe name of the executable
uid user id that job ran as
jobid job id from the scheduler
start_time start time of the job
end_time end time of the job
nprocs number of MPI processes
run_time run time of the job in seconds
STDIO.* STDIO module data
POSIX.* POSIX module data
Table 3.2: Summary of Darshan metrics captured during workflow execution. Since Darshan logs are only
produced at job completion, near real-time monitoring is attained at per job completion granularity.
Darshan. We use the Darshan profiling tool to gain insights into the I/O performance of Message Passing
Interface (MPI) applications. Darshan is usually enabled by default on large HPC systems and its logs
are available in a proprietary binary format at a standard location for each job when it completes. We
have developed a new tool called pegasus-darshan that: (1) determines the corresponding log file for a
particular job based on the local resource manager’s job ID; and (2) parses the binary file and generates a
JSON record containing the relevant Darshan statistics. This tool is invoked at the end of each MPI remote
job execution, and the extracted Darshan information gets encoded in the job’s stdout. The JSON record
containing information from Darshan (Listing 1) is parsed by pegasus-monitord and is published to the
AMQP endpoint. Currently, we only extract a subset of the available Darshan statistics (summarized in
Table 3.2).
Tstat. Execution sites often do not provide direct access to Tstat logs. Instead, execution sites make a
subset of preprocessed records available to their users. For instance, at the National Energy Research
Scientific Computing Center (NERSC), Tstat processed logs are published to an ELK cluster, which users
can access via a Kibana dashboard. One thing to note is that not all Tstat records are made available in
the ELK cluster (NERSC filters records where transfer throughput is below 100 MB/s, i.e. transfers from/to
slow speed networks are omitted). Due to the delay (up to several hours) in preprocessing the large amount
54
of transfer logs, online monitoring becomes challenging. Finally, Tstat captures very low-level network
statistics that do not contain metadata associating TCP flows to a particular data transfer. Such disjointed
information from TCP flow fine-grained statistics and data transfers hinders the ability to identify issues
concerning a specific transfer operation. The problem becomes more complex in large-scale production
environments, where hundreds of transfers may occur simultaneously.
Data collection with Tstat is currently performed at the execution site using a best effort approach.
Globus uses GridFTP for file transfers and spawns concurrent GridFTP processes to transfer multiple files
simultaneously. Furthermore, each GridFTP process uses multiple TCP connections [10, 107], and connections are reused for all files within a process. In order to correlate individual file transfers with TCP flows.
We filter the Tstat data using the transfer time windows and other Globus features, including source and
destination hosts.
3.3.3 A Versatile Approach for Integrating New Tools
As monitoring tools are constantly evolving, we argue it is crucial to provide a way to seamlessly integrate
new tools into the overall monitoring and analysis system. To this end, we have extended pegasus-monitord
with the ability to parse additional monitoring events from job stdout records. This feature allows us to
create job wrapper scripts that can invoke arbitrary monitoring tools and append statistics produced by
these tools to the stdout upon task completion. These statistics have to be in JSON format and must be
wrapped within an output segment specified by keyword tags that indicate the start and end of a record.
While pegasus-monitord is processing stdout records, it is able to identify this special segment and trigger
a monitoring event after parsing the JSON document. By following this approach, the monitoring data
will be added to Elasticsearch under the index hosting the workflow events by default. However, by using
Logstash, these events can be filtered, preprocessed, and finally ingested into a new custom index. In our
architecture, we used this approach to add the data produced by Darshan on the remote execution site to
55
@@@PEGASUS_MONITORING_PAYLOAD - START @@@
{
"monitoring_event": "darshan.perf",
"payload": [
{
"POSIX_module_data": {...},
"STDIO_module_data": {...},
"compression_method": "ZLIB",
"darshan_log_version": "3.10",
"end_time": 1531941742,
"end_time_asci": "Wed Jul 18 19:22:22 2018",
"exe": "namd2 equilibrate.conf",
"jobid": "1547",
"metadata": {
"h": "romio_no_indep_rw=true;cb_nodes=4",
"lib_ver": "3.1.6"
},
"nprocs": 8,
"run_time": 65.0,
"start_time": 1531941678,
"start_time_asci": "Wed Jul 18 19:21:18 2018",
"uid": "1003"
}
],
"ts": 1531941740
}
@@@PEGASUS_MONITORING_PAYLOAD - START @@@
Listing 1: Generic monitoring payload example.
Elasticsearch as described in Section 3.3.2. A generic monitoring payload consists of a monitoring event,
a payload, and a timestamp. An example of a generic monitoring payload can be found in Listing 1.
3.3.4 Data Capture Flow
To convey application and infrastructure performance data from various heterogeneous data sources to
the workflow’s end user in a comprehensive and coherent fashion, we have designed and implemented
the flow illustrated in Figure 3.1. The information flow entry point is a Pegasus workflow, and performance information collected during the workflow execution is made accessible through a custom Kibana
dashboard or the Pegasus dashboard REST API.
On a submit host, which is a machine where a user submits a workflow from and it can be a headless
login node, a laptop or another terminal endpoint, Pegasus takes in a high-level description of the user
56
workflow and generates an executable workflow that is managed by HTCondor DAGMan. DAGMan releases jobs when they are ready for execution to the local HTCondor scheduler that in turn submits the
jobs to remote resources for execution. On a remote resource, the jobs are launched by pegasus-kickstart,
which monitors job execution and sends online monitoring events to the RabbitMQ message broker. When
a compute job is an MPI job, the job wrapper automatically invokes pegasus-darshan to parse I/O characteristics from Darshan logs upon completion of the job and encodes them as part of the job’s standard
output (stdout). The job stdout is then included in the pegasus-kickstart output, which is transferred
back automatically to the workflow submit host. On the submit host, the job information and the DAGMan logs are parsed by pegasus-monitord, which publishes this information to the RabbitMQ message
broker and populates the Pegasus Stampede workflow database. Workflow data transfer jobs are executed
using pegasus-transfer. While managing the transfers, pegasus-transfer initiates transfer requests to the
Globus transfer service and frequently (at a predefined time interval) performs pull requests inquiring
about the status of the transfers. Upon completion, Globus transfer logs are retrieved and published by
pegasus-transfer to the RabbitMQ message broker. All the online monitoring data published to RabbitMQ
are automatically fed into Elasticsearch using the Logstash connector, and afterwards are accessible in
the custom Kibana dashboard plugin. Additionally, workflow and job level data recorded in the relational
workflow database (Pegasus Stampede [73]) is made available via the Pegasus dashboard REST API.
3.3.5 Data Discovery and Visualization
The goal of this thesis is to provide a resource for the collection, discovery, analysis, and sharing of endto-end performance data of scientific workflow executions. In our proposed system, data archiving and
browsing are managed as follows. Structured data is stored in a traditional relational database (Pegasus
Stampede) that permanently records workflow events and statistics. This data can then be accessed via
a REST API to query the workflow status and the job-specific or workflow-specific performance metrics
57
Figure 3.2: Screenshots of the Kibana plugin for near real-time monitoring of workflow performance metrics. Top: workflow progression and detailed job characteristics. Bottom: time series data of job performance.
58
during execution. Online performance data, as well as workflow events, are also stored in an unstructured
format in Elasticsearch. By generating inverted indexes for every field in the data (inverted indexes can
be used simultaneously in queries), data discovery is empowered by full text search, which allows for the
unveiling of hidden knowledge (e.g., differentiate identical error codes using the ascii error messages).
For our experiments, the Elasticsearch deployment is centralized in a single server. As data ingestion and
querying traffic become larger, we plan to enable the distributed deployment (shards) of Elasticsearch.
To enable near real-time monitoring and easy, interactive visualization of workflow characteristics and
performance metrics, we have developed an open-source Kibana plugin [165] (Figure 3.2). This dashboard
combines and summarizes data gathered from the various sources described in this chapter and displays
concise information (in the form of tables and plots) about the workflow execution at runtime. The goal of
this dashboard is to allow users to effortlessly pinpoint performance issues without needing to review the
hundreds of MBs of execution logs generated during execution. As events occur and their data are pushed
into Elasticsearch, the user can follow the progression of the workflow in near real-time. Statistical or
Machine Learning (ML) analyses on the data can be performed by simply running one or a few of the
multitude of Kibana plugins freely available online, all within the same platform. Our plugin is a live
product and is constantly evolving with new capabilities. The current collection of available plots includes
time-series analysis of workflow-level and job-level statistics such as CPU utilization, I/O read and write
operations, I/O wait, I/O throughput and runtime, among others (Figure 3.2).
3.4 Deploying the Online Monitoring Architecture
In this section, we describe how the proposed architecture can be deployed. Our architecture allows users
to partially or fully enable monitoring features by simply providing specific Pegasus profiles or properties
as described below. The source code is publicly available on GitHub [161].
59
pegasus.monitord.encoding = json
pegasus.catalog.workflow.amqp.url = \
amqp://[username:password]@hostname[:port]/exchange_name
pegasus.catalog.workflow.amqp.events = stampede.*
Listing 2: Enabling Pegasus Stampede events via the Pegasus’ properties file.
...
1
amqp://[username:password]@hostname[:port]/exchange_name
Listing 3: Enabling Pegasus Transfer events via the Pegasus’ sites catalog.
Enabling Stampede Events. In order to direct pegasus-monitord to publish all of its events to the AMQP
endpoint in JSON format, three properties must be specified in the workflow’s properties file (e.g., pegasus.properties): (1) pegasus.monitord.encoding enables the JSON output format
(2) pegasus.catalog.workflow.amqp.url contains the connection information to the AMQP
endpoint; and (3) pegasus.catalog.workflow.amqp.events filters which events should be published. Listing 2 shows an example of these properties in practice.
Enabling Transfer Events. To enable the mechanisms to publish transfer statistics from the Globus
Transfer Service to an AMQP endpoint in JSON format, two Pegasus profiles must be specified in the workflow’s sites catalog (e.g., sites.xml), under the site where pegasus-transfer will be invoked (e.g., local),
as shown in Listing 3. The environment variable PEGASUS_TRANSFER_PUBLISH operates as an on/off switch
to the transfer monitoring, and the PEGASUS_AMQP_URL variable provides the AMQP endpoint definition to
pegasus-transfer.
Enabling Kickstart Online Traces. To publish traces of resource usage statistics, two Pegasus profiles
must be specified in the workflow’s sites catalog (e.g., sites.xml) under the compute site (Listing 4):
60
...
-m interval_seconds
rabbitmq://[USERNAME:PASSWORD]@hostname[:port]/...
Listing 4: Enabling Kickstart online traces via the Pegasus’ sites catalog.
(1) pegasus.gridstart.arguments instructs pegasus-kickstart to collect resource usage statistics every N
seconds; while (2) KICKSTART_MON_URL points to AMQP’s rest api for publishing data, which looks similar
to “api/exchanges/exchange_name/publish”. This way, pegasus-kickstart will push such information to
an AMQP endpoint in JSON format.
Enabling Darshan Statistics. As mentioned in Section 3.3, we use a wrapper script to retrieve Darshan
logs. This script identifies the location of the generated Darshan logs and invokes pegasus-darshan after the
completion of the MPI job. The “pegasus-darshan” tool parses the Darshan logs and outputs a monitoring
payload (Listing 1). The propagation of these events depends on whether the Stampede events [73] have
been enabled. An example of the wrapper script, which was used to retrieve Darhan logs from Cori, an
HPC system at NERSC, is shown in Listing 5.
Setting Up The Monitoring Backend. Configuring the monitoring backend (RabbitMQ, Elasitcsearch,
Logstash, and Kibana) can turn out to be a cumbersome and challenging process, especially if one is not
familiar with these tools and only wants to collect and analyze workflow execution statistics. By leveraging
container technologies (Docker [23] and Docker Compose [46]), we have developed a container orchestration mechanism that spins up all the required services preconfigured to capture all events produced by the
workflow execution. Additionally, this automation spins up a version of Kibana with our Kibana plugin
installed (Figure 3.2). All container services use persistent volumes, which consequently maintain the state
61
#!/bin/bash -l
srun $PEGASUS_HOME/bin/pegasus-monitor namd2 "$@"
#post job parse darshan output
DAY=$(date '+%d')
DAY=${DAY##0}
MONTH=$(date '+%m')
MONTH=${MONTH##0}
YEAR=$(date '+%Y')
darshan_base=${DARSHAN_LOGDIR}/${YEAR}/${MONTH}/${DAY}
darshan_file=\
${darshan_base}/${SLURM_JOB_USER}_*${SLURM_JOB_ID}_*.darshan
for f in $darshan_file; do
$PEGASUS_HOME/bin/pegasus-darshan -f "$f"
done
Listing 5: Example of a wrapper script for gathering Darshan statistics from Cori at NERSC.
of the daemons and the collected data between restarts. Finally, all services can be triggered using a single
command (docker-compose up -d), which is reliable and easy to use [160].
3.5 Case Studies
In this section, we present case studies with two real world scientific workflow applications: a CPUintensive material science application and a data-intensive genomics application. The goal of these case
studies is to demonstrate the ability of our system to accurately capture performance metrics that are critical for improving the efficiency and resilience of current and upcoming systems and scientific workflow
applications. We experiment within a controlled environment, where anomalies are injected at runtime,
so that we can measure whether our system adequately captures such behaviors.
3.5.1 Scientific Applications
Spallation Neutron Source. We use a material, science-related workflow developed at the Spallation
Neutron Source (SNS) [117], a DOE research facility at Oak Ridge National Laboratory. The SNS workflow
62
Parameter
Values
Equilibrate Stage Production Stage
Amber14
Unpack Database
Coherent
Incoherent
Post-processing
and Viz
Figure 3.3: A diagram of a branch of the SNS workflow.
executes an ensemble of molecular dynamics (MD) and neutron scattering intensity calculations to optimize a model parameter value, for example, to investigate temperature and hydrogen charge parameters
for models of water molecules. The workflow takes as input a set of temperature values and four additional parameters: (1) type of material, (2) the number of required CPU cores, (3) the number of timesteps
in the simulation, and (4) the frequency at which the output data is written. Figure 3.3 shows a branch
of the workflow that analyzes a single temperature value. First, each set of parameters is fed into a series
of parallel molecular dynamics simulations using NAMD [143]. The first simulation computes an equilibrium, which is used by the second simulation to compute the production dynamics. The output from
the MD simulations has the global translation and rotation removed using AMBER’s [158] cpptraj utility,
which is passed into Sassena [100] to compute coherent and incoherent neutron scattering intensities from
the trajectories. The final outputs of the workflow are transferred to the user’s desktop and loaded into
Mantid [17] for analysis and visualization. In our experiments, we configured the SNS workflow to spawn
8 MPI processes for the Equilibrate stage, 16 MPI processes for the Production stage, and 16 MPI processes
for both Coherent and Incoherent neutron scattering intensities calculations. Unpacking the Sassena DB
and executing cpptraj were done using a single core only.
1000Genome. The 1000 genomes project provides a reference for human variation, having reconstructed
the genomes of 2,504 individuals across 26 different populations [2]. The test case used in this work identifies mutational overlaps using data from the 1000 genomes project in order to provide a null distribution
for rigorous statistical evaluation of potential disease-related mutations [55]. This test case (Figure 3.4)
is composed of five different tasks: (1) individuals – fetches and parses the Phase 3 data from the 1000
63
c1 c2 c3 c4 ... c22 p1 p2 ... pn s1 s2 s3 s4 ... s22
... fc 2505 fc 1 fs 3 fp 1 fp 2 fp n ...
m1 m2 m3 ... m154 fr1 fr2 fr3 ... fr154
i 3 pop 2 sh 3
om 1
Data Preparation
Populations Sifting
Individuals
1000 Genome Populations Sifting
Pair
Overlap
Mutations
Individuals
Analysis
ofm 1
Input Data
Output Data fom 2 fog 2
Frequency
Overlap
Mutations
Figure 3.4: Overview of the 1000Genome sequencing analysis workflow.
genomes project per chromosome; (2) populations – fetches and parses five super populations (African,
Mixed American, East Asian, European, and South Asian) and a set of all individuals; (3) sifting – computes the SIFT scores of all of the SNPs (single nucleotide polymorphisms) variants, as computed by the
Variant Effect Predictor; (4) pair overlap mutations – measures the overlap in mutations (SNPs) among
pairs of individuals; and (5) frequency overlap mutations – calculates the frequency of overlapping mutations across subsamples of certain individuals. In order to fit an instance of the workflow execution into
our testbed (see description below), we are processing 2 chromosomes for which we have pruned the original datasets to about 10% (about 1GB each) of the original data (about 11GB per individual dataset). For
this experiment, each workflow is composed of 22 individuals jobs, 2 sifting jobs, 14 frequency overlap
mutations jobs, and 14 pair overlap mutations jobs.
3.5.2 Experimental Setup
Figure 3.5 presents the experimental system on the ExoGENI cloud testbed [20]. It is orchestrated over
a federation of independent cloud sites located across the US and connected via national research circuit
providers such as Internet2 [77] and ESNet [49], through their programmable exchange points. ExoGENI
64
Figure 3.5: Experimental setup on the ExoGENI testbed.
provides users with isolated virtual compute, storage, and network resources named “slices” of infrastructure. ExoGENI uses its native ORCA (Open Resource Control Architecture) [30] control framework
software to offer a unified hosting platform for deeply programmable, multi-domain cloud applications.
By using a controlled execution environment such as ExoGENI, we ensure that system interference are
minimized as synthetically generated interference affect specific performance metrics targeted for evaluation.
Our setup consisted of one data node, one master node, and four compute (worker) nodes. Each node
had 4 vCPUs clocked at 2.2 Ghz, 10GBytes of RAM, and 75GBytes of storage. Both the compute and the
master nodes were collocated on the same rack, while the data node was spawned on a rack in another
region. The master and the compute nodes communicated via the rack switch, while the data node could
be reached via ESnet’s network. To facilitate the execution of the workflows, we configured our slice with
HTCondor and Slurm, and we configured Pegasus on the master/submit node (where HTCondor managers
and schedulers, as well as Slurm master, reside). Globus Endpoints were created on both the master and
the data nodes. Additionally, the master node and the compute nodes had access to a shared file system
(NFS), which was physically located on master’s hard disk.
65
3.5.3 Network Throughput
In this set of experiments, we aim to demonstrate the ability of our framework to capture network anomalies due to low performance in a network. More specifically, we arbitrarily inject synthetic packet loss and
packet reordering anomalies during the execution of workflow transfer jobs so that we can assess whether
the information captured by our framework at runtime is sufficient to show discrepancies due to network
anomalies.
For this experiment, we performed runs of the 1000Genome workflow where the input data is staged in
from the data node to the master node (the execution site) with Globus transfer. Job scheduling, execution,
and data transport between the master and compute nodes are performed through HTCondor. Upon job
completion, output data is staged out to the data node using Globus transfer service. For each scenario
described below, we performed six runs of the 1000Genome workflow to insure statistical significance
(error below 5%).
We used the Linux native Traffic Control (TC) [76] toolset to configure the Linux kernel packet scheduler so that we could introduce synthetic network and I/O anomalies, including delay, packet loss, and
jitter, among others. Figure 3.6a shows the effect of packet loss when using TC to introduce a randomized
percentage of packet losses at rates of 1%, 3%, and 5%. This interference is generated during the execution
of a Pegasus stage-in transfer job (stage_in_0_1) for the 1000Genome workflow, which transfers 790 MB
of input data via ESNet (from the data to the master node). When there is no interference in the network
connection, the mean transfer throughput is about 170 Mbps. In the event of randomized packet loss disturbances, network throughput significantly degrades by up to ∼6x – measured throughput is about 32,
18, and 9 Mbps for 1%, 3%, and 5% packet loss, respectively.
Although TCP attempts to mitigate out-of-order delivery of data packets, this is still a common issue in
today’s computational systems [203]. Not only does this impose significant computational overheads on
hosts, but it also impacts the throughput of TCP. Therefore, having the ability to accurately identify and
66
0
20
40
60
80
100
120
140
160
180
200
0% 1% 3% 5%
Throughput (Mbps)
Packet Loss
(a) Transfer throughput of 1000Genome workflow
with packet loss.
90
100
110
120
130
140
150
160
170
180
190
200
0% 10% 30% 50% 70% 90% 100%
Throughput (Mbps)
Packet Reordering
(b) Transfer throughput of 1000Genome workflow
with packet reordering.
assess such impact on workflow job executions is crucial to support the decision process of performing
actions to prevent or mitigate such effect. To this end, we inject synthetic packet reordering by arbitrarily
delaying some of the packets for 10 ms before dispatching them. For example, 30% reordering means 30% of
packets are delayed for 10 ms, while the remaining packets are dispatched immediately with a correlation
of 50%.
Figure 3.6b depicts distribution measurements of the network throughput for 10%, 30%, 50%, 70%, 90%,
and 100% packet reordering. Transfer throughput has a noticeable slowdown when the packet reordering rate increases from 0% to 30% (∼0.2x degradation). Intriguingly, the network throughput raises for
reordering at a 50% rate – as more packets are delayed, the more ordered they become. Similar behavior
can also be observed in the throughput for reordering rates of 70% and 90%, which are symmetrical to the
throughput of 30% and 10%, respectively.
Both of the network anomalies above have significant impact on the transfer throughput. However,
it is difficult to identify the issue that degrades network throughput from the Globus transfer logs alone.
Therefore, Tstat becomes a centerpiece of our data capture architecture as it provides fine-grained low-level
network statistics of TCP flows. By combining both Globus transfer and Tstat logs into one monitoring
framework, we can derive a more complete picture of the data transfers, which is fundamental for end-toend monitoring of workflow executions.
67
3.5.4 I/O Throughput
A typical bottleneck when running large-scale, data-intensive workflows is the heavy use of disk I/O. As
parallel file system performance is not keeping up with compute and memory performance, it is imperative
to properly identify situations where low I/O performance may severely impact the workflow makespan.
To attenuate this issue, both in situ and in transit solutions have been used to accelerate the workflow
performance [52, 53].
0
20
40
60
0 500 1000 1500 2000
GBytes
Runtime (Seconds)
Bytes Read Bytes Written
0
20
40
60
0 500 1000 1500 2000
GBytes
Runtime (Seconds)
Figure 3.7: Cumulative I/O over time. Top: 1000Genome workflow without interference. Bottom: I/O
stressing of the workers.
To evaluate the ability of our framework to capture fine-grained I/O information, we used Stress [189],
a simple workload generator that can impose configurable amount of CPU, memory, I/O, and disk stress
on the system. Figure 3.7 shows the cumulative reads and writes for the 1000Genome workflow over time,
with and without disk stress (top: regular workflow execution with no interference; bottom: workflow
execution with disk stress). To introduce the interference, we spawned one stress process on each worker
(compute) node that performed about 50 MB of I/O writing to the node’s disk continuously. Since workflow
tasks have dependencies, i.e., one child job does not start its execution until all its parents have completed,
68
the workflow is slowed down by a factor of ∼1.5 – i.e., disk stress degrades the node’s disk throughput,
thus jobs require more time to complete I/O operations.
The above performance metric is obtained with pegasus-kickstart, which provides time series data of I/O
read and write operations at runtime. Although this metric aids in pinpointing bottlenecks in the workflow,
it lacks fine-grained information regarding jobs’ I/O profiles. As previously mentioned, Darshan provides
I/O characterizations for HPC applications including properties such as patterns of access within files.
By combining Darshan and Pegasus logs, one can accurately identify the bottlenecks or low performance
issues due to I/O operations at different levels (e.g., single or parallel operations).
For this experiment, we used the SNS workflow, which has four parallel (MPI) jobs in its pipeline
and is instrumented with Darshan†
. Similarly to the 1000Genome workflow execution, we stage in the
workflow input data from the data node to the master node via Globus transfer, which is then stored in
a parallel file system (NFS) accessible to the worker nodes. Computing jobs (NAMD and Sassena) are
submitted to HTCondor (used as a broker), which dispatches them to Slurm, so that MPI jobs can benefit
from all available resources. Note that in the parallel execution, multiple processes may write to a single
file simultaneously. In the evaluated scenario, all processes write to a single file in the parallel file system,
thus multiple I/O write/read requests may happen at the same time. Therefore, on the master node we
spawned two stress processes that write ∼50 MB each, and there was no disk stress on the worker nodes.
Figure 3.8 shows the parallel file system performance, captured by Darshan, during the execution of the
SNS’ NAMD and Sassena jobs. The performance of STDIO operations is not affected by the interference.
On the other hand, POSIX performance is severely impacted – I/O throughput degradation is slowed down
up to a factor of 3 when compared to a regular execution with no interference.
†
For additional information refer to Darshan’s documentation https://www.mcs.anl.gov/research/projects/darshan/documentation/
69
0
10
20
30
40
50
60
70
none hdd_50_50
Megabytes/Second
Interference
namd_stdio
namd_posix
sassena_stdio
sassena_posix
Figure 3.8: Average STDIO and POSIX performance for NAMD and Sassena Jobs obtained from Darshan’s
logs.
3.5.5 CPU Contention
External load is a common factor that often negatively impacts the workflow makespan. In shared environments such as grids, external processes (including processes spawned by different users sharing the
same resource) may substantially impact CPU performance. We have performed CPU stress tests on one
CPU per worker node during the execution of the SNS workflow. Figures 3.9a and 3.9b show the CPU utilization per rank of the NAMD MPI job without and with interference, respectively. Under no interference,
CPU utilization for each MPI rank is nearly 100%, while in the scenario with interference CPU utilization
resonates between 70% and 90%. This is due to the stress processes not having been pinned to a specific
core of each node with affinity, while each MPI process had affinity set during submission. Notice that due
to CPU utilization degradation, the NAMD job resulted in a slowdown by a factor of 5 – most probably
because NAMD largely depends on the performance of work synchronization between tasks of the MPI
jobs.
70
0
20
40
60
80
100
120
0 50 100 150 200 250 300 350
Utilization (%)
Runtime (Seconds)
(a) without interference.
0
20
40
60
80
100
120
0 400 800 1200 1600 2000
Utilization (%)
Runtime (Seconds)
(b) with interference.
Figure 3.9: CPU utilization per rank for the NAMD MPI job. Without interference the CPU utilization
is steady, close to 100%. However with interference CPU utilization fluctuates between 70% and 90%. To
introduce interference, one stress process per worker consumed approx. 25% of the node’s CPU time.
3.6 Conclusion
In this chapter, we have presented a framework to orchestrate a number of well-established and newly
developed state-of-the-art tools in order to capture and correlate fine-grained and coarse-grained information about scientific workflow executions from various, heterogeneous sources, in an online manner. Such
sources include network (Globus, Tstat), filesystem (Darshan, kickstart), and compute resources (kickstart).
To this end, we have extended several components (pegasus-monitord, pegasus-transfer) of Pegasus WMS
to enable online performance monitoring. Moreover, we developed new tools (pegasus-darshan) to facilitate performance data collection on remote nodes. Additionally, we presented a custom Kibana plugin,
tailored to the needs of tracking a workflow execution and identifying potential issues at runtime. We
evaluated our approach by executing normal and anomalous runs of two different classes of workflows
in a controlled environment. Our experiments demonstrate that this novel monitoring system is able to
accurately collect relevant performance metrics that can then be used to identify and analyze performance
issues.
71
Chapter 4
Evaluation of Cyberinfrastructure Configuration’s Impact on Data
Intensive Workflow Performance
In recent years, machine learning has seen a lot of innovations. The emergence of new techniques (e.g.,
deep learning) that are adaptable and effective in many domain applications, and the development of sophisticated, yet easy to use, frameworks (e.g., PyTorch, Tensorflow, etc.), have made the application of
machine learning a common approach in solving a variety of problems. DDDAS applications, such as the
orcasound (Section 1.2.2.1), have been incorporating machine learning modules in order automate their
operations and remove the human in the loop. These ML components are data-intensive and their access
patterns to the data can vary by a lot (e.g., training vs inference).
Modern cyberinfrastructure provides significant flexibility and can be configured in many different
ways, but some configurations may be more suitable than others for DDDAS applications that include ML
components. In certain scenarios, a shared file system configuration may be more favorable over a non
shared filesystem configuration. However, in some cases data access patterns of the application may saturate a shared file system and lead to degraded performance. In this Chapter, we aim to provide insights into
how different cyberinfrastructure configurations impact the performance of data-intensive workflows. We
have developed three machine learning workflows from the domains of astronomy, medicine, and crisis
computing, and we leverage the framework we developed and presented in Chapter 3 to characterize their
72
behavior across the different phases of their pipelines (preprocessing, hyperparameter optimization, training, inference and evaluation). We then evaluate their performance on two distinct cyberinfrastructure
deployments, one using a non shared filesystem and one with a shared filesystem.
4.1 Related Work
Scientific workflows and workflow management systems have been two extremely fertile research domains
in the last decades and resulted in many contributions (a relevant survey [102]). However, recently, several
workflow management systems have emerged to accommodate the life cycle of ML pipelines executing
on high-performance computing environments and commercial clouds. Wozniak et al. [193] designed a
workflow framework for cancer research optimized for HPC resources. Many cloud-native WMS have also
been introduced. MLFlow [200] and Pachyderm [141] are few examples of cloud-native Kubernetes [95]-
based solutions tailored for ML workflows. Despite this, major companies are also rolling out there own
cloud native WMS in support of ML, with some examples being Uber (Michelangelo) [120] and Facebook
(FBLearner) [78]. These solutions target pure ML pipelines, where all the pre-processing is done on-thefly, traditionally in Python, rather than scientific ML workflows which may contain additional processing
steps relying on specialized software.
Apart from WMS, general purpose distributed computing frameworks such as Spark [201] and
Hadoop [192] have also been thoroughly investigated for ML workloads [98]. More recently, the research
focus has been shifted towards specialized ML frameworks built on top of serverless computing [26, 97].
Carreira et al. [26] have proposed Cirrus, a framework that facilitates the execution of ML workflows on
the cloud using a serverless approach.
Monitoring and profiling of ML pipelines have also been topics addressed in recent literature. Zhou
et al [207] proposed an extension of HPCToolkit to enable fine-grained performance analysis on GPUs
by collecting program counter samples on both CPU, and GPU. Profiling and visualization tools have
73
been introduced by hardware vendors (e.g., NVIDIA with Nsight [131] its in-house profiling tool for GPU
application) and ML frameworks such as PyTorch [147], which, when coupled to TensorBoard [175], can
help scientists analyze and understand ML algorithm behaviors and performance. However, these tools
are focusing on in-depth performance analysis and debugging ML pipelines. In this chapter, we used a
more practical, coarse-grained, resource monitoring approach to capture the ML workflow executions and
did not delve into fine-grained analysis of function calls.
A vast majority of existing characterization and performance studies focus exclusively on the model
training or inference tasks, and use reference implementations of popular deep learning models that fail
to mimic the complexity of workloads used in scientific experiments [72, 35, 36]. In this thesis chapter,
we aim to develop a better understanding of the behavior of the machine learning pipelines in each part
of their life cycle (preprocessing, hyperparameter optimization, training, inference and evaluation), and
also evaluate how cyberinfrastructure configuration decisions can impact their performance. This work
can impact how resource management and job management in ML workflows is conducted to achieve
scalability and performance.
4.2 Workflows
In this section, we introduce each of the workflows we developed and discuss their general characteristics. The workflows feature different scientific domains (astronomy, medicine, and crisis computing) and
different supervised learning approaches (image classification, image segmentation, and natural language
processing). We decouple data pre-processing and training tasks as data transformations for scientific experiments often require specialized software. The pre-processing tasks in the workflows exhibit data parallelism and can be executed simultaneously across available compute resources. The hyper-parameters
optimization (HPO) trials (where different values of hyper-parameters are evaluated) could also be executed in parallel, however, within the scope of this thesis, we use Bayesian optimization and execute the
74
trials sequentially. The inference is treated as a separate task so that the job can be easily deployed when
a trained model is used for predictions. In the next sections we describe the three different workflows we
use in our study.
4.2.1 Galaxy Classification Workflow
As described in Chapter 1 the Galaxy workflow classifies the morphology of galaxies into five distinct categories (completely round-smooth, in-between smooth, cigar-shaped smooth, edge-on, and spiral), using
deep learning techniques and images captured by the Sloan Digital Sky Survey (SDSS) [88].
Worflow Job
Input Data
Inference and
Evaluation
Legend
Input Files
Output Files
Train VGG16
processed
test imgs
test
images Pre-process
Test Images
preprocessed train,
val imgs
augmented imgs
preprocessed train,
val and augmented
imgs
best hyperparameters
processed
train imgs
train
images Pre-process
Train Images processed
val imgs
val
images Pre-process
Val Images
train, val, aug imgs,
best hyperparameters
best vgg16 models
test imgs, best
vgg16 model
classification results,
performance stats
Galaxy Images
Augment
Images Augment
Images
Dataset
Generation
and Split
VGG16 HPO
Pre-workflow
Figure 4.1: Galaxy Classification Workflow [93].
75
Workflow Overview. The Galaxy Workflow (Figure 4.1) utilizes the Galaxy Zoo 2 dataset∗
that consists
of 61,578 RGB images, each of size 424x424x3 pixels (1.9 GB of compressed data). The first stage of the
workflow (Dataset Generation and Split) filters out galaxies based on their feature scores. This reduced
dataset of 28,790 images is split into training, validation, and test sets. These datasets are passed to Preprocess Images jobs where several data transformations (e.g., crop, downscale, whitening) are applied. To
address the problem of class imbalance in the dataset Augment Images jobs generate additional instances
of underrepresented galaxy types. Next, VGG16 HPO job utilizes the Optuna [7], an HPO framework,
to find a good set of hyperparameters (e.g., learning rate, numbers of transferred layers). The chosen
hyperparameters and all the data are sent to the Train VGG16 job where the model is trained with the
chosen hyper-parameters. The weights of the trained model are saved to a checkpoint file. Finally, the
Inference and Evaluation job runs predictions on the test set, generates statistics and plots that provide
insights into the quality of the trained model. The implementation of this workflow is based on a recent
publication [208] and is publicly available [93].
4.2.2 Lung Segmentation Workflow
Precise detection of the borders of organs and lesions in medical images such as X-rays, CT, or MRI scans
is an essential step towards correct diagnosis and treatment planning (Chapter 1). Here, we implement a
workflow that employs supervised learning techniques to locate lungs on X-ray images.
Workflow Overview. The Lung Segmentation Workflow (Figure 4.2) uses the Chest X-ray Masks and Labels
dataset (800 high-resolution X-ray images and masks, 5.4 GB) available on Kaggle. The dataset is split into
training, validation, and test sets before the workflow starts. Each set consists of original lung images
(3000x2933 pixels each, 6.3 MB in size) and their associated masks (same resolution, 30 KB in size). The
Pre-process and Augment Images job resizes images (lungs and masks) to 256x256 pixels and normalizes
∗
https://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge
76
Pre-workflow
Worflow Job
Input Data
Evaluation
Legend
Input Files
Output Files
Lung Images
preprocessed train,
val and augmented
imgs
best hyperparameters
processed
and augmented
images
all images
Split Images into Train,
Test, Val Set
train, val, aug imgs,
best hyperparameters
best UNet models
test imgs, best
UNet model
performance report
and stats
Mask Images
Pre-process
and Augment
Images
segmentation
results
best UNet models
UNet HPO
Train UNet
Inference on
UNet
Figure 4.2: Lung Segmentation Workflow [80].
lung X-rays. Additionally, for each pair of lung image and mask in the train dataset, two new pairs are
generated through image augmentation (e.g., rotations, flips). Next, the train and validation data are passed
to the UNet HPO job, where Optuna [7] explores different learning rates. The Train UNet job fine-tunes the
UNet model with the recommended learning rate on the concatenated train and validation set, and saves
the weights into a file. The Inference on Unet job uses the trained model to generate masks for the test
X-ray images. The final step of the workflow, the Evaluation job generates a PDF file with the scores for
relevant performance metrics and prints examples of lung segmentation images produced by the model.
As the inference and evaluation steps are implemented as separated jobs, the Inference on Unet job can be
77
deployed independently for real-world predictions. We implemented a workflow that employs supervised
learning techniques to locate lungs on X-ray images, and is publicly available [80].
4.2.3 Crisis Computing Workflow
Pre-workflow
Train BiLSTM
Worflow Job
Late Fusion Input Data
Classification
and Evaluation
Legend
preprocessed
images, best
hyperparams
best resnet
model
best bilstm
model
Input Files
Output Files
preprocessed
test images, best
resnet model
predicted
labels for test
images
test images and test
tweets predicted
labels
classification results
and performance
statistics report
Extract and Split
Tweets
train, test and val
images
preprocessed
train, test, val
images
preprocessed
train, test,
val tweets
Pre-process
Tweet Text
Inference on
BiLSTM
preprecessed
test tweets, best
bilstm model
Fine-tune
ResNet50
Extract and Split
Tweets
Pre-process
Tweet Images
Inference on
ResNet50
train, test and val
tweets.csv
train, test and
val_tweets.csv
train, test and
val_tweets.csv
renamed images
split into 3 sets
BiLSTM HPO ResNet50 HPO
preprocessed
train, val tweets
best hyperparameters
preprocessed
test, val images
best hyperparameters
predicted
labels for test
tweets
preprocessed
tweets, best
hyperparams
Tweet Text Tweet Image
Figure 4.3: Crisis Computing Workflow [92].
As mentioned in Chapter 1, in recent years, social media platforms like Twitter and Instagram have
proven to be valuable sources of critical information during disaster events. In this section, we implement
78
a crisis informatics application, which uses deep learning methods to automate extraction of valuable
information from social media threads.
Workflow Overview. The workflow consists of the two pipelines that ingest, respectively, pictorial and
textual parts of SM posts. We use the CrisisMMD v2.0 [8] datasets (18,082 images and 16,058 texts, about
2 GB of data) and its accompanying data-split files. The textual part of the tweets is passed to the Preprocess Tweet Text job, where stop words, special symbols, and links are removed. Then, the train set of
clean tweets are embedded using GloVe [142] 200-d Twitter pre-trained vectors (758 MB) and then used
to find the best hyper-parameters to train the BiLSTM architecture (BiLSTM HPO job). Next, the model
is re-trained on the concatenated train and validation datasets, and its weights are saved and passed to
the Inference on BiLSTM job. Here, the model predicts labels for the text tweets. A similar procedure is
used for the image pipeline where, instead of BiLSTM, a ResNet50 model is used. Then, results from both
inference jobs are used in the Late Fusion Classification and Evaluation job to make final predictions where
information is classified as either informative or non-informative. The implementation of this workflow is
inspired by [133] and is available publicly [92].
Table 4.1 presents an overview of models and selected ML hyperparameters used for each of the workflows.
Table 4.1: Machine Learning Hyperparameters
Model Size #Params #Trials #Epochs Batch Size
Galaxy Classification Workflow
VGG-16 528 MB 138M 2 5 32
Lung Segmentation Workflow
UNet 81.6 MB 24.4M 10 25 32
Crisis Computing Workflow
ResNet50 98 MB 25M 2 5 8
BiLSTM 9 MB 1M 2 10 128
79
4.3 Experimental Setup
In this section we present our experimental setup and we describe the different data management scenarios
we used to evaluate the performance of the machine learning workflows on a distributed cyberinfrastructure, and discover potential bottlenecks that arise when running these complex applications. To facilitate
the execution of the ML workflows of Section 4.2, we developed them using the Pegasus Workflow Management System[42]. Additionally, to provide an execution platform, we used the Chameleon testbed [87]
and we created two deployments with different filesystem configurations, non shared FS and shared FS, as
described bellow.
Submit Host
Apache Server
Storage
CHI @TACC
HTCondor
Pool
CHI @UChicago
GPU Node
Local
Disk
Job HTCondor
Pool
CHI @UChicago
GPU Node
Local
Disk
Job
HTTP
SCP
Network I/O
GPU Node
Local
Disk
Job
CPU Node
Local Disk
Job
Figure 4.4: Non-shared filesystem deployment on Chameleon.
4.3.1 Chameleon Cloud Platform
We provision and configure selected Chameleon resources to have two environments: (i) an environment
without a shared filesystem and (ii)) an environment with a shared filesystem. The non-shared filesystem setup (Figure 4.4) consists of one node located in Texas Advanced Computing Center (TACC) acting
as a workflow submit node and HTTP server, and three worker nodes located in University of Chicago
(UChicago). The submit node hosts the WMS, which coordinates and launches workflow jobs on the
worker nodes. All of the nodes are bare metal nodes with 24 physical cores (hyperthreading disabled),
80
192GB of RAM, 10Gbps network connection and two of the worker nodes are equipped with one NVIDIA
RTX6000 (24GB memory) each.
The shared filesystem setup (Figure 4.5) is deployed entirely within Chameleon at UChicago and uses
the same amount of compute resources as the non shared filesystem setup. This time, however, the submit
node is configured with a Network Filesystem (NFS) to enable data access from the workers and also serve
as the temporary compute scratch location. Finally, both deployments use the same operating system
(Ubuntu 18.04) and software stack (HTCondor v8.8.9 and Pegasus Panorama-branch v5.1.0 [161]).
HTCondor Pool
CHI @UChicago
GPU Node
Jobs
Submit Host
NFS Server
NFS Storage
Network I/O
GPU Node
Jobs CPU Node
Jobs
Figure 4.5: Shared filesystem deployment on Chameleon.
All experiments run the workflows using Pegasus [182] and Singularity containers [96]. We collect
statistics using the online monitoring framework, we developed and presented in Chapter 3, execute each
workflow 10 times per configuration, and present average and standard deviation results over these 10
runs where applicable. During all the workflow runs there were always enough resources to handle all the
jobs added to the queue.
4.3.2 Execution Scenarios
We design three experiments that exhibit different data placement and file access strategies.
81
Baseline. This first scenario, which acts as a baseline scenario, involves executing the workflows without
any data placement optimization. Input and intermediate data staging are transferred via HTTP from
the submit host, while output files are sent back to submit host’s staging area via SCP (Figure 4.4). To
conduct the transfers we are using a maximum of 8 number of threads (Table 4.2) and we configure the
preprocessing tasks to be split in up to 6 jobs.
Container Installed. This second scenario is a variation of the Baseline scenario optimized for container
image placement. Here, the Singularity container images are pre-loaded on the worker nodes and the jobs
are able to pick them up from local disk whereas in the Baseline scenario container images are sent over
the network before each job.
Network Filesystem (NFS). Finally, the third scenario uses a shared filesystem among all the nodes, hosted
on the submit node (Figure 4.5). Here, we use a shared file system to host all the input, intermediate and
output data, but we also use it as the scratch location during execution. No clustering is employed and the
input files are symlinked to the scratch location of the jobs. With this scenario, we attempt to highlight
the penalties ML pipelines face when using a shared file system.
Table 4.2: Machine learning executable workflow scenarios and transfers settings (Baseline, Container
Installed, NFS)
Aux. Transfer Files
Workflow Scenario Jobs Jobs Threads Staged In
Galaxy Baseline 11 12 8 111950
Classification Container Inst. 11 12 8 111950
NFS 11 12 8 111938
Lung Baseline 7 12 8 9446
Segmentation Container Inst. 7 12 8 9446
NFS 7 12 8 9438
Crisis Baseline 16 12 8 51006
Computing Container Inst. 16 12 8 51006
NFS 16 12 8 50987
82
4.4 General Characterization
Table 4.3: Job-level characterization when running with non shared filesystem (Baseline scenario)
I/O I/O Avg. Peak Avg. Peak Avg.
Job Read (MB) Write (MB) CPU (%) Memory (GB) GPU (%) GPU Memory (GB) Exec. Time (Sec)
Galaxy Classification Workflow
Preprocessing 435.47 260.71 99.85 0.08 - - 135.21
HPO 2900.37 3404.28 1426.22 41.19 53.22 4.13 3421.48
Training 1754.6 2091.05 789.51 18.8 68.92 4.13 1453.53
Inference & Evaluation 1440.85 527.82 404.01 3.78 26.76 4.42 51.84
Lung Segmentation Workflow
Preprocessing 8107.44 143.99 120 0.37 - - 337.85
HPO 3816.46 84.8 100.75 5.96 68.19 22.99 4947.24
Training 540.71 8010.01 104.83 5.12 62.48 22.99 557.4
Inference 396.57 0.86 106.89 3.1 10.39 22.99 22.51
Evaluation 0.39 0.37 129.44 0.43 - - 6.13
Crisis Computing Workflow
Preprocessing (image) 1185.06 1457.79 660.74 0.07 - - 212.48
Preprocessing (text) 11.84 1.12 121.99 0.133 - - 8.8
HPO (ResNet50) 28321.72 786.75 186.22 6.44 68.73 1.67 1424.91
HPO (BiLSTM) 4464.78 0.95 269.51 4.13 20.92 22.64 1031.6
Training (ResNet50) 17702.71 728.70 179.75 5.94 63.73 1.76 871.99
Training (BiLSTM) 2032.84 5.84 272.56 4.03 20.9 22.64 625.09
Inference (ResNet50) 2734.55 231.07 2386.63 16.64 - - 3003.4
Inference (BiLSTM) 400.48 0.58 111.57 4.13 42.41 22.64 65.95
Evaluation 384.47 58.59 161.01 11.18 3.72 0.83 211.76
We first aim to characterize each workflow at a global level. To that end, each workflow is executed
only with Baseline settings. The characterization data is collected at job- and workflow-level. We use the
framework presented in Chapter 3 to capture workflow’s I/O behavior, CPU utilization and GPU utilization
traces. The values presented in Table 4.3 and Table 4.4 are averaged over 10 runs and I/O data collected.
4.4.1 Job-level Characterization
First, we examine the Pre-processing job. The data transformations performed during pre-processing are
commonly executed on the CPU. This step is easily parallelizable, and data (stored in files) are read into
the memory in batches resulting in low values of peak memory. The Lung Segmentation Workflow has the
highest peak memory and average execution time as medical images segmentation requires high-resolution
images. The pre-processing job in the Crisis Computing Workflow has the highest average CPU utilization
due to use of a computationally expensive interpolation algorithm.
83
The peak GPU memory remains constant for each workflow across the HPO, Training and Inference
jobs. That value is bound by the number of the parameters in the model, batch size (Table 4.1), and size
of a data instance. It differs for the Inference job on ResNet50 as predictions were calculated on the CPU.
The HPO and Training jobs are the most computationally expensive. The tasks require forward pass of the
data and weights update through backward propagation (both consists of many tensor operations). The
Inference jobs are characterized by lower average GPU utilization as only the forward pass of the data is
needed to make predictions. The HPO and Training jobs for the BiLSTM model expect a large file with
the pre-trained sentence-level embeddings as input (a common practice for NLP tasks). Once the model is
trained, the required fine-tuned embeddings are stored within the model’s weights resulting in a low I/O
read value for the BiLSTM Inference task.
Table 4.4: Workflow execution profiles. (Baseline scenario)
Aux. Input Input Container I/O I/O Peak Peak GPU CPU GPU
Workflow Jobs Jobs Files Size (GB) Size (GB) Read (GB) Write (GB) Memory (GB) Memory (GB) Hours Hours
Galaxy Classification 11 12 28793 0.374 2.4 6.29 6.14 41.19 4.42 33.23 1.93
Lung Segmentation 7 12 1408 3.6 4.1 12.56 8.05 5.96 22.99 37.63 1.64
Crisis Computing 16 12 12747 3.2 4.6 55.89 3.2 16.74 22.64 48.76 2.45
The Evaluation jobs generate performance metrics and plots that help assess the quality and robustness
of the trained models. Inference and evaluation steps are sometimes performed in the same script during
model development. However, these tasks are often separated when a model is deployed in production. The
Evaluation jobs are not computationally expensive but with the growing stress on trust and explainability
in AI, we expect the complexity of the evaluation to increase in the future and as a result, the cost will
grow as well.
4.4.2 Workflow-level Characterization
The Crisis Computing Workflow is characterized by the largest amount of both the CPU and GPU hours.
This can be attributed to the structure of the workflow that consists of two training pipelines. Moreover, in
84
the image pipeline, we employ "on the fly" image augmentation (i.e., new versions of images are generated
on CPU during training and are transferred to GPU), which further increases the number of CPU hours.
The images used in the Galaxy Classification workflow have a low resolution resulting in the small
size of the workflow’s input. The galaxies are classified based on their shape as presented in the image,
and this tasks is not as intricate as deciding whether an image is informative. It also does not require high
quality images for complex feature extraction.
As seen in Table 4.4, the workflows are using many input files, ranging from a couple thousand (Lung
Segmentation) to over 28,000 (Galaxy Classification), that are sometimes small in size. Thus, different
data management strategies with a workflow, when data moves between tasks, might affect the overall
workflow execution. In the next section, we explore 4 different scenarios that are supported by Pegasus
WMS and optimize for data access and data placement.
4.5 Experiments
Now that we have defined and characterized our different workflows using a basic configuration, we study
the impact of data management on end-to-end workflow performance using different scenarios defined in
Section 4.3.2.
Results. For each workflow and each execution scenario, we analyze the total time the workflows took to
complete, the cumulative compute time spent on the jobs and the cumulative time spent in staging in/out
data for the computations.
In Figure 4.6, the cumulative compute time remains fairly similar across all workflows for all scenarios,
except NFS. In the NFS configuration apart from input data being picked up from the shared location, the
NFS was used as the execution’s scratch location and all outputs were produced directly to the shared
filesystem. This slowed down the compute time, since some of the jobs (e.g, HPO, Train) were I/O heavy
(Table 4.3).
85
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Crisis Computing
Lung Segmentation
Galaxy Classification
Cumulative compute time (s)
Workflow
Baseline Container Inst. NFS
Figure 4.6: Cumulative compute time for each workflow when running on Chameleon Cloud.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Crisis Computing
Lung Segmentation
Galaxy Classification
End-to-end execution time (s)
Workflow
Baseline Container Inst. NFS
Figure 4.7: Workflows end-to-end execution time for each scenario when running on Chameleon Cloud.
86
0
500
1000
1500
2000
2500
3000
Baseline Container NFS
Seconds
Scenario
Cumulative stage-in time
Cumulative stage-out tume
(a) Crisis Computing
0
100
200
300
400
500
600
700
Baseline Container NFS
Seconds
Scenario
Cumulative stage-in time
Cumulative stage-out tume
(b) Lung Segmentation
0
500
1000
1500
2000
2500
3000
3500
Baseline Container NFS
Seconds
Scenario
Cumulative stage-in time
Cumulative stage-out tume
(c) Galaxy Classification
Figure 4.8: Cumulative stage-in and stage-out time for each workflow when running on Chameleon Cloud.
The results in Figure 4.8 were anticipated, since each scenario was designed to further optimize time
spent on staging in and out data. For all the workflows a reduction in staging time is observed moving
from the Baseline towards the NFS scenario, with these two having the slowest and the fastest times
respectively. Even though all of the workflows show an overall improvement across the scenarios, it is not
in the same proportion. For the Crisis Computing workflow the Baseline was improved by 700 seconds
when we moved to the Container Installed scenario and a further 800 seconds when we moved to the NFS
scenario. On the other hand, the Galaxy Classification workflow was improved by only a few hundred
seconds. This result can be attributed to the difference in container size used for each of the workflows
(4.6 GB vs 2.4 GB), as well as the difference in number of input files (12,747 vs 28,793) and total input
size (3.2 GB vs 0.374 GB) - the Galaxy Classification workflow has many small files. Finally, in Figure 4.7
the data placement and stage in/out optimizations affect all the workflows and a reduction in end-to-end
execution time is observed between the Baseline and the Container Installed scenario. Although, the NFS
case had faster stage in/out times than the Baseline and Container Installed scenarios, the overhead in
compute time overshadows the staging gains and results in the slowest end-to-end execution time.
87
4.6 Conclusion
Dynamic Data Driven Application Systems (DDDAS), are relying more and more in machine learning
components to remove the man in the loop, and generate accurate and fast predictions. In this chapter, we
presented three scientific machine learning workflows, we have developed, and we evaluated their performance on modern cyberinfrastructure. We first conducted a characterization study of their performance
during each phase of their life cycle (preprocessing, hyperparameter optimization, training, inference and
evaluation). As expected, scientific ML workflows can be quite data-intensive. The Galaxy workflow, for
example, reads and writes more than 13 times it is input size. Additionally, even though they achieve
a great speed up from GPUs, they do not utilize them to their maximum capacity, and GPU utilization
greatly varies from one ML workflow to another (from 21% to 68%). The second part of our evaluation
focused on how different cyberinfrastructure configurations, affect these data-intensive workflows, especially when it comes down to data location and filesystem configurations. We have demonstrated that
the NFS data configuration can improve time spent during the staging in and staging out phases of the
workflow (by over 60% in the case of the Crisis Computing workflow), but it fails to improve the overall
end-to-end execution time of the workflow due to a significant performance penalty of sharing network
and filesystem I/O during the execution of the jobs. The penalty is so severe that in the case of the Galaxy
workflow the NSF setting results in up to 13% slower end-to-end execution times, compared to the baseline
for the Galaxy workflow. In the next chapter we are going to address these shortcomings, and improve
the execution performance by reducing the network requirements of the workflows, through workflow
management system optimizations that can be performed during their planning phase.
88
Chapter 5
Reduction of Network Contention Using Workflow Management
System Mechanisms During the Workflow Planning Phase
Dynamic Data Driven Application Systems (DDDAS) need a reliable data infrastructure to satisfy their
QoS constraints, since they constantly need to consume data directly from instruments and other data
repositories (Section 1.2). As we saw in the previous chapter of this thesis, the choices DDDAS developers
make, as they deploy them on modern cyberinfrastrcture, can impede their systems’ performance. This
occurs particularly when they are configuring how their system will be accessing data. Additionally, peak
network requirements of the DDDAS workflows can contribute to capping their performance due to limited available network resources, such as bandwidth. Thus, we need to provide mechanisms that can aid
DDDAS developers to better manage their workflows and optimize their network performance. We can
approach this in two different ways, (1) with techniques that we employ before a workflow gets submitted
to the system, and (2) with techniques after workflow submission, while the workflow is running (Chapter 6). In this chapter we develop techniques that can be used before workflows are submitted. These
techniques are integrated into workflow management systems, so that they can reduce the peak network
requirements of the workflows using optimizations at planning time, with a goal to maximize data reuse
and data locality. We then evaluate a optimization technique that uses workflow analysis during the planning phase of the workflow and clusters similar tasks together to improve data reuse. Additionally, we
89
propose and evaluate an approach that selectively places jobs at the edge, where the data reside. The goal
of this solution is to reduce wide area network transfers and improve data locality. Both approaches aim
to reduce peak network needs of the workflows and improve their overall performance.
5.1 Related Work
Prior works focused in developing novel workflow optimization algorithms to reorganize the workflow
directed acyclic graphs (DAG), without introducing additional development time to the users. Initially, the
motivation behind these works was to improve the ratio of the time a job spent waiting in the scheduler
queue vs time spent during execution (especially in the cases where the task execution time is short). Rynge
et al. [154] investigated a clustering approach, implemented using the MPI master-worker paradigm, to
enable a more efficient high-throughput workflow execution on large HPC facilities. High-throughput
workflows have the tendency to incorporate thousands of tasks when they are running at scale, and as
result when they are submitted to HPC schedulers as individual jobs, they may face strict scheduler limits
(e.g., maximum submissions in the queue, maximum active running jobs), that will slowdown they overall
execution. The MPI based clustering approach proposed by Rynge et al. avoids these delays by sending all
the workflows’ jobs as a single MPI job. Singh et al. [169], took a different approach in reducing queuing
delays, by clustering compatible tasks together using horizontal or vertical clustering algorithms. Others
looked into further optimizing the fault tolerance and the performance of clustered jobs using dynamic
reclustering techniques and imbalance metrics [31],[32].
With the increased use of cloud computing, researchers have started looking in clustering techniques of
science workflows, in order to optimize the resource utilization of the cloud compute infrastructure. Sahni
et al. [157] suggested a clustering heuristic that takes into account both the workflow structure and the
the available cloud resource set in order to achieve maximum parallelism among tasks while minimizing
system overheads and resource wastage. However, even though, these works focus on optimizing compute
90
resource utilization, increasing fault tolerance and reducing the queuing delays, they do not look into the
effect that clustering has on the data requirements of a workflow and its associated network requirements.
In this chapter of this thesis, we present how clustering aids in reducing peak network requirements of
data-intensive applications, by increasing data reuse and reducing the total amount of transfers that need
to take place during the execution of the workflow.
More recently, with the advent of edge computing and the concept of the edge-to-cloud continuum [85]
works have focused in the creation of frameworks and tools to facilitate execution both at the edge (closer
to the data) and on the cloud, in order to lower operational costs of network bandwidth usage, satisfy data
privacy requirements and optimize for the time spend transferring data versus performing computations.
Various software systems and frameworks have been developed to facilitate application deployment and
job execution specifically in edge-cloud environments. KubeEdge[196] is one such framework. Built as
an add on to Kubernetes, KubeEdge extends native containerized application orchestration and device
management to hosts at the edge. AWS IoT Greengrass[1] is an edge runtime and cloud service for building,
deploying, and managing IoT devices. The edge runtime allows AWS Lambda functions to be run on target
edge devices in addition to the cloud. Steel [129] is a system that automates deployment across edge-cloud
environments and provides monitoring capabilities. RACE [29] is an edge-cloud framework that allows
cloud-edge applications to continuously correlate or join data from multiple edge devices by means of a
novel cost-based optimizer developed to minimize communication time.
Due to the influx in data being generated by sensors, cameras and other devices, services that facilitate
data collection specifically have emerged. Google Cloud IoT[69], which targets data collection applications
via the edge, is a cloud based service for securely connecting and managing IoT devices. Services provided
include device registration, authentication, and an MQTT/HTTP [122] based communication channel between edge and cloud. Firework [204] focuses on distributed data sharing and processing for big data
91
applications while keeping computations within stakeholders data facilities to adhere to data privacy constraints.
A
B B B B
C C C C
D
A
B B B B
C C C C
D
Horizontal
Clustering
Figure 5.1: Horizontal clustering example. Horizontal clusters don’t present intra-cluster dependencies,
any execution order maintains DAG semantics.
In this chapter, we propose a new approach that reduces the network requirements of the DDDAS that
span from edge (e.g., instruments and data repositories) to cloud, and it can be applied during the planning
phase of the workflow management. First we explore the how clustering affects the network needs of dataintensive workflows, such as the machine learning pipelines we presented and characterized in Chapter 4,
and then develop and evaluate a methodology to execute tasks closer to the edge where the data reside.
This increases data locality and reduces the need to transfer excessive amounts of data over wide area
networks, avoiding potential network congestion slowdowns.
5.2 Task Clustering
In the past, clustering tasks together into bigger jobs has been used as an approach to reduce scheduler
queue overheads and increase the work done by the submitted job [154],[169]. Pegasus WMS [42], supports
a variety of clustering algorithms. The most commonly used ones are horizontal clustering and label
clustering. In horizontal clustering (Figure 5.1), Pegasus groups tasks at the same level of the workflow
92
that have no dependencies between them to create larger jobs. Hints, such as the maximum number of
tasks per cluster, can be used to control the size of these clusters. In label clustering (Figure 5.2), labels
indicate which tasks will be clustered together. In this mode the user has more fine grained control over the
created clusters and because tasks at different levels can be clustered together, topological sorting is used
to guarantee the expected execution sequence of the tasks and task inter-dependencies. The clustering
algorithms are being applied by the Pegasus planner during the conversion of the abstract workflows into
executable ones.
A
B1 B1 B2 B2
C1 C1 C2 C2
D
A
B2
C1 C2
D
Label
Clustering
B1 B1
C1 C2
B2
Label 1 Label 2
Figure 5.2: Label clustering example. Subscripts designate the label assigned to each task. Within clusters
execution happens based on topological sorting.
Apart from reducing queuing delays, clustering tasks together has some additional benefits that have
not been explored and quantified before in prior works. Task clustering can increase data reuse at the
workers, which in turn reduces the network requirements of the workflow, since fewer files need to be
transferred to facilitate the execution of the tasks. For example, in horizontal clustering, if the clustered
tasks share the same input data or configuration files, these need to be transferred only once per cluster
job. In label clustering, however, we do not only get the benefit of reusing the input files of the job, but
there is also the chance of reusing intermediate products. Tasks that depend on the output products of
other tasks in the workflow, they can reuse the data produced directly on the worker node, eliminating
93
the need to fetch their input from a shared location. In both cases, network requirements are reduced by
reducing the need for file transfers to facilitate the workflow execution.
5.2.1 Effect of Task Clustering in Network Requirements
To evaluate the effect that task clustering has on network requirements of DDDAS workflows, we are
going to use the machine learning workflows presented in Section 4.2. Similar to Chapter 5, the workflows
are deployed in the non-shared filesystem setup on Chameleon, as seen in Figure 4.4 and described in
Section 4.3.1.
To showcase how clustering affects the data-intensive machine learning workflows, we use the same
baseline setup described in Section 4.3.2 and on a newly introduced task clustering scenario. In the task
clustering scenario we aim to minimize transfers to and from the staging area and optimize data reuse
among tasks, while input data are still staged in using HTTP and output files are sent back over SCP (Figure 4.4), since many data repositories are configured to be accused using these protocols. More specifically,
for the Lung Segmentation and Galaxy Classification workflows we cluster all tasks in a single job, but, for
the Crisis Computing workflow we create two clustered jobs, one for the image pipeline and one for the
text pipeline, and we leave the late fusion as a single job. The two different clustering approaches were
designed in order to maximize data reuse and execution parallelism, after taking into consideration the
workflow DAG structures. In the Clustering scenario we assign an entire compute node to each cluster of
tasks and we use a maximum of 24 threads to conduct the transfers. We collect statistics using the online
monitoring framework, we developed and presented in Chapter 3. We execute each workflow 10 times per
configuration, and present the average and standard deviation results over these 10 runs where applicable.
The first major difference that can be noticed across the two scenarios is highlighted on Table 5.1. We
analyzed the final executable machine learning workflows and for all three of them we observe a significant reduction in files that are being staged in during execution. In the case of the Galaxy Classification
94
workflow and the Crisis Computing workflow, clustering reduced the staged in files by ~75%. In the Lung
Semgentation case, the reduction of staged in files reached ~85%.
Table 5.1: Machine learning executable workflow scenarios and transfers settings (Baseline, Clustering)
Aux. Transfer Files
Workflow Scenario Jobs Jobs Threads Staged In
Galaxy Baseline 11 12 8 111950
Classification Clustering 1 4 24 28803
Lung Baseline 7 12 8 9446
Segmentation Clustering 1 4 24 1417
Crisis Baseline 16 12 8 51006
Computing Clustering 3 6 24 12758
As a result, the reduction in number of files that are required to be staged in, directly translates into the
reduction of time spent transferring data for all three machine learning workflows. Figure 5.3 presents the
cumulative stage in and stage out time spent during the execution of the workflows for both the baseline
and the clustering scenario. In the case of the Crisis Computing and Galaxy Classification workflows
(Figure 5.3a and Figure 5.3c), the workflows save over 2000 seconds in data transfer operations while using
the clustering scenario. On the other hand, the Lung Segmentation workflow, improves the data transfer
times by ~530 seconds (Figure 5.3b). In all cases the reduction of time spent in data transfer operations is
over ~73%, compared to the baseline.
0
500
1000
1500
2000
2500
3000
Baseline Clustering
Seconds
Scenario
Cumulative stage-in time
Cumulative stage-out tume
(a) Crisis Computing
0
100
200
300
400
500
600
700
Baseline Clustering
Seconds
Scenario
Cumulative stage-in time
Cumulative stage-out tume
(b) Lung Segmentation
0
500
1000
1500
2000
2500
3000
3500
Baseline Clustering
Seconds
Scenario
Cumulative stage-in time
Cumulative stage-out tume
(c) Galaxy Classification
Figure 5.3: Cumulative stage-in and stage-out time for each workflow.
95
Clustering tasks together and increasing the size of the jobs also affects the time workflows spent in
computations. In Figure 5.4a we present the total cumulative compute time for each across across the two
configurations. Clustering does not introduce any overhead over the baseline in computational time. In
fact, we observe a slight improvement, since the clustered tasks can reuse the same container deployment,
without having to start multiple container instances. Finally, Figure 5.4b depicts that all workflows saw
an improvement in their end-to-end execution time. The Crisis Computing workflow makespan improved
by ~1200 seconds, the Lung Segmentation workflow makespan improved by ~500 seconds, and the Galaxy
Classification workflow makespan improved by ~900 seconds. It is important to note that the improvement
in makespan does not match the improvement in time spent transferring data, since data transfers are
parallelized and we presented the total cumulative time in Figure 5.3. Additionally, in contrast to the
shared filesystem data configuration case we presented in Chapter 4, clustering reduces the time spent in
data transfers and it also reduces the end-to-end workflow execution time.
0
1000
2000
3000
4000
5000
6000
7000
8000
Crisis Computing
Lung Segmentation
Galaxy Classification
Cumulative compute time (s)
Workflow
Baseline
Clustering
(a) Cumulative compute time for each workflow
0
1000
2000
3000
4000
5000
6000
7000
8000
Crisis Computing
Lung Segmentation
Galaxy Classification
End-to-end execution time (s)
Workflow
Baseline
Clustering
(b) Workflow end-to-end execution time
Figure 5.4: Cumulative compute time and end-to-end workflow execution time.
In the next section we present how DDDAS workflows can be orchestrated from the edge to the cloud,
and how placing tasks at the edge, where the data reside, can reduce the workflows’ peak network requirements.
96
5.3 Reducing Network Requirements of DDDAS Workflows Using the
Edge To Cloud Paradigm
Edge-cloud computing environments make it possible for DDDAS to capitalize on the advantages offered
by both computing paradigms: faster response times, data locality, and cost savings at the edge, and scalability, high availability and reliability provided by the cloud.
5.3.1 Orchestrating DDDAS Workflows From Edge To Cloud
In order to orchestrate workflows that span edge and cloud resources, we leverage resource selection hints
offered by the Pegasus WMS [42], and HTCondor’s [177] matchmaking capabilities to match jobs to the
appropriate resources (edge or cloud). To match jobs specifically with edge or cloud resources, we added an
additional attribute, MACHINE_RESOURCE_TYPE = {edge,cloud}, to the machine classad of each HTCondor
worker, which indicated whether or not that resource was an edge or cloud resource. When creating the
Pegasus workflows for each of the execution environments, we indicated which type of resource the job
should be matched to using the Python API for describing Pegasus workflows. Internally, HTCondor takes
into account this requirement in addition to other job requirements such as required number of CPUs,
RAM, disk space, etc.
In our design, data movement operations for each workflow uses HTTP, SCP, and local file system
operations. These are managed by the pegasus-transfer utility. Pegasus-transfer is invoked for each job to
handle staging in necessary input data and staging out generated data products. For jobs that are scheduled
on locations where input data already resides, symlinks are used by pegasus-transfer to avoid unnecessary
data movements and reduce overall disk usage.
97
5.3.2 Experimental Setup
In this section we present the experimental setup we used to evaluate the edge-cloud solution for DDDAS.
We developed three workflow cases, one synthetic and two based on production DDDAS workflows, and
we evaluated how accommodating a split edge-cloud execution can help reduce network requirements and
improve the end-to-end performance of the workflows.
5.3.2.1 Workflows
keg_1 keg_1 keg_1
1_1.txt
input_1.txt input_2.txt
1_2.txt 1_32.txt
. . .
input_32.txt
keg_merge
merge.txt
Input File
Intermediate File
Output File
Compute Job
keg_2 keg_2 keg_2
2_1.txt 2_2.txt 2_32.txt
. . .
Figure 5.5: Synthetic Workflow.
Synthetic. The synthetic workflow (Figure 5.5), was developed to represent data aggregation and analytics
applications, which run in edge-cloud environments. For such applications, initial input data is derived at
the edge from multiple instruments such as a cameras and sensors. Each input goes through preprocessing
steps before being aggregated by a single job that outputs the final result. This workflow is modeled after
98
a video analytics application which can search for and trace the route of a missing object or person using
surveillance footage aggregated from security cameras and other IoT devices geographically scattered
across an urban area [190, 159].
Each of the 32 initial input files in this workflow is 1024 MB. There are 3 levels of synthetic jobs. Jobs
at levels 1, 2 and 3 are labeled keg_1, keg_2, and keg_merge respectively. Output files from jobs at level
1, 2, and 3 are 500 MB, 250 MB, and 250 MB for the final output file. Workflow jobs are implemented
using the pegasus-keg executable, a stand-in for a typical executable which can be configured to simulate
computation and I/O. In total, there are 86 jobs in this workflow. Job runtimes vary based on their level in
the workflow. Jobs at level 2 run 50% faster than jobs at level 1. The final job at level 3, keg_merge, runs
50% slower than jobs at level 1.
CASA-Wind. The second workflow we used is a production workflow of the CASA systems [119]. The
CASA Wind workflow was described in Chapter 2 and a detailed description of the workflow alongside
with the workflow DAG can be found in Section 2.2.4.2 and Figure 2.4 respectively.
Orcasound Workflow. The Orcasound application was presented in Section 1.2.2.1. Figure 5.6 depicts
the implementation of the Orcasound Pegasus workflow, which processes the hydrophone data of one or
more sensors in batches for each timestamp, and converts them to a WAV format. Using the WAV output it
creates spectrogram images that are stored in the final output location. Furthermore, using the pretrained
Orcasound model, the workflow scans the WAV files to identify potential sounds produced by the orcas.
These predictions are merged into a JSON file for each sensor, and if data from more than one sensor are
being processed, the workflow will create a final merged JSON output for all. In our experiments, we used
data from a single hydrophone sensor over the span of a day. The workflow consumed 8641 recordings
with a total size of 1.5GBs and median size of 181KBs.
99
convert2wav
wav/sensor_1/ts/{batch_1}
convert2spectrogram
png/sensor_1/ts
merge_predictions
predictions_sensor_1.json
inference
predictions_all.json
. . .
Input File
Intermediate File
Output File
Compute Job
sensor_1/hls/ts/{batch_1}
predictions_sensor_1_ts_batch_1.json
orca_model
merge_predictions
convert2wav
wav/sensor_1/ts/{batch_K}
convert2spectrogram
png/sensor_1/ts
inference
sensor_1/hls/ts/{batch_K}
predictions_sensor_1_ts_batch_K.json
orca_model
. . .
convert2wav
wav/sensor_N/ts/{batch_1}
convert2spectrogram
png/sensor_N/ts
inference
sensor_N/hls/ts/{batch_1}
predictions_sensor_N_ts_batch_1.json
orca_model
merge_predictions
convert2wav
wav/sensor_N/ts/{batch_K}
convert2spectrogram
png/sensor_N/ts
inference
sensor_N/hls/ts/{batch_K}
predictions_sensor_N_ts_batch_K.json
orca_model
. . .
. . .
predictions_sensor_N.json
. . .
. . .
Figure 5.6: Orcasound Workflow.
5.3.2.2 Execution Environment Setup
We used the Chameleon Cloud testbed [87] for our experiments. We emulated an edge-to-cloud scenario,
and provisioned nodes from both Chameleon sites as shown in Figure 5.7.
Figure 5.7: Experimental Setup on Chameleon.
In the TACC Chameleon testbed site, we deployed our cloud site, where we assumed we could get
unlimited resources. There we created our workflow submit node and two worker nodes (48 cores and
192GB of RAM each), which were connected using a 10Gbps network. In UC, we deployed our edge host
(24 cores and 192GB of RAM), which was hosting all the data and was offering compute capability via
100
compute slots that were dynamically provisioned using Docker containers. The data were attached to the
compute slots using volumes and the produced outputs were sent back to the submit node using SCP. In
case input data was needed for computations by the cloud worker nodes, they were served directly via
HTTP. The two sites were connected using a 1Gbps network. To execute our workflow scenarios we used
Pegasus v5.0 and we created an HTCondor pool to manage the resources. Additionally, because we wanted
to emulate a less powerful machine on the Edge, we used Docker to throttle the CPU usage of the compute
slots to 67%. Finally, to maintain a consistent environment across all nodes, we used the more lightweight
Singularity containers to create the environment for the workflow jobs.
5.3.2.3 Evaluation
0
20
40
60
80
100
120
140
160
synthetic casa-wind orcasound
Relative Makespan (%)
Workflow
Edge
100% 100% 100%
Edge-Cloud
117%
88%
101%
Cloud
138%
69%
57%
Figure 5.8: Workflow makespans.
Workflow Performance.
The synthetic, CASA-Wind, and Orcasound workflows were each run 10 times for each of the three execution scenarios: edge only, edge-cloud, and cloud only. The following metrics were averaged over 10 runs
for each workflow: makespan (Figure 5.8), cumulative job walltime (Figure 5.9), cumulative job walltime
101
including queuing delays (Figure 5.10), cumulative time spent transferring data between edge and cloud
(Figure 5.11), and total amount of data transferred between edge and cloud (Figure 5.12).
Makespan is the wall clock time of the entire workflow run. Cumulative job walltime is the sum of
all compute times for each job not including queuing delays and time to stage input and output files. Figure 5.10 includes queuing delays and data staging times. Cumulative time spent transferring data between
edge and cloud is the sum of all time spent transferring data between the edge host (at the UChicago
Chameleon testbed site) and the cloud (at the TACC Chameleon testbed site). Note that for the edge only
case, final output files generated from the workflow are transferred back to the cloud. For the cloud only
case, all initial input files to the jobs are transferred from the edge to the cloud.
Each metric uses the edge only scenario as a baseline. In our experiments, we analyze each metric on
a per workflow basis with respect to the execution scenario, as each workflow has varying characteristics:
number of jobs, job runtimes, dependency structure, and amount of data moved.
0
20
40
60
80
100
120
synthetic casa-wind orcasound
Relative Cumulative Job Walltime (%)
Workflow
Edge
100% 100% 100%
Edge-Cloud
92%
56%
82%
Cloud
77%
52% 51%
Figure 5.9: Cumulative job walltime.
Synthetic. The synthetic workflow has the shortest makespan when run in an edge only environment,
despite having the longest cumulative job walltime among the other execution environments. In the edge
only scenario, only 250 MB needs to be transferred between edge and cloud as opposed to the 16 GB and
102
0
50
100
150
200
250
300
350
400
synthetic casa-wind orcasound
Relative Cumulative Job Walltime (%)
Workflow
Edge
100% 100% 100%
Edge-Cloud
184%
66%
108%
Cloud
357%
90% 89%
Figure 5.10: Cumulative job walltime as observed from the submit node.
32 GB when run in edge-cloud and cloud respectively. When run using edge-cloud the makespan is 17%
slower, stemming from the almost 6300% increase in the amount of data which needs to be transferred
between the edge and the cloud. The workflow was configured for this environment such that all initial
workflow jobs, denoted as keg_1 in Figure 5.5, were scheduled on the edge host where the initial input
files are already stored. All subsequent jobs (keg_2 and keg_merge) were scheduled to run in the cloud
requiring all keg_1 outputs to be sent over the WAN.
CASA-Wind. The CASA-Wind workflow exhibited best performance in terms of makespan when run only
in the cloud, yielding a 31% improvement over running at the edge. Both edge-cloud and cloud only see a
decrease in cumulative job walltime by about 46%, due some or all jobs running in the faster cloud. When
factoring in queuing delays and time to stage data (Figure 5.10), we see that edge-cloud performs the best
with a 34% improvement over edge-only and 24% improvement over cloud only due to the amount of data
that needs to be transferred over the slower WAN. 5914.80 MB must be transferred between the edge and
the cloud when only cloud compute resources are used in contrast to 13.60 MB when both edge and cloud
are utilized.
103
0
500
1000
1500
2000
2500
3000
3500
synthetic casa-wind orcasound
Relative Cumulative Transfer Time (%)
Workflow
Edge
100% 100% 100%
Edge-Cloud
1080%
143% 186%
Cloud
3010%
371%
193%
Figure 5.11: Cumulative time spent on transferring data over WAN.
10
100
1000
10000
100000
1x106
synthetic casa-wind orcasound
Relative Data (%)
Workflow
Edge Edge-Cloud Cloud
Figure 5.12: Total data transferred over WAN (Yaxis in logscale(2)).
104
Orcasound. When run in a cloud only environment, the Orcasound workflow runs about 43% faster
compared to running in edge only and edge-cloud environments. Compute jobs performing inference,
which make up a significant amount of computation required by the Orcasound workflow, take an average
of 36.29 seconds on the cloud compared to 82.26 seconds when run on the edge. Additionally, running in
an edge-cloud environment requires roughly 10 times more data to be moved between the edge and the
cloud. In terms of cumulative time spent performing data movement over the WAN, there was only a 7%
difference between the edge-cloud and cloud only environment runs relative to the edge only environment,
indicating that the 43% improvement in makespan was likely due to the 49% decrease in cumulative job
walltime when running in the cloud.
5.3.2.4 Discussion
The CASA-Wind and Orcasound workflows both perform the best in regards to makespan when run in a
cloud only environment while the synthetic workflow runs the fastest at the edge. When data movement
and queuing delays are factored into cumulative job walltimes (Figure 5.10) it is evident that these factors
can negatively impact the performance gained by running on faster hardware (illustrated in Figure 5.9).
Furthermore, the amount of data moved between the edge and the cloud varies vastly depending on the
execution environment and job placement. For the Synthetic and CASA-Wind workflows, Figure 5.3 and
Figure 5.11 demonstrate that placing some of their computations at the edge, when possible, can reduce
the time spent transferring data and also reduce the total amount of data that need to be sent over WAN.
The Orcasound workflow on the other hand, because it creates more data than it consumes, increases network requirements in the edge-cloud setting. As a result careful consideration by the DDDAS application
developers is required when deciding how to split their workflows during the planning phase, since not
all of the workflows benefit by a split edge-cloud execution.
105
5.4 Conclusion
Dynamic data driven applications can experience degraded performance, due to their high network requirements, increasing network congestion. A solution involves workflow optimizations that reduce the
peak network requirements of the DDDAS workflows. In this chapter we developed and evaluated mechanisms that take place at workflow planning phase and can effectively reduce the workflow network needs,
by increasing data reuse and data locality. The first mechanism we evaluated is workflow DAG restructuring through clustering tasks together, that increases data reuse at the worker nodes. To quantify the effect
of clustering on the network requirements of DDDAS workflows, we used three machine learning workflows that we developed and presented in Chapter 4. Our experiments show that clustering can decrease
the data transfer needs of the data-intensive workflows by a significant amount. More specifically, our results show that for our target workflows, clustering reduced the data transfer needs by up to 85% and the
time spent on data transfers improved by over 73%, compared to the baseline scenario. The three machine
learning workflows also achieved a speed up in end-to-end execution time of up to 17%. Furthermore,
with edge computing being increasingly used for data-intensive applications, we developed an approach
to automate the execution of DDDAS workflows in hybrid edge-to-cloud execution environments by leveraging the matchmaking feature of HTCondor and evaluated the approach using a synthetic workflow and
two production DDDAS workflows from CASA [119] and Orcasound [30]. An edge-to-cloud hybrid environment can improve the end-to-end execution time of the DDDAS workflows but it can also reduce
their network requirements by increasing data locality. For example, in the case of the CASA-Wind workflow, data requirements are decreased by over 100 times in the hybrid scenario vs the cloud only scenario.
However, this hybrid edge-to-cloud execution approach does not benefit all workflows. The workflows
that produce more data than they consume are not getting any benefits. In fact they may experience an
increase in the network requirements as seen in the case of the Orcasound workflow.
106
Even though both these approaches can reduce network needs by increasing data reuse and data locality, they are taking place during the planning phase of the workflow and do not allow for any changes
during execution. In the next chapter of this thesis, we present techniques and methodologies that can be
applied at runtime. These can help regulate the allocated network resources assigned to the DDDAS workflows, reduce network contention, and promote a more fair utilization of the available network resources.
107
Chapter 6
Reduction of Network Contention Using Application-Aware Software
Defined Flows and Workflow Management System Mechanisms During
the Workflow Running Phase
As we’ve discussed in previous chapters, network performance and data transfer times have a significant
role in the overall workflow performance of data-intensive applications. In production, Dynamic Data
Driven Application Systems (e.g., CASA [119]) use multiple workflows that work together (workflow ensembles) in order to produce the required information needed to respond to an event. These workflows
access the same data repositories concurrently, increasing network congestion, which can have a negative impact to the overall performance of the workflow ensembles. Modern cyberinfrastructure supports
dynamic provisioning of multiple high bandwidth links (Chapter 2). However, since these capabilities
are offered upon availability of physical resources (e.g., network switch ports, bandwidth capacity, etc.)
DDDAS applications can become limited by the number of network links they can provision at any given
point in time, as they scale their resources. As we have seen in Chapter 5, workflow management systems
can help alleviate network pressure by applying optimizations at planning time and before submitting
workflow jobs to the scheduler queues, which can reduce the network requirements of the workflows
(clustering and co-locating jobs with data when possible). However, even though these approaches can be
effective in reducing network requirements, they do not allow users and administrators to apply on-the-fly
108
policies that can further aid to resolve the network pressure, while the workflow is executing. In this chapter, we present and evaluate two novel methods that enable operators to apply policies at runtime. These
solutions are dynamic and policies can be altered as the workflow ensembles execution progresses. The
first approach, introduces a virtual Software Defined Exchange (vSDX) architecture, through which we
can apply application aware policies to the cyberinfrastructure, and allocate network capacity to specific
workflows. The second approach leverages a workflow ensemble manager, that enables advanced management of workflow ensembles, allowing us to set concurrency policies and automate workflow triggering
based on events.
6.1 Related Work
Cloud services allow users to easily spawn and dismiss resources around the globe upon their realtime
needs. With its great flexibility, cloud computing has rapidly emerged as one of the most popular approaches for compute intensive and data-intensive applications. There has been extensive prior work on
the topics of cloud support for various types of science applications.
Resource management and provisioning for distributed applications has been the subject of many
research efforts. There have been extensive survey papers [103, 61, 37] in regards to provisioning IaaS
cloud resource for scientific workflows, where authors explore cloud elasticity for science workloads and
challenges for data-intensive workflows on multi site clouds. Wang et al. [188] propose an approach that
leverages Kepler [109] and CometCloud [187] to dynamically provision cloud resources, build scientific
workflows and execute them on a federation of clouds. Moreover, there have been strategies for workflow
systems to deploy virtual machines in the cloud with limited support for on-demand provisioning and
elasticity, while none or minimal support to infrastructure optimization is enabled. Ostermann et al. [139]
discussed a set of VM provisioning policies to acquire and release cloud resources for overflow grid jobs
from workflows, and characterized the impact of those policies on execution time and overall cost.
109
Some work focuses on the network aspects of cloud computing. Macker et al. [111] describe a workflow
paradigm that uses decentralized decision making to address network edge workflow scenarios, such as
one-to-many communication. Based on experiences with virtualized reservations for batch queuing systems, as well as coordinated usage of TeraGrid, Amazon EC2 and Eucalyptus (cloud) resources with fault
tolerance through automated task replication, Ramakrishnan et al. [149] built VGrADS and they were able
to develop a new workflow planning method to balance performance, reliability and cost considerations.
Liu et al. [104] developed the Virtual Science Network Environment (VSNE) that emulates the multi-site
host and network infrastructure, wherein software can be tested based on mininet with SDN capabilities.
Many prior works have focused in achieving a satisfactory Quality of Service (QoS) for applications
running on cloud resources. [198, 65, 185, 125]. Zhen et al. [198] propose a novel genetic-algorithm-based
approach to compose services in cloud computing using QoS estimates of the application services, achieving SLAs while maintaining the same volume of acquired resources. Varshney et al. [185] proposed QoS
based workload scheduling mechanism by considering energy consumption, execution cost and execution
time as QoS parameters. The Department of Energy’s ESNet has proposed an On-Demand Secure Circuits
and Advance Reservation System [134], which provides software system for booking time and resources
on high-speed science networks used by large teams of researchers to share vast amounts of data.
The work in this thesis presents two novel methods that enable cyberinfrastructure operators to apply QoS policies at runtime, aiming to improve network bottlenecks for the workflow ensembles sharing
network resources, across multi cloud sites. The first approach, introduces a virtual Software Defined
Exchange (vSDX) architecture, through which we can apply application aware policies to the cyberinfrastructure, and allocate network capacity to specific workflows. While the second approach leverages
a workflow ensemble manager, and enables advanced management of workflow ensembles and pacing,
allowing operators to set concurrency policies and automate workflow triggering based on events. We
evaluate both approaches with real world applications.
110
6.2 Approach
In order to accommodate different application QoS policies and make a more efficient and fair use of the
infrastructure among the workflow ensembles, we are extending the DyNamo system (Figure 2.2), that we
presented in Chapter 2 of this thesis, with a more sophisticated network configuration component, and
active application management techniques.
6.2.1 vSDX module
A Virtual Software Defined Exchange (vSDX) is defined as a virtual interconnect point between multiple
adjacent domains, e.g, instruments, compute resources, or data/storage systems. Like a static SDX, a vSDX
uses Software Define Networking (SDN) within the exchange to enforce different network policies.
In this thesis, the vSDX support is provided by the ExoPlex [197] network architecture depicted in
(Figure 6.1). ExoPlex uses an elastic slice controller to coordinate dynamic circuits and the Zeek (formerly
Bro) [202] security monitors via Ahab [6]. The controller runs outside of the vSDX slice and exposes
a REST API for clients to request network stitching and connectivity and to express QoS parameters.
Clients invoke this API to bind named subnets under its control to the vSDX via L2 stitching and request
bandwidth provisioned connectivity with other subnets. The vSDX slice is comprised by virtual compute
nodes running OpenVSwitch [101], OpenFlow controllers [135], and Zeek traffic monitors. Traffic flow and
routing within the vSDX slice are governed by a variant of the Ryu [156] rest router [155] SDN controller.
The vSDX slice controller computes routes internally for traffic transiting through the vSDX network,
and invokes the SDN controller API to install them. The SDN controller runs another Ryu module to
block traffic from offending senders. If a Zeek node detects that traffic violates a Zeek policy, it blocks the
sender’s traffic by invoking a REST API call via the Zeek NetControl plugin.
As client requests for bandwidth provisioned connectivity arrive at the vSDX, the slice controller instantiates slice resources as needed to carry the expected traffic. These resources include peering stitchport
111
Figure 6.1: Virtual Software Defined Exchange (SDX) Network Architecture
interfaces at each point of presence (PoP), the OVS nodes that host these vSDX edge interfaces, Zeek (Bro)
nodes to monitor the traffic, and backplane links to carry the traffic among the PoPs. The controller reuses
existing resources in the slice if they have sufficient idle capacity to carry the newly provisioned traffic,
and instantiates new resources as needed. In particular, it adapts the vSDX backplane topology by allocating and releasing dynamic network circuits as needed to meet its bandwidth assurances to its customers.
The flows are inspected by out of band Zeek network security monitor appliances to detect intrusion. As a
simple form of intrusion prevention, it uses Zeek’s NetControl framework to interrupt all traffic from the
source of a suspect flow. The vSDX controller deploys Zeek instances elastically to scale capacity.
In this thesis we have deployed the Exoplex Slice controller [50] as a docker container. Mobius [115],
[123] has been enhanced to communicate with the ExoPlex Slice controller via its REST API to establish
network connectivity between ExoGENI and Chameleon via layer2 networks and to allocate bandwidth to
individual workflows. Once connectivity is established, Mobius triggers REST API calls to publish network
prefixes, sets up routes between network prefixes and dynamically applies different bandwidths as needed.
Additionally, we have implemented a Python based interface that can be used to provision the required
112
#!/bin/bash
pegasus-em create wind-ensemble
pegasus-em web-file-pattern-trigger
--ensemble wind-ensemble
--trigger wind_1min
--interval 60s
--script run_script.sh
--web_location https://data.casa.umass.edu
--file_patterns .*netcdf.tar.gz
--timeout 60m
--args -n 10
Listing 6: Pegasus-EM Web File Trigger Example
resources. This interface enables programmatic resource provisioning and is capable of spinning up resources, establishing connectivity and implementing network QoS policies on a per workflow application
level.
6.2.2 Pegasus Ensemble Manager
In this chapter we wanted to explore how the network behavior affects not only the workflow but a set
of workflows competing for network resources. To this end we leverage the Pegasus Ensemble Manager
(Pegasus-EM) and integrate it into DyNamo. Through the Pegasus-EM service, the Pegasus WMS can
manage collections of related workflows, commonly referred to as ensembles. Pegasus-EM supports ensemble creation, workflow prioritization, workflow submission, throttling of concurrent executions, and
ensemble level monitoring capabilities.
As we have discussed in Section 1.2, Dynamic Data Driven Application Systems (DDDAS) need to
respond to changes and process new data as they arrive in their data repositories (Section 1.2.1, Section 1.2.2.1). As a result, DDDAS need to continuously monitor these repositories and trigger and manage
new processing workflows, dynamically, based on the flow of data obtained from various sources. To
113
satisfy this requirement, we have have extended Pegasus-EM with a workflow triggering capability that
supports three triggering modes (a) cron, (b) monitoring for local files, and (c) monitoring for web files.
• cron: This mode is similar to a cron job. On a predefined interval specified during the trigger’s
creation, Pegasus-EM executes a user-defined script that generates a new Pegasus workflow, which
is in turn added to the targeted ensemble.
• monitoring local files: In this mode Pegasus-EM monitors a local directory for new files. Based on
an interval specified during its creation, it checks for new files that match a file pattern and passes
them to a user-defined workflow generation script that dynamically creates and plans a Pegasus
workflow based on the incoming data. Pegasus-EM executes the workflow generation script and
queues up the generated workflow for execution.
• monitoring web files: This triggering mode is similar to the local file mode. In this case, however,
Pegasus-EM will monitor a remote web location
(HTTP) for new files that match the provided file patterns.
An example of a Pegasus-EM trigger monitoring for web files is presented on Figure 6. In the definition of
the trigger the following parameters need to be specified.
• ensemble: The targeted ensembe to which Pegasus-EM will queue up the new workflow
• trigger: A unique name for the trigger
• interval: The polling period that Pegasus-EM will check for changes
• script: User-defined script that handles workflow generation
• web_location: Web url of the remote repository
• file_patterns: A list of regex patterns that will be checked against the file names
114
• timeout: After an optional timeout time has elapsed and no new files have appeared, the trigger will
be deleted
• args: An optional parameter for any extra arguments that need to be passed to the user-defined
script
In the following section we demonstrate this functionality and evaluate the effectiveness of the Pegasus Ensemble Manager to trigger new workflows and apply QoS policies to two production use cases by
CASA[119].
6.3 Evaluation
6.3.1 CASA Pegasus Workflows
For the evaluation of the QoS impact we have selected two CASA workflows that produce nowcasts (Nowcast workflow) and wind speed estimates (Wind workflow) as described in Section 1.2.1. The implementation of the workflows into Pegasus was presented in Section 2.2.4, and the abstract dag of the Nowcast
workflow can be found in Figure 2.3, and the abstract dag of the Wind workflow can be found in Figure 2.4.
The workflow tasks include input data collection and product generation, visualization, contouring into
polygon objects, spatial comparisons of identified weather features with infrastructure, and dissemination
of notifications.
Workflow Testcases. To conduct our evaluation, both workflows are processing 30 minutes of precaptured real weather data, which we replay as if they were arriving in real-time to simulate a production
scenario from CASA’s operations. The individual files consumed by the nowcast workflow are 9.6MB in
size and the total size is 287MB. On the other hand the dataset for the wind workflows is comprised by
files with individual size of ~12MB, and the total dataset size is ~6GB. For the two workflows we replay
115
Figure 6.2: CASA vSDX workflow deployment.
the data using an accumulation interval of 1 minute and we are using Pegasus-EM to identify the newly
added files and queue nowcast or wind workflow to their respective ensembles.
6.3.2 Experimental Infrastructure
For evaluation, we used the DyNamo system (Chapter 2) to deploy a production scenario that is similar
to CASA’s day to day operational radar data processing setup, and spreads across both ExoGENI and
Chameleon testbeds (Figure 6.2). In our setup Mobius [123] and the vSDX controller are running within
Docker containers at our USC Information Sciences Institute (ISI) Docker cluster.
Additionally, we are using one of CASA’s operational nodes at the University of North Texas (UNT) in
Denton, TX, to host the data and submit the Pegasus workflows. The vSDX nodes and the workflow master
node are located on ExoGENI at the University of Massachusetts Amherst (UMass) rack, on separate slices,
while the compute nodes are located on Chameleon at TACC. To establish the layer2 connectivity between
the sites, Mobius “stitched” the UNT server to the workflow master node and instructed the vSDX controller to stitch the same node to the Chameleon nodes via the vSDX slice. The Chameleon compute cluster
contains 5 nodes, 4 compute nodes and 1 storage node. 3 of the compute nodes reside in the 192.168.40.0/24
subnet while the other compute node and the storage node reside in subnet 192.168.30.0/24. Each node
116
has 24 physical cores with hyperthreading (48 threads), 192GB RAM, 250GB SSD and is connected to a
shared 10Gbps network. During the experiments we did not use the storage node to optimize for network
traffic, but it was used as a next hop to route traffic from the subnet (192.168.40.0/24) that did not match
the Chameleon stitchport’s subnet.
As we have shown in our initial evaluation of the DyNamo system (Chapter 2) 144 and 48 HTCondor compute slots are enough to execute the nowcast and the wind speed workflow ensembles, respectively, without any compute imposed delays. Using HTCondor tags, the 3 compute nodes residing on
the subnet 192.168.40.0/24 have been assigned to nowcast workflow tasks, while the node on the subnet
192.168.30.0/24 has been assigned to the wind speed workflow. Finally, all the stitchable networks were
created with a network bandwidth of 1Gbps.
Software. On the submit node (where parts of the Dynamo system reside), the master node and the
worker nodes we have installed HTCondor v8.8.9, and we have customized its configuration to match the
role of each node. In this setup, the workers are configured with partitionable slots and they advertise a
workflow tag so they can be matched to the correct workflow. Additionally, on the submit node we have
installed the nightly build of Pegasus v5.0.0 and the Apache HTTP server, to allow the workers to retrieve
input files, configuration files and the application containers over HTTP. All of the workers use Singularity
v3.6.1, and Mobius was used to provision compute resources on ExoGENI and Chameleon, and establish
the network connections between ExoGENI, Chameleon and the CASA repository.
6.3.3 Workflow Ensembles - Network Requirements
We constructed two workflow ensembles that present different network requirements due to the amount
of tasks and the container transfers they instantiate. To better understand the network needs of these two
workflow ensembles, we profile the network utilization on CASA’s data repository at UNT (production
117
site), as these workflow ensembles are being executed, using a dedicated 1Gbps layer2 connection and the
testcase datasets described in Section 6.3.1.
Figure 6.3a shows that the wind workflow ensemble is executed for ~2100 seconds, has an average
bandwidth usage of ~200Mbps with a peak close to 240Mbps, while the total amount of data transferred is
~44GBs.
0
100
200
300
400
500
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
35
40
45
240
Mbps
GBs
Runtime (Seconds)
Egress Ingress Total Data
(a) Wind Ensemble
0
100
200
300
400
500
600
700
800
900
1000
0 500 1000 1500 2000 2500 3000 3500
0
50
100
150
200
250
300
Mbps
GBs
Runtime (Seconds)
Egress Ingress Total Data
(b) Nowcast Ensemble
Figure 6.3: Network Utilization
Figure 6.3b depicts the network utilization imposed by the nowcast workflow ensemble. The nowcast
calculations are occupying resources for ~3200 seconds and they lead the network to congestion for prolonged periods of time. The average network utilization is close to 900Mbps with spikes reaching 960Mbps,
and the total amount of data transferred is ~280GBs.
From Figures 6.3a and 6.3b it is clear that the two workflow ensembles cannot fairly share the shame
network resources without one of them impacting the other’s QoS constraints, since the nowcast workflow
ensemble will lead to prolonged network congestion.
6.3.4 Experimental Results
To conduct our vSDX study, we developed 3 scenarios without throttling the ensembles via Pegasus-EM.
118
• 1Gbps Dedicated: Each workflow ensemble had a dedicated 1Gbps layer 2 link provisioned with
Mobius, connecting the data repository to the compute resources.
• 1Gbps vSDX-Shared: Both workflow ensembles shared the same 1Gbps layer 2 link provisioned
with Mobius, connecting the data and compute. No QoS policies where applied.
• 1Gbps vSDX-Shared-QoS: Both workflow ensembles shared the same 1Gbps layer 2 link provisioned with Mobius, connecting the data and compute. Individual QoS policies were applied to each
workflow ensemble. 300Mbps for Wind and 700Mbps for Nowcast.
For each of the 3 scenarios, we repeated the workflow ensemble executions 5 times, leading to 900
workflow submissions and over 240,000 file transfers generating over 4TBs of network traffic. Figures 6.4a
and 6.4b present makespan statistics of the individual workflows of the ensembles, while Figures 6.5a and
6.5b present statistics of the individual data transfers of the workflow ensembles.
200
400
600
800
1000
1200
1400
1600
1800
2000
1Gbps
Dedicated
1Gbps
SDX-Shared
300Mbps
SDX-Shared-QoS
Runtime (Seconds)
Network Configuration
(a) Wind Ensemble
0
500
1000
1500
2000
2500
3000
3500
1Gbps
Dedicated
1Gbps
SDX-Shared
700Mbps
SDX-Shared-QoS
Runtime (Seconds)
Network Configuration
(b) Nowcast Ensemble
Figure 6.4: Workflow Makespans.
To explore the space for different ensemble throttling configurations, we executed the two workflow
ensembles under 42 unique configurations, varying the number of maximun concurrent wind and nowcast
workflows. In every configuration we make sure that wind ensemble concurrency is greater or equal of
the nowcast ensemble concurrency. We justify this decision to prune some of the possible configurations,
119
on the fact that nowcast is congesting the network while wind does not, and by exploring that space we
won’t get useful insight.
• Wind ensemble: Concurrency was altered from 1 to 16 with a step of 2.
• Nowcast ensemble: Concurrency was altered from 1 to 12 with a step of 2.
This resulted in over 2000 additional workflow executions and over 13TBs of data transfers. Figures 6.6a
and 6.6b present heatmaps of the average individual workflow makespans of the two ensembles. Figures 6.7a and 6.7b present heatmaps of the makespans of the two workflow ensembles. These results are
discussion in Section 6.3.4.4
6.3.4.1 Dedicated link performance
To conduct our vSDX analysis, we first executed the nowcast and wind workflow ensembles under the
best conditions possible, using 1Gbps dedicated layer2 connection and no Pegasus-EM throttling. For the
wind ensemble Figures 6.4a and 6.5a show a very consistent workflow duration (~300 seconds) and file
transfer duration with very little deviation. On the other hand, since the nowcast workflow was creating
network congestion we observe a noticeable deviation in both the workflow and file transfer durations
(Figures 6.4b and 6.5b). More than half of the workflows in the nowcast ensemble are completed within
less than 1500 seconds. However, there are workflow executions that take from 500 seconds all the way to
~2,400 seconds.
6.3.4.2 Uncontrolled network sharing
When we allow the two workflow ensembles to share the same network resources without any QoS policy,
then we observe a very noticeable increase to the workflow makespans (Figures 6.4a, 6.4b middle). The
most impacted are the workflows of the wind ensemble, where the average workflow duration increases
120
0
5
10
15
20
25
30
35
40
1Gbps
Dedicated
1Gbps
SDX-Shared
300Mbps
SDX-Shared-QoS
Duration (Seconds)
Network Configuration
(a) Wind Ensemble
0
5
10
15
20
1Gbps
Dedicated
1Gbps
SDX-Shared
700Mbps
SDX-Shared-QoS
Duration (Seconds)
Network Configuration
(b) Nowcast Ensemble
Figure 6.5: Data Transfer Durations.
from 300 seconds to over 1000 seconds, with some workflows completing execution close to 1800 seconds.
This is an increase of over 500%. The impact of the additional network overhead is also visible in the
nowcast workflows, although more subtle. The median nowcast workflow duration increased by about
200 seconds, while there were more workflows to the far ends of the spectrum.
6.3.4.3 Applying SDX QoS policies
Finally, based on the network profiles presented in Figure 6.3a and Figure 6.3b we allocated 300Mbps of
the available network bandwidth to the wind workflow ensemble and 700Mbps to the nowcast workflow
ensemble, in an attempt to accommodate any network spikes of the wind ensemble. Both Figure 6.4a and
Figure 6.5a show an improvement of the wind workflow median makespan and data transfer durations.
The wind ensemble’s statistics have returned to a more consistent and predictable state with small deviation, similar to the execution conditions when a dedicated network link was used. Meanwhile, as it
was expected, the median runtime of the nowcast workflows has increased since there is less available
bandwidth (700Mbps) than what the workflow would optimally require (~900Mbps). However, the relative
increase in comparison to the dedicated link runtimes is less than 60%. Something we did not expect to
see was that even though the median duration of the file transfers in the nowcast ensemble increased by
a few seconds, the transfers became more consistent, reducing the duration of the slowest transfers.
121
1
2
4
6
8
10
12
14
16
1 2 4 6 8 10 12
Wind Ensemble Execution Parallelism
Nowcast Ensemble Execution Parallelism
349
320
346
316
320
326
327
325
323
405
403
436
385
377
382
404
419
572
599
617
607
609
565
624
695
795
737
765
707
765
912
851
868
867
843
918
970
908
891
896
1121
938
Avg. Makespan (Seconds)
(a) Wind Workflow
1
2
4
6
8
10
12
14
16
1 2 4 6 8 10 12
Wind Ensemble Execution Parallelism
Nowcast Ensemble Execution Parallelism
380
389
390
395
389
390
393
387
397
410
446
460
484
472
459
439
413
572
597
593
594
559
570
595
692
709
695
713
741
689
910
861
885
873
884
1017
1049
1063
1034
1220
1120
1224
Avg. Makespan (Seconds)
(b) Nowcast Workflow
Figure 6.6: Heatmap of Average Workflow Makespans
6.3.4.4 Applying QoS Policies using Pegasus-EM
With the extensions we introduced in Section 6.2.2 the Pegasus Ensemble Manager now supports dynamic
data driven applications and its throttling capabilities offer another opportunity to apply QoS policies on
their workflow ensembles that share network resources. We executed the wind workflow ensemble with
workflow execution parallelism ranging from 1 to 16, and the nowcast ensemble with workflow execution
parallelism ranging from 1 to 12. Figures 6.6a and 6.6b present the average workflow makespans of the two
ensembles and we can distinguish a pattern for both cases. As we increase the concurrency of the nowcast
ensemble (moving to the right) the average workflow execution time increases in both cases, and affects
the turnaround times. For the wind workflows there is a 350% worst case increase, and for the nowcast
workflows there is a 320% worst case increase. On the other hand increasing the concurrency of the wind
ensemble (moving to the top) does not affect the execution times, which was expected.
Figures 6.7a and 6.7b show the makespans of the whole ensembles. In Figure 6.7a as we increase the
wind ensemble parallelism (moving to the top) and maintaining the nowcast ensemble parallelism equal to
1, the makespan of the wind ensemble decreases, but only until max concurrency equals 6. After that there
is no improvement. However as we increase the nowcast ensemble parallelism (moving to the right) we
need to increase wind ensemble parallelism again to improve the makespans. In Figure 6.7b the makespan
122
1
2
4
6
8
10
12
14
16
1 2 4 6 8 10 12
Wind Ensemble Execution Parallelism
Nowcast Ensemble Execution Parallelism
11792
5542
3171
2059
2112
2092
2106
2142
2094
6521
3370
2677
2227
2017
2226
2261
2229
4645
3475
3027
2508
2310
2428
2140
3825
3701
2759
2572
2520
2564
3827
3532
2986
2705
2714
3567
3493
2977
3116
3300
3035
2825
Makespan (Seconds)
(a) Wind Ensemble
1
2
4
6
8
10
12
14
16
1 2 4 6 8 10 12
Wind Ensemble Execution Parallelism
Nowcast Ensemble Execution Parallelism
12206
12447
12083
12081
11828
11872
11531
11740
12033
6079
6880
7025
7564
7192
6872
6708
6490
4446
4719
4598
4746
4446
4573
4818
3893
3900
3846
3897
3937
3705
3813
3689
3785
3697
3799
3658
3822
3916
3723
3823
3597
3760
Makespan (Seconds)
(b) Nowcast Ensemble
Figure 6.7: Heatmap of Workflow Ensemble Makespans
of the nowcast ensembles is governed only by the execution parallelism set for them. As we increase the
nowcast ensemble execution parallelism (moving to the right) the makespan of the ensembles is reduced,
until max concurrency of 8 is reached. After this point we do not observe any significant improvements,
the network gets significantly saturated and the workflow turnaround time is tripled (Figures 6.6a, 6.6b)
compared to the non-congested state.
One thing that is notable with the Pegasus-EM throttling, is that we were able to get better workflow
turnaround times for the nowcast ensemble under a shared resource scenario, than in the 1Gbps dedicated
link scenario (Figure 6.4b left boxplot). By setting the nowcast ensemble’s workflow execution parallelism
to 8 the average workflow turnaround time is under 1000 seconds (Figure 6.6b). This can be explained due
to the fact that the nowcast ensemble overly saturates the network (Figure 6.3b) and by pacing the rate of
the dispatched nowcast workflows we can achieve better average workflow execution times.
6.4 Conclusion
In this chapter we demonstrated how a DDDAS system can experience degraded performance and fail to
satisfy QoS requirements, due to unfair network contention caused by workflow ensembles that compete
for the provisioned network resources. We proposed two novel approaches to apply QoS policies to the
123
workflow ensembles, in order to promote a fairer sharing of the provisioned cyberinfrastructure. The
Virtual Software Defined Exchange (vSDX) approach, allows the fine-grained control over the dynamically
established networks of the cyberinfrastructure, via link adaptation, flow prioritization and traffic control
between endpoints. These policies can be an effective way to avoid unfair use of network resources. Even
if single workflow ensembles are capable of flooding and congesting the network, other ensembles can
maintain their own QoS requirements. To evaluate the QoS polices we deployed two of CASA’s workflow
ensembles (Wind Speed and Nowcast) and we showed that even though the Nowcast ensemble is capable
of interfering with the Wind Ensemble, by applying the QoS policies the interference is removed and
the Wind ensemble’s performance returns to levels close to the ones observed using a dedicated network
link. The second approach proposed in this chapter is based on the Pegasus Ensemble Manager (PegasusEM). Pegasus-EM was extended within the scope of this thesis to support file and time-based workflow
triggering logic that allows DDDAS applications, like CASA, to automatically execute their workflows as
new data arrive while managing the number of the concurrent workflows being executed. In our evaluation
we showed that Pegasus-EM provides an alternative way of applying QoS policies and can promote a
fairer sharing of both network and compute resources. This is essential for the infrastructures that do not
offer Software Defined Network (SDN) support and QoS policies need to be applied. Both methods were
incorporated to the DyNamo system that we developed and described in Section 2.
124
Chapter 7
Summary and Future Work
Computational science today depends on complex, data-intensive applications operating on datasets from a
variety of scientific instruments. Dynamic Data Driven Application Systems (DDDAS) are being employed
by many scientific domains in order to dynamically integrate real-time and archival data from multiple
sources, with their modeling and simulation algorithms. DDDAS applications heavily rely on the available
cyberinfrastructure (CI) for their operations, but often they do not leverage all the available CI capabilities
or capacities. For example, they are not designed to utilize adaptive features offered by state-of-the-art,
networked cloud infrastructures, especially with respect to delivering end-to-end, high-performance data
flows. Traditional approaches of statically provisioned, dedicated, preconfigured computing and network
infrastructure are expensive, hard to adapt, and difficult to manage.
In this thesis we identified challenges and provided algorithms and solutions for managing the CI in
support of Dynamic Data Driven Application Systems. Our efforts focused on making data integration
into DDDAS applications more efficient by building new automation mechanisms driven by the science
application and by managing the network infrastructure to reduce bottlenecks and increase performance
of data transfers. In Chapter 2 we presented our approach to infrastructure federation and presented the
design and evaluation of mechanisms that enable malleable, high-performance data flows between diverse,
125
distributed, national-scale cloud platforms. In Chapter 3 we introduced an end-to-end framework for online performance data capture, preprocessing, and storage of scientific workflows performance data. The
framework includes fine-grained and coarse-grained application-level and resource-level data within or
across distributed CI. Additionally, in Chapter 4 we presented an experimental study demonstrating how
data management configurations affect data-intensive applications, while they are executing in cloud environments. In Chapter 5 we introduced mechanisms that can be applied during the planning phase of
the workflows, when the workflow management system makes resource selection and task scheduling
decisions. We explored how dynamic workflow reorganization and placement of computing tasks closer
to the edge can help optimize the application’s network needs by reducing data movement and increasing
data reuse. Finally, in Chapter 6 we conceptualized and implemented two novel methodologies to alleviate network contention using application-aware network control mechanisms, that identify and throttle
application flows that saturate the network at runtime.
The constant evolution of the cyberinfrastructure, which new capabilities include software and new
types of hardware, provide a lot of opportunities for future research directions. P4 (Programming Protocol independent Packet Processors) [24] capable switches are becoming available more broadly and offer
promising avenues for future work in-network management and data transfer optimization. P4 switches
make the concept of in network processing more approachable through programmable interfaces and APIs,
allowing applications to perform computations as the data moves through the network. Among others,
with the P4 language DDDAS could introduce virtual network functions into their workflows, increase
their security during data transfers, and develop new network congestion avoidance strategies via real
time network traffic analysis [90],[34],[126].
Additionally, in Chapter 5 we touched upon the edge-to-cloud continuum as a way of decreasing network requirements of the DDDAS workflows. However, the integration of edge computing and IoT devices
126
into the data transfer and DDDAS ecosystem, needs to be further explored. Current data transfer mechanisms are designed for powerful endpoints that have constant connectivity. Thus, there is a need to rethink
how in these cases data become available to DDDAS applications, and how DDDAS applications can utilize
more efficiently these low powered computational resources at the edges [85],[152].
Finally, new classes of extreme low latency DDDAS applications, are becoming common place. For
example, the deployment and control of Unpiloted Aerial Vehicles (UAV) applications have become either
the subject of research themselves or are used by researchers to provide scientific observations and input to DDDAS applications, ranging from environmental monitoring and disaster response, to wildfire
monitoring [43, 167, 33, 12]. UAV-based applications are often mission critical and are also affected by
intermittent network connectivity and highly fluctuating throughput. As a result, extreme low latency
DDDAS cyberinfrastructure research for controlling UAVs can be crucial for the successful completion of
their missions.
The field of CI management for dynamic data-driven application systems is highly interdisciplinary.
Collaborating with experts from various domains and staying up-to-date with the latest research and technological advancements will always be crucial for impactful future work in this area.
127
Bibliography
[1] "AmazonWebServices". AWS IoT Greengrass. url: https://aws.amazon.com/greengrass/.
[2] 1000 Genomes Project Consortium. “A global reference for human genetic variation”. In: Nature
526.7571 (2012), pp. 68–74.
[3] Abdurro’uf et al. “The Seventeenth Data Release of the Sloan Digital Sky Surveys: Complete
Release of MaNGA, MaStar, and APOGEE-2 Data”. In: apjs 259.2, 35 (Apr. 2022), p. 35. doi:
10.3847/1538-4365/ac4414. arXiv: 2112.02026 [astro-ph.GA].
[4] Saeid Abrishami, Mahmoud Naghibzadeh, and Dick H.J. Epema. “Deadline-constrained
workflow scheduling algorithms for Infrastructure as a Service Clouds”. In: Future Generation
Computer Systems 29.1 (2013), pp. 158–169.
[5] Advanced Message Queuing Protocol (AMQP). https://www.amqp.org/.
[6] Ahab Github Repository. https://github.com/RENCI-NRIG/ahab.
[7] Takuya Akiba et al. “Optuna: A next-generation hyperparameter optimization framework”. In:
Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data
mining. 2019.
[8] Firoj Alam, Ferda Ofli, and Muhammad Imran. “CrisisMMD: Multimodal Twitter Datasets from
Natural Disasters”. In: Proceedings of the 12th International AAAI Conference on Web and Social
Media (ICWSM). USA, June 2018.
[9] Nilani Algiriyage, Raj Prasanna, Kristin Stock, Emma E. H. Doyle, and David Johnston.
“Multi-source Multimodal Data and Deep Learning for Disaster Response: A Systematic Review”.
In: SN Computer Science 3.1 (Nov. 2021), p. 92. issn: 2661-8907. doi: 10.1007/s42979-021-00971-4.
[10] William Allcock, John Bresnahan, Rajkumar Kettimuthu, Michael Link, Catalin Dumitrescu,
Ioan Raicu, and Ian Foster. “The Globus Striped GridFTP Framework and Server”. In: ACM/IEEE
Conference on Supercomputing. SC ’05. Washington, DC, USA: IEEE Computer Society, 2005,
pp. 54–. isbn: 1-59593-061-2. doi: 10.1109/SC.2005.72.
128
[11] Bryce Allen, John Bresnahan, Lisa Childers, Ian Foster, Gopi Kandaswamy, Raj Kettimuthu,
Jack Kordas, Mike Link, Stuart Martin, Karl Pickett, and Steven Tuecke. “Software as a Service
for Data Scientists”. In: Communications of the ACM 55.2 (2012), pp. 81–88.
[12] Robert S. Allison, Joshua M. Johnston, Gregory Craig, and Sion Jennings. “Airborne Optical and
Thermal Remote Sensing for Wildfire Detection and Monitoring”. In: Sensors 16.8 (2016). issn:
1424-8220. doi: 10.3390/s16081310.
[13] Amazon. Amazon S3.
[14] Amazon.com, Inc. Elastic Compute Cloud (EC2). http://aws.amazon.com/ec2. url:
http://aws.amazon.com/ec2.
[15] Apache jclouds. https://jclouds.apache.org/.
[16] Daniel G. Arce. “Cybersecurity and platform competition in the cloud”. In: Computers & Security
93 (2020), p. 101774. issn: 0167-4048. doi: https://doi.org/10.1016/j.cose.2020.101774.
[17] Owen Arnold, Jean-Christophe Bilheux, JM Borreguero, Alex Buts, Stuart I Campbell, L Chapon,
M Doucet, N Draper, R Ferraz Leal, MA Gigg, et al. “Mantid—data analysis and visualization
package for neutron scattering and µ SR experiments”. In: Nuclear Instruments and Methods in
Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 764
(2014), pp. 156–166.
[18] AWS CloudFormation. http://aws.amazon.com/cloudformation.
[19] I. Baldin, P. Ruth, C. Wang, and J. S. Chase. “The Future of Multi-Clouds: A Survey of Essential
Architectural Elements”. In: 2018 International Scientific and Technical Conference Modern
Computer Network Technologies (MoNeTeC). Oct. 2018, pp. 1–13.
[20] Ilya Baldin, Jeff Chase, Yufeng Xin, Anirban Mandal, Paul Ruth, Claris Castillo,
Victor Orlikowski, Chris Heermann, and Jonathan Mills. “ExoGENI: A Multi-Domain
Infrastructure-as-a-Service Testbed”. In: The GENI Book. Ed. by Rick McGeer, Mark Berman,
Chip Elliott, and Robert Ricci. Springer International Publishing, 2016, pp. 279–315.
[21] Ilya Baldin, Anita Nikolich, James Griffioen, Indermohan Inder S. Monga, Kuang-Ching Wang,
Tom Lehman, and Paul Ruth. “FABRIC: A National-Scale Programmable Experimental Network
Infrastructure”. In: IEEE Internet Computing 23.6 (2019), pp. 38–47. doi:
10.1109/MIC.2019.2958545.
[22] Erik P. Blasch, Frederica Darema, Sai Ravela, and Alex J. Aved, eds. Handbook of Dynamic Data
Driven Applications Systems Vol. 1. Springer International Publishing, 2022. doi:
10.1007/978-3-030-74568-4.
[23] Carl Boettiger. “An Introduction to Docker for Reproducible Research”. In: SIGOPS Oper. Syst.
Rev. 49.1 (Jan. 2015), pp. 71–79.
129
[24] Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford,
Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and David Walker. “P4:
Programming Protocol-Independent Packet Processors”. In: SIGCOMM Comput. Commun. Rev.
44.3 (July 2014), pp. 87–95. issn: 0146-4833. doi: 10.1145/2656877.2656890.
[25] Maria Carla Calzarossa, Luisa Massari, and Daniele Tessera. “Workload characterization: A
survey revisited”. In: ACM Computing Surveys (CSUR) 48.3 (2016), p. 48.
[26] Joao Carreira et al. “Cirrus: A serverless framework for end-to-end ML workflows”. In:
Proceedings of the ACM Symposium on Cloud Computing. 2019, pp. 13–24.
[27] Mitchell K Cavanagh, Kenji Bekki, and Brent A Groves. “Morphological classification of galaxies
with deep learning: comparing 3-way and 4-way CNNs”. In: Monthly Notices of the Royal
Astronomical Society 506.1 (June 2021), pp. 659–676. issn: 0035-8711. doi:
10.1093/mnras/stab1552. eprint:
https://academic.oup.com/mnras/article-pdf/506/1/659/38878609/stab1552.pdf.
[28] Mert Cevik, Paul Ruth, Kate Keahey, and Pierre Riteau. “Wide-area Software Defined
Networking Experiments using Chameleon”. In: IEEE INFOCOM 2019 - IEEE Conference on
Computer Communications Workshops (INFOCOM WKSHPS). Apr. 2019.
[29] Badrish Chandramouli, Joris Claessens, Suman Nath, Ivo Santos, and Wenchao Zhou. “RACE:
real-time applications over cloud-edge”. en. In: Proceedings of the 2012 international conference on
Management of Data - SIGMOD ’12. Scottsdale, Arizona, USA: ACM Press, 2012, p. 625. isbn:
9781450312479. doi: 10.1145/2213836.2213916. (Visited on 11/30/2021).
[30] Jeff Chase, Laura Grit, David Irwin, Varun Marupadi, Piyush Shivam, and Aydan Yumerefendi.
“Beyond virtual data centers: Toward an open resource control architecture”. In: International
Conference on the Virtual Computing Initiative. 2007.
[31] Weiwei Chen, Rafael Ferreira da Silva, Ewa Deelman, and Thomas Fahringer. “Dynamic and
Fault-Tolerant Clustering for Scientific Workflows”. In: IEEE Transactions on Cloud Computing
4.1 (2016), pp. 49–62. doi: 10.1109/TCC.2015.2427200.
[32] Weiwei Chen, Rafael Ferreira da Silva, Ewa Deelman, and Rizos Sakellariou. “Using Imbalance
Metrics to Optimize Task Clustering in Scientific Workflow Executions”. In: Future Generation
Computer Systems 46 (2015), pp. 69–84. doi: 10.1016/j.future.2014.09.014.
[33] Xiwen Chen, Bryce Hopkins, Hao Wang, Leo O’Neill, Fatemeh Afghah, Abolfazl Razi, Peter Fulé,
Janice Coen, Eric Rowell, and Adam Watts. “Wildland Fire Detection and Monitoring using a
Drone-collected RGB/IR Image Dataset”. In: IEEE Access (2022), pp. 1–1. doi:
10.1109/ACCESS.2022.3222805.
[34] Yingwen Chen, Hong Va Leong, Ming Xu, Jiannong Cao, K.C.C Chan, and A.T.S Chan.
“In-Network Data Processing for Wireless Sensor Networks”. In: 7th International Conference on
Mobile Data Management (MDM’06). 2006, pp. 26–26. doi: 10.1109/MDM.2006.96.
130
[35] Zeshan Chishti et al. “Memory System Characterization of Deep Learning Workloads”. In:
Proceedings of the International Symposium on Memory Systems. New York, NY, USA: Association
for Computing Machinery, 2019. isbn: 9781450372060. doi: 10.1145/3357526.3357569.
[36] Fahim Chowdhury et al. “I/O Characterization and Performance Evaluation of BeeGFS for Deep
Learning”. In: Proceedings of the 48th International Conference on Parallel Processing (2019).
[37] Emanuel Ferreira Coutinho, Flávio Rubens de Carvalho Sousa, Paulo Antonio Leal Rego,
Danielo Gonçalves Gomes, and José Neuman de Souza. “Elasticity in cloud computing: a survey”.
In: annals of telecommunications - annales des télécommunications 70.7 (Aug. 2015), pp. 289–309.
[38] Frederica Darema, Erik P. Blasch, Sai Ravela, and Alex J. Aved, eds. Handbook of Dynamic Data
Driven Applications Systems Vol. 2. Springer International Publishing, 2023. doi:
10.1007/978-3-031-27986-7.
[39] Eli D Dart, Katie A Antypas, Gregory R Bell, Edward W Bethel, Richard Carlson, Vince Dattoria,
Kaushik De, Ian T Foster, Barbara Helland, Mary C Hester, et al. “Advanced Scientific
Computing Research Network Requirements Review: Final Report 2015”. In: (2016).
[40] Erfan Darzidehkalani, Mohammad Ghasemi-rad, and P.M.A. van Ooijen. “Federated Learning in
Medical Imaging: Part I: Toward Multicentral Health Care Ecosystems”. In: Journal of the
American College of Radiology 19.8 (2022), pp. 969–974. issn: 1546-1440. doi:
https://doi.org/10.1016/j.jacr.2022.03.015.
[41] Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J Maechling,
Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny, and Kent Wenger. “Pegasus: a
Workflow Management System for Science Automation”. In: Future Generation Computer
Systems 46 (2015), pp. 17–35. doi: 10.1016/j.future.2014.10.008.
[42] Ewa Deelman, Karan Vahi, Mats Rynge, Rajiv Mayani, Rafael Ferreira da Silva,
George Papadimitriou, and Miron Livny. “The Evolution of the Pegasus Workflow Management
Software”. In: Computing in Science Engineering 21.4 (2019), pp. 22–36. doi:
10.1109/MCSE.2019.2919690.
[43] Ritu Dewan and Khandakar Faridar Rahman. “A Survey on Applications of Unmanned Aerial
Vehicles (UAVs)”. In: Recent Innovations in Computing. Ed. by Pradeep Kumar Singh,
Yashwant Singh, Jitender Kumar Chhabra, Zoltán Illés, and Chaman Verma. Singapore: Springer
Singapore, 2022, pp. 95–110. isbn: 978-981-16-8892-8.
[44] Paolo Di Tommaso, Maria Chatzou, Evan W Floden, Pablo Prieto Barja, Emilio Palumbo, and
Cedric Notredame. “Nextflow enables reproducible computational workflows”. In: Nature
biotechnology 35.4 (2017), p. 316.
[45] M. Dickinson, S. Debroy, P. Calyam, S. Valluripally, Y. Zhang, R. Bazan Antequera, T. Joshi,
T. White, and D. Xu. “Multi-cloud Performance and Security Driven Federated Workflow
Management”. In: IEEE Transactions on Cloud Computing (2018), pp. 1–1. issn: 2168-7161.
[46] Docker Inc. Docker Compose.
131
[47] “Doppler Radar and Weather Observations (Second Edition)”. In: Doppler Radar and Weather
Observations (Second Edition). Ed. by Richard J. Doviak and Dušan S. Zrnic. Second Edition. San
Diego: Academic Press, 1993, p. iv. isbn: 978-0-12-221422-6. doi:
https://doi.org/10.1016/B978-0-12-221422-6.50002-4.
[48] ELK Stack. https://www.elastic.co/elk-stack. 2018.
[49] Energy Sciences Network (ESnet). https://www.es.net/.
[50] Exoplex Github Repository. https://github.com/RENCI-NRIG/CICI-SAFE.
[51] Dror G Feitelson, Dan Tsafrir, and David Krakov. “Experience with using the parallel workloads
archive”. In: Journal of Parallel and Distributed Computing 74.10 (2014), pp. 2967–2982.
[52] Rafael Ferreira da Silva, Scott Callaghan, and Ewa Deelman. “On the Use of Burst Buffers for
Accelerating Data-Intensive Scientific Workflows”. In: 12th Workshop on Workflows in Support of
Large-Scale Science (WORKS’17). 2017. doi: 10.1145/3150994.3151000.
[53] Rafael Ferreira da Silva, Scott Callaghan, Tu Mai Anh Do, George Papadimitriou, and
Ewa Deelman. “Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows”.
In: Future Generation Computer Systems 101 (2019), pp. 208–220. doi:
10.1016/j.future.2019.06.016.
[54] Rafael Ferreira da Silva, Weiwei Chen, Gideon Juve, Karan Vahi, and Ewa Deelman.
“Community Resources for Enabling and Evaluating Research on Scientific Workflows”. In: 10th
IEEE International Conference on e-Science. eScience’14. 2014, pp. 177–184. doi:
10.1109/eScience.2014.44.
[55] Rafael Ferreira da Silva, Rosa Filgueira, Ewa Deelman, Erola Pairo-Castineira,
Ian Michael Overton, and Malcolm Atkinson. “Using Simple PID-inspired Controllers for Online
Resilient Resource Management of Distributed Scientific Workflows”. In: Future Generation
Computer Systems 95 (2019), pp. 615–628. doi: 10.1016/j.future.2019.01.015.
[56] Rafael Ferreira da Silva and Tristan Glatard. “A Science-Gateway Workload Archive to Study
Pilot Jobs, User Activity, Bag of Tasks, Task Sub-steps, and Workflow Executions”. In: Euro-Par
2012: Parallel Processing Workshops. Ed. by Ioannis Caragiannis et al. Vol. 7640. Lecture Notes in
Computer Science. 2013, pp. 79–88. doi: 10.1007/978-3-642-36949-0_10.
[57] Rafael Ferreira da Silva, Gideon Juve, Mats Rynge, Ewa Deelman, and Miron Livny. “Online Task
Resource Consumption Prediction for Scientific Workflows”. In: Parallel Processing Letters 25.3
(2015). doi: 10.1142/S0129626415410030.
[58] Rafael Ferreira da Silva, Mats Rynge, Gideon Juve, Igor Sfiligoi, Ewa Deelman, James Letts,
Frank Wurthwein, and Miron Livny. “Characterizing a High Throughput Computing Workload:
The Compact Muon Solenoid (CMS) Experiment at LHC”. In: Procedia Computer Science,
International Conference On Computational Science, ICCS 2015 Computational Science at the Gates
of Nature 51 (2015), pp. 39–48. doi: 10.1016/j.procs.2015.05.190.
132
[59] I. Foster. “Globus Online: Accelerating and Democratizing Science through Cloud-Based
Services”. In: IEEE Internet Computing 15.3 (May 2011), pp. 70–73.
[60] Prathamesh Gaikwad, Anirban Mandal, Paul Ruth, Gideon Juve, Dariusz Król, and
Ewa Deelman. “Anomaly detection for scientific workflow applications on networked clouds”.
In: High Performance Computing & Simulation (HPCS), 2016 International Conference on. IEEE.
2016, pp. 645–652.
[61] Guilherme Galante, Luis Carlos Erpen De Bona, Antonio Roberto Mury, Bruno Schulze, and
Rodrigo Rosa Righi. “An Analysis of Public Clouds Elasticity in the Execution of Scientific
Applications: A Survey”. In: Journal of Grid Computing 14.2 (June 2016), pp. 193–216.
[62] Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, and Carole Goble.
“Common motifs in scientific workflows: An empirical analysis”. In: Future Generation Computer
Systems 36 (2014), pp. 338–351.
[63] Wolfgang Gerlach, Wei Tang, Kevin Keegan, Travis Harrison, Andreas Wilke, Jared Bischof,
Mark D’Souza, Scott Devoid, Daniel Murphy-Olson, Narayan Desai, et al. “Skyport:
container-based execution environment management for multi-cloud scientific workflows”. In:
Proceedings of the 5th International Workshop on Data-Intensive Computing in the Clouds. IEEE
Press. 2014, pp. 25–32.
[64] Wolfgang Gerlach, Wei Tang, Andreas Wilke, Dan Olson, and Folker Meyer. “Container
orchestration for scientific workflows”. In: 2015 IEEE International conference on cloud
engineering. IEEE. 2015, pp. 377–378.
[65] M. H. Ghahramani, M. Zhou, and C. T. Hon. “Toward cloud computing QoS architecture:
analysis of cloud systems and cloud services”. In: IEEE/CAA Journal of Automatica Sinica 4.1
(2017), pp. 6–18.
[66] Aritra Ghosh, C. Megan Urry, Zhengdong Wang, Kevin Schawinski, Dennis Turp, and
Meredith C. Powell. “Galaxy Morphology Network: A Convolutional Neural Network Used to
Study Morphology and Quenching in 100,000 SDSS and 20,000 CANDELS Galaxies”. In: ˜ The
Astrophysical Journal 895.2 (June 2020), p. 112. doi: 10.3847/1538-4357/ab8a47.
[67] Shilpa Gite, Abhinav Mishra, and Ketan Kotecha. “Enhanced lung image segmentation using
deep learning”. en. In: Neural Comput Appl (Jan. 2022), pp. 1–15.
[68] Google Cloud. https://cloud.google.com/.
[69] Google cloud IOT - fully managed IOT services. url: https://cloud.google.com/solutions/iot.
[70] Timothy Grance and Wayne Jansen. Guidelines on Security and Privacy in Public Cloud
Computing. Nov. 2011. doi: https://doi.org/10.6028/NIST.SP.800-144.
[71] Robert Graves, Thomas H. Jordan, Scott Callaghan, Ewa Deelman, Edward Field, Gideon Juve,
Carl Kesselman, Philip Maechling, Gaurang Mehta, Kevin Milner, David Okaya, Patrick Small,
and Karan Vahi. “CyberShake: A Physics-Based Seismic Hazard Model for Southern California”.
In: Pure and Applied Geophysics 168.3-4 (May 2010), pp. 367–381. doi: 10.1007/s00024-010-0161-6.
133
[72] Mauricio Guignard et al. “Performance Characterization of State-Of-The-Art Deep Learning
Workloads on an IBM "Minsky" Platform”. In: Proceedings of the 51st Hawaii International
Conference on System Sciences. 2018. doi: 10.24251/HICSS.2018.702.
[73] Dan Gunter, Ewa Deelman, Taghrid Samak, Christopher Brooks, Monte Goode, Gideon Juve,
Gaurang Mehta, Priscilla Moraes, Fabio Silva, Martin Swany, and Karan Vahi. “Online Workflow
Management and Performance Analysis with Stampede”. In: 7th International Conference on
Network and Service Management (CNSM-2011). 2011.
[74] Jeroen van der Ham, Freek Dijkstra, Paola Grosso, Ronald van der Pol, Andree Toonk, and
Cees de Laat. “A distributed topology information system for optical networks based on the
semantic web”. In: Optical Switching and Networking 5.2 (2008), pp. 85–93. issn: 1573-4277. doi:
https://doi.org/10.1016/j.osn.2008.01.006.
[75] HDF5. https://portal.hdfgroup.org/display/HDF5/HDF5.
[76] Bert Hubert, Thomas Graf, Greg Maxwell, Remco van Mook, Martijn van Oosterhout,
P Schroeder, Jasper Spaans, and Pedro Larroy. “Linux advanced routing & traffic control”. In:
Ottawa Linux Symposium. Vol. 213. 2002.
[77] Internet2. https://www.internet2.edu/.
[78] Introducing FBLearner Flow: Facebook’s AI backbone. Facebook Engineering. 2016. url:
https://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-aibackbone/.
[79] Alexandru Iosup, Hui Li, Mathieu Jan, Shanny Anoep, Catalin Dumitrescu, Lex Wolters, and
Dick HJ Epema. “The grid workloads archive”. In: Future Generation Computer Systems 24.7
(2008), pp. 672–686.
[80] Aditi Jain et al. Lung Instance Segmentation Workflow Implementation for the Pegasus Workflow
Management System. Aug. 2021. doi: 10.5281/zenodo.5297479.
[81] Jianwei Li, Wei-keng Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel,
B. Gallagher, and M. Zingale. “Parallel netCDF: A High-Performance Scientific I/O Interface”. In:
SC ’03: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing. Nov. 2003, pp. 39–39.
doi: 10.1109/SC.2003.10053.
[82] Jongmin Jo, Sucheol Jeong, and Pilsung Kang. “Benchmarking GPU-Accelerated Edge Devices”.
In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). 2020,
pp. 117–120. doi: 10.1109/BigComp48618.2020.00-89.
[83] Gideon Juve, Ann Chervenak, Ewa Deelman, Shishir Bharathi, Gaurang Mehta, and Karan Vahi.
“Characterizing and Profiling Scientific Workflows”. In: Future Generation Computer Systems
29.3 (Mar. 2013), pp. 682–692. url:
http://pegasus.isi.edu/publications/2013/JuveG-Characterizing.pdf.
134
[84] Gideon Juve, Benjamin Tovar, Rafael Ferreira da Silva, Dariusz Krol, Douglas Thain,
Ewa Deelman, William Allcock, and Miron Livny. “Practical Resource Monitoring for Robust
High Throughput Computing”. In: Workshop on Monitoring and Analysis for High Performance
Computing Systems Plus Applications. 2015. doi: 10.1109/CLUSTER.2015.115.
[85] Yogeswaranathan Kalyani and Rem Collier. “A Systematic Survey on the Role of Cloud, Fog, and
Edge Computing Combination in Smart Agriculture”. In: Sensors 21.17 (June 2021), p. 5922. doi:
10.3390/s21175922.
[86] Pilsung Kang and Sungmin Lim. “A Taste of Scientific Computing on the GPU-Accelerated Edge
Device”. In: IEEE Access 8 (2020), pp. 208337–208347. doi: 10.1109/ACCESS.2020.3038714.
[87] Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik,
Jacob Colleran, Haryadi S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes,
François Halbach, Alex Rocha, and Joe Stubbs. “Lessons Learned from the Chameleon Testbed”.
In: Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC ’20). USENIX
Association, July 2020.
[88] Stephen M Kent. “Sloan digital sky survey”. In: Astrophysics and Space Science 217 (1994),
pp. 27–30.
[89] Rajkumar Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, and
Franck Cappello. “Transferring a petabyte in a day”. In: Future Generation Computer Systems 88
(2018), pp. 191–198. issn: 0167-739X. doi: https://doi.org/10.1016/j.future.2018.05.051.
[90] Youngho Kim, Dongeun Suh, and Sangheon Pack. “Selective In-band Network Telemetry for
Overhead Reduction”. In: 2018 IEEE 7th International Conference on Cloud Networking (CloudNet).
2018, pp. 1–3. doi: 10.1109/CloudNet.2018.8549351.
[91] Derrick Kondo, Bahman Javadi, Alexandru Iosup, and Dick Epema. “The failure trace archive:
Enabling comparative analysis of failures in diverse distributed systems”. In: Proceedings of the
2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE
Computer Society. 2010, pp. 398–407.
[92] Patryjca Krawczuk et al. Crisis Computing Workflow Implementation for the Pegasus Workflow
Management System. Aug. 2021. doi: 10.5281/zenodo.5298196.
[93] Patryjca Krawczuk, Srujana Subramanya, George Papadimitriou, Ryan Tanaka, and
Ewa Deelman. Galaxy Classification Workflow Implementation for the Pegasus Workflow
Management System. Aug. 2021. doi: 10.5281/zenodo.5297662.
[94] Dariusz Krol, Rafael Ferreira da Silva, Ewa Deelman, and Vickie E. Lynch. “Workflow
Performance Profiles: Development and Analysis”. In: Euro-Par 2016: Parallel Processing
Workshops. 2016, pp. 108–120. doi: 10.1007/978-3-319-58943-5_9.
[95] Kubernetes. https://kubernetes.io. 2021.
[96] Gregory M. Kurtzer, Vanessa Sochat, and Michael W. Bauer. “Singularity: Scientific containers
for mobility of compute”. In: PLOS ONE 12 (May 2017), pp. 1–20.
135
[97] Malte S Kurz. “Distributed double machine learning with a serverless architecture”. In:
Companion of the ACM/SPEC International Conference on Performance Engineering. 2021,
pp. 27–33.
[98] Sara Landset et al. “A survey of open source tools for machine learning with big data in the
Hadoop ecosystem”. In: Journal of Big Data (2015).
[99] L. Li, W. Schmid, and J. Joss. “Nowcasting of Motion and Growth of Precipitation with Radar
over a Complex Orography”. In: Journal of Applied Meteorology 34.6 (1995), pp. 1286–1300. doi:
10.1175/1520-0450(1995)034<1286:NOMAGO>2.0.CO;2. eprint:
https://doi.org/10.1175/1520-0450(1995)034<1286:NOMAGO>2.0.CO;2.
[100] Benjamin Lindner and Jeremy C Smith. “Sassena—X-ray and neutron scattering calculated from
molecular dynamics trajectories using massively parallel computers”. In: Computer Physics
Communications 183.7 (2012), pp. 1491–1501.
[101] Linux Foundation Collaborative Projects. https://www.openvswitch.org/.
[102] Ji Liu et al. “A survey of data-intensive scientific workflow management”. In: Journal of Grid
Computing 13.4 (2015).
[103] Ji Liu, Esther Pacitti, Patrick Valduriez, and Marta Mattoso. “A Survey of Data-Intensive
Scientific Workflow Management”. In: Journal of Grid Computing 13.4 (Dec. 2015), pp. 457–493.
issn: 1570-7873.
[104] Qiang Liu, Nageswara S. V. Rao, Satyabrata Sen, Bradley W. Settlemyer, Hsing-Bung Chen,
Joshua M. Boley, Rajkumar Kettimuthu, and Dimitrios Katramatos. “Virtual Environment for
Testing Software-Defined Networking Solutions for Scientific Workflows”. In: Proceedings of the
1st International Workshop on Autonomous Infrastructure for Science. AI-Science’18. Tempe, AZ,
USA: Association for Computing Machinery, 2018. isbn: 9781450358620. doi:
10.1145/3217197.3217202.
[105] Wufeng Liu, Jiaxin Luo, Yan Yang, Wenlian Wang, Junkui Deng, and Liang Yu. “Automatic lung
segmentation in chest X-ray images using improved U-Net”. In: Scientific Reports 12.1 (May
2022), p. 8649. issn: 2045-2322. doi: 10.1038/s41598-022-12743-y.
[106] Zhengchun Liu, Prasanna Balaprakash, Rajkumar Kettimuthu, and Ian Foster. “Explaining Wide
Area Data Transfer Performance”. In: 26th International Symposium on High-Performance
Parallel and Distributed Computing. HPDC ’17. Washington, DC, USA: ACM, 2017, pp. 167–178.
isbn: 978-1-4503-4699-3. doi: 10.1145/3078597.3078605.
[107] Zhengchun Liu, Rajkumar Kettimuthu, Ian Foster, and Peter H. Beckman. “Towards a Smart
Data Transfer Node”. In: Future Generation Computer Systems (June 2018), p. 10.
[108] Zhengchun Liu, Rajkumar Kettimuthu, Ian Foster, and Nageswara S.V. Rao. “Cross-geography
Scientific Data Transfer Trends and User Behavior Patterns”. In: 27th ACM Symposium on
High-Performance Parallel and Distributed Computing. HPDC ’18. Tempe, Arizona, USA: ACM,
2018. isbn: 978-1-4503-4699-3. doi: 10.1145/3208040.3208053.
136
[109] Bertram Ludäscher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones,
Edward A. Lee, Jing Tao, and Yang Zhao. “Scientific Workflow Management and the Kepler
System: Research Articles”. In: Concurr. Comput. : Pract. Exper. 18.10 (Aug. 2006), pp. 1039–1065.
issn: 1532-0626. doi: 10.1002/cpe.v18:10.
[110] E. J. Lyons, M. Zink, and B. Philips. “Efficient data processing with exogeni for the casa dfw
urban testbed”. In: 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).
July 2017, pp. 5977–5980. doi: 10.1109/IGARSS.2017.8128371.
[111] Joseph P. Macker and Ian Taylor. “Orchestration and analysis of decentralized workflows within
heterogeneous networking infrastructures”. In: Future Generation Computer Systems 75 (2017),
pp. 388–401.
[112] P. R. Mahapatra and V. V. Makkapati. “Studies on a High-Compression Technique for Weather
Radar Reflectivity Data”. In: 2005 5th International Conference on Information Communications
Signal Processing. Dec. 2005, pp. 895–899. doi: 10.1109/ICICS.2005.1689178.
[113] Maciej Malawski, Kamil Figiela, Marian Bubak, Ewa Deelman, and Jarek Nabrzyski. “Scheduling
Multilevel Deadline-constrained Scientific Workflows on Clouds Based on Cost Optimization”.
In: Scientific Programming 29 (Jan. 2015), pp. 158–169.
[114] A. Mandal, P. Ruth, I. Baldin, R. F. Da Silva, and E. Deelman. “Toward Prioritization of Data
Flows for Scientific Workflows Using Virtual Software Defined Exchanges”. In: 2017 IEEE 13th
International Conference on e-Science (e-Science). Oct. 2017, pp. 566–575.
[115] A. Mandal, P. Ruth, I. Baldin, Y. Xin, C. Castillo, G. Juve, M. Rynge, E. Deelman, and J. Chase.
“Adapting Scientific Workflows on Networked Clouds Using Proactive Introspection”. In: 2015
IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC). Dec. 2015,
pp. 162–173.
[116] C. Maple. “Geometric design and space planning using the marching squares and marching cube
algorithms”. In: Proc. 2003 Intl. Conf. Geometric Modeling and Graphics. 2003, pp. 90–95. isbn:
978-0-7695-1985-2. doi: 10.1109/GMAG.2003.1219671.
[117] TE Mason, D Abernathy, I Anderson, J Ankner, T Egami, G Ehlers, A Ekkebus, G Granroth,
M Hagen, K Herwig, et al. “The Spallation Neutron Source in Oak Ridge: A powerful tool for
materials research”. In: Physica B: Condensed Matter 385 (2006), pp. 955–960.
[118] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson,
Jennifer Rexford, Scott Shenker, and Jonathan Turner. “OpenFlow: Enabling Innovation in
Campus Networks”. In: SIGCOMM Comput. Commun. Rev. 38.2 (Mar. 2008), pp. 69–74.
137
[119] David McLaughlin, David Pepyne, V. Chandrasekar, Brenda Philips, James Kurose, Michael Zink,
Kelvin Droegemeier, Sandra Cruz-Pol, Francesc Junyent, Jerald Brotzge, David Westbrook,
Nitin Bharadwaj, Yanting Wang, Eric Lyons, Kurt Hondl, Yuxiang Liu, Eric Knapp, Ming Xue,
Anthony Hopf, Kevin Kloesel, Alfred DeFonzo, Pavlos Kollias, Keith Brewster, Robert Contreras,
Brenda Dolan, Theodore Djaferis, Edin Insanic, Stephen Frasier, and Frederick Carr.
“Short-Wavelength Technology and the Potential For Distributed Networks of Small Radar
Systems”. In: Bulletin of the American Meteorological Society 90.12 (2009), pp. 1797–1818. doi:
10.1175/2009BAMS2507.1. eprint: https://doi.org/10.1175/2009BAMS2507.1.
[120] Meet Michelangelo: Uber’s Machine Learning Platform. Uber Engineering. 2017. url:
https://eng.uber.com/michelangelo-machine-learning-platform/.
[121] Microsoft Azure Cloud. https://azure.microsoft.com/en-us/.
[122] Biswajeeban Mishra and Attila Kertesz. “The Use of MQTT in M2M and IoT Systems: A Survey”.
In: IEEE Access 8 (2020), pp. 201071–201086. doi: 10.1109/ACCESS.2020.3035849.
[123] Mobius Github Repository. https://github.com/RENCI-NRIG/Mobius.
[124] Maria Luiza Mondelli, Matheus Tonelli de Souza, Kary Ocaña, ATR de Vasconcelos, and
Luiz MR Gadelha Jr. “HPSW-Prof: a provenance-based framework for profiling high
performance scientific workflows”. In: Proceedings of Satellite Events of the 31st Brazilian
Symposium on Databases (SBBD 2016), SBC. 2016, pp. 117–122.
[125] Saad Mubeen, Sara Abbaspour Asadollah, Alessandro Vittorio Papadopoulos,
Mohammad Ashjaei, Hongyu Pei-Breivold, and Moris Behnam. “Management of Service Level
Agreements for Cloud Services in IoT: A Systematic Mapping Study”. In: IEEE Access 6 (2018),
pp. 30184–30207. doi: 10.1109/ACCESS.2017.2744677.
[126] Craig Mustard, Fabian Ruffy, Anny Gakhokidze, Ivan Beschastnikh, and Alexandra Fedorova.
“Jumpgate: In-Network Processing as a Service for Data Analytics”. In: 11th USENIX Workshop
on Hot Topics in Cloud Computing (HotCloud 19). Renton, WA: USENIX Association, July 2019.
url: https://www.usenix.org/conference/hotcloud19/presentation/mustard.
[127] National Energy Research Scientific Computing Center (NERSC). https://www.nersc.gov.
[128] NOAA/NCDC. U.S. Billion-Dollar Weather & Climate Disasters 1980-2019. Press Release.
[129] Shadi A. Noghabi, John Kolb, Peter Bodik, and Eduardo Cuervo. “Steel: Simplified Development
and Deployment of Edge-Cloud Applications”. In: 10th USENIX Workshop on Hot Topics in Cloud
Computing (HotCloud 18). Boston, MA: USENIX Association, July 2018. url:
https://www.usenix.org/conference/hotcloud18/presentation/noghabi.
[130] Jon Ander Novella, Payam Emami Khoonsari, Stephanie Herman, Daniel Whitenack,
Marco Capuccini, Joachim Burman, Kim Kultima, and Ola Spjuth. “Container-based
bioinformatics with Pachyderm”. In: Bioinformatics 35.5 (2018), pp. 839–846.
[131] Nsight. https://developer.nvidia.com/nsight-graphics. 2021.
138
[132] Oak Ridge Leadership Computing Facility (OLCF). https://www.olcf.ornl.gov.
[133] Ferda Ofli, Firoj Alam, and Muhammad Imran. Analysis of Social Media Data using Multimodal
Deep Learning for Disaster Response. 2020. arXiv: 2004.11838 [cs.CV].
[134] On-Demand Secure Circuits and Advance Reservation System.
https://www.es.net/engineering-services/oscars/.
[135] Open flow SDN Controllers. https://en.wikipedia.org/wiki/List_of_SDN_controller_software/.
[136] OpenStack Cloud Software. http://openstack.org.
[137] OpenStack Heat Project. https://wiki.openstack.org/wiki/Heat.
[138] Open Science Grid. https://www.opensciencegrid.org.
[139] S. Ostermann, R. Prodan, and T. Fahringer. “Dynamic Cloud provisioning for scientific Grid
workflows”. In: 2010 11th IEEE/ACM International Conference on Grid Computing. Oct. 2010,
pp. 97–104.
[140] Simon Ostermann, Radu Prodan, Thomas Fahringer, Alexandru Iosup, and Dick Epema. “A
trace-based investigation of the characteristics of grid workflows”. In: From Grids to Service and
Pervasive Computing. Springer, 2008, pp. 191–203.
[141] Pachyderm. https://www.pachyderm.com. 2021.
[142] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. “Glove: Global vectors for
word representation”. In: In EMNLP. 2014.
[143] James C Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid,
Elizabeth Villa, Christophe Chipot, Robert D Skeel, Laxmikant Kale, and Klaus Schulten.
“Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system”. In: IBM Journal of
Research and Development 26.1.2 (2008), pp. 1781–1802. doi: 10.1147/rd.521.0177.
[144] Pivotal. RabbitMQ. https://www.rabbitmq.com/.
[145] Ruth Pordes, Don Petravick, Bill Kramer, Doug Olson, Miron Livny, Alain Roy, Paul Avery,
Kent Blackburn, Torre Wenaus, Frank Würthwein, Ian Foster, Rob Gardner, Mike Wilde,
Alan Blatecky, John McGee, and Rob Quick. “The open science grid”. In: Journal of Physics:
Conference Series 78 (July 2007), p. 012057. doi: 10.1088/1742-6596/78/1/012057.
[146] POSIX Standard. https://pubs.opengroup.org/onlinepubs/9699919799.
[147] PyTorch. https://pytorch.org. 2021.
[148] Rackspace Cloud. https://www.rackspace.com/.
139
[149] L. Ramakrishnan, C. Koelbel, Y. Kee, R. Wolski, D. Nurmi, D. Gannon, G. Obertelli, A. YarKhan,
A. Mandal, T. M. Huang, K. Thyagaraja, and D. Zagorodnov. “VGrADS: enabling e-Science
workflows on grids and clouds with fault tolerance”. In: Proceedings of the Conference on High
Performance Computing Networking, Storage and Analysis. 2009, pp. 1–12.
[150] Lavanya Ramakrishnan and Dennis Gannon. “A survey of distributed workflow characteristics
and resource requirements”. In: Indiana University (2008), pp. 1–23.
[151] Narathip Reamaroon, Michael W. Sjoding, Harm Derksen, Elyas Sabeti, Jonathan Gryak,
Ryan P. Barbaro, Brian D. Athey, and Kayvan Najarian. “Robust segmentation of lung in chest
x-ray: applications in analysis of acute respiratory distress syndrome”. In: BMC Medical Imaging
20.1 (Oct. 2020), p. 116. issn: 1471-2342. doi: 10.1186/s12880-020-00514-y.
[152] Daniel Rosendo, Alexandru Costan, Patrick Valduriez, and Gabriel Antoniu. “Distributed
intelligence on the Edge-to-Cloud Continuum: A systematic literature review”. In: Journal of
Parallel and Distributed Computing 166 (2022), pp. 71–94. issn: 0743-7315. doi:
https://doi.org/10.1016/j.jpdc.2022.04.004.
[153] E. Ruzanski and V. Chandrasekar. “Weather Radar Data Interpolation Using a Kernel-Based
Lagrangian Nowcasting Technique”. In: IEEE Transactions on Geoscience and Remote Sensing 53.6
(June 2015), pp. 3073–3083. issn: 0196-2892. doi: 10.1109/TGRS.2014.2368076.
[154] Mats Rynge, Gideon Juve, Karan Vahi, Scott Callaghan, Gaurang Mehta, Philip J. Maechling, and
Ewa Deelman. “Enabling Large-scale Scientific Workflows on Petascale Resources Using MPI
Master/Worker”. In: XSEDE12. 2012. url:
http://pegasus.isi.edu/publications/2012/XSEDE12-Rynge-pegasus-mpi-cluster.pdf.
[155] Ryu Rest Router. https://github.com/faucetsdn/ryu/blob/master/ryu/app/rest_router.py.
[156] Ryu SDN Controller. https://ryu-sdn.org/.
[157] Jyoti Sahni and Deo Prakash Vidyarthi. “Workflow-and-Platform Aware task clustering for
scientific workflow execution in Cloud environment”. In: Future Generation Computer Systems 64
(2016), pp. 61–74. issn: 0167-739X. doi: https://doi.org/10.1016/j.future.2016.05.008.
[158] Romelia Salomon-Ferrer, David A Case, and Ross C Walker. “An overview of the Amber
biomolecular simulation package”. In: Wiley Interdisciplinary Reviews: Computational Molecular
Science 3.2 (2013), pp. 198–210.
[159] Mahadev Satyanarayanan. “The Emergence of Edge Computing”. In: Computer 50.1 (2017),
pp. 30–39. doi: 10.1109/MC.2017.9.
[160] SciTech. Panorama Architecture Backend. https://github.com/Panorama360/data-collection-arch.
2019.
[161] SciTech. Pegasus Panorama. https://github.com/pegasus-isi/pegasus/tree/panorama.
[162] Scitech. CASA Hail Pegasus Workflow. https://github.com/pegasus-isi/casa-hail-workflow.
140
[163] Scitech. CASA Nowcast Pegasus Workflow.
https://github.com/pegasus-isi/casa-nowcast-workflow.
[164] Scitech. CASA Wind Pegasus Workflow. https://github.com/pegasus-isi/casa-wind-workflow.
[165] Scitech. Panorama Kibana Plugin. https://github.com/Panorama360/panorama-kibana-plugin. 2018.
[166] Izzet F. Senturk, P. Balakrishnan, Anas Abu-Doleh, Kamer Kaya, Qutaibah Malluhi, and
Ümit V. Çatalyürek. “A resource provisioning framework for bioinformatics applications in
multi-cloud environments”. In: Future Generation Computer Systems 78 (2018), pp. 379–391.
[167] Alireze Shamsoshoara, Fatemeh Afghah, Abolfalz Razi, Liming Zheng, Peter J. Fulé, and
Erik Blasch. “Aerial imagery pile burn detection using deep learning: The FLAME dataset”. In:
Computer Networks 193 (2021), p. 108001. issn: 1389-1286. doi:
https://doi.org/10.1016/j.comnet.2021.108001.
[168] Rafael Ferreira da Silva, Rosa Filgueira, Ilia Pietri, Ming Jiang, Rizos Sakellariou, and
Ewa Deelman. “A characterization of workflow management systems for extreme-scale
applications”. In: Future Generation Computer Systems 75 (2017), pp. 228–238.
[169] Gurmeet Singh, Mei-Hui Su, Karan Vahi, Ewa Deelman, Bruce Berriman, John Good,
Daniel S. Katz, and Gaurang Mehta. “Workflow Task Clustering for Best Effort Systems with
Pegasus”. In: Mardi Gras Conference. 2008. url:
http://pegasus.isi.edu/publications/2008/MardiGras_v3.pdf.
[170] L. M. Smith and et al. “The Ocean Observatories Initiative”. In: Oceanography (2018).
[171] Shane Snyder, Philip Carns, Kevin Harms, Robert Ross, Glenn K Lockwood, and
Nicholas J Wright. “Modular hpc i/o characterization with darshan”. In: Extreme-Scale
Programming Tools (ESPT), Workshop on. IEEE. 2016, pp. 9–17.
[172] Craig A Stewart, David Y Hancock, Matthew Vaughn, Jeremy Fischer, Tim Cockerill,
Lee Liming, Nirav Merchant, Therese Miller, John Michael Lowe, Daniel C Stanzione, et al.
“Jetstream: performance, early experiences, and early results”. In: Proceedings of the XSEDE16
Conference on Diversity, Big Data, and Science at Scale. 2016, pp. 1–8.
[173] Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. Workflows for e-Science:
Scientific Workflows for Grids. Springer Publishing Company, Incorporated, 2014. isbn:
978-1-84628-757-2.
[174] Benjamin Teitelbaum, Susan Hares, Larry Dunn, Robert Neilson, Vishy Narayan, and
Francis Reichmeyer. “Internet2 QBone: building a testbed for differentiated services”. In: IEEE
network 13.5 (1999), pp. 8–16.
[175] TensorBoard. https://www.tensorflow.org/tensorboard. 2021.
[176] Texas Lonestar Education and Research Network (LEARN). http://www.tx-learn.org/.
141
[177] Douglas Thain, Todd Tannenbaum, and Miron Livny. “Distributed computing in practice: the
Condor experience”. In: Concurrency and computation: practice and experience 17.2-4 (2005),
pp. 323–356.
[178] The GeoJSON Specification (RFC 7946). https://tools.ietf.org/html/rfc7946.
[179] The Orcasound project. url: https://www.orcasound.net/.
[180] J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop,
D. Lifka, G. D. Peterson, R. Roskies, J. Scott, and N. Wilkins-Diehr. “XSEDE: Accelerating
Scientific Discovery”. In: Computing in Science & Engineering 16.05 (Sept. 2014), pp. 62–74. issn:
1521-9615. doi: 10.1109/MCSE.2014.80.
[181] Tstat. TCP STatistic and Analysis Tool.
[182] Karan Vahi et al. “Rethinking Data Management for Big Data Scientific Workflows”. In:
Workshop on Big Data and Science: Infrastructure and Services. 2013.
[183] Blesson Varghese, Nan Wang, Sakil Barbhuiya, Peter Kilpatrick, and Dimitrios S. Nikolopoulos.
“Challenges and Opportunities in Edge Computing”. In: 2016 IEEE International Conference on
Smart Cloud (SmartCloud). 2016, pp. 20–26. doi: 10.1109/SmartCloud.2016.18.
[184] Jinesh Varia. “Best practices in architecting cloud applications in the AWS cloud”. In: Cloud
Computing: Principles and Paradigms. Vol. 18. Wiley Online Library, 2011, pp. 459–490.
[185] Shefali Varshney, Rajinder Sandhu, and P. K. Gupta. “QoS Based Resource Provisioning in Cloud
Computing Environment: A Technical Survey”. In: Advances in Computing and Data Sciences.
Ed. by Mayank Singh, P.K. Gupta, Vipin Tyagi, Jan Flusser, Tuncer Ören, and Rekha Kashyap.
2019, pp. 711–723.
[186] Laurens Versluis, Roland Mathá, Sacheendra Talluri, Tim Hegeman, Radu Prodan, Ewa Deelman,
and Alexandru Iosup. “The Workflow Trace Archive: Open-Access Data from Public and Private
Computing Infrastructures”. In: IEEE Transactions on Parallel and Distributed Systems (2020),
pp. 1–1. doi: 10.1109/TPDS.2020.2984821.
[187] C. Wang, K. Thareja, M. Stealey, P. Ruth, and I. Baldin. “COMET: A Distributed Metadata Service
for Federated Cloud Infrastructures”. In: 2019 IEEE High Performance Extreme Computing
Conference (HPEC). Sept. 2019, pp. 1–7.
[188] Jianwu Wang, Moustafa AbdelBaky, Javier Diaz-Montes, Shweta Purawat, Manish Parashar, and
Ilkay Altintas. “Kepler + CometCloud: Dynamic Scientific Workflow Execution on Federated
Cloud Resources”. In: Procedia Computer Science 80 (2016), pp. 700–711. issn: 1877-0509.
[189] Amos Waterland. stress, POSIX workload generator. 2013.
[190] Weisong Shi et al. “Edge Computing: Vision and Challenges”. In: IEEE Internet of Things Journal
3.5 (2016), pp. 637–646. doi: 10.1109/JIOT.2016.2579198.
142
[191] Derek Weitzel, Marian Zvada, Ilija Vukotic, Rob Gardner, Brian Bockelman, Mats Rynge,
Edgar Fajardo Hernandez, Brian Lin, and Matyas Selmeci. “StashCache: A Distributed Caching
Federation for the Open Science Grid”. In: (2019). doi: 10.1145/3332186.3332212. eprint:
arXiv:1905.06911.
[192] Tom White. Hadoop: The definitive guide. " O’Reilly Media, Inc.", 2012.
[193] Justin M Wozniak et al. “CANDLE/Supervisor: A workflow framework for machine learning
applied to cancer research”. In: BMC bioinformatics 19.18 (2018).
[194] Ryan Wu, Bingwei Liu, Yu Chen, Erik Blasch, Haibin Ling, and Genshe Chen. “A
Container-Based Elastic Cloud Architecture for Pseudo Real-Time Exploitation of Wide Area
Motion Imagery (WAMI) Stream”. In: J. Signal Process. Syst. 88.2 (Aug. 2017), pp. 219–231.
[195] www.globus.org. globus. 2023.
[196] Ying Xiong, Yulin Sun, Li Xing, and Ying Huang. “Extend Cloud to Edge with KubeEdge”. In:
2018 IEEE/ACM Symposium on Edge Computing (SEC). 2018, pp. 373–377. doi:
10.1109/SEC.2018.00048.
[197] Yuanjun Yao, Qiang Cao, Rubens Farias, Jeff Chase, Victor Orlikowski, Paul Ruth, Mert Cevik,
Cong Wang, and Nick Buraglio. “Toward live inter-domain network services on the ExoGENI
testbed”. In: IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops
(INFOCOM WKSHPS). 2018, pp. 772–777.
[198] Zhen Ye, Xiaofang Zhou, and Athman Bouguettaya. “Genetic Algorithm Based QoS-Aware
Service Compositions in Cloud Computing”. In: Database Systems for Advanced Applications.
Ed. by Jeffrey Xu Yu, Myoung Ho Kim, and Rainer Unland. Berlin, Heidelberg: Springer Berlin
Heidelberg, 2011, pp. 321–334. isbn: 978-3-642-20152-3.
[199] Geoffrey Yeap, S. S. Lin, Y. M. Chen, H. L. Shang, P. W. Wang, H. C. Lin, Y. C. Peng, J. Y. Sheu,
M. Wang, X. Chen, B. R. Yang, C. P. Lin, F. C. Yang, Y. K. Leung, D. W. Lin, C. P. Chen, K. F. Yu,
D. H. Chen, C. Y. Chang, H. K. Chen, P. Hung, C. S. Hou, Y. K. Cheng, J. Chang, L. Yuan,
C. K. Lin, C. C. Chen, Y. C. Yeo, M. H. Tsai, H. T. Lin, C. O. Chui, K. B. Huang, W. Chang,
H. J. Lin, K. W. Chen, R. Chen, S. H. Sun, Q. Fu, H. T. Yang, H. T. Chiang, C. C. Yeh, T. L. Lee,
C. H. Wang, S. L. Shue, C. W. Wu, R. Lu, W. R. Lin, J. Wu, F. Lai, Y. H. Wu, B. Z. Tien,
Y. C. Huang, L. C. Lu, Jun He, Y. Ku, J. Lin, M. Cao, T. S. Chang, and S. M. Jang. “5nm CMOS
Production Technology Platform featuring full-fledged EUV, and High Mobility Channel
FinFETs with densest 0.021µm2 SRAM cells for Mobile SoC and High Performance Computing
Applications”. In: 2019 IEEE International Electron Devices Meeting (IEDM). 2019,
pp. 36.7.1–36.7.4. doi: 10.1109/IEDM19573.2019.8993577.
[200] Matei Zaharia et al. “Accelerating the machine learning lifecycle with MLflow.” In: IEEE Data
Eng. Bull. 41.4 (2018), pp. 39–45.
[201] Matei Zaharia et al. “Spark: Cluster computing with working sets.” In: HotCloud 10.10-10 (2010),
p. 95.
[202] Zeek Github Repository. https://github.com/zeek/zeek.
143
[203] Jiao Zhang, F Richard Yu, Shuo Wang, Tao Huang, Zengyi Liu, and Yunjie Liu. “Load Balancing
in Data Center Networks: A Survey”. In: IEEE Communications Surveys & Tutorials (2018).
[204] Quan Zhang, Xiaohong Zhang, Qingyang Zhang, Weisong Shi, and Hong Zhong. “Firework: Big
Data Sharing and Processing in Collaborative Edge Environment”. In: 2016 Fourth IEEE
Workshop on Hot Topics in Web Systems and Technologies (HotWeb). 2016, pp. 20–25. doi:
10.1109/HotWeb.2016.12.
[205] Chao Zheng, Ben Tovar, and Douglas Thain. “Deploying high throughput scientific workflows
on container schedulers with makeflow and mesos”. In: 2017 17th IEEE/ACM International
Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE. 2017, pp. 130–139.
[206] Charles Zheng and Douglas Thain. “Integrating containers into workflows: A case study using
makeflow, work queue, and docker”. In: Proceedings of the 8th International Workshop on
Virtualization Technologies in Distributed Computing. ACM. 2015, pp. 31–38.
[207] Keren Zhou et al. “A Tool for Top-down Performance Analysis of GPU-Accelerated
Applications”. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of
Parallel Programming. PPoPP ’20. San Diego, California: Association for Computing Machinery,
2020, pp. 415–416. isbn: 9781450368186. doi: 10.1145/3332466.3374534.
[208] Xiao-Pan Zhu et al. “Galaxy morphology classification with deep convolutional neural
networks”. In: Astrophysics and Space Science (2019).
144
List of Publications
Chapters in Books
[B1] George Papadimitriou, Cong Wang, Eric Lyons, Komal Thareja, Paul Ruth, J. J. Villalobos,
Ivan Rodero, Ewa Deelman, Michael Zink, and Anirban Mandal. “Dynamic Network-Centric
Multi-cloud Platform for Real-Time and Data-Intensive Science Workflows”. In: Handbook of
Dynamic Data Driven Applications Systems: Volume 2. Ed. by Frederica Darema, Erik P. Blasch,
Sai Ravela, and Alex J. Aved. Cham: Springer International Publishing, 2023, pp. 835–868. isbn:
978-3-031-27986-7. doi: 10.1007/978-3-031-27986-7_32.
Articles in International Refereed Journals
[J2] Hongwei Jin, Krishnan Raghavan, George Papadimitriou, Cong Wang, Anirban Mandal,
Mariam Kiran, Ewa Deelman, and Prasanna Balaprakash. “Graph neural networks for detecting
anomalies in scientific workflows”. In: The International Journal of High Performance Computing
Applications (2023). doi: 10.1177/10943420231172140. eprint:
https://doi.org/10.1177/10943420231172140.
[J3] Alicia Esquivel Morel, Chengyi Qu, Prasad Calyam, Cong Wang, Komal Thareja,
Anirban Mandal, Eric Lyons, Michael Zink, George Papadimitriou, and Ewa Deelman.
“FlyNet: Drones on the Horizon”. In: IEEE Internet Computing 27.3 (2023), pp. 35–43. doi:
10.1109/MIC.2023.3260440.
[J4] George Papadimitriou, Eric Lyons, Cong Wang, Komal Thareja, Ryan Tanaka, Paul Ruth,
Ivan Rodero, Ewa Deelman, Michael Zink, and Anirban Mandal. “Fair sharing of network
resources among workflow ensembles”. In: Cluster Computing (2021). issn: 1573-7543. doi:
10.1007/s10586-021-03457-3.
145
[J5] George Papadimitriou, Cong Wang, Karan Vahi, Rafael Ferreira da Silva, Anirban Mandal,
Liu Zhengchun, Rajiv Mayani, Mats Rynge, Mariam Kiran, Vickie E. Lynch,
Rajkumar Kettimuthu, Ewa Deelman, Jeffrey S. Vetter, and Ian Foster. “End-to-End Online
Performance Data Capture and Analysis for Scientific Workflows”. In: Future Generation
Computer Systems 117 (2021), pp. 387–400. issn: 0167-739X. doi:
https://doi.org/10.1016/j.future.2020.11.024.
[J6] Mariam Kiran, Cong Wang, George Papadimitriou, Anirban Mandal, and Ewa Deelman.
“Detecting Anomalous Packets in Network Transfers: Investigations using PCA, Autoencoder
and Isolation Forest in TCP”. In: Machine Learning (2020). issn: 1573-0565. doi:
10.1007/s10994-020-05870-y.
[J7] Ewa Deelman, Karan Vahi, Mats Rynge, Rajiv Mayani, Rafael Ferreira da Silva,
George Papadimitriou, and Miron Livny. “The Evolution of the Pegasus Workflow
Management Software”. In: Computing in Science Engineering 21.4 (2019), pp. 22–36. doi:
10.1109/MCSE.2019.2919690.
[J8] Rafael Ferreira da Silva, Scott Callaghan, Tu Mai Anh Do, George Papadimitriou, and
Ewa Deelman. “Measuring the Impact of Burst Buffers on Data-Intensive Scientific Workflows”.
In: Future Generation Computer Systems 101 (2019), pp. 208–220. doi:
10.1016/j.future.2019.06.016.
Articles in International Refereed Conferences
[C9] Andrew Grote, Eric Lyons, Komal Thareja, George Papadimitriou, Ewa Deelman,
Anirban Mandal, Prasad Calyam, and Michael Zink. “FlyPaw: Optimized Route Planning for
Scientific UAVMissions”. In: 2023 IEEE 19th International Conference on e-Science (e-Science). 2023,
pp. 1–10. doi: 10.1109/e-Science58273.2023.10254831.
[C10] Alicia Esquivel Morel, Prasad Calyam, Chengyi Qu, Durbek Gafurov, Cong Wang,
Komal Thareja, Anirban Mandal, Eric Lyons, Michael Zink, George Papadimitriou, and
Ewa Deelman. “Network Services Management using Programmable Data Planes for Visual
Cloud Computing”. In: 2023 International Conference on Computing, Networking and
Communications (ICNC). 2023, pp. 130–136. doi: 10.1109/ICNC57223.2023.10074183.
[C11] Patrycja Krawczuk, George Papadimitriou, Shubham Nagarkar, Mariam Kiran,
Anirban Mandal, and Ewa Deelman. “Anomaly Detection in Scientific Workflows using
End-to-End Execution Gantt Charts and Convolutional Neural Networks”. In: Proceedings of the
Practice and Experience in Advanced Research Computing. PEARC ’21. Boston, MA, USA:
Association for Computing Machinery, 2021. isbn: 978-1-4503-8292-2. doi:
10.1145/3437359.3465597.
146
[C12] Eric Lyons, Hakan Saplakoglu, Michael Zink, Komal Thareja, Anirban Mandal, Chengyi Qu,
Songjie Wang, Prasad Calyam, George Papadimitriou, Ryan Tanaka, and Ewa Deelman.
“FlyNet: A Platform to Support Scientific Workflows from the Edge to the Core for UAV
Applications”. In: Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud
Computing. UCC ’21. Leicester, United Kingdom: Association for Computing Machinery, 2021.
isbn: 9781450385640. doi: 10.1145/3468737.3494098.
[C13] Eric Lyons, Dong-Jun Seo, Sunghee Kim, Hamideh Habibi, George Papadimitriou,
Ryan Tanaka, Ewa Deelman, Michael Zink, and Anirban Mandal. “Predicting Flash Floods in the
Dallas-Fort Worth Metroplex Using Workflows and Cloud Computing”. In: 2021 IEEE 17th
International Conference on eScience (eScience). 2021, pp. 259–261. doi:
10.1109/eScience51609.2021.00050.
[C14] Huy Tu, George Papadimitriou, Mariam Kiran, Cong Wang, Anirban Mandal, Ewa Deelman,
and Tim Menzies. “Mining Workflows for Anomalous Data Transfers”. In: 2021 2021 IEEE/ACM
18th International Conference on Mining Software Repositories (MSR) (MSR). Los Alamitos, CA,
USA: IEEE Computer Society, May 2021, pp. 1–12. doi: 10.1109/MSR52588.2021.00013.
[C15] Eric Lyons, David Westbrook, Andrew Grote, George Papadimitriou, Komal Thareja,
Cong Wang, Michael Zink, Ewa Deelman, Anirban Mandal, and Paul Ruth. “An On-Demand
Weather Avoidance System for Small Aircraft Flight Path Routing”. In: Dynamic Data Driven
Application Systems. Ed. by Frederica Darema, Erik Blasch, Sai Ravela, and Alex Aved. Cham:
Springer International Publishing, 2020, pp. 311–319. isbn: 978-3-030-61725-7. doi:
10.1007/978-3-030-61725-7_36.
[C16] Eric Lyons, Michael Zink, Anirban Mandal, Cong Wang, Paul Ruth,
Chandrasekar Radhakrishnan, George Papadimitriou, Ewa Deelman, Komal Thareja, and
Ivan Rodero. “DyNamo: Scalable Weather Workflow Processing in the Academic MultiCloud”.
In: 100th American Meteorological Society Annual Meeting (2020).
[C17] George Papadimitriou, Karan Vahi, Jason Kincl, Valentine Anantharaj, Ewa Deelman, and
Jack Wells. “Workflow Submit Nodes as a Service on Leadership Class Systems”. In: Proceedings
of the Practice and Experience in Advanced Research Computing. PEARC ’20. Portland, OR, USA:
Association for Computing Machinery, 2020. isbn: 978-1-4503-6689-2. doi:
10.1145/3311790.3396671.
[C18] K. Vahi, D. Goldstein, G. Papadimitriou, P. Nugent, and E. Deelman. “Gearing the DECam
Analysis Pipeline for Multi-Messenger Astronomy Using Pegasus Workflows”. In: Astronomical
Data Analysis Software and Systems XXIX. Ed. by R. Pizzo, E. R. Deul, J. D. Mol, J. de Plaa, and
H. Verkouter. Vol. 527. Astronomical Society of the Pacific Conference Series. Jan. 2020, p. 631.
[C19] Cong Wang, George Papadimitriou, Mariam Kiran, Anirban Mandal, and Ewa Deelman.
“Identifying Execution Anomalies for Data Intensive Workflows Using Lightweight ML
Techniques”. In: 2020 IEEE High Performance extreme Computing Conference (HPEC). 2020,
pp. 1–7. doi: 10.1109/HPEC43674.2020.9286139.
147
[C20] Eric Lyons, George Papadimitriou, Cong Wang, Komal Thareja, Paul Ruth, J.J. Villalobos,
Ivan Rodero, Ewa Deelman, Michael Zink, and Anirban Mandal. “Toward a Dynamic
Network-centric Distributed Cloud Platform for Scientific Workflows: A Case Study for
Adaptive Weather Sensing”. In: 15th International Conference on eScience (eScience). San Diego,
CA, USA, 2019, pp. 67–76. doi: 10.1109/eScience.2019.00015.
[C21] Ivan Rodero, Yubo Qin, Jesus Valls, Anthony Simonet, J.J. Villalobos, Manish Parashar,
Chooban Youn, Cong Wang, Komal Thareja, Paul Ruth, George Papadimitriou, Eric Lyons,
and Michael Zink. “Enabling Data Streaming-based Science Gateway through Federated
Cyberinfrastructure”. In: Gateways 2019. San Diego, CA, USA, 2019.
[C22] Karan Vahi, Mats Rynge, George Papadimitriou, Duncan Brown, Rajiv Mayani,
Rafael Ferreira da Silva, Ewa Deelman, Anirban Mandal, Eric Lyons, and Michael Zink. “Custom
Execution Environments with Containers in Pegasus-enabled Scientific Workflows”. In: 15th
International Conference on eScience (eScience). San Diego, CA, USA, 2019, pp. 281–290. doi:
10.1109/eScience.2019.00039.
Articles in International Refereed Workshops
[W23] Imtiaz Mahmud, George Papadimitriou, Cong Wang, Mariam Kiran, Anirban Mandal, and
Ewa Deelman. “Elephants Sharing the Highway: Studying TCP Fairness in Large Transfers over
High Throughput Links”. In: Proceedings of the SC ’23 Workshops of The International Conference
on High Performance Computing, Network, Storage, and Analysis. SC-W ’23. Denver, CO, USA:
Association for Computing Machinery, 2023, pp. 806–818. isbn: 9798400707858. doi:
10.1145/3624062.3624594.
[W24] Alicia Esquivel Morel, Durbek Gafurov, Prasad Calyam, Cong Wang, Komal Thareja,
Anirban Mandal, Eric Lyons, Michael Zink, George Papadimitriou, and Ewa Deelman.
“Experiments on Network Services for Video Transmission using FABRIC Instrument
Resources”. In: IEEE INFOCOM 2023 - IEEE Conference on Computer Communications Workshops
(INFOCOM WKSHPS). 2023, pp. 1–6. doi: 10.1109/INFOCOMWKSHPS57453.2023.10225817.
[W25] Hongwei Jin, Krishnan Raghavan, George Papadimitriou, Cong Wang, Anirban Mandal,
Patrycja Krawczuk, Loïc Pottier, Mariam Kiran, Ewa Deelman, and Prasanna Balaprakash.
“Workflow Anomaly Detection with Graph Neural Networks”. In: 2022 IEEE/ACM Workshop on
Workflows in Support of Large-Scale Science (WORKS). 2022, pp. 35–42. doi:
10.1109/WORKS56498.2022.00010.
[W26] Ryan Tanaka, George Papadimitriou, Sai Charan Viswanath, Cong Wang, Eric Lyons,
Komal Thareja, Chengyi Qu, Alicia Esquivel, Ewa Deelman, Anirban Mandal, Prasad Calyam,
and Michael Zink. “Automating Edge-to-cloud Workflows for Science: Traversing the
Edge-to-cloud Continuum with Pegasus”. In: 2022 22nd IEEE International Symposium on Cluster,
Cloud and Internet Computing (CCGrid). 2022, pp. 826–833. doi: 10.1109/CCGrid54584.2022.00098.
148
[W27] Henri Casanova, Ewa Deelman, Sandra Gesing, Michael Hildreth, Stephen Hudson,
William Koch, Jeffrey Larson, Mary Ann McDowell, Natalie Meyers, John-Luke Navarro,
George Papadimitriou, Ryan Tanaka, Ian Taylor, Douglas Thain, Stefan M. Wild,
Rosa Filgueira, and Rafael Ferreira da Silva. “Emerging Frameworks for Advancing Scientific
Workflows Research, Development, and Education”. In: 2021 IEEE Workshop on Workflows in
Support of Large-Scale Science (WORKS). 2021, pp. 74–80. doi: 10.1109/WORKS54523.2021.00015.
[W28] Patrycja Krawczuk, George Papadimitriou, Ryan Tanaka, Tu Mai Anh Do, Srujana Subramany,
Shubham Nagarkar, Aditi Jain, Kelsie Lam, Anirban Mandal, Loïc Pottier, and Ewa Deelman. “A
Performance Characterization of Scientific Machine Learning Workflows”. In: 2021 IEEE/ACM
Workflows in Support of Large-Scale Science (WORKS). 2021. doi: 10.1109/WORKS54523.2021.00013.
[W29] George Papadimitriou and Ewa Deelman. “A Lightweight GPU Monitoring Extension for
Pegasus Kickstart”. In: 2021 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS).
2021.
[W30] George Papadimitriou, Eric Lyons, Cong Wang, Komal Thareja, Ryan Tanaka, Paul Ruth,
J.J. Villalobos, Ivan Rodero, Ewa Deelman, Michael Zink, and Anirban Mandal. “Application
Aware Software Defined Flows of Workflow Ensembles”. In: 2020 IEEE/ACM Innovating the
Network for Data-Intensive Science (INDIS). 2020, pp. 10–21. doi: 10.1109/INDIS51933.2020.00007.
[W31] George Papadimitriou, Mariam Kiran, Cong Wang, Anirban Mandal, and Ewa Deelman.
“Training Classifiers to Identify TCP Signatures inScientific Workflows”. In: 2019 IEEE/ACM
Innovating the Network for Data-Intensive Science (INDIS). Denver, CO, USA, 2019, pp. 61–68. doi:
10.1109/INDIS49552.2019.00012.
Preprint Articles Available Online
[O32] George Papadimitriou, Hongwei Jin, Cong Wang, Krishnan Raghavan, Anirban Mandal,
Prasanna Balaprakash, and Ewa Deelman. Flow-Bench: A Dataset for Computational Workflow
Anomaly Detection. 2023. doi: 10.48550/ARXIV.2306.09930.
149
Abstract (if available)
Abstract
Computational science today depends on complex, data-intensive applications operating on datasets from a variety of scientific instruments. These datasets may be huge in volume, may have high velocity or both, raising major challenges of how scientists can analyze these datasets. For example, the Legacy Survey of Space and Time (LSST) telescope collects over 20TB of data per day, with a goal of 500 PB towards the end of the survey (large data volume). Other applications, such as the network of weather radars located in the Dallas Fort-Worth (DFW) area can produce a steady data flow of over 400Mbits/second (high velocity). On the other hand, workflows processing these datasets might need to respond to changes in the processing load (e.g, increases in data flow), in order to maintain a steady and predictable turnaround time.
With the deployment of these data-intensive applications, scientists have to answer questions related to the cyberinfrastructure (CI) they rely on for the processing. Such questions are: 1) What are the most appropriate resources needed for the application execution? 2) How can the computations scale on demand? 3) How to distribute and access the datasets efficiently, satisfying quality of service (QoS) requirements? Having to worry about the CI can be a distraction or even a road block, consuming time or preventing scientists from achieving their goals. Workflow management systems provide tools that can address these issues and enable the execution of complex applications in heterogeneous and distributed CI, but can be too generic and might be missing functionality essential to the applications.
In this thesis we present our efforts to improve the performance of these data-intensive application systems, while they are executing on modern CI. We construct and evaluate novel approaches and methodologies that aim to improve performance without adding more complexity to the CI, and we develop new tools that extend the functionality offered by the CI. We provide a methodology that makes it easier for scientists to interact with the cyberinfrastructure available to them, and we develop a new framework to capture end-to-end performance statistics of the data-intensive workflows executed on the heterogeneous and highly distributed CI, which couldn't be done at this scale before. Additionally, since modern CI is very malleable and provides many configuration opportunities, we evaluate how the choices during the acquisition and configuration of resources affect the performance of the data-intensive workflows.
Finally, network performance is a very important factor that dictates the performance of the data-intensive workflows. This thesis answers the fundamental question of how scientists can manage the CI and apply policies that can help their data-intensive applications meet their constraints (.e.g, turn around time), by avoiding network degradation. We develop methodologies that rely on workflow restructuring and optimizations that take place during the planning phase of the workflows, and can reduce their peak network requirements. We also develop active approaches that can be applied to reduce the per workflow network requirements during their execution. These approaches use a workflow ensemble manager and application aware software defined flows.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Workflow restructuring techniques for improving the performance of scientific workflows executing in distributed environments
PDF
Resource management for scientific workflows
PDF
Optimizing execution of in situ workflows
PDF
Data-driven methods for increasing real-time observability in smart distribution grids
PDF
Dynamic graph analytics for cyber systems security applications
PDF
A resource provisioning system for scientific workflow applications
PDF
Heterogeneous graphs versus multimodal content: modeling, mining, and analysis of social network data
PDF
Provenance management for dynamic, distributed and dataflow environments
PDF
An automated testing system for scientific workflows
PDF
Efficient data and information delivery for workflow execution in grids
PDF
Exploiting variable task granularities for scalable and efficient parallel graph analytics
PDF
An end-to-end framework for provisioning based resource and application management
PDF
Scientific workflow generation and benchmarking
PDF
Compilation of data-driven macroprograms for a class of networked sensing applications
PDF
Towards trustworthy and data-driven social interventions
PDF
Efficient processing of streaming data in multi-user and multi-abstraction workflows
PDF
Adaptive and resilient stream processing on cloud infrastructure
PDF
On efficient data transfers across geographically dispersed datacenters
PDF
Hardware-software codesign for accelerating graph neural networks on FPGA
PDF
Applying semantic web technologies for information management in domains with semi-structured data
Asset Metadata
Creator
Papadimitriou, Georgios
(author)
Core Title
Cyberinfrastructure management for dynamic data driven applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2023-12
Publication Date
11/21/2023
Defense Date
10/25/2023
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cyberinfrastructure,data intensive applications,dynamic data driven applications,OAI-PMH Harvest,scientific workflows,software defined networks,workflow management systems
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Deelman, Ewa (
committee chair
), Nakano, Aiichiro (
committee member
), Prasanna, Viktor (
committee member
)
Creator Email
georgepapajim@gmail.com,papadimi@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113777750
Unique identifier
UC113777750
Identifier
etd-Papadimitr-12490.pdf (filename)
Legacy Identifier
etd-Papadimitr-12490
Document Type
Dissertation
Format
theses (aat)
Rights
Papadimitriou, Georgios
Internet Media Type
application/pdf
Type
texts
Source
20231127-usctheses-batch-1108
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
cyberinfrastructure
data intensive applications
dynamic data driven applications
scientific workflows
software defined networks
workflow management systems