Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A unified mapping framework for heterogeneous computing systems and computational grids
(USC Thesis Other)
A unified mapping framework for heterogeneous computing systems and computational grids
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UM I films
the text directly from the original or copy submitted. Thus, some thesis and
dissertation copies are in typewriter face, while others may be from any type of
computer printer.
The quality o f this reproduction is dependent upon the quality of the
copy submitted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bieedthrough, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send U M I a complete manuscript
and there are missing pages, these w ill be noted. Also, if unauthorized
copyright material had to be removed, a note w ill indicate the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by
sectioning the original, beginning at the upper left-hand comer and continuing
from left to right in equal sections with small overlaps.
ProQuest Information and Learning
300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA
800-521-0600
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A UNIFIED MAPPING FRAMEWORK FOR HETEROGENEOUS
COMPUTING SYSTEMS AND COMPUTATIONAL GRIDS
by
Ammar Hasan Aihusaini
A Dissertation Presented to the
FACULTY’ OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER ENGINEERING)
December 2001
Copyright 2001 Ammar Hasan Aihusaini
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
U M I Number 3065757
U M I*
U M I Microform 3065757
Copyright 2002 by ProQuest Information and Learning Company.
A ll rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, M l 48106-1346
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UNIVERSITY OF SOUTHERN CALIFORNIA
The G raduate School
U niversity Park
LOS ANGELES, CALIFORNIA 900891695
This dissertation, w ritten b y
AMMAR HASAN ALHUSAlNI
Under the direction o f h J J S .. D issertation
Com mittee, an d approved b y a il its m em bers,
has been presen ted to an d accepted b y The
Graduate School, in p a rtia l fu lfillm en t o f
requirem ents fo r th e degree o f
DOCTOR OF PHILOSOPHY
D ate December 17 , 2001
DISSERTATION COMMITTEE
_____________________________
/] / , Chai rperson
-----------------------
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Dedication
This dissertation is dedicated to my father, my mother, my wife, and my chil
dren.
ii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Acknowledgments
I would like to take this opportunity to express my deepest appreciation to the
people who have made this dissertation possible. First, I would like to thank
Dr. Viktor Prasanna, my advisor at USC. for his guidance, encouragement,
and support throughout my Ph.D. program. He provided me excellent direction
and moral support in the long road of my dissertation. I have been extremely
fortunate to have him as my advisor and to know him as a person.
I also thank the members of my qualifying examination and defense commit
tees. Dr. Jean-Luc Gaudiot. Dr. C.S. Raghavendra. Dr. Cyrus Shahabi. and Dr.
Monte Ung.
I thank my friends at Pgroup, our research group, who have made my Ph.D.
study an enriching experience. Finally. I would like to thank my parents and my
wife for being extremely encouraging and patient. Thank you all.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Contents
Dedication ii
Acknowledgments iii
List of Figures vii
List of Tables ix
Abstract x
1 Introduction 1
1.1 M o tiv atio n s............................................................................................. 1
1.2 Thesis C o n trib u tio n s............................................................................. 12
1.3 Chapter Sum m aries................................................................................ 15
2 Background 18
2.1 D efinitions................................................................................................. 18
2.2 A Motivational E x a m p le....................................................................... 19
2.3 Taxonomies of HC Systems ................................................................ 22
2.4 An Overview of Some Research P r o je c ts ........................................... 23
2.4.1 M S H N ............................................................................................ 24
2.4.2 G lo b u s............................................................................................ 27
2.4.3 L eg io n ............................................................................................ 31
2.4.4 A p p L e S ........................................................................................ 34
2.4.5 Y D C E ............................................................................................ 37
3 Mapping in HC Systems and Computational Grids 39
3.1 The Mapping P ro b le m .......................................................................... 39
3.2 N'P-Completeness of the Mapping P ro b le m ...................................... 42
3.3 Mapping Taxonom ies............................................................................. 44
3.4 An Overview of Mapping A lgorithm s................................................ 46
iv
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4 The proposed Mapping Framework 53
4.1 System M o d el......................................................................................... 53
4.2 Application Model ................................................................................ 56
4.3 Mapping Problem ................................................................................... 59
4.4 Mapping algorithms................................................................................ 60
5 Mapping w ith Multiple Resource Requirements and Data Repli
cation 63
5.1 Introduction............................................................................................. 64
5.2 Problem D efinition................................................................................ 66
5.2.1 System M odel............................................................................. 66
5.2.2 Application M o d e l................................................................... 66
5.2.3 Problem S ta te m e n t................................................................... 69
5.3 Our Mapping Approach ...................................................................... 71
5.3.1 Level-By-Level A pproach.......................................................... 72
5.3.2 Greedy A pproach....................................................................... 76
5.3.3 Main Characteristics of Our Algorithms ............................. 78
5.3.3.1 Simultaneous Selection.............................................. 79
5.3.3.2 Unified M a p p in g ........................................................ 79
5.4 Performance Evaluation ...................................................................... 85
5.4.1 Simulation Procedure................................................................ 85
5.4.2 Experimental R e su lts................................................................ 86
5.5 Related W o r k ......................................................................................... 92
5.6 Summary ............................................................................................... 93
6 Mapping w ith Resource Co-Allocation Requirements 95
6.1 Introduction............................................................................................ 96
6.2 Problem D efinition................................................................................ 99
6.2.1 System M odel.................................................................................100
6.2.2 Application M o d e l.......................................................................100
6.2.3 Objective F u n c tio n .......................................................................103
6.3 Mapping A lg o rith m s................................................................................103
6.3.1 Independent-Set A pproach......................................................... 104
6.3.1.1 General Mapping Steps .............................................. 105
6.3.1.2 Resource-Sharing G r a p h .............................................. 105
6.3.1.3 The Independent-Set Mapping Algorithm .... 108
6.3.1.4 Maximal Independent Sets S e le c tio n ....................... 110
6.3.1.5 Allocation H euristics.....................................................115
6.3.2 Critical-Resource A p p ro a c h ...................................................... 117
6.3.2.1 Critical R esource.......................................................... 117
6.3.2.2 Dynamic-Critical-Resource Algorithm ......................118
6.4 Performance Evaluation ......................................................................... 120
v
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.4.1 A Lower B o u n d.............................................................................. 120
6.4.2 Baseline A lgorithm ........................................................................124
6.4.3 Implementation I s s u e s ................................................................. 125
6.4.4 Simulation Procedure.................................................................... 126
6.4.5 Experimental R esu lts.................................................................... 127
6.5 Summary ..................................................................................................... 133
7 Conclusions and Future Directions 134
7.1 C onclusions..................................................................................................134
7.2 Future D irections........................................................................................ 137
7.2.1 Mapping with Run-Time A d a p ta tio n .......................................137
7.2.2 Mapping with QoS R equirem ents............................................. 140
7.2.3 Mapping in Time-Sharing E n v iro n m en ts................................ 144
vi
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Figures
1.1 An example of a HC system.................................................................... 3
2.1 A hypothetical example of the advantage of using HC systems. . . 21
2.2 MSHN architecture................................................................................... 25
2.3 The Globus resource management architecture.................................. 30
2.4 The Legion resource management model............................................. 33
2.5 AppLeS architecture................................................................................. 35
4.1 Main components of our mapping framework..................................... 54
4.2 Example of TIG and DAG representations.......................................... 57
5.1 Example of two application DAGs........................................................ 67
5.2 The combined DAG for the applications in Figure 5.1..................... 73
5.3 Pseudo code of the level-bv-level approach.......................................... 74
5.4 Level partitioning for the combined DAG in Figure 5.2................... 75
5.5 A motivation example for the greedy approach.................................. 78
5.6 Application DAG for the example in Section 5.3.3.2......................... 82
5.7 Separated mapping (machines first)...................................................... 83
5.8 Separated mapping (data repositories first)........................................ 84
5.9 Unified mapping........................................................................................ 84
5.10 Performance of the level-by-level algorithms with vary ing number
of tasks......................................................................................................... 87
5.11 Performance of the greedy approach algorithms with varying num
ber of tasks.................................................................................................. 87
5.12 Performance of the level-by-level algorithms with different CCR. . 88
vii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.13 Performance of the greedy approach algorithms with different C ’CR. 89
5.14 Comparison of our algorithms with varying number of tasks. . . . 90
5.15 Comparison of the two approaches based on the average schedule
length........................................................................................................... 90
6.1 General mapping steps of the independent-set approach..................... 106
6.2 The resource-sharing graph for the tasks shown in Table 6.1. . . . 107
6.3 Pseudo code of our mapping algorithm based on the independent-
set approach................................................................................................... I l l
6.4 Example schedules for the example in Section 6.3.1.4........................... 113
6.5 Pseudo code of the dvnamic-critical-resource algorithm....................... 119
6.6 Comparison with the lower bound.............................................................128
6.7 Comparison of schedule lengths with different number of tasks. . . 130
6.8 Comparison of running times with different number of tasks. . . . 131
6.9 Comparison of schedule lengths with different application structures. 132
6.10 Comparison of running times with different application structures. 132
7.1 Our initial approach for mapping with QoS requirements....................141
viii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Tables
3.1 Mapping algorithms from HC and grid literature............................... 47
3.2 DAG mapping algorithms........................................................................ 49
3.3 TIG mapping algorithms.......................................................................... 51
3.4 Meta-task mapping algorithms............................................................... 52
5.1 Estimated computation times for the tasks in Figure 5.6.................. 82
5.2 Communication costs (time units/data unit) for the example in
Section 5.3.3.2............................................................................................ 83
5.3 Input requirements for the tasks in Figure 5.6............................. 83
6.1 An example showing six tasks and their resource requirements. . . 107
6.2 Execution times for the tasks in Figure 6.2.................................... 112
6.3 Minimum, maximum, and average percentage difference in sched
ule length between independent-set approach algorithms with dif
ferent number of tasks........................................................................129
IX
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abstract
In Heterogeneous Computing (HC) systems and computational grids, a diverse set
of geographically distributed resources are used to solve challenging problems. A
major challenge in using these systems is to effectively use available resources.
System resources are shared among applications. Applications are submitted
from various user sites with specific quality of service requirements. One way
to take advantage of HC systems is to decompose an application into several
tasks based on the computational requirements. Different tasks may be best
suited for different machines. Once the application is decomposed into tasks,
each task needs to be assigned to a suitable machine (matching problem) and
task executions need to be ordered in time (scheduling problem) to optimize a
given objective function.
The focus of this dissertation is the matching and scheduling (defined as
mapping) of application tasks onto HC systems and computational grids. We
introduce a unified framework that can be used for mapping applications onto
system resources. Our framework consists of four key components: system model,
application model, mapping problem, and mapping algorithms. The framework
x
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
incorporates the concept of advance reservation where system resources can be
reserved in advance for specific time intervals. Our mapping algorithms are de
veloped in such a wav that all resource requirements are considered at the same
time in a unified manner to achieve better mapping decisions.
Based on this framework, we develop efficient mapping algorithms for two
novel problems. The first problem is mapping applications with multiple re
source requirements and data replication. Our algorithms for this problem are
of two types: level-by-level algorithms and greedy algorithms. The second prob
lem is mapping a set of applications with resource co-allocation requirements.
Application tasks have two types of constraints to be satisfied: precedence con
straints and resource sharing constraints. Two different approaches are used to
develop the heuristic algorithms: independent-set approach and critical-resource
approach. For this mapping problem, we also develop a lower bound on the
optimal schedule length. Performance evaluation shows the effectiveness of our
mapping algorithms for both problems.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 1
Introduction
1.1 M otivations
The last decade has seen enormous improvements in commodity computer and
communication capability among local as well as geographically distributed sys
tems. This motivated a number of research groups to investigate the possibility of
using a diverse set of geographically distributed resources as one "virtual" system
to solve challenging problems. This new approach is known by several names such
as metacomputing [73]. heterogeneous computing (HC) [32. 54, 69]. and more re
cently grid computing [37]. In [54]. HC is defined as the well-orchestrated and
coordinated use of a diverse set of resources to provide efficient processing for
computationally demanding applications with diverse computing needs.
A wide variety of resources including compute resources such as supercom
puters. storage systems, and special devices can be coupled and used as a single
1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
unified resource to form a computational grid [7]. A computational grid is defined
as a hardware and software infrastructure that provides dependable, consistent,
pervasive, and inexpensive access to high-end computational capabilities [36].
In general, computational grids are wide-area networking infrastructures con
necting heterogeneous and high performance computers (as well as other re
sources) at geographically distributed sites. The term "the Grid” is used to
denote a proposed distributed computing infrastructure that will connect mul
tiple regional and national computational grids to create a universal source of
computing power [37. 39]. The word “grid” is chosen in analogy' to the electric
power grid that delivers electrical energy from generator sites to the consumers.
In this dissertation, we consider HC systems and computational grids (or
simply grids) to be the systems th at make use of several compute resources with
different capabilities. I/O devices, data repositories, and other resources, all in
terconnected bv heterogeneous local and wide area networks to optimize the
performance of the system. Such systems provide a universal access to resources,
even at sites that are remotely located from the physical resources. In general.
HC systems and computational grids exploit the heterogeneity of applications
and system resources to enable the construction of high-performance systems.
Figure 1.1 shows an example of a HC system.
One of the motivations behind HC systems and computational grids is the
need to access resources not located within a single system. Frequently, the
•>
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Local Network
(FO O L Ethernet)
Site 2
S ite 1
Itti Long
(A T M )
WAN H a m
(155 mm*)
Data Repository
S ite 3
Figure 1.1: An example of a HC system.
driving force is economic: scarce resources such as supercomputers and specialized
I/O devices are too expensive to be replicated. Alternatively, an application may
require concurrent access to resources that would not normally be co-located at
the same place. Finally, certain unique resources, such as people and certain
specialized databases cannot be replicated. In each case, the ability to construct
networked virtual supercomputers can provide qualitatively new capabilities that
enable new approaches to problem solving [13].
Several application scenarios would benefit from HC environments and com
putational grids. Five major application classes have been identified in [37]:
3
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Distributed Supercomputing
Distributed supercomputing applications are distinguished by their require
ments for large amount of computational resources. These applications cou
ple multiple computational resources to tackle problems that are too large
for a single system, or that can benefit from executing different problem
components on different computer architectures. Examples of such appli
cations include distributed interactive simulation (DIS) and global climate
modeling.
• High-Throughput Computing
In high-throughput computing, applications use idle cycles from computa
tional resources to schedule large number of independent tasks. As an exam
ple. the Condor system from the University of Wisconsin is used to manage
pools of hundreds of workstations at universities and laboratories around
the world [59]. When they are idle, workstations are used to execute appli
cations. In general, the goal of the Condor project is to develop, implement,
deploy, and evaluate mechanisms and policies that support high-throughput
computing on large collections of distributive!}- owned computing resource's.
• On-Demand Computing
On-demand applications use remote resources that cannot be located locally
to meet short-term requirements. These application are usually driven by
cost-performance concerns rather than absolute performance. For example.
4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
network-enhanced numerical solver systems, such as Net Solve [18]. allow
users to couple remote software and resources into desktop applications.
Calculations that are computationally demanding or that require special
ized software are done at remote resources.
• Data-Intensive Computing
The focus of data-intensive applications is to extract useful information of
huge amounts of data that are maintained in geographically distributed
repositories. Data assimilation is one example of data-intensive applica
tions.
• Collaborative Computing
Collaborative applications are concerned with facilitating and enabling the
shared use of resources by a group of people from different locations.
HC systems and computational grids have several unique characteristics. Some
of these characteristics are [13]:
1. Heterogeneity at multiple levels: Resources used to construct HC and grid
systems are often highly heterogeneous. Heterogeneity can exist at different
levels, ranging from low-level devices through system software, to scheduling
and usage policies.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. Unpredictable structure: HC and grid systems are often constructed at run
time. based on the availability of the resources. Applications are therefore
required to adapt themselves to a wide range of environments.
3. Dynamic behavior: Resources in HC and grid systems are likely to be
shared. Consequently, performance can van- over time. For example, net
work characteristics such as latency, bandwidth, and jitter may vary de
pending on network load.
4. Multiple administrative domains: Resources in HC and grid systems span
multiple organizations and are not owned or administered by a single orga
nization or entity. Therefore, security issues are an important consideration.
Further, different entities may use different authentication mechanisms, au
thorization schemes, and access policies.
There are many challenges and research issues that must be addressed before
HC systems and computational grids can be constructed and used effectively on a
large scale. The primary issues arise due to the heterogeneous and shared nature
of system resources. This sharing introduces challenging resource management
problems as well as security problems. Furthermore, programming models, per
formance analysis tools, and network protocols and infrastructure are needed. In
the following, we briefly highlight some of these challenges.
6
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Program m ing M odels and Tools
G rid programming are faced with several challenges. One challenge is to
provide tools that allow programmers to understand and explain program
behavior and performance. New techniques are required for translating
user performance requirements into system resource requirements and for
adapting to changes in system structure.
In [53]. Kennedy explored three possible paradigms for grid programming
environments. The paradigms are task composition, grid shared memory,
and new language environments supported by global grid compilation tech
niques. He also examined how the programming models, languages, com
pilers. and libraries that are used for parallel computers can be enhanced
for grid environments. Foster and Kesselman [37] identified three possi
ble approaches for grid programming. The first approach is to adapt pro
gramming models that have proved successful in sequential or parallel envi
ronments such as distributed shared-memory and message-passing models.
T he second approach is to build on technologies that have proven effec
tive in distributed computing, such as Remote Procedure Call (RPC) and
object-oriented technologies. The last approach is to develop new program
m ing models and services. Examples of emerging grid programming models
are high-throughput computing, group-based communication systems, and
agent-based models.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Resource Management
In HC systems and computational grids, users share available resources
such as networks, computers, and other resources. This sharing introduces
challenging resource management problems. Many of the grid applications
need to meet strict end-to-end performance requirements across multiple
compute resources connected by heterogeneous . shared networks. In order
to meet these requirements, sophisticated methods and tools are needed for
specifying application requirements, for translating these requirements from
application-level into resource-level QoS parameters, and for arbitrating
between conflicting demands [37]. Another challenging problem is the co
allocation problem, which arises because grid applications often require
concurrent access to multiple resources of different types at the same time.
Some of the resource management problems in HC and grid systems are
due the fact that system resources span different administrative domains.
Complexities associated with the management and usage of resources across
multiple administrative domains need to be hided. In a grid environment,
different resources, controlled by diverse organizations with diverse poli
cies in widely-distributed locations, need to be used together [16]. There
are possible differences in scheduling policies, security mechanisms, etc.
Furthermore, different sites may use different local resource management
8
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
systems. Even when the same system is used at two sites, different con
figurations often lead to differences in functionality [24). In general, re
source management services for HC systems and grid environments will
include tools and services for resource discovery, task partitioning, resource
allocation and co-allocation, resource reservation. Quality of Service (QoS)
guarantee, dynamic monitoring, and scheduling.
• Security
Security is critical for the widespread deployment of HC systems and com
putational grids. However, these systems introduce challenging security
problems. Resources in such systems are managed by many organizations,
often with different security requirements and policies. Therefore, manag
ing security for such systems is difficult. Further. HC and grid systems
introduce requirements of protecting applications and user data from the
systems on which parts of a computation will execute. Running code in
grid systems may originate from many points requiring effective methods
to verify the origin and authenticity of the codes and means to confine its
execution.
Security requirements in HC and grid environments encompass authentica
tion. authorization, assurance, integrity and confidentiality. While such re
quirements exist for traditional systems also, the implantation mechanisms
for HC and grid systems are more complex. For example, authentication in
9
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
traditional systems is focused on the user, but in grid systems user as well
as server authentications are important. In general, "grid systems should
provide a modular framework within which different security mechanisms
can be integrated so that knowledge of the needs of a particular applica
tion. user, or computing environment can be used to dynamically select
from among the implemented choices and so that the quality of protection
can be negotiated as part of the process of resource allocation" [67].
• Network Protocols and Infrastructure
While early developments in networking have focused on best-effort ser
vice for low-bandwidth flows, many of grid applications require both high
bandwidths and strict performance assurance. Meeting these requirements
requires major advances in networking technologies. HC and grid appli
cations are expected to have significant implications for future network
protocols and technologies. Network protocols need to be designed specif
ically to support these applications, and application requirements will de
termine the communication services that are provided. Each application
may impose different requirements on network protocols. For example [63].
data-intensive applications will require: (1) data transport protocols for the
rapid and reliable transport of huge amounts of data. (2) protocols for the
integration of complex, independently developed modules, and (3) protocols
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
for the encapsulation and integration of existing modules and movements
of program codes to remote sites.
Several research projects are investigating the challenges of HC systems and
computational grids and developing tools and services for such systems. Primary
among these are: Globus [45j. Legion [58]. YDCE [76]. MSHN [46]. AppLeS [8].
and ReMos [26]. Some of these projects will be discussed in Chapter 2.
As stated earlier, a major challenge in using HC systems and computational
grids is to effectively use available resources. System resources such as compute
resources, network bandwidth, and data repositories are shared among applica
tions. In such systems, applications are submitted from various user sites with
specific QoS requirements. One way to take advantage of a HC system or a
computational grid is to decompose an application into several tasks based on
the computational requirements. Different tasks may be best suited for differ
ent machines. Once the application is decomposed into tasks, each task needs
to be assigned to a suitable machine (matching problem) and task executions
need to be ordered in time (scheduling problem) to optimize a given objective
function. Efficient matching and scheduling algorithms are necessary to achieve
high performance for submitted applications. The focus of this dissertation is
the matching and scheduling (defined as mapping) of application tasks onto HC
systems and computational grids.
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1.2 Thesis Contributions
This dissertation focuses on the mapping problem in HC systems and computa
tional grids. We introduce a unified mapping framework for such systems. Using
this framework, we develop efficient algorithms for two novel mapping problems.
Our techniques can be incorporated into resource management systems (RMSs)
for HC and grid environments. This dissertation can be considered as one of the
early efforts that address the mapping problem in HC systems and computational
grids in a general way. Our contributions are summarized as follows.
(a) A Unified Mapping Framework:
We introduce a general framework that can be used to map applications onto
resources of HC systems and computational grids in a unified way. The key
components of our framework are:
1. A system model that captures the main characteristics of the target hard
ware platform such as numbers and types of available resources, and the
interconnection networks. The system model also represents the commu
nication times between system resources, and if the system supports con
current send and receive, overlapped computation and communication, and
advance resource reservations.
12
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2. An application model that captures the main characteristics of the appli
cations to be mapped. These characteristics include: size and type of ap
plications. resource requirements. QoS requirements, and communication
patterns between application tasks.
3. A mapping problem which defines an objective function. Mapping algo
rithms are developed to optimize the given objective function based on
specific system and application models.
4. Mapping algorithms that optimize the objective function of the mapping
problem. Mapping algorithms are usually developed for specific system and
application models with specific assumptions. The output of a mapping
algorithm is a schedule that specifies resource assignments for each task,
and the start time of that task on each of its assigned resources.
(b) Algorithms for M apping W ith Multiple Resource Requirements
and Data Replication:
In HC systems and grid environments, applications usually access multiple re
sources during their execution. We use our unified mapping framework to develop
efficient mapping algorithms for such applications. We assume that application
tasks need to access multiple resources (compute resources and data reposito
ries) during their execution. Furthermore, input data sets are replicated and
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
may retrieved from one or more data repositories. Our algorithms for this map
ping problem are of two types: level-bv-level algorithms and greedy algorithms.
The algorithms are based on list scheduling technique. Our algorithms have a
big improvement in the overall schedule length over a baseline algorithm that
does not consider all resource requirements at the same time. As shown by our
simulation results, it is advantageous to consider all resource requirements simul
taneously when making mapping decisions rather than mapping based on each
type of resource separately.
(c) Algorithms for Mapping W ith Co-Allocation Requirements:
It is often the case in HC systems and computational grids that an application
requires concurrent access to multiple resources of different types at the same
time. In general, this problem is the resource co-allocation problem [4]. We
use our mapping framework to develop efficient algorithms for mapping a set of
applications with resource co-allocation requirements onto HC and grid systems.
In this mapping problem, application tasks have two types of constraints to be
satisfied: precedence constraints and resource sharing constraints.
Two approaches are used to develop the heuristic algorithms: independent-set
approach and critical-resource approach. In the first approach. Directed Acyclic
Graph (DAG) and Resource-Sharing Graph representations are used to find sets
of independent tasks that can be executed concurrently. The critical-resource
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
approach is based on identifying resources in high demand. High priorities for
mapping are given to tasks that require these resources.
We also develop a lower bound on the optimal schedule length of this mapping
problem. The lower bound is developed by considering precedence and resource
sharing constraints at the same time. The Dependency Graph is used to capture
both constraints simultaneously. We found that the weight of the maximum
weighted clique of the dependency graph with some relaxed constraints yields a
very good lower bound. Simulation results show that our algorithms are very
close to the lower bound. The results also show that our algorithms have an
excellent performance improvement over a baseline algorithm of list scheduling
which does not consider the co-allocation requirements.
1.3 Chapter Summaries
Chapter 2 provides background information on HC and computational grids. In
this chapter, we define HC and give a motivational example for HC and grid
systems. Several taxonomies of HC systems from the literature are presented.
We also overview some HC and grid research projects. The projects are MSHN,
Globus. Legion. AppLeS. and YDCE projects.
In Chapter 3 we give a background information on the mapping problem in HC
system and computational grids. First, we define the general mapping problem
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
in such systems- Then, we discuss some mapping taxonomies. Finally, we give a
short survey of several mapping algorithms from HC and grid literature.
Chapter 4 introduces our proposed mapping framework. This chapter de
scribes the key components of the framework: system model, application model,
mapping problem, and mapping algorithms. The framework will be used in Chap
ters 5 and 6 to develop heuristic algorithms for two novel mapping problems in
HC systems and computational grids.
In Chapter 5. we develop heuristic algorithms for mapping a set of application
DAGs with multiple resource requirements and data replication. First, we give
the motivations for this mapping problem. Mapping algorithms are then devel
oped using two approaches: level-by-level approach and greedy approach. The
performance of our algorithms is compared with that of a baseline algorithm,
which does not consider all resource requirements at the same time.
In Chapter 6. the mapping framework of Chapter 4 L s used to develop heuristic
algorithms for mapping a set of application DAGs with resource co-allocation re
quirements. Application tasks have two types of constraints to be satisfied: prece
dence constraints and resource sharing constraints. First, we give the motivations
behind this problem and define the problem. Then, we use two approaches to de
velop our algorithms: independent-set approach and critical-resource approach.
We also develop a lower bound on the optimal schedule length of this mapping
problem. Simulation results comparing the performance of our algorithms with
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that of a baseline algorithm of list scheduling, which does not consider the co
allocation requirements, are then presented.
Finally. Chapter 7 gives a brief summary and some concluding remarks. It
also identifies interesting research directions for future work.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 2
Background
In this chapter, we discuss the background of heterogeneous and grid computings.
First, we start by some definitions. Then we give a motivational example for HC
systems and computational grids. Several taxonomies of HC systems from the
literature are presented next. Finally, we overview some HC and grid research
projects. We discuss the MSHN. Globus. Legion. AppLeS. and YDCE projects.
2.1 Definitions
Many definitions of heterogeneous computing (HC) exist in the literature [32.
43. 54]. Khokhar et al. [54] define HC as "the well-orchestrated and coordinated
effective use of a suite of diverse high-performance machines to provide efficient
processing for computationally demanding tasks with diverse computing needs".
HC is defined in [32] as a special form of parallel and distributed computing that
performs computations using a single autonomous computer operating in both
18
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
SIMD ancl MIMD modes, or using a number of connected autonomous computers.
Freund and Siegel [43] define HC as the tuned use of diverse processing hardware
to meet distinct computational needs.
In general. HC and grid computing are computing paradigms that exploits
the heterogeneity of applications and system resources to enable the construction
of high-performance systems. In this dissertation, we consider a HC system (or a
computational grid) to be the system that makes use of several compute resources
with different capabilities. I/O devices, data repositories, and other resources, all
interconnected by heterogeneous local and wide area networks to optimize the
performance of the system.
2.2 A M otivational Example
Conventional High-Performance Computing (HPC) systems utilize a number of
homogeneous processors and support one execution mode in a given machine
(e.g., MIMD. SIMD. vector processing, and so on). Such systems provide effi
cient solutions to applications which require one particular execution mode that
matches the one the system supports. However, many applications usually consist
of several tasks exhibiting different computing requirements. For such applica
tions. homogeneous systems cannot adequately meet all application requirements
and may perform poorly because a machine spends most of the execution time on
19
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
codes for which it is unsuited. Using more powerful and more expensive homo
geneous systems does not solve the problem because the improvement will only
affects the execution time of the codes with an execution mode that matches the
mode supported by the systems.
HC systems and computational grids offer a cost effective approach to this
problem by utilizing several machines with different computing and architectural
capabilities. In such system, the execution of an application’ s tasks is coordinated
such that each task runs on a machine that matches its embedded parallelism
type, thus increasing application performance. One important advantage of HC
systems compared to homogeneous distributed or parallel systems is the pos
sibility of superlinear speedup (relative to the execution speed on some serial
baseline systems) for heterogeneous applications whose tasks are well matched
with architectures that constitute the underlying HC system [27].
Let us consider a hypothetical example application [54]. as shown in Fig
ure 2.1. which consists of four tasks with different types of execution modes. The
first task is best suited to execute on a vector machine, the second task is best
suited to execute on SIMD machine, the third task is best suited to execute on
MIMD machine, and the fourth task is best suited to execute on a special pur
pose machine. Executing the whole application on a vector machine improves
the performance given bv the serial machine by a factor of two. With a vec
tor machine, the vector portion of the application can be executed much faster.
20
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Special
Vector SIMD MIMD Purpose
Figure 2.1: A hypothetical example of the advantage of using HC systems.
However, other portions may have only a slight improvement in execution time
due to the mismatch between their execution mode types and vector machine
architecture. Using a HC system with four machines that support ail the four
different execution modes of the tasks can result in an overall execution time 25
times faster than the baseline serial machine. If the tasks are dependent on any
shared data, then a communication overhead occurs when using HC systems.
21
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3 Taxonomies o f HC System s
Different classifications and taxonomies have been proposed in the literature for
HC systems [27. 32. 80]. In [80]. HC systems are divides into either mired-mode
HC systems or mixed-machine HC systems. A mixed-mode HC system is a single
parallel processing machine capable of operating in either SIMD or MIMD paral
lelism modes and can switch between them dynamically with negligible overhead.
A mixed-machine HC system is a heterogeneous suite of independent machines
of different types interconnected by a high-speed network.
A more general taxonomy based on execution modes and machine models
is proposed in [27]. The taxonomy is called E M 3 (Execution Mode. Machine
Model). Execution mode is defined by the type of parallelism supported by the
system (e.g.. SIMD. MIMD. vector). Machine model is defined as the machine
architecture and machine performance. E M 3 taxonomy classifies HC systems
based on the number of execution modes and the number of machine models into
four categories: (1) SESM (single execution mode, single machine model). (2)
SEMM (single execution mode, multiple machine model). (3) MESM (multiple
execution mode, single machine model), and (4) MEMM (multiple execution,
multiple machine model). The SEMM class includes homogeneous systems. The
SEMM class includes systems in which different nodes use different machine mod
els but still support only one execution mode. The MESM and MEMM classes
correspond to mixed-mode and mixed-machine systems, respectively.
22
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In [32]. two groups of HC are proposed: (a) system heterogeneous computing
(SHC) and (b) network heterogeneous computing (NHC). In SHC. a single mul
tiprocessor machine supports SIMD and MIMD parallelism modes. SHC divides
into multimode SHC and mixed-mode SHC. In multimode SHC. computations si
multaneously execute in both parallelism modes. However, in mixed-mode SHC.
the execution mode switches between SIMD and MIMD. In NHC. several inde
pendent machines are connected to execute one or more concurrent tasks. There
are two types of NHC: multimachine and mixed-machine NHC. In multimachine
NHC. all machines are identical, but in mixed-machine NHC systems different
types of machines exist.
In this dissertation, the HC system we are considering can be classified as a
mixed-machine system (based on [80]). a MEMM system (based on [27]). or a
mixed-machine NHC system (based on [32]).
2.4 An Overview o f Some Research Projects
In this section, we provide background information on some of the leading HC
and grid research projects and their respective philosophies. In the following, we
discuss the MSHN. Globus. Legion. AppLeS. and VDCE projects.
23
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.4.1 M SHN
The Management System for Heterogeneous Networks (MSHN) [46) project was a
collaborative effort between DoD (Naval Postgraduate School), academia (N ’PS.
USC. and Purdue University), and industry (NOEMIX). Our research was a part
of the MSHN effort. MSHN (pronounced “mission") designed and implemented a
Resource Management System (RMS) for distributed and shared environments.
The main goal of MSHN is to determine the best way to support the execution
of many different applications, each with its own QoS requirements, in a dis
tributed heterogeneous environment. MSHN assumes heterogeneity in resources
and processes. Processes may have different priorities, deadlines, and compute
characteristics.
MSHN supports adaptive applications that can exist in several different ver
sions. These versions may differ in the precision of computation or input data,
and therefore have different values to a user. Unlike other HC and grid projects.
MSHN seeks to determine how to meet QoS requirements of multiple applications
simultaneously. MSHN has defined an optimization metric which is a weighted
sum of values, that represents the benefits and costs of delivering the required
QoS within the specified deadlines.
MSHN's RMS schedules and passively monitors distributed and heteroge
neous resources in shared environments so as to deliver acceptable end-to-end
QoS for a collection of applications. The RMS performs several key functions.
24
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Application
Emulator
AE
MSHN
Daemon
Resource
\ Status
RSS Server
C lient'
Library
Resource RRD
Requirement x —
Database
V
Scheduling
Advisor
Figure 2.2: MSHN architecture.
These include: (1) the monitoring of general resource availability. (2) the on-line
measurement of resource and system state. (3) the transparent sensing of the
resource requirements of an application. (4) mapping of application tasks onto
a heterogeneous suite of resources in a way that exploits heterogeneity, and (5)
the meeting of QoS requirements including those of real-time deadlines, fault
tolerance, security, and priorities.
MSHN architecture consists of several distributed, potentially replicated, com
ponents that communicate with one another using CORBA. The main compo
nents of the architecture (shown in Figure 2.2) are:
25
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• The Scheduling Advisor (SA)
The primary responsibility of the SA is to determine the best assignment
of system resources to a set of applications, based on the optimization of a
global measure. The SA depends on the RSS and the RRD to identify an
operating point that optimizes the global measure. The SA incorporates
scheduling techniques for different types of resources.
• The Resource Requirements Database (R R D )
the RRD is a repository of information pertaining to the resource usage of
applications. The RRD provides this information to the SA.
• The Resource Status Server (RSS)
the RSS maintains a repository of information about the resources available
to MSHN. The RSS responds to SA requests with estimates of currently
available resources.
• A Client Library (CL)
The Client Library provides a transparent interface to all the other MSHN
components. The CL intercepts system calls that initiate new processes
and consults the SA for the best place to execute that process.
• A MSHN Daemon
The MSHN Daemon executes on all compute resources available for use by
the RMS. Its sole purpose is to start applications as requested by the CL.
26
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Users submit their jobs at any of the client sites. First. The C ’L checks
the request against a list of applications managed by MSHN. If the requested
application is not on that list, the CL simply passes the request directly to the
local operating system. If the application is on the list, it instead passes the
request to the SA. The SA queries the RRD for historical and other information
on the job. and also the RSS for information on the status of resources. The SA
then determines the resources that are to be allocated for the job. The job is
executed on these resources. Finally, the RRD and RSS are updated.
2.4.2 Globus
Globus is a joint research and development project between researchers at the
Argonne National Laboratory (ANL) and the Information Science Institute at
the University of Southern California (ISI/USC) [45]. Globus follows a layered
approach in building Grid infrastructure. The most fundamental layer consists
of a set of core services or the Globus toolkit. The toolkit allows applications to
use grid services without having to adopt a particular programming model.
The Globus toolkit comprises a set of components that implement basic ser
vices for security, resource management, communication, and so on. These ser
vices are distinct and have well-defined interfaces, so that they can be incorpo
rated into applications or tools in an incremental fashion. Currently, the toolkit
contains the following tools and services:
27
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1. The Globus Resource Allocation Manager (GRAM), which provides re
source allocation and process creation, monitoring, and management ser
vices. A Globus-based system typically contains many GRAMs. Each
GRAM is responsible for a set of resources operating under the same site-
specific allocation policy. Resource and computation management services
are implemented in a hierarchical fashion. An individual GRAM supports
the creation and management of a set of processes on a set of local re
sources. A computation created by a global service may then consist of
one or more jobs, each created by a request to a GRAM and managed via
management functions implemented by that GRAM. The GRAM provides
a standard interface to local resource management systems. Hence, grid
tools and applications can express resource allocation and process manage
ment requests in terms of a standard GRAM API. while individual sites
are not constrained in their choice of resource management tools.
2. The Metacomputing Directory Service (MDS). which is an information ser
vice that allows applications to discover characteristics of their execution
environment dynamically, and then either configure aspects of system and
application behavior for efficient, robust execution or adapt behavior dur
ing program execution. A run-time directory' service is extremely important
28
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
for computational grid environments. The utilization and availability of re
sources constantly changes, making it inadequate for programmers to rely
on default or standard configurations when building applications.
3. The Grid Security Infrastructure (GSI). which provides authentication ser
vices.
4. The Global Access to Secondary Storage (GASS) service, which implements
a variety of data movement and remote data access strategies.
5. Nexus communication library, which provides communication services within
the Globus toolkit.
6. The Heartbeat Monitor (HBM), which is used to detect failure of Globus
system components or application processes.
7. The Globus Network Performance Monitor (GloPerf) service, which pro
vides online information about the latency and bandwidth observed on
network links.
Figure 2.3 shows the Globus resource management architecture [24]. The
architecture consists of three main components: the Metacomputing Directory
Service (MDS). Globus Resource Allocation Managers (GRAMs). and various
types of co-allocation agents (co-allocators). Co-allocators implement strategies
to discover and allocate multiple resources simultaneously to meet application
29
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
| Application
Gfottnd
RSL
Local
resource
managers
Broker
\RSL
RSL
- k ■ ^
Quetie*. Information
Service
Co-allocator |
Simple jround RSL
GRAM GRAM | GRAM
LSF EASY-LL NQE
Figure 2.3: The Globus resource management architecture.
QoS requirements. Two co-allocation strategies have been developed: an atomic
transaction strategy and an interactive transaction strategy [23]. In the atomic
strategy, the co-allocation request succeeds if all the required resources are al
located. All required resources should be specified at the request time. In the
interactive strategy, the contests of the co-allocation request can be modified
to enable greater application-level control. The Globus project constructed two
co-allocators: an atomic transaction co-allocator called the Globus Resource Al
location Broker (GRAB) and an interactive transaction co-allocator called the
Dynamically Updated Resource Online Co-allocator (DUROC).
30
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The original Globus resource management architecture proposed in [24] has
been extended recently by the Globus Architecture for Reservation and Allo
cation (GARA) [38] to account for different types of resources and to consider
advance resource reservation. Globus technologies have been deployed in the
Globus Ubiquitous Supercomputing Testbed (GUSTO), which includes 40 sites
in eight countries. Different groups are developing applications for GUSTO.
These included remote visualization of scientific simulations, real-time analysis
of satellite data, and distributed parameter studies [45].
2.4.3 Legion
Legion [58] is a metasvstem software project begun in the Fall of 1993 at the Uni
versity of V irginia. Legion is a flexible, wide-area operating system designed to
build a virtual computer from millions of distributed hosts and trillions of objects
while presenting the image of a single computer to the user. In general. Legion
is a grid operating system. It provides the services of a traditional operating
system (such as process creation and control, file system, resource management,
etc.) on a grid. Legion uses an object oriented approach to metacomputing
system design. Each system and application component in Legion is an object.
The object-based approach enables modularity, data and fault encapsulation, and
replace-ability [66].
31
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The philosophy of Legion is to hide the complexity of resource scheduling,
load balancing, data movements, etc. from the application developer. Legion
finds and schedules resources and handles security issues among disparate oper
ating systems and objects written in different languages. That frees users from
the need to negotiate with outside systems and administrators. Distributed ap
plication components are represented as independent, active objects, enabling the
programmer to work with the uniform abstraction of distributed objects. The
development of distributed applications and tools is thus simplified.
There are ten design objectives of the Legion project: site autonomy: an
extensible core: scalability: an easy-to-use. seamless computational environment:
high performance via parallelism: a single persistent name space: security for users
and sendee provides: management and exploitation of resource heterogeneity:
multi-language support and interoperability: and fault tolerance.
The main components of the Legion resource management model are the
basic resources (hosts and vaults), the information database (the Collection),
the Scheduler, the schedule implementor (the Enactor), and an execution Moni
tor [20]. Legion currently provides two types of resources: hosts (computational
resources) and vaults (storage resources). The Collection acts as an information
repository describing the state of the system resources. The Scheduler computes
the mapping of objects to resources using the information provided by the Collec
tion. The Enactor uses resource reservation approach to implement the mapping
32
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
^ /S c h e d u le
•2i
P^Umptementor
Scheduler
Resource
D atabase
Figure 2.4: The Legion resource management model.
computed by the Scheduler. It is not an objective for Legion to directly de
velop efficient schedulers. Therefore. Legion only provides simple generic default
schedulers. Scheduling algorithms designers can use Legion's resource manage
ment infrastructure and mechanisms to develop more sophisticated and efficient
schedulers.
Figure 2.4 shows the Legion resource management model [58]. The Collection
interacts with resource objects to collect state information describing the system
(step 1). The Scheduler queries the Collection to determine a set of available
resources that match the Scheduler's requirements (step 2). After computing a
schedule, or set of desired schedules, the Scheduler passes a list of schedules to
the Enactor for implementation (step 3). The Enactor then makes reservations
with the individual resources (step 4). and reports the results to the Scheduler
33
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(step 5). Lpon approval by the Scheduler, the Enactor places objects on the
hosts, and monitors their status (step 6).
Currently. Legion is running at over 300 hosts across the United States and
Europe. Legion is being used by researchers from a number of disciplines, in
cluding biochemistry. molecular biology, materials science, aerospace, information
retrieval, climate modeling, astronomy, neuroscience, and computer graphics [66].
There is a rough correspondence between some Legion and Globus components
(e.g.. the Collection and the MDS). However. Legion is different than Globus in
its object-oriented approach. Globus presents a "bag-of-services" architecture
layered over preexisting components, while Legion presents a “whole-cloth” de
sign.
2.4.4 AppLeS
The focus of the AppLeS (Application-Level Scheduling) project [6] is to provide
a mechanism for adaptively and dynamically scheduling individual applications
on computational grids. AppLeS schedulers are based on the principle that ev
ery potential scheduling decision has a performance impact on the application [9].
AppLeS's approach is to enhance application performance bv evaluating each sys
tem component in terms of its impact on the application's execution. Scheduling
decisions are based on dynamic resource availability and performance informa
tion. application performance models, and user specifications.
34
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Schedule
Pfcnner
CkM npututfcM tetOrid
Figure ‘ 2.5: AppLeS architecture.
Part of the AppLeS project focuses on developing AppLeS agents to provide a
mechanism for scheduling individual applications. Each application will have its
own AppLeS agent whose function is to select resources, determine a performance-
efficient schedule, and implement that schedule [10]. AppLeS agents utilize the
Network Weather Service (NWS) [82] to get dynamic forecasts of resource load
and availability. Each AppLeS agent uses static and dynamic application and
system information to resources. AppLeS agent then interacts with the relevant
RMS to implement application tasks. Figure 2.5 shows the general AppLeS
architecture and its information sources.
Each AppLeS agent performs the following functions in order to develop a
schedule [83]:
35
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Resource Selection
AppLeS agents select a collection of candidate resource sets from the avail
able resources. Selected resource sets are ranked with respect to their po
tential as an execution platform using an application-specific resource usage
function.
• Schedule Planning
Possible schedules are developed for each candidate resource set. Then,
the most performance-efficient schedule among the schedules is determined
based on an application-specific cost function.
• Application Deployment
Finally, the application is deployed or "actuated” using the most performance-
efficient schedule. Some AppLeS are developed to support rescheduling
during execution in response to dynamic system events or variation in ap
plication requirements.
Several AppLeS agents have been developed for a number applications in
cluding: (1) two-dimensional and three-dimensional Jacobi iterative applications.
(2) a Genetic Algorithm application. (3) a Mandelbrot application. (4) a protein
docking application based on DOT. and (5) a Synthetic Apperture Radar Atlas
(SARA) application.
36
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.4.5 VDCE
VDCE (Virtual Distributed Computing Environment) is a research project at the
University of Arizona [76). The goal of the VDCE project is to develop a complete
framework for application development, configuration, and execution. The main
philosophy of VDCE is to provide a general software development environment in
which to build and execute large-scale applications on a network of heterogeneous
resources. At each VDCE site, the server software, called site manager, handles
the inter-site communications and bridges the VDCE modules to the web-based
repository.
VDCE software consists of three modules: (1) the Application Editor, a
user friendly application development environment th at generates the Applica
tion Flow Graph (AFG) for an application. (2) the Application Scheduler, which
provides an efficient task-to-resource mapping of an AFG. and (3) the VDCE
Run-time System, which is responsible for running and managing application
execution and monitoring the VDCE resources.
The Application Editor is a web-based graphical user interface for developing
parallel and distributed applications. The end-user establishes a URL connection
to the VDCE Server software within the site, namely the Site Manager. The pro
cess of building an application with the Application Editor consists of two steps:
building the Application Flow Graph (AFG). and specifying the task properties
of the application.
37
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The Application Editor provides menu-driven task libraries that are grouped
in terms of their functionality. A selected task is represented as a clickable and
draggable graphical icon in the active editor area. The user can add new tasks,
and specify connections between them. The tasks and links constitute the AFG.
For each task, the user can specify preferences such as computational mode, ma
chine type, and the number of processors to be used in a parallel implementation.
The main function of the Application Scheduler is to interpret the AFG and
to assign the current best available resources so as to minimize the completion
time. List scheduling heuristics are used. The first step of the heuristic is to select
the node with the highest priority. The next step is to select the best available
processor to run the selected task. These steps are repeated until all nodes of the
graph are covered. The level of each node is used to determine its priority.
The VDCE Run-time System sets up the execution environment for a given
application and manages the execution to meet the hardware/software require
ments of the application. The runtime system has two important components:
the control manager and the data manager. The control manager measures the
loads on the resources periodically and monitors the resources for possible fail
ures. The data manager provides low latency and high-speed communication and
synchronization services for inter-task communications.
38
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 3
Mapping in HC Systems and Computational
Grids
In this chapter we give a background information on the mapping problem in HC
systems and computational grids. First, we define the mapping problem and then
we show its NP-complete results. Next, we discuss some mapping taxonomies
from the literature. Finally, we give an overview of several mapping algorithms
proposed by researchers for HC and grid environments.
3.1 T he Mapping Problem
The general scheduling problem has been described in a number of different ways
in different fields. The problem in general assumes a set of consumers serviced
by a set of resources according to a certain policy. The goal is to find an efficient
policy for managing the access to and the use of the resources by the consumers
39
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to optimize a desired performance measure such as schedule length. In distributed
and parallel systems, the scheduling problem occurs because program tasks must
be arranged in time and space so that the overall performance of the system is
optimized [30]. The goal of the scheduling in such systems is to determine an
assignment (or allocation) of tasks to processing elements and an order in which
tasks are executed such as any precedence constraints among the tasks are satis
fied. DAGs are usually used to represent precedence constraints and communica
tion requirements among application tasks. If there are no precedence constraints
among tasks, this problem is known as the task allocation problem [31] or the task
assignment problem [74. 51], In this case, program tasks are represented by an
undirected graph called Task Interactive Graph (TIG) where an undirected edge
links any two tasks need to communicate during their execution. In the task al
location problem, specifying the order of executing the tasks in not required. In
other words, tasks might interact or communicate without imposed precedence
relations.
The scheduling problem in homogeneous environments can be considered as a
special case of the problem in heterogeneous environments. One way to take ad
vantage of HC systems is to decompose submitted applications into several tasks
based on their computational requirements such that each task is computationally
homogeneous. Different tasks may require different architectural capabilities and
may be best suited for different machines. Once an application is decomposed into
40
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
tasks, each task needs to he assigned to a suitable machine (matching problem)
and task executions need to be ordered in time (scheduling problem) to optimize
a given objective function. Matching the task type to the machine type adds
m ore constraints on the original scheduling problem. In homogeneous systems,
th e matching problem does not exists since all machines are identical. In such
systems, load balancing can be an effective way to optimize the performance of
the system. Although some scheduling concepts and techniques for homogeneous
environments can be applied to matching and scheduling for heterogeneous envi
ronments. there is a fundamental distinction between the two approaches. In HC
systems, a task may execute most effectively on a particular type of architecture
and matching tasks to appropriate machines is more important than balancing
the load among all machines [69].
In this dissertation, we refer to the matching and scheduling problem as the
mapping problem. With the increased interest in HC and computational grids,
the mapping problem became the focus of many recent research efforts. Recent
literature has proposed several mapping tools, frameworks, and policies. Several
new issues related to the mapping problem are emerging due to advances in
communication networks, system architectures, and application structures.
41
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2 N P-C om pleteness of the Mapping Problem
The problem of scheduling program tasks in (homogeneous) parallel and dis
tributed systems in known to be NP-complete in general, as well as in several
restricted cases [44]. Optim al solutions are known for three simple cases [31]:
1. Scheduling a tree-structured task graph to an arbitrary number of proces
sors.
2. Scheduling an arbitrary DAG to two processors.
3. Scheduling an interval-ordered DAG to an arbitrary number of processors.
A DAG is called interval-ordered if even- two precedence-related nodes can
be mapped to two non-overlapping intervals on the real number line.
In all the above cases, the communication cost is not considered and all tasks
are assumed to have the same execution time. When communication cost is
considered, optimal algorithms exist for scheduling an interval-ordered DAG to
an arbitrary number of processors [5] and for scheduling a tree-structured task
graph (in-forest/out-forest) on two processors [28]. In both cases, the execution
time is the same for all tasks and is identical to communication costs.
Several versions of the scheduling problem in homogeneous parallel and dis
tributed systems were proven to be .\P-complete when communication among
tasks is not considered and the target system is assumed to be fully connected [31.
56]. The first one is the general scheduling problem which consider the scheduling
42
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
of an arbitrary DAG with arbitrary task execution times to an arbitrary number
of homogeneous processors. Other problems, which are special cases of the gen
eral scheduling problem, are: (1) scheduling a DAG with unit execution times to
an arbitrary number of processors. (2) scheduling a DAG with one or two time
units to two processors. (3) scheduling an interval-ordered DAG with arbitrary'
execution times to two processors, and (4) scheduling an opposing forest with
unit execution times to an arbitrary number of processors.
In HC environments, the mapping problem includes the scheduling problem
as well as the matching problem. It is well known that the mapping problem in
HC environments is. in general, NP-complete [34]. Obtaining an optimal solution
for the mapping problem is not feasible. Optimal solutions can be found through
an exhaustive search, but because there are nm different ways in which m tasks
can be assigned to n machines, an exhaustive search is often not possible. For
example, consider a system with five machines and an application consisting of
30 tasks. This means that there are o3 0 possible mappings. If it takes only
one nanosecond to evaluate the quality of one mapping, then 1000 years are
needed to evaluate and compare all possible mappings to obtain the optimal
one [70]. Therefore, many near-optimal heuristic algorithms for HC systems and
computational grids have been proposed in the literature. In Section 5.5 we give
an overview of several of these algorithms.
43
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.3 Mapping Taxonomies
Several taxonomies for classifying scheduling and mapping algorithms exist in
the literature [19. 31. 56. 14]. Casavant and Kuhl [19] presented a taxonomy
for classifying scheduling techniques in general-purpose distributed systems. The
taxonomy consists of hierarchical and flat classifications. Several examples of dif
ferent scheduling techniques from the published literature are given in [19]. with
each classified by the taxonomy. El-Rewini et al. [31] presented a hierarchical
taxonomy that can be used to classify scheduling approaches in parallel and dis
tributed systems. The taxonomy in [31] starts by dividing scheduling approaches
into two classes: deterministic scheduling and nondeterministic scheduling. In
deterministic scheduling, all information about tasks and their relation is know
a priori. In nondeterministic scheduling, the task graph and execution and com
munication costs are not known before execution.
In [56]. Kwok and Ahmad proposed a taxonomy for classifying static DAG
scheduling algorithms in multiprocessor systems. The highest level of the tax
onomy divides the scheduling algorithms into two categories, depending upon
whether the task graph is of an arbitrary structure or a restricted structure.
Several examples of each class of the taxonomy are given in [56]. While both
previous taxonomies [19. 31] describe the general scheduling problem in parallel
and distributed systems, this taxonomy is partial and only focuses on static DAG
44
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
scheduling techniques. However, the three taxonomies consider homogeneous en
vironments only.
Braun et al. [14] presented a taxonomy, called the Purdue HC Taxonomy,
which can be used to classify mapping algorithms for HC systems. The tax
onomy is defined by three major parts: (1) application model characterization.
(2) platform model characterization, and (3) mapping strategy characterization.
The taxonomy differs from previous taxonomies in the consideration of applica
tion model and platform model characterizations. Information about the target
hardware platform and the application being executed is necessary to achieve
better mapping decisions.
Application model characteristics in the Purdue HC Taxonomy include: ap
plication size: application type (e.g.. DAG or independent tasks); communica
tion patterns between tasks: whether tasks have deadlines, priorities, or other
QoS requirements: and temporal distribution of submitted applications (i.e.. is
the complete set of tasks known a priori (static applications), or tasks can ar
rive at run-time in a non-deterministic manner (dynamic applications)). Plat
form model characteristics include: number of machines, number of connections,
machine heterogeneity, machine architectures, interconnection networks, and if
concurrent send and receive or overlapped computation and communication are
supported or not. Mapping strategy characteristics include: application model
45
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
supported, platform model supported, objective function, control location (cen
tralized or distributed), dynamic or static strategy, and whether data forwarding,
task duplication, preemption, or remapping are considered or not.
3.4 An Overview o f M apping Algorithms
Many algorithms exist in the literature for mapping applications in HC systems
and computational grids. In this section we give an overview of several of these
mapping algorithms. General information about the algorithms is given in Ta
ble 6.2. The table gives the algorithm's name, authors, and the year it was
proposed.
We present the mapping algorithms in three groups based on the application
model supported by the algorithms. The main application models considered
the literature are: DAG. TIG. and independent jobs (or Meta-task). Application
tasks that are independent and indivisible are represented as independent jobs
(meta-task). DAG represents applications by capturing the precedence relations
between tasks. TIG represents applications in which multiple tasks can run
concurrently regardless of their precedence.
For applications consisting of several tasks and represented by DAGs. many
static and dynamic mapping algorithms have been proposed (see Table 3.2).
Dynamic algorithms include the Hybrid Remapper [62]. the Generational algo
rithm [40]. as well as others [49. 57. 78]. Leangsuksun et al. [57] proposed two
46
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A lg o rith m A u th o rs Y ear
Stone H. Stone [74) 77
A-D Ibarra et al. [48] 77
Lo Virginia Lo [60) 88
MH El-Rewini and Lewis 29 90
DLS Sih and Lee [71] 93
TNZ93 Tao et al. [75] 93
LMT Iverson et al. [50] 95
KNN Leangsuksun et al. [57 95
GQE Leangsuksun et al. 57 95
Minmin Freund et al. [41] 96
Maxmin Freund et al. [41] 96
SiY96 Singh and Youssef [72] 96
GSA Shroff et al. [68] 96
Generational Freund et al. [40] 96
HuC97 Hui and Chanson [47] 97
YVSR+97 Wang et al. [79] 97
Cluster-M Eshghian and Wu [33] 97
Max Edge Kopidakis et al. [55] 97
MBA Kopidakis et al. [55] 97
OASS Kafil and Ahmad 98
OAPS Kafil and Ahmad 98
Hybrid Remapper Maheswaran and Siegel [62] 98
Iv098 Iverson and Ozguner [49] 98
MAS+99 Maheswaren et al. [61] 99
BSB+99 Braun et al. [15] 99
YeR99 Yenkataramana et al. [78] 99
HEFT Topcuoglu et al. [77 99
CPOP Topcuoglu et al. [77] 99
PBSA Ahmad and Kwok [2 ] 99
XSufferage Casanova et al. [17] 2000
Table 3.1: Mapping algorithms from HC and grid literature.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
dynamic algorithms: the K Nearest-Neighboring (KNN) algorithm and the Global
Queue Equalizer (GQE) algorithm. The KNN algorithm is a distributed heuristic
while the GQE algorithm is a centralized heuristic. In [49]. Iverson and Ozguner
proposed a distributed dynamic heuristic for mapping multiple application DAGs
competing for system resources. Venkataramana and Ranganathan [78] proposed
a framework for dynamic mapping based on a learning automata model. The
framework can adapt itself to changes in the hardware and network environment,
and it works for any cost metric.
Several DAG static algorithms are described in [2.33. 50. 29. 72. 68. 71.77. 79].
The algorithms are based on several mapping approaches such as list schedul
ing [50. 29. 71. 77], genetic algorithm [72. 68. 79], and clustering techniques [33].
In [2]. Ahamd and Kwok introduced a parallel algorithms that perform mapping
using parallel processors. The algorithm, called PBSA (Parallel Bubble Schedul
ing and Allocation), maps both the tasks and messages and is applicable to
systems with arbitrary network topologies using homogeneous or heterogeneous
processors.
The problem of mapping TIGs in HC systems was first introduced by Stone [74].
The problem is known in the distributed systems literature as the task allocation
problem [31] or the task assignment problem [74. 51]. In stone's work, a graph the
oretic approach is used where a Max Flow/Min Cut algorithm can be utilized to
48
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Algorithm Control
Location
Algorithm
Type
Com m ents
MH Centralized Static Based on list scheduling.
DLS Centralized Static
LMT Centralized Static Based on list scheduling.
KNN Distributed Dynamic
GQE Centralized Dynamic
SiY96 Centralized Static Genetic algorithm.
Generational Centralized Dynamic
GSA Centralized Static Genetic simulated annealing.
WSR+97 Centralized Static Genetic algorithm.
Cluster-M Centralized Static Clustering algorithm.
Hybrid
Remapper
Centralized Dynamic Improves an initial static mapping.
3 heuristics based on list scheduling.
Iv098 Distributed Dynamic Based on list scheduling.
VeR99 Centralized Dynamic Learning automaton model.
HEFT Centralized Static Based on list scheduling.
CPOP Centralized Static Based on list scheduling.
PBSA Centralized Static Parallel algorithm.
Explicitly schedules communications.
Table 3.2: DAG mapping algorithms.
49
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
find assignments which minimize total execution and communication costs. Op
timal solutions can be found for two processors systems. Lo [60] extends stone’ s
model by introducing the concept of interference costs (resource contention costs)
which are incurred when tasks are assigned to the same processor. In [75]. three
heuristic algorithms based on simulated annealing, tabu search, and stochastic
probe approach were proposed. Hui and Chanson [47] proposed a fast heuristic
algorithm. Kopidakis et al. [55] presented two heuristic algorithms: the Matching
Based Algorithm (MBA) and the Max-Edge algorithm. Both algorithms based
on a transformation of the initial minimization problem to a maximization one.
Kafil and Ahmad [51] proposed two algorithms based on a best-first search
technique called the A* algorithm. The A* algorithm, an informed-search algo
rithm from the area of artificial intelligence, guarantees an optimal solution, but
doesn’t work for large problems because of its high time and space complexity.
The first algorithm, the Optimal .Assignment with Sequential Search (OASS). is
a sequential algorithm that reduces the search space. The second algorithm, the
Optimal Assignment with Parallel Search (OAPS). is a parallel algorithm that has
a lower time complexity compared to the OASS algorithm. Both algorithms find
the optimal solution that minimizes the load on the heaviest-loaded processor.
This is equivalent to minimizing the completion time of the entire program since
the time needed by the heaviest-loaded processor will determine the program's
completion time. Several TIG mapping algorithms are given in Table 3.3.
50
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Algorithm Control
Location
Algorithm
Type
Com m ents
Stone Centralized Static Assume identical communication links.
Lo Centralized Static- Introduce interference costs.
Assume identical communication links.
TNZ93 Centralized Static 3 heuristics based on tabu search,
simulated annealing, and
stochastic probe.
HuC97 Centralized Static Assume identical communication links.
Max Edge Centralized Static Assume identical communication links.
MBA Centralized Static Assume identical communication links.
OASS Centralized Static Sequential optimal algorithm based on
A* search.
OAPS Centralized Static Parallel optimal algorithm based on
A* search.
Table 3.3: TIG mapping algorithms.
Table 3.4 gives some characteristics of several meta-task mapping algorithms.
Ibarra and Kim [48] proposed five static heuristics for mapping independent tasks
on nonidentical (heterogeneous) processors. The algorithms are called A. B. C.
D, and E algorithm. Braun et al. [15] compared eleven static heuristics from the
literature and analyzed them under one set of common assumptions. Three dy
namic algorithms were proposed by Maheswaran et al. [61]. The algorithms were
grouped into two categories: on-line mode and batch-mode heuristics. In on-line
mode, a task is mapped as soon as it arrives. In batch-mode, tasks are collected
into a set that is examined for mapping at specific mapping events. Casanova et
al. [17] modified three existing meta-task heuristics (Maxmin. Minmin. and Suf-
ferage) to schedule parameter sweep applications with file I/O requirements. An
51
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Algorithm Control
Location
Algorithm
Type
Comments
A-D Centralized Static 5 algorithms (A. B. C. D. and E).
Minmin Centralized Static Based on algorithms D from [48].
Maxmin Centralized Static Based on algorithms E from [48].
BSB+99 Centralized Static Compare 1 1 heuristics.
MAS+99 Centralized Dynamic 3 algorithms (SA. KPB. Sufferage).
XSufferage Centralized Dynamic Consider file sharing.
Table 3.4: Meta-task mapping algorithms.
extension of Sufferage. called XSufferage. was proposed which takes advantage of
61e sharing to achieve better performance than the other heuristics.
52
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 4
The proposed M apping Framework
In this chapter, we define our proposed mapping framework for HC systems and
computational grids. Figure 4.1 shows the key components of the framework.
The framework consists of four main components: system model, application
model, mapping problem, and mapping algorithms. In the following we discuss
each component.
4.1 System Model
The system model captures the main characteristics of the target hardware plat
form such as number and types of available resources and the interconnection
network. The connectivity of the target system can be represented using an
undirected graph where an edge represents a communication link between two
resources. Some researchers assume arbitrary network topologies [47. 51. 74. 78.
53
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Application
Model
System
Model
Mapping
Problem
Mapping
Algorithms
Figure 4.1: Main components of our mapping framework.
29. 71. 33. 2] while others assume fully connected networks [62. 49. 57. 77. 50.
79. 68. 72. 61. 15. 48].
The system model also represents the communication times between system
resources, and if the system support concurrent send and receives, overlapped
computation and communication, and advance resource reservation. In the sys
tems that support advance resource reservation, system resources can be reserved
in advance for specific time intervals. Simultaneous allocation of resources can
be easily achieved if advance reservation is supported in the system. Also, the
support of resource reservation is vital for providing guaranteed QoS [21]. Ad
vance reservation provides an increased expectation that resources can be allo
cated when needed. In the absence of a reservation system, one encounters either
54
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
increased cost due to excess overprovision or degraded service for critical traf
fic [38]. Techniques for advance reservations have been proposed for real-time
connections [35]. for predictive flows [25]. for mobile flows [12]. and for handling
multicast [11]. Other researchers [81] proposed a model for advance resource
reservation for continuous media applications.
An advance reservation request must specify (1) the amount of resource ca
pacity required. (2) the starting time, and (3) the duration. It may be useful for
the reservation system to set a limit for how early a reservation request can be
submitted. This prevents the system from storing too much reservation infor
mation [81]. The reservation duration issue needs to be considered also. Most
advance reservation strategies require definite reservation duration [81. 35. 25]
while others may allow for indefinite durations [12]. When the actual duration
does not correspond to the reservation duration, several issues arise [81]. If the
actual duration is shorter, then the unused resource capacity should be freed and
made available for other applications. If the actual duration is longer, the system
may or may not have sufficient resources. If enough resources are available, it
may be a good strategy to allow for reservation extension to encourage applica
tions not to overbook resources in order to guarantee services against unpredicted
longer durations.
In this dissertation, we consider a HC system with m compute resources
(machines). M ={rnl.ni, tnm}. and a set of r non-compute resources. R
55
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
= { r ,.r , rr ). Compute resources can be HPC’ platforms, workstations, per
sonal computers, etc. A non-compute resource rk 6 R can be a d ata repository,
an input/output device, etc. We assume that only one task can access any re
source (compute or non-compute resource) at any given time. Resources are
interconnected by heterogeneous communication links. Communication costs are
assumed to be source and destination dependent. System resources may not be
available over some time intervals due to advance reservations. We assume that
system resources can be reserved in advanced for specific time intervals by other
users or by RMSs. We assume that available time intervals for machine m7 are
given by .V/.4(m7) and available time intervals for resource rk are given bv RA(rk).
4.2 Application M odel
The application model captures the main characteristics of the applications to be
mapped. These characteristics include: size and type of application, resource re
quirements. QoS requirements, and communication patterns between application
tasks. For an application model to be sufficient, it must efficiently represent: (1)
application structure (type). (2) application resource requirements, and (3) QoS
requirements (general and application-specific requirements).
Some of the most used representations to capture application structure are
DAG [2. 3. 33. 49. 50. 57. 62. 29. 71. 68. 72. 77. 78. 79]. TIG [47. 51, 55. 60. 64.
74. 75]. and independent jobs (meta-task) [15. 41. 48. 61]. Application tasks that
56
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
TIG DAG
Figure 4.2: Example of TIG and DAG representations.
are independent and indivisible are represented as independent jobs (meta-task).
DAGs and TIGs are used to represent applications that can be decomposed into
several tasks such as parallel programs. A TIG is different from a DAG in that the
former is an undirected graph with no precedence constraints among the tasks.
Figure 4.2 shows an example of DAG and TIG representations.
Application representations must also reflect communication patterns between
tasks. Some researchers assume that one-time communication to be done between
task executions. This is usually the case when using DAG representation. With
DAG. a task communicates with its successor tasks by sending output data after
its execution. A task cannot start execution until it receives all input data from
57
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
its predecessor tasks. Another communication pattern researchers may assume
is simultaneous executions and communications. With TIG. for example, a task
communicates with its neighbors during its execution.
Application models must also represents resource and QoS requirements of
application tasks. Some tasks may require specific resources with specific ar
chitectures. QoS requirements of tasks include deadlines, priorities, throughput,
latency, security, and other application-specific requirements such as jitter level
and frame rate.
In this dissertation. We consider the DAG representation, which can be also
considered as a generalization of the meta-task representation. We assume that
the whole set of applications to be mapped is known a priori (static applica
tions). Each application consists of a set of communicating tasks. In our model,
resource and QoS requirements are specified at the task level. The data depen
dencies among the tasks are assumed to be known and are represented by a DAG.
G = (T .E ). The set of tasks of an application to be mapped is represented by
T={tl.t> tk] where k > I. and E represents the data dependencies and com
munication between tasks. Edge ei} indicates that there is a communication from
task t, to task t} and its weight. \etJ\. denotes the amount of communication.
We assume that an estimate of the computation time of a given task t, on
machine rrij is available at compile-time. These estimated computation times
are given in an Estimated Computation Time. (ECT) matrix. Thus. ECT(t,. nij)
58
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
gives the estimated computation time for task t, on machine m}. If t, cannot be
executed on nij. then EC'T(t,. nij) is set to infinity.
4.3 Mapping Problem
Mapping problems define the objective functions (or performance measures).
Mapping algorithms are developed to optimize a given objective function based
on specific system and application models. An objective function can be (for
example):
• Balancing the load.
• Maximizing resource utilization.
• Satisfying some QoS requirements.
• Minimizing the total completion time.
The total completion time is also know as the schedule length or makespan.
Minimizing the schedule length is the objective of several mapping algorithms
(e.g.. [62. 49. 77. 29. 50. 71. 79. 61. 15. 48. 51. 47. 75]). For the algorithms
proposed in [55. 60. 74]. the objective function is to minimize the sum of compu
tations and communications.
In this dissertation, our objective function is to minimize the overall sched
ule length for a collection of applications that compete for system resources.
59
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This strategy (i.e.. optimizing the performance of a collection of applications as
opposed to that of a single application) hits been taken by Sm art Net [42] and
MSHN [46]. On the other hand, the emphasis in other projects, such as Ap-
pLes [9]. is to optimize the performance of an individual application rather than
to cooperate with other applications sharing the resources. Since multiple users
share the resources, optimizing the performance of an individual application may
dramatically affect the completion time of other applications. We can formally
define our objective function as:
where Finish Time(A,) is the completion time of application A,, and .V is the
total number of submitted applications.
Mapping algorithms are generally developed for specific system and application
models with specific assumptions. Algorithms designed for one model do not usu
ally work for another model. Based on system and application models, mapping
algorithms are developed to optimize a given objective function subject to a set
of constraints. The complexity and quality of mapping algorithms largely depend
on application and system models.
j max {Finis/i rim e (.4 ,)} | Minimize
4.4 Mapping algorithms
60
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Since the mapping problem is NP-complete in general, obtaining an optimal
solution for the mapping problem is not feasible. Therefore, most mapping al
gorithms in the literature are heuristic algorithms. Mapping algorithms can be
either distributed or centralized based on their control locations. In the central
ized approach, a central mapper collects global system information and makes all
mapping decisions. Mapping decisions in the distributed approach are taken by
several local mappers rather than by a single mapper. Mapping algorithms can
be also classified as static or dynamic algorithms. In static algorithms, mapping
decisions are made at compile-time before the execution of applications. On the
other hand, all mapping decisions in the dynamic algorithms are made on-line
during the execution of the applications.
Different mapping algorithms implement different mapping policies and tech
niques such as list scheduling, task duplication, and clustering techniques. A
mapping policy is a set of rules used in a mapping algorithm to match tasks to
resources and schedule task executions. For example, in list scheduling, all sub
mitted tasks are placed in a list according to some priority assigned to each task.
A task cannot be mapped until all its predecessors have been mapped. Tasks are
considered for mapping in the order of their priorities.
The output of a mapping algorithm is a schedule that specify the matching
of tasks to resources, and the start time of each task on its allocated resources.
61
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A schedule / is a feasible schedule if it satisfy precedence relations and all con
straints among tasks. Gantt charts can be used to illustrate the schedules of
mapping algorithms where start and finish time for all tasks on their allocated
resources can be easily shown. Mapping algorithms are evaluated based on the
goodness of the produced schedule and the efficiency of the mapping policy [31].
The schedule is judged based on the objective function we are trying to optimize.
The mapping policy (or the mapping algorithm itself) can be evaluated based on
its time complexity.
In this dissertation, our proposed mapping algorithms are static and central
ized. In our algorithms, all resource requirements are considered at the same time
in a unified manner to achieve better mapping decisions. System resources, in our
model, are available to execute tasks only during certain time intervals as they
are reserved (by other users) at other times. Our algorithms can be considered
as insertion-based algorithms. A higher resource utilization can be achieved with
insertion-based algorithms compared to non insert ion-based algorithms. In non
insert ion-based algorithms, the earliest time when a resource is available is cal
culated as the finish time of the last task assigned to this resource. On the other
hand, insertion-based algorithms consider a possible insertion of each task in an
earliest idle time slot between two already-mapped tasks on any given resource.
62
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 5
Mapping with M ultiple Resource Requirements
and Data Replication
In this chapter, the mapping framework of Chapter 4 is used to develop heuristic
algorithms for mapping a set of application DAGs with multiple resource re
quirements and data replication. Our algorithms are of two types: level-by-level
algorithms and greedy algorithms. As shown bv simulation results, our algo
rithms lead on the average to 50% improvement in the overall schedule length
over a baseline algorithm that does not consider all resource requirements at the
same time. In general, our results show that it is advantageous to consider all re
source requirements simultaneously when making mapping decisions rather than
mapping based on each type of resource separately.
63
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.1 Introduction
As shown in Chapter 3. most mapping algorithms focus on compute resources
only. However, many applications in HC and grid systems need to access compute
resources as well as other resources, such as data repositories, during their exe
cutions. For example, in data-intensive computing [65]. applications access high-
volume data from distributed data repositories such as databases and archival
storage systems. Data may need to be transported to remote sites in order to be
processed by HPC platforms that are not available locally. Most of the execution
time of these applications is in data movement. To achieve high performance for
such applications, mapping decisions must be based on all resource requirements.
Assigning an application task to the machine that gives its best execution time
may result in poor performance due to the cost of retrieving the required input
data from data repositories.
In this chapter, we study the mapping problem where a set of applications
have multiple resource requirements and input data sets are replicated. We con
sider compute resources as well as other resources such as the communication
network and data repositories. An application consists of several tasks and is
represented by a DAG. Resource requirements are specified at the task level. A
task's input data can be data items from its predecessor tasks and data sets from
data repositories. We allow input data sets to be replicated: a data set can be
accessed from one or more data repositories. In our mapping approach, sources of
64
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
input data, execution times of tasks on various machines, and resource availabil
ity are considered simultaneously when making mapping decisions. The objective
is to minimize the overall schedule length of all submitted applications.
To illustrate our ideas, we present several heuristic algorithms. Our algorithms
are static and are based on two approaches: level-by-level approach and greedy
approach. In the level-bv-level approach, all submitted DAGs are combined into
one DAG. The combined DAG is then partitioned into levels of independent
tasks. Independent tasks have no precedence constraints among them and can
be executed concurrently. During mapping process, tasks are examined level by
level. When all tasks in a specific level are mapped onto selected resources, tasks
in the next level are considered for mapping. In the second approach, which is a
greedy approach, all ready tasks from all levels are considered for mapping at the
same time. A task is ready for mapping if all its predecessors have been mapped
onto selected resources.
The experimental results show that our algorithms lead on the average to 50%
improvement in the overall schedule length over a baseline algorithm that does
not consider all resource requirements at the same time. In general, as shown by
our simulation results, it is advantageous to consider all resource requirements
simultaneously when making mapping decisions rather than mapping based on
each type of resource separately.
65
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.2 Problem Definition
5.2.1 System M odel
We consider a HC system with m compute resources (machines). {«i| mm}.
and / data repositories (file servers). R = {r( rj}. Available time intervals for
machine m} are given by .V IA(nij) and available time intervals for data repository
rj are given by RA(rf).
Communication costs are given by two matrices: mrrucomm and rmjcomm.
The mm.comm matrix gives the communication cost for transferring a byte of
data among machines and the rm.comm matrix gives the communication cost
for transferring a byte of data between data repositories and machines. The
communication cost within same machine. is assumed to be
zero.
5.2.2 Application Model
In the HC system we are considering. .V applications. {.4[ 4y }• compete
for the shared resources of the system. Applications with various requirements
are submitted from participant sites. Each application consists of a set of com
municating tasks. The data dependencies among the tasks are assumed to be
known and are represented by a DAG. G = (T .E ). The set of tasks of an ap
plication to be mapped is represented by T={tiA2 **} where k > I. and E
66
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Application 1 Application 2
Figure 5.1: Example of two application DAGs.
represents the data dependencies and communication between tasks. Edge etJ
indicates that there is a communication from task t, to task t} and its weight.
|etJ|. denotes the amount of communication. Figure 5.1 shows an example of two
application DAGs.
As discussed in Chapter 4. we assume that an estimated computation times of
tasks on machines are given in an Estimated Computation Time matrix (EC T).
Thus. ECT{tt. rrij) gives the estimated computation time of task t, on machine
rrij. If t, cannot be executed on rrij. then ECT(t,.nij) is set to infinity.
A task's input data can be data items from its predecessors and data sets from
data repositories. We allow for input data sets to be replicated. In systems with
67
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
multiple copies of data sets, one or more data repository can provide the required
data sets. dataset{t,) specifies the required input data sets for task f, along with
their amounts and possible locations. All of a task’ s input data (data items and
data sets) must be retrieved before its execution. A task is ready for execution if
all its predecessors have completed and it has received all the input data needed
for its execution. After a task’ s completion, the generated output data may be
forwarded to successor tasks and/or written back to data repositories.
Let STARTit^m.}) and FINISH(t,.mj) be the earliest start time and the
earliest finish time of task t, on machine m3. respectively. FINISH{t,.mj) is
calculated as
F IM S H (t,. mj) = START{tt. m}) + ECT(t,. m})
where START{tl.m J) is defined as the earliest time when: (1) machine m} is
available for a duration of ECT{tt. m}). (2) all input data items from the prede
cessor tasks of f, are available at m}. and (3) all required input data sets for f,
are available at m}. START(tt.mj) is calculated as
START{tt. m}) = max{MA(mJ).DataJtems(tl.m J). DataSets(tt. mj)}
where DataJtems{tt. m}) gives the earliest time when all input data items from
every task in the predecessors set of t,. Pre(tt). are available at machine rn} and
68
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Dnta_Set!i(t,.nrij) gives the earliest tim e when all required data sets for t, are
available at m; . For every task tk in Pre(tt). there is an edge from tk to t, in the
DAG (i.e.. ejt, # 0). DataJtems{t,.mj) is calculated as
DataJtems(tt.m,) = max {FINISH(tk. mu) + mm-comm(mu. m.) x let,!}
Vtk e P R E U ,) 9 9 J
where my is the machine assigned to tk. DataSets(tt. rrij) is calculated as
where One-DataSet(tt, mj, dx. rf ) gives the earliest time for data set dx. which
is needed by to be available at m} if it is retrieved from data repository
rj. DRS(tt,dj:) is the Data Repositories Set of task t, which specifies all data
repositories that can be used to retrieve dx. In our model, we allow* for input
data sets to be pre-staged to their destination machines.
5.2.3 Problem Statem ent
We can formally state our mapping problem as follows.
Given:
• A HC system with m machines and / data repositories.
DataSets{tx.m ]) = max
Vdz €dataset(t, )
{One-DataSet(t,. m
f d i )
69
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• .V applications. {.4|........ .4,v}.where each application is represented by a
• Communication costs among the various resources as given by mm.comm
and rmjcomm matrices.
• Estimated computation times of tasks on various machines as given by E C T
matrix.
• Amount and locations of all input data sets that are needed for each task
tt as given by dataset(£,). and
• Advance reserved times for system resources as given by M A and RA.
F ind a mapping to:
where the mapping determines the selected resources to execute each task and
the estimated start times of all tasks on selected resources.
Subject to the following constraints:
• A task can execute only after all its predecessors have completed, all input
data items have been received from its predecessors, and all input data sets
have been retrieved from data repositories.
DAG.
jn iax {F inish T im e(A })} j
70
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• All advance resource reservations must he preserved.
• Only one task can execute on any machine at any given time, and
• At most one task can access any data repository at any given time.
5.3 Our Mapping Approach
As in state-of-the-art systems, we assume a central mapper with a given set of
static applications to map. Applications from all sites are sent to the central
mapper to determine the mapping for each task so that the global objective is
achieved. The information about the submitted tasks and the status of various
resources is communicated to the central mapper. This centralized mapper then
makes appropriate decisions achieving better utilization of the resources.
Mapping in HC systems, even if we map based on compute resources only, is
known to be NP-complete. One popular mapping method is the well known list
scheduling approach [1]. In list scheduling all tasks are placed in a list according
to some priority assigned to each task. A task cannot be mapped until all its
predecessors have been mapped. Ready tasks are considered for mapping in order
of their priorities.
In this section, we develop modified versions of list scheduling for our mapping
problem with multiple resource requirements and data replication. Our heuristic
71
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
algorithms are static and are based on two approaches: level-by-level approach
and greedy approach. In the following, we describe these two approaches.
5.3.1 Level-By-Level Approach
In the level-by-level approach, we combine all submitted DAGs into one DAG. G.
as follows. A hypothetical node is created and linked, with zero communication
time edges, to the root nodes of all the submitted DAGs to obtain the combined
DAG. This dummy node has zero computation time. Figure 5.2 shows the com
bined DAG for the two application DAGs in Figure 5.1. Now. minimizing the
maximum time to complete this combined DAG achieves our global objective.
We develop two heuristic algorithms based on the level-bv-Ievel approach:
Min-FINISH and Max-FINISH algorithms. Pseudo code for the level-by-level
mapping approach is shown in Figure 5.3. In the first step of the approach,
we combine all submitted DAGs into one DAG as explained above. In step
2. we partition the combined DAG into I levels of tasks such that each level
contains independent tasks (i.e.. there are no dependencies between the tasks in
the same level). Therefore, all the tasks in a level can be executed concurrently.
The dummy node is in level 0. Level 1 contains all tasks that do not have any
incident edges originally, i.e.. the tasks without any predecessors in the original
DAGs. All tasks in level I have no successors. For each task tj in level k. all
of its predecessors are in levels 0 to k -1. and at least one of them in level A --1.
72
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Dummy node
Application 1 Application 2
Figure 5.2: The combined DAG for the applications in Figure 5.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Level-by-Level Approach
1. Combine all submitted DAGs into a single DAG G.
2. Do level partitioning of G.
3. For each level in G do
4. Ready = {all tasks in the current level}.
5. W hile Ready is not empty do
6. For each task t, in Ready do
Find the best finish time of tt. F I X IS H ( tt. m ^t)-
7. Select the task f, with the minimum/maximum F IX IS H { tt.mbeSt).
8. Map t, to machine and the associated data repositories.
9. Update M A and RA based on this mapping.
10. Remove t, from Ready.
11. endwhile
12. endfor
End
Figure 5.3: Pseudo code of the level-by-level approach.
Figure 5.4 shows the levels of the combined DAG in Figure 5.2. The combined
DAG in this example has four levels.
The mapping then proceeds by considering tasks in one level at a time.
In step 4. the set of ready tasks. Ready, is initialized to contain all tasks in
this level. Then, for each task t, in Ready set. we find its best finish time.
F IX ISH itt.m best). where m ^st is the machine that gives the best finish time
for t, based on the current availability of resources. The advance reservations
of compute resources and data repositories are handled by choosing the first-fit
time interval to optimize the finish time of a task. In step 7. we select the task
74
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Dummy node
LO
L2
L3
L4
Application 1 Application 2
Figure 5.4: Level partitioning for the combined DAG in Figure
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
with the minimum F IX ISH (t,. to be mapped first in the A [in-FINISH
algorithm. In the second algorithm. Max-FINISH. the task with the maximum
F I X IS H ( tt. nibest) is selected to be mapped first. In step 8 . we map the se
lected task to the machine and data repositories that achieve the best finish
time. F I X I S H ( tt. ). for this task. In step 9. availability times of allocated
resources are updated. Then, the selected task is removed from the Ready set.
The mapping procedure continues by repeating steps 3-12 until all tasks are
mapped.
The idea behind the Min-FINISH algorithm, as in algorithm D in [48] and
Min-min algorithm in SmartNet [42], is that at each step, we attem pt to minimize
the finish time of the last task in the ready set. On the other hand, the idea in the
Max-FINISH. as in algorithm E in [48] and Max-min algorithm in SmartNet [42].
is to minimize the worst case finishing time for critical tasks by giving them the
opportunity to be mapped to their best resources.
5.3.2 Greedy Approach
In the Ievel-by-level approach, we are creating dependency among various ap
plications. However, tasks in a specific level I of the combined DAG belong to
different independent applications. Further, the completion times of levels of
different applications can vary widely, and the level-by-level algorithms may not
76
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
perform well for some types of applications. For example, consider two applica
tion DAGs as shown in Figure 5.5. The first application is a DAG with small
degree of parallelism. Degree of parallelism is defined as the maximum number
of tasks in any level of the DAG (i.e.. the maximum number of tasks that can
be executed concurrently). The second application is a DAG with a large degree
of parallelism. With level-by-level approach, task 3 of application 1 would not
be considered for mapping until all tasks of application 2 have been mapped.
Therefore, the schedule length of application 1 will be long due to the depen
dency created by the level-by-level approach between applications. Although
level-by-level approach provides a good performance in minimizing the overall
schedule length for all submitted application, it may not provide a good schedule
length for individual applications with small degree of parallelism. In the greedy
approach, we are trying to overcome this problem bv considering all ready tasks
from all levels at each mapping event. This will advance the execution of different
tasks by different amounts and will attempt to achieve the global objective and
provide good response times for application with small degree of parallelism at
the same time.
We develop two heuristics based on the greedy approach: Min-FINISH-ALL
and Max-FINISH-ALL algorithms. As in the level-by-level approach, we consider
both minimum and maximum finish times of all ready tasks in determining the
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Application 1 Application 2
Figure 5.5: A motivation example for the greedy approach.
mapping order. The two greedy algorithms. Min-FINISH-ALL and Max-FINISH-
ALL algorithm, are similar to Min-FINISH and Max-FINISH algorithms, respec
tively. They only differ with respect to the Ready set. In the greedy approach,
the Ready set may contain tasks from several levels. Initially, the Ready set
contains all tasks in level I from all applications. After mapping a task, we check
if any of its successors are ready to be considered for mapping and add them to
Ready set. A task cannot be considered for mapping until all its predecessors
have been mapped (as in the level-by-level approach).
5.3.3 Main Characteristics of Our Algorithm s
Our heuristic algorithms have two unique characteristics: simultaneous selection
and unified mapping. In the following we explain these two characteristics.
78
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.3.3.1 Simultaneous Selection
A traditional list scheduling algorithm performs two steps at each mapping event.
First, it finds the next task to be mapped from a sorted list of subm itted tasks.
Second, it selects a machine to execute that task in order to optimize a given
objective function. It has been shown by Sih and Lee [71] that this independent
selection of tasks and machines is not practical. Their approach is to select a
task and a machine at the same time. At each mapping event, a pair of a ready
task and a machine, which maximizes a quantity called the Dynamic Level is
selected for mapping. All possible combinations of ready tasks and machines are
examined. Sih and Lee showed that this strategy is superior to independently
selecting either the task or the machine.
In our algorithms, we follow a similar strategy. At each mapping event, we
examine all possible combinations of ready tasks, machines, and data repositories.
Then a group of one task, one machine, and one or more data repositories that
minimizes (or maximizes) a specific quantity is selected for mapping at the same
time.
5.3.3.2 Unified Mapping
As shown in Chapter 3. classical DAG mapping algorithms for HC systems con
sider compute resources only. With the inclusion of data repositories, one can
79
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
obtain mapping schedules for compute resources and data repositories indepen
dently and combine the schedules. O ur approach in designing our mapping al
gorithms is to use a unified mapping approach. W ith the unified mapping, all
resource requirements of tasks are considered at the same time when making
mapping decisions. To map a task, all required resources are considered simulta
neously to achieve better mapping decisions.
In the following, we show with a simple example that the separated mapping
approach is not efficient (as compared to the unified approach) with respect
to schedule length. Let us consider an application with five tasks as shown
in Figure 5.6. In this example, we assume a fully connected system with three
machines and three data repositories. The estimated computation times for tasks
(in time units) are given in Table 5.1. Table 5.2 gives the the communication costs
(in time units) for transferring one data unit between the data repositories and
the machines. We assume that each task needs an input data set. which can be
retrieved from one or more data repositories as given in Table 5.3. In this example,
we use a fast static algorithm for mapping DAGs in HC environments [62. 791.
The algorithm is based on list scheduling approach. It partitions a DAG into
blocks (levels). Then all the tasks are ordered such that the tasks in block k
come before the tasks in block b. where k < b. The tasks in the same block are
sorted in a descending order based on the number of children of each task (ties
are broken arbitrarily). The tasks are considered for mapping in this order. The
80
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
mapping order for our example is t ,. t > . f;J. then # 5. A task is mapped onto the
machine that gives the minimum finish time for that particular task. Since the
original algorithm does not consider data repositories, we implemented a modified
version of the algorithm. In the modified version, the algorithm chooses a data
repository that gives the best retrieving time of the input data set.
The schedule based on the separated approach, when mapping the machines
first, is shown in Figure 5.7. The length of this schedule is 52 tim e units. For
this case, we first map the tasks to the machines as they are the only resources
in the system. Then for each task we select the data repository that gives the
best retrieving (delivery) time of the required data set. thus minimizing the finish
time of this task. The length of the schedule based on the separated approach,
when mapping the data repositories first, is 39 time units as shown in Figure 5.8.
For this case, we map the tasks to the data repositories as they are the only
system resources. Then for each task we select the machine which gives the best
finish time for that task when using the pre-selected data repository to get the
required data set. Figure 5.9 shows the schedule based on the unified approach.
The length of the unified schedule is 28.5 time units. In the unified approach, for
each task, we select a machine and a data repository at the same time in order
to minimize its finish time. We check all possible combinations of machines and
data repositories for even.' ready task.
81
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
| Taskt
□ Task 2
gU Task 3
g Task 4
I TaskS
Figure 5.6: Application DAG for the example in Section 5.3.3.2.
The previous example shows clearly that the mapping based on the separated
approach is not efficient with respect to schedule length compared to the unified
approach. Further, with advance reservations, separated mapping can lead to
poor utilization of resources when one type of resource is not available while
others are available.
mi m -2 m.j
5 4 8
t-2
20 5 3
h
6 10 4
U
10 4
2
*5
DC 6 5
Table 5.1: Estimated computation times for the tasks in Figure 5.6.
82
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
mi
mi
ni\
r i
5 6 6
*2
1 4 3
r3
4 1.5 5
Table 5.2: Communication costs (time units/data unit) for the example in Sec
tion 5.3.3.2.
Task Amount of the Input
Data Set
D ata Repository
Choices
3 units r t or r2
h
10 units r2 or r3
h
2 units r | or r3
* 4
1 unit H or r2
h
5 units
Table 5.3: Input requirements for the tasks in Figure 5.6.
Figure 5.7: Separated mapping (machines first).
83
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
10 2 0 30 4 0 SO
Figure 5.8: Separated mapping (data repositories first).
Figure 5.9: Unified mapping.
84
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.4 Performance Evaluation
In this section, we evaluate the performance of the algorithms we have devel
oped for the mapping problem defined in Section 5.2. A software simulator was
implemented for the evaluation process. In the following, we first explain our
simulation procedure then we discuss the experimental results.
5.4.1 Simulation Procedure
To define a HC system, number of machines {no.machines) and number of data
repositories (no.repositories) are given to the software simulator as inputs. Com
munication costs among all resources are selected randomly from a uniform dis
tribution with a mean equal to ave.comm.
The workload consists of randomly generated DAGs. A random DAG is gen
erated as follows, number of tasks in the graph, no.tasks, maximum out-degree
of a task, max.outdegree. average computation cost of a task, ave.comp. and aver
age message size to be transferred among tasks, ave.msg.size. are given as inputs.
First, number of tasks at the first level of the DAG is randomly selected. Then,
the simulator randomly selects number of tasks for next levels until the total
number of generated tasks is equal to no.tasks. This will determine number of
levels in the generated DAG. Number of tasks at any level is randomly selected
from a uniform distribution with a mean equal to a \/ no .task s. where a is the
shape parameter of the DAG. A dense DAG (graph with high parallelism) can
85
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
bp generated by selecting n 1. Starting with the first task, number of chil
dren (out-degree) for this task L s randomly selected between 1 and max.out degree.
Then, children are randomly selected from next levels. At least one child should
be selected from the following level. The weight of each edge in the DAG is ran
domly selected from a uniform distribution with a mean equal to ave.msg.size.
The computation time of each task on even' compute resource is randomly
selected from a uniform distribution with a mean equal to ave.comp. Number of
data repositories that can be used to retrieve a data set is randomly selected from
a uniform distribution between 1 and no.repositories. Amount of data sets are
randomly selected from a uniform distribution with a mean equal to ave.data.set.
5.4.2 Experim ental Results
For our experiments, number of machines (no.machines) was set to 20 and num
ber of data repositories (no.repositories) was set to 5. Random DAGs were gen
erated with a = 1. ave.comp — 50. ave.msg.size = 50 Kbyte, ave.data.set = 500
Kbyte, and max.outdegree = n o Ja sks/10. We assume that each task needs an
input data set from data repositories. This data set can be retrieved from one or
more data repositories. We compare our algorithms with the separated version
of the baseline algorithm used in the example of Section 5.3.3.2. The comparison
is based on the overall schedule length as the performance metric.
86
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
50 100 150 200 250
Number of tasks
Figure 5.10: Performance of the level-by-level algorithms with varying number of
tasks.
■Mn.FinBti.AI OMax.FinisO./Ui ■Bas«*n«
0 9 ;
Number of tasks
Figure 5.11: Performance of the greedy approach algorithms with varying number
of tasks.
87
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
■Mta.Ftngh □ Max.Pmah BBasefine
09
5 10 15 20
CCA
Figure 5.12: Performance of the level-by-level algorithms with different CCR.
Figure 5.10 compares the level-by-level algorithms and the baseline algorithm
with different number of tasks. The number of tasks is varied from 50 tasks
to 250 tasks with increments of 50. Figure 5.11 shows a similar comparison for
the greedy approach algorithms and the baseline algorithm. Each point in the
figures is an average of 50 different runs. The algorithms of the same approach
have almost the same performance. The level-by-level algorithms have from 43%
to 73% improvement in the overall schedule length over the baseline algorithm
while the greedy algorithms have from 34% to 53% improvement over the baseline
algorithm. As the number of tasks increases, the performance improvement of
our algorithms increases.
In Figure 5.12 and Figure 5.13. we compare our algorithms to the baseline
algorithm with different communication to computation ratio (CCR). CCR is de
fined as the ratio of ave.dataset (in Kbyte) to ave.comp (in seconds). Low CCR
88
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 5.13: Performance of the greedy approach algorithms with different CCR.
indicates compute intensive applications, while high CCR indicates data inten
sive applications. Figure 5.12 shows the comparison between the level-by-level
algorithms and the baseline algorithm. The level-by-level algorithms have at least
32% improvement in the schedule length over the baseline algorithm. The greedy
approach algorithms are compared to the baseline algorithm in Figure 5.13. The
greedy algorithms have at least 18% improvement in the schedule length over the
baseline algorithm. The figures show that the improvement of our algorithms over
the baseline algorithm increases as the data requirements from data repositories
increase. Figures 5.10- 5.13 clearly show the advantage of our algorithms over
the baseline algorithm that does not consider data repositories requirements.
Throughout our experiments, we found that the level-by-level algorithms have
better performance than the greedy algorithms. Figure 5.14 compares all algo
rithms with different number of tasks. The level-by-level algorithms achieve 7%
89
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
■ M n.fifM h n»ta«_Firoh □ W n .R m h .A i ■M an.K ngh./M
06
SO 100 ISO 200 250
Number o< tasks
Figure 5.14: Comparison of our algorithms with varying number of tasks.
— Max-Finish * • Max*F *m sh »A l
*
M
?
5
4 5
4
3 5
3
2.5
2
1 5
1
OS
0
to z 5 8 4
Number of applications
Figure 5.15: Comparison of the two approaches based on the average schedule
length.
90
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to 13% improvement in the schedule length over the greedy algorithms. How
ever. the greedy algorithms are expected to provide better schedule lengths for
individual applications in some cases as we explained in Section 5.3.2. This is
illustrated in Figure 5.15 where we compare the Max-FINISH algorithm and the
Max-FINISH-All algorithm with different number of submitted applications. The
comparison is based on the average schedule length of all submitted applications.
In the previous experiments, the comparison was based on the overall schedule
length of submitted applications. For this experiment, applications are randomly
generated such that they have different shapes with different degrees of paral
lelism. The size of each application is randomly selected between 50 tasks and
500 tasks, q is randomly selected between 0.1 and 7.0. Figure 5.15 shows that, on
the average, the greedy algorithm provides better schedule lengths for individual
applications compared to the level-by-level algorithm.
In general, as shown by our simulation results, it is advantageous to consider
all resource requirements simultaneously when making mapping decisions rather
than mapping each type of resource separately. Our algorithms lead on the aver
age to 50% improvement in the overall schedule length over a baseline algorithm
that does not consider all resource requirements at the same time.
91
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.5 Related Work
As shown in Chapter 3. classical DAGs mapping algorithms focus on compute
resources only. Also, they assume that a task receives all its input data from its
predecessor tasks only. Therefore, their mapping decisions are based on machine
performance for the tasks and the cost of receiving input data from predecessor
tasks. On the other hand, our algorithms consider compute resources as well as
data repositories. We also assume that input data sets from data repositories
and input data items from predecessor tasks are required for task executions.
The impact of accessing data servers on scheduling decisions has been con
sidered in the context of developing an AppLes agent for the Digital Sky Survey
Analysis (DSSA) application [8]. The DSSA AppLes selects where to run a
statistical analysis according to the amount of required data from data servers.
However, the primary motivation is to optimize the performance of a particular
application. In our algorithms, we optimize the performance of a set of submitted
applications as opposed to that of a single application.
The work described in [17] proposed an adaptive scheduling algorithm for
parameter sweep applications in computational grids. They consider a set of
independent sequential tasks without precedence constraints. They assume that
a task needs a set of files as input which makes that work similar to our mapping
problem. However, a major difference between our work and [17] is that we
consider a more general application model. Also, they assume compute resources
92
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
are grouped in clusters and each cluster has a dedicated storage. We assume that
system resources can be located anywhere on the network. Unlike our approach,
they do not consider the impact of accessing data repositories on the mapping
decisions. On the other hand, they studied the impact of the variation of the
estimated computation and communication costs with the actual run-time costs,
which is outside the scope of this paper.
5.6 Summary
In this chapter, we developed several static algorithms for mapping a set of ap
plication DAGs with multiple resource requirements in HC systems and com
putational grids. Application tasks need to access different types of resources
(compute resources and data repositories) during their execution. Input data
sets are replicated and can be retrieved from one or more data repositories. The
mapping algorithms we have developed for this problem are of two types: level-
by-level algorithms and greedy algorithms. Our algorithms are designed in a
such way that all resource requirements are considered at the same time when
making mapping decisions (unified mapping). As shown by simulation results,
our algorithms lead on the average to 50% improvement in the overall schedule
length over a baseline algorithm that does not consider all resource requirements
at the same time. In general, our results show that it is advantageous to consider
93
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
all resource requirements simultaneously when making mapping decisions rather
than mapping based on each type of resource separately.
94
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 6
Mapping w ith Resource Co-Allocation
Requirements
In this chapter, the mapping framework of Chapter 4 is used to develop heuris
tic algorithms for mapping a set of application DAGs that have resource co
allocation requirements. Application tasks have two types of constraints to be
satisfied: precedence constraints and resource sharing constraints. Two different
approaches are used to develop the heuristic algorithms: independent-set ap
proach and critical-resource approach. Y V e also develop a lower bound on the
optimal schedule length of this mapping problem. Simulation results show that
the performance of our algorithms is very close to the lower bound. The results
also show that our algorithms have a performance improvement up to 30% over
a baseline algorithm of list scheduling which does not consider the co-allocation
requirements.
95
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.1 Introduction
In HC systems and computational grids, applications oftpn require concurrent
access to multiple resources of different types at the same time. For example, an
interactive data analysis application may require simultaneous access to a storage
system holding a copy of the data, a supercomputer for analysis, network elements
for data transfer, and a display device for interaction [38]. For such applications,
the allocation of all required resources is necessary. In general, this problem is the
resource co-allocation problem. The need for resource co-allocation is a common
characteristic of applications running in HC systems and computational grids.
In this chapter, we develop several heuristic algorithms for mapping a set
of applications with resource co-allocation requirements onto HC systems and
computational grids. Each application consists of several communicating tasks
and is represented bv a DAG. Application tasks require multiple (and possibly
different) resources to be allocated simultaneously. In Chapter 5. we considered
the problem of mapping applications with multiple resource requirements but
application tasks were not required to access different types of resources at the
same time.
In the classical mapping problems. DAGs are used to represent precedence
constraints among tasks. In this chapter, the co-allocation requirements add
another type of constraint: the resource sharing constraint. Tasks that share
one or more resources cannot be executed concurrently even if they have no
96
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
precedence constraints among them. Known mapping algorithms for the classical
DAG mapping problem cannot be directly used for our mapping problem since
they only consider precedence constraints.
Two approaches are used to develop our mapping algorithms: independent-
set approach and critical-resource approach. In the first approach. DAG and
Resource-Sharing Graph (defined in Section 6.3.1.2) representations are used to
select sets of independent tasks. Independent tasks have no precedence or resource
sharing constraints among them and can be executed concurrently. The resource-
sharing graph is used in this approach to capture resource sharing constraints
among tasks. The second approach, the critical-resource approach, is based on
the idea of identifying critical resources that are in a high demand. During
mapping, priority is given to the tasks that require these resources.
In this chapter, we also develop a lower bound on the optimal schedule length
of our mapping problem. The lower bound is developed by considering prece
dence and resource sharing constraints at the same time. An undirected graph
called the Dependency Graph (DG) is used to capture both constraints simulta
neously. The weight of the maximum weighted clique of the DG yields a very
good lower bound where the weight of each node (task) is its best possible exe
cution time. Simulation results show that the performance of our algorithms is
very close to the lower bound. The results also show that our algorithms have
up to 30% improvement in the overall schedule length over a baseline algorithm
97
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
of list scheduling which floes not consider the coallocation requirements. In gen
eral. the experimental results show the importance of considering coallocation
requirements at mapping decisions.
A version of the resource coallocation problem has been introduced by the
Globus project [45]. The coallocation problem is defined as the provision of a lio
cation, configuration, and monitoring/control functions for the resource ensemble
required by a single application [23]. The Globus tool-kit provides a flexible set
of coallocation mechanisms that can be used to construct application-specific
coallocation strategies. Two coallocation strategies have been developed: an
atomic transaction strategy and an interactive transaction strategy [23]. In the
atomic strategy, the coallocation request succeeds if all the required resources
are allocated. In the interactive strategy, the contests of the coallocation request
can be modified to enable greater application-level control. The Globus project
constructed two coallocator implementations: an atomic transaction coallocator
called the Globus Resource Allocation Broker (GRAB) and an interactive trans
action coallocator called the Dynamically Updated Resource Online Coallocator
(DUROC).
The notion of coallocation was also considered in the Legion project [58].
Legion uses an object oriented approach to metacomputing system design. It
currently provides two types of resources: hosts (computational resources) and
98
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
vaults (storage resources). The main components of the Legion resource manage
ment model are the basic resources (hosts and vaults), the information database
(the Collection), the Scheduler, the schedule implementor (the Enactor), and an
execution Monitor. The Collection acts as an information repository describing
the state of the system resources. The Scheduler computes the mapping of ob
jects to resources using the information provided by the Collection. The Enactor
provides a mechanism to co-allocate compute and storage resources (hosts and
vaults) to a single application. The co-allocation is implemented using a resource
reservation approach. Globus and Legion projects focus on implementation issues
of the co-allocation process. Algorithms for efficient mapping with co-allocation
requirements are not considered. Also, both projects focus on executing a single
application. The problem becomes challenging when a number of applications
share system resources.
6.2 Problem Definition
In this section we formulate our mapping problem with resource co-allocation
requirements. In the following we define the system model, the application model,
and the objective function.
99
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.2.1 System M odel
We consider a HC system with m compute resources (machines), {mi. m > m m}
and a set of r non-compute resources. R ={/*i.r2 rr}. Compute resources
can be HPC platforms, workstations, personal computers, etc. A non-compute
resource r* € R can be a data repository, an input/output device, etc. MA(mj)
gives a list of time slots when machine m} is available and RA(rk) gives a list of
time slots when resource rk is available. As the mapping proceeds, these lists are
updated.
Communication costs are given by two matrices: mmjcomm and rmjcomm.
where mm.comm(mx.m }) gives the communication cost for transferring a byte
of data from machine m, to machine m} and rmjcomm{rk,mj) gives the com
munication cost for transferring a byte of data from resource rk to machine m,j.
The communication cost within same machine. mm.comm(m.j. rrij). is assumed
to be zero.
6.2.2 Application M odel
In this HC system, a set of .V applications. .4={.4.t 4,v}. compete for the
shared resources. Each submitted application consists of several tasks and is
modeled by a DAG. All submitted application DAGs are combined into a single
DAG. G = (T .E ). where T represents the set of tasks to be executed from
all applications. r= { # i.t2........£„}. and E represents the data dependencies and
100
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
communication between tasks. Edge el} indicates that there is communication
from task t, to task t} and its weight denotes the amount of communication.
We assume that each task t, needs concurrent access to a set of resources:
one compute resource and a number of non-compute resources as specified by
its resource requirements set /?(£,). where R(t,) C R. The amount of data to
be transferred between t, and rk, where rk G /?(£,). is given by data(t,.rk). A
task t, cannot start execution until all its required resources are available. All
required resources will be allocated to t, during its execution. These resources
will be available for other tasks after t, completes its execution. We assume that
all required resources are acquired at the same time (atomic transaction). We
say that task t, and task tj are compatible if R(t,) n R(t}) = o. otherwise £ , and
tj are incompatible. Incompatible tasks cannot be executed concurrently even if
they have no precedence constraints among them. Therefore, in our framework,
tasks may not be executed concurrently for either of the following reasons: (1)
precedence constraints, or (2 ) resource sharing constraints.
We assume that ECT{tt.rnj) gives the estimated computation time for task
t, on machine m}. The execution time of task £ , on machine Exec(tt. mj).
depends on the computation time of tt on m3 and data transfer times between
rrij and all resources which tt needs to access during its execution. For example.
1 0 1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
for systems that assume computation and communication cannot he overlapped.
Exec(tt.ntj) can be calculated as
Exec(tt. rrij) = ECT{tt. m.j) + ^ (data(t,. rk) x rm.cotnm(rk. rrij))
VrkeR(t,)
where the last term gives the total time to transfer any required data between
machine rn} and even' resource rk 6 /?(£,). Exec(tt. m}) can also be calculated in
different ways to consider the overlap between computations and communications
as well as other communication models. The average execution time of task f,.
Exec{t,), is defined as
_____________ m
Exec{tt) = Exec(t,. m,)/m
j =t
ST(tt.nij) and FT{tt.m j) are the earliest start time and the earliest finish time
of task t, on machine m}. respectively if t, where to be mapped on mj. ST(t,. m})
is defined as the earliest time when: (I) machine m} is available for a duration of
Exec{tt. rrij) and (2) t, receives all the needed data from all tasks in its predecessor
set. Pre(tt). FT{t,.mj) is defined as
FT(tt.nij) = ST{t,.mj) + Exec(t,. nij)
102
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.2.3 O bjective Function
Our goal is to determine an assignment (matching) of application tasks to com
pute resources and schedule their executions on all required resources such that
the overall schedule length (or makespan) of all submitted applications is min
imized while satisfying all application-specified precedence constraints and im
plied resource sharing constraints. Thus, our objective function is to minimize
the maximum completion time among all tasks of submitted applications. The
goal is to optimize the performance of the set of submitted applications rather
than optimizing the performance of each application individually. Since multiple
applications share the resources, optimizing the performance of an individual ap
plication may dramatically affect the completion time of other applications. We
can formally define our objective function as
Minimize {max [Finish Time(A,)\ }.
1=1
where Finish Time{A,) is the completion time of application A, and .V is the
total number of submitted applications.
6.3 Mapping Algorithm s
Algorithms for the classical DAG mapping problems do not consider resource
co-allocation requirements. In such problems, application DAGs are partitioned
103
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
into levels such that tasks in the same level have no data dependencies among
them. Therefore, all tasks in the same level can be executed concurrently. In
our model, incompatible tasks in the same level cannot be executed concurrently
due to resource sharing constraints among them. Therefore, mapping algorithms
for the classical DAG mapping problems (e.g.. [2. 33. 49. 62. 77. 79]) cannot be
directly used for our problem.
In this section, we develop several heuristic algorithms for our mapping prob
lem with resource co-allocation requirements. Two different approaches are used
to develop the heuristic algorithms: independent-set approach and critical-resource
approach. In the following we describe both approaches.
6.3.1 Independent-Set Approach
The first approach for developing our heuristic algorithms is the independent-set
approach. In the following, we first explain the general mapping steps of the
independent-set approach. Then, we define the Resource-Sharing Graph (RSG)
which we use to capture resource sharing constraints among tasks. Next, we
develop a static mapping algorithm (called the independent-set mapping algo
rithm) based on the independent-set approach. The independent-set mapping
algorithm can be used with different strategies and heuristics which are given in
sections 6.3.1.4 and 6.3.1.5.
104
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.3.1.1 General Mapping Steps
The general mapping steps of the independent-set approach are shown in Fig
ure 6.1. This approach is based on selecting sets of independent tasks. Inde
pendent tasks have no precedence or resource sharing constraints among them
and can be executed concurrently. In step 1. all submitted application DAGs are
combined into one DAG. One simple way to do this is to use zero-weight edges
to connect root nodes of all application DAGs to a hypothetical zero-cost entry
node, as discussed in Section 5.3.1. A set of tasks that have no precedence con
straints among them is selected (step 3). Top-down or bottom-up level-by-level
partitioning are two possible ways for this step. Step 3 ensures that precedence
constraints are satisfied. Resource sharing constraints among the selected tasks
in step 3 are satisfied in step 4 by selecting sets of compatible tasks among them,
steps 3 and 4 ensure that selected sets have independent tasks only. In step 5.
a scheduling order of tasks within selected sets is determined. In the last step,
step 6. selected tasks are mapped to required resources, in their scheduling order,
such that the finish time of each task in minimized.
6.3.1.2 Resource-Sharing Graph
To capture the implied resource sharing constraints among tasks that may belong
to the same or different applications, we use the resource-sharing graph (RSG).
Gr = (V. E ). where vertex c, denotes task t, and edge el} exists if and only if task
105
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Independent-Set Approach
1. Combine all submitted application DAGs into a single DAG G.
2. While not all tasks are mapped do:
3. Let RD Y be a set of tasks from G that have no precedence constraints
among them.
4. Select sets of compatible tasks from RDY.
5. Find a scheduling order of tasks within the selected sets.
6. Map tasks, in their scheduling order, to required resources such that the
finish time of each task is minimized.
Figure 6.1: General mapping steps of the independent-set approach.
tt and task t} are incompatible. Recall that t, and t} are incompatible if and only
if R{tt) fl R(tj) ■ £ o.
An independent set of an undirected graph g is defined as a set of vertices
of g such that no two vertices of the set are adjacent [22]. An independent set
is called a maximal independent set if there is no other independent set of g
that contains it. A maximal independent set with the largest number of vertices
among all maximal independent sets is called a maximum independent set [22].
The maximum independent set problem is NP-complete [44], In our model, a
maximal independent set of a RSG represents a maximal set of tasks that have
no resource sharing constraints and can be executed concurrently if there is no
precedence constraints among them.
106
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Task Resource Requirements
v, r ,. r 2
Y,
r - 2 - r 3
v 3 rj. r 5
Y.
r,. r 4
v 5 Ft. r 5 . r 6
v 6 re
Table 6.1: An example showing six tasks and their resource requirements.
Figure 6.2: The resource-sharing graph for the tasks shown in Table 6.1.
As an example, consider a set of six tasks that have no precedence constraints
among them. Each task needs concurrent access to a set of resources as spec
ified in Table 6.1. The RSG Gr for this example is shown in Figure 6.2. The
maximal independent sets of Gr are {Y[.Y5}. {Y2.Y5}. {Y^Y^A^}. {Y2.Y4.Y6}.
and {V3.Y4.Y6}. The last three sets are maximum independent sets.
107
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.3.1.3 The Independent-Set Mapping Algorithm
Pseudo code for the independent-set mapping algorithm that we developed based
on the independent-set approach is shown in Figure 6.3. Given a set of application
tasks and their resource requirements, we first find a set of tasks that have no
precedence constraints and then we select maximal independent sets among them
for allocation. RSG is used to find maximal independent sets. Our approach for
selecting maximal independent sets is given in Section 6.3.1.4.
Let RDY be the set of tasks that have no precedence constraints among them
and that are ready for allocation. A task is ready for allocation if for each prede
cessor all required resources have been allocated. Let H' be the set of tasks that
are waiting for allocation, and ALLOCATED be the set of allocated tasks. After
executing the algorithm, the list SCHEDULE will give the resulting schedul
ing order and resource assignments of tasks and length will give the resulting
schedule length.
In the first step of the algorithm, we combine all submitted application DAGs
into a single DAG. G. by using zero-weight edges to connect the root nodes (tasks)
of all DAGs to a hypothetical zero-cost entry node. In step 2. we partition G into
/ levels such that level 0 contains the entry node and level 1 contains all tasks
that do not have any predecessors in the submitted DAGs. All tasks in level I
have no successors. For each task t, in level k. all of its predecessors are in levels
108
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
0 to A : - 1. and at least one of them in level k- 1. All tasks in the same level have
no precedence constraint among them.
The algorithm proceeds level-bv-Ievel as follows. For each level / of G. we con
struct the RSG Gr for all tasks in that level (step 6 ). Gr is used to find maximal
independent sets of tasks that can be executed concurrently. Our approach for
selecting a maximal independent set is based on first choosing a critical node ec.
and then finding a maximal independent set that contains uc. The first maximal
independent set of tasks to be allocated is selected in steps 7 and 8. A critical
node ve is chosen in step 7 and a maximal independent set that contains vc is
selected in step 8 .
In step 10-16. all tasks in the selected maximal independent set are allocated
to their required resources. For the allocation, we first find a scheduling order of
tasks. Several heuristics for this step are given in Section 6.3.1.5. The scheduling
order is used to assign a compute resource m.j to each task t, such that its finish
time. FT(t,.rrij). is minimized. Availability times (MA(rrij) and RA(rk)) of all
resources required by t, are updated based on ST(t,.mj) and FT(t,.mj).
In steps 17-24. a new maximal independent set of tasks among all waiting
tasks is selected to be allocated at the next allocation event. The next allocation
event is calculated as the earliest finish time. FT{tt.m}). among all allocated
tasks. An allocated task vx with the earliest finish time is identified in step
17 and removed from the ALLOCATED set in step 18. Initially, the set of
109
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
candidate tasks that can be allocated next .C. contains all waiting tasks (step
19). C is updated in step '20 by removing all tasks that are incompatible with
any allocated task. Then Gr is used to find a maximal independent set of tasks
from C. The algorithm repeats steps 10-24 until all tasks in this level have been
allocated.
6.3.1.4 Maximal Independent Sets Selection
The schedule length of any mapping is influenced by the selection of maximal
independent sets and by the order in which these sets are considered for mapping.
This is shown in the following example using the RSG in Figure 6.2. For this
example, we assume that each task needs a compute resource (mi. m2. or m3)
and other resources as specified in Table 6.1. The execution times of tasks on the
different compute resources are shown in Table 6.2. Example schedules are given
in Figures 6.4. These schedules have different schedule lengths. The optimal
schedule length for this example is 11 time units. This is achieved by schedules
2 and 3. In schedules I and '2. two different maximum independent sets were
selected to be mapped first. Schedule 1 has a length of 13 time units, while
schedule 2 has the optimal length. This clearly shows the importance of the
order in which the selected sets are considered for mapping. From schedule 3. we
can also see that it is not always efficient to select a maximum independent set
to be mapped first. Schedule 3. which starts with the maximal independent set
110
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
T he In d ep en d e n t-S e t M apping A lgorithm
B egin
1 . Combine all submitted DAGs into a single DAG G.
2. Do level partitioning of G.
3. Let SCH ED U LE = o and length = 0.
4. For level 1 to / do
5. 14' = {all tasks in the current level} and ALLOCATED = 6.
6. Construct the resource-sharing graph Gr for all tasks in W.
" T
1 . Pick a critical node vc from W\
8. Find a maximal independent set of tasks. S. from H’ such that vc € S.
9. W h ile U' is not empty do
10. Find a scheduling order of tasks in 5 then add them to SCHEDULE.
11. For each task t, in 5 (in their scheduling order) do
12. Assign a machine nij to t, in order to minimize its FT(t,. m}).
13. Update MA(mj) and RA(rk). Vrf e e R(tt).
14. If FT(t,.mj) > length th e n length=FT(t,.mj).
15. endfor
16. Add all tasks in S to ALLOCATED and remove them from IV.
17. Let rr be the allocated task with the lowest finish time.
18. Remove vz from ALLOCATED.
19. Let C = W. C is the set of candidate tasks that can be allocated next.
20. Remove all tasks from C that are incompatible with any allocated task.
21. If C # o th e n
22. Pick a critical node uc from C.
23. Find a maximal independent set of tasks 5 from C such that ve € S.
24. en d if
25. endw hile
26. endfor
End
Figure 6.3: Pseudo code of our mapping algorithm based on the independent-set
approach.
I ll
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
nil m> mi
V, 9 6 5
Y, 8 10 6
v 3 5 3
2
V.. 4 5 7
v 5
6 1 4
v 6
10 3 8
Table 6/2: Execution times for the tasks in Figure 6.2.
{V2.V5} (not a maximum independent set), has the optimal length while schedule
1. which starts with the maximum set {Y3tY.,.Y6}. has a non-optimal length of
13 time units.
Since the maximum independent set problem is NP-complete [44], we use
a heuristic approach for finding maximal independent sets. Our approach for
finding a maximal independent set S is to select a critical node vc and add it
to 5 which is initially empty. Then we attempt to enlarge 5 by traversing Gr.
Critical nodes need to be selected carefully. Different strategies can be used for
selecting critical nodes. In the following we describe some of these strategies.
S i H ighest average ex ecu tio n tim e. In this strategy, we give priority for
tasks that need more time for execution since they can be critical tasks.
The node which represents the task with the highest average execution
time is selected as critical node. In HC systems and computational grids,
tasks have different execution times on different machines. Therefore, we
use the average execution time Exec(t.t) as the selection criterion.
112
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
nrii v 4
n v v 5 V6 v 3
m 3
v:
V|
-----
9
— 1 — 1 —
I 2 3
-----
4
-----
5
— 1 — 1 —
6 7 8
i i i r
9 10 II 12 13
Schedule 1
m ,
v4
nrb
v‘ 1 lid
V3
m 3
v 2 V|
)
----- 1 ----- 1 -----
2 3
— i—
4 5
— r— i—
6 7 8 «
1 1 1 1
9 10 II 12 13
Schedule 2
v4
v6 v5
m 3 v 3 | v 2
V|
“ 1 1 T 1 I I ---- 1 ..
9 1 2 3 4 5 6 7
Schedule 3
1 1 1 1 T
S 9 10 11 12 13 I
Figure 6.4: Example schedules for the example in Section 6.3.1.4.
113
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
52 H ighest degree. The node out-degree in a DAG has been used in many list
scheduling heuristics as a priority function. The out-degree of a node tt gives
the number of tasks that have immediate precedence constraints with t,.
The idea is to advance the execution of tasks with high out-degree. Thus,
many tasks can be ready for mapping once high out-degree tasks finish
execution. In our framework, the out-degree of node £,. which represents
task £,. in the combined DAG G does not reflect all dependencies between t,
and other tasks since G only captures the precedence constraints. Resource
sharing constraints should also be considered. Therefore, we define the
degree of task t, as the sum of its out-degree in G and its degree in the RSG
Gr. This number gives a better indication about the number of tasks that
can be ready for mapping once t, completes its execution, either because
those tasks have precedence or resource sharing dependencies with £,.
53 C ritical p a th nodes. In some situations, the average execution time or
the degree of a task £ , cannot reflect how important for other tasks that
t, finishes execution as soon as possible. The successors of f, may not be
critical tasks and advancing their execution may not improve the schedule
length. For these reasons, selecting critical path nodes as critical tasks can
be a good strategy. A Critical Path (CP) in a DAG is a set of nodes and
edges that form a path from an entry node to an exit node with the largest
sum of execution and communication costs. We use average execution times
114
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and average communication costs to find the critical path. In this paper.
we implement two variations of this strategy:
53.1 In this version, the task that is on the critical path is selected as a
critical task. If there is no such task among the current set of candidate
tasks, the task with the highest average execution time is selected as
a critical task.
53.2 This version is similar to S3.1 except for the case when there is no
critical path node in the current set of candidate tasks. In this case,
the task with the highest degree is selected as a critical task.
6.3.1.5 A llocation H eu ristics
After selecting a maximal independent set of tasks, careful allocation of these
tasks to required resources is necessary to achieve our objective. Different heuris
tics can be used for allocating tasks of the selected maximal independent set S
to their required resources. In the following we describe some of our allocation
heuristics. The idea behind our heuristics is to advance the execution of tasks
that may be critical in order to minimize the overall schedule length.
1. H ighest A verage-E xecution-T im e F irs t (H A E T F )
In this heuristic, the average execution time is used as a priority function
115
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to place tasks in a list. All tasks are plared in a list in the order of non-
increasing average execution times. Using this order, each task is allocated
to its required resources such that its finish time is minimized.
2. Maximum Finish-Time First (M AX)
For each task, we calculate the best finish time that can be achieved. Then
we select the task with the maximum best finish time among all tasks. The
selected task is allocated its required resources such that its finish time is
minimized. Y Y e repeat until all tasks are allocated.
3. Minimum Finish-Time First (M IN)
This heuristic is similar to the Maximum Finish-Time First (MAX) heuris
tic except that we select the task with the minimum best finish time instead
of selecting the task with the maximum best finish time.
4. Highest Degree First (HDF)
In this heuristic, all tasks are placed in a list in a descending order according
to their degrees (ties are broken arbitrarily). Then, tasks are allocated
one-by-one to required resources such that the finish time for each task is
minimized.
116
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.3.2 Critical-Resource Approach
Our second approach for solving the mapping problem defined in Section 6.2 is
the critical-resource approach. In this approach, we try to identify resources in
high demand and give high mapping priorities for tasks that need these bottleneck
(or critical) resources. In the following, we first define critical resource. Then,
we develop a mapping algorithm based on this approach.
6.3.2.1 Critical Resource
Let the Resource-Tasks-Set of a resource rk. RTS(rk). be the set of all tasks that
need to access rk during their executions. Formally.
RTS(r k) = {tt. such that rk 6 R{t,)}
where R(tt) is the resource requirements set of task t,.
Let u-(rk) be the weight of a resource rk. u'(rk) is calculated as the sum of
execution times of all tasks in RTS(rk) on machines that give best finish times
for the tasks. w(rk) is defined as
w(rk) = S Exec'{t,)
Vf,€ftTS(rt )
where Esec'(t,) is the execution time of task t, on the machine that gives the best
finish time for t, among all machines based on resource availabilities. We define
117
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the Critical Resource (cr) as the resource with the maximum weight among all
resources, i.e. ic(cr) > tc(rk). Vr* 6 R.
6.3.2.2 Dynamic-Critical-Resource Algorithm
In this section we develop a mapping algorithm based on finding the critical
resource as defined above. The idea is to give high priority in mapping for tasks
that need the critical resource. Our algorithm is called the Dynamic-Critical-
Resource Algorithm (DCRA).
Pseudo code for DCRA algorithm is shown in Figure 6.5. In steps 1 and 2 of
the algorithm, we combine all submitted applications into a single DAG G and
partition G into / levels using the same procedure described in Section 6.3.1.3.
Then the algorithm proceeds level-by-level as follows. For each level / of G. we
construct RTS(rk) for every resource rk (step 4). RTS{rk) is constructed by
considering resource requirements set R(t,) of every task f, in the current level
only. Then the mapping proceeds by calculating the weights of all resources
(step 6 ). finding the critical resource cr (step 7). and mapping all tasks in the
RTS(cr) to their required resources (step 8 ). Allocation heuristic developed in
Section 6.3.1.5 can be used for step 8. In step 9. all mapped tasks are removed
from RTS(rk) of every resource rk.
The DCRA algorithm is dynamic in the sense that the weights of resources
are calculated after each mapping event. In a static approach, the weights of
118
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
D ynam ic-C ritical-R esource A lg o rith m
Begin
1. Combine all submitted application DAGs into a single DAG G.
2. Do level partitioning of G.
3. For level 1 to / do
4. Construct RTS(rk) for every resource rk.
5. W hile not all tasks in the current level are mapped do
6 . Calculate the weights of all resources.
7. Find the critical resource cr.
8. Map all tasks in the RTS(cr) to their required resources.
9. Delete mapped tasks from all RTSs.
10. end w hile
11. endfor
End
Figure 6.5: Pseudo code of the dynamic-critical-resource algorithm.
119
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
resources would be calculated one time only and all resources would be sorted in
a list based on their weights in a non-increasing order. Then task mapping would
proceed in this order. In our dynamic approach, only the critical resource is
identified at each mapping event. Resource weights may change during mapping
process because Exec*{ti) may not be the same in each mapping event since
resource availabilities change as the mapping proceeds.
6.4 Performance Evaluation
6.4.1 A Lower Bound
To evaluate the quality of our algorithms, in this section we develop a lower
bound, lb, on the optimal schedule length, of our mapping problem with
resource co-allocation requirements. Recall that a path p in a DAG G is defined
as a chain of nodes and edges that starts with an entry node and ends with an
exit node. The length of p is the sum of its execution and communication costs
(the total weight of nodes and edges) for a given mapping. The critical path
(cp) of G is the path with the maximum length among all paths in G. Tasks on
the critical path, as well as tasks belong to any path in G. cannot be executed
concurrently due to precedence constraints among them and can only be executed
sequentially. Therefore, the schedule length of a given mapping A. S L A. cannot
be less than the length of its critical path. Similarly, the schedule length of any
120
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
optimal mapping. SLopt- cannot be less than the length of the critical path based
on this optimal mapping.
Let us define r p(G) as the best-length of a path p in a DAG G. r p(G) is
calculated as
r p(G) = 51 best.Exec(t,)
v t.e p
where best-Exec(tt) is the best execution time of task t, and is equal to the execu
tion time of £ , on machine mb. Exec{t,.mb). where Exec(tt,m b) < Exec(tt.m.j)
for even.' machine nij. The best-length of the critical path cp. r cp(G). is the
maximum among all paths in G. Obviously. ^ ( G ) is a lower bound on the op
timal schedule length. SLopt- since it has best possible execution costs without
considering any communication costs. Thus, it is clear that SLopt > IYp(G).
The above lower bound is based on precedence constraints. In our mapping
problem, precedence constraint is not the only reason for tasks not to be executed
concurrently. Resource sharing constraint is another reason. Incompatible tasks
cannot be executed concurrently even if they have no precedence constraints
among them. Therefore, another lower bound based on resource sharing con
straints can be developed using RSG. Let U'7 4(G r) be the weight of a clique q in
a RSG Gr for a given mapping .4. The weight of each vertex ct in Gr is equal
to the value of Exec(tt. nij) where m.j is the machine assigned to task t, by .4.
The weight of a clique is the sum of the weights of all tasks belong to this clique.
121
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The clique with the maximum weight among all cliques is called the maximum
weighted clique.
Each clique in Gr represents a set of tasks that have resource sharing con
straints among them and cannot be executed concurrently. Tasks belong to the
same clique must be executed sequentially. Therefore, the schedule length of
mapping .4. S L .4 . cannot be less than ITq(G>). the weight of the maximum
weighted clique Q of Gr based on the resource assignments of .4. Similarly, the
optimal schedule length. SLopt. cannot be less than W ^ iG r) for any optimal
mapping opt. We call the weight of the maximum weighted clique Q of a RSG
Gr as the best-weight. B\Yq(Gt ). if the weight of each vertex r, in Gr is assigned
the value of best-Exec(t,). Hence, it is clear that BWq(Gf) is a lower bound on
the optimal schedule length.
In our model, tasks cannot run concurrently for either of the following reasons:
( 1) precedence constraints, or (2) resource sharing constraints. A better lower
bound can be achieved if we consider both types of constraints at the same time.
In the following, we develop another lower bound by considering both precedence
and resource sharing constraints simultaneously.
Let the Dependency Graph (DG) be an undirected graph where task tt is
represented by vertex r,. An edge el} exists between c, and v} if f, and are
incompatible or if there is a path between t, anti t} in the application DAG. The
dependency graph captures both precedence and resource sharing constraints. A
122
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
clique in a dependency graph represents a set of tasks that cannot be executed
concurrently. Tasks belong to the same clique need to be executed sequentially to
ensure that both precedence and resource sharing constraints are satisfied. This
representation may not follow the precedence order of tasks but this will not affect
the value of the lower bound. Let Q be the maximum weighted clique in a DG Gd
if the weight of each vertex vt in Gd is assigned the value of best.Eiec{tt). We call
the weight of Q as the best-weight of the maximum weighted clique Q of a DG
Gd, BW^Gd). Based on the definition of Q. the optimal schedule length, SLopt.
cannot be less than B\V~{Gd) for any optimal mapping opt. Hence. BW-(Gd) is
a lower bound on the optimal schedule length. SLopt, i-e. SLopt > BlV^(Gd)-
From the definitions of dependency graph (DG) and resource-sharing graph
(RSG). DG is a generalization of RSG. For any set of tasks S with a RSG Gr
and a DG Gd- Gr is a subgraph of Gd- Both graphs have the same number of
vertices but different number of edges. Gd has all edges in Gr plus additional
edges that represent precedence constraints. Any clique q in Gr is also a clique
in Gd- Therefore, the maximum weighted clique of Gr is also a clique in Gd but
not necessarily the maximum weighted clique of Gd- This is due the fact that
the additional edges of Gd may form a clique with a greater weight than the
maximum weighted clique of Gr. Also, from the definition of DG. any path p in
G is a clique in Gd- If qp is the clique in Gd that represents the path p in G. then
the best-weight of qp. B\Yqp{Gd). is equal to rp(G) . Similarly, the best-weight
123
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
of the clique q^ which represents the critical path cp of G. is equal
to the best-length of p. The clique < 7 ^ is not necessarily the maximum
weighted clique of Gj. Gd has edges to represent resource sharing constraints
that may form cliques with best-weights greater than B\Vqcp(Gd)-
Based on the previous discussion, we can define a lower bound, lb. for our
mapping problem based on all constraints among tasks as follows
lb = B W ^ G d)
6.4.2 Baseline Algorithm
As shown in Chapter 3. many mapping algorithms exist in the literature for map
ping application DAGs in HC systems. None of these algorithms consider the
resource co-allocation problem we define in this chapter. Therefore, we will use
a simple list scheduling algorithm as a baseline algorithm to evaluate our heuris
tic algorithms and to show the importance of considering resource co-allocation
requirements at mapping decisions.
The baseline algorithm is a fast static algorithm for mapping DAGs in HC
environments. The algorithm does not consider co-allocation requirements at
mapping decisions. It partitions the tasks in the DAG into levels using an al
gorithm similar to the level partitioning algorithm described in Section 6.3.1.3.
Then all the tasks are ordered such that the tasks in level k come before the
124
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
tasks in level k + 1. The tasks in the same level are sorted in a descending order
based on the number of children (out-degree) of each task (ties are broken arbi
trarily). The tasks are considered for mapping in this order. A task is mapped
to the required resources such that its finish tim e is minimized based on compute
resource requirements only. The Baseline algorithm is similar to our algorithms
in the sense that all algorithms proceed level-by-level.
6.4.3 Implementation Issues
The focus of this chapter is the mapping problem with resource co-allocation
requirements in HC systems. The implementation details for the co-allocation
process are outside the scope of this chapter. A good discussion of implementation
issue can be found in [23]. In the following, for the sake of completeness, we briefly
state our assumptions regarding the co-allocation implementation.
We assume that a task t, cannot start execution until all its required resources
are available. These resources will be acquired at the same time. Once a task
tt completes its execution, all its allocated resources will be released and will be
available for other tasks. We assume that any allocation request for any resource
will be granted as long as this resource is available. In this chapter, we do not
consider the cases of resource failures that can occur in the HC systems and
computational grids.
125
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.4.4 Sim ulation Procedure
A software simulator was implemented to evaluate the performance of our algo
rithms. To define the HC system, number of machines (no.machines) and number
of resources (no.resources) are given to the simulator as inputs. Communication
costs among all resources are selected randomly from a uniform distribution with
a mean equal to ave.comm. The communication costs are source and destination
dependent.
The workload consists of randomly generated DAGs. A random DAG is gen
erated as follows: number of tasks in the graph, no.tasks. maximum out-degree
of a task, max.outdegree. average computation cost of a task, ave.comp. and aver
age message size to be transferred among tasks, ave.msg.stze. are given as inputs.
First, number of tasks at the first level of the DAG is randomly selected. Then,
the simulator randomly selects number of tasks for next levels until the total
number of generated tasks is equal to no.tasks. This will determine number of
levels in the generated DAG. Number of tasks at any level is randomly selected
from a uniform distribution with a mean equal to aVrio.tasks. where a is the
shape parameter of the DAG. A dense DAG (graph with high parallelism) can
be generated bv selecting a » 1. Starting with the first task, number of chil
dren (out-degree) for this task is randomly selected between 1 and max.outdegree.
Then, children are randomly selected from next levels. At least one child should
be selected from the following level. The computation time of each task on every
126
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
compute resource is randomly selected from a uniform distribution with a mean
equal to ave.comp. The weight of each edge in the DAG is randomly selected
from a uniform distribution with a mean equal to ave.msgsize.
Resource requirements for each task are randomly selected from available
resources. Number of required resources is randomly selected from a uniform
distribution with a mean equal to y/noj-esources. The amount of data to be
transferred to/from each resource in the resource requirements set is randomly
selected from a uniform distribution with a mean equal ave.datasize.
6.4.5 Experim ental Results
For our experiments, no.machines was selected equal to 10 and no.resources equal
to 20. Random DAGs were generated with a = 1. ave.comp = 50, ave.msg.stze =
50 Kbyte, ave.datasize = 300 Kbyte, and max.outdegree = n o J a sk s/10. We use
two performance metrics to evaluate the performance of our algorithms. These
metrics are: (1) the makespan or the schedule length, and (2) the running time
of the algorithm on a Sun Enterprise 250 with 1 Gbyte of memory.
Figure 6.6 shows the performance of our algorithms compared to the lower
bound developed in Section 6.4.1. The algorithms are compared to the lower
bound for application size up to 70 tasks since the maximum weighted clique
problem is NP-complete [44]. We found that it is unfeasible to find the lower
bound of applications with more than 70 tasks with our available resources. We
127
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
BHAETF B M W QMAX POU T BOCRA BLoweftoundl
4 ,
N unbir of tasks
Figure 6.6: Comparison with the lower bound.
implemented an exhaustive-search algorithm with backtracking to find maximum
weighted cliques. Independent-set approach’s algorithms shown in the figure are
using strategy SI for selecting critical nodes. Figure 6.6 shows clearly how close
our algorithms are to the lower bound. It should be noted that the lower bound
may not always be possible to achieve, and the optimal schedule length may be
greater than this bound.
The mapping algorithm we have developed in Section 6.3.1.3 based on the
independent-set approach can be used with four different strategies for selecting
critical nodes and with four different allocation heuristics. Therefore, we have
16 different algorithms based on the independent-set approach. Throughout our
experiments, we found out that all the 16 algorithms have relatively the same
performance. The difference in schedule length between the best and the worst
128
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
20 40 60 80 100
minimum 0.07c 0.17 c 1.07c 0.97 1.37
maximum 13.99c 13.97 10.47 9.67c 7.97
average 4.97c 4.87 4.77 4.37 3.47
Table 6.3: Minimum, maximum, and average percentage difference in schedule
length between independent-set approach algorithms with different number of
tasks.
algorithms was less than 5% on average. The explanation is that, after the first se
lected maximal independent set. following maximal independent sets are selected
in an incremental way based on the next task to finish execution. This cause
the algorithms to have relatively similar subsequent maximal independent sets.
Therefore, allocation heuristics will have no chance to show' their performance
differences. The algorithms will differ in the first selected maximal independent
set, but as number of tasks increases, the effect of this difference will be minimal.
Table 6.3 shows the minimum, maximum, and average differences between the 16
algorithms in schedule length based on 100 different runs with different number
of tasks. The table shows that as number of tasks increases, the performance dif
ference between the algorithms decreases due to the previous reasons. Therefore,
in the rest of our results, for the sake of simplicity and without loss of generality,
we will use one algorithm from all the 16 algorithms to represent independent-
set approach’s algorithms. The selected algorithm uses strategy S3.1 for critical
nodes selection and Maximum Finish-Time First heuristic for tasks allocation.
We will call this algorithm the MAX algorithm.
129
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
■M ir F r a n □Ma»_Rm r BBMrtwei
0 9
0 8
I
| 0 3
I 02
01
0
50 100 150 200 250
Number of tasks
Figure 6.7: Comparison of schedule lengths with different number of tasks.
Figure 6.7 and Figure 6.8 compare MAX. DCRA. and baseline algorithms with
different numbers of tasks ranging from 100 tasks to 500 tasks with increments of
100. Each point in the figures is an average of 50 different runs. The figures clearly
show the advantage of our algorithms over the baseline algorithm which does
not consider co-allocation requirements. MAX algorithm has relatively better
schedule length compared to DCRA algorithm. With 500 tasks. DCRA's schedule
length is 5.6% longer than MAX's schedule length. On the other hand. DCRA is
faster than MAX algorithm. With 100 tasks. DCRA is 2.7% faster than MAX.
and the difference increases as number of tasks increases to reach 35.9% with 500
tasks.
The comparison with respect to different application structures is shown in
Figure 6.9 and Figure 6.10. In both figures, we changed the value of the shape
parameter a from 0.5 to 3.0 with increments of 0.5. For each value we ran 50
130
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
■ W n.Fw ah^A* □Maj_Fjnoft_AI H Basetne
09
Figure 6.8: Comparison of running times with different number of tasks.
different executions. The figures show that MAX is better than DCRA for wider
DAGs with high parallelism (a > 1.5). For application with longer DAGs. DCRA
is much faster than MAX with relatively same schedule length. As the value of
a increases, the running time of DCRA increases and the running time of MAX
decreases. V V ’ith a = 3. DCRA is 30% slower than MAX and its schedule length
is 7.5% longer. In our simulation study, we found that the number of machines
and the number of resources did not have a significant impact on the performance
of our algorithms.
In general, our experimental results show the importance of considering co
allocation requirements at mapping decisions. From our results we can conclude
that any mapping algorithm that consider the resource co-allocation requirements
of application tasks will be better than the baseline algorithm.
131
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMAX ■DCRA □BaMfcw
3 0
2 S
.20
* « s
10
iUMi
0 5 1 IS 2 5
Alpha
Figure 6.9: Comparison of schedule lengths with different application structures.
2.5
2
.5
1
3
E
0.5
0
0.5
- M A X
• DCRA
1.5 2
Alpha
2.5
Figure 6.10: Comparison of running times with different application structures.
132
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.5 Summary
In this chapter we studied the problem of mapping applications with resource
co-allocation requirements onto HC systems and computational grids. We for
mulated the problem and developed several algorithms for solving this problem
using two different approaches: the independent-set approach and the critical-
resource approach. We also developed a lower bound on the optimal schedule
length of this mapping problem. The lower bound is developed by considering
precedence and resource sharing constraints at the same time. Simulation results
showed that the performance of our algorithms is very close to the lower bound.
The results also showed that our algorithms have a performance improvement
up to 30% over a baseline algorithm of list scheduling which does not consider
the co-allocation requirements. In general, our simulation results showed the im
portance of considering the co-allocation requirements when mapping application
onto HC systems and computational grids.
133
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 7
Conclusions and Future Directions
7.1 Conclusions
Heterogeneous computing and grid computing are emerging as paradigms for
high performance computing due to the improvements in communication capa
bility among geographically distributed systems. In general. HC systems and
computational grids are considered to be the systems that make use of several
compute resources with different capabilities. I/O devices, data repositories, and
other resources, all interconnected by heterogeneous local and wide area networks
to optimize the performance of the system.
A major challenge in using HC systems and computational grids is to effec
tively use available resources. One way to take advantage of these systems is
to decompose an application into several tasks based on the computational re
quirements. Different tasks may be best suited for different machines. Once the
134
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
application is decomposed into tasks, each task needs to be assigned to a suitable
machine (matching problem) and tasks executions need to be ordered in time
(scheduling problem) to optimize a given objective function. The focus of this
dissertation is the matching and scheduling (defined as mapping) of application
tasks onto HC systems and computational grids.
In the context of HC systems and computational grids, we introduced a unified
framework that can be used to map applications onto system resources. Our
framework consists of four key components: system model, application model,
mapping problem, and mapping algorithms. The framework incorporates the
concept of advance reservation where system resources can be reserved in advance
for specific time intervals. Our mapping algorithms are developed in such a way
that all resource requirements are considered at the same time in a unified manner
to achieve better mapping decisions.
We used our mapping framework to study two novel mapping problems in
HC systems and computational grids. The first problem is mapping a set of
application DAGs with multiple resource requirements and data replication. Our
algorithms that we have developed for this problem are of two types: level-
by-level algorithms and greedy algorithms. As shown by simulation results, our
algorithms lead on the average to 50% improvement in the overall schedule length
over a baseline algorithm that does not consider all resource requirements at the
same time. In general, our results showed that it is advantageous to consider
135
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
all resource requirements simultaneously when making mapping decisions rather
than mapping each type of resource separately.
The second mapping problem that we have considered in this dissertation is
the problem of mapping a set of application DAGs with resource co-allocation re
quirements. Application tasks have two types of constraints to be satisfied: prece
dence constraints and resource sharing constraints. Two different approaches
were used to develop the heuristic algorithms: independent-set approach and
critical-resource approach. In the first approach. DAG and Resource-Sharing
Graph representations are used to find sets of independent tasks that can be
executed concurrently. The critical-resource approach is based on identifying re
sources in a high demand. The priority in mapping is given to tasks that require
these resources. We also developed a lower bound on the optimal schedule length
of this mapping problem. The lower bound was developed by considering prece
dence and resource sharing constraints at the same time. The Dependency Graph
is used to capture both constraints simultaneously. The weight of the maximum
weighted clique of the dependency graph yields a very' good lower bound. Sim
ulation results showed that our algorithms are very close to the lower bound.
The results also showed that our algorithms have a performance improvement
up to 309£ over a baseline algorithm of list scheduling which does not consider
the co-allocation requirements. In general, our simulation results showed the
importance of considering the co-allocation requirements at mapping decisions.
136
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In conclusion, we believe that this dissertation has made fundamental research
contributions toward efficient mapping algorithms for important problems in HC
systems and com putational grids. A unified mapping framework was introduced,
and its usefulness was demonstrated by applying it to two novel mapping prob
lems. In the following section we briefly describe some future research directions
and identify several challenging mapping problems.
7.2 Future Directions
In the previous chapters, we have discussed our unified mapping framework for
HC systems and computational grids. We used the framework to develop heuris
tic algorithms for tw'o novel mapping problems. To the best of our knowledge,
our work can be considered as one of the early efforts in formalizing the map
ping problem in HC systems and computational grids. Several research issues
remain to be explored. Also, several mapping problems can be studied using our
framework, and these are discussed in this section.
7.2.1 M apping with Run-Tim e Adaptation
Most mapping algorithm s are static and assume perfect estimations of computa
tion and communication costs are available at compile-time (e.g.. [2. 15. 50. 68.
71. 77. 79]). However, at run-time, computation and communication costs may
differ from estimated costs and this may greatly affect the performance of static-
137
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
algorithms. Therefore, several dynamic mapping algorithms have been proposed
(e.g.. [40. 49. 57. 61. 62]). One problem with most previous static and dynamic
mapping algorithms is the consideration of compute resources only. Most algo
rithms assume that only compute resources are needed for tasks execution and
they do not consider resource co-allocation requirements. In Chapter 6. we have
studied the mapping problem with co-allocation requirements, but we did not
consider the variations of computation and communication costs at run-time.
Also, with resource co-allocation, applications may not hold all their allocated
resources for their entire executions at run-time. Some resources may be released
before a task finishes execution once the task is no longer needs these resources.
For example, a data repository that is needed by a task to get input data may
be released once all input data have been retrieved. Also, scarce and expensive
resources that are used at some stages of a task's execution, such as a supercom
puter that is needed to process or analyze data at early stage, can be released
as soon as the task finishes using it. For these cases, run-time adaptation is
needed to account for changes in computation and communication cost and to
take advantage of the early release of resources. Adaptation may involve advanc
ing the execution of some tasks or changing their resource assignments in order
to minimize the overall execution time for the whole set of tasks.
We are investigating the general mapping problem where a set of indepen
dent tasks compete for the shared resources of a HC system or a computational
138
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
grid. Each task requires multiple and different resources to be allocated simulta
neously. At run-time, a task may release some of its allocated resources during
its execution and before its finish time. This early release of resources cannot be
predicted at compile-time.
Our initial approach for solving this mapping problem is a two-phase ap
proach. The first phase of our approach is the off-line planning phase. The goal
of this phase is to generate a valid schedule plan that minimizes the overall sched
ule length of all submitted tasks while satisfying all resource sharing constraints
among them. The schedule plan gives a scheduling order and resource assign
ments of tasks. Also, it specifies estimated start and finish times of each task
on all required resources. Start and finish times, as well as resource assignments,
are selected by the static mapping algorithm used in this phase and are based on
estimated computation and communication costs. In this phase, we assume that
all required resources will be held by a task for its entire execution time since
the early release of resources cannot be predicted at compile-time. The mapping
algorithms of Chapter 6 can be used for this phase.
The second phase of our approach is the run-time adaptation phase. The
goal of the second phase is to improve the performance of the schedule plan
generated at compile-time by adapting to all run-time changes. At run-time,
actual computation and communication costs may differ from estimated costs
used in the off-line planning phase to generate the schedule plan. Also, tasks may
139
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
release some of their allocated resources during their executions and before their
finish times (as opposed to the assumptions made in the first phase). Therefore,
we need to adapt to all run-time changes and probably we may need to modify
the original schedule plan (i.e.. the scheduling order and resource assignments of
tasks) in order to improve the overall scheduling length.
7.2.2 Mapping w ith QoS Requirem ents
It is common in HC systems and computational grids that multiple applica
tions with different QoS requirements compete and share system resources. QoS
requirements include deadline, security level, latency, throughput, etc. As an
increasing number of applications have QoS requirements, mapping algorithms
must account for the QoS desired by application tasks. Mapping algorithms
must provide the required QoS and ensure the logical correctness of those appli
cations. Our framework of Chapter 4 can be used to study this mapping problem
where precedence-constrained applications have multiple QoS requirements such
as deadlines, priorities, and security requirements.
For application tasks that are represented by DAGs and have precedence
constraints. QoS requirements (such as deadline, priority, and throughput) are
usually specified globally for the overall application. Since mapping is done at
the tasks level, translation mechanisms are needed to assign QoS requirements
specified globally to tasks level based on the overall QoS requirements specified
140
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Application
Set
No Yes
Done
Re-assignment
of virtual QoS
requirements
Assignment of
virtual QoS
requirements
Mapping ready
tasks
Figure 7.1: Our initial approach for mapping with QoS requirements.
141
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
at the application level. An approach for developing mapping algorithms for
precedence-constrained applications with multiple QoS requirements is shown in
Figure 7.1. The approach can be summarized as follows.
1. Assignment of virtual QoS requirements
In this step, different mechanisms are needed to translate global QoS re
quirements (if necessary) to “virtual" requirements at tasks level. Different
translation mechanisms are needed for different QoS attributes. For prior
ity attribute, for example, a simple translation strategy can be used to set
tasks' “virtual" priorities equal to the global priority. This simple strategy
mav not be efficient for other QoS attributes such as deadline. A problem
of this simple strategy with deadline assignment is that it does not consider
execution times of successor tasks when assigning a 'Virtual” deadline to a
task. Deadline attribute needs more complicated mechanisms.
The problem of assigning “virtual” deadlines at tasks level based on the
overall application deadline was studied in [52]. Guidelines for deriving
deadlines for tasks from global end-to-end deadlines were presented. No
mapping algorithms were proposed by the authors. Also, only the serial-
parallel task model was considered in [52] while we are considering a more
general DAG model. The system model assumed in [52] is homogeneous
and communication costs have not been considered in deadline assignments.
In our approach, the following issues must be considered when assigning
142
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
"virtual" deadlines to tasks: (1) critical paths. (2) worst-case (or average)
execution times among all machines, and (3) worst-case (or average) com
munication times.
2. M apping ready task s
In this step, ready tasks will be matched to suitable resources and scheduled
to run. A task is considered ready for mapping if all its predecessors have
completed, and it has received all the input data needed for its execution.
3. R eassignm ent of v irtu a l QoS req u irem en ts
After each mapping event, the pre-assigned virtual QoS values for the suc
cessors of the mapped task need to be modified in order to improve their
mapping decisions. The initial “virtual" QoS assignment of a task consid
ers the execution and communication times of the successor tasks along the
critical path. Since we are assuming HC systems, these execution times
may be the maximum (or average) of all possible execution times on differ
ent machines. The same applies for communication times. Therefore, after
the mapping of a task, the virtual values of all successor tasks along the
critical path will be recalculated to improve the future mapping decisions.
The reassignment strategies used for this step will largely depend on the
techniques used for the first step.
143
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
7.2.3 Mapping in Time-Sharing Environments
In this dissertation, we assume that at most one task can access any resource
at any given time. In other words, tasks have a dedicated access to systems
resources. This assumption has been taken by several researchers in order to
simplify the mapping problems. However, in real systems, multiple tasks may
share the same resource simultaneously.
In the future, we are planning to extend our mapping framework to consider
time sharing scenarios. Thus, we need to modify our communication and compu
tation costs models. The E T C matrix will be considered as the base (estimated)
computation costs. The execution time of a task on a given machine will be a
function of its base computation cost on that machine as well as numbers (and
types) of other tasks running on the machine.
144
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reference List
[1] T. Adam. K. Chandy. and J. Dickson. A comparison of list schedules for
parallel processing systems. Communication of the ACM. 17(1‘ 2):685 690.
December 1974.
[2] I. Ahmad and Y.-K. Kwok. On parallelizing the multiprocessor schedul
ing problem. IEEE Transactions on Parallel and Distributed Systems.
10(4):414— 132. April 1999.
[3] A. Alhusaini. Y. K. Prasanna. and C. S. Raghavendra. A unified resource
scheduling framework for heterogeneous computing environments. In 8th
Heterogeneous Computing Workshop (H C W 99). pages 156-165. April 1999.
[4] A. Alhusaini, V. K. Prasanna, and C. S. Raghavendra. A framework for
mapping with resource co-allocation in heterogeneous computing systems.
In 9th Heterogeneous Computing Workshop (HCW ' 2000). pages 273 286.
May 2000.
[5] H. Ali and H. El-Rewini. The time complexity of scheduling interval orders
with communication is polynomial. Parallel Processing Letters. 3( 1 ):53 58.
1993.
[6] AppLeS Web Page, http://apples.ucsd.edu.
[7] M. Baker. R. Buyya. and D. Laforenza. The grid: A survey on global efforts
in grid computing. Technical Report TR-2001/92. Monash University. May
2001.
[8] F. Berman. High-Performance schedulers, pages 279-309. In Foster and
Kesselman [37]. 1999.
[9] F. Berman and R. Wolski. Scheduling from the perspective of the appli
cation. In 5th IEEE International Symposium on High Performance Dis
tributed Computing. August 1996.
[10] F. Berman and R. Wolski. The AppLeS project: A status report. In 8th
NEC Research Symposium. Berlin. Germany. May 1997.
145
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[11] S. Berson. R. Lindell. and R. Braden. An architecture for advance reserva
tions in the internet, work in progress.
[12] V. Bharghavan. K. Lee. S. Lu. S. Hu. J.-R. Li. and D. Dwyer. The TIMELY
adaptive resource management architecture. IEEE Personal Communica
tions. 5(4):20 31. August 1998.
[13] P. B. Bhat. Communication Scheduling Techniques for Distributed Hetero
geneous Systems. PhD thesis. University of Southern California. August
1999.
[14] T. Braun. H. J. Siegel. N. Beck. L. Boloni. M. Maheswaran. A. Reuther.
J. Robertson. M. Theys. and B. Yao. A taxonomy for describing match
ing and scheduling heuristics for mbced-machines heterogeneous computing
systems. In Workshop on Advances in Parallel and Distributed Systems
(APADS). West Lafayette. IN. October 1998.
[15] T. Braun. H. J. Siegel. N. Beck. L. Boloni. M. Maheswaran. A. Reuther.
J. Robertson. M. Theys. B. Yao. D. Hensgen. and R. Freund. A comparison
study of static mapping heuristics for a class of meta-task on heterogeneous
computing systems. In 8th Heterogeneous Computing Workshop (H C W 99).
pages 15-29. April 1999.
[16] R. Buyva. S. Chapin, and D. DiNucci. Architectural models for resource
management in the Grid. In 1st IEEE/ACM International Workshop on
Grid Computing, pages 18-35. Bangalore, India. December 2000.
[17] H. Casanova. A. Legrand. D. Zagorodnov. and F. Berman. Heuristics for
scheduling parameter sweep applications in grid environments. In 9th Het
erogeneous Computing Workshop (HCW ' 2000). pages 349-363, May 2000.
[18] H. Casanove and J. Dongarra. NetSolve: A network server for solving com
putational science problems. Technical Report CS-95-313. University of Ten
nessee. November 1995.
[19] T. Casavant and J. Kuhl. A taxonomy of scheduling in general-purpose dis
tributed computing systems. IEEE Transactions on Software Engineering.
14(2):141-154. Feb. 1988.
[20] S. J. Chapin. D. Katramatos. J. Karapovich. and A. Grimshaw. Resource
management in legion. Technical Report CS-98-09. University of Virginia.
February 1998.
[21] S. Chen and K. Nahrstedt. Hierarchical scheduling for multiple classes of ap
plications in connection-oriented integrated-service networks. In ICMCS'99.
June 1999.
146
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[22] N\ C ’hristofides. Graph Theory: An Algorithmic Approach. Academic Press.
1975.
[23] K. Czajkowski. I. Foster, and C. Kesselman. Resource co-allocation in com
putational grids. In 7th IEEE Symposium on High Performance Distributed
Computing, pages 219-228. 1999.
[24] K. Czajkowski. I. Foster. C. Kesselman. S. Martin. V V . Smith, and S. Tuecke.
A resource management architecture for metacomputing systems. In 4th An
nual Workshop on Job Scheduling Strategies for Parallel Processing. Orlando.
FL. March 1998.
[25] M. Degermark. T. Kohler. S. pink, and O. Schelen. Advance reservations for
predictive service. ACM/Springer Verlag Journal on Multimedia Systems.
5(3). 1997.
[26] T. DeVV'itt. T. Gross. B. Lovvekamp, N. Miller. P. Steenkiste. J. Subhlok.
and D. Sutherland. ReMoS: A resource monitoring system for network-
aware applications. Technical Report CMU-CS-97-194. School of Computer
Science. Carnegie Mellon University. Dec 1997.
[27] I. Ekemecic. I. Tartalja. and V. Milutinovic. E M 3: A taxonomy of hetero
geneous computing systems. IEEE Computer. 28(12):68-70. 1995.
[28] H. El-Rewini and H. Ali. On considering communication in scheduling task
graphs on parallel processors. Journal of Parallel Algorithms and Applica
tions. 3:177-191, 1994.
[29] H. El-Rewini and T. G. Lewis. Scheduling parallel tasks onto arbitrary’
target machines. Journal of Parallel and Distributed Computing. 9:138-153.
1990.
[30] H. El-Rewini and T. G. Lewis. Distributed and Parallel Computing. Man
ning Publications. Greenwich. CT. 1998.
[31] H. El-Rewini. T. G. Lewis, and H. H. Ali. Task Scheduling in Parallel and
Distributed Systems. PTR Prentice Hall. Englewood Cliffs. N.I. 1994.
[32] M. M. Eshagian. editor. Heterogeneous Computing. Artech House. Norwood.
MA. 1996.
[33] M. M. Eshagian and Y. Wu. Mapping heterogeneous task graphs onto
heterogeneous system graphs. In 6th Heterogeneous Computing Workshop
(H C W 97). pages 147-160. 1997.
147
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[34] D. Fernandez-Baca. Allocating modules to processors in a distributed sys
tem. IEEE Transactions on Software Engineering. SE-15(11):1427 1436.
November 1989.
[35] D. Ferrari. A. Gupta, and G. Ventre. Distributed advance reservation of real
time connections. ACM/Springer Verlag Journal on Multimedia Systems.
5(3). 1997.
[36] I. Foster and C. Kesselman. Computational Grids, pages 15-51. In [37],
1999.
[37] I. Foster and C. Kesselman. editors. The Grid: Blueprint for a New Com
puting Infrastructure. Morgan Kaufmann. San Francisco. CA. 1999.
[38] I. Foster. C. Kesselman. C. Lee. B. Lindell, K. iNahrstedt, and A. Roy. A
distributed resource management architecture that support advance reser
vations and co-allocation. In Int'l Workshop on Quality of Service. 1999.
[39] I. Foster. C. Kesselman. and S. Tuecke. The anatomy of the grid. Intena-
tional Journal of Supercomputer Applications. 2001. To appear.
[40] R. Freund. B. Carter, D. Watson, E. Keith, and F. Mirabile. Generational
scheduling for heterogeneous computing systems. In Int'l Conf. Parallel
and Distributed Processing Techniques and Applications (PDPTA '96). pages
769-778. August 1996.
[41] R. Freund. M. Gherritv, S. Ambrosius. M. Campbell. M. Halderman.
D. Hensgen, E. Keith, T. Kidd. M. Kussow. J. Lima, F. Mirabile. L. Moore.
B. Rust, and H. J. Siegel. Scheduling resources in multi-user, heterogeneous
computing environments with SmarNet. In 7th Heterogeneous Computing
Workshop (HCW '98). pages 184 199. March 1998.
[42] R. Freund. T. Kidd. D. Hensgen. and L. Moore. SmartNet: A scheduling
framework for heterogeneous computing. In The International Symposium
on Parallel Architectures. Algorithms, and Networks. Beijing. China. June
1996.
[43] R. Freund and H. J. Siegel. Guest editors' introduction: Heterogeneous
processing. IEEE Computer. 26(6): 13 17. June 1993.
[44] M. R. Gary and D. S. Johnson. Computers and Intractability: A Guide to the
Theory o f NP-Completeness. W.H. Freeman and Company. San Francisco.
CA. 1979.
[45] Globus Web Page, http://www.globus.org.
148
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[46] D. Hensgen. T. Kidd. D. St.John. M. Schnaidt. H. Siegel. T. Braun. M. Mah
eswaran. S. Ali. J. Kim. C. Irvine. T. Levin. R. Freund. M. Kussow. M. God
frey. A. Duman. P. C ’ arff. S. Kidd. V. Prasanna. P. Bhat. and A. Alhu
saini. An overview of MSHN: the management system for heterogeneous
networks. In 8th Heterogeneous Computing Workshop (H CW ' 99). pages
184-198. April 1999.
[47] C.-C. Hui and S. T. Chanson. Allocating task interaction graphs to pro
cessors in heterogeneous networks. IEEE Transactions on Parallel and Dis
tributed Systems. 8(9):908-925. September 1997.
[48] O. Ibarra and C. Kim. Heuristic algorithms for scheduling independent tasks
on non identical processors. Journal of The ACM. 24(2):‘ 280-289. April 1977.
[49] M. Iverson and F. Ozguner. Dynamic, competitive scheduling of multi
ple dags in a distributed heterogeneous environment. In 7th Heterogeneous
Computing Workshop (HCW'' 98). pages 70-78. March 1998.
[50] M. Iverson. F. Ozguner. and G. J. Follen. Parallelizing existing applications
in a distributed heterogeneous environment. In 4th Heterogeneous Comput
ing Workshop (H C W 95). pages 93-100. April 1995.
[51] M. Kafil and I. Ahmad. Optimal task assignment in heterogeneous
distributed computing systems. IEEE Concurrency. 6(3):42-49. July-
September 1998.
[52] B. Kao and H. Garcia-.Molina. Deadline assignment in a distributed soft
real-time systems. IEEE Transactions On Parallel and Distributed Systems.
December 1997.
[53] K. Kennedy. Compilers. Languages, and Libraries, pages 181-204. In Foster
and Kesselman [37], 1999.
[54] A. Khokhar. V. K. Prasanna. M. Shaaban. and C. L. Wang. Heterogeneous
computing: Challenges and opportunities. IEEE Computer. 26(6):18-27.
June 1993.
[55] Y. kopidakis. M. Lamari. and V. Zissimopoulos. On the task assignment
problem: Two new efficient heuristic algorithms. Journal of Parallel and
Distributed Computing. 42:21-29. 1997.
[56] Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating di
rected task graphs to multiprocessors. ACM Computing Surveys. 31 (4):406
471. 1999.
149
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[57] C. Leangsuksun. J. Potter, and S. Scott. Dynamic task mapping algorithms
for a distributed heterogeneous computing environment. In 4th Heteroge
neous Computing Workshop (H C W 95). pages 30 34. April 1995.
[58] Legion Web Page, http://legion.mrginia.edu.
[59] M. Litzkow. M. Livny. and M. Mutka. Condor: A hunter of idle workstations.
In 8th Int'l Conference of Distributed Computing Systems, pages 104 ill.
1988.
[60] V. M. Lo. Heuristic algorithms for task assignment in distributed systems.
IEEE Transactions on Computers. 37(11): 1384-1397. November 1988.
[61] M. Maheswaran. S. Ali. H. J. Siegel. D. Hensgen. and R. Freund. Dynamic
matching and scheduling of a class of independent tasks onto heterogeneous
computing systems. In 8th Heterogeneous Computing Workshop (HCW '99).
pages 30-44. April 1999.
[62] M. Maheswaran and H. J. Siegel. A dynamic matching and scheduling algo
rithm for heterogeneous computing systems. In 7th Heterogeneous Comput
ing Workshop (H CW ’ 98). pages 57-69. March 1998.
[63] P. M. Melliar-Smith and L. E. Moser. Network Protocols, pages 453-478. In
Foster and Kesselman [37], 1999.
[64] I. Milis. Task assignment in distributed systems using network flow methods.
In 8th Franco-Japanese and 4th Franco-Chinese Conference, pages 396-405,
Brest. France. July 1995.
[65] R. Moore. C. Baru, R. Marciano. A. Rajasekar. and M. Wan. Data-intensive
computing, pages 105-129. In Foster and Kesselman [37], 1999.
[66] A. Natrajan. M. Humphry, and A. Grimshaw. Capacity and capability com
puting using legion. In International Conference on Computational Science.
May 2001.
[67] C. Neuman. Security. Accounting, and Assurance, pages 395-420. In Foster
and Kesselman [37], 1999.
[68] P. Shroff. D. W. Watson. N. S. Flann. and R. F. Freund. Genetic simu
lated annealing for scheduling data-dependent tasks in heterogeneous en
vironment. In 5th Heterogeneous Computing Workshop (H C W 96). pages
98-117. April 1996.
[69] H. J. Siegel. J. K. Antonio. R. Metzger, min Tan. and V. A. Li. Heteroge
neous computing. In A. Zomaya. editor. Parallel and Distributed Computing
Handbook, pages 725-761. McGraw-Hill. New York. 1996.
150
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[70] H. J. Siegel. M. Mahesewaran. and T. D. Braun. Heterogeneous distributed
computing. In .J. Webster, editor. Encyclopedia of Electrical and Electronics
Engineering. John Wiley k Sons. New York. To appear.
[71] G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for
interconnection-constrained heterogeneous processor architectures. IEEE
Transactions on Parallel and Distributed Systems. 4(2):175-187. Feb. 1993.
[72] H. Singh and A. Youssef. Mapping and scheduling heterogeneous task
graphs using genetic algorithms. In 5th Heterogeneous Computing Work
shop (H CW ' 96). pages 86-97. April 1996.
[73] L. Smarr and C. E. Catlett. Metacomputing. Communications of the ACM.
35(6):45 52. June 1994.
[74] H. S. Stone. Multiprocessor scheduling with the aid of network flow algo
rithms. IEEE Transactions on Software Engineering, SE-3(1):85 93. Jan
uary 1977.
[75] L. Tao, B. Narahari. and Y. Zhao. Heuristics for mapping parallel computa
tions to heterogeneous parallel architectures. In 2nd Workshop on Hetero
geneous Processing (WHP ’ 93). pages 36-41. April 1993.
[76] H. Topcuoglu. S. Hariri. W. Furmanski. J. Yalente, I. Ra. D. Kim. Y. Kim.
X. Bing, and B. Ye. The software architecture of a Virtual Distributed
Computing Environment. In 6th IEEE In t’ l Symp. on High Performance
Distributed Computing. 1997.
[77] H. Topcuoglu. S. Hariri, and M.-Y. V V u. Task scheduling algorithms for het
erogeneous processors. In 8th Heterogeneous Computing Workshop (HCW ’
99), pages 3 14. April 1999.
[78] R. D. Yenkataramana and N. Ranganathan. Multiple cost optimization
for task assignment in heterogeneous computing systems using learning au
tomata. In 8th Heterogeneous Computing Workshop (H C W 99). pages 137
145. April 1999.
[79] L. Wang. H. J. Siegel. Y. P. Roychowdhury. and A. A. Maciejewski. Task
matching and scheduling in heterogeneous computing environments using a
genetic-algorithm-based approach. Journal of Parallel and Distributed Com
puting. 47( l):8 22. November 1997.
[80] D. W. Watson. H. J. Siegel, john K. Antonio. M. A. Nicholes, and M. J.
Atallah. A framework for compile-time selection of parallel modes in an
SIMD/SPMD heterogeneous envirmnent. In 2nd Workshop on Heteroge
neous Processing (WHP ’ 93). pages 57 64. April 1993.
151
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[81] L. C. Wolf. L. Delgrossi. R. Steinmetz. S. Schaller. and H. Witting. Issues
of reserving resourc es in advance. In NOSSSDA V'95. April 1995.
[82] R. Wolski. W Spring, and J. Hayes. The network weather service: A dis
tributed resource performance forcasting service for metacomputing. Journal
of Future Generation Computing Systems. 1999.
[83] D. Zagorodnov. F. Berman, and R. Wolski. Application scheduling on the
information power grid. International Journal of High-Performance Com
puting. submitted.
152
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Energy and time efficient designs for digital signal processing kernels on FPGAs
PDF
Hierarchical design space exploration for efficient application design using heterogeneous embedded system
PDF
Energy -efficient information processing and routing in wireless sensor networks: Cross -layer optimization and tradeoffs
PDF
An efficient design space exploration for balance between computation and memory
PDF
Compression, correlation and detection for energy efficient wireless sensor networks
PDF
Improving memory hierarchy performance using data reorganization
PDF
A thermal management design for system -on -chip circuits and advanced computer systems
PDF
Architectural support for efficient utilization of interconnection network resources
PDF
Application-specific external memory interfacing for FPGA-based reconfigurable architecture
PDF
Cost -sensitive cache replacement algorithms
PDF
Energy latency tradeoffs for medium access and sleep scheduling in wireless sensor networks
PDF
Intelligent systems for video analysis and access over the Internet
PDF
Alias analysis for Java with reference -set representation in high -performance computing
PDF
Architectural support for network -based computing
PDF
Intelligent image content analysis: Tools, techniques and applications
PDF
Adaptive routing services in ad-hoc and sensor networks
PDF
An integrated environment for modeling, experimental databases and data mining in neuroscience
PDF
Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread
PDF
Contributions to content -based image retrieval
PDF
Color processing and rate control for storage and transmission of digital image and video
Asset Metadata
Creator
Alhusaini, Ammar Hasan
(author)
Core Title
A unified mapping framework for heterogeneous computing systems and computational grids
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Computer Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Computer Science,engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Prasanna, Viktor (
committee chair
), Raghavendra, Cauligi S. (
committee member
), Shahabi, Cyrus (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-187462
Unique identifier
UC11339256
Identifier
3065757.pdf (filename),usctheses-c16-187462 (legacy record id)
Legacy Identifier
3065757.pdf
Dmrecord
187462
Document Type
Dissertation
Rights
Alhusaini, Ammar Hasan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical