Page 1 |
Save page Remove page | Previous | 1 of 209 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
RESOURCE MANAGEMENT FOR SCIENTIFIC WORKFLOWS by Gideon M. Juve A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2012 Copyright 2012 Gideon M. Juve
Object Description
Title | Resource management for scientific workflows |
Author | Juve, Gideon M. |
Author email | juve@usc.edu;gideonjuve@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Computer Science |
School | Viterbi School of Engineering |
Date defended/completed | 2012-03-27 |
Date submitted | 2012-04-13 |
Date approved | 2012-04-16 |
Restricted until | 2012-04-16 |
Date published | 2012-04-16 |
Advisor (committee chair) | Deelman, Ewa |
Advisor (committee member) |
Jordan, Thomas H. Chervenak, Ann Nakano, Aiichiro |
Abstract | Scientific workflows are a parallel computing technique used to orchestrate large, complex, multi-stage computations for data analysis and simulation in many academic domains. Resource management is a key problem in the execution of workflows because they often involve large computations and data that must be distributed across many resources in order to complete in a reasonable time. Traditionally, resources in distributed computing systems such as clusters and grids were allocated to workflow tasks through the process of batch scheduling. The tasks were submitted to a batch queue and matched to available resources just prior to execution. Recently, due to performance and quality of service considerations on the grid, and the development of cloud computing, it has become advantageous and, in the case of cloud computing, necessary for workflow applications to explicitly provision resources ahead of execution. This trend toward resource provisioning has created many new problems and opportunities in the management of scientific workflows. This thesis explores several of these resource management issues and describes some potential solutions. ❧ This thesis makes the following contributions: 1. It describes several problems associated with resource provisioning in cluster and grid environments, and presents a new provisioning approach based on pilot jobs that has many benefits for both resource owners and application users in terms of performance, quality of service, and efficiency. It also describes the design and implementation of a system based on pilot jobs that enables applications to bypass restrictive grid scheduling policies and is shown to reduce the makespan of several workflow applications by 32%-48% on average. 2. It describes the challenges of provisioning resources for workflows and other distributed applications in Infrastructure as a Service (IaaS) clouds and presents a new technique for modeling complex, distributed applications that is based on directed acyclic graphs. This model is used to develop a system for automatically deploying and managing distributed applications in infrastructure clouds. The system has been used to provision hundreds of virtual clusters for executing scientific workflows in the cloud. 3. It describes the challenges and benefits of running workflow applications in infrastructure clouds and presents the results of several studies investigating the cost and performance of running workflow applications on Amazon EC2 using a variety of different resource types and storage systems. These studies compared the performance of workflows in grids and clouds, characterized the virtualization overhead of workflow applications in the cloud, compared the cost and performance of using different storage systems with workflows in the cloud, and evaluated the long-term costs of hosting workflow applications in the cloud. 4. It investigates the issue of predicting the resource needs of workflow applications using historical data, and describes a technique for collecting detailed resource usage records for workflow applications that is applied to several real applications. In addition to estimating resource requirements, this data can also be used as inputs for simulations of scheduling algorithms and workflow management systems, and for identifying problems and optimization opportunities in workflows. This technique is used to collect and analyze the resource usage of six different workflow applications, which is analyzed to identify potential bugs and opportunities for optimizing the workflows. 5. It investigates issues related to dynamic provisioning of resources for workflow ensembles and describes three different algorithms (1 offline and 2 online) that were developed for provisioning and scheduling workflow ensembles under deadline and budget constraints. The relative performance of these algorithms is evaluated using several different applications under a variety of realistic conditions including resource provisioning delays and task estimation errors. It shows that the offline algorithm is able to achieve higher performance given perfect conditions, but the online algorithms are better able to adapt to errors and delays without exceeding the constraints. |
Keyword | scheduling; resource management; provisioning; scientific workflows |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m |
Contributing entity | University of Southern California |
Rights | Juve, Gideon M. |
Physical access | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
Repository name | University of Southern California Digital Library |
Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
Repository email | cisadmin@lib.usc.edu |
Archival file | uscthesesreloadpub_Volume1/etd-JuveGideon-605.pdf |
Description
Title | Page 1 |
Contributing entity | University of Southern California |
Repository email | cisadmin@lib.usc.edu |
Full text | RESOURCE MANAGEMENT FOR SCIENTIFIC WORKFLOWS by Gideon M. Juve A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2012 Copyright 2012 Gideon M. Juve |