Page 1 |
Save page Remove page | Previous | 1 of 185 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
OPTIMIZING DISTRIBUTED STORAGE IN CLOUD ENVIRONMENTS Copyright 2013 by Maheswaran Sathiamoorthy A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2013 Maheswaran Sathiamoorthy
Object Description
Title | Optimizing distributed storage in cloud environments |
Author | Sathiamoorthy, Maheswaran |
Author email | callsmahesh@gmail.com;callsmahesh@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Electrical Engineering |
School | Viterbi School of Engineering |
Date defended/completed | 2013-08-14 |
Date submitted | 2013-11-24 |
Date approved | 2013-11-26 |
Restricted until | 2013-11-26 |
Date published | 2013-11-26 |
Advisor (committee chair) | Krishnamachari, Bhaskar |
Advisor (committee member) |
Neely, Michael J. Yu, Minlan Dimakis, Alexandros G. Bai, Fan |
Abstract | Cloud storage, in the context of this research, is defined to be the abstraction of storage spanning multiple machines into a single storage pool that end-users can access without knowing the internal details of where or how the storage is maintained. Traditionally, cloud storage is used to refer to the storage pool in data centers. In our work, in addition to data center based cloud storage, we also consider a vehicular network based cloud storage -- storage obtained by pooling together the storage on vehicles, typically connected by a vehicular network. ❧ In this thesis, we optimize the distributed storage in these two cloud environments. Specifically, we identify two challenges each in the two cloud environments and propose solutions to these challenges. ❧ In Chapter 3, we consider the first important challenge in the vehicular cloud, namely the high latencies of on-demand content access. We investigate the benefits of using erasure codes in reducing the content access latencies through both analysis and realistic trace-based simulations. We show that a key parameter affecting the file download latency is the ratio of file size to download bandwidth. When this ratio is small so that a file can be communicated in a single encounter, we find that coding techniques offer very little benefit over simple file replication. However, we analytically show that for large ratios, for a memoryless contact model, distributed erasure coding yields a latency benefit of N/α over uncoded replication, where N is the number of vehicles and α the redundancy factor. Effectively, in this regime, coding yields the same performance as replicating all the files at all other vehicles, but using much less storage. We also evaluate the benefits of coded storage using large real vehicle traces of taxis in Beijing and buses in Chicago. These simulations, which include a realistic radio link quality model for a IEEE 802.11p dedicated short range communication (DSRC) radio, validate the observations from the analysis, demonstrating that coded storage dramatically speeds up the download of large files in vehicular networks. ❧ In Chapter 4, we consider the second challenge, namely the problem of helper node allocation. In order to relay a file from a node that has the file to another that wants the file, it may be necessary to enlist the help of other relaying nodes. When there are multiple types of files, an existing pool of helper nodes cannot help the dissemination of all the files due to storage and bandwidth constraints. In the chapter, we formulate and address mathematically this fundamental problem of resource allocation in the form of helper nodes in disseminating multiple contents. We consider a stochastic homogeneous contact process for the nodes in the vehicular network, or more generally an intermittently connected mobile network. We consider and solve two variations of the problem -- one in which the goal is to maximize the expected number of demands satisfied and another in which the goal is to minimize the time taken to disseminate the files. Besides the global optimization perspective, we also examine the problem from a game theoretic perspective in which a central agent auctions the storage to competing content providers, and show how self-interested decisions impact the social welfare. ❧ In the second half of the thesis, the data center cloud is considered. In Chapter 5 we investigate how to optimize the repair traffic in data centers, while keeping the storage overhead as low as possible. Node failures are frequent in data centers and when repairing failed nodes, network traffic is used (which is called repair traffic). Replication has the lowest possible repair traffic; however, replication has large storage overhead. The storage overhead can be reduced by using Reed Solomon codes, but they generate significantly more repair traffic than replication. We implement in Hadoop HDFS a new class of erasure codes called Locally Repairable Codes (LRCs) that can reduce the repair traffic by approximately 2x as compared to Reed Solomon Codes, while only requiring 14% more storage (for our particular implementation). ❧ The last challenge we consider is the problem of placement of blocks in a data center. When placing replicas or erasure encoded blocks in a data center, the common approach is to place them on separate racks. While this can help reduce the probability of permanent data loss, it creates cross-rack traffic when repairing failed nodes. This can slow down repair, thereby affecting reliability. In Chapter 6, we identify a tradeoff between fault tolerance and repair speed when placing data in a data center and capture this tradeoff into a single metric called Mean Time To Data Loss (MTTDL). We use this metric to determine how to store blocks to maximize reliability. ❧ Even though vastly different, the two cloud environments offer some similarities in the challenges or the approaches we can take to handle these challenges. For example, we advocate the use of erasure codes in both these storage environments for storing certain types of data. ❧ A different perspective to understand the work presented in this thesis is to consider the solutions to the four challenges presented here as the answers to the questions that ask how to store data and where to store data in each of these two distributed environments. While the chapters 3 and 4 cater to the how and where questions for the vehicular clouds, the chapters 5 and 6 do the same for the data center cloud. |
Keyword | cloud environment; data centers; distributed storage; erasure codes; optimization; vehicular networking |
Language | English |
Format (imt) | application/pdf |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m |
Contributing entity | University of Southern California |
Rights | Sathiamoorthy, Maheswaran |
Physical access | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
Repository name | University of Southern California Digital Library |
Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-Sathiamoor-2181.pdf |
Archival file | uscthesesreloadpub_Volume8/etd-Sathiamoor-2181.pdf |
Description
Title | Page 1 |
Repository email | cisadmin@lib.usc.edu |
Full text | OPTIMIZING DISTRIBUTED STORAGE IN CLOUD ENVIRONMENTS Copyright 2013 by Maheswaran Sathiamoorthy A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2013 Maheswaran Sathiamoorthy |