Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Design and evaluation of a fault tolerance protocol in Bistro
(USC Thesis Other)
Design and evaluation of a fault tolerance protocol in Bistro
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A FAULT TOLERANCE PROTOCOL IN BISTRO by Leslie Chi-Keung Cheung y\Tl%%ûsFïeM%üedtothe ]R A iC U L T "Y (]H F TUI? S(:II()C)L()i;i3hJ(jIhIIiE]tIT4(] UNIVERSITY OF SOUTHERN CALIFORNIA In Partial FuMUment of the Requirements for the Degree MASTER OF SCIENCE (COMPUTER SCIENCE) May 2004 Copyright 2004 Leslie Chi-Keung Cheung Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 1421761 Copyright 2004 by Cheung, Leslie Chi-Keung All rights reserved. INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. UMI UMI Microform 1421761 Copyright 2004 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgements First of all, I would like to express my thanks to my advisor Leana Golubchik. Without her guidance 1 would not have completed this thesis. On technical matters, she gave me a lot of useful comments to improve this thesis. Personally, she gave me tremendous amount of support and encouragement. I am grateful to Banu Ozden and Cyrus Shahabi for serving on my thesis committee, and for giving me comments to improve this thesis. I also want to thank my co-worker, Yan Yang. He designed the protocol with me, and gave me a lot of useful feedback on improving this thesis. Thanks also go to Cheng-Fu Chou for providing me with his previous work on this topic. His work provided me an excellent outline for this thesis. I thank my colleagues at the Internet Multimedia Lab for participating in discussions on this work, and for providing me feedback on earlier versions of this thesis. Thank you for providing me a nice place to work, and for letting me occupy the conference table all the time when I was in the lab. Finally, I wish to express my love to my parents. They provided a lot of emotional support for me, especially during my hard times. 1 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Contents Acknowledgements ü List of Tables v List of Figures vi Abstract vii 1 Introduction 1 2 Related Work 4 2.1 Fault Tolerance in Distributed Systems................................................. 4 2.2 Erasure Codes in Computer Networking............................ 6 3 Overview of Erasure Codes 8 4 Fault Tolerance Protocol 11 4.1 Timestamp S tep ..................................................................................... 11 4.2 Data Transfer Step ................................................................................ 15 4.3 Data Collection S t e p ............................................................................. 16 5 Analytical Models 19 5.1 Modeling ReHability of Checksum Groups............................................ 19 5.1.1 Independent Packet Losses.......................................................... 20 5.1.2 Independent Bistro Failures ....................................................... 20 5.1.3 Gilbert Model.............................................................................. 21 5.2 Overall Reliability M odel....................................................................... 23 5.3 Performance M odel............................................................................... 26 5.3.1 Server Performance in the Timestamp S t e p ............................... 26 5.3.2 Size of timestamp m essages....................................................... 28 5.4 CZostfimction........................................................................................ :Z 8 1 1 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6 Results 30 6.1 Setting W eights....................................................................................... 31 6.2 Varying the Number of Checksum Groups in a FEC G ro u p ................. 32 6.3 Varying the Number of Parity Packets in a FEC Group ....................... 34 6.4 Varying the Number of FEC Groups in a F i l e ...................................... 35 6.5 Varying the Probability of Losing a Packet............................................ 37 7 Future Work 38 8 Conclusions 40 Bibliography 42 IV Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Tables 4.1 Meaning of Variables.................................................. 13 5.1 Summary of Content of Checksum Groups. ............................. 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures 1.1 Depiction of upload processes with and without Bistro ............ 2 3.1 Illustration of an Erasure Code ................................... 8 4.1 Timestamp S te p ...................................................................................... 12 4.2 FEC Groups and Checksum G ro u p s..................................................... 12 4.3 Data Transfer Step ................................................................................ 15 4.4 Data Collection S t e p ............................................................................. 17 5.1 Gilbert M odel........................................................................................ 21 5.2 Average Time to Compute Hash of Different Number of Checksums . . 27 6.1 Cost Function-varying w eig h ts........................................................... 31 6.2 Cost Function-varying Z ........................... 33 6.3 Cost Function - varying n — k ............................................................... 34 6.4 Cost Function - varying A;........................................................................ 36 6.5 Cost Function - varying p ........................................................................ 37 VI Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract This thesis investigates fault tolerance issues in Bistro, a wide area upload architecture. In Bistro, to achieve scalability and to avoid hot spots when deadlines approach, clients first upload their data to intermediaries, known as bistros. A destination server then pulls data from bistros as needed. However, during the server pull process, bistros can be unavailable due to failures, or they can be malicious, i.e., they might intentionally corrupt data. As a result, we need to provide a fault tolerance protocol within the Bistro architecture. Thus, in this thesis, we develop such a protocol which employs erasure codes in order to make the data uploading process more reliable. We develop analyti cal models to study reliability and performance characteristics of this protocol, and we derive a cost function to smdy the tradeoff between reliability and performance in this context. V ll Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction High demand for some services or data creates hot spots, which is a m^or hurdle to achieving scalability in Internet-based applications. In many cases, hot spots are associ ated with real life events. For instances, releasing a new version of certain software may create a huge demand for it soon after the release, which may overload servers which distribute that software. There has been a lot of research aiming at relieving hot spots in the Internet due to download applications, such that Internet applications can operate efficiently even under heavy loads. Examples include replication of services (e.g., replication of DNS servers), data replication (e.g., web caching and Akamai [1]), and data replacement (e.g., digerent streaming rates for audio and video streaming). Tÿpes of communicate mode which these research addresses are mostly: one-to-one such as email and instant messaging, one-to-many such as web downloads, and many-to-many such as chatrooms and video conferencing. To the best of our knowledge, however, there are no research attempts to relieve hot spots in many-to-one applications, or upload applications, except Bistro [4]. Examples of upload applications are: interactive TV polling, online tax form submissions, home work submissions in distance education, and conference paper submissions. Current 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) ig)load without Ae Æmro CHents Figure 1.1: Depiction of upload processes with and without Bistro upload applications usually make use of many individual one-to-one transfers. This is not a scalable solution because the server or the network around the server can be sat urated. This problem is exacerbated when real life deadlines are approaching, during which there can be a large number of clients. Bistro attempts to relieve hot spots in upload applications. In the design of Bistro, we take advantage of the fact that data are usually not consumed right after the deadline. Instead, the application does not want senders to change their data after the deadline, e.g., to achieve fairness. Therefore, as long as the data remains unchanged after the deadline, data transfer can take place later within a reasonable amount of time. In Bistro, an upload process is broken down into three steps. Figure 1.1, taken hom [4], depicts upload processes with and without Bistro. First, in the timestamp step, clients send hashes of their hies to the server, and obtain timestamps. These timestamps clock clients' submission time. After this point, clients cannot change their data without the server detecting this. In the data transfer step, clients send their data to intermediaries called bistros. Note that bistros are not trusted, so clients encrypt their hies to prevent unauthorized access. In the last step, called the data collection step, the server coordi nates bistros to transfer clients' data to itself. The server then matches the hashes of the received hies against the hashes it received directly from the clients. The server accepts Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ûles that pass this test, and asks the clients to resubmit otherwise. This completes the upload procedure. We are interested in developing and analyzing a fault tolerance protocol in this thesis in the context of the Bistro architecture. In the original implementation of Bistro, if bistros are not available at the data collection step due to, for examples, power failures, disk failures or network problems, all hies on the unavailable bistros are lost, hence the destination server needs to ask clients to resubmit. In addition, malicious bistros can intentionally corrupt data. Although destination server can detect cormpted data from the hash check, it has no way of recovering the data; hence the destination server needs to ask clients to resubmit. In this work, we are interested in using forward error correction techniques to recover corrupted or lost data. The fault tolerance protocol, on the other hand, brings in additional storage and network transfer costs for the redundant data as a result of employing the protocol. The goal of this thesis is to provide better performance when intermediaries fails while minimizing the amount of redundant data brought on by the fault tolerance protocol. We propose analytical models to evaluate our fault tolerance protocol. In partic ular, we develop reliability models to analyze the reliability characteristics of bistros. We also derive performance models to estimate the performance penalty of employing our protocol. We study the tradeoff between reliability and performance using a cost function. This thesis is organized as follows. We discuss related work in Chapter 2. We give an overview of erasure code in Chuter 3. Chuter 4 describes our fault tolerance protocol. We derive analytical models for this protocol in Chapter 5. Chapter 6 presents some results showing the tradeoff between performance and reliability characteristics of our protocol. We describe future work in Chapter 7. Finally, we conclude in Chapter 8. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Related Work We focus on fault tolerance issues in Bistro framework, a wide area upload architec ture, and we propose to use erasure codes to provide fault tolerance in such systems. This chapter describes fault tolerance in other large-scale data transfer applications, and discusses other uses of erasure codes in the context of computer networking. 2,1 Fault Tolerance in Distributed Systems In the context of download applications, one approach to achieve fault tolerance is through service replication. Replication of DNS servers [22, 23] is one such example. The root directory servers are replicated, so if any root server fails, DNS service is still available. Each ISP is likely to host a number of DNS server, and most clients are conhg- ured with primary and alternate DNS servers. Therefore, even if some DNS servers fail, clients can contact an alternate DNS server to make DNS lookup requests. In Bistro, the service of intermediaries is replicated, where intermediaries provide interim storages of data until the destination server retrieves it. In storage systems [31, 16, 15], data replication techniques, such as RAID tech niques [26], are commonly used for providing better fault tolerance characteristics. In Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. case of disk failures, file servers are able to reconstruct data on the failed disk once the failed disk is replaced, and data is available even before replacing the failed disks. Although data replication can provide better fault tolerance characteristics, the storage overhead is high. For example, the storage oveihead of mirroring is 100%. We are interested in providing fault tolerance with smaller storage overhead in this work. Some caching techniques can be used to improve fault tolerance. Disconnected oper- ations in Coda [17] makes use of clients’ local cache to achieve better fault tolerance. When client accesses a file in Coda, the hie server sends the whole hie to the client, and allows the client to keep a local cache copy of the hie in local storage. When the file server fails, the client can still access the files in local storage, and hence allow work to be performed in the event of file server failure. In upload applications, however, caching is not feasible because the destination server reads the data only once. For fault tolerance reasons, recent research in distributed storage systems has been moving from centralized servers to serverless systems; [13, 3, 30, 18, 28] are examples of such systems. A centralized hie server has a single point of failure, as the system cannot operate once the hie server breaks down. Using a peer-to-peer communication model to provide hie service can provide better fault tolerance because failure of one server is assumed to be independent 6om failures of other servers. By combining with this replication techniques such as RAID-type striping, these systems are able to provide high availability of data. The destination server is a single point of failure in the Bistro architecture. Once the destination server fails, the entire system breaks down. Future research should investigate how to eliminate this problem in the Bistro framework. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2 Erasure Codes in Computer Networking There are a lot of uses of erasure codes iu computer networking. A number of multi cast applications employ erasure codes to protect against losses [19, 25]. When there are losses in multicast applications, clients can either tolerate the losses, or attempt to recover the packets. One way to recover lost packets is to ask for retransmissions, but this solution is not scalable when there are a large niunber of clients. Using forward error correction techniques allows clients to reconstruct lost packets without contacting the sender. Erasure codes are also useful in bulk data distribution, e.g., [6] describes a way to use erasure codes where clients can reconstruct the data as long as a small fraction of erasure-encoded files are received. This scheme allows clients to choose from a large set of servers, resulting in good fault tolerance and better performance characteristics than traditional bulk data distribution techniques. In wireless networking, using forward error correction techniques can reduce packet loss rates by recovering parts of lost packets [2,10,21]. Packet loss rates in wireless net works are much higher because propagation errors occur more frequently when the data is transmitted through air. Employing forward error correction techniques can improve reliability and reduce retransmissions. Multimedia streaming also employs erastue codes [14,24,11]. Retransmitting pack ets may not be feasible because streaming applications are usually time-critical, so streaming applications must often operate under packet losses. However, when a lot of packets are lost, the quality of the video is often poor. Using erasure codes to recover parts of lost packets can improve the quality of the video. These applications of erasure codes assume that packets are either received success fully or are lost. They assume that there are other ways to detect corrupted packets, e.g., using TCP checksums. In Bistro, however, this assumption is not valid because packets Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. can be intentionally cormpted by intermediate bistros. In Chapter 4, we describe one way to detect cormpted packets using checksums such that we treat cormpted packets as losses. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Overview of Erasure Codes □□□□□□ Encoder □□□□□□□□□ ♦ Network ♦ □ B a D D D l x i a i x i Decoder oooooa Figure 3.1: Illustration of an Erasure Code This chapter provides an overview of erasure codes. An erasure code takes a file of A data packets, adds (n — t) parity packets, and creates a M-packet encoded hie. After encoding, a sender typically transmits the encoded hie using a number of channels. A receiver can use an erasure code decoder to reproduce the original hie as long as any A ; packets are received with no errors. Rgure 3.1 illustrates this process. 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Many erasure codes are based on linear algebra theory. Let be a data packet, and let T = ToTi... Zjk-i be the original data. An encoding procedure produces encoded data ^ ... % /n _ i by multiplying a generator matrix G = % and z, i.e., y = Gx At the receiver side, assume that some of the 2 /iS are lost. Let z = ... z^-i be the received data. Let Æ = is received }. That is, if % /r is received, we add the rth row in G to fif. Hence, H is a t x A ; matrix. Note that z — Hx So, X = H~^z Therefore, we can reconstruct the original data by multiplying and z. This proce dure requires rows of H to be linearly independent, which is true when H is invertible. Note that an erasure code decoder assumes that all received packets are correct. That is, it does not correct corrupted data; it only recovers lost data. Some erasure codes make use of Vandermonde matrices; this is based on hnite held theory. Vandermonde matrices are used in traditional error-correcting codes such as Reed-Solomon Codes. However, these codes are not efhcient since hnite held operations are very expensive; [29] is one of such example. Other erasure codes take advantage of the fact that hnite held operations in erasure codes can be reduced to XOR operations. This can improve the performance of erasure codes because XOR operations are much more efhcient than hnite held operations; [5] is one such implementation. This code uses Cauchy matrices where there is an efhcient Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. algorithm for inversing such matrices. Recall that we invert in the decoding proce dure, so making such operations efhcient can improve the performance of the code. For instance, [12] develops a Reed-Solomon-hke erasure code which uses only XOR opera tions. Instead of using Cauchy matrices, this code explores the structure of an RS-based erasure code and gives an improved algorithm to perform encoding and decoding. Tornado Codes [20] are another example of erasure codes. Unlike other erasure codes that are based on linear algebra. Tornado Codes are based on bipartite graphs. Let G(y, F 7) be a bipartite graph. Let S' be a set of vertices representing the original data, and let P be a set of vertices representing parity data, where V = S U P. Let Pj = {st G S|(si,pj) G E}. Encoding is done by performing XOR on every element in Pj and producing pj for all j. Decoding is done in a similar fashion, in which XOR operations are performed on received data to find the lost data. This is an efficient implementation since it uses only XOR operations. It, however, requires slightly more than t packets to guarantee that the decoder is able to construct the original data. 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Fault Tolerance Protocol This chapter provides details of our fault tolerance protocol. The protocol is broken down into three parts as in the original Bistro protocol described in [8]. The timestamp step verifies clients’ submissions. Actual data transfer is done in the data transfer step, in which clients stripe their files across a number of bistros, instead of sending their files to one bistro as in the original protocol. In the data collection step, the destination server coordinates data transfers from intermediaries to itself. Note that only the timestamp step has to be done before the real life deadline, since our protocol can detect if the files have changed after the deadline. File transfer can be done later as any changes to the file after the issuing of timestamp can be detected. We are going to provide details about each step in this chapter with focus on fault tolerance aspects, and discuss related design decisions. 4.1 Timestamp Step The timestamp step verifies clients' submissions. Clients generate hashes of their data and send them over to the destination server. The destination server replies to clients 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. H = h(Ti) + h(T2) + ... + h(Tx) Client ^ Destination Server ^ = K pnv(h(H ),G ) Figure 4.1: Hmestamp Step with tickets, which consist of timestamps and the hashes messages clients have just sent. Figure 4.1 depicts the timestamp step. In the original protocol, clients send a checksum of the whole hie to the destination server in the timestamp step. If any packets are lost or corrupted, the checksum check would fail, and the destination server would have to discard all packets that correspond to that checksum because it does not know which packets are corrupted. This would mean that losing any packet would result in retransmissions in the original protocol. W packets Original file Divide Into Y FEC groups Encode by erasure code Divide Into Z checksum groups packets packets packets packets n packets G packets G packets 9 packets Figure 4.2: FEC Groups and Checksum Groups 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Variable Meaning To T T V X y z n k The original file The original hie encoded by erasure code Total number of packet in the original hie Number of checksums in the timestamp request message Number of data packets in a FEC group Number of checksum groups in a FEC group Number of data packets and parity packets in a FEC group Number of data packets in a FEC group Table 4.1: Meaning of Variables To solve this problem, we send multiple checksums in the fault tolerance protocol. Assume that each client has W data packets to send. The data packets are divided into Y FEC (forward error correction) groups of k packets each. For each FEC group, a client encodes k data packets into n packets (data + parity), arranges the n packets into Z checksum groups each of size and generates one checksum for each checksum group using a message digest algorithm such as SHAl. We assume that Z is & factor of g, because we want the size of all checksum groups to be the same, which can simplify our reliability model in Chapter 5. There are altogether X = Y Z checksums, which are concatenated and sent in one message to the destination server. Figure 4.2 illustrates the relationship between FEC groups and checksum groups. Table 4.1 summarizes the meaning of variables we use in this context. Note that the size of a checksum group has to be smaller than number of data packets per FEC group (g < A ;). Recall that erasure codes do not correct corrupted packets, so we drop aH packets in a checksum group if any packet within the checksum group is lost or corrupted, and then we try to recover the dropped packets using an erasure code. If ^ > A and if a checksum group is dropped, then we lose more than & packets in at least one FEC group, which we are not able to recover because less than A : packets within that FEC group are received. Hence we have to ask for retransmissions if any packet in the 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Êle is lost or corrupted. So, if p > A , we are back to the problem of the original protocol that losing any packet would result in retransmissions. The above argument also implies that there must be at least two checksum groups per FEC group. This raises an interesting question. To provide better fault tolerance, we should choose Z to be n, i.e., we generate a checksum for every packet, so losing one packet does not afPect other packets. However, in the timestamp step, if we generate a checksum for every packet, the size of the message to be sent to the destination server increases, and hence more network resources would be used. This problem is exacerbated when a large number of clients try to send messages to the destination server moments before the real life deadline, i.e., the original problem solved by Bistro. We derive a cost function to study this tradeoff in Chapter 5. Also, note that erasure codes can keep replication as a special case when we set n to be multiples of For example, if we set n = 2& , we can consider this scheme to be sending two copies of the original file to intemediaries. 1. Client divides his original file. To, into Y FEC groups, each of size k. Then client passes 7^ to an erasure code encoder to get an encoded file, T, where size of each FEC group in T is n. 2. The encoded file T is divided into % parts (T = 7i + Ts - + 7%), and client generates checksums h(7) for all i using a message digest algorithm such as SHAl. 3. Client concatenates checksums he generated in the previous step, and send the result, to the destination server. H = h { T i ) + + • • • + h ( T x ) 4. Upon receiving the message, destination server generates a timestamp cr. 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5. Destination server stores information about the client, the received checksums, and timestamp it has just produced into a local database. 6. Destination server computes the hash value of the received message 7. Destination server concatenates the timestamp tr and hash digitally signs them with its private key, and sends this message, i.e., tickets to client. ( — K p r i v { h { H ) , (j) 4.2 Data Transfer Step bistrol Client Client bistrol bistroB Figure 4.3: Data Transfer Step In the data transfer step, clients send their hies to intermediate bistros which are not trusted. Upon receiving the data from clients, bistros send receipts to clients and the destination server to verify their submissions. Figure 4.3 depicts the data transfer step. In [9], the assignment problem is studied. That is, how a client should choose a bistro to which it sends its file to. However, in that case, only one bistro out of a pool of bistros is chosen. In the case of striping, a client needs to choose B > 1 bistros. We leave the choice of which B bistro clients should stripe its hie to, and how clients determine the value of B to future work. 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In our fault tolerance protocol, we stripe the data across a number of bistros instead of sending the whole hie to one bistro, as in the original protocol. [27] suggests that data dispersal can provide better fault tolerance, if failure of one storage device is inde pendent of failure of other storage devices in the system. Note, we treat failures and malicious behavior similarly. Since we do not trust intermediate bistros, clients need to encrypt their data to protect it against unauthorized accesses or modihcations. This property is inherited from the original protocol, except that we need to generate a number of session keys instead of just one since we are striping files across a number of bistros. Data Step AfgontAm 1. Client chooses B bistros to send their data to. 2. Client generates a session key for each bistro it has chosen. 3. Client divides the file into B parts. For each part of the file, client encrypts it with a session key and sends that part to intermediate bistros f. Client also sends bistro i session key Kgesi and ticket ^ encrypted with public key of destination server. 4. Each bistro i generates a receipt p, and sends it to both client and destination server. Pi ~~ ^ i , p r i v i , K p y } ) { K , ^)), K i i,pub 4.3 Data Collection Step In the data collection step, the destination server coordinates intermediate bistros to collect data. Figure 4.4 depicts this step. 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. bistrol bistroB bistrol Destination Server bistroB Destination Server Figure 4.4: Data Collection Step When an erasure code is used, we do not always need to collect all the data as some of it is redundant. We only need A out of n packets from each FEC group in a file. After receiving k packets for each FEC group, the destination server have two choices on reconstructing clients' data. First, the destination server can pass the received packets to an erasure code decoder. Second, the destination can ask intermediaries to transfer the remaining data. There is a tradeoff between computational costs of erasure code decoding and network resource requirements. The first scheme involves more computation costs while the second scheme requires more network bandwidth. We leave the study of this tradeoff as future work, and in this thesis we assume that the destination server collects all packets for every file. Data Confection Step Afgorithm 1. When destination server wants to retrieve data from bistro z, it sends a retrieval request along with receipt p^. RetrievefEZD, p j 2. Upon receiving retrieval requests from destination server, bistro z sends the file 7^ along with the encrypted session key and ticket for decryption. 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. When the destination server receives a message, it retrieves the session key by decrypting the message using its private key. It then decrypts 7^ using the session key. 4. When all packets within a checksum group are received, destination server com putes a checksum of the received checksum group. It then matches this checksum with what it received during the timestamp step. If these two checksums match, the destination server accepts all packets in the checksum group, and discards them otherwise. 5. After the destination server has retrieved data from all intermediate bistros, it passes the packets that pass the checksum check to an erasure code decoder if it has received at least k packets from every FEC group, and the erasure code decoder reconstructs the original file To. If the destination server receives less than k packets from any FEC group, it contacts the clients and asks for resubmitting the lost data. 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Analytical Models We propose analytical models to evaluate our fault tolerance protocol in this chapter. We derive reliability models to study how reliability characteristics of bistros affect system reliability. We also derive a performance model to estimate the performance penalty of employing our protocol. Lastly, we derive a cost function to study the tradeoS between reliability and performance. 5.1 Modeling Reliability of Checksum Groups We begin the reliability models discussion by considering the reliability of checksum groups. Recall that if a checksum check fails, all packets within that checksum group are discarded because we have no way of determining which of the packets are corrupted. Let pg be the probability that there is no loss within a checksum group. Hence, the probability that at least one packet is lost within a checksum group is 1 - pg. In the following sections, we are interested in deriving p, using different reliability models. These models make different assumptions about the packet loss characteristics and corruption characteristics. 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.1.1 Independent Packet Losses In the independent packet loss model, we assume that losing or corrupting one packet is independent of losing or corrupting other packets within the same checksum group. This is a good model to analyze packet losses and corruptions if we have no information about how striping is done. Let the probability of losing a packet, p, be the probability that a packet is lost or cormpted. Then, % = (1 - .P)' This model does not allow correlation between consecutive packet losses. For example, if a bistro is malicious, given that a packet is corrupted, the probability of the next packet from the same bistro being corrupted weight higher. We describe Gilbert Model in 5.1.3 to capture this correlation. 5.1.2 Independent Bistro Failures This model assumes that all packets in the same checksum group are sent to the same bistro. We also assume that failure of one bistro is independent of failure of other bistros. So, if a bistro fails, all packets on that bistro are lost. This model attempts to illustrate this effect. Let py be the probability that a bistro fails or is malicious. So, the probability that the whole checksum group is sent over successfully is 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This model makes an assumption that malicious bistros always corrupt all packets, which is not the case since bistros do not have to corrupt all packets to be considered malicious. 5.1.3 Gilbert Model The Gilbert Model takes the middle ground between the two models we have just described. This model allows correlations between lost or corrupted packets. Although it is typically used to model network losses, we believe it is also a good model for understanding reliability characteristics of bistros. Gilbert Model is a two-state Markov chain. Being in state 0 means that the previous packet is not lost, while being in state 1 means that the previous packet is lost. Figure 5.1 depicts the discrete time version of the Gilbert Model. Pio Figure 5.1: Gilbert Model Let T T o be the steady state probability of state 0, and v r^ be the steady state probability of state 1. Solving the Markov chain we have 7 T i = Pw + Poi Poi Pro + Por 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Let jP(o, 6) be the probability that a packets are sent by the clients to the destination server via bistro i, and 6 of them are lost or corrupted. We can dehne f (o, 6) recursively as follows. f (a, 6) = 7roP(a — 1,6) 4- 7riP(n — 1,6 — 1) The boundary conditions are given as follow. P(a, b) = 0 ifa<6ora<0or6<0 P (1 ,0 ) = TToPoO+TTiPio P ( l ,l) = TToPoi+TTiPn Assuming we send all packets in a checksum group to a bistro, the probability of no loss or corruption within a checksum group is given by % = ^ ( 9 ,0) The Gilbert Model degenerates to the independent packet loss model in Section 5.1.1 when P oo = 1 — P, Poi = P, Pio = 1 — P, and pn = p. This says no matter which state the system is in, we have same probability of losing the current packet. If we add an additional state a that has probability of 1 — P/ to go to state 0, and probability of p/ to go to state 1, and we setpoo = 1, Poi = 0, pio = 0, andpn = 1, we can represent the bistro fads model in Section 5.1.2. 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2 Overall Reliability Model Let y be a random variable that represents the number of checksum groups that pass the checksum check within a FEC group, then P (V = v)=\ ^ V In other words, y is a binomial random variable with parameters % and pg. The minimum number of checksum groups required to reconstruct a FEC group is given by [^1 because we need at least k packets and the packets are organized in groups of size g. As a result, the probability that the destination server is able to reconstruct a FEC group is given by .k. f ( y > H ) = ^ f ( y = 2) = É r j p i d - p j ' " \ ' y We are also interested in Ending the expected data packets lost within a FEC group, L(j), where j is the number of checksum groups lost. In other words, L(y) is the expected number of data packets that cannot be recovered by erasure codes. If A or more packets are received from a FEC group, then E (_ 7) is 0 because erasure codes can recover all data packets. The content of checksum groups can be classiGed as follows. # Contain only data packets. There are such checksum groups. 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Content Quantity # data packets Data Packets only k 9 Data Packets 0, if A ; is divisible by g A ; mod g and Parity Packets 1, otherwise Parity Packets Only 0 Table 5.1: Summaiy of Content of Checksum Groups # Contain both data packets and parity packets. If A is divisible by g, there are no such checksum groups. Otherwise, there is exactly one such checksum group. • Contain only parity packets. There are such checksum groups. In the hrst case, the number of data packets per FEC group is g. In the second case, the number of data packets per FEC group is k mod g. In the last case, the number of data packets per FEC group is 0 as there are only parity packets in that FEC group. Table 5.1 summarizes this information. Assume that A is a multiple of g. As a result, there is no checksum group that contains both data and parity packets. Let be the number of different ways to distribute checksum group losses among the two classes of checksum group listed in Table 5.1 given that j out of Z checksum groups are lost. Hence, Hi) = E x=0 So, ^ 0 ) = Z 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Now assume that A is not a multiple of g', and we are given that j checksum groups are lost. Therefore, we need to distribute j losses among aU three classes of checksum group. Let z be the number of checksum groups lost that contains only data packets, and let ? / be the number of checksum groups lost that contains both data packets and parity packets. So, the number of ways to distribute j checksum groups lost is 1 / i&i m = E E " x~o y~o y X l;l 1 = EE x ~ 0 y=:0 \ X So, ^ (zg + 2 /(A m o d g ))(l-p g X p (^ :') x=o p=o y X j y j — X — y j Expected data packet losses per FEC group L is given by Hence, the average data packet loss rate per FEC group Lrok is given by T _ ^rate — . The whole hie is transferred successfully if all FEC groups can be reconstructed. So, we need at least A : packets from each of E FEC group. Therefore, the probabihty that the hie is transferred successfully is ( f (V > and the probability that part of a 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ûle is lost is given by 1 — ( f (V > [^1))^. The average data packets loss rate for the whole hie is because the quantity is normalized. 5.3 Performance Model This section describes performance models to evaluate our fault tolerance protocol. We consider the performance penalty in the timestamp step only. If the performance of timestamp step is poor, then we are back to the original problem where many clients are trying to send large files to a server at the same time. In the data transfer step, perfor mance is not as critical because clients are less likely to overload intermediate bistros with striping. In the data collection step, the destination server is able to coordinate bistros, so performance is not as critical as in the timestamp step as there is no hard deadline. 5.3.1 Server Performance in the Timestamp Step In this section, we are interested in the performance of the computation needed at the destination server in the timestamp step. When the destination server receives a times- tamp request message from a client, it computes a hash value of the checksums gener ated by that client, and digitally signs the reply message, which we term as the ticket. Let fda be the average time the destination server takes to digitally sign a ticket, and let be the average time to compute a hash value of a timestamp request message consisting of z checksums. Notice that is independent of the number of checksums we generate, because the size of the return message is fixed, which is the size of the hash of the original message plus the size of the timestamp. Total time a destination server needs to compute a reply to a message is t(z) = -t- th(z). Results from a 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. previous work [7] suggest that tj, is approximately 0.0041s on an 800 MHZ Pentium- Ill PC running Linux, and we are interested in looking at how changes when a client sends different number of checksums. 3 , 0 E 0 0 c o CL W 0 0.0007 0.0006 0.0005 0.0004 0.0003 0.0002 0.0001 0 0 200 400 600 800100012001400160018002000 number of checksums Figure 5.2: Average Time to Compute Hash of Different Number of Checksums In order to estimate we emulated the destination server by running OpenSSL on different number of checksums on an 800 MHZ Pentium-Ill PC running Linux. Fig ure 5.2 shows the time taken to compute SHAl hashes in our simulation. From Figure 5.2, even if clients send 2000 checksums to the destination server, it takes about 0.0006s to computer a hash of the checksums, which is an order of magnitude faster than pro ducing a digital signature. Thus we believe that the generation of a hash of multiple checksums should not overload the destination server. 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3.2 Size of timestamp messages It is likely that overloading of network resources, due to sending a greater number of checksums, is more important than the additional computations needed on the server. If every client sends 2000 checksums to the destination server, and clients use the SHAl message digest algorithm which produces a 20-byte checksum, every client will send about 40KB of data in the timestamp step. This might overload network resources around the deadline time, i.e., we would be back to the original problem of a large number of clients trying to send large amounts of data to the server in a short period of time. As a result, we use the size of the timestamp message as a metric for determining the performance drawbacks of our scheme. The total number of checksums a client sends is given by X = YZ. Now, we want to find a normalized metric to measure the size of a timestamp message, because we do not want to penalize large files for sending more checksums than small files. One possible metric is the number of checksums normalized by the file size. This quantity is given by and hence, This is the number of checksum groups per data packet. In what follows, we use this quantity as a metric for evaluating the performance penalties of our fault tolerance protocol. 5.4 Cost function Now that we have a reliability model and a performance model, the question is how to combine the effects of from both in order to study the tradeoff between reliability and performance. This section describes a cost function which we propose to use to achieve this goal. 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Let Cl be the cost computed using the reliability model, and let Cg be the cost computed using the performance model in the timestamp step. Thus, our cost function is C = w i C \ + W2C 2 where lui and tug are weights of each factor. We derive two different costs from the reliability model, namely the probability of losing part of a hie and the average data packet loss rate. We can use either for C % and evaluate the differences between these two metrics. The performance cost in the timestamp step is given by the number of checksum groups per data packet. In the next chapter, we will study how each parameter affects the overall cost function. 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Results This chapter provides results on varying different parameters of the cost function dis cussed in the last chapter. Parameters of interest are as follows. 1. Number of checksum groups per FEC group, Z. We mentioned this tradeoff in Chapter 4. Setting Z to be large can provide better reliability because losing a packet affects fewer packets, as we drop the entire checksum group whenever any packets from that group is lost or corrupted. On the other hand, large values of Z result in large timestamp messages, which can have adverse effects on network resources. 2. Number of parity packets per FEC group (n — A). For reliability reasons, we want to send a large number of parity packets, but this increases the number of checksums we send as we are interested in adding parity checksum groups. 3. Number of data packets per FEC group A . Given a hie of FF packets, we want to study dihlerences in dividing the hie into few large FEC groups or many small FEC groups. Dividing hies into many small FEC groups has better fault tolerate, but chents send more checksums which makes the timestamp request message 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. large because we require that clients produce at least two checksums per FEC group. 4. Probability of losing a packet, p. We want to see how sensitive the cost function is to p. All results presented in this chapter use the independent packet loss model because of its simplicity. Other reliability models give similar results. We use both reliability metrics, which are probability of losing part of a hie and the average data packet loss rate. The performance metric used is the number of checksums per data packet. In each of the results, we show graphs on each reliability metrics, the performance metric, and the cost function computed using each of the reliability metrics. 6.1 Setting Weights 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 w1 = 0.999 — w1 = 0.99 — w1 = 0 . 9 / - w 1 = 0 ^ - vy.l'=0.1 --- 2 4 6 8 10 12 14 16 16 20 number of checksum groups per FEC group 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 w1 = 0.999 -----T-' W l = 0.99 W l = 0.9,.'-...... wl z.O '.S ----- - # = 0.1 -------- 10 12 14 16 18 20 number of checksum groups per FEC group Figure 6.1 : Cost Function - varying weights In the section, we study how to set the weights in the cost function in order to obtain a convex curve to study the tradeoff. Recall that is the weight corresponding to a reliability metric, and wg is the weight corresponding to the performance metric, and 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. W l + wg = 1. We depict the results as a function of Z; we plot these results by setting W l to 0.1,0.5,0.9,0.99, and 0.999. Figure 6.1 illustrates these results. From the graphs, we can see that when wi = 0.1, the cost strictly increasing. When W l = 0.5, the cost is very close to a strictly increasing line. We observe that the cost is a convex curve when wi = 0.9, and the cost becomes strictly decreasing when wi is 0.99 and 0.999. Values of both reliability metrics is between 0 and 1, and these values approach 0 when Z increases. On the other hand, values of checksum per data packet ranging from 0 to 2, and are able to go further when we keep increasing Z. As a result, in order to obtain a convex curve, we need to set wi to be around 0.9. 6.2 Varying the Number of Checksum Groups in a FEC Group We now study the effect of Z. Setting Z to be large can provide good reliability, because losing one packet can lead to drop fewer packets. On the other hand, large values of Z would make the timestamp message large, hence we are more likely to overload network resources. Our results are plotted in Figure 6.2, with F = 5, n = 20, A : = 10, p = 0.01, W l = 0.9, and wg = 0.1. The two upper graphs show how reliability metrics change with Z. Both reliability metrics drop dramatically when Z is between 1 and 2, and are close to 0 when Z > 4. This is because when Z > 2, clients are able to reconstruct a FEC group even if some checksum groups are dropped. Checksums per data packet increase linearly, because Z increase linearly while A : is hxed. The cost is high when Z is small because the probability of losing part of a hie is high. The cost decreases when 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.2 0.18 S 0.16 0.14 0.12 I t I 0.08 0.06 0.04 0.02 I 0 2 4 6 8 10 12 14 16 18 20 0.7 0.6 0.5 n j î 0.4 o .ë - 0.3 1 0.2 a 0 2 4 6 8 10 12 14 16 18 20 number of checksum groups per FEC group number of checksum groups per FEC group 1 g 1 .8 CL a 1.6 I : I Ü 0 .6 0.4 0.2 1 0 2 4 6 8 10 12 14 16 18 20 0.2 I 0.18 I 0.16 I 0.14 g. I & 0.12 0.1 2 I 0.06 8 0.04 0.02 0 2 4 6 8 10 12 14 16 18 20 number of checksum groups per FEC group number of checksum groups per FEC group 0.6 i 0.5 g I * 5 0.3 I 0.2 8 0 2 4 6 8 10 12 14 16 18 20 number of checksum groups per FEC group Figure 6.2: Cost Function - varying Z Z is between 1 and 2 because the probability of losing part of a hie is improving. At Z > 2, the cost goes up again because the size of the message is too large. 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3 Varying the Number of Parity Packets in a FEC Group 0,045 0.035 0.015 0.005 5 10 15 20 25 30 35 40 number of parity packets per FEC group 0.4 0.35 0.3 0.25 0.2 o 0.15 0,05 0 5 10 15 20 25 30 35 40 I 0.9 g. 0.7 I 0.6 & g 0.5 o 0.4 I 0.3 0.2 0 5 10 15 20 25 30 35 40 number of parity packets per FEC group number of parity packets per FEC group 8 0.09 I I a 0.08 0.07 0.06 2 0.05 8 0.04 0.03 0 5 10 15 20 25 30 35 40 number of parity packets per FEC group 0.4 S ' 0.35 t o .£ f 0.25 o g . 02 0.15 8 0.05 0 5 10 15 20 25 30 35 40 number of parity packets per FEC group Figure 6.3: Cost Function - vaiying M — A We now consider the ejects of adding parity packets to our system. Intuitively, adding more parity packets would lead to better reliability. Here we vary n - A , the 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. number of parity packets, as well as Z. We are actually interested in the effect of adding 'parity checksum groups'. If we hx Z, that means that the new parity pack ets are squeezed into the existing checksum groups, and this does not provide better reliability as might be expected. Figure 6.3 shows the results, with t = 10, p = 0.01, Z = 2, = 0.9, and V Ü 2 = 0.1. The two upper graphs show that both reliability metrics drop as the number of parity packets increase. This make intuitive sense because adding more parity packets can provide better reliability. Checksum per data packet increase linearly with number of parity packets per FEC group, because Z increase hnearly as we add more parity packets, and k is hxed. In the cost function graphs, when n — A ; < 5, the cost decreases because the reliability metrics decrease. A tn — k = 5 in the middle right graph and n — k = 10 in the bottom graph, the cost increases because j increases while both reliability metrics approaching 0. 6.4 Varying the Number of FEC Groups in a Füe We study how we should choose A , the number of data packets in each FEC group, in this section. We are interested in the following question. Should we group the data packets into few large FEC groups, or should we group them into many smaller FEC groups? The results are plotted in Figure 6.4, with IV = 100, rr = 2& , Z = 2, p = 0.01, W l = 0,9, and W 2 = 0.1. The two upper graphs show that reliability drops as A ; increases. That is, large FEC groups do not tolerate faults as well as small FEC groups. However, using smaller FEC groups means that we are sending more checksums, as we need at least two checksums for each FEC group. Since Z is hxed while A : increases, checksum per data packets, given by f , decreases exponentially. 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.16 0.14 I 0.12 I ! I 0.08 0.06 0.04 I 0.02 0 5 10 15 20 25 30 35 40 45 50 0.16 0.14 0.12 (0 I 0.08 0.06 I g 0.04 0.02 0 5 10 15 20 25 30 35 40 45 50 number of data packets per FEC group number of data packets per FEC group Î I & I 0.8 0.6 0.4 0.2 0 1 0 5 10 15 20 25 30 35 40 45 50 number of data packets per FEC group 5 10 15 20 25 30 35 40 45 50 number of data packets per FEC group 0.22 0.2 I 0.18 J 0.16 ■ 5 a- 0.14 0.12 & 0.1 Î 3 8 0.08 0.06 0 5 10 1 5 20 25 30 35 40 45 50 number of data packets per FEC group Figure 6.4: Cost Function - varying A ; In the cost function graphs, cost is high when A : is small, because this results in a lot of checksums. The cost drops when A ; s between 1 and 10, as we send fewer checksums and the corresponding reliability penalty does not increase as fast. Eventually, when A : > 10, since larger FEC groups are not as fault tolerant, cost goes up as A ; increases. 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.5 Varying the Probability of Losing a Packet I I I 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ^ 0.9 B 0.8 S' 0.7 0.6 0.5 0.4 1 0.3 " 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 probability of losing a packet probability of losing a packet Figure 6.5: Cost Function - varying p We are interested in looking at how the cost function changes with the probability of losing a packet, p. Figure 6.5 depicts our results, with Y = 5, n = 20, A : = 10, Z = 2, Wl = 0.9, and W 2 = 0.1. Since both Z and A ; are hxed, changes in cost reflects changes in the reliability met rics. As p increases from 0.1 to 0.6, cost increases rapidly. Asp > 0.6, the cost increases at a decreasing rate. This is because both the probability of losing part of a file and the average data packet loss rate approaches 1 as p increases. This result makes intuitive sense as p increases, reliability metric should increase. 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 7 Future Work In this chapter, we outline possible directions for future work. In the data transfer step, a client needs to choose B bistros for striping their data. Clients also need to choose how much data to send to each of the B bistros according to their reliability and performance characteristics. For example, clients may want to send their hies to faster bistros to minimize their response time, and they may want to send files to more reliable bistros to minimize the chance of losing part of their files. We believe that this problem can be formulated as a variant of the transportation problem, which is a NP-complete problem studied extensively in the area of operations research. After the destination server has collected A : packets from each FEC group, it has a choice of reconstructing the data without retrieving the remaining n — A ; packets. This approach can reduce network bandwidth requirement, but it can increase the computa tional cost at the destination server for decoding. The remaining packets, on the other hand, could be on malicious bistros and hence may be unavailable. Since some era sure code decoders do not benefit from more than A : packets, attempting to retrieve the remaining packets could be a waste if they are unlikely to pass the checksum check. In this work we investigated data loss and data corruption. However, recall that we rely on bistros to send session keys to the destination server. Although the session 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. keys are encrypted with the public key of the destination server, malicious bistros can corrupt session keys to prevent data from properly reaching the destination server. Even though the destination server can treat such data as corrupted since it wdl not pass the checksum check, we can look for ways to prevent this from happening, hence making the system even more fault tolerant. One way to achieve this goal is to have the clients also send the session keys to the destination server in the timestamp step. This would make the timestamp message even larger, especially when a client stripes hies across a large number of bistros. Blacklisting of malicious bistros can provide better fault tolerance as we can elim inate unreliable bistros from future upload events. If data from a particular bistro does not pass the checksum checks frequently, we can assume that bistro is malicious, or has some software or hardware problems (i.e., unreliable). We can blacklist that bistro and notify its administrators, and remove that bistro from future upload events. Another fault tolerance issue in Bistro is the failure of a destination server. If the destination server fails, an event owner can receive no data since all information about clients and hies is unavailable. We need to hgure out what is needed to reconstruct data on the destination server, then we can derive an algorithm to reconstruct the database of the failed destination server from data reside intermediate bistros. 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 8 Conclusions Hot spots are a m^or hurdle to making Internet applications scalable. Many researches have been addressing this problem in one-to-one applications, one-to-many applications, and many-to-many applications. Yet, to the best of our knowledge, there is no work on relieving hot spots for many-to-one, or upload, applications except for Bistro, which have been shown to have a scalable and secure design. The goal of this thesis is to develop a fault tolerance protocol that improves perfor- mance in the face of failures or malicious behavior of intermediaries. In the original protocol, if any bistro is not available during data collection step, hies on that bistro are lost, and the destination server has to ask for resubmissions. Also, if bistros are mali cious, i.e., they intentionally corrupt data, the destination server can detect this, but it cannot recover the corrupted Ales, hence we have to request them from the clients again. Our goal is to provide redundancy using erasure codes to tolerate failures or corrup- Aon of some intermediate bistros, while minimizing the amount of storage and network transfer costs as a result of employing the protocol. We developed a fault tolerance protocol in this thesis. As in the original proto col, the proposed fault tolerance mechanism is scalable and can provide data security. We encode Ales with erasure codes, divide them into checksum groups, and generate 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a checksum for each checksum group. We concatenate the checksums and send them to the destination server in the timestamp step. We stripe the data across a number of bistros in the data transfer step. Finally, the destination server collects data residing on intermediaries in the data collection step. We evaluated our protocol using proposed analytical models. We provided a reli ability model that for computing the probability of losing part of a hie as well as the average data packet loss rate. Our performance model uses number of checksums per data packet as its metric. Furthermore, we use a cost function to combine the reliabil- ity and performance metrics in order to study the combined effects and the resulting tradeoff. We studied the resulting cost, as a function of a number of parameters, including number of data packets per FEC group, the number of parity packets, and the number of checksum groups per FEC group. We also studied the sensitivity of the cost function to the probability of losing a packet and to the weights of the cost function. Conclusively, we believe fault tolerance is important in wide area data transfer appli- cations. We developed a &ult tolerance protocol in the Bistro architecture. We believe our protocol can provide fault tolerance while minimizing additional cost as a result of employing our protocol. Better fault tolerance also leads to fewer retransmissions due to packet losses or corruptions, resulting in better system performance. 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bibliography [1] Akamai, http://www.akamai.com. [2] Jong-Suk Ahn and John Heidemann. An adpative FEC algorithm for mobile wire less networks. Technical Report ISI-TR-555, USC/Information Sciences Institute, March 2002. [3] T. Anderson, M. Dahlin, J. Neefe, D. Patterson, D. Roselli, and R. Wang. Server- less network hie systems. In on Operafing Pnncip/gj /5'OSP^ JJ, December 1995. [4] Samrat Bhattachaijee, William C. Cheng, Cheng-Fu Chou, Leana Golubchik, and Samir Khuller. Bistro: a framework for huilding scalable wide-area upload applica tions. ACM MGME7R/C5 Evn/wohon Review, 28(2):29-35, Septem ber 2000. [5] J. Blomer, M. Kalfane, R. Karp, M. Karpinski, M. Luhy, and D. Zuckerman. An XOR-based erasure-resilient coding scheme. Technical report, International Com- puter Science Institute, Berkeley, California, 1995. [6] John Byes, Michael Luhy, Michael Mitzenmacher, and Ashutosh Rege. A digi tal fountain approach to reliable distribution of bulk data. In ACM MCCCMM, September 1998. [7] WiHiam C. Cheng, Cheng-Fu Chou, and Leana Golubchik. Performance of batch- based digital signatures. In Procggdmg.; q/"r/i6 JOt/i 7EE&ACM /nfemnhonal 5[ym - on Modg/ing, Ano/y.sM on^ Simw/otion q /" Cofnpwter am/ Tgkco/Mnwnico- rion 5"y.rrenw , pages 291-302, October 2002. [8] William C. Cheng, Cheng-Fu Chou, Leana Golubchik, and Samir Khuller. A secure and scalable wide-area upload service. In Procgg(/mg^ /hrgmohona/ Con/grencg on /nte/Tiet Co/np wring, volume 2, pages 733-739, June 2001. [9] William C. Cheng, Cheng-Fu Chou, Leana Golubchik, and Samir Khuller. A per formance study of bistro, a scalable upload architecture. ACM MCMETTdCS Per- /brmoncg Evo/woiion Rgview, 29(4):31-39, 2002. 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [10] G. Ding, H. Ghafoor, and B. Bhargava. Resilient video transmission over wire less networks. In dr/z IEEE Wgrnarzonaf Con^ on Ofÿgcr-onenW EeoZ-dnza Dw- rridwrgff May 2003. [11] Nick Feamster and Harl Balakrishnan. Packet loss recovery for streaming video. In IEEE Azc&er Vkko 2002, April 2002. [12] Gui-Liang Feng, Yan Yang, Robert Deng, and Feng Bao. A novel reed-solomon- like code scheme with only XOR operations. Technical report. Center of Advanced Computer Studies, University of Louisiana Lafayette, 2003. [13] G. Gibson, D. Nagle, K. Amiri, F. Chang, E. Feinberg, H. Gobioff, C. Lee, B. Ozceri, E. Riedel, D. Rochberg, and J. Zelenka. File server scaling with network-attached secure disks. In ACM M GAfEZTdCS^ InrgmorfonaZ Cor^/mcg on Mgaynnenzgnr on<f Mofle/zng Computer S^stenw, pages 272-284, June 1997. [14] Leana Golubchik, John C.S.Lui, Tak Fu Tung, Lik Hang Chow, W.J. Lee, G. Franceschinis, and C. Anglano. Multi-path continuous media streaming: What are the benefits? Eer/brmonce EvoJuohon Tbumol, 39:429— 449, September 2002. [15] J. Hartman and J. Ousterhout. The zebra striped network file system. ACM Trans- octrons on Computer S^rtenw, 13(3), August 1995. [16] J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West. Scale and performance in a distributed file system. ACM Trongoc- tions on Computer Systems, 6(1):51-81, February 1988. [17] J. Kistler and M. Satyanarayanan. Disconnected operation in the coda file system. In 13th Symposium on Cperutzng Sy^temj ErtnczpZea (5C5E '91 ), pages 213-225. [18] J. Kubiatowicz, D. Bindel, Y . Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gum- madi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao. Oceanstore: An architecture for global-scale persistent storage. In A5EEC5 9, November 2000. [19] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley, and J. Crowcroft. RFC3453 : The use of forward error correction (EEC) in rehable multicast, December 2002. [20] Michael G. Luby, Michael Mitzenmacher, M. Amin Shokrollahi, and Daniel A. Spielman. Efficient erasure correcting codes. IEEE Trunfoctronf on In/brmutton Theory, 47(2):569-584, February 2001. [21] Philip McKinley and Arun Mani. An experimental study of adaptive forward error correction for wireless collaborative computing. In IEEE 5ympo.yium on Apphco- tfon; omJ the Internet (5ÆVT 2001), January 2001. 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [22] P. V . Mockapetris. RFC 1034: Domain names — concepts and facilities, Novem ber 1987. [23] P. V . Mockapetris. RFC 1035: Domain names — implementation and specidca- tion, November 1987. [24] T. Nguyen and A. Zakhor. Distributed video streaming with forward error correc tion. In IEEE Eac&et VW eo 2002, April 2002. [25] Jorg Nonnenmacher and Ernst Biersack. Reliable multicast: where to use REC. In Ewtocok ybr Agtwort;, pages 134-148,1996. [26] David A. Patterson, Garth Gibson, and Randy H. Katz. A case for redundant arrays of inexpensive disks (raid). In q/" fA e I98& ACM 5/GMOD mtgmarionaZ con/èrencg on Afonogg/ngnf pages 109-116. ACM Press, 1988. [27] Michael O. Rabin. Efficient dispersal of information for security, load balancing, and fault tolerance. Towmo/ q/^rA g ACM, 36(2):335-348, April 1989. [28] Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz. Pond: the oceanstore prototype. In 2wf f/5'EMY Con/grgncg on E iZ g on<f Aorogg 7gcAno/ogig.r '03), March 2003. [29] Luigi Rizzo. Effective erasure codes for reliable computer communication proto cols. ACM Compufgr Conununzcohon Rgvigw, 27(2):24— 36, April 1997. [30] A. Rowstron and P. Druschel. Storage management and caching in PAST, a large- scale, persistent, peer-to-peer storage utility. In SCSE 78, October 2001. [31] R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and imple mentation of the stm network hlesystem. In Swnmgr 7985 C8EMX Con^/encg, pages 119-130, June 1985. 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Design-time software quality modeling and analysis of distributed software-intensive systems
PDF
Building queryable datasets from ungrammatical and unstructured sources
PDF
An integrated environment for modeling, experimental databases and data mining in neuroscience
PDF
Allocating roles in large scale teams: An empirical evaluation
PDF
Diagnosis and localization of interdomain routing anomalies
PDF
Energy and time efficient designs for digital signal processing kernels on FPGAs
PDF
Distributed resource allocation in networks for multiple concave objectives
PDF
Architectural support for efficient utilization of interconnection network resources
PDF
Boundary estimation and tracking of spatially diffuse phenomena in sensor networks
PDF
An adaptive soft classification model: Content-based similarity queries and beyond
PDF
Tensor voting in computer vision, visualization, and higher dimensional inferences
PDF
A unified mapping framework for heterogeneous computing systems and computational grids
PDF
Exploiting locality of interaction in networked distributed POMDPs: An empirical evaluation
PDF
Concept, topic, and pattern discovery using clustering
PDF
Automatically and accurately conflating road vector data, street maps and orthoimagery
PDF
An adaptive temperament -based information filtering method for user -customized selection and presentation of online communication
PDF
Contact pressures in the distal radioulnar joint as a function of radial malunion
PDF
Efficient minimum bounding circle-based shape retrieval and spatial querying
PDF
Cost -sensitive cache replacement algorithms
PDF
An investigation into depth estimation from blurring and magnification
Asset Metadata
Creator
Cheung, Leslie Chi-Keung
(author)
Core Title
Design and evaluation of a fault tolerance protocol in Bistro
School
School of Engineering
Degree
Master of Science
Degree Program
Computer Science
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Computer Science,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Golubchik, Leana (
committee chair
), Ozden, Banu (
committee member
), Shahabi, Cyrus (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-314351
Unique identifier
UC11337015
Identifier
1421761.pdf (filename),usctheses-c16-314351 (legacy record id)
Legacy Identifier
1421761.pdf
Dmrecord
314351
Document Type
Thesis
Rights
Cheung, Leslie Chi-Keung
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA