Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Techniques for peer-to-peer content distribution over mobile ad hoc networks
(USC Thesis Other)
Techniques for peer-to-peer content distribution over mobile ad hoc networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
TECHNIQUES FOR PEER-TO-PEER CONTENT DISTRIBUTION OVER MOBILE AD HOC NETWORKS by Chao-Chin Chou A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Ful¯llment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2007 Copyright 2007 Chao-Chin Chou Dedication This dissertation is dedicated to my beloved family. To my father Hung-Chu Chou, you make me have no fear of disturbance in the rear when pursuing my goal in life, and to my mother Hsing-Yi Huang, you support me all my life to discover my potentials. I will always be indebted and grateful to you. To my brother Yang-Cheng Chou, you are always a trustworthy friend. I am grateful to have you take care of mom and dad when I was not around. To my wife-to-be and best friend for life Kate Hung, you are the sunshine dispelling my sorrow and depression. You give my life meaning and complete my being. With all the love and respect to my family. ii Acknowledgements I am grateful to C.-C. Jay Kuo, my advisor, mentor and friend. Without his support, I could not even start my Ph.D. study. His encouragement and guidance lead the way for metorealizemyideasthroughourresearch. Hisunderstandingintechnicalandpersonal matters was a motive for me to continue through tough times. Special thanks go to Xiaojiang Chen, my mentor and friend, who also serves on my quali¯cation and defense exams. Without his support I could not have reached this far. Thanks for my committee member Konstantinos Psounis for his useful discussions and commentson the dissertation and for serving on myquali¯cation and defense exams. Thanks also go to Kai Hwang for serving on my quali¯cation committee. IwouldliketothankallmycolleaguesintheMultimediaCommunicationsLab. Their interaction enriched my experience and inspired me. iii iv Table of Contents Dedication Acknowledgements List of Tables List of Figures Abstract Chapter 1: Introduction 1.1 Significance of the Research 1.2 Review of Previous Work 1.2.1 P2P systems on MANETs 1.2.2 Routing of P2P Query Messages 1.2.3 Anonymous Networking 1.3 Contributions of the Research 1.4 Organization of the Dissertation Chapter 2: Background Review of P2P Networks and MANETs 2.1 Peer-to-peer Networks 2.1.1 Unstructured P2P Networks 2.1.1.1 Classification of Unstructured P2P Networks 2.1.1.2 Alternatives for Query Search 2.1.2 Structured P2P Networks 2.2 The Mobile Ad Hoc Networks 2.3 Similarities between P2P Networks and MANETs Chapter 3: Bloom Filter-Based Probabilistic Routing 3.1 Protocol Design 3.1.1 Bloom Filter Overview 3.1.2 Algorithm Design 3.1.2.1 Bloom filter set maintenance 3.1.2.2 Query processing 3.1.2.3 Bandwidth saving 3.1.2.4 Dynamic Bloom filter size adjustment 3.2 Performance Analysis 3.2.1 Space Complexity 3.2.2 The Impact of Dynamic Bloom Filter Size Adjustment 3.2.2.1 Time to trigger the size adjustment ii iii vii viii xi 1 1 3 4 4 5 6 10 11 11 12 13 15 17 18 20 21 22 22 23 24 25 26 27 32 32 33 33 v 3.2.2.2 Value of size adjustment 3.3 Simulations 3.3.1 Route strength 3.3.2 Query success rate 3.3.3 Normalized Bandwidth overhead 3.3.4 Impact of Bloom Filter Size Adjustment Chapter 4: Evaluation of Unstructured P2P Content Discovery Techniques over MANETs 4.1 Content Discovery Techniques for MANETs 4.1.1 Query Flooding 4.1.2 Expanding Ring 4.1.3 Random Walk 4.1.4 BF-based Probabilistic Routing 4.2 Performance Analysis 4.2.1 Query Success Rate 4.2.1.1 Query Flooding 4.2.1.2 Expanding Ring 4.2.1.3 Random Walk 4.2.1.4 VABF 4.2.2 Query Route Stretch 4.2.2.1 Flooding-based Algorithms 4.2.2.2 Random Walk 4.2.2.3 VABF 4.2.3 Search Cost 4.2.3.1 Query Flooding 4.2.3.2 Expanding Ring 4.2.3.3 Random Walk 4.2.3.4 VABF 4.2.4 Unsynchronized State Duration in VABF 4.3 Simulation Results 4.3.1 Static Environment 4.3.2 Mobile Environment 4.3.3 VABF Message Overhead Chapter 5: Lightweight Anonymous Communication Protocol 5.1 Design Rationale 5.2 Protocol Design 5.2.1 Query Phase 5.2.2 Data Transmission Phase 5.2.3 Extension of MAPCP 5.3 Security Analysis 5.3.1 Degree of Anonymity 5.3.2 Traffic Analysis 5.3.2.1 Timing attacks and flooding attacks 34 35 36 37 39 39 42 43 43 44 44 44 46 46 47 48 50 50 53 53 53 54 54 55 55 56 56 58 62 63 65 66 70 71 71 72 73 76 77 77 79 80 vi 5.3.2.2 Message coding attacks 5.4 Performance Evaluation 5.4.1 Degree of anonymity 5.4.2 Performance of packet delivery 5.4.3 Protocol Overhead 5.4.3.1 Normalized number of packet transmissions 5.4.3.2 Energy consumption 5.4.4 Effect of multipath in hostile environments Chapter 6: Conclusion and Future Work 6.1 Conclusion 6.2 Future Work Bibliography 80 82 82 84 87 88 88 91 93 93 94 97 vii List of Tables Table 3.1: Global variables for the VABF node procedures Table 3.2: Symbol definitions Table 4.1: Notation used 28 32 47 viii List of Figures Figure 2.1: Figure 2.2: Figure 2.3: Figure 2.4: Figure 2.5: Figure 2.6: Figure 2.7: Figure 3.1: Figure 3.2: Figure 3.3: Figure 3.4: Figure 3.5: Figure 3.6: Figure 3.7: Figure 3.8: Figure 4.1: Figure 4.2: Figure 4.3: Figure 4.4: Figure 4.5: The architecture of a semi-P2P network The architecture of a fully decentralized P2P network The architecture of a semi-centralized P2P network Illustration of expanding ring search Illustration of random walk search An example to explain how the distributed hash table works An example of the mobile ad hoc network The scenario in which node j is x hops away from node i A (d+1)-ary tree of multi-level filters The size of the compressed broadcast message decreases logarithmically as the number of shared objects decreases even if the Bloom filter size remains unchanged Performance of the implemented Bloom filters in VABF Query route strength with (a) λ = 1.0 and (b) λ = 10.0 Query success rate in the mobile networks with (a) λ = 1.0, and (b) λ = 10.0 Normalized bandwidth overhead when (a) λ = 1.0 and (b) λ = 10.0 The impact of increasing shared files and dynamic Bloom filter size adjustment on the query success rate. CDFs of the path length for networks of different sizes The Markov chain model of the query routing process in VABF The average size of an update message v.s. network diameter Illustration of content updates and their services in a VABF network 13 14 15 16 16 17 19 25 26 35 37 38 39 40 41 48 51 58 59 61 ix Figure 4.6: Figure 4.7: Figure 4.8: Figure 4.9: Figure 4.10: Figure 4.11: Figure 4.12: Figure 4.13: Figure 5.1: Figure 5.2: Figure 5.3: Figure 5.4: Figure 5.5: Figure 5.6: An example of a w x w square network being divided into 4 areas. Nodes are assumed homogeneous with radio transmission range r Theoretical and simulation results of the query success rate in static networks Theoretical and simulation results of the route stretch in static networks Theoretical and simulation results of the search cost in static networks Comparison of query success rates in mobile networks with (a) λ = 1.0, and (b) λ = 10.0 Comparison of the query route stretch in mobile networks with (a) λ = 1.0, and (b) λ = 10.0 Comparison of the search cost in mobile networks with (a) λ = 1.0, and (b) λ = 10.0 Theoretical and simulation results of the size of a compressed VABF content update message as a function of the network size and the number of content objects in static networks The propagation delay of content updates in (a) the static and (b) the mobile environments Tradeoff between hop-by-hop encryption/decryption schemes and broadcast-based schemes Probability assignment results of flooding control in (a) a grid topology, (b) a randomly generated topology in the 700m-by-700m network field, and (c) a randomly generated topology in the 1000m-by-1000m network field. S is the sender, and R is the receiver By traffic analysis such as timing analysis and payload matching, colluded attackers (represented by black nodes) can divide the network space into smaller cells and shrink the anonymity set into a specific cell Degree of anonymity in the 700m-by-700m field divided into (a) 1 cell, (b) 2 cells, and (c) 9 cells. (d) Degree of anonymity with a larger α value Packet delivery fraction and end-to-end delay in the 700m-by-700m field with (a)(b) high mobility and (c)(d) low mobility Packet delivery fraction and end-to-end delay in the 1000m-by-1000m 64 65 66 67 68 68 69 69 71 76 82 84 86 87 x Figure 5.7: Figure 5.8: field with (e)(f) high mobility and (g)(h) low mobility Overhead in terms of (a) normalized number of control packets, (b) normalized number of data packets, (c) energy consumption in route construction phase, and (d) energy consumption in data transmission phase Simulation results in the hostile environments. The packet delivery fraction and end-to-end delay in (a)(b) the 700m-by-700m field and (c)(d) the 1000m-by-1000m field 89 92 Abstract Themobilead-hocnetwork(MANET)isemergingasanewparadigmofwirelesscommu- nication in both civilian and military applications. Recently, e®orts have been made to migrate peer-to-peer (P2P) applications from the wired Internet to the MANET system, which are expected to be the major impetus to MANET commercialization. Several fun- damental technologies that facilitate the deployment of P2P content distribution systems over MANET are investigated. In particular, we have focused on e±cient P2P content discovery and privacy protection in P2P ¯le sharing. First, a Bloom ¯lter (BF)-based probabilistic routing (VABF) is proposed to improve query e±ciency. The VABF protocol constructs its routing tables using Bloom ¯lters in a distributed manner without the knowledge of the global network. Each node forwards P2P queries to its closest object holders by the shortest path. Simulation results show that VABF outperforms several popular unstructured P2P search algorithms in terms of the query success rate and the route strength. Second, the performance of several unstructured P2P content discovery techniques over MANETs is analyzed. They include: query °ooding, expanding ring search, random walk and BF-based probabilistic routing. The chosen performance metrics are the query success rate, the route stretch and the search cost. Mathematic analysis is conducted to predict their behavior in static networks. Besides, the overhead introduced by the BF-based probabilistic routing is modeled using a M=G=1 queue. Finally, extensive computersimulationisperformedtovalidateouranalyticalresultsinastaticenvironment and shed light on the behavior of these schemes in a mobile environment. Third, the MANET Anonymous Peer-to-peer Communication Protocol (MAPCP) is proposed for privacy protection. MAPCP uses broadcasts with probabilistic °ooding control to establish multiple anonymous paths between communication peers. It requires xi no hop-by-hop encryption/decryption along anonymous paths and builds multiple paths to multiple peers within a single query phase without using an extra route discovery process. The analysis and simulations show that MAPCP always maintains a higher degree of anonymity than a MANET anonymous single-path routing protocol in a hostile environment. xii Chapter 1 Introduction 1.1 Signi¯cance of the Research The peer-to-peer (P2P) network has drawn increasing attention nowadays, and has been widelydeployedovertheInternetforvariouspurposes,includingdistributeddatastorage, content distribution service, collaborative computing and Internet telephony. The P2P technology is attractive for its scalability, fault-tolerance, self-management and low cost of deployment. Examples of P2P content distribution systems include Napster, Gnutella, Kazaa, eDonkey and the BitTorrent network, etc. Regardless of the legitimacy of dis- tributed contents, these systems have satis¯ed one of the most natural desire of human beings, namely, data sharing, which has become one of the most important P2P applica- tions today. In the meanwhile, the mobile ad hoc network (MANET) has been proposed as an al- ternativetocellularnetworksforuseinareaswherea¯xednetworkinfrastructuresuchas base stations are unavailable. Traditionally, MANET has been often associated with ap- plications in digital battle ¯elds and/or disaster areas. However, more recently, MANET isemergingasanewparadigmofwirelesscommunicationforcivilianapplications. Nowa- days, portable devices such as laptops, PDAs and mobile phones are ubiquitous and used in people's daily lives. The materializing of wireless technologies has changed the ap- plication contexts of MANET greatly. A representative example of civilian MANET applications is the vehicular ad hoc network (VANET), which aims to provide safety 1 and commercial applications on the vehicle-vehicle and vehicle-roadside communication networks. MANET resembles the P2P network in some ways. First, both networks lack a ¯xed infrastructure and topology. The P2P peers join and leave frequently and unpredictably while MANET nodes move randomly. Second, both networks require no intermediation ofacentralizedserverorauthority. Instead,theybothrequirethecooperationofnetwork nodes for communication. Third, the formation of both networks re°ects the nature of human behaviors. People with wireless mobile devices gathering together form a mobile ad hoc network and, when people gather, they tend to share and exchange information betweeneachotherinapeer-to-peermanner. Thesesimilaritiesleadtoanaturalbinding of the P2P networks and MANETs, which provides us a strong research motivation. There are already e®orts made to migrate P2P applications from the wired Internet to MANETs. For example, Kortuem et al. [32] proposed a software platform called Proem to ease the development of P2P applications over MANETs. Furthermore, in spite of multihop communication ine±ciency and transient node population in MANET, it is still an attractive platform for P2P content distribution especially in the sharing of noncritical contents. This is supported by observing more R&D e®orts on P2P ¯le sharing technology development in MANETs [7], [16], [19], [29]. Emerging civilian P2P applications over MANETs are expected to arise in many interesting scenarios, including sharing business card, presentation slides, and various multimedia clips among mobile hand-helddevicesandsharingsafetyinformation(e.g. collisionavoidance)andnonsafety information(e.g. tra±ccongestionandroutinginformation,mobileinfotainment)among moving vehicles. The transient nature of wireless communication channels and the constraints of lim- ited energy and computing power of portable devices impose great challenges for the deployment of P2P systems over MANET than that over the wired Internet. The main technical challenges include the following. ² Construction of a locality-aware P2P overlay. For the wired Internet, neighbors on the P2P overlay may be physically hundreds of kilometers away from each other with many hops in between. However, this is 2 usually not a concern due to fast data transmission and relay in broadband wired networks. However, the throughput of a multihop communication path in MANET decays quickly as the number of hops increases. As a result, node locality should be taken into account in the construction of a P2P overlay over MANET. ² E±cient content discovery and delivery algorithms. Most existing unstructured P2P networks employ °ooding-based query. When em- ployedinMANETs,°oodingisnote±cientanditmaycausesevereinterferenceand sometimes the broadcast storm problem [40]. In contract, structured P2P networks build distributed hash tables (DHTs) to locate the desired content in an e±cient way. However, maintaining these hash tables introduces a signi¯cant amount of control tra±c as well as the locality problem as described above. Thus, there is a need in developing an e±cient content discovery and delivery algorithm customized for the P2P systems over MANETs. ² Privacy issues in the multihop communication. ProvidingpeerprivacyintheP2Pnetworkisanimportantproblem. Severaltechni- cal challenges exist in ensuring P2P privacy on MANET as explained below. First, the open environment in MANET makes its radio signals vulnerable to eavesdrop- ping. Second, the multihop communication in MANET involves untrustworthy nodes in a private conversation. Third, MANET nodes are constrained by limited battery and computing power, which prevent computation-intensive schemes such asthepublic-keycryptographyfrombeingadopted. Therefore,existingsolutionsin thewirelineInternetcannotbedirectlyappliedtoMANETforP2Pcommunication without a considerable amount of modi¯cations. 1.2 Review of Previous Work PreviousrelatedworkonmigratingP2PtechnologiesfromthewiredInternettoMANETs and privacy ensurance in both environments is reviewed in this section. 3 1.2.1 P2P systems on MANETs MostworkondeployingP2PoverMANETtransplantsexistingInternetP2Ptechnologies to MANET with primary focus on P2P ¯le sharing [7], [16], [19], [29]. For example, Turi, Contiand Gregori [16] investigatedthe performance of Gnutella, one of the most popular unstructured P2P system, on MANET and proposed a cross-layer design for Gnutella to interact with the MANET routing protocols. It was shown that the cross-layer design has much better performance in the P2P overlay construction. However, there is little innovation in their content discovery algorithm, in which the °ooding-based query is used as the core. Research has been done to apply Internet-based distributed hash tables (DHTs) to MANET. Examples include Ekta [46] (Pastry on DSR) and MADPastry [62] (Pastry on AODV). Although both present good results on adapting DHTs to MANET based on the cross-layer design, the publishing/unpublishing overheads in DHT-based mechanisms makes them less e±cient than °ooding-based mechanisms in the search of highly replicated contents [35]. Furthermore, building a DHT overlay requires the globe knowledge of the network, which is usually unavailable due to the transient nature of mobile connections in MANET. 1.2.2 Routing of P2P Query Messages TheuseofBloom¯lterstoassistdatadisseminationinP2Pnetworkhasbeenproposedin variouscontexts. TheattenuatedBloom¯lterwasproposedin[52]tobuildcontentindices of nearby peers to improve the performance of searching replicas near the query source. The Exponentially Decaying Bloom Filter (EDBF) was proposed in [34] to construct a probabilistic routing table for queries at each node. While both leverage Bloom ¯lters well for content indices and query forwarding, they target at large-scale P2P systems on the Internet and do not address the mobility issues. Repantis and Kalogeraki [51] used Bloom ¯lters for P2P data dissemination in a mobile environment. That is, nodes construct content synopses of their data with Bloom ¯lters and disseminate them to selected nodes. Queries are then routed according to content synopses stored at each node. However, the performance of the proposed algorithm was not thoroughly studied. 4 1.2.3 Anonymous Networking For the wired Internet, most solutions for anonymous communications use application- layer routing to achieve anonymity. For example, the mix-net [11] and the onion rout- ing [49] rely on pre-selected proxies called mix servers or onion routers to relay packets between the sender and the receiver so as to hide the sender-receiver relationship. The sender wraps its outgoing packets with encrypted layers to form an onion, and these layers are torn o® one by one at each proxy en route to reveal the next-hop node until the packet reaches the destination. Crowds [50] provides an example that uses groups of forwarding proxies to conceal the communication pairs. While application-layer solu- tions are attractive in providing anonymity, the mobile and ad hoc nature of MANET makes the pre-selection of mix nodes infeasible. Furthermore, to compose an onion is computationally intensive at the sender. It requires common secrets between the sender and all proxies en route as well as hop-by-hop decryption along the routing path. Such requirementsaretooexpensiveforMANETnodesthatareconstrainedbylimitedenergy and computation power. Most previous work on anonymous communication in MANETs considers network- layer solutions, i.e. anonymous routing protocols [30], [63]. In general, these solutions consist of two phases: (1) the anonymous route discovery phase and (2) the anonymous data transmission phase. In the ¯rst phase, the sender broadcasts a route request mes- sage to discover an anonymous route to its communication target. The entire process usuallyinvolveshop-by-hopencryption/decryptiontoconcealtherouteinformationfrom eavesdroppers. Once the anonymous route is established, the sender proceeds to the anonymous data transmission phase and begins to send data packets via the anonymous route. ANODR, proposed by Kong and Hong [30] [31], is the ¯rst identity-free anonymous on-demandMANETroutingprotocol. ANODRemploystheTrapdoorBoomerangOnion (TBO),avariantofonionthatusesonlysymmetrickeycryptography, tobuildananony- mous routing path. ANODR-TBO greatly reduces the cryptographic overhead when building the anonymous routing path by trading some performance for route pseudonym updateandhop-by-hoppayloadshu²e. ANODRalsoachievesstronguntraceability. The 5 major °aw of this protocol is that it is sensitive to node mobilities, and the route infor- mation is partially revealed if one or more nodes en route are compromised. MASK [63] employsananonymousneighborhoodauthenticationprotocoltoestablishitsroutingpath instead of using the onion structure. MASK is claimed to have a lower computational complexity as compared with ANODR. Alltheseanonymousroutingprotocolsachievegoodperformanceinprivacyprotection for point-to-point unicast communications. However, when being applied to peer-to-peer applicationsoverMANETs,alotofoverheadwillbeintroducedduetothecharacteristics of P2P applications. Most P2P applications involves two phases: the query phase and the data transmission phase. In the query phase, the ¯le requester broadcasts its query message to the entire network, and the ¯le holders reply to the requester the metadata of the requested ¯le. When the requester received enough query replies, it establishes an unicast connection to each ¯le holder and proceeds to the data transmission phase. In ordertoprovideprivacyinP2Papplications,thecommunicationsinboththequeryphase and the data transmission phase should be anonymous. Therefore, the routing protocols aresupposedtoguaranteetheanonymityofbroadcastqueriesinthe¯rstphase, andthen establish an anonymous route between the ¯le requester and ¯le holder. This means two or more rounds of message broadcasts are necessary since the construction of anonymous routes also requires broadcast of route discovery messages. The condition is even worse whentherequesterrequests¯lesfrommultiple¯leholderssimultaneously,whichisacom- mon scenario in P2P applications, not to mention the hop-by-hop encryption/decryption overheads for building a single anonymous route. 1.3 Contributions of the Research In this research, we study the behavior of existing Internet P2P content distribution systemsaswellasthecharacteristicsofMANETs, andproposeseveralnewalgorithmsto solve the problems described above. Speci¯c contributions of this work are summarized below. 6 First, a Bloom ¯lter (BF)-based probabilistic routing, which is a variation of the attenuated Bloom ¯lter (VABF), is proposed in Chapter 3 to improve the e±ciency of query routing. Research in Chapter 3 has the following contributions. ² Avoidance of query °ooding Message°oodingisusuallyanightmareinMANETs. VABFabandonsthe°ooding- based query scheme used by most unstructured P2P networks and forwards the query message to its destination precisely using preconstructed routing tables. It is shownbyexperimentalresultsthatVABFperformsalmostaswellas°ooding-based algorithms and achieves better performance under heavy query tra±c. ² Low maintenance overhead The size of periodically exchanged content indices at each VABF node does not in- crease as the number of shared objects increases. Furthermore, unlike DHT-based algorithms, VABF has no extra overhead in maintaining the P2P overlay topology, particularly with peer joins, departures and failures. We have performed the anal- ysis on the space complexity and the size of a single broadcast message in VABF. Furthermore, we have done simulations on the normalized bandwidth overhead in VABF and several popular search algorithms; namely, °ooding, random walk and expanding ring search. Our simulation shows that the normalized bandwidth over- head in VABF remains almost constant as node mobility increases, and it is much lower than others under heavy query tra±c. ² Fully distributed operation TheconstructionofroutingtablesinVABFisfullylocalized; namely,itreliessolely on message exchanges among neighboring nodes. Unlike DHT-based algorithms, it does not need the global knowledge of the P2P network. ² Preservation of physical node locality There is no need for a VABF node to relocate its shared content or index of shared content to another speci¯c node in the network, and every query message is for- wardedtoitscorrespondingcontentholderthatisclosesttothequeryoriginatorvia pathswiththeminimumnumberofhops. Wehavedonesimulationstodemonstrate 7 the route strength of VABF and several popular search algorithms. It is shown by simulation that the actual query routing path in VABF is almost identical to the shortest path (in terms of hop counts) between these two nodes. Second, comprehensive performance comparison of several unstructured P2P con- tent discovery techniques for MANETs is conducted through mathematical analysis and computer simulation in Chapter 4. The P2P content discovery techniques under study include: query °ooding, expanding ring search, random walk and BF-based probabilistic routing. Each technique is evaluated by its query success rate, route stretch and search cost. Some of the main research results are summarized below. ² The path probability, which is the probability for two randomly selected nodes in an ad hoc network to be connected by at least a path, serves as an upper bound of the query success rate for all content discovery techniques. ² TheBF-basedprobabilisticroutingschemeoutperforms°ooding-basedandrandom walk schemes. The only disadvantage with the BF-based probabilistic routing is that the control packet size is much larger, and it increases as the number of nodes or sharing objects increases. ² The node density has a great impact on the route stretch of °ooding-based and random walk schemes, but little impact on the BF-based probabilistic routing. ² Node mobility has a very limit impact on the query route stretch and the search cost. ² Randomwalkisnotresilienttonodemobilityascomparedwith°ooding-basedand BF-based probabilistic routing schemes. ² Node mobility facilitates the propagation of content updates in the BF-based prob- abilistic routing. ² Our theoretical analysis is well corroborated by computer simulation in most cases. 8 Third, the Manet Anonymous Peer-to-peer Communication Protocol (MAPCP) is proposed in Chapter 5 for privacy protection. The following contributions have been made in this chapter. ² Avoidance of expensive hop-by-hop encryption/decryption in the anonymous route construction MAPCP uses broadcasts together with a probabilistic-based °ooding control algo- rithm to provide a light-weight scheme of establishing anonymous routing paths, which lowers the barrier of energy and computation power for MANET nodes to provide anonymity. We have done extensive simulations to visualize anonymous route constructions in MAPCP, which shows that the proposed °ooding control al- gorithm can e®ectively con¯ne the broadcast messages within an acceptable range and form a probabilistic routing path. ² Provisionofa°exiblemiddlewarebetweentheP2PapplicationsandMANETrout- ing protocols MAPCP is designed to be a middleware on top of the network layer. Applica- tions which need no privacy can bypass MAPCP to avoid the overhead brought by anonymity. WehaveimplementedMAPCPinthens-2simulator~ citens2asastand- alonetransport-layeragentwhichsitsbetweentheapplicationlayer(aGnutella-like P2P client) and the network layer (MANET routing protocols). ² Establishment of multiple anonymous routes within a single query round MAPCP uses a probabilistic way to establish multiple anonymous paths within a singleroundofquery. Ascomparedwithexistinganonymousroutingprotocolsthat build anonymous route one at a time, MAPCP saves more resources in bandwidth and computation power. Our simulation shows that MAPCP introduces lower con- trol overhead and energy consumption in the anonymous route construction phase, and is more resilient to node mobility, failures, and passive attacks by malicious nodes. ² Provision of a higher anonymity degree in a hostile environment Broadcastingwithafakeidentityinherentlyprovidessourceanddestinationanonymity. 9 With the aid of cover tra±c and multiple probabilistic paths, the forwarding route can also be concealed from others. MAPCP uses controlled broadcasts with cover tra±candmultipleprobabilisticpathsand,hence,givesthebestprivacyprotection against colluded malicious nodes. We have done quantization and analysis of the anonymity degree achieved by MAPCP, and our simulation shows that MAPCP al- ways achieves a higher anonymity degree as compared with single-path anonymous routing protocols when collaborated adversaries divide the network into smaller cells. 1.4 Organization of the Dissertation The rest of this disseration is organized as follows. A brief review of P2P and MANET systemsisgiveninChapter2. TheBF-basedprobabilisticroutingforP2Pquerymessages over MANET is proposed in Chapter 3. The performance evaluation of existing unstruc- tured P2P content discovery techniques for MANETs are presented in Chapter 4. The broadcast-based anonymous P2P communication protocol is introduced and discussed in Chapter5. Finally,concludingremarksandfutureresearchtopicsaregiveninChapter6. 10 Chapter 2 Background Review of P2P Networks and MANETs In this chapter, we provide a review of the peer-to-peer (P2P) network and the mobile ad hoc network (MANET). 2.1 Peer-to-peer Networks A peer-to-peer (P2P) network is a network consisting of interconnected peer nodes which can self-organize themselves into an ad hoc network topology without the support of a centralized server or authority [8]. In other words, the organization and operation of P2P networks rely mainly on the resources of participating nodes. A pure P2P network consists of only equal peer nodes that act as both clients and servers to other peer nodes in the network, i.e. there is no client-server architecture. P2P networks have the following advantages. ² Self management The P2P network provides self adaptation in presence of nodes joins, node depar- tures or even node failure without the intermediation of a centralized server, and it is able to adjust its network topology dynamically. ² Scalability In P2P networks, each participating node provides its own resources, including computing power, storage and bandwidth. Therefore, the total capacity of the system increases as more nodes join the network. In contrast, in the client-server 11 architecturewitha¯xedsetofservers, anincreasingnumberofclientsimpliesmore burden on servers and less resource per demanding client. ² Fault tolerance The lack of centralized servers prevents P2P networks from the problem of a single- point failure. Furthermore, by replicating data over multiple peers in a distributed manner, P2P networks are more resilient to node or link failure. TheP2Pnetworktechnologyhasbeenwidelyemployedforvariousapplicationsnowa- days. They include collaborative computing ( e.g. the Human Genome Project [1] and SETI@home [5]), instant messaging (e.g. AOL, ICQ, Yahoo, MSN and Jabber [2]), In- ternet telephony (e.g. Skype [6]), distributed database systems (e.g. PIER [25] and LRM [57]), and content distribution (e.g. Napster, Kazaa [3], Freenet [13], BitTor- rent [14], the OceanStore project [33], Gnutella [53] and PAST [55]). This research focuses on the area of content distribution, which is the area that most of the popular P2P systems fall within. Based on how peer nodes are linked to each other and how their shared contents are placed, P2P networks can be classi¯ed as unstructured or structured as described below. 2.1.1 Unstructured P2P Networks In an unstructured P2P network, network overlay links are established arbitrarily while the placement of content is unrelated to its underlying overlay topology. Content discov- ery and location in unstructured P2P networks usually involves query °ooding or some less-complex strategies such as random walk [36] and expanding ring search. However, thereisnoguaranteeofsuccessfulquery(especiallyfortheraredatasharedbyonlyafew peers) for the latter. Since °ooding demands high bandwidth overhead, the unstructured P2P network has poor search e±ciency and scalability. However, it has some advan- tages such as high °exibility and low overhead in overlay maintenance. The unstructured P2P networks are generally more suitable for transient node populations and dynamic contents. 12 2.1.1.1 Classi¯cation of Unstructured P2P Networks UnstructuredP2Pnetworkscanbefurtherclassi¯edassemi-P2P,fullydecentralizedP2P and semi-centralized P2P networks as described below. ² Semi-P2P networks AsillustratedinFig.2.1,aglobecentralizedindexserverisusedtostorethecontent information shared by all peers in the semi-P2P network. The index server answers ¯le queries from peers and provides some information of the queried ¯le such as the location of ¯le owners. After successfully received the query reply from the index server, peers establish a direct connection between each other for ¯le transmission. File discovery in such networks is quite e±cient since the information of sharing ¯les are managed in a centralized manner. However, the centralized index server su®ers from the single node failure, and its capability also poses a limitation to the scalability of the network. The notorious Napster ¯le sharing network belonged to this class. Figure 2.1: The architecture of a semi-P2P network. 13 ² Fully decentralized P2P networks As shown in Fig. 2.2, a fully decentralized P2P network has a °at network archi- tecture that does not maintain any centralized index server. File discovery in such networks is achieved by °ooding the query messages to the entire or partial (by setting the TTL value) network. When the query is °ooded partially, there is no guaranteeofaqueryresponseeventhoughthequeried¯lemayexistinthenetwork. DecentralizedP2Pnetworksare°exible and resilientto peerdynamicsandfailures. However,theyusuallyhavepoorscalabilityduetotheine±ciencyofquery°ooding. The Gnutella ¯le sharing network [53] is an example of the fully decentralized P2P network. Figure 2.2: The architecture of a fully decentralized P2P network. ² Semi-centralized P2P networks The semi-centralized P2P networks use supernodes to improve scalability as de- picted in Fig. 2.3. In such networks, peers with powerful computing ability and fast network connection will automatically become the supernodes and act as the temporary index servers for other slower peers. Each peer (not a supernode) picks 14 a supernode as its index server and uploads the list of its sharing ¯les to this su- pernode. It also sends its query messages to this supernode. These supernodes communicate with each other for query processing. After received enough query replies, peers build directed connections between each other for ¯le transmission. The Kazaa ¯le sharing network (also known as its underlying P2P protocol Fast- Track) is a representative of the semi-centralized P2P network. Figure 2.3: The architecture of a semi-centralized P2P network. 2.1.1.2 Alternatives for Query Search Query °ooding is a brutal and ine±cient way of query search. Two alternatives which are more bandwidth e±cient have been proposed as alternatives as detailed below. ² Expanding Ring Search As illustrated in Fig. 2.4, one way to control the °ooding of the query message is achieved by setting its time-to-live (TTL) to a smaller value so that the query message will not be forwarded to further nodes when its TTL goes to zero. This is called the expanding ring search. If the query gets no response, another query mes- sage with a larger TTL value will be broadcast again. The process is then repeated until a response is successfully received by the query originator. As compared with 15 query °ooding, expanding ring search is much more bandwidth e±cient when the queried ¯le is nearby. However, if the ¯le is far away from the query originator, this approach could be even worse than query °ooding. Figure 2.4: Illustration of expanding ring search. ² Random Walk Search With random walk search, a node forwards the query message in a probabilistic way. For example, the query message is forwarded to one of its neighboring nodes with an equal probability until it reaches the node that can satisfy the query. This concept is shown in Fig. 2.5. Random walk search is much more e±cient than query °ooding when the queried ¯le is popular or highly replicated. However, if the queried ¯le is scarce and held by only a few nodes, random walk search may have a higher probability of query failure. Figure 2.5: Illustration of random walk search. 16 Figure 2.6: An example to explain how the distributed hash table works. 2.1.2 Structured P2P Networks Structured P2P networks attempt to address the scalability issue of unstructured P2P networks by maintaining a Distributed Hash Table (DHT), in which a hash function is applied to each node identi¯er (e.g. a node address) and each content identi¯er (e.g. a ¯lename) to get the hash values. Then, the overlay topology and the content placement strategy are decided according to the calculated hash values. The system provides a mapping between the content and the node so that a query to the content based on its hashed identi¯er will be e±ciently routed to the corresponding node. Consequently, structured P2P systems are generally more scalable than unstructured ones. ThebasicideaofhowaDHTworksisshowninFig.2.6. Inthisexample, apeernode publishes the ¯le foo.mp3. Its content identi¯er, which is the ¯lename here, is hashed by the hash function H to result in a hash value h (5 in this example). The hash value is then looked up in the mapping table to get its corresponding peer node (node 3). Then, this ¯le or an index to this ¯le will be stored at node 3. When a peer node receives a query for foo.mp3, it hashes the ¯lename using the same hash function H to get the desired hash value 5. Likewise, with the mapping table, the query will be forwarded to node 3. ThedisadvantageofstructuredP2PnetworksisitshighoverheadofDHTmaintenance and content publishing/unpublishing in the presence of transient node populations and dynamic contents. Another disadvantage is the lack of keyword search (or fuzzy search), 17 which is straightforward in unstructured P2P networks. To the best of our knowledge, building a substrate of keyword search over the DHT overlay remains to be an open problem. Representatives of DHT networks include CAN [47], Pastry [54] (infrastructure of PAST), Chord [60] and Tapestry [64](infrastructure of OceanStore). 2.2 The Mobile Ad Hoc Networks Amobileadhocnetwork(MANET)isacollectionofmobileusersthatcommunicatewith each other over wireless links (e.g. Bluetooth or 802.11) without the aid of ¯xed base stations or any central administration. MANETs have the following characteristics [23]. ² Dynamic network topology Since nodes may be mobile, links between nodes may change as nodes move, which results in a varying network topology. ² Multihop communication Due to the limited radio transmission range and the lack of ¯xed base station, each MANET node should serve as a router as well as a host to relay packets on behalf of other nodes whenever necessary. ² Unstable wireless links Mobility results in transient links between nodes, and the aggregated link errors along the multihop communication path causes °uctuation in link capacity. ² Power constraints Mobile devices are usually battery-driven and have a tight power budget and a smallermemorysize,whicha®ecttheircomputingandcommunicationpower. Thus, power consumption should be considered in the design of algorithms to be run on MANET nodes. An example scenario of a mobile ad hoc network formed by laptops, PDAs and smart phones is shown in Fig. 2.7. Due to the limited radio transmission range and the random node location, the communication between each mobile devices may involve one single 18 Figure 2.7: An example of the mobile ad hoc network. hop or multiple hops. Furthermore, internal mobile nodes can access the Internet via edge nodes, which may be connected to a 802.11 access point or via the GPRS/3G smart phone. TheroutingprotocolsinMANETscanberoughlyclassi¯edintoproactiveandreactive two types as described below. ² Proactive routing Proactive (or table-driven) routing protocols rely on a regular exchange of informa- tion about the network topology among nodes to maintain a routing table at every single node. The advantage is that there is minimal delay in determining the route to be taken, which is good for real-time tra±cs. The drawback is that a signi¯cant amount of bandwidth will be consumed by control messages as the node mobility increases, since the rate of control messages exchange must re°ect the dynamics of the network to keep the validity of the routing table. Thus, proactive routing pro- tocolsarebestfornetworkswithlownodemobilityandfrequentdatatransmission. RepresentativesofproactiveMANETroutingprotocolsincludetheOptimizedLink StateRouting(OLSR)[26]andtheDestination-SequencedDistance-VectorRouting (DSDV) [43]. 19 ² Reactive routing In contrast, reactive (or on-demand) routing protocols construct a route only when it is needed. When there is a packet to send, such a protocol sends a route request message to discover the route, which prevents the node from sending unnecessary route updates. The drawback is that there is a signi¯cant delay before the packet can be sent. Furthermore, the route request messages are usually °ooded to the network, which also results in a signi¯cant amount of control tra±c. Reactive routingprotocolsarebestfornetworkswithhighnodemobilityandinfrequentdata transmission. Representatives of reactive MANET routing protocols include the Dynamic Source Routing (DSR) [27], the Temporally Ordered Routing Algorithm (TORA) [41] and the Ad Hoc On-Demand Distance Vector (AODV) [42]. 2.3 Similarities between P2P Networks and MANETs P2P networks and MANETs share some similar characteristics [56]. First, they are both decentralized, self-organized, and lack of central administration. Second, they both form °at and ad hoc network topologies. Peers join and leave the overlay arbitrarily in P2P networks while the random movement of nodes leads to a varying topology in MANETs. Third,theybothestablishhop-by-hopconnectionsandsupportmultihopcommunication. MultihopcommunicationhasbeenwidelyusedbyP2Psystemsforvariouspurposessuch as anonymous communication and Voice over IP (VoIP) services. In the former case, packets of the communication between two peers are relayed by several intermediate "proxy nodes" to conceal the real sender and the receiver from the outside world. In the latter case, the voice stream of a conversation between two peers which both reside behind the ¯rewall are in general relayed by another peer in the public IP domain. 20 Chapter 3 Bloom Filter-Based Probabilistic Routing This chapter focuses on the search issues, and proposes a Bloom ¯lter-based probabilistic routingprotocolforP2PqueryroutingoverMANETs,calledthevariedattenuatedBloom ¯lter (VABF), which is a variation of the attenuated Bloom ¯lter proposed by Rhea et al. [52]. The VABF protocol is designed with the following objectives: ² Avoid query °ooding. Flooding in MANET is ine±cient and leads to the broadcast storm problem [40], which causes severe interference and packet loss, and hence low bandwidth e±ciency. ² Improve search e±ciency. The query can always be forwarded to an adaptive node which is closer to the ¯nal destination. If the node knows in advance that the queried object does not exist in the network, it can simply drop the query to save bandwidth. ² Shorten the communication path. The multi-hop communication in MANET is expensive. Therefore, the search algorithm should try to locate the nearest object holder and return the shortest path in terms of hop counts to this node. Theproposeddesignisevaluatedthroughbothanalyticalmodelingandextensivesim- ulations. The analysis presented in this document provides an insight of the approximate overhead a deterministic query routing may incur. Through the analytical modeling we derivetheexpressionsforthequerysuccessrateandthecorrelationbetweenthefrequency of content updates and the delay for the updates to be propagated to the entire network. 21 Our simulation curves closely match the predictions of our analysis. Our simulation also comparestheVABFprotocolwiththreepopularsearchalgorithms,namely°ooding,ran- domwalk, andexpandingringsearch, adoptedbyunstructuredP2Psystems. Simulation shows that VABF has much higher search e±ciency especially under heavy query tra±c. 3.1 Protocol Design The basic concept of the content-based query routing is to let each node forward the query to the neighbor which is explicitly one-hop closer to the ¯nal destination, and the forwarding decision is made based on the knowledge the node has. An intuitive way to achieve this is to propagate the metadata of shared objects at each node to every other node in the network so that every node knows who has the object and where to route its corresponding queries. However, the size of these metadata could be too huge and could be proportional to the number of shared objects and the number of nodes in the network, which could occupy very huge memory space and bandwidth resource. Therefore, VABF adopts a space-e±cient data structure, i.e. the Bloom ¯lter [9], to represent the set of shared objects. The following section gives an overview of the Bloom ¯lter, and Section 3.1.2 presents the detail of the VABF protocol. 3.1.1 Bloom Filter Overview A Bloom ¯lter is a probabilistic data structure for representing a set of elements and assisting membership queries. It uses an m-bit array and k independent hash functions h 1 :::h k with hash values in the range of [1;:::;m] to represent a set S of n elements, S = fx 1 ;:::;x n g. Each element x i is hashed by these k hash functions to get k values h 1 (x i );:::;h k (x i ), and the corresponding bits in the m-bit array indexed by the k hash valuesaresetto1. AqueryforcheckingifanelementyisinS checksallthecorresponding bits indexed by h j (y);j = 1:::k. If all these k bits are 1, then y is in S with probability (1¡P fp ), where P fp is the false positive rate. Otherwise y is de¯nitely not in S. The false positive rate can be su±ciently small and acceptable by many applications. The 22 following three theorems describe some properties of the Bloom ¯lter, and their proofs are in [9] and [10]. Theorem 3.1.1 (False positive rate) Given an m-bit Bloom ¯lter with k perfectly random hash functions, the false positive rate of this Bloom ¯lter when representing a set of n elements is: P fp = (1¡(1¡ 1 m ) kn ) k ¼ (1¡e ¡ kn m ) k Theorem 3.1.2 (Fraction of zero bits) Given the number of elements n, the Bloom ¯lter size m, and the optimum number of hash functions k such that the resulting false positiverateisminimum, theexpectedfractionofzerobitsintheBloom¯lteris e ¡kn m ¼ 1 2 . Theorem 3.1.3 (Union of Bloom ¯lters) Given two sets of elements S 1 and S 2 and two Bloom ¯lters B 1 and B 2 , representing these two sets respectively, with the same number of bits and the same set of hash functions, the Bloom ¯lter representing the union of these two sets S 1 [S 2 can be obtained by taking the bit-wise OR operations on B 1 and B 2 . CountingBloom¯lter. TraditionalBloom¯lterssupportinsertionofelements,but not deletion. To support deletion of elements, Fan et al. proposed the counting Bloom ¯lter [20], in which an integer counter is used instead of a single bit for each Bloom ¯lter entry. Thecorrespondingcountersincreasebyoneasanelementisinserted,anddecrease byonewhenanelementisremoved. Inpractice,4bitspercountershouldsu±ceformost applications [20]. To convert a counting Bloom ¯lter to a traditional Bloom ¯lter, just simply set the corresponding bits of nonzero values to one. 3.1.2 Algorithm Design The basic idea behind our VABF protocol is that each node maintains a set of Bloom ¯lters organized in multi-level structure, where ¯lters at level h of the node represent the set of objects shared by the nodes within h-hop distance. Each node periodically 23 broadcasts its own set of Bloom ¯lters to its 1-hop neighbors for update. The detail procedures are described below. 3.1.2.1 Bloom ¯lter set maintenance EachVABFnodemaintainsthe multi-level ¯lters and periodicallybroadcaststhem to its 1-hop neighbors. The multi-level ¯lter is de¯ned as follows: De¯nition 3.1.1 (Multi-level ¯lters) The level-0 ¯lter of node i, denoted by B 0 i , is the Bloom ¯lter which represents the objects shared by node i alone. The level-h ¯lter of node i, h ¸ 1, denoted by B h i is obtained by taking bit-wise OR operations on node i's level-(h¡1) ¯lter B h¡1 i and its 1-hop neighbors' level-(h¡1) ¯lters B h¡1 j . Initially each VABF node maintains a counting Bloom ¯lter to represent its own shared objects. The counting Bloom ¯lter is then converted to a traditional Bloom ¯lter, which is just its level-0 ¯lter. The node also sets a update timer to periodically trigger the broadcast of its own ¯lters to its 1-hop neighbors. Before next broadcast, the node collects level-h ¯lters, h¸ 0, from its neighbors, caches them, and makes its own higher level (h+1) ¯lters. The making of next higher level ¯lters stops when its current highest level ¯lter contains all the ¯lters received from its neighbors. At the next broadcast these ¯lters are again broadcast to its neighbors. The level of ¯lters at each node increases as theaboveproceduregoesonuntilthepropagationofBloom¯ltersreachesasteadystate, inwhichthehighest¯lterateachnodeisidentical, whichiscalledthe synchronized state. Theorem 3.1.4 (The synchronized highest level ¯lters) LetB H i i bethehighestlevel ¯lter of node i in the synchronized state, and let l i;j be the number of hops on the shortest path between node i and node j;j6=i, then H i ¸l i;j 8j6=i Proof: Assume that there exists node j which is x hops away from node i along the shortest path, and x = H i +1. Let i 1 ;i 2 ;:::;i x¡1 be the nodes along the shortest path between node i and node j, with node i 1 being closest to node i, and node i x¡1 being 24 closest to node j, as illustrated in Fig. 3.1. According to the VABF protocol, in the synchronized state, the level-0 ¯lter of node j, B 0 j , must have been propagated to node i x¡1 , whichmusthavetriggerednodei x¡1 tomakeitslevel-1¯lter, B 1 i x¡1 . Likewise, node i x¡1 musthavebroadcastB 1 i x¡1 to nodei x¡2 and triggered it to make B 2 i x¡2 . The process continues until node i 1 received B x¡2 i 2 , made its level-(x¡1) ¯lter B x¡1 i 1 , and broadcast it to node i. Therefore node i, when received B x¡1 i 1 from node i 1 , must have made its level-x ¯lter B x i . However, x=H i +1>H i , which contradicts the claim that B H i i is the highest level ¯lter of node i. Therefore, such node j does not exist. Figure 3.1: The scenario in which node j is x hops away from node i. 3.1.2.2 Query processing The set of Bloom ¯lters maintained by a VABF node i can be organized into a (d+1)- ary tree, where d is the degree of this node, i.e. the number of neighbors, as shown in Fig. 3.2. The tree root is the level-0 ¯lter of this node, and the left most node at level h is its level-h ¯lter. The remaining leaves at level h are level-(h¡1) ¯lters received from neighbors of this node. According to Theorem 3.1.4, the height of this tree H i is equal to the maximum number of hops between this node to any other nodes, along the shortest paths, in the network. When a query arrives, the Bloom ¯lter of this query B Q is ¯rst checked with the tree root B 0 . If B 0 contains B Q , this node may have the object being queried, with highprobability. Otherwise, itperforms depth-¯rst search onthistree until a¯lterwhichcontainsB Q isfound, sayB h , thenitswitchesto breadth-¯rst search untila ¯lter which contains B Q is found, say B h¡1 j ;j6=i. In the ideal case of no false positive, it must be able to locate a ¯lter in the breadth-¯rst search. The query is then forwarded to the neighbor node j. However, if no ¯lter containing B Q is found throughout the search process, we conclude that the object being queried does not exist in the network. 25 Figure 3.2: A (d+1)-ary tree of multi-level ¯lters 3.1.2.3 Bandwidth saving Duetotheconstraintsonwirelesschannels,bandwidthsavingisofimportanceinMANET. ToreducethebandwidthconsumptionfromBloom¯lterspropagation,thearithmeticcod- ing algorithm [61] is used in VABF to compress the multi-level ¯lters of a node before theyarebroadcasttotheneighbors. Sincethecompressionratioofthearithmeticcoding is decided by the entropy of the input string, the lower the entropy is, the higher the compression ratio we gain. Therefore, VABF performs delta compression on the multi- level ¯lters to reduce the entropy before arithmetic coding. Let B =fB 0 ;B 1 ;:::;B h g be the set of ¯lters to be broadcast. The delta compression tries to eliminate the duplicated 1 bits in these ¯lters by taking bit-wise exclusive-OR operation on each adjacent level of ¯lters to get a new set of ¯lters B' =fB 0 ;B 1 0 ;:::;B h 0 g, where B h 0 = B h¡1 ©B h ;h¸ 1. Then the input to the arithmetic encoder is made from concatenating the ¯lters in B' into a single bit string. The output of the arithmetic encoder, which has a smaller size than the concatenation of the original multi-level ¯lters, is then broadcast to the 1-hop neighbors. Theorem 3.1.5 (Entropy of ¯lters after delta compression) Themulti-level¯lters in VABF have lower entropy value after applying the delta compression. 26 Proof: RecallthatinVABF,theselectionoftheBloom¯ltersizemtriestoaccommodate the maximum possible number of shared objects in the network. Therefore, based on Theorem 3.1.2, with the optimum number of hash functions k, the fraction of zero bits in the maximum level of Bloom ¯lter at each node is greater than or equal to 1=2, and hencethefraction ofzerobitsin thelowerlevelBloom ¯ltersis also greaterthanorequal to1=2. Furthermore, applying delta compression on these Bloom ¯lters can only increase the fraction of zero bits, since it removes from the higher level ¯lters the corresponding 1 bits which are already in the lower level ¯lters. according to the binary entropy function H(p) = ¡plog 2 p¡(1¡p)log 2 (1¡p), where p is the fraction of zero bits, the entropy value is 1 when p=1=2, and the entropy decreases as p increases. 3.1.2.4 Dynamic Bloom ¯lter size adjustment When a VABF node joins the P2P network, it listens to the network for any Bloom ¯lter update message and learns the Bloom ¯lter size currently adopted by other peers in the network. If no message is heard after a period of time, it selects a size according to the number of its own shared objects. The selected size should be as large as possible to accommodatepossibleincreasednumberofsharedobjectsinthefuture, eitherfromitself or from other newly joined peers. However,thenodemaynotagreewiththeBloom¯ltersizeitlearnedfromotherpeers, e.g. the node may share much more objects, which cannot be accommodated by current ¯lter size, than other peers. Moreover, the total number of shared objects in the network is unknown and changes from time to time as peers join and leave. The initially selected Bloom ¯lter size may not be able to accommodate all the shared objects in the future as the P2P network grows. In such case, the high false positive rate will lead to signi¯cant performancedegradationinaqueryroutingprotocol. Therefore,VABFprovidesameans for each node to negotiate the Bloom ¯lter size dynamically in a distributed manner. The basic concept is that each VABF node constantly monitors the number of 1 bits in its own highest level ¯lter, and to keep the fraction of 1 bits in an acceptable range. Let B H i i be the highest level ¯lter of node i (level-H i ¯lter), and m 1 i be the fraction of 1 bits in B H i i . According to Theorem 3.1.2, the optimum Bloom ¯lter size with minimum 27 false positive rate has the fraction of 1 bits very close to 1 2 . Therefore, the goal of the dynamic ¯lter size adjustment is to maintain an acceptable false positive rate by keeping m 1 i close to 1 2 . Here is how the size adjustment works. When node i ¯nds its m 1 i has exceeded the threshold, or the estimated false positive rate has exceeded its acceptable value for the lastH i T seconds(whereT isitsbroadcastperiod), itappendsa request to change (RTC) message,whichcontainsitsnewlyselectedBloom¯ltersize,toitsnextbroadcastmessage. (Theselectionofthresholds, thenew¯ltersizeandhowtoestimatethefalsepositiverate willbediscussedinSection3.2.2.) NodeithenwaitsforanotherH i T seconds. Ifanother peer node j receives the RTC message and agrees with the new size, it leaves the RTC untouched and forwards it in its next broadcast. Otherwise, it overwrites the RTC with its own one with the new size it selects before forwarding it. If node i did not receive any RTC with di®erent size during the last H i T seconds, it starts to make its Bloom ¯lters of the new size. Otherwise, the size negotiation continues for another H i T seconds. Algorithms 1 to 6 illustrate the functionality of the VABF protocol in pseudocode, including node initialization and routines, routing table maintenance, Bloom ¯lter size checking, and incoming message processing. The global variables used by the procedures in these algorithms are listed in Table 3.1. Table 3.1: Global variables for the VABF node procedures Type Variable name Description boolean negotiatingFilterSize The state of size negotiation integer newFilterSize The new ¯lter size to negotiate integer currentFilterSize The ¯lter size being used integer maxLevel The maximum level among all ¯lters of this node time t T Periodic broadcast time interval time t timeToEndNegotiation Timetoendsizenegotiationandswitchtothe new size 28 Algorithm 1 Node initialization and routines Procedure NodeStart() 1: Initialize() 2: GoToWork() Procedure Initialize() 1: negotiatingFilterSize := TRUE 2: newFilterSize:=¡1 3: currentFilterSize := DEFAULT FILTER SIZE 4: maxLevel :=0 5: Setup update timer time T 6: timeToEndNegotiation := CURRENT TIME + T 7: while CURRENT TIME·timeToEndNegotiation do 8: if Received the broadcast Bloom ¯lters from a neighbor then 9: currentFilterSize := the size used by the received Bloom ¯lters 10: Break 11: end if 12: end while 13: negotiatingFilterSize := FALSE Procedure GoToWork() 1: while TRUE do 2: MakeFilters() 3: CheckFilterE±ciency() 4: BroadcastFilters() 5: Wait and listen for T seconds 6: end while Algorithm 2 Bloom ¯lter size checking Procedure CheckFilterEfficiency() 1: Let r be the ratio of 1-bit in my level-maxLevel ¯lter B maxLevel 2: if r has exceeded the threshold for the last (maxLevel£T) seconds then 3: Calculate the new Bloom ¯lter size and assign the result to newFilterSize 4: negotiatingFilterSize := TRUE 5: timeToEndNegotiation := CURRENT TIME + (maxLevel£T) 6: end if 29 Algorithm 3 Routing tables maintenance Procedure MakeFilters() 1: if negotiatingFilterSize = TRUE then 2: if timeToEndNegotiation· CURRENT TIME then 3: negotiatingFilterSize := FALSE 4: currentFilterSize := newFilterSize 5: end if 6: end if 7: Make my level-0 ¯lter B 0 of size currentFilterSize 8: h:=0 9: while at least one level-h ¯lter B h i in my cache do 10: B h+1 :=B h 11: for all cached level-h ¯lters B h i do 12: if B h contains B h i then 13: Drop B h i 14: else 15: B h+1 :=B h+1 [B h i 16: end if 17: end for 18: h:=h+1 19: maxLevel :=h 20: end while Procedure BroadcastFilters() 1: String Tx := EncodeMyFilters() 2: if negotiatingFilterSize = TRUE then 3: Append to Tx an RTC message of the value newFilterSize 4: end if 5: Broadcast Tx to all my 1-hop neighbors Procedure EncodeMyFilters() 1: String S :=DeltaCompression(my ¯lters) 2: S :=ArithmeticEncoding(S) 3: return S Algorithm 4 Incoming message processing Procedure processMessage(CQRMessage msg) 1: if msg is a query message then 2: ProcessQuery(msg:queryString) 3: else if msg is a Bloom ¯lter update message then 4: Extract the neighbor ID and its Bloom ¯lters from msg 5: Cache the neighbor ID and its Bloom ¯lters 6: if RequestToChange message is embedded then 7: ProcessRequestToChange(msg:RequestToChange:size); 8: end if 9: end if 30 Algorithm 5 Incoming message processing: query processing Procedure ProcessQuery(string Q) 1: Make the Bloom ¯lter B Q from Q 2: for all my level-h ¯lters B h do 3: if B H contains B Q then 4: if h=0 then 5: return query success 6: else 7: for all cached level-(h¡1) ¯lters B h¡1 i do 8: if B h¡1 i contains B Q then 9: forward Q to node i 10: return query success 11: end if 12: end for 13: end if 14: end if 15: end for 16: return query not found Algorithm 6 Incoming message processing: RTC processing Procedure ProcessRequestToChange(int size) 1: if negotiatingFilterSize = TRUE then 2: if newFilterSize6=size then 3: if size is acceptable then 4: newFilterSize:=size 5: timeToEndNegotiation := CURRENT TIME + maxLevel£T 6: end if 7: end if 8: else 9: if size6=currentFilterSize then 10: if size is acceptable then 11: newFilterSize:=size 12: else 13: Calculate the new Bloom ¯lter size and assign the result to newFilterSize 14: end if 15: negotiatingFilterSize := TRUE 16: timeToEndNegotiation := CURRENT TIME + maxLevel£T 17: end if 18: end if 31 Table 3.2: Symbol de¯nitions Symbol Description n The number of distinct objects shared by all mobile nodes. m The number of bits of conventional Bloom ¯lters, and also the number of counters used by counting Bloom ¯lters. k The number of hash functions used by both conventional Bloom ¯lters and counting Bloom ¯lters. N The set of all mobile nodes in the network. H i ;H The maximum number of hops from node i to all other nodes, along the shortest path, in the network. H =MAXfH i g D The average node degree. d i The degree of node i, i.e. the number of node i's neighbors. B h i The h-level Bloom ¯lter of node i, which represents the objects shared by all nodes within h hops from node i, including the objects shared by node i itself. B 0 i represents the objects shared by node i itself. 3.2 Performance Analysis ThissectionpresentstheanalysisoftheperformanceandoverheadsoftheVABFprotocol. Symbols used in the following analysis are listed in Table 4.1. 3.2.1 Space Complexity Space complexity is de¯ned as the memory space required by a VABF node for maintain- ing its multi-level ¯lters, which is given by the following theorem. Theorem 3.2.1 (Space complexity in VABF) ThespacecomplexityofaVABFnode is O(m¢H¢D), where D is the average node degree. Proof: A VABF node needs to keep a counting Bloom ¯lter for its shared objects, and cachesallitsneighbors'Bloom¯ltersandnodeidenti¯cations. Letbbethenumberofbits percounterusedinthecountingBloom¯lter,andl nid bethelengthofnodeidenti¯cation in bits. Therefore, the space complexity S of an arbitrary VABF node i is: S · mb+mH i d i +l nid d i = O(m¢H¢D) 32 When the size of the Bloom ¯lter and the network territory are ¯xed, according to Theorem 3.2.1, the space complexity of a VABF node is proportional to only the number of its neighbors. 3.2.2 The Impact of Dynamic Bloom Filter Size Adjustment The size of the Bloom ¯lter dominates the false positive rate and hence the query success rateinVABF.Therefore,VABFprovidesawaytodynamicallyadjusttheBloom¯ltersize inadistributedmannerwhenitdetectsthatthecurrentusingsizevalueisinappropriate. The size adjustment should follow three principles as described below: ² Maintain an acceptable false positive rate, since the false positive rate dominates the performance of query in VABF. ² Keep the fraction of 0 bits in any Bloom ¯lter greater than or equal to 1 2 . The decrease of 0 bits implies the increase of shared objects in the network and possibly the need to increase the ¯lter size. Moreover, a ¯lter string with 0 bits less than one-half of its size may lead to poor compression ratio in the arithmetic encoding and hence higher bandwidth overhead. ² Adjustonlywhennecessary. ThechangeoftheBloom¯ltersizebreaksthesynchro- nization of routing tables among peers, and causes ripples of propagation of Bloom ¯lterupdates. Therefore, the performance of query routing drops signi¯cantlyuntil all peers reach another synchronized state. Therefore, the dynamic size adjustment should be triggered only when necessary. 3.2.2.1 Time to trigger the size adjustment The size adjustment should be triggered only in the moments when (1) the increased number of shared objects results in an unacceptable false positive rate or a Bloom ¯lter having the fraction of 1 bits greater than one-half, or (2) the number of shared objects decreases dramatically. In the former case, a VABF node i keeps monitoring the fraction of 1 bits in its highest level ¯lter B H i , and estimates the current false positive rate. Let m 1 be the number of 1 bits in B H i . Since the number of shared objects in the 33 network is unknown, the false positive rate can only be estimated using m 1 . Recall that in Theorem 3.1.1 and 3.1.2, the false positive rate of a Bloom ¯lter can be expressed as: P fp = (1¡ m¡m 1 m ) k = ( m 1 m ) k Let ½ be the target false positive rate to be maintained, we can now obtain one su±cient and necessary condition for triggering the size adjustment as: m 1 m >MINf 1 2 ;½ 1 k g (3.1) In the second case in which the number of shared objects decreases, the Bloom ¯l- ters currently being used may become redundant and cause extra bandwidth overhead. However, as the number of shared objects in the network decreases, the fraction of 0 bits in the Bloom ¯lter also increases, which makes the ¯lters sparser and hence improves the compression ratio when encoding the broadcast messages. Figure 3.3 illustrates the size of a single compressed broadcast message of the VABF node i versus the number of shared objects when the Bloom ¯lter is ¯xed at the maximum size, and is compared with thatwhentheBloom¯ltersizeisdynamicallyadjustedaccordingtothenumberofshared objects. As seen, the size of the compressed broadcast message decreases logarithmically asthenumberofsharedobjectsdecreaseseveniftheBloom¯ltersizeremainsunchanged. Therefore, in the case when the number of shared objects decreases, the bandwidth over- head caused by the redundant Bloom ¯lters will be mitigated by improved compression ratio on these sparse ¯lters, and a leisurely adjustment to the Bloom ¯lters is tolerable. 3.2.2.2 Value of size adjustment Let ½ be the target false positive rate to be maintained, m 1 be the number of 1 bits in the current highest level Bloom ¯lter B H i of node i, and m 0 be the new Bloom ¯lter size. To maintain the desired false positive rate, the following condition should be satis¯ed: ( m 1 m 0 ) k ·½ 34 Figure 3.3: The size of the compressed broadcast message decreases logarithmically as the number of shared objects decreases even if the Bloom ¯lter size remain unchanged. In this example, k =7, N =50, and H i = 5. The optimal Bloom ¯lter size is calculated as 10n (10 times the number of shared objects), and the ¯xed Bloom ¯lter size is 10000 bits. Therefore, the necessary condition for the new Bloom ¯lter size can be obtained as: m 0 ¸MAXfm 1 ½ ¡1 k ;2m 1 g (3.2) To avoid frequent size adjustment, the new ¯lter size should be set to a value larger than the above boundary to accommodate the °uctuation or possible increasing in the number of shared objects. 3.3 Simulations The simulations are conducted within ns-2 [4]. The VABF protocol is implemented as a transport-layer agent on top of the AODV network layer agent, and a Gnutella-like P2P clientisimplementedattheapplicationlayertosimulatethebehaviorofP2P¯lesharing. TwokindsofMAClayersprovidedbyns-2aresimulated: theperfectMAClayer,inwhich there is no collision and bu®ering delay, and the IEEE 802.11 MAC with the distributed coordination function (DCF) for wireless LANs. The radio model uses characteristics 35 similar to Lucent's WaveLAN, with 2 Mbps channel capacity, 250m radio propagation range, and the two-way ground re°ection propagation model as the physical-layer path loss model. Three popular search algorithms adopted by most unstructured P2P sys- tems, namely °ooding, random walk, and expanding ring search, are also simulated for performance comparison. Selection of hash functions. The hash functions used by the Bloom ¯lters deter- minethefalsepositiverateandhencethesearchperformanceandcomputationaloverhead in VABF. However, there is no hash function which is perfectly random, and hence the practical implementation of Bloom ¯lters may not perform as perfect as the theoretical prediction. Our simulation adopts a modi¯cation to the MD5 hash function, which is one of the most popular hash functions for checksum making. The 128-bit output string from MD5 is divided into several substrings of identical length according to the Bloom ¯lter size to provide su±cient hash values. For example, an m-bit Bloom ¯lter would need a hash function with values within [0;m], which can be representedbydlog 2 me bits. Therefore, theMD5hashcanprovideb 128 dlog 2 me chashvaluesbydividingits128-bitoutput into substrings ofdlog 2 me bits each. Fig. 3.4 shows that the false positive rates produced by our adopted hash functions are close to those produced by the perfect hash functions. The VABF protocol is investigated using the following metrics: 3.3.1 Route strength Theroutestrengthisde¯nedastheratioofthehopcountsofactualqueryroutingpathto thehopcountsofthetopologicalshortestpath. Thereforehigherroutestrengthimplicates longer query delay and higher costs in ¯le transmissions afterwards. The VABF protocol is simulated in mobile networks with di®erent query arrival rates ¸ and is compared with unstructured search algorithms. Recall that the VABF protocol always tries to forward the query along the shortest path to the object holder, which can be justi¯ed by the results shown in Figs. 3.5(a) and 3.5(b). The route strength in VABF is very close to 1 evenwhenthequerytra±cgoeshigh(¸=10). Bycontrast,both°oodingandexpanding ring search have higher route strength, since °ooding causes severe interference and more 36 Figure 3.4: Performance of the implemented Bloom ¯lters in VABF. collisions along the shortest path. The random walk algorithm, as expected, has the longest route path. The simulation of route strength assumes that each queried object has at most one instance in the network, i.e. there is no replica. This settings may be unfavorable to randomwalkalgorithmbecauselongerroutingpathsareexpected. However,sinceVABF always returns the closest instance (so do the °ooding and expanding ring search if there is no collision), VABF will still outperform random walk in terms of route strength when there are replicas. Therefore due to the page limitation only the simulations of no replica is presented. 3.3.2 Query success rate The query success rate is de¯ned as the ratio of the number of successful queries received by the object holders to the number of sent queries, with the assumption that all queries are satis¯able. The VABF protocol is simulated in mobile networks with various query 37 4 6 8 10 12 14 16 Route stretch v.s. node mobility 700m x 700m, λ q =1.0 Random Walk (TTL max =50) 2 4 6 8 10 12 14 16 18 20 1 1.05 1.1 1.15 1.2 1.25 Max. node speed (m/s) Route stretch Flooding Expanding Ring (TTL init =3) VABF (T=3.0) (a) 2 4 6 8 10 12 14 Route stretch v.s. node mobility 700m x 700m, λ=10.0 Random Walk (TTL max =50) 2 4 6 8 10 12 14 16 18 20 1 1.05 1.1 1.15 1.2 1.25 Max. node speed (m/s) Route stretch Flooding Expanding Ring (TTL init =3) VABF (T=3.0) (b) Figure 3.5: Query route strength with (a) ¸=1:0 and (b) ¸=10:0. arrival rates and broadcast time intervals T. The query arrival is modeled as a Poisson process with average arrival rate ¸ being 1 and 10 queries per second. The three unstruc- tured search algorithm are also simulated in the mobile environment for performance comparison. Figs. 3.6(a) and 3.6(b) show the results in the mobile environment. As seen, VABF hascomparableperformancewith°ooding-basedalgorithms, andevenoutperformsthem when the query tra±c goes high (¸=10). The degradation of °ooding-based algorithms in high query tra±c comes from their redundant tra±c, which leads to higher collision rate. With expanding ring search, the improvement is still quite limited. The random walk algorithm, as expected, has the lowest success rate among all. However, as seen in Fig. 3.7(a) and the discussion later, random walk is more e±cient than °ooding-based algorithms in terms of bandwidth/packet overhead in low query tra±c. Asseeninboth¯gures,asmallerbroadcastintervalT improvesthequerysuccessrate in VABF, especially in high mobility. However, smaller T introduces more packet trans- missions, as shown in Figs. 3.7(a) and 3.7(b), which demonstrates the tradeo® between the query performance and bandwidth overheads. 38 2 4 6 8 10 12 14 16 18 20 10 20 30 40 50 60 70 80 90 100 Max. node speed (m/s) Query success rate(%) Query success rate v.s. node mobility 700m x 700m, λ q =1.0 Flooding Expanding Ring (TTL init =3) Random Walk (TTL max =50) VABF(T=1.0) VABF(T=3.0) (a) 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 100 Max. node speed (m/s) Query success rate(%) Query success rate v.s. node mobility 700m x 700m, λ=10.0 Flooding Expanding Ring (TTL init =3) Random Walk (TTL max =50) VABF(T=1.0) VABF(T=3.0) (b) Figure3.6: Query successrateinthemobilenetworkswith (a) ¸=1:0, and(b)¸=10:0. 3.3.3 Normalized Bandwidth overhead The normalized bandwidth overhead is de¯ned as the ratio of the number of packets sent per successful query to the query success rate. These packets include all packets sent during the query process, including the query messages, route discovery/reply messages, and the periodically broadcast messages in VABF. The VABF protocol is simulated with di®erent broadcast time intervals and is compared with the three unstructured search algorithms. Figs. 3.7(a) and 3.7(b) show the results with the query arrival rate ¸ being 1 and 10 respectively. As seen, the °ooding-based algorithms have the most packet over- heads in both low tra±c and high tra±c scenes. The random walk algorithm introduces less number of packets compared with °ooding-based algorithm. In all cases, VABF in- troduces the lowest bandwidth overhead in terms of number of packets, especially under high query tra±c volume. 3.3.4 Impact of Bloom Filter Size Adjustment Thecontinuingincreasingnumberofshared¯lesinthenetworkwillraisethefalsepositive rateofthe¯xed-lengthBloom¯ltersandhencedegradetheperformanceofVABF.Fig.3.8 showstheimpactofincreasingshared¯lesonthequeryroutinginVABFwithandwithout dynamic Bloom ¯lter size adjustment. The simulation is done in two di®erent networks to demonstrate the impact of di®erent network diameters. In the ¯rst network, there are 39 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 100 110 Max. node speed (m/s) # of packets per successful query Number of packets per successful query v.s. Node mobility 700m x 700m, λ q =1.0 Flooding Expanding Ring (TTL init =3) Random Walk (TTL max =50) VABF(T=1.0) VABF(T=3.0) (a) 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 100 110 Max. node speed (m/s) # of packets per successful query Number of packets per successful query v.s. Node mobility 700m x 700m, λ=10.0 Flooding Expanding Ring (TTL init =3) Random Walk (TTL max =50) VABF(T=1.0) VABF(T=3.0) (b) Figure 3.7: Normalized bandwidth overhead when (a) ¸=1:0 and (b) ¸=10:0. 50 nodes uniformly distributed within a 700m-by-700m territory (Fig. 3.8(a)), while in thesecondnetwork,75nodesareuniformlydistributedwithina860m-by-860mterritory 1 (Fig. 3.8(b)). Both the ¯le arrival and query arrival are simulated as Poisson processes. The average ¯le arrival rate ranges from 0.1 ¯les per second to 1.0 ¯les per second, and the average query arrival rate is ¯xed at 0.1 queries per second. The threshold to trigger the size adjustment is decided by (3.1) derived in Section 3.2.2.1, and each VABF node decidesitsnextnew¯ltersize(m 0 )aftersizeadjustmentistriggeredaccordingtofollowing formula m 0 = 8 < : m 1 ( ½ 2 ) ¡1 k if m 1 ( ½ 2 ) ¡1 k >2m 1 3m 1 otherwise (3.3) where m 1 is the current number of 1 bits in its highest level ¯lter, and ½ is the target false positive rate. The size decided by (3.3) satis¯es the necessary condition for the new ¯lter size as presented by (3.2) in Section 3.2.2.2, and also leaves bu®ers to accommodate the °uctuation or possible increasing in the number of shared objects. As seen in Fig. 3.8, without dynamic size adjustment (the curves marked by \without DSA") the query success rate decays quickly as the ¯le arrival rate increases. With 1 Due to the limitation of the ns simulator and our machines, we only simulated up to 75 nodes. To maintain the same node density as in the 50-node scenario, the network size in the 75-node scenario is expanded to 860 square meters. 40 (a) (b) Figure3.8: Theimpactofincreasingshared¯lesanddynamicBloom¯ltersizeadjustment on the query success rate. The VABF is running in the networks of (a) 700m-by-700m territory with 50 nodes, and (b) 860m-by-860m territory with 75 nodes. dynamic size adjustment enabled, the query success rate remains almost constant in both scenarios, and in both static and mobile environments. However, as compared with Fig. 3.6 in which the number of shared ¯les remains constant, the average query success rate is about 10 percent lower. This decay comes from that the change of Bloom ¯lter size break the synchronized state among all nodes, and queries arrives before the system reaches the next synchronized state will probably be dropped due to the incomplete routing tables. 41 Chapter 4 Evaluation of Unstructured P2P Content Discovery Techniques over MANETs There are several obstacles in popularizing MANETs in the commercial world. One is theirpoorperformanceinmultihopcommunications. Anotheristhelackofproperappli- cationsfromuser'sperspective. Whiledeployingadistributed¯lesystemonalarge-scale MANET is not foreseeable in the near future, a moderate-scale MANET supporting peer-to-peer (P2P) ¯le sharing among a group of end users appears to be practical and meaningful [15]. Several conceivable scenarios include: 1) P2P gaming among mobile devices, 2) sharing multimedia ¯les in a conference room among laptops and PDAs, 3) exchangingcontactinformation,ringingtonesandaudio/videoclipsamongsmartphones, and 4) auto collision avoidance signal transmission in VANETs. This chapter presents comprehensive performance comparison of several unstructured P2P content discovery techniques for MANETs through mathematical analysis and computer simulation. The P2P content discovery techniques under study include: query °ooding, expanding ring search, randomwalkand Bloom¯lter(BF)-based probabilistic routing. Eachtechniqueis evaluated by its query success rate, route stretch and search cost. Our research objective is to provide a thorough understanding of the impact of the wireless ad hoc environment on the behavior of P2P content discovery techniques. This knowledge will bene¯t the development of proper end-user applications in MANETs. 42 Some of our main results in this chapter are summarized below. ² The path probability, which is the probability for two randomly selected nodes in an ad hoc network to be connected by at least a path, serves as an upper bound of the query success rate for all content discovery techniques. ² TheBF-basedprobabilisticroutingschemeoutperforms°ooding-basedandrandom walk schemes. The only disadvantage with the BF-based probabilistic routing is that the control packet size is much larger, and it increases as the number of nodes or sharing objects increases. ² The node density has a great impact on the route stretch of °ooding-based and random walk schemes, but little impact on the BF-based probabilistic routing. ² Nodemobilityhasverylimitimpactonthequeryroutestretchandthesearchcost. ² Randomwalkisnotresilienttonodemobilityascomparedwith°ooding-basedand BF-based probabilistic routing schemes. ² Node mobility facilitates the propagation of content updates in the BF-based prob- abilistic routing. ² Our theoretical analysis is well corroborated by computer simulation in most cases. 4.1 Content Discovery Techniques for MANETs We review several unstructured P2P content discovery techniques developed from the InternetsettinganddiscusshowtoapplytheminthecontextofMANETsinthissection. 4.1.1 Query Flooding Flooding query messages to the entire network is the simplest way for content discovery. It has been used in unstructured P2P systems, e.g., the early version of Gnutella [53]. Flooding demands little maintenance overhead. Besides, it is robust to the dynamics 43 of network topology. However, °ooding in a wireless environment introduces broadcast storms, which result in severe packet collision and loss. Hence, the overall system perfor- mance degrades. 4.1.2 Expanding Ring Expandingringaimstoreducethe°oodingoverheadbylimitingthe°oodingarea. Inthis scheme, query messages are °ooded to a larger area successively until one of them hits the content owner. This is done at the querist by increasing the time-to-live (TTL) value of query messages until it receives a query reply. Intuitively, the expanding ring search prevents unnecessary broadcast tra±c when the content owner is close to the querist. The Ad hoc On-Demand Distance Vector Routing (AODV) [42] uses the expanding ring scheme in its route discovery phase. 4.1.3 Random Walk Chawatheetal.[12]proposedarandomwalkcontentdiscoveryschemetoreplace°ooding with an objective to make Gnutella-like systems scalable. They used a bias random walk scheme in selecting the next peer for query message forwarding. In structured P2P networks,therandomwalkschemehasbeenshownbyGkantsidisetal.[24]tooutperform °ooding if the P2P topology is clustered and/or queried contents are highly replicated. A simple way to implement random walk in MANETs is to select an unvisited one-hop neighbor randomly for query message forwarding. This is used in this work for analysis and simulations. 4.1.4 BF-based Probabilistic Routing Thisschemeutilizesaprobabilisticframeworktostoreandpropagatethecontentinforma- tionofeachnode. Eachnodeusesthecollectedinformationtobuilditsownroutingtable and routes query messages accordingly. The Bloom ¯lter was ¯rst proposed by Bloom in [9]. More recently, several variations such as the attenuated Bloom ¯lter (ABF) [52] andtheexponentiallydecayingBloom¯lter(EDBF)[34]wereproposedasaspace-saving 44 data structure for the construction and dissemination of content indices in P2P networks over the Internet. For adaptation to the wireless ad hoc environment, a variation of the attenuated Bloom ¯lter (abbreviated as VABF) is proposed in this work. The basic idea of VABF is that each node maintains a set of Bloom ¯lters organized in a multi-level structure. The level-h ¯lter represents the set of objects shared by all nodes within the h-hop distance from this node. Each VABF node periodically broadcasts its own set of Bloom ¯lters to its one-hop neighbors, and constructs its (h+1)-level ¯lter by taking the union of its h-level ¯lter and all h-level ¯lters collected from neighbors. Then, the level of ¯lters at each node increases as the exchange of ¯lters proceeds until it reaches a steady state. Note that the highest level ¯lter of each node is identical at the steady state, which is called the synchronized state of the VABF network. ThesetofBloom¯ltersmaintainedbynodeiinVABFcanbeorganizedintoa(d+1)- ary tree, where d is the degree of this node, i.e. the number of its neighbors. The tree root is the level-0 ¯lter of this node, and the left most node at level h is its level-h ¯lter. The remaining nodes at level h are level-(h¡1) ¯lters received from its neighbors. To handle a query for content with index Q, a Bloom ¯lter B Q is constructed from Q and thenB Q is compared with tree root B 0 . IfB Q ½B 0 , this node may have object Q with a certain false positive rate. Otherwise, VABF performs the depth-¯rst search on this tree until a ¯lter that contains B Q is found. Then, VABF switches to the breadth-¯rst search until a ¯lter that contains B Q is found. (Please note that, if B Q ½ B 0 , VABF must be able to locate a ¯lter in the breadth-¯rst search when there is no false positive.) The query is then forwarded to the corresponding neighbor node. On the other hand, if no ¯lter containing B Q is found throughout the search process, we conclude that the object being queried does not exist in the network. VABF di®ers from the classic ABF scheme [52] in several aspects. First, due to the small scale of a practical MANET, the content information of each VABF node is propagated through the entire network and, unlike ABF, it does not decay during propagation. Second, the range of the information represented by each Bloom ¯lter is di®erent. A level-h ABF represents the content of nodes which are exactly h hops 45 away, while the level-h VABF ¯lter represents the content of all nodes within the h-hop distance. Thus, the highest level ¯lter of a VABF node provides the content information of the entire network. The advantage of VABF is that a query to a non-existing object can be resolved and dropped immediately by checking the highest level ¯lter to avoid unnecessary wireless transmissions. 4.2 Performance Analysis TheperformanceofunstructuredP2PcontentdiscoverytechniquesdiscussedinSec. 4.1is evaluatedinthreemetrics;namely,thequerysuccessrate,thequeryroutestretchandthe search cost. Furthermore, the expected duration for a MANET to reach its synchronized stateinVABFisalsoinvestigated. Weprovidemathematicalanalysisforvariousschemes under the assumptions of no node mobility and no interference in this section. Extensive computer simulations for both static and mobile networks are conducted in Sec. 4.3 to corroborate analytical results. Uniform popularity of all contents (i.e. exactly one instance of each content in the MANET of concern) is assumed in our analysis. This assumption may not be favorable to the random walk scheme. However, considering the fact that a MANET is usually formed by a limited number of nodes in an ad hoc manner, uniform popularity appears to be a reasonable choice. Table 4.1 summarizes notations to be used in the following analysis. 4.2.1 Query Success Rate The query success rate is de¯ned as the probability of a query message being successfully delivered to its corresponding content owner. In ad hoc networks, a successful query happens only when there is at least one path, single-hop or multi-hop, between the querist and the content owner. Let P path be the path probability, i.e., the probability that two randomly selected nodes in an ad hoc network are connected via at least one path. Intuitively, P path is the upper bound on the query success rate that a content discovery technique can achieve. 46 Table 4.1: Notation used Symbol Meaning r The radio transmission range of a single node. n The number of distinct objects shared by all mobile nodes. m The number of bits of the Bloom ¯lters in VABF. k The number of hash functions used by the Bloom ¯lters in VABF. t init The initial TTL value in the expanding ring search. t max The maximum TTL value in the random walk or the expanding ring search. N;N N is the set of all mobile nodes in the network. jNj=N. H;H i H i is the maximum number of hops from node i to all other nodes, along the shortest path, in the network. H is the maximum value of H i 8i2N. B h i The h-level Bloom ¯lter of node i, which represents the objects shared by all nodes within h hops from node i, including objects shared by node i itself. B 0 i represents objects shared by node i itself. The path probability in one-dimensional (1D) MANETs has been widely studied [22]. However, the path probability analysis in two-dimensional (2D) MANETs still remains an open problem. Here, we approach this problem by statistical analysis of simulated node distributions on a square region. For each network of size ranging from 700 square metersto1500squaremeters,wegeneratedatleast1000di®erenttopologieswithCMU's scenariogeneratorinsidens-2[4]. Eachnetworkconsistsof50uniformlydistributedstatic nodes with transmission range equal to 250 meters. The path length (in terms of hop counts) between each node pair is averaged. For disconnected nodes, their path length is set to zero. The relative frequency is used to estimate the cumulative distribution function (CDF) of the path length. Fig. 4.1 shows the CDFs of the path length for networks of di®erent sizes. 4.2.1.1 Query Flooding In query °ooding, the query message is simply °ooded to the entire network. Thus, the query success rate is equal to the path probability, i.e., QSR flooding =P path . 47 0 5 10 15 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of hops Probability Cumulative distribution function v.s. Number of hops 50 nodes, r = 250 m 700 m 2 800 m 2 900 m 2 1000 m 2 1100 m 2 1200 m 2 1300 m 2 1400 m 2 1500 m 2 Figure 4.1: CDFs of the path length for networks of di®erent sizes. 4.2.1.2 Expanding Ring In the expanding ring search, the querist keeps sending new query messages with an increasingTTLvalueuntilitreceivesaqueryreplyorthemaximumTTLvalueisreached. For this case, the query success rate is de¯ned as the ratio of the number of successful queriesandthenumberofsentqueries. Pleasenotethatifthereisnopathtothecontent owner or the maximum TTL is not large enough to reach the content owner, the number of successful queries is equal to zero. 48 Let ¡ be the random variable representing the link distance between two nodes that are randomly positioned in a x-by-y (x·y) rectangular area, and let F(°)=Prf¡·°g be the distribution function of ¡. According to Miller's work [38], we have F(°)= 8 > > > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > : 0; » <0 ³» 2 [ 1 2 ³» 2 ¡ 4 3 »(1+³)+¼]; 0·» <1 2 3 ³(2» 2 +1) p » 2 ¡1¡ 1 6 ³(8» 3 +6³» 2 ¡³)+2³» 2 sin ¡1 ( 1 » ); 1·» <³ ¡1 2 3 ³(2» 2 +1) p » 2 ¡1¡ 1 2 ³ 2 (» 4 +2» 2 ¡ 1 3 ) + 2 3 (2³ 2 » 2 +1) p » 2 ¡³ ¡2 + 1 6 ³ ¡2 ¡» 2 +2³» 2 sin ¡1 ( 1 » ) ¡2³» 2 cos ¡1 ( 1 ³» ); ³ ¡1 ·» < p 1+³ ¡2 1; p 1+³ ¡2 ·» (4.1) where » = ° x ³ = x y ·1: Let h be the hop distance between the querist and its corresponding content owner, n sentQ be the number of query messages sent by the querist, n succQ be the number of successful queries, and Á be (t max ¡t init +1). The query success rate in the expanding ring search can be obtained as QSR er = E[n succQ ] E[n sentQ ] ; (4.2) where E[n succQ ]=Prfsuccessful queryg¢P path =F(r¢t max )¢P path ; 49 and E[n sentQ ] =1¢Prfh·t init g+2¢Prft init <h·t init +1g . . . +(Á¡1)¢Prft max ¡2<h·t max ¡1g +Á¢Prft max ¡1<hg =F(rt init )+Á(1¡F(r(t max ¡1)))+ Á¡1 X i=2 i(F(r(t init +i¡1))¡F(r(t init +i¡2))): 4.2.1.3 Random Walk The result of an ®-step random walk can be modeled as ® independent uniform sampling as shown by Gkantsidis et al. in [24]. To make the scheme more e±cient, a list of visited nodes is appended to the query message so that a random walker will not visit a node more than once. Then, the query success rate of the random walk scheme can be written as QSR rw =(1¡ t max Y i=1 (1¡ 1 N¡i ))¢P path = t max ¢P path (N¡1) : (4.3) 4.2.1.4 VABF Since there exists a false positive in the Bloom ¯lter, a node whose level-0 Bloom ¯lter containsthequerymaynothavetheobjectbeingqueried. Similarly,anodewhoselevel-h (h > 0) Bloom ¯lter contains the query may not be able to locate a neighbor node to forward this query since the union of all neighbors' level-(h¡1) Bloom ¯lters may have a false positive, too. When this occurs, Bloom ¯lters of higher levels also inherit this false positive. Consequently, a query may be misled to a node in which the query cannot 50 be handled even when the object being queried exits somewhere in the network. These situations all lead to query failures and will be analyzed below. Consider the representation of a set of n elements using an m-bit Bloom ¯lter with k perfectly random hash functions. The false positive rate can be expressed as [9] FPR=(1¡(1¡ 1 m ) kn ) k ¼(1¡e ¡ kn m ) k : (4.4) The query routing process in VABF in a static x-by-y network can be modeled by a discrete Markov chain with (H+2) states (where H is de¯ned in Table 4.1) as illustrated in Fig. 4.2. The model assumes that the network is connected and the propagation of Bloom¯lterseventuallyreachesthesteadystate. Therefore,thehighest-levelBloom¯lter at each node satis¯es every query. Figure 4.2: The Markov chain model of the query routing process in VABF. Each node in the network can be classi¯ed as being in one of these states during the queryroutingprocess. Anodeisinstateh(1·h·H)ifitreceivesaquerythatmatches its level-h Bloom ¯lter and at least one of its cached level-(h¡1) Bloom ¯lter. A node is in state 0 if it receives the query and has the object being queried (i.e., a successful query). A node is in state F if it receives a query, and it can neither satisfy this query nor locate a neighbor to forward this query (i.e., a failed query). Since a VABF node tries to forward a received query to its neighbor that is one hop closer to the destination, there is no transition from state h to state (h¡±) with ± > 1, nor from state h to state (h+±) with ± ¸ 1 as shown in Fig. 4.2. The transition probabilities between adjacent 51 states depend on the false positive rate in each state, which may di®er from one state to another, since a di®erent level of Bloom ¯lters represents a di®erent number of objects. Letp h bethefalsepositiverateinstateh,Q S betheeventthataqueryissuccessfully forwarded to its corresponding content owner, and Q h be the event that the query is initiated in state h, where h 2 f0;:::H;Fg and Q 0 = Q F = 0. Then, the conditional probability P(Q S jQ h ), which is nothing but the h-step transition probability from state h to state 0, can be written as P(Q S jQ h )= h Y j=1 (1¡p j ): Then, the unconditional probability of Q S can be obtained via P(Q S )= H X h=1 P(Q S jQ h )P(Q h ): (4.5) Distribution of initial query states. Let r be the radio transmission range of a mobile node. Since a query to a random object is initiated by a node chosen uniformly at random, the probability that a query originates at state h, P(Q h ), can be approximated as the probability that the link distance of two randomly selected nodes is less than or equal to hr but greater than (h¡1)r, which can be derived as P(Q h )=F(hr)¡F((h¡1)r): (4.6) False positive rate in each state. A node in state h indicates that its level-h Bloom ¯lter contains the received query. Recall that the level-h Bloom ¯lter in VABF representsallobjectssharedbynodeswhichareatmost hhopsawayfromthisnode. Let n h bethenumberofdistinctobjectssharedbynodeswithinh-hopdistance. Assumethat there is no border e®ect and that all n distinct shared objects are uniformly distributed on N nodes within a x-by-y network territory. According to (4.1), n h can be estimated to be n¢F(hr). Then, the false positive rate in state h can be obtained by (4.4) as FPR(h)=(1¡(1¡ 1 m ) knF(hr) ) k ¼(1¡e ¡knF(hr) m ) k : (4.7) 52 Finally, based on (4.5)-(4.7), the query success rate of VABF can be computed as QSR VABF =P(Q S ) = H X h=1 0 @ h Y j=1 (1¡FPR(j)) 1 A (F(hr)¡F((h¡1)r)): (4.8) 4.2.2 Query Route Stretch Thequeryroutestretchisde¯nedastheratioofthehopcountoftheactualqueryrouting path and that of the topological shortest path. Generally speaking, a larger route stretch implies longer query response time. 4.2.2.1 Flooding-based Algorithms The query °ooding and the expanding ring search schemes have the query route stretch equaltoone(undertheassumptionofnocollisionorinterference),sincethequerymessage is°oodedtothecontentownerinbothalgorithms. Itishoweverworthwhiletoemphasize thatthedelayduetosuccessiveTTLincrementisnotre°ectedbythequeryroutestretch analysis in the expanding ring search. This discrepancy has to be taken into account in the query response time analysis. 4.2.2.2 Random Walk The query route stretch in random walk can be estimated by the ratio of the number of walks for the random walker from the querist to reach the content owner and the hop count of the shortest path between them. Let h be the hop distance between the querist and its corresponding content owner, n wks be the number of walks in a successful query. The expected value of n wks can be obtained as E[n wks ]= 1¢Prfh·1g N¡1 + 2¢Prfh·2g N¡1 +¢¢¢+ t max ¢Prfh·t max g N¡1 = t max X i=1 ( i N¡1 )F(ir): (4.9) 53 Let h shortest be the length of the shortest path, in terms of the hop count, between two randomly selected nodes. The expected value of h shortest can be calculated as E[h shortest ]=F(r)+ tmax X i=2 i(F(ir)¡F((i¡1)r)): (4.10) Then, the query route stretch for random walk is equal to E[n wks ]=E[h shortest ]. 4.2.2.3 VABF ThequeryroutestretchofVABFisequaltooneascharacterizedbythefollowingtheorem. Theorem 4.2.1 If a query is successfully delivered from the querist to its corresponding content owner in a synchronized VABF network, it must be forwarded along the shortest path (in terms of the hop count) between them. Proof: LetnodeibetheonlynodewhichhasobjectO, andnodej bethequeristthatis h hops away from node i. According to the VABF protocol, the lowest level Bloom ¯ler of node j which contains the information of object O must be level h, which is denoted by B h j . Assume that node j sends a query message for O, and this query is successfully forwardedtonodeialongah 0 -hoppathconsistingofnodesfj;j+1;j+2;:::;j+h 0 ¡1;ig. Then, by the VABF protocol, the level-1 ¯lter of node (j +h 0 ¡1), i.e. B 1 j+h 0 ¡1 , must contain the information of object O. By induction, the level-± ¯lter of node (j+h 0 ¡±), 1 · ± · h 0 , must contain the information of object O. Therefore, the level-h 0 ¯lter of nodej,B h 0 j , mustcontaintheinformationofobjectO. RecallthataVABFnoderesolves a query message by the depth-¯rst search among its own multi-level ¯lters from level-0 until it locates the ¯rst ¯lter that matches the query, which is B h 0 j in this case. Since the lowest level Bloom ¯ler of node j that contains the information of object O is B h j , we conclude that h 0 =h. In other words, the forwarding path is the shortest path. 4.2.3 Search Cost In our work, we characterize the search cost with the number of packet transmissions per successful query. 54 4.2.3.1 Query Flooding Sinceallnodesconnectedtothequeristhelpforwardthequerymessagesinquery°ooding, its search cost can be found as SC flooding = (1+(N¡1)P path ) QSR flooding : (4.11) 4.2.3.2 Expanding Ring The number of forwarding nodes in the expanding ring search is determined by the TTL value in the query message. Let h be the hop distance between the querist and its corresponding content owner, n sentPkt the number of packet transmissions in querying an object, and ^ N(x) = N ¢F(r¢x) the number of nodes within the x-hop distance. The search cost in the expanding ring scheme can be calculated as SC er = E[n sentPkt ] E[n succQ ] ; (4.12) where E[n succQ ] is given in (4.2), and E[n sentPkt ] =Prfh·t init g ^ N(t init ¡1)+Prft init <h·t init +1g( ^ N(t init ¡1)+ ^ N(t init )) +¢¢¢+Prft max ¡2<h·t max ¡1g t max ¡2 X i=t init ¡1 ^ N(i)+Prft max ¡1<hg t max ¡1 X i=t init ¡1 ^ N(i) =F(rt init ) ^ N(t init ¡1)+ tmax¡1 X i=t init +1 (F(ri)¡F(r(i¡1))) i¡1 X j=t init ¡1 ^ N(j) +(1¡F(r(t max ¡1))) tmax¡1 X i=t init ¡1 ^ N(i): 55 4.2.3.3 Random Walk The search cost in random walk is equal to the number of walks for the query message to reach its content owner in a successful query, or the maximum TTL value in a failed query. Based on (4.3) and (4.9), the search cost in random walk can be written as SC rw =t max (1¡QSR rw )+E[n wks ]¢QSR rw : (4.13) 4.2.3.4 VABF Although the search cost includes all packet transmissions at nodes along the query forwarding path and periodical broadcast messages in VABF, the dominating one is the cost of periodical broadcast messages. Let T be the broadcast time interval and ¸ q the average query arrival rate. Then, the search cost of VABF can be obtained as SC VABF = N (¸ q T ¢QSR VABF ) ; (4.14) where QSR VABF can be computed according to (4.8). Besides the number of broadcast messages, the size of broadcast messages, which is usually much larger than the query message, is another concern. To reduce the message size,thearithmeticcodingalgorithm[61]canbeusedtocompressthemulti-level¯ltersof aVABFnodebeforebroadcasting. Thecompressionratioisdeterminedbytheentropyof inputBloom¯lterstrings,whichcanbefurtherreducedbythedeltacompression. Thatis, ifB =fB 0 ;B 1 ;:::;B h g is the set of ¯lters to be broadcast, the delta compression scheme eliminates duplicated \1" bits in these ¯lters by taking bit-wise exclusive-OR operation on each adjacent level of ¯lters to get a new set of ¯lters B 0 =fB 0 ;B 1 0 ;:::;B h 0 g, where B h 0 = B h¡1 ©B h , for h¸ 1. Afterwards, ¯lters in B 0 are concatenated into a single bit string, which is the input of the arithmetic encoder. 56 Let ¡ be the set of bits representing the concatenated Bloom ¯lters, ¡ 0 be the set of bits after delta compression of ¡, and ¡ 00 be the set of bits after arithmetic encoding of ¡ 0 . The size of ¡ and ¡ 0 for an arbitrary VABF node i can be obtained as j¡j=j¡ 0 j =m(H i +1) ´M: (4.15) Then, ¡ 0 is an M-bit Bloom ¯lter. By [39], we know that the size of a compressed m-bit Bloom ¯lter can be approximated by mH(p), where p is the expected fraction of 0 bits in the Bloom ¯lter, and H(p) = ¡plog 2 p¡(1¡p)log 2 (1¡p) is the entropy function. Thus, the size of ¡ 00 can be obtained asj¡ 00 j=M¢H(p 0 ), where p 0 is the fraction of 0 bits in ¡ 0 . The p 0 value can be determined using the following theorem. Theorem 4.2.2 Let p 0 be the fraction of 0 bits in ¡ 0 , which is the concatenated Bloom ¯lters after delta compression. Then p 0 can be obtained by: p 0 ¼ H i +e ¡ kn 0 d H i i m H i +1 where n 0 = n N (4.16) Proof: Assume that perfect random hash functions are used in MABF, and n shared objectsaredistributeduniformlyoverN nodes. Therefore,thenumberofdistinctobjects shared by each node is n 0 =n=N. For an arbitrary MABF node i, the set of bits of ¡ is the concatenation of its H i +1 Bloom ¯ltersfB 0 i :::B H i i g. Let ¿ h be the expected number of 0 bits in Bloom ¯lter B h i ;0·h·H i , which can be obtained as ¿ h =m(1¡1=m) kn 0 d h i . Let B 0 h i be the Bloom ¯lter after delta compression of B h i , and let ¿ 0 h be the expected number of 0 bits in Bloom ¯lter B 0 h i . Since the delta compression eliminates duplicated 1 bits from higher level Bloom ¯lters, we have ¿ 0 0 =¿ 0 and ¿ 0 h =m¡¿ h¡1 +¿ h . Let ¿ be 57 the expected number of 0 bits in ¡ 0 , and hence p 0 = ¿=M. Since ¡ 0 is the concatenation of the H i +1 compressed Bloom ¯ltersfB 0 0 i :::B 0 H i i g, we have ¿ = H i X h=0 ¿ 0 h =mH i +m(1¡ 1 m ) kn 0 d H i i ¼m(H i +e ¡ kn 0 d H i i m ) (4.17) Fig. 4.3 shows the size of a single update message, before and after compression, in the networks of di®erent sizes in terms of network diameter. As seen, the compression greatly reduces the broadcast message size, especially when network diameter increases. Figure 4.3: The average size of an update message v.s. network diameter (m = 1000 bits, k = 7). 4.2.4 Unsynchronized State Duration in VABF The query routing in the BF-based probabilistic routing scheme relies on the content in- formation stored at each node, which is synchronized by periodical broadcasts of Bloom ¯lters. An unsynchronized state starts from the occurrence of a content update at an arbitrary node until the network reaches the synchronized state, in which all nodes in the 58 network are aware of the update. Unsynchronized states result in a higher probability of incorrect query results, thus damaging the overall query performance. To address this problem, the broadcast rate should be carefully chosen to shorten the duration of unsyn- chronized states while we need to keep a good balance between the system performance and the extra tra±c cost required by synchronization. In this subsection, we analyze the e®ect of the content update rate on the expected duration of an unsynchronized state in astaticVABFnetwork. ItisassumedthatVABFisrunningonaperfectMAClayerthat introduces no collision and bu®er delay. Simulation results for both static and mobile networks are presented in Section 4.3.3. LetC i denotetheithcontentupdateand¾ i beitsservicetime, whichisthetimespan from the arrival of C i to the that time this information propagates to the entire network. An example of content updates and their propagation in the VABF network is shown in Fig. 4.4. The y-axis in the ¯gure is the remaining time U(t) for the network to clean all existing content updates and enter the synchronized state. It can also be referred to as the un¯nished work of this system at time t. Figure 4.4: Illustration of content updates and their services in a VABF network. Based on the service time de¯nition, each content update is served immediately upon itsarrival. Therefore, anarrivalofcontentupdate C i which¯ndsasynchronizednetwork statewillterminatethissynchronizedstateandinitiateanewunsynchronizedstate. This unsynchronized state ends when all content updates that arrive during this unsynchro- nizedstatehavebeenpropagatedtotheentirenetwork. Synchronizedandunsynchronized 59 statesoftheVABFnetworkarereferredtoastheidleandthebusyperiodsofthissystem, respectively. The alternation between busy periods (B 1 ;B 2 :::) and idle periods (I 1 ;I 2 :::) is shown clearly in Fig. 4.4. The average length of busy periods is equal to the average duration of the unsynchro- nized state, which is a function of the content update rate and the service time. Assume that the arrival of content updates is a Poisson process with rate ¸, and the service time ¾ is a generic distribution with values in [dHT=2e:::HT], where H is the network diam- eter, and T is the time interval between two consecutive broadcasts of Bloom ¯lters at a node. The system can be modeled by a M=G=1 queue with arrival rate ¸ and generic service time ¾. Let B be the random variable representing the generic busy period of this M=G=1 queue. According to the result in [37], the expected value of B can be approximated by E[B]¼ K ¸(1¡K) ; (4.18) where K =1¡e ¡¸E[¾] : Service time distribution. The service time of a content update is determined by the node location where the update occurs. It is intuitive that an update occurring at an edge node demands longer service time than that occurring at the center of the network. Consider a square network of size w£w as shown in Fig. 4.5, which consists of N uniformlydistributednodeswiththesameradiotransmissionrange,denotedby r. We can partition the whole square region into several subregions according to the distance from an arbitrary node to the four corners. Nodes in areas marked A h are at least h hops awayfromthefurthestnetworkborder,whereh=h 1 ;h 2 ;¢¢¢ isapositiveintegerbetween ° 1 = (b p 2w 2r c+1) and ° 2 = d p 2w r e. For example, if w = 1000 and r = 260, the entire square region can be divided into 4 subregions fA h 1 ;A h 2 ;A h 3 ;A h 4 g =fA 3 ;A 4 ;A 5 ;A 6 g. Each subregion covers nodes that are 3, 4, 5 and 6 hops away from the furthest border, respectively. Let T be time interval between two consecutive broadcasts at a node. The average residual time from the arrival of a content update to the next broadcast time of the node 60 0 w 0 w/2 w h 1 r h 1 r h 1 r A h4 A h4 A h1 A h2 A h2 A h3 A h3 w− p h 2 3 r 2 −w 2 p h 2 3 r 2 −w 2 w− √ 4h 2 2 r 2 −w 2 /4 2 w− √ 4h 2 1 r 2 −w 2 /4 2 q h 2 2 r 2 − w 2 4 q h 2 1 r 2 − w 2 4 A h4 A h4 A h3 A h3 A h3 A h3 A h3 A h3 A h3 A h3 A h3 A h3 A h3 A h3 A h3 A h2 A h2 A h2 A h2 A h2 A h2 A h3 w 2 Figure4.5: Anexampleofaw£wsquarenetworkbeingdividedinto4areasA h1 ;:::;A h4 . Nodes are assumed homogeneous with radio transmission range r. fh 1 ;:::;h 4 g = f[b p 2w 2 2r c+1;:::;d p 2w 2 r e]g are consecutive positive integers. is T 2 . Then, the average service time of a content update at a node in area A h is hT 2 , and the expected service time of this system E[¾] is E[¾]= ° 2 X h=° 1 A h w 2 hT 2 ; (4.19) 61 where A h = 8 > > > > > > > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > : 0 if h<° 1 R w 2 ± 1 [f 1 (x)¡f 2 (x)]dx + R ± 2 w 2 [f 3 (x)¡f 4 (x)]dx ¡A h¡1 if ° 1 ·h·b p 5w 2r c w 2 ¡A h¡1 ¡ R ± 3 0 [w¡f 1 (x)+f 2 (x)]dx ¡ R w ± 4 [w¡f 3 (x)+f 4 (x)]dx ifb p 5w 2r c<h<° 2 w 2 ¡ P h¡1 i=1 A i if h=° 2 and where ± 1 =w¡( p 4h 2 r 2 ¡w 2 )=2 ± 2 = p h 2 r 2 ¡w 2 =4 ± 3 =w¡ p h 2 r 2 ¡w 2 ± 4 = p h 2 r 2 ¡w 2 f 1 (x)= p h 2 r 2 ¡(x¡w) 2 f 2 (x)=w¡ p h 2 r 2 ¡(w¡x) 2 f 3 (x)= p h 2 r 2 ¡x 2 f 4 (x)=w¡ p h 2 r 2 ¡x 2 The expected duration of an unsynchronized state in VABF can be computed by (4.18) and (4.19). 4.3 Simulation Results Computer simulations were conducted with ns-2 [4]. The content discovery protocols un- derourstudywereimplementedastransport-layeragentsontopofnetwork-layeragents, and a Gnutella-like P2P client was implemented at the application layer to simulate the 62 behavior of P2P ¯le sharing. The IEEE 802.11 MAC with the distributed coordination function (DCF) for wireless LANs was used as the MAC layer. The radio model uses characteristics similar to Lucent's WaveLAN. That is, the channel capacity was 2 Mbps, the radio propagation range was 250m and the two-way ground re°ection propagation model was adopted as the physical-layer path loss model. Content discovery protocols were ¯rst simulated in static networks to validate our analytical results, where the network size varying from 700x700 to 1500x1500 square meters with 50 uniformly distributed static nodes. Then, protocols were simulated in mobile networks to understand their behaviors in a mobile environment, where 50 nodes roam around using the random waypoint model with the maximum speed varying from 2 to 20 meters per second (or 7.2-72 km per hour). In both static and mobile networks, the content popularity is set to 1=50; namely, there is exactly one instance of each content in thenetwork. ThequeryarrivalprocesswasmodeledasaPoissonprocesswithanaverage arrival rate (¸ q ) equal to 1 query per second. Only results of the following simulation settings are presented due to the space limit. For both static and mobile networks, the initial TTL in the expanding ring search (TTL init ) is 3. The maximum TTL in the random walk (TTL max ) is 50, which is the same as the node number. The broadcast time interval in VABF (T) is 3 sec. For the mobile network, results of 700x700 square-meter networks are presented, and results of VABF with T = 1 are also presented to demonstrate the impact of di®erent broadcast intervals in a mobile environment. 4.3.1 Static Environment The simulation results in static networks and compare them with analytical results are presentedinthissection. Weseeaclosematchbetweentheoreticalandsimulationresults in static networks. As shown in Fig. 4.6, the path probability (indicated by the curve of the °ooding scheme) provides an upper bound on all schemes. The success rate de- creases as the network density becomes lower since a lower density results in a lower path probability. The success rate of the expanding ring search decreases much faster than other schemes as the network size increases, since there is a higher probability that the 63 700 800 900 1000 1100 1200 1300 1400 1500 10 20 30 40 50 60 70 80 90 100 Network width (m) Query success rate (%) Query success rate v.s. Network size Analytical predictions v.s. Simulation results Flooding Flooding (theoretical) Expanding Ring (TTL init =3) Expanding Ring (theoretical) Random Walk (TTL max =50) Random Walk (theoretical) VABF (T=3.0) VABF (theoretical) Figure4.6: Theoreticalandsimulationresultsofthequerysuccessrateinstaticnetworks. queried object is located far away which cannot be accessed within its initial broadcast TTL. The random walk scheme performs as well as the °ooding and VABF schemes in this case, since its maximum TTL is equal to the node number, which allows it to visit every reachable node (recall that an improved random walk were used in our simulation, in which no node will be visited twice). The route stretch of the random walk scheme is much higher than others as shown in Fig 4.7, which indicates that random walk has much longer delay in the query response. The route stretch of random walk drops quickly as the network goes sparser. The reason is that there are fewer options in the selection of the next forwarding node in a sparse network. Thus,thereisahigherpossibilitythattheforwardingpathofasuccessfulquery isjusttheshortestpath. TheVABFschemehastheoptimalroutestretchthatisequalto one. Since there exists interference in the query °ooding and the expanding ring search for a dense network, their route stretch is slightly higher than one. The interference is alleviatedforasparsernetworksothattheroutestretchofthe°oodingandtheexpanding ring decreases gradually to one. 64 4 6 8 10 12 14 16 Route stretch v.s. Network size Analytical predictions v.s. Simulation results Random Walk (TTL max =50) Random Walk (theoretical) 700 800 900 1000 1100 1200 1300 1400 1500 1 1.05 1.1 1.15 1.2 1.25 Network width (m) Route stretch Flooding Expanding Ring (TTL init =3) VABF (T=3.0) Figure 4.7: Theoretical and simulation results of the route stretch in static networks. As to the search cost, the query °ooding has the highest cost in terms of the packet number per successful query as shown in Fig. 4.8. With con¯ned °ooding, the expanding ring search successfully reduces the amount of redundant packets. However, it is still much higher than random walk and VABF. VABF has the lowest search cost in terms of the control packet number. However, we need to consider the size of a VABF message, since it is usually much larger than a query message. This will be examined in Sec. 4.3.3. 4.3.2 Mobile Environment Theoretical analysis of P2P content delivery techniques in a mobile environment is di±- cult. Instead, simulation results are provided to shed some light on their behaviors. We see from Fig. 4.9(a) that the two °ooding-based schemes are most resilient to mobility in their query success rate. VABF achieves comparable performance and performs better as the broadcast time interval T becomes smaller. However, a smaller T value introduces morepackettransmissionsasshowninFig.4.11(a). Thus, thereexistsatradeo®between the query performance and the search cost. The random walk performs the worst in the 65 700 800 900 1000 1100 1200 1300 1400 1500 0 10 20 30 40 50 60 70 80 90 100 110 Network width (m) Number of packets per successful query Number of packets per successful query v.s. Network size Analytical predictions v.s. Simulation results Flooding Flooding (theoretical) Expanding Ring (TTL init =3) Expanding Ring (theoretical) Random Walk (TTL max =50) Random Walk (theoretical) VABF (T=3.0) VABF (theoretical) Figure 4.8: Theoretical and simulation results of the search cost in static networks. query success rate, and its success rate drops much faster then others as mobility goes higher, which demonstrates severe performance degradation of multihop communications in a mobile environment since the random walk scheme demands a longer routing path as shown in Fig. 4.10(a). By comparing Figs. 4.7 and 4.10(a), we see that the random walk scheme has an even higher route stretch in mobile networks than in static networks. In contrast, the route stretch of two °ooding-based schemes and VABF is not a®ected by node mobility. We see from Fig. 4.11(a) that the query °ooding scheme still has the highestsearchcost, andtheexpandingringschemewithacon¯nedsearchrangedoesnot helpmuchinreducingredundantpacketsinthemobileenvironment. Generallyspeaking, the node mobility has little impact on the search cost of all schemes. 4.3.3 VABF Message Overhead We study the impact of the network size and the number of content objects on the average size of a VABF message in this subsection. As shown in Fig. 4.12, the message size is proportional to the number of content objects and it grows much faster in a 66 2 4 6 8 10 12 14 16 18 20 10 20 30 40 50 60 70 80 90 100 Max. node speed (m/s) Query success rate(%) Query success rate v.s. node mobility 700m x 700m, λ q =1.0 Flooding Expanding Ring (TTL init =3) Random Walk (TTL max =50) VABF(T=1.0) VABF(T=3.0) (a) 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 100 Max. node speed (m/s) Query success rate(%) Query success rate v.s. node mobility 700m x 700m, λ=10.0 Flooding Expanding Ring (TTL init =3) Random Walk (TTL max =50) VABF(T=1.0) VABF(T=3.0) (b) Figure 4.9: Comparison of query success rates in mobile networks with (a) ¸ = 1:0, and (b) ¸=10:0. dense network, which shows the major weakness of VABF. We show the e®ect of the content arrival rate on the expected duration of an unsynchronized state in Figs. 4.13(a) and 4.13(b). The simulation was conducted in a 700m-by-700m network territory with various content update rates. The arrival of content updates was modeled as a Poisson process with average rate ¸ cu ranges from 0.1 to 1.0 updates per second. For mobile networks, results of maximum node mobility of 4 and 8 meters per second are shown. We see a close match between theoretical and simulation results in static networks in Fig. 4.13(a), where the expected unsynchronized duration grows exponentially as the content update rate increases as predicted. However, for broadcast interval T = 1 (less than5seconds), theunsynchronizeddurationisstillshortevenwhentheaveragecontent updaterateisashighas1. Theexpectedunsynchronizeddurationinthemobilenetwork isevenlowerthanthatinthestaticnetworkasshowninFig.4.13(b),whichdemonstrates that node mobility assists the propagation of content updates in VABF to some degree. This is especially obvious for a large T value. 67 4 6 8 10 12 14 16 Route stretch v.s. node mobility 700m x 700m, λ q =1.0 Random Walk (TTL max =50) 2 4 6 8 10 12 14 16 18 20 1 1.05 1.1 1.15 1.2 1.25 Max. node speed (m/s) Route stretch Flooding Expanding Ring (TTL init =3) VABF (T=3.0) (a) 2 4 6 8 10 12 14 Route stretch v.s. node mobility 700m x 700m, λ=10.0 Random Walk (TTL max =50) 2 4 6 8 10 12 14 16 18 20 1 1.05 1.1 1.15 1.2 1.25 Max. node speed (m/s) Route stretch Flooding Expanding Ring (TTL init =3) VABF (T=3.0) (b) Figure 4.10: Comparison of the query route stretch in mobile networks with (a) ¸=1:0, and (b) ¸=10:0. 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 100 110 Max. node speed (m/s) # of packets per successful query Number of packets per successful query v.s. Node mobility 700m x 700m, λ q =1.0 Flooding Expanding Ring (TTL init =3) Random Walk (TTL max =50) VABF(T=1.0) VABF(T=3.0) (a) 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 100 110 Max. node speed (m/s) # of packets per successful query Number of packets per successful query v.s. Node mobility 700m x 700m, λ=10.0 Flooding Expanding Ring (TTL init =3) Random Walk (TTL max =50) VABF(T=1.0) VABF(T=3.0) (b) Figure 4.11: Comparison of the search cost in mobile networks with (a) ¸=1:0, and (b) ¸=10:0. 68 700 800 900 1000 1100 1200 1300 1400 1500 100 200 300 400 500 600 700 800 900 1000 0 1 2 3 4 5 x 10 4 Number of objects (n) Network size (m 2 ) Message size (bits) Figure4.12: TheoreticalandsimulationresultsofthesizeofacompressedVABFcontent update message as a function of the network size and the number of content objects in static networks. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 25 30 35 40 Ave. # of updates per second ( λ cu ) Ave. busy period length (sec) Ave. busy period length v.s. Average # of updates per second Static nodes VABF (T=3.0) Theoretical (T=3.0, E[σ]=5.277) VABF (T=2.0) Theoretical (T=2.0, E[σ]=3.518) VABF (T=1.0) Theoretical (T=1.0, E[σ]=1.759) (a) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 25 30 35 40 Ave. # of updates per second ( λ cu ) Ave. busy period length (sec) Ave. busy period length v.s. Average # of updates per second Mobile nodes VABF (4 m/s, T=3.0) VABF (8 m/s, T=3.0) VABF (4 m/s, T=2.0) VABF (8 m/s, T=2.0) VABF (4 m/s, T=1.0) VABF (8 m/s, T=1.0) (b) Figure4.13: Thepropagationdelayofcontentupdatesin(a)thestaticand(b)themobile environments. 69 Chapter 5 Lightweight Anonymous Communication Protocol Providing peer privacy in the P2P network has always been an important topic, which poses even more challenges when facing a P2P system over MANET. First, the open en- vironment in MANET makes its radio signals vulnerable to eavesdropping. Second, the multihop communication in MANET involves untrustworthy nodes in a private conver- sation. Third, MANET nodes are constrained by limited battery and computing power, which makes computation-intensive schemes such as the public-key cryptography too expensive to be adopted. Therefore, existing solutions for wireline Internet cannot be applied directly on MANET for P2P communication without considerable modi¯cations. Weproposeane±cientanonymouscommunicationprotocolforP2Papplicationsover MANET, called MANET Anonymous Peer-to-peer Communication Protocol (MAPCP). MAPCP is designed to be a °exible middleware between the P2P applications and most MANETroutingprotocols. MAPCPemploysabroadcast-basedmechanismtogetherwith a probabilistic-based °ooding control algorithm to establish anonymous paths between peers, which requires no hop-by-hop encryption/decryption, and hence introduces very low complexity in terms of computation and power consumption. MAPCP establishes multiple un¯xed anonymous paths between communication peers within a single query phase, and is highly resilient to node mobility, failure, and malicious attacks. Further- more,MAPCPprovidesschemesforcommunicationpeerstocontrolthetradeo®between anonymity degree and bandwidth e±ciency. 70 5.1 Design Rationale Hop-by-hopencryption/decryptiondoesprovideexcellentanonymityandcontentprivacy. However, previous study [21] [45] shows that the computational complexity and power consumption of a public-key encryption (e.g. RSA) are several orders greater than a symmetric-key encryption (e.g. AES) and a packet transmission. Therefore, we argue that cryptography should be used conservatively in MANET in which resources is scarce. The MANET communication usually involves one or multiple local broadcasts, even for unicast communication. As discussed in previous work [30] [59], broadcast without speci- fyingreceiver'srealidentitye®ectivelyachievesthereceiveranonymityandthwartsmany security attacks [28]. Therefore, we believe that a good solution for anonymous P2P communication over MANET should deal with the tradeo® between resource e±ciency (bandwidth e±ciency, energy consumption and computational intensity) and the degree of anonymity. Such a solution should lie somewhere between the pure broadcast scheme and the pure cryptographic scheme, as shown in Fig. 5.1. Figure 5.1: Tradeo® between hop-by-hop encryption/decryption schemes and broadcast- based schemes 5.2 Protocol Design The design of MAPCP assumes that each node is in the promiscuous receiving mode on their wireless network interface (which is mandatory for 802.11-based nodes in the 71 ad-hoc mode) and is capable of manipulating the source IP and MAC address of its out- going packets. Similar to most P2P applications, communication in MAPCP consists of two phases: the query phase and the data transmission phase. MAPCP uses only local broadcasts in both phases. To prevent the broadcast storm problem [40], MAPCP em- ploys a probabilistic algorithm to control packet °ooding in the data transmission phase. Conceptually, every node is assigned a rebroadcast probability for each communication session. Nodesalongtheselectedoptimalpathsareassignedthehighestprobabilitywhile nodes not on the optimal paths are assigned a lower or zero probability. At each node, the forwarding of a data packet depends on the calculated rebroadcast probability. To realize this, each MAPCP node maintains two tables: a destination table of ¯ve ¯elds (which include the destination ID pseudonym, the path pseudonym, the ± value, the ¿ value and the session key) and a path table of four ¯elds (which include the source ID pseudonym, the path pseudonym, the ± value and the ¿ value). 5.2.1 Query Phase The ¯le requester, S, ¯rst generates a one-time public/private key pair PK S and PK ¡ S , a 128-bit random nonce N S (used as its identity pseudonym) and a random positive integer ± = ± S > 1. The overhead of key and pseudonym generation can be traded o® by storage since the node can generate a number of keys and pseudonyms in advance. Then, S broadcasts to its neighbors the query message with a forged source address e.g. the broadcast address. The query message includes PK S , N S , ± and the query string QString. This is expressed as S!¤:fPK S ;N S ;±;QStringg: Besides, S keeps entries fnull;null;± S ;MAX INT;nullg in its own destination table, where MAX INT is a very large positive integer. When node i, (i6=S), receives a nonduplicate query message, it increases ± by 1 and forwards the query message to its neighbors. Node i checks whether the query can be satis¯ed. If no, it keeps entries fN S ;null;min(±);MAX INTg in its path table, where 72 min(±) is the minimum ± value among all received query messages. Otherwise, if i can satisfy the query (i is a ¯le holder), it generates a random positive number ¿ =¿ i >1, a 128-bit random nonce N i , another 128-bit random nonce N P i , and a one-time symmetric key SK i . Here, N i is its identity pseudonym, N P i is the path pseudonym and SK i is the session key for further communication with query originator S. Then, it broadcasts to its neighbors the query reply, which includes N S , N P i , ¿, and a PK S -encrypted part which contains N i , SK i and the metadata of the requested ¯le, as shown below: i!¤:fN S ;N P i ;¿;[N i ;SK i ;metadata] PK S g: Note that N S is used to identify the recipient of this query reply. Node i keeps entries fN S ;N P i ;min(±);¿ i ;SK i g in its destination table. Whennodej receivesanonduplicatequeryreply,itincreases¿ by1andforwardsthis messagetoitsneighbors. Ifj6=S,itupdatestheentryfN S ;null;min(±);MAX INTgin itspathtabletofN S ;N P i ;min(±);min(¿)g,wheremin(¿)istheminimum¿ valueamong all received query replies. Otherwise, if j =S, it decrypts the encrypted part with PK ¡ S to get N i , SK i and the metadata, and updates the entryfnull;null;± S ;MAX INTg in its destination table tofN i ;N P i ;± S ;min(¿);SK i g. 5.2.2 Data Transmission Phase Once node S collects enough query replies, data transmission between S and each ¯le holder R i can be done anonymously as follows. S looks up R i 's pseudonym N R i from its destination table to get N P R i , ± S , min(¿) and session key SK R i and broadcasts a data message to its neighbors, which contains N P R i , N R i , a positive number ® =± S +min(¿), and a SK R i -encrypted part consisting of N S and the data (e.g. a request for ¯le). This can be written as S!¤:fN P R i ;N R i ;®;[N S ;data] SK R i g: 73 When an intermediate node j, (j 6= S;R i ), receives a nonduplicate data message, it looks up N P R i in its path table to get min(±) and min(¿), and calculates its rebroadcast probability p j as ¹ = ® min(±)+min(¿) p j = 8 < : ¹¸ (min(±)+min(¿)¡®) ; if ¹<1, 1; otherwise, (5.1) where 0· ¸· 1 is a real number selected by the protocol. Then, node j forwards this message according to its rebroadcast probability p j . When node R i receives a nonduplicate data message identi¯ed by N R i , it decrypts the encrypted part with session key SK R i to get N S and the data. Likewise, if node R i intents to send a data message to S (e.g. the requested ¯le), it looks up N S from its destination table to get N P R i , min(±), ¿ R i and session key SK R i , and then broadcasts a datamessagecontainingN P R i ,N S ,apositivenumber® 0 =min(±)+¿ R i andtherequested ¯le to its neighbors. When receiving the data message, each intermediate node j, (j 6= S;R i ),calculatesitsrebroadcastprobabilityp 0 j using(5.1)andforwardsthedatamessage according to p 0 j . The selection of ¸ represents the tradeo® between anonymity and performance. If ¸=1,thesystemhasthehighestanonymitybutlowerforwardinge±ciency,sincedummy packetscontributetocollision. If¸isclosetozero,thesystemgeneratesthefewestdummy packets and has higher forwarding e±ciency. However, since the algorithm establishes multipleanonymouspathsinmostcases,anacceptabledegreeofanonymityisguaranteed even when ¸ is set to zero. The analysis of anonymity degree will be conducted in Section 5.3. Propagation Delay or Hop Count? The optimal path can be a path with minimum propagation delay or minimum hop counts. On Internet, the propagation delay re°ects more precisely the distance between two nodes since a one-hop away node may be kilometers away physically. However in MANET, the one-hop distance is limited by the radio transmission range of a node, and 74 more hops introduce more processing overhead and energy consumption. Furthermore, we observed that the propagation time re°ects poorly the real distance of two nodes in MANET,especiallywhentra±cisheavy. RoutingprotocolsuchasAODV[42]bu®ersthe broadcast packets for an random time before sending them to the MAC layer. Moreover, the 802.11 MAC layer senses the carrier before transmitting a broadcast packet, and postpones the transmission if it senses a busy channel. Therefore when the tra±c load is high, a packet may be queued for a very long time, and a node receives a packet earlier than other nodes may forwards the packet the last. Therefore, a route decision made according to the propagation time in a high tra±c load period (e.g. the query phase, in which the network is overwhelmed by broadcast messages) may not be a right decision when the tra±c load is back to normal. Therefore, the °ooding control algorithm in MAPCP uses hop count information to decide the optimal path between two nodes. The identity pseudonyms and path pseudonyms are used in MAPCP to identify the packet receiver and rebroadcast probability respectively for each communication session. Therefore, no pseudonym collision is allowed among all live communication sessions in the network. In case of pseudonym collision, the packet may be forwarded to the wrong target. Currently MAPCP ignores this problem and leaves it to the applications due to the following reasons. First, as studied in [30], for a l-bit pseudonyms, the probability of collision p collision when m pseudonyms are selected is p collision =1¡ Q m¡1 i=0 (2 l ¡i) (2 l ) m whichdecreasesexponentiallyaslincreaselinearly,andisextremelysmallwhenlisequal to 128 bits 1 , as used in MAPCP. Second, since the receiver is identi¯ed by the identity pseudonym, in case of path pseudonym collision, there is still chance for the receiver to receive the packet due to the broadcast-based communication nature of MAPCP. Third, sincetheidentitypseudonymscanberenewedateachpacketexchange,incaseofidentity pseudonym collision, the error can be con¯ned within a single packet transmission. 1 As shown in Kong's work [30], the probability is even smaller than the probability of detection failure of a 128-bit MD5 checksum. 75 (a) (b) (c) Figure 5.2: Probability assignment results of °ooding control in (a) a grid topology, (b) a randomly generated topology in the 700m-by-700m network ¯eld, and (c) a randomly generated topology in the 1000m-by-1000m network ¯eld. S is the sender, and R is the receiver. Fig. 5.2 shows two examples of probability assignment results of the °ooding control algorithm with ¸ equals to 0.9. Nodes marked by the darkest color are assigned rebroad- cast probability one. The lighter the node color, the lower the probability it has. Nodes marked by the lightest color are assigned probability lower than 0.5. The samplings are conducted in a static 700m-by-700m network ¯eld, and nodes are homogeneous with ra- dio transmission range being 250m. Fig. 5.2(a) presents evenly distributed nodes with a distance of 100m between their vertical and horizontal neighbors. This ¯gure shows an idealresultthatallnodesonpossibleshortestpaths(intermsofhopcounts)areassigned the highest probability. Figs. 5.2(b)and 5.2(c) present the randomly generated topolo- gies, and show that the probability assignments are not always perfect (i.e. only nodes on optimal paths are selected) due to random topologies and unpredictable collisions of query messages and query replies. 5.2.3 Extension of MAPCP MAPCP can also be extended to provide anonymity for general communication between nodes in MANET which share common secrets between each other. In this case, the query term in the query message is replaced by a trapdoor [18] which can only be opened by the intended communication party. Once the destination node receives the query message, it responses a reply message embedded with a proof of opened trapdoor to the 76 source node, and then the following communication can be realized anonymously along theanonymouspaths. Ifthecommunicationpartiesshareacommonsymmetrickeys, the encryptions and decryptions of the identity pseudonyms in the query message and query reply can be replaced by the symmetric cryptography instead of the expensive public key cryptography. 5.3 Security Analysis Attacks to the P2P communication protocols can be roughly divided into two categories: the service attacks, in which attackers try to paralyze the P2P service (e.g. DoS attacks) or steal the message content, and the anonymity attacks, in which attackers try to pin down the communication parties. The design of MAPCP aims at the protection against anonymity attacks, and leaves service attacks to existing solutions such as content en- cryptions. This section discusses the anonymity degree of MAPCP under di®erent attack scenarios. First, the anonymity degree is quantized using the entropy-based metric pro- posed by D¶ ³az et al. [17] and Serjantov et al. [58]. Second, we discuss popular anonymity attacks and how MAPCP thwarts these attacks. 5.3.1 Degree of Anonymity We consider the sender anonymity (the receiver anonymity can be obtained in a simi- lar way and the anonymity degree will be around the same in two-way communication). Throughout the analysis of anonymity, we follow the de¯nition of anonymity given by P¯tzmann and KÄ ohntopp in [44]: \Anonymity is the state of being not identi¯able within a set of subjects, the anonymity set", and the anonymity set is de¯ned as \the set of all possible subjects who might cause an action". In a hostile environment, adversaries can assign each suspicious node a probability of being the message sender. The less number of suspicious nodes (i.e. the smaller the anonymity set), the higher probability each sus- picious node can get. Apparently, an anonymity set which includes all nodes in a system and all nodes are equally suspicious provides the highest degree of anonymity. Unfortu- nately, the wireless network is an open environment, in which all messages are broadcast 77 in the air and are vulnerable to eavesdropping. By monitoring the node activities and the tra±c °ying in the air, adversaries are able to gathering information to distinguish di®erent nodes with di®erent probabilities to shrink the anonymity set. The degree of anonymity can be quanti¯ed by the entropy-based metric proposed by D¶ ³az et al. [17] and Serjantov et al. [58]. Consider a set Á of N nodes (jÁj=N), and the anonymityattackersassigneachnodeiinÁaprobabilityp i ofbeingthesenderaccording to the information eavesdropped from the system. The entropy of this system H(Á) is de¯ned as: H(Á)=¡ X i2Á p i log 2 (p i ) ThesystemhasthemaximumentropyH max whenallnodesinÁareequallysuspicious, i.e. p i = 1 N 8i2Á. Therefore: H max = ¡ X i2Á 1 N log 2 ( 1 N ) = log 2 (N) The degree of anonymity provided by the system d Á now can be de¯ned as: d Á = H(Á) H max Apparently d Á is zero when jÁj = 1 (the anonymity set consists of only one node), and 0·d Á ·1. Therefore, ifadversariesobservedthatthereare nnodesinvolvedinacommunication session while the other (N¡n) nodes are quiet, they can shrink the anonymity set Á 0 to a smaller one that consists only these n active nodes (jÁ 0 j = n), and assigned each node in Á 0 the probability 1 n , while others with zero probability. The anonymity degree of this system now becomes: d Á 0 = (¡ X i2Á 0 1 n log 2 ( 1 n )) 1 log 2 (N) = log 2 (n) log 2 (N) 78 For a single-path routing protocol such as AODV and ANODR, the value of n is roughly equal to the number of hops of its discovered route. In MAPCP, since the anonymous paths are decided by the rebroadcast probability of each node p rebroadcast i , the value of n is then determined by the number of relay nodes, which is di®erent in each communication session (a single run of packet exchange between the sender and the receiver). Let's de¯ne the random variables R i , i=1;:::;N, by R i = 8 < : 1 if node i rebroadcasts the packet; 0 otherwise: Then the value of n, which is equal to the expected number of relay nodes in a communication session, is found to be n = N X i=1 E[R i ] = E[R] = N X i=1 p rebroadcast i Since the °ooding control algorithm of MAPCP assigns rebroadcast probability one to all nodes on all possible optimal paths (when X = (± S + min(¿))), even with the settings of lowest anonymity (i.e. ¸=0), the value of n is still much larger than the hop counts of a single path. Therefore, MAPCP always provides higher anonymity degree than single-path (anonymous) routing protocols. 5.3.2 Tra±c Analysis In a more hostile environment, adversaries can detect the °ow of packets and track down the source and destination by means of tra±c analysis attacks. Tra±c analysis can be launched by analyzing the timing corrections (timing attack) or the content correlations (messaging coding attack) exhibited by packets, as described below. 79 5.3.2.1 Timing attacks and °ooding attacks In timing analysis attacks [48], adversaries monitor a speci¯c area and use temporal de- pendencybetweentransmissionstotraceavictimmessage'sforwardingpath. Ane®ective way to thwart the timing attacks is to introduce more randomness of transmissions to hide the real tra±c patterns. The Mix-net [11] uses playout bu®ers in the mix nodes to store and reorder received data packets, and to inject dummy packets into the bu®er if necessary. However, thiscanbecompromisedbysending n¡1messagestotraceavictim message when a playout bu®er of size n is used by each mix node, which is also called °ooding attacks. In ANODR [30], a variant playout-bu®er scheme is used to thwart the timing attacks, and the hop-by-hop payload shu²ing is used to stop the °ooding attacks. MAPCP adopts similar schemes used in Mask [63] that relies on collaboratively gener- ated dummy packets to conceal the real tra±c patterns. Furthermore, we observed that the timing information required for launching the timing attack is much di±cult to be obtained in wireless networks than in wired networks, especially when the wireless chan- nels are overwhelmed by broadcast packets. Routing protocol such as AODV bu®ers the broadcast packets for a random time before sending them to the MAC layer. Moreover, the802.11MAClayersensesthecarrierbeforetransmittingabroadcastpacket,andpost- ponesthetransmissionifitsensesabusychannel. Thereforewhenthetra±cloadishigh, there are good chances that a node receives a packet earlier than other nodes, but for- wards it much later than some other nodes. This makes the measurement of propagation delay insigni¯cant since it does not re°ect any more precisely the location of nodes or the forwarding paths. This observation, together with the arti¯cially and probabilistically generated dummy packets from MAPCP, and the multipath characteristics in MAPCP, constitute an e®ective defense against the timing attacks. 5.3.2.2 Message coding attacks Signatures of packets such as identical content, identi¯cation, and unchanged packet length can be clues for adversaries to recognize the correlation of packets and track the 80 °ow of packets. Hop-by-hop encryption, payload shu²ing and random padding on for- wardingpacketse®ectivelythwartthistypeofattackswhileintroducecryptographicover- headandperformancedegradation[30][63]. MAPCPdoesnotneedtoemployhop-by-hop encryption/decryptionsincetheanonymouspathsareconstructedprobabilisticallyandit does not need to have pair-wise shared keys between adjacent nodes. However, the path pseudonym, which is used by the relay nodes for determining the rebroadcast probabil- ity, is unchanged during the entire communication session. Nevertheless, in MAPCP, the pathpseudonymdoesnotrevealtherealtransmissionpaths: everynodewithrebroadcast probability greater than zero may rebroadcast the received packets. Adversaries can only seethatthereisacrowdofnodesforwardingpacketswithidenticalpathpseudonym, and the observed crowd changes from time to time since nodes forward packets probabilisti- cally. Furthermore, there is no link between the communication parties' real identities and the identity pseudonyms they are using, and the identity pseudonyms can also be changed by the sender or receiver at any time (since they share the session key and the sender's public key). Therefore, the information gained from message coding attacks is quite limited. To analyze the degree of anonymity provided by MAPCP under tra±c analysis at- tacks, consider the scenarios in which colluded attackers are able to divide the network space into smaller cells, as shown in Fig. 5.3. Suppose node S sends a message to node D, and the attackers divide the network space into nine cells. By timing and payload analysis, the attackers may ¯nd out that the message is originated in cell 7. Therefore, they can assume that the sender must reside in cell 7, and assign active nodes in this cell the highest probability of being the sender, while nodes in other cells probability zero. Therefore, thesize of anonymityset isshrunk tothe number of activenodes in that cell. The more cells the attackers divide, the smaller the anonymity set is. Apparently, dummy packets and multiple paths increase the size of anonymity set. Simulation results presentedinthenextsectiondemonstratetheimpactoftra±canalysisontheanonymity degree. 81 Figure 5.3: By tra±c analysis such as timing analysis and payload matching, colluded attackers (represented by black nodes) can divide the network space into smaller cells and shrink the anonymity set into a speci¯c cell. 5.4 Performance Evaluation The simulation is performed based on ns-2 [4]. MAPCP is implemented as a transport agentsittingonthetopoftheroutingagent,andaGnutella-likeP2Pclientisimplemented at the application layer to simulate the behavior of P2P applications. We compare the performance of the two systems: (1) P2P client on the top of MAPCP with AODV as its routing protocol (MAPCP system), and (2) P2P client on the top of AODV directly 2 (AODV system). The IEEE 802.11 with the distributed coordination function (DCF) for wireless LANs is used as the MAC layer in the simulation. The radio model uses characteristics similar to Lucent's WaveLAN, with 2 Mbps channel capacity, 250m radio propagationrange, andthetwo-waygroundre°ectionpropagationmodelasthephysical- layer path loss model. 50 nodes are randomly distributed within the 700m-by-700m and 1000m-by-1000m ¯elds respectively. Simulation lasts 900 seconds and each result is averaged over at least 10 runs with randomly generated topologies. MAPCP is evaluated using the following metrics: 5.4.1 Degree of anonymity We investigate the degree of sender anonymity in the scenario in which colluded attack- ers, by means of tra±c analysis, divide the network into some smaller cells. Recall that 2 Though AODV is not an anonymous routing protocol, it still can be used to represent the single-path anonymous routing protocol in this case. 82 parameters ¸ and ® determine the anonymity degree of MAPCP. MAPCP is ¯rst eval- uated under di®erent ¸ with ® set to ± S +min(¿)+¾, where ¾ = 0. Then, the value of ¸ is ¯xed at 0 and ® is increased by one (¾ = 1) to evaluate the e®ect of ® to the anonymity degree. We simulate 100 randomly selected one-to-one communication pairs over 20 randomly generated static network topologies, and each sender sends out one 512-byte data packet. The entropy metric de¯ned in Section 5.3 is used to measure the anonymity degree. Figs. 5.4(a), 5.4(b) and 5.4(c) demonstrate the anonymity degree of MAPCP and AODV when the network is divided into 1, 2 and 9 cells respectively. The ticks on x-axis representtheupperboundofthelineardistancebetweenthesenderandthereceiver. For example, a point with x = 500 represents an averaged anonymity degree of all sender- receiverpairswithdistancelessthan500metersbutgreaterorequalto250meters. Since the radio transmission of each node is 250 meters, the x-axis also represents the linear distance in terms of hop counts. These ¯gures show that the anonymity degree of both systems increases as the distance increases since there are more nodes involved in packet forwarding. Furthermore, MAPCP achieves higher anonymity than single-path routing protocols (represented by AODV) in all scenarios, which has justi¯ed that broadcast is an e®ective approach in providing anonymous communication. The ¯gures also show that the anonymity degree of MAPCP increases as ¸ increases, since more nodes are involved in packet forwarding and in the generation of dummy packets. However, this is accompanied with degradation of e±ciency in packet delivery since higher tra±c leads to more packet collisions. Moreover, as seen in the ¯gures, when the sender is one-hop away from the receiver, both protocols achieve the lowest anonymity degree, especially when the number of cells (created by adversaries) increases. AODV and MAPCP with ¸=0providealmostzeroanonymityforpeerswithinone-hopdistancewhenthenetwork is divided into 2 or more cells. The reason is that no relay node is needed in this short distance, and the anonymity set consists of only the sender and the receiver themselves if no other node help generate dummy packets. This gives an insight that an anonymous communication protocol should provide covering when communication pairs are close to each other, e.g. trusted nodes generate dummy tra±c to cover the real tra±c patterns. 83 (a) (b) (c) (d) Figure 5.4: Degree of anonymity in the 700m-by-700m ¯eld divided into (a) 1 cell, (b) 2 cells, and (c) 9 cells. (d) Degree of anonymity with a larger ® value. In MAPCP, the covering can be provided by using a larger ®, e.g. ®>(± S +min(¿)), as showninFig.5.4(d). Theincreaseof®involvesmoreneighbornodesinpacketforwarding and hence helps conceal the location of the sender. 5.4.2 Performance of packet delivery MAPCP is evaluated in terms of its performance of packet delivering and is compared to the routing performance in AODV. Both protocols are evaluated in high mobility and low mobility environments. In a high mobility environment, the node speed ranges from 0 to 20m/s with zero pause time (nonstop movement), while in a low mobility 84 environment, the node speed is ¯xed at 20m/s and the pause time ranges from 0 to 900sec. The random waypoint mobility model is used for both scenarios. Simulation uses CBR sessions to generate data tra±c in a rate of 4 packets per second with 512-byte data packets. To demonstrate the impact of tra±c load, two di®erent tra±c settings are evaluated. Thelow-tra±csettingconstantlymaintains5livecommunicationpairsduring the 900sec simulation, while the high-tra±c setting constantly maintains 10 pairs. Each pair exchanges 100 data packets. In this MAPCP simulation, ¸ = 0 and ® = ± S +min(¿). Two performance metrics are used: (1) the packet delivery fraction (PDF), which is the ratio of the number of packets received by the receiver to the number of data packets sent by the sender; (2) the average end-to-end delay of data packets, which is the duration from the generation of a data packet by the sender to the reception of it by the receiver. To simulate the cryptographic overhead in MAPCP, the computational delay of the ECAES public key cryptography (42ms for decryption and 160ms for encryption) [30] is added to the sender and the receiver upon the reception of each query reply and query message, respectively. Figs. 5.5 and 5.6 shows the performance of packet delivery of both protocols in the 700m-by-700m and 1000m-by-1000m network ¯elds respectively. As seen, MAPCP does not perform as good as AODV in packet delivery ratio, which is as expected, since MAPCP trades performance for anonymity and has not been optimized for end-to-end communication. The major reason of the performance degradation in MAPCP is that the broadcast-based communication causes more collisions, since there is no RTS/CTS exchange for channel reservation as in 802.11 DCF. The situation is worse when the traf- ¯c load gets higher. As seen in both Fig. 5.5(a) and Fig. 5.5(c), the PDF of MAPCP is about 95% in the 5-pair scenario, but only about 90% in the 10-pair scenario. How- ever, the PDF of MAPCP does not degrade signi¯cantly as the node mobility increases, which proves that the broadcast-based communication scheme adapt well to node mobil- ity. Fig. 5.5(b) and Fig. 5.5(d) show that the average end-to-end delay of data packets in MAPCPincreasesasthetra±cloadgoeshigh,whichisalsoasexpectedsincehighertraf- ¯c load indicates more chances of sensing a busy channel by the 802.11 MAC layer, and hence longer bu®ering before transmitting the broadcast packets. Moreover, the collision 85 (a) (b) (c) (d) Figure 5.5: Packet delivery fraction and end-to-end delay in the 700m-by-700m ¯eld with (a)(b) high mobility and (c)(d) low mobility. of query replies may lead to the retransmission of query messages, which also introduce more encryption delay at the receiver end. The ¯gures also give an insight that the PDF of both protocols degrades in the network of lower node density, as shown in Fig. 5.6(a) and Fig. 5.6(c). Furthermore, the low-node-density environment magni¯es the impact of node mobility. Apparently, the discovery of relay nodes for multihop communication is much harder in a sparse network than in a dense network. Nevertheless, the ¯gures show that the end-to-end delay of MAPCP in 10-pair tra±c load decreases signi¯cantly when the node density goes lower. Thereasonpartiallycomesfromthatthechannelislessbusyinlowernodedensity,which 86 (a) (b) (c) (d) Figure 5.6: Packet delivery fraction and end-to-end delay in the 1000m-by-1000m ¯eld with (e)(f) high mobility and (g)(h) low mobility. shortens the bu®ering delay in MAC layer and hence decreases the end-to-end delay of data packets. 5.4.3 Protocol Overhead The overhead of MAPCP is measured in terms of the normalized number of packet transmissions and its energy consumption. 87 5.4.3.1 Normalized number of packet transmissions We measure the normalized number of control packets, which is the ratio of the total number of control packets transmitted by any node to the total number of data packets received by all receivers, and the normalized number of data packets, which is the ratio of the total number of data packets transmitted by any node to the total number of data packets received by all receivers. We compare the overhead of MAPCP with that of AODV in one-to-many communication, in which senders and receivers are randomly chosen. Fig. 5.7(a) showsthe performance in terms of the normalized number of control pack- ets. As seen, the control overhead introduced by MAPCP almost remains the same, whiletheoverheadinAODVisproportionaltonodemobility. Forananonymousrouting protocol, more control overhead means higher cryptography overhead and higher energy consumption. Furthermore, the normalizedcontroloverheadinMAPCPdecreasessignif- icantlyasthenumberofreceiversincreases. ThisprovesthatMAPCPestablishesanony- mous paths from one peer to multiple peers more e±ciently. The normalized number of data packets shown in Fig. 5.7(b) indicates that MAPCP generates more redundant packets in the data transmission phase, which is as expected since MAPCP provides anonymity by generating dummy tra±c. However, these packet transmissions are spread over all involved nodes instead of concentrated on nodes en route. Therefore, the energy consumption per MAPCP node, as seen in the following discussion, is still acceptable. 5.4.3.2 Energy consumption We compare the energy consumption of MAPCP with that of single-path anonymous routing protocols using hop-by-hop encryption/decryption. A general hop-by-hop en- cryption/decryption protocol is implemented to imitate the behavior of ANODR. The implementation consists of two phases: the anonymous route discovery phase and the anonymous data forwarding phase. In the anonymous route discovery phase, the route discovery (RD) packets are broadcast to the entire network, while the route reply (RR) packets are unicast back to the source. Each node, upon receiving a nonduplicate RD packet, performs one AES encryption (to hide the route) and one AES decryption (to 88 (a) (b) (c) (d) Figure5.7: Overheadintermsof(a)normalizednumberofcontrolpackets,(b)normalized number of data packets, (c) energy consumption in route construction phase, and (d) energy consumption in data transmission phase. decrypt the trapdoor information). Each node en route, upon receiving a nonduplicate RR packet, performs one AES decryption. In the anonymous data forwarding phase, data packets are forwarded along the anonymous path established in the previous phase. For a comparison with MAPCP, nodes en route also generate dummy packets to show the extra energy consumption. The number of dummy packets generated per node en routeisanadjustableparameterinthesimulation. Thesimulationisconductedin700m- by-700m high-mobility environment. We measure the total energy consumption, which includeenergyconsumedbykeygenerations, encryptions, decryptions, packetbroadcasts 89 and unicasts, according to the numbers provided in [21] and [45]. For fair comparison, the MAPCP extension for non-P2P applications described in section 5.2.3 is used, and common secrets are assumed between the communication parties in both protocols. Our imitating protocol may not operate exactly the same with ANODR, however the num- ber of these cryptographic operations will be roughly the same. Furthermore, be advised thatforP2Papplications,theextraoverheadofquerybroadcastsshouldalwaysbeadded when anonymous routing protocols are used. Figs. 5.7(c) and 5.7(d) show the energy consumption in the route construction phase and the data transmission phase respectively. As seen, in the route construction phase, the energy consumed by MAPCP remains constant, while that of the hop-by-hop en- cryption/decryption based protocol increases linearly as node mobility increases, which is due to more route rediscovery processes as the mobility increases. However, MAPCP has slightly higher energy consumption when the mobility is zero. The reason is that the query replies in MAPCP are broadcast back to the sender, while in hop-by-hop encryp- tion/decryption based protocols, the route replies are unicast back to the sender, which consume less energy due to less packet transmission. In the data transmission phase, we compare MAPCP with the hop-by-hop encryp- tion/decryption based protocol that also generates di®erent number of dummy packets, whichcanre°ecttheextraenergyconsumptionfromgeneratingdummypackets. Wemea- surethetotalenergyconsumedinthedatatransmissionphaseduringtheentiresimulation process. The result is then normalized by the number of nodes involved in the commu- nication and its resulting packet delivery fraction. The ratio in the legend in Fig. 5.7(d) indicates the ratio of the number of total sent data packets to the number of total sent dummy packets. Since MAPCP broadcasts the data packets onto all anonymous paths it established between the communication parties, the number of packet transmissions in MAPCP is expected to be much larger. As seen in Fig. 5.7(b), the packet transmission is about 5 times more than that in the single-path anonymous routing protocol without dummy packets. However, as seen in Fig. 5.7(d), the energy consumed by MAPCP is as low as that consumed by the hop-by-hop encryption/decryption based protocol with 1:5 ratio of data packets to dummy packets. This shows that when providing the same 90 anonymity degree, the energy consumption in the data transmission phase is similar in both protocols. Recall that MAPCP consumes much lower energy in the route construc- tion phase. Therefore, MAPCP is expected to prolong the network lifetime compared to the hop-by-hop encryption/decryption based protocols. 5.4.4 E®ect of multipath in hostile environments We investigate the e®ect of multiple paths created by MAPCP in hostile environments where compromised nodes perform selective attacks. Selective attack is the simplest passive attack in which the compromised node drops data packets traveling through it. For a comparison with the single-path routing protocols, AODV is also simulated. We evaluate both protocols in networks with 10% and 30% compromised nodes, and 5 CBR session pairs are constantly maintained during the 900sec simulation period. Each CBR session sends 100 512-byte data packets in a rate of 4 packets per second. The results are shown in Fig. 5.8. As seen in Fig. 5.8(a), AODV achieves only about 85% and 75% in PDF when there are 10% and 30% compromised nodes respectively, while MAPCP still maintains a PDF of higher than 90% in both cases. The di®erence of the performance between two protocols is more signi¯cant in a sparse network, as seen in Fig. 5.8(c). The resultsprovethatprovidingmultiplepathsisane®ectivedefencetomaliciousattacks,and isessentialtoasecuredcommunicationprotocol. Furthermore,bycomparingFigs.5.5(b) and 5.6(b) with Figs. 5.8(b) and 5.8(d), we found that the delay of MAPCP is almost intact, while the delay of AODV increases signi¯cantly, especially in the sparse network (Fig. 5.6(b)). The increase in delay partially comes from the increased number of route re-discoverprocessesinAODVwhenpacketsaremaliciouslydropped. Forananonymous communicationprotocol,moreroutere-discoverprocessesmeansmorebroadcastsofroute request packets and more cryptographic overhead, which is really a concern in a resource constrained environment such as MANET. An interesting scenario shown in Fig. 5.8(a) is that the selective attack somewhat improves the PDF of MAPCP when the tra±c load is high, which is due to the alleviation of packet collisions when redundant data packets are dropped by the compromised nodes. 91 (a) (b) (c) (d) Figure 5.8: Simulation results in the hostile environments. The packet delivery fraction and end-to-end delay in (a)(b) the 700m-by-700m ¯eld and (c)(d) the 1000m-by-1000m ¯eld. 92 Chapter 6 Conclusion and Future Work The deployment of P2P content distribution systems over MANETs is more challenging than that over the wired Internet. In this research, we have attempted to understand the behavior of P2P networks and the characteristics of MANETs, and then ¯nd a best solution to perform P2P content delivery over MANETs. The research work conducted so far will be summarized, and several possible future research topics will be described in this chapter. 6.1 Conclusion A probabilistic query routing protocol based on the Bloom ¯lter, called VABF, for P2P applications over MANETs was studied in Chapter 3. VABF maintains the content information of each P2P node in a distributed way and forwards query messages to the closest object holder via the shortest path. The proposed protocol was evaluated by theoretical analysis as well as extensive computer simulations. The analysis provides a goodestimateoftheapproximateoverheadrequiredbyadeterministicqueryrouting. We derived expressions for the query success rate and the correlation between the frequency of content updates and the delay for updates to be propagated to the entire network. Simulationcurvesmatchthoseobtainedbytheoreticalanalysisclosely. Wealsocompared the VABF protocol with three popular search algorithms; namely, °ooding, random walk andexpandingringsearch,whichareadoptedbyunstructuredP2Psystems. Itwasshown 93 bysimulationthatVABFhasmuchhighersearche±ciency, especiallyunderheavyquery tra±c. To address the privacy issue in the multihop communication scenario, an e±cient anonymous communication protocol, called MAPCP, was proposed in Chapter 5 for P2P applicationsoverMANETs. MAPCPusesabroadcast-basedcommunicationschemeand a probabilistic °ooding control strategy to establish multiple anonymous paths within a single query phase. It was shown by computer simulation that MAPCP achieves a high anonymity degree even when colluded adversaries divide the network into several smaller cells. MAPCP also maintains high packet delivery fraction even under selective attacks. MAPCP is designed to be a middleware protocol lying between applications and network layer routing protocols and can be easily implemented on any existing MANET. To facilitate MANET applications in our daily life, it is critical to develop improved solutions to system performance and user-oriented applications such as P2P ¯le sharing. To ¯nd a good reference for the design of future user applications over MANETs, several unstructured P2P content discovery techniques over MANETs were evaluated in Chap- ter 4. These techniques are query °ooding, expanding ring search, random walk and BF-based probabilistic routing. We conducted mathematical analysis on performance metrics such as the query success rate, the route stretch and the search cost, and per- formed extensive simulations in both static and mobile environments for these schemes. Our main observation is that the BF-based probabilistic routing scheme, called VABF, outperforms °ooding-based and random walk schemes under various performance met- rics. The only disadvantage with VABF is that the control packet size increases as the number of shared objects increases. However, this may not impose a severe constraint on a middle-sized MANET. 6.2 Future Work For the future research, it is interesting to consider the improvement of the throughput andsecurityoftheP2PcontentdistributionoverseveralpromisingMANETapplications. These applications include the vehicular ad-hoc network (VANET) and underwater net- works. Both are variation of MANETs under di®erent constraints. 94 A VANET is actually a hybrid form of a MANET. That is, it consists of high-speed mobilenodes(movingvehicles)anda¯xedinfrastructure(road-sidestations). Themove- ment of mobile nodes in VANETs is usually monotonic, i.e. along a single direction or two opposite directions, and network territories are usually in long rectangular shapes. The underwater network is another variation of MANETs, in which the network topol- ogy is usually three-dimensional instead of two-dimensional as discussed in traditional MANETs. Thus, traditional MANET routing protocols should be redesigned in order to beadaptiveto highnodemobilities, takeadvantageoftheassistancefrom ¯xedroad-side stations in VANETs, or consider the three-dimensional scenario in underwater networks. After successfully locating the desired content via the query routing protocol, a peer nodewillattempttoestablishsteadyconnectionsbetweenitselfandselectedcontentown- ers to download the content. Building steady connections between the content requestor and content owners is a more interesting problem than building a steady connection between two single nodes. The transient nature of wireless links in MANETs and the uniquecharacteristicsofVANETsandunderwaternetworksmakethisproblemevenmore challenging. We may begin with studies on the impact of a peer selection algorithm on data transmission throughput. The peer selection algorithm decides which peer among all available content owners to download, and also the number of concurrent downloading tasks from di®erent peers. Peer selection is always a critical issue to the performance of P2P networks, and it has been widely studied for Internet P2P systems (especially on structured P2P systems). A good peer selection algorithm can improve the download speed signi¯cantlyas evidenced bythe BitTorrent network [14], where a client simultane- ously downloads pieces of the ¯le from all members of the overlay that share one speci¯c ¯le. In contrast, clients in Gnutella and Napster download their ¯le from one peer over its P2P overlay that changes frequently. As a result, the BitTorrent network has much higherthroughputinitsdownloadtasksascomparedwithotherpopularP2P¯lesharing networks. 95 While concurrent downloading from di®erent peers provides good throughput in P2P systemsovertheInternet,itmaycauseseriousinterferenceand,hence,throughputdegra- dation when being applied to a multihop wireless network. The situation could be even worseifselectedpeersareclosetoeachotherordownloadingtasksarenotscheduledcare- fully. As compared with the Internet, there are more factors that in°uence the download throughput and should be taken into consideration in the selection of downloading peers. These extra factors include: the physical nodes distance, the node location, residual energy, and the velocity and the direction of node mobility. In a heterogeneous environ- ment such as VANETs, there are also factors introduced by variations of relaying nodes, including di®erent computing power, the radio transmission range and storage capacity. It is important to analyze these issues and include these factors in a cost function. By optimizing the cost function, we can get the optimal decision on peer selection that maximizes the download throughput. Furthermore, we may study the peer selection problem in the following scenarios. ² Real-time Short Message Thisscenarioincludesreal-timesafetyandcontrolmessagesin VANETs, whichhas no tolerance of delay and °aw in the message integrity. ² Asynchronous Content Distribution ThisscenariooftenoccursinP2P¯lesharing. Itrequirestheintegrityof¯lesharing but imposes a looser constraint on delivery time. ² Synchronous Content Distribution This scenario happens in real-time audio or video broadcasting, online conferencing and online gaming, which generates real-time tra±c and, hence, has a strict jitter requirement. However,itcantolerateacertainamountoferroneousorlostpackets. ² On Demand Content Retrieval This scenario includes non-real-time P2P audio or video streaming, which has a strict jitter requirement and some tolerance on packet errors depending on the QoS requirement. 96 Bibliography [1] The Human Genome Project. [Online]. Available: http://www.ornl.gov/sci/techresources/Human Genome/home.shtml [2] The Jabber website. [Online]. Available: http://www.jabber.org/ [3] Kazaa. [Online]. Available: http://www.kazaa.com/ [4] The Network Simulator - ns-2. [Online]. Available: http://www.isi.edu/nsnam/ns/ [5] SETI@home. [Online]. Available: http://setiathome.ssl.berkeley.edu/ [6] skype. [Online]. Available: http://www.skype.com [7] D. Ahmet and C.-C. Shen, \Mobile ad hoc p2p ¯le sharing," in Proc. IEEE WCNC'04, 2004, pp. 114{119. [8] S. Androutsellis-Theotokis and D. Spinellis, \A survey of peer-to-peer content dis- tribution technologies," ACM Computing Surveys, vol. 36, no. 4, Dec. 2004. [9] B. H. Bloom, \Space/time trade-o®s in hash coding with allowable errors," Commu- nications of the ACM, vol. 13, no. 7, pp. 422{426, 1970. [10] A. Broder and M. Mitzenmacher, \Network applications of bloom ¯lters: A survey," no. 4, pp. 485{509, 2005. [11] D. L. Chaum, \Untraceable electronic mail, return addresses, and digital pseudonyms," Commun. ACM, vol. 24, no. 2, pp. 84{90, 1981. [12] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. Shenker, \Making Gnutella-like p2p systems scalable," in Proc. ACM SIGCOMM'03, 2003, pp. 407{ 418. [13] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong, \Freenet: A distributed anony- mousinformationstorageandretrievalsystem," in Proc. International Workshop on Design Issues in Anonymity and Unobservability, 2001. [14] B. Cohen, \Incentives build robustness in BitTorrent," in Proc. Workshop on Eco- nomics of Peer-to-Peer Systems, 2003. [15] M. Conti and S. Giordano, \Multihop ad hoc networking: the theory," Communica- tions Magazine, IEEE, vol. 45, pp. 78{86, April 2007. 97 [16] M.Conti, E.Gregori, andG.Turi, \Across-layeroptimizationofgnutellaformobile ad hoc networks," in Proc. ACM MobiHoc'05, 2005, pp. 343{354. [17] C. D¶ ³az, S. Seys, J. Claessens, and B. Preneel, \Towards measuring anonymity," in Proc. Privacy Enhancing Technologies Workshop (PET'02), Apr. 2002. [18] W. Di±e and M. E. Hellman, \New directions in cryptography," vol. IT-22, no. 6, pp. 644{654, Nov. 1976. [19] G. Ding and B. Bhargava, \Peer-to-peer ¯le-sharing over mobile ad hoc networks," in Proc. IEEE PERCOMW'04, 2004. [20] L.Fan,P.Cao,J.Almeida,andA.Z.Broder,\Summarycache: ascalablewide-area Webcachesharingprotocol," IEEE/ACM Transactions on Networking,vol.8, no.3, pp. 281{293, 2000. [21] L. M. Feeney and M. Nilsson, \Investigating the energy consumption of a wireless network interface in an ad hoc networking environment," in Proc. IEEE Infocom'01, Anchorage, AK, US, 2001. [22] C. Foh, G. Liu, B. Lee, B. Seet, K. Wong, and C. Fu, \Network connectivity of one- dimensional manets with random waypoint movement," Communications Letters, IEEE, vol. 9, no. 1, pp. 31{33, 2005. [23] M. Frodigh, P. Johansson, and P. Larsson, \Wireless ad hoc networking: The art of networking without a network," Ericsson Review, no. 4, 2000. [24] C. Gkantsidis, M. Mihail, and A. Saberi, \Random walks in peer-to-peer networks," in Proc. IEEE INFOCOM'04, vol. 1, 2004. [25] R. Huebsch, J. M. Hellerstein, N. Lanham, B. T. Loo, S. Shenker, and I. Stoica, \Querying the internet with PIER," in Proc. VLDB'03, 2003. [26] P. Jacquet, P. Muhlethaler, T. Clausen, A. Laouiti, A. Qayyum, and L. Viennot, \Optimized link state routing protocol," in Proc. IEEE INMIC'01, 2001. [27] D.Johnson, D.Maltz, andJ.Broch, DSR The Dynamic Source Routing Protocol for Multihop Wireless Ad Hoc Networks. Addison-Wesley, ch. 5, pp. 139{172. [28] C. Karlof and D. Wagner, \Secure routing in wireless sensor networks: Attacks and countermeasures," Elsevier's AdHoc Networks Journal, Special Issue on Sensor Network Applications and Protocols, vol. 1, no. 2{3, pp. 293{315, Sep. 2003. [29] A.Klemm, C.Lindemann, andO.P.Waldhorst, \Aspecial-purposepeer-to-peer¯le sharingsystemformobileadhocnetworks," in Proc. MADNET'03, 2003, pp.41{49. [30] J. Kong and X. Hong, \Anodr: Anonymous on demand routing with untraceable routes for mobile ad-hoc networks," in Proc. ACM MobiHoc'03, June 2003. [31] J.Kong,X.Hong,M.Sanadidi,andM.Gerla,\Mobilitychangesanonymity: Mobile ad hoc networks need e±cient anonymous routing," in Proc. IEEE ISCC'05, 2005. 98 [32] G. Kortuem, J. Schneider, D. Preuitt, T. G. C. Thompson, S. Fickas, and Z. Segall, \When peer-to-peer comes face-to-face: Collaborative peer-to-peer computing in mobile ad hoc networks," in Proc. IEEE P2P'01, 2001. [33] J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H.Weatherspoon, W.Weimer, C. Wells, andB. Zhao, \Oceanstore: Anarchitecture for global-scale persistent storage," in Proc. ACM ASPLOS'00. ACM, Nov. 2000. [34] A. Kumar, J. Xu, and E. W. Zegura, \E±cient and scalable query routing for un- structure peer-to-peer networks," in Proc. IEEE INFOCOM'05, 2005. [35] B. T. Loo, J. M. Hellerstein, R. Huebsch, S. Shenker, and I. Stoica, \Enhancing p2p ¯le-sharing with an internet-scale query processor," in Proc. VLDB'04, 2004. [36] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, \Search and replication in unstruc- tured peer-to-peer networks," in Proc. ACM ICS'02, 2002, pp. 84{95. [37] A. M. Makowski, \On a random sum formula for the busy period of the M/G/1 queue with applications," The Institute for Systems Research, University of Mary- land, Tech. Rep. CSHCN TR 2001-4, 2001. [38] L. E. Miller, \Distribution of link distances in a wireless network," Journal of Re- search of the National Inst. Standards and Technology, vol. 106, no. 2, pp. 401{412, 2001. [39] M. Mitzenmacher, \Compressed bloom ¯lters," in Proc. ACM PODC'01, 2001, pp. 144{150. [40] S.-Y.Ni, Y.-C.Tseng, Y.-S.Chen, andJ.-P.Sheu, \Thebroadcaststormproblemin a mobile ad hoc network," in Proc. ACM MobiCom'99, New York, NY, USA, 1999, pp. 151{162. [41] V. D. Park and M. S. Corson, \A highly adaptive distributed routing algorithm for mobile wireless networks," in Proc. IEEE INFOCOM'97, 1997, pp. 1405{1413. [42] C. Perkins and E. Royer, \Ad-hoc on-demand distance vector routing," in Proc. IEEE WMCSA'99, 1999, pp. 90{100. [43] C. Perkins and P. Bhagwat, \Highly dynamic destination-sequenced distance-vector routing (DSDV) for mobile computers," in Proc. ACM SIGCOMM'94, 1994, pp. 234{244. [44] A. P¯tzmann and M. KÄ ohntopp, \Anonymity, unobservability, and pseudonymity: A proposal for terminology," in Proc. Workshop on Design Issues in Anonymity and Unobservability, 2000, pp. 1{9. [45] N. R. Potlapally, S. Ravi, A. Raghunathan, and N. K. Jha, \Analyzing the energy consumption of security protocols," in Proc. ISLPED'03, 2003. [46] H.Pucha,S.M.Das,andY.C.Hu,\Ekta: Ane±cientDHTsubstratefordistributed applications in mobile ad hoc networks," in Proc. IEEE WMCSA'04, 2004. 99 [47] S.Ratnasamy,P.Francis,M.Handley,R.Karp,andS.Schenker,\Ascalablecontent- addressablenetwork," in Proc. ACM SIGCOMM'01, NewYork, NY,USA,2001, pp. 161{172. [48] J.-F. Raymond, \Tra±c analysis: protocols, attacks, design issues, and open prob- lems,"inProc. International workshop on Designing privacy enhancing technologies. New York, NY, USA: Springer-Verlag New York, Inc., 2001, pp. 10{29. [49] M. G. Reed, P. F. Syverson, and D. M. Goldschlag, \Anonymous connections and onion routing," vol. 16, no. 4, 1998. [50] M. K. Reiter and A. D. Rubin, \Anonymous web transactions with crowds," Com- mun. ACM, vol. 42, no. 2, pp. 32{48, 1999. [51] T. Repantis and V. Kalogeraki, \Data dissemination in mobile peer-to-peer net- works," in Proc. ACM MCM'05, 2005, pp. 211{219. [52] S. C. Rhea and J. Kubiatowicz, \Probabilistic location and routing," in Proc. IEEE INFOCOM'02, 2002. [53] M. Ripeanu, \Peer-to-peer architecture case study: Gnutella network," in Proc. IEEE P2P'01. [54] A. Rowstron and P. Druschel, \Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems," in Proc. IFIP/ACM Middleware'01, Heidelberg, Germany, Nov. 2001, pp. 329{350. [55] A. I. T. Rowstron and P. Druschel, \Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility," in Symposium on Operating Systems Principles, 2001, pp. 188{201. [56] R. Schollmeier, I. Gruber, and M. Finkenzeller, \Routing in mobile ad hoc and peer-to-peer networks. a comparison," in Proc. the International Workshop on Peer- to-Peer Computing, 2002. [57] L. Sera¯ni, F. Giunchiglia, F. Mylopoulos, and P. Bernstein, \The local relational model: A logical formalization of database coordination," in Proc. CONTEXT'03, 2003. [58] A. Serjantov and G. Danezis, \Towards an information theoretic metric for anonymity," in Proc. Privacy Enhancing Technologies Workshop (PET'02), Apr. 2002. [59] C. Shields and B. N. Levine, \A protocol for anonymous communication over the internet," in Proc. ACM CCS'00, 2000, pp. 33{42. [60] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, \Chord: A scalable peer-to-peer lookup service for internet applications," in Proc. ACM SIG- COMM'01, Aug. 2001, pp. 149{160. 100 [61] I.H.Witten,R.M.Neal,andJ.G.Cleary,\Arithmeticcodingfordatacompression," Communications of the ACM, vol. 30, no. 6, pp. 520{540, 1987. [62] T. Zahn and J. Schiller, \MADPastry: A DHT substrate for practicably sized MANETs," in Proc. ASWN'05, 2005. [63] Y. Zhang, W. Liu, and W. Lou, \Anonymous communications in mobile ad hoc networks," in Proc. IEEE INFOCOM'05, 2005. [64] B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. Kubiatowicz, \Tapestry: A resilient global-scale overlay for service deployment," IEEE Journal on Selected Areas in Communications, vol. 22, no. 1, Jan. 2004. 101
Abstract (if available)
Abstract
The mobile ad-hoc network (MANET) is emerging as a new paradigm of wireless communication in both civilian and military applications. Recently, efforts have been made to migrate peer-to-peer (P2P) applications from the wired Internet to the MANET system, which are expected to be the major impetus to MANET commercialization. Several fundamental technologies that facilitate the deployment of P2P content distribution systems over MANET are investigated. In particular, we have focused on efficient P2P content discovery and privacy protection in P2P file sharing.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Peer-to-peer content networking with copyright protection and jitter-free streaming
PDF
Supporting multimedia streaming among mobile ad-hoc peers with link availability prediction
PDF
Distributed indexing and aggregation techniques for peer-to-peer and grid computing
PDF
Performance and incentive schemes for peer-to-peer systems
PDF
Scalable reputation systems for peer-to-peer networks
PDF
Reliable and power efficient protocols for space communication and wireless ad-hoc networks
PDF
Joint routing, scheduling, and resource allocation in multi-hop networks: from wireless ad-hoc networks to distributed computing networks
PDF
Using formal optimization techniques to improve the performance of mobile and data center networks
PDF
Scalable peer-to-peer streaming for interactive applications
PDF
Location-based spatial queries in mobile environments
PDF
Language abstractions and program analysis techniques to build reliable, efficient, and robust networked systems
PDF
Anycast stability, security and latency in the Domain Name System (DNS) and Content Deliver Networks (CDNs)
PDF
Protocols, algorithms, and application adaptation for mobile ad hoc network (MANET)-like disruption tolerant networks (MDTNs)
PDF
Distributed resource management for QoS-aware service provision
PDF
Enabling virtual and augmented reality over dense wireless networks
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Performant, scalable, and efficient deployment of network function virtualization
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
Asset Metadata
Creator
Chou, Chao-Chin
(author)
Core Title
Techniques for peer-to-peer content distribution over mobile ad hoc networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
10/10/2007
Defense Date
09/04/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
content distribution,manet,mobile ad hoc network,OAI-PMH Harvest,p2p,peer-to-peer
Language
English
Advisor
Kuo, C.C. Jay (
committee chair
), Chen, Xiaojiang (
committee member
), Psounis, Konstantinos (
committee member
)
Creator Email
chaochic@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m860
Unique identifier
UC1173091
Identifier
etd-Chou-20071010 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-557695 (legacy record id),usctheses-m860 (legacy record id)
Legacy Identifier
etd-Chou-20071010.pdf
Dmrecord
557695
Document Type
Dissertation
Rights
Chou, Chao-Chin
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
content distribution
manet
mobile ad hoc network
p2p
peer-to-peer