Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Gradient-based active query routing in wireless sensor networks
(USC Thesis Other)
Gradient-based active query routing in wireless sensor networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
GRADIENT-BASED ACTIVE QUERY ROUTING IN WIRELESS SENSOR NETWORKS by Jabed Faruque A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER ENGINEERING) August 2007 Copyright 2007 Jabed Faruque Dedication To my parents and my wife Nushrat, for their love and support. ii Acknowledgements It has been my great opportunity to pursue my graduate study at USC Viterbi School of Engineering. During the period, extremely bright and talented faculties have taught me essential knowledge and skills for academic and professional life. I would like to acknowledge the help and cooperation of many people during my graduate study at USC. Firstly, I would like to thank my PhD advisor, Prof. Ahmed Helmy. It has been my distinct honor to work with him. I am really thankful for his guidance, inspiration and support to overcome my difficult time. His constructive feedback made my life easier to find the solution of critical problems. I would also like to acknowledge the feedback of Prof. Bhaskar Krishnamachari and Prof. Konstantinos Psounis at the different stages and problems of my PhD work. I am really thankful to all the members of NOMAD group, especially Fan, Narayan, Karim, Ganesha, Shamim, Shirin, Shao-Cheng, Wei-jen and Sapon. I have enjoyed their cooperation and it has been my great pleasure to work and discuss problems with them. Also, being in a same lab, I have availed lot of opportunity to discuss different ideas with the members of ANRG group, especially Marco, Sundeep, Shyam, Kiran and Avinash. Also,IwouldliketoacknowledgeProf. RameshGovindantoallow metouseTutornet test-bed. I have received significant support to use the test-bed effectively from the iii members of ENL, especially Kiyoung, Omprakash and Jeongyeup. Further, I like to acknowledge Prof. Cyrus Shahabi, my MS advisor, for proper guidance and support at the being days of graduate study at USC. Finally, I like to acknowledge my family members especially my parents and my wife, Nushrat. A wonderful and loving family has made this journey possible. Their patience and cooperation have helped me to reach my goal. iv Table Of Contents Dedication ii Acknowledgements iii List Of Figures viii Abstract xi Chapter 1: Introduction 1 1.1 Diffusion Information in the Environment . . . . . . . . . . . . . . . . . . 5 1.2 RUGGED: Information Gradient-based Routing. . . . . . . . . . . . . . . 7 1.3 Analysis of Gradient-based Routing Approaches . . . . . . . . . . . . . . . 8 1.4 PBS: A Virtual Grid Architecture for Querying . . . . . . . . . . . . . . . 9 1.5 TABS: Link Loss Tolerant Data Routing Protocol . . . . . . . . . . . . . 10 Chapter 2: Related Work 12 2.1 Query Dissemination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Data Routing in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . 18 2.3.1 Wireless Link Characteristics . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 Hop-by-hop retransmission . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Blacklisting and Link Reliability Metric . . . . . . . . . . . . . . . 19 2.3.4 Wireless Broadcast Advantage . . . . . . . . . . . . . . . . . . . . 20 2.3.5 Multiple Path Routing . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.6 Opportunistic Forwarding . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 3: RUGGED - Information Gradient-based Routing 23 3.1 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Simulation Model and Performance Evaluation . . . . . . . . . . . . . . . 28 3.2.1 Single-value Query . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.2 Global Maxima Search . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.3 Multiple Event Detection . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 v Chapter 4: Analysis of Gradient-based Routing Approaches 36 4.1 Query Routing Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.1 Single-path approach . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.2 Multiple-path approach . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Analytical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2.1 Assumptions and Metrics . . . . . . . . . . . . . . . . . . . . . . . 40 4.2.2 Single-path Approach . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.3 Multiple-path Approach . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3 Simulations and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.1 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.2 Query Success Rate i.e., Probability of Success . . . . . . . . . . . 51 4.3.3 Overhead i.e., Energy Dissipation . . . . . . . . . . . . . . . . . . . 52 4.3.4 Path Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.5 Wireless Link Loss Effect . . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Chapter 5: PBS - A Virtual Grid Architecture for Querying 58 5.1 Overview of PBS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2 Approximate Estimation of ‘d’ . . . . . . . . . . . . . . . . . . . . . . . . 64 5.3 Query Processing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.3.1 Aggregate query - Count, Sum and Average . . . . . . . . . . . . . 66 5.3.2 Aggregate query - Max and Min . . . . . . . . . . . . . . . . . . . 68 5.3.3 Combined Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.4 Analysis of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.4.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.4.2 Count Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.4.3 Max Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.4.4 Combined Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.5 Simulations and Performance . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.5.1 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.5.2 Count Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.5.3 Max Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.5.4 Combined Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Chapter 6: TABS - Link Loss Tolerant Data Routing Protocol 84 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.2 Protocol Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.3 Protocol Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.3.1 Node State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.3.2 Packet Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3.3 Packet Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3.4 Packet Reception . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.3.5 Handling poor quality links or link failure . . . . . . . . . . . . . . 93 6.3.6 Minimum Progress Limit . . . . . . . . . . . . . . . . . . . . . . . 93 vi 6.3.7 Suppress Extra Forwarding Paths. . . . . . . . . . . . . . . . . . . 95 6.3.8 State Transition of Node . . . . . . . . . . . . . . . . . . . . . . . . 96 6.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.4.1 Implementation and Evaluation Methodology . . . . . . . . . . . . 96 6.4.2 Brief Description of Other Routing Protocols . . . . . . . . . . . . 98 6.4.3 ARQ-based Routing . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.4.4 Multi-Path Routing . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.4.5 ARQ-based Routing with Link Quality Estimation . . . . . . . . . 100 6.4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.4.6.1 Approximate Linear Deployment . . . . . . . . . . . . . . 100 6.4.6.2 Dynamic Topology (Query-Reply Scenario) . . . . . . . . 102 6.4.6.3 ARQ-based Routing with Link Quality Estimation . . . . 104 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Chapter 7: Contributions 107 Chapter 8: Future Work 109 8.1 Query Dissemination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.2 Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.3 Reply Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Bibliography 110 vii List Of Figures 1.1 Environment model: (a) Events are at the peaks and their effect reduces with distance. (b) The event is located at “E”. Radial gradient of color represents the event’s effect. Black dots denote good sensor nodes and gray dots (e.g., “M”) denote malfunctioning sensor nodes. Nodes in the white region are the flat information region nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1 Routing protocol: Event is at ‘E’ and querier ‘Q’ located in the flat infor- mation region. Effect of ‘E’ follows diffusion law. M x is local maxima and M n is local minima. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Sensor layout: (a) uniform random grid. Sensors within dotted rectangle are removed to create sensor hole, (b) uniform random grid with sensor hole. “E” denotes the location of the event. . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Effect of flat information region nodes (3% environmental noise and 15% malfunctioning nodes). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Comparison with FBQ, ERS, and our information gradient based routing (GBR).HereERSringsizesare3,5and7. ForGBR,s1,s2 ands3 indicate 66%, 47%,and36%flatinformationregionnodesrespectively. Anda1 and a2 indicate ‘β’ is 0.7 and 0.5 respectively. . . . . . . . . . . . . . . . . . . 31 3.5 Queryfailureratetorouteaqueryaroundsensorsholeofthesecondsensor layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.6 For global maxima search, effect of flat information region nodes, while environmental noise is 3% and malfunctioning nodes are 15%. Notice that y-scale of (a) and (b) are 0-14000 and 0-8000 respectively. . . . . . . . . . 32 3.7 Multiple events detection, while environmental noise is 3% and malfunc- tioning nodes are 15%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 viii 4.1 A regular grid topology. Here, f j indicates the magnitude of information and f 0 <f 1 <··· <f d . The triangular pattern represented by white dots is present eight times in the grid. Information magnitude near the source is f d and gradually reduces towards the edge. . . . . . . . . . . . . . . . . 41 4.2 Single path approach with look-ahead parameter r. . . . . . . . . . . . . . 42 4.3 Query forwarding pattern using the multiple path approach. Depending on the position of the sink patterns are different. Here, the white dots indicate the participating nodes to forward the query towards the source and d is the distance between the source and the sink. . . . . . . . . . . . 47 4.4 Probability of query success of both approaches. ‘A’ and ‘S’ indicate the analytic and simulation results respectively. . . . . . . . . . . . . . . . . . 51 4.5 Comparison of the query success rate of the single-path and the multiple- path routing approaches using analytical results. (Simulation results yield very similar plots.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.6 Comparison of the overhead of the improved single-path and the multiple- path approaches using analytical results. . . . . . . . . . . . . . . . . . . . 53 4.7 Percentage of energy saving of the multiple path approach over the im- proved single-path approach for d≤ 25 using analytical models. . . . . . . 53 4.8 Path length increase factor for the improved single-path and the multiple- path approaches. The exponent of the probabilistic function β = 0.60. . . 54 4.9 Query success rate of the improved single-path approach with the varying lossy link conditions. The probability of malfunctioning nodes is p f = 0.05. 54 4.10 Comparison of the query success rate of the improved single-path and the multiple-path approaches in the presence of link loss, p c = 0.05. The exponent of the probabilistic function β = 0.65. . . . . . . . . . . . . . . . 54 5.1 Virtual gridof PBS withvirtualqueriers,Q v . IfS islocated at(x,y), then (x 1 ,y 1 ) and (x 3 ,y 3 ) are (x+ d √ 2 ,y+ d √ 2 ) and (x− d √ 2 ,y− d √ 2 ) respectively. 62 5.2 Light diffusion patterns in two different environments. Here, squares and triangles represent the measured data. Curve fitting is used to determine the diffusion equations.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3 E 1 ,E 2 and E 3 are three events of same type. The magnitude of E 1 and E 2 are X or more, while E 3 is much smaller. Here, small dots represent sensor nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 ix 5.4 E 1 ,E 2 and E 3 are three events of same type, where E 3 > E 2 >E 1 . Cell’s number,a,b,c,...,o indicates theorder of visit. Scopedfloodingisusedin cellaandthenb. Informationgradientisperceivedincellbthatdetermines the size of cell c. Again, cell e and g have more powerful event sources, so cell size is increased. For h,i, andj cells, centers are already visited, soQ v is moved diagonally to first unvisited node for those cells. Here, the result of Max query is the magnitude of E 3 . . . . . . . . . . . . . . . . . . . . . 69 5.5 Count query processing overhead for various number of sources and cells in the sensor field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.6 Count query using Scoped-flooding (SF) and Information gradient-based routing (IR) with β = 0.7. Here, DOI = 0.05 and Pr(link loss) = 0.1. . . 80 5.7 (Normalized) Overhead of Count query using scoped-flooding (SF) and Information gradient-based routing (IR) with β = 0.7. Here, DOI = 0.05 and Pr(link loss) = 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.8 Max query using Scoped-flooding (SF) and Information gradient-based routing (IR) with β = 0.7. Here, Pr(link loss) = 0.1. . . . . . . . . . . . . 81 5.9 Combined query using Scoped-flooding (SF) and Information gradient- based routing (IR) with β = 0.7. Here, DOI = 0.05 and Pr(link loss) = 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.1 Characteristicsofthefunctionsthatareusedtocompute“minimumprogress limit” using RetryCount of a packet. . . . . . . . . . . . . . . . . . . . . . 94 6.2 State transition of a node when it receives a new data packet. . . . . . . . 96 6.3 Layout of testbed with 56 nodes. . . . . . . . . . . . . . . . . . . . . . . . 97 6.4 Linear deployment with different node densities over an office floor. . . . . 101 6.5 Protocols performance in linear networks with different node density. . . . 101 6.6 Protocols performance evaluation using dynamic topologies (query-reply scenario). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.7 Performance of ARQ-based routing with periodic link quality estimation (using ETX) and TABS routing between 11AM to 3PM of weekdays. The overhead of ARQ-based routing doesnot includetheoverhead of linkqual- ity estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 x Abstract Every physical event results in a natural information gradient in the proximity of the phenomenon. Many physical phenomena follow known diffusion laws. In wireless sensor networks, this natural gradient information can be exploited to design robust and energy efficient protocols. In this thesis, we design an information gradient-based active query routing framework for wireless sensor networks. In our analysis and design, we target scenarios where the event’s effect can be sensed around the surrounding location of the event’s source. To process user’s query about events, we design protocols for three basic tasks: (1)query dissemination, (2)queryprocessing, and(3)query reply. Inthesedesigns, we are interested to improve energy efficiency and robustness. These protocols reduce energy overhead by eliminating proactive phases and unnecessary flooding. Also, several robustness issues, such as noisy natural information gradient, region without information gradient i.e., flat information region, and environmental noise as well as lossy wireless links are modeled and considered in our analysis. In addition to mechanism design, we use simulations, analytical modeling and test-bed implementation for performance evaluation. From our results, it is found that the proposed protocols are significantly robust and energy efficient than the state of the art solutions. More specifically, our current case studies include followings: xi 1. Design a multiple-path greedy mechanism to disseminate queries over noisy nat- ural information gradient. Using analytical model and simulations, we analyze robustness and energy efficiency of the protocol in the presence of erroneous read- ings, environmental noise and region without information gradient as well as lossy wireless links. Also, we propose a new single-path approach for query routing in the presence of information gradient. Further, we compare both single-path and multiple-path approaches using analytical model and simulations. 2. Develop an information gradient-based querying architecture that exploits geo- graphical information and the perceivable gradient of an event’s effect to reduce search overhead. We introduce virtual grid leveraging geographical information while the spread of the event’s effect is used to determine the size of the grid cell. Then probing is used to identify the availability of required information before dis- seminating a query in a cell. We develop and evaluate algorithms for aggregate queries and combined query using this architecture. 3. Design link loss tolerant data (or reply) routing protocol for multi-hop wireless sen- sor networks. The proposed data routing protocol effectively combines the benefits ofwirelessbroadcastadvantagewiththetraditionalretransmissionbaseddatarout- ing. Also, this protocol instantaneously adapts to network dynamics without peri- odic link quality maintenance. We evaluate the performance of the implementation of this protocol on a 56-node test-bed. In addition, we compare the performance of this protocol with that of hop-by-hop retransmission based routing with or without xii periodic link quality estimation, and the routing that exploits wireless broadcast advantage. xiii Chapter 1 Introduction Sensor networks are envisioned to bewidely used for habitat and environmental monitor- ing applications where the attached tiny sensors sample various physical phenomena.The introductionoftinydeviceswiththecapabilityoflimitedcomputation, tetherlesscommu- nication through low-power wireless and sensing have extended the applicability of net- working technology to observe natural and man-made phenomena at finer spatial scales than before. In addition, advances in Micro-Electro-Mechanical-Systems (MEMS) tech- nology make it possible to develop sensors to detect and/or measure a wide variety of physicalphenomenaliketemperature,light, sound,radiation,humidity,chemicalcontam- ination, nitrate level in water, etc. With the advancements in sensor technology, wireless sensor network enables the extension of computing platform to the physical world. Thus, wireless sensor nodescombine the physical sensing with networking and computing capa- bilities. In practice, sensor nodesare not only able to measure the real world phenomena, but also filters, shares and aggregates the measurements. In real life, every physical event leaves some fingerprints in the environment in terms of the event’s effect; e.g., fire increases the temperature, chemical spilling increases the 1 contamination, nuclear leakage increases the radiation so on. Moreover, most of the physical phenomena follow known diffusion laws[3][42] with distance, i.e., f(d) ∝ 1 d α , where d is the distance from the point having the maximum effect of the event, f(d) is the magnitude of the event’s effect, and α is the exponent of the diffusion function that dependson the type of effect and the medium; e.g., for light α= 2, and for heatα = 1. If the effect of the event spreads over a relatively significant area (spanning multiple sensor nodes) around the source of the event, then the information gradient may be exploited to design energy efficient and effective querying mechanisms for habitat and environmental monitoring applications. Throughout this document the term “source” is used to refer to the source of the event; e.g., the contaminant, the epicenter of an earthquake, etc. Query dissemination in sensor networks is different from traditional IP-based net- works. Sensor networks use data centric paradigm[23] for routing, storage and querying, where all communication is for named data or information, instead of the specific node that sense or hold the data or information. Thus, there is no communication overhead for address binding like IP-based networks. Also, event data available in physical en- vironment often exhibits strong correlation especially in dense sensor networks. Thus, in-network processing is enabled in sensor networks to reduce the number of flow and the amount of data flow using data aggregation and compression. Traditional data-centric query dissemination, i.e., routing protocols for sensor net- works are based mostly on flooding (e.g., Directed Diffusion[25]) or random-walks (e.g., Rumor routing[5], ACQUIRE[40]). However, theseapproaches donotutilize thedomain- specific knowledge, i.e., the information gradient about the monitored phenomenon. It is important to note that routing in sensor network is more than a message transport 2 mechanism. It needs to optimize for both information gathering and aggregation. Thus, the routing structure must match with the processof the physical information generation for effective query dissemination. Furthermore, perceiving the existence of information gradient in a region before query dissemination may reduce the search overhead of query routing significantly. In the physical world, for most of the natural events, information diffuses after the occurrence of an event; e.g., earthquake, nuclear leakage, chemical contamination etc. Also, moving objects, like vehicle, animals etc. can be tracked from the diffused acoustic or magnetic signals. However, in real life, sensors are not always perfect and subject to malfunctionduetoobstaclesorfailures. Also, thecharacteristics ofthesensornodes,e.g., limited battery life, the energy expensive wireless communication, and the unstructured natureofthesensornetworks,makebothquerydisseminationandqueryprocessingbased on the information gradient a challenging problem. After the query processing, usually reverse path forwarding mechanism[10] is used to send the reply of a query to a querier or sink. However, in the presence of significant asymmetric or unidirectional links in low-power wireless networks, existing reverse path approaches based on the child-parent relationship (of forwarding tree) towards the sink may fail frequently. In this dissertation, we develop a gradient-based querying framework to route query efficiently and effectively towards event source(s) while able to conquer the previously mentioned constrains. In our first study, we design two novel on-demand query dissemi- nation protocols to effectively exploit the natural information gradient repository. Here, one of the proposed protocols uses single-path while the other one uses fully distributed braidedmultiple-path approach. Thesecondstudypresentsadetail performanceanalysis 3 of gradient-based on-demand query approaches in the presence of ideal and lossy wire- less link conditions using analytical models and simulations. Leveraging geographical information and the diffusion property of event’s effect, our third study proposes a vir- tual grid-based querying architecture, Probe-before-Spray (PBS) to process information gradient-based queries efficiently. Based on PBS, we design new algorithms to process aggregate queries - count, sum, average, max and min and combined query. The fourth study presents the design and implementation of a resilient data (or reply) protocol that blends the benefits of traditional routing with wireless broadcast advantage. These studies are carefully chosen for the completeness of gradient-based querying in wireless sensor networks. Most of studies are based on the mix of mechanism design, modeling, analysis, simulations, empirical measurements and implementations. With the improvement of sensor technology, itwill bepossibleto collect moreaccurate information gradient using sensor nodes. Thus, technological advancement may further improve the effectiveness and the efficiency of the proposed framework and protocols. It is important to note that the information gradient concept is not limited to physical phenomena. For example, time gradient can be used to track a moving object, velocity gradient can be used to monitor highway traffic condition especially congestion, and so on. The following subsections describe the diffusion information pattern available in the environmentandbrieflysummarizethefourcase studiesthatarepartofthisdissertation. 4 1.1 Diffusion Information in the Environment Sensor nodes have small and inexpensive sensors. Also, several environmental effects like wind, rain etc., and obstacles due to dust and foliage further limit the sensing ability of thesetinysensors. Consideringtheseadverseeffects, followingcomponentsareconsidered in thisstudyto modelthe available information gradient ofa naturalor man-madeevent. (a) (b) Figure 1.1: Environment model: (a) Events are at the peaks and their effect reduces with distance. (b) The event is located at “E”. Radial gradient of color represents the event’s effect. Black dots denote good sensor nodes and gray dots (e.g., “M”) denote malfunctioning sensor nodes. Nodes in the white region are the flat information region nodes. • Area covered by event’s effect: When an event occurs, the diffusion of its effect is a function of distance, d and time, t. In particular, f(d,t) ∝ g(t) d α , where g(t) is a function of time t, and α is the diffusion parameter. Now, considering sensors readings at particular time instance, say t 1 , the diffusion can be expressed as a function of distance only, i.e., f(d) ∝ 1 d α . Theoretically, the tail of the diffusion of the event’s effect is infinite. But in practice, the sensors are unable to detect or measure the effect of an event below a certain threshold. Thus, after a certain 5 distance from the event’s location, sensors get a zero reading for that event. This creates a flat information region where the information gradient is unavailable. • Erroneous readings of malfunctioning sensors: Real sensors are subject to malfunction due to obstacles or sensor failures or calibration errors. Some of the sensors erroneous readings may cause an irregular pattern, e.g., local maxima or minima, in the information gradient region due to an event. To model this in the analysis and thesimulations, we consider that malfunctioning sensorsare uniformly distributed in the sensor network and assign them random readings. • Environmental noise: The conditions in the surrounding environment, such as the direction of airflow or fluid, humidity, etc., create environmental noise. The effect of this noise gradually increases towards the low gradient zone. Due to this noise, the sensors readings can increase or decrease by a certain amount. Let, d i be the distance of the i-th location from the peak information point(the event). Consider that f(d i ) denotes the gradient information of the i-th location withenvironmentalnoise,f max denotesthepeakinformationandf ∗ (d i )denotesthe gradient information of the i-th location without the environmental noise. Then, we model the noise as follows, f(d i )=f ∗ (d i )±f EN (f ∗ (d i )), f EN (f ∗ (d i ))∝ (f max −f ∗ (d i )). 6 Thus, in the environment (Fig.1.1), the information gradient of an event consists of a flat (i.e., zero) information region and a gradient region. The environmental noise is present only in the gradient region, while malfunctioning sensors are uniformly distributed in both regions. 1.2 RUGGED: Information Gradient-based Routing Natural information gradient is available normally as a part of a natural or man-made phenomenon without any communication. Also, this information provides directionality towards the source of an event. These inspire our first study on the design of RUGGED (Routing on Fingerprint Gradients) protocol using multiple-path approaches. This is an information gradient-based query dissemination protocol and similar to one-phase pull approach[24]. Theprotocol uses reactive and fully distributed querydissemination mech- anism and forwards a given query greedily along the direction of information improve- ment about the required event(s). The protocol uses braided multiple-path exploration and controls the instantiation of paths using a probabilistic diffusion function based on simulated annealing concept. Unlike other information driven query dissemination pro- tocols, the multiple-path based RUGGED is able to forward a query with or without the existence of information gradient in sensor networks. Further, the utilization of natu- ral information gradient repository allows the protocol to eliminate flooding or proactive information gradient maintenance overhead. We analyze the performance of the proposed protocol through extensive simulations using three different types of query - (1) Single-value query, (2) Global maxima search, 7 and (3) Multiple event detection (for multiple-path based protocol only). In the simu- lations, we observe that the protocol is energy efficient and resilient to malfunctioning nodes and environmental noise. Also, the query success rate of the multiple-path based RUGGED protocol is over 98% even at the presence of sensor hole and flat information region. Further, this protocol detects most events in the multiple events scenario having overlapping effects. 1.3 Analysis of Gradient-based Routing Approaches Several routing protocols have been proposed to exploit the information gradients, es- pecially directionality towards the source. These query routing protocols use greedy forwarding and can be broadly classified in two categories: (i) the single-path approach and (ii) the multiple-path approach. Furthermore, appropriate routing approach is criti- cal for applications in wirelesssensor networks (WSN) for resilience and energyoverhead, which is directly related to the lifetime of the sensor node and/or sensor network. In this study, we first propose a reactive and resilient query routing protocol based on single- path approach. Then using a regular grid topology, we develop analytical models for the query success rate and the overhead of both approaches for ideal and lossy wireless link conditions. For consistency, we validate our analytical models using simulations for grid topology. Also, both the analytical and the simulation models are used to characterize each approach in terms of overhead, query success rate and increase in path length. From this study, we find that the multiple-path approach is generally more energy efficient than the single-path approach when the sourceis relatively close (e.g., 22 hopsin 8 theideallinkcase,accordingtoourmodel)tothesinkeveninthecaseofideallinks. Also, the multiple-path approach yields shorter paths than the single-path approach. Further, as the number of malfunctioning nodes in the network increases, the query success rate of the single-path approach degrades significantly faster than that of the multiple-path approach. Finally,inthelossylinkcase,thequerysuccessrateofthesingle-pathapproach dropsdrasticallywhilethemultiple-pathapproachremainsquiteresilient. However,using ARQ and expanding ring search (ERS) when necessary, the resilience of the single-path approach improves while the overhead increases. 1.4 PBS: A Virtual Grid Architecture for Querying The diffusion nature of events effect helps to perceive the existence of information gra- dient within the surrounding region of an event source. This region is defined as “the geometry of event’s effect” throughout this document. This geometry depends on the magnitude and the type of an event source and may be determined using known diffu- sion laws, empirical data or local collaboration. Leveraging this geometry of events effect with geographical information, we propose a virtual grid-based querying architecture, PBS (Probe-before-Spray) to process information gradient-based queries efficiently. PBS initiates a given queryin a grid cell if information gradient is perceived in that cell. Here, the grid cell size dependson query types and on the geometry of the event’s effect i.e., on the given query parameter about the event’s effect. Also, geographic routing[27] is used to route a query among the virtual grid cells. 9 Using the proposed PSB architecture, we design algorithms for computing aggregate queries - count, average, sum, max and min and combined query. Also, we analyze the characteristic of PSB for different query types using probabilistic tools. Further, using extensive simulations, we demonstrate that PBS helps to reduce search overhead significantly (over 30%) to process such queries while attains accuracy over 99%. 1.5 TABS: Link Loss Tolerant Data Routing Protocol Most of the existing data routing protocols use proactive mechanisms, like blacklisting, link quality metric for consistent highly reliable pathsbetween a source and a destination (i.e., sink). In addition to periodic overhead associated with these mechanisms, each one has limitations or side effects. Using these mechanisms, a routing protocol is unable to adapt to network dynamicsbefore the next link quality estimation phase. Also, blacklist- ing or best neighbors selection based on link quality metrics may limit routing options. Further, in the case of a sparse deployment of nodes, blacklisting may cause network partitions. On the other hand, wireless broadcast advantage based routing protocols use multiple relays to improve reliability while causes high transmission overhead. Also, they are unable to attain high reliability without retransmissions dueto the non-zero loss probability of wireless links. ThispartofthestudydescribesthedesignofTABS(Try-Ancestors-Before-Spreading) that combines the benefits of wireless broadcast advantage with the traditional retrans- mission based data routing protocol. TABS is designed to send reply or data packets to sink (or querier). TABS broadcasts each packet. After successfully decoding a packet, a 10 node decides to further broadcast the packet if the nodeis either the parent of the sender or received the packet through an opportunistic link that allows the packet to progress morethanalimit called “minimum progress limit”. Unlike theschemesthatexploit wire- less broadcast advantage, TABS initially deters a sender from sending a packet through multiple relays by setting a high value for “minimum progress limit”. Thus, TABS for- wards a packet using traditional routing (based on child parent relationship) as well as exploits long opportunistic links when available. WeevaluatetheperformanceofTABSimplementationona56-nodetest-bed[1],where the nodes are telosb mote equipped with 802.15.4 radio. Also, we compare the perfor- mance of TABS with that of hop-by-hop retransmission based routing with and without periodic link quality estimation, and the routing that exploits wireless broadcast advan- tage. 11 Chapter 2 Related Work In this chapter, we briefly discuss the related work on query dissemination and query processingmechanisms. Also, wediscussaboutwirelesschannelcharacteristics andquery reply mechanisms. 2.1 Query Dissemination Several approaches have been proposed for query dissemination in sensor networks. Di- rected Diffusion[25][26] is one of the first data-centric query dissemination protocols that isparticularlyusefulforlong-livedcontinuousqueries. Inthisscheme,anode’sinterestfor some data is initially distributed through the network via flooding to find the sources of the relevant data. Diffusion results in high quality paths and is well-suited for long-lived continuous queries. The initial flooding overhead is amortized over the duration of the long flows. The work in [24] attempts to adapt directed diffusion to specific applications. However, Directed Diffusion does not utilize information gradients. Severalprotocols[5][40][37][41][33]havebeendesignedbasedonRandomwalks. Asymp- totically, the random-walks approach shows good performance. But in practice, it causes 12 high latency and without directionality and/or a proper value for the TTL, it may fail to resolve the query. However, for replicated data, it can efficiently process simple one- shot query. In [41], Servetto and Barrenechea have shown that multiple random-walks improve load balancing and minimize latency with increased communication cost. Also, they analyzed the random-walk approach in regular/irregular and static/dynamic grid topology, but they did not consider the existence of information gradients. Themajor difference between the information-gradient basedapproach andthe flood- ing and the random-walks based approaches is that the former uses the sensors measure- mentsabouttheevent’s effectforroutingdecisions. Wenowbrieflysummarizepreviously proposed information gradient-based query dissemination protocols. Chu, Hausseker and Zhao [8] propose Constrained Anisotropic Diffusion Routing (CADR) mechanism, especially designed for localization and target tracking. CADR uses a proactive sensor selection strategy for correlated information based on a crite- rion that combines information gain and communication cost. CADR is a single-path greedy algorithm that routes a query to its optimal destination using the local gradients to maximize the information gain through the sensor network. Later work by Liu, Zhao and Petrovic[31] proposed the min-hop routing algorithm to overcome the limitation of CADR to handle local maxima and minima. The algorithm uses a multiple step look-ahead approach with single path query forwarding. Here, the initial network discovery phase determines the minimum look-ahead horizon (in hops) so that the path planning phase can avoid network irregularities. The algorithm improves the success rate of routing message with additional search cost. Also, the increase in the neighborhood size causes more communications between the cluster leaders and their 13 neighbors further shown through analysis and comparison with multiple path approach in Section 4.2.2. In [30], a navigation protocol is proposed to guide along the safest path using a distributed information repository about the area covered by the sensor network. The network can adapt to sensor failure or the addition of new nodes by continuous updates of the distributed information content. Both building and updating of the information repository causes significant communication overhead. Another information gradient- based protocol is GRAdient Broadcast(GRAB)[47] that establishes virtual gradient to- wards the sink through initial flooding and building a cost field towards the sink. Then each source uses a limited size mesh to reliably send data to the sink. In the presence of asymmetric links, the established gradient is highly unreliable. Also, the cost gradient needs to rebuild for different sinks or when sink is mobile. Compared to this protocol, RUGGED uses natural information gradient to forward the query towards the source. Also, our proposed query dissemination mechanism, RUGGED, has some similarities with the techniques proposed in [13] and [21] for encounter-based routing in ad-hoc net- works using time gradient diffusion to find the destination. However, less communication and node dynamics and the mobility issue of ad-hoc networks make it inapplicable for static sensor networks. Our proposed querydissemination protocol, RUGGED, routes the queryusing a fully distributed decision making procedure by effectively exploiting the natural gradient in- formation repository, which is the consequence of the fingerprint gradient of physical phenomenon being monitored and follows well established physical laws. Multiple path exploration to discover the route or the event and control the instantiation of multiple 14 paths, using a probabilistic function based on simulated annealing concept, is another key difference with existing information-driven protocols. According to above discussion, it is apparent that two major approaches are usually used to disseminate query in the information gradients. These are (i) the single-path approach, and (ii) the multiple-path approach. It is important to note that all the above information-driven protocols based on the single-path approach, use a proactive phase to prepare the information gradient repository. In our second case study, we analyze the performance of the query routing mechanisms. The protocols[8][31][30] based on single- path approach are designed to forward query in information gradient region. Thus, in this study, we only consider the performance of various query dissemination approaches in the information gradient region. 2.2 Query Processing There has been substantial work on query processing in the database community. From the perspective of sensor network, it is an effort to co-design both query processing and networking subsystems to enable efficient and scalable self-organized data retrieval and in-network processing in a reliable, energy efficient and timely manner. Among several in-network query systems, Directed-Diffusion[25] is pioneer work. In- steadofusingquerylanguagelikeSQL,thisapproachfocusesonbothquerydissemination mechanisms and flexible in-network processing. All the routing protocols based on this approach consider that queries in the network are described by interest messages. Here, 15 the interest describes the query in detail. A sink node originates the interest and dis- seminates in the network by flooding. The decision about to which location to forward the interest is based on cues, for example location attribute, provided within the query. However, this approach is unable to avoid query dissemination overhead within regions having no event. Declarative query processing systems, like TinyDB[35] and Cougar[46], use flooding to disseminate queries in the network and collect the replies via a routing tree, where the root node usually is the user’s physical location. Here, queries are parsed and optimized at user’s PC and then injected into the tree-based sensor network for processing. Like Directed-Diffusion, here in-network processing can be done at leaf nodes or intermediate nodes to reduce the amount of data flow to the root. Leveraging geographical information and the area of the surrounding region around the event (i.e., the geometry of event’s effect), our proposed query processing mechanism introduces a virtual grid framework. This reduces search overhead when the required sourcesareabsentandperformsin-networkprocessingwithoutflooding. Thismechanism can be easily augmented with both Directed-Diffusion and TinyDB or Cougar to reduce flooding overhead for energy efficient in-network processing. It is important to mention that the proposed query-processing mechanism is appropriate for a set of events, where the event’s effect can be perceived within the surrounding region of an event source. Several query systems define policies to avoid flooding for query dissemination and forward the query only to nodes that produces relevant results for a particular query. For example, [36] uses semantic routing tree (SRT) to limit the query dissemination only to nodes having data value within a given range. Here, each node needs to collect data 16 from its children or subtrees. The SRT concept is analogous to index in a conventional database system and suitable for less dynamic environment. However, our proposed approach perceives the presence of event(s) through the diffusion of event’s effect and avoids search overhead, if the required events are absent within a grid cell. Also, another example is [48] that discovers querying paths for target tracking. This approach uses an objective function to choose a node that optimizes the usefulness of sensor data and corresponding communication costs along the paths. Model-based data acquisition scheme[11] proposed by Deshpande at el. has some similarity with our approach of using diffusion model concept. Their proposed architec- ture combines model-based approximate query answering for optimum data gathering. However, we use known diffusion models to estimate the geometry of event’s effect only. Also, the use of virtual grids in our proposed query processing technique has certain similarity with TTDD[32] approach for scalable and efficient data delivery to multiple mobile sinks. Here, each data source establishes grid as needed and sensor nodes at the cross point of the grid receive data from the source. Compared to our approach, in TTDD, grid cell size is fixed and independent of the geometry of event’s effect. Also, TTDD is not suitable for in-network query processing. In [15], three information-driven algorithms, DAM, EBAM and EMLAM have been proposed for constructing and maintaining sensor aggregates that collectively monitor target activity in the environment. All three algorithms are used for leader election without addressing query processing and associated routing issues. Here, we can obtain target count by determining the total number of elected leaders. 17 In the proposed virtual grid framework for query processing, the sizes of grid cells are determined by the geometry of event’s effect and the type of queries. The search overhead of query processing dependson the sizes of grid cells. Using this framework, we develop algorithms for basic aggregate queries (count, sum, average, max and min) and combined query. 2.3 Data Routing in Wireless Sensor Networks There has been a great deal of work exploring the low-power radio link characteristics in sensor networks and developing different mechanisms for data routing to improve re- liability. In this section, we review related work about link characteristics and different approaches to improve performance, like hop-by-hop retransmission, blacklisting, relia- bility metrics, wireless broadcast advantage, multiple path routing, etc. 2.3.1 Wireless Link Characteristics The characteristic of low-power radios has been empirically studied in several papers. Ganesan et al. [18] use 150 motes with TR1000 radio in an obstacle free outdoor envi- ronment to study the effect of MAC, link and application layers in data communication. They identify the presence of asymmetric and unidirectional links as well as unreliable long links with non-zero packet reception probability. Further, this study and Zhou et al. [50] using Mica2 (with CC1000 radio) observe non-isotropic radio connectivity. Woo et al. [45] examine packet loss between pair of motes and develop a packet loss model where link quality varies with distance. In [49], Zhao and Govindan perform a detailed study with 60 motes (Mica1) for three different environments, power levels and 18 coding schemes. Using a simple linear topology, they observe that more than 10% links are asymmetric and one third of links have loss rate greater than 30%. Using non-linear topologies, a complimentary study [6] by Cerpa et al. observes that the percentage of asymmetric links varies from 5% to 30%. Another empirical study [34] using 38 motes with CC2420(ZigBee) radio observes that the percentage of asymmetric links vary from 21% to 36%. 2.3.2 Hop-by-hop retransmission Hop-by-hop retransmission improves the data delivery rate of a given link when the link has non-zero loss probability. Thus, this mechanism improves the packet delivery success probability of whatever path has been selected for data routing. In [20], Gnawali et al. noticethatlink-layer retransmissionisnecessarytodeliverapacket withhighprobability. TABS uses hop-by-hop retransmission to recover from the failures due to poor quality links. 2.3.3 Blacklisting and Link Reliability Metric Blacklisting[12] avoids poor quality links i.e., eliminates unreliable, lossy or asymmetric links from the available links for communication. This mechanism limits the routing options and requires a high-density deployment of nodes [20] to avoid partition. Among different link reliability metrics, ETX (Expected number of transmissions) proposed by De Couto et al. in [9] is the most popular one. ETX considers both forward and backward reliabilities to identify high throughput paths in a network. Also, LQI 19 (Link Quality Indicator) information of IEEE 802.15.4 packet can also be used as a good link reliability metric [43]. Non-zero lossprobabilityof wirelesslinksrequireshop-by-hopretransmissionfor both blacklisting-basedandmetric-basedroutingprotocolstoachievehighsuccessprobability[20]. Further,linkqualitymaychangeovertimeduetoenvironmentaldynamics,transientnoise or node failure. Thus, each node sends periodic beacons to blacklist poor quality links or re-estimate link reliability with neighbors. The proposed protocol, TABS does not use blacklisting or link reliability metric to select a path, thus eliminates periodic overhead. 2.3.4 Wireless Broadcast Advantage Wireless broadcast advantage[28] allows a node within the transmission range of a sender to receive a packet where the node may not be the next hop destination of the packet. Thus, multiple nodes may receive a given packet and forward through multiple paths toward the sink. Therecent innovative routing technique ExOR[4] (Extremely OpportunisticRouting) exploits the advantages of the broadcast wireless channel for diversity to send a batch of packets. In ExOR, the node to forward a packet towards the sink is not predetermined beforethepacketistransmitted. Instead,amongthereceivingnodesofagivenpacket, the nodeclosesttothedestination thathasthehighestprioritybroadcaststhepacket further. Then the remaining forwarders transmit in order only the unacknowledged packets of the batch to suppress duplicate forwarding. However, the non-zero probability of missing an ACK can cause duplicate transmissions and may instantiate multiple forwarding paths. 20 The theme of TABS has some similarity with ExOR. Unlike ExOR, TABS avoids the periodic collection of inter-node link information to prioritize candidate receivers. TABS only uses implicit and explicit ACKs to suppress unnecessary alternative paths toward the sink. Also, TABS limits the use of wireless broadcast advantage when the inter-node link quality is good enough for traditional routing. Another protocol proposed in [14] also exploits wireless broadcast advantage that buffers corrupt packets and combines those to build a complete one; however, this ap- proach may be suitable for a sensor networks having low data rate. 2.3.5 Multiple Path Routing Multiple path routing is a basic solution to improve reliability. Braided multi-path pro- posed by Ganesan et al. in [19] identifies multiple routes where one route is primary that is mainly used for routing. Several alternative paths are maintained for use when the primary path fails. Gradient Routing (GRAd)[38] maintains a cost field through all nodes in the network and allows multiple nodes to forward the same packet. Gradient Broadcast (GRAB) proposed by Ye et al. in [38] is similar to GRAd, but controls energy-robustness tradeoff. ExOR[4] exploits multiple paths simultaneously through broadcast. TABS forwards a packet through multiple paths when link quality is poor. Also, packet forwarding through opportunistic long links may create alternative paths. How- ever, TABS uses implicit and explicit ACKs to suppress unnecessary paths to reduce the overhead. 21 2.3.6 Opportunistic Forwarding Opportunistic forwarding protocols use channel conditions to choose next forwarder. In [29], Larsson proposed a scheme that selects a forwarder based on reported RTS SNR (signal-to-noise ratio) by all potential forwarders. To avoid control packet overhead, [7] uses the historical observation of channel conditions. GeReF[51] uses similar approach, but prioritize best forwarders based on the geographic progress of a packet towards the destination. Performance of above protocols depends on accurate prediction of channel condition and distance. However, [2] has shown that packet delivery success probability is hard to predict usingthese measurements. In contrast, TABS avoids the measurement of channel condition and determines forwarding node(s) after the successful reception of a packet. Also, this protocol exploits long links opportunistically when available. 22 Chapter 3 RUGGED - Information Gradient-based Routing Sensingis the primarytask of a sensor network to harvest information aboutthe physical environment in order to answer a set of user queries or support other decision making functions. Typical user queries include detect, locate, track or classify physical phenom- ena of interest such as fires, chemical contamination, nuclear leakage, vehicles movements etc. Thus, routing in a sensor network for query dissemination is not just propagating data or queryfrom one nodeto another. It needsto beoptimized for both data transport and information gathering. Characteristics of the sensor nodes, i.e., limited battery life, energy expensive and lossy wireless communication, high probability of failure or malfunction and unstruc- tured nature of the wireless sensor network (WSN), make routing in the WSN a chal- lenging problem. Traditional routing protocols of the WSN are based mostly on flooding or random-walk. These approaches, however, do not utilize domain-specific knowledge, i.e., the event’s fingerprint gradient of a monitored phenomenon. In other words, such approaches are not optimized for information gathering. 23 Previousdata-centricroutingprotocolsthosearebasedoninformationgradient[8][31][30], use a proactive phase to prepare distributed or cluster based gradient information reposi- tory towards a target or an event. To adapt dynamic behavior of WSN, these approaches use periodic update of the information repository. To route a query from sink to source, most of these protocols use greedy routing algorithms based on information gradient. To prepare the gradient information repository and route a query, protocols do not utilize the well established law of physical events. Moreover, the query proceeds toward the source through a single path, which usually get trapped at local maxima or minima, or reach a dead-end due to imperfect sensing device and/or environmental noise. On the other hand, generating unlimited multiple paths to improve resilience resembles flooding. In this study, we propose a scheme for using the fingerprint gradient of the event’s effect to avoid the proactive phase of preparing the distributed information gradient repository and a novel energy efficient, fully distributed and reactive routing protocol based on that information gradient. Our protocol effectively exploits the laws of physical events for routing decision. Also it overcomes the limitations of usual information-driven routing protocols due to local maxima or minima using simulated annealing concepts in a distributed manner and establishes an effective balance between single path and multiple path exploration to discover routes for query dissemination. To design and test the protocol, we consider a somewhat realistic model of the environment consists of flat information region and environmental noise in addition to the region having noisy information gradient about the event’s effect. The target applications of the proposed protocol generally trigger queries to identify the origin of an event after its occurrence where the event’s effect follows diffusion laws. 24 For example, fire event, earth quake, chemical contamination, tracking moving vehicle etc. Our scheme works without the location information. However, location information, when available, can make our protocol more energy efficient and robust. 3.1 Protocol (a) ‘Q’ forwards query to its neighbors. All ‘f’ nodes are in the flat information region, so they use flooding again. (b) All ‘g’ nodes are in gradient information region, so they switch the query mode to gra- dient mode and must forward again. (c) All neighbors(p) of Mx have less informa- tion, sotheywillprobabilistically forwardthe query to their neighbors. (d) All neighbors(g) of Mn have more infor- mation, so they will forward the query to their neighbors. Figure 3.1: Routing protocol: Event is at ‘E’ and querier ‘Q’ located in the flat infor- mation region. Effect of ‘E’ follows diffusion law. M x is local maxima and M n is local minima. 25 With the intuition of natural information gradient discussed in Chapter 1 and the environment model presented in Sec.1.1, our basic information-driven routing protocol is designed. It is assumed that to prevent looping, each querier generates unique sequence number for the query it sends. Based on the environment model, each query can have two differentmodes-(1) flatregion modeand(2) gradientregion mode. Initially, a query starts with flat region mode. It switches to gradient region mode as soon as it finds the gradient information about the event’s effect. Thus, the query packet needs fields for the query ID and the query mode in addition to other information. A query may be initiated from any arbitrary node. Upon receiving a query about an event, the querier sets the query mode to flat region mode and forwards the query to its neighborswithitsgradientinformationlevelabouttheevent’seffect. Then,eachneighbor independently decides whether to forward the query based on following algorithm. • In the flat information region, if the query mode is flat, a node uses flooding to forward the query towards the gradient information region (Fig.3.1(a)). Otherwise, the node uses probabilistic forwarding described next. The query does not switch to the gradient modeunless gradient information is found. Hence, in the absence of event(s), gradient information is zero (ideal condition) and the protocol only uses this flooding approach. • In the gradient information region, a node uses greedy forwarding approach. If a node is able to improve the information level, it forwards the query to its neighbors for further improvement (Fig.3.1(b) and 3.1(d)). Otherwise, the node performs 26 probabilistic forwarding, described next. Note that, this greedy forwarding ap- proach is different from basic greedy forwarding algorithms which either choose the best neighbor or a set of best neighborsbased on collected information of neighbors like information level, close to destination etc. In our greedy forwarding approach underlying concept is, if a node’s information is more than that of its parent node along the forwarding path, then the forwarding path through the node may reach to node(s) having higher information. • The type of irregular patterns possible in the gradient information region, due to erroneous sensors reading as discussed in Sec.1.1, can be sharp drop or rise of information level about the queried event. To overcome such local and isolated maximaorinformationhole,theprotocolusesaprobabilisticforwarding(Fig.3.1(c)) which is a function based on simulated annealing concept. As a parameter, the function takes the hop-count(x) in the information gradient region. That is, the probabilistic function, f p (x) = 1 x β where, β depends on the diffusion parameter, α and controls the reachability of the protocol. As we will discuss, the performance is a function of the interplay between ‘α’ and ‘β’. This will be discussed in Sec.3.2. Nodes use the reverse path as a basic mechanism to send the reply of the resolved query to the querier. However, depending on the type of query, the reply mechanism may be optimized to suppress unpromising responses; more discussed in Sec.3.2.2. Initially the protocol instantiates multiple paths to discover source(s); but, in the absence of multiple sourcesmost of thepathswill terminate after few hops. Note that our algorithm does “NOT” require neighbor “Hello” messages (i.e., a node processing the query is not 27 assumed to know all its neighbors readings). Thishas proven to save significant overhead over using “Hello” message. 3.2 Simulation Model and Performance Evaluation Simulationswerecarried outtoanalyze andcharacterize theperformanceoftheproposed routing protocol for query dissemination. In the simulation, we use two different sensor layouts. The first layout is a regular 100×100 grid of 10000 sensor nodes and each node haseightneighbors. Thesecond oneisauniformrandomgrid225×375m 2 (Fig.3.2(a)) of 90 sensors with a sensor hole as shown in Fig.3.2(b). Here, the grid points are perturbed with independent Gaussian noise(0, 25) and each node’s communication range is 50m. For all simulations, parameter α, of the phenomenon diffusion function is set to 0.8. E 350 300 250 200 150 100 50 0 0 50 100 150 200 (a) 0 0 E 350 300 250 200 150 100 50 50 100 150 200 (b) Figure3.2: Sensorlayout: (a)uniformrandomgrid. Sensorswithindottedrectangleareremoved to create sensor hole, (b) uniform random grid with sensor hole. “E” denotes the location of the event. We consider two metrics for performance evaluation, (1) query success rate, and (2) average energy dissipation, i.e.,the total number of transmissions required to forward a query and get its reply. Also, performance of our routing algorithm is compared with 28 flooding and expanding ring search (ERS), which uses additive increase of ring size. Effectiveness of the information gradient is analyzed by varying the percentage of flat information region nodes and the percentage of uniformly distributed malfunctioning nodes. Also, we pay attention to tune the parameter ‘β’ of the probabilistic function described in Sec.3.1, to find optimal trade-off between energy dissipation and improve the query success rate of our proposed routing protocol. In the evaluation, we use three different types of query - (1) Single-value query, (2) Global maxima search, and (3) Multiple event detection. 3.2.1 Single-value Query The query searches for a specific value and only one source has the response. Here, the response is about an event, where the effect follows a diffusion law and creates the infor- mation gradient. For example, search for a source of chemical leakage, where information gradient is a fingerprint of the chemical contaminant. Also, the event source is assumed to be stationary and the source node uses reverse path to send the reply. 0 2 4 6 8 10 12 0 10 20 30 40 50 60 70 Percentage of failure (to find source) Percentage of nodes in flat region a=0.90 a=0.85 a=0.80 a=0.75 a=0.70 a=0.65 a=0.60 a=0.55 a=0.50 (a) Failure rate 0 1000 2000 3000 4000 5000 0 10 20 30 40 50 60 70 Avg. energy dissipation Percentage of nodes in flat region a=0.90 a=0.85 a=0.80 a=0.75 a=0.70 a=0.65 a=0.60 a=0.55 a=0.50 (b) Average energy dissipation Figure 3.3: Effect of flat information region nodes (3% environmental noise and 15% malfunctioning nodes). 29 In the first sensor layout, we simulate an event at (74,49) and querier can be any remaining nodes. Through simulation, we notice that with the increase of the flat infor- mation regionnodes,floodingoverhead becomesdominant. Theprotocolcreatesmultiple paths, which improves the query success rate, but increases the energy dissipation. In contrast, with the increase of malfunctioning nodes, the protocol switches from the flat region mode to the gradient region mode rapidly, which reduces the flooding overhead, but increases the query failure rates especially for higher value of β, the diffusion pa- rameter of the probabilistic forwarding function. For higher value of β, the probabilistic forwarding function, f p (x) = 1 x β drops sharply and the protocol explores less number of nodes. As a result, the query success rate increases and the average energy dissipation decreases. For β < α, the probabilistic function drops slowly and allows to follow the diffusion pattern due to α through probabilistic forwarding. Thus, values of β < α but close toα give the optimal trade-off between theenergy dissipation andthe querysuccess rate. In simulations, though α is 0.8, but due to simulated environmental noise, it is found from Fig.(3.3) that β = 0.65 is the optimal for the simulated scenario. In addition, Figure(3.4), the average energy dissipation of our algorithm is compared with that of the flooding-based querying (FBQ) and the expanding ring search (ERS) algorithms using the same configuration of the layout. Our routing protocol reduces the energy dissipation by 47-80% over FBQ, while the flat information region nodes are 66% or less. Also in the presence of 47% or less flat information region nodes, our protocol reduces the energy dissipation by 18-50% over ERS. The second sensor network layout is used to test reachability in the presence of a deployment gap or hole. Target is simulated at location ‘E’ and queriers can be any node 30 10000 3804 3942 4093 5138 5265 3340 3401 1906 2460 Flood ERS-3 ERS-5 ERS-7 GBR-s1a1 GBR-s1a2 GRB-s2a1 GRB-s2a2 GRB-s3a1 GRB-s3a2 Figure 3.4: Comparison with FBQ, ERS, and our information gradient based routing (GBR). Here ERS ring sizes are 3, 5 and 7. For GBR, s1,s2 and s3 indicate 66%, 47%, and 36% flat information region nodes respectively. And a1 and a2 indicate ‘β’ is 0.7 and 0.5 respectively. below the sensors hole of the sensor network layout as shown in Fig.3.2(b). In Figure 3.5, for 20% malfunctioning nodes, the flat information region nodes are varied from 20-94%. For smaller values of β, the success rates of our protocol to route the query around the sensors hole are above 98%, even at the presence of 55% flat region nodes. 0 5 10 15 20 20 30 40 50 60 70 80 90 100 Percentage of failure (to find source) Percentage of nodes in flat region a=0.90 a=0.80 a=0.75 a=0.70 a=0.65 a=0.60 a=0.55 a=0.50 a=0.45 a=0.40 Figure 3.5: Query failure rate to route a query around sensors hole of the second sensor layout. 31 3.2.2 Global Maxima Search The query searches for the maximum value of the event’s effect. This important statistic gives the current critical status about the observed phenomenon. For FBQ and ERS algorithms, to decide about the maximum value, need to explore all nodes of the WSN. However, using the information gradient, our protocol determines the global maxima by exploring only a limited number of nodes. 0 2000 4000 6000 8000 10000 12000 14000 0 10 20 30 40 50 60 70 Average energy dissipation Percentage of nodes in flat region a=0.90 a=0.85 a=0.80 a=0.75 a=0.70 a=0.65 a=0.60 a=0.55 a=0.50 (a) Average energy dissipation without filter to avoid malfunctioning nodes. 0 1000 2000 3000 4000 5000 6000 7000 8000 0 10 20 30 40 50 60 70 Average energy dissipation Percentage of nodes in flat region a=0.90 a=0.85 a=0.80 a=0.75 a=0.70 a=0.65 a=0.60 a=0.55 a=0.50 (b) Average energy dissipation with filter. Figure 3.6: For global maxima search, effect of flat information region nodes, while environmental noise is 3% and malfunctioning nodes are 15%. Notice that y-scale of (a) and (b) are 0-14000 and 0-8000 respectively. In this type of query processing, any node with some information gradient about the observed phenomenon, can become a potential responder of the query; so the reply over- headforthistypeofquerymaybecomesignificant. Hence,weproposeareplysuppression scheme in which intermediate nodes suppress non-promising replies by caching and com- paring the maximum value of the responses passing through the node for the same query ID. To make this scheme even more effective a node may use a timer (per reply) that is set before a reply is sent or forwarded, while the timer is running, the node listens to 32 other broadcast replies and suppresses unnecessary reports. The timeout is a function of the network size. As shown in Figure(3.6(a)), high reply overhead also increases with the increase of flat information region nodes due to the replies from the malfunctioning nodes having arbitrary high value. However, using the filter[16] to detect isolated mal- functioning nodes, this overhead can be reduces significantly as shown in Figure(3.6(b)). Also, we notice that the query success rates and the effect of malfunctioning nodes are similar to those of the single-value query. 3.2.3 Multiple Event Detection 0 20 40 60 80 100 0 5 10 15 20 Percentage of sources found Number of sources a=0.65 a=0.60 a=0.55 a=0.50 a=0.45 a=0.40 a=0.35 a=0.30 (a) Percentage of sources found vs number of sources. 0 1000 2000 3000 4000 5000 6000 7000 0 5 10 15 20 Average energy dissipation Number of sources a=0.65 a=0.60 a=0.55 a=0.50 a=0.45 a=0.40 a=0.35 a=0.30 (b) Average energy dissipation. Figure3.7: Multipleeventsdetection,whileenvironmentalnoiseis3%andmalfunctioning nodes are 15%. This type of query searches for the multiple sources of same type. Usually, for the non-coherent sources of information, gradient from multiple events are summed at each sensor node. The multiple path exploration mechanism of our proposed protocol allows to detect such multiple sources. However, with the increase of number of sources, the resultantinformationgradientduetomultipleeventscreatessomeplateau regions; hence, 33 the protocol requires more probabilistic forwarding to forward the query towards the events through that region. Thus, as shown in Figure(3.7), small of ‘β’ is required to detect such events when the number of events increases in the network. Itisimportanttomentionthattheproposedprotocolisnotsuitableforsomemultiple event scenario where events effects are non-overlapping. In such a scenario, if a query is lunched from one information gradient reason, it may fail to reach other information gra- dientregionsofotherevents. Furthermore, transmissionoverhead duetofloodingbecome significant. In study 3, we propose another mechanism to overcome these problems. 3.3 Conclusion In this study, we presented a scheme to route on fingerprint gradients in sensor networks. The main contributions of this study are 1. The proposed novel scheme to exploit the natural information gradient repository, which is a consequence of the fingerprint gradients of the event’s effect. 2. The novel reactive, fully distributed routing protocol for sensor network, based on above mentioned information gradient repository. Unlike other information-driven protocols for sensor network, our scheme eliminates the overhead of preparing and maintaining the information gradient repository. Three differ- ent problems were studied using our scheme and the performance of the routing protocol for each problem, was demonstrated by simulations. Overall energy dissipation of the protocol was found significantly low compared to FBQ and ERS. Also its success rate to route around sensors hole, was found to be over 98%. 34 Multiple path exploration and control the instantiation of paths by simulated anneal- ing, make our protocol well suited for broad range of applications includingtime gradient based target tracking, event boundary detection. One possible future research direction istodevelop protocolsfortarget trackingandtargetcountingusingourproposedscheme. Also, we notice that the parameter β of the probabilistic function depends on the dif- fusion parameter α. So, another important future work will be to establish analytical relationship between β and α to further reduce the energy dissipation. 35 Chapter 4 Analysis of Gradient-based Routing Approaches Exploiting information gradient, several routing protocols have been designed for wire- less sensor networks. They greedily follow the pattern of information improvement to disseminate query and can be broadly classified in two categories : 1. Single-path approach[8][31][30], where the query reaches the source from the sink through a single path. 2. Multiple-path approach[16],wherethequeryusesmultiplepathstoreachthesource. However, in sensor network, appropriate routing approach is critical for both the perfor- mance of applications in terms of resilience and the lifetime of sensor node and/or sensor networks. Thus, it is important to analyze the characteristics of these two major routing approaches. In this study, we do not aim to design new routing protocols per se. Rather, the objective of the research is the evaluation and the analysis of the general approaches to route a query using the natural information gradient in the sensor networks. Inthiswork, ourinterest isprimarilyfocusedon thesystematically analyzing theperformancee.g., the 36 query success rate and the overhead, of the single-path and the multiple-path approaches to design data centric routing protocols in the presenceof a natural information gradient. In particular, we use a probabilistic framework to develop simple analytical models for the query success rate and the energy overhead for both approaches in ideal and lossy wireless link conditions. Also, we design a new single-path routing protocol to improve robustness. Our analysis is validated through extensive simulations. For the analysis and the simulations, we only consider sensor networks with static nodes, which is usually the case for environmental monitoring, and we assume that the queries are triggered from a sink to identify the origin (i.e., source) of the event, after the event’s occurrence. To keep the analysis simple, we ignore potential packet collisions, which can be (and usually is) effectively reduced by inserting a random delay time before forwarding the query packet. However, wireless link loss is considered in both the analysis and the simulations. It is important to note that all the above information-driven protocols based on the single- path approach, use a proactive phase to prepare the information gradient repository. In this study, we analyze the performance of the query routing mechanisms without considering the cost of the proactive phase. Also, the protocols[8][31][30] based on single- path approach are unable to forward a query in the flat information region. In this study, we only consider the performance of various routing approaches in the information gradient region. 37 4.1 Query Routing Approaches In the section, we describe the two major information gradient-based routing approaches as mentioned before. To properly describe these routing approaches, we need to define the following terms: • Active node: A node which is currently holding the query. • Candidate node: A node which has never received the query. Now, a brief description of both routing approaches is given below: 4.1.1 Single-path approach Thequeryfollowsasinglepathtoreachthesourcefromthesink. Ateachstepofthequery forwarding, the active node uses a look-ahead parameter r, r≥ 1, to collect information from all candidate nodes within r-hops. For r > 1, all nodes within r−1 hops need to transmit the request of the active node to gather information about the event. Note that for r = 0, the single-path approach becomes a random-walk and is unable to utilize the gradient information repository. Single-path approach based protocols can be designed in several ways using different selection policies for the next active node. In our study, we consider the following two policies: a) Basic single-path approach: In this policy, the protocol always selects the node with the maximum information among all candidate nodes within r-hops of the active node, when the node’s information is higher than that of the active node. Thisselectionpolicyissensitivetolocalmaximaandarbitrarilyhighreadingsofthe 38 malfunctioningnodesthatcausetheselocal maxima. Theresilienceof theprotocols based on this approach can be improved by using filters to avoid such arbitrarily high readings. In this document, we omit the analysis and results for this routing policy due to significant low resilience compared to the other routing policy. b) Improved single-path approach: Inthispolicy, theactive nodeforwardsthequeryto a node having the maximum information among all candidate nodes within r-hops of the active node. So, the information content of all candidate nodes can be less than that of the active node. Here, the query forwarding ends either at the source node or at an active node having no candidate nodes within r-hops. In the rest of this document, single-path approach means improved single-path approach. 4.1.2 Multiple-path approach This approach forwards the query through multiple paths towards the source without any look-ahead phase. These paths may not be disjoint paths. Usually the active nodes forward the query greedily when information level improves. In the presence of malfunc- tioning nodes having wrong information, the protocols based on this approach can use probabilistic forwarding. For example, the protocol described in Section ?? uses a diffu- sion function for probabilistic forwarding. It creates some extra paths but the protocol can adaptively change the forwarding probability to control the instantiation of these extra paths. To capture this in the analysis, the forwarding probability is considered different at each step of the query forwarding. All query routing protocols considered in this paper use uniquequery IDs to suppress duplicates and to avoid loops. 39 4.2 Analytical Model In this section, we derive models to describe the characteristics of the approaches used to design information-driven routing protocols for sensor networks. In addition to ideal wireless link case, we also consider lossy wireless link case. In fact, several experimental studies on wireless sensor networks [45][49] have shown that in practice, the wireless links of thesensornetworks canbeextremely unreliableanddeviate fromtheidealized perfect- reception-range models at a large extent. Due to the lossy links, the transmissions of a node may not reach to some of its neighbor nodes. This affects the performance of the routing protocols. 4.2.1 Assumptions and Metrics Let a sensor network consist of N nodes and the nodes be deployed as a regular grid as shown in Fig.4.1. Assume that only one event occurs and the effect of the event follows the diffusion law as previously described. Assume also that the information gradient is available in the whole network, i.e., there is no flat information region. Further, consider that the malfunctioning nodes have arbitrary information and that these nodes are uni- formly distributed in the network which may cause failure during the route discovery. Let p f be the probability that a node is malfunctioning. The stored information in the malfunctioning node can be arbitrarily high or low and this is equally likely. Finally, assume each node is able to communicate via broadcast with its eight neighbors on the grid. 40 i = 0 1 . . . d f 0 f 0 f 0 f 0 f 0 f d f d−1 f d−1 f d−2 f d−2 f d−2 . . . . . . . . . . . . . . . . . . . . . . . . Figure 4.1: A regular grid topology. Here, f j indicates the magnitude of information and f 0 < f 1 < ··· < f d . The triangular pattern represented by white dots is present eight times in the grid. Information magnitude near the source is f d and gradually reduces towards the edge. Suppose that the querier, i.e. the sink, is located d hops away from the source node. The query is forwarded step by step, where the term “step” is defined as follows 1) Single-pathapproach: Theactivenodecollectsinformationfromallcandidatenodes within r-hops. Then it forwards the query to the next active node which is r-hops away. 2) Multiple-path approach: The active nodes forward the query either greedily or probabilistically via broadcasts. Then the query reaches the candidate nodeswhich are 1-hop away. Due to greedy forwarding based on the information gradient, after each step of the query forwarding, the query reaches one step closer to the source with some probability. In the case of lossy wireless links, let p c be the probability of a link loss, and assume that the lossy links are uniformly distributed in the network. Assume also that no au- tomatic repeat request (ARQ) is used to broadcast or to forward the query towards the 41 source, which is usually the case in sensor networks for energy conservation. However, notice that the ARQ mechanism is used to send the reply (if the query is successful) to the sink using the reverse path. Here, we are interesting in developing the analytical models of two metrics, (1) query success rate, and (2) overhead in terms of number of transmissions. 4.2.2 Single-path Approach Let n b be the number of nodes that are one hop away from the active node. Overlap of the sensor nodes radio coverage causes some nodes to receive the same query multiple times. The query ID is used to suppress duplicate queries. If we consider the radio coverage of a node to be circular and the radius to be the same for all nodes, then using simple geometry, it can be shown that the overlap is one-third. For all except the first step of the query forwarding, let n c = 2 3 n b denote the number of candidate nodes within one-hop of the active node. Now, it is easy to show that the total number of neighbors and candidate nodes within r-hops of the active node equal n B = r(r+1) 2 and n C = 2 3 n B Candidate nodes of the j−th step Overlapped nodes A j−1 j A (a) Aj−1 andAj aretheactivenodesofsteps(j−1) and j. j h n h 2 − 1 n h 3 − 2 b 2n n rn 3n b b b information gradient increasing A n (b) Active node Aj with r-hop neigh- bors. Figure 4.2: Single path approach with look-ahead parameter r. 42 respectively. However, for the first step of the query forwarding, n C =n B . Within one-hop of the active node, let n h and n l be the number of candidate nodes having high and low information respectively according to the diffusion pattern of the event’s effect in the grid, where n c = n l + n h . Thus, except for the first step of the queryforwarding, it can beshown from Fig.4.2 that the total number of high information candidate nodes within r-hops of the active node equals n H = r(r+1) 2 n h − r(r−1) 2 . Finally, n L = n C −n H denotes the number of low information candidate nodes within r-hops, since n C =n L +n H . In this routing policy, at each step of the query forwarding, the protocol selects the node with the maximum information among all candidate nodes within r-hops of the active node and forwards the query to that node. For protocols based on this approach, the querysuccess rate and the overhead depend on the length of the path followed by the protocol, which may be greater than the shortest path in the presence of malfunctioning nodes with arbitrary information. Let l j denote the length of the path after the j-th forward of the query. If all sensor nodes in the network were perfect, the query should followtheshortestpathandl j −l j−1 =r. However,duetomalfunctioningnodes,somelow information candidate nodes may contain arbitrarily high information with probability p f 2 . The probability of selecting such a node as the next active node is p f 2 n L n C and the path length difference per step equals l j −l j−1 =r+L err , where L err is the average path length increase per step. For simplicity of the analysis, if we consider that each step of 43 the query forwarding is independent, then the length of the path after the j-th step can be expressed as l j = p f 2 .n L n C (l j−1 +r+L err )+ 1− p f 2 .n L n C (l j−1 +r), for j > 1 r, for j = 1. Thus,l d denotes the length of the path followed by the protocol while the actual distance of the source is d. Here,thequeryforwardinghaltseitheratthesourcenodeoratanactivenodewithno candidate nodes within r-hops. In the gradient information repository, there are always some candidate nodes as the query forwarding proceeds from low to high information nodes. However, due to malfunction, with probability p f 2 , high information candidate node(s) may contain arbitrarily low information which may be lower than that of some low information candidate nodes. When all high information candidate nodes are mal- functioning and containing arbitrarily low information, the query forwarding proceeds through a low information candidate node. Such low information candidate nodes are unable to find any candidate node as its all neighbors may already have received the query and the query fails to reach the source. Therefore, the probability of query success in the ideal link case equals P single I = h 1− p f 2 n H i l l d r m . (4.1) Here, 1− ( p f 2 ) n H is the probability that not all high information candidate nodes are malfunctioning at each step and a total of l l d r m such steps are required. 44 We compute the overhead by counting the total number of transmissions requires to forward the query from the sink to the source for path length l d and get the reply. Considering that nodes in the overlapping regions respond only one, the total overhead of this routing approach in the ideal link case equals T single I O = 1+r 2 n b +r + l d r −1 1+ 2r 2 n b 3 +r +l d , (4.2) sinceexceptforthefirststepofthequeryforward,eachremainingsteprequires1+ 2r 2 n b 3 +r transmissions for the non-overlapping nodes. Here, l d transmissions are required to reply to the sink using the reverse path. Due to the lossy links, at each step of the query forwarding, the broadcast of the active node may not reach to all candidate nodes within r-hops. Thus, at each step of the queryforwarding, n H (1−p c ) high information candidate nodesreceive thebroadcast. Similarly, the active nodereceives responsesfromn H (1−p c ) 2 high information candidate nodes. Also, theprobability to forwardthequeryto thenextactive node, which isr-hops away, is (1−p c ) r . Thus,for theimproved-single path approach, the probability of success equals P single Ic = 1− p f 2 n H (1−pc) 2 (1−p c ) r l l d r m . (4.3) Here, 1− p f 2 n H (1−pc) 2 (1−p c ) r is the probability of success at each step of the query forwarding and a total of l l d r m such steps are required. Inthisroutingapproach,thefirststepofthequeryforwardingrequires1+ r(r−1) 2 n b (1− p c )+ r(r+1) 2 n b (1−p c )+r(1−p c ) transmissions, which equals 1+r(1−p c )(rn b +1). With probability p c , the nodes of the overlapped region can be candidate nodes of the current 45 active node as they failed to receive the broadcast of the previous active node due to the lossy links. Further, consider that the overlapped region nodes respond only one time. So, each remaining step of the query forwarding requires 1+ 1 3 r 2 (1−p c ) n b (p c +2)+ 3 r transmissions. Thus the total number of transmissions equals T single IOc =1+r(1−p c )(rn b +1)+ l d r −1 · 1+ 1 3 r 2 (1−p c ) n b (p c +2)+ 3 r + l d 1−p c . (4.4) Sinceexceptforthefirststepofthequeryforwarding,weneedtoconsidertheoverlapping region nodes for the remaining steps. Also, note that to reply to the sink through the reverse path requires l d 1−pc transmissions. 4.2.3 Multiple-path Approach In this routing approach, except for the firststep of the queryforwarding, multiple active nodes may forward the query to the candidate nodes without any look-ahead phase. The active nodes with lower information forward the query probabilistically. Let p j denote the probability of forwarding the query probabilistically at the j-th step of the query forwarding. So, at the j-th step of the query forwarding, a high information candidate node fails to forward the query with probability q j = p f 2 (1−p j ), where p f 2 is the probability that the high information candidate node is malfunctioning and containing low information. For simplicity of the analysis, we assume that the query forwarding steps are independent. This simplified model still captures the characteristics of the multiple-path approach, while the analysis is kept tractable. 46 i = 0 f 0 f 0 f 0 f 0 f 0 f d f d−1 f d−1 f d−2 f d−2 f d−2 f d−1 f d−2 f d−2 . . . . . . . . . . . . . . . . . . . . . . . . (a) Sink at i = 0. i = 1 f 0 f 0 f 0 f 0 f 0 f d f d−1 f d−1 f d−2 f d−2 f d−2 f d−1 f d−2 f d−2 . . . . . . . . . . . . . . . . . . . . . . . . (b) Sink at i = 1. i = d−1 f 0 f 0 f 0 f 0 f 0 f d f d−1 f d−1 f d−2 f d−2 f d−2 f d−1 f d−2 f d−2 . . . . . . . . . . . . . . . . . . . . . . . . (c) Sink at i = d−1. i = d f 0 f 0 f 0 f 0 f 0 f d f d−1 f d−1 f d−2 f d−2 f d−2 f d−1 f d−2 f d−2 . . . . . . . . . . . . . . . . . . . . . . . . (d) Sink at i = d. Figure 4.3: Query forwarding pattern using the multiple path approach. Depending on the position of the sink patterns are different. Here, the white dots indicate the participatingnodestoforwardthequerytowardsthesourceanddisthedistancebetween the source and the sink. Let P multiple and T multiple denote the query success rate and the overhead of the multiple-path approach. Consider that i denotes the position of the querier, i.e. the sink, in the last row of the grid as shown in Fig.4.1 and Fig.4.3. The query forwarding patterns, i.e. the number of participating nodes, are different for different values of i as shown in Fig.4.3. Also, it is easy to show that according to the diffusion pattern in the grid, the average number of low information candidate nodes is four at each step of the query forwarding. These nodes forward the query probabilistically. 47 According to the query forwarding patterns as shown in Fig.4.3, for an even value of i, 0≤i≤d, the query success probability equals P multiple e (i) = (1−q 1 )· d− i 2 Y m= i 2 1−q i+1 m+1 (1−p m+1 ) 4 · i 2 −1 Y m=1 1−q 2m+1 m+1 (1−p m+1 ) 4 1−q 2m+1 d−m+1 (1−p d−m+1 ) 4 , and for an odd value of i, 0≤i≤d, it equals P multiple o (i), where the only difference with theaboveexpressionisthelimitsoftheproducts(i.e.,⌈ i 2 ⌉≤m≤d−⌈ i 2 ⌉and1≤m≤⌊ i 2 ⌋ in the first and the second products respectively). Here, the terms having the form q x y ,1≤y≤d, inthe above equation expressestheprobability of notforwardingthe query atthey-thstepbyallhighinformationcandidatenodes,x,astheyaremalfunctioningand containing less information. Also, the surrounding four low information candidate nodes failtoforwardthequerywithprobability(1−p y ) 4 . Thus,atthey-th step,1−q x y (1−p y ) 4 istheprobabilityof forwardingthequerytowardsthesource. Detailed derivation ofthese equations and all the equations of the remaining document are presented in [17]. Similarly, to compute theenergy dissipation, we also consider the differentforwarding patterns of the query as shown in Fig.4.3. The total number of transmissions required to forward the query to the source for an even value of i, 0≤i≤d, equals T multiple e (i) = i 2 2 −1+(i+1)(d−i+1) − i 2 X m=1 (2m−1)q m + i 2 −1 X m=1 (2m+1)q d−m+1 +(i+1) d−i+1 X m=1 qi 2 +m +4 d X m=1 p m+1 , (4.5) 48 and for an odd value of i, 0 ≤ i ≤ d, it equals T multiple o (i), where the differences with Equation(4.5)arethefirstterm( h (i+1) 2 2 −1+(i+1)(d−i) i )andthelimitsofthesecond term (i.e., 1≤m≤ i+1 2 , 1≤m≤ i+1 2 −1 and 1≤m≤d−i respectively). Here, the first termoftheaboveequationcomputesthenumberoftransmissionsduetohighinformation candidate nodes, if all such nodes are working properly. However, some of these nodes are malfunctioning and unable to forward the query with probability q y , 1 ≤ y ≤ d, at they-th step of thequeryforwarding. Thisreducestheoverhead. Thesecond term of the above equation computes this reduction. Finally, the third term computes the overhead due to the probabilistic forwarding of the four low information candidate nodes. Now, if each value of i, 0 ≤ i ≤ d, is equally likely, then the average probability of success equals P multiple = 1 d+1 ⌊ d 2 ⌋ X k=0 P multiple e (2k)+ ⌈ d 2 ⌉ X k=1 P multiple o (2k−1) . (4.6) Similarly, the average number of transmissions required to forward the query from the sink to the source and get the reply using the reverse path equals T multiple = 1 d+1 ⌊ d 2 ⌋ X k=0 T multiple e (2k)+ ⌈ d 2 ⌉ X k=1 T multiple o (2k−1) +d. (4.7) 4.3 Simulations and Results In this section, we validate our analytical models by conducting extensive simulations. In addition to query success rate and overhead, we also investigate another metric, path quality in the simulations. We define the metric as a path length increase factor, which 49 is the ratio of the average length of the discovered path over the shortest path length between a set of sinks and the source. This metric is important for long-lived continuous queries. 4.3.1 Simulation Model Inoursimulations,weusea100m×100mgridwith10 4 sensornodesplacedatdistance1m from each other. Except for the border nodes, each sensor node is able to communicate with eight neighbors. For all simulations, the exponent of the phenomenon diffusion function, i.e., the parameter α, is set to 0.8. To be consistent with the analytical models, the information gradient is available in the whole network and the malfunctioning nodes are uniformly distributed with some arbitrary values. The querier, i.e., the sink, and the source are different and can be any node. We use a flooding technique to find the set of sink nodes that are specific shortest distance away from the source. In the simulations, we use only single-value queries, that search for a specific value and have a single response. The simulated protocol based on the single-path approach uses a look-ahead param- eter r = 1. For r = 1, it can be easily shown from Fig.4.1 that n B = 8, L err ≈ 2 and n H ≈ 2.5. So, using the expressions of Section 4.2.2, we get n C = 2 3 n B ≈ 5 and n L ≈ 2.5. These parameter values are used in the analytic models of the single-path approach to compare the analytical results with the simulation results. The simulated protocol based on the multiple-path approach uses a probabilistic dif- fusion function with exponent β as specified in [16] for probabilistic forwarding. Thus p j =f(j) = 1 j β , where j is the hop count in the information gradient region and β <α. 50 4.3.2 Query Success Rate i.e., Probability of Success For the single-path approach, the query success rate of the routing protocols depends on the availability of high information candidate nodes. From Fig.4.4(a), it is obvious that theanalytically resultsaremoreorlessinlinewith thesimulation results. Thenumberof high information candidate nodes reduces with the exploration of more nodes, especially for large d and causes some minor differences between the analytical and the simulation results. Theapproachisresilienttolocalmaximacomparedtobasicsingle-path approach due to its selection policy for the next active node. In the analytical model for the multiple-path approach, we consider that each step of the query is independent, and that low information candidate nodes forward the query probabilistically. However, duetocorrelation withpreviousstepsofthequeryforwarding, some extra nodes may also forward the query and create few more extra paths, which actually improve the query success rate when less number of nodes are malfunctioning. Also, with the increase of malfunctioning nodes, active nodes use more probabilistic 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 0 10 20 30 40 50 60 70 P(success) Distance of the source Err=0% (A) Err=5% (A) Err=15% (A) Err=0% (S) Err=5% (S) Err=15% (S) (a) Improved single-path approach. 0.9 0.92 0.94 0.96 0.98 1 0 10 20 30 40 50 60 70 P(success) Distance of the source Err=0% (A) Err=5% (A) Err=10% (A) Err=15% (A) Err=0% (S) Err=5% (S) Err=10% (S) Err=15% (S) (b) Multiple path approach. The exponent of the probabilistic function β = 0.65. Figure 4.4: Probability of query success of both approaches. ‘A’ and ‘S’ indicate the analytic and simulation results respectively. 51 forwarding that results less number of paths and the query success rate drops. For these reasons,wenoticesomeminordifferencebetweentheanalyticalandthesimulationresults in Fig.4.4(b). 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 0 10 20 30 40 50 60 70 P(success) Distance of the source Err=0% (multi) Err=5% (multi) Err=15% (multi) Err=0% (single) Err=5% (single) Err=15% (single) Figure4.5: Comparisonof thequerysuccessrate of thesingle-path andthemultiple-path routing approaches using analytical results. (Simulation results yield very similar plots.) The use of multiple paths and the probabilistic forwarding in the presence of mal- functioning nodes improves the query success rate of the multiple-path approach with compare to that of the single-path approach as shown in Fig.4.5. For the single-path approach, it is important to notice that the query success rate drops fast as the number of the malfunctioning nodes in the network increase. 4.3.3 Overhead i.e., Energy Dissipation In Fig.4.6, the overhead of both approaches is compared using the analytical results. It is obvious that in our model, if the source is less than 22 hops away from the sink then the multiple-path approach is more energy efficient; otherwise, the single path approach is preferable when energy dissipation is only considered. The overhead of the multiple-path approach increases more due to the extra paths created by probabilistic forwarding. 52 0 500 1000 1500 2000 2500 0 10 20 30 40 50 60 70 Average energy dissipation Distance of the source Err=0% (multi) Err=5% (multi) Err=10% (multi) Err=15% (multi) Err=0% (single) Err=5% (single) Err=10% (single) Err=15% (single) Figure 4.6: Comparison of the overhead of the improved single-path and the multiple- path approaches using analytical results. -40 -20 0 20 40 5 10 15 20 25 Percentage of the energy saving Distance of the source Err=0% (A) Err=5% (A) Err=10% (A) Err=15% (A) Err=20% (A) Figure 4.7: Percentage of energy saving of the multiple path approach over the im- proved single-path approach for d≤ 25 us- ing analytical models. Using analytical models, the percentage of energy savings of the multiple-path ap- proach over the single-path approach is shown in Fig.4.7. As the number of malfunction- ing nodes increase, the overhead of the single-path approach increases. Since, the length of the path followed by this approach increases. On the other hand, with the increase of malfunctioning nodes, the multiple-path approach uses more probabilistic forwarding. This creates less number of paths, and the overhead reduces. 4.3.4 Path Quality The multiple-path approach results the shorter paths which are very close to the shortest path length as shown in Fig.4.8. We notice that the path length for the single-path approach increases with the increase of the malfunctioning nodes. As expected, in the presence of malfunctioning nodes, the single path approach fails to follow the shortest path towards the source. On the other hand, the instantiation of multiple paths and 53 1 1.1 1.2 1.3 1.4 1.5 0 10 20 30 40 50 60 70 Path length increase factor Distance of the source Err=0% (multi) Err=5% (multi) Err=10% (multi) Err=15% (multi) Err=20% (multi) Err=0% (single) Err=5% (single) Err=10% (single) Err=15% (single) Err=20% (single) Figure 4.8: Path length increase factor for the improved single-path and the multiple- path approaches. The exponent of the probabilistic function β = 0.60. 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 P(success) Distance of the source p c =0% (A) p c =1% (A) p c =3% (A) p c =5% (A) p c =7% (A) p c =0% (S) p c =1% (S) p c =3% (S) p c =5% (S) p c =7% (S) Figure 4.9: Query success rate of the im- proved single-path approach with the vary- ing lossy link conditions. The probability of malfunctioning nodes is p f = 0.05. probabilistic forwarding help the multiple path approach to alleviate the problem for malfunctioning nodes. 4.3.5 Wireless Link Loss Effect So far, all the results presented in Section 4.3 consider ideal wireless links. However, the wireless links are lossy and likely affect the query success rate of the routing protocols 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 P(success) Distance of the source Err=0% (multi) Err=5% (multi) Err=10% (multi) Err=15% (multi) Err=0% (single) Err=5% (single) Err=10% (single) Err=15% (single) (a) Using analytic models. 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 P(success) Distance of the source Err=0% (multi) Err=5% (multi) Err=10% (multi) Err=15% (multi) Err=0% (single) Err=5% (single) Err=10% (single) Err=15% (single) (b) Using simulation models. Figure 4.10: Comparison of the query success rate of the improved single-path and the multiple-path approaches in the presence of link loss, p c = 0.05. The exponent of the probabilistic function β = 0.65. 54 significantly. Fig.4.10 shows the query success rate of both approaches in the presence of lossy links with probability p c = 0.05. Here, the analytical (Fig.4.10(a)) and simulation (Fig.4.10(b)) results are identical. In both cases, the query success rate of the multiple- path approach drops quite slowly and it is more than 93% even at the presence of 15% malfunctioning nodes in the sensor network. The ability to send the same query from multiple active nodes towards a candidate node improves the resilience of this approach significantly. For the improved single-path approach, the query success rate drops drastically with the increase of the distance between the source and the sink. Also, Fig.4.9 showsthat the query success rate drops further with the increase of loss probability of the lossy links, i.e., p c . From Equation (4.3), it is obvious that the term (1−p c ) r , which corresponds to forwardingthe queryfrom the currentactive nodeto the nextactive nodein the presence of lossy links, is responsible for the low success rate of this approach. Using ARQ, the query success rate can be improved significantly [17]. 4.4 Summary and Conclusions In this study, we have presented a detailed performance analysis of information-driven routing approaches in ideal and lossy wireless link conditions using analytical models and simulations. We consider the effect of (various kinds of) noise, malfunctioning nodes and node failures in our analysis. From our study, it is found that the query success rate of the single-path approach drops quite fast as the number of malfunctioning nodes in the network increase while the 55 multiple-path approach retains very high query success rate. Also, it is found that the multiple-path approach ismore energy efficient when thesource isless than 22 hopsaway from the sink; otherwise, the single-path approach is more energy efficient. For example, in Fig. 4.7, for 5% malfunctioning nodes and a source 15 hops away from the querier, the overhead of the multiple-path approach is only 75% of that of the improved single-path approach. Further, the multiple-path approach resultsin shorter pathswhich are close to the shortest path. Finally, in the lossy link case, the query success rate of the single path approach drops drastically with the increase of the link loss probability and the distance between the source and the querier. On the other hand, the multiple-path approach achieves over 93% success rate even at the presence of 15% malfunctioning nodes in the sensor network. The analytical models of both routing approaches can be used to determine the per- formanceofaprotocolforalargesensornetworkwithoutsimulationsorwhensimulations are not possible due to resource constraints. Further, the performance of a new proto- col, based on either of these two approaches, can be determined using our models. For example, in the model of the multiple-path approach, the forwarding probability, p j , can be replaced by a new forwarding policy, f(j) of a new protocol. In Section 4.3, we use this technique to model the protocol proposed in [16]. Also, these analytical models can be used to quickly determine the performance bottlenecks of a protocol. Fromtheanalyticalmodels,itisobviousthatmoreefficientinformation-drivenrouting protocols can bedesigned based on these two approaches by tuningthe parameters of the models. For example, the overhead of the multiple-path approach can be reduced further 56 by making the probabilistic forwarding function p j = f(j) more intelligent. This can be one possible future research direction. 57 Chapter 5 PBS - A Virtual Grid Architecture for Querying The diffusion property of an event’s effect can be used to determine the magnitude of the event from apart if the distance from the event is known. Conversely, for a given magnitude of an event source, it is also possible to estimate the spread (e.g., area) of the effect through known diffusion laws, empirical data or the local collaboration of sensor nodes. This property of information gradient can be utilized for query processing, especially for on-demand queryprocessing about the event(s). In this document, we refer the area around an event source from which the event’s effect can be perceived as “the geometry of event’s effect”. One of the challenges of using this diffusion property is that the gradient is not perfect in reality and suffers distortion due to various environmental effects. We carefully consider this fact in our design and analysis to exploit this property. Inrecentyears,sensornetworkshavebeenviewedasdistributeddatabasethatcollects the measurements of the physical world[44]. Users specify the named data they want to collect or the event of interest through application specific or declarative queries, and the infrastructure efficiently collects and processes the data within the sensor network. 58 Typical queries can be on-demand simple one-shot, aggregate or combined queries or long-lived continuous queries. Existing declarative query processing systems, including TinyDB[35] and Cougar[46] resolve query via a routing tree, which need to be established by initial flooding through- outthenetwork. Also,theapplicationspecificquerymechanisms,likeDirectedDiffusion[25], disseminate the interest hop-by-hop throughout the network similar to flooding; how- ever, they optimize the interest forward based on query parameters, e.g., location. Ran- dom walk based mechanisms, like ACQUIRE[40], process simple one-shot and combined queries leveraging replicated data. Nevertheless, all such approaches ignore the physical properties of the events of interest. Informationgradient-basedquerymechanisms[8][31][16]exploitthediffusionpatterns of events for directionality towards source(s) or node(s) that satisfy given query param- eters. However, in multiple events scenario, sources can be sparsely located and create non-overlapping information gradient regions. Thus, existing information gradient-based approaches unable to explore the gradients due to all sources. Therefore, the query processing may produce only partial results. In this study, we propose a novel framework that exploits the diffusion property to form virtual grid-based architecture, Probe-before-Spray(PBS) to process information gradient-based queries. Leveraging geographical information and the geometry of event’s effect, the querier (i.e., sink) establishes a virtual grid structure in a sensor field and initiatesthequeryineachgridcell. ThegridstructureofPBSusesthegeometryofevent’s effect to introduce search scope that reduces search overhead. Here, the cell size can vary with query type. Also, PBS uses probing to identify the occurrence or the existence of 59 event(s) and improves search overhead, especially for a region without event(s). Further, PBS overcomes the limitation of existing information gradient-based query processing approaches and explores the information gradients due to sparsely located sources. Compared to existing grid-based architectures, the grid cells of PBS are resizable and can be variable. Also, the querier establishes the grid in the network on-demand. Further,basedontheproposedPBSarchitecture,wehavedesignedalgorithmstocompute basic aggregate queries - count, sum, average, max and min, and combined query, which combines multiple sub-queries by conjunction operator. In this study, we focus our attention on the set of events where the event’s effect diffusesafter its occurrence. Here, we assume that the sensor nodesare able to detect the changes due to event(s). Several recent work ([39, 22]) justifies this assumption. Also, initiallyweassumethatthesurroundingregionofaneventsourceisobstaclefreetodiffuse theinformationgradientofanevent. However, thisassumptioncan berelaxedusinglocal collaboration to detect such obstacles. Finally, throughout this study, we consider the approximate geometry of event’s effect based on empirical results and known diffusion laws. This geometry may change with environment condition. Precise computation of the geometry of event’s effect is beyond the scope of this work. However, for performance evaluation, weconsiderthedistortionofdiffusionthatcapturesthiseffectinsomeextend. 60 5.1 Overview of PBS Architecture The proposed virtual grid-based active querying architecture, PBS relies upon two foun- dations - (1) the geometry of event’s effect, and (2) an underlying geographic routing scheme. An event source having specific magnitude (say, X) diffuses its effect in a sensor field. Depending on the sensitivity of embedded sensors, this diffusion spreads up to a certain area. At the periphery, the recorded magnitude of the event’s effect is much lower than X. However, this small magnitude of the event’s effect can be regarded as an indication that the required source(s) may exist within that area, which is called “the geometry of event’s effect”in thispaper. Leveraging thiskey idea, a sensor node,S, having minimum sensitivity m (where, m<X) can establish a virtual circular contour, C, within which it can detect the presence of a source having magnitude X. In addition to known diffusion laws, this C can be also be determined by empirical data or local collaboration. Now, virtual grid formation is described through Figure 5.1. Consider the distance between the source and the node,S is d. Now, according to the geometry of event’s effect, the radius of contour C is d. For conservative estimation of d and to avoid gap or overlapping area due to circular region, here we consider the inner square (say, C is ) of C. Thus, the node S is able to detect the presence of a source having magnitudeX withinC is . Here,thelengthofeachsideofC is isd √ 2. Inatwo-dimensional sensor field, using C is as the area of grid cell, PBS divides the specified region using the query parameter into grid cells as shown in Figure 5.1. Depending on query types, the cells size can be equal (e.g., count, sum, average) or variable (e.g., max, min, combined). 61 2 v Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v x 3 ,y 3 ( ) x 1 ,y 1 ( ) C S(x,y) d Q Figure 5.1: Virtual grid of PBS with virtual queriers, Q v . If S is located at (x,y), then (x 1 ,y 1 ) and (x 3 ,y 3 ) are (x+ d √ 2 ,y+ d √ 2 ) and (x− d √ 2 ,y− d √ 2 ) respectively. For each cell, the node closest to the center of corresponding cell is considered as virtual querier,Q v . Thesevirtualqueriersinitiate queryinthecorrespondinggrid cells on behalf of the querier. To initiate query in a cell, the corresponding virtual querier, Q v , performs following two tasks. 1) Information probing: Q v uses a probing phase to identify the existence of infor- mation gradient in a cell. To improve the quality of probing, Q v also collects data about the required information gradient from its one-hop way neighbors. 2) Query spray i.e., dissemination: If information probing finds the required informa- tion gradient, Q v disseminates the query either by scoped flooding or information gradient-based query dissemination mechanisms, like RUGGED[16]. The routing protocol, RUGGED usesbraidedmultiple-path exploration andcontrols theinstan- tiation of paths using a probabilistic function. In the information gradient region, a node forwards the query greedily towards the region having the required level 62 (according to given query parameters) of information gradient, where nodes use scoped flooding to find all nodes that satisfy the given query. Here, given query parameter(s) and the boundary of the grid cell limit the scope of flooding. On the other hand, when the probing is unable to identify the required information gradient in the cell, Q v simply forwards the query to the Q v of the next cell. Inadditiontoquerydisseminationwithingridcells,theproposedqueryingmechanism uses a geographical routing protocol, GPSR[27] to route the query among Q v s and to get a reply. GPSR[27] is previously developed in literature to enable packet/query delivery to a node at a specified location. This routing mechanism has of two modes - (1) greedy- mode forwarding, and (2) perimeter-mode traversal. In greedy mode traversal, when a node receives a packet destined to a node at location (x,y), it forwards the packet to the neighbor closest to (x,y). In the absence of any such neighbor or due to existence of void in the network, the node forwards the packet using perimeter mode traversal that uses right-hand rule to get around the voids. According to above description, PBS architecture is based on three assumptions. First, all nodeshave the knowledge aboutthe geometry of event’s effect. Thisdependson the type of application and sensing modality. As previously mentioned, in this study, we consider the approximate geometry of event’s effect based on known diffusion laws (e.g., light, temperature etc.), empirical results or local collaborations. Approximate estima- tion of d is detailed in Section 5.2. Second, all nodes know the approximate geographical perimeter of the network, which may be configured at the time of deployment or us- ing simple discovery protocol. Finally, nodes location can be determined using existing localization protocols. 63 AlthoughthebasicideaofPBSissimple,themainchallengingpartistodesignenergy efficientqueryprocessingalgorithmsforvariousquerytypes. Here,wedevelopalgorithms for aggregate queries - count, sum, average, max, min, and combined query using PBS architecture. Following sections detailed the approximate estimation of ‘d’ and the query processing algorithms. 5.2 Approximate Estimation of ‘d’ Before describing the approach to estimate approximate value of ‘d’, we first present some empirical results to support the fact that event’s effect follows diffusion law in real environment. For empiricaldatacollection, wemeasurelight diffusioninbothemptyroom(formin- imum surface reflection) and office room (for moderate surface reflection) in the presence of ambient light. We usehighprecision digital lightmeter (EXTECH,model401025) and omni-directional light sources having different magnitudes. Here, we measure the change of light intensity due to omni-directional light source. In both scenarios, we observe sim- ilar pattern of light diffusion having diffusion parameter, α = 2, as shown in Figure 5.2. Although same light sources are used in both rooms, the dark surface of the office room absorbssome portion of light, so themeasured light intensity in the office room is slightly lower than that of the open room. Also, we observe the fact that the signals of multiple non-coherent sources (e.g., light source) have additive effect at each point of overlapping diffusion regions. It is required to mention that we use this empirical data set to emulate 64 1 2 3 4 25 50 75 100 125 150 175 es (d)= d 2 43.584 f os (d)= d 2 37.166 f ew (d)= d 2 21.841 f ow (d)= d 2 20.688 light intensity (in foot−candle) (office room − strong light source) (empty room − weak light source) Distance, d, from the source(in foot) (empty rooom − strong light source) f (office room − weak light source) Figure 5.2: Light diffusion patterns in two different environments. Here, squares and triangles represent the measured data. Curve fitting is used to determine the diffusion equations. event sources for simulations to evaluate the performance of PBS architecture and the proposed algorithms. Now,toestimatetheapproximatevalueof‘d’,considertheminimumchangedetection sensitivity of sensor node is m for the event of interest. Also, assume that the event’s effect follows a diffusion law having diffusion parameter α. Now, consider a query to find node(s)havingmagnitudeX. Ifasensornodecan measureaneffect havingmagnitudem from d distance away from the source of interest, then the distance, d, can be expressed as d = α r X m . (5.1) Here, the value of α may change with the change of environmental condition and the elasticity of medium. Thus,theabove equation computesonlythe approximate geometry of event’s effect. 65 5.3 Query Processing Algorithm In this section, we describe the details of new query processing algorithms that uses PBS architecture. 5.3.1 Aggregate query - Count, Sum and Average Theaggregate queryCount countsthetotalnumberofsensornodesinanetworkthatsat- isfies the given query parameters, for example, how many nodes of a sensor field record temperature increase 200 o F or more due to fire or equivalent event(s). In addition to geographical scope, here the query parameters also specify event type and the magni- tude(s) of source(s) or event’s effect (e.g., temperature, light, etc.) of interest. Existing querying approaches start with disseminating the query by flooding within the specified geographic scope and then use in-network aggregation for counting. When each node and/or its descendents satisfy the query, the node reports the accumulated count to its parent node. Here, initial flooding causes significant energy dissipation. The developed Count algorithm using PBS leverages the geometry of event’s effect that corresponds to the query parameter to reduce the search overhead where the required event source(s) is not available. Sum and Average queries are similar to Count query. In these cases, in addition to counting the nodes they also aggregate sensor readings where the reading satisfies given query parameter(s). In this paper, we only describe the Count algorithm. Consider a query about an event of type E having magnitude X or more. Estimate the approximate geometry of event having magnitude X using Equation (5.1). Here, 66 we consider an obstacle free environment for the diffusion of event’s effect, which will be relaxed later. Assume that the event’s effect diffuses up to d x and beyond that the magnitude of the events effect drops below the minimum sensitivity of sensor node, i.e., m. Now, the steps of Count algorithm are as follows: 1) Establish a virtual grid of cells where the area of each cell is d x √ 2×d x √ 2, except the edge cells. 2) Virtual querier, Q v , uses information probingas described in Section 5.1 to findthe existence of information gradient within its corresponding cell. (a) If theprobingfindsinformation gradient inthe cell, Q v disseminatesthe query in the cell as described in Section 5.1 to find source(s) having magnitude X or more. (b) Otherwise, Q v skips the query dissemination in that cell. PBS continues this step for all remaining cells in the sensor field. In step (2a), we consider only sources having magnitude X or more. However, sources close to Q v with magnitude less than X may result sufficient information gradient for probing and Q v triggers query dissemination in that cell. In such a scenario, the query dissemination using scoped-flooding approach causes some extra overhead, while the in- formation gradient-based query dissemination approach stops query forwarding after few steps. Since, information gradient-based query dissemination approach is unable to im- prove information level to X. Inthepresenceofobstacleswithinacell,informationgradientpatternforthediffusion may be different among nearby nodes. Through local collaboration, nodes can divide the 67 Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v E 1 E 2 E 3 Figure 5.3: E 1 ,E 2 and E 3 are three events of same type. The magnitude of E 1 and E 2 are X or more, while E 3 is much smaller. Here, small dots represent sensor nodes. corresponding grid cell(s) to obtain proper diffusion pattern within each portion of the cell. 5.3.2 Aggregate query - Max and Min Ina sensorfield, aggregate queryMax findsa nodethat recordsthemaximum magnitude of an event’s effect. Existing approaches collect data from all nodes and the maximum is identified at root node i.e., sink. To reduce the amount of data flow, intermediate nodes suppress non-promising responses. However, flooding-based query dissemination and collecting reply through tree (based on child-parent relationship) cause significant transmission overhead. Using PBS architecture, we develop a new Max algorithm that reduces energy overhead significantly in most cases. Consider a query to find the maximum magnitude, say M, of an event of type E. Assume, M x is the maximum sensing limit of a given node for the event of type E. Now, the steps to determineM are as follows: 68 Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v Q v E 1 E 2 E 3 d x 2 b c e o m l g f k n a d j i h Figure 5.4: E 1 ,E 2 and E 3 are three events of same type, where E 3 > E 2 > E 1 . Cell’s number,a,b,c,...,oindicatestheorderofvisit. Scopedfloodingisusedincellaandthen b. Information gradient is perceived in cell b that determines the size of cell c. Again, cell e and g have more powerful event sources, so cell size is increased. For h,i, and j cells, centers are already visited, so Q v is moved diagonally to first unvisited node for those cells. Here, the result of Max query is the magnitude of E 3 . 1) Determine the initial value of M using scoped-flooding within the virtual grid cell that corresponds to M x . Assume, d x √ 2×d x √ 2 is the area of the cell according to Equation (5.1). Say, M 1 is the maximum information gradient within the cell and M 1 ≤ M x . Thus, the initial value of M is M 1 . Now, assume that the area of a virtualgridcellthatcorrespondstothecurrentvalueofM(i.e.,M 1 )isd 1 √ 2×d 1 √ 2 according to Equation (5.1), where d 1 ≤d x . This initialization step continues until some required information gradient is per- ceived. 2) This step is similar to the step (2) of the Count algorithm, except the current cell area is determined by the current value of M. Here, M is non-decreasing as well as the area of the cell corresponding to M. Now, depending on the result of information probing, Q v has following two choices: 69 (a) If Q v perceives information gradient and according to Equation (5.1) the in- formation gradient is higher than the current value ofM within the boundary of currentcell, Q v disseminatesthequery. Usingscopefloodingor information gradient-based protocols, like RUGGED[16], Q v findsthemaximum value, say M c , within the current cell. Thus,M can be updated as M= max(M,M c ). This updatesM to M c , if M<M c . (b) Otherwise, Q v skips the query dissemination within the current cell. Continue these steps to cover the whole sensor field. In this algorithm, the query dissemination between Q v s is not simple due to the variabilityofcellsarea. Thealgorithmscansthesensorfieldhorizontallyfromlefttoright and right to left and so on while the area of cells may change. To avoid any gap between the cells of two consecutive horizontal scans, the starting position of new horizontal scan is determined by the smallest cell of the most recently completed horizontal scan as shown in Figure 5.4. This causes some overlapping cells and also the center node of a cell, the potential virtual querier, may be visited during the previous horizontal scan. In such a scenario, the query is forwarded diagonally further from the center within the cell until an unvisited node is found, which is the virtual querier, Q v of the cell. Finally, if no source exists in a sensor field, the overall algorithm becomes equivalent to multiple scoped-flooding at different parts of the sensor field to cover all nodes. 70 Usingthestepssimilartotheabovealgorithm,itisalsopossibletodesignanalgorithm to find an event source having minimum magnitude. 5.3.3 Combined Query Combined query consists of several sub-queries that are combined by conjunction oper- ator. In a multi-modal sensors field, the sub-queries are interested for different type of eventshavingdifferentmagnitudes. Also,thecorrespondingdiffusionpatternsmayfollow different diffusion laws. Consider a combined query consists of n sub-queries about n different type of events, sayE 1 ,E 2 ,...,E n havingmagnitudeX 1 ,X 2 ,...,X n . Assumethattheareaofvirtualcells corresponds to X 1 ,X 2 ,...,X n are A 1 ,A 2 ,...,A n respectively, where A i = 2d 2 x i , for i = 1,2,...,n,accordingtoEquation(5.1). Thus,possiblecellareasetA ={A 1 ,A 2 ,...,A n }. Now, using PBS architecture, the steps of combined query processing algorithm are as follows: 1) Set current cell area to min(A) and initiate information probing. This cell area allows Q v to perceive the presence of information gradient due to any remaining events of interest. 2) Depending on the result of probing, Q v chooses one of the following steps: (a) If probing finds information gradient, Q v disseminates the query within the cell to find node(s) that solves some unsolved sub-queries. (b) Otherwise, Q v skips the query dissemination in that cell. 71 3) Rebuild the set A of possible cells area, based on remaining sub-queries. Now, if A =φ, thequeryissuccessfulandsendthereplytothequerier. Ontheother hand, if A 6= φ and the sensor field is fully visited, the query is unsuccessful. Finally, if A6=φ and the sensor field is not fully visited, then continue from step(1). Here the area of cells are also variable. Thus, the query dissemination between Q v s is similar to Max algorithm described in Section 5.3.2. 5.4 Analysis of the Algorithm In this section, we present simple analysis to highlight the energy efficiency of PBS ar- chitecture to process the query processing algorithms mentioned in Section 5.3. 5.4.1 Assumptions Consider a rectangular 2-D sensor field with uniformly distributed N nodes. Also, con- sider average neighborhood size is n b . Thus, the energy overhead of information probing per virtual querier, Q v , equals C p =n b +1. Here, the neighbors reply the broadcast of Q v . The collection of information about required event from neighbors in addition to Q v reduces the effect of environmental noise as well as the distortion of diffusion. 72 5.4.2 Count Query Consider a Count query about an event of type E having magnitude X or more. Now, use Equation (5.1) to find the area of cell and assume that there are x cells within a given sensor field. For simplicity of analysis, assume that the area of all cells are equal including edge cells. Thus, each cell has n c = N x nodes and there are √ n c nodes on each side of a cell. Let m be the number of sources in the sensor field having magnitude X or more. For simplicity, assume that the sensor nodes with reading X or more for an event are located in a same cell. Now, if m = 1, the probability that a cell does not have any source is (1− 1 x ). Thus, for m sources, the probability of findingat least one source in a cell equals P e = 1− 1− 1 x m ≈ 1−e − m x . Here, 1− 1 x m is the probability that there is no source in the cell. To assess the worst-case energy overhead, assume that each Q v uses scoped flooding to find the required node(s) in a cell where sensor reading is X or more. Thus, the query spray i.e., dissemination overhead per cell equals C s =n c = N x , since, nodes are uniformly distributed. Therefore, the energy overhead per cell equals T c =P e (C p +C s )+(1−P e )C p . 73 Here, the first part computes the overhead in the presence of source(s), while the other part determines the overhead if no source is available in the cell. Q v susegeographic routing protocol (a multi-hop routingprotocol) to routethe query between them. Since, all cells are square and identical, so the distance between two consecutive nodes is approximately equal to the length of cell’s side. Thus, the number of transmissions require to route the query among Q v s equals T gr = (x−1) √ n c = (x−1) r N x . Thus, the total energy overhead to process the Count query equals T =xT c +T gr , ≈x(n b +1)+N 1−e − m x +(x−1) r N x . (5.2) This equation of query processing overhead captures the impacts of both the number of sources in the sensor field and the number of cells, which depends on given query parameter. Consider a sensor field of N = 1000 sensor nodes where the average neighborhood size, n b = 6. For this sensor field, Figure 5.5(a) shows the query processing’s overhead where the number of cells and sources vary between 1 to 100 and 1 to 20 respectively. Figure 5.5(a) shows that for a fixed number of sources, initially the query-processing overhead reduces with the increase of the number of cells i.e., querying for smaller values. Since, the query spray overhead is higher for larger cells. Further, with the increase of 74 20 40 60 80 100 5 10 15 20 400 600 800 1000 1200 5 10 15 20 Number of sources Number of grid cells Overhead (transmissions) (a) Overhead for various number of sources and cells. 0 5 10 15 20 5 10 15 20 25 30 35 40 200. 400. 600. 800. 1000. Number of sources Optimal number of cells Overhead (transmission) (b) Optimal number of cells (shown by boxes) and correspondingoverhead(shownbydots)fornumber of sources. Figure 5.5: Count query processing overhead for various number of sources and cells in the sensor field. number cells in a sensor field, information-probing overhead increases, but at a slower rate. The minimum overhead and the corresponding optimal number of cells are shown in Figure 5.5(b) for the different number of sources in a sensor field. 5.4.3 Max Query In the absence of events in a sensor field, the Max query algorithm performs multiple scoped flooding as discussed in Section 5.3.2. Now, considering M x is the maximum sensing limit and using Equation (5.1) to find the size of grid cell, assume that there are x cells within a given sensor field. Therefore, the overhead of the Max query-processing algorithm equals T noevent =N +(x−1) r N x . Here, the first term is combined flooding overhead and the second term is query routing overhead between Q v s. This overhead is larger than usual flooding based approach. 75 Theoverheadofthealgorithmincreasesfurtherifinformationgradientisfoundduring initialization and later steps while the current maximum, M, increases at every step of queryprocessing. Since, thiscauses probingoverhead in smaller cells in addition to query dissemination overhead. However, due to horizontal scans of the algorithm, this scenario is very unlikely to occur. 5.4.4 Combined Query ConsideraCombined queryhasnsub-queriesaboutndifferenttypeofeventsE 1 ,E 2 ,...,E n havingmagnitudeX 1 ,X 2 ,...,X n respectively. Forsimplicityofanalysis,assumethatthe area of cells corresponds to X 1 ,X 2 ,...,X n are same and there are x cells within the sen- sor field. According to Equation (5.1), if α i 6= α j for i6= j, then X i 6= X j for events E i and E j . Let m 1 ,m 2 ,...,m n be the number of events of type E 1 ,E 2 ,...,E n having required magnitudes. Consideringtheevents areindependentanduniformlydistributedin agiven sensor field, the probability to find all events in a cell equals p =p 1 p 2 ...p n = n Y i=1 1−e − m i x . Here, p i = 1− 1− 1 x m i = 1−e − m i x is the probability that the event E i is available in a cell. 76 Now, p changes (i.e., increases) after finding each event due to probing and spray in cells. Thus, the average overhead of information probing can be expressed as T pavg ≤ (n b +1) 1 p = n b +1 Q n i=1 1−e −m i x . Here, (n b +1) is the overhead of each probing and 1 p is the expected number of cells required to probe. In the worst case, all cells are required to probe. Thus, in this scenario, the worst case overhead of information probing equals T pw = (n b +1)x. For query spray i.e., dissemination, actual overhead dependson the location of events in the sensor field i.e., cells. Consider query spray is used for all 1 p cells. Thus, using scoped-floodingforqueryspray,theaveragecaseoverheadofqueryspraycanbeexpressed as T savg ≤ 1 p N x , since, the sensor field hasN nodesandxcells. Therefore, the total average case overhead of PBS architecture to process combined query can be expressed as T avg ≤ n b +1 Q n i=1 1−e −m i x + 1 p N x +(x−1) r N x . 77 Here, the third term isthe overhead of geographic routingbetween Q v s similar to Section 5.4.2. In the worst case scenario, n events will be located in different cells. Thus, using scoped-flooding for query spray, the worst-case overhead of query spray equals T sw =n N x . Therefore, in the worst case, PBS architecture will be energy efficient over flooding- based approach for combined query processing, if (n b +1)x+n N x +(x−1) r N x ≤N. Here, we assume the overhead of flooding-based approach is N. 5.5 Simulations and Performance We evaluate the performance of PBS architecture for proposed query processing algo- rithms through extensive simulations and consider following performance metrics: 1) Overhead in terms of energy dissipation is the average number of transmissions required to process a query. 2) Success ratio is the ratio of obtained value through query algorithm over actual value. This metric is used for Count and Max queries. 3) Absolute success probability is the fraction of total queries where the obtained value equals the actual value. 78 5.5.1 Simulation Model In our simulations, we use a 100ft×100ft uniform random grid with 10 4 sensor nodes placed at distance 1ft from each other. Except for the border nodes, each node is able to communicate with eight neighbors. For the simulations of Count and Max queries, we useempirical data set (Section 5.2) to emulate event source(s), wherethe exponentof the diffusionfunctioni.e.,αequals2.0. Forcombinedquery,wesimulatefivedifferenttypesof eventshavingαequals2.0,1.9,1.8,1.7and1.6. Forthedistortionofinformationdiffusion, we use Degree of Irregularity (DOI) and Weibull distribution with shape parameter 1.13 and scale parameter 0.28 similar to [50]. Both actual event(s) and small noisy events are uniformly distributed in sensor field. Here,smalleventsareunabletosolvequeries. Also, considerlossywirelesslinksandARQ is used only for information-probing. For query spray, both scope-flooding and informa- tion gradient-based routing are used. Information gradient-based routing as specified in [16] uses a probabilistic diffusion function for probabilistic forwardingto avoid local max- ima. The probabilistic diffusion function is an inverse function hop count with exponent β, i.e., p j =f(j) = 1 j β , where j is the hop count in the gradient region. The performance of query processing using PBS depends on information probing and query spray. For query spray, scoped flooding is more robust, but causes more energy overhead than information gradient-based routing. Thus, the robustness and energy efficiency achieved using scoped flooding mainly represents the effectiveness of probing. 79 5.5.2 Count Query In our simulations, the success ratio of Count query is over 99% using scoped flooding as shown in Fig.5.6(a). For small query value, occasionally probing may fail due to noise 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 20 30 40 50 60 70 80 90 100 Avg. Success Ratio Query Value Events = 1(SF) Events = 2(SF) Events = 3(SF) Events = 4(SF) Events = 1(IR) Events = 2(IR) Events = 3(IR) Events = 4(IR) (a) Success ratio. 0 0.2 0.4 0.6 0.8 1 20 30 40 50 60 70 80 90 100 Absolute Success Probability Query Value Events = 1(SF) Events = 2(SF) Events = 3(SF) Events = 4(SF) Events = 1(IR) Events = 2(IR) Events = 3(IR) Events = 4(IR) (b) Absolute success probability. Figure 5.6: Count query using Scoped-flooding (SF) and Information gradient-based routing (IR) with β = 0.7. Here, DOI = 0.05 and Pr(link loss) = 0.1. 0 0.2 0.4 0.6 0.8 1 20 30 40 50 60 70 80 90 100 Avg. Number transmit per node Query value Events = 0(SF) Events = 0(IR) (a) No event. 0 0.2 0.4 0.6 0.8 1 20 30 40 50 60 70 80 90 100 Avg. Number transmit per node Query value Events = 1(SF) Events = 2(SF) Events = 3(SF) Events = 4(SF) Events = 1(IR) Events = 2(IR) Events = 3(IR) Events = 4(IR) (b) One or more events. Figure 5.7: (Normalized) Overhead of Count query using scoped-flooding (SF) and In- formation gradient-based routing (IR) with β = 0.7. Here, DOI = 0.05 and Pr(link loss) = 0.1. (i.e., higher DOI). Theprobing quality can be improved further by collecting information fromneighborsmorethanonehopawayfromvirtualquerier. Usinginformationgradient- based routing, the success ratio drops for large query values and in the presence of more 80 events. Cell area is large for large query values; so gradient-based routing may unable to find all nodes that satisfy the query in the presence of noise and lossy wireless links. However, using smaller value of β as shown in [16], the success ratio can be improved further. Similarly, Fig.5.6(b) showsthat absolute success probability is high whenscoped flooding is used for spray. In the presence of no events, the overhead of PBS is only 20% as shown in Fig.5.7(a). However, we notice in Fig.5.7(b) that with the increase of number of events, the overhead increases as more nodes can satisfy the query and require more transmissions to find them. For scoped flooding, in addition to flooding within a bounded region, it is unable to stop query forwarding if probing result is false positive and causes more overhead. 5.5.3 Max Query Fig.5.8(a) shows the success ratio of Max query is over 99% i.e., the obtained maximum is very close to the actual maximum in our simulations even in the presence of lossy 0.95 0.96 0.97 0.98 0.99 1 1 2 3 4 5 Avg. Success Ratio Number of Events DOI = 0.00(SF) DOI = 0.01(SF) DOI = 0.02(SF) DOI = 0.03(SF) DOI = 0.04(SF) DOI = 0.00(IR) DOI = 0.01(IR) DOI = 0.02(IR) DOI = 0.03(IR) DOI = 0.04(IR) (a) Success ratio. 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 Avg. Number transmit per node Number of Events DOI = 0.00(SF) DOI = 0.01(SF) DOI = 0.02(SF) DOI = 0.03(SF) DOI = 0.04(SF) DOI = 0.00(IR) DOI = 0.01(IR) DOI = 0.02(IR) DOI = 0.03(IR) DOI = 0.04(IR) (b) Overhead per node. Figure 5.8: Max query using Scoped-flooding (SF) and Information gradient-based rout- ing (IR) with β = 0.7. Here, Pr(link loss) = 0.1. wireless links and distortion. We notice that the overhead of query processing decreases 81 as the number of events increases as shown in Fig.5.8(b). Because, less number of scoped flooding is required to obtain the initial Max value. Also, at the early stages of query processing, Max value becomes high, so probing helps to avoid query spray and further improves the overhead. 5.5.4 Combined Query 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 Absolute Success Probability Number of Sub-queries/query 1 event/type,Eq.(SF) 1 event/type,Eq.(IR) 2 events/type,Eq.(SF) 2 events/type,Eq.(IR) 1 event/type,L.inc.(SF) 1 event/type,L.inc.(IR) 2 events/type,L.inc.(SF) 2 events/type,L.inc.(IR) 1 event/type,Exp.inc.(SF) 1 event/type,Exp.inc.(IR) 2 events/type,Exp.inc.(SF) 2 events/type,Exp.inc.(IR) (a) Absolute success probability. 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 Avg. Number transmit per node Number of Sub-queries/query 1 event/type,Eq.(SF) 1 event/type,Eq.(IR) 2 events/type,Eq.(SF) 2 events/type,Eq.(IR) 1 event/type,L.inc.(SF) 1 event/type,L.inc.(IR) 2 events/type,L.inc.(SF) 2 events/type,L.inc.(IR) 1 event/type,Exp.inc.(SF) 1 event/type,Exp.inc.(IR) 2 events/type,Exp.inc.(SF) 2 events/type,Exp.inc.(IR) (b) Overhead per node. Figure 5.9: Combined query using Scoped-flooding (SF) and Information gradient-based routing (IR) with β = 0.7. Here, DOI = 0.05 and Pr(link loss) = 0.1. We consider three sets of combined queries, where the area of cells corresponds to sub-queries are (1) equal (i.e., 1 : 1 : 1 : ...), (2) linearly increasing (i.e., 1 : 2 : 3 : ...) and (3) exponentially increasing (1 : 2 : 4 : ...). The success probabilities in all cases are over 99% as shown in Fig.5.9(a) even in the presence of diffusion distortion and lossy wireless links. Also, the query processing overhead using gradient-based routing is below 50% as shown in Fig.5.9(b). However, as the area corresponds to sub-queries increases exponentially, the overhead of using scoped flooding for query spray, i.e., dissemination increases sharply due to large cell area. 82 5.6 Conclusion In this study, we have presented a novel architecture, Probe-before-Spray (PBS) for in- formation gradient-based active query processing. This reduces search overhead by ex- ploiting geographical information and the diffusion spread to form resizable virtual cells within a query specified region. Based on PBS, we develop query-processing algorithms for aggregate queries and combined query. We analyze the performance of PBS using simple analytical models. Also, through simulations, we found that Count, Max and Combined query algorithms based on PBS reduces search overhead over 40%, 30% and 50% respectively over usual flooding based approach while attaining accuracy over 99%. In addition, the proposed architecture can be easily augmented with both Directed- Diffusion[25] and TinyDB[35] or Cougar[46] to reduce flooding overhead for energy effi- cient in-network query processing. Further, considering each virtual querier as a cluster head, PBS can also be used for hierarchical sensor networks. 83 Chapter 6 TABS - Link Loss Tolerant Data Routing Protocol Routing techniques of multi-hop wireless sensor networks typically choose the best se- quence of nodes between the source and sink to forward packets. In previous routing approaches, each sender forwards a packet to a node, called parent (of the sender), that is one hop closer to the sink. However, several problems arise from the use of low-cost low-powerradios used in wireless sensor nodes, such as asymmetric, unidirectional and unreliable links. To attempt to address these problems several techniques have been pro- posed including blacklisting, link reliability metrics, and neighbor discovery approaches. Blacklisting limits the routing options and may cause network partitions. Link reliability metrics (e.g., ETX[9]) allow the routing protocols to consider cumulative link reliability over paths to find the most reliable path between the source and sink. The neighbor discovery approach sidesteps such unreliable links through periodic beacons. However, link quality as well as neighborhood change over time due to environmental dynamics, transient noise or node(s) failure. Thus, all such approaches require periodic link quality 84 estimation or neighbor discovery that causes extra overhead. This overhead is propor- tional to the size of a network. Moreover, data rate in many (low-rate) sensor networks may not be high enough to amortize this extra overhead. In contrast, exploiting wireless broadcast advantage[28], a packet can be sent through multiple relays to improve the reliability of packet delivery. Also, forwarding a packet through multiple paths eliminates the need to avoid unreliable asymmetric and unidirec- tional links. Inaddition, this scheme exploits such linksopportunistically whenavailable. However, sendingapacket throughmultiplerelaysisnotanenergyefficientsolutionwhen the wireless link quality is good enough to reach the parent of a sender using traditional routing. This paper describes TABS (Try Ancestors Before Spreading) that combines the ben- efits of wireless broadcast advantage with the traditional data routing protocol. TABS broadcasts each packet. After successfully decoding a packet, a node decides to further broadcast the packet if the node is either the parent of the sender or received the packet through an opportunistic link that allows the packet to progress more than a limit called “minimum progress limit”. Unliketheschemesthatexploit wirelessbroadcastadvantage, TABS initially deters a sender from sending a packet through multiple relays by setting a high value for “minimum progress limit”. Thus, TABS forwards a packet using tradi- tional routing (based on child parent relationship) as well as exploits long opportunistic links when available. The major challenge in realizing TABS is ensuring reliable data delivery without proactive link quality measurements. When a sender fails to receive implicit or explicit acknowledgment(ACK) after the broadcast of a packet, TABS considers that the link 85 quality between the sender and its parent is poor. After identifying a poor quality link, TABS exploits some gains of wireless broadcast advantage on low-power radio and grad- ually lowers the “minimum progress limit” for that packet before each retransmission of the packet. This permits the sender to forward the packet through multiple relays that helps to improve the success probability of packet delivery. Another key challenge is to eliminate unnecessary paths. TABS uses implicit and explicit ACK mechanisms to suppress extra paths. The current sender of a forwarding path stops the broadcast (or rebroadcast) of a packet when it overhears a broadcast (of the same packet or explicit ACK) from a node that is closer to the sink. TABS can be used to send reply or data packets to sink (or querier). On-demand query applications usually disseminate a query by flooding (or a variant of flooding) and establish a gradually decreasing hop-count gradient towards the sink. TABS exploits this hop-count gradient to deliver a reply to querier (i.e., sink). For static routing tree based data collection applications, the level of a node of the tree can be used by TABS to forward data packets towards the root (i.e., sink) of the tree. WeevaluatetheperformanceofTABSimplementationona56-nodetest-bed[1](avail- ableatUniversityofSouthernCalifornia), wherethenodesaretelosbmoteequippedwith 802.15.4 radio. (TABS shows over 98% success rate to deliver packet between all pair of nodes.) Also, we compare the performance of TABS with that of traditional rout- ing, periodic link quality estimation based routing, and the routing that exploits wireless broadcast advantages. 86 6.1 Motivation Most of the existing routing protocols use proactive mechanisms, like blacklisting, rout- ing metric for consistent high reliability paths. In addition to periodic overhead associ- ated with these mechanisms, each one has some limitations or side effects. Using these mechanisms, routing protocols are unable to adapt themselves to network dynamics in- stantaneously and are required to wait for next proactive link quality estimation phase. Also, blacklisting or best neighbors selection based on routing metrics may limit routing options. Further, at a sparse deployment, blacklisting may cause network partitions. On the other hand, in addition to the high transmission overhead of wireless broadcast ad- vantage based multi-path routing protocols, they are unable to attain high reliability due to non-zero loss probability of wireless links. Basedontheabovelimitationsofexistingmechanisms,theinitialmotivation ofTABS is to avoid proactive phase and to initiate recovery when it suffers loss due to poor link quality. Thus, TABS can adapt instantaneously with the network dynamics. Further, considering the energy constraints of wireless sensor networks, the design of TABS is also motivated to reduce the overhead of data routing while attains high packet delivery successprobabilitytosink. Weidentifythescopeofreducingtheoverheadusingfollowing simple overhead model of data routing. Consider a wireless sensor network with N nodes. Assume that during each periodic link quality estimation, each node transmits x number of beacons. Also, assume that one transmission per node is required to broadcast the statistics of all incoming beacons. Thus, the overhead of each periodic link quality estimation is (x+1)N. 87 Now, consider a source has infinite number of data packets to send to sink and after sending every t packets, the network needs to re-estimate the quality of links using bea- cons. Also, assume that T fwd denotes the number of transmissions are required to deliver a packet from a source to sink. Therefore, the routing overhead per packet equals T = (x+1)N +T fwd t = (x+1)Nf +T fwd . (6.1) Here, f = 1 t , the frequency of periodic link quality estimation. The term (x + 1)Nf represents the periodic link quality estimation overhead per packet and depends on N and f. InEquation(6.1),iff = 0,aprotocolcanusemultipleretransmissionsperlinkwithout periodic link quality estimation. However, the packet delivery success probability of this protocol is significantly low (below 70% even at night in our testbed experiments) even with three retransmissions per link. Also, for f = 0, a protocol can exploit wireless broadcast advantage to improve the packet delivery success probability. However, T fwd is significantly high for this scheme, since a packet is forwarded through multiple relays. Assume, T fwd of routing protocols with and without periodic link quality estimation are denoted by T f le and T f respectively. Thus, the elimination of periodic link quality estimation is practical if a protocol attains high packet delivery success probability as well as its overhead per packet, T f satisfies T f −T f le ≤ (x+1)Nf. (6.2) 88 The design of TABS combines the benefits of both wireless broadcast advantage and traditional hop-by-hop retransmission based data routing scheme while eliminates peri- odic link quality estimation to reduce overhead and satisfy Equation(6.2). 6.2 Protocol Overview A simplified version of TABS might work as follows. A source node has a packet to send to a sink node. Between the source and sink, intermediate nodes also use this protocol and each node maintains the estimated cost of sending a packet to sink. To form this cost estimation, each nodecan store hop count while the sinkdisseminates a queryor use a local exchange of cost information similar to link-state routing protocol. The source broadcaststhepacketanddependingonwirelesslinkqualitysomesub-setofnodesreceive the packet. A receiver node further broadcasts the packet if the node is either the parent of the sender, or received the packet through an opportunistic link that allows the packet to progress more than the “minimum progress limit”. Also, the broadcast of the packet is considered as an implicit ACK for the packet to all nodes those have estimated cost to sink higher than the current sender. This process continues until the sink receives the packet. 6.3 Protocol Design TABS design faces threekey challenges. First, each noderequiresa mechanism to control the “minimum progress limit” that governs the use of opportunistic links. Each node should locally compute the current value of this limit. 89 Second, TABS avoids proactive linkqualityestimation. Thus,asendershouldinitiate a recovery mechanism when the link quality between the sender and its parent is poor. This is important to ensure the progress of a packet towards the sink without proactive link quality estimation. Third,thereisapenaltyofusingopportunisticlinksandwirelessbroadcastadvantage, especially in a large dense network. Since, a large number of alternative routes may be initiated, which increases total overhead. Thus,TABS needstosuppressunnecessaryand unpromising paths. 6.3.1 Node State Each node maintains sequence number and state for each active packet. In the network, each packet has unique sequence number that prevents duplicate packets. The state of a node for a given packet can be PKT RCVD, PKT FWDED and PKT FWD ACKED. After receiving a new packet, the node stores the packet’s sequence number and marks the corresponding state as PKT RCVD. After the broadcast of the packet, the state is changedtoPKT FWDED.IfthenodereceivesanimplicitorexplicitACKforthatpacket before or after the broadcast, the state is changed to PKT FWD ACKED. Each node also maintains cost to sink information (usually hop count) and parent node id. If an application uses dynamic topology, each sequence number is associated with the corresponding cost to sink information (usually hop count) and parent node id. For an application that uses a static topology, only one pair of cost to sink information and parent node id is used for all sequence numbers. Further, each node buffers the packet temporary, since the retransmission of the packet may be required. 90 After the broadcast of a packet, a node associates a retransmission timer with the sequence number of the packet. If the timer expires before receiving an ACK for the packet, the node rebroadcasts the packet. 6.3.2 Packet Format TABS packet has few extra fields in addition to usual fields for payload, data length and checksum. Each packet has a field, SeqNo, for unique sequence number to identify duplicate and old packet. There are two fields to identify the sender and destination of the packet, called SenderAddr and DestAddr respectively. Also, the cost to sink from the sender is indicated by a field, Cost. The RetryCount indicates the number of retransmissions by the sender. Finally, DataOrigin indicates the source node of data (or reply). 6.3.3 Packet Forwarding A source prepares a data (or reply) packet with available data (or the reply of a query) and sets the values of SeqNo, Cost (to sink from the source) and DataOrigin. Here, the DataOrigin is the source itself. Also, the source uses its address and its parent address to set SenderAddr and DestAddr respectively. RetryCount is zero for first broadcast. Finally, the source broadcasts the packet. A sender node, other than the source, uses its address, its parent address, and its cost to sink to update SenderAddr, DestAddr, and Cost fields respectively of the received packet. Also, the sender resets the RetryCount to zero before first broadcast. 91 After the broadcast, each sender starts a retransmission timer for the packet. Here, the timer interval depends on the type of radio and data rate. For our experiments, the interval of retransmission timer is 100ms. 6.3.4 Packet Reception A node parses the header of every successfully decoded packet. In TABS, a node can accept a new packet either through traditional routing or opportunistic links. According to traditional routing, if the nodes address matches with the DestAddr of a packet, the node accepts the packet for further broadcast. The node schedules a timer to broadcast the packet, which get abandoned if the node receives an implicit or explicit ACK for that packet from a node having lower cost to sink. “Minimum progress limit” controls theacceptance of apacket throughan opportunis- tic link for all nodes except the sink. When the address of a node and the DestAddr of a packet are different, the node computes the progress of the packet towards the sink using the cost to sinkinformation of the packet and the node. If the progressis higher than the “minimum progress limit”, the node accepts the packet for further broadcast, while the sink always accepts a data (or reply) packet. To allow maximum progress of the packet, in this case, the node broadcasts the packet immediately. Packet reception at sink is slightly different. After receiving a new data (or reply) packet, sink broadcast the packet immediately to send implicit ACK to its neighbors. However, sink does not schedule any retransmission timer for further rebroadcast. 92 6.3.5 Handling poor quality links or link failure After the broadcast of a packet, a sender expects an implicit or explicit ACK before the expiration of a timer, called Retransmission timer. Without an ACK, the broadcast is considered failure and TABS uses two mechanisms to recover from this failure. Similar to traditional routing protocols, the sender retransmits the same packet. After each retransmission,thesenderwaitsforanACK(implicitorexplicit)usingtheretransmission timer. In addition, TABS adaptively changes “minimum progress limit” with broadcast fail- ures, where this limit is an inverse function of retry count (the RetryCount of packet). The sender increases the RetryCount of the packet before each retransmission. Other nodes compute the new value of “minimum progress limit” for that packet using the RetryCount after decoding the packet successfully. This gradual relaxation of the “min- imum progress limit” allows forwarding the packet through multiple relays i.e., exploits wireless broadcast advantage. Retransmissionshelptoovercome thefailuresduetopoorquality links,whilerelaxing the value of “minimum progress limit” helps to overcome most link failures. Unidirec- tionalorhighlyasymmetriclinksmaycauselinkfailures. InourimplementationofTABS, the minimum value of “minimum progress limit” is zero. 6.3.6 Minimum Progress Limit “Minimum progress limit” is an inverse function of the retry count (RetryCount) of a packet. This limit controls the use of opportunistic links as well as the use of wireless broadcast advantage to recover from link failures. 93 We design this function based on the observation of [20], where authors noticed that blacklisting orlinkqualityestimation (e.g., ETXmetric)withuptothreeretransmissions is a robust choice to achieve high (over 98%) packet delivery success probability. We also observe similar results for those two approaches at the test-bed [1], which is used to conduct all experiments of this paper. We design two different functions for TABS to change the value of “minimum progress limit”. The first one is less aggressive (F LA ) to recover from link failures, while the other one is aggressive (F A ). Inourimplementation, thehopcountofanodeisconsideredasthecosttosinkofthat node and both functions change the value of “minimum progress limit” in terms of hop count for a packet gradually from three to zero. The higher values (three and two) of this limitallowsforwardingthepacket throughlongopportunisticlinksinadditiontosender’s parent. However, when the limit’s value is one or zero, the packet can be forwarded also throughthesiblingsof thesender’sparentor thesiblingsofthesender respectively. Here, TABS exploits wireless broadcast advantage to recover from link failures. 0 1 2 3 0 1 2 3 4 5 6 Minimum progress limit RetryCount F LA . F A Figure 6.1: Characteristics of the functions that are used to compute “minimum progress limit” using RetryCount of a packet. 94 Figure6.1showsthegradualchangeofthe“minimumprogresslimit”fortwodifferent functions(F LA andF A ), while the RetryCount of a packet increases. Here, the aggressive function, F A reduces the limit’s value to one after two retransmissions, while the other one, F LA does the same after three retransmissions. The aggressive function may be suitable for noisy or dynamic environment. 6.3.7 Suppress Extra Forwarding Paths Packetforwardingthroughopportunisticlinksaswellasthroughmultiplerelaystorecover from link failures generate some extra paths. TABS uses implicit and explicit ACKs to suppress unnecessary paths. Implicit ACK is the broadcast of a same packet by another node. While further receiving a packet from a node having higher cost to sink triggers the broadcast of an explicit ACK from a node that has state PKT FWDED or PKT FWD ACKED for the packet. The current sender of a forwarding path stops transmission or retransmission of a packet when it overhears an implicit or explicit ACK for the packet from a node that is closer to the sink. Also, current sender nodes as well as potential sender nodes change their state for the packet to PKT FWD ACKED after receiving either ACK when the ACK sender is closer to the sink than the ACK receiver in terms of cost. This helps to avoid the generation of unnecessary paths. 95 Send explicit ACK PKT_FWD_ACKED PKT_FWDED Duplicate packet/ Packet received AND (Node’s address equals packet’s DestAddr OR Progress >= "minimum progress limit") Implicit or Explicit ACK Implicit or Explicit ACK Implicit or Explicit ACK Send explicit ACK Retransmit timer expires/ Rebroadcast Timer expire / Broadcast Initial State PKT_RCVD Duplicate packet/ Figure 6.2: State transition of a node when it receives a new data packet. 6.3.8 State Transition of Node Fig. 6.2 shows the state transition of a node based on above description of TABS for a given data packet with a new sequence number. After receiving the packet at the ini- tial state, the node’s state changes to PKT RCVD. After the broadcast of the packet, the node moves to PKT FWDED state. While the node in this state, the node retrans- mits the packet if the retransmission timer expires. The node moves to the final state, PKT FWD ACKED, which is an absorbing state, from any other states if the node re- ceives an ACK (implicit or explicit) for that packet. Also, the node sends explicit ACK if it receives a duplicate packet from a node having higher cost to sink, while the node’s state is either PKT FWDED or PKT FWD ACKED. 6.4 Evaluation 6.4.1 Implementation and Evaluation Methodology We have implemented all the details of TABS described in Section 6.3 in TinyOS 1.1. The implementation of TABS is about 500 lines of code. In addition, three existing data 96 Figure 6.3: Layout of testbed with 56 nodes. routing protocols have been implemented using the same platform for performance com- parison with TABS. These three protocols are based on (1) Automatic Repeat Request, ARQ (using 3 retransmissions), (2) Multi-path using wireless broadcast advantage, and (3) ARQ (using 3 retransmissions) with periodic link quality estimation (using ETX[9] metric for link). Brief description of these protocols are given in Section 6.4.2. We evaluated our TABS implementation on the largest segment of tutornet[1], a 56- node indoor wireless sensor network test-bed deployed over 1125 square meters of a office floor as shown in Fig. 6.3. Also, we use the other segment of tutornet to the compare performanceof protocols on (approximate) linear deployment of nodes. Each sensor node oftutornetisaMoteiv Tmotewithan8MHzTexasInstrumentsMSP430microcontroller, 10KB RAM and a 2.4GHz 802.15.4 Chipcon wireless transceiver with a nominal bit rate 250Kbps. Also, the motes have a USB back channel that we use for logging trace. We established both dynamic and static topologies with 56 nodes of the test-bed. Using radio transmission power −21dBm, flooding based query dissemination is used to generate dynamictopology i.e., different routingtree wheresink(i.e., querier)is theroot. So, every reply i.e., data packet may follow different path between a source and sink. 97 Here, the hop count of the query, when it reaches a node, establishes the cost to sink information of the node. Static topology based experiment uses separate link quality measurement to establish a connected static routing tree among the nodes of the test- bed. So, every node has a fixed ancestor i.e., parent in a static routing tree. Here, The distance of a node from the sink in terms of hop count is the cost to sink information of the node. We ran each experiment at consistent time of day (having the interference of more than five wireless LANs based on IEEE 802.11b/g) and at night (low or no interference). The duration of each experiment was at least an hour. During the experiment, each nodelogged every packet transmission and reception. Also, each node counts the number of transmissions are used for data routing and corresponding ACK. Further, during the experiment, wecontrolled thetimesynchronization ofmotesstrictlyasneededbysending commands from PC to motes through USB back channel. We consider two metrics to evaluate and compare the performance of TABS with other protocols. The first metric is the “Success Probability” of packet delivery from a source to sink. We compute this as the total number of data (or reply) packets received at the sink divided by the total number of data (or reply) packets send from the source. The other metric is “Overhead” in terms of the average number of transmissions requires by all nodes to successfully deliver a data (or reply) packet. 6.4.2 Brief Description of Other Routing Protocols In this section, we briefly explain the three existing protocols that are used to compare with TABS. 98 6.4.3 ARQ-based Routing This protocol strictly uses routing tree to send a data packet from a source to sink. Using this simple protocol, a node forwards a data packet only to its parent, which is one hop closer to sink than the node (the current sender). After sending a packet, the node waits for an ACK using a retransmission timer. The node retransmits the packet when the timer expires. Also, a parent node sends explicit ACK to its child node, if it receives a duplicate packet from the child node. In our ARQ-based routing protocol implementation, a node can retransmit a packet maximum three times before dropping the packet. 6.4.4 Multi-Path Routing This protocol uses cost to sink information (hop count) of all nodes in a network and exploits wireless broadcast advantage. Using this protocol, a node broadcasts every data packet to send through multiple relays. Here, the packet includes a field to indicate the cost to sink information of the sender node. Any receiver that receives this packet and finds that its own cost is smaller than the cost information of the packet can forward the packet further, so long as the packet is not a duplicate. Before forwarding, the node uses its own cost information to update the cost information of the packet. No retransmission or ACK is used in this protocol. 99 6.4.5 ARQ-based Routing with Link Quality Estimation This protocol is similar to ARQ-based routing protocol described in Section 6.4.3. In addition, the protocol uses periodic link quality estimation among the nodes of a net- work and each node chooses best b n neighbors based on the link quality metric. In our implementation, we use ETX[9] metric to estimate link quality. Consider, d f and d r be the packet reception rate on a link in the forward direction and the reverse direction respectively. Then we compute the ETX metric of the link as ETX = 1 d f ×d r . ETX helps to avoid unidirectional and highly asymmetric links. Also, in this protocol, a node can receive a packet only from its best b n neighbors. 6.4.6 Results In this section, we present experimental results that validate the design of TABS and compare performance with three other existing protocols mentioned above. 6.4.6.1 Approximate Linear Deployment WeconsiderthreeapproximatelineardeploymentsasshowninFig6.4,wherenodedensity is highest in Line-1 (Fig. 6.4(a)) and lowest in Line-3 (Fig. 6.4(c)). Also, in each deployment, sink and source are node 60 and node 80 respectively. Further, the sink uses flooding to disseminate a query to the source. These deployments are used to evaluate 100 (a) Line-1. (b) Line-2. (c) Line-3. Figure 6.4: Linear deployment with different node densities over an office floor. the robustness of TABS while the node density changes as well as to understand the effectiveness of the design components of TABS. 1.0 0.8 0.6 0.4 0.2 0 Line-3 Line-2 Line-1 Success Probability ARQ Multi-path TABS with F LA TABS with F A (a) Success probability. 0 2 4 6 8 10 12 14 16 Line-3 Line-2 Line-1 Number transmit/packet delivery ARQ Multi-path TABS with F LA TABS with F A (b) Overhead. Figure 6.5: Protocols performance in linear networks with different node density. The number of routing options reduces as the node density decreases in a network. Thus, both the packet delivery success probability and the overhead of multi-path based routing protocol fall off as the network become sparse as shown in Fig. 6.5. However, surprisingly the packet delivery success probability of ARQ (hop-by-hop retransmission) 101 based protocol as shown in Fig. 6.5(a) improves as the node density decreases. At low- density deployment, the percentage of poor quality links is less; therefore, the probability that a query packet is forwarded through a poor quality link is also low. Thus, the query dissemination uses mostly good quality links to reach the source. Consequently, the data (i.e., reply) delivery success probability of ARQ based routing protocol improves in a sparse network. TABS effectively combines the benefits of both ARQ and multi-path approaches and achieves high packet delivery success probability in both dense and sparse networks (Fig. 6.5(a)). However, the overhead of TABS is slightly higher than the other two routing protocols. In terms of overhead as shown in Fig. 6.5(b), TABS with F A to control the “minimum progress limit” is suitable for dense networks, while F LA is appropriate for sparse networks. Since, in a dense network, TABS with F A finds an alternative path quickly and ACK mechanism suppressesnon-promising paths. On the other hand, sparse networks have limited alternative paths and ACK mechanism is also less effective due to low node density to suppress extra paths. 6.4.6.2 Dynamic Topology (Query-Reply Scenario) Thisexperimental setup usesnon-linearly deployed 56 nodesas shownin Fig. 6.3. For all experimentsusingthissetup,node4isusedasthesink. Consideringthenodedeployment pattern of the test-bed, we choose node 40 and node 52 as two different sources. It is importanttonoticethatnode40issurroundedbylessnumberofnodescomparedtonode 52. Also, using radio transceiver (CC2420) power -21dBm, we observe that the number of hops between the sink (node 4) and the source (node 40 or node 52) is the maximum. 102 Similar to previous experiment (Section 6.4.6.1), the sink disseminates each query to the source through flooding that forms different topologies among the nodes. We ran this experiment on multiple days in the early morning (between 2AM to 2:50AM)toensurelowinterference(since,atdaytime,ARQandmulti-pathbasedrouting protocols perform poorly without link quality estimation). Every five minutes interval, we compute the average success probability and overhead of data (i.e., reply) routing protocols, which is shown in Fig. 6.6. 0 0.2 0.4 0.6 0.8 1 2:50AM 2:40 2:30 2:20 2:10 Success probability ARQ Multi-path TABS with F la TABS with F a (a) Success probability (source id 52). 4 6 8 10 12 14 16 18 20 22 2:50AM 2:40 2:30 2:20 2:10 Number transmit/packet delivery ARQ Multi-path TABS with F la TABS with F a (b) Overhead (source id 52). 0 0.2 0.4 0.6 0.8 1 2:50AM 2:40 2:30 2:20 2:10 Success probability ARQ Multi-path TABS with F la TABS with F a (c) Success probability (source id 40). 4 6 8 10 12 14 16 18 20 22 2:50AM 2:40 2:30 2:20 2:10 Number transmit/packet delivery ARQ Multi-path TABS with F la TABS with F a (d) Overhead (source id 40). Figure 6.6: Protocols performance evaluation using dynamic topologies (query-reply sce- nario). Fig. 6.6 shows that TABS perfectly delivers data (i.e., reply) packet most of the time from both sources. Also, for both sources, the overhead of TABS is less than that of 103 multi-path based routing, but slightly higher than ARQ-based routing as shown in Fig. 6.6(b) and 6.6(d). Similar to previous experiment, the success probability of ARQ-based routingishigher thanthatof multi-path basedroutingforthesourcenode40asshownin Fig. 6.6(d). Since, only few nodessurroundnode40 that limits the numberof alternative paths. 6.4.6.3 ARQ-based Routing with Link Quality Estimation ThisexperimentusesETX[9] metric for periodiclinkquality estimation among the nodes of test-bed (Fig. 6.3). The frequency of link quality estimation is f per packet, where f equals 1 32 , 1 16 , 1 8 , 1 4 and 1 2 . After each estimation period, based on ETX metric, each node chooses maximum b n best neighbors, where b n equals 5 and 10. Similar to previous experiment (Section 6.4.6.2), the sink is node 4 and the sources are node 52 and node 40. Also, the sink disseminates a query through flooding and each node is allowed to forward the query packet to its b n best neighbors. Thus, each query reaches to source node using highqualitylinks. Finally, ARQ-basedroutingprotocolisusedtosendadata(i.e., reply) packet to the sink from the source. We ran this experiment on multiple weekdays between 11AM to 3PM to realize the effectiveness of periodic link quality estimation. We plot the success probability and overhead of ARQ-based routing for all combinations of f and b n in Fig. 6.7. Also, Fig. 6.7 shows the performance of TABS at similar time of weekdays without link quality estimation. The overhead of periodic link quality estimation is not shown in Fig. 6.7(b) and Fig. 6.7(d). 104 1.0 0.8 0.6 0.4 0.2 0 TABS f=1/2 f=1/4 f=1/8 f=1/16 f=1/32 Success probability ARQ with max. b n = 5 ARQ with max. b n = 10 TABS with F LA TABS with F A (a) Success probability (source id 52). 0 5 10 15 20 25 TABS f=1/2 f=1/4 f=1/8 f=1/16 f=1/32 Number transmit/packet delivery ARQ with max. b n = 5 ARQ with max. b n = 10 TABS with F LA TABS with F A (b) Overhead (source id 52). 1.0 0.8 0.6 0.4 0.2 0 TABS f=1/2 f=1/4 f=1/8 f=1/16 f=1/32 Success probability ARQ with max. b n = 5 ARQ with max. b n = 10 TABS with F LA TABS with F A (c) Success probability (source id 40). 0 5 10 15 20 25 TABS f=1/2 f=1/4 f=1/8 f=1/16 f=1/32 Number transmit/packet delivery ARQ with max. b n = 5 ARQ with max. b n = 10 TABS with F LA TABS with F A (d) Overhead (source id 40). Figure 6.7: Performance of ARQ-based routing with periodic link quality estimation (using ETX) and TABS routing between 11AM to 3PM of weekdays. The overhead of ARQ-based routing does not include the overhead of link quality estimation. The success probability of ARQ-based routing depends on both f (the frequency of periodic link quality estimation) and b n (the number of best neighbors) as shown in Fig. 6.7(a) and Fig. 6.7(c). With the increase of f, for both sources, the success probability increases when b n equals 5. However, for b n equals 10, the link quality between a node and its few neighbors may be poor. Also, from our experimental log file, we notice that several nodes do not have 10 neighbors. If query dissemination uses such poor quality links to reach the source, ARQ-based routing may fail to deliver data (i.e., reply) packet. 105 Thus, ARQ-based routing achieves higher success probability for b n equals 5 than b n equals 10 as shown in Fig. 6.7(a) and Fig. 6.7(c). On the other hand, TABS achieves similar or higher success probability without pe- riodic link quality estimation while its overhead is higher than the ARQ-based routing as shown in Fig. 6.7. However, if we consider the per packet overhead of periodic link quality estimation, TABS has much less overhead compared to ARQ-based routing with link quality estimation. The overhead of periodic link quality estimation depends on implementation and the size of a network in addition to f. 6.5 Conclusion In this study, we have presented a novel data routing protocol, TABS that combines the benefits of wireless broadcast advantage with traditional retransmission based routing. Thisprotocoleliminatestheneedofperiodiclinkqualityestimationorblacklisting. TABS is suitable for both dynamic and static topologies. TABS implementation on a test- bed having 56 nodes achieves over 98% success probability while link quality is highly dynamic. This protocol works for both sparse and dense deployment of nodes. Also, the performance of TABS implementation is similar to that of ETX metric based routing protocol. 106 Chapter 7 Contributions In this chapter, we briefly summarize the contributions of this research effort. 1) In this research, we first propose the use of natural information gradient of event’s effect for query dissemination in wireless sensor networks to avoid the proactive phase of preparing distributed information gradient repository. 2) We design an information gradient-based querying framework about events. To process users query about events, we design protocols for three basic tasks: (1) query dissemination, (2) query processing, and (3) reply. 3) We proposeanovel multiple-path greedyprotocol, RUGGED, to disseminate query over noisy natural information gradient. The protocol is fully distributed, reactive and energy efficient. Unlike other information gradient-based approaches, our pro- tocol is also able to forward the query when no information gradient is available. To control the instantiation of multiple paths, we use a probabilistic function based on simulated annealing concept. In addition to the protocol, we also design simple 107 filter usingknown diffusionlaws to reducereplyoverhead. Further, we develop sim- ple modelfor environmental noise. Throughsimulation, wefindthatthe protocol is robust enough to route query around sensor hole and achieves around 98% success rate. 4) Wedevelopaprobabilisticframeworktoanalyzetwomajorapproachesofquerydis- seminations: (1)single-path approachand(2)multiple-path approach. Weconsider two metrics, query success rate and energy overhead, to compare the approaches using regular grid topology for both ideal and lossy wireless link conditions. Also, we compare these two approaches through extensive simulations. 5) Exploiting geographical information and the area of the surrounding region around theeventfromwhichtheevent’seffectcanbeperceived(i.e.,thegeometryofevent’s effect), we develop an information gradient-based query processing methodology to reduce search overhead. Here, we introduce a virtual grid framework leveraging geographical information where the grid cell size is determined by the geometry of event’s effect. In addition, we develop algorithms for several aggregate queries (count, max/min) and complex queries. 6) We design link loss tolerant data (or reply) routing protocol for multi-hop wireless sensor networks. This protocol effectively combines the benefits of wireless broad- cast advantage with the traditional retransmission based data routing. Also, the protocol instantaneously adapts to network dynamics without periodic link quality maintenance. Further, the proposed protocol is suitable for both dense and sparse networks. 108 Chapter 8 Future Work In this chapter, we discuss possible directions for future work on gradient-based active query routing. 8.1 Query Dissemination Multiple-path based query dissemination protocol suffers failures due to the non-zero loss probability of wireless links. Incorporating retransmissions may improve the success probability at the expense of significant energy overhead. We believe that adding selec- tive retransmissions will improve the performance of the protocol significantly while the increase of overhead will be negligible. 8.2 Query Processing The proposed query processing architecture, PBS can be extended for other query types like on-demand range queries. Also, on-demand querying architecture, like PBS can be compared with existing GHT-based query architecture. 109 8.3 Reply Mechanism TABS can also be used in a network that uses periodic link quality estimation. We believe that such addition may increase the overhead of data forwarding somewhat, but will help to reduce the frequency of periodic link quality estimation significantly. Also, this will improve the success probability significantly compare to that of simple ARQ- based routing. 110 Bibliography [1] http://testbed.usc.edu [2] D. Aguayo, J. Bicket, S. Biswas, G. Judd, and R. Morris. “Link-level Measurements from an 802.11b Mesh Network”. In ACM SIGCOMM, 2004. [3] D.R. Askeland. “The Science and Engineering of Materials”. PWS Publishing Co., 1994. [4] S. Biswas and R. Morris. “ExOR: Opportunistic Multi-hop Routing for Wireless Networks”. In ACM SIGCOMM, 2005. [5] D. Braginsky and D. Estrin. “Rumor Routing Algorithm for Sensor Networks”. In WSNA, 2002. [6] A. Cerpa, N. Busek, and D. Estrin. “SCALE: A tool for Simple Connectivity As- sessment in Lossy Environments”. Technical Report 0021, CENS Technical Report, September 2003. [7] R. Roy Choudhuryand N. Vaidya. “MAC Layer Anycasting in Wireless Networks”. In HotNets II, number November, 2003. [8] M. Chu, H. Haussecker, and F. Zhao. Scalable Information-Driven Sensor Querying and Routing for Ad hoc Heterogeneous Sensor Networks. International Journal on High Performance Computing Applications, 16(3):90–110, Fall 2002. [9] D. Couto, D. Aguayo, J. Bicket, and R. Morris. “A High-Throughput Path Metric for Multi-Hop Wireless Routing”. In MobiCom, 2003. [10] Y. K. Dalal and R. M. Metcalfe. “Reverse Path Forwarding of Broadcast Packets”. Communications of the ACM, 21(12), December 1978. [11] A.Deshpande,C.Guestrin,S.Madden,J.Hellerstein, andW.Hong. “Model-Driven Data Acquisition in Sensor Networks”. In VLDB, 2004. [12] R. Dube, C. Rais, K. Wang, and S. Tripathi. “Signal Stability based Adaptive Routing (SSA) for Ad Hoc Mobile Networks”. IEEE Personal Communication, February 1997. [13] H.Dubois-Ferri` ere, M.Grossglauser, andM.Vetterli. “Age Matters: EfficientRoute Discovery in Mobile Ad Hoc Networks Using Encounter Ages”. In MobiHoc, 2003. 111 [14] H. Dubois-Ferri` ere, D. Estrin, and M. Vetterli. “Packet Combining in Sensor Net- works”. In ACM SenSys, 2005. [15] Q. Fang, F. Zhao, and L. Guibas. “Lightweight Sensing and Communication Proto- cols for Target Enumeration and Aggregation”. In MobiHoc, 2003. [16] J. Faruque and A. Helmy. “RUGGED: RoUting on finGerprint Gradients in Sensor Networks”. In IEEE International Conference on Pervasive Services (ICPS), 2004. [17] J. Faruque, K. Psounis, and A. Helmy. “Analysis of Gradient-based Routing Proto- cols in Sensor Networks”. Technical report, USC CS Technical Report, 2005. [18] D.Ganesan,D.Estrin,A.Woo,D.Culler,B.Krishnamachari,andS.Wicker. “Com- plex Behavior at Scale: An Experimental Study of Low-Power Wireless Sensor Net- works”. Technical Report 02-0013, UCLA CS Technical Report, 2002. [19] D. Ganesan, R. Govindan, S. Shenkar, and D. Estrin. “Highly-Resilient, Energy- Efficient Multipath Routing in Wireless Sensor Networks”. ACM MC2R, 5(4), Oc- tober 2001. [20] O.Gnawali, M.Yarvis,J.Heidemann,andR.Govindan. “Interaction ofRetransmis- sion, Blacklisting, and Routing Metrics for Reliability in Sensor Network Routing”. In IEEE SECON, 2004. [21] M. Grossglauser and M. Vetterli. “Locating nodes with EASE: Last Encounter Routing for Ad Hoc Networks through Mobility Diffusion”. In InfoCom, 2003. [22] Lin Gu, Dong Jia, Pascal Vicaire, Ting Yan, Liqian Luo, Ajay Tirumala, Qing Cao, Tian He, John A. Stankovic, Tarek F. Abdelzaher, and Bruce H. Krogh. “Lightweight Detection and Classification for Wireless Sensor Networks in Realis- tic Environments”. In ACM Sensys, 2005. [23] J.Heidemann,F.Silva,C.Intanagonwiwat, R.Govindan,D.Estrin,andD.Ganesan. “Building Efficient Wireless Sensor Networks with Low-level Naming”. In SOSP, 2001. [24] J. Heidemann, F. Silva, and D. Estrin. “Matching Data Dissemination Algorithms to Application Requirements”. In SenSys, 2003. [25] C. Intanagonwiwat, R. Govindan, and D. Estrin. “Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks”. In MobiCom, 2000. [26] C.Intanagonwiwat, D. Estrin,R.Govindan, andJ.Heidemann. “ImpactofNetwork Density on Data Aggregation in Wireless Sensor Networks”. In ICDCS, 2002. [27] B. Karp and H. T. Kung. “GPSR: Greedy Perimeter Stateless Routing for Wireless Networks”. In MobiCom, 2000. [28] J.N.LanemanandG.Wornell. “ExploitingDistributedSpatialDiversityinWireless Networks”. In Allerton conference on communication, control and computing, 2000. 112 [29] P. Larsson. “Selection Diversity Forwarding in a Multi-hop Packet Radio Network with Fading Channel and Capture”. ACM MC2R, 5(4), 2001. [30] Q. Li, M.D. Rosa, and D. Rus. “Distributed Algorithms for Guiding Navigation across a Sensor Network”. In MobiCom, 2003. [31] J. Liu, F. Zhao, and D. Petrovic. “Information-Directed Routing in Ad Hoc Sensor Networks”. In WSNA, 2003. [32] H.Luo,F.Ye,J.Cheng,S.Lu,andL.Zhang. “TTDD:Two-TierDataDissemination in Large-Scale Wireless Sensor Networks”. Wireless Networks, 11:161–175, 2005. [33] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker. “Search and Peplication in Un- structured Peer-to-Peer Networks”. In ICS, 2002. [34] D. Lymberopoulos, Q. Lindsey, and A. Savvides. “An Empirical Analysis of Radio Signal Strength Variability in IEEE 802.15.4 Networks using Monopole Antennas”. In EWSN, 2006. [35] S. Madden, M. Franklin, J. Hellerstein, and W. Hong. “TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks”. In OSDI, 2002. [36] S. Madden, M. Franklin, J. Hellerstein, and W. Hong. “The Design of an Acquisi- tional Query Processing for Sensor Networks”. In SIGMOD, 2003. [37] H. Matsuo and K. Mori. “Accelerated Ants Routing in Dynamic Networks”. In Intl. Conf. on Software Engineering, Artificial Intelligence, Networking and Paral- lel/Distributed Computing, pages 333–339, 2001. [38] R. Poor. “Gradient Routing in Ad Hoc Networks”. http://www.media.mit.edu/ pia/Research/ESP/texts/poorieeepaper.pdf. [39] L.Reznik, G. V. Pless, and T.Karim. “EmbeddingIntelligent SensorSignal Change Detection into Sensor Network Protocols”. In IEEE SECON, 2005. [40] N. Sadagopan, B. Krishnamachari, and A. Helmy. “Active Query Forwarding in Sensor Networks (ACQUIRE)”. In SNPA, 2003. [41] S. D. Servetto and G. Barrenechea. “Constrained Random Walks on Random Graphs: Routing Algorithms for Large Scale Wireless Sensor Networks”. In WSNA, 2002. [42] J.F. Shackelford. “Intro to Materials Science For Engineers”. Prentice Hall, 2000. [43] K. Srinivasan and P. Levis. “RSSI is Under-Appreciated”. In EmNets-III, 2006. [44] A. Woo, S. Madden, and R. Govindan. “Networking Support for Query Processing in Sensor Networks”. Communications of the ACM, 47(6), June 2004. [45] A. Woo, T. Tong, and D. Culler. “Taming the Underlying Issues for Reliable Mul- tihop Routing in Sensor Networks”. In ACM SenSys, 2003. 113 [46] Y. Yao and J. Gehrke. “Query Processing for Sensor Networks”. In CIDR, 2003. [47] FanYe,GaryZhong,SongwuLu,andLixiaZhang. “GRAdientBroadcast: ARobust Data Delivery Protocol for Large Scale Sensor Networks”. ACM WINET (Wireless Networks), 11(3):285–298, 2005. [48] F. Zhao, J. Liu, L. Guibas, and J. Reich. “Collaborative Signal and Information Processing: An Information Directed Approach. Proceeding of the IEEE, 91(8), 2003. [49] J. Zhao and R. Govindan. “Understanding Packet Delivery Performance in Dense Wireless Sensor Networks”. In ACM Sensys, 2003. [50] G. Zhou, T. He, S. Krishnamurthy, and J. Stankovic. “Impact of Radio Irregularity on Wireless Sensor Networks”. In MobiSys, 2004. [51] M. Zorzi and R. Rao. “Geographic Random Forwarding (GeRaF) for Ad Hoc and Sensor Networks: Multi-hop performance”. IEEE Transactions on Mobile Comput- ing, 2(4), October 2003. 114
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Robust routing and energy management in wireless sensor networks
PDF
Rate adaptation in networks of wireless sensors
PDF
A protocol framework for attacker traceback in wireless multi-hop networks
PDF
Aging analysis in large-scale wireless sensor networks
PDF
Efficient and accurate in-network processing for monitoring applications in wireless sensor networks
PDF
Distributed wavelet compression algorithms for wireless sensor networks
PDF
Cooperation in wireless networks with selfish users
PDF
Techniques for efficient information transfer in sensor networks
PDF
Realistic modeling of wireless communication graphs for the design of efficient sensor network routing protocols
PDF
Transport layer rate control protocols for wireless sensor networks: from theory to practice
PDF
Multichannel data collection for throughput maximization in wireless sensor networks
PDF
Congestion control in multi-hop wireless networks
PDF
Reconfiguration in sensor networks
PDF
Analysis and countermeasures of worm propagations and interactions in wired and wireless networks
PDF
Towards interference-aware protocol design in low-power wireless networks
PDF
Language abstractions and program analysis techniques to build reliable, efficient, and robust networked systems
PDF
Understanding and exploiting the acoustic propagation delay in underwater sensor networks
PDF
Dynamic routing and rate control in stochastic network optimization: from theory to practice
PDF
Algorithmic aspects of throughput-delay performance for fast data collection in wireless sensor networks
PDF
Relative positioning, network formation, and routing in robotic wireless networks
Asset Metadata
Creator
Faruque, Jabed
(author)
Core Title
Gradient-based active query routing in wireless sensor networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Engineering
Publication Date
07/26/2007
Defense Date
05/07/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
gradient-based,OAI-PMH Harvest,routing,sensor networks
Language
English
Advisor
Helmy, Ahmed (
committee chair
), Govindan, Ramesh (
committee member
), Krishnamachari, Bhaskar (
committee member
)
Creator Email
faruque@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m698
Unique identifier
UC1295702
Identifier
etd-Faruque-20070727 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-522074 (legacy record id),usctheses-m698 (legacy record id)
Legacy Identifier
etd-Faruque-20070727.pdf
Dmrecord
522074
Document Type
Dissertation
Rights
Faruque, Jabed
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
gradient-based
routing
sensor networks