Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Efficient data collection in wireless sensor networks: modeling and algorithms
(USC Thesis Other)
Efficient data collection in wireless sensor networks: modeling and algorithms
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EFFICIENT DATA COLLECTION IN WIRELESS SENSOR NETWORKS: MODELING AND ALGORITHMS by Lorenzo Rossi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2011 Copyright 2011 Lorenzo Rossi Dedication Dedicated to my dear parents. ii Acknowledgements I would like to thank my friendsand all the USC faculties and staff who have been very patient with me. iii Table of Contents Dedication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Chapter1: Introduction 1 1.1 Diffusion Models for Data Gathering . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Significance of the Research . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Motivations of the Research . . . . . . . . . . . . . . . . . . . . . 3 1.1.3 Contributions of the Research . . . . . . . . . . . . . . . . . . . . 4 1.2 Spatially Non Stationary Stationary Models . . . . . . . . . . . . . . . . 5 1.2.1 Significance of the problem . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Energy Efficient Data Gathering via Distributed Classification . . . . . 8 1.3.1 Significance of the Problem . . . . . . . . . . . . . . . . . . . . . 9 1.3.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Outline of the Dissertation. . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter2: Background Review 13 2.1 Wireless Sensor Networks: Applications and Tasks . . . . . . . . . . . . 13 2.1.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.2 Main tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Diffusion Partial Differential Equation (PDE) . . . . . . . . . . . . . . . 16 2.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 iv 2.2.2 Analytic solution of the heat equation . . . . . . . . . . . . . . . 18 2.2.3 Numerical solution . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Model Based Identification . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Information Theory for Sensor Networks and Spatially Nonstationary Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.1 Joint coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.2 Distributed coding . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.3 Correlated data gathering . . . . . . . . . . . . . . . . . . . . . . 23 2.4.4 Spatially nonstationary models . . . . . . . . . . . . . . . . . . . 23 Chapter3: PDE Parameter Estimation via SensorNets: Basic Frame- work 26 3.1 Overview of Proposed Framework . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Systems with Known Boundary Conditions . . . . . . . . . . . . . . . . 28 3.3.1 Model discretization and clustering . . . . . . . . . . . . . . . . . 29 3.3.2 Parameter estimation via Kalman filtering . . . . . . . . . . . . . 32 3.3.3 Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.4 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.5 Uncertainty on node location . . . . . . . . . . . . . . . . . . . . 35 3.4 Systems with Unknown Boundary Conditions . . . . . . . . . . . . . . . 35 3.5 Systems with Active Sources . . . . . . . . . . . . . . . . . . . . . . . . 37 3.6 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.6.1 Locating the sampling point . . . . . . . . . . . . . . . . . . . . . 39 3.6.2 Smoothing property . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter4: PDE Parameter Estimation via SensorNets II: Extensions and Implementation Issues 45 4.1 Extension to 2-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.1 Discretization of the PDE model . . . . . . . . . . . . . . . . . . 46 4.1.2 Incorporation of measurements . . . . . . . . . . . . . . . . . . . 48 4.1.3 Incorporation of boundary conditions . . . . . . . . . . . . . . . 50 4.1.4 Treatment of unknown boundary conditions . . . . . . . . . . . . 51 4.1.5 Treatment of spatially varying parameters . . . . . . . . . . . . . 51 4.1.6 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Hybrid Data and Decision Fusion Techniques . . . . . . . . . . . . . . . 54 4.2.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 v 4.3 Energy Consumption and Data Gathering . . . . . . . . . . . . . . . . . 58 4.4 Turbulent Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Chapter5: Optimal Sink Deployment for Distributed Sensing of Spa- tially Nonstationary Phenomena 64 5.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.2 Massively Dense Sensor Networks . . . . . . . . . . . . . . . . . . . . . . 68 5.3 Analysis of the Rate Profile . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3.1 A class of nonstationary covariance models . . . . . . . . . . . . 74 5.3.2 Variability with respect to the sink location . . . . . . . . . . . . 76 5.3.3 Sensitivity to node perturbations . . . . . . . . . . . . . . . . . . 79 5.3.4 Node density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3.5 Maximum steepness of the incremental rate . . . . . . . . . . . . 79 5.4 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Chapter6: Energy Efficient Data Gathering via Distributed Classifi- cation 85 6.1 Problem Formulation and Efficiency Constraints . . . . . . . . . . . . . 85 6.1.1 System Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.1.2 Problem Formulation for Error Free Classification . . . . . . . . 86 6.1.3 Formulation and Constraints under Misclassification Errors . . . 88 6.2 Admissible Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.3 Content Analysis via Intensity Histograms . . . . . . . . . . . . . . . . . 91 6.3.1 Efficient encoding of the histogram . . . . . . . . . . . . . . . . . 92 6.3.2 Methods to increase the efficiency . . . . . . . . . . . . . . . . . 93 6.4 The Effect of Subsampling over Classification . . . . . . . . . . . . . . . 94 6.4.1 Impact of Subsampling over Intensity Histogram . . . . . . . . . 94 6.4.2 Analytical Computation of the Error Rates and the Mean Square Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.4.3 Mean Square Error . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.5 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.5.2 Experiments (No Subsampling) . . . . . . . . . . . . . . . . . . . 100 6.5.3 Simulations and Experiments to Study Impact of Subsampling . 101 6.5.4 Impact of Subsampling over Effective Efficiency . . . . . . . . . . 104 6.6 Distributed Classification Decision from Incomplete Data . . . . . . . . 105 6.6.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.6.2 Confidence Metric . . . . . . . . . . . . . . . . . . . . . . . . . . 108 vi 6.6.3 Algorithm for Distributed Feature Evaluation . . . . . . . . . . . 109 Chapter7: Conclusion and Future Work 114 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.1.1 Diffusion Models for Data Gathering . . . . . . . . . . . . . . . . 114 7.1.2 Spatially Non Stationary Stationary Models . . . . . . . . . . . . 115 7.1.3 Energy Efficient Data Gathering via Distributed Classification . 115 7.2 Open Problems and Future Work . . . . . . . . . . . . . . . . . . . . . . 116 7.2.1 Diffusion Models for Data Gathering . . . . . . . . . . . . . . . . 116 7.2.2 Spatially Non Stationary Models . . . . . . . . . . . . . . . . . . 118 7.2.3 Distributed Classification . . . . . . . . . . . . . . . . . . . . . . 120 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 vii List of Tables 4.1 Comparison between data and decision fusion.. . . . . . . . . . . . . . . 57 6.1 Pseudo code for distributed histogram acquisition. . . . . . . . . . . . . 93 6.2 Distributed vs. centralized decision. . . . . . . . . . . . . . . . . . . . . 111 viii List of Figures 2.1 Overview of a generic sensor networks system, where the nodes send in- formation to a remote base station communicating though a sink node. The information can be sent either by autonomous initiative of the net- work or as answer to queries from the base station. . . . . . . . . . . . . 14 2.2 The diffusivity ranges in cm 2 /s for gases, solids and liquids. . . . . . . . 18 2.3 The example of a 1-D heat diffusion field with x(ξ,0) =δ(ξ) and D =1. 19 2.4 An example of data gathering tree. . . . . . . . . . . . . . . . . . . . . . 22 3.1 An overview of the proposed system. . . . . . . . . . . . . . . . . . . . . 27 3.2 State variables, x i (k), and a noisy sensor measurement y 1 (k) for the monodimensional scalar field x(ξ,t). . . . . . . . . . . . . . . . . . . . . 29 3.3 The sampledBC’s,y l (k) andy r (k), are sent tothe cluster head inalocal model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 Cluster splitting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 The convergence rate of the algorithm versus the sensor location and the noise level (top), and the profile of the scalar field at the initial state and at the steady state (bottom). . . . . . . . . . . . . . . . . . . . . . . . . 40 3.6 The mean estimation error versus the sensor location and the noise level. 41 3.7 The scalar field described by the parabolic equation (3.30), where an advection phenomenon can be observed and the concentration is moving toward one of the boundaries. . . . . . . . . . . . . . . . . . . . . . . . . 42 3.8 The Fourier transform with respect to the space at three different time instances to show the smoothing effect. . . . . . . . . . . . . . . . . . . . 43 3.9 The squared error in the estimation of a scalar field. . . . . . . . . . . . 43 3.10 The estimated diffusion coefficient θ 1 (top), covariance matrix (middle) and prediction error (bottom) for the parabolic problem given by Eq. (3.30). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.11 Estimates of coefficients θ 2 (top) and θ 3 (bottom). . . . . . . . . . . . . 44 4.1 Thecomputational molecule fortheLaplacianoperatorandotherspacial derivatives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 ix 4.2 The state space grid for a rectangular sensor field and boundary points. The locations of sensor nodes are not shown. . . . . . . . . . . . . . . . 48 4.3 The square-cell approximation of a measurement point. . . . . . . . . . 49 4.4 An example of domain decomposition in a 2-D region, where parameter θ(ξ) is assumed to be constant over each subregion. . . . . . . . . . . . . 53 4.5 Comparison of local estimates of parameters with their true values. . . . 54 4.6 The scalar fielddescribedbythe parabolic Eq(4.30), wherean advection phenomenon is observed and the concentration is moving toward one of the boundaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.7 Estimates of coefficients via data fusion of three sensors, one sensor, and a witch between three/one sensor. . . . . . . . . . . . . . . . . . . . . . 58 4.8 The scalar field estimation using a hybrid method. . . . . . . . . . . . . 59 4.9 The 1-D uniform sensor field, where nodes are equally spaced and the sink is located at the center. . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.10 The physical field x(ξ,t) over space at different sampled times. . . . . . 60 4.11 The physical field x(ξ,t) over time at different sampled locations. . . . . 61 4.12 An example of the cluster-based data gathering system. . . . . . . . . . 61 5.1 Example of placement of three sinks (black dots) and partition of the sensor region (dashed lines). . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Profile of α(x) and partition of the 1D sensor field in to two subregions with optimal sink locations at s ∗ 1 =46 and s ∗ 2 =.83. . . . . . . . . . . . 72 5.3 Profile of α(x) =log 2 (2+10x) and optimal sink location in s ∗ =.6. . . 72 5.4 Data aggregation cost as a function of the sink location. . . . . . . . . . 73 5.5 Absolutevariationoftheestimatedvarianceoverthespace(rainfalldata set) and bounding line according to condition (5.26) . . . . . . . . . . . 75 5.6 Absolute variation of an estimated covariance coefficient over the space (rain fall data set) and bounding line according to condition (5.26). . . . 75 5.7 Covariance coefficient and bounding exponential (34.12e −.30Δx ) accord- ing to condition (5.27). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.8 Normalized error, 100|α 1 (x,y)−α 2 (x,y)|/|α 1 (x,y)|, between two data aggregation density functions on two different sink locations. . . . . . . 77 5.9 Thetwodifferentsetsofchildrenconsideredforcomputingtheincremen- tal rate at node i for the sink locations s 1 and s 2 . In this case there is no overlapping between the sets. . . . . . . . . . . . . . . . . . . . . . . 77 5.10 Variable node density, stationary correlation model, Slepian-Wolf coding. 79 5.11 Data aggregation function (incremental rate) for 1D subset of the rain data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 x 5.12 Subregion considered for the experiment, delimited by the rectangular box, and optimal sink location,’X’. . . . . . . . . . . . . . . . . . . . . . 82 5.13 Rate profiles for 2D subset of the rain fall data, computed using explicit communication strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.14 Rate profiles for 2D subset of the rain fall data, computed using optimal Slepian-Wolf strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.15 Data aggregation cost profile for explicit communication aggregation as- sociated to 2D subset of the rain fall data. The optimal sink location is the centre of the region. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.1 Maximum allowed size of the feature vector with respect to the number of hops for a binary balanced tree network. . . . . . . . . . . . . . . . . 91 6.2 Node topology for over ’dry region’ for 3 and 4 hops. . . . . . . . . . . . 101 6.3 Effective versusestimated efficiency for3, 4hops, 4binsand2binunder- sampled histograms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.4 Errorratesforfalsedetectedandmissedsensingframesforvariouslengths of the training sequence, 3 and 4 hop networks and 4 and 3 bin histograms.103 6.5 The MSE for regular (a) and uniform random (b) subsampling vs. num- ber of samples and signals with different spectral characteristics. . . . . 104 6.6 Comparison of MSE for different types of regular, uniform and Bernoulli sampling, for white processes. . . . . . . . . . . . . . . . . . . . . . . . . 105 6.7 Analytical vs. experimental MSE for uniform random subsampling with m constant (a) and Comparison of experiments MSE vs. 1/m (b). . . . 106 6.8 Comparison of analytical and experimental error rates for uniform ran- dom(a)andBernoulli(b)subsamplingovera5yearsperiodoftherainfall data set (2 bin histograms). . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.9 Comparison of analytical, approximated and experimental average MSE for uniform random sampling over a 4 years period of the rainfall data set (2 bin histograms). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.10 Comparison of error rates for regular, Bernoulli and uniform random subsampling (Rain fall data set, 5 years period, 2 bin histograms). . . . 109 6.11 Sensor network topology for the whole rainfall data set region 165 nodes, 6 hops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.12 Error rates and effective efficiency for sensing frame classification via 8−bin histograms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.13 Error rates and effective efficiency for sensing frame classification via 4−bin histograms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 xi 6.14 Sensor network topology and locations of the decision making nodes (in- dicatedwiththeblackdiamond). Arandomperturbationhasbeenadded to the location of such nodes, in order to help visualization. . . . . . . . 113 6.15 Two examples of the scalar field for the confidence metric on two rainy days. The peaks are placed on the nodes that took the decision . . . . . 113 7.1 Peaks of the diffusion field have a higher dynamic range in the transient state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 xii Abstract Thisdissertationfocusesondatagatheringforwirelesssensornetworks. Datagathering deals with the problem of transmitting measurements of physical phenomena from the sensor nodes to one or more sinks in the most efficient manner. It is usually the main task performed by a sensor network and therefore the main cause of energy depletion for the nodes. The research efforts presented here propose insightful models for the phenomena sampled by sensor networks with the purpose of designing more energy efficient data gathering schemes. We first focus on phenomena that can be characterized by a diffusive process. We proposetomodelthedataviadiscretizeddiffusionpartialdifferentialequations(PDEs). The rationale is that few equation coefficient plus initial and contour conditions may havethepotentialtocompletelydescribesuchspatio-temporalphenomenainacompact manner. We propose and study an algorithm for the in-network identification of the diffusion coefficients. Then, we adopt a spatially non stationary correlation model and we study how this impacts correlation based data gathering and, in particular, the problem of optimally placing a sink node in a sensor network region. Finally, we view each round of sensor measurements as a still image and we represent it via intensity histograms. This way, we can adopt image content analysis tools (intensity histograms matching) to analyze the data and determine which rounds of measurements are of interest to the final users. Therefore energy can be saved by transmitting only some rounds of measurements to the base station. We study the above models and the performance of data collection algorithms via analysis and experiments on synthetic and real data. xiii Chapter 1 Introduction The main focus of this dissertation is on data gathering for wireless sensor networks. Data gathering deals with the problem of transmitting measurements of physical phe- nomena from the sensor nodes to one or more sinks in the most efficient manner, in order neither to deplete too fast the limited energy supply of the sensor nodes, nor to lose data. This is probably the most widely studied problem in the wireless sensor network literature. The common denominator of the research efforts presented here is the attempt to propose insightful models for the phenomena sampled by sensor net- works. The final goal for studying new data models is the design of more efficient data gathering schemes. We first focus our attention on the problem of monitoring natural phenomena that can becharacterized byadiffusive process(chapters 3and4). We proposetomodelthe data via discretized diffusion partial differential equations (PDEs) with the purpose of designing the data gathering process as the distributed identification and transmission of the parameters of the PDE models. The rationale is that few equation coefficient plus initial and contour conditions may have the potential to completely describe some spatio temporal phenomena in a compact manner. Then, also in order to focus on more general problems, we switch to a statistical model and studythe impact of spatially non stationary statistical models on data gath- ering and in particular on the optimal sink placement (Chapter 5). Here our goal is to adopt a model more realistic than the spatially stationary models generally assumedby other works in the literature. Then we study how our assumptions impact the design of sensor network systems and in particular we focus on the optimal sink placement problem. Finally, in Chapter 6, we propose a model that further abstracts from the physics of the phenomenon, by viewing each round of sensor measurements as a still image and then use image content analysis tools (intensity histograms matching) to analyze the 1 data. The goal of data analysis is to transmit to the sink only the rounds of data that are of interest to the users and therefore save energy in the data gathering task. For the rest of this chapter, we discuss in detail significance, related works and contribution for each of the above studies. 1.1 Diffusion Models for Data Gathering The problem of estimating parameters of physical phenomena via wireless sensor net- works is studied inchapters 3and 4. It is assumedthat aphenomenonbeingmonitored by a sensor network, e.g. the diffusion of a certain gas in an environment, is modeled by a partial differential equation (PDE) whose coefficients are unknown. The physical phenomena being considered in this work belong to the class of diffusion phenomena. Those are modeled by parabolic partial differential equations. However, many of our results can be extended to other classes of physical phenomena, e.g. the wave field. Research significance, motivations and main contributions are presented in Secs. 1.1.1, 1.1.2 and 1.1.3, respectively, of this introductory chapter. 1.1.1 Significance of the Research Wireless sensor networks aim to provide a smart interaction with the physical world. They can be deployed at low cost and in large numbers in remote environments to pro- vide autonomous and intelligent measurements, answer queries andperformmonitoring tasks. Animportantsetofapplication scenarios consistsinmonitoringphenomenahav- ing a diffusion behavior over the space. Examples of relevant phenomena include the temperaturefieldin anecosystem, cloudsof gases in theair, polluting agents inthesea, etc. In such cases, a sensor network may be required to transmit and keep updated a map of the phenomenon to a remote base station (BS) and/or to answer queries from the BS. Several approaches have been proposed to tackle the above problems, and most of them rely on transferring all samples of the phenomenon measured at various sensor locations to the BS periodically. This is often referred to as data gathering. Under this context, people study the optimal way (say, in the sense of least energy consumption) to transfer node measurements to a remote BS. A typical assumption is that all nodes sample the sensor field uniformly in time and generate a message at each round of measurements. Allmessagepackets aredeliveredtotheBSthroughvariousaggregation strategies [27]. It is worthwhile to point out that many of the aforementioned methods treat the 2 sampled data as if they were an i.i.d. process. However, natural phenomena are typi- cally characterized by strong spatial and temporal correlations and, therefore, the i.i.d assumption may lead to redundancy in computation and communication. Usually, sen- sor network devices are severely energy limited. In many real world applications, such correlations canbecharacterized byrelatively simplephysicalmodels, e.g. oneoracou- pleof partialdifferential equations (PDE’s). For instance, partial differential (diffusion) equations can be used to model phenomena such as the propagation of a gas in the air, the diffusion of a polluting fluid in the water, or the evolution of the temperature/heat distribution over a space [23]. Thus, physical models can provide fundamental insights into many phenomena that play a critical role in sensor network applications. Furthermore, they can exploited in the design of more effective wireless sensor networks algorithms since some useful prior knowledge of the system has been embedded in the problem formulation. For example, physics can be used to design prediction schemes or to provide a compact parametric description of the phenomenon [53], or to design more efficient encoding schemes for sensor data [3]. Under the above framework, one problem of our interest in this research is the identification of unknown parameters (e.g., the diffusion coefficient of a certain gas) of physical models through sampled sensor data. The knowledge of these parameters, along with boundary and initial conditions, allows the complete characterization of the monitoredphenomenaandhasawiderangeofapplications. Therearemanymathemat- icalchallengesintheestimationofparametersofdiffusionmodelseveninthetraditional centralized setting [43]. These challenges become even more in a distributedsensor net- work environment due to the very limited computation and communication resource as well as uncertainties (e.g., node locations, measurement errors, communication errors, boundary conditions and sources) associated with these environments. 1.1.2 Motivations of the Research Research on physics-based sensor networks is an emerging field. To the best of our knowledge, the first group of papers publishedon this topic appeared in 2004. Beferull- Lozano et al. [3] proposedacompression scheme for the temperature fields ona circular ring topology. The fields were modeled by a parabolic PDE. A source localization problemforadiffusionmodelwasstudiedin[41]. Servetto[20] studiessomedistributed problems related to the wave equation. All these papers assumed the knowledge of the coefficients of PDE models. In contrast with the above efforts, our work addresses the problem of estimating 3 parameters of PDE models via wireless sensor networks [51–53]. Traditionally, iden- tification of physical models has been conducted by a centralized processing system with wired sensor nodes [2,43]. This approach does not take into account typical con- straints of wirelesssensor networks, suchas the limitation incomputation, transmission and memory. Furthermore, they usually assume perfect knowledge of sensor locations, source signals and boundary conditions. These assumptions may not be realistic in a general sensor net environment. Therefore, there is a great need in the development of a new approach to address the physical model identification problem in the context of wireless sensor networks. In particular, the diffusion phenomena are examined in this work. Many physi- cal phenomena relevant to wireless sensor network applications fall into this category. For example, security monitoring of the ground, water and air against the risk of con- taminants and polluting agents is among the primary applications of interest in sensor networks. All of them can be modeled by diffusion PDE’s. Furthermore, many of our results can be applied to other classes of physical models such as the wave field. 1.1.3 Contributions of the Research The main contributions of this research are listed below. • A distributed diffusion system identification problem is formulated based on the PDE modeling of a physical field. A solution to this problem is developed, which can be conducted by a cluster based architecture so that it is suitable for the wireless sensor network implementation. Generally speaking, this problem can be represented by simple discrete state space models defined independently at each cluster and solved accordingly. Some novel algorithmic results are stated below. 1. AnadaptivedistributedapproachtotheidentificationofthePDEparameters is proposed based on the Extended Kalman Filter (EKF). Its computation cost in a sensor network context is evaluated. 2. Theboundaryconditionscanbeusedtomodelphenomenaaffectedbysources externaltothesensorfield. Thetreatmentoftheboundaryconditionsisdone with care. The boundary conditions may or may not be known in advance. Forthelattercase,nodesinasensornetworkmustestimateinitiallyunknown information about boundaries. 3. The scenario of spatially varying parameters is also addressed and a simpli- fied estimation method is proposed. This proposed solution decomposes a problem of spatially varying parameters into several local subproblems char- acterized by spatially invariant parameters. 4 4. Ahybriddataanddecisionschemeisproposed,wherethealgorithmswitches from initial data fusion to decision fusion iterations in order to save the communication energy. • It is observed that sensor nodes achieve better estimation performance in loca- tions of the field characterized by higher time dynamics and far from points of forcedmotionofthefield(e.g.,nearsourcesorconstrainedboundaries). Themea- surements at those points provides better estimates, but they are not necessarily characterized by a higher signal to noise ratio (SNR). This result may be seen as contrasting the common view point of the performance of signal processing algorithms merely depending on SNR. • Some preliminary results to analyze the communication cost of the algorithm are provided. The cost of transmitting the physical field to the base station via PDE models is compared with the one related to sample based data gathering. 1.2 Spatially Non Stationary Stationary Models The topic of energy efficient acquisition of a spatio-temporal phenomenon via wireless sensor networks, namely, correlated data gathering, is investigated in Chapter 5. In a data gathering round, each node aggregates its own sensed data with data from other nodes, performing entropy coding, and forwards them to its parent node. Final destinations for all data packets are nodes called sinks. The overall energy needed by nodes (battery supplied) for gathering data depends on the total length of the set of routes to sinks, which is called the data gathering tree, and rates generated by nodes. The rate depends on the spatial correlation among the data. Inthesensornetwork literature, problemsofpowerefficientdatagathering underan implicit (or explicit) assumption of spatially stationary statistics have been extensively studied, e.g., [58], [45], [14], [70]. However, real phenomena are typically characterized by nonstationary spatial covariance models due to heterogeneity (e.g., variations in the altimetric profile) in the underlying environment [57]. So far, little is known on the specific impact of nonstationarity on correlated data gathering, which is the major focus of this research. The problem of optimal deployment of sinks for power efficient data gathering of a spatially nonstationary phenomenon is examined in this work. An evident consequence of spatial nonstationarity is that the rate profilehas an uneven(or non-uniform)spatial distribution. Some areas may be characterized by more correlated data, therefore gen- erating lower rates, while other areas may generate higher rates due to higher spatial 5 harshness of the monitored phenomenon. Intuitively, if the rate profile has an uneven distribution with respect to the region center, placing the sink closer to the higher rate area could be more energy efficient than keeping it at the region center. Under the as- sumption of an even spatial distribution of rates generated by sensor nodes, the center would be the most energy efficient location, which is true when the monitored phe- nomenon is characterized by stationary spatial statistics. The optimal sink placement problem under spatial nonstationarity of the phenomenon is however not well studied in the literature. We address this problem by studying the properties of rate profiles for a general class of nonstationary spatial covariance models both analytically and numerically. An interesting aspect is that we are able to relate geo-statistical properties of a physical phenomenon to rate profiles and, hence, to optimal sink locations over a certain region of observation. The main challenges arise from the complexity of nonstationary spatial covariance models and many factors that affect the rate profile (e.g. the type of the distributed source coding scheme being adopted, the structure of the data gathering tree, the location and the density of nodes). For instance, it is difficult to rely on a well defined graph structure to solve the problem because the data gathering tree depends on the sink location which is however the unknown to be solved. Our main result is that the optimal location of the sink turns out to be close (usually within one hop) to the center of the sensor region for a general class of nonstationary covariance models even though preliminary intuition would have suggested that the optimal location may be significantly distant from the center. 1.2.1 Significance of the problem Thestudyof power efficient datagathering strategies is one of the fundamental areas of researchinwirelesssensornetworks, e.g.,[58],[45],[14],[21],[70],[63]. Thewideinterest in this particular subject is motivated by the fact that wireless sensor nodes rely on limited energy suppliedby non-replaceable batteries anddata gathering is typically one ofthemostrecurringandenergyconsumingtasksperformedbyasensornetwork. Inthe aforementioned research effort, as one would expect, there is a trend of moving towards more realistic physical assumptions. For example, anindependentidentical distribution (i.i.d.) for the phenomena of interest was assumed in early work. Then, researchers started considering statistical correlation among observations of the phenomenon in a more realistical setting in recent work [14]. With few and very recent exceptions [35], spatial covariances are assumed to be stationary and isotropic. Stationary models may be inaccurate for phenomena over heterogeneous environ- ments. For instance, in the case of a mixed indoor-outdoor deployment of temperature 6 sensors, the indoor temperature could be assumed ideally independent of the outdoor one. Thisimplies that a spatially nonstationary modelwould bemore appropriate than a stationary one for this scenario. The outdoor temperature can be considered inde- pendent of the indoor one also at points very close to the indoor environment and it is less spatially smooth. Nonstationary models could be more appropriate also in indus- trial applications of multimodal sensor networks. To give an example, we may consider the case of monitoring two completely different phenomena in contiguous environments (e.g., the temperature field and the condition of certain machinery). 1.2.2 Related work Problems studied in the literature on correlated data gathering include: optimal se- lection of aggregation paths [14], optimization of the placement of sensors nodes [21], evaluation of the energy gain of optimal correlation aware routing schemes with respect to the shortest path routing schemes [45,70]. More details on the main results of some works will be elaborated in Section 2.4. The problem of multisink deployment was ad- dressedin[29], whereeachnodewasassumedtogenerateaconstantrateunderaregular grid topology. All aforementioned efforts rely on an explicit (or implicit) assumption of stationarity and isotropy of the covariance models or even i.i.d. phenomena [29]. To the best of our knowledge, only Guestrin et al. [35] studied the problem of optimal node placement under the assumption of possibly nonstationary phenomena. Their en- ergy metric considered only the numberof retransmissions, but neither the rate nor the weight of paths. Nonstationary models for geophysical phenomena have been studied with an increasing interest in the field of geostatistics [57], [44], [60]. Overall, there has beenlittle work that addresses the impact of more realistic physi- cal assumptionsaboutthemonitoredphenomenaover thewirelesssensornetwork prob- lem. One of the leading motifs in our research is to understand the impact of more ac- curate physical assumptions over wireless sensor network problems [51]. As mentioned, heterogeneity of environments makes the stationarity assumption unrealistic in many applicationscenarios. Abandoningtheassumptionofstationarityopensseveralinterest- ing issues on correlated data gathering and may require to define new node deployment and data aggregation strategies. The study of the sink deployment is one of these new issues. It requires to find a relationship between the physics of the phenomenon and the spatial distribution of the rate in data aggregation tasks. The understanding of the aforementioned relationship may also be useful for other sensor network problems (e.g., node deployment). 7 1.2.3 Contributions To the best of our knowledge, the problem of the energy efficient deployment of sinks under the assumption of nonstationary phenomena is novel in sensor networks. The main contributions of our effort include the following. 1. We present some analytical solutions for the optimal region partition and sink placement under the assumption of nonstationary statistical models of the phe- nomena being monitored. 2. Weprovideanalyticalandnumericalresultsontherelationshipbetweenthespatial statisticsofthephenomenonandthespatialdistributionoftherateallocationover the sensor region for different distributed coding schemes. In other words, a link between geostatistical modeling and information theory for sensor networks is established. 3. Show that optimal location of the sink is close to the sensor region center via analysis and experimental results based on real data for a general class of nonsta- tionary covariance models. 1.3 EnergyEfficientDataGatheringviaDistributedClas- sification In this problem, we consider the scenario of a network of sensor nodes collecting mea- surements (e.g. of temperature) from a natural environment. The spatio-temporal recordings of the phenomenon provided by the sensor network can be seen as a se- quence of images, where each node represents a different pixel 1 . Here, we use the term sensing fame to indicate a complete spatial round of sensor measurements at a certain time. The goal of this work is to study efficient distributed techniques to automatically analyze and classify sensing frames (i.e. rounds of measurements in the sensing data) and to exploit those techniques for the task of efficient data gathering. In other words, the network shouldperformcontent analysis inorder to transmit only frames belonging to a class of interest for the end users, instead of transmitting each sensing frame to the base station, as in typical data gathering application. For instance, a sensor network could be collecting rain fall data over a certain region and end users could be interested only in frames of very rainy days. the case of a sensor network collecting rain fall data over acertain region. Theendusersmay beinterested in getting datafromthenetwork 1 Such pixels are not necessarily placed on a regular grid, due to the random placements of the nodes in the sensor field. 8 only in case of intense rain over the whole region. The network could send all the data to the base station, where the data of interest could be extracted by the end users. However, the amount of energy spent by the network could be reduced, if the nodes had the ability to perform some energy efficient content analysis on the data and then transmit to the base station only the sensing frames belonging to a class of interest to the end users, (e.g. ’rainy day’). Automatic image content analysis requires the extraction of features, such as color and intensity, from the data [16]. In this work, we adopt the intensity histogram as a feature to analyze the content of a sensing frame (a round of measurements). classifi- cation task can be performed by computing the distances of the histograms from a set of reference histograms. The distributed extraction of features from the sensing data, and hence the content analysis task, has a cost for the sensor network. Therefore, if the goal of in-network content analysis is to increase the efficiency of the datagathering process, there are strict constraints for the numberand size of the messages (e.g feature vectors) that can be exchanged for the purpose of analyzing the content. In this work, we study such energy constraints and their impact on the design and the performance of techniques for a sensor network context. We also study suitable methods to increase the potential efficiency of these techniques, by reducing the feature size (e.g group bins and/or subsample data) and extracting the features from incomplete subsets of the sensing data. how to estimate the potential energy savings from network topology and historical series of sensing data. To We also study and how the spatial subsampling of themeasurementscanaffectclassification performanceviahistograms. Severalmethods andresultspresentedherecanbegeneralizedtootherin-networkclassificationmethods. problem by picking data gathering as benchmark of the ... 1.3.1 Significance of the Problem Data gathering is one of the fundamental problems in wireless sensor networks. If the applications of sensor networks are going to widespread and hence the amount of sensing data, it is sensible to investigate on techniques for the automatic analysis of the sensing content. Moreover, the in-network implementation of such techniques can be adopted to transmit to the base station only data of interest to the end user(s) with potential energy savings for the network. Furthermore, addressing these problems in a sensor network contest requires to design methods that work within very strict energy efficiency constraints and therefore opens new research problems. 9 1.3.2 Literature Review There is a broad literature on the topics automatic still image retrieval [16] and video indexing [40]. Usually the approaches presented for such problems are centralized. Itseemsthatthereisnotmuchliteratureontheautomaticclassificationofecosystem data (e.g. temperature). In [61], Sukharev et al analyze time series of multivariate 3D time series of climatic data (temperature, index of salinity, ...). They perform principal component analysis to reduce the size of the data and thenk−means clustering classify data segments of fixed duration into homogeneous classes. All the data processing is done in a centralized manner In the sensor network literature, there are several works addressing the problem of distributeedge (contour) detection [9], [37], distributedclassification ofobjects [17] and ofhumanactivities[66]. However,tothebestofourknowledge,noneoftheseandsimilar worksattempts toaddresstheproblemwiththegoal ofefficient skimmingofinteresting data to transmit to the base station. in some cases, it is not clear whether extracting those features in a distributed manner at the sensor network is more energy efficient than performing data gathering and then the extraction at the sink or at the base station. many results on the temporal indexing of sensing data on natural phenomena. A distributed decision tree based classifier is presented in in [8]. In this approach, each node classifies its own time series of measurements separately. However, the pa- rametersoftheclassifiersareupgradedandsharedamongthenodesbymeansofpseudo random training sequences. Note that in our work, we are trying to classify the whole sensing frame at a certain time, instead of the various measurement time series at each node. Another somehow related topic is distributed anomaly detection [48]. The main goal is to detect security attacks to the sensor network, as well as faulty measurements from the sensors. Again, the focus of these works is on analyzing sensing streams at the nodes rather than the sensing frame. In the second part of this chapter, we study the impact of subsampling over clas- sification via intensity histograms. Some related works are listed here. The intensity or color histogram has been used in the image and video processing fields for several applications: still images queries/matching [16], additive noise detection [49], steganal- ysis [26], detect shot cuts in videos [22]. The impact of subsampling images on histogram based retrieval has not yet been studied in depth. A reason is probably that there is very little to gain in terms of computational and memory cost when processing histograms of subsampled images on apc. In[6] itisexplicitly saidthatthesubsamplingofimages isnotadvisedwhenusing the histograms for retrieval purposes, because aliasing effects can smooth the shape 10 of the histogram. In [11] it is studied the problem of image retrieval via thumbnail images. However, noanalytical results are provided onthe the relationship between the histogram of an image and the histogram of the thumbnail of that image. Recently, some works on reversible data hiding techniques [34], steganalysis [36], [69], [32] have highlighted that a stego image (i.e. the hidden image) can be detected by analyzing the histogram of a subsampled version of the stego message (i.e. cover + stego’ image). 2 However, some of the assumptions of the authors, however, don’t apply to our problems, because the number of pixels of an image is much larger than the number of ’pixels’ of a sensing frames. Besides, they don’t focus of the problem of histogram matching. The effect of sub-sampling on intensity histogram has been also studied in medical imaging [4]. The problem of serial decision fusion has been studied for target detection [64]. A fuzzylogic scheme issuggested in[31] foramobile sensornetwork wherenodescan take decisions from incomplete sets of data. However, those results don’t seem to apply to our specific problem. detection in sensor networks in [48]. 1.3.3 Contributions Tothebestofourknowledge,thisisthefirstattempttostudytheproblemofdistributed content classification for sensor networks in a comprehensive manner. We claim the following contributions. • We define a class of admissible features that satisfies strict energy efficiency con- straints. We propose algorithms for the distributed real time extraction of such features and automatic classification of rounds of sensing data. Our work high- lights trade-offs and key design goals and guidelines to deal with the aforemen- tioned problems by means of analytical and numerical results. • We provide an analytical and experimental (via synthetic and real data) study on the impact that spatial subsamplinghas over the histogram feature andhence the classification process (Sec. 6.4). This is the most challenging problem addressed bythiswork. theapproachseemstobeintherightdirectiontoprovideacomplete analytical characterization of the problem. • We study the classification performance for the rain fall data set over spatially subsampled data via several experiments (Subsection 6.5.3). 2 The histogram of the sum of a cover image and a stego image is equal to the convolution between the histograms of the stego and cover images, by analogy with PDF of the sum of two independent random variables [26]. 11 • We propose an approach for distributed classification of the sensing frame based on a heuristic measure of confidence with some preliminary experimental results. With this framework, also nodes, other than the sink, can take classification de- cisions for the whole network (Sec. 6.6). 1.4 Outline of the Dissertation The remaining chapters of this thesis are organized as follows. Some background in- formation on wireless sensor networks, PDE and diffusion phenomena, spatially non stationary correlation models, information theory for sensor networks and image re- trieval is provided in Chapter 2. The basic framework for the identification of diffusion parameters via sensor networks is presented in Chapter 3, where the sensor region is a one-dimensional space and the coefficients are constant. Some extensions of the basic framework to the case of two-dimensional sensor regions, spatially varying parameters anddecisionfusionimplementationaregiveninChapter4. Chapter5presentsthestudy onoptimalsinkplacementforsensornetworksmonitoringspatiallynonstationarymod- els and adopting correlated data gathering. The problem of efficient data gathering via distributed classification of the rounds of sensing data is studied in Chapter 6. Finally, concluding remarks, open problems and future directions of research are discussed in Chapter 7. 12 Chapter 2 Background Review Thischapterprovidessomebackgroundinformationofthisresearch. Somebasicnotions on wireless sensor networks are discussed in Sec. 2.1. The diffusion phenomena and its PDE modeling are addressed in Sec. 2.2. Finally, previous work on physical-model based identification using traditional centralized systems is reviewed in Sec. 2.3. 2.1 Wireless Sensor Networks: Applications and Tasks A wireless sensor network consists of a set of sensing devices with computation, stor- age and communication capabilities. Such devices are required to adopt intelligent cooperation strategies to autonomously process their sensed data and extract useful in- formation from them. This information is sent to a possibly remote base station either automatically or in return to a specific query as shown in Fig. 2.1. One of the research goals in this field is to manufacture small (< 1cm) and inex- pensive (< $1) devices so that they can be deployed on a very large scale to provide very dense measurements of the environment. This may enable different interesting applications not feasible yet with current technologies. For this reason, wireless sensor networks are seen as one of the most promising technologies for the 21st century [10]. The requirements of inexpensiveness of devices bring limitations in energy supply, computation, storage and communication abilities. In many scenarios, a sensor node becomes unusableonce the battery is drawn, since it is not replaceable. The above con- straints, alongwithrequirementsoflarge scale deploymentsandautonomousoperation, lead to many research challenges in the scientific community [18]. Research on wireless sensor networks span a very broad set of subjects such as computer science, networking, signal processing, distributed computing, operational research, information theory, telecommunications. Furthermore, the evolution of the field is very fast, with hundreds of new publications every year. Thus, it is difficult 13 1 Sink Base Station User Sensor nodes A B C D E Observed phenomenon Reconstructed phenomenon Queries data Figure 2.1: Overview of a generic sensor networks system, where the nodes send infor- mation to a remote base station communicating though a sink node. The information can be sent either by autonomous initiative of the network or as answer to queries from the base station. to assess the emerging trend and identify key milestone papers. Even articles that are referred to many times today may not turn out to be essential in building future effective applications tomorrow. Therefore, it is challenging in preparing an extensive and updated survey on the development of sensor network technologies. In the following, we will discuss some applications of the sensornet technologies and then outline main tasks that are fundamental to a generic sensor network application. 2.1.1 Applications Some of main application classes are given below [10]. • Detection, localization and tracking of localized objects. Target tracking has been among the first set of applications of this technology [5]. Mostoftheinitialfundingforthisresearchcamefromplannersofmilitarysystems. • Monitoring of diffusion phenomena and/or environments. Ecosystem monitoring is one of the motivating applications for sensor network re- search [18] due to the potential dense samplingof environments achievable thanks to this technology. Other applications are related to monitoring security of water and air against eventual biological or chemical attacks. • Monitoring of infrastructures. To give an example, sensor networks can applied to constructions such as bridges, dams and high-rise buildings to study their reactions to earthquakes. 14 • Industrial sensing. Wireless sensor networks are potentially interesting to the commercial industry, since they may allow replacing a wired sensor system with a wireless one, thus eliminating a large number of wires. The tasks performed by sensor nodes may span from monitoring the health of machines to controlling levels of chemical reagents and/or lubrication levels. • Traffic control. Inexpensive sensor devices may enable the monitoring of traffic at the road inter- section [10] while sensors currently deployed only cover critical intersections due to the cost constraint. Furthermore, people envision sensors to be applied to cars to propagate traffic information from vehicle to vehicle. 2.1.2 Main tasks In this subsection, we review the class of tasks to be performed in a generic sensor network application and briefly outline some related research problems. • Node deployment. This is the first step. The nodes can be manually deployed in ad hoc locations or randomly deployed e.g., thrown from an aircraft over a certain environment. • Network initialization. The network initialization includes all preliminary tasks that the sensor nodes need to perform in order to set up an autonomous sensor network such as self localization (most applications require nodes to know their locations in the field), time synchronization (all nodes should have a common time reference), in situ sensor calibration (for nodes to normalize their measurements), node clustering, task to create radio links among the nodes. A lot of them are open problems. • In-network data acquisition and processing. In this stage, sensor nodes sample the environment and process their data. The tasks may be as simple as taking samples from the environment at a certain sampling time or as complicated as performing collaborative signal processing al- gorithms. The fact that there is cooperation among nodes means that inter-node communication is involved. Often, these nodes are partitioned into clusters to perform distributed compression, target tracking, edge detection, etc. The prob- lem of this research, i.e., estimating diffusion parameters, falls in this category, too. 15 • Off-network data transmission. The off-network data transmission consists of all operations to transport data from nodesto the sink andthen from the sink to the base station. It also includes sending data from the base station to nodes. Those data can be either queries or software updates. The problems involved include network connectivity, routing and security, to name a few. Also, the task of continuously collecting all node measurements and sending them round by round to the sink is often referred to as data gathering. Most of the applications involve a recursive alternation of in-network data data acquisition/processing and off-network data transmission. The information is extracted within the network and then sent to the base station. We observe that there is usually a trade-off between the amount of information sent to the sink and the volume of data processingduringthein-networkacquisition stage. Thus,wecanhave anetwork simply getting samples, without inter-node processing ad then sendingall raw data to the base station (data gathering) as one extreme. Or, we can have the network do a lot of in-network processing and return for instance only queries to the base station. Our research problem, i.e., the estimation of diffusion parameters, contains both in-network and off-network communications. It demands a larger amount of in-network processing to reduce the amount of information to be sent to the sink. The lifetime of a sensor node is limited due to energy supply from nonreplaceable batteries. Theradioisthemostenergyconsumingcomponentforasensornode. Inpar- ticular, the transmission cost is higher than the receiving and standby costs. However, those two are not negligible [47]. In this regard, people investigate the fundamental bounds for the lifetime of a sensor networks. It is interesting to note that they only assume a data gathering model where little processing of data is done at nodes and most measurements are sent to the sink round by round [33]. 2.2 Diffusion Partial Differential Equation (PDE) 2.2.1 Introduction This section introduces the diffusionpartial differential equations. A partial differential equation (PDE) is an equation relating partial derivatives of the dependent variable. Examples of PDE’s are: x t = x ξξ −x ξ , (2.1) x tt = x ξξ +u(ξ,t). (2.2) 16 where t and ξ are time and space domain variables, respectively, and x t and x tt (or x ξ and x ξξ ) denote the first and the second-order partial derivatives with respect to t (or ξ). Note that Eq. (2.1) is a parabolic equation while Eq. (2.2) is a hyperbolicequation. PDE’s can be solved with the knowledge of the initial condition (IC), i.e. x(0,ξ) = x 0 (ξ) and boundaryconditions (BC’s). There are several types of boundaryconditions, modeling different kinds of physical constraints at boundaries. The two most common types of BC’s are the Dirichlet and the Neumann conditions. Given boundary point ξ i =0 or L, the Dirichlet condition describes a certain value for the scalar field: x(ξ i ,t) =ψ i (t). (2.3) The Neumann condition imposes a certain flux at the boundary: x ξ (ξ,t) |ξ=ξ i =ψ i (t). (2.4) Consideraspace andtime varyingscalar fieldx(ξ,t) associated toacertain physical phenomenon. The diffusion equation is a PDE of the form [23]: x t (ξ,t) =Dx ξξ (ξ,t), 0<ξ<L, x(ξ,t). (2.5) wherethe scalarD iscalled the diffusivity. Typically,x(ξ,t) representsaconcentration. ThemeaningofEq. (2.5) isthat there isnetflowof particles fromregionscharacterized byahigherconcentration totheonesofalower concentration. Heat diffusionisanother example of the diffusion process. In such a case, Eq. (2.5) is called the heat equation andx(.) representsatemperature,T(.). Thegeneral expressionofahigher-dimensional diffusion equation can be written as x t =∇ 2 x, (2.6) where ∇ 2 is the Laplacian operator. An interesting property of the diffusion equation is the smoothing effect. Thanks to the proportionality between time derivative x t (ξ,t) and local curvature x ξξ (ξ,t), Eq. (2.5) has a low pass filtering action on the spatial profile of x(ξ,t) over time. Besides, the diffusion equation is invariant under reflexion operations in space, i.e. x t (ξ,t) =x t (−ξ,t). However, it is irreversible with respect to time t. In more general cases, the diffusion equation can have lower order derivative terms x ξ (ξ,t), x(ξ,t) and a sourceu(ξ,t). That is, we have x t (ξ,t) =θ 1 x ξξ (ξ,t)+θ 2 x ξ (ξ,t)+θ 3 x(ξ,t)+u(ξ,t). (2.7) 17 The parameter θ 2 is sometime called the velocity, while θ 3 is referred to as dissipation or dispersion. The parameters {θ i } can be space and time varying. Under a particular hypothesis on boundaries and on parameters (say, they must be constant), the diffu- sion equation can be solved analytically, e.g. by Fourier series. Otherwise, numerical methods are needed. Fig. 2.2 shows the range in cm 2 /s for the diffusivities of gases, liquids and solids. Note that diffusivities of gases and liquids vary on tight ranges, while the ones of solids vary on a much wider range. Diffusivities of gases and liquids are a function of the temperature [23]. 10 ÷10 -1 Gases 10 -4 ÷ 10 -6 Liquids Solids 10 -8 ÷ 10 -20 Figure 2.2: The diffusivity ranges in cm 2 /s for gases, solids and liquids. 2.2.2 Analytic solution of the heat equation For the following 1-D heat equation x t =Dx ξ,ξ , −∞<ξ <∞, x(ξ,0) =f(ξ), (2.8) we can have a closed form solution as x(ξ,t) = 1 √ 4πDt Z ∞ −∞ f(ξ)exp − (x−ξ) 2 4Dt dξ. (2.9) Note that whenf(ξ)=δ(ξ−a), the solution becomes x(ξ,t) = 1 √ 4πDt exp − (x−a) 2 4Dt . (2.10) An example of this solution is plotted in Fig. 2.3 a different time instance. TheGreen’s functionsallow anintegral representation ofthe solution ofaPDE [23]. Inthiscontext,thesolutioncanbeseenasaspaceandtimevaryingconvolutionbetween thesourcetermandtheGreen’sfunction. Considerasexamplewheretheheatequation has an active source u(ξ,t) and is given as x t =x ξξ +u(ξ,t), 0<ξ<1, t>0, (2.11) 18 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8 9 ξ x(ξ,t) t=.001 t=.032 Figure 2.3: The example of a 1-D heat diffusion field with x(ξ,0) =δ(ξ) and D =1. subject to x(ξ,0) = f(ξ), 0<x<1, (2.12) x(0,t) = x(1,t) =0. (2.13) Then, the solution can be written as [23] x(ξ,t) = Z 1 0 f(s)G(ξ;s,t)ds+ Z 1 0 Z 1 0 u(ξ,τ)G(ξ;s,t−τ)dτds, (2.14) where the Green’s function for this case is given by G(ξ;s,t−τ) =2 ∞ X n=1 sinnπs·sinnπx· −(nπ) 2 (t−τ) . (2.15) 2.2.3 Numerical solution To solve the diffusion equation by numerical methods, the derivatives must be approx- imated with finite difference as x ξ (ξ,t) ξ=ih ≈ x i+1 (t)−x i (t) h , (2.16) and x ξξ (ξ,t) ξ=ih ≈ x i+1 (t)+x i−1 (t)−2x i (t) h 2 , (2.17) 19 where h is the spatial sampling period. The application of the above finite differences to Eq. (2.5) at a generic point of discretization (ξ,t) =(ih,kt s ) gives x i (k+1) = Dt s h 2 [x i−1 (k)−2x i (k)+x i+1 (k)]+x i (k), (2.18) wheret s is the samplingtime andx i (k) =x(ih,kt s ). It can be shownthat the sampling time t s must obey the following inequality for the discrete approach to converge [23]: t s < h 2 2D . (2.19) Since the size of the sampling space can be also constrained by the profile of the scalar field, the inequality in Eq. (2.19) can pose a severe restriction on the sampling time. 2.3 Model Based Identification The problem of identification of parameters of diffusion models has been investigated throughout mathematical and numerical methods by people in the area of material sciences [30,67]. Besides, the identification and control of systems modeled by PDE’s have been studied by researchers in the fields automatic control and applied mathe- matics [2,43]. In this context, systems represented by PDE’s are referred to as infinite dimensional or distributed systems 1 [15]. Moreover, a lumped system indicates asystem that has discrete components (e.g. the discretized version of an infinite dimensional system). Here, we follow the terminology of system theory, but reserve the term dis- tributed algorithms to indicate algorithms using multi-point independent computations and term distributed computation with the same meaning as commonly used in the sensor network context. Several adaptive algorithms for the identification of parameters of infinite dimen- sionalsystemsinthecontinuoustime-spacedomainhavebeenpresented[2,43]. Onema- jor application of this study is the automatic control of PDE-modeled industrial plants. Adaptive control schemes that rely on the estimation system states and unknown pa- rametersofaplantwereproposedbyBaumeisteret al.[2]. Theyalsodevelopedatheory on the use of a finite dimensional approximation to the infinite dimensional estimator. Orlov and Bentsman [43] studied the problem of identifying an infinite dimensional system with spatial-varying parameters. They derived some constructive identifiability conditionsthatallowtodeterminepersistentlyexcitinginputsources. In[68], Syrmoset 1 In this context, the term distributed systems has a different meaning from those normally given in the sensor network literature. 20 al. formulated the problem via discrete-time state space equations and then estimated the parameters using nonlinear filtering techniques such as the of the extended Kalman filter (EKF). All aforementioned algorithms are centralized and sensors are assumed to be con- nected to the computing center directly. Besides, the computing center is assumed to haveperfectknowledgeofboundaryconditionsandmanipulabilityofsources. Theseas- sumptions are reasonable in classic system control and identification problems, but not applicable in a wireless sensor network environment. In the latter context, the amount of data being exchanged among nodesis a critical parameter due to the limited amount of energy in sensor nodesandthe relatively high communication cost. We addresssome of the above constraints in this work, as described in the next chapters. 2.4 InformationTheoryforSensorNetworksandSpatially Nonstationary Models 2.4.1 Joint coding Sensor networks deal with sampling of continuous quantities. In this work, nodes are assumed to perform uniform quantization with step size equal to Δ. Some results on joint coding used in our work are given below [12]. The joint entropy of a set of n random variables {Z i } with density f(z 1 ,z 2 ,...,z n ) is approximately: H(Z 1 ,Z 2 ,...Z n )≈h(Z 1 ,Z 2 ,...,Z n )−nlogΔ, (2.20) where log denotes log 2 and the differential entropy (see [12] for the definition). If {Z i } have a multivariate normal distribution N n (μ,K), i.e.: f(z) = 1 ( √ 2π) n |K| 1/2 e − 1 2 (z−μ) T K −1 (z−μ) , (2.21) where μ is the vector of the means and K is the covariance matrix, then the joint entropy of {Z i } is approximately [12]: H(Z 1 ,Z 2 ,...,Z n )≈ 1 2 log 2 [(2πe) n |K|]−nlog 2 Δ. (2.22) 2.4.2 Distributed coding In a sensor network framework the rate at the nodes (i.e. the joint entropy) dependson the transmission structure (namely, the data gathering tree) and on the coding strategy 21 being adopted. Figure 2.4 shows an example of data gathering tree. The two main distributed lossless coding approaches studied in sensor networks are [14]: • Explicit communication • Slepian-Wolf Inthe explicit communication scheme, anodeencodes its dataconditioning on the data received from the children. The amount of side information available at a certain node i depends on the number of children ofi and on the spatial correlation among the data of the children. Example 1 In Figure 2.4, node Z 2 encodes its own data conditioning on the ones from node Z 3 and then send them to node Z 1 . The incremental rate at Z 2 is at least H(Z 2 /Z 3 ); the incremental rate at Z 1 is at least H(Z 1 /Z 2 ,Z 3 ). The total rate at node Z 1 is at least: R(1) =H(Z 3 )+H(Z 2 /Z 1 )+H(Z 1 /Z 2 ,Z 3 ). Figure 2.4: An example of data gathering tree. In the Slepian-Wolf scheme, a node encodes the data based on the global knowledge ofthestatisticsofthephenomenon,withoutnecessarilyconsideringthesideinformation from received data. [14]. Slepian and Wolf showed that two correlated random sources can encode their own data at an overall rate equal to the joint entropy R 1 +R 2 = H(Z 1 ,Z 2 ) without the need of communicating as long as their individual rates are at least equal to the conditional entropies H(Z 1 /Z 2 ) and H(Z 2 /Z 1 ), [12]. The result can be generalized to the n-dimensional case. Note that if the two sources do not know the global statistics, and do not communicate, then their overall rate is R 1 +R 2 = H(Z 1 )+H(Z 2 )≥H(Z 1 ,Z 2 ). Slepian-Wolf is the optimal distributed coding strategy, because it achieves the best exploitation of the correlation among the data. 22 2.4.3 Correlated data gathering Correlated data gathering studies problems related to energy efficient distributed sam- pling andcompression of spatio-temporal fields. All the sensingdata are sent via multi- hopping to one or more sinks. The nodes are assumed to have a limited energy supply. Data transmission is the most energy consuming operation performed by a node. Typ- ically the cost for transmitting a packet from a node is expressed as [rate]×[distance] k for channels affected by low noise and as [exp(rate)]×[distance] k for very noisy chan- nels [50], with k = 2,...,4. In order to decrease the rate and save energy, a node can compress its own data either using the side information received from other nodes or usinga-priori informationonthecorrelation ofthephenomenon(Slepian-Wolf scheme). The set of routes heading to the sink (the data gathering tree, see Fig. 2.4) needs also to be chosen properly so that to minimize the energy consumption. In large sensor network deployments theremay bemore thanone sink. Thisusuallyimplies apartition of the nodes, so that each node communicates either directly or indirectly to one and only one sink. Cristescuetalprovedthatthedesignoftheoptimalcorrelationawaredatagathering tree is an NP-complete problem for explicit communication schemes [14]. In this case, the optimal structure isthe Steiner minimumtree (SMT). Ifglobal Slepian-Wolf coding isperformed,theoptimaltransmissionstructureistheshortestpathtree(SPT)andthe most energy efficient coding strategy consists in conditioning one node’s data over the dataofallthenodesclosertothesink[14]. Thisstrategyiscalled optimal Slepian-Wolf. It assign the lowest possible rate to the nodes most distant from the sink. In [70], the authors showed that the ratio of the energy cost of shortest path ag- gregation strategies over the energy cost of optimal correlation driven data gathering strategies (Steiner minimum tree) grows slowly as θ( √ logN), where N is the number of nodes in the network. The practical implication of this result is that routing driven aggregation strategies, such as shortest path tree, seem preferable to correlation aware structures because of the simplicity in the construction of the data gathering tree and the comparable, although if not optimal, energy cost. All the aforementioned works usually considers stationary spatial covariance models (often K(d) =exp(−ad k ), where d is the distance between two sensing nodes). 2.4.4 Spatially nonstationary models Geostatistics [13], [24] deals with the problem of developing statistical models for geo- physical processes from a limited number of observations. Z(x,t), (x,t)∈R p ×R. (2.23) 23 Second order moments of Z(.) are used for prediction of the random field and various tasks. The covariance depends on the space time coordinates (x 1 ,t 1 ) and (x 2 ,t 2 ). Usu- ally simplifying assumptions are done. The random field Z has a separable covariance function if K[Z(x 1 ,t 1 ),Z(x 2 ,t 2 )]=K S (x 1 ,x 2 )K T (t 1 ,t 2 ). (2.24) Separability is one of the assumptions used in our work, we focus only on the spatial statistics of the phenomena. A process Z(x,t) has a spatially stationary covariance, if C[Z(x 1 ,t 1 ),Z(x 2 ,t 2 )] dependson the observation pointsx 1 andx 2 only via the spatial separation vector, x 1 −x 2 . A key property of covariance functions is non-negative definiteness. A covariance function needs to be non-negative definite to be associated to a physically realizable process, i.e: n X i=1 n X j=1 a i a j K(x i ,x j )≥0 for all a i , a j , x i , x j . (2.25) Thispropertyishardtoverifyforarbitrarynonstationarycovariancefunctions. Inother words, simply plugging some spatially varying parameters into astationary model, may not lead to a valid correlation model or, at least, to a model that can be shown to be valid. Nonstationary spatial correlation models have been studied in the field geostatistics since 1992, with most of the recent results, including close form expressions, in the last fouryears. Manyofthemodelsassumeseparabilityoftimeandspace,withthetemporal independence (possibly after some preprocessing of the processes). The requirement of non-negative definiteness poses the main challenge in this research field. Perhaps the simplest way to model a nonstationary spatial covariance is by means ofaset oflocally stationary orthogonal models. Thismethod, although mathematically simple, is not very flexible for representing real phenomena [56]. There are three main classes of approaches to define nonstationary spatial correlations: spatial deformation approaches, convolutional methods, and close form expressions. In [57] nonstationary spatial statistics are defined through a bidimensional map- ping from R 2 to R 2 of the sensor region. The observation points in the geographical plane (G-plane) are re mapped to a deformed plane (D-plane) defined so that their spatial correlation structure is stationary over the D-plane. Those methods adopt mul- tidimensional scaling (MDS) to determine the mapping of the observation points to the deformed region. However, finding the general mathematical relation between the G-plane and the D-plane has generally high complexity [24]. 24 In [28], a convolutional approach to define nonstationary correlation models is pre- sented. The idea is to define anonstationary spatial process as a moving average model based on spatially varying kernels. The choice of the kernels determines desired prop- erties in the correlation function, e.g. anisotropy. The motivation for defining the correlation through the kernels is to ensure positive definiteness of the correlation by construction [56]. There are also recent works that derived closed form expressions for nonstationary covariance functions [44], [60]. The following theorem provides a close form expression for a nonstationary covariance [60]: Theorem. Suppose Σ a mapping from R p to a positive definitep×pmatrix,μis anonnegative measure on [0,∞], andg(.,x)∈L 2 (μ) for each x∈R p . Then an expression for a covariance function is given by: K(x 1 ,x 2 )= |Σ(x 1 )| 1/4 |Σ(x 2 )| 1/4 |Σ(x 1 ,x 2 )| 1/2 Z ∞ 0 e −wQ(x 1 ,x 2 ) g(w,x 1 )g(w,x 2 )μ(dw), (2.26) where Σ(x 1 ,x 2 ) = 1 2 Σ(x 1 )+ 1 2 Σ(x 2 ) andQ(x 1 ,x 2 )=(x 1 −x 2 ) ′ Σ(x 1 ,x 2 ) −1 (x 1 −x 2 ). The covariance expression in (2.26) can be tuned to have desired short and long term properties via proper choices of the g(.,x) and Σ. 25 Chapter 3 PDE Parameter Estimation via SensorNets: Basic Framework The basic framework to identify the parameters of partial differential equation (PDE) models via a sensor network system is described in this chapter. Our discussion will focus on parabolic PDE’s. This class of PDE’s can model a broad set of physical phenomenathat are related to the diffusion of gases and fluids in various environments, which is interesting to many sensor network applications. However, this framework is applicable to physical phenomena modeled by other classes of PDE’s as well (e.g. acoustic waves). Toconvey thebasic ideathatisessential toourtreatment, weconsider the one-dimensional sensor field in this chapter. The extension to the two-dimensional sensor field will be given in the next chapter. Thechapter isorganized asfollows. Anoverview oftheproposedframeworkisgiven in Sec. 3.1. The problem formulation is presented in Sec. 3.2. The PDE modeling with known boundary conditions is examined in Sec. 3.3. A Kalman filtering technique is proposed for parameter estimation in Sec. 3.3.2. The PDE modeling with unknown boundary conditions and active sources in the sensor field are treated in Secs. 3.4 and 3.5, respectively. Finally, simulation results are shown and discussed in Sec. 3.6. 3.1 Overview of Proposed Framework The first step in our framework is to derive a discrete-time state-space model from the PDE, which is continuous in both space and time, to represent the physical field being monitored by sensor nodes. Here, the state of the system is the spatially sampled physical field. The sensor nodes are assumed to be partitioned into clusters. The state space model can either represent the physical field beingmonitored by the entire sensor 26 network (i.e., the global model) or a portion of the physical field being sampled by a given cluster (i.e., the local model). The boundary conditions (BC’s) of the PDE are among the factors involved in the definition of the state space model. We address the cases of known and unknown BC’s in Secs. 3.3 and 3.4, respectively. After the set-up of the model, a cluster head adaptively processes measurements received from its member nodes. This corresponds to a data fusion architecture. Under the assumption of Gaussian noise, the optimal way to estimate the physical model from randomly deployed nodes is to perform state prediction through Kalman filtering. Here, we are interested in estimating the parameters of the model (or, equivalently, the PDE coefficients). Therefore, the state space system is augmented by including these parameters as additional state variables. The parameters and the physical field are estimated jointly via the extended Kalman filter (EKF). The parameter estimates may then be sent to the base station or used for other network tasks. ) ( ) ( 1 k y k y i EKF Member Nodes Communication Links Cluster Head ) ( ˆ ) ( ˆ k k x Figure 3.1: An overview of the proposed system. The estimation of parameters of PDE-based physical models can have many appli- cations in sensor networks. For instance, the parameters can be estimated in response to a query from the BS for the study of physical properties of a particular environment. Moreover, the knowledge of parameters along with initial and boundary conditions al- lows us to solve the PDE problem. Thus, the estimates of PDE coefficients could be used by nodes to perform the in-network prediction (or reconstruction) of the physical field via standard Kalman filtering. Under such a scenario, only relevant events need to be notified to the base station. Alternatively, the estimates could be sent to the BS to supply the information necessary to reconstruct and predict the phenomenon without the need for the sensor network to continuously send raw data to the BS. Note that the latter option requires the transmission of a smaller amount of data with respect to data gathering since the rate of variation of the parameters in time and space is slower than that of the field itself. Furthermore, the estimates can be used for the coding purpose [3]. 27 3.2 Problem Formulation We consider a finite set of sensor nodes S = {s i } deployed over the one-dimensional sensor field [0,L]. The nodes are measuring discrete-time samples of the scalar field x(ξ,t), continuous in time, for t > 0 and space for 0 < ξ < L. It is assumed that the evolution in time and space ofx(.) can be modeled by the following sourceless diffusion equation x t (ξ,t) =θ 1 x ξξ (ξ,t)+θ 2 x ξ (ξ,t)+θ 3 x(ξ,t), (3.1) with initial condition (IC) x 0 (ξ). The boundaries, ξ = 0 and ξ =L, are subject either to Dirichlet or to Neumann conditions, not necessarily homogeneous. For simplicity, we consider only the one-dimensional (1-D) case in space here. The extension to a higher dimensional case will be given in Chapter 4. Also, note that Eq. (3.1) can be used to model real phenomena such as the diffusion of a fluid in a water channel or the propagation of heat in a metallic rod. Notice that, even if we did not include a source term in our model (3.1), the BC’s can be used to model external sources. Hence, the model in Eq. (3.1) can reflect many real world scenarios. Sec. 3.5 treats the case of parameter identification when there are active sources in the field. It is assumed that nodemeasurementsarecorruptedbyadditivezero-meanwhiteGaussiannoise,tomodel thermal noise from the analog/digital circuitry connected to the sensor device and also the finite precision effect of the A/D converter. Thermal noise usually dominates over the quantization effect. Furthermore, we assume that nodes {s i } know their absolute location {p i } in the field and they are partitioned into clusters {S j }. We also assume that there is reliable communication among nodes (i.e. no packet loss) and that member nodes can talk simultaneously to the leader without interference within a cluster. The task of the sensor network is to identify parameters {θ j }, j =1,2,3, at some nodes {s i }. 3.3 Systems with Known Boundary Conditions This section describes our approach to the identification of parameters of diffusion modelsunderthe assumptionthat the boundaryconditions are knownapriori andthat this knowledge is embedded in the nodes of the sensor network. This may be the case of some controlled plant application scenarios or when the sensor region over extends the phenomenon region so that homogeneous Dirichlet BC’s can be assumed. Then, a global discrete model for the physical phenomenon being monitored can be considered. The proposed discrete time-space model is presented in Sec. 3.3.1. The algorithm to identify system parameters is derived in Sec. 3.3.2. 28 3.3.1 Model discretization and clustering We derive a discretized model for the PDE problem described in equation (3.1) to be adopted by the nodes in order to identify the parameters. We assume that the infor- mation on the exact BC’s is provided to the nodes (e.g. by means of some information flooding mechanisms). The nodes are partitioned into clusters, either statically or dy- namically. Here we do not specify the mechanisms to select the cluster heads nor to select the active clusters. For instance, some distributed election techniques for node clustering are described in [62]. ξ=1 ξ=0 … x 1 (k) * y 1 (k) +v(k) x(ξ,t) x 2 (k) x i (k) Figure3.2: Statevariables,x i (k), andanoisysensormeasurementy 1 (k)forthemonodi- mensional scalar field x(ξ,t). The scalar fieldx(ξ,t) is sampled in space and time at points (ih,kt s ), whereh and t s are the sampling periods in space and time, respectively, and i and k are integers. The discrete samples x i (k) represent the state of the system. Let N = L h and there are N +1 sampling points in the sensor field. We will show below that the number of state variables can be less or equal to N +1, depending on the type of BC’s. The sampling time t s must be selected according to the inequality constraint given in Sec. 2.2; namely, t s < h 2 2D . (3.2) The discretization in space and time of the PDE given in Eq. (3.1) requires approx- imating spatial and time derivatives by means of finite differences, Eqs. (2.16-2.17). The complete discretized model, called the lumped system, includes the noisy sensor measurements. Fig. 3.2 illustrates the spatial sampling of a scalar field. For a generic point ih not adjacent to the boundaries, the expression for x i (k) is similar to the one in Eq. (2.18). However, there may be more terms if lower order space derivatives are also included in the model. If the point is next to the boundaries, 29 i.e., for i = 1 or i = N, the effect of the BC’s need to be considered. We express the Dirichlet boundary condition, Eq. (2.3), by introducing the boundary value as virtual source term in the state equation. For example, we define x 1 (k), which is next to the boundary point ξ =0 where x 0 (k) =ψ 0 (t) |t=kts , as x 1 (k+1) = θ 1 t s h 2 [x 0 (k)−2x 1 (k)+x 2 (k)]+x 1 (k), = θ 1 t s h 2 [−2x 1 (k)+x 2 (k)]+x 1 (k)+θ 1 t s h 2 ψ 0 (kt s ). (3.3) In approximating the Neumann BC, Eq. (2.4), we must include the boundary vari- able x(ξ i ,t) in the state variables, since this is not constrained for the Dirichlet BC. Then, we can expressx(ξ i ,t) by a finite difference approximation of the actual BC. For example, the BC x ξ (ξ,t) |ξ=0 =ψ 0 (t) (3.4) can be approximated in the discrete space as x ξ (ξ,t) |ξ=0 ≈ x 1 (t)−x 0 (t) h , (3.5) and this leads to x 0 (k+1) = θ 1 t s h 2 [x 0 (k)−2x 1 (k)+x 2 (k)]+x 1 (k) −hψ 0 (k) (3.6) x 1 (k+1) = θ 1 t s h 2 [x 0 (k)−2x 1 (k)+x 2 (k)]+x 1 (k). (3.7) To derive (3.6), we discretize Eq. (3.5) to obtain x 0 (k+1) =x 1 (k+1)−hψ 0 (k) (3.8) and then replace x 1 (k+1) with expression (3.7). Thediscretization of PDE inspace andtime andmeasurements leads tothe lumped model described by the following state-space equations: x(k+1) = A(θ)x(k)+B(θ)u(k), (3.9) y j (k) = C j x(k)+v(k), (3.10) where j is the index of cluster S j , state vector x(k) represents a uniformly sampled version of the scalar field x(ξ,t) with x i (k) = x(ih,kt s ), and state matrix A(θ) is a square matrix dependent on the spatial derivatives of x(.) in the right hand side of Eq. 30 (3.1), the BC’s, the sampling periodsh and t s and the set θ of parameters to identify. If the parameters are constant over the space, A(θ) is given by A(θ) =t s 3 X m=1 θ m A i +I, (3.11) where I is the identity matrix and matrices A i are obtained from the finite difference approximation of the spatial derivatives on the right hand side of Eq. (3.1). Eq. (3.10) expresses the sensor measurements y j (k) in cluster S j as a linear in- terpolation, C j x(k), of state variables, which is affected by the AWGN term v(k). Note that nodes in general have random locations with respect to sampling points {ξ = ih : i = 1,2,...}. Thus, in most cases, Eq.(3.10) is a linear approximation of the relationship between state variables and noisy sensor measurements. Eq. (3.10) should be written as y j (k) =g j (x(k))+v(k), (3.12) whereg j (x(k)) is a nonlinear vectorial function of state variables. Hence, a polynomial interpolation may approximateg j (.) better than a linear one. We will analyze how this approximation and eventual uncertainties on the node locations in Sec. 3.3.5. In some cases, it may be necessary to include in Eq. (3.9) an additional term to account for noise in the environment. Thus, we have x(k+1) =A(θ)x(k)+B(θ)u(k)+w, (3.13) wherew is a vector of zero mean Gaussian terms whose covariance is R w . Example 2 In this example, we derive a lumped model for the infinite dimensional system described by the diffusion equation x t (ξ,t) =θx ξξ (ξ,t), where t > 0 and 0 < ξ < L with mixed Dirichlet and Neumann boundary conditions, i.e., x(0,t) =ψ(t), x ξ (L,t) =0. There are two sensors in the field, respectively, in locations ξ 1 = 3L/2 and ξ 2 = 3L/5, the spacial sampling rate is h = L/5, and there is only one cluster formed by the two sensors. The following 5×5 state matrix A(θ) can be derived in a straightforward manner 31 through the finite differences approximation presented in Eq. (2.17). That is, A(θ) = t s θ h 2 −2 1 0 0 0 1 −2 1 0 0 0 1 −2 1 0 0 0 1 −2 1 0 0 1 −2 1 +I, (B(θ)u(t)) = θ h 2 h ψ(t) 0 ... 0 i T and measurement matrix C can be written as C= " 1 2 1 2 0 0 0 0 0 1 0 0 # In the above framework, some kind of information flooding mechanism is needed so that active nodes can share the knowledge of all model parameters, e.g., the location of the boundaries and spatial sampling rate h to determine matrix C j . Furthermore, under the assumption of known boundaries, nodes share the same state matrix A(θ), but have different measurement equations given by (3.10). In an application scenario, the selection of initial parameters could be led by the BS station or a node leader in a sensor network to avoid mismatch in the model definition. 3.3.2 Parameter estimation via Kalman filtering The next step lies in the adaptive identification of the unknown parameters θ. For this purpose, we adopt the extended Kalman filter (EKF) [39] with an approach similar to the one in [68]. The key idea of the EKF approach is to treat unknown parameters as additional state variables, by defining an augmented system, where the state vector is defined as z(k) = " x(k) θ # . The augmented system is nonlinear, since parameters θ are multiplied by state variables x(k). Here, the Kalman filter is used as the state predictor to the linearized versionoftheaugmentedsystemandtheestimatesofparametersθareobtainedbecause they are treated as additional state variables in the augmented system. The augmented 32 system can be written as z(k+1) = f(z(k),u(k)) = " A(θ)x(k)+B(θ)u(k) θ # , (3.14) y j (k) = C j x(k)+v(k). (3.15) The linearized version of the augmented system is obtained by approximating it with its first order Taylor series expansion. Therefore, the Jacobians of the state and measurement equations must be derived. Those are defined as follows from equations (3.14) and (3.15): J(k) = ∂f(z(k)u(k)) ∂z z=ˆ z , (3.16) H j = [C j 0]. (3.17) Then, the Kalman filter can be applied. Given the mean of the state vector, ¯ z 0 , the initialization for the covariance matrixP can be found conventionally as P 0 =E[(z−¯ z 0 )(z−¯ z 0 ) T ]. (3.18) Therefore, at each measurement time, the filter equations are [39]: L(k) = P(k/k−1)H T HP(k/k−1)H T +R −1 P(k/k) = P(k/k−1)−L(k)HP(k/k−1) (3.19) ˆ z(k/k) = ˆ z(k/k−1)+L(k)(y(k)−C i ˆ x(k/k−1)), (3.20) where R is the covariance matrix of noise measured by sensors and the fact that Hˆ z = Cˆ x is applied in the derivation of Eq. (3.20). Then, the state estimate and the covariance are propagated to the next measurement time via P(k+1/k) = J(k)P(k/k)J T (k) (3.21) ˆ z(k+1/k) = f(ˆ z(k),u(k)). (3.22) 3.3.3 Identifiability The identifiability problem studies the proper conditions under which a system iden- tification algorithm is able to converge to the correct parameter estimates. The EKF algorithm performs identification via state estimation so that identifiability is related 33 to the observability of the system state. Observability is one of the fundamental problems in linear dynamical systems. A system is said to be observable at time k 0 , if there is a finite k 1 > k 0 such that the knowledge of the outputs y from k 0 to k 1 is sufficient to determine state x(k 0 ). Con- versely, an unobservable system is one where the values of some of the state variables may not be determined from the system output regardless of the number of samples taken. By its nature, a Kalman filter applied to an unobservable system will not work. Similarly, the EKF approach will not work if it is applied to a nonlinear unobservable system. Theobservabilitytestissimpletodoforlinearsystems,butmuchmoredifficult for nonlinear ones. For the latter case, only some approximated methods are available [7]. Alternatively, a practical method to evaluate the identifiability of PDE parameters relies on the study of the analytical solution of the PDE problem as a function of the parametersandtocheckwhethertheparametersareidentifiablefromthemeasurements. 3.3.4 Convergence analysis One of the most important work on the convergence analysis of the EKF as parameter estimator for linear systems is the work by Ljung in 1979 [38]. Ljung showed that the asymptotic properties of algorithm can be analyzed by studying the stability proper- ties of an associated differential equation. He proved that the extended Kalman filter may diverge or converge to biased estimates of the parameters, because the associated differential equations can have several stationary points. The EKF converges to biased estimates if the covariance of the state noise is unknown. In [38], Ljung proposed a modified version of the EKF that achieves better global stability by adding an extra term in the Kalman gain computation. It is worthwhile to point out that aforementioned method for convergence analysis is analytically feasible only for simple EKF (e.g., the dimension of the state is equal to one). Otherwise, numerical methods are needed. It appears difficult to analyze how other design factors of the EKF (such as the dimension of the state, the number of measurements) may affect the convergence of the algorithm. Of course, from intuition and empirical evidence, the increase of the number of measurements typically improves the accuracy of the estimates. The difficulty in analyzing the EKF essentially arises in the fact that the state estimates are performed based on the linear projection of a nonlinear system. 34 3.3.5 Uncertainty on node location In Sec. 3.3, we simplified the expression of the measurement equation (3.12) with Eq. (3.10), where there is a linear relation between states and measurements. The effect of this approximation is negligible on the EKF estimation, since this one requires a linear approximation of the measurement equation as well. Therefore, measurement matrix C j should come close to the Jacobian ∂g j (x(k))/∂z |z=ˆ z . Ontheotherhand,theuncertaintyregardingthenodelocationsmayhaveanimpact ontheconvergenceoftheEKF.Wrongvaluesforthenodelocationwillproduceanoffset ΔC i on the value of the measurement matrixC j . The size of the offset depends on the average location error and on the slope of the physical field. In fast spatially varying physical fields, the consequences of node location errors can be more dramatic. The impact is on the measurement equation, i.e. ˆ z(k/k) =ˆ z(k/k−1)+L(k)(y(k)−(ΔC i +C i )ˆ x(k/k−1)). (3.23) This mismatch can lead to a bias in the parameter estimates. 3.4 Systems with Unknown Boundary Conditions In many realistic sensor network scenarios, a priori knowledge of the BC’s may not be available. Then, it may be necessary to sample on the boundaries of the node region. Here, weproposeanapproachwhereeachclusteracquirestheBC’sthroughthesamples of the nodes located at its boundaries. This implies the definition of a local state space model for each cluster. All assumptions given in Sec. 3.3 remain valid except that BC’s are unknown. The knowledgeofBC’sisnecessarytodefinethestatespacemodeland,therefore,toidentify the parameters of the diffusion model. A possible approach to estimate BC’s is to have some of the nodes in charge of sampling the boundaries. Here, we consider a more general case where the boundaries can be defined not only globally for the sensor field, but also locally for a specific subregion. In order to save the communication energy and also keep a low latency, it appears reasonable to have cluster heads receiving the sampled boundary information from nearby nodes (one hop away). In this framework, the monitored phenomenon is “sliced” in many subregions, covered by clusters, and a local lumped model is defined in each cluster. ConsideraclusterofnodesS j ⊆S,anditssubsetofboundarynodesb j . Forthe1-D case, boundary locations are indicated, respectively, as ξ (j) l and ξ (j) r as shown in Fig. 3.3. The local PDE is given by Eq. (3.1) but only in the space interval ξ (j) l <ξ<ξ (j) r . 35 Besides, BC’s are known through the noisy samples of nodes b j and can be considered as Dirichlet BC’s. That is, the estimated boundaries are ˆ x(ξ l ,t) = x(ξ l ,t)+w l (t), (3.24) ˆ x(ξ r ,t) = x(ξ r ,t)+w r (t), (3.25) where w l (t) and w r (t) are AWGN terms. Therefore, after defining the local sampling space and time (i.e., h j and t j ), and imposing the above BC’s, the state space model for cluster S j can be written as x j (k+1) = A j (θ j )x j (k)+B j (θ)(u j (k)+w j (k)), (3.26) y j (k) = C j x j (k)+v(k), (3.27) wherew(k) is a zero-mean white Gaussian noise that affects the state of the system. If the parameters are not space varying, we have θ j =θ. ξ=0 … … ξ=1 y L (k) y 1 (k) y R (k) * * * x(ξ,t) Figure 3.3: The sampled BC’s, y l (k) and y r (k), are sent to the cluster head in a local model. TheimplicationofEqs. (3.26)and(3.27)isthattheupdateofthecovariancematrix, Eq. (3.21), in Sec. 3.3.2 now includes also a term related to the covariance matrix Q of the state noise. That is, P(k+1/k) =J(k)P(k/k)J T (k)+B(θ j )QB(θ j ) T . (3.28) The above framework for local models is suitable for the identification of spatially varying parameters. If the variation of parameters is slow enough we may assume it to be constant over a limited subregion. Then, we may apply the model defined in Eqs. (3.26) and (3.27) to estimate parameters as if they were constant over a cluster. 36 Then,theglobalvaryingparameterscanbereconstructedbyinterpolatingtheestimates performed at clusters, as it will be shown in Sec. 4.1.6. 3.5 Systems with Active Sources The diffusion equation (3.1) does not include a term for the input source, u(ξ,t). This means that no active sources are in the sensor field during the monitoring/estimation process (i.e., u(ξ,t) = 0 for ξ ∈ [0,1] and t > 0). This is suitable for scenarios where sources either have been active in the field only before the sensor network starts the monitoring activity (e.g., after the explosion of a gas container) or are active outside the field and the BC’s take into account external activities. Under other circumstances, one or more sources may be active in the field while the sensor network is sampling the environment. If the source signal is known, it suffices to include a source term in the prediction equation (3.22) for the EKF. That is, ˆ z(k+1/k) =f(ˆ z(k/k),u(k)). (3.29) This may be doable in scenarios such as the monitoring of industrial plants, where source signals can be sampled and transmitted to nodes. However, the information about the sources may not be always available. Then, the design of a blind source and parameter identification approach may be challenging due to the difficulty to make statistical assumptions on sources that are realistic yet easy to treat mathematically. An approach to perform identification without the knowledge of source signals may be possible by excluding the sources from the sensor field. This can be achieved by formingclustersnotincludingsourceswithintheirboundariesandtorelytolocalmodels to estimate the parameters. The parameter would be estimated as described in Sec. 3.4. Ifthenumberofsourcesandtheirlocationsareunknownapriori,thesensornetwork must detect and localize the sources to exclude them in forming clusters. Some model- basedmethodsforsourcelocalization incentralized sensornetworksaredescribedin[1]. However, the model parameters are not available in our current context, and they need to be estimated after the clustering stage. One possibility is to attempt to localize sources based on some coarse model of the physical field. Actually, physical parameters such as the diffusivity of a certain gas in the air can be known a priori at least in terms of the value range. Alternatively, simpler heuristics such as the peak detection can be adopted. If we assume that nodes are partitioned into a set of clusters S j , a scheme that localizes sources and re-clusters the nodes is done based on a clustering technique as 37 proposed in [55]. That is, we conduct the following. 1. Perform intra-cluster source detection and localization (e.g., a peak-based ap- proach). 2. If a source is detected in clusterS i , splitS i into the child sub-clustersS i1 andS i2 so that S i1 andS i2 do not contain the source. 3. Merge newsub-clusterswiththerulethattwosiblingscannotbemergedtogether. A possible criterion for splitting a cluster would be to separate nodes according to the line connecting the base station (BS) to the source location as shown in Fig. 3.4. In this way, the new clusters keep having more or less the same distance to the BS and the communication cost in not affected. Criteria on the maximum/minimum cluster size may lead the merging phase. Figure 3.4: Cluster splitting. 3.6 Experimental Results The goal of the experiments is to evaluate the performance of the algorithms under different conditions that may arise in a sensor network scenario. In particular, we want to understandthe relationship between performance and location of the sampling point inthefieldandtheperformancewhenthesampledscalarfieldhasaninitiallargespatial bandwidth. 38 3.6.1 Locating the sampling point In this subsection, we want to study how the quality of the estimate varies with respect to noise and the location of nodes that sample the field. In the setup, a single node estimates the diffusivity parameter from its own noisy readings. We measure how the estimation error varies with the location of the node in space and the node level in the data gathering tree. The measured scalar field is modeled by the diffusion equation x t (ξ,t) = 1 π 2 x ξξ (ξ,t), defined in interval 0 < ξ < 1 and t > 0, with initial condition x(ξ,0) = 2cos( π 2 ξ) and Dirichlet boundary conditions: x(0,t) = 2, x(1,t) = 0. A single node s in position p is estimating the diffusivity parameter with sampling time t s = 0.004, about 1/10th of the right hand side of inequality given by Eq. (2.19). The sampling space of the lumped model given by (3.9)-(3.10) is set to 1/11. We assume that the node knows the BC’s. The position p of s varies uniformly on the ξ axis. Different noise levels, σ 2 n , are considered. The average percent error is measured over 200 Monte Carlo trials and for p=k/11,k ∈{1,2,...,10}. We observe that the algorithm may not always converge. The divergent iterations are excluded from the computation of the mean error. The estimates are performed only byone single node. Thisexplainsthe relatively highdivergence rate innoisycases. Theperformancewouldhave beenmuch betterby processingdata frommultiple nodes. Besides increasing the number of sensors, more robust estimates may be obtained also through some modified versions of the EKF. The percentage of convergent trials is shown in the upper part of Fig. 3.5 while the mean error w.r.t. the sensor location is shown in Fig. 3.6 for two different noise levels. The profiles of the scalar field at the initial time and the steady state are shown in Fig. 3.5. It can be noticed that the performance degrades as measurements are taken closer to any of the boundaries, which is not related to the SNR value in a simple relation. TheSNR value is monotonically decreasing when approachingξ =1, but the estimation performance seems to depend also on the rate of temporal variability of the sampled scalar field. 3.6.2 Smoothing property In this subsection, we consider the parabolic equation x t (ξ,t) = 1 50 x ξξ (ξ,t)− 1 50 x ξ (ξ,t)−0.32x(ξ,t). (3.30) 39 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 1.5 2 ξ x(ξ,t) t=0 t=5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 65 70 75 80 85 90 95 100 Sensor locations Convergence rate (%) σ n 2 =.001 σ n 2 = .01 σ 2 n = .1 Figure 3.5: The convergence rate of the algorithm versus the sensor location and the noise level (top), and the profile of the scalar field at the initial state and at the steady state (bottom). Sinceallthecoefficientsarenonzero,theequationcanmodelaphenomenonofadvection- diffusion with dissipation. In other words, besides a diffusion, there is a motion of the concentration (advection)x(ξ,t) due to the term− 1 50 x ξ (ξ,t) and also a dissipation due to the term −.32x(ξ,t). The initial condition is x(ξ,0) = exp(−(ξ−.3). 2 /(.08)) with homogeneous Dirichlet BC’sx(0,t) =x(1,t) =0. Note that the initial conditionx(ξ,0) here has a larger bandwidth with respect to that of the problem in Sec. 3.6.1. The field x(ξ,t) is shown in Fig. 3.7 where the smoothing property of the diffusion equation is highlighted by Fig. 3.8. The spatial bandwidth of the field is getting narrower along time. A cluster of 5 nodes centrally located is used. After 200 Monte Carlo iterations, the mean error plus the standard deviation in the estimation of the three parameters are equal to 1.70+3.41, 1.90+8.01 and 9.38+21.7, respectively. The estimates of PDE parameters are displayed in Figs. 3.10 and 3.11. 3.6.3 Discussion Several factors affect the performanceof distributedparameter estimation of adiffusion model. As expected, the numberof sensor measurements and the SNR value are among those factors. As shown in Figs. 3.10 and 3.11, the algorithm in general converges quite fast, and only a relatively small number of iterations is needed to estimate these parameters. Fromexperimental evidence andtheoretical considerations, webelieve thataneffect 40 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.5 1 1.5 2 2.5 3 Sensor locations mean error (%) σ n 2 . = .001 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 2 4 6 8 10 12 Sensor locations mean error (%) σ 2 n =.01 Figure 3.6: The mean estimation error versus the sensor location and the noise level. ofdecreasingthespatialsamplingstephistoimprovetheapproximationoftheJacobian of the measurement equation, i.e., E[C j ]→ ∂g j (x(k)) ∂x |x=ˆ x as h→0. (3.31) However, this does not necessarily lead to better estimates since it increases the dimen- sion of state vector x and, therefore, the number of parameters that the EKF must estimate. It seems reasonable that the criteria to choose h depend on the spatial band- width of the physical field and the number of measurement points. From the experiment described in Sec. 3.6.1 and other tests, we observed that some nodes can provide better estimates than others, even if the former have a lower SNR than the latter. Recall that BC’s were assumed to be known in those experiments. We also observed that the nodes giving unexpected poor performance despite good SNR’s are usually located close to boundaries. In other words, the SNR value is not the only factor effecting the estimation performance. In the case of constant BC’s (x(0,t) = c), we observe that, as ξ → 0, the field tends to be stationary, i.e.,x(ξ,t) t →0. Therefore, the PDE tends to degenerate to an ordinary differential equation (ODE) and their parameters are non longer identifiable. In other words, the field is close to an equilibrium point (or the steady state) and propertiessuch as diffusivity, that are related to the dynamicsof the field, are nolonger measurable. We observed the same behavior, i.e., degradation of the estimation performance, as 41 0 5 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t (time) ξ (space) x(ξ,t) Figure 3.7: The scalar field described by the parabolic equation (3.30), where an advec- tion phenomenon can be observed and the concentration is moving toward one of the boundaries. nodes approach the boundaries for time-varying BC’s. This can also be explained in terms of equilibrium (or steadiness of the state). Note that the steady state of a linear system does not necessarily mean that its output signal is constant. Another way to interpret this phenomenon is to consider the boundary signal as a source. The medium acts as a time varying spatial filter. The measurement can be interpreted as an output signal. If it is too close to the boundaries, it is difficult to measure the property of the medium accurately. In other words, the filtering effect is still limited since the spatial filter is still “thin”. This can be seen from the behavior of Green’s function, too. Therefore, the estimation performance depend on not only the SNR value but also the node distance from the boundaries. This seems to be an interesting result. We can conclude that it is important to deploy nodes close to constrained boundaries and sources to measure BC’s if they are unknown. Some other nodes need to be placed at other separate locations in sensor regions so that those can measure better dynamics of the physical field, thus providing better estimates of parameters. This fact suggests a different criterion to decide which nodes to activate in the field. 42 0 20 40 60 80 100 120 140 0 1 2 3 4 5 6 7 f s (spatial frequency) X(f s , t) t = 0 t = 4.96 t = 9.96 Figure 3.8: The Fourier transform with respect to the space at three different time instances to show the smoothing effect. 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 0 0.005 0.01 0.015 0.02 t ξ err(ξ,t) 2 Figure 3.9: The squared error in the estimation of a scalar field. 43 0 1 2 3 4 5 6 7 8 9 10 −0.01 0 0.01 0.02 0.03 t (time) θ 1 0 1 2 3 4 5 6 7 8 9 10 10 −10 10 −5 10 0 t (time) cov(θ 1 ) 0 1 2 3 4 5 6 7 8 9 10 10 −2 10 −1 10 0 10 1 t(time) err(t) Figure 3.10: The estimated diffusion coefficient θ 1 (top), covariance matrix (middle) and prediction error (bottom) for the parabolic problem given by Eq. (3.30). 0 1 2 3 4 5 6 7 8 9 10 −0.05 0 0.05 0.1 0.15 t (time) θ 2 0 1 2 3 4 5 6 7 8 9 10 −0.05 0 0.05 0.1 0.15 t (time) θ 3 Figure 3.11: Estimates of coefficients θ 2 (top) and θ 3 (bottom). 44 Chapter 4 PDE Parameter Estimation via SensorNets II: Extensions and Implementation Issues Some extensions and implementational details of the basic framework for the identi- fication of physical parameters via sensor networks given in Chapter 3 are examined here. The generalizations include the cases of two-dimensional (2-D) sensor regions and spatially varying parameters. Furthermore, we discuss about a different strategy of co-operation among nodes, involving decision fusion processing in order to save the communication cost. Finally, we analyze the energy cost of the proposed method, and compare it with the one of sample-based data gathering. 4.1 Extension to 2-D Theregionwhereasensornetwork isdeployed can beoften assumedflat enoughsothat the sensor field can be modeled as a 2-D space rather than a 3-D one. The 3-D model may be needed if there are significant variations in the elevation of the sensor field, e.g. intheunderwateroroversteepgrounds. Inthissection, weformulatethelumpedmodel for 2-D sensor fields and discuss sensor network issues related to boundary conditions and parameter identification through a sample advection diffusion problem in a 2-D field. 45 4.1.1 Discretization of the PDE model Consider a physical field x(ξ 1 ,ξ 2 ,t) and the following advection diffusion problem in a rectangular space: x t (ξ 1 ,ξ 2 ,t) =D∇ 2 x(ξ 1 ,ξ 2 ,t)+v ξ 1 x(ξ 1 ,ξ 2 ,t), (4.1) where 0<ξ 1 <L 1 , 0<ξ 2 <L 2 , t>0, subject to the boundary and initial conditions: x(ξ 1 ,ξ 2 ,t) = 0 on the boundaries, (4.2) x(ξ 1 ,ξ 2 ,0) = f(ξ 1 ,ξ 2 ). (4.3) Recall that the Laplacian operator ∇ 2 inR 2 is defined as ∇ 2 x(ξ 1 ,ξ 2 ,t) =x ξ 1 ξ 1 +x ξ 2 ξ 2 . Let t s and h be the time and space sampling intervals for the state space model, respectively. Without loss of generality, we assume thatL 1 ≤L 2 . The dimension of the state space isn 1 ×n 2 , wheren i =⌊L i /h⌋+1. The discrete sample point is denoted by x i,j (k) =x(ξ 1 ,ξ 2 ,t) ξ 1 =ih,ξ 2 =jh,t=kts . We replace all the differential terms inEq. (4.1) byfinite difference approximations. The 5-point finite difference approximation for ∇ 2 x is given as ∇ 2 x(ξ 1 ,ξ 2 ,t) ξ 1 =ih,ξ 2 =jh ≈ x i+1,j (t)+x i−1,j (t)+x i,j+1 (t)+x i,j−1 (t)−4x i,j (t) h 2 . (4.4) This is also shown in Fig. 4.1. Thus, Eq. (4.1) becomes x i,j (k+1)−x i,j (k) t s = D x i+1,j (k)+x i−1,j (k)+x i,j+1 (k)+x i,j−1 (k)−4x i,j (k) h 2 +v ξ 1 x i+1,j (k)−x i,j (k) h . (4.5) We define state vector x T j (k) scanning the sensor region horizontally as [(x 1,j (k),x 2,j (k),...,x n 1 ,j (k))] for j =1,2,...,n 2 . Also, we can define x T (k) =[x 1,1 (k)...x n 1 ,1 (k)x 1,2 (k)...x n 1 ,2 (k)···x 1,n 2 (k)...x n 1 ,n 2 (k)], (4.6) which is a vector of lengthn 1 ×n 2 . Thus, the state transition matrix is ann 1 n 2 ×n 1 n 2 block matrix, havingn 2 ×n 2 blocks of size n 1 ×n 1 . The state equation can be written 46 i,j i,j+1 i-1,j i+1,j i,j-1 Figure 4.1: The computational molecule for the Laplacian operator and other spacial derivatives. by grouping all the expressions of type (4.5) according to the above definition of x(k). Hence, the state space version for the problem (4.1) becomes x(k+1) = t s DA D +t s v ξ 1 A v ξ 1 +I x(k), (4.7) where A D = 1 h 2 A Ln 1 I n 1 0 ... 0 I n 1 A Ln 1 I n 1 0 ... . . . . . . . . . . . . . . . 0 ... 0 I n 1 A Ln 1 , (4.8) with I n 1 being then 1 ×n 1 identity matrix and A Ln 1 = −4 1 0 ... 0 1 −4 1 0 ... . . . . . . . . . . . . . . . 0 ... 0 1 −4 , (4.9) and where A v ξ 1 = 1 h A dn 1 0 0 ... 0 0 A dn 1 0 0 ... . . . . . . . . . . . . . . . 0 ... 0 0 A dn 1 , (4.10) 47 A dn 1 = −1 1 0 ... 0 0 −1 1 0 ... . . . . . . . . . . . . . . . 0 ... 0 0 −1 . (4.11) It can be shown that, for the lumped model in Eq. (4.1) to converge, the sampling time t s must obey the following inequality [23] t s ≤ D 2 4h . (4.12) Note that the above inequality is more restrictive than that in the 1-D case as given by (2.19). 1 n 1 n 2 x 0,3 =ψ 3 Internal points Boundary points Figure 4.2: The state space grid for a rectangular sensor field and boundary points. The locations of sensor nodes are not shown. 4.1.2 Incorporation of measurements For the simplicity of discussion, we assume that sensor nodes are placed only over the points of a rectangular grid (ih,jh). Thus, the measurement equation at cluster S j is defined as y j (k) =C j x(k)+v(k), 48 where v(k) denotes the vector of measurement noise. The above idealized assumption on node locations leads to a simplified expression for measurement matrix C. In this case, every line of C has one and only one nonzero element equal to one. If nodes are randomly located in the field, the expression for the measurement equation involves a nonlinear matrix function g(.) representing the relation between state variables of the model and noisy measurements of the 3-D physical field, i.e. y j (k) =g(x(k))+v(k). To implement the EKF, we need to estimate the Jacobian of g(x(k)) to linearize the expression near the estimate ˆ x of the physical field. Note thatg(x(k)) is unknown, and recall that the Jacobian represents the best linear approximation of a differentiable function near a given point. x i,j x i,j+1 x i+1,j x i+1,j+1 y l d 1 d 3 d 4 d 2 Figure 4.3: The square-cell approximation of a measurement point. By extending the methodology presented in Sec. 3.3, we approximate thelth line of the linearized measurement equation by means of a weighted sum of state variablesx i,j close to the node location. For simplicity, we consider only the x i,j terms associated with the vertices of the square cell containing the sensor node. The set-up is illustrated in Fig. 4.3. The sensor node is represented by y l . Let d i be the distance from the measurement point to a vertex of the square cell. We use the inverse of the distance as weight; namely, y l (k) = 1 d 1 x i,j +···+ 1 d 4 x i,j+1 P 4 i=1 1 d i +v l . (4.13) Note that if the sensor is located over one of the vertices, sayx i,j , then we havey l (k) = x i,j +v l . 49 4.1.3 Incorporation of boundary conditions Eq. (4.7) is valid only for null homogeneous Dirichlet boundary conditions. We can model general Dirichlet BC’s by introducing virtual source terms in the state equation. Each virtual source term, associated to a boundarypoint, appears inthe state equation of the nearby internal point. To give an example, for x 1,3 (k +1) that is associated to the location next to the boundarypoint (0,3) as shown in Fig. 4.2, we denote the term x 0,3 (k) as a virtual sourceu 3 (k). That is, we have x 1,3 (k+1) = t s D h 2 (x 2,3 (k)+x 1,4 (k)+x 1,2 (k)−4x 1,3 (k)) + t s v ξ 1 h (x 2,3 (k)−x 1,3 (k))+x 1,3 (k)+ t s D h 2 u 3 (k). (4.14) An m-point boundary following Dirichlet BC’s requires m source terms in the state equation. The Neumann condition in a multi-dimensional space is of the following form: c ∂x ∂n i r i =ψ i (r i ,t), (4.15) where r i denotes the coordinate at the ith boundary, n i is the unit vector orthogonal to the boundarysurface and directed outside the domain region. This type of condition imposes a certain flux at boundaries. Let us consider an example with the Neumann BC as ∂x(ξ 1 ,ξ 2 ,t) ∂ξ 1 ξ=0 =ψ i (t), (4.16) whose corresponding finite difference expression can be written as x 1,j (k)−x 0,j (k) h =ψ(k) j =1,...n 2 . (4.17) As given in Eq. (4.17) of Sec. 3.3, the boundary value x 0,ξ 2 (k) can be replaced by one auxiliary state equation. An m-point boundary following Neumann BC’s requires a lumped model to includem additional state variables. To summarize, Dirichlet BC’s can be modeled by virtual source terms, while Neu- mann BC’s require also some additional state space equations. Therefore, the general state space equation is of the following form x(k+1) =A(θ)x(k)+B(θ)u(k)+w(k), where u(k) denotes the (virtual) source terms and w(k) indicates noise affecting the 50 physicalfield. NotethattheinformationonBC’s,themodeldimensionandthesampling intervals must be available to cluster heads as mentioned in Sec. 3.3. 4.1.4 Treatment of unknown boundary conditions If the BCs are unknown, the sensor network needs to get this information to provide cluster heads with proper state space models for parameter identification. Recall that the boundaryproblemcan also besolved locally at each cluster, leading to aset of local state space models, in our framework. Thus, the approach proposed in Sec. 3.4, that uses nodes at boundaries of the sensor region or clusters to sample the boundaries, is still valid for the 2-D case. However, the boundaryregion is composed by 1-D curves in this case. Thisimplies a higher cost for transferringsamples from the boundarysensors to the other nodes in the network. Besides, if the BC’s are spatially varying, there is also an issue of collecting samples from enough nodes to avoid spatial aliasing. Thus, it is important to develop good techniques to solve the boundary problem at a low cost, for example, by estimating a flux at the boundaries to determine the Neumann BC’s or adopting some prediction method to decrease the communication cost. This is an open issue for future research. Several real world problems can be characterized by a mix of known and unknown BC’s. Consider a sensor network deployed in a gallery to measure the concentration of some toxic gas. In this case, the BC’s at the entrances are probablyunknown, butmost likely there is no flux at the walls, implying null Neumann BC’s, i.e., ∂x ∂n i r i =0. 4.1.5 Treatment of spatially varying parameters So far, it is assumed that the model parameters are constant over space and time. However, the parameters (e.g., diffusivity) may vary with space and time. Since the EKF is by nature an adaptive algorithm, it can track parameters θ i (t) varying in time slowly. In this subsection, we focus on the case of spatially varying parameters. A parameter varying with space (θ(ξ)) will result in a set of parameters (θ i ) in the state space matrixA(θ). Each parameterθ i in the lumpedsystem correspondsto the sample of the parameter θ(ξ) |ξ=ih in the infinite dimensional system. We provide a simple example to explain this concept. Example 3 Consider an infinite dimensional system described by the following diffu- sion equation: x t (ξ,t) =θ(ξ)x ξξ (ξ,t). (4.18) 51 with 0< ξ < 1, t > 0 and some given BC’s and IC’s, where diffusion coefficient θ is space variable. Then, the corresponding state matrix becomes A(θ)= 1 h 2 −2θ 1 θ 1 0 ... 0 θ 2 −2θ 2 θ 2 0 ... . . . . . . . . . . . . . . . 0 ... 0 θ n −2θ n (4.19) From Eq. (4.19), we see that the general expression of the lumped model in Eqs. (3.9) and (3.10) is valid even if coefficients θ i vary with the space. However, space varying coefficients imply an increase in the number of parameters to be estimated. For instance, the diffusion problem given in Eq. (3) requires the estimation of n state variablesx 1 (k) andnparametersθ i . A highnumberof parameters tobeestimated may affect the convergence of the EKF, especially if the number of measurements is low. On the other hand, the rate of spatial variation for physical parameters is typically slow. If this is the case, the problem can be simplified by considering the parameters constant over a certain subinterval. Hence, θ(ξ) can be approximated by a piecewise constant function, i.e., θ(ξ)≈θ j , for G j ≤ξ<G j+1 . (4.20) Letm bethe total numberof subintervals [G j ,G j+1 ,] in [0,1], i.e.,G 0 =0andG m+1 = 1. Therefore, we can write θ(ξ)≈ m X j=1 θ j (H(ξ−G j )−H(ξ−G j+1 )), (4.21) where H(ξ) is the Heaviside function, i.e., H(ξ) = ( 1 ξ≥0, 0 ξ<0. We can decompose the PDE problem in Eq. (3) into a set of subproblems defined according to the sub-region decomposition given in Eq. (4.21). Then, each subproblem is characterized by constant coefficients: x j t (ξ,t) =θ j x j ξξ (ξ,t) G j ≤ξ<G j+1 , j =1,...,m. (4.22) 52 Without going into mathematical detail, it can be shown that the set of functions {x j (ξ,t)} can build an approximation for the field x ( ξ,t) [46]. The important result is that we can adopt such a scheme to estimate parameters θ(ξ) j by decomposing one physical model into a set of sub-models having constant parameters. For each cluster, we define a local model characterized by constant parameters by applyingthe method presented inSec. 3.4. The parameters are assumedto beconstant within a cluster domain. The global parameter θ(ξ) is reconstructed by interpolating the local estimates ˆ θ j . This implies a smaller number of parameters to estimate with respect to the global parameters, if the numberof clusters is kept smaller thann, which is a reasonable assumption. Furthermore, the convergence of the EKF should benefit fromthedecreaseinthenumberofparameterstoidentify. Sinceourapproachisalready cluster-based, this method does not lead to any increase in computation in comparison with the global model case. We show an experiment of domain decomposition in Sec. 4.1.6). The method is applicable to the 2-D case as well. Fig. 4.4 shows an example of domain decomposition of a 2-D region. θ 1 θ 2 θ 4 θ 3 Figure4.4: Anexample ofdomaindecomposition ina2-D region, whereparameterθ(ξ) is assumed to be constant over each subregion. 4.1.6 Simulation results In this subsection, we examine the problem of estimating spatially varying parameters under unknown B.C.’s via computer simulation. Let us consider the following PDE system: x(ξ,t) =(θ(ξ)x ξ (ξ,t)) ξ (4.23) in interval 0 < ξ < 1 and t > 0, with initial condition x(ξ,0) = 2cos( π 2 ξ) and the Dirichlet boundary conditions; namely, x(0,t) = 2 and x ξ (1,t) = 0. The diffusion 53 coefficient is varying with space and can be written as θ(ξ) =θ 0 +θ 1 f(ξ)=0.1−0.2(ξ−0.5) 2 . (4.24) Thus, Eq. (4.23) can be rewritten as x(ξ,t) =θ(ξ)x ξξ (ξ,t)+θ ξ (ξ)x ξ (ξ,t). (4.25) To identify the coefficients, we partition the sensor field into clusters and assume the parameters to be constant over that cluster. The time-varying boundary conditions are acquired by boundary nodes. The estimates of the parameters are compared with true coefficients in Fig. 4.5 to show their accuracy. Hence, multiple local models with constantparameterscanbeusedtoidentifyandapproximatespace-varyingparameters. This allows a limited number of parameters to be estimated in each cluster. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 ξ (spatial position) θ(ξ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.2 −0.1 0 0.1 0.2 ξ (spatial position) θ ξ (ξ) Figure 4.5: Comparison of local estimates of parameters with their true values. 4.2 Hybrid Data and Decision Fusion Techniques The distributed algorithm described in Chapter 3 requires that the member of nodes in a cluster pass their measurements to the cluster head so that the head can perform the Extended Kalman Filtering (EKF) task. This process is called data fusion. The transmission of collected time series from member nodes may be expensive in terms of energy consumption due to a large number of data samples being transmitted, even if the size of a cluster is kept small. An alternative approach is to let each node in a 54 cluster to process its own measurements and then to pass estimated parameter values to the cluster head. This is known as a decision fusion process, which is less expensive in energy consumption. The cluster head needs only to average the received estimates by ˆ θ j = 1 |S j | X s l ∈S j ˆ θ s l , (4.26) where|S j | is the numberof membernodess l inclusterS j . However, this approach may result in poor estimates since only the measurements from one sensor are available to the EKF algorithm (i.e., y s i is a scalar) at each node. Acritical factortoconsiderforthedecision-fusionapproachisthespatialbandwidth ofthesampledprocess. Switchingtoosoontodecisionfusioncanleadtospatialaliasing and, thus, wrong estimates, since the spatial sampling frequency decreases. On the other hand, the smoothing effect as describedin Sec. 2.2 will progressively decrease the bandwidth of the sampled field. To find a good balance between the estimate quality and the communication cost, we consider a hybrid strategy that integrates time data and decision fusion. Some preliminary iterations are performed within each cluster using the data fusion mode. Then, the algorithm switches to the decision fusion mode. This allows to save energy, since notime series are exchanged among nodesafter initial iterations. We propose two criteria for the switching decision: 1. a function defined on some diagonal terms of the covariance matrix associated with parameter estimates; and 2. the number of samples being sent. The algorithm switches to the decision fusion mode if either f(p i (θ))<t 1 , (4.27) or n it >t 2 , (4.28) where t 1 and t 2 are two thresholds and n it is the number of samples sent to the fusion point since the beginning of the iterative algorithm. The two above conditions should be also integrated with a spatial frequency test to avoid the risk of aliasing. Condition (4.28)isusedtoavoidexcessivebatteryconsumptionforparticipatingnodes. Threshold t 2 can be decreased as the remaining energy level of nodes is decreasing to extend the lifetime of the network with soft performance degradation. 55 The simple arithmetic average of the parameter estimates may have some draw- backs, since the estimates from some of the nodes can be less reliable than others. Our experiments show that the SNR is not the only factor influencing the quality of the estimates (see Sec. 3.6.1). The performance depends also on the rate of variability of the field. For example, areas where the scalar field is close to be stationary cannot lead to good estimates since the PDE problem turns in an ODE problem at the equilibrium and becomes under-determined. In the case of equilibrium, there is no longer a net diffusion of concentration to observe and, therefore, the diffusion coefficients cannot be estimated. Hence, weighted averages given by ˆ θ j = 1 |S j | X s l ∈S j w l ˆ θ s l , (4.29) should be performed instead to give more relevance to the estimates coming from areas with better dynamics. 4.2.1 Simulations For the first simulation case, we consider the same set-up as that in Sec. 3.6.2. That is, the parabolic equation can be written as x t (ξ,t) = 1 50 x ξξ (ξ,t)− 1 50 x ξ (ξ,t)−0.32x(ξ,t). (4.30) The equation models a phenomenon of advection-diffusion with dissipation, due to the nonzerovelocityanddispersionparameters. TheICisx(ξ,0) =exp(−(ξ−0.3) 2 /(0.04)), with homogeneous Dirichlet BC’sx(0,t) =x(1,t) =0. The fieldx(ξ,t) is shown in Fig. 4.6. It is initially characterized by a relatively large spatial bandwidth, as it is a narrow pulse. In Fig. 4.7, we compare the estimates of parameters performed according to three different modes: 1. data fusion of three sensor nodes; 2. only one sensor node processing its own data; and 3. data fusion with three nodes switched to one after 20 iterations. On one hand, one node alone gives poor estimates since the first iteration due to the aliasing problem. On the other hand, thanks to the smoothing effect as described in Sec.2.2, switching to one node after a few iterations of data fusion in a cluster gives 56 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 t (time) ξ (space) x(ξ,t) Figure 4.6: The scalar field described by the parabolic Eq (4.30), where an advection phenomenonis observed andthe concentration is moving toward one of the boundaries. results comparable to data fusion alone by comparing the estimated and original scalar fields as shown in Figs. 4.6 and 4.8. For the second experiment, the measured scalar field is modeled by the following diffusion equation: x t (ξ,t) = 1 π 2 x ξξ (ξ,t), defined in interval 0 < ξ < 1 and t > 0, with initial condition x(ξ,0) = 2cos( π 2 ξ) and Dirichlet boundary conditions x(0,t) =2 and x(1,t) =0. We compare the performanceof data fusionwith two different decision fusionmeth- ods using the arithmetic and the weighted averages of the estimates of the three nodes, respectively. ErrorsinestimatingthediffusivityarecomparedinTable4.1. Theweights are defined to be proportional to the distance of nodes to the boundaries. Table 4.1: Comparison between data and decision fusion. data decision weighted fusion fusion dec. fusion Mean error 0.44 % 1.85 % 1.78% Std. 3.57 % 7.98 % 2.61% 57 0 1 2 3 4 5 6 7 8 9 10 −0.01 0 0.01 0.02 0.03 t θ 1 1 sensor true value 3/1 sensors 3 sensors 0 1 2 3 4 5 6 7 8 9 10 −0.2 −0.1 0 0.1 0.2 t θ 2 0 1 2 3 4 5 6 7 8 9 10 −0.1 0 0.1 0.2 0.3 t (time) θ 3 Figure 4.7: Estimates of coefficients via data fusion of three sensors, one sensor, and a witch between three/one sensor. 4.3 Energy Consumption and Data Gathering As mentioned before, data gathering can be one of the possible applications for this research. Under this scenario, the PDE model can be used to represent a physical field in a way alternative to a distributed time series. For instance, in a sourceless scenario, the field can be expressed with PDE coefficients, BC’s and IC’s. The purpose of this section is to analyze the energy cost of our approach. To this end, we consider a monitoring application and make a comparison between PDE-model-based and sample- based data gathering. In this section, we consider a sensor network regularly deployed in the 1-D space ξ∈[0,1]. The number of nodes, including the sink, isN +1. Nodes are equally spaced so that the distance between two contiguous nodes is 1/N. Without loss of generality, we assume that the sink is located in the center location of the sensor region (i.e., ξ =1/2) and N is an even number. The sensor field is represented in Fig. 4.9 The sensor network is monitoring the physical field x(ξ,t) characterized by the following diffusion PDE: x t = Dx ξ,ξ , − 1 N <ξ<1+ 1 N , t>0, (4.31) x − 1 N ,t = x 1+ 1 N ,t =0, (4.32) x(ξ,0) = δ(ξ). (4.33) 58 0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 t (time) ξ (space) \hat{x}(\xi,t) Figure 4.8: The scalar field estimation using a hybrid method. Sink 1/n 2/n Nodes Figure 4.9: The 1-D uniform sensor field, where nodes are equally spaced and the sink is located at the center. Note that BC’s are defined in this way so that nodes in ξ = 0 and ξ = 1 are not on the boundaries, which simplifies the mathematical expressions. The BC’s are known at the network. Then, the physical field can be approximated by the following analytical expression x(ξ,t)≈ 1 √ 4πDt exp − (x− 1 2 ) 2 4Dt ! . (4.34) This field is plotted over space at different sampled times in Fig. 4.10 and over time in different sampled locations in Fig. 4.11. The task of the sensor network is to enable the base station (BS) to reconstruct x(ξ,t). The communication to the BS is performed through the sink as shown in Fig. 4.9. Asmentioned, wewanttocomparetheenergycostofhavingallnodescontinuously sending measured samples to the sink (namely, the sample-based data gathering) with the one of having some of the nodes performing initial estimates of the diffusivity D and then sending the estimate along with the initial condition, non necessarily at time t=0, (namely, the PDE-model-based data gathering). Since our approach to the PDE 59 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8 9 ξ x(ξ,t) t=.001 t=.032 Figure 4.10: The physical field x(ξ,t) over space at different sampled times. model identification is based on a cluster architecture, we compare it with a cluster- baseddatagathering system. Thesystem architecture isshowninFig. 4.12. Thenodes are partitioned intoN c balanced clusters, each one havingS c member nodes, including the cluster head, where S c = N N c . We define the cost of transmitting an information packet to a point at distanced to be E T (d) =E 0 +E 1 d 2 . (4.35) Forsimplicity,weconsideronlythecostoftransmittingapacket,butneglectingcostsfor receiving packets, sensing or computing for the time being. An information packet can be a measurement, a parameter estimate or an aggregation of measurements performed at a cluster. To justify the latter point, we assume perfect data aggregation at the cluster head i.e. the aggregation of m packets at the cluster head (CH) gives still one and only one packet of the standard size. Assuming that the nodes talk directly to the sink, the cost of data gathering in the 1th round becomes E g (N) =NE 0 + 2E 1 N 2 N/2 X k=1 k 2 . (4.36) Ifthecommunicationisperformedthroughclusterheads,thenthecostisthesumofthe energy required to aggregate the measurements at each cluster and the energy needed 60 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.5 1 1.5 2 2.5 t x(ξ,t) ξ=.4 ξ−.1 Figure 4.11: The physical field x(ξ,t) over time at different sampled locations. Sink 1/n 2/n Nodes Cluster head Figure 4.12: An example of the cluster-based data gathering system. for the cluster heads to pass their measurements to the sink; namely, E c (N,N c ) =NE 0 +N c 2E 1 N 2 Sc/2 X k=1 k 2 + 2E 1 N 2 Nc/2 X k=1 (kS c − S c −1 2 ). (4.37) Let n r be the number of data transmission rounds, the total cost for data gathering is E dg (N) =n r E c (N,N c ). (4.38) ForthePDE-model-basedapproach, wehave toconsiderenergycostsattwostages: pa- rameter estimation andparametertransmission. Thecost ofn i iterations forparameter estimation at a cluster of size S c is equal to E est (N,Sc) =n i (S c −1)E 0 + 2E 1 N 2 Sc/2 X k=1 k 2 . (4.39) 61 Tomakeafullcomparisonbetweensampled-basedandPDE-model-baseddatagath- ering, we have to evaluate the number of clusters needed to estimate the model param- eters (since the number of active nodes can be actually lower that N) and the number of iterations needed for the estimate to converge. For this purpose, two important pa- rameters to compare are the sampling rate t g for data gathering and the time stepsize t s for PDE computation. Recall that t s must obey the inequality t s ≤ h 2 2D . In general, to have a good accuracy, parameter t g should be set around 10÷20 times smaller than the Nyquist rate. On the other hand, if D = 1 and h = 0.1, t s ≤ 0.005. From the numerical evaluation of the Fourier transform ofx(ξ,t), the upper limit fort s should be around 10 times smaller of the Nyquist sampling rate and therefore similar to the range of t g . However, note that if the node space is too small, say, h =0.01, the value for the time stepsize t s becomes much samller than the data gathering sampling time t g . In the data transport phase, we demand that the sensor network sends BC’s, IC’s and parameters to the base station. In this example, the parameter is constant and the BC’s are known so that they can be ignored in the analysis. Note that our approach can provide IC’s at an arbitrary time instance after the convergence as it uses a joint state and parameter prediction method (the EKF). Thus, the IC information can be aggregated to parameter estimates for an efficient delivery to the BS. The above analysis can be extended to the 2-D case, with a similar approach (i.e. nodeson2-D regulargrid). Ingeneral, ifthe boundaryconditions are unknown,the pa- rameter estimation and data transport for the PDE-model-based methods will demand more energy. However, this should involve only a subset of the nodes. In the 2-D case, for a network with N nodes the number of boundary nodes is O( √ N). Therefore, it should still be cheaper than performing data gathering throughout the whole network. If t s is much smaller than the Nyquist sampling time, parameter estimation can be more energy demanding than data gathering. However, it must be pointed out that the parameter estimation may involve only a subset of the nodes and for a limited amount of time, as it stopswhenthealgorithm converges. Therefore, the cost duetoan eventual higher sampling rate for the preliminary estimation of the parameters should be compensated thanks to a smaller amount of nodes involved and a shorter sampling time. 62 4.4 Turbulent Diffusion The purpose of this section is to discuss an open issue from the last year attempted qualifying exam and also to give further motivations to this research direction. The estimation of coefficients of the parabolic PDE model on the gas concentration data from the NIST fire simulator failed. At that time, it seemed that the failure was mainly due to the attempt to capture a 3D phenomenon with a 2D model. However, from furtherconsiderations, themainreasonfortheinabilitytoestimatethePDEcoefficients seems to be the attempt of capturing the behavior of a macroscopic phenomenon with a model mainly suitable for molecular phenomena. The fact is that macroscopic diffusion is characterized by turbulent motion of the particles. As matter of fact the NIST fire simulator relies on the Navier-Stokes equa- tion. Note that their solution is characterized by turbulent diffusion. The diffusion generated by the turbulent motion of the particles is actually much more effective than the molecular diffusion for a certain diffusivity coefficientD. For example, the diffusion of a certain quantity of sugar in cup of tea would take days, instead of the usual few seconds, without stirring. The operation of stirring introduces turbulence which domi- nates over molecular diffusion. Turbulentdiffusionischaracterized by arandommotion of the particles on eddies of variable size. The maximum eddy size depends on the size of the phenomenon [42]. In a parabolic PDE modeling turbulent diffusion, at least one of the coefficients is a random process: x t =(Dx ξ ) ξ +ux ξ . (4.40) whereuis arandomvelocity termtomodel turbulenceandthe subfixes t and ξ indicate partial derivatives with respect to time and space respectively. The random coefficient of a PDE model for turbulent diffusion is characterized by a high temporal variability and it is not practical to try track its instantaneous value [42], as we did in the attempt of identifying PDE parameters associated to the fire data. Therefore nonstationary covariance models seemed a better tool to deal with macroscopic spatio temporal phe- nomena than PDE’s. However, it must be pointed out that the initial study on PDE’s gave us some very important insights on the characterization spatio-temporal phenom- ena that are still leading our research now such as the the spatio temporal variability of the bandwidth and the necessity to adopt nonstationary covariance models. 63 Chapter 5 Optimal Sink Deployment for Distributed Sensing of Spatially Nonstationary Phenomena The topic of energy efficient acquisition of a spatio-temporal phenomenon via wireless sensor networks, namely, correlated data gathering, is investigated in this chapter. In a data gathering round, each node aggregates its own sensed data with data from other nodes, performing entropy coding, and forwards them to its parent node. Final destinations for all data packets are nodes called sinks. The overall energy needed by nodes (battery supplied) for gathering data depends on the total length of the set of routes to sinks, which is called the data gathering tree, and rates generated by nodes. The rate depends on the spatial correlation among the data. Thischapterisorganized asfollows. Mainassumptionsaremadeandtheproblemis formulatedinSection5.1. Section5.2presentsanalyticalresultsonmultisinkplacement under the assumption of densely deployed sensor networks. Numerical and analytical results on the spatial profile of the rate are given in Section 5.3. Experimental results are shown in Section 5.4. Section 7.2.3 outlines in detail the plan for future work. Finally, the appendix 4.4 gives some further motivation for this work in relation with our previous work on identification of diffusion partial differential equation models. 5.1 Problem Formulation We consider a set of N sensor nodes deployed over a compact region Ω⊆R p , sampling a spatio-temporal process Z(x,t), x ∈ Ω, t ∈ R p characterized by a separable spatial 64 covariance model K(Z(x 1 ,t 1 ),Z(x 2 ,t 1 )). All the nodes are connected via a tree struc- ture, the data gathering tree, to a special node called sink, the root of the tree. The samples of the field Z(.) are transmitted to the sink through the data gathering tree. Our goal is to find the location s of the sink in Ω that minimizes the overall network cost of transmitting the data to the sink. We assume that a nodeperformslossless compression of its own sensing data, either viaanexplicitcommunication scheme, i.e. usingthesideinformationfromthechildren, or via optimal Slepian-Wolf. See Subsec. 2.4.3 or [14]. In both cases, we adopt the shortest path tree (SPT) as data gathering structure. The SPT is not optimal for the explicit communication case. However, the result provided by [70] justifies choosing SPT for explicit communication as well, because of the relatively limited energy saved via an optimal data aggregation tree structure (see Sebsec. 2.4.3). The total amount of information transmitted by a node i, including the data received from its children, is the rate, R(i). We define the incremental rate, r(i), as the amount of information generated by node i encoding its own sensing data. Let T i be the tree rooted at node i, then r(i) is given by: r(i) =R(i)− X j∈T i \{i} R(j). (5.1) Thus the incremental rate is the net amount of information at node i. For the explicit communication scheme, we have that r(i)≥H(Z i /{Z j }), j ∈T i \{i}. (5.2) We also assume that the sampling time at the sensor nodes is synchronized and that the network is faster than the phenomenon. Therefore effects of temporal correlation can be disregarded. In general, the energy spent by each each node i, located at x i , for transmitting a packet, depends on the rate R(i) (i.e. the number of bits) transmitted and on the distance d i to the receiver of the packet. Therefore a typical expression for the total energy spent by network for a round of data gathering cost is [21]: E = N X i=1 R(i)d k i , (5.3) where k =2,4. 1 We derive an approximate expression for the energy cost (5.3) that makes explicit 1 In noisy wireless channels exp[R(i)] is considered instead of R(i) [50]. 65 the dependency on the sink location and depends on the incremental rate. Proposition 1 With the assumptions of fixed radio range communications and shortest path data gathering tree, an approximation up to a constant factor of the energy cost function is given by: E ′ = N X i=1 r(i)d(x i ,s), (5.4) where c r is the radio range and d(x i ,s) is the Euclidean distance between the node location x i and the sink location s. Proof: We are interested in minimizingan energy cost functionequivalent to (5.4). Due to the fixed radio range assumption,d i =c r for every nodei and E = N X i=1 R(i)d k i = N X i=1 R(i)c k r . (5.5) We rewrite the (5.5) in function of the incremental rate, r(i): E = N X i=1 r(i) h i X j i c k r =c k−1 r N X i=1 r(i) h i X j i c r where h i is the total number of hops from node i to the sink. We approximate P h i j i c r with the Euclidean distance between nodex i and the sink, d(x i ,s). Hence E ≈c k−1 r N X i=1 r(i)d(x i ,s). We are interested in the minimization of the energy expression, therefore we can disre- gard the constant factor c k−1 r and consider E ′ (s) = N X j r(i)d(x i ,s) as a cost function. For a scenario of N s sinks over a set of locations s j , the energy cost becomes: E = Ns X j N j X i r(i)d(x i ,s j ). (5.6) Such an energy metric would not be valid valid for aggregation structures such as the Steiner Minimum Tree (SMT), because in the latter case the routes to the sink are 66 Figure 5.1: Example ofplacement ofthree sinks(black dots) andpartition of the sensor region (dashed lines). generally longer then the number of direct hops [70]. Therefore the Euclidean distance would not be a good approximation for the total length of the communication links between a node and the sink. The optimal sink location, s ∗ , can be found by minimizing expression (5.6), i.e.: s ∗ =argmin s N X i r(i)d(x i ,s), (5.7) wheres,x i ∈Ω. In the case of multiple sinks, we assume that each sensor node communicates either directly or indirectly to one and only one sink, i.e. the data gathering trees rooted at the sinks define a partition of the sensor nodes. Therefore the generalized problem consists in finding the optimal sink locations s i and region partition {Ω i }, i.e: {Ω ∗ i ,s ∗ i } Ns i=1 =argmin Ω i ,s i Ns X j N j X i r(i)d(x i ,s j ), (5.8) where N s is the number of sinks and S Ns i=1 Ω i =Ω. The optimization problems (5.7) and (5.8) require the computation of the rate al- location at the nodes,r(i). The rate allocation depends on several factors such as data aggregation strategy, data aggregation structure and spatial covariance structure of the phenomenon. Note in particular that different sink locations can lead to different rate profiles, because the sink is the root of the data gathering tree and thus its location impacts its structure. Therefore this may seem a sort of ’chicken and egg’ problem, be- cause we need to know the rate allocation in order to find an optimal sink location and 67 we need a sink location in order to compute the rate assignments. It would seem that this problem could be solved only by brute force, evaluating and comparing the energy aggregation cost for all the different candidate sink locations. However, in Section 5.3, we will show that rate profiles generally exhibit a weak sensitivity to variations in the sink location. This allows considering the rate allocations as a geographical weighting of the sensor region, depending on the statistical properties of the phenomenon being monitored. The correlation, and hence the rates at the nodes, can be estimated from some preliminarymeasurements over the region of interest, for instance throughthe de- ployment of a preliminary sensor network [35]. Then the problem can be solved off-line in a centralized manner. 5.2 Massively Dense Sensor Networks The aim of this section is to find analytical solutions of problems (5.7) and (5.8) under some simplifying assumptions. We consider a massively dense sensor network perform- ing correlated data gathering, so that the rate profile can beconsidered as a continuous function. The idea of representing sensor networks via a dense spatial distribution of nodes, instead of a discrete set of sparse node locations, has been adopted already (e.g. [63]). Like in [63], we assume that the nodes are distributed over a compact region Ω ∈ R p , p = 1,2, with spatial density δ(x). We denote the area (or, if p = 1, the length) of Ω as |Ω|. Each nodes performs data gathering and generates an amount of bits equal to the incremental rate. The information is transmitted toward the sink via multihopping. We assume that the density δ(x) is such that the incremental rate can be assumed as a continuous function over Ω. To refer to the incremental rate as a scalar field over a continuous space domain, we introduce the expression data aggregation density function, defined as follows: α(x,s) :=r(x,s), (5.9) where, we have highlighted the dependency of the incremental rate on the sink location s (see Sec.5.1). Notethatα(x,s)dependsalsoonthenodedensity,δ(x), ontheaggregation strategy chosen and on the covariance functionK(Z(x 1 ),Z(x 2 )) =K(x 1 ,x 2 ). 2 The data aggre- gation density function is nonnegative: α(x,s)≥0 everywhere in the sensor region, Ω. It is measured in bits/m p . The total amount of information generated over Ω can be 2 [63] uses a information density function that is the spatial distribution of the rate. They do not consider any kind of compression. 68 expressed as: I(Ω,s) = Z Ω α(x,s)dx. (5.10) This is the total information collected at the sink after a data gathering round. Based on Proposition 1, we express the cost of sendingα(x) bits fromx to the sink, at point s, as dE =α(x)kx−sk, where kx−sk is the Euclidean distance between the two points. Hence, the total cost of a data gathering round for the network can be written as: E(Ω,s) = Z Ω α(x,s)kx−skdx. (5.11) Therefore the optimal sink location problem, under the assumption of massively dense sensor networks, is written as: s ∗ =argmin s E(Ω,s) (5.12) In the multiple sink case, the data aggregation cost can be expressed as: E(Ω,s i )= X i Z Ω i α i (x,s)kx−s i kdx. (5.13) Findingtheminimalaggregation energy requiresfindingtheoptimal partition{Ω i }and then the optimal sink location within each subregion Ω i , i.e.: {Ω ∗ i ,s ∗ i } n i=1 =arg min {Ω i ,s i } E(Ω,s i ). (5.14) We now show the convexity of the objective function (5.11). Proposition 2 Consider a certain data aggregation density function α(x,s) defined over a compact region Ω. The total information I(Ω ′ ), Ω ′ j Ω is a monotonically increasing function of the monitored area. Proof: If Ω 1 jΩ 2 , then I(Ω 1 )≤I(Ω 2 ), (5.15) because |Ω 1 |≤|Ω 2 | and α(x,s)≥0. A similar result holds for the energy: Lemma 1 Consider a certain data aggregation density function α(x) defined over a compact region Ω. For a fixed sink location s, the total aggregation energy E(Ω,s) is a monotonically increasing function of the monitored area Ω. Proof: It follows from the fact that bothα(x,s) andks−xk are nonnegative for every x. 69 In the rest of this section, to further simplify the analysis, we disregard the depen- dency ofα(.) on the sinklocations. Thereforewe denote it asα(x). This simplification is shown valid in Section 5.3. Proposition 3 If {x j : α(x j ) = 0} has zero measure, the objective function (5.11) is convex in Ω∈R. Proof: Let Ω = [0,L], by setting equal zero the first order derivative of (5.11) with respect to s, we have: Z sopt 0 α(x)dx= Z L sopt α(x)dx. (5.16) This implies that the objective function has only one minimum point, because I(.) is a monotonically increasing function. Besides, it can be shown that the second order derivative of (5.11) is positive. Therefore the objective function is convex. This means that in one dimension, the optimal location of the sink is such that the amount of information associated to the two subregions [0,s − ] and [s + ,L] is the same. The convexity of 5.11 seems to hold also in the multidimensional case. We also provide a general result for the deployment of multiple sinks in an one- dimensional region. Proposition 4 Consider the deployment of 2 sinks in the interval [0,L] ∈ R. The optimal sink locations, s 1 ,s 2 and partition point, v, have the following properties: (1) each sink location s i satisfies the condition (5.16) over its subregion and (2) v = (s 1 + s 2 )/2. Proof: Without lack of generality, we consider Ω=[0,1]. We define P α (.) := Z α(x)dx, (5.17) P αx (.) := Z α(x)xdx. (5.18) Therefore R b a α(x)dx =P α (a)−P α (b) and dP α (x)/dx =α(x). E = Z v 0 α(x)|s 1 −x|dx+ Z 1 v α(x)|s 2 −x|dx (5.19) =P α (x)| x 1 0 −P αx (x)| s 1 0 +P αx (x)| v s 1 −P α (x)| v s 1 +P α (x)| p s 2 −P αx (x)| v s 2 +P αx (x)| 1 s 2 −P α (x)| 1 s 2 =2P α (s 1 )s 1 −P α (0)s 1 −2P αx (s 1 )+P αx (0) +P αx (v)−P α (v)s 1 +2P α (s 2 )s 2 −P α (0)s 2 −2P αx (s 2 ) +P αx (v)+P αx (v)−P α (1)s 2 . 70 Setting to zero the partial derivatives: E s 1 =0 ⇒ P α (s 1 )= P α (0)+P α (v) 2 (5.20) E s 2 =0 ⇒ P α (s 2 )= P α (v)+P α (1) 2 (5.21) E v =0 ⇒ v = s 1 +s 2 2 . (5.22) Note that expression (5.20) (as well as (5.21)) can be rewritten as: Z s 1 0 α(x)dx = Z v s 1 α(x)dx, whichis theoptimality condition we derived inthe single nodecase. Solvingthesystem of equations (5.20-5.22) gives the optimal sink locations and partition point of [0,1]. Lemma 2 The above result can be extended to the case of n sinks. The results (4) and (2) are useful for defining an iterative algorithm for the sink place- ment,wheretheregionpartitionisiterativelyadjustedaftercomputationsoftheoptimal sink locations. We formulate also the following conjecture: Conjecture 1 Necessary condition for a partition{Ω i } of Ω and related sink placement s i , i = 1,...,n s to be optimal is that the aggregation energy is the same over each subregion. I.e. E i = E(Ω) ns over each subregion Ω i for all i. Proof: The conjecture may actually require some additional hypotheses on α(.). The necessity can be proved considering that when the sink locations and the partitions are optimal, thenthe energyover each Ω i must beequal, because thegradient of theenergy is zero. Example 4 We consider the optimal partition and sink placement for α(x) = x in [0,1]. We need to find: (v ∗ ,s ∗ 1 ,s ∗ 2 ) =arg min (v,s 1 ,s 2 ) Z s 0 x|s 1 −x|dx+ Z 1 v x|s 2 −x|dx. We get v =.65, s ∗ 1 =46 and s ∗ 2 =.83. The profile of the data aggregation function, the optimal sink locations and partition point are shown in Figure 5.2. We also observed that the energy aggregation cost of the two subregions [0,.65] and ].65,1] turns out to be the same. Thisseems tosupport the validity of Conjecture 1, i.e. the optimal solution for the multisink case is characterized by the same data gathering cost for each subregion. 71 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ξ α(ξ) sink1 α(ξ) sink2 Figure 5.2: Profile ofα(x) and partition of the 1D sensorfieldin totwo subregionswith optimal sink locations at s ∗ 1 =46 ands ∗ 2 =.83. Example 5 We consider here a slowly increasing data aggregation function, α(x) = log 2 (a +bx), with a,b > 0 and x ∈ [0,1]. For a = 2 and b = 10, we have that the optimal sink location is s = .6. See Figure 5.3. Besides, the ratio between the energy cost associated to the optimal sink location and the cost associated to the sink at the centre, E(.5)/E(.6) = .95. In other words, in this case there is 5% energy saving in having the sink at an optimal location, rather than at the centre of the sensor region. The value of the energy as function of the sink location is shown in Figure 5.4. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 1.5 2 2.5 3 3.5 4 x α(x) Figure 5.3: Profile of α(x) =log 2 (2+10x) and optimal sink location in s ∗ =.6. A close form solution for the case of Euclidean distance is difficult to evaluate in the 72 0 0.2 0.4 0.6 0.8 1 0 100 200 300 400 500 600 700 800 s E(s) Figure 5.4: Data aggregation cost as a function of the sink location. 2D case. However, it is interesting to notice that, when changing the distance metric from Euclidean to quadratic, the optimal sink location is the centroid in Ω. Thus if E(Ω,s) = Z Ω α(x)kx−sk 2 dx, (5.23) then s ∗ = R Ω α(x)xdx R Ω α(x)dx . (5.24) This result is useful, because it allows a close form calculation of the optimal location of the sink, although for a different energy metric, simply by looking at the spatial distribution of the incremental rate. 5.3 Analysis of the Rate Profile In this section, we study the relationship between correlation model and spatial profile of the incremental rate, which we refer to as data aggregation density function, α(x,s), x,s ∈ Ω. As mentioned, α(.) depends on several factors such as: (1) the spatial co- variance of the monitored phenomenon, (2) the coding strategy being adopted (e.g. Slepian-Wolf vs. explicit communication), (3) the location of the sink and (4) the lo- cation of the nodes. Our goal is to analyze how the aforementioned factors affect the profile of the incremental rate. To this end, we need some further assumptions on the structure of the spatial correlation model, beside the fact that it is possibly nonstation- arity. Thus we introduce a general class of nonstationary covariance models (Subsec. 5.3.1) characterized by slow variability over space. Then, we analyze the sensitivity of 73 α(.) to the sink location and to node perturbations (Subsections 5.3.2 and 5.3.3), eval- uate the impact of node density in this context (Subsec. 5.3.4) and study the steepness of the data aggregation density function (Subsec. 5.3.5). We assume that the phenomenonZ has amultivariate Gaussian spatial distribution N n (μ,K), wheren is the number of observation points andK is the covariance matrix. Thus the incremental rate r(i) at nodei can be computed via the following expression [12], [21]: r(i) := 1 2 log 2 (2πe) det(K[T i ]) det(K[T i \{i}]) −log 2 Δ, (5.25) where Δ is the quantization step size, R[T i ] is the covariance matrix of the subset of nodes T i . In the case of explicit communication, T i is the subtree of nodes rooted at i and T i \{i} is the set of children of i. For the optimal Slepian-Wolf aggregation, T i is the union of i itself with the set of nodes that are closer than i to the sink [14]. 5.3.1 A class of nonstationary covariance models Closedformexpressionsfornonstationaryspatialcovariancefunctionshavebeenderived only in the last 2-3 years and are usually complicated, because they need to guarantee the property of nonnegative definiteness [44], [60]. See Subsec. 2.4.4. For the purpose of ouranalysis, we need correlation models simple as much as general. Thuswe definea classofcovariancemodels,byboundingtheir’deviationfromthestationarity’. Consider the the two covariance coefficients K 1 =K(x,x+Δx) and K 2 = K(y,y+Δx), whit x,y ∈ Ω ⊆ R p , and the vector Δx ∈ R p . We bound the absolute value of their difference, |K 1 −K 2 |, with a function growing linearly with the distance separating the two different pairs of measurements, i.e. |x−y|. If K(.) is isotropic and stationary K 1 = K 2 . In defining the class of nonstationary covariance models, we also observe that nonstationary covariance models usually exhibit a behavior similar to stationary models on the long range, because as the distance between two measurement points increases, their covariance decreases, for instance with exponential decay [60]. For the short range behavior, we want to impose that the covariance between two points can at most grow linearly with the distance from a given pair of points, because we assume that a model for many nonstationary natural phenomena would still exhibit a quasi-stationary behavior on the short range [57]. Therefore, for the pointsx,y,w∈ Ω⊆R p andthe vector Δx∈R p , weconsider covariance functionsobeyingthe following additional conditions on short and long range behavior: |K(x,x+Δx)−K(y,y+Δx)|≤a|x−y|, (5.26) K(x,w)≤be −c|x−w| , (5.27) 74 for some real constants a, b and c. We refer to condition (5.26) as distance dependent variability andtocondition (5.27) as exponential decay. Theseconditions seem valid for many natural phenomena and are respected by some existing nonstationary correlation models [13], [28]. 0 1 2 3 4 5 6 7 8 9 0 20 40 60 80 100 120 140 160 x |K(0,0)−K(x,x)| Figure 5.5: Absolute variation of the estimated variance over the space (rain fall data set) and bounding line according to condition (5.26) 0 1 2 3 4 5 6 7 8 0 10 20 30 40 50 60 |K(0,1)−K(x,x+1)| x Figure5.6: Absolutevariationofanestimatedcovariancecoefficientoverthespace(rain fall data set) and bounding line according to condition (5.26). Example 6 To test the validity of our distance dependent variability condition (5.26), we used rain fall data provided by the University of Washington. See Section 5.4 for more details on the data set. We considered a one-dimensional subsets of data. 75 0 1 2 3 4 5 6 7 8 9 0 5 10 15 20 25 30 35 Δ x K(0,Δ x) Figure 5.7: Covariance coefficient and bounding exponential (34.12e −.30Δx ) according to condition (5.27). We scaled in the space domain the locations of each set of measurement to the points x=0,1,.... We estimated the covariance coefficients K(x,x) andK(x,x+1) and then plotted | ˆ K(0,0)− ˆ K(x,x)| and | ˆ K(0,1)− ˆ K(x,x+1)|, x=0,1,..., with their bounding lines, figures 5.5 and 5.6. In both cases, the estimated covariance shows a clear non- stationary behavior (if stationary, it would be constant over x). And in both cases the condition (5.26) is respected, because the difference between covariance coefficients in- creases with distance in the short range. We also notice that such a difference saturates beyond a certain distance. We also verified condition (5.26) and plotted ˆ K(0,Δx) with its bounding exponential, in Figure 5.7. Different subsets of the rain data set yielded similar results. 5.3.2 Variability with respect to the sink location We want to analyze the dependency of the data aggregation density function α(x,s), x,s ∈ Ω ⊆R p , on the sink location s. From numerical results, we have observed that such a dependency is usually fairly weak and that therefore, given two distinct sink locations,s 1 ands 1 , we can considerα(x,s 1 )≈α(x,s 2 ) for the purpose of optimal sink placement in Ω. We consider first an example. Example 7 We considered a subset of the rain fall data (see Sec. 5.4), over a 5 by 6 rectangular grid. We computed the data aggregation density for 2 different locations of the sink: one at center and the other at the corner (5,6) of the region, adopting the explicit communication scheme. Let α 0 (.) and α 1 (.) respectively the data aggregation density functions for the central and extreme location of the sink. We computed the 76 error |α 0 (x)−α 1 (x)| and normalized it. The average absolute percent error is around 5.5%. The absolute normalized error field is plotted in Figure 5.8. The plot shows that the peaks in the error are at the locations of the two sinks. 4 6 8 10 12 4 6 8 10 12 0 5 10 15 20 25 30 x y Figure 5.8: Normalized error, 100|α 1 (x,y)−α 2 (x,y)|/|α 1 (x,y)|, between two data ag- gregation density functions on two different sink locations. Intuitively, the structure of the data gathering tree will not change in the same way over Ω, as the sink changes location. As Figure 5.8 shows, at some locations the error is zero, because thedata aggregation tree hasnot changed there. At some other locations, there is a small difference, because there is a lot of overlapping among the nodes used to compute α(.). i s 1 s 2 s 3 i s 1 s 2 s 3 Figure 5.9: The two different sets of children considered for computing the incremental rate at node i for the sink locations s 1 and s 2 . In this case there is no overlapping between the sets. 77 To gain better insights on the behavior of the data aggregation function, we want to find the following upperbound max s 1 ,s 2 ,x i |α(x i ,s 1 )−α(x i ,s 2 )| s 1 ,s 2 ,x i ∈Ω, (5.28) under the conditions (5.26-5.27) on the correlation model and for the explicit commu- nication case. To compute the upper bound of the expression (5.28), we assume the worst case scenario: i.e. s 1 ,s 2 ,x i are such that the set of children of the node at x i changes completely as the sink switches from s 1 to s 2 . See Figure 5.9. We have that, due to condition (5.27), only a limited number n of children is used as the others have little correlation with point x i . Hence |α(x i ,s 1 )−α(x i ,s 2 )| = 1 2 log 2 (2πe) |K (1) n ||K (2) n−1 | |K (1) n−1 ||K (2) n | , (5.29) where K (j) n−1 is the covariance matrix associated to children of node i as the sink is s j andK (j) n is the covariance matrix associated to nodei and its children. We express the matrices at the numerator as perturbations of the matrices at the denominator: K (1) n =K (2) n +M n . (5.30) We have that kM n k is bounded due to condition (5.26). By expressing the determi- nants as products of eigenvalues and applying the Bauer-Fike theorem, from matrix perturbation theory [25], we have that: max s 1 ,s 2 ,x i |α(x i ,s 1 )−α(x i ,s 2 )|≤ (5.31) log 2 (λ (1) max +δ n ) n (λ (2) max +δ n−1 ) n−1 Q n j=1 λ (2) j Q n−1 j=1 λ (1) j where δ n =kM n k, λ (m) j is an eigenvalue ofK (n) . A similar result can be obtained for the Slepian-Wolf case as well. In general, small perturbations over an ill conditioned matrix can lead to high variations in the eigenvalues. However, acovariance matrixis symmetric bydefinition. Thisimplies that the maximum variation of an eigenvalue is boundedby only a norm of the perturbation matrix, which is a small number because of the hypothesis on the slow variation of the covariance model (5.26). Hence small perturbations on the eigenvalues lead to a small variation of the determinant. Thus the analytical results confirm the intuition on the numerical results: the spatial profile of the incremental rate has a weak sensitivity 78 0 0.5 1 1.5 2 2.5 3 3.5 4 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Figure 5.10: Variable node density, stationary correlation model, Slepian-Wolf coding. versus different choices of the sink location (and of the data gathering tree). 5.3.3 Sensitivity to node perturbations From numerical tests, we observed that given the same correlation model, the optimal location of the sink was showing a weak sensitivity to perturbations of the nodes. For example, we compared the results for different realizations of auniformrandom deploy- ment of the nodes. From the analytical point of view, we obtain a result analogous to the bound (5.31) presented in Subsection 5.3.2, because the a perturbation over the node location will lead to a perturbation of the covariance matrices in the expression for the computation of α(.). 5.3.4 Node density From numerical tests, it seems that a variable node density over a sensor region Ω does not impact the optimal sink location as the spatial sampling frequency is above the Nyquist frequency over each subregion of Ω (see Figure 5.10). Onthe other hand, if the nodesnotuniformlyplacedinafieldare samplingspatially stationary phenomenonand their spacing is such that their measurements are uncorrelated, then the sink location depends on the node locations, because the rate at each node is identical. 5.3.5 Maximum steepness of the incremental rate Ifthedataaggregationdensityfunctionhasanearlyflatorevenprofile,thentheoptimal location for the sink is the centre of the region. Hence, we want evaluate how steep 79 can be the profile α(.) and therefore how eccentric can be the optimal sink location for a correlation structure obeying to the conditions of smooth spatial nonstationarity given in (5.26-5.27). For simplicity, we assume a monodimensional regular deployment of nodes at x i = 0,1,... and the data to be independent. Under such assumptions, the data aggregation function is computed as: α(x i ) = 1 2 log 2 (2πe)K(x i ,x i )−log 2 Δ. (5.32) SincetheautocovarianceK(x i ,x i )cangrowatmostlinearlyasiincreases,duecondition (5.26), we have that α(x i )≤ 1 2 log 2 (2πe)(a|x i −x 0 |+K(0,0))−log 2 Δ i=0,1..., (5.33) wherea is a positive constant. Hence the steepest profile that the incremental rate can have is at mostlogarithmic. We showed inExample5that thisimpliesthat theoptimal sink location is generally close to the centre and that therefore the amount of energy saved by optimally placing a sink is relatively low with respect to placing the sink at the centre. An analytical bound on the maximum steepness of α(.) seems harder to obtain for the case of correlated data. Experiments and some preliminary analysis seem to suggest that also in this case the incremental rate profile is still bounded by a slow growing function over the space. However, the bound (5.33) does not seem to apply to the correlated case. 5.4 Experimental Results We considered the rain fall data set provided by the University of Washington [65]. The data set provides the daily rain concentration in the North Western region of the US (Washington) for a period of 49 years, for multiple observation points located on regular grid. The minimum distance between two points is 50Km. This data set has been used in different works in the sensor network literature [45], [35]. A step needed to estimate the spatial covariances is to remove any seasonality trend from the data, becausethetimeseriesneedtobestationaryfortheconsistencyofthespatialestimates. We consideredthe365daydifferenceofthedata(i.e. W(x,t) :=Z(x,t)−Z(x,t−365)) and then the 10 day cumulative over a period of 80 days during the rainy season for a period of 30 years. Thus the spatial covariance between measurements at the locations 80 x i andx j is estimated as: ˆ K(x i ,x j ) = 1 T t 2 X t=t 1 Z(x i ,t)Z(x j ,t) (5.34) The resulting covariance estimates showed spatial nonstationarity expecially in sub- regions characterized by mixed terrain (e.g. plan and mountains or coast and inside region). 6 7 8 9 10 11 12 13 14 15 0 2 4 6 8 10 12 x α(x) Figure5.11: Dataaggregation function(incremental rate)for1Dsubsetoftheraindata set. From the covariance estimates, we computed the rate profiles for different subsets of the nodes, coding schemes and sink locations. The data were assumed to have a multivariate Gaussian distribution. Overall the rate profiles showed a fairly even distri- bution; e.g., see Figure 5.11. For the sink placement, we use the Matlab optimization toolbox to minimize the energy metric given below (gradient search method): E(s) = N X i=1 r(i)ks−x i k. (5.35) We also determined the optimal sink location computing the data aggregation cost for all the grid locations within a subregion (brute force method). Anyway, regardless the method being chosen, for different subsets of the rain data, the optimal sink locations turn out to be within one hop from the centre of the regions being considered. See the following example. Example 8 We considered the subregion of the observation field shown in the box in 81 Figure 5.12. We set the centre as initial sink location. We computed the incremental rate profiles for both the explicit communication and Slepian-Wolf schemes, figures 5.13 and 5.14. In both cases, the optimal sink location resulted to be very close to centre of the region (less than one hop). See Figure 5.12. The final result did not change for chooding different initial sink locations. We also show the spatial profile of the data aggregation cost in Figure 5.15. Note that figures 5.13 and 5.14 show the different profiles of the bit rate for the case of explicit communication and Slepian-Wolf. In the former scheme, the highest rates are usually at the boundary nodes, because those have no side information available to code their data. On the other hand in the Slepian-Wolf scheme, the lowest rates are closed to the sink. Anyway the optimal sink location was teh same for the schemes. X X Figure5.12: Subregionconsideredfortheexperiment, delimitedbytherectangularbox, and optimal sink location,’X’. The experiments confirm what was suggested by the analysis (e.g. see Subsec. 5.3.5). Despitethenonstationarityofthedata, theincrementalratesturnouttohaveamoreor less even profile mostly due to the effect of the logarithm calculation. We will consider different energy metrics in the future work (Sec. 7.2.3). 82 4 5 6 7 8 9 10 6 8 10 12 14 16 7 8 9 10 11 12 13 x y α ec (x,y) Figure 5.13: Rate profiles for 2D subset of the rain fall data, computed using explicit communication strategy. 4 5 6 7 8 9 10 6 8 10 12 14 16 6 7 8 9 10 11 12 x y α sw (x,y) Figure 5.14: Rate profiles for 2D subset of the rain fall data, computed using optimal Slepian-Wolf strategy. 83 4 5 6 7 8 9 10 6 8 10 12 14 16 2500 3000 3500 4000 4500 5000 x y E(x,y) Figure 5.15: Data aggregation cost profile for explicit communication aggregation asso- ciated to 2D subset of the rain fall data. The optimal sink location is the centre of the region. 84 Chapter 6 Energy Efficient Data Gathering via Distributed Classification This chapter is organized as follows. Section 6.1 defines the constraints on the formula- tionforapreliminaryenergyanalysisofthecosts andpotential savingsofimplementing content analysisinsensornetworks. Analytical conditionsonthemaximumallowed size for a class of admissible feature vectors are presented in Section 6.2. In Section 6.3, we present an approach for the distributed extraction of the intensity histogram feature and the classification of the sensing frames. Section 6.4 studies analytically how spatial subsampling of the sensing frames can impact the performance of the classification via intensity histograms. Experimental results are provided in Section 6.5. in Section 6.6, we also propose an approach for distributed classification (i.e. decisions for the whole network can be taken at nodes other than the sink) of the sensing frames and provide some preliminary results. 6.1 Problem Formulation and Efficiency Constraints In this section, we define general energy constraints on the cost of classifying rounds of data (i.e. sensing frames) and then of transmitting to the base station only the subset of those belonging to a class of interest. In this scenario, each sensor frame at a time t is considered as a still image and the sensor network performs the task of classifying it. These tasks require the nodes to process and transmit data and therefore have a non zero energy cost for the network. Here we define the problem with constraints so that the tasks of classification and transmission of data of interest to the base station are more energy efficient for the network than data gathering without content analysis. 85 The rest of this section is organized as follows. Our sensor network related assump- tions are given in Subsection 6.1.1. Subsection 6.1.2 presents the problem formulation for the ideal case of classification without errors. The problem definition is extended to the case of classification errors in Subsection 6.1.3. 6.1.1 System Assumptions In this work, we consider a set of N sensor nodes deployed over a compact region Ω ⊆R p , sampling a spatio-temporal process Z(x,t), x ∈ Ω, Ω ∈R p , during a discrete temporal interval t ∈ [t 0 ,...,t 0 +kΔt,...], k = 1,2,...,K. All the nodes are connected via a tree structure, the data gathering tree, to a special node called sink, the root of the tree. The samples of the field Z(.) are transmitted to the sink through the data gatheringtree. ThesinkisassumedtobelocatedatthecentreoftheregionΩ. Thesink transmits the data received from the nodes to a remote base station. Unless otherwise specified, we assume that the nodes have limited energy supply, while the sink doesn’t suffer of any relevant limitation in the energy supply. We adopt the shortest pathtree (SPT)as datagathering structure. We also assume thatthesamplingtimeatthesensornodesissynchronizedandthatthenetworkisfaster thanthephenomenon. Wealsorequirethatthecommunicationofthedatatothesinkis unidirectional. That is each node transmit its own data (sensor readings, data features) only once toward the sink, beside relaying data from other nodes. The purpose of the requirement of unidirectionality is to minimize the energy cost, as discussed in [59]. We already mentioned the term sensing frame. Here is a formal definition: Definition 1 We say that a complete round of measurements Z(x i ,t), i = 1,...,N at time t is a sensing frame. 6.1.2 Problem Formulation for Error Free Classification Without loss of generality, we assume that each sensing frame Z{(x i ,t 0 )} acquired by the network at a certain time t 0 can belong to one of two different classes, S H andS L , and that the end users are more interested in the sensing frames belonging to class S H (e.g. intense rain over Ω) than to those belonging to class S L (e.g. moderate or no rain). Since we assume that the sink does not have energy limitations (e.g. because it is connected to the main electrical network), the only opportunity to save energy is in the multi-hop communication from the nodes to the sink. The energy saving can be achieved by limiting the number and/or the size of the messages sent from the nodes to the sink. 86 Let us denote withE H the energy spent by the network to transmit a sensing frame of classS H to the sink and withE L the energy spent for the similar task, if the sensing frame belongs to class S L . With the assumption of shortest path data gathering and fixedradio range communication, an expressionto compute the energy costE H is given in [54]: E H = N X n=1 r(n)d(x n ), (6.1) where r(n) and d(x n ) are respectively the size of the packet generated by node n and thehopdistancefromthatnodetothesink. WesetE L tobeafractionofE H according to a certain scalar factor α < 1, i.e. of E L = αE H . If α = 0, the network does not transmit to the sink the samples classified as belonging to S L . We assume that over an interval of observation [1,K], there areK H andK L sensing frames of classes S H and S L respectively, with K =K H +K L . The total energy spent by the sensor network to perform data gathering, assuming that there are no errors in the classification of the samples, is: E =K H E H +K L E L +KE a , (6.2) where E a is the additional energy spent by the network to analyze the data at time k in order to classify a sensing frame. In this work, we assume that E a depends only on additional communication messages needed to analyze the data and not on the compu- tation cost at the nodes. The packages transmitted via these additional communication messages can contain for instance feature vectors or data necessary to extract feature vectors. Some kinds of data analysis require a lot of inter-node communication (e.g gradient calculations) and hence they may not be convenient to be performed for the purpose of saving energy. Let β be the ratio between the data analysis cost and E H , i.e. β =: E a /E H . In order to compare the cost of data gathering via content analysis and the the cost of regular data gathering, we define the efficiency factor, η, as η := E/(KE H ), where E is given by equation (6.2). We can rewrite η as a function of α and β: η =p H +α(1−p H )+β, (6.3) where the termp H := K H K is the frequency of occurrence of high interest sensing frames anditcanbeseenasanestimateoftheprobabilityofoccurrencefortheeventsofhigher interest to the end users; it depends on the nature of the phenomenon being analyzed by the network. The factors α and β depend on the design of the system. Our goal is 87 to design sensor network systems such that η<0, i.e.: p H +α(1−p H )+β<1. Definition 2 We say that an in-network classification system is efficient if η<1. We have implicitly assumed that the decision on the classification of a sensing frame is taken at the sink. Therefore, the nodes need to be notified by the sink to transmit their measurements to send a sensing frame to the base station. We introduce a term θ to account for the cost of this notification. We assume θ < 1. For simplicity, we consider the case of α=0, i.e: η =p H (1+θ)+β (6.4) Proposition 5 A necessary condition for an in-network classification system to be ef- ficient is that β<1. The proof of Proposition 5 is obvious. The relevant implication is a constraint on the energy that the system is allowed to spend to extract and transmit the features to the decision center (which is either the sink, or some nodes in the network) and to eventually transmit the classification decisions. A consequence is that some features whoseextractioninvolvesalotofcommunicationamongthenodesmaybetooexpensive to be handled by such a kind of system. 6.1.3 Formulation and Constraints under Misclassification Errors In a real situation, the system can classify some sensing frames incorrectly. Therefore, we consider two type of errors, respectively related to the misclassification of high and low relevance samples: ε M = K L/H K H , (6.5) ε F = K H/L K L , (6.6) whereK L/H (K H/L )isthenumberofsetsofsamplesbelongingtoclassS H (S L )classified as belonging to S L (S H ). We have that K = K ′ H +K ′ L = (K H/H +K H/L )+(K L/H + K L/L ). The cost of classifying aggregating K sets of samples can be written as: E ′ =(K H/H +K H/L )(E H +θE H )+(K L/H +K L/L )(E L +θE L )+E a , (6.7) 88 and can be expressed as E ′ =(K H/H +ε F K L )(E H +θE H )+α(K L/L +ε M K H )(E L +θE L )+βE H . (6.8) Therefore from equations (6.5 - 6.8): η =β[p H +α(1−p H )+(1−α)[(1−p H )ε F −p H ε M ]](1+θ) (6.9) For α =0, the efficiency factor is given by: η =p H (1+θ)+β+[(1−p H )ε F −p H ε M ](1+θ)<1 (6.10) The above expression ties together many parameters relevant to the system design. Intuitively, a high false alarm error rate, ε M , increases the amount of energy spent by the system, while a high ”missed error rate” ε M causes loss of information to the end users. The misclassification of high relevance samples decreases the energy cost, but at the expense of loss of relevant information for the end users. Another potential trade- off, implicit, in the above equations is between the average cost factor for analyzing the data, β, and the classification errors. Intuitively, lowering β may increase the error rates. 6.2 Admissible Features In Section 6.1, we presented a necessary condition on the cost of analyzing the data for content classification based data gathering to be more efficient than standard data gathering. In this section, we make furtherassumptions on the content analysis process inordertodeterminepropertiesofthefeaturesthatcanbeusedforthecontent analysis task. Here we assume that at each node the content analysis consists in the extraction of one or more vectors of features computed from the measurement data and in the aggregation of other feature vectors received from the children. Let v the number of bits used to encode the feature vectors at each node. We assume v to be constant over the network. Let p the number of bits used to encode a standard measurement packet. 1 Let ρ:=v/p. Based on the energy cost model we adopted in (6.1), the cost of sending a standard package to the sink is p×h×d, where h is the number of hops between the node and the sink and d is the hop distance. The total cost of a full round of data gathering for 1 Note that both v and r include also a header packet. 89 a fully connected sensor network is given by the expression: E = N h X h=1 p×n(h)×h×d, (6.11) whereN h is the number of hops andn(h) is the number of nodes at the hoph. On the other hand, the transmission cost involved in analyzing the data is equivalent to the cost of transmitting the feature vectors hop by hop all the way to the sink, which can be expressed as: E a = N h X h=1 v×n(h)×d =vdN, (6.12) where N is the total number of nodes in the network. Recall that we have assumed v to be constant over the network. The necessary condition defined in Proposition 5 (Subsec. 6.1.2) requires that β =E a /E <1. Therefore: β = vN p P N h h=1 n(h) =ρ N P N h h=1 n(h) <1. (6.13) Throughout the rest of this work, we refer to β as the feature to data gathering cost ratio. By solving the eq. (6.13) for ρ, we have : ρ< P N h h=1 n(h)h N . (6.14) Hence, the maximum allowed size of the feature vector depends on the number of hops and on the spatial distribution of the nodes. If due to the problem constraints, we need to chose ρ so that β ≤β 0 <1, we have: ρ≤β 0 P N h h=1 n(h)h N . Example 9 Consider the case of sensor network whose topology consists of a balanced binary tree, i.e. each node has exactly 2 children and n(h) = 2 h−1 . Therefore from eq. (6.13), we have: ρ< P N h h=1 2 h−1 h P N h h=1 2 h−1 = 2 N h (N h −1)+1 2 N h −1 In Figure 6.1, we show the value of ρ as a function of the number of hops forβ =1 and β =.5 respectively. For instance for a network with 4 hops, and β =.5, ρ =1.6. Therefore, it may not be very convenient to apply in-network content analysis tech- niques to networks with a small number of hops (e.g. 2). Even in networks with 4 or 90 1 2 3 4 5 6 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Number of hops ρ max β=1 β=.5 Figure 6.1: Maximum allowed size of the feature vector with respect to the number of hops for a binary balanced tree network. more hops, the size of the feature vectors is strictly limited. This also poses restrictions on the kind of features allowed for in-network content analysis. In the next section we focus our study on a class of features that can satisfy the above constraints and can be used to classify sensing frame: the intensity histogram. 6.3 Content Analysis via Intensity Histograms In Section 6.2, we assume that the size of the feature vector is constant through the node by node extraction process over the network. A type of feature that respects this constraintsistheintensity histogram, whichhasbeenwidelyusedinimageretrieval[16]. For instance, the histogram of two distinct subsets of measurements is equal to the sum of the histograms of the two subsets. Therefore, we focus the rest of this work on the analysis of sensing frames via intensity histograms. We give some formal definitions. Let Z ∈N N a vector of sensing measurements, with N ∈N and z(i) ∈ (a,b) ⊂N, where i = 1,2...,N and (a,b) is a finite subinterval 2 . The elements of Z are the measurements extracted at a time t k by a set of N sensing nodes deployed over a compact region Ω. 2 Sensing measurements are quantized and can be assumed integer up to a constant scale factor 91 Definition 3 We define the histogram of Z as h Z : (a,b)→N N s.t.: h Z (u) := N X i=1 x i , (6.15) ∀u∈(a,b), where x i = ( 1 if z(i) =u, 0 if z(i)6=u. (6.16) Usually the number of bins is smaller than the number of possible values of z. Definition 4 We define the L−bin histogram of Z as ˜ h Z : (a,b)→N L s.t. ˜ h Z (k) := N X i=1 ˜ x i , (6.17) ∀k∈(1,L), where ˜ x i = ( 1 if z(i)∈I k , 0 if z(i) / ∈I k . (6.18) where the subintervals I k , k =1,...,L form a partition of the finite interval (a,b), i.e.: I k ⊂(a,b), ∪ k I k =(a,b) and I k ∩I l =∅, k6=l. Theintensityhistogramsofasensingframecanbeextractedinacompletelydistributed manner from the sensor nodes. The sink can then perform classification by comparing the distance of the extracted histogram with a set of reference histograms. Each refer- ence histogram represents a class for the sensing frames. The reference histograms can beobtainedinacentralizedmanneratthebasestationfromasetofhistoricaldata. The method we adopt for the distributed extraction of the histograms is described in Table 6.1. In the next two subsections, we discuss possible ways to increase the efficiency of the classification system. 6.3.1 Efficient encoding of the histogram Here we define a simple method for efficient lossless and lossy encoding of such feature. First of all, if the total number of nodes is known, the transmission of one of the bins in the histogram can be avoided This means that for example, ifρ=2 and the number of bits to encode a bin is equal to the size of a packet of sensing data, the histogram can have at least 3 bins. In general, at a certain nodez i with N d descendants, the number of bits needed to encode a bin, v b =⌈log 2 (N d )+1⌉. Therefore, if each node knows the exact number of 92 for each sampling time k do for each node (starting from most external hop)do compute histogram of data (1 bin) sum histogram to children histograms send histogram to parent node end for At central node(s) Compute distance metric d(k) =kh(k)−h ri k frame at time k associated to class of closest h ri end for Table 6.1: Pseudo code for distributed histogram acquisition. its descendants, we can design a histogram coding scheme that allocates progressively only the bits strictly necessary for each bin, increasing the number of bits as the nodes get closer to the sink. Forms of lossy encoding of the bins can also be adopted, for instance by quantizing the bins. As in the previous case, it is end of the scale value should be upgraded as the nodes get closer to the sink. 6.3.2 Methods to increase the efficiency Recall thatwedefinedtheefficiency inEq. (6.10). Theonlyparameterwecanpartially affect with the design of the feature extraction and analysis tasks is β, the feature to data gathering cost ratio (FDR). Several methods can be adopted to reduce theFDR and therefore to potentially increase the efficiency η: 1. Reduce the number of bins 2. Reduce the number of bits per bin via quantization 3. Reduce the number of bits per bin and the number of messages to analyze the data by reducing the number of measurements used to compute the histograms, i.e. by subsampling the sensing frames 4. Reduce the average number of messages to analyze a sensing frame by classifying the sensing frames on incomplete sets of data. In the remaining part of this chapter (sections 6.4 and 6.5), we study in detail the problem of histogram extration and analysis from subsampled sensing frames (item n. 3 of the above list). In Section 6.6, we also show some preliminary results on the distributed classification via incomplete data (item n. 4) 93 6.4 The Effect of Subsampling over Classification In this section, we consider spatially subsampled sensing frames, analyze how the his- tograms of the original frames are affected by subsampling (Ssec. 6.4.1) and how this impacts the classification performance (Ssec. 6.4.2). We carry out this study by means of analytical results and simulations over synthetic and real data (Ssec. 6.5.3). 6.4.1 Impact of Subsampling over Intensity Histogram In this section, we model the histogram of a subsampled sensing frame as a random vector, whose statistics are characterized by the histogram of the sensing frame before subsampling. We first give some preliminary definitions. Let Z be a N-dimensional vector of sensing measurements. We give a formal def- inition of the histogram of Z in Section 6.3. Let Y ∈ N m , m < N a subsampled version of Z according to a subsampling factor D. In the analysis and the simulations on synthetic data and experiments of this work, we have been considering three types of subsampling: Definition 5 Regular subsampling: y(i) = z(Di), where D is the subsampling factor; m=⌊ N D ⌋. Definition 6 Uniform random subsampling: Z is sampled at m random locations, uni- formly distributed over the N locations of Z; m =⌊ N D ⌋. Definition 7 Bernoulli subsampling: each element z(i)∈Z can be sampled with prob- ability p = 1 D . In this case, E[m] =⌊ N D ⌋. This methodology is easy to implement in a distributed system and therefore suitable for wireless sensor networks. We also define a subsampling operator. Definition 8 Let Z an N dimensional set of measurements, D an integer s.t. N/D = m is also an integer. We define a subsampling operator S D (.) s.t. S D (Z) =Y, where Y is a subsampled version of Z with subsampling factor D. Let the m−dimensional vector Y = S D (Z) = be a subsampled version of Z with subsampling factor D, (m < N). Let h Z and h Y be respectively the histograms of Z and Y. Our goal is to analyze the relationship between h Y and the histogram h Z of the the sensing frame Z prior subsampling. The histogram, h Y of the subsampled process Y can be seen as the realization of a random vector H Y . We formulate the following proposition. 94 Proposition 6 We assume for simplicity that m,N ∈N s.t. m N ∈N. Let Y a uniform random subsampled version of Z and h Y the histogram of Z. Then: E[H Y ]= 1 D h Z , (6.19) where E[.] is the statistical expectation. Proof: We evaluate the element H Y (u) of the random vector H Y . From the definition of histogram and the hypotheses: E[H Y (u)] =E " m X i=1 x i # = m X i=1 E[x i ]=mP (Y =u)= N D 1 N h Z (u) = 1 D h Z (u), where x i is defined in Eq. (6.16), P(Y = u) = 1 N h Z (u) thanks to the hypothesis of uniform random sampling. Therefore: E[H Y ] = 1 D h Z . Proposition 7 The result given (6.19) applies also to the case of Bernoulli subsam- pling. Proof: We consider an element of the random vector H Y (u 0 ), i.e.: E[H Y (u)] =E " N X i=1 x i # = N D P(Y =u)= 1 D h Z (u), x i = ( 1 if picked with P = 1 D 0 if not picked with P =1− 1 D Corollary 1 Let Z an N−dimensional set of measurements with histogram h Z and D∈N, s.t. m=N/D∈N, Y =S D (Z). Then as N →∞, h Y → 1 D h Z with probability P =1. (6.20) Proof: Let {Z i }, i = 1,2,... a sequence of N i −dimensional vectors s.t. {h Z i } = { N i N 0 h Z 0 } . Let {Y i } a sequence of m i dimensional vectors such that {Y i }={S D (Z i )}. We evaluate an element of E[H Y i ]. From Proposition 6: E[h Y i ] = 1 D h Z i = N i D h Z 0 N 0 (6.21) By dividing the left and right hand sides with N i , we see that all the elements of the random sequence E[ h Y i N i ] have the same expected value. Due to the strong law of large 95 numbers, we have that: h Y i → N i DN 0 h Z 0 , as i→∞. Now let h Z := N i N 0 h Z 0 and N =N i Then i→∞ implies N →∞, hence the proof. We can see the process of subsampling Z as an estimation of a scaled version, h Y , of h Z . A small numberof available samplesN implies thelikelihood ofalarger estimation error. Thisestimation errorimpactstheclassification ofthesubsampledsensingframes. Proposition 8 Propositions 6, 7 and Corollary 1 hold also for the case of L−bin his- tograms. Proof: Let h Y be the histogram of Y and ˜ h Y the L−bin histogram Y. We observe that for a certain interval I j =(a j ,b j ): ˜ h Y (j) = b j X i=a j h Y (i). Hence, all the proofs follow by observing that: E[ ˜ h Y (j)] = b j X i=a j E[h Y (i)]. From evidence gathered via simulations (Ssec. 6.5.3), we make the following conjec- tures. Conjecture 2 Let E 2 be the mean square error in the estimate of the scaled histogram of Z. If D<1, E 2 = h Z −Dh Y N 2 (6.22) Then E[E 2 ]=O 1 m , (6.23) where recall that m=N/D Note that in a real sensor network scenario, the total number of nodes in the network mayberelativelysmall,e.gN <50,andsom. Therefore,weareinterestedinpredicting the MSE for m small. 96 Conjecture 3 An approximation of the expected value of the MSE, Eq. (6.22), is given by: E[E 2 ]≈ 1 m 1− 1 D 2 . (6.24) If m is small, Equation (6.24) is affected by the term 1− 1 D 2 , which is zero when D =1. See Figures 6.7.a and 6.7.b in Subsection 6.5.3. 6.4.2 AnalyticalComputationoftheErrorRatesandtheMeanSquare Error In this subsection, we propose analytical expressions to predict the error rates for his- togram based classification of sets of labeled sensing frames subsampled via uniform random or Bernoulli subsampling. Let {Z i }, i = 1,2,...,N t , a sequence of N dimensional sensing frames, {Y i }, i = 1,2,...,N t , the correspondingsequence ofm i dimensional 3 subsampledsensing frames, with {h Z i } and {h Y i } the corresponding sequences of L dimensional histograms. We can see all the histograms laying on hyperplanes in aL dimensional space, since: L X j=1 h Z i (j) =N L X j=1 h Y i (j) =m i We assumethat each element of the sequence of histograms associated toZ i (andhence the sensing frames Z i themselves) either belongs to a class S 1 , or to a class S 2 . The process of subsampling a sensing frame Z and then to extract the intensity histogram can be modeled as the mapping of the histogram Z onto a different hyperplane in a L dimensional space. In the case of uniform random or Bernoulli subsampling of a frame Z, there is a finite set of possible Y = S D (Z) and hence of possible histograms h Y . Some of these histograms may belong to class S 1 , while the others to class S 2 . We indicate with {h (l) Y i }, l = 1,... the sequence of all the possible histograms that can obtained by subsamplingZ i , whereY i =S D (Z i ). LetS ∋Z i and ¯ S the complement of S. In the case of uniform random subsampling and 2−bin histograms, the total error 3 mi =m ∀i for regular and uniform random subsampling. 97 probability can be written as: P U (e) = 1 N t N m −1 Nt X i=1 X l:h (l) Y i ∈ ¯ S,h Z i ∈S h Z i (1) h (l) Y i (1) N −h Z i (1) m−h (l) Y i (1) , (6.25) where recall that n k = n! k!(n−k)! and m=⌊ N D ⌋. In the case of Bernoulli subsampling with subsampling factor D and 2−bin his- tograms, the total error probability is given by: P B (e) = 1 N t Nt X i=1 N X m=0 X l:h (l) Y i ∈ ¯ S,h Z i ∈S N X m=0 h Z i (1) h (l) Y i (1) 1 D h (l) Y i (1) 1− 1 D h Z i (1)−h (l) Y i (1) N −h Z i (1) m−h (l) Y i (1) 1 D m−h (l) Y i (1) 1− 1 D N−h Z i (1)−m+h (l) Y i (1) = 1 N t Nt X i=1 N X m=0 X l:h (l) Y i ∈ ¯ S,h Z i ∈S h Z i (1) h (l) Y i (1) N −h Z i (1) m−h (l) Y i (1) 1 D m 1− 1 D N−m (6.26) , The expressions (6.26) and (6.25) require the a priori knowledge of the population of histograms of the sensing frames that are going to be subsampled and can be easily generalized to the case of multiple bin histograms. The (6.26) and (6.25) can be also used to compute misclassification and false classification error rates, if they are applied respectively only to histograms associated to the class of interest, or of low interest. Figures 6.8.a 6.8.b in Subsection 6.5.3 show a comparison between the analytical and experimental error rates. If we have a certain population ofN H histogram prototypesH Z i , each of them with probability of occurrenceP(H Z i ), we can rewrite the total error probability expression (6.27) as: P U (e) = N m −1 N h X i=1 P(H Z i ) X l:H (l) Y i ∈ ¯ S,H Z i ∈S h Z i (1) h (l) Y i (1) N −h Z i (1) m−h (l) Y i (1) , (6.27) A similar generalization can be made also for equation (6.25). 98 6.4.3 Mean Square Error With an approach similar to what we adopted to compute the bit error rates, we can also compute analytical expressions for average mean square error of a certain set of framesgiven theirhistograms. Itcanbeshownthatfortwobinhistogramsanduniform random subsampling, the expected value of the MSE is given by: E " h Z −Dh Y N 2 # = 1 N t Dm m −1 Nt X i=1 min(m,h Z i (1)) X k=0 h Z (1) k Dm−h Z (1) m−k (Dk−m) 2 D 2 m 2 (6.28) where we assume that N = Dm and that h Z i (2) ≥ h Z i (1). The (6.28) can be easily modified for the case h Z i (2)<h Z i (1). In the case of of Bernoulli Random subsampling, we have: E " h Z −Dh Y N 2 # = 1 N t Nt X i=1 N D X m=0 N m −1 min(m,h Z i (1)) X k=0 h Z (1) k N −h Z (1) m−k (Dk−m) 2 D 2 m 2 1 D m 1− 1 D N−m (6.29) 6.5 Experimental Results In this section, we first describe the general methodology we use for our experiments (Subsection 6.5.1). Subsection 6.5.2 presents several experiments of sensing frame clas- sification. Subsection 6.5.3 compares analytical results from Section 6.4 on the impact of sensing frame subsampling with experimental results on synthetic and real data. Fi- nally Subsection 6.5.4 studies the impact of subsampling on the effective efficiency for classification and gathering of sensing frames. 6.5.1 Methodology As in our previous work, we use the rain fall data set provided by the University of Washingtonforourexperiments[65]. 4 Thedatasetprovidesthedailyrainconcentration intheNorthWesternregionoftheUS(Washington)foraperiodof49years,formultiple observation points remapped on a regular grid. The minimum distance between two points is 50Km. Unless otherwise specified, our experiments consist in the following stages: 4 We also considered the Intel Lab Data set, which consists of temperature, humidity and light samples from 54 sensors deployed at the Berkley Intel Lab over a period of about 30 days. However, the problem with this data set is that the considered 30 days time span is quite short. 99 • Regionselection: usuallyweselect asub-regionoftheregion covered bythethe rain fall data set, for the sake of having more spatially homogeneous data, and a certain time period (t 1 ,t 2 ), e.g. 5 years starting from August 12th, 1960. • Labeling: Allthesensingframesforthatsub-regionandtimeperiod(e.g. 365×5 sensing frames) are labeled via an unsupervised learning method k−means clustering) for a intensity histogram with a certain number of bins. The number of classes is decided a priori. Theclass of interest is usually represented by the sensing frames of the smallest cluster. • Training: Atemporalsubsetofthelabelingsetisusedfortraining. Unsupervised learning is performedagain over the training subset. The centroids of the clusters (which are histograms) are used as reference features by the sensor network for the classification task in the testing stage. • Testing: The testing stage is performed over the temporal subset of the labeled sensing frames not used for training. The number of bins of the histograms for this stage is equal or less the number of bins chosen for the labeling stage. Each sensing frame is classified via a minimum distance criterion through the reference histograms. In a real scenario, it is reasonable to assume that the training of the classifiers is performed in a centralized manner with subsets of data sent from the sensor network to the base station. Then the classifier parameters can be transmitted to the sink and to the sensor nodes involved in performing the classification. Note that because of the high similarity between the unsupervised learning method used to perform the labeling of the data and the in-network classification process, it is reasonable to expect a relatively low error rate in the testing stage. The goal of our experimentshereisnottoclaimexcellentclassificationperformances,buttounderstand key trade-off for these systems. 6.5.2 Experiments (No Subsampling) Experiment 1 In the set of experiments, a network of 36 nodes placed in the ’dry part’ of the sensor region is considered. The experiments are run for 3 and 4 hop topologies. The testing has been performed over a 5 years of rainfall data (i.e. 1825 sensing frames). The training has been performed over sequences whose length varied from 90 days to 2 years. All the frames of rainfall data used for testing and training (for a total of 7 years) have been labeled via k-means clustering into 3 classes. The class of interest is chosen as the the rainfall data of the days of high intensity rain. Both 4 and 100 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 9 9.5 10 10.5 11 11.5 12 12.5 13 13.5 14 3 Hops 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 9 9.5 10 10.5 11 11.5 12 12.5 13 13.5 14 4 Hops (a) (b) Figure 6.2: Node topology for over ’dry region’ for 3 and 4 hops. 2 bin histograms are used for the classification of the sensing frames. The plots show the network topology (Fig. 6.2), the effective vs. estimated efficiency ratio, η (Figure 6.3). Recall that if η < 1, the network is saving energy. Figure 6.4 shows the error rates for false and missed detections. It can be noticed that the rate of missed detections seems to increase with the number of training samples, despite intuition would suggest the opposite. Further experiments has shown that this is an anomaly just related to the seasonality phenomena. In other words, some training segments may be better then other ones for certain testing segments, even if they are shorter. On the other hand, the rate of falsely detected frames tends to decrease as the number of training samples increases. Note that in the experiments discussed in this section, we assume the size of the header of the feature and data packet to be equal to 0 bits, while in other sets of experiments we assume it to be equal to 16 bits. 6.5.3 Simulations and Experiments to Study Impact of Subsampling This subsection compares analytical results from Section 6.4 with results from simu- lations and experiments on synthetic (Exp 2) and real (Exp. 3) data. The goal is to study the impact of sensing frame subsampling over classification via histograms. Experiment 2 We consider a vectorX ofN random samples drawn from zero amean, unitary variance Gaussian distribution, N(0,1). We filter X with a finite impulse re- sponse (FIR) filter to obtain a colored sequence Z with certain spectral properties. The filter coefficients have been designed via the Parks-McClellan algorithm. We perform regular as well as random uniformly distributed sampling of Z for several values of the 101 0 100 200 300 400 500 600 700 800 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Length of the training sequence [days] η 3 Hops Effective η (4 bins) Estimated η Effective η (2 bins) Estimated η 0 100 200 300 400 500 600 700 800 0.4 0.5 0.6 0.7 0.8 0.9 1 Length of the training [days] η 4 Hops Effective η (4 bins) Estimated η (4 bins) Effective η (2 bins) Estimated η (2 bins) (a) (b) Figure 6.3: Effective versus estimated efficiency for 3, 4 hops, 4 bins and 2 bin under- sampled histograms. subsampling factor D. Let Y the subsampled version of Z. Let h Z and h Y the his- tograms of Z and Y respectively. We are interested in studying the behavior of the mean square error metric as defined in (6.22): E[E 2 ]=E[ 1 N 2 k(h Z −Dh Y )k 2 ]. We average the results over 500 Monte Carlo trials for the case of regular subsam- pling, and 10000 trials for uniform random or Bernoulli subsampling. The FIR filter designed with 30 coefficients. We consider 10 bin histograms. We vary the number of samples N and the subsampling rate D. We generate the colored sequence Z via a low pass and a high pass filter as well as without any filtering. The low and the high pass filters are designed to have symmetric spectra. Figure 6.5.a compares the average MSE for the regular subsampling of three different processes. Figure 6.5.b compares the average MSE for uniform random subsampling. The average MSE for regular, uniform random and Bernoulli subsampling is shown in Figure 6.6. Finally, Figure 6.6 compares the formula to approximate the average MSE (Equation 6.4, Subsection 6.4.1). We make the following observations: • Aliasing due to subsampling has a slight effect on the MSE for regular sampling ((Fig. 6.5.a), but no impact on the MSE for uniform random subsampling (Fig. 6.5.b). • The average MSE for Bernoulli subsampling is very similar to the average MSE 102 0 100 200 300 400 500 600 700 800 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Number of days for training stage False classification % 4 bins, 4 hops 4 bins, 3 hops 2 bins, 4 hops 2 bins, 3 hops 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 Number of days for training stage Missed classification % 4 bins, 4 hops 4 bins, 3 hops 2 bins, 4 hops 2 bins, 3 hops (a) (b) Figure 6.4: Error rates for false detected and missed sensing frames for various lengths of the training sequence, 3 and 4 hop networks and 4 and 3 bin histograms. for uniform random subsampling (Fig. 6.6). • The MSE decreases as 1 m , where m is the number of samples of a subsampled sensing frame (Fig. 6.7). In conclusion, the number of available samples m seems to be the parameter more in- fluential on the normalized mean square error. Experiment 3 In the following experiments, we compare the error rates for regular, Bernoulli and uniform random subsampling over 5 years period of the rain fall data set. We consider 2 classes and 2 bin histograms. In Figures 6.8.a and 6.8.b, we compare the error rates obtained analytically (Subsection 6.4.2, Eq. (6.25) and (6.26)) and experimentally. It can be noticed that both the formulas predict the error rates quite accurately. Figure 6.9 shows the comparison of the analytical (Eq. 6.28), approximated (Eq. 6.24) and experimental MSE. The analytical MSE matches the experimental one very accurately, while there is a small bias w.r.t. the approximated MSE. In Figures 6.10.a and 6.10.b, we show the MSE error rates and the MSE for the three types of subsampling on the same subset of the rain fall set, over a 5 year period. The experiments were repeated 500 times for uniform and Bernoulli subsampling. In these experiments uniform random sampling leads to lower error rates than Bernoulli sampling. 103 100 200 300 400 500 600 700 800 10 −3 10 −2 10 −1 10 0 White − LP − HP Comparison (regular subsampling) Normalized MSE Number of samples before subsampling White (D=4) LP (D=4) HP (D=4) White (D=8) LP (D=8) HP (D=8) 100 200 300 400 500 600 700 800 10 −3 10 −2 10 −1 10 0 White − LP − HP Comparison (uniform sampling) Number of samples before subsampling Normalized MSE White (D=4) LP (D=4) HP (D=4) White (D=8) LP (D=8) HP (D=8) (a) (b) Figure 6.5: The MSE for regular (a) and uniform random (b) subsampling vs. number of samples and signals with different spectral characteristics. 6.5.4 Impact of Subsampling over Effective Efficiency Recall that efficiency factor defined in Subsection 6.1.3 is given by: η =p H (1+θ)+β+[(1−p H )ε F −p H ε M ](1+θ)<1. The goal of subsampling is to increase the η by decreasing the size of the histogram packets and therefore decreasing the feature to data gathering cost ratio (FDR, i.e. β). 5 In our context, we use η along with ε M as main indicators to characterize the results of our experiments. In our simulations, we consider a sensor network withN nodes and 1 sink acquiring datafromtherainfalldataset. ThenetworkperformsBernoullisubsamplingofthesen- sor frames: at each data acquisition timet k , each node uses its data for the distributed computation of the histogram with probability 1/D, where D is the subsampling fac- tor. This is a way to perform subsampling for any type of sensor network topology. We consider also regular and uniform subsampling for understanding purposes. Experiment 4 In this experiment we considered 1 year of training frames and 3 years of testing frames over the whole region (163 nodes). The topology of a 6 hop network is shown in Figure 6.11. A 16 bit header is assumed for all the packets. First 8−bin 5 Note that subsampling reduces also the number of messages transmitted, when the measurements from leaf nodes (i.e. without children) are not used. In this case, the nodes don’t generate any packet. 104 100 200 300 400 500 600 700 800 10 −3 10 −2 10 −1 10 0 Regular, Uniform Random and Bernoulli, (White Process) Number of samples before subsampling Normalized MSE Regular s. (D=4) Regular s. (D=8) Uniform r. s. (D=8) Uniform r. s. (D=4) Bernoulli s. (D=8) Bernoulli s. (D=4) Figure 6.6: Comparison of MSE for different types of regular, uniform and Bernoulli sampling, for white processes. histogram are usedfortheclassification. Without anysubsampling ofthesensingframes, content based data gathering is less efficient than data gathering (η>1). Figure 6.12.b shows how (Bernoulli) subsampling improves the effective efficiency, which becomes <1 for D =8 and D =16. Then we perform labeling, training and testing for the same sequences of sensing frames via 4 bin histograms. Figure 6.13.a compares the error rates and the effective efficiency for regular, uniform random and Bernoulli subsampling. Regular subsampling leads to slightly better performances. Figure 6.13.b shows how the effective efficiency for the case of Bernoulli subsampling varies as the size of the header increases. As the header size increases, the potential improvement of the efficiency given by subsampling decreases. 6.6 Distributed Classification Decision from Incomplete Data As we mentioned in Section 6.3, a way to increase the efficiency is by limiting the number of messages that are exchanged for extracting and analyzing the histograms. This can be seen as alternate approach to reducingthe numberof bits and/or of bins of the histogram. In this section, we propose an approach for the distributedclassification of the sensing frames from incomplete sets of data. An incomplete set of data in this 105 500 1000 1500 2000 2500 3000 10 −2 10 −1 Number of samples before subsampling Normalized MSE Experimental vs. Analytical MSE (m constant) U. random (m=25) U. random (m=50) m −1 (1−D −1 ) 2 (m=25) m −1 (1−D −1 ) 2 (m=50) 100 200 300 400 500 600 700 800 10 −3 10 −2 10 −1 10 0 Normalized MSE Experimental vs. Analytical MSE Number of samples before subsampling U. random, D=4 U. random, D=8 U. random, D=16 m −1 (1−D −1 ) 2 , D=8 m −1 (1−D −1 ) 2 , D=16 m −1 (1−D −1 ) 2 , D=4 (a) (b) Figure 6.7: Analytical vs. experimental MSE for uniform random subsampling with m constant (a) and Comparison of experiments MSE vs. 1/m (b). context is a set of measurements from subtree of nodes the sensor network. The feature extraction proceeds from the most external hops to the sink. The key idea here is that the nodes can perform the feature analysis on the partial data from the children measurements. Each node analyzes the features extracted from the partial data and, if there is a good degree of confidence, can take a decision for the whole network. If this is the case, that node sends a request to all the other nodes to quit the feature acquisition process and to send their measurements to the sink. This allows saving of energy with respect to the case of centralized decision, because the size of notification/request package is smaller than the one of a feature vector package. The amount of energy saved when one of the nodes ”decides” for the rest of the network depends on how many other nodes are notified early enough to interrupt the feature acquisition process and send their measurement to the sink. Therefore the total energy saved can potentially being increased by delaying the process at nodes with low confidence measure. Hence, we need an algorithm to increase or decrease the waiting time for a node to transmit the feature package to its parent according to value of the confidence measure. This turns out to be a very challenging problem and it is not addressed in this work. Here we propose a method that combines irregular subsampling and distributed decision sothat thenetwork hastheability tostop thefeatureextraction processbefore it is complete. 106 2 3 4 5 6 7 8 9 10 11 12 2 4 6 8 10 12 Subsampling factor N. of errors (%) Analytical vs. Experimental Error Rates (U. Random Subsampling) 2 3 4 5 6 7 8 9 10 11 12 0 5 10 15 20 Subsampling factor Missed / False (%) Error rate (Analytical) Error rate (Experimental) Missed (Analytical) False (Analytical) Missed (Experimental) False (Experimental) 2 3 4 5 6 7 8 9 10 11 12 2 4 6 8 10 12 14 Subsampling factor N. of errors (%) Analytical vs. Experimental Error Rates (Bernoulli Subsampling) 2 3 4 5 6 7 8 9 10 11 12 0 5 10 15 20 25 30 Subsampling factor Missed / False (%) Error rate (Analytical) Error rate (Experimental) Missed (Analytical) False (Analytical) False (Experimental) Missed (Experimental) (a) (b) Figure 6.8: Comparison of analytical and experimental error rates for uniform random (a) and Bernoulli (b) subsampling over a 5 years period of the rainfall data set (2 bin histograms). 6.6.1 Problem Statement In the centralized feature analysis approach proposed in the previous sections of this work, the sink aggregates the histograms from all the subtrees in one single histogram vector h Z . Then performs the classification, by computing the distance, kh Z −h i k, between h Z and the prototypes h i from the training process: kh Z −h i k. The frame Z is assigned to the class S i of the closest prototype. Inthiscase, givenapartialsetofmeasurementsY j ⊂Z(e.g. restrictedtothenodes of a certain subtree T(j) rooted at a node j), we would like an estimate of P(S i /Y j ). Note that in this case, P(S i /Y j ) can be low for all the possible classes i, because only an incomplete set of measurements, from only a small subregion of the network, is available. However, under some conditions on the phenomena being monitored by the network and on the subset of measurement, P(S i /Y j ) can be high enough for a certain i. Example 10 Consider a one dimensional network with N nodes plus one sink, placed in the middle of the network. We assume the nodes communicate to the sink via multi hop and that each node is on a different hop. We assume that after the training process, a day is considered ’rainy’ over the sensor field if there is intense rain over at least M nodes, with M <N/2. If during the feature extraction process, a node j at l hops from the sink has ’counted’ M−1 rainy nodes among its children and its own measurement, 107 2 3 4 5 6 7 8 10 −3 10 −2 10 −1 10 0 Subsampling Factor MSE Comparison of experimental and analytical MSE (U. random sampling) Experimental MSE Analytical MSE m −1 (1−1/D) 2 Figure 6.9: Comparison of analytical, approximated and experimental average MSE for uniform random sampling over a 4 years period of the rainfall data set (2 bin histograms). that node has the potential to assess that the day is rainy over the field with high degree of confidence. In this case, we have that: P(”Non rainy day over region”)=P(”All remaining N 2 +j−1 nodes are not rainy”) For instance if N/2+j ≥11 and at each node, P(”rain”) =.2, the confidence is larger than 91%. 6.6.2 Confidence Metric We assume that the sink is placed at center of the sensor region and that the nodes are placed uniformly. Let N the total number of sensing nodes (sink excluded) and n 1 the number of nodes in the closest hop. A rough approximation for the average number of nodes on a tree rooted at the sink is N/n 1 . Hence, each subtree is going to have on average at mostN/n 1 measurements in order to take a decision. This implies that this system can only take a reliable decision if the size of the phenomenon of interest covers a comparable sub portion of the sensor region (e.g. n 1 /N). Otherwise, the number of available data can be insufficient to take any classification decision outside the sink. In our approach, the vectors of partial measurements at each node are augmented with the most likely a priori values (e.g. ’no rain’) to match the size of the reference 108 2 3 4 5 6 7 8 9 10 11 12 10 −3 10 −2 10 −1 Subsampling factor MSE Uniform Random vs. Bernoulli vs. Regular Subsampling (Rain Data) 2 3 4 5 6 7 8 9 10 11 12 2 4 6 8 10 12 14 Subsampling factor N. of errors (%) Uniform Random Bernoulli Regular Uniform Random Bernoulli Regular 2 3 4 5 6 7 8 9 10 11 12 5 10 15 20 25 30 Subsampling factor Missed (%) Uniform Random vs. Bernoulli vs. Regular Subsampling (Rain Data) 2 3 4 5 6 7 8 9 10 11 12 0 2 4 6 8 10 Subsampling factor False (%) Uniform Random Bernoulli Regular Uniform Random Bernoulli Regular (a) (b) Figure 6.10: Comparison of error rates for regular, Bernoulli and uniform random sub- sampling (Rain fall data set, 5 years period, 2 bin histograms). histograms computed using measurements from the entire network. Then a node com- putes its histogram from the augmented measurement vectors. The computation of the confidence metric consists in computing a distance between the histogram of the aug- mented measurements with the closest reference histogram. We propose a the following confidence metric: c m (h(Z i )) = N b X i=1 (O i −X i ) 2 X i , (6.30) whereN b isthenumberofbins,O i andX i arerespectivelythevaluesofthei−thbinfor the histograms of the partial augmented measurements and the prototype of the most probable class, e.g. ’no rain over the region’ (null hypothesis). The notation c m (h(Z i )) indicates the confidence metric. If the confidence metric is above a certain threshold, which can beassessed throughthe training process, the nodecan take adecision forthe whole network. Note that the (6.30) is similar to the metric used for theχ−square test (for instance, see [48]). However, we cannot consider our approach a proper χ−square test, but just a heuristic, because of the sample measurements in this case are spatially correlated (χ−square test requires independence of the samples) and the augmentation of the observed vectors. 6.6.3 Algorithm for Distributed Feature Evaluation This is the distributed decision method. We assume that a threshold c 0 on the confi- dence measure c m (.) has been defined and passed to all the nodes in the network. 109 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 Node Locations and Data Gathering Tree Figure 6.11: Sensor network topology for the whole rainfall data set region 165 nodes, 6 hops. 1. begin from the nodes in the most external hop 2. At each node Z i , compute h(Z i ) as the sum of the feature vector from the node own measurement and the feature vectors received from children nodes 3. estimate c m (h(Z i )). 4. if c m (h(Z i ))≥c 0 • send transmission request package • each node receiving the transmission request package suspends the feature extraction processandre-transmits thepackage toits neighborsanditsmea- surements to the parent node 5. else if c m (h(Z i ))≤c 0 • the node waits for a number of hop transmission times inversely proportion to c m (Z i ) • then sends the feature vector h(Z i ) to the parent node If no decision is taken at the nodes, the sink collects all the feature vectors, sums them up in one vector representing the histogram of the sensor frame and performs the classification. The reason for different waiting times according to the value of the confidence mea- sure c m (h(Z i )) is to have the feature vectors to propagate faster over trees more likely 110 2 4 6 8 10 12 14 16 10 −3 10 −2 10 −1 Subsampling factor MSE 0 2 4 6 8 10 12 14 16 8 9 10 11 Subsampling factor Average Error Rate % 0 2 4 6 8 10 12 14 16 0 5 10 15 Subsampling factor % Relative Percentages of Missed and Falsely Detected Sensing Frames Missed False 0 2 4 6 8 10 12 14 16 0.8 1 1.2 1.4 Subsampling factor Effective Efficiency (a) (b) Figure6.12: Errorratesandeffective efficiencyforsensingframeclassificationvia8−bin histograms. Performance metric Distributed decision Centralized decision Error rate 5.2055% 0.5479% Relative misclassification rate 3.5714% 7.1429% Relative false detection rate 5.3412% 0% Decisions taken not at the sink 12.33% 0% Table 6.2: Distributed vs. centralized decision. tomeasureahighenoughconfidencetostop theprocessbeforethepackets have arrived to the sink. Note that the notification messages have minimum waiting time. Experiment 5 We consider also in this case a network of 36 node over the dry subre- gion for the rain fall data set. We perform classification using the distributed decision approach defined in SubSec. 6.6.3 for a 365 day period, with a 4 bin histogram feature. Note that the variable waiting time feature for the nodes has not been implemented. Instead, we assume for proof of concept that the feature collection stops as soon as one node takes a classification decision. Table 5 compares the performance of the distributed decision and centralized deci- sions approaches. The distributed decision is characterized by a larger number of false alarms and a smaller number of miss-detections. False detections have a negative im- pact on the energy cost. Therefore this experiment shows that there is trade-off between the performance of the distributed method and the energy cost. 111 2 4 6 8 10 12 14 16 0 2 4 6 8 10 Subsampling factor % Relative Percentages of Missed and Falsely Detected Sensing Frames 2 4 6 8 10 12 14 16 0.5 0.55 0.6 0.65 0.7 0.75 0.8 Subsampling factor Effective Efficiency Uniform Random Bernoulli Regular Uniform R. (Missed) Uniform R. (False) Bernoulli (Missed) Bernoulli (False) Regular (Missed) Regular (False) 2 4 6 8 10 12 14 16 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 Subsampling factor η eff Efficiency vs. Header Size <header size> = 0 bits <hs> = 16 bits <hs> = 32 bits <hs> = 64 bits <hs> = 128 bits (a) (b) Figure6.13: Errorratesandeffective efficiencyforsensingframeclassificationvia4−bin histograms. Figure 6.14 shows the network topology and black diamond marks on the nodes that took decisions. We added small random perturbations to the locations of those nodes in order to facilitate visualization. The plot confirms the data on Table 5: most of the decisions have been taken at the sink. The rest of the decisions happened at the nodes of the first hop. Figure 6.15 shows two snapshots of the scalar field for the confidence metric on two different rainy days. 112 6 7 8 9 10 11 9 9.5 10 10.5 11 11.5 12 12.5 13 13.5 14 Locations of decisor nodes Figure 6.14: Sensor network topology and locations of the decision making nodes (indi- cated with the black diamond). A random perturbation has been added to the location of such nodes, in order to help visualization. 1 2 3 4 5 6 0 2 4 6 0 50 100 150 200 250 13174 1 2 3 4 5 6 0 2 4 6 0 50 100 150 200 13300 (a) (b) Figure 6.15: Two examples of the scalar field for the confidence metric on two rainy days. The peaks are placed on the nodes that took the decision 113 Chapter 7 Conclusion and Future Work In this chapter, we make conclusive remarks and outline open problems and possible future work for the research efforts presented in this dissertation. 7.1 Conclusion In this section we make conclusive remarks for the work presented in Chapters 3, 4, 5 and 6. Each subsection focuses on a different chapter. 7.1.1 Diffusion Models for Data Gathering In chapters 3 and 4, we have focused on the monitoring of the diffusion phenomena via autonomous sensor networks in this thesis proposal. Under this framework, partial differential equations (PDE’s), which characterize the diffusion process, were adopted to model the time and space correlations of the physical phenomena observed by sensor nodes. One of the motivations to adopt PDE models is that the sensor network can sendonlythe modelparameters, the initial andboundaryconditions tothebasestation in these scenarios. If there are no sources existing in the region of interest, this set of information suffices to predict the evolution of the underlying phenomenon without the needofcontinuousupdatesofrawdatafromthesensorfieldtothebasestation. Wenote thatthisframeworkcanonlybeappliedtodiffusionphenomenathatcanbemodeledby alinear PDE model (for instance, state solid diffusion). Many real diffusionphenomena are dominated a by a turbulent behavior and therefore can only be represented by non linear PDE models. One of our main contributions has been on the distributed identification of param- eters of PDE’s under various types of possibly unknown boundary conditions for one andtwo-dimensionalsensorregions. Ouridentificationmethodisbasedontheextended 114 Kalman filtering (EKF). We have addressed the problem of identifying spatial varying parameters, and proposeda hybriddata and decision fusion approach to the estimation of PDE parameters in order to limit the amount of communication involved in the pro- cess. We also made some evaluation on the amount of energy needed to estimate these parameters. Computer simulation was conducted to to study the performance of the proposed scheme. Simulation results were shown and discussed. This research has other contributions in addition to the distributed identification of PDE’s parameters. With respect to a global model covering the entire sensor region, discrete-time space-independent local models were proposed as an alternative. Further- more, we showed that sensors providing better measurements for the estimation of the parametersarethoselocatedinpointsofthefieldcharacterizedbyhigherfreedynamics, therefore not close to constrained boundaries or in other stationary points. This may lead to new criteria to select “good” sensor nodes and to weigh measurement data. In particular, it is important to point out that static SNR does not play a key role in the performance of identification methods, which is contrary to the common belief. Some open problems for this research are detailed in the next section. 7.1.2 Spatially Non Stationary Stationary Models In Chapter 5, we studied the problem of the energy efficient deployment of sinks in a sensor region for the task of correlated data gathering. The fundamental assumption here is that the physical field being monitored is characterized by a nonstationary spa- tial covariance model. Nonstationarity leads to an uneven spatial distribution of the ratesassigned tothenodes. Thisimplies thattheoptimal sinklocation maynot benec- essarily the centre of the sensor region, unlike typically in the case of stationary spatial statistics. We provided analytical results on the optimal sink location and studied the properties of the spatial profiles of the rates for a general class of nonstationary spatial covariance models. Our main conclusion, validated on a real data set as well, is that the optimal location of the sink turns out to be close (usually within one hop) to the centre of the sensorregion. Thisresult goes against preliminaryintuition on the impact of nonstationarity on correlated data gathering. 7.1.3 Energy Efficient Data Gathering via Distributed Classification In Chapter 6, each round of data collected by a wireless sensor network is seen as a still image (sensing frame). We addressed the problem of increasing the efficiency of data gathering by performing in-network classification of the frames in order to send to the base station only those of interest to the end users. This framework poses strict constraints on the cost of analyzing the sensing frames in order to be more energy 115 efficient than data gathering. We focused in particular on the analysis of the data via intensity histograms. The aforementioned efficiency constraints limits the size of the histograms we extract to analyze the data and/or the number of messages than can be exchanged to extract analyze the histograms. We present analytical results on the maximum size allowed for the histograms. We study methods to increase the efficiency by subsampling the data and so decreasing the number of bits to encode the histograms and also reducing the number of mes- sages to analyze the data. We present extensive analytical and experimental results to characterize the performance of these methods. 7.2 Open Problems and Future Work 7.2.1 Diffusion Models for Data Gathering Although the main focus of Chapters 3 and 4 has been on the identification of physical models, the broad goal of this research is to get a deeper understanding of the impact of physical modeling for the design of wireless sensor network algorithms. In the near future,wirelesssensornetworkswillenableaverydensesamplingofnaturalphenomena at a relatively low cost, provided that the scientific community keeps all the promises. Physical models capture the spatio and temporal relationship of natural phenomena. Thus, embedding physical models in sensor networks may potentially lead to better algorithms, since it provides important prior information on the behavior of observed phenomena. With this in mind, our attempt to study physical models via sensor net- works is useful in the sense that it offers insights into the understanding of physical phenomena as well as the engineering of sensor networks. Some interesting research directions to extend the current research are described below. 1. More experimental results and performance evaluation In particular, would be interesting to evaluate the performance of 2-D algorithms and the trade-off between the energy cost andthe accuracy of estimates. It would also be interesting to evaluate the performance when processing real data. 2. Network initialization In order to initialize a network, we should have a better understanding of the following subjects. • Node clustering technique Ourframeworkreliesonaclusterbasedarchitecture. Severalclusteringtech- niques are available from the distributed systems [62] as well as sensor net- work [19] literatures. In the latter case, the algorithms try to elect the nodes 116 closest to the source as cluster leaders to ensure a better SNR value. For the problem of our interest, we may consider ad-hoc clustering methods since it has slightly different requirements. For example, the sources are not inside the cluster region. Besides, it is interesting to study how an approximate knowledge of the physics can be exploited to form proper clusters. • Data acquisition rate Another issue is the study of methods for the (semi)autonomous in-network acquisition ofthestate spacemodelofthephenomenonbeingmonitored. An example of question for this problem is the selection of the spatial/temporal sampling resolutions. Low sampling resolutions allow lower order matrices for the Extended Kalman Filter, while large ones have potentially a better accuracy in representing the physical field. • Determination of the optimal number of operating nodes The evaluation of the optimal number of nodes to perform the estimation process is another research problem. From the experiments, it is clear that onlysomeofthenodesareneeded. Acriterionforactivatingthenodesshould be based on the intensity of the time derivative of the signalx t , rather than on the amplitude of x itself, since the goal is to measure dynamic properties of the field. 3. Treatment of boundary conditions Theboundaryconditions(oratleastsomeoftheboundaries)maynotbeavailable in advance. Sampling the boundaries using some of the nodes to continuously update cluster heads can be demanding in terms of the energy cost. Thus, we would like todevelop techniques to exploit spatial and temporal correlation of the fieldatboundariesinordertoreducetheamountofdatabeingtransmitted. Also, alternative identification methods that do not require an explicit solution of the boundary problem should be worth of investigation. 4. Peer-to-peer node collaboration strategy Our current framework is a cluster-based processing method. A completely dis- tributed architecture could be considered so that we can eliminate the leader election phase. A tool suitable for this purpose is the information form of the Kalman Filter, since it allows a simple additive form of the update equation. 5. Impact of multi-hop communications So far, we only consider one-hop communication between member nodes and the clusterhead. Itisworthwhiletoexaminetheimpactofmulti-hopcommunications 117 in collecting the measurements. Nodes communicating through multi-hop will result in latency in the estimation process. the cluster head needs to wait for the measurements from all nodes. Other problems in this area are the possibility of updating the estimates of the parameters jointly with their routing to the sink and the optimal criteria for switching from data to decision fusion. 6. Physics-based processing and queries Most of wireless sensor networks make simplified assumptions about the model of the phenomena under observation. They generally do not adopt PDE models. We discussed the potential benefits in using PDE’s in the case of data gathering. The work in [3] considers PDE models for distributed compression. The impact of physics based processing can be promising in other applications as well (e.g., queries) and should be investigated from the theoretical and practical point of view. 7. Physics-based sensor networks under uncertainty An accurate mathematical model for a physical phenomenon may not be always available, because either too expensive to be estimated or too complicated to be processed efficiently. In real scenarios, suitable physical models can be very complicated e.g., including multiple sources, space and time varying parameters. Thus, identifying and adopting real physical models might be impractical in some wireless sensor network applications. A complement approach is to exploit an ap- proximate or partial knowledge of the physical models for monitored phenomena. An example is given by the smoothing property of diffusion phenomena. Even without the exact knowledge of the spatio-temporal equations characterizing gas diffusion,wecanstillpredicttheoverall behaviorofthephenomenon. Thisknowl- edge can be exploited in the selection of sensor network parameters such as the sampling period and/or spatial densities of active sensors. It can also be used in some sensor network tasks such as finding the extrema of a gas field. For the latter case, the physics tells us that, under some hypotheses, points of extrema are more likely to be located in an area of higher temporal variation as shown in Fig. 7.1. 7.2.2 Spatially Non Stationary Models The broad goal of the work presented in Chapter 5 is to get a deeper understanding of the impact of more accurate modeling of spatio temporal phenomena on wireless sen- sor network problems and algorithms. In particular, considering nonstationary models for the spatio-temporal phenomena may potentially lead to better algorithms, since it 118 x(ξ,,t 0 ) x(ξ,,t 1 ) x(ξ,,t 2 ) ξ Figure 7.1: Peaks of the diffusion field have a higher dynamic range in the transient state. provides important prior information on the behavior of observed phenomena. In the following subsections, we list several open problems we foresaw when we studied the subject. • More analytical results We want to provide further analytical results on some of the issues presented in this document. We are analyzing the maximum steepness of the incremental rate for correlated data obeying conditions (5.26) and (5.27), in order to evaluate the maximumdistance fromthe centre of theoptimal sinklocation. We presentedthe result for independent data in Subsection 5.3.5. From the experiments, the slow spatial growth of the rates should be confirmed also for correlated data. Besides, we would like to prove conjecture (1) and to extend the proof on convexity given in Section 5.2 to the bidimensional case. • A different real data set An additional real data set that can be used to study the problem is a set of min- imum and maximum daily temperatures for 113 weather stations across Oregon, from January 1 2000 to December 31 2004. The stations are at different eleva- tions and this altimetric heterogeneity may lead to an interesting structure of the spatial covariance. • Different energy metrics In this work, we considered an internode energy cost model of the type [rate]× [pathweight], wherethepathweightisusuallyd k ,wheredisthedistancebetween two nodes. This is a metric commonly adopted in the wireless sensor network 119 literature [14]. However, alternative energy metrics should also be evaluated and could lead to different methods for finding the optimal sink location and different conclusions(e.g. optimalsinklocationcouldbefarfromcentre). Inparticular,two energy cost metrics could be considered. For the first one, the transmission cost becomes proportional to [exp(rate)]×[path weight] which is considered suitable for higly noisy wireless channels [50]. The second one is adopted in the work [35]: heretheenergycostisproportionaltotheexpectedthenumberofretransmissions that a node has to do in order to send its data to a destination. In this case, a certain packet loss probability is defined for every communication link. • Modeling temporal nonstationarity For simplicity, we have assumed temporal stationarity of the phenomena focusing only on the spatial structure of the covariance model. To model spatio-temporal phenomena more accurately, we want to consider nonstationary spatio-temporal covariance models. Note that to the best of our knowledge, most of the current sensor network literature assumes stationary temporal models. There are many challenges in dealing with nonstationary spatio-temporal models (e.g. estima- tion of the statistics). Cyclostationary (i.e. periodically stationary) is probably the simplest kind of temporal nonstationarity and it seems to be a realistic as- sumption for many natural phenomena. A consequence of considering temporal nonstationarity is that the spatial correlation model becomes time varying. Other types of nonstationary temporal models can also be investigated, perhaps by con- sidering bounds on the covariance similar to conditions (5.26) and (5.27) that we introduced for spatial covariance models. • Other applications There are other problems that can be studied under the assumption on nonsta- tionaryspatio-temporalcovariancesbesideoptimalsinkplacement. Twonaturally related problems seem to be node deployment and on-off topology control. The former problem can be viewed as a generalization of the sink deployment. A mul- tiresolution (quad-tree) approach is a possible way to to treat it. For the latter problem, the goal is to study how to schedule to sleep some sensor nodes based on the temporal statistics of the monitored phenomenon. 7.2.3 Distributed Classification Some possible extensions of the work presented in Chapter 6 are discussed below. • Approximate computation of the efficiency In Chapter 6, we have conjectured that the average MSE for the histograms of 120 a population of subsampled sensing frames is given by a function of 1/m and D, where m is the number of nodes after subsampling and D is the subsampling factor (see Eq. 6.24). We would like to prove the conjecture and then to use the resulttoderiveanapproximate expressionfortheefficiencyη asafunctiononlyof m and D. This would have a practical application as a quick way to preliminary estimate the performance of the system. • Other kinds of features The chapter has focused on the use of the intensity histogram feature. However, there are other types of features that could be adopted to analyze the content of sensing frames: e.g. variations of the intensity histogram or shape features. Texturefeaturesdonotseemtohaveanyapplications inasensornetworkcontest. The extraction process or these new features can be different from the one for the intensity histogram. Hence the expressions for the computation of the efficiency orforanalyzingtheimpactofsubsamplingcandiffertoofromthecaseofintensity histograms. • Classification of independent subsets of the sensing frames The sensor network can be partitioned into independent sub-networks. Each sub- network can performclassification ofits ownrelated subframes. Anopenproblem herehowtodefinethepartition ofthe nodesinordertooptimize theclassification performance. • Temporal segmentation of sequences of sensing frames Analternativeapproachforperformingefficientcontentawaredatagatheringisto regardconsecutivetemporalsequencesofsensingframesasvideostreamsandthen to analyze features in order to detect temporal transition points (e.g. transition from rain to no rain). When this is the case, or after a maximum number of frames has been analyzed without the detection of any transitions, the network can send to the base station a summary of the sensing data collected between the latest and previous transition points along with the times these transitions occur. Hence there is a potential for energy savings with respect to regular data gathering. This scenario implies that there is a latency factor of several collection rounds for the transmission of the data to the base station. 121 References [1] M. E. Alpay and M. Shor. Model-based solution techniques for the source localization problem. Transactions on Control Systems Technology, 8(6):895–904, November 2000. [2] J. Baumeister, W. Scondo, M. Demetrioiu, and I. Rosen. On-line parameter estimation forinfinite dimensionaldynamicalsystems. SIAM J. Control Optimization, 35(2):678–713, March 1997. [3] B. Beferull-Lozano,R. L. Konsbruck,and M. Vetterli. Rate-distortionproblemfor physics baseddistributedsensing. InInformation Processing inSensor Networks (IPSN),Berkeley, CA, April 26-27 2004. [4] P. Bromiley and N. Thacker. When less is more: Improvements in medical image segmen- tation through spatial sub-sampling. In Proc. MIUA, 2007. [5] R. Brooks, P. Ramanathan, and A. Sayeed. Distributed target classification and tracking in sensor networks. IEEE proceedings Special Issue on Sensor Networks, September 2002. [6] O. Chapelle, P. Haffner, and V. N. Vapnik. Support vector machines for histogram based image classification. IEEE Transactions on Neural Networks, 10(5):1055, Sep. 1999. [7] C.-T. Chen. Linear Systems Theory and Design. Oxford University Press, 3rd edition, 1999. [8] X. Cheng, J. Xu, J. Pei, and J. Liu. Hierarchical distributed data classification in wireless sensor networks. Computer Communications, 33(15), July 2010. [9] K. Chintalapudi and R. Govindan. Proceedings of the IEEE ICC localized edge detection in wireless sensor networks. In IEEE ICC, editor, Workshop on Sensor Network Protocols and Applications, 4 2003. [10] C. Y. Chong and S. P. Kumar. Sensor networks: Evolution, opportunities and challenges. Proceedings of the IEEE, 91(8):1247–1256,August 2003. 122 [11] H. A. Cohen. Retrieval and browsing image database using image thumbnails. Journal of Visual Computing and Image Representation, 8(2):226–234, 1997. [12] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 1991. [13] N. A. C. Cressie. Statistics for Spatial Data. Wiley, 1st edition, 1991. [14] R. Cristescu, B. Beferull-Lozano, and M. Vetterli. On network correlated data gathering. In INFOCOM, Hong-Kong, March 2004. [15] R. Curtain and H. Zwart. An Intoduction to Infinite-Dimensional Linear Systems Theory. Springer Verlag, 1995. [16] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2), April 2008. [17] M. F. Duarte and Y. H. Hu. Vehicle classification in distributed sensor networks. ELSE- VIER Journal of Parallel and Distributed Computing, 64:826–838, 2004. [18] D. Estrin, L. Girod, G. Pottie, and M. Srivastava. Instrumenting the world with wireless sensor networks. In In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2001. [19] Q. Fang, F. Zhao, and L. Guibas. Lightweight sensing and communication protocols for target enumeration and aggregation. In International Conference on Mobile Computing and Networking (MobiHoc), pages 165–176, June 2002. [20] S. D. S. G. N. Lilis, M. Zhao. Distributed sensing and actuation on wave fields. In Second Sensor and Actor Networks Protocols and Applications (SANPA), Boston, MA, August 2004. [21] D. Ganesan, R. Cristescu, and B. Beferull-Lozano. Power-efficient sensor placement and transmission structure for data gathering under distortion constraints. ACM Transaction on Sensor Networks, 2005. [22] U. Gargi, R. Kasturi, and S. Strayer. Performance characterization of video-shot-change detection methods. Circuit and Systems for Video Technology, 10(1), Feb. 2000. [23] R. Ghez. Diffusion Phenomena. Kluwer Academic/Plenum Publishers, 2nd edition, 2001. [24] T.Gneiting,M.G.Genton,andP.Guttorp. Geostatisticalspace-timemodels,stationarity, separabilityandfullsymmetry. TechnicalReport475,DepartmentofStatistics-University of Washington, February 2005. 123 [25] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 1990. [26] J.HarmsenandW.Pearlman. Steganalysisofadditivenoisemodelableinformationhiding. In SPIE, 2003. [27] W. R. Heinzelman, A. Chandrakasan, and H. Balakrishnan. Energy efficient communica- tion protocol for wireless sensor networks. In Proc. International Conference on System Sciences, volume 33, January 2000. [28] D. Higdon, J. Swall, and J. Kern. Non-stationary spatial modeling. Bayesan Statistics, 6:761–768, June 1998. [29] W. Hu, C. T. Chou, S. Jha, and N. Bulusu. Deploying long-lived and cost-effective hybrid sensor networks. In (Broadnets) Basenets, 2004. [30] V.IsakovandS.Kindermann. Identificationofthediffusioncoefficientinaone-dimensional parabolic equation. Inverse Problems, 16(2000):665–680,2000. [31] X. (James), M. Zhang, K. E. Nygard, S. Guizani, and H.-H. Chen. Self-healing sensor networks with distributed decision making. International Journal of Sensor Networks, 2(5/6):289–298,2007. [32] J.-C. Joo, T.-W. Oh, J.-H. Choi, and H.-K. Lee. Steganalysis scheme using the differ- ence image of calibrated sub-sampling. In Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing. IEEE, 2010. [33] K. Kalpakis, K. Dasgupta, and P. Namjoshi. Maximum lifetime data gathering and aggre- gationin wirelesssensor networks. In IEEE International conference on networking, pages 685–696, Aug. 2002. [34] K.-S. Kim, M.-J. Lee, H.-K. Lee, and Y.-H. Suh. Histogram-based reversible data hiding technique using subsampling. In MM&Sec, pages 69–74, 2008. [35] A. Krause, C. Guestrin, A. Gupta, and J. Kleinberg. Near-optimal sensor placements: Maximizinginformationwhileminimizingcommunicationcost. InFifth International Con- ference on Information Processing in Sensor Networks (IPSN’06), April 2006. [36] X. Li, T. Zeng, and B. Yang. Detecting LSB Matching by Applying CalibrationTechnique forDifferenceImage. InProceedingsofthe10thACMworkshoponMultimediaandsecurity, pages 133–138, New York, 2008. ACM. 124 [37] P. Liao, M. K. Chang, and C.-C. J. Kuo. A statistical approachto contour line estimation in wirelesssensor networkswith practicalconsiderations. IEEE Transactions on Vehicular Technology, 58(7):3579, 9 2009. [38] L. Ljung. Asymptotic behavior of the extended kalman filter as a parameter estimator for linear systems. IEEE Trans. on Automatic Control, AC-24(1):36–50, February 1979. [39] J. M. Mendel. Lessons in Estimation Theory for Signal Processing, Communications and Control. Prentice Hall, 1995. [40] A. G. Money and H. Agius. Video summarisation: A conceptual framework and survey of the state of the art. Visual Communication & Image Representation, 19(2):121–143, February 2008. [41] G. Nofsinger and K. W. Smith. Plume source detection using a process query system. In Defense and Security Symposium 2004, Orlando, FL, April 12-16 2004. [42] A. Okubo. Diffusion and Ecological Problems: Mathematical Models. Springer-Verlag, 1st edition, 1980. [43] Y. Orlov and J. Bentsman. Adaptive distributed parameter systems identification with enforceableidentifiabilityconditionsandreduced-orderspatialdifferentiation. IEEETrans. on Automatic Control, 45(2):36–50, February 2000. [44] C. J. Paciorek and M. J. Schervish. Nonstationary covariance functions for gaussian pro- cess regression. In M. Press, editor, Advances in Neural Information Processing Systems, volume 16, 2003. [45] S. Pattem, B. Krishnamachari, and R. Govindan. The impact of spatial correlation on routing with compression in wireless sensor networks. In IPSN, pages 28–25, Berkeley, CA, USA, April 26-27 2004. [46] O. Pironneau. Derivatives with respect to piecewise constant coefficients of a PDE with applications. In ECCOMAS Conference, Jyvaskyla, Finland, July 2004. [47] V. Raghunathan, C. Schurgers, S. Park, and M. B. Srivastava. Energy-aware wireless microsensor networks. IEEE Signal Processing Magazine, 19, Mar. 2002. [48] S. Rajasengar, C. Leckie, and M. Palaniswami. Anomaly detection in wireless sensor networks. Topics on Security in Ad Hoc and Sensor Networks, 8 2008. [49] K. Rank, M. Lendl, and R. Unbehauen. Estimation of image noise variance. IEEE Trans. on Image Signal Processing, 146(2), 1999. 125 [50] T. Rappaport. Wireless Communication: Principles and Practices. Englewood Cliffs NJ: Prentice-Hall, 1996. [51] L. A. Rossi, B. Krishnamachari, and C.-C. J. Kuo. Distributed parameter estimation for monitoringdiffusionphenomena usingphysicalmodels. In 1st IEEE Conference on Sensor and Ad Hoc Communications and Networks (SECON),SantaClara,CA,October4-72004. [52] L. A. Rossi, B. Krishnamachari, and C.-C. J. Kuo. Hybrid data and decision fusion tech- niquesformodel-baseddatagatheringinwirelesssensornetworks. InIEEE VTC2004-Fall, Los Angeles, CA, September 26-29 2004. [53] L. A. Rossi, B. Krishnamachari, and C.-C. J. Kuo. Monitoring of diffusion processes with pdemodelsinwirelesssensornetworks. InDefenseandSecuritySymposium2004,Orlando, FL, April 12-16 2004. [54] L. A. Rossi, B. Krishnamachari, and C.-C. J. Kuo. Optimal sink deployment for dis- tributed sensing of spatially nonstationary phenomena. In In Proceedings of IEEE Pacific Visualization 2009 Symposium, November 2007. [55] L. A. Rossi and C.-C. J. Kuo. Semi-dynamic approaches to node clustering for sensor networks. In ITCOM 2004, 2003. [56] P. Sampson, D. Damian, and P. Guttorp. Advances in modeling and inference for environ- mental processes with nonstationary spatial covariance. In GeoENV 2000: Geostatistics for Environmental Applications, 2000. [57] P.SampsonandP.Guttorp. Nonparametricestimationofnonstationaryspatialcovariance structure. Journal of the American Statistical Association, 87:108–119,1992. [58] A. Scaglione and S. Servetto. Wireless Networks, chapter On the Interdependence of Routing and Data Compressionin Multi-Hop Sensor Networks,pages149 – 160. Springer- Verlag, January 2005. [59] G. Shen, S. Pattem, and A. Ortega. Energy-efficient graph-based wavelets for distributed coding in wireless sensor networks. In ICASSP, 2009. [60] M. L. Stein. Non-stationaryspatial covariancefunctions. Technical report, Department of Statistics - University of Washington, January 2005. [61] J. Sukharev, C. Wang, K.-L. Ma, and A. T. Wittenberg. Correlation study of time- varying multivariate climate data sets. In In Proceedings of IEEE Pacific Visualization 2009 Symposium, April 2009. 126 [62] G. Tell. Introduction to Distributed Algorithms. Cambridge University Press, 1994. [63] S. Toumpis and G. A. Gupta. Optimal placement of nodes in large sensor networks under a general physical layer model. In SECON, San Jose, September 2005. [64] R. Viswanathan, S. Thomopoulos, and R. Tumuluri. Optimal serial distributed decision fusion. IEEE Transactions on Aerospace and Electronic Systems, 24(4):366–376, Aug. 1988. [65] M. Widmann and C. S. Bretherton. 50 km resolution daily precipitation for the pacific northwest. Technical report, University of Washington, 1999. [66] C. Wren and E. Munguia-Tapia. Toward scalable activity recognition for sensor networks. In Proceedings of the Workshop on Location and Context-Awareness, 2006. [67] S. Xiong and Y. Li. Evolutionary algorithm for parameter identification inverse problem in parabolicsystems. In 4 th World Congress on Intelligent Control and Automation, pages 1817–1821,June 2002. [68] J. Yin, V. Syrmos, and D. Yun. System identification using nonlinear filtering methods with applications to medical imaging. In 39th IEEE Conference on Decision and Control, volume 4, pages 3313–3318,December 2000. [69] T. Zeng and B. Yang. A further study on steganalysisof LSB matching by calibration,. In Proc. IEEE International Conference on Image Processing, pages 2072–2075.IEEE, 2008. [70] Y. Zhu, K. Sundaresen, and R. Sivakumar. Practical limits on achievable energy improve- ments and usable delay tolerance in correlation aware data gathering in wireless sensor networks. In SECON, San Jose, September 2005. 127
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Aging analysis in large-scale wireless sensor networks
PDF
Efficient and accurate in-network processing for monitoring applications in wireless sensor networks
PDF
Distributed wavelet compression algorithms for wireless sensor networks
PDF
Multichannel data collection for throughput maximization in wireless sensor networks
PDF
Algorithmic aspects of throughput-delay performance for fast data collection in wireless sensor networks
PDF
Gradient-based active query routing in wireless sensor networks
PDF
Models and algorithms for energy efficient wireless sensor networks
PDF
Lifting transforms on graphs: theory and applications
PDF
Optimal resource allocation and cross-layer control in cognitive and cooperative wireless networks
PDF
Robust routing and energy management in wireless sensor networks
PDF
Modeling and predicting with spatial‐temporal social networks
PDF
On location support and one-hop data collection in wireless sensor networks
PDF
Understanding the characteristics of Internet traffic dynamics in wired and wireless networks
PDF
Sensing with sound: acoustic tomography and underwater sensor networks
PDF
Design of cost-efficient multi-sensor collaboration in wireless sensor networks
PDF
Modeling intermittently connected vehicular networks
PDF
Magnetic induction-based wireless body area network and its application toward human motion tracking
PDF
Learning, adaptation and control to enhance wireless network performance
PDF
Algorithmic aspects of energy efficient transmission in multihop cooperative wireless networks
PDF
Relative positioning, network formation, and routing in robotic wireless networks
Asset Metadata
Creator
Rossi, Lorenzo
(author)
Core Title
Efficient data collection in wireless sensor networks: modeling and algorithms
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
05/26/2012
Defense Date
11/26/2011
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
digital signal processing,information theory,OAI-PMH Harvest,partial differential equations,pattern recognition,spatially non-stationary correlations,wireless sensor networks
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Golubchik, Leana (
committee member
), Krishnamachari, Bhaskar (
committee member
)
Creator Email
lorenzo.rossi@gmail.com,lrossi@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-212948
Unique identifier
UC11291433
Identifier
usctheses-c3-212948 (legacy record id)
Legacy Identifier
etd-RossiLoren-438-1.pdf
Dmrecord
212948
Document Type
Dissertation
Rights
Rossi, Lorenzo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
digital signal processing
information theory
partial differential equations
pattern recognition
spatially non-stationary correlations
wireless sensor networks