Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Communication and estimation in noisy networks
(USC Thesis Other)
Communication and estimation in noisy networks
PDF
Download
Share
Open document
Flip pages
Copy asset link
Request this asset
Transcript (if available)
Content
COMMUNICATION AND ESTIMATION IN NOISY NETWORKS by Satish Vedantam A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2009 Copyright 2009 Satish Vedantam Dedication This dissertation is dedicated to Indira and my family – without them and their support, none of this would be possible. ii Acknowledgements The start of every journey is the first step. This thesis is the culmination of all the time I have spent at USC and none of that would have been possible but for the support from my advisor – Dr. Urbashi Mitra. She has been the quintessential advisor guiding me through thick and thin as I plodded on, towards this thesis and I am grateful to her for her support. I have gained a lot from my interactions with some other professors at USC – Dr. Giuseppe Caire, his class on Information Theory and his clarity of presentation was the motivation to look at this field for a research problem. Dr. Boris Rozovsky (who advised me for my Masters Thesis in Applied Mathematics) and his engineering approach to solving complicated mathematical problems was a revelation and one of the driving forces for my interest in research. I also benefited from interactions with Dr. Kenneth Alexander, Dr. Peter Baxendale, Dr. Ronald Bruck, Dr. Solomon Golomb, Dr. RemigijusMikulevicius, Dr. MichaelNeely, Dr. RobertScholtz, Dr. JianfengZhang andDr. ZhenZhanginandoutsideclasses. Thesearethepeoplewhohelpedmedevelopthetools I employ in this thesis. I am also thankful to Dr. Gerhard Kramer for extensive discussions about how cost constraints make the simple MAC problem a hard one to deal with. This document gained immeasurably from his important contribution. Finally, I have to mention Dr. Ashutosh Sabharwal and Dr. Wenyi Zhang. We collaborated on a number of results which are presented in this thesis and have always been sounding boards for my ideas and were immensely helpful in setting up the direction of this thesis. I cannot begin to describe the hard-work, dedication and support that my parents and my little sister have put into my life. I am blessed and hope that I turn out to be at least half the iii persontheydeserve. Indirahasbeenanotherpersonwhohasalwaysbeenatmysideoverthepast few years and I am thankful for her support. No mention of my life at USC would be complete withoutamentionofmyfriends. Arjun, Cecilia, Gautam, Hari, Madhavan, Nick, Sameera, Shiva, Sridhar, Stefan and Sundeep helped me keep a balanced and inquisitive mind. I am thankful for all the fruitful discussions we had on topics not restricted to research. Finally, I would be remiss to not acknowledge the contributions my friends from before USC made to this thesis. Anish, Nayak, Nagendra, Meher and a few others whose names slip my mind kept me grounded and helped me concentrate on the job at hand. Life at USC was made easy by the expert handling and ever-present help provided by Milly Montenegro, Mayumi Thrasher, Gerrielyn Ramos, Anita Fung and Diane Demetras. I never had a question that they could not help me with. iv Table of Contents Dedication ii Acknowledgements iii List Of Figures vii Abstract xi Chapter 1: Introduction 1 Chapter 2: The Single Hop Problem: Two Particular Examples 8 2.1 Signal Model and Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 BSC with additive Channel State and Hamming Distortion . . . . . . . . . . . . . 11 2.3 Fading Gaussian Channel with the MSE Distortion Metric . . . . . . . . . . . . . . 16 2.4 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Chapter 3: The Capacity-Distortion Trade-off 26 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 One Hop Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 A One Hop Constrained Channel Coding Formulation . . . . . . . . . . . . . . . . 32 3.4 Illustrative Examples for the Point-to-Point Case . . . . . . . . . . . . . . . . . . . 35 3.4.1 Uniform Estimation Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.2 A Binary Multiplicative Channel . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.3 Fading Channel with Binary Signaling . . . . . . . . . . . . . . . . . . . . . 40 3.4.4 Two Parallel Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Multiple Access Channel with Channel State Estimation . . . . . . . . . . . . . . . 46 3.6 Linear Two-Hop Network with Channel State Estimation . . . . . . . . . . . . . . 56 3.6.1 An achievable rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.6.2 The converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.6.3 Information Losslessness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.7 Illustrative Examples for the Two-Hop Case . . . . . . . . . . . . . . . . . . . . . . 66 3.7.1 Scalar Additive Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.7.2 Scalar Multiplicative Channels . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Chapter 4: A Multi-Hop Problem 78 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2 Signal Model and Problem Definition : Linear Network. . . . . . . . . . . . . . . . 82 4.3 The Encode-and-Forward Protocol: Linear Network . . . . . . . . . . . . . . . . . 86 v 4.4 The Amplify-and-Forward Protocol: Linear Network . . . . . . . . . . . . . . . . . 91 4.5 Many-to-One Relay : Protocols and Bounds . . . . . . . . . . . . . . . . . . . . . . 103 4.5.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.5.2 First Hop Many-to-Relay Protocol . . . . . . . . . . . . . . . . . . . . . . . 104 4.5.3 Amplify-and-Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.5.4 Encode-and-Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.6.1 Linear Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.6.2 Many-to-One Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.7 Comparisons with Extrinsic Sensing and Multi-terminal Communication . . . . . . 116 4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Chapter 5: Optimal Location for a Mobile Relay 120 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.2 Collected Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.3 Underwater Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.4 Problem Setup With Static Fusion Center . . . . . . . . . . . . . . . . . . . . . . . 127 5.5 Problem Setup with a Mobile Fusion Center . . . . . . . . . . . . . . . . . . . . . . 130 5.6 The Two-Hop Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.7 Minimizing Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.7.1 Pure Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.7.2 Estimation Followed by Communication . . . . . . . . . . . . . . . . . . . . 136 5.7.3 Multiple Sensors with one Relay . . . . . . . . . . . . . . . . . . . . . . . . 140 5.8 Minimizing Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.8.1 The Two-Hop Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 5.8.2 Multiple Sensors with one Relay . . . . . . . . . . . . . . . . . . . . . . . . 145 5.9 Minimizing Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.9.1 The Two-Hop Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.10 Multiple Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.10.1 Minimizing Delay with a Given Energy Constraint . . . . . . . . . . . . . . 148 5.10.2 Minimize Distortion Given a Delay Constraint . . . . . . . . . . . . . . . . 149 5.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Chapter 6: Conclusions 151 Bibliography 152 vi List Of Figures 1.1 Two particular network topologies considered. The first is a multi-hop linear net- work while the second is the node topology for the Many-to-One Relay problem. . 2 2.1 The Generalized Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 The signal model for the binary case. . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 The signal model for the fading channel. . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 The achievable (C A ,D) region for the binary channel when p n changes from 0.05 to 0.45, T c is held fixed at 10 and p s is fixed at 0.05. . . . . . . . . . . . . . . . . . 22 2.5 A comparison between the achievable rate regions (for the binary case) given by joint communication and estimation as compared to allocating a fixed number of time slots to estimation and communication. (T c = 10, p s = 0.05 and p n = 0.2.) . . 23 2.6 The achievable (C,D) region for the block fading case as σ 2 increases. . . . . . . . 24 2.7 The achievable (C,D) region as compared to an upper bound on the disjoint esti- mation/communication problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1 The general signal model without CSIT. . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Channel model for joint communication and channel estimation. . . . . . . . . . . 30 3.3 Capacity-distortion function for the scalar multiplicative channel. . . . . . . . . . . 38 3.4 The achievable rate with training and joint communication and estimation for the block multiplicative channel when D = 0. . . . . . . . . . . . . . . . . . . . . . . . 40 3.5 The trade-off curve for Y k =S k X k +Z k when σ 2 Z = 0.1 and σ 2 s varies (The curves for σ 2 s = 0.5,1,2,4,8 are shown here). . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.6 The trade-off curve for Y k =S k X k when σ 2 s varies (The curves for σ 2 s = 5,10,20,30 are shown here). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.7 Two parallel channels where we want to estimate both the channel parameters S 1 and S 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 vii 3.8 The capacity distortion region for two symmetric binary multiplication channels when P(S i = 1)= 0.2 for varying D. . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.9 The 2-MAC-CSE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.10 A general two-hop relay network where the destination is interested in estimates of both the intermediate channels and the message transmitted. . . . . . . . . . . . . 56 3.11 Relay processing structure for an achievable rate. SQ is the noisy source quantizer for S 1 at the relay node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.12 Capacity distortion function for the two-hop additive channel model with σ zi = 0i = 1,2. Results for σ 2 = 1,2,3,4 are shown. . . . . . . . . . . . . . . . . . . . . 67 3.13 Bounds on capacity distortion function for additive signal models with noise. We assume σ 2 s1 =σ 2 s2 = 5 and σ 2 z1 =σ 2 z2 = 0.1. . . . . . . . . . . . . . . . . . . . . . . . 70 3.14 Capacitydistortionregionforthetwo-hopmultiplicativechannelmodelY k =S k X k . (The curves for σ 2 s = 6,7,8,10,12,15 are displayed.) . . . . . . . . . . . . . . . . . . 73 3.15 Upper bound on the capacity distortion function for the two-hop fading channel model. (The curves for σ 2 s1 = σ 2 s2 = 0.5,1,2,4,8 while σ 2 z1 = σ 2 z2 = 0.1 are shown here.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.16 Lower bound on the capacity distortion function for the two-hop fading channel model. (The curves for σ 2 s1 = σ 2 s2 = 0.5,1,2,4,8 while σ 2 z1 = σ 2 z2 = 0.1 are shown here.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.1 A graphical demonstration of the contributions to the distortion bounds for the two hop network assuming the encode-and-forward protocol at SNR 1 = SNR 2 =20 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.2 MSE’s for the TRLUE as compared to the Encode-and-Forward bounds for a two hop network at an SNR of 30dB as time allocated to the first hop changes. . . . . 112 4.3 Comparison of the rate of decay of MSE for the first hop with SNR assuming that T 1 =T 2 =T 3 = 1/3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.4 MSE’s for the TRLUE using three hops assuming that T 1 =T 2 =T 3 = 1/3. . . . . 114 4.5 Comparison of the bounds on distortion of the first hop in a network assuming the encode-and-forward protocol as the number of sources in the first hop changes. . . 115 4.6 Comparison of the bounds on the distortion of the first hop in a two level tree network assuming the encode-and-forward protocol as the SNR for each of the nodes in the network changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.7 Comparison of the bounds on distortion of the first hop in a network assuming the encode-and-forward protocol as the network topology changes. . . . . . . . . . . . 116 5.1 Setup for testing and benchmarking the NAMOS system. . . . . . . . . . . . . . . 121 viii 5.2 Buoy setup to collect data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.3 Sum distortion at the fusion center as a function of the initial transmit SNR at each of the buoys for f c = 1kHz and f c = 24kHz. . . . . . . . . . . . . . . . . . . . 128 5.4 The number of active buoys for the minimum sum distortion problem with a fixed fusion center for f c = 1kHz and f c =24kHz. . . . . . . . . . . . . . . . . . . . . . . 130 5.5 The minimum sum distortion at the fusion center as a function of SNR 0 optimized for the location of the fusion center for f c =1kHz and f c = 24kHz. We also present the case when the fusion center is fixed. . . . . . . . . . . . . . . . . . . . . . . . . 131 5.6 The number of active buoys to minimize the sum distortion at the mobile fusion center as a function of SNR 0 . We also present the results when the fusion center is fixed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.7 Signal model for the two-hop problem with a mobile relay. . . . . . . . . . . . . . . 132 5.8 End-to-end MSE for the two hop case with pure estimation as the decay exponent changes from 2 to 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.9 Optimal location for the relay as a function of the power available at the relay for different decay exponents (α = 2,3,4). . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.10 End-to-endMSEasafunctionofthelocationofthemobilerelaywhentheSNRfor the first hop obeys an exponential decay with different decay exponents (α=1,2,3,4 and 5 are shown here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.11 End-to-end MSE as a function ofd SR with required communication rates of 0,1,2,3 and 4 bps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.12 Optimal location as a function of the power available at the mobile relay with the additional constraint of a minimum of 0,1,2,3 and 4 bps of guaranteed communi- cation between the relay and destination. . . . . . . . . . . . . . . . . . . . . . . . 139 5.13 Signalmodelfortheproblemwherewehavemultiplesensorsandarelaytofacilitate communication with the destination node D. . . . . . . . . . . . . . . . . . . . . . 139 5.14 Minimum end-to-end MSE for the network with two sensors and one mobile relay node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.15 Total time taken up in the network – as a function of where the mobile sensor node is placed after actuation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.16 Total energy consumed by the two-hop network – as a function of where the mobile sensor node is placed after actuation. . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.17 Minimumdelayforthenetworkasafunctionofthemaximumenergywearewilling to expend on the network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.18 Delay-Distortion trade-off curves for α = 2,3 and 4.. . . . . . . . . . . . . . . . . . 150 ix Abstract In this thesis, we present a study of two problems relevant to sensor networks. We first study a new form of sensor networks where the parameters of interest are the inter-node time-varying channel(s) and then study the problem of determining the optimal location of a mobile relay node to optimize a certain cost function for the sensor network. Networkswhichcollectinformationaboutthechannelsarebecomingrelevantinenvironmental monitoring systems, like underwater tsunami detecting networks. We first present two examples todrivehomethepointthatjointchannelestimationandcommunicationcanoutperformtraining basedschemes. First,thebinarysymmetricchannelisexamined;anachievablecapacity-distortion trade-off is derived for both joint and time-orthogonal protocols. For the flat fading, additive, white, Gaussian noise channel, a novel joint communication and estimation scheme using low cor- relation sequences is presented. It is observed that in most situations, joint communication and estimation performs better than a scheme where communication and estimation are performed individually, furthermore, the gains of joint communication and estimation over individual com- munication and estimation can be significant as the distortion tolerance increases. It is also observed that even a slight tolerance to errors in the channel parameters close to the theoreti- cal lower bounds yield significant improvements in the rate at which reliable communication is achievable. We then formulate a joint communication and estimation problem : simultaneous communi- cation over a one-hop noisy channel and estimation of certain channel parameters. The trade-off betweentheachievablerateanddistortioninestimatingthechannelparametersisthenquantified x foravarietyofchannels. Finally,weformulatetheinformationtheoreticproblemcorrespondingto maximizingtheachievableratewhilesimultaneouslymeetingthedistortionconstraintforchannel estimation at the destination. The capacity for this problem is evaluated and the theory applied to examples to highlight the results presented. These results are then extended from the one-hop scenario to the multiple access channel (MAC) and the two-hop relay – though we only present achievable and outer-bounds for the latter network. It is also noted that these bounds coincide for particular networks and one particular class of these networks is presented. Each ideas for the theoretical results presented are bolstered with examples as appropriate. More general networks are then studied where the only objective at the destination is to min- imize the end-to-end distortion of collecting channel estimates for all inter-node channels. Two particular protocol classes: amplify-and-forward and encode-and-forward are analyzed in this study. First we show that asymptotically in SNR, amplify-and-forward can outperform encode- and-forward and in fact can achieve the maximum possible distortion diversity order of unity. Second, we compare two topologies, a linear network and a tree network operating with orthog- onal access, and conclude that fewer hops are beneficial to achieve better end-to-end distortion performance. We derive lower and upper bounds on distortion for both protocols and both net- works, which can be used to optimize finite SNR performance. Finally, we study the problem of determining the optimal location of a mobile relay node in a sensor network to optimized some end-to-end metric. Three such metrics are considered: distortion, delay and energy. For each of these cases when we want to optimize the end-to-end metric in the estimation of a certain phenomenon, the problem is reduced to some problems which have already been solved and the results are applied to gain intuition about the results. We also study problems with multiple cost constraints – like minimizing end-to-end distortion while imposing an upper bound on the delay across the network. It is observed that the mobility of the relay node gives us additional degrees of freedom to optimize over while simultaneously making the optimization problem hard as the number of nodes in the network increases. xi Chapter 1 Introduction In this thesis, we propose and analyze two new problems of relevance in sensor networks. We first address the problem where the features to be sensed are the time-varying inter-node communica- tion channels and then address the problem of optimizing for the location of a mobile relay in a network. For the first problem, as the communication channel is integral to the communication network itself, we denote this class of sensing as intrinsic. For several key applications, intrinsic sensing is fundamental. In cognitive radio networks, the spectrum manager is required to build a “radio- map” of spectral usage [21] or the network uses an estimate of the interference topology [20], operations which are akin to measuring the inter-node channels. Diverse underwater applica- tions from active sonar [69] to tsunami detection [40] using underwater acoustic communication networks also require information about the communication channel which will enable inferences aboutdetectedobjects(sonar)orlocalchangesinchannelconditionswhichcanpredictlargescale changes (seismic). Forthesecondproblem,weoptimizeforthelocationofamobilerelaynodeinasensornetwork when the metric to be optimized is one (or more) of end-to-end distortion, delay in the network and the total energy consumed in the network. We also address the problem from the viewpoint 1 of underwater channels where the decay profile is significantly different as compared to over-the- air channels. While this optimization problem is straightforward to setup, it is observed that it becomes non-convex in certain situations thereby making a general solution hard to come by. Our contributions are six fold. First we study two particular channels and show that joint communicationandestimationoutperformstrainingbasedschemesformostnetworks. Second,we formulate and solve the one-hop joint communication and channel estimation problem. Third, we analyze and compare two classes of protocols for intrinsic sensing in multi-hop networks. Fourth, we propose the capacity distortion problem and analyze achievable rates for two particular signal models. Fifth, we mathematically evaluate the capacity of the problem proposed for a general discrete memoryless channel (DMC). Sixth, we study the problem of the optimal location of a mobile relay in a sensor network to optimize a given cost function. Note that our current work differs from extrinsic sensor network problems such as [13,39,55] where the parameters of interest are external to the network. In those prior works, the resource of contention between sensing and communication is typically power; for our scenario, time (or bandwidth) is the resource of contention. S n-2 2 1 n-1 D Source Destination 1 2 M R D . . . Figure 1.1: Two particular network topologies considered. The first is a multi-hop linear network while the second is the node topology for the Many-to-One Relay problem. To appreciate the trade-offs relevant to multihop intrinsic sensing, consider the linear topology of Figure 1.1 where there are n+1 nodes and n inter-node channel states. Thus a relaying node, 2 i, must provide signaling to facilitate the sensing of channel i and must simultaneously forward “estimates” of all prior channel states (1,2,··· ,i−1). Resources are thus balanced between the estimation of channels and the communication of channel estimates. At first glance, this appears to be the classical problem of sharing resources between training and communication to maximize capacity. However, in contrast to [28], which explicitly treats the training issue, or [67], which implicitly treats channel estimation through the examination of non-coherent communication, where the objective is the reliable transmission of bits, our goal is to optimize the end-to-end distortion. Furthermore, our metric is not a per-hop metric. Data aggregation and forwarding is integral to many sensor network problems (e.g. [17]); however, again, we underscore that these prior works focused on extrinsic parameters. Thus channels are used for communication alone and are not themselves of interest. A work of more direct connection to our current problem is the single hop problem of [54] which examines the communicationofchannelstateandcommunicationtothedestination. However,insharpcontrast to the current problem framework, [54] presumes channel state information at the transmitter. Furthermore we explicitly examine multi-hop networks. Finally we observe that there has been significant work on cooperative communication for sensor networks (see e.g. [25,34]); however this work is focused on two-hop networks and of course ignores the impact of sensing altogether. Wefirstproposetwoclassesofprotocols,denotedencode-and-forwardandamplify-and-forward, and compare their end-to-end mean-squared error distortion. 1 Additionally, we examine two dif- ferent topologies as building blocks for more complex and practical networks: the linear topology previously discussed and a two-hop tree (many-to-one) topology also depicted in Figure 1.1. We thenmoveontoformulatingthejointcommunicationandestimationproblemandpresentachiev- ableratesfortwoparticularchannelmodels. Theintuitiongleanedfromtheseparticularexamples helps us formulate and fully characterize the trade-off between communication and estimation for 1 These protocols are inspired by well-known protocols for data forwarding in relay channels [9,34]. 3 a single hop network when the channel is a DMC. These results hold in a very general setting and the extension of these results to multiple hops is still open. Our contributions are summarized as follows. We first define a measure, which is asymptotic in SNR, the end-to-end distortion diversity and straightforwardly show that the largest distortion diversity possible is unity. Upper and lower bounds on the end-to-end distortion are provided for the encode-and-forward and amplify-and-forward protocols for the linear network and extended to the many-to-one network. The encode-and-forward results judiciously exploit results from the classic work [63] on noisy communication over noisy channels. The amplify-and-forward protocol converts the problem to one of pure estimation and thus variants of the Cramer-Rao bounds on estimation variance [7] are relevant. Thesinglehopnetworkisthefirstnetworkthatneedstobeanalyzedforderivingananalytical trade-off function for intrinsic sensor networks. Towards this end, we first propose achievable schemes for two particular signal models for the single hop network. Using the intuition from these results, we present a characterization of the trade-off function for an arbitrary discrete memoryless system. These results are then generalized to the MAC and a two-hop relay network. Further generalizations become complicated and we only present bounds for more complicated networks. For a general linear network, we show that for amplify-and-forward, a non-trivial upper bound on the distortion diversity (which is defined in a later chapter) is unity and provide an achievable channel estimation scheme which numerically achieves distortion diversity of one. Thus, for high SNR,amplify-and-forwardcanachievetheoptimaldistortiondiversity. Thisresultisindependent of how resources are allocated to the different nodes (assuming some time allocation). Of course, time allocation is relevant for performance at finite SNR. For encode-and-forward, it can be shown that the distortion diversity is strictly less than one for a network with more than one hop. The performance advantage of amplify-and-forward can be attributed to the fact that soft 4 information is preserved. In contrast, encode-and-forward suffers from error propagation since we form estimates at each hop. However, optimal asymptotic performance is not a guarantee for optimal finite SNR perfor- mance. Thus for moderate SNR, encode-and-forward can outperform amplify-and-forward with respect to end-to-end distortion. For many-to-one networks, we determine an optimal first hop protocol which minimizes the sum distortion at the relay. This resultant protocol is an orthogonal time-division multiplex- ing scheme and is applicable to both protocols. Given the orthogonality of the first hop, the encode-and-forward many-to-one protocol yields performance which is equivalent to that of mul- tiple two-hop networks for the case of all channels having the same quality (SNR). Numerical results bolster intuition which suggests that given a set of nodes, the overlaid network topology should be as hierarchical as possible. Too many hops increase distortion. We observe that given the orthogonal communication schemes our results suggest resource distribution policies for net- works which employ collision avoidance as is often suggested for dense, large scale networks. In orthogonal access networks, we show that while employing the encode-and-forward protocol, the first hop should get the least amount of time since it is conveying least amount of information, and subsequent hops should be allocated more time. Finally, while we introduce the intrinsic sensing problem, we note that some of the results pre- sented here are similar to classical results for similar networks with either pure communication or extrinsic estimation. Consider the linear network. Cutset bounds [11] for linear AWGN networks show that the worst case link determines the end-to-end capacity – this echoes what we see in the diversity analysis for the encode-and-forward case (Corollary 3). However, in sharp contrast, diversity analysis for amplify-and-forward (Corollary 5) shows no loss with respect to distortion diversity. So, fundamentally, linear network performance is not limited by the worst case link for intrinsic sensor networks. Similarly, for many-to-one networks, if we employ the encode-and- forward protocol, the distortion diversity is bounded by the least time allocated to the links 5 that communicate a particular set of estimates (Corollary 7) – which is equivalent to the cutset bounds for the corresponding communication network. However, again, since the amplify-and- forwardprotocolcanachieveoptimalperformance(Corollary6),thefundamentalperformancefor themany-to-onenetworkisagainquitedifferentascomparedtothecorrespondingcommunication network. Once we ignore the measurement noise terms in the MSE expressions derived in [55], for ex- trinsic sensing, we note that the distortion diversity for both linear and the many-to-one topology isunity(forboththeamplify-and-forwardcaseandtheestimate-and-forwardprotocol). Itisthen straightforward to see that intrinsic sensing gives up a certain fraction of this optimal distortion diversity (for the encode-and-forward protocol) to sustain inter-node communication. Thus, there seems to be a trade-off between the amount of inter-node communication that needs to be sus- tained and the best distortion diversity that can be a achieved for given network in the intrinsic sensing case. For the problem of solving for the optimal location of a mobile relay node in a sensor network, we take advantage of current results like [63], [43] and [41] to obtain simple expressions for the end-to-end distortion in the sensor network for a particular placement of the relay node. We then optimize this expression over all locations of the relay node to obtain the best possible location. These schemes are then generalized to problems where we consider the end-to-end delay and the totalenergyconsumedinthenetworkasourcostfunctions. Finally,weaddressthesituationwhen more than one cost constraints are active and present a numerical way to solve this optimization problem. The remainder of this thesis is organized as follows: Chapter 2 presents two example channel models where joint communication and channel estimation provides an advantage over training based schemes. Chapter 3 presents an information-theoretic handling of the capacity distortion function – first for the one-hop network and then generalizes the results to the MAC and the two-hop relay channel. Chapter 4 then addresses the problem of pure channel estimation over a 6 linear and a many-to-one network and presents an analysis of the performance of two particular communication schemes between the nodes. Chapter 5 presents the optimization problem of solving for the optimal location of a mobile relay node in a sensor network with varied cost constraints and finally Chapter 6 concludes the thesis. 7 Chapter 2 The Single Hop Problem: Two Particular Examples In this chapter, we consider the problem of joint channel estimation and data transmission over a channel with a time-varying channel state over a single hop network. Thus, the objective is that the receiver determine both the information transmitted from the source as well as the channel over which the information was transmitted. It is assumed that the transmitter has no channel state information. While we examine a fairly circumspect problem herein, our formulation is motivated by several real-world applications such as tsunami detection with acoustic signaling in an underwater communications network [50], cognitive radio [15], and wireless network topology determination [39]. Whereas we consider a single point-to-point link, of future interest is the con- sideration of multihop links; see [59]. In the previous work, we considered disjoint communication andsensing. Incontrast,hereweconsiderjointsensingandcommunication. Akeydistinctfeature of our problem formulation is that the objective is both an estimation as well as a communication problem. While it is clear that joint estimation and communication should yield improved performance and/or rate over a disjoint scheme, due to its prohibitive complexity, disjoint, or orthogonal schemes are often considered. A common strategy based on training sequences known at the receiver is often employed.Information theoretic evaluation of such an approach is examined in [29,35,37] (for example). This prior work differs from our formulation in that, channel estimation 8 is employed only to increase the achievable rate.Thus, the quality of the channel estimate is not traded-off with the rate of information as we do here. We also note that there are other purely information-theoretic approach to communication over channels with unknown states in which no explicit channel estimation is involved [30,38]. The interplay between estimation (minimum-mean squared error, MMSE, in particular) and information has been previously examined; however there estimation is mainly an approach to information transmission rather than a separate goal. To this end, [19] establishes the optimality of lattice source codes coupled with MMSE estimation of the source at the receiver. The work [27] examines the relationship between mutual information and MMSE. Again, neither problem formulation considers an unknown channel and its estimation at the receiver. The problem for- mulation in [10,54] bears some similarity to the one we consider in that the receiver is interested in both communication and channel estimation. However, they differ from our work in a crucial way that, the memoryless channel state is assumed known at the transmitter in a non-causal way. In this chapter, we consider the case where the channel state is unknown (except for its distribution) to both the transmitter and the receiver for the binary symmetric channel with a Hamming distance distortion and an additive channel model as well as Gaussian fading channels, with a multiplicative channel and additive noise coupled to the MSE distortion metric. For both scenarios, we determine achievable rates which are compared to the achievable rate-distortion curves for orthogonal, training based systems. We show that for sufficiently large coherence blocks, our developed bounds clearly show the superiority of the joint channel estimation and communication schemes over orthogonal, training based schemes. 2.1 Signal Model and Problem Description We assume that the channel is ergodic and block-fading with block size T c . The transmitter and receiverarebothassumedtoknowwheneachchannelblockstarts,butnotthechannelstateitself. 9 P(y|x,s) x y Encoder m Decoder m s ^ ^ Figure 2.1: The Generalized Signal Model The transmitter encodes one of M possible messages into one of M sequences of blocklength T c chosen from X Tc , where X is the channel input alphabet and ||.|| is a norm on this space. The codeword x is transmitted through a channel with output alphabet Y and noise alphabet N. The channel has a state s which is selected from the space S which parametrizes the transition probability P(y|x,s). The form of P(.|.) is assumed to be known at both the transmitter and the receiver. The received distorted codeword y is processed at the destination to yield estimates of s (denoted ˆ s) and x (denoted ˆ x) given distortion measures on X and S which we denote d s (s,ˆ s) and d x (x,ˆ x). We now define the capacity distortion function: C A (D) = max px:ds(s,ˆ s)≤D, E||x|| 2 ≤P I(X;Y) (2.1) whereC A (D)isthemaximumreliablerateofcommunicationthatcanbesustainedbythechannel while satisfying the distortion bound D. Note that we are interested in communicating over an unknownchannelatareliablerateC A (D)andparallelyestimatethechanneltowithinadistortion constraint D. A (c,d) pair is said to be achievable if c≤C A (d) for C A (D) as defined in Equation (2.1). To distinguish between the capacity distortion function defined here and the classical rate distortion function, let us recall the definition for the classical rate distortion function: R (I) (D) = min p(ˆ x|x): P (x,ˆ x p(x)p(ˆ x|x)d(x,ˆ x)≤D I(X; ˆ X) (2.2) 10 While Equation (2.1) looks like the classical rate distortion function (Equation (2.2), these two quantities are inherently different. While the rate distortion curve characterizes the description rate of a given source to meet a distortion constraint, the problem proposed here characterizes the trade-off between the rate of reliable pure information transmission and the distortion of the channel state estimation at the same receiver of the channel. In a sense, the problem considered here quantifies a trade-off between an information theoretic quantity (rate) and and estimation theoretic one (channel state distortion at the receiver.) This trade-off is also characterized in [54] forthechannely =x+s+nwherenisaGaussiannoisetermandweassumethatperfectchannel state information is available at the transmitter (CSIT) while we do not need the CSIT term. In the sequel, we provide several results for sufficiently large T c , yielding results within an of a key quantity. Thus we ignore terms under the assumption that the coherence time of the channel is large. We will use the notation to indicate such an equality. This can be interpreted to be the “order” of growth of the two terms is the same for largeT c . Mathematically, A B ⇐⇒ abs(A−B) = o(T c ). That is to say that as T c grows large, the formulas we have grow closer and closer to the actual values. We employ this scheme to highlight the trade-off between communication and estimation even at relatively large T c . 2.2 BSCwithadditiveChannelStateandHammingDistortion Consider the binary channel represented by the input output relation: y =x⊕s⊕n (2.3) 11 OR Encoder Decoder x s n y m s m Channel ^ ^ Figure 2.2: The signal model for the binary case. where the output of the channel, y, the input to the channel, x, the channel state, s, and the noise, n, are all binary (i.e. X =S =N =Y ={0,1}) and⊕ represents binary addition. Further, the distortion metric is the standard Hamming distortion metric: d x (a,b) =d s (a,b) =wt(a⊕b) (2.4) where wt(a) is the number of non-zero terms in a. This signal model is also presented in Figure 2.2 for ease of understanding. As stated in Section 2.1, the channel state, s, is assumed to be fixed for blocks of T c symbol durations and then change to another independent value (this is the block fading assumption made frequently in literature.) For the rest of this chapter, we shall also assume that s and n are independent and information about neither is available at either the receiver or the transmitter. Further, we also assume that s is ergodic in time. For a binary random variable b, we use the notation that p b = P[b = 1]. We will also use the notation p⊗q :=p(1−q)+q(1−p), the “convolution” of p and q. Note that if P[A = 1]=p and P[B = 1]=q, thenP[A+B = 1]=p⊗q. Finally, wedefine,H 2 (p) :=−plogp−(1−p)log(1−p), the binary entropy of a source with probability p and K 2 (p) =KL(p||1−p) := (1−2p)ln 1−p p , the Kullback-Leibler divergence between a source and its negation. All these formulas hold for any probabilities p,q∈ [0,1]. 12 Theorem 1 (An Achievable Rate for the Binary Case). Given a sufficiently large coherence time (in symbol durations) T c , p s , the probability that the channel introduces an error, and p n , the probability that n is 1, we present an achievable rate C A as a function of the Hamming Distortion in the estimation of s at the destination (over the interval T c ), D in a parametric form (using the parameter p x ) as: D(p x ) exp{−T c K 2 (p n ⊗p x )} (2.5) C A (p x ) =H 2 (p x ⊗p s ⊗p n )−H 2 (p n ⊗p s ) (2.6) Proof. Over blocks of size T c on which s remains fixed, forming an estimate of s corresponds to detecting whether s = 0 or s = 1. We can thus form the following hypothesis testing problem, in view of Equation (2.3) H 0 : s = 0 y =x⊕n H 1 : s = 1 y =x⊕n where all the vectors are of size T c and x =1⊕x is the negation of x. Then, using the notation definedinthetheoremstatement,p y =P[x⊕n = 1]=p x ⊗p n . Thetwopmfsgoverningthescalar observations underlying each hypothesis are thus: H 0 :p y =p x ⊗p n and H 1 :p y = 1−p x ⊗p n . This problem is addressedin[12], among otherplaces, and it is established that for sufficiently largevectorsizes(i.e. sufficientlylargeT c )theprobabilityoferrorinforminganoptimalestimate of s is given by: P e (s) exp(−T c K 2 (p n ⊗p x )) (2.7) Given the Hamming distortion metric, the distortion in the estimation of s at the receiver is the same as the probability of error in its detection and hence, D =P e (s). 13 Now, given p x = P[x = 1], we can encode m to have this distribution as long as H(m) ≤ H 2 (p x ). Further: I(X;Y) Δ = H(Y)−H(Y|X) (2.8) (a) = H(Y)−H(Y ⊕X|X) (2.9) (b) = H 2 (p x ⊗p s ⊗p n )−H(S⊕N|X) (2.10) (c) = H 2 (p x ⊗p s ⊗p n )−H(S⊕N) (2.11) (d) = H 2 (p x ⊗p s ⊗p n )−H 2 (p s ⊗p n ) (2.12) where, • (a) follows by the definition of conditional entropy, • (b) holds by the identity s⊕n =x⊕y and since P[Y = 1]=p x ⊗p s ⊗p n , • (c) follows since S,N are independent of X, • (d) holds since P[S⊕N] =p s ⊗p n by definition of⊗. Observe that to achieve the rate presented in Equation (2.12), we have to code over coherence blocks and exploit the ergodic nature of s. Note that when p n or p x is 0.5, the distortion in s is always 1. This is because we can no longer distinguish between x⊕n and x⊕n since they both have the same statistics: P(x⊕n = 1) = 0.5 = P(x⊕n = 1) and so using these statistics to distinguish between s = 0 and s = 1 yields no benefit. Further, when p s = 0.5, our achievable rate is 0. This is because we ignore the block structure of s in the derivation of the bound. The following Corollary establishes a similar result in the case when we restrict ourselves to estimating s and communication over non-overlapping time intervals. 14 Encoder Decoder x y n s m m s Channel ^ ^ Figure 2.3: The signal model for the fading channel. Corollary 1. Given a large enough coherence time T c , the noise probability, p n , and a certain distortion target for s, D, we need to allocate at least k − lnD K 2 (p n ) (2.13) symbols to attain this target. Further, the achievable rate for this distortion target is then given by C A = T c −k T c (1−H 2 (p n ⊗p s )) (2.14) Proof. We first observe that when we send training over the channel, we can assume that the transmitted sequence is known at the receiver and can set p x = 0, and hence, we can solve for the number of symbols over which training needs to be carried out over using Equation (2.5) to get Equation (2.13). The rest of the time can be used exclusively for communication and this leads to the rate expression in the theorem once we set p x = 0.5. 15 2.3 FadingGaussianChannelwiththeMSEDistortionMetric The signal model for this scenario is: y =sx+n (2.15) and now y,x and n are real, i.e. N = X = Y = R; we restrict our channel state to be positive S =R + . This can happen, for example, when the receiver knows the channel phase perfectly and thus we assume that this phase is zero thus leading to Equation (2.15). Furthermore, the norm defining distortion is the standard L 2 norm yielding the mean-squared error distortion metric. Optimal estimation is achieved by the conditional mean estimator (see for example [8]), which we will bound by a simpler structure. We assume that n is a mean zero Gaussian random variable with known variance σ 2 n . Again, s is assumed to remain fixed over blocks of T c symbols changing independently between blocks. We require the following lemma on order statistics, Lemma 1 (Correlated Order Statistics). (From [4] or [14][pp 311]) Given G equicorrelated iden- tically distributed Gaussian random variables ∼ N(0,1), with cross-correlation ρ, the largest of these variables is ∼N( p 2(1−ρ)lnG,ρ) for large G. Note that Lemma 1 holds only for large enough G such that the other terms in the resultant distribution can be neglected. Theorem 2 (An Achievable Rate for the Fading Channel). Given a sufficiently large coherence time T c (as measured in symbol durations), an achievable (C A ,D) region is given by: D s 3(1−E[pe(s)]) SNR 2 T 2 c + s E » pe(s) A 4 + 6cA 2 SNRTc + 3c 2 SNR 2 T 2 c ff– (2.16) CA = log M M−1 +E[1−pe(s)]log(M−1)−H2(E[pe(s)]) (2.17) wheretheexpectationsarewithrespecttothedistributionofs,p e (s)Q s p (1−c)T c SNR− p 2ln(M−1) , A = s(1−c)− q 2(1−c)ln(M−1) SNRTc , M is the largest number of sequences of real numbers of length T c with cross correlations c and SNR =P/σ 2 n . 16 Proof. We summarize the major steps of the communication process: Modulation: We generate M real sequences{x 1 ,x 2 ,...,x M } each of length T c such that: ||x i || 2 = PT c (2.18) |x T i x j | = cPT c ∀i6=j and 1≤i,j≤M (2.19) where P is the power constraint for the problem. Given a message, m, uniformly distributed over M ={1,2,3,...,M}, we transmit x m over the next T c periods of time. Channel Transmission: The encoder transmits x i when it receives message i (1 ≤ i ≤ M). Conditionedonmessageibeingtransmitted,thereceiverthenreceivessx i +nwherenisacolumn vector of all the noise symbols over a block of length T c . Demodulation: At the decoder, we receivey =sx+n. The decoder estimates the message by, ˆ m = arg max i=1,...,M ˜ z i where ˜ z i =y T x i (2.20) With ˆ m in hand, s is estimated from the correlator output (see Equation (2.31). We next treat each operation of the decoder one-by-one to establish the theorem. Detection: Recall the detector structure in Equation (2.20) above. Without loss of generality, we assume that m = 1 (recall that m is uniformly distributed over M). The correlator output distributions (given each x i ) are given by, z 1 ∼ N(sPT c ,σ 2 n PT c ) (2.21) z i ∼ N(scPT c ,σ 2 n PT c ), i6= 1 where α i =x T i x 1 (2.22) 17 We are interested in the behavior of the largest correlator output. Each of the z i is Gaussian and identically distributed with mean m = scT c and variance σ 2 = σ 2 n PT c . Further the cross correlation between any two of these variables is (i,j6= 1): E[(z i −Ez i )(z j −Ez j )]=cσ 2 n PT c (2.23) To optimize the bounds formed , M is the largest cardinality of sequences (with elements in R) of length T c and normalized aperiodic in phase cross correlation c. As T c grows large, the size of M grows with it for a given c. For large T c we can then safely assume that M is large too. Note that the characterization of the rate of growth of M as T c grows for a fixed c has been a long standing open problem and we do not attempt to address this. Let z \1,max = maxz i , 1 < i≤ M. After appropriately normalizing all the variables involved and applying Lemma 1 we observe that z \1,max is a Gaussian random variable with parameters: E[z \1,max ]scPT c +σ n p 2(1−c)PT c ln(M−1) (2.24) and var(z \1,max )cσ 2 n PT c (2.25) Now, given the statistics of z 1 (same as that of ˜ z 1 presented in Equation (2.21)) and those of z \1,max (Equation (2.24) and Equation (2.25)), we can evaluate the probability that the decoder makes a wrong decision as to whichx i was transmitted. This is equivalent to the probability that z 1 is smaller than z \1,max : p e (s) = P[z 1 ≤z \1,max ] =P[z 1 −z \1,max ≤ 0] (2.26) 18 Let us use the notation z = z 1 −z \1,max and z max = max{z 1 ,z \1,max } for convenience. Then the statistics of z are given by E[z]s(1−c)PT c −σ n p 2(1−c)PT c ln(M−1) (2.27) and var(z)σ 2 n (1−c)PT c (2.28) We are interested in p e (s) =P[z≤ 0]. Given the statistics of z, this probability is evaluated as: p e (s)Q s p (1−c)T c SNR− p 2ln(M−1) (2.29) where SNR = P/σ 2 n and Q is the complementary cumulative distribution function of a standard normal random variable: Q(x) = 1 √ 2π Z ∞ x e − t 2 2 dt (2.30) Oncewehavep e (s),wecannowboundthedistortionintheestimationofsandtherateofreliable communication. Bound on the distortion in s: Once we have p e (s) and z max from the previous discussion, we can proceed as follows (we still assume that message 1 is transmitted). From the statistics of z 1 given in Equation (2.21), we observe that we can form a Best Linear Unbiased Estimate (BLUE) for s. Note that we employ an unbiased estimator to ensure that the error variance is not dependent on s itself (as would be the case if we employ the Linear Minimum Mean Squared Error Estimator for example.) While the LMMSE estimator might attain a lower error variance, we employ the BLUE estimator for convenience. The estimator is then given by: ˆ s = z max PT c (2.31) 19 D = min ˆ s E[(s−ˆ s) 2 ]≤E " s− z max PT c 2 # (2.32) where the minimization is over all estimators ˆ s and the inequality holds since the BLUE is a particular estimator. Note that the expectation in Equation (2.32) is over boths andn. Consider Equation (2.32). We know thatz max has the statistics of eitherz 1 orz \1,max with the appropriate probability: D ≤ D1 z }| { E " s− z 1 PT c 2 1 {zmax=z1} ) # +E " s− z \1,max PT c 2 1 {zmax6=z1} # | {z } D2 (2.33) D 1 ≤ s 3σ 4 n (1−E[p e (s)]) P 2 T 2 c (2.34) D 2 ≤ s E p e (s) A 4 + 6cσ 2 n PT c A 2 + 3c 2 σ 4 n P 2 T 2 c (2.35) where Equation (2.33) follows by conditioning, and Equations (2.34) and (2.35) follow by an application of the Cauchy-Schwartz inequality and by using Equations (2.21), (2.24) and (2.25) and where A =s(1−c)− σn √ PTc p 2(1−c)ln(M−1). Note that the averaging in Equations (2.34) and (2.35) is with respect to s alone and not with respect to the noise variable anymore. This averaging will depend on the statistics of the fading parameter s that is assumed for the channel. Achievable Rate Calculation: Since we perform a demodulation at the destination before any- thing else, we can replace the channel by an M-input, M-output discrete memoryless channel for each block that we communicate over. Equation (2.29) gives us the probability that ˆ m6= m and so, 1−p e (s) =P[ˆ m =m|s]andalltheotherprobabilitiesneedtosumuptop e (s)foranygivenm 20 and s. Since we do not assume any channel state information at the receiver, we cannot assume that p e (s) is known to the destination. However: P[ˆ m6=m] =E[P[ˆ m6=m|s]]=E[p e (s)] (2.36) Given Equation (2.36), we can employ Fano’s Inequality (see [11] Section 2.11 for example) to get: H(X|Y) ≤ H 2 (E[p e (s)])+E[p e (s)]log(M−1) (2.37) and hence, C A = I(X;Y) = H(X)−H(X|Y) (a) ≥ H(X)−(H 2 (E[p e (s)])+E[p e (s)]log(M−1)) (b) = logM−(H 2 (E[p e (s)])+E[p e (s)]log(M−1)) = log M M−1 +E[1−p e (s)]log(M−1)−H 2 (E[p e (s)]) where, • (a) follows by the inequality in (2.37), • (b) holds since we can assume that the source symbols are uniformly distributed over all the possible input symbols. Thus establishing the Theorem. Given the form of Equation (2.29), we can improve performance by either decreasing c or by increasing SNR. While we can increase SNR independently of all the other variables involved, 21 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Distortion in s Achievable Rate Increasing p n Figure 2.4: The achievable (C A ,D) region for the binary channel when p n changes from 0.05 to 0.45, T c is held fixed at 10 and p s is fixed at 0.05. it is interesting to observe that as c decreases, the number of sequences that satisfy the cross correlation constraint in Equation (2.19) decreases and hence the number of messages that can be transmitted reliably decreases too thereby reducing the achievable rate in Equation (2.17). Thus, there is an inherent trade-off between the best c and M for a given channel depending on the distortion bound that we are required to meet. 2.4 Results Figure 2.4 presents the achievable rate for the binary channel using the results of Theorem 1. As p n increases, the channel gets noisier and noisier and at p n = 0.5, the achievable rate is 0 for any distortion constraint. It is enlightening to note that the achievable rate curve rises sharply from D = 0. This indicates that even a small tolerance to error in channel estimation can enable us to communicate at a considerable rate even as we meet the distortion constraint. Figure 2.5 presents a comparison of the achievable rates for the cases when we perform esti- mation and communication jointly or independently still for the binary case. We first observe the 22 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25 Distortion in s Achievable Rate Joint Individual Capacity of the channel (without the distortion constraint) Figure 2.5: A comparison between the achievable rate regions (for the binary case) given by joint communication and estimation as compared to allocating a fixed number of time slots to estimation and communication. (T c = 10, p s = 0.05 and p n = 0.2.) “stepped” nature of the curve when we perform communication and estimation independently. This is because while p x is a real number, the time allocated to either communication or esti- mation needs to be a natural number. It is trivially observed that one can achieve a higher rate with joint communication and estimation as compared to individually for most distortion bounds D. Further, at high distortions, these gains can be quite significant. We also note that for the “corner” points our bounding scheme for the jointly achievable rate is loose and is below the achievable rate for the scheme that estimates and communicates in separate intervals of time. We present the bounds in Theorem 2 for a particular distribution for s in Figure 2.6. s is assumed to be Rayleigh(σ 2 ) for different σ, T c = 100 and we use the vertices of a simplex in T c dimensions (note that there are T c +1 such sequences). These sequences attain the simplex bound for cross-correlation (c = 1 Tc−1 . Note that, in general, the expectations in the statement of Theorem 2 need to be evaluated numerically except in certain special cases. Like in the binary case, note the large achievable rate as we grow tolerant to even a little distortion. Finally, Figure 2.7 presents a comparison between the achievable rate presented in this chapter and an 23 0 5 10 15 20 0 1 2 3 4 5 6 7 Distortion in Estimating s Achievable Rate Increasing ! 2 Figure 2.6: The achievable (C,D) region for the block fading case as σ 2 increases. upper bound on the capacity of the fading channel with the same parameters employing disjoint communication and estimation. The gains when we use the scheme proposed are apparent. The upper bound was obtained exploiting the concavity of the log function. For this plot, we employ the parameter σ 2 = 5. 2.5 Conclusions We present a problem linking together results from information theory and estimation theory. Two specific cases for the formulation are analyzed and achievable rates are presented for these two cases. The analysis presents the useful engineering insight that joint channel communication and estimation is better than the current procedure of carrying the two out independently for most values of the distortion constraint. Further work includes improving the bounds we have presented, a full characterization of the capacity region for these special cases considered and a general framework for the analysis of a joint communication and estimation problem using the schemes introduced in this chapter. 24 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 Distortion in S Achievable Rate Achievable Rate ! Joint Communication Upper Bound ! Disjoint Communication Figure 2.7: The achievable (C,D) region as compared to an upper bound on the disjoint estima- tion/communication problem. 25 Chapter 3 The Capacity-Distortion Trade-off While Chapter 2 proposes the capacity distortion problem and presents achievable rates for two particular signal models, it is impossible to evaluate the (sub-)optimality of any such scheme without a result on the best possible transmission rate that can be sustained given a particular signal model. We address this problem in this chapter and present a coding theorem for the capacity distortion problem. The problem framework is similar to that of Chapter 2, except that the channel is now assumed to be a DMC and hence changes to an independent realization every instant of time. Interestingly, this makes the problem easier to address in an information theoretic setting. However, we note that by considering blocks of size T c , the coherence time of the channels in the previous chapter, and by augmenting the alphabet size of the encoder and the decoder (X new =X Tc andY new =Y Tc ) we can address these problems using the DMC framework introduced below. 3.1 Introduction In this chapter, we consider the problem of joint communication and channel estimation over a channel with a time-varying channel state. For a noisy channel with a random channel state that evolves with time, the objective is to have the receiver recover both the information transmitted from the transmitter as well the state of the channel over which the information was transmitted 26 Encoder Decoder m m s p(y | x,s) Channel ^ ^ s x y Figure 3.1: The general signal model without CSIT. under the presumption that the channel state is not available to either the transmitter or the receiver. The problem setting may prove relevant for situations such as environmental monitoring in sensor networks [45], underwater acoustic/sonar applications [50], and cognitive radio [48]. A distinct feature of our problem formulation is that both communication and channel es- timation are required. The interplay between mutual information and estimation (minimum mean-squared error (MMSE)) has long been investigated where the focus is on estimation and thus communication of data; see, e.g., [27] and references therein. Channel estimation with the goal of facilitating information transmission, rather than a separate goal in itself has also been extensively studied [28,68]. In those prior works, the quality of the channel estimate is not traded offwiththeinformationrateasweconsiderinthiswork. Additionally, weexaminemulti-terminal scenarios, where the need to estimate multiple channel states provides further challenges. Our problem statement is in contrast to the significant amount of work which examines communica- tion in multi-terminal scenarios such as cooperative communication as in [25,34] or optimal relay processing in linear networks as in [57]. We have previously considered a pure estimation problem with communication constraints over two particular network topologies in [58,59] however, no messages independent of the channel states were transmitted as we do herein. Theproblemformulationin[10,54]bearssomesimilaritytothepoint-to-pointcaseweconsider – the receiver is interested in both communication and channel estimation. However, there is 27 a critical distinction: in [10,54] the channel state is assumed known at the transmitter. In our formulation, neithertransmitternorreceiverhas a priorichannelstateinformation. Furthermore, we examine more complex networks than a single link, such as the multiple access channel and a two-hop linear network. Even the solution for the point-to-point network in our work and in [10,54] are quite different, as we will show. Intuitively, thereexistsatradedbetweenachannel’scapabilityoftransferringinformationand its capability of exhibiting state. Information transmission is accomplished by exercising random quantities as channel inputs, thereby increasing the randomness of channel outputs and reducing the receivers capability of estimating channel states. Channel state estimation in turn, desires the transmitter to transmit deterministic quantities to facilitate estimation, limiting any information transmission through the channel. We quantize this fundamental tension in this work. The main contributions of this work are summarized as follows: • We show that the optimal trade-off can be formulated as a constrained channel coding problem, with the channel input distribution constrained by an average estimation cost (or distortion) constraint. The problem of designing the optimal encoder then reduces to selecting those code words from the original encoder for the unconstrained problem which meet this cost constraint. • The results from the point-to-point case are then extended to the MAC with channel state estimation at the destination employing techniques akin to [22]. The important difference is that our distortion constraint on the channel state estimation translates to a coupled constraint for the encoders as opposed to the independent energy constraints at each of the encoders in [22]. Thus, given a distortion constraint to meet on the channel state estimation problem, the encoders need to collaboratively optimize their distributions to maximize rate given this constraint, before the start of communication. 28 • Finally, we analyze the linear two-hop network where the destination is interested in esti- mates of the states of both the active channels. We first need to trade off the distortion in the estimation of the channel state for the second hop with the rate that we can achieve over the second link. The relay node also needs to design its transmission to expose the second channel state while communicating enough information about the first channel state for the destination to mitigate just enough of the distortion introduced over the second channel. This introduces an additional constraint versus to the point-to-point case. We provide an achievable rate and a converse for this two-hop network. These bounds coincide leading to the capacity distortion region of the network when the two-hop network is said to be “information lossless”. The condition on the quality of information available about the first channel state at the relay node under which we have information losslessness is also presented. The rest of this chapter is organized as follows. Section 3.2 introduces the channel model and the capacity-distortion function, and Section 3.3 formulates the equivalent constrained channel coding problem. Section 3.4 illustrates the application of the capacity-distortion function through severalsimpleexamples. Section3.5extendstheresulttothemultipleaccesschannelwithchannel state estimation and presents an illustrative example. Section 3.6 presents an upper bound for the linear two-hop network where the destination is interested in channel estimates for each of the links and Section 3.7 presents some illustrative examples for this network. Finally, Section 3.8 concludes the chapter. 3.2 One Hop Signal Model We first consider the point-to-point channel presented in Figure 3.2. We want to communicate a messageM equallyprobablyselectedamongM ={1,...,b2 nR c}. Thismessageisencodedbythe encoder, generating the corresponding channel inputs {X 1 ,...,X n } of length n, which are then 29 Encoder Decoder m m s p(y | x,s) Channel ^ ^ s x y Figure 3.2: Channel model for joint communication and channel estimation. transmitted over the channel. The alphabet of the encoder (X) is determined depending on the channel model. Definition 1. An encoder is defined by a function, f n : M = {1,...,b2 nR c} → X n , for each n∈N. ThechannelisdescribedbyatransitionfunctionP(y|x,s),whichistheprobabilitydistribution of the channel output Y, conditioned on the channel input X and the channel state S. Upon receiving a length-n block of channel outputs, the joint decoder and estimator declares ˆ M ∈ {1,...,b2 nR c} as the decoded message, and a length-n block of estimates of the channel state, ˆ S 1:n . For technical purposes, in this chapter, we assume that the random channel state evolves with timeinamemorylessfashion. Wenotethatthismodelencompassestheblockinterferencechannel model, because we can treat a block as a super-symbol and thus convert a block interference channel into a memoryless channel. Definition 2. An estimate for a variable Z given X (denoted ˆ Z(X)) is a mapping h :X→Z. Definition 3. A joint decoder and estimator is defined by a pair of functions, g n :Y n →M and h n :Y n →S n , for each n∈N. This definition differs from that for the conventional channel decoder (e.g., [11]) in that it explicitly requires estimation of the channel state S at the receiver. The quality of estimation is 30 measuredbyadistortionfunctiond :S×S→R + ∪{0}. Thatis,if ˆ S i istheithelementofh n (Y 1:n ), then d(S i , ˆ S i ) denotes the distortion at time i, i = 1,...,n. For technical convenience, we assume that d(·,·) is bounded from above so that there exists a finite T > 0 with d(s,s 0 ) ≤ T < ∞ for any s,s 0 ∈S. Note that for length-n block coding schemes, the average distortion is given by ¯ d(S 1:n , ˆ S 1:n ) = 1 n n X i=1 d(S i , ˆ S i ). (3.1) Finally, we have the following definitions. Definition 4. A non-negative number R(D) is an achievable rate if there exist a sequence of encoders and corresponding joint decoders and estimators such that (a) the average probability of decoding error P (n) e = (1/b2 nR(D) c)· P b2 nR(D) c m=1 Pr[ ˆ M 6=m|M =m] tends to zero as n→∞; and (b) the average distortion in channel state estimation meets a constraint; limsup n→∞ E ¯ d(S 1:n , ˆ S 1:n )≤D. (3.2) Definition 5. The capacity-distortion function is defined as C(D) = sup fn,gn,hn R(D). (3.3) Remark: It is important to distinguish between the capacity-distortion function and the rate- distortion function in lossy source coding [11]. The capacity-distortion function is defined to characterize the fundamental trade-off between the rate of information transmission and the dis- tortion of state estimation. In contrast, the rate-distortion function is defined with respect to a source distribution, seeking to characterize the fundamental trade-off between the rate of its lossy description and the achievable distortion due to that same description. 31 3.3 A One Hop Constrained Channel Coding Formulation In this section, we show that the joint communication and channel estimation problem can be equivalently formulated as a constrained channel coding problem. For this purpose, the following minimum conditional distortion function for each possible realization of the channel input X will be important: d ∗ (x) = inf h:X×Y→S E[d(S,h(x,Y))], (3.4) wheretheexpectationiswithrespecttothechannelstateS andthechanneloutputY conditioned upon the channel input X = x and h :X×Y→S denotes an arbitrary one-shot estimator of S given the channel input and output. The following theorem establishes the constrained channel coding formulation. Theorem 3. The capacity-distortion function for the channel model in Figure 3.2 is given by C(D) = sup P X ∈P D I(X;Y), (3.5) where P D = ( P X : X x∈X P X (x)d ∗ (x)≤D ) . (3.6) Remark: Theorem 3 applies to general input/output/state alphabets. If X is a continuous random variable, the summation in Equation (3.6) should be understood as an integral overX. In order to prove Theorem 3, we shall employ the following lemmas. 32 Lemma 2. For any (f n ,g n ,h n )-sequence that achieves C(D), as n → ∞, the achieved average distortion, ¯ d(S 1:n ,h n (Y 1:n )), is (in probability) equal to ¯ d(S 1:n , ˆ S ∗,1:n ) and ˆ S ∗,1:n =h ∗ n (X 1:n ,Y 1:n ), (3.7) where h ∗ n (X 1:n ,Y 1:n ) denotes the block-n estimator that achieves the minimum average distortion conditioned upon both the block-n channel inputs and outputs: h ∗ n (X 1:n ,Y 1:n ) =arg min h:X n ×Y n →S n d(S 1:n ,h(X 1:n ,Y 1:n )) n (3.8) Proof: For each n, let us replace the estimator h n in Equation (3.4) by h ∗ n in Equation (3.7), with its first argument being the channel inputs ˆ X 1:n corresponding to the decoded message ˆ M. When ˆ M =M,theminimumaveragedistortionisachievedbyh ∗ n ;when ˆ M 6=M,theincrementin the average distortion due to replacing h n by h ∗ n is bounded from above because d(·,·)≤T <∞. By Definitions 4 and 5, as n→∞, the average probability of decoding error P (n) e → 0. Hence as n→∞, the minimum average distortion is achieved by h ∗ n ( ˆ X 1:n ,Y 1:n ), which is further equal to the term in Equation (3.7), in probability and hence the lemma is established. Lemma 2 shows that the joint decoder and estimator can utilize the reliably decoded channel inputs for channel state estimation. The next lemma further shows that the length-n block estimator can be decomposed into n one-shot estimators, one for each channel use. Lemma 3. For any (f n ,g n ,h n )-sequence that achieves C(D), as n → ∞, the achieved average distortion in Inequality (3.2) is (in probability) equal to that achieved by ˆ S ∗,i =h ∗ (X i ,Y i ), i= 1,...,n, (3.9) where h ∗ (X i ,Y i ) denotes the one-shot estimator that achieves the minimum expected distortion for S i conditioned upon both the channel input X i and output Y i . 33 Proof: From Lemma 2, as n → ∞, h n (Y 1:n ) is in probability equivalent to h ∗ n (X 1:n ,Y 1:n ). The decomposition in Equation (3.9) follows because the channel is memoryless. For each fixed n, we have P(S 1:n |X 1:n ,Y 1:n ) = P(X 1:n ,Y 1:n ,S 1:n ) P(X 1:n ,Y 1:n ) (3.10) (a) = P(Y 1:n |X 1:n ,S 1:n )P(X 1:n )P(S 1:n ) P S 1:nP(Y 1:n |X 1:n ,S 1:n )P(S 1:n )P(X 1:n ) (3.11) (b) = Q n i=1 P(Y i |X i ,S i )P(S i ) Q n i=1 ( P S iP(Y i |X i ,S i )P(S i )) (3.12) (c) = n Y i=1 P(Y i |X i ,S i )P(S i )P(X i ) P(Y i |X i )P X (X i ) (3.13) = n Y i=1 P(S i ,X i ,Y i ) P(Y i |X i )P X (X i ) = n Y i=1 P(S i |X i ,Y i ). (3.14) where, (a) holds by the definition of conditional probabilities, (b) holds once we cancel P(X 1:n ) and by the memoryless nature of the channel and the i.i.d nature of S, (c) holds by the definition of conditional probabilities and once we multiply numerator and denominator by Q n i=1 P(X i ). As we take n→∞, the lemma is established. Proof of Theorem 3: From Lemma 3, we can rewrite the average distortion constraint in Inequality (3.2) as limsup n→∞ 1 n n X i=1 Ed(S i , ˆ S i )≤D (3.15) ⇒ limsup n→∞ 1 n n X i=1 Ed(S i ,h ∗ (X i ,Y i ))≤D. (3.16) Utilizing Equation (3.4) and the fact that the channel is memoryless, we can further deduce from Inequality (3.16) that Ed ∗ (X)≤D, (3.17) 34 in the limit n→∞. So now the constraints in Definition 4 reduce to having P (n) e → 0 as n→∞, subject to the constraint of Inequality (3.17). This is exactly the problem of channel coding with a cost constraint on the input distribution, and Theorem 3 directly follows from standard proofs for channel capacity; see, [23] for example. Discussion: (1) The proof of Theorem 3 suggests the joint decoder and estimator first decode thetransmittedmessageina“non-coherent”fashion,thenutilizethereconstructedchannelinputs along with the channel outputs to estimate the channel states. As the coding block length grows large, such a two-stage procedure becomes asymptotically optimal. (2) For each x∈X, d ∗ (x) quantifies its associated minimum distortion. Alternatively, d ∗ (x) can be viewed as the “estimation cost” due to signaling with x. Hence the average distortion constraint in Equation (3.6) regulates the input distribution such that the signaling is estimation- efficient. We note that, d ∗ (x) is dependent on the channel through the distribution of the channel state S, and thus differs from other usual costs such as symbol energies or time durations. (3) A key condition that leads to the constrained channel coding formulation is that the channel is memoryless. Due to the memoryless property, we can decompose a block estimator into multiple one-shot estimators, without loss of optimality asymptotically. If the channel state evolves with time in a correlated fashion, then such a decomposition is generally suboptimal. 3.4 Illustrative Examples for the Point-to-Point Case In this section, we discuss several simple examples to illustrate the application of Theorem 3. We will start with simple binary examples to illustrate how the theorem is applied and then move on to other, more practical, examples. 35 3.4.1 Uniform Estimation Costs A special case is that d ∗ (x) = d 0 for all x∈X. For such types of channels, the average cost con- straint in Equation (3.6) exhibits a singular behavior. If D < d 0 , then the joint communication and channel estimation problem is infeasible; otherwise, P D consists of all possible input distri- butions, and thus the capacity-distortion function C(D) is equal to the unconstrained capacity of the channel. One of the simplest channels with uniform estimation costs is the additive channel Y i =X i +S i , for which as the receiver reliably decodes M, it can subtract off X i from Y i . 3.4.2 A Binary Multiplicative Channel Consider the following channel Y i =S i X i , (3.18) where X and Y are length-K blocks so that the super-symbols in the block memoryless channel have alphabets X K =Y K ={0,1} K and the multiplication is in the sense of real numbers. The channel state S ∈S ={0,1} remains fixed for each block, and changes in a memoryless fashion across blocks. We adopt the Hamming distance as the distortion measure: d(s,ˆ s) =1 if and only if ˆ s6=s and zero otherwise. First, let us considertheK = 1case, when the block multiplicative channel reduces to ascalar multiplicative channel: Y i =S i X i . (3.19) We can view S as the status of a jamming source, a fading level, or the status of a primary transmitter. Activating S to its “effective status” S = 0 essentially shuts down the link between X andY;otherwise,thelinkX →Y isessentiallynoiseless. Thetrade-offbetweencommunication 36 and channel estimation is straightforward to observe from the nature of the channel: for good estimation of S, we want x = 1 as often as possible, whereas this would reduce the achieved information rate. Let us assume that P(S = 1)=r≤ 1/2. We shall optimize P(X = 1), denoted byp∈ [0,1]. ThechannelmutualinformationisI(X;Y) =H 2 (pr)−p·H 2 (r),whereH 2 (·)denotes the binary entropy function: H 2 (t) =−tlogt−(1−t)log(1−t). For x = 0, the optimal one-shot estimator is ˆ S = 0 (note that P(S = 1) = r ≤ 1/2), and the resulting minimum conditional distortion is d ∗ (0) = r. For x = 1, the optimal one-shot estimator is ˆ S = Y = S, leading to d ∗ (1) =0. Therefore the input distribution should satisfy (1−p)r≤D. We find that the optimal solution is given by If D≥r− h 1+e H2(r)/r i −1 :p ∗ = 1 r h 1+e H2(r)/r i −1 , C(D) =H 2 (p ∗ r)−p ∗ ·H 2 (r), otherwise :p ∗ = 1− D r andC(D) =H 2 (r−D)− 1− D r H 2 (r). From the solution, we observe the following. For relatively large D, the average distortion con- straint is not active, and thus the optimal input distribution coincides with that for the uncon- strained channel capacity. As the estimation distortion constraint D falls below a threshold, the averagedistortionconstraintbecomesactive, andthecapacity-distortionfunctionC(D)decreases from the unconstrained channel capacity. In fact, this is what we expect all C(D) curves to look like in general. For this channel, we can show from the expression of C(D) that, as D→ 0, C(D) = log(1−r) −r D+o(D) (3.20) whichreflectsalinearincreaseincapacityasweloosenthedistortionrequirement. However, asD increases, wehavediminishingreturnsandtherelationshipbecomessub-linear. Figure3.3depicts C(D)versusD fordifferentvaluesofr. Wenoticethatthetrade-offbetweencommunicationrates and estimation distortions is evident. 37 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25 Distortion in s Capacity Increasing p s Figure 3.3: Capacity-distortion function for the scalar multiplicative channel. Let us now consider the case when K ≥ 2. For such a channel, there are 2 K possible vectors for an input super-symbol. However, we note that, all of them except the all-zero x = 0 are similar. This is because they all lead to the same conditional distribution for Y as well as the same minimum conditional distortion d ∗ (x) = 0, ∀x 6= 0. So from the concavity property of channel mutual information in input distributions, the optimal input distribution should take the following form: P X (0) =1−p, andP X (x) =p/(2 K −1), ∀x6= 0. We can find that the channel mutual information per channel use is I(X;Y) K = 1 K H 2 (pr)+p· rlog(2 K −1)−H 2 (r) , (3.21) 38 and that the average distortion constraint is (1−p)r≤D, (3.22) the same as that in the scalar multiplicative channel case. After some manipulations, we find that the resulting optimal solution for general K ≥ 1 is Case 1 : 2 K > 1+(1−r) −1/r :p ∗ = 1, C(D) = rlog(2 K −1) K > 0. Case 2 : 2 K ≤ 1+(1−r) −1/r : if D≥r− 1+ 1 2 K −1 e H2(r)/r −1 ≥ 0 : p ∗ = 1 r 1+ 1 2 K −1 e H2(r)/r −1 , C(D) = 1 K H 2 (p ∗ r)+p ∗ rlog(2 K −1)−H 2 (r) ; otherwise :p ∗ = 1− D r , C(D) = 1 K H 2 (r−D)+ 1− D r rlog(2 K −1)−H 2 (r) . IfthechannelblocklengthK issufficientlylargesuchthat2 K > 1+(1−r) −1/r ,theresultingp ∗ as given by Case 2 would be greater than one, which is impossible for a valid probability. In Case 1, we have P X (0) = 0, and all the nonzero symbols selected with equal probability 1/(2 K −1). In fact, Case 1 kicks in for rather small values of K. In our channel model we have assumed r∈ [0,1/2]. For r smaller than 0.175, Case 1 arises for K ≥ 2; and for r larger than 0.175, Case 1 arises for K ≥ 3. In the scalar multiplicative channel case (K = 1), we have noticed that C(D) linearly scales to zero as D→ 0; see Inequality (3.20). For K > 1, however, we have C(0) = rlog(2 K −1) K > 0. (3.23) For comparison, let us consider a suboptimal approach based upon training; the source transmits X = 1 in the first channel use in each channel block. The receiver can thus perfectly estimate 39 Figure 3.4: The achievable rate with training and joint communication and estimation for the block multiplicative channel when D = 0. the channel state S and achieve D = 0. The encoder then can use the remaining (K−1) channel uses in each channel block to encode information, and the resulting achievable rate is R(0) = rlog(2 K−1 ) K . (3.24) Comparing C(0) and R(0), we notice that their ratio approaches one as K →∞, consistent with the intuition that training usually leads to negligible rate loss for channels with long coherence blocks. However, as seen in Figure 3.4, for small coherence blocks, our approach outperforms the training based approach for the particular case when D = 0. 3.4.3 Fading Channel with Binary Signaling Consider the signal model Y i =S i X i +Z i (3.25) 40 where, i is the time index, X ={0,1} andS =Z =C. We shall assume that S ∼ CN(0,σ 2 s ) and Z ∼ N(0,σ 2 z ) and that they are both i.i.d, circularly symmetric and independent of each other. The distortion metric employed at the destination is the standard L 2 -norm, d(S, ˆ S) =||S− ˆ S|| 2 . If we assume that P(X = 1) = p (0≤ p≤ 1), the probability density function for Y is given by f Y (y) = p √ π(σ 2 s +σ 2 z ) exp(− y 2 (σ 2 s +σ 2 z ) )+ 1−p √ πσ 2 z exp(− y 2 σ 2 z ). If X were always to be picked to be 1, we would have Y i =S i +Z i and the MMSE for S i given this signal model is given by D(1) = σ 2 s σ 2 z σ 2 s +σ 2 z . Further, we can trivially show that D(0) =σ 2 s . The distortion constraint for the estimation for S at the destination node now becomes pσ 2 s σ 2 z σ 2 s +σ 2 z +(1−p)σ 2 s ≤D. (3.26) Since H(Y) cannot be evaluated in closed form, we evaluate H(Y) numerically for different p’s which satisfy the distortion constraint. Using the Karush-Kuhn-Tucker (see [8] for example) conditions, we obtain Figure 3.4.3 for the case when σ 2 z is kept fixed and σ 2 s increases. Note that, unlike the earlier cases where the C(D) curve starts from the origin, here the capacity-distortion function remains identically zero till D≥ σ 2 s σ 2 z σ 2 s +σ 2 z > 0. To gain some more insight into this problem, let us consider the case when σ 2 z = 0. This can be seen as the limit σ 2 z → 0 – which is equivalent to the limit SNR→∞. The signal model then reduces to Y i =S i X i . Since S is a Gaussian variable, h(S) =log(2πeσ 2 s ). We now have, H(X|Y) =0 and so, I(X;Y) =H 2 (p). (3.27) 41 Figure 3.5: The trade-off curve for Y k =S k X k +Z k when σ 2 Z = 0.1 and σ 2 s varies (The curves for σ 2 s = 0.5,1,2,4,8 are shown here). If X = 1, we make no error in estimating S, while if X = 0, we have no information to estimate S from. This leads to an minimum average distortion of (1−p)σ 2 in the estimation of S. We then have the following optimization problem: maximize H 2 (p) subject to (1−p)σ 2 ≤D. Solving this problem gives us the optimal p: p ∗ = 1 2 if D > σ 2 s 2 , 1− D σ 2 s else. and C(D) = H 2 (p ∗ ) for either case. This trade-off curve is presented in Figure 3.6. We note that as in the binary case above, the capacity decreases with increasing σ 2 s . Further, the maximal 42 Figure 3.6: The trade-off curve for Y k = S k X k when σ 2 s varies (The curves for σ 2 s = 5,10,20,30 are shown here). communication rate (or the unconstrained capacity) is 1 (since X is binary) and is attained when D = σ 2 s 2 as validated by our results above. 3.4.4 Two Parallel Channels Consider the network presented in Figure 3.7. We have two source nodes represented by the two independent encoders. These sources have independent messages M 1 and M 2 to transmit to the destination with a vanishing probability of error. In addition to recovering these messages, the destination (represented by the decoder) is also interested in estimates of the channel parameters S 1 and S 2 . Again, we impose a distortion constraint on these estimates. We are then interested in the capacity region: C(D) =∪{(R 1 ,R 2 ) :R 1 andR 2 are achievable andD 1 +D 2 ≤D} (3.28) 43 DEC ENC 1 ENC 2 m 2 m 1 x 1 x 2 s 2 s 1 ˆ s 1 ˆ m 1 ˆ m 2 ˆ s 2 y 1 y 2 P(y 1 |x 1 ,s 1 ) P(y 2 |x 2 ,s 2 ) Figure 3.7: Two parallel channels where we want to estimate both the channel parametersS 1 and S 2 . where we have imposed a sum-distortion bound of D on the two channels. Employing the results in Section 3.3, we can show the following: C(D) = ¯ ∪ D1+D2≤D (R 1 ,R 2 ) :R 1 ≤C 1 (D 1 ) andR 2 ≤C 2 (D 2 ) (3.29) where C 1 (D) = max px 1 ∈P D I(X 1 ;Y) and C 2 (D) = max px 2 ∈P D I(X 2 ;Y) and ¯ ∪ indicates that we take the convex closure of the set obtained by varying D 1 and D 2 . For example, let us pick both the channels in Figure 3.7 to be the binary multiplicative channels presented earlier: Y i 1 =S i 1 X i 1 andY i 2 =S i 2 X i 2 (wherei is the time index). Further, let us assume that the channels are symmetric: P(S 1 = 1) = P(S 2 = 1) = 0.2. We then have capacity distortion region presented in Figure 3.8. Note that as D increases, the shape of the capacity distortion region tends towards a square – which implies that for large distortion constraints, we can effectively design the two channels independently. 44 0 0.02 0.04 0.06 0.08 0.1 0.12 0 0.02 0.04 0.06 0.08 0.1 0.12 R 1 R 2 D = 0.00001 D = 0.05 D = 0.1 D = 0.2 D = 0.3 Figure 3.8: The capacity distortion region for two symmetric binary multiplication channels when P(S i = 1)= 0.2 for varying D. ENC 1 ENC 2 DEC y m 2 m 1 ˆ m 1 ˆ m 2 s ˆ s p(y|x 1 ,x 2 ,s) x 1 x 2 Figure 3.9: The 2-MAC-CSE. 45 3.5 MultipleAccessChannelwithChannelStateEstimation We will start with analyzing the two transmitter MAC; the case where we have more than two transmitters will then follow as a corollary. Figure 3.9 presents the signal model for a general two transmitter MAC with the channel state given by S (assumed to be i.i.d) and the decoder required to form an estimate of the channel parameter ˆ S. While the encoders (ENC 1 and ENC 2) are similar to the classical encoders for a regular MAC, the major difference lies in the decoder structureimposedhere. AlsonotethedifferencebetweentheMACandtheparallelchannelsignal models: while Y 1 and Y 2 , the outputs of each of the parallel channels are available directly to the decoder in the parallel channel model, for the MAC, we have only one channel output Y from which we need to guess both the transmitted messages and the channel state. Let us first present a few definitions. Definition 6. A two source discrete memoryless multiple access channel with channel state estimation (2-MAC-CSE) consists of four alphabets X 1 , X 2 , S and Y and a probability transition matrix p(y|x 1 ,x 2 ,s) and a constraint D for the estimation error for S at the destination. Note that by the memoryless property of the channel we have P(y 1:n |x 1:n 1 ,x 1:n 2 ,s 1:n ) = n Y i=1 P(y i |x i 1 ,x i 2 ,s i ) (3.30) Definition 7. A ((b2 nR1 c,b2 nR2 c),n) code for the multiple access channel with channel state estimation consists of two sets of integers M 1 = {1,2,...,b2 nR1 c} and M 2 = {1,2,...,b2 nR2 c} called the message sets, two encoding functions, X 1 : M 1 →X n 1 , (3.31) X 2 : M 2 →X n 2 (3.32) 46 and two decoding functions, g n : Y n →M 1 ×M 2 (3.33) ˆ S 1:n : Y n ×X n 1 ×X n 2 →S n (3.34) Assuming that the distribution of messages selected by the transmitters is uniform (i.e the messages are independent and equally likely), we define the average probability of error for the ((b2 nR1 c,b2 nR2 c),n) code as: P (n) e = 1 b2 n(R1+R2) c X (m1,m2)∈M1×M2 P{g(Y 1:n )6= (m 1 ,m 2 )|(m 1 ,m 2 ) sent}. (3.35) Now, suppose we are given a map d :S× ˆ S→R + ∪{0} which we call the distortion function and let us assume that d is bounded above and that d splits additively over higher dimensions: d(S 1:n , ˆ S 1:n ) = n X i=1 d(S i , ˆ S i ). (3.36) Definition 8. A rate pair (R 1 ,R 2 ) is said to be achievable for the 2-MAC-CSE if there exists a sequence of ((b2 nR1 c,b2 nR2 c),n) codes with P (n) e → 0 and if lim sup n→∞ Ed(S 1:n , ˆ S 1:n )≤D (3.37) Definition 9. The capacity distortion region of the 2-MAC-CSE is the convex closure of the set of achievable (R 1 ,R 2 ) rate pairs. Analogous to the formulation in Section 3.3, let us define the conditional minimum distortion function: d ∗ (x 1 ,x 2 ) = inf ˆ S:Y×X1×X2→S E[d(S,h(Y,x 1 ,x 2 )], (3.38) 47 and we again note that we have a distortion cost d ∗ (x 1 ,x 2 ) for each (x 1 ,x 2 )∈X 1 ×X 2 and that the expression above is averaged over the statistics of S and Y. Theorem 4 (2-MAC-CSE capacity distortion region). The capacity distortion region of the 2- MAC-CSE (X 1 ×X 2 ,p(y|x 1 ,x 2 ,s),Y,S,D) is the convex closure of all (R 1 ,R 2 ,Q) satisfying R 1 < I(X 1 ;Y|X 2 ,Q), (3.39) R 2 < I(X 2 ,Y|X 1 ,Q), (3.40) R 1 +R 2 < I(X 1 ,X 2 ;Y|Q) (3.41) for some product distribution p(q)p 1 (x 1 )p 2 (x 2 ) on Q×X 1 ×X 2 which satisfies: X (q,x1,x2)∈Q×X1×X2 p(q)p 1 (x 1 |q)p 2 (x 2 |q)d ∗ (x 1 ,x 2 )≤D. (3.42) Note that |Q| is bounded (see [11] for more details.) Note that the distortion constraint at the destination imposes an additional constraint, In- equality (3.42), on the capacity distortion region as compared to the MAC without the channel stateestimationconstraint. WenotethattheresultpresentedinTheorem4issimilartotheresult presented in [22] with a key distinction. While [22] presents the case where we have independent constraintsonX 1 andX 2 , thedistortionconstraintweconsiderhereimposesacoupledconstraint on the two encoders. We note that in this scenario, the two encoders cannot optimize their dis- tributions independently and need to jointly agree on their distributions before transmission. We will employ the following lemmas to prove this theorem. Lemma 4. For every n> 0, ∃ n 3H(X 1:n 1 ,X 1:n 2 |Y 1:n )≤n n and n → 0. Proof. Note that by construction (Definition 7), P(E) = P((M 1 ,M 2 ) 6= g n (Y 1:n )) → 0 and we are done by Fano’s Inequality (see [11]). 48 Lemma 5. As n→∞, ˆ S 1:n (Y 1:n ) = ˆ S 1:n (X 1:n 1 ,X 1:n 2 ,Y 1:n ) for an achievable rate pair. Proof. The proof of this lemma is similar to that of Lemma 2. Lemma 6. We have the following decomposition: lim n→∞ ˆ S 1:n (Y 1:n ) ={ ˆ S 1 (X 1 1 ,X 1 2 ,Y 1 ),..., ˆ S n (X 1:n 1 ,X 1:n 2 ,Y n )} Proof. This follows from Lemma 5 and the memoryless property assumed for the channel. Con- sider the following P(S 1:n |X 1:n 1 ,X 1:n 2 ,Y 1:n ) = P(X 1:n 1 ,X 1:n 2 ,Y 1:n ,S 1:n ) P(X 1:n 1 ,X 1:n 2 ,Y 1:n ) (3.43) = P(Y 1:n |X 1:n 1 ,X 1:n 2 ,S 1:n )P(X 1:n 1 )P(X 1:n 2 )P(S 1:n ) P S 1:nP(Y 1:n |X 1:n 1 ,X 1:n 2 ,S 1:n )P(X 1:n 1 )P(X 1:n 2 )P(S 1:n ) (3.44) = n Y i=1 P(S i |X i 1 ,X i 2 ,Y i ) (3.45) by an argument similar to that used in Lemma 3, which establishes the lemma. We will need one additional lemma to establish the convexity of the capacity distortion region for the 2-MAC-CSE. Lemma 7. The capacity distortion region C(D) of the 2-MAC-CSE is convex. i.e. if (R 1 ,R 2 )∈ C(D) and (R 0 1 ,R 0 2 )∈C(D), then (λR 1 +(1−λ)R 0 1 ,λR 2 +(1−λ)R 0 2 )∈C(D) for 0≤λ≤ 1. Proof. Theproofisbasedonthenotionoftime-sharing(see[11]forexample). Giventwosequences of codes at different rates pairs R = (R 1 ,R 2 ) and R 0 = (R 0 1 ,R 0 2 ), we can construct a third codebook at a rate λR + (1−λ)R 0 = (λR 1 + (1−λ)R 0 1 ,λR 2 + (1−λ)R 0 2 ) by using the first codebook for the first λn symbols and the second for the last (1−λ)n symbols. The number of X 1 codewords in the new code is b2 n(λR1+(1−λ)R 0 1 ) c.Further, the distortion bound D is met for each of the codebooks and is hence met for the timeshared version. 49 Proof of Theorem 4: Given Lemmas 5 and 6, we can now formulate the second condition to be met for a rate to be achievable (Definition 7) as follows: Ed(S 1:n , ˆ S 1:n (Y 1:n ))≤D (3.46) (a) ⇐⇒ n X i=1 Ed(S i , ˆ S i (Y i ,X i 1 ,X i 2 ))≤D (3.47) (b) ⇐⇒ E X1,X2 [E S,Y [d(S i , ˆ S i (Y i ,X i 1 ,X i 2 ))]|(X 1 ,X 2 )]≤D (3.48) ⇐⇒ X (q,x1,x2)∈Q×X1×X2 p(q)p 1 (x 1 |q)p 2 (x 2 |q)d ∗ (x 1 ,x 2 )≤D (3.49) where(a)followsfromLemmas5and6,(b)followsfromtheergodicassumptionandtheproperties ofconditionalexpectationoncewenotethatD(x 1 ,x 2 ) =E S,Y [d(S i , ˆ S i (Y i ,x i 1 ,x i 2 ))]istheaveraged version of the distortion observed in S given a particular realization of (X 1 ,X 2 ). The constraint on the densities p 1 and p 2 presented in Inequality (3.42) now follows. The rest oftheproofforthistheoremfollowsfromtheproofforthecapacitydistortionregionoftheregular MAC (see [11] for example), given Lemma 7. (Achievability): Let us assume that the constraint presented in Equation (3.42) is satisfied for some fixed p(x 1 ,x 2 ) =p 1 (x 1 )p 2 (x 2 ). Generate b2 nR1 independent codewords X 1 (i) ∼ Q n i=1 p 1 (x i 1 ) for i ∈ {1,2,...,b2 nR1 } and each codeword has length n. Similarly, generate b2 nR2 independent codewords X 2 (j), j ∈ {1,2,...,b2 nR2 } where each element is i.i.d ∼ Q n i=1 p 2 (x i 2 ). These codewords form the code- book which is revealed to both the transmitters and the receiver. To send index i, sender 1 sends thecodewordX 1 (i)andsimilarly,tosendj,sender2sendsX 2 (j). Thedecodingruleisasfollows: Let A (n) denote the typical (x 1 ,x 2 ,y) sequences. The receiver chooses the pair (i,j) such that (x 1 (i),x 2 (j),y)∈A (n) if such a pair (i,j) exists and is unique; otherwise, an error is declared. Let us now analyze the probability of error of this scheme. By the symmetry of the random code construction, the 50 conditional probability of error does not depend on which pair of indices is sent. So, without loss of generality, we can assume that (i,j) =(1,1). We have an error if either the correct codewords arenottypicalwiththereceivedsequenceorthereisapairofincorrectcodewordsthataretypical with the received sequence. Define the events E ij ={(X 1 (i),X 2 (j),Y)∈A (n) }. Then by the union bound, P (n) e ≤P(E c 11 )+ X i6=1,j=1 P(E i1 )+ X i=1,j6=1 P(E 1j )+ X i6=1,j6=1 P(E ij ), where P is the conditional probability given that (1,1) was sent. From the AEP, P(E c 11 ) → 0. Further, we have P(E i1 ) ≤ b2 −n(I(X1;Y|X2)−3) c, i6= 1 (3.50) P(E 1j ) ≤ b2 −n(I(X2;Y|X1)−3) c, j6= 1 (3.51) P(E ij ) ≤ b2 −n(I(X1,X2;Y)−4) c, i6= 1, j6= 1 (3.52) and then the achievability part of the theorem follows since we now have: P (n) e ≤P(E c 11 )+b2 nR1 b2 −n(I(X1;Y|X2)−3) c+b2 nR2 b2 −n(I(X2;Y|X1)−3) c+b2 n(R1+R2) b2 −n(I(X1,X2;Y)−4) c→ 0 since > 0. 51 (Converse): We want to prove: R 1 < I(X 1 ;Y|X 2 ,Q), (3.53) R 2 < I(X 2 ,Y|X 1 ,Q), (3.54) R 1 +R 2 < I(X 1 ,X 2 ;Y|Q) (3.55) for some product distribution p(q)p(x 1 |q)p(x 2 |q)p(s)p(y|x 1 ,x 2 ,s), where Q∈Q and|Q|≤ 4 with the additional constraint: X q∈Q X (q,x1,x2)∈Q×X1×X2 p(q)p 1 (x 1 |q)p 2 (x 2 |q)d ∗ (x 1 ,x 2 )≤D (3.56) We will first establish the following equation which will be used in the subsequent derivation: P(y n |x 1:n 1 ,x 1:n 2 ) = X s n ∈S n P(y 1:n |x 1:n 1 ,x 1:n 2 ,s 1:n )P(s 1:n ) (3.57) (a) = X s n ∈S n n Y i=1 {P(y i |x i 1 ,x i 2 ,s i )P(s i )} (3.58) (b) = n Y i=1 ( X s i ∈S P(y i |x i 1 ,x i 2 ,s i )P(s i ) ) (3.59) = n Y i=1 P(y i |x i 1 ,x i 2 ) (3.60) 52 where (a) follows by the memoryless property of the channel and the i.i.d nature of S and (b) by algebra. Now consider, nR 1 = H(M 1 ) (3.61) = I(M 1 ;Y 1:n )+H(M 1 |Y 1:n ) (3.62) (a) ≤ I(M 1 ;Y 1:n )+n n (3.63) (b) ≤ I(X 1:n 1 ;Y 1:n )+n n (3.64) (c) ≤ H(X 1:n 1 |X 1:n 2 )−H(X 1:n 1 |Y 1:n ,X 1:n 2 )+n n (3.65) = I(X 1:n 1 ;Y 1:n |X 1:n 2 )+n n (3.66) (d) = H(Y 1:n |X 1:n 2 )− n X i=1 H(Y i |Y 1:i−1 ,X 1:n 1 ,X 1:n 2 )+n n (3.67) (e) = H(Y 1:n |X 1:n 2 )− n X i=1 H(Y i |X i 1 ,X i 2 )+n n (3.68) (f) ≤ n X i=1 H(Y i |X 1:n 2 )− n X i=1 H(Y i |X i 1 ,X i 2 )+n n (3.69) (g) ≤ n X i=1 H(Y i |X i 2 )− n X i=1 H(Y i |X i 1 ,X i 2 )+n n (3.70) = n X i=1 I(X i 1 ;Y i |X i 2 )+n n (3.71) where (a) follows from Fano’s inequality, (b) from the data processing inequality, (c) because the messages and the encoding schemes are independent of each other, (d) by the chain rule, (e) by the memorylessness of the channel and the i.i.d nature of S (as established in Equation (3.60)), (f) by the chain rule and by removing conditioning and (g) by removing conditioning. So, we have R 1 ≤ 1 n n X i=1 I(X i 1 ;Y i |X i 2 )+n n Similarly, we have R 2 ≤ 1 n n X i=1 I(X i 2 ;Y i |X i 1 )+n n 53 Finally, to bound the sum rate, we have, n(R 1 +R 2 ) = H(M 1 ,M 2 ) (3.72) (a) ≤ I(M 1 ,M 2 ;Y 1:n )+n n (3.73) (b) ≤ I(X 1:n 1 ,X 1:n 2 ;Y 1:n )+n n (3.74) (c) = H(Y 1:n )− n X i=1 H(Y i |Y i−1 ,X 1:n 1 ,X 1:n 2 )+n n (3.75) (d) = H(Y 1:n )− n X i=1 H(Y i |X i 1 ,X i 2 )+n n (3.76) (e) ≤ n X i=1 H(Y i )− n X i=1 H(Y i |X i 1 ,X i 2 )+n n (3.77) = n X i=1 I(X i 1 ,X i 2 ;Y i )+n n (3.78) where, (a) follows by Fano’s inequality, (b) by the data processing inequality, (c) by the chain rule, (d) by the memoryless property of the channel and since S is i.i.d (as in Equation (3.60) and (e) by the chain rule and by removing conditioning. Hence, we have R 1 +R 2 ≤ 1 n n X i=1 I(X i 1 ,X i 2 ;Y i )+ n . Let us pick pick Q∼ Uniform{1,2,...,n}. We then have (we will consider only the bound on R 1 – the rest are similar) R 1 ≤ 1 n n X i=1 I(X i 1 ;Y i |X i 2 )+ n (3.79) = 1 n n X i=1 I(X Q 1 ;Y Q |X Q 2 ,Q =i)+ n (3.80) = I(X Q 1 ;Y Q |X Q 2 ,Q)+ n (3.81) = I(X 1 ;Y|X 2 ,Q)+ n (3.82) 54 where X 1 Δ = X Q 1 , X 2 Δ = X Q 2 and Y Δ = Y Q are new random variables whose distribution depends on Q in the same way as the distribution of X i 1 , X i 2 and Y i depend on i. Since M 1 and M 2 are independent, so are X 1 and X 2 and hence taking limits as n→∞ we have the converse R 1 < I(X 1 ;Y|X 2 ,Q), (3.83) R 2 < I(X 2 ,Y|X 1 ,Q), (3.84) R 1 +R 2 < I(X 1 ,X 2 ;Y|Q) (3.85) for some product distribution p(q)p(x 1 |q)p(x 2 |q)p(s)p(y|x 1 ,x 2 ,s). The cardinality bound for Q followsfromTheorem14.3.4in[11]oncewenoteLemma7. Wealsohavetheadditionalconstraint: X q∈Q X (x1,x2)∈X1×X2 p(q)p 1 (x 1 |q)p 2 (x 2 |q)d ∗ (x 1 ,x 2 )≤D (3.86) and hence we are done. The observations made in the point-to-point case described in Section 3.3 still hold for the MAC. In fact, this result can be extended to an arbitrary, K, number of transmitters from the two considered here. The result then generalizes to: Corollary 2. The capacity distortion region of the K-MAC-CSE (X 1 ×X 2 ×···×X K , p(y|x 1 ,x 2 ,...,x K ,s),Y,S,D) is the convex closure of all (R 1 ,R 2 ,...,R K ,Q) satisfying R(A)≤I(X(A);Y|X(A c ),Q) for allA⊆{1,2,...,K} (3.87) for some product distributionp(q)p 1 (x 1 )p 2 (x 2 )...p K (x K ) onQ×X 1 ×X 2 ×···×X K which satisfies: X (q,x1,x2,...,x K )∈Q×X1×X2×···×X K p(q)p 1 (x 1 |q)p 2 (x 2 |q)...p K (x K |q)d ∗ (x 1 ,x 2 ,...,x K )≤D (3.88) 55 ENC 1 DEC 2 RELAY PROCESSING m x 1 x 2 y 2 y 1 s 1 s 2 ˆ m d ˆ s 1,d ˆ s 2,d p(y 1 |x 1 ,s 1 ) p(y 2 |x 2 ,s 2 ) p(x 2 |y 1 ) Figure 3.10: A general two-hop relay network where the destination is interested in estimates of both the intermediate channels and the message transmitted. Proof. The ideas used for this proof are exactly those used for the proof of Theorem 4 and are omitted for brevity. We note that both Theorem 4 and Corollary 2 are similar to the classical MAC result – the key difference being the additional constraint that the estimation of S poses at the sources. Correspondingly, the capacity distortion regions for these cases are strictly inside the capacity regions of the corresponding unconstrained MAC. Further, for the 2-MAC-CSE, we can evaluate the capacity distortion region by removing those pentagons from the capacity region of the two user MAC which do not satisfy the additional distortion constraint. 3.6 LinearTwo-HopNetworkwithChannelStateEstimation We now present a two-hop network model formed by two links connected in tandem. Consider the network presented in Figure 3.10. The first encoder, ENC 1, represents the source node, and the last decoder, DEC 2, represents the destination node. The encoding and decoding functions are exactly as described for the point-to-point link. Note that we do not assume any particular estimate-and-forward processing at the relay node: its output, X 2 , may depend on its input Y 1 directly, and need not depend only on estimates of the message and the channel state of the first link. We use the notation ˆ A 1,b to denote the estimate of A 1 at node b. We are interested in the trade-off region: C(D 1 ,D 2 ) =sup R :R is achievable andD 1 ≤D 1 andD 2 ≤D 2 (3.89) 56 as we now have two distortion constraints to satisfy. For the rest of this section, we will often use C to mean C(D 1 ,D 2 ), the capacity of the network and will qualify any other capacity terms to eliminate confusion. We first present a few definitions and lemmas that we will use in the subsequent proof of Theorem 6 below. Definition 10. The rate-estimation error function for S 1 given a distortion bound, D, is defined as R S1 (D) = inf P(X2|Y1)∈P D I(Y 1 ; ˆ S 1,d (Y 2 )|X 1 ) (3.90) where, P D = {P(X 2 |Y 1 ) : E[d(S 1 , ˆ S ∗ 1,d (Y 2 |X 1 ))] ≤ D} and ˆ Z ∗ (X|Y) is the best estimate of Z given X conditioned on Y being known to both the transmitter and receiver: ˆ Z ∗ (X|Y) =arg min ˆ Z:X→Z Ed(Z, ˆ Z(X)|Y) (3.91) If P D is empty, we set R S1 (D) = ∞ to reflect the fact that the minimization problem is infeasible. Note that we are optimizing over the processing performed at the relay node while the encoder associated with the source node is fixed. Lemma 8. R S1 (D) is concave in D. Proof. The lemma follows via a timesharing argument (see [11] for example.) Say S 1:n 1,Q = h(Y 1:n 1 ) is a source coded version of the information about S 1 available at the relay node. If we are interested in developing an achievable rate, we need to minimize the lower of I(S 1 ;S 1,Q ) and I(S 1 ;S 1,Q |X 1 ). The following lemma is useful towards comparing these two terms. Lemma 9. There is no advantage to employing the information we have about X 1 at the relay node to reduce the source quantization rate for S 1 . 57 Proof. Consider, I(S 1 ;S 1,Q )−I(S 1 ;S 1,Q |X 1 ) = h(S 1 )−h(S 1 |S 1,Q )−h(S 1 |X 1 )+h(S 1 |X 1 ,S 1,Q ) (3.92) (a) = h(S 1 )−h(S 1 |S 1,Q )−h(S 1 )+h(S 1 |X 1 ,S 1,Q ) (3.93) = −I(S 1 ;X 1 |S 1,Q ) (b) ≤ 0, (3.94) where, (a) holds since S 1 and X 1 are assumed to be independent and (b) holds by the non- negativity of mutual information. Definition 11. The rate - quantization error function for S 1 given a distortion bound, D, is given by R(D) = inf P(S 1,Q |Y1)∈P D I(S 1 ;S 1,Q ) (3.95) where, P D ={P(S 1,Q |Y 1 ) :E[d(S 1 , ˆ S ∗ 1,Q (Y 2 ))]≤D}. WenotethattheminimizationinDefinition11isw.r.ttherelaytransitionfunctionP(S 1,Q |Y 1 ) and not w.r.t the input-output characterization of the channel as in the definition of the indirect rate-distortion function (see [18], for example). Lemma 10. H(M|Y 1:n 1 )≤H(M|Y 1:n 2 )≤n n . Proof. Since the destination can reconstruct the message M with vanishing probability of error, this result follows from Fano’s inequality (see [11] for example.) Note that Lemma 10 implies that both the relay and the destination can decode the message successfully. 3.6.1 An achievable rate We develop an achievable rate by exploiting the results of [62] which addresses the problem of efficient estimation of a noisy source (with respect to a distortion measure, d(.,.)) over a two-hop 58 network. Inthatcase, itisshownthattheoptimalprocessingattherelaynodecorrespondstothe best encoding/decoding schemes implied by a modified distortion measure ˆ d(.,.). The advantage of this reduction is that the design of the relay is now effectively decoupled and we no longer have to jointly optimize over the decoder-encoder pair at the relay node. Since we need not restrict our relay node to a decoder-encoder pair (this is just one of the possibilities for the relay node), the application of [62] to our problem only yields an achievable rate, in general. We now present an achievable end-to-end rate for the two-hop network. Theorem 5. The following rate is achievable on the two-hop network: R(D 1 ,D 2 ) = max P X 1 ∈P D 1 min{I(X 1 ;Y 1 ),C 2 (D 2 )−R(D 1 )} (3.96) where, C 2 is the point-to-point capacity distortion trade-off function for the second hop (as in Section 3.3), R(.) is the rate-quantization error function for S 1 and P D1 is given by: P D1 = ( P X1 : X x1∈X1 P X1 (x 1 )d ∗ (x 1 )≤D 1 ) . and d ∗ (x 1 ) = inf h:X×Y→S E[d(S 1 ,h(x 1 ,Y 1 ))]. (3.97) Proof. We first note that P X1 ∈ P D1 ensures that the distortion in the estimation of S 1 at the relaynodeisupperboundedbyD 1 foranyfeasiblerate. ThatR(D 1 ,D 2 ) =max P X 1 ∈P D 1 I(X 1 ;Y 1 ) is achievable over the first hop then follows by the achievability proof of the capacity distortion function for a single hop. For the second constraint, let us first design the input distribution for X 1 , such that the relay decodes the message at rate R(D 1 ,D 2 ), and quantizes the received Y 1:n 1 block to S 1:n 1,Q . The quantization rate is given by R(D 1 ) to achieve an end to end distortion of D 1 in the estimation of S 1 at the destination. Next, the relay picks the distribution for X 2 as the capacity-distortion optimal one (as discussed in Theorem 1 in Section 3.3), so that the second hop achieves a rate of C 2 (D 2 ). However, the message now is the combination of two: the original 59 DEC 1 SQ ENC 2 DEC 2 ENC 1 RELAY m p(y1 |x1,s1) p(y2 |x2,s2) x 1 y 1 x 2 y 2 ˆ m r s 1,Q ˆ m d ˆ s 1,d ˆ s 2,d s 1 s 2 Figure 3.11: Relay processing structure for an achievable rate. SQ is the noisy source quantizer for S 1 at the relay node. message, and the quantization of S 1 which takes up R(D 1 ). Since we are operating under the capacity of the second hop, we can decode these messages reliably (in the limit of large blocks) at the destination. We then have the message and an estimate of S 1 which meets the distortion constraint (namely S 1,Q ) at the destination and we are done. The structure of the processing implied by this lower bound is presented in Figure 3.11. Note the similarity of this structure to the one presented in Figure 1 of [62]. The only difference is that the generic encoder decoder pair at the relay node is replaced by a source coding block for the noisy information about S 1 available at the relay node, and a channel coding block which time multiplexes between transmitting this source coded version and the message over the second hop. 3.6.2 The converse Theorem 6. The following upper bound holds for the two-hop network: C(D 1 ,D 2 )≤ max P X 1 ∈P D 1 min I(X 1 ;Y 1 ),C 2 (D 2 )−R S1 (D 1 ) . (3.98) where, C 2 is the point-to-point capacity distortion trade-off function for the second hop ( as in Section 3.3), R S1 (.) is as defined in Equation (3.90) and P D1 is defined as in Theorem 5. Note that R S1 (D 1 ) implicitly depends on the P X1 selected to optimize the bound. 60 Remarks: We note that R(D)≤R S1 (D) because quantization is one of the permissible encod- ing strategies in the evaluation of R S1 (D). Proof. For a rate to be achievable, we should have a P X1 ∈ P D1 – else we sustain a distortion exceedingD 1 intheestimationofS 1 attherelaynodeandconsequently, theend-to-enddistortion constraint on S 1 is violated. Considering the link from the source to the relay node, we have: nC(D 1 ,D 2 ) = H(M) =H(X 1:n 1 ) (3.99) (a) ≤ H(X 1:n 1 )−H(X 1:n 1 |Y 1:n 1 )+n n (3.100) = I(X 1:n 1 ;Y 1:n 1 )+n n (3.101) = n X i=1 (H(Y i 1 |Y 1:i−1 1 )−H(Y i 1 |X 1:n 1 ,Y 1:i−1 1 )+n n (3.102) (b) ≤ n X i=1 (H(Y i 1 )−H(Y i 1 |X i 1 )+n n (3.103) = n X i=1 I(X i 1 ;Y i 1 )+n n (3.104) where (a) follows by Lemma 10 and (b) follows since removing conditioning can only increase entropy and by the memorylessness of the first channel. We now have the first term in the minimizationinEquation(3.98)onceweobservethatthepoint-to-pointcapacitydistortionregion is convex (see Section 3.3 for a proof). For the second constraint, consider the following chain of inequalities: I(M;Y 1:n 2 )−I(M;Y 1:n 2 |X 1:n 2 ) = H(Y 1:n 2 )−H(Y 1:n 2 |M)−H(Y 1:n 2 |X 1:n 2 )+H(Y 1:n 2 |X 1:n 2 ,M) = I(X 1:n 2 ;Y 1:n 2 )−I(X 1:n 2 ;Y 1:n 2 |M) (3.105) 61 which leads to I(M;Y 1:n 2 ) = I(M;Y 1:n 2 |X 1:n 2 )+I(X 1:n 2 ;Y 1:n 2 )−I(X 1:n 2 ;Y 1:n 2 |M) (3.106) (c) = I(X 1:n 2 ;Y 1:n 2 )−I(X 1:n 2 ;Y 1:n 2 |M) (3.107) = n X i=1 [H(Y i 2 |Y 1:i−1 2 )−H(Y i 2 |Y 1:i−1 2 ,X 1:n 2 )]−I(X 1:n 2 ;Y 1:n 2 |M) (3.108) (d) ≤ n X i=1 [H(Y i 2 )−H(Y i 2 |Y 1:i−1 2 ,X 1:n 2 )]−I(X 1:n 2 ;Y 1:n 2 |M) (3.109) (e) = n X i=1 [H(Y i 2 )−H(Y i 2 |X i 2 )]−I(X 1:n 2 ;Y 1:n 2 |M) (3.110) = n X i=1 I(X i 2 ;Y i 2 )−I(X 1:n 2 ;Y 1:n 2 |M) (3.111) (f) ≤ n X i=1 I(X i 2 ;Y i 2 )−I(X 1:n 2 ; ˆ S 1:n 1,d |M) (3.112) (g) ≤ n X i=1 I(X i 2 ;Y i 2 )−I(Y 1:n 1 ; ˆ S 1:n 1,d |M) (3.113) (h) ≤ n X i=1 I(X i 2 ;Y i 2 )−I(Y 1:n 1 ; ˆ S 1:n 1,d |X 1:n 1 )+nδ n (3.114) = n X i=1 I(X i 2 ;Y i 2 )− n X i=1 [H(Y i 1 |Y 1:i−1 1 ,X 1:n 1 )−H(Y i 1 | ˆ S 1:n 1,d ,Y 1:i−1 1 ,X 1:n 1 )]+nδ n (3.115) (i) = n X i=1 I(X i 2 ;Y i 2 )− n X i=1 [H(Y i 1 |X i 1 )−H(Y i 1 | ˆ S 1:n 1,d ,Y 1:i−1 1 ,X 1:n 1 )]+nδ n (3.116) (j) ≤ n X i=1 I(X i 2 ;Y i 2 )− n X i=1 [H(Y i 1 |X i 1 )−H(Y i 1 | ˆ S i 1,d ,X i 1 )]+nδ n (3.117) = n X i=1 I(X i 2 ;Y i 2 )− n X i=1 I(Y i 1 ; ˆ S i 1,d |X i 1 )+nδ n (3.118) (k) ≤ n X i=1 I(X i 2 ;Y i 2 )− n X i=1 R S1 (d(S i 1 , ˆ S i,∗ 1,d ))+n n +nδ n (3.119) where (c) holds since M → X 1:n 2 → Y 1:n 2 is a Markov chain, (d) holds since the removal of conditioning can only increase entropy, (e) holds by the memorylessness of the second hop signal model Y 2 . Note that given X i 2 all the other conditioning is unnecessary, (f) holds by the data 62 processinginequalitysinceX 2 →Y 2 → ˆ S 1,d formsaMarkovchain,(g)holdsbythedataprocessing inequality again since Y 1 → X 2 → ˆ S 1,d forms a Markov chain, (h) holds by Lemma 10. Since all the nodes in the network can decode the message in probability (for large n), and since the codebooks are made known to all nodes before communication starts, each node has X 1:n 1 . Note that this holds in the limit of large n. So, for finite n, let us say that we make an error of nδ n . By Lemma 10, δ n → 0 as n→∞, (i) holds since the first channel is memoryless, (j) holds since conditioning can only reduce entropy and (k) holds by the definition of the rate-estimation error function for S 1 as defined in Equation (3.90). Finally, we have the following set of inequalities as a consequence, C(D 1 ,D 2 ) (l) ≤ 1 n n X i=1 I(X i 2 ;Y i 2 )− 1 n n X i=1 R S1 (d(S i 1 , ˆ S i,∗ 1,d ))+ n +δ n (3.120) (m) ≤ 1 n n X i=1 I(X i 2 ;Y i 2 )−R S1 ( 1 n n X i=1 d(S i 1 , ˆ S i,∗ 1,d ))+ n +δ n (3.121) (n) ≤ 1 n n X i=1 I(X i 2 ;Y i 2 )−R S1 (D 1 )+ n +δ n (3.122) where, (l) holds by Fano’s inequality as in Lemma 10, (m) holds by the concavity of the rate- estimation error function which is established in Lemma 8 and (n) holds by the definition of the averaged estimation error. Now, since we can maximize the bound in (k) over all encoders, decoders and estimators for S 2 at the destination, we can apply Theorem 1 from Section 3.3 to upper bound the first term in (k): 1 n n X i=1 I(X i 2 ;Y i 2 )≤C 2 (D 2 ) (3.123) We now take the limit n→∞ and the theorem is established. 63 Again, we note that Theorem 6 presents a coupled optimization problem – the source distri- bution P X1 affects both I(X 1 ;Y 1 ) and R S1 (D 1 ). 3.6.3 Information Losslessness Definition 12. A two-hop network is said to be information lossless if R S1 (D) =R(D)∀D≥ 0. Comparing Theorems 5 and 6, we note that for an information lossless network, Theorem 6 (or Theorem 5) provides the capacity distortion function for the network. Further, note that we are assured of information losslessness if we can show R S1 (D)≥ R(D)∀D ≥ 0, since R S1 (D)≤ R(D) ∀D ≥ 0 (see the remark following Theorem 6). In the particular case when we have an undistorted version of the channel state, the following theorem shows that we always have information losslessness by establishing this inequality. Theorem 7. If S 1 = f(Y 1 ,X 1 ), where f(.,.) is a known deterministic function, then we have information losslessness and the capacity of the two-hop network is given by Equation (3.96). Proof. Expanding I(Y 1 ;S 1 , ˆ S 1,d |X 1 ) in two ways, we have, I(Y 1 ;S 1 |X 1 )+I(Y 1 ; ˆ S 1,d |S 1 ,X 1 ) =I(Y 1 ; ˆ S 1,d |X 1 )+I(Y 1 ;S 1 |X 1 , ˆ S 1,d ). (3.124) Let us now analyze a few particular terms in Equation (3.124). I(Y 1 ;S 1 |X 1 ) =h(S 1 |X 1 )−h(S 1 |X 1 ,Y 1 ) (3.125) (a) = h(S 1 |X 1 ) (3.126) 64 where, (a) holds since S 1 =f(X 1 ,Y 1 ). Further, I(Y 1 ;S 1 |X 1 , ˆ S 1,d ) =h(S 1 |X 1 , ˆ S 1,d )−h(S 1 |X 1 , ˆ S 1,d ,Y 1 ) (3.127) (b) = h(S 1 |X 1 , ˆ S 1,d ) (3.128) where (b) again holds since S 1 =f(X 1 ,Y 1 ). Substituting these results into Equation (3.124), and after rearranging terms, we get I(Y 1 ; ˆ S 1,d |X 1 ) =I(S 1 ; ˆ S 1,d |X 1 )+I(Y 1 ; ˆ S 1,d |X 1 ,S 1 ) (3.129) (c) ≥ I(S 1 ; ˆ S 1,d |X 1 ) (3.130) =h(S 1 |X 1 )−h(S 1 |X 1 , ˆ S 1,d ) (3.131) (d) = h(S 1 )−h(S 1 |X 1 , ˆ S 1,d ) (3.132) =h(S 1 )−h(S 1 | ˆ S 1,d )+h(S 1 | ˆ S 1,d )−h(S 1 |X 1 , ˆ S 1,d ) (3.133) =I(S 1 ; ˆ S 1,d )+I(S 1 ;X 1 | ˆ S 1,d ) (3.134) (e) ≥ I(S 1 ; ˆ S 1,d ) (3.135) where, (c) holds by the non-negativity of mutual information and (d) holds since S 1 and X 1 are independent and (e) holds by the non-negativity of mutual information. We are done once we note that if we employ quantize-and-forward processing at the relay, we can achieve exactly the rate presented in Equation (3.135) and in view of the remarks following Theorem 6. This result is similar to that presented in [32] and speaks to the possible relation between the problem studied here and in that chapter. Finally, we observe that the achievable rate presented in Theorem 5 can be generalized to any number of hops in a linear network. However, while the converse presented in Theorem 6 is generalizable to an arbitrary number of hops, it gets harder to evaluate R Si as i grows large. 65 3.7 Illustrative Examples for the Two-Hop Case Wenowapplyourresultstotwoparticularchannelmodels. Notethattheinformationlosslessness for two of the cases considered implies that we calculate the capacity distortion function for these two-hop networks. 3.7.1 Scalar Additive Channels Consider the signal model Y k i =X k i +S k i +Z k i , i = 1,2 (3.136) where k is the time index and i = 1 corresponds to the first point-to-point link and i = 2 to the second. Further, X i = Y i = S i = Z i = C, S i ∼ CN(0,σ 2 si ), Z i ∼ CN(0,σ 2 zi ) and S i and Z i are i.i.d distributed, for i = 1,2. The distortion measure used to evaluate our estimates at the destination is the L 2 norm: d(S i , ˆ S i,d ) =E||S i − ˆ S i,d || 2 . Finally, let us impose a power constraint for the transmitter and the relay node: E||X i || 2 ≤P i ,i = 1,2. Let us first analyze the case when σ 2 zi = 0,i = 1,2. For this network, we can have perfect information about S 1 at the relay node (that is ˆ S 1,r = S 1 – see Section 3.4.1) and S 2 at the destination node ( ˆ S 2,d = S 2 ) as long as the rate constraints are met. The network is then information lossless (trivially, by Theorem 5, since S 1 itself is available at the relay node). We then have the following expressions for the capacity distortion function for the network (from Theorem 7): C = min log 1+ P 1 σ 2 s1 ,log 1+ P 2 σ 2 s2 −log σ 2 s1 D 1 (3.137) where we have substituted the Gaussian rate-distortion function for R S1 assuming that we are willing to tolerate a maximum distortion of D 1 in the estimation of S 1 at the destination node. Consider a particular example where P i = 10 and σ 2 s1 = σ 2 s2 = σ 2 = 1. Figure 3.12 presents the capacity distortion function in Equation (3.137) as a function of the distortion bound onS 1 . Note 66 Figure3.12: Capacitydistortionfunctionforthetwo-hopadditivechannelmodelwithσ zi = 0i = 1,2. Results for σ 2 = 1,2,3,4 are shown. that the capacity distortion function is a non-decreasing function ofD. Further, as in the channel models with non-binary inputs considered in Section 3.4, the rate stays zero until D 1 crosses a certain threshold. We can evaluate this threshold explicitly as D threshold = σ 2 1 1+ P 2 σ 2 2 . We then have, C = log D D threshold ⇒ dC dD C=0 = 1 D threshold (3.138) when D threshold ≤ D ≤ σ 2 . In contrast to the point-to-point case discussed in Section 3.3, we note that as σ 2 increases in Figure 3.12, capacity decreases. This is because the AWGN capacity term decreases with increasing σ 2 as does the rate-distortion term in Equation (3.137). 67 When σ 2 zi > 0,i = 1,2, the analysis becomes more involved. We first consider the upper bound presented in Theorem 6. To this end, we obtain a lower bound to the rate-estimation error function following Definition 10 : I(Y 1 ; ˆ S 1,d |X 1 ) = h(Y 1 |X 1 )−h(Y 1 | ˆ S 1,d ,X 1 ) (3.139) (a) = h(S 1 +Z 1 )−h(S 1 +Z 1 | ˆ S 1,d ) (3.140) = log(2πe(σ 2 s1 +σ 2 z1 ))−h(S 1 +Z 1 | ˆ S 1,d ) (3.141) (b) ≥ log(2πe(σ 2 s1 +σ 2 z1 ))−h(S 1 − ˆ S 1,d +Z 1 ) (3.142) (c) ≥ log(2πe(σ 2 s1 +σ 2 z1 ))−log(2πe(D 1 +σ 2 z1 +2σ z1 p D 1 )) (3.143) = log σ 2 s1 +σ 2 z1 D 1 +σ 2 z1 +2σ z1 √ D 1 ! (3.144) where (a) holds since S 1 and Z 1 are independent of X 1 , (b) holds because entropy is translation invariant and since we can only increase uncertainty by removing conditioning, (c) holds once we factor in the distortion constraint in the estimation of S 1 at the destination and bound under the assumption that the noise Z 1 and the errors (S 1 − ˆ S 1,d ) are perfectly correlated Gaussian random variables to majorize the entropy of their sum. We next evaluate a lower bound on the achievable rate presented in Theorem 5. Towards this end, we obtain an upper bound on the rate-quantization error function for S 1 : I(S 1 ;S 1,Q ) = h(S 1 )−h(S 1 |S 1,Q ) (3.145) = log(2πeσ 2 s1 )−h(S 1 |S 1,Q ) (3.146) (a) ≥ log(2πeσ 2 s1 )−log(2πeD 1 ) (3.147) = log σ 2 s1 D 1 (3.148) 68 where (a) holds because of the distortion constraint: E||S 1,Q −S 1 || 2 ≤D 1 and since the Gaussian distribution maximizes differential entropy for a given variance. While Inequality (3.148) presents only a lower bound on the rate-quantization error function for S 1 , this lower bound is actually achievable. Towards this end, we define S 1,MMSE to be the MMSE estimate of S 1 at the relay node: S 1,MMSE = E[S 1 |Y 1 ,X 1 ], since we assume that we have X 1 available in probability at the relay node. Due to the assumption of Gaussian distributions, we can write S 1,MMSE = α(Y 1 −X 1 ) = α(S 1 +Z 1 ), where, α = σ 2 s 1 σ 2 s 1 +σ 2 z 1 and then, the corresponding mean squared error is σ 2 s 1 σ 2 z 1 σ 2 s 1 +σ 2 z 1 . For an achievable rate, consider a scheme where we quantize S 1,MMSE to form S 1,Q , the quantized version of S 1 that we transmit from the relay node to the destination. As [63] shows, this scheme achieves the optimal rate-distortion trade-off presented in Inequality (3.148) with equality as long as D 1 ≥ σ 2 s 1 σ 2 s 1 +σ 2 z 1 . Further, for any D 1 below this threshold, the end-to-end capacity is identically zero. Combining the results in Inequalities (3.144) and (3.148), we have: min C 1 (D1),C 2 (D2)−log „ σ 2 s 1 D1 «ff ≤C≤ min C 1 (D1),C 2 (D2)−log „ σ 2 s 1 +σ 2 z 1 D1 +σ 2 z 1 +2σz 1 √ D1 «ff (3.149) where C i (D i ) = log 1+ Pi σ 2 s i +σ 2 z i when D i ≥ σ 2 s i σ 2 z i σ 2 s i +σ 2 z i and is 0 otherwise for i = 1,2. This is since we can use P X1 to optimize the first term in Theorem 6 and Theorem 5 alone since we have bounded the other terms in these theorems by terms which no longer depend on P X1 . These boundsarepresentedinFigure3.13. Notethatasσ 2 z1 → 0, weachievetheearlierresultleadingto information losslessness. Since σ 2 s1 = 5 in Figure 3.13, the two bounds coincide for D > 5 and are equal to the capacity of the channel with no channel estimation error constraint. Further, the two bounds have different x-intercepts as governed by their corresponding expressions as governed by the result in Equation (3.149). 69 Figure 3.13: Bounds on capacity distortion function for additive signal models with noise. We assume σ 2 s1 =σ 2 s2 = 5 and σ 2 z1 =σ 2 z2 = 0.1. 3.7.2 Scalar Multiplicative Channels We now consider the signal model: Y k i =X k i S k i +Z k i , i = 1,2 (3.150) where k is the time index and i = 1 corresponds to the first point-to-point link and i = 2 to the second. Again, X i ={0,1}, Y i =S i =Z i = C, S i ∼CN(0,σ 2 si ), Z i ∼CN(0,σ 2 zi ) and S i , Z i are i.i.d distributed for i = 1,2. The distortion measure used to evaluate our estimates at the destination is the L 2 norm: d(S i , ˆ S i,d ) =E||S i − ˆ S i,d || 2 . Let us first consider the case when σ 2 zi = 0, i = 1,2. An application of Theorem 6 yields: C ≤ I(X 1 ;Y 1 ), (3.151) C ≤ C 2 (D 2 )−R S1 (D 1 ) (3.152) 70 where P X1 is chosen so that Ed( ˆ S 1,d ,S 1 )≤ D 1 . We next evaluate a bound on R S1 (D 1 ). Let us set P(X 1 = 1)=p. Consider: I(Y 1 ; ˆ S 1,d (Y 1 )|X 1 ) = h(Y 1 |X 1 )−h(Y 1 |X 1 , ˆ S 1,d ) (3.153) = ph(S 1 )−h(X 1 ˆ S 1,d +X 1 (S 1 − ˆ S 1,d )|X 1 , ˆ S 1,d ) (3.154) (a) = plog(2πeσ 2 1 )−h(X 1 (S 1 − ˆ S 1,d )|X 1 , ˆ S 1,d ) (3.155) (b) = plog(2πeσ 2 1 )−h(X 1 (S 1 − ˆ S 1,d )|X 1 ) (3.156) = plog(2πeσ 2 1 )−ph(S 1 − ˆ S 1,d |X 1 = 1) (3.157) (c) ≥ plog(2πeσ 2 1 )−plog(2πeD 11 ) (3.158) ≥ plog σ 2 1 D 11 (3.159) where (a) holds since differential entropy is translation invariant, (b) holds by the orthogonality principle (as in [8] for example) for the estimation of S 1 at the destination and (c) holds because the Gaussian distribution maximizes entropy for a given variance and D 11 is given by D 11 = D1−(1−p)σ 2 s 1 p + where (x) + = x if x ≥ 0 and is zero otherwise. Note that if D1−(1−p)σ 2 s 1 p is negative (or zero), the distortion constraint D 1 can not be met given P(X 1 = 1) = p which implies that this source distribution can be excluded when we form the capacity distortion region. So, we can safely pick D 11 to meet the distortion constraint at the destination. Equation (3.152) then becomes: C ≤C 2 (D 2 )−plog σ 2 1 D 11 (3.160) Finally, we have the bound, C ≤ max 0≤p≤1 min H 2 (p),C 2 (D 2 )−plog σ 2 1 D 11 . (3.161) 71 Note that C 2 (D 2 ) is given by Section 3.4.2 substituting for the appropriate values. Now, we evaluate the achievable rate that is presented in Theorem 5. Towards this end, consider, I(S 1 ;S 1,Q |X 1 ) = pI(S 1 ;S 1,Q |X 1 = 1) (3.162) which leads toI(S 1 ;S 1,Q |X 1 ) =plog σ 2 1 D11 since, whenX 1 = 1, we haveS 1 available at the relay node and we need to quantize this to achieve a distortion ofD 11 at the destination. By Inequality (3.94), we then have an information lossless network. Figure 3.14 presents the network capacity distortion function as a function of D 1 when we set σ 2 1 = σ 2 2 = σ 2 and when D 2 is held fixed at σ 2 2 . The results are presented in Figure 3.14. Again note that the capacity distortion function is a non-decreasing function of the distortion bound like all our other results. Comparing this set of results to those presented in Figure 3.6, we note the loss we take due to the additional hop and other concomitant constraints. When σ 2 zi > 0, i = 1,2, we have: I(Y 1 ; ˆ S 1,d (Y 1 )|X 1 ) = h(Y 1 |X 1 )−h(Y 1 |X 1 , ˆ S 1,d ) (3.163) = plog(2πe(σ 2 s1 +σ 2 z1 ))+(1−p)log(2πeσ 2 z1 )−h(Y 1 |X 1 , ˆ S 1,d )(3.164) by the definition of conditional entropy. Now, h(Y 1 |X 1 , ˆ S 1,d ) = ph(Y 1 |X 1 = 1, ˆ S 1,d )+(1−p)h(Y 1 |X 1 = 0, ˆ S 1,d ) (3.165) (a) = h(Y 1 |X 1 = 1, ˆ S 1,d )+(1−p)h(Y 1 |X 1 = 0) (3.166) = ph(S 1 +Z 1 |X 1 = 1, ˆ S 1,d )+(1−p)h(Z 1 ) (3.167) (b) ≤ plog(2πe(D 11 +σ 2 z1 +2σ z1 p D 11 ))+(1−p)log(2πeσ 2 z1 ) (3.168) 72 Figure 3.14: Capacity distortion region for the two-hop multiplicative channel model Y k =S k X k . (The curves for σ 2 s = 6,7,8,10,12,15 are displayed.) where (a) holds because when X 1 = 0, we have no information to estimate S 1 from and (b) holds once we factor in the distortion constraint and upper bound under the assumption that the errors (S 1 − ˆ S 1,d ) and the noise Z 1 are perfectly correlated Gaussian random variables since this is what majorizes the entropy of the sum of these random variables. Substituting for h(Y 1 |X 1 , ˆ S 1,d ) in Equation (3.164) leads to the bound: R S1 (D 1 )≥plog σ 2 s1 +σ 2 z1 D 11 +σ 2 z1 +2σ z1 √ D 11 ! . (3.169) Further, I(X 1 ;Y 1 ) = h(Y 1 )−h(Y 1 |X 1 ) (3.170) = h(Y 1 )−plog(2πe(σ 2 s1 +σ 2 z1 ))−(1−p)log(2πeσ 2 z1 ). (3.171) 73 Note that since Y 1 is a Gaussian mixture, h(Y 1 ) needs to be evaluated numerically as in Section 3.4.3. Therefore, by an application of Theorem 6, C(D 1 ,D 2 ) ≤ max 0≤p≤1 min h(Y 1 )−plog(2πe(σ 2 s1 +σ 2 z1 ))−(1−p)log(2πeσ 2 z1 ), C 2 (D 2 )−plog σ 2 s1 +σ 2 z1 D 11 +σ 2 z1 +2σ z1 √ D 11 !) (3.172) For the achievable rate given in Theorem 5, let us again consider a scheme where S 1,Q is a quantized version of S 1,MMSE , the MMSE estimate of S 1 at the relay node (and hence, S 1 → S 1,MMSE →S 1,Q . When X 1 = 0, we have no information about S 1 at the relay node to estimate it from and hence S 1,MMSE = S 1,Q = 0. This translates to no information about S 1,Q being transmittedoverthesecondhopwhenX 1 = 0. Further, inthis case, themeansquaredestimation error is σ 2 s1 . When X 1 = 1 however, we have Y 1 = S 1 +Z 1 , a noisy version of S 1 to estimate it from. In this case, ˆ S 1,MMSE = αY 1 where α = σ 2 s 1 σ 2 s 1 +σ 2 z 1 and the mean squared error in the estimation of S 1 at the relay node is σ 2 s 1 σ 2 z 1 σ 2 s 1 +σ 2 z 1 . Now consider, I(S 1 ;S 1,Q ) (a) = pI(S 1 ;S 1,Q |X 1 = 1) (3.173) (b) = plog σ 2 s1 D 11 (3.174) where, (a) holds because we do not transmit information about S 1,Q over the second hop when X 1 = 0 (as discussed above) and (b) holds by a calculation exactly similar to that required to achieve Inequality (3.148). Again, we have equality in (b) because of the achievability part of the proof in [63]. So, by Theorem 5, we have the following lower bound on the end to end capacity distortion function for the network: C(D 1 ,D 2 ) ≥ max 0≤p≤1 min H(Y 1 )−plog(2πe(σ 2 s1 +σ 2 z1 ))−(1−p)log(2πeσ 2 z1 ), C 2 (D 2 )−plog σ 2 s1 D 11 . (3.175) 74 Figure 3.15: Upper bound on the capacity distortion function for the two-hop fading channel model. (The curves for σ 2 s1 =σ 2 s2 = 0.5,1,2,4,8 while σ 2 z1 =σ 2 z2 = 0.1 are shown here.) From Equations (3.172) and (3.175), we observe that when σ 2 z1 = 0 we have information losslessness as noted in the earlier part of this section. These bounds are presented in Figures 3.15and3.16. Comparing these bounds to the single hopresultpresentedin3.4.3, we see thatthe two-hopnetworkcannotsustainanycommunicationuntilthedistortionconstraintisconsiderably more relaxed than in the single hop case. However, when the distortion constraint is lax enough, the rates for the two cases are equal to the unconstrained capacity of the point-to-point link. 3.8 Conclusions We consider the problem of communicating over a time-varying channel while also estimating the channel state. A constrained communication formulation of the joint communication and channel estimation problem is presented. We present the capacity distortion region for the point-to-point network and the multiple access channel and upper and lower bounds on the capacity distortion 75 Figure 3.16: Lower bound on the capacity distortion function for the two-hop fading channel model. (The curves for σ 2 s1 =σ 2 s2 = 0.5,1,2,4,8 while σ 2 z1 =σ 2 z2 = 0.1 are shown here.) function for the network for the two-hop case. We also discuss a particular condition under which we can evaluate the capacity distortion function for the network. The results are explained using examples derived from practical signal models. For the point-to-point problem, we establish that the problem can be reduced to a constrained communication problem – a distortion constraint on the estimation of the channel at the destina- tion can be converted to a constraint on the distribution of symbols at the source. Further, the communication scheme implied is one where we first communicate the message non-coherently to the destination and then use the decoded message as training to estimate the channel parameters. This idea is extended to the MAC where a distortion constraint on the estimation of the channel at the destination is now reduced to a coupled constraint on source distributions of the source encoders. Finally, for the two-hop network, it is shown that quantizing the information available at the relay node about the first channel is, in general, a suboptimal approach and that a joint processing at the relay node is the optimal scheme to employ. 76 Future work includes extensions to other topologies – in particular, the broadcast channel, the interference channel and the general relay channel. These cases are more involved since destinations no longer have all the information that is transmitted over the network, thus, the encoding and decoding processes are necessarily more complicated. Establishing a necessary and sufficient condition for information losslessness is another avenue for research. While noisy source coding has been explored (see [18] for example), a result similar to that presented in [24] seems hard to establish due to the absence of closed form expressions for noisy source and channel coding. Finally, we are interested in examples involving more complicated, yet realistic, channel models to explore the applicability of the framework developed. 77 Chapter 4 A Multi-Hop Problem The linear network is the basic building block of many network topologies. It is also the topology wherethetrade-offsbetweencommunicationandestimationaremostobvioussincetheamountof communication between nodes effects the estimation performance for all the previous nodes. As we willshowinthischapter, thefundamentalperformanceofthe jointcommunication/estimation schemes in this topology can still be optimal (if we employ the amplify-and-forward protocol.) 4.1 Introduction In this chapter, we propose and analyze a class of sensor networks where the parameters to be sensed are the time-varying inter-node communication channels. As the communication channel is integral to the communication network itself, we denote this class of sensing as intrinsic. In cognitive radio networks, the spectrum manager is required to build a “radio map” of spectral usage [21] or the network uses an estimate of the interference topology [20], operations which are akin to measuring the inter-node channels. Other underwater applications from active sonar [69] to tsunami detection [40] using underwater acoustic communication networks also require information about the communication channel which will enable inference about detected objects (sonar) or local changes in channel conditions which can predict large scale changes (seismic). Consider the problem of using an underwater sensor network to detect the presence of submarines 78 in a certain location (similar to the setup in [69] Section III). The presence or absence of a submarine is inferred from the inter-node communication channels – but this information also needs to be relayed to the destination node over the network and is one of the scenarios where intrinsic sensing is relevant. Here, we propose, analyze, and compare two classes of protocols for intrinsic sensing in multi-hop networks. Our current work differs from extrinsic sensor network problems such as [13,39,55] where the parameters of interest are external to the communication network. In those prior works, the resource of contention between sensing and communication is typically power, while, for our scenario, time (or bandwidth) is the resource of contention. To appreciate the tradeoffs relevant to multihop intrinsic sensing, consider the linear topology of Figure 1.1 where there are n+1 nodes and n inter-node channel states. Thus a relaying node, i, must provide signaling to facilitate the sensing of channel i and must simultaneously forward “estimates” of all prior channel states (1,2,··· ,i− 1). At first glance, this appears to be the classical problem of sharing resources between training and communication to maximize capacity. However, in contrast to [28] (which explicitly treats the training issue) or [67] (which implicitly treats channel estimation through the examination of non-coherent communication) where the objective is the reliable transmission of bits, our goal is to optimize the end-to-end distortion. Furthermore, our metric is not a per-hop metric but end-to-end distortion with the goal of better estimation at the destination. Data aggregation and forwarding is integral to many sensor network problems (e.g. [17]); however, again, we underscore that most prior work focuses on extrinsic parameters, that is, parameters which do not influence the communication network itself. A work of more direct connection to our current problem is the single hop problem of [54] which examines the com- munication of channel state and data to the destination. However, in sharp contrast to current problemframework, [54]presumestheavailabilityofchannelstateinformationatthetransmitter. Furthermore we explicitly examine multi-hop networks. Finally we observe that there has been 79 significantworkoncooperativecommunicationforsensornetworks(seee.g. [25,34]); howeverthis work is focused on two-hop networks and ignores the impact of sensing/estimation altogether. We analyze two standard classes of protocols, denoted encode-and-forward and amplify-and- forward, and compare their end-to-end mean-squared error distortion. 1 Additionally, we examine two different topologies as building blocks for more complex and practical networks: the lin- ear topology previously discussed and a two-hop tree (many-to-one) topology also depicted in Figure 1.1. Ourcontributionsaresummarizedasfollows. Wefirstdefineameasure, theend-to-enddistor- tionexponent, whichisasymptoticinSNR,andstraightforwardlyshowthatthelargestdistortion exponent possible is unity. We provide upper and lower bounds on the end-to-end distortion for the encode-and-forward and amplify-and-forward protocols for the linear network and extend them to the many-to-one network. The encode-and-forward results judiciously exploit results from the classic work [63] on noisy communication over noisy channels. The amplify-and-forward protocol converts the problem to one of pure estimation and thus variants of the Cramer-Rao bounds on estimation variance [7] are relevant. We show that a non-trivial upper bound on the distortion exponent is unity for amplify-and-forward, and provide an achievable channel estima- tion scheme which achieves the optimal distortion exponent of one. We then propose a practical channel estimator which is based on the estimation of the multihop product channel. A two hop product channel estimator also based on linear minimum mean squared error (MMSE) estimation principles is examined in [42]. A key distinction is that we require the individual channel esti- mates and thus, further processing is necessary. We also note that a relatively straightforward channel estimation scheme works for our purposes. For encode-and-forward, it can be shown that the distortion exponent is strictly less than one for a network with more than one hop . The performance advantage of amplify-and-forward can be attributed to the fact that soft information is preserved and explicit communication is not necessary. In contrast, encode-and-forward suffers 1 These protocols are inspired by well-known protocols for data forwarding in relay channels [9,34]. 80 from error propagation since we form hard estimates at each hop because of the finite rate pro- cessing constraint for each of the intermediate nodes which is a penalty for sustaining inter-node communication. This result mimics that of [26] for the pure communication problem where it is established that encode-and-forward might be suboptimal from a network capacity point of view. However, optimal asymptotic performance is not a guarantee for optimal finite SNR performance. Thus for moderate SNR, encode-and-forward can outperform amplify-and-forward with respect to end-to-end distortion. For many-to-one networks, we determine an optimal first hop protocol which minimizes the sum distortion at the relay. This resultant protocol is an orthogonal time-division multiplexing scheme and is applicable to both protocols. Given the orthogonality of the first hop, the encode- and-forward many-to-one protocol yields performance which is equivalent to that of multiple two-hop networks for the case of all channels having the same quality (SNR). Numerical results bolster intuition which suggests that given a set of nodes, the overlaid network topology should be as hierarchical as possible. Too many hops increase distortion due to the additional processing involvedateachintermediatenodefortheencode-and-forwardprotocol. Weobservethatgiventhe orthogonalcommunicationschemes, ourresultssuggestresourcedistributionpoliciesfornetworks which employ collision avoidance as is often suggested for dense, large scale networks. It is interesting to note that the results presented in this chapter apply to networks of finite size – we neither require the network size to be infinite for our results to hold, nor restrict the network to the regular three/four node networks studied in classical information theory. To put our results in perspective, we first compare our results for intrinsic sensing to recent results in extrinsic sensing [55] and pure communication [11]. We observe that extrinsic sensing, with no additional communication, does not appear to be constrained by the topology of the network over which sensing occurs while pure communication and intrinsic encode-and-forward experience bottlenecks due to the quality of individual communication links within the overall topology. 81 The rest of this chapter is organized as follows. Section 4.2 introduces the formulation of the problemasaminimizationanddefinesthedifferentcommunicationprotocols: encode-and-forward and amplify-and-forward. Upper and lower bounds on the end-to-end distortion and associated distortion diversities for encode-and-forward are presented in Section 4.3 for the linear network. Section 4.4 provides a commensurate analysis for the amplify-and-forward protocol. Section 4.5 extendstheresultspresentedforlinearnetworkstothemany-to-onetopology. Section4.6provides numerical results. Comparisons of the sensing schemes proposed here with pure communication and sensing schemes where parameters to be sensed are extrinsic to the communication network appear in Section 4.7. Final conclusions are presented in Section 4.8. 4.2 SignalModelandProblemDefinition: LinearNetwork Consider the n-node linear network in Figure 1.1. At each node, i, Y i is the received message and X i is the corresponding transmitted message. The frequency flat fading coefficient for the channel between node i−1 and i is denoted by h i ; thus, for example, the frequency flat fading coefficient for the channel between the source (node 0) and node 1 is h 1 . We will use upper case (like Y and X) to denote the random variables and lower case (like y and x) for particular realizations of these random variables. Further, we use ˆ h j i to denote a reconstruction of h i at the j-th node 2 . We will assume a sufficiently large sampling rate (which is the same for each of the nodes) and that all time intervals are integer multiples of the sample time. Let T(∈ Z) be the coherence interval of the channel (post sampling). We will also use the notation Y i and X i to denote the discrete time versions of the received and transmitted message at node i. Note that 2 We use ˆ α to denote an estimate of a parameter α at a subsequent node 82 size(Y i ) = size(X i ) = T ∀ i. The following equations describe the signal model for the n-hop linear network - all values not displayed are assumed to be zero, X S (l) =X 0 (l) = p P 1 , l∈I 1 (4.1) Y 1 (l) = p P 1 h 1 +Z 1 , l∈I 1 (4.2) X j (l) = p P j+1 β k j a k j (Y j ,l), l∈I k j ,k = 1,...,j (4.3) Y j+1 (l) = p P j+1 β k j a k j (Y j ,l)h j+1 +Z j+1 (l), l∈I k j ,k = 1,...,j, (4.4) where Y D =Y n and I k j is the time interval over which we transmit information about h k in hop j. Further, we impose the constraints that P k |I k j | = T j ∀j and P j,k |I k j | = T, where T j is the time for which channel j is active. The channel gains are standard Gaussian random variables, h j ∼ N(0,1) and the additive channel noises (or thermal noise at the receivers) are modeled as Gaussian random variables as well, Z j (l) ∼ N(0,σ 2 j ). The channel coefficients and noises are assumed to be mutually independent. The channels are further assumed to have a common coherence interval of T seconds, such that all channels change synchronously to a new realization every T seconds and, finally, for simplicity, we make the assumption that communication in the network is time orthogonal, i.e when one node is transmitting, every other node is silent. Boththeencode-and-forwardschemeandtheamplify-and-forwardschemecanbedescribedby the signal model described here by choosing the functions a k j appropriately. For the encode-and- forward scheme, the functions a k j in Equation (4.4) are restricted to be such that the processing at each individual node is digital (i.e. both the input and the output alphabet for each of these functions is of finite size). This proves to be a crucial difference between the amplify-and-forward and the encode-and-forward cases as we will see in the subsequent sections. For the amplify-and- forward scheme, these functions are the statistics used for the estimation of h k from Y j (this will be made more precise later). The normalization constants β k j are chosen to make the total transmit power equal to P j+1 in each interval I k j for each of these protocols. 83 Theprocessingateachoftheindividualnodesfortheamplify-and-forwardschemeisrelatively straightforward and will be discussed in Section 4.4. We now discuss the encode-and-forward scheme. The encode-and-forward protocol is defined as follows : 1. Initialization for node 1: ForthefirstN coherenceintervals(NT seconds), onlythechannel between source and node 1, h 1 , is measured. For each measurement, ˆ h 1 1,i , the channel measurement probe is sent for T 1 ≤T units of time. During this phase, the second channel is not measured. (We note that this initialization is only one of the many possible, each yielding the same results.) 2. Initialization for each subsequent node: TheN×1 vector of measurements, [ ˆ h 1 1,1 ,..., ˆ h 1 1,N ] T of the channel between the source and node 1, (denoted b H 1 1 ) is encoded as a 1 ( b H 1 1 ) (a k 1 = a 1 ∀k) and transmitted in each coherence interval for T 2 units of time when node 1 is in transmission mode. At the end of N such transmissions, node 2 forms two estimates: b H 2 2 , which is the vector of estimates of channel between node 1 and node 2 and b H 2 1 , which is the vector of estimates for the channel between the source and node 1. Continuing in this fashion, we finally reach the destination which forms a set of estimates for each of the previous channels. Our goal is for the destination to produce estimates of every channel between the source and the destination such that a distortion measure is minimized. The distortion of interest between the estimate and the actual channel is the end-to-end mean-squared error, D i =Ekh i − ˆ h d i k 2 (4.5) where ˆ h d i is the reconstruction of h i at the destination and the expectation is taken over both the noise and the channel variables. The feasible distortion region D is then described as all those (D 1 ,D 2 ,...,D n ) n-tuples which can be simultaneously achieved. Of interest is investigating the 84 achievable distortion exponent for the intrinsic sensing problem. We shall define the distortion exponent as the exponent on the decay rate of the end-to-end distortion as a function of the signal-to-noiseratio(SNR).Thatis,weachievedistortionexponentd i forestimatesofthechannel impulse response h i if lim ∀i SNRi=SNR,SNR→∞ D i = O SNR −di (4.6) where SNR i := Pi σ 2 i is the SNR for the i-th channel. Consider the single hop system, Y 1 (l) = p P 1 h 1 +Z 1 (l), 1≤l≤T 1 (4.7) when all resources are allocated towards the estimation of h 1 , and none towards communication, the MMSE estimator is linear and achieves the mean-squared error D 1 = 1 SNR1T1+1 given a total transmission time of T 1 . For the single hop problem, the distortion exponent is unity, i.e. d 1 = 1. Further, since an increase in the number of hops introduces more channel parameters to be estimated given the same time and SNR constraints as a single hop system, we note that d i ≤ 1 ∀ i. Forencode-and-forward(Section4.3), weassumethattheonlymessagestobetransmittedare the channel estimates themselves since the allocation of resources to communication can only lead to higher distortion levels for the channel estimates at the destination. Thus, for example, the first node only transmits training over the first channel while from the second channel onwards, the only information that is transmitted between nodes are the channel estimates. In contrast, amplify-and-forwarddoesnottransmitexplicitestimates,butscaledversionsofthereceivedsignal (Section 4.4). 85 The following section presents upper and lower bounds on the achievable distortion exponent for a linear network assuming the encode-and-forward protocol. The schemes presented employ results from both estimation and information theory. 4.3 The Encode-and-Forward Protocol: Linear Network Withthedistortionmetricdefinedasthemean-squarederror,theoptimalestimatoristheMMSE, whichwillbenon-linearforourscenario. WedenotetheMMSEestimateforchanneliatnodej as, ˆ h j i =E[h i |Y j ]. Recallthatthetotaldistortionforh i atthedestinationnodeisdenotedD i . Finally, we define D j i to be the contribution to the distortion in h i at the j-th node. Mathematically, D j i =E[ ˆ h j−1 i − ˆ h j i ] 2 . ExploitingtheorthogonalityoftheMMSEestimator,followingthearguments of [63], the i’th node to destination distortion can be decomposed as follows: D i = P n j=i D j i under the assumption of optimal encoding functions. We now present an outline for the rest of this section: We first bound the per hop distortion via results from rate-distortion theory [11,63] in Lemma 11. Using this result and the orthogonality property of conditional means, we extend this bound to an “end-to-end” distortion bound for an arbitrary number of hops in Theorem 8. Using Theorem 8, we then proceed to bound the distortion exponent for the linear network for the encode-and-forward protocol in Corollary 3. Lemma 11. The per-hop distortion, D j i , is lower bounded by: D j i ≥ var( ˆ h j−1 i )(1+SNR j ) − T j T where ˆ h j i =E[h i |Y j ]. (4.8) Proof. Recall the signal model for the j-th hop in Equation (4.4). We condition on h j . The parameters P j and β k j are deterministic functions of the allocated power. The function a k j (Y j ,l) 86 is the optimal encoding function for encode-and-forward. Note that Y j is random and will serve as our source. We have the following inequality (Problem 8, Chapter 13, [11]): R(D)≥h(Y j )−log(2πeD) =⇒ D≥ 1 2πe exp(h(Y j )−R(D)) (4.9) To maximize this lower bound, we make Y j a white, i.i.d Gaussian vector. We now exploit results from the transmission of a noisy source over a noisy channel [63]. The per-hop distortion for h i over the j-th hop is then given, in the limit of infinite block length (assuming that the normalization constant a k j is completely dedicated to the transmission of information about h i ), by Theorem 1 in [63] and Equation (4.9), D j i = var( ˆ h j−1 i )e −Cnc,j , where C nc,j is the non-coherent channel capacity of the j-th link. Let C j be the corresponding coherent channel capacity. Since C j > C nc,j , i.e. the coherent channel capacity upper bounds the non-coherent channel capacity, we use C j to provide a lower bound on D j i . Thus, we have, D j i ≥ var( ˆ h j−1 i ) exp(−C j ) =var( ˆ h j−1 i ) exp − T j T E hi log 1+ P j |h j | 2 σ 2 j !!! (4.10) ≥ var( ˆ h j−1 i )exp − T j T log 1+ P j σ 2 j !! = var( ˆ h j−1 i )(1+SNR j ) − T j T (4.11) where Equation (4.10) is from the capacity of the coherent fading channel. The factor Tj T in Equation (4.10) is due to the fact that we allocate only that fraction of available time towards communication over the j-th link. Equation (4.11) follows by Jensen’s Inequality, recalling that E[|h j | 2 ] =1. Lemma 12. Given Y i , we can lower bound the aggregate distortion, D i , as, D i ≥ 1 SNR i T i +1 . (4.12) 87 Proof. If we assume that we know {h 1 ,...,h i−1 } in addition to Y i , this bound follows from the computation of the estimation error for the MMSE estimate for single-hop channels. Theorem 8. Given the signal model described in Section 4.2, we can form the following lower bounds on the end-to-end distortion for encode-and-forward: D n ≥ 1 SNR n T n +1 (4.13) D i ≥ 1− Y j≥i 1−(1+SNR j ) − T j T , i<n (4.14) Proof. The bound for distortion on h n (Equation (4.13)) is derived from the distortion of the MMSE estimate given Y D using Lemma 12. Recall that ˆ h j i is the MMSE estimate of ˆ h j−1 i at node j. Then, by definition, var( ˆ h j i ) = E h [ ˆ h j i ] 2 −[E ˆ h j i ] 2 i = E h ˆ h j i i 2 since the MMSE estimator is unbiased. Now we employ the orthogonality property of MMSE estimates which can be stated as: E[( ˆ h j−1 i − ˆ h j i ) ˆ h j i ] =0. (4.15) Using Equation (4.15), we can show that, D j i =E h h j−1 i i 2 −E h ˆ h j i i 2 = var( ˆ h j−1 i )−var( ˆ h j i ). (4.16) Note that Equation (4.16) lends itself to recursion. We then have the following result: D i = X j≥i D j i = 1−var( ˆ h n i ). (4.17) 88 Further, employing Equation (4.16) and Lemma 11, we have the following result: D j i = var( ˆ h j−1 i )−var( ˆ h j i )≥ var( ˆ h j−1 i )(1+SNR j ) − T j T (4.18) =⇒ var( ˆ h j i )≤ var( ˆ h j−1 i ) 1−(1+SNR j ) − T j T (4.19) Equation (4.19) leads to var( ˆ h n i )≤ Y j≥i 1−(1+SNR j ) − T j T . (4.20) Equation (4.20) with Equation (4.17) then leads to the result. Note that the result in Theorem 8 holds for all SNR j ’s and for all time allocations. For a system with fixed SNR j ’s, as the number of hops increases, var( ˆ h n i ) decreases – the estimation gets swamped by the noise. This leads the variance of the estimate to decrease which increases the end-to-end distortion by Equation (4.17). However, for an arbitrarily large, but fixed n, as theSNRavailabletoeachoftheindividualnodesincreases, theestimatesforh i atthedestination improveandconsequently,D i decreases. Thenextcorollaryupperboundsthebestpossible decay rate of this end-to-end distortion as the SNR grows. Corollary 3. For encode-and-forward, the distortion exponent for estimating h i is upper bounded by, d i ≤ min i<j≤n T j T . (4.21) Proof. We define A j = (1+SNR) − T j T where SNR j = SNR∀j. Using Theorem 8, we then have, D i ≥ 1− Y j≥i (1−A j ) (4.22) ≥ X j≥i A j − X j,k≥i,j6=k A j A k +... (4.23) 89 This leads to d i = − lim SNR→∞ logD i logSNR (4.24) ≤ − lim SNR→∞ log P j≥i A j + P n−i j=2 (−1) j−1 P n k16=k2···6=kj=i A k1 ...A kj logSNR (4.25) ≤ min j>i T j T (4.26) since−lim SNR→∞ log(Aj) log(SNR) = Tj T . Thus we see that for encode-and-forward schemes, the bottleneck is the link between node i andthedestinationwhichhasbeenallocatedtheleastamountofresourcesintermsoftime. Thus, with respect to the upper bounds, equal distortion exponent bounds are achieved by allocating equal resources to each link, assuming that the channel and noise statistics are identical across all links; however, as will be seen by the numerical results, such a strategy would not minimize the individual distortions for finite SNR. We now present a lower bound on the distortion exponent by outlining an achievable time multiplexing scheme. Corollary 4. For the encode-and-forward protocol, the distortion exponent for estimating h i is lower bounded by, d i ≥ min i<j≤n T j jT . (4.27) Proof. For each hop j, we allocate the time resources equally to the communication of h i ∀i≤ j by dividing the available time for hop j into j equal intervals I k j ,1≤k≤j with the notation that I k j is used to communicate information about h k exclusively. Now during each of the intervals I k j , we employ a Gaussian codebook (which is capacity achieving for Gaussian fading channels – see [5] for example) to transmit information about h k . Since we allocate only Tj j units of time to the 90 transmission of h k ∀k≤j, we end up with a lower distortion exponent than that in Theorem 8 by a factor of j. The corollary now follows. The next section deals with the amplify-and-forward protocol for linear networks and estab- lishes one can attain the best possible distortion exponent by employing this scheme. The tools used to establish this result are borrowed from standard estimation theory constructs. 4.4 The Amplify-and-Forward Protocol: Linear Network In this section, we consider the amplify-and-forward protocol where each relay node simply scales and re-transmits the signals received. While this protocol brings about a reduction in compu- tational load at the nodes, it also incurs higher computational complexity at the destination in comparison to encode-and-forward. We show that the distortion exponent upper bound of one is a non-trivial bound and that in contrast to encode-and-forward, amplify-and-forward achieves this optimal bound. Our approach is to examine the Cramer-Rao Bound (CRB) averaged over all realizations of the channel fading parameters{h i } n i=1 (which we denote CRB) for the end-to-end channel estimation problem for each channel coefficient, h i . The CRB provides a lower bound on the estimation error variance; we implicitly assume unbiased estimators. For multiple hops, ex- plicit computation of the CRB is intractable, thus we consider upper and lower bounds. We then determine the distortion exponent associated with the CRB, which is unity. We also propose and test a modified version of the classical LMMSE estimation scheme for the amplify-and-forward protocol and establish that it attains the optimal distortion exponent. For our analysis, we use a specific protocol which could incur a loss in optimality at moderate SNR; however, as noted above, our results show the maximal possible distortion exponent for the CRB is achieved with this protocol. In our protocol we set |I j i | = Ti j , j < i,∀i that is, the time allocated for the j-th hop is divided equally for the estimation of all the previous hops. Each node cumulates the received signal over the interval I j i , resulting in the statistic M i,j : 91 M i,j = P l∈I j i Y l . The collection {M i,j } j i=1 forms a statistic for the estimation of all channel coefficients at node j. A scaled version of M i,j (scaled by β j i to satisfy the power constraint) is transmitted in the slot allocated for h i at node j (which would be the i-th slot in the following). Since each node normalizes its output power, the normalization factor for each hop is given by the following expression (after some algebra) 1 (β i n ) 2 =|I i n |(σ 2 n +P n |I i n |). Given the channel coefficients, {h 1 ,...,h n }, the m i,j are Gaussian distributed. Further, the conditional statistics of m i,j can be determined using the following recursive equations: E h [m i,j ] = p P j β j i h j T j E h [m i,j−1 ] j (4.28) var h [m i,j ] = T j j (σ 2 j +P j (β i j ) 2 |h j | 2 var h [m i,j−1 ]) (4.29) where E h [.] = E[.|h 1 h 2 ...h n ], var h [.] = var[.|h 1 h 2 ...h n ] are the conditional mean and variance and the initial condition is m i,i |h 1 ,...,h n ∼ N( √ PihiTi i , σ 2 i Ti i ) ∀i. We observe that due to the mutual independence of the channel coefficients we have p(m i,j ,h 1 ,···h j ) = p(m i,j |h 1 ,···h j ) Q j k=1 p(h k ). As the channel coefficients are random parameters, a lower bound on the estimation variance is given by averaging the conditional CRB over the random parameters ([56], pp. 84–85). For the multihop case, we require the vector parameter version of the CRB [7]. We first compute the Fisher Information matrix (FIM) for the vector case which is now averaged over noise in the hops as well as the variables h i : J n = [J n ij ]; where J n ij =−E ∂ 2 logp mi,n ∂h j ∂h i i,j=1,...,n (4.30) 92 where again we point out that the expectation is a total expectation and is taken over h i and the noises over each of the intermediate channels. Once the FIM is formed, the averaged version of the CRB (averaged over all realizations of the channel fading parameters h i ) is Ekh i − ˆ h i,n k 2 ≥ [J n ] −1 [i,i] (4.31) where [J n ] −1 [i,i] is the i-th element on the diagonal of the inverse of the FIMJ n . Note that m i,n dependsonlyonh n ,...,h i andconsequently, theFIMislower-triangular. Thus, sinceweareonly interested in the diagonal terms of [J n ] −1 , we only need to compute the diagonal terms inJ n and evaluate the CRB for each hop in the network. The CRB can be explicitly evaluated for n = 2. This case is proved in [59], where a lower bound for the amplify-and-forward distortion region is determined to be: 1 D 1 ≤ 1+ P 0 T 1 T 2 2σ 2 1 1− √ 2παe α 2 2 Q(α) , (4.32) 1 D 2 ≤ P 2 T 2 2σ 2 2 + √ 2πe α 2 2 Q(α) α+ 1 α + P 0 T 1 T 2 2ασ 2 1 (4.33) where, α = σ 2 √ P 2 T 1 β 1 2 σ 1 = r SNR 1 T 1 +1 SNR 2 and Q(x) = 1 √ 2π Z ∞ x e − t 2 2 dt. (4.34) Further, in contrast to the encode-and-forward case, [59] also establishes that the distortion exponent of the CRB for n = 2 is unity – the best possible. For n≥ 3, directly computing the CRB for the first channel, h 1 is challenging due to presence of the so-called “product channel,” – determining the exact densities of the m 1,n for n ≥ 3 is computationally intractable. Thus we resort to computing upper and lower bounds on the CRB and then show that these bounds are asymptotically tight in a distortion exponent sense. Our upperboundfollowsasasimpleconsequenceoftheapplicationoftheArithmeticMean-Harmonic 93 Meaninequality(see[1]forexample)whichstatesthatforanycollectionofnpositiverealnumbers {x 1 ,x 2 ,...,x n }: x 1 +···+x n n 2 ≥ 1 x 1 +···+ 1 x n −1 . (4.35) We require the following lemma for establishing the lower bound on the CRB . Lemma13. Ifh 1 ,...,h n are each distributed according to the lawN(0,1), the following inequality holds: E A Q n i=1 |h i | 2 A 0 +A 1 |h n | 2 +A 2 |h n | 2 |h n−1 | 2 +···+A n−1 |h n | 2 |h n−1 | 2 ...|h 2 | 2 ≥ AB n A 0 +A 1 +···+A n−1 (4.36) where, A, A 0 , ...,A n−1 are arbitrary (positive) constants and B n = exp(n( γ 2 −ln2)), where γ is the Euler-Mascheroni constant, γ = 0.57721566.... Proof. Consider the following chain: L.H.S =EAexp log Q n i=1 |h i | 2 A 0 +A 1 |h n | 2 +A 2 |h n | 2 |h n−1 | 2 +···+A n−1 |h n | 2 |h n−1 | 2 ...|h 2 | 2 | {z } DEN = EAexp n X i=1 log|h i | 2 −log(DEN) ! (4.37) ≥ Aexp n X i=1 Elog|h i | 2 −Elog(DEN) ! (4.38) = AB n exp −Elog(A 0 +A 1 |h n | 2 +···+A n−1 |h n | 2 |h n−1 | 2 ...|h 2 | 2 ) (4.39) ≥ AB n exp −log(A 0 +A 1 Eh 2 n +···+A n−1 E[|h n | 2 |h n−1 | 2 ...|h 2 | 2 ]) (4.40) = AB n exp(−log(A 0 +A 1 +A 2 +···+A n−1 )) (4.41) = AB n A 0 +A 1 +···+A n−1 = R.H.S. (4.42) 94 where Equation (4.38) follows by Jensen’s Inequality since exp is a convex function, Equation (4.39) follows because of the identity, Elog|h i | 2 = Z ∞ −∞ ln(x 2 ) e − x 2 2 √ 2π dx = γ 2 −ln2 (4.43) where γ is the Euler-Mascheroni constant (see [1] for example) and Equation (4.40) follows by Jensen’s Inequality again, since−log is a convex function and since exp is a monotonic function. The lemma is established. We are now in a position to establish a theorem bounding the CRB for any n. Theorem 9. The CRB for h 1 in a n> 2 hop amplify-and-forward network is bounded above by: CRB −1 ≥ 1+ B n Q n i=1 (P i T 2 i ) Q n i=2 (β 1 i ) 2 (n!) 2 P n−1 i=0 Q n j=n−i Pj(β 1 j ) 2 Tj j (4.44) where, B n is as defined in Lemma 13, and bounded below by: CRB −1 ≤ 1+ Q n i=1 (P i T 2 i ) Q n i=2 (β 1 i ) 2 (n!) 2 nT n σ 2 n + n−1 X i=1 Q n−i−1 i=1 (P i T i ) Q n i=1 T i Q n−i−1 i=2 (β 1 i ) 2 n!(n−i−1)! . (4.45) Proof. Let us evaluate the diagonal term contributing to the CRB for the first hop in a n-hop network: J n 1,1 = 1+E E h [m n,1 ] 2 var h [m n,1 ] (4.46) where, the outer expectation is with respect to the flat fading coefficients h 1 ,...,h n and E h [m n,1 ] 2 = Q n i=1 (P i |h i | 2 T 2 i ) Q n i=2 (β 1 i ) 2 (n!) 2 (4.47) var h [m n,1 ] = T n n σ 2 n +P n (β 1 n ) 2 |h n | 2 var[m n−1,1 ] (4.48) 95 from Equations (4.28) and (4.29). To obtain an upper bound on the CRB (which is the inverse of Equation (4.46)), we need to develop a lower bound on J n 1,1 . Towards this end, we exploit Lemma 13 with: A = Q n i=1 (PiT 2 i ) Q n i=2 (β 1 i ) 2 (n!) 2 , A0 = Tnσ 2 n n , Ai = n Y j=n−i Pj(β 1 j ) 2 Tj j 0<i≤n−1. (4.49) We then obtain the following lower bound on J n 1,1 J n 1,1 ≥ 1+ B n Q n i=1 (P i T 2 i ) Q n i=2 (β 1 i ) 2 (n!) 2 P n−1 i=0 A i (4.50) where B n is as in Lemma 13 and we are done with establishing the upper bound on the CRB once we substitute values for A i . Now we establish a lower bound on the CRB. From Equation (4.46), we again have J n 1,1 = 1+E " A Q n i=1 |h i | 2 A 0 + P n−1 i=1 A i Q n j=n+1−i |h j | 2 # (4.51) = 1+E 1 A0 A Q n i=1 h 2 i + P n−1 i=1 Ai A Q n−i i=1 |hi| 2 (4.52) where,A,A 0 andA i areasinEquation(4.49). UsingtheAM(ArithmeticMean)≥HM(Harmonic Mean) inequality (see [1]) as 1/AM≤ 1/HM, and considering the denominator of Equation (4.52) as an arithmetic mean of n quantities, we have the upper bound J n 1,1 ≤ 1+E " A Q n i=1 |h i | 2 n 2 A 0 + n−1 X i=1 A Q n−i i=1 |h i | 2 n 2 A i # = 1+ A n 2 A 0 + n−1 X i=1 A n 2 A i (4.53) whichcorrespondstotheHMofthetermsusedintheAMandwherewehaveusedtheassumptions thatE[h 2 i ] =1∀iandthath i areindependentofeachother. Again,notethatEquation(4.53)leads to a lower bound on the CRB. Substituting values for A,A 0 and A i establishes the theorem. 96 Once we have Theorem 9, it is straightforward to establish the corresponding distortion ex- ponent result for a linear amplify-and-forward network is unity (the best achievable). Note that we need to evaluate the distortion exponent of the upper bound too since we are not guaranteed that this is bounded by unity (since our expression is not an MMSE expression). Corollary 5. The distortion exponent for the CRB for the first hop channel in a linear amplify- and-forward network with n hops is unity. Proof. We first introduce the . = notation to denote that the corresponding terms have the same exponential order, i.e ν 1 . =ν 2 ⇐⇒ lim SNR→∞ ν 1 ν 2 <∞ and lim SNR→∞ ν 2 ν 1 <∞. (4.54) For n = 2, we have already established this result in Equations (4.32) and (4.33). Therefore, by mathematical induction, we need to establish that the distortion exponent for h 1 is unity for an arbitrary n-hop network. Now we observe that lim SNRi→∞ P i (β 1 i ) 2 = lim SNRi→∞ i 2 SNR i T i (SNR i T i +i) → i 2 T 2 i = constant∀i (4.55) Hence, we can neglect all the terms which include these products for the purpose of evaluating the distortion exponent. In Equation (4.44), the numerator has one more P term than the denominatorwhichhasanadditionalσ 2 n terminEquation(4.44)whichleadstothefinaldistortion exponent result: J n 1,1 ≥ 1+ B n Q n i=1 (P i T 2 i ) Q n i=2 (β 1 i ) 2 (n!) 2 P n−1 i=0 Q n j=n−i Pj(β 1 j ) 2 Tj j . = SNR. (4.56) 97 Equation (4.56) leads to a distortion exponent of unity for the upper bound on the CRB for the firsthopinan-hopnetwork. NowusingEquation(4.55)inEquation(4.45)wehavethedistortion exponent for the lower bound on the CRB for the first hop in a n-hop network: J n 1,1 ≤ 1+ Q n i=1 (PiT 2 i ) Q n i=2 (β 1 i ) 2 (n!) 2 nTnσ 2 n + n−1 X i=1 Q n−i−1 i=1 (PiTi) Q n i=1 Ti Q n−i−1 i=2 (β 1 i ) 2 n!(n−i−1)! (4.57) . = SNR (4.58) whichprovesthatthedistortionexponentfortheCRBofthefirsthopofan-hopnetworkisunity. Since we allocate the time available for each node uniformly towards the estimation of each of the previous channel fading parameters, we should have d 1 ≤ d 2 ≤ d 3 ···≤ d n . Further, we already know from Section 4.2 the distortion exponent is always upper bounded by unity. We have thus established that the CRB for the amplify-and-forward scenario yields a distortion exponent of unity for the estimation of each of the hops of the channel. The CRB provides a lower bound on the estimation variance, but does not say anything about the achievability of the bound. To assess how tight this bound is, we investigated several estimation algorithms for the channel coefficients at the destination. Two logical estimators to consider are the maximum likelihood (ML) estimator and the conditional mean (CM) estimator. The latter is the MMSE estimator; both the CM and ML estimators are non-linear in nature due to the structure of the received signal. Both the CM and ML estimators appear to be intractable for even a moderate number of hops. This is in large part due to the presence of the product channel,whichnecessitatedtheconsiderationofupperandlowerboundsontheCRB. Aniterative algorithm based on estimating h n first, then using ˆ h n to estimate ˆ h n−1 , etc. was also examined; however if ˆ h k ≈ 0, this approach yielded unbounded estimates for other channels. 98 Tomotivatetheestimationschemeemployedinthispaper,wefirstconsiderthelinearunbiased estimator (LUE) for h i , i.e. ˆ h i,LUE =k i m i,n . The constant k i is selected to minimize the MSE: Ek ˆ h i,LUE −h i k 2 =k 2 i ( n Y j=i (P j β n j T j +E[var h [m i,n ]))+1 (4.59) It is clear from Equation (4.59) above that the LUE estimator is a pathological one since k i = 0, ∀ i minimizes Equation (4.59) and hence ˆ h i,LUE = 0 ∀ i. Thus, based on our prior inves- tigations, we propose the following estimator structure, which we term the ratio LUE (RLUE) estimator. First we define the product channel parameter, θ i = Q n k=i h i . We next form the LUE for θ i at the n-th node; ˆ θ i = n!m i,n Q i j=1 p P j β j i T j . (4.60) Further,thestatisticsof ˆ θ i aregivenby: ˆ θ i |h 1 ...,h n ∼N θ i , (n!) 2 var h [mi,n] (i!) 2 Q i j=1 Pjβ j,2 i T 2 j wherevar h [m i,n ] is given in Equation (4.29). Note that [42] also estimates the product channel, for two hop networks, using LMMSE, however, our problem requires the channel estimates as opposed to estimates of the product channel. Our channel estimates are now given by: ˆ h n = ˆ θ n and ˆ h i = ˆ θ i ˆ θ i+1 . (4.61) The asymptotic performance of this RLUE scheme is studied in the following theorem. Theorem 10. The distortion exponent of the RLUE scheme is unity. Proof. We are interested in lim SNR→∞ min n E[| ˆ h i −h i |] 2 ,1 o . Since this value is always bounded aboveby1(thevarianceofh i ),bytheboundedconvergencetheorem(see[?],forexample),wecan switch the order of the limit and expectation and evaluate min E h lim SNR→∞ | ˆ h i −h i | i 2 ,1 to get the same result. Consider, 99 D i = min E E h ˆ h i −h i 2 ,1 (a) = min E E h ˆ θ i ˆ θ i+1 −h i 2 ,1 (4.62) = min E E h ˆ θ i − ˆ θ i+1 h i ˆ θ i+1 2 ,1 (4.63) (b) = min E E h Δ ˆ θ i ˆ θ i+1 2 ,1 (4.64) where, we have used the law of iterated expectations(see [?] for example), (a) follows from Equation (4.61) and the statistics of ˆ θ i and ˆ θ i+1 , (b) holds with the definition Δ ˆ θ i = ˆ θ i − ˆ θ i+1 h i , which is a random variable with Δ ˆ θ i |h 1 ...h n ∼N 0, (n!) 2 var h [mi+1,n] (i+1!) 2 Q i+1 j=1 Pjβ j,2 i T 2 j + (n!) 2 var h [mi,n]h 2 i (i!) 2 Q i j=1 Pjβ j,2 i T 2 j . It is known that the ratio of two standard normal variables is given by a Cauchy distribution (see[1],forexample)whichhasanundefinedmeanandvariance. However,whenthetwoindepen- dent, normally distributed random variables have different means and variances, Y 1 ∼N(μ 1 ,σ 2 1 ) andY 2 ∼N(μ 2 ,σ 2 2 ), the probability density function of their ratio,R = Y1 Y2 , is given by (see [?,?]): f R (r) = e − c 2 πσ 1 σ 2 a(r) + b(r) √ 2πσ 1 σ 2 (a(r)) 3 2 exp − 1 2 c− b(r) 2 a(r) erf b(r) p 2a(r) ! where, a(r) = r 2 σ 2 1 + 1 σ 2 2 , b(r) = μ 1 r σ 2 1 + μ 2 σ 2 2 , andc = μ 1 σ 2 1 + μ 2 σ 2 2 . Note that erf(x) = 2 √ π R x 0 e −t 2 dt is the error function. The first term in f R can be interpreted as the Cauchy part of the distribution and corresponds to the case when f Y2 (0) > 0. If σ 2 2 << μ 2 , we can handle this term using a formal Taylor series expansion (see [1] for example). In our case, μ 2 = θ i+1 and σ 2 2 = (n!) 2 var h [mi+1,n] (i+1!) 2 Q i+1 j=1 Pjβ j,2 i T 2 j . Further, since we are operating under the limit of large 100 SNR, we can always find SNR 0 such that for all SNR > SNR 0 , σ 2 2 << μ 2 and then, the Taylor series expansion for ˆ θ −1 i+1 = (θ i+1 + ˜ θ i+1 ) −1 with ˜ θ i+1 ∼N 0, (n!) 2 var h [mi+1,n] (i+1!) 2 Q i+1 j=1 Pjβ j,2 i T 2 j becomes, 1 θ i+1 + ˜ θ i+1 = 1 θ i+1 1+ ˜ θ i+1 θ i+1 ! −1 (4.65) = 1 θ i+1 1− ˜ θ i+1 θ i+1 + ˜ θ 2 i+1 θ 2 i+1 −... ! (4.66) Let us now restrict our attention to the case that |h i |≥ , ∀i for some > 0. Since {h i } n i=1 are standard Gaussian random variables, this happens with probability p 1 = (2Q()) n , where Q(.) is the complementary cumulative distribution function for a standard normal random variable. When|h i |≥, ∀i, we can make SNR so large that Equation (4.66) can be approximated by 1 θi+1 . . Note that we can make as small as we wish by driving SNR larger and larger and in the limit of infinite SNR, we have p 1 → 1 and, D i ' E (n!) 2 var h [mi+1,n] (i+1!) 2 Q i+1 j=1 Pjβ j,2 i T 2 j + (n!) 2 var h [mi,n]h 2 i (i!) 2 Q i j=1 Pjβ j,2 i T 2 j |θ i+1 | 2 (4.67) = E (n!) 2 var h [mi+1,n] (i+1!) 2 Q i+1 j=1 Pjβ j,2 i T 2 j + (n!) 2 var h [mi,n]h 2 i (i!) 2 Q i j=1 Pjβ j,2 i T 2 j |h i+1 | 2 |h i+2 | 2 ...|h n | 2 (4.68) . = f() SNR (4.69) where, we exploit the fact that SNR is large as in Theorem 9 and f() is a deterministic, bounded function of (oncen is fixed) and is evaluated once we note that{h i } n i=1 are i.i.d standard normal random variables and under the assumption that|h i |≥, ∀i, using the integral 2 Z ∞ e − t 2 2 t 2 √ 2π dt = erf √ 2 + r 2 π e − 2 2 −1 =G() (say) (4.70) which establishes the theorem. 101 We have excluded an elaborate discussion on how we calculate f() since the computation is similar to the discussion in Theorem 9. For example, when n = 2, Equation (4.68) can be evaluated as (substituting for the various statistics involved), D i = E " 2σ 2 1 P 1 T 1 T 2 + 2σ 2 2 P 1 P 2 β 1,2 1 T 2 1 T 2 h 2 2 + h 2 1 σ 2 2 σ 2 2 + P2T2 2 h 2 2 # (4.71) = 2 SNRT 1 T 2 +E 2 SNRT 2 h 2 2 +E[h 2 1 ]E 1 h 2 2 2 2+SNRT 2 (4.72) = 2 SNRT 1 T 2 + 2G() SNRT 2 + 2G() 2+SNRT 2 (4.73) where, we have used Equations (4.55) and (4.70) along with the independence of h 1 from h 2 . While our results for the RLUE hold asymptotically in SNR, in practical implementations (even atreasonablyhighSNR), wealwayshavethepossibilitythat ˆ θ i+1 isextremelysmall. Insuchsitu- ations, the RLUE scheme exhibits unpredictable behavior. We therefore propose the Thresholded Ratio LUE (TRLUE) which is a more practical version of the RLUE. ˆ h n = ˆ θ n and ˆ h i = ˆ θi ˆ θi+1 , | ˆ θ i+1 |≥τ i 0, else (4.74) where, we note that the thresholds,{τ i } n i=1 , are determined experimentally. The inclusion of the threshold is motivated by our previous discussion for maximum likelihood estimators, that is, the threshold prevents small estimates of the channels due to deep fades from yielding pathologically largeestimatesforpreviouschannels. NotethattheTRLUEhasdegradedperformancecompared to the RLUE at high SNR because of the distortion hit we take due to the thresholding. The performance of this scheme for one particular network is presented in Section 4.6. The following section presents results for a different network topology from the linear one we haveconsideredthusfar. Someoftheresultspresentedintheprevioussectionareemployedinthis 102 section and it is established that it is advantageous (from an end-to-end distortion perspective) to layer the nodes of a network as hierarchically as possible. 4.5 Many-to-One Relay : Protocols and Bounds Our ultimate goal is to investigate the joint sensing and communication problem for arbitrary network topologies. To this end, an important generalization of the linear network is to consider atwo-levelnetworkwheremultiplenodescommunicatetoarelayandthentherelaycommunicates over a single channel to the destination. We call this topology the many-to-one network. In this section, we first determine an appropriate communication protocol for the multiple sources to the relay node (see Figure 1.1). As the initial source nodes merely transmit training, our strategy is applicable to both the encode-and-forward and the amplify-and-forward protocols. After the optimal first hop protocol is determined, we examine the end-to-end distortions for the encode- and-forward and amplify-and-forward approaches. 4.5.1 Signal Model TherelayreceivestrainingsignalsfromM sourcesfortheM channels. Therelaythenjointlycodes the estimated channels if we employ the encode-and-forward approach or amplifies each of the received signals if we employ the amplify-and-forward approach. This signal is then transmitted over the final link to the destination. The destination must then estimate all M +1 (M channels from the sources to the relay and one channel from the relay to the destination) channels of interest. Let T R be the total time available for communication between the source nodes and the relay node. Let I j be the sub-interval of [0,T R ], over which source j transmits training. The transmitted signal from each source is given by: X j S (i) = q P j 0 , i∈I j andY 1 (i) = M X j=1 q P j 0 h j 1 (i∈Ij) +Z 1 (i), 1≤i≤T R (4.75) 103 where 1 (·) is the indicator function and again, as in Section 4.2, the elements not presented here are assumed to be zero. Note that I j ∩ I k may not be empty, thus sources may transmit in overlapping intervals. At the relay, we cumulate over each intervalI k to form a set of statistics for estimation. We note that these statistics are sufficient for the source to relay estimation problem, Y R = A √ Ph+n whereP = diag P 1 0 ,P 2 0 ,···P M 0 (4.76) where the vector signals above are defined as, Y R = [Y R1 ,Y R2 ,··· ,Y RM ] T ,h = [h 1 ,h 2 ,··· ,h M ] T , and n = [n 1 ,n 2 ,··· ,n M ]. Observe that due to the potential of overlapping signals, n might be colored and is distributed as N(0,σ 2 A). Recall that h ∼ N(0,I). Further, note that A j,j = P Ij 1 (l∈Ij) = |I j | = T j , the duration of the interval I j . Since we have a total time constraint across all the source nodes, we impose the additional constraint tr(A) = T R . The relay node processes its received signal and then transmits over the relay-destination link: X 1 (i) = p P 1 β k 1 a k 1 (Y 1 ,i), i∈ [T −T R ,T],k = 1,...,M, (4.77) Y D (i) =Y 2 (i) = p P 1 β k 1 a k 1 (Y 1 ,i)h 2 +Z 2 (i), i∈ [T −T R ,T],k = 1,...,M (4.78) where the notation is as in Section 4.2. 4.5.2 First Hop Many-to-Relay Protocol To determine the optimal first hop protocol, we consider the sum distortion of all of the source- to-relay channel estimates. A protocol for the first hop of this problem is defined to be the allocation of time over which training is transmitted. We determine the optimal allocation for the sum distortion cost function at the relay and show that a time orthogonal solution achieves the optimal sum distortion. 104 Theorem 11. One of the optimal (with respect to sum distortion at the relay node) protocols for the sources to transmit training over the first hop of a many-to-one network corresponds to time orthogonal multiplexing among the different sources where we allocate T i units of time to each source where, T i = σ 2 i P i 0 s P i 0 μ −1 + (4.79) where, μ obtained by solving the equation M X i=1 σ 2 i P i 0 s P i 0 μ −1 + =T R (4.80) and x + is the positive part of x. Proof. Given the signal model in Section 4.5.1, we form the MMSE estimate, ˆ h, of h from y. The error covariance matrix is given by the covariance of h− ˆ h and is as follows, K = I−PA T AP 2 A T +σ 2 A −1 AP. The sum distortion at the relay node is then the trace of this matrix, D S = tr(K) =M− M X i=1 P i 0 λ 2 i P i 0 λ 2 i +σ 2 i λ i = M X i=1 σ 2 i P i 0 λ i +σ 2 i . (4.81) where{λ 1 ,...,λ M } are the eigenvalues for A. Our desired optimization is given by: minimize M X i=1 σ 2 i P i 0 λ i +σ 2 i subject to M X i=1 λ i = tr(A) =T R . (4.82) This equation can be solved using Lagrange multipliers and the Karush-Kuhn-Tucker (since λ 2 i must be positive) conditions (see, for example, [8]) to yield the solution such that λ i is given by the expression for T i noted above in Equation (4.80) and μ i is similarly specified. Setting 105 A = diag[λ 1 ,...,λ M ],wehaveadiagonalmatrixwhoseeigenvaluessatisfytheconditionsoutlined aboveandhencetheTheoremisestablished. Inotherwords,wecanallocatesourceianorthogonal time interval the duration of which is λ i (that is, set T i = |I i | = λ i ) and choose each source to transmit in a distinct interval (that is, set I i ∩ I j = φ ∀ i,j) and achieve the minimal sum distortion, D S . There are other time allocations and protocols which can achieve the minimal distortion at the relay; the time-orthogonal allocation is just one. We underscore that Theorem 11 provides the optimal time-sharing for the objective of minimizing the sum distortion at the relay alone. Thus, alternative schemes may provide superior performance when considering the sum distortion at the destination at the expense of computational intractability or complexity. We next provide the end-to-end distortion with optimized orthogonal signaling for the first hop when considering both amplify-and-forward and encode-and-forward for the second hop. 4.5.3 Amplify-and-Forward We first provide an upper bound on the distortion exponent. Corollary 6. For the many-to-one network with M sources, an upper bound on the individual distortion diversities for the end-to-end distortion of any source is 1. Proof. Let the interval T −T R be sub-divided into M +1 orthogonal time slots, I 0 i . The signal fromrelayR i istransmittedoverI 0 i ,i = 1,··· ,M andtrainingfortherelay-to-destinationchannel is sent over I 0 M+1 . For h M+1 , single channel MMSE estimation is possible using the signal within I 0 M+1 and unit distortion exponent is achievable. For all other channels, joint channel estimation for h M+1 and h i is conducted using signals from intervals I 0 M+1 and I 0 i ,i = 1,··· ,M. Thus we have M two-hop links. By Corollary 5, the upper bound on the distortion exponent is thus unity. 106 The optimal protocol, which jointly processes the signals from all intervals simultaneously to jointly estimate h i ,i = 1,··· ,M+1 cannot do worse than the above specified protocol – thus the individual distortion exponent is upper bounded by one, non-trivially. Determining the CRB’s for the M source-to-relay channels for joint estimation with h M+1 appears to be intractable; However, the upper bounds on the CRBs of Theorem 9 provide upper bounds for the many-to-one network – as implied above in the determination of the distortion exponent. 4.5.4 Encode-and-Forward We observe that if we assume a time orthogonal communication scheme over the first hop, the components of Y R = [Y R1 ,...,,Y RM ], the received signal at the relay, are statistically indepen- dent, that is Y R ∼N 0,diag T 2 1 P 0 +T 1 σ 2 1 ,··· ,T 2 M P M +T M σ 2 M where T i =|I i |∀i. The next theorem allows us to generalize the results presented in Section 4.3 to arbitrary tree topologies. Theorem 12. The optimal joint encoding of the M sources at the relay node for a many-to-one topology is a form of reverse water-filling among the estimates of the sources at the relay node and achieves a sum distortion (of the sources at the destination) that is lower bounded by: D sum ≥ (M−M 0 )+ M 0 X i=1 1 SNR i T i +1 + M 0 Q min (1+SNR M+1 ) −(T−T R ) M 0 exp 1 M 0 M 0 X i=1 logγ i (4.83) where Q min = min i=1..M SNR i T i SNR i T i +1 andγ i = SNR i T i (SNR i T i +1)Q min . (4.84) Further, the sources are ordered according to the Q i , and the first M 0 sources are the ones jointly encoded for transmission over the second hop. 107 Proof. To compute the end-to-end distortion for h i , i = 1,··· ,M we observe that the results of Wolf and Ziv [63] are easily extended to vector sources. Let h M = [h 1 ,h 2 ,··· ,h M ] T . Our goal is to bound the end-to-end mean-squared error between h M and the estimate made for this channel vector at the destination, ˆ h M (Y D ). Let ˆ h i (Y) = E[h i |Y] be the MMSE estimate of h i (corresponding to source i) given output Y and ˆ h M (Y) = [ ˆ h 1 (Y),..., ˆ h M (Y)] T be the column vector of all the estimates given output Y. Now, Dsum = E ‚ ‚ ‚h M − ˆ h M (Y D ) ‚ ‚ ‚ 2 =E ‚ ‚ ‚h M − ˆ h M (Y R ) ‚ ‚ ‚ 2 +E ‚ ‚ ‚ ˆ h M (Y R )− ˆ h M (Y D ) ‚ ‚ ‚ 2 (4.85) = " M X i=1 E “ hi− ˆ hi(Y R )] ” 2 # +E ‚ ‚ ‚ ˆ h M (Y R )− ˆ h M (Y D ) ‚ ‚ ‚ 2 (4.86) = " M X i=1 E “ hi− ˆ hi(YRi) ” 2 # +E ‚ ‚ ‚ ˆ h M (Y R )− ˆ h M (Y D ) ‚ ‚ ‚ 2 (4.87) = " M X i=1 MSERi # + M X i=1 E ‚ ‚ ‚ ˆ hi(Y R )− ˆ hi(Y D ) ‚ ‚ ‚ 2 . (4.88) Equation(4.85)followsfromtheorthogonalityprinciple, Equation(4.86)fromthedecomposabil- ity of the mean-squared error criterion and Equation (4.87) from the fact that the components of Y R are statistically independent and thus the conditional mean estimator for h i conditioned on Y R is equivalent to that conditioned on Y Ri . Finally, we note that MSE Ri is the MSE for the estimation of h i at the relay node R. We do not decompose the second term in the MSE due to the fact that joint coding of parallel Gaussian sources achieves lower distortion than the sum of the distortions due to individual coding [11]. The mean-squared estimation error for the first estimation at the relay for h i is as in Lemma 11: D 0 i = MSE Ri = 1 SNR i T i +1 . (4.89) To compute the distortion accrued over the relay-destination link, we once again consider the transmission of a coded source over a noisy link. We summarize the results from rate distortion theoryforthejointcodingofparallelGaussiansourceswithindependent,zero-meancomponentsof 108 varianceQ i [[11], Theorem13.3.3]. LetD bethesumdistortionandD i theindividualdistortions, then the rate-distortion function is given by, R(Δ) = M X i=1 log Qi δi where δi = 8 > > < > > : λ if λ<Qi P i 0 if λ≥Qi such that M X i=1 δi = Δ. (4.90) Given the reverse water-filling solution above, let M 0 correspond to the number of sources with sufficient power to be jointly coded at the relay node R. Then we have, R(δ) = M 0 X i=1 log Q i δ = M 0 X i=1 log Q min γ i δ =M 0 log Q min δ + M 0 X i=1 logγ i (4.91) where Q min = min i Q i such that Q i >δ andγ i = Q i Q min (≥ 1) (4.92) ⇒ δ = Q min exp − 1 M 0 R(δ)− M 0 X i=1 logγ i . (4.93) Weunderscorethefactthatallcodedsourcesachievethesamedistortionlevel,δ. Theergodicnon- coherentcapacityoftherelay-destinationlinkisboundedasbefore: C M+1 ≤ log 1+ P M+1 σ 2 M+1 Further, using Equation (4.89), we note that the variance of the estimate ˆ h i (Y R ) is given by: var( ˆ h i (y R )) = SNR i T i SNR i T i +1 . (4.94) Plugging our bound forR and Equation (4.94) forQ i in Equation (4.93), and noting that we have T−T R units of time available for communication over the relay-destination link, we observe that the individual distortion for each source due to the relay destination link is bounded as, δ i ≥ Q min (1+SNR M+1 ) −(T−T R ) M 0 exp 1 M 0 M 0 X i=1 logγ i (4.95) where γ i = SNR i T i (SNR i T i +1)Q min andQ min = min i=1..M SNR i T i SNR i T i +1 . (4.96) 109 Now substituting Equations (4.89) and (4.95) in Equation (4.88) we can form our final lower bound on sum distortion of each of the source channels at the destination node. From Theorem 12, it is straightforward to determine the distortion exponent for the many-to- one EF protocol. Corollary7. Given anM source, many-to-one relay network, with the encode-and-forward proto- col and assuming that SNR j = SNR ∀j, the distortion exponent for estimatingh i fori = 1,··· ,M is upper bounded by, d i ≤ min T i T , T −T R MT . (4.97) Proof. Under the assumption of equal SNR, M 0 = M and logγ i = 0. Further, due to the symmetry,thelowerboundontheestimationerrorofanyindividualsourcechannelisthensimply D sum /M. It is straightforward to show that, lim SNR→∞ Q min = 1. The result is then shown by an application of the methods employed in Theorem 1 for the two hop case, appropriately adjusting the MSE bounds using the expressions of Equation (4.95). The scheme outlined here can be applied recursively to any number of hops and can be ex- tendedtotreesinthatmanner. Thisgeneralizationisconceptuallystraightforward,butsomewhat involved and does not yield any meaningful intuition beyond what is presented and is thus omit- ted. Intuitively, given a fixed number of nodes, a hierarchical topology imposed over the nodes (versus say a linear topology) reduces the MSE of the channel coefficients. This is due to the fact that a hierarchical topology minimizes the number of hops from a source to the destination. Increasednumberofhopsleadstoincreaseddistortion. Inthesequel,weprovidenumericalresults which underscore the conclusions we have drawn from the developed theory. 110 4.6 Numerical Results 4.6.1 Linear Topology 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 !5 !4.5 !4 !3.5 !3 !2.5 !2 !1.5 !1 !0.5 0 T 1 log (MSE) D 1 EF D 2 EF D 1 AF D 2 AF Figure 4.1: A graphical demonstration of the contributions to the distortion bounds for the two hop network assuming the encode-and-forward protocol at SNR 1 = SNR 2 =20 dB. We first present bounds on the distortion for each hop for the simple, two hop network for the encode-and-forward and amplify-and-forward protocols. We have assumed that SNR 1 = SNR 2 = 20 dB and as such, both links are the same in terms of quality. Figure 4.1 presents the bounds on the distortion for the EF protocol as a function of the time allocated to the first hop, T 1 . Not surprisingly, D 2 increases as the time allocated to the second hop decreases. We note that the estimation of h 1 is a function of the quality of both channels. Figure 4.1 also presents the bounds on the CRB for the amplify-and-forward protocol. It is interesting to note that the bounds for the two protocols intersect and that the encode-and-forward curve is lower than the amplify-and-forward curve for certain time allocations. It is thus seen that at reasonable SNR’s, it might sometimes be better to employ the encode-and-forward protocol, even though it has a lower distortion exponent. 111 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 !3.5 !3 !2.5 !2 !1.5 !1 !0.5 0 T 1 log 10 MSE 1 Modified LMMSE for AF Lower bound for EF Upper bound for EF Averaged CRB for AF Figure 4.2: MSE’s for the TRLUE as compared to the Encode-and-Forward bounds for a two hop network at an SNR of 30dB as time allocated to the first hop changes. Figure 4.2 presents a comparison of the performance of the TRLUE scheme for amplify-and- forward as compared to the upper and lower bounds on encode-and-forward presented in Section 4.3 and the CRB for two hops presented in Section 4.4. Here, SNR = 30dB for each of the two nodes. This plot shows that while amplify-and-forward achieves asymptotic superiority over encode-and-forward (as evidenced by the distortion exponent results – see Figure 4.3), at low to moderate SNR’s, encode-and-forward is clearly superior to the realizable TRLUE. This speaks to the need for further investigation of estimator structures for both the protocols considered. We next examine three hop linear networks. Figure 4.3 compares the upper and lower bounds for the distortion of h 1 computed for the amplify-and-forward (Section 4.4) and encode-and- forward protocols (Section 4.3) when T 1 = T 2 = T 3 = 1 3 . We note that the amplify-and-forward curves are for the CRB. We first observe that the amplify-and-forward upper and lower bounds are essentially coincident for the range of SNRs considered. The decay rates of the three sets of 112 30 35 40 45 50 55 60 65 70 !6 !5 !4 !3 !2 !1 0 SNR 1 = SNR 2 = SNR 3 = SNR (in dB) log 10 MSE (h 1 ) EF Lower Bound EF Upper Bound AF Lower Bound AF Upper Bound Figure 4.3: Comparison of the rate of decay of MSE for the first hop with SNR assuming that T 1 =T 2 =T 3 = 1/3. curves are those predicted by the theory, that is: amplify-and-forward CRB’s slope is−1, encode- and-forward’s upper bound slope is− 1 3 and encode-and-forward’s lower bound slope is− 1 9 . Thus for very high SNR environments, the promise of amplify-and-forward is significant. We next examine simulation results for the achieved MSE of the TRLUE algorithm for the amplify-and-forward protocol. The MSE versus SNR for the three hop case is shown in Figure 4.4. Recall that the TRLUE scheme determines the LMMSE estimator for the product channels, θ i = Q n i h i todeterminetheestimatesonh i (seeSection4.4). Wefirstobservethattheestimators forθ i achieveunitydistortionexponent;theestimatorsforh i areofhighervariance,butalsoobeya decayrateofminusoneasafunctionofSNR. Thus,wehaveanempiricalresultthatindicatesthat estimation algorithms for the amplify-and-forward protocol can indeed achieve optimal distortion exponent. However, as observed in Figure 4.2, for moderate to low SNR, the MSE achieved by TRLUE can be significantly higher than that achieved by an upper bound for the MSE for the encode-and-forward protocol. 113 120 122 124 126 128 130 132 134 136 138 140 !14 !12 !10 !8 !6 !4 !2 0 SNR 1 = SNR 2 = SNR 3 (in dB) log 10 MSE h 3 h 2 h 1 Figure 4.4: MSE’s for the TRLUE using three hops assuming that T 1 =T 2 =T 3 = 1/3. 4.6.2 Many-to-One Topology We next consider the many-to-one topology. Herein, we shall focus on the encode-and-forward protocol. We consider scenarios where the channels for the M sources are the same, that is, SNR i = SNR ∀ i ≤ M. As such, the distortion bounds are the same for any of the sources. In Figure 4.5, we examine the effect of increasing the number of sources, M for SNR = 20 dB. The bounds on the distortion increase as we increase the number of sources. This is also seen more directly by examining the distortion exponent bound computed in Corollary 7. Figure 4.6 considersM = 2,SNR 1 = 20dBandstudiestheeffectofincreasingtheSNR 2 andnotsurprisingly, improves the distortion. An interesting comparison between Figures 4.5 and 4.6 is the slope at which the distortion on the sources decreases as a function of the time allocated to the first hop. This slope is far more sensitive to the number of sources than it is to the SNR; this is predictable as MSE∼ SNR − K M for K a constant. Finally, we compare the effect of topology structure. In Figure 4.7, we compare the encode- and-forward bounds for a many-to-one network with M = 2 with the two hop and three hop linear topologies. SNRs have been appropriately normalized for the number of nodes to allow for 114 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time spent in the first hop Bounds on Distortion Increasing number of sources Figure 4.5: Comparison of the bounds on distortion of the first hop in a network assuming the encode-and-forward protocol as the number of sources in the first hop changes. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time spent in the first hop Distortion bound on sources Increasing SNR’s for second hop Figure 4.6: Comparisonof the bounds onthe distortion ofthe first hop in a two level tree network assuming the encode-and-forward protocol as the SNR for each of the nodes in the network changes. 115 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 T 1 MSE 1 2!to!1 3 Hop 2 Hop Figure 4.7: Comparison of the bounds on distortion of the first hop in a network assuming the encode-and-forward protocol as the network topology changes. fair comparison. We first observe that for the 2-to-1 network and the three hop linear network, there are a comparable number of channels to be estimated; however, the many-to-one network achieves lower distortion. This is due to the fact that the overall number of hops is less and thus lower overall distortion is incurred even though the number of channels to be estimated is the same. Note that for the 2-to-1 network, since we assume equal SNR’s at each of the nodes, joint encoding and individual encoding yield the same same scheme and thus attain the same distortion. However, as the sources SNR’s get more and more imbalanced, we expect these two schemes to diverge and the joint encoding scheme to perform better. 4.7 ComparisonswithExtrinsicSensingandMulti-terminal Communication In this chapter, we discuss the problem of estimating intrinsic parameters as we communicate over the channel. Here, we shall compare this sensing framework and achievable performance with the more traditional extrinsic sensing paradigm (where parameters to be sensed are out- side the communication network, e.g. temperature, wind speed etc.) and pure communication (in an information theoretic sense) in multi-node networks. For all single hop systems optimal 116 performance is trivial to achieve. Thus, we examine topologies with more than two hops. Power allocation for parameter estimation in a sensor network subject to a total network power con- straint is analyzed in [55]. The system set-up is such that multiple sensors observe a common random parameter and then transmit their observations to a fusion center. Intermediate relay nodeseitheramplify-and-forwardorestimate-and-forwardbasedonthesignalsreceived. Acritical aspect of the scenario in [55] is the presence of measurement noise at each sensor node - no such analogous feature exists in the sensing problem considered in this chapter. For non-zero measure- ment noise and an arbitrary power allocation for the many-to-one two hop network, one can show that the distortion exponent for both amplify-and-forward and estimate-and-forward protocols is zero as the measurement noise introduces an error floor. However, if the measurement noise is set to zero (perfect measurements at each sensor), one can show that the distortion exponent is unity for both protocols. It is important to observe that both amplify-and-forward and estimate- and-forward protocols transmit soft information from the relay to subsequent nodes. Similarly, amplify-and-forwardforoursensingproblem canachieve the optimaldistortionexponentthrough the transmission of soft information. For pure communication, the network topology can impose bottlenecks on performance. Con- siderthelinearnetwork. Byapplyingthecutsetbounds[11]foradditive,whiteGaussianchannels, one can show that the capacity of the linear network from relay i to the destination node is given by C ? i = min j>i C j (4.98) where C j is the capacity of the link between nodes j and j +1. Equation (4.98) mimics that of the distortion exponent for the encode-and-forward scheme in our sensing problem (Corollary 1). In both cases, the worst-case link is the bottleneck that determines network performance. Again we reiterate that amplify-and-forward can achieve the optimal distortion exponent in our sensing 117 scenario. We infer that, fundamentally, linear network performance is not limited by the worst case link. In fact, we further conjecture that sensing with soft information is not constrained by network topology, while the inherently “hard” nature of decoding is so limited. An interpretation of the encode-and-forward distortion exponent result for our sensing problem is that encode- and-forward must employ a fraction of the optimal distortion exponent to sustain inter-node communication; and communication must be distinguished from sensing. Thus, there seems to be a trade-off between the amount of inter-node communication that needs to be sustained and the bestdistortionexponentthatcanbeachievedforagivennetworktopologyforencode-and-forward in our sensing scenario. Similarly, for many-to-one networks, if we employ the encode-and-forward protocol for our sensing problem, the distortion exponent is bounded by the least time allocated to the links that communicate a particular set of estimates (Corollary 7) – which is equivalent to the cutset bounds for the corresponding communication network. However, again, since the amplify-and- forwardprotocolcanachieveoptimalperformance(Corollary6),thefundamentalperformancefor themany-to-onenetworkisagainquitedifferentascomparedtothecorrespondingcommunication network. 4.8 Conclusions In this chapter, we have analyze a class of sensing problems for sensor networks where the pa- rameters of interest are intrinsic to the communication network. In particular, we examine the problem of sensing all channels for all active links in a pre-defined network topology with the goal of achieving highfidelitychannel estimates at a sinknode. To this end, we derive upperand lower bounds on performance for two relay protocols: amplify-and-forward and encode-and-forward. The performance measure of interest was the end-to-end distortion and a measure of performance in the high SNR regime, the distortion exponent. We establish that encode-and-forward cannot 118 achieve the optimal distortion exponent of unity, but amplify-and-forward can. However, at finite SNR, we observe that encode-and-forward often achieves performance superior to that of amplify- and-forward. Both linear and tree-type network topologies are examined. We provide further justification for the intuition that network topologies should be layered and endeavor to minimize the number of hops from any source to the sink. An initial comparison with the more traditional extrinsic sensing as well as pure communication reveals that sensing does not share the same network constraints as pure communication, to this end, the transmission of soft-information in both intrinsic and extrinsic sensing can result in limited loss, if any. A topic of interest is further study of practical processing algorithms for both the encode-and- forwardandamplify-and-forwardframeworks. Additionalfutureworkwouldincludetheextension of our results to arbitrary tree topologies and other networks. 119 Chapter 5 Optimal Location for a Mobile Relay While the previous chapters address the problem of estimating the channel state while simultane- ously communicating over it, in this chapter, we address a different problem. The problem now is to position a mobile relay in a sensor network to optimize some global cost function. Particular instances of the cost function vary from end-to-end distortion to delay in the network to the total energy consumed in the network. It is interesting to note that some particular realizations of these problems lead to non-convex optimization problems which can be an issue as the size of the network is ramped up to address larger scale networks. 5.1 Introduction We are currently instrumenting robotic boats to have underwater acoustic communications capa- bilities. These robotic boats are an integral part of the Networked Aquatic Microbial Observing System (NAMOS) project 1 , which is used in studies of microbial communities in freshwater and marine environments [16], [53]. The NAMOS testbed is a system of anchored buoys (the static nodes), and a robotic boat (the mobile robot) capable of measuring temperature and chlorophyll concentrations. Thus, the testbed is a collection of static nodes as well as mobile/actuated nodes. AkeyresearchgoaloftheNAMOSprojectisdetermininghowtocoordinatethemobilerobotsand 1 See http://robotics.usc.edu/˜ namos for more information. 120 Figure 5.1: Setup for testing and benchmarking the NAMOS system. the static nodes such that the error associated with the estimation of a scalar field is minimized subject to the constraint that the energy available to the mobile robot is bounded. Specifically, if each static node makes a measurement, and the total energy available to the network is known, what path should the mobile robot take to minimize the integrated mean square error (MSE) associated with the reconstruction of the entire field? Typically, it had been assumed that the energy consumed by communication and sensing was negligible compared to the energy consumed by moving the mobile robot [66]. Another common assumption was that the mobile robot could communicate essentially error-free with all the static nodes and acquire sensor readings from them. In the currently implemented NAMOS system, communication is over a multi-hop wireless ad-hoc network described in [16]. This network employs 802.xx and thus communicates over-the- air. With the inclusion of underwater acoustic communication, we will have a proxy for a truly underwater mobile robot such as a glider or small submarine. In this new context, the cost of communications and the variability of the communication channel cannot be ignored. Thus, in addition to examining mobile node location as a function of estimation accuracy, we shall also consider the impact of endeavoring to communicate as well as estimate over the channel. We shall consider both radio wireless channels as well as underwater acoustic channels. The problem of communicating over a link while estimating the parameters associated with the link is addressed in [60]. There, it is established that a distortion constraint on the quality of 121 the channel estimates at the destination node can be translated to a constraint on the codebook employed at the encoder under some mild constraints. The problem is also generalized to the Multiple Access Channel (MAC) and the two-hop network, where, we are interested in estimates for all the channels involved in the network. The results of that work are then directly applicable tothisproblemwhenweareinterestedinformingchannelestimatesinadditiontocommunicating over the network. Wewillfirstgiveanexampleofhowoptimizingforthelocationofamobilerelaynodecanhelp in a practical setting. We will then attempt to optimize the network design for three particular metrics that the network can be scored on: end-to-end distortion, total delay in the network and the total energy consumed by the network. Once we understand these basic parameters, we will set up the problem of optimizing one metric while holding one or more of the others within particular limits. While the NAMOS testbed system has a number of static nodes, we shall limit our attention to a small number of static sensing nodes. It will be observed that interesting tradeoffs exist even in the context of these idealized sensor network topologies. In fact, even for two-hop networks, we must often contend with non-convex optimization problems as in the case when we want to minimize end-to-end distortion when one of the links is an underwater link and the other link is over-the-air (see Section 5.7.1 for example). In the sensor networking community, adaptive sampling techniques have been used to address thetrade-offbetweennetworklifetimeandestimationaccuracy. Inthetypicalsettingthenetwork is used to estimate a field by means of extrapolation from samples - measurements made at the locations where the network nodes are deployed. Approaches to treat this trade-off vary from multi-scale wavelet-based techniques [61] to linear filtering for sampling rate adaptation [31]. In the traditional paradigm, energy expenditure in a sensor network is largely ascribed to commu- nication so prolonging network lifetime is often equated with switching off the radios on selected sensor nodes. If the sensor network is heterogeneous, and some nodes within it are mobile, then 122 the energy expenditure is largely due to mobility. Rahimi [46] proposed an algorithm for an ac- tuated sensor network, which successively refines the field estimation by taking more readings in locations where estimation variance is high. In [65] a similar approach (based on optimal experi- mental design) is used but a specific model of energy consumption due to mobility is used. The approach due to [33] assumes that many sensors are deployed at the beginning, the correlations between the values of the scalar field at different locations are learned from the data collected from the sensors and the sensors are then redeployed for optimal sampling. In the mobile robotics community, the related problem of efficient data gathering tours by mobile robots has received someattention. Thesettingisusuallyinthecontextofaccurateself-localizationbytherobot[47], or simultaneous self-localization and environment mapping [49] or efficient environment coverage via mobility [64]. In the theoretical computer science community, similar problems have been studied as variants of TSP [2] and [6]. Thisworkdrawsheavilyfromtheresultsfromclassicalestimationtheory(see[56]forexample) and some new developments linking the fields of information theory and estimation theory – [27] in particular. We also employ results derived in [41,43] to solve the minimum distortion problem when we have multiple sensors and a single mobile relay node to facilitate communication with the destination node. We will assume no fading for the remainder of this chapter. Further, we will make simplifying assumptions to drive home the tradeoffs that we want to highlight. This will help us generate baseline estimates for the optimizing parameters which can then be improved by employing the applicable fading models. Finally, we assume a unit bandwidth for each link, for the purposes of this chapter since we are concerned with the optimal location of the relay. The problem under the availability of a band of frequencies can be addressed using the tools presented herein. One particular feature of the underwater channel is the decay in available bandwidth with increasing inter-node distances. This particular issue will be addressed at a later stage. 123 In this chapter, we first address the problem of solving for the optimal location of a mobile relay in a sensor network to optimize end-to-end distortion, total network delay and total net- work energy consumed individually. We then introduce the problem of optimizing for one of these parameters while one or more of the remaining parameters meet certain thresholds. From solvingtheseproblems, weobservethatevenforrelativelysimpletwo-hopnetworks, theoptimiza- tion problems might be non-convex (see 5.7.1 for example). Further, the nature of the problem changes dramatically when we consider links to have exponential degradation with distance (like for underwater links) as opposed to the more classical polynomial degradation seen in over-the-air links. The remainder of this chapter is organized as follows: Section 5.2 describes the data collection setupandpresentsstatisticsforthedatacollectedfromaparticularnetwork. Section5.3provides a brief description of the underwater communication channel. Sections 5.4 and 5.5 present results for the problem of minimizing sum distortion for the case that the base station is at a fixed position and the case that we can optimize over the location of the base station respectively. Section 5.6 introduces the two-hop signal model with a sensor and a mobile relay to assist the sensor communicate its estimates to the destination node. Section 5.7 introduces the problem of minimizing the end-to-end distortion in the estimation of a variable that is sensed at the sensor node. The case when there is a minimum guaranteed communication rate between the relay and the destination is also addressed before generalizing the problem to one with multiple sensors and one mobile relay node. Sections 5.8 and 5.9 address the problem of optimizing the total delay and the total energy, respectively, in communicating a fixed number of bits from the sensor node to the destination. Section 5.10 introduces problems where we are interested in optimizing over two or more of the constraints discussed in the previous sections. Finally, Section 5.11 concludes the chapter and presents avenues for future work. 124 Figure 5.2: Buoy setup to collect data. 5.2 Data Collection 5.2.1 Setup ThedatausedforthispaperwascollectedatJamesReserve(LakeFulmor),CAfromAugust28th to September 1st, 2006. Nine buoys were deployed in the lake and extensive amounts of data was collected over the time frame of the experiment. Some raw results are available on the NAMOS webpage. We will be using only a section of the data collected. In particular, we are interested in the surface temperature recorded at each of the buoys. Since surface temperature varies from buoy to buoy, we note that each buoy has information to transmit to the fusion center. 5.2.2 Collected Data Table 5.1 presents the surface temperature at each of the buoys and their locations for the net- work presented in Figure 5.2. Note that the data from Node 106 seems to have an especially high mean and variance (approximately 33 and 118 respectively) leading one to suspect that the measurementsonthatnodearebad. So, wewillrefrainfromusingthedatafromthisnodeforthe remainder of this paper and concentrate on the eight nodes remaining as the sources of interest. 125 Table 5.1: Statistics for the data collected from the network presented in Figure 5.2. Node Longitude Latitude Mean Variance Number ◦ W ◦ N ◦ C 101 -116.780733 33.804367 20.1285 0.8016 102 -116.779700 33.804967 19.8805 1.8452 103 -116.779400 33.805217 19.5718 1.9775 106 -116.781117 33.804400 33.6061 118.6918 107 -116.779083 33.805617 19.4780 1.9810 109 -116.780450 33.804600 19.9139 1.1469 110 -116.780067 33.804733 19.9810 1.5435 112 -116.779667 33.805000 19.8571 1.7236 114 -116.780733 33.804483 19.7265 1.0698 β = 0.11f 2 1+f 2 + 44f 2 4100+f 2 f ∈ (100Hz,3kHz], 8.68×10 3 SAf T f 2 f T 2 +f 2 + Bf 2 f T (1−6.54×10 −4 P) f ∈ (3kHz,0.5MHz]. (5.1) 5.3 Underwater Communication Characterizing the underwater acoustic channel has been a subject of great interest in the lit- erature the past decades [44,52]. As pointed out in [51], there exists fundamental differences that distinguish underwater acoustic channel from a Radio Frequency (RF) land mobile channel or a satellite channel. As unique to underwater acoustic channels, the signal frequency plays an importantroleofdeterminingtheabsorptionloss duetotransferofacousticenergyintoheat. Re- lying on extensive laboratory and field experiment data, the empirical formula of the attenuation coefficient β with units of db/km in sea water is presented in [36] and shown in Equation (5.1), where f is frequency in kHz, A = 2.34×10 −6 , B = 3.38×10 −6 , S is salinity, P is hydrostatic pressure in kg/cm 2 , and f T = 21.9×10 6− 1520 T+273 , is a relaxation frequency in kHz, with T the temperature in ◦ C. f T fluctuates between 59 kHz and 210 kHz as temperature ranges from 0 ◦ C to 30 ◦ C. For lower range of frequencies (i.e. 100 126 Hz to 3 kHz), the empirical formula forβ is also known as Thorp’s formula. Once the attenuation coefficient is determined, the transmission loss can be evaluated by 20log 10 SNR d SNR 0 =−βd (5.2) where, SNR 0 is the Signal to Noise Ratio at the transmitter and SNR d is that at a distance of d km. The transmission loss due to sound absorption leads to a fundamental bandwidth limitation of the underwater acoustic channel. The other characteristics of the underwater acoustic channel are described in greater detail in [44,51,52]. These features include time-varying channels due to Doppler effects, large delay spreads due to sparse multipath and significant latencies due to the slow speed of sound in water. However, for the current work, we shall focus on the strong attenuation of underwater acoustic propagation as a function of distance. 5.4 Problem Setup With Static Fusion Center We will model the data collected at each of the sources by an i.i.d Gaussian source with mean and variance as given in Table 5.1: X i ∼N(m i ,σ 2 i ). Since the amount of data that each node in our network needs to communicate to the fusion center is relatively small, we will assume that the lower frequencies are used for communication between the nodes and the fusion center and so use Thorp’s equation (the first branch in Equation (5.1). Let us assume a centre frequency of 1kHz for communication. Equation (5.1) then yields β ∼ 44. However, if we are interested in smaller form-factor devices, we can attempt to use a higher frequency like 24kHz and then β∼ 124. Let us model each of the links from the buoys to the fusion center as an AWGN channel with an SNR of SNR 0 at each of the buoys. Given the β we just calculated, we can then evaluate the corresponding communication rate R i for the link between buoy i and the fusion center: R i = 1 2 1+SNR 0 10 − βd 20 . (5.3) 127 Figure 5.3: Sum distortion at the fusion center as a function of the initial transmit SNR at each of the buoys for f c = 1kHz and f c = 24kHz. Let T i be the fraction of time that we allocate to node i in the TDMA setting. Then, the effective rate that node i can communicate information to the fusion center is given by R i T i . Given this rate of communication between node i and the fusion center, if we assume a mean squared distortion metric, the minimum distortion that we can sustain in the estimation of X i is then given by D i =σ 2 i e −2RiTi (5.4) bytherate-distortionresultforGuassianrandomvariables. Theminimumsumdistortionproblem can then be formulated as Minimize D = n X i=1 D i = n X i=1 σ 2 i e −2RiTi subject to n X i=1 T i ≤ 1 and T i ≥ 0 We can form the Lagrange multiplier for this problem (see [8] for example) as: L(T 1 ,...,T n ,λ) = n X i=1 σ 2 i e −2RiTi +λ n X i=1 T i −1 ! . (5.5) 128 Settingpartialderivativesw.r.tT i (i = 1...n)andλtozeroandsolving,wearriveatthesolution: T i = log(2σ 2 i R i ) 2R i P n i=1 log(2σ 2 i R i ) 2R i , R i > 1 2σ 2 i 0, otherwise. (5.6) for i = 1...n. The results of this optimization are presented in Figure 5.3 as a function of the transmit SNR at each buoy. Figure 5.4 presents the number of active buoys that the fusion center communicates with as a function of the tranmit SNR of the buoys. It is observed that for low transmit SNR’s, the fusion center selectively communicates with only those nodes that have high source variances (σ 2 i as given in Table 5.1) while at high transmit SNR’s, it performs a form of water-filling (see [11] for example) across the buoys. We also note that high source variances translate to more information content in the information theoretic sense and so the low SNRbehaviorcanalsobeinterpretedasselectivelycommunicatingwiththenodesthathavemore information to transmit that the others. LetusassumethatallthenodeshavethesametransmitSNR.AsthisSNRincreasesfromzero, it is observed that the buoys are communicated with in the order of decreasing source variance – the buoy with the highest source variance (node 107) is turned on first while the one with the lowest source variance (node 101) is turned on last. When SNR 0 is large, all nodes are allocated some time and no T i is zero. In this case, Equation (5.6) leads to T i ∝ log(2σ 2 i Ri) Ri and so, the time allocated to a particular buoy is directly proportional to the log of its observed variance and increases with its distance from the fusion center (since rate decreases as distance increases). It is then straightforward to see that buoy 107 always has the largest time allocated to it of all the buoys in the network (since it has the largest variance as well as distance from the fusion center) and is always the first node to be turned on for communication. 129 Figure 5.4: The number of active buoys for the minimum sum distortion problem with a fixed fusion center for f c = 1kHz and f c =24kHz. 5.5 Problem Setup with a Mobile Fusion Center We now address the case when the fusion center is mobile – in effect, we now optimize the results of the last section as we vary the location of the base station from the fixed on assumed there. Some gains are immediately obvious – at low SNR 0 ’s, we would locate the fusion center at the node with the largest variance and turn all the other nodes off – thereby leading to some gains over the fixed base station approach. Figures 5.5 and 5.6 present the results of this optimization as a function of SNR 0 . We note that while there gains over the fixed base station scheme, they are not large when f c = 1kHz – the gains are more striking when f c = 24kHz. We conjecture that this is because of the small distances involved in this network – the largest distance is on the order of hundreds of meters and for the attenuation coefficient implied by Equation (5.1), we do not have significant losses, when f c = 1kHz, over the underwater links for the communication losses to factor majorly into the problem. However, when f c = 24kHz, β is significant enough so that moving the fusion center actually produces significant gains. We will now present a general framework for the optimization of certain cost functions for networks where the nodes can be mobile. 130 Figure 5.5: The minimum sum distortion at the fusion center as a function of SNR 0 optimized for the location of the fusion center for f c =1kHz and f c = 24kHz. We also present the case when the fusion center is fixed. Figure5.6: Thenumberofactivebuoystominimizethesumdistortionatthemobilefusioncenter as a function of SNR 0 . We also present the results when the fusion center is fixed. 131 S D A d d RD SR R Figure 5.7: Signal model for the two-hop problem with a mobile relay. 5.6 The Two-Hop Signal Model ConsiderthesignalmodelinFigure5.7withasensor,S,(oracertainphenomenontobemeasured) and a data sink, D, that requires measurements of processes observed at S and separated from S by a distance of Am. We want to place a mobile relay, which we call the relay node R, between S, and D such that a parameter of interest is optimized while facilitating the estimation of a parameter X which is available at S. The relay faces an Additive White Gaussian Noise (AWGN) channel (with noise power N 0 ) from the sensor and the channel between the relay and the data sink is also modeled as an AWGN channel (with noise power N 0 again). We characterize the SR and RD channels by their individual SNR’s. For over-the-air trans- missions, SNR 1 = P1 N0d α 1 SR and SNR 2 = P2 N0d α 2 RD while for underwater communication, SNR 1 = P1e −α 1 d SR N0 and SNR 2 = P2e −α 2 d RD N0 . The signal model for the SR channel is Y 1 = p SNR 1 X +N 1 (5.7) where N 1 ∼N(0,1) and the signal model for the RD channel is Y 2 = p SNR 2 X 1 +N 2 (5.8) and again, N 2 ∼N(0,1). 132 5.7 Minimizing Distortion In this section, we attempt to minimize the end-to-end distortion from the phenomena to the destinationnodew.r.tthelocationofmobilesensornodesinthenetwork. Forthissection, wewill assume that the communication happens over a very large time horizon and that the transmission powers at S and R are fixed. In later sections, we will optimize over these parameters too. 5.7.1 Pure Estimation Consider the signal model described in Section 5.6 and that we observe a certain random variable at S that we want to describe at D. We want to evaluate the ideal location of the relay node, R, to facilitate this task with a minimum distortion between what S observes and what is received at D. Note that if we locate R close to S, we reduce losses in the transmission of measurements (or the measurements themself) over the SR link. However, if the relay is close to the source, then attenuation between the relay and destination nodes is maximized and hence, communica- tion is hampered and vice versa. We endeavor to quantify this trade-off under some simplifying assumptions here. LetusassumethatweareinterestedinminimizingtheMSEbetweentheoriginalphenomenon, X, anditsreconstruction ˆ X atthedestination. ForGaussianvariables, whencommunicatingover AWGN channels, it has been established (see [56] for example) that ˆ X ∗ = SNRσ 2 X Y SNRσ 2 X +1 (5.9) where SNR is the signal to noise ratio for the channel, σ 2 X is the variance of X and Y is the signal received at the destination (we shall assume σ 2 X = 1 for the remainder of this section for 133 convenience). This is then the minimum mean squared estimate (MMSE) ofX at the destination. Further, the minimum mean squared estimation error is then given by: min ˆ X=f(Y) E[X− ˆ X] 2 =E[X− ˆ X ∗ ] 2 = 1 SNR+1 (5.10) Further, it has been established that for Gaussian variables transmitted over AWGN channels, the end-to-end MSE can be written as a sum of the MSE’s over each of the individual hops in the network (see [63] for example). Due to our assumptions about the channels and the SNR’s for each of the channels (see Section 5.6), we then have the following expression for the end-to-end MSE: MSE = 1 SNR 1 +1 + 1 SNR 2 +1 (5.11) = 1 P1 N0d α 1 SR +1 + 1 P2 N0d α 2 RD +1 (5.12) = d α1 SR P 1 +N 0 d α1 SR + d α2 RD P 2 +N 0 d α2 RD (5.13) Minimizing Equation (5.13) given that d SR +d RD = Am, we obtain the following equation for the optimizing d SR : P 1 α 1 d α1−1 SR (P 1 +N 0 d α1 SR ) 2 = P 2 α 2 (A−d SR ) α2−1 (P 2 +N 0 (A−d SR ) α2 ) 2 (5.14) Note that Equation (5.14) always has a solution since the L.H.S is a monotonically increasing functionofd SR (increasingfrom0)whiletheR.H.Smonotonicallydecreasesto0withd SR . Figure 5.8 presents the end-to-end MSE as a function of the location of the mobile relay for the case when α 1 = α 2 = α, P 1 = 1 and P 2 = 4. Note that the end-to-end MSE is a convex function of the distance from the sensor to the relay and that there is a well defined global minimum at d SR ' 0.2m when α = 2. Figure 5.9 presents how the optimal location for the relay varies with the power available at the relay for transmission as α changes. As is expected, when the relay 134 Figure 5.8: End-to-end MSE for the two hop case with pure estimation as the decay exponent changes from 2 to 4. has very little power, it is optimal to place it close to the destination and vice versa. Further, we note that as α increases, the convexity of the MSE curves increases and that there is a particular point P 2 /N 0 ∼ 2 when the optimal location for the mobile node is always close to the middle of the line joining S and D. Under certain scenarios, one (or both) hop(s) of this two-hop network might be exclusively underwater. In this case it is appropriate to choose an exponential decay law for the SNR of that (those) hop(s). For example, if the first hop (the link between the sensor and the mobile relay) is an underwater link, we can model SNR 1 as SNR 1 = exp(−αd SR ) N0 . The MSE expression for this case is given by MSE = e αd SR P 1 +N 0 e αd SR + d α2 RD P 2 +N 0 d α2 RD (5.15) The end-to-end MSE expression above is a non-convex function of d SR for particular values of α. One such example is shown in Figure 5.10 for α = 5 We note that the MSE curve remains convex till α = 4 and then becomes non-convex. The end-to-end optimization for the MSE can still be solved for all extrema. One can then search over these points for one which yields the least MSE. Thus, we see that for a pure estimation problem, in general, we can provide a set of locations 135 Figure 5.9: Optimal location for the relay as a function of the power available at the relay for different decay exponents (α = 2,3,4). (corresponding to the extrema of the end-to-end MSE expression) which would be the optimal locations for mobile sensor. 5.7.2 Estimation Followed by Communication Consider the signal model in Figure 5.7. We want a minimum dedicated communication rate of R between the relay and D (for control information or otherwise) while the remaining resources can be utilized towards communicating information about the parameter X to the destination. The MSE for the estimation of X at the relay is then MSE 1 = 1 SNR 1 +1 . (5.16) Similarly, the RD link can support a maximal transmission rate of C 2 = 1 2 log 2 (1+SNR 2 ). (5.17) 136 Figure 5.10: End-to-end MSE as a function of the location of the mobile relay when the SNR for the first hop obeys an exponential decay with different decay exponents (α=1,2,3,4 and 5 are shown here. Figure 5.11: End-to-end MSE as a function of d SR with required communication rates of 0,1,2,3 and 4 bps. 137 Let us say that we need a sustained communication rate ofR from the relay to the destination for controlpurposesandtheremainingbandwidthcanbeusedtocommunicatetheestimateofX. So, we have an available rate of C 2 −R for communication of estimates of X to the destination. The relay node then needs to quantize its estimates using the rate-distortion function of a Gaussian random variable with variance 1. This rate-distortion function is given by R(E) = 1 2 log 2 1 E . (5.18) whereE is the estimation error andR(E) is the corresponding minimum rate of transmission (see [11] for example). After some algebra, we observe that an achievable distortion over the second hop is then given by MSE 2 = 2 2R SNR 2 +1 . (5.19) Therefore, the end-to-end MSE is then given by MSE =MSE 1 +MSE 2 (5.20) = 1 SNR 1 +1 + 2 2R SNR 2 +1 (5.21) Substituting for SNR 1 and SNR 2 and minimizing this expression w.r.t d SR , we obtain P 1 α 1 d α1−1 SR (P 1 +N 0 d α1 SR ) 2 = P 2 α 2 (A−d SR ) α2−1 2 2R (P 2 +N 0 (A−d SR ) α2 ) 2 (5.22) To highlight the difference even a relatively minor required communication rate imposes, consider Figures5.11and5.12whichpresenttheMSEasafunctionofd SR andtheoptimallocationforthe mobile sensor as a function of the power available for transmission at the relay node, respectively, for different required communication rates. It is seen that even R = 1bps causes a major change in the nature of the problem. 138 Figure 5.12: Optimal location as a function of the power available at the mobile relay with the additional constraint of a minimum of 0,1,2,3 and 4 bps of guaranteed communication between the relay and destination. Finally, note that the conclusions from Section 5.7.1 still hold – even if the end-to-end MSE expression is non-convex, we can provide a set of extremal points to test for the optimal location of the mobile relay – even when there is a requirement of a guaranteed rate of communication between the relay node and the destination. S D S S X R 1 2 3 Figure 5.13: Signal model for the problem where we have multiple sensors and a relay to facilitate communication with the destination node D. 139 5.7.3 Multiple Sensors with one Relay For this section, we will assume that there exists a source that each of the sensors, S i ,i = 1..K, observes over an AWGN channel with varying measurement noises as governed by the decay law applicable and the environment. The signal model for this scenario is presented in Figure 5.13. Further, we will assume that the sensors need to jointly relay information about this source to the destination through the use of a mobile relay node. Note that the relay serves both as a destination for the messages from each of the sensors and as a source for the transmission of its (possibly quantized) estimates about the source to the destination. The first part of the network: the source, the sensors and the relay form the network for the classical CEO problem – the only difference being that the location of the relay can be modified to minimize end-to-end distortion. Again note that since the sensors are distributed over space and since we finally need to relay the information from the relay to the destination, the optimal location for the relay might vary dramatically for different distributions of the sensors. The problem of evaluating the optimal locations for the relay then reduces to solving an MSE problem with the first term in the end-to-end MSE of Equation (5.14) replaced by the lowest distortion for the Gaussian CEO problem when the relay is located d RD away from the destination node. Once we assume AWGN channels between the communicating nodes, the problem reduces to a Gaussian CEO problem [3]. The rate-distortion region for this problem was established in [41]. Further, [43] establishes that for the Gaussian scenario, even allowing inter-relay communication does not improve of the optimal region. Mathematically, let us say that we have a variable X that is estimated at the sensor that we want to communicate to the destination with the least possible distortion. Further, assume that X ∼N(0,1). The observation at sensor i is given by Y i = p SNR 1,i X +Z 1,i , i = 1..K (5.23) 140 where, we have assumed that we have K sensors to aid the communication process. Each of the Z 1,i is independent and AWGN: Z 1,i ∼ N(0,σ 2 1,i ). Further, SNR i is determined by the power decay law assumed and the distance between the source of the observation X and sensor i. Each sensor then transmits its (quantized) estimate over its link to the mobile relay node. The relay thenquantizesitsestimatetomeettherateconstraintoverthesecondhopwhichisagaingoverned by the decay law and the distance between the relay and the destination node. Since we assume that the RD channel is AWGN too, the signal received at the destination is given by Y D = p SNR 2 X 2 +Z 2 (5.24) where, X 2 = f(Y i ), is a coded version of what is received at the relay. Further, Z 2 ∼ N(0,1) assumed to be independent of Z 1,i ∀i. Since all variables are Gaussian, the quantization rate at the relay is governed by the Gaussian Rate-Distortion function R(E 2 ) = 1 2 log 2 1 E 2 . (5.25) where, E 2 is the MSE over the second hop and R(E 2 ) is the corresponding minimum rate of transmission over the RD link. Given these expressions, we can employ the capacity region of the Gaussian CEO problem to solve for the minimum end-to-end MSE as well as the locations of the relays leading to this distortion. This region is given byR(E 1 ) =∪ (r1..r K )∈F(E1) R(r 1 ..r K ) where, R(r 1 ,...,r K ) def = ( (R 1 ,...,R K ) : X k∈B R k ≥ X k∈B r k + 1 2 log 2 1 E 1 − 1 2 log 1+ X k∈B c p k ! ; ∀B⊂{1,...,K}, B6=φ} (5.26) 141 Figure 5.14: Minimum end-to-end MSE for the network with two sensors and one mobile relay node. and, F(E 1 ) = ( (r 1 ..r K )∈R K + : 1+ K X k=1 p k = 1 E 1 ) (5.27) wherep k = 1−exp(−2r k ) σ 2 k . and the final end-to-end distortion in the estimation ofX at the destina- tionnodeisthengivenbyE =E 1 +E 2 . Notethattheexpressionabovegivesustherateregionas a function of a distortion bound E 1 while we want to determine the lowest admissible distortion E min = min d SR E as a function of the locations of the relay node. We evaluate this expression through simulation once we assume the appropriate decay functions over each of the links in the network. Figure 5.14 presents the minimum MSE curves for over-the-air and underwater links as a function of d SR for the case when we have two sensors, α = 2 (over-the-air communication and underwater communication), A = 1m, P 1 /N 0 = 1, P 2 /N 0 = 4, the sensor noise for the estimation of X at both the sensors in 0.5 and when the two sensors are symmetrically placed at (0, 0.2) and (0,-0.2) when the destination node is at (1,0) (all distances are in m). Note that we want to place the mobile relay close to the destination to minimize the MSE for the estimation of X at the destination in view of Figure 5.14. 142 5.8 Minimizing Delay For this section, we will assume that the mobile sensor node cannot communicate and move at the same time. Further, in spite of the finite time duration for communication and the limited number of bits to transmit, we will assume that communication between two nodes can occur at a fraction 1 K of the capacity of the link between the nodes and that the mobile relay node employs a half-duplex communication scheme – it first receives all the data from the sensor and then, in the next instant of time, communicates this information to the destination node. This will help us quantize the dependence of the communication rate between nodes on the distance between them. 5.8.1 The Two-Hop Network Consider the signal model from Figure 5.7. Let us assume that the sensor node has B bits of information to convey to the destination and that the mobile relay node is originally parked at the destination. Let us first analyze the case when the relay node needs to stay at a fixed location even after the sensor finishes transmission. Then, for each distance, d SR , between the sensor and the mobile relay, the relay needs to move d RD to reach that position. Let us say that the mobile relay node can move at v m/s. The total time taken for the messages to be received at the destination node is then given by T = d RD v + 2BK log 2 (1+SNR 1 ) + 2BK log 2 (1+SNR 2 ) (5.28) where, SNR 1 and SNR 2 are the SNR’s for the SR and the RD links. These can be parametrized using the distance between nodes as in Section 5.7.1 as: SNR 1 = P1 N0d α SR and SNR 2 = P2 N0d α RD . Employing a decay rate of α = 2 for both the links and assuming the powers as in Section 5.7.1 yields the total delay curve (the solid line) in Figure 5.15 for v=1m/s (which is roughly the speed of a NAMOS boat), K = 2, A = 10m and when B = 10 bits. Differentiating Equation (5.28) 143 Figure 5.15: Total time taken up in the network – as a function of where the mobile sensor node is placed after actuation. w.r.t d SR , we can see that for particular values of v and the other parameters in the problem, the optimization problem can be non-convex – thereby leading to (possibly) multiple local minima. Again, we need to find these extrema and then evaluate the delay function at each of these points to evaluate the globally optimal location for the mobile sensor. For the case when we consider the possibility that the relay can move after the sensor finishes transmission, we need to solve two optimization problems – the first for when the sensor is transmitting its information to the relay and the second when the relay is communicating its estimates to the destination. These problems are: (1) Minimize d RD v + 2BK log 2 (1+SNR 1 ) to get d ∗,1 SR (2) Minimize abs(d SR −d ∗,1 SR ) v + 2BK log 2 (1+SNR 2 ) Oncewesolvethesetwooptimizationproblems, wecomparetheresultanttotaldelaytothatfrom the one-shot optimization problem framed above to evaluate the globally optimal scheme. The resultofthetwooptimizationsascomparedtothesingleoptimizationcanbedramatic–asshown in Figure 5.15. The dashed line is the global minimum obtained from solving two optimization 144 problems as opposed to the single one in the solid line and is an order of magnitude better – the dashed line is at∼ 3.33 minutes while the minimum delay from the single optimization is∼ 15.3 minutes. 5.8.2 Multiple Sensors with one Relay Consider the problem of minimizing the delay in communication over the network presented in Figure 5.13. This problem can be solved by employing the multiple optimization scheme outlined above. We optimize the location of the mobile sensor node after each round of communication to minimize the end-to-end delay for that round. In a sense, we are employing a greedy approach towards solving the general problem and so, we need to compare the result of our approach to the one that results from a global minimization. This reduces to solving a Travelling Salesman Problem (TSP) which has been shown to be a NP-hard problem as the number of sensors in the problem increases. However, we can still employ our solution to obtain one particular solution to the problem. 5.9 Minimizing Energy For this section, we are interested in the problem of minimizing the energy consumed in the network. While optimizing the communication power employed at the transmitters has been studied earlier, the novelty of our setup is the additional optimization over the energy required to move the mobile sensor from location to location. We will see in the following sections that the relative expense of this task compared to the inter-node communication is one of the deciding factors in making the decision of where to locate the mobile node. 145 5.9.1 The Two-Hop Network Sinceweareoptimizingovertheenergyaloneandnotovertheotherdesignparameters,weneedto impose a rate constraint over the network. Let us say that we require sustained communication rates of R bps between the sensor and the destination for T seconds. We are then interested in minimizing the energy used by the network to attain this goal. Again, we assume that the mobile relay node starts off at the destination node and can be moved any distance d at a cost of kd units of energy. The problem then is to evaluate the optimal position of the mobile relay to minimize the energy consumption in the network. Note that if we leave the mobile relay at the destination node, the sensor needs to allocate a lot of power towards attaining the minimum required transmission rate of R. There is then an optimal location to which to move the mobile relay node which depends on how high the actuation cost k is relative to the communication cost. We again have the signal model described in Figure 5.7 and the link SNR’s are again given by: SNR 1 = P1 N0d α SR and SNR 2 = P2 N0d α RD . The total energy consumed over a duration of T seconds then is: E =kd RD +T(P 2 1 +P 2 2 ) (5.29) where, to meet the end-to-end rate constraint of R bps on the network, we need to have: R≥ 1 2K log 2 (1+SNR 1 ) (5.30) R≥ 1 2K log 2 (1+SNR 2 ) (5.31) where, again, we are assuming that we can only transmit at a fraction K of the maximum theoretical capacity of the channel. Figure 5.16 presents the energy consumed in the network over T = 1s to sustain an end-to-end rate of 6 bps with k = 100. The d SR which minimizes the energy consumption in the network is seen to be d ∗ SR = 0.595m. 146 Figure 5.16: Total energy consumed by the two-hop network – as a function of where the mobile sensor node is placed after actuation. Note the convex form of the curve presented. This convexity is lost when the polynomial decay of SNR with distance for over-the-air networks is replaced by an exponential one from the underwater communication channel as noted in previous sections. Finally, we note that we can further reduce the energy consumed in the network by allowing the mobile relay to move once the sensor is finished with its communication. This is under the assumption that we have a number of bits constraint instead of the sustained rate constraint that we impose in this problem. Given the sustained end-to-end rate constraint in this problem, however, this is not a practical solution. Again, these results can be extended to multiple nodes – however, the computational load grows exponentially with the number of nodes. 5.10 Multiple Constraints While the previous sections dealt with the optimal location for the mobile node when only one optimization metric is active, we address the problem under the presence of multiple metrics that we want optimize over. We will present one example of this problem – when we want to minimize delay while the energy used in the network is below a threshold. 147 5.10.1 Minimizing Delay with a Given Energy Constraint Consider the two-hop network in Figure 5.7 again. We are now interested in minimizing the delay in transmitting B bits of information from the sensor to the destination while the energy used in the network is lower than a given threshold. We will not allow the mobile sensor node to move after the sensor finishes transmission to simplify the problem. Note that since we have a fixed number of bits to transmit, we need to reformulate the energy consumed – we cannot use Equation (5.29). The total energy consumed for this problem is: E =kd RD + 2P 2 1 BK log 2 (1+SNR 1 ) + 2P 2 2 BK log 2 (1+SNR 2 ) (5.32) where K is the fraction of capacity that we can communicate at and k is the energy required to move the mobile sensor a distance of 1m. The optimization problem then is: Minimize dRD v + 2BK log 2 (1+SNR1) + 2BK log 2 (1+SNR2) while, E≤E threshold Figure 5.17 presents the minimum delay across the network as a function of E threshold when we assume that there is an additional requirement of at least R = 6 bps of information to be transmitted from the sensor to the destination. Here, E min = 415.2 units. Note that the decreasing nature of the curve is expected – as we allow more energy to be spent in the network, we have a larger feasible set to work with – thereby leading to lower delays. For a general optimization problem, with more nodes constraints, we can employ the schemes presented here to obtain the optimal location for the mobile sensor nodes. However, we note that as the number of nodes in the network increases, the complexity optimization problem grows exponentially – the problem becomes intractable for even a limited number of mobile nodes. 148 Figure 5.17: Minimum delay for the network as a function of the maximum energy we are willing to expend on the network. 5.10.2 Minimize Distortion Given a Delay Constraint Considerthetwo-hopnetworkpresentedinSection5.6again. Wearenowinterestedintheproblem of minimizing the end-to-end distortion in the transmission of a Gaussian random variable (with variance 1) observed at the sensor, S, to the destination, D, using the mobile node as a relay, R. Given a delay threshold T Threshold , if we can get N bits across the network reliably, from the rate-distortion expression for a Guassian variable (see Equation (5.18), we can attain a distortion of E =e −2N . The optimization problem then becomes Minimize e −2N given d RD v + 2NK log 2 (1+SNR 1 ) + 2NK log 2 (1+SNR 2 ) ≤T threshold . Figure 5.18 presents the result of this optimization problem for different values of the over-the-air decay coefficient α. 149 Figure 5.18: Delay-Distortion trade-off curves for α = 2,3 and 4. 5.11 Conclusions Theproblemofsolvingfortheoptimallocationforamobilesensornodeinasensornetworkisad- dressed. Different optimization metrics are considered and the differences in solutions highlighted even for the simple two-hop network. A many-sensor problem considered from an end-to-end distortion optimization point of view. Finally, the problem of solving for the optimal location of a mobile sensor node under multiple cost constraints is considered and solved for particular cases. It is noted that as the number of sensor nodes increase, the problem becomes intractable very soon and efficient heuristics are needed. One such heuristic – the greedy solution is presented for the particular case when we want to minimize end-to-end delay in the sensor network and might be a likely candidate heuristic for the general problem. 150 Chapter 6 Conclusions In this thesis we have explored the problems of estimating a channel state while communicating over it and of optimizing for the location of a mobile relay node in a sensor network to optimize a given cost function. There are a number of other open problems that present avenues for future research. A few of these are: • The problem of joint communication and channel state estimation over a broadcast link. Note that the distributiveness of the probability distribution of the channel state estimate (stated in Lemma 6) fails to hold for this channel and other approaches need to be explored to single letterize the capacity region of this channel. • The notion of information losslessness for a two-hop relay network needs to be fully charac- terized. We present one particular case where information losslessness holds, but, a general characterization would be extremely useful and might present additional insight. • A cutset bound for joint communication and channel estimation needs to be explored. This will enable us to present upper bounds on the end-to-end communication rates for a general network which requires joint communication and channel state estimation. • The dual problem to the one-hop capacity-distortion problem needs to be explored and defined. 151 • The results from Chapter 5 need to be generalized to arbitrary networks. Towards this, tools need to be developed to handle the network optimization problem which can get very complicated as the number of nodes in the network increase. These are just a few of the problems that need to be explored. We note that given the novelty of the problems considered in this thesis, there is very little previous work that one can draw on and most results that one can derive about these problems are of theoretical interest since they overcome the notion of having channel state information for “free” at the transmitter and receiver. 152 Bibliography [1] M. Abramowitz and I. Stegun. Handbook of Mathematical Functions. Dover, NY, 1965. [2] E. M. Arkin, J. S. Mitchell, and G. Narasimhan. Resource-constrained geometric network optimization. In Proceedings of the forteenth annual symposium on Computational geometry, pages 307–316, 1998. [3] T. Berger, Zhen Zhang, and H. Viswanathan. The CEO problem [multiterminal source coding]. IEEE Transactions on Information Theory,, 42(3):887–902, May 1996. [4] S.M Berman. Equally correlated random variables. Sankhya A, 24:155–156, 1962. [5] E. Biglieri, J. Proakis, and S. Shamai. Fading channels: information-theoretic and com- munications aspects. IEEE Transactions on Information Theory, 44(6):619–2692, October 1998. [6] Avrim Blum, Shuchi Chawla, David R. Karger, Terran Lane, Adam Meyerson, and Maria Minkoff. Approximation algorithms for orienteering and discounted-reward tsp. In Proceed- ings of the 44th Annual IEEE Symposium on Foundations of Computer Science, pages46–55, 2003. [7] B.Z. Bobrovsky and M. Zakai. A lower bound on the estimation error of certain diffusion processes. IEEE Transactions on Information Theory, IT-22(1):45–52, January 1976. [8] StephenBoydandLievenVandenberghe. Convex Optimization. CambridgeUniversityPress, 2003. [9] T. M. Cover and A. El Gamal. Capacity theorems for the relay channel. IEEE Info. Theory, 25(5):572–584, September 1979. [10] Thomas M. Cover, Young-Han Kim, and Arak Sutivong. Simultaneous communication of data and state. 2007. [11] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., 2006. [12] I Csiszar and JG Korner. Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press, Inc., Orlando, FL, USA, 1982. [13] S. Cui, J-J. Xiao, A. J. Goldsmith, Z-Q. Luo, and H. V. Poor. Estimation Diversity and En- ergy Efficiency in Distributed Sensing. IEEE Transactions on Signal Processing, 55(9):4683– 4695, September 2007. [14] H.A David and H.N Nagaraja. Order Statistics. Wiley Series in Probability and Statistics. Wiley Interscience, 3 edition, 2003. 153 [15] N. Devroye, P. Mitran, and V. Tarokh. Achievable rates in cognitive radio channels. IEEE Transactions on Information Theory, 52(5):1813–1827, May 2006. [16] A. Dhariwal, B. Zhang, C. Oberg, B. Stauffer, A. Requicha, D. Caron, , and G. S. Sukhatme. Networkedaquaticmicrobialobservingsystem. InIEEEInternationalConferenceofRobotics and Automation (ICRA), Orlando, FL, USA, May 2006. [17] S. Draper and G. Wornell. Side information aware coding strategies for sensor networks. IEEE J. Selected Areas Commun., 22, 2004. [18] Y. Ephraim and R.M. Gray. A unified approach for encoding clean and noisy sources by means of waveform and autoregressive model vector quantization. IEEE Transactions on Information Theory, 34(4):826–834, July 1988. [19] U. Erez and R. Zamir. Achieving 1/2 log (1+snr) on the awgn channel with lattice encoding and decoding. IEEE Transactions on Information Theory, 50(10):2293–2314, October 2004. [20] R. Etkin, A. Parekh, and D. Tse. Spectrum sharing for unlicensed bands. IEEE Journal on Selected Areas in Communications, 25(3):517–528, April 2007. [21] B. Fette. Cognitive Radio Technology. Elsevier, 2006. [22] R.G Gallager. Energy limited channels: Coding, multiaccess, and spread spectrum. Tech- nical Report LIDS-P-1714, Tech. Rep. Lab. Inform. Decision Syst., Mass. Inst. Technol., Cambridge, MA, November 1988. [23] Robert G. Gallager. Information Theory and Reliable Communication. Wiley, 1968. [24] M. Gastpar, B. Rimoldi, and M. Vetterli. To code, or not to code: Lossy source-channel communication revisited. IEEE Transactions on Information Theory, 49(5):1147–1158, May 2003. [25] M. Gastpar and M. Vetterli. Power-bandwidth-distortion scaling laws for sensor networks. In Proc. of Information Processing in Sensor Networks, pages 320–329, April 2004. [26] R Gowaikar, A.F Dana, B Hassibi, and M Effros. A Practical Scheme for Wireless Network Operation. IEEE Transactions on Communications, 55(3):463–476, March 2007. [27] Dongning Guo, S. Shamai, and S. Verdu. Mutual information and minimum mean-square error in gaussian channels. IEEE Transactions on Information Theory, 51(4):1261–1282, April 2005. [28] B.HassibiandB.Hochwald. Howmuchtrainingisneededinmultiple-antennawirelesslinks? IEEE Transactions on Information Theory, 49(4):951–963, April 2003. [29] B.HassibiandB.Hochwald. Howmuchtrainingisneededinmultiple-antennawirelesslinks. IEEE Transactions on Information Theory, 49(4):951–963, April 2003. [30] B.M. Hochwald and T.L. Marzetta. Unitary space-time modulation for multiple-antenna communicationsin rayleigh flat fading. IEEE Transactions on Information Theory, 46(2):543–564, March 2000. [31] AnkurJainandEdwardY.Chang. Adaptivesamplingforsensornetworks. IntheProceedings of the First Workshop on Data Management for Sensor Networks (DMSN 2004), 2004. [32] Young-Han Kim. Capacity of a class of deterministic relay channels. IEEE Transactions on Information Theory, 53(3):1328–1329, March 2008. 154 [33] Andreas Krause, Carlos Guestrin, Anupam Gupta, and Jon Kleinberg. Near-optimal sensor placements: Maximizing information while minimizing communiction cost. In Proccedings of the Fifth International Conference on Information Processing in Sensor Networks (IPSN’06), pages 2–10, Apr 2006. [34] J. N. Laneman, D. N. C. Tse, and G. W. Wornell. Cooperative diversity in wireless net- works: Efficient protocols and outage behavior. IEEE Transactions on Information Theory, 50(12):3062 – 3080, December 2004. [35] Yingbin Liang and V.V. Veeravalli. Capacity of noncoherent time-selective rayleigh-fading channels. IEEE Transactions on Information Theory, 50(2):3095–3110, December 2004. [36] I.P. Lysanov L.M. Brekhovskikh, Yu. P. Lysanov. Fundamentals of Ocean Acoustics. Springer, Computer Sciences Corporation, Washington DC, USA, 2003. [37] XiaoliMa,LiuqingYang,andG.B.Giannakis. Optimaltrainingformimofrequency-selective fading channels. IEEE Transactions on Wireless Communications, 4(2):453–466, March 2005. [38] R. McEliece and W. Stark. Channels with block interference. IEEE Transactions on Infor- mation Theory, 30(1):44–53, January 1984. [39] R. Nowak, U. Mitra, and R. Willett. Estimating inhomogeneous fields using wireless sensor networks. IEEE Journal on Selected Areas of Communications,22(6):999–1006,August2004. [40] US Department of Commerce and US Department of the Interior. US DoC and US DoI Tsunami detection and warnings. [41] Y. Oohama. The rate-distortion function for the quadratic Gaussian CEO problem. IEEE Transactions on Information Theory,, 44(3):1057–1070, May 1998. [42] C.S. Patel and G.L. Stuber. Channel Estimation for Amplify and Forward Relay Based Cooperation Diversity Systems. IEEE Transactions on Wireless Communications, 6(6):2348 – 2356, June 2007. [43] V. Prabhakaran, K. Ramchandran, and D. Tse. On the role of interaction between sensors in the CEO problem. In Allerton Conference on Communication, Control, and Computing, Pacific Grove, CA, USA, Oct 2004. [44] James Preisig. Acoustic propagation considerations for underwater acoustic communications network development. SIGMOBILE Mob. Comput. Commun. Rev., 11(4):2–10, 2007. [45] R. Szewczyk, E. Osterweil, J. Polastre, M. Hamilton, A. Mainwaring, and D. Estrin. Habitat Monitoring with Sensor Networks. Communications of the ACM, 47(6):34–40, June 2004. [46] Mohammad H. Rahimi. Bioscope: Actuated Sensor Network for Biological Science. PhD thesis, Computer Science Department, Viterbi School of Engineering, USC, 2005. [47] N. Roy, W. Burgard, D. Fox, and S. Thrun. Coastal navigation – robot motion with uncer- tainty. In In AAAI Fall Symposium: Planning with POMDPs, pages 35–40, 1998. [48] S. Haykin. Cognitive Radio: Brain-Empowered Wireless Communications. IEEE J. Select. Areas Comm., 23(2):201–220, February 2005. [49] Robert Sim and Nicholas Roy. Global a-optimal robot exploration in SLAM. In the Proceed- ings of the IEEE/RSJ International Conference of Robotics and Automation (ICRA), pages 661–666, 2005. 155 [50] M.Stojanovic. Recentadvancesinhigh-speedunderwateracousticcommunications. InIEEE Journal of Oceanic Engineering, volume 21, pages 125–137, April 1996. [51] Milica Stojanovic. On the relationship between capacity and distance in an underwater acousticcommunicationchannel. InWUWNet ’06: Proceedings of the 1st ACM international workshop on Underwater networks, pages 41–47. ACM, 2006. [52] Milica Stojanovic. Underwater acoustic communications: Design considerations on the phys- ical layer. Fifth Annual Conference on Wireless on Demand Network Systems and Services, pages 1–10, 2008. [53] G. S. Sukhatme, A. Dahriwal, B. Zhang, C. Oberg, B. Stauffer, and D. Caron. The de- sign and development of a wireless robotic networked aquatic microbial observing system. Environmental Engineering Science, 24(2):205–215, Oct 2007. [54] ArakSutivong,MungChiang,ThomasM.Cover,andYoung-HanKim. Channelcapacityand state estimation for state-dependent gaussian channels. IEEE Transactions on Information Theory, 51(4):1486–1496, April 2005. [55] G. Thatte and U. Mitra. Sensor selection and power allocation for distributed estimation in sensor networks: Beyond the star topology. IEEE Transactions on Signal Processing, July 2007. to appear. [56] Harry L. Van Trees. Detection, Estimation, and Modulation Theory Part I. John Wiley and Sons, Inc., 1968. [57] D. Tuninetti, U. Niesen, and C. Fragouli. On capacity of line networks. San Diego, CA, February 2006. [58] S. Vedantam, U. Mitra, and A. Sabharwal. Shared sensing and communications in sensor networks: The multihop case. In Proceedings of the IEEE International Symposium on Information Theory, pages 620–624, July 2006. [59] S. Vedantam, Urbashi Mitra, and A. Sabharwal. Sensing the channel: Sensor networks with shared sensing and communications. In Proceedings of the fifth international conference on Information processing in sensor networks,pages260–267,Nashville,Tennessee,USA,2006. ACM Press. [60] S. Vedantam, W. Zhang, and U. Mitra. A constrained channel coding approach to joint com- munication and channel estimation. IEEE Transactions on Information Theory,, submitted Sept 2008. [61] Rebecca Willett, Aline Martin, and Robert Nowak. Backcasting: Adaptive sampling for sen- sor networks. In Proceedings of the third international symposium on Information processing in sensor networks, pages 124–133, 2004. [62] H. Witsenhausen. Indirect rate distortion problems. IEEE Transactions on Information Theory, 26(5):518–521, September 1980. [63] J. K. Wolf and J. Ziv. Transmission of noisy information to a noisy receiver with minimum distortion. IEEE Transactions on Information Theory, 16(4):406–411, July 1970. [64] B. Yamauchi. A frontier-based approach for autonomous exploration. In the Proceedings of the IEEE International Conference of Robotics and Automation (ICRA), pages 146–151, Jul 1997. 156 [65] B. Zhang, M. Hansen, and G. Sukhatme. Adaptive sampling for estimating a scalar field using a mobile robot and a sensor network. submitted to the IEEE Transactions on Robotics, 2008. [66] B.ZhangandG.S.Sukhatme. Adaptivesamplingforestimatingascalarfieldusingamobile robotandasensornetwork. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 3673–3680, Roma, Italy, Apr 2007. [67] L. Zheng and D. N. C. Tse. Communication on grassman manifold: A geometric approach to the noncoherent multiple antenna channel. IEEE Transactions on Information Theory, 48(2):359–383, February 2002. [68] Lizhong Zheng and David N. C. Tse. Communication on the grassmann manifold: a geomet- ric approach to the noncoherent multiple-antenna channel. IEEE Transactions on Informa- tion Theory, 48:359–383, February 2002. [69] S. Zhou and P. Willett. Submarine location estimation via a network of detection-only sensors. IEEE Transactions on Signal Processing, 55(6):3104–3115, June 2007. 157
Asset Metadata
Creator
Vedantam, Satish (author)
Core Title
Communication and estimation in noisy networks
Contributor
Electronically uploaded by the author
(provenance)
School
Andrew and Erna Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
05/01/2009
Defense Date
03/03/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
capacity,channel estimation,distortion,estimation theory,Information Theory,linear networks,multiple hops,OAI-PMH Harvest
Language
English
Advisor
Mitra, Urbashi (
committee chair
), Baxendale, Peterm H. (
committee member
), Caire, Giuseppe (
committee member
), Kim, Young-Han (
committee member
)
Creator Email
satish230@yahoo.com,vedantam@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2148
Unique identifier
UC1232796
Identifier
etd-Vedantam-2875 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-225968 (legacy record id),usctheses-m2148 (legacy record id)
Legacy Identifier
etd-Vedantam-2875.pdf
Dmrecord
225968
Document Type
Dissertation
Rights
Vedantam, Satish
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
uscdl@usc.edu
Abstract (if available)
Abstract
In this thesis, we present a study of two problems relevant to sensor networks. We first study a new form of sensor networks where the parameters of interest are the inter-node time-varying channel(s) and then study the problem of determining the optimal location of a mobile relay node to optimize a certain cost function for the sensor network.
Tags
capacity
channel estimation
distortion
estimation theory
linear networks
multiple hops
Linked assets
University of Southern California Dissertations and Theses