Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Joint communication and sensing over state dependent channels
(USC Thesis Other)
Joint communication and sensing over state dependent channels
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
JOINT COMMUNICATION AND SENSING OVER STATE DEPENDENT CHANNELS by Chiranjib Choudhuri A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2013 Copyright 2013 Chiranjib Choudhuri Dedication This dissertation is dedicated to my mother and brother, who always stood by me through thick and thin. ii Acknowledgements This dissertation would not have been possible without the help of so many people in so many ways. I would like to take this opportunity to recognize all the people who have contributed in various capacities at various stages of my dissertation. I am forever indebted to all my teachers, who has shaped me to the person that I am today. Most importantly, I would like to express my deepest appreciation to my advisor - Dr. Urbashi Mitra. She has given me unparalled opportunities for learning, thinking, and self-discovery. Without her guidance and persistant help this dissertation would not have been possible. I have also gained a lot from my interactions with Dr. Giuseppe Caire at USC, his class on Information Theory and his clarity of presentation was the motivation to look at this field for a research problem. I also benefited from interactions with Dr. Rahul Jain, Dr. Alex Dimakis, Dr. Solomon Golomb, Dr. Gaurav S. Sukhatme, Dr. Shrikanth Narayanan and Dr. Zhen Zhang in and outside classes. These are the people who helped me develop the tools I employ in this thesis. Finally, I have to mention Behzad Ahmadi, Dr. Osvaldo Simeone, Dr. Young-Han Kim and Dr. Geoff Hollinger, with whom we collaborated on a number of results which are presented in this thesis. Their insights and intuitions were immensely helpful in setting up the direction of this thesis. I cannot begin to describe the hard-work, dedication and support that my mother and my brother have put into my life. The belief that they have in me and their constant motivation has always been the source of energy for me. I am blessed to have them in my life and hope that I turn out to be at least half the person they deserve. No mention of my life at USC would be complete without a mention of my friends. Subhankar, Sivaditya, Sunil, Abhishek, Srinivas and Prithviraj helped me keep a balanced and inquisitive mind. I am thankful for all the fruitful discussions we had on topics not restricted to research. Life at USC was made easy by the expert handling and ever-present help provided by Gerrielyn Ramos, Anita Fung and Diane Demetras. I never had a question that they could not help me with. iii Table of Contents Dedication ii Acknowledgements iii List of Tables vii List of Figures viii Abstract xi Chapter 1: Introduction 1 1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2: Relay with finite memory 15 2.1 Channel Models and Capacity Relationships . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Bounds on the Capacity of then-CGRC . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 Achievable Rate: Decode-and-Forward . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.2 Achievable Rate: Compress-and-Forward . . . . . . . . . . . . . . . . . . . . . . 22 2.2.3 Capacity Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2.4 Optimality of the achievable rates . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 Limiting Capacity of the Relay Channel with memory . . . . . . . . . . . . . . . . . . . . 32 2.4 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.1 Relay with equal bandwidths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.2 Relay with unequal bandwidths . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.3 Comparison of achievable rates . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5 Symbol asynchronous relay channel: MIMO Relay channel with memory . . . . . . . . . 39 2.5.1 Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.5.2 Achievable rates and upper bound . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Chapter 3: Causal state communication 49 3.1 Problem Setup and Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.1.2 Proof of the Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1.3 Lossless Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 Capacity–Distortion Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.2 Proof of the Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.3 Injective Deterministic Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.4 Gaussian Channel with Additive Gaussian State . . . . . . . . . . . . . . . . . . . 64 iv 3.2.5 Binary Symmetric Channel with Additive Bernoulli State . . . . . . . . . . . . . . 65 3.3 Causal State Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.1 Gaussian Channel with Additive Gaussian State . . . . . . . . . . . . . . . . . . . 68 3.3.2 Binary Symmetric Channel with Additive Bernoulli State . . . . . . . . . . . . . . 68 3.3.3 Five-Card Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Chapter 4: On non-causal channel state information 73 4.1 Problem Setup and Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.1.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.1.2 Proof of the Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2.1 Lossless Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2.2 Quadratic Gaussian state communication . . . . . . . . . . . . . . . . . . . . . . 81 4.2.3 Binary Symmetric Channel with Additive Bernoulli State . . . . . . . . . . . . . . 82 4.3 Source coding with vending machine at the encoder . . . . . . . . . . . . . . . . . . . . . 84 4.4 Discrete memoryless implicit channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.4.1 Lossless Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4.2 Binary Symmetric Channel with Additive Bernoulli State . . . . . . . . . . . . . . 90 4.5 Witsenhausen counterexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.5.1 Proof of the Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.5.2 Proof of Achievability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.5.3 Comaparison of lower bounds with optimal MMSE . . . . . . . . . . . . . . . . . 108 4.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Chapter 5: Action dependent state: channel coding 110 5.1 Joint Communication and Estimation: Problem Formulation . . . . . . . . . . . . . . . . 111 5.2 Action dependent Causal state communication . . . . . . . . . . . . . . . . . . . . . . . . 112 5.2.1 Non-adaptive action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.2.2 Proof of Achievability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.2.3 Proof of the Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.2.4 Adaptive Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.3 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.3.1 Actions Seen by Decoder: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.3.2 Gaussian Channel with Additive Action Dependent State . . . . . . . . . . . . . . 119 5.3.3 State dependent MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.4 Non-causal action dependent communication . . . . . . . . . . . . . . . . . . . . . . . . 124 5.4.1 Proof of the Converse: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.4.2 Adaptive action dependent Gaussian channel . . . . . . . . . . . . . . . . . . . . 126 5.5 Cost constrained sampling: Probing Capacity . . . . . . . . . . . . . . . . . . . . . . . . 128 5.5.1 Gaussian Probing Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.5.1.1 Proof of Achievability . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.5.1.2 Proof of the Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Chapter 6: Distortion metric for Robotic sensor networks 137 6.1 Problem Formulation: Single Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.1.1 Motion Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.1.2 Communication Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2 Tracking a source with unknown location . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.3 Moving Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 v 6.4 Properties of the Distortion Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.5 Motion Planning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.6 Simulated Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.6.1 Stationary Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.6.2 Moving Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.6.3 Vehicle Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Chapter 7: Action dependent side information: source coding 160 7.1 Cascade Source Coding with A Side information Vending Machine . . . . . . . . . . . . . 161 7.1.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 7.1.2 Rate-Distortion-Cost Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 7.1.2.1 Proof of the converse . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 7.1.3 Lossless Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 7.2 Cascade-Broadcast Source Coding with A Side Information Vending Machine . . . . . . . 169 7.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 7.2.2 Lossless Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7.2.3 Example: Switching-Dependent Side Information . . . . . . . . . . . . . . . . . . 173 7.2.4 Lossy Compression with Common Reconstruction Constraint . . . . . . . . . . . 176 7.2.4.1 Proof of the converse . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 7.3 Adaptive Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 7.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Chapter 8: Conclusions and Future Work 188 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 8.2 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 8.2.1 Linear Two Hop Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 8.2.2 Action dependent channel with asymmetric messages . . . . . . . . . . . . . . . . 193 References 198 vi List of Tables 4.1 Equivalence of the achievable scheme of [43] to our proposed coding scheme . . . . . . . 99 5.1 Equivalence of setting in [130] to our formulation of adaptive action-dependent channel . . 128 5.2 The impact of adaptive action on action-dependent channel . . . . . . . . . . . . . . . . . 136 8.1 Equivalence of setting of relay channel of [26] to our formulation of action-dependent channel with asymmetric message . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 vii List of Figures 1.1 Visualization of an autonomous vehicle gathering data from a deployment of sensors mon- itoring a source signal. We propose a novel metric based on the squared error distortion to optimize the trajectory of the vehicle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 A multi-hop computer network in which intermediate and end nodes can access side infor- mation by interrogating remote data bases via cost-constrained actions. . . . . . . . . . . . 5 2.1 Channel model of a single-relay channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Decomposition of relay with ISI into parallel memoryless relays in frequency domain. . . 22 2.3 Relay with equal bandwidths on each link. . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4 Decomposition of a relay with unequal bandwidths into a relay with equal bandwidth and a two hop channel with “remaining” bandwidths. . . . . . . . . . . . . . . . . . . . . . . 37 2.5 Channel model of a single-relay channel with ISI. . . . . . . . . . . . . . . . . . . . . . . 38 2.6 Achievable rates and upper bound whenh = 0.25 Km,d SD = 1 Km anda varies from 0 to1.0 Km. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7 Discrete time equivalent of a symbol-asynchronous relay channel. . . . . . . . . . . . . . 41 2.8 Rates for symbol-asynchronous relay network forP S = 10,P R = 10,α = 2 and varyingd. 48 3.1 Strictly causal state communication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2 The capacity–distortion function of the binary symmetric channel with additive Bernoulli state (p = q = 1/4) when the state information is available strictly causally (C SC ) or causally (C C ) at the encoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.1 Non-causal state communication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2 Source coding with side information vending machine at the encoder. . . . . . . . . . . . 84 4.3 Channel model for implicit communication . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.4 Vector Witsenhausen’s counterexample. . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.5 Pictorial description of the estimation problem. . . . . . . . . . . . . . . . . . . . . . . . 97 4.6 Comparison of our optimal amplification factora ∗ with the optimala used in DPC strategy of [24]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.7 Comparison of prior lower bounds with the minimum distortion of Theorem 20. . . . . . . 108 viii 5.1 Channels with adaptive action-dependent states. . . . . . . . . . . . . . . . . . . . . . . . 111 5.2 Capacity–distortion function: adaptive vs. non-adaptive action . . . . . . . . . . . . . . . 122 5.3 State dependent MAC with strictly causal CSI at both encoders. . . . . . . . . . . . . . . 123 5.4 State dependent MAC with asymmetric CSI at both encoders [130]. . . . . . . . . . . . . 127 5.5 Equivalence of the setting of probing the channel state at the encoder to that of channels with action dependent states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.1 Channel model for the sensor network for a particular location of the AUV . The channel is a two hop communication channel with the sensors acting as a relay. We additionally assume that the second hop channel is a function of the source signal. . . . . . . . . . . . 138 6.2 Visualization of an autonomous vehicle gathering data from a deployment of sensors mon- itoring a source signal. We assume the location of the source is uniformly distributed in the source cloud and the dimension of the cloud is known to both the sensors and the AUV . 147 6.3 Visualization of an autonomous vehicle gathering data from a deployment of sensors mon- itoring a moving source signal. We assume that at each instant of time the source is uni- formly distributed in the source cloud and the cloud is changing position because the source is moving. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.4 Comparison of the BCDM-RRT sampling-based motion planning algorithm to a gradient- based approach and two heuristic strategies in a 10 km× 10 km environment with obsta- cles. The simulated vehicle is capable of unconstrained motion at a maximum speed of 1 km/hr. Results shown for a single source and for five sources. The proposed method provides improved estimation of the source signal for a given trajectory length. Each data point is averaged over 1000 random sensor deployments, and error bars are one SEM. . . . 155 6.5 Simulations with a moving source in a 10 km× 10 km environment with obstacles. Similar to the results with stationary sources, the gradient methods and the BCDM-RRT provide improved minimization of squared error versus the heuristic and random methods. Each data point is averaged over 1000 random sensor deployments, and error bars are one SEM. 156 6.6 Trajectory generated for an autonomous underwater vehicle (AUV) with dynamics. The information source is shown as a square, and the sensors are shown as diamonds. The vehicle’s starting point is shown as a circle, and its trajectory is shown as a solid line. The proposed algorithm minimizes the expected distortion along the path and also incorpo- rates general constraints on the vehicle’s trajectory. The vehicle moves in a trajectory that veers towards the bottom sensor before moving towards a location between the information source and its closest sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.1 (a) Cascade source coding problem and (b) cascade-broadcast source coding problem. . . . 161 7.2 Cascade source coding problem with a side information “vending machine” at Node 3. . . 162 7.3 Cascade source coding problem with a side information “vending machine” at Node 2 and Node 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 7.4 Cascade-broadcast source coding problem with a side information “vending machine” at Node 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 7.5 The side information S-channelp(w|x) used in the example of Sec. 7.2.3. . . . . . . . . . 174 ix 7.6 Difference between the weighted sum-rateR 1 +ηR 2 obtained with the greedy and with the optimal strategy as per Corollary 7 (R b = 0.4,δ = 0.6). . . . . . . . . . . . . . . . . 176 8.1 State communication problem for two-hop state dependent channel. . . . . . . . . . . . . 192 8.2 Action dependent channel with a private message at the action encoder. . . . . . . . . . . 194 8.3 Channel model for a degraded memoryless relay channel. . . . . . . . . . . . . . . . . . . 195 8.4 The problem of joint communication and estimation with cost constrained action depen- dent side information at both the encoder and the decoder. . . . . . . . . . . . . . . . . . 197 x Abstract The fundamental trade-off between communication rate and estimation error in sensing the channel state at the decoder is investigated for a discrete memoryless channel with discrete memoryless action dependent state when the state information is available either partially or fully at the encoder. We first investigate the capacity a relay channel with finite memory, where the action independent fixed channel state information is assumed to be known at both the encoder and decoder and then went on to investigate the problem of determining the trade-off between capacity and distortion for the channel with states known only at the encoder. The relay channel with finite memory is modeled with channels with inter-symbol interference (ISI) and additive colored Gaussian noise. The channel state or channel impulse responses are assumed to be known at both the encoders and decoder. Prior results are used to show that the capacity of this channel can be computed by examining the circular degraded relay channel in the limit of infinite block length. The thesis provides single letter expressions for the achievable rates with decode-and-forward (DF) and compress-and-forward (CF) processing employed at the relay. Additionally, the cut-set bound for the relay channel is generalized for the ISI/colored Gaussian noise scenario. All results hinge on showing the optimality of the decomposition of the relay channel with ISI/colored Gaussian noise into an equivalent collection of coupled parallel, scalar, memoryless relay channels. The region of optimality of the DF and CF achievable rates is also discussed. The resulting rates are illustrated through the computation of numerical examples. The problem of state communication over a discrete memoryless channel with discrete memoryless state when the state information is available strictly causally at the encoder is then studied. It is shown that block Markov encoding, in which the encoder communicates a description of the state sequence in the pre- vious block by incorporating side information about the state sequence at the decoder, yields the minimum state estimation error. When the same channel is used to send additional independent information at the expense of a higher channel state estimation error, the optimal tradeoff between the rate of the independent information and the state estimation error is characterized via the capacity–distortion function. It is shown xi that any optimal tradeoff pair can be achieved via rate-splitting. These coding theorems are then extended optimally to the case of causal channel state information at the encoder using the Shannon strategy. For non-causal channel state knowledge at the encoder, information-theoretic lower and upper bounds (based respectively on ideas from hybrid-coding and rate–distortion theory) are derived on the capacity– distortion function. Some examples are provided, for which the capacity–distortion functions are charac- terized by showing that the two bounds match. These coding theorems are then extended to the case of source coding with side information vending machine at the encoder (introduced in [5]) to provide an im- proved lower bound on the rate–distortion function. In some of the communication scenarios, however, the decoder is not interested in estimating the state directly, but it wants to reconstruct a function of the state with maximum fidelity. This problem of modified state estimation over a discrete memoryless implicit channels (DMIC) with discrete memoryless (DM) states is studied when the state information is available non-causally at the encoder. Lower and upper bounds on the optimal distortion in estimating the input of the implicit channel are derived. The methods developed for the DMIC with DM state model are then used to investigate the optimal distortion for the asymptotic version of the Witsenhausen counterexample, one of the fundamental problems in distributed control theory. The minimum distortion is characterized for the counterexample; furthermore it is shown that the combination of linear coding and dirty-paper coding (DPC) proposed in [42], in fact, achieves the minimum distortion for the Gaussian case when the proper amplification factor is determined. The results obtained with discrete memoryless state dependent channels are then extended to channels with action-dependent states, as defined in [123]. While [123] investigated the scenario of message depen- dent nonadaptive action sequences, this work focuses on characterizing the benefits of allowing adaptive action sequences, where the action is not only a function of the message, but it also depends strictly causally on the past observed state sequences. To compare the two framework, the problem of joint com- munication and state estimation is considered over an action dependent channel. The capacity–distortion tradeoff of such a channel is characterized for the case when the state information is available strictly causally/causally at the channel encoder. It has been shown that although adaptive action is not useful in increasing the unconstarined capacity of the channel, but it helps in achiveing a better capacity–distortion function by decreasing the state estimation error at the decoder. Since the capacity–distortion function is open with non-causal channel state information at the encoder, the capacity of such a channel is character- ized and it is shown that the adaptive action is not useful in increasing the capacity. The result is illustrated xii with an example of action dependent additive Gaussian channel, whose capacity is characterized by show- ing the equivalence of the current setting to the problem of the cooperative multiple access channel (MAC) with asymmetric state information at the encoders [130]. To illustrate the results of state communication with a practical example, the problem of planning the trajectory of a robotic vehicle to gather data from a deployment of stationary sensors is studied. The pur- pose of collecting data from the sensors is to monitor a source signal present in the environment. Here the source signal is the state of the system, which affects the coomunication channel between the sensors and the autonomous vehicle. The robotic vehicle and the sensors are equipped with wireless modems (e.g., radio in terrestrial environments or acoustic in underwater environments), which provide noisy communi- cation across limited distances. In such scenarios, the robotic vehicle can improve its efficiency by planning an informed data gathering trajectory. Prior work has proposed information theoretic performance metrics for these problems based on mutual information and Fisher information, but such metrics do not properly account for stochastic variations in the quantity being measured and also they don’t explicitly provide a reconstructed source signal, which is one of the main objective of monitoring an unknown source signal. A novel performance metric for data gathering in robotic sensor networks based on the concept of squared error distortion is proposed. This metric provides a principled approach for modeling source variations and communication limitations during data collection. The formal properties of the distortion function is ana- lyzed, and the squared error distortion with correlated sources, sources with unknown location and sources with unknown kinematics is determined using our results on state communication over state dependent channels. A sampling-based motion planning algorithm for optimizing data gathering tours for minimal distortion is also proposed and the proposed algorithms is compared in simulation, to show that distortion metrics provide significant improvements in data gathering efficiency. Lastly, The problem setting is extended to study the rate–distortion region of some distributed source coding problems, where action dependent side information “vending machine” is available to some of the decoders. These source coding problems can be thought of as a source coding dual of the channel coding problems with action dependent side information at the encoders. The model of a side information “vend- ing machine” (VM) accounts for scenarios in which the measurement of side information sequences can be controlled via the selection of cost-constrained actions. In this thesis, the three-node cascade source cod- ing problem is studied under the assumption that a side information VM is available and the intermediate and/or at the end node of the cascade. A single-letter characterization of the achievable trade-off among the transmission rates, the distortions in the reconstructions at the intermediate and at the end node, and xiii the cost for acquiring the side information is derived for a number of relevant special cases. It is shown that a joint design of the description of the source and of the control signals used to guide the selection of the actions at downstream nodes is generally necessary for an efficient use of the available communication links. In particular, for all the considered models, layered coding strategies prove to be optimal, whereby the base layer fulfills two network objectives: determining the actions of downstream nodes and simul- taneously providing a coarse description of the source. Design of the optimal coding strategy is shown via examples to depend on both the network topology and the action costs. Examples also illustrate the involved performance trade-offs across the network. xiv Chapter 1 Introduction The problem of information transmission over channels with state (also referred to as state-dependent channels) is classical. One of the most interesting models is the scenario in which the channel state is available at the encoder either causally or noncausally. This framework has been extensively studied for independent and identically distributed (i.i.d.) states, starting from the pioneering work of Shannon [99], Kusnetov and Tsybakov [70], Gelfand and Pinsker [38], and Heegard and El Gamal [47]; see a recent survey by Keshet, Steinberg, and Merhav [61]. Most of the existing literature has focused on determining the channel capacity or devising practical capacity-achieving coding techniques for this channel. In certain communication scenarios, however, the encoder may additionally wish to help reveal the channel state to the decoder. In this thesis, we study this problem of joint communication and state estimation over a discrete memoryless channel (DMC) with discrete memoryless (DM) state, in which the encoder has either partial or full channel state information and wishes to help reveal it to the decoder subject to some fidelity criterion. This problem is motivated by a wide array of applications, including multimedia information hiding in [86], digital watermarking in [14], data storage over memory with defects in [70] and [47], secret communication systems in [74], dynamic spectrum access systems in [83] and later in [33], and underwater acoustic/sonar applications in [104]. Each of these problems can be expressed as a problem of conveying the channel state to the decoder. For instance, the encoder may be able to monitor the interference level in the channel; it only attempts to carry out communication when the interference level is low and additionally assists the decoder in estimating the interference for better decoder performance. This same channel can also be used to send additional independent information. This is, however, accomplished at the expense of a higher channel state estimation error. We characterize the tradeoff between the amount of independent information that 1 can be reliably transmitted and the accuracy at which the decoder can estimate the channel state via the capacity–distortion function, which is to be distinguished from the usual rate–distortion function in source coding. Although we wish to eventually characterize the trade-off between communication and estimation, we start with a problem of determining the capacity of a relay channel with finite memory, where the fixed channel state information is assumed to be known at both the encoder and decoder. In this thesis, we determine the achievable rates and an upper bound on the capacity of the classical three node simple relay channel, but with intersymbol interference (ISI) and additive colored Gaussian noise. Such channels are of interest as most of wireless standards are bandlimited in nature; further, underwater acoustic channels and other wideband channels such as ultrawideband exhibit both ISI and colored noise (see [106, 116, 13]). We remark that the capacity of the memoryless relay channel is a long standing open problem with solutions for scenarios under very specific conditions (see [65],[28]). We then formulate the problem of joint communication and state estimation over a discrete memory- less channel (DMC) with discrete memoryless (DM) state, in which the encoder has either strictly causal, causal or non-causal state information. In the strictly causal case, the encoder has access to the channel state sequence of the previous transmission times, while in the causal case, the current state is additionally available at the encoder. With non-causal channel state information, the encoder knows the entire state sequence apriori. In this problem, the decoder is not only interested in decoding the message sent by trans- mitter in vanishing probability of error, but it also wants to estimate the state of the channel in minimum possible distortion. In this thesis, we characterize the tradeoff between the amount of independent infor- mation that can be reliably transmitted and the accuracy at which the decoder can estimate the channel state via the capacity–distortion function. We next extend our setting to consider the problem of conveying a modified version of the state over a state-dependent channel. In particular, our focus is on a discrete memoryless, state-dependent channel with states available non-causally at the encoder. In contrast to previous setting, where the decoder wishes to optimally estimate the channel state, here the receiver is interested in estimating a functionX of both the encoder input U and the channel state S in minimum possible distortion. As the encoder can fully observe the channel state S, the encoder can choose the channel input U in a clever way to make the functionX easier to estimate; the resulting communication channel is called a discrete memoryless implicit channel (DMIC). Example applications include target classification, where the decoder is not interested in estimating the sourceS directly, but it wants to reconstruct a functionX(S) of the source, sent over 2 a noisy communication channel. Here,X(S) may represent a set of feature vectors corresponding to the targetS. As another motivating example, when our setting is specialized to the Gaussian source and noise case, with the mean-squared error distortion, our setting becomes equivalent to the problem of Assisted Interference Suppression as introduced in [42]. Therein, the decoder wishes to reconstruct the source with a helper in the presence of an interferer known non-causally at the helper. As detailed in [42], this problem is also closely related to Witsenhausen’s counterexample in optimum distributed control theory [124], a connection we exploit herein. In fact, we specialize our results for DMICs to characterize the minimum distortion of the asymptotic version of Witsenhausen’s counterexample. So far in our discussion, we have considered the problem of joint communication and estimation over a DMC with DM state, where the channel state information is assumed to available for free at the encoder. However, in practical problems, neither the channel state is always chosen by the nature (for e.g., in the uplink of a cellular network, the codeword of unwanted user acts as channel state), nor the channel state information is available to the encoder without incurring any cost. To include this scenario, we consider state-dependent discrete memoryless channel (DMC) with action-dependent states (introduced in [123]), where the state sequence is independent and identically distributed (i.i.d.) over time given an action sequence. The cost constraint actions not only affect the formation of the channel states, but also captures the cost of acquisition of the channel state information at the channel encoder. Action-dependent channels are mainly classified into two groups with respect to the availability of state information at the action encoder: (i) non adaptive action where the action encoder selects an action sequencea n based only on the input messagem of the current block;(ii) adaptive action where at channel usei, the action encoder uses all its knowledge of states up toS i−1 and input messagem to decide on actiona i (m,s i−1 ). With respect to the availability of state information at the channel encoder, it can be further classified into three groups : (i) non-causal state information, (ii) causal state information and (iii) strictly casual channel state information. In this thesis, we investigate the impact of adaptive action relative to non-adaptive framework on such state dependent channels. To study the difference between non-adaptive and adaptive action dependent framework, we consider the problem of joint communication and estimation over these channels. This joint communication and estimation problem formulation over an action dependent channel is relevant to a wide array of applications including active classification [88], underwater path planning [119, 52], just to name a few. Each of these problems can be expressed as a problem of conveying action dependent state to the destination. For example, an autonomous vehicle performing a active classification task has control over how it 3 Figure 1.1: Visualization of an autonomous vehicle gathering data from a deployment of sensors moni- toring a source signal. We propose a novel metric based on the squared error distortion to optimize the trajectory of the vehicle. views the environment or state S. The vehicle could take actions such as change its position, modify parameters on its sensor, or even manipulate the environment to improve its view. Also the autonomous vehicle can sometimes be able to take actions adaptively by modifying its plan as new information from viewing the object becomes available. This example can be mathematically formulated as a problem of planning the trajectory of a robotic vehicle to gather data from a deployment of stationary sensors. The purpose of collecting data from the sensors is to monitor a source signal present in the environment. The accurate measurement and interpolation of large-scale spatio-temporal processes is becoming increasingly important for sciences such as biology, climatology, geology, and oceanography. In terrestrial environ- ments, phenomena of interest include seismic activity [68], volcanic activity, and catastrophic weather patterns. In marine environments, harmful algal blooms [101], oil spills, and other oceanographic events are extremely challenging to monitor effectively with available technology (e.g., satellites, drifters, and human-operated surface craft). Recent advances in autonomous robotic vehicles and sensor networks have made it feasible to study and predict these phenomena across large spatial scales and long periods of time, but a number of challenges still remain. For example, many currently deployed sensors must be removed from the field to download their data. The ability to gather sensor data in situ would improve the cost- effectiveness and lifespan of the sensors and would make such deployments feasible across larger scales. To improve the efficiency of spatio-temporal monitoring, we discuss the use of an autonomous robotic vehicle acting as a “data mule” to gather data from a large-scale sensor deployment. Based on our work 4 Figure 1.2: A multi-hop computer network in which intermediate and end nodes can access side informa- tion by interrogating remote data bases via cost-constrained actions. on state estimation, we develop a communication strategy (encoding strategy at the sensors and decoding strategy at the vehicle), which quantifies the effective distortion in estimating a dynamic source sequence along the trajectory of the vehicle. We also propose a sampling-based motion planning algorithm, which can be used to generate effective trajectories using limited computational resources. Lastly, The problem setting is extended to study the rate–distortion region of some distributed source coding problems, where action dependent side information “vending machine” is available to some of the decoders. These source coding problems can be thought of as a source coding dual of the channel coding problems with action dependent side information at the encoders. In this thesis, we extend the point-to-point set-up to cascade models, which provide baseline scenarios in which to study fundamental aspects of communication in multi-hop networks, which are central to the operation of, e.g., sensor or computer networks (see Fig. 1.2). Standard information-theoretic models for cascade scenarios assume the availability of given side information sequences at the nodes (see e.g., [111]-[15]). In this chapter, instead, we account for the cost of acquiring the side information by introducing side information VMs at an intermediate node and/ or at the final destination of a cascade model. As an example of the applications of interest, consider the computer network of Fig. 1.2, where the intermediate and end nodes can obtain side information from remote data bases, but only at the cost of investing system resources such as time or bandwidth. Another example is a sensor network in which acquiring measurements entails an energy cost. 5 1.1 Related Work The relay channel was introduced by van der Meulen [82, 117, 118] and extensively studied since that time. A significant set of contributions on the analysis of such channels was provided by Cover and El Gamal in [26], wherein capacity achieving coding strategies were provided for degraded, reversely degraded and feedback relay channels. The bulk of the research on relay channels has focused on memoryless channels either with or without feedback (see e.g. [136, 94, 65, 97]). In the thesis, we determine the achievable rates and an upper bound on the capacity of the classical three node simple relay channel, but with intersymbol interference (ISI) and additive colored Gaussian noise. Important prior work on this problem includes [39], which provided the link between circular multi-terminal networks with ISI and linear ones. A single-letter expression for the two-user broadcast channel is given in [39]; however, the computational methods used therein do not directly apply to our case due to the presence of the multihop link. In fact, the challenge for the three node relay network stems from the intermediate processing at the relay node. While the defining expressions for the capacity of the relay channel with finite memory 1 are provided in [80], a single-letter expression is not provided. The problem of joint communication and state estimation was introduced in [110], which studied the capacity–distortion tradeoff for the Gaussian channel with additive Gaussian state when the state informa- tion is noncausally available at the encoder; see Sutivong [109] for the general case. The other extreme case was studied later in [132], in which both the encoder and the decoder are assumed to be oblivious of the channel state; the capacity of the channel subject to a distortion constraint is determined. This thesis connects these two sets of prior results by considering causal (i.e., temporally partial) information of the state at the encoder. Note that the problem of communicating the causally (or noncausally) available state and independent information over a state-dependent channel was also studied in [63] and its dual problem of communicating independent information while masking the state was studied by Merhav and Shamai [81]. Instead of reconstructing the state subject to some fidelity criterion, however, the focus in [63] was the optimal tradeoff between the information transmission rate and the state uncertainty reduction rate (the list decoding exponent of the state). We will elucidate the connection between the results in [63] and our results in Chapter 3. The problem of conveying a modified version of the state over a state-dependent channel was studied by [42] in connection with Witsenhausen’s counterexample, one of the fundamental open problems in 1 Classically, capacity computations in the presence of memory require examination of the entire signal due to the memory. 6 distributed control theory. The original scalar version of the Witsenhausen’s counterexample [124] is a classical open problem in distributed control systems. In [124], Witsenhausen underscores the complexity of distributed systems by showing that even for a two-stage distributed linear quadratic Gaussian (LQG) system, a nonlinear control strategy can outperform all linear laws. It is now clear that Witsenhausen’s problem itself is quite hard; the optimal strategy for the problem is still unknown. The non-convexity of the problem makes the search for an optimal strategy challenging and various modifications of the problem have been considered in [7, 95] that still admit linear solutions. In fact, it was shown in [49, 90] that the discrete approximations of the problem are NP-complete. Various schemes [6, 73, 69] were developed using numerical optimisation techniques for the scalar problem with [69] having the lowest cost computed to date. We refer to [42] and references therein for a review of the early literature. Recently, [126] formulated the problem as an optimization problem involving the quadratic Wasserstein distance and the minimum mean-square error. For the original scalar problem, [126] shows that affine controllers are asymptotically optimal in the weak signal regime, but strictly sub-optimal for strong signals. Motivated by the success of asymptotic results in information theory and existence of an implicit communication channel in Witsenhausen’s counterexample (as observed in [50, 84]), a vector extension of the counterexample was formulated in [42] (recently, a more general version of the Assisted Interference Suppression problem, where the desired signal is also available at the encoder, is investigated in [16]). As noted in [42], the problem is one of many closely related information theory problems on state estimation, some of which have been recently addressed in the literature [24, 81, 63, 132, 20, 113]. Using results from rate-distortion theory and dirty paper coding (see [24]), lower and upper bounds on the minimum distortion were provided in [42], which was then specialized in [41] for the finite blocklength case to compute bounds for the original counterexample for all values of the parameters. The lower bound on the asymptotic problem was further improved in [43] and then in [16], which relies on a novel application of the relation between relative entropy and mismatched estimation in Gaussian noise; in contrast to our approach for the vector version of the problem. The notion of channels with non-adaptive action dependent state was introduced in [123]. The capacity of such a channel both for the case where the channel encoder inputs are allowed to depend noncausally on the state sequence, and that where they are restricted to causal dependence are characterized. The problem setting was revisited in [5] to include the scenario where cost constrained actions are taken to acquire any partial or complete channel state information by the encoder, the decoder or both. Both of these literature 7 have focused on determining the channel capacity or devising practical capacity-achieving coding tech- niques for this channel, while we focused on studying trade-off between communication and estimation, when acquiring channel state information is associated with expenditure of costly system resources. There has been increasing interest in the robotics community on the problem of coordinating robotic data mules for data collection tasks [98]. Recent research has often focused on ground robots constrained to download data from all deployed sensors. If the network is sparse, it can be efficient to partition the sensors into sub-networks and optimize the collection over each sub-network [125]. For denser networks, if the communication range of the sensors can be modeled as a fixed radius, it is possible to develop effi- cient motion planning methods based on the Traveling Salerperson (TSP) with Neighborhoods [129]. Such techniques have been implemented on robots operating in real-world environments, showing the feasibility of robotic data mules with current technology [10]. For many applications of robotic sensor networks, a fixed communication radius is not a valid assumption, due to gradual degradation of packet error rate over distance [85]. To provide more realistic communication modeling, two-ring communication models have been explored [112] as well as methods that optimize based on expected network latency [114]. Similar approaches have also been applied to improving the placement of sensors to maximize communication efficiency [67]. Robotic data collection from sensor networks has also been applied to applications in underwater domains. In such domains, communication is limited to long-range and low-bandwidth acous- tic communication [105] or shorter-range and higher-bandwidth optical communication [119]. In some prior work, robotic motion planning techniques with realistic acoustic communication modeling for the problems of station-keeping [4] and underwater search [53] was investigated. In some recent work, the problem of robotic data collection in underwater sensor networks that optimize assuming a probabilistic communication model [51] was explored. While all these prior works properly integrate network power consumption and latency, they utilize information metrics based on information gain, which are not funda- mentally tied to the properties of the communication system. In this theis, we introduce a new metric for such tasks that measures the integrity of sensed information that is communicated over a channel 2 , which improves the performance of the data collection. The concept of a side information “vending machine” (VM) was introduced in [91] for a point-to-point model, in order to account for source coding scenarios in which acquiring the side information at the re- ceiver entails some cost and thus should be done efficiently. Various works have extended the results in 2 This measure can be extended to the case where we wish to optimize between sending sensed information and additional signals. This is an avenue for future work. 8 [91] to multi-terminal models. Specifically, [17, 2] considered a set-up analogous to the Heegard-Berger problem [45, 58], in which the side information may or may not be available at the decoder. The more gen- eral case in which both decoders have access to the same vending machine, and either the side information produced by the vending machine at the two decoders satisfy a degradedness condition, or lossless source reconstructions are required at the decoders is solved in [17]. In [1], a distributed source coding setting that extends [8] to the case of a decoder with a side information VM is investigated, along with a cascade source coding model to be discussed below. Finally, in [137], a related problem is considered in which the sequence to be compressed is dependent on the actions taken by a separate encoder. We will shed light on the relation in Chapter 7. 1.2 Contributions The main contributions of our theis are as follows: • We discuss two important coding strategies at the relay (a) the Decode-and-Forward (DF) protocol, and (b) the Compress-and-Forward (CF) protocol, and derive the corresponding achievable rates. In addition, we generalize the cut-set bound for the converse to our scenario of interest. A key consequence of our work is that a parallel decomposition is optimal for the computation of both the DF and CF achievable rates and thus permuting channels at the relay [133] cannot improve the bounds. The resulting parallel relays are coupled via the power constraint for the DF case and via both a power and rate constraint for the CF case which affect the optimal power allocation strategies. • For the causal state communication problem, we show that block Markov encoding, in which the encoder communicates a description of the state sequence in the previous block by incorporating side information about the state sequence at the decoder, is optimal for communicating the state when the state information is strictly causally available at the encoder. For the causal case, this block Markov coding scheme coupled with incorporating the current channel state using the Shannon strategy turns out to be optimal. With additional independent information rate, we show that any optimal tradeoff can be achieved via rate-splitting, whereby the encoder appropriately allocates its rate between information transmission and state communication. • We derive an information theoretic upper bound on the capacity–distortion function with non-causal channel state information. The upper bound requires a careful choice of auxiliary random variable 9 and to the best of our knowledge, this is the first upper bound proposed for this non-causal state communication problem. To illustrate the utility of the bounds proposed, we provide a few examples where these upper and lower bounds match. We also revisit the setting of source coding with a side information vending machine at the encoder introduced in [91] and using our methods for the channel coding problem, we derive a stronger converse for the source coding problem than that provided in [91]. • We provide lower and upper bounds on the minimum distortion of the DMIC and specialize the results to lossless communication and the state-dependent binary implicit channel. We show that the lower bounds proposed in [43] and [16] for the Gaussian DMIC are loose and our lower bound is tighter due to a clever choice of a key auxiliary random variable. • We specialize our proposed bounds to the Gaussian DMIC setting to characterize the minimum distortion of the asymptotic Witsenhausen counterexample. This result could potentially provide stronger bounds on the optimal cost of the original scalar counterexample. • The tighter lower bound obtained in the DMIC problem enables us to prove that the achievable strategy of combining linear coding with DPC (proposed in [42]) in fact achieves the minimum distortion in the asymptotic Witsenhausen problem. This optimal achievable strategy is a vector extension of the “soft quantization” scheme proposed in [6], where it was shown that the optimal strategies may use “slopey” quantization (this connection is explicitly discussed in [42]). We show that the optimal amplification factor of the DPC strategy herein is different from the one derived in [24] to maximize the rate over this state-dependent channel. • In the action dependent setting, we show that with non adaptive action and strictly causal channel state knowledge at the channel encoder, a two stage encoding scheme achives the capacity–distortion function. In the first stage the message is communicated through the action sequence and then conditioned on the action sequence, a block Markov strategy similar to the one in [18] is optimal. • Capacity–distortion function with adaptive action is also characterized and it is shown that although adaptive action is not useful to increase the capacity, but it helps the receiver to get a better estimate of the channel state as compared to a non-adaptive framework. • These coding theorems are then extended optimally to the case of causal channel state information at the channel encoder using the Shannon strategy. Beyond merely generalizing previously considered 10 problems involving coding with states known at the transmitter, we show that this adaptive action framework also captures scenarios considered in [71, 77] pertaining to multiple access channels (MAC) with states. • Since the capacity–distortion function is open with non-causal channel state information at the en- coder, we study the capacity (no state estimation) of the action-dependent channel with non-causal channel state information (CSI) at the channel encoder. We show that the case in which both the action and channel encoders send only the common message, adaptive action does not increase the capacity and hence the capacity of our channel model is the same as that of the non-adaptive action- dependent channel. In fact, the inutility of the adaptive action for the case of causal CSI at the channel encoder was noted in [123], but the scenario of non-causal CSI with adaptive action was left open. • We note that the inutility of the strictly causal availability of the states at the action encoder is a direct consequence of the fact that with non-causal CSI, the channel encoder can itself cancel the effect of the interference completely using a variation of the standard Gelfand–Pinsker coding scheme [38], with no need to jointly transmit the compressed version of the state. This fact is re-established by studying a memoryless action-dependent Gaussian channel model in which both the noise and the action-dependent state are additive and Gaussian. In this case, we characterize the capacity region of our model (again left open in [123]) by showing the equivalence of our setting to the problem of coopeartive MAC with asymmetric state information at the encoders [130]. • We extend our problem formulation to study the adaptive action dependent probing channels (intro- duced in [5]). We observe the inutility of the adaptive action in this setting as well. We then revisit the setting of Gaussian probing example of [5], and we provide an upper bound on its capacity and propose an achievable strategy that gives a better lower bound on capacaity than the previous one in [5]. • In the path planning problem, we propose a novel metric for coordinating the actions of a robotic vehicle collecting data from a stationary sensor network. Since the problem setting becomes inher- ently estimation theoretic in nature and hence the metric we propose to measure the fidelity of the source estimate is based on the idea of minimizing the squared error distortion along the trajectory of the vehicle, which captures the average error in estimating the stochastic source. The distortion 11 metric is widely used in the wireless communications literature (see [110, 81, 63, 19, 20] for details) to study the problem of lossy reconstruction of source sequences, but to our knowledge is unknown in the robotics literature. • It is straightforward to show that, in the general case, finding an optimal trajectory for a vehicle to minimize distortion is NP-hard. We also show that distortion does not possess the properties of monotonicity or submodularity, which are often associated with information gathering metrics. However, we demonstrate that sampling-based motion planning algorithms can be used to generate effective trajectories using limited computational resources. The resulting robotic data collection trajectories improve the efficiency of sensing and move towards tighter integration between motion planning and communication. • In action dependent source coding setup, we derive the achievable rate-distortion-cost trade-offs for a set-up, in which a side information VM exists at Node 3, while the side informationY is known at both Node 1 and Node 2 and satisfies the Markov chain X→ Y → Z. This characterization extends the result of [15] discussed above to a model with a VM at Node 3. • We also study the a source coding problem with cascade-broadcast model in which a VM exists at both Node 2 and Node 3. In order to enable the action to be taken by both Node 2 and Node 3, we assume that the information about which action should be taken by Node 2 and Node 3 is sent by Node 1 on the broadcast link of rateR b . Under the constraint of lossless reconstruction at Node 2 and Node 3, we obtain a characterization of the rate-cost performance. This conclusion generalizes the result in [17] discussed above to the case in which the rateR 1 and/orR 2 are non-zero. • We then tackle the problem with cascade-broadcast model but under the more general requirement of lossy reconstruction. Conclusive results are obtained under the additional constraints that the side information at Node 3 is degraded and that the source reconstructions at Node 2 and Node 3 can be recovered with arbitrarily small error probability at Node 1. This is referred to as the CR constraint following [103], and is of relevance in applications in which the data being sent is of sensitive nature and unknown distortions in the receivers’ reconstructions are not acceptable (see [103] for further discussion). This characterization extends the result of [3] mentioned above to the set-up with a side information VM, and also in that both ratesR 1 andR b are allowed to be non-zero. 12 • Finally, we revisit the source coding results above by allowing the decoders to select their actions in an adaptive way, based not only on the received messages but also on the previous samples of the side information extending [22]. Note that the effect of adaptive actions on rate–distortion–cost region was open even for simple point-to-point communication channel with decoder side non- causal side information VM. In this thesis, we conclude that, in all of the considered examples, where applicable, adaptive selection of the actions does not improve the achievable rate-distortion- cost trade-offs. The rest of this thesis is organized as follows. Chapter 2 describes the basic channel model of a relay channel with ISI and colored Gaussian noise, characterizes the DF and CF achievable rate, generalizes the cut-set upper bound for relay with finite memory and illustrates the results with key examples. Chapter 3 describes the basic channel model for joint communication and sensing, evaluates the capacity–distortion function with strictly causal and causal channel state knowledge. Chapter 4 extends the results to the non-causal channel state information and also treats the problem of estimation of a function of the state with application in Witsenhausen’s counterexample. Chapter 5 extends the joint communication and state estimation framework to the set up where encoders can take actions to choose a more favorable channel and acquiring channel state information requires spending a part of avavilable resources. To illustrate the usefulness of this framework of joint information transmission and state estimation, we consider a practical set up of planning the trajectory of a robotic vehicle to gather data from a deployment of stationary sensors in Chapter 6. Chapter 7 deals with some multi-terminal distributed lossy source coding duals of the channel coding problems investigated in the thesis, where action dependent side information is available at some of the decoders. The rate–distortion functions are characterized for some specific problems. Finally, Chapter 8 concludes the thesis, and discuss future work. Throughout the thesis, we closely follow the standard notation. In particular, a random variable is denoted by an upper case letter (e.g., X,Y,Z) and its realization is denoted by a lower case letter (e.g., x,y,z). The shorthand notationX n is used to denote the tuple (or the column vector) of random variables (X 1 ,...,X n ), andx n is used to denote their realizations. The notationX n ∼p(x n ) means thatp(x n ) is the probability mass function (pmf) of the random vectorX n . Similarly,Y n |{X n = x n }∼ p(y n |x n ) means thatp(y n |x n ) is the conditional pmf ofY n given{X n = x n }. ForX∼ p(x) andǫ∈ (0,1), we define the set ofǫ-typicaln-sequencesx n (or the typical set in short) [89] as T (n) ǫ (X) = x n :|#{i: x i =x}/n−p(x)|≤ǫp(x) for allx∈X . 13 We say thatX→ Y → Z form a Markov chain ifp(x,y,z) = p(x)p(y|x)p(z|y), that is,X andZ are conditionally independent of each other givenY . Finally, C(x) = (1/2)log(1+x) denotes the Gaussian capacity function. 14 Chapter 2 Relay with finite memory In this chapter, we would like to study the problem of sending information over a Gaussian relay channel with state. The state here is the finite length channel impulse responses of various links and the state is assumed to be fixed over the horizon of communication. We will also assume that channel state informa- tion is available at both the encoder and decoder. Hence, the decoder knowing the channel state is only interested in decoding the message transmitted by the encoder. The problem is motivated by applications in underwater acoustic communication, in which due to the frequency and distance dependent attenua- tion, long distance communication is not possible without the use of intermediate relays. We would like to underscore the fact that the capacity of a scalar relay channel, which is a special case of our problem formulation, is a long standing open problem with solutions for scenarios under very specific conditions (see [65],[28]). So the objective of this chapter is to comment on the relationship between a Gaussian relay channel with memory and a memoryless relay channel, so that the large set of existing results for scalar, memoryless relay channel can be extended to the relay channel with memory. The remainder of the chapter is organized as follows. In Section 2.1, we introduce relevant channel models and establish the equivalence between different models. Section 2.2 computes the bounds on the capacity ofn-block channel and the condition of optimality of the different achievable rates. In Section 2.3, we examine the bounds in the limit of infinite block length. Section 2.4 provides a few illustrative examples. Section 2.5 extends the framework to the MIMO relay channel with memory concentrating on the particular example of a symbol asynchronous relay channel. Finally, in Section 2.6, we present the conclusions of the chapter. 15 Figure 2.1: Channel model of a single-relay channel. 2.1 Channel Models and Capacity Relationships In this section, we introduce our channel model and the related circular Gaussian relay channel building on the formulations of [39]. We consider the capacity of the discrete-time relay channel model as shown in Fig. 2.1. The signal transmitted by the source and relay are given by {X Sk } and {X Rk }, respectively. The stationary, additive Gaussian noise processes at the relay and destination, denoted by {V Rk } and {V Dk }, have zero-mean, and autocorrelation functionsR R [i] andR D [i], respectively, of finite supporti max . Let {h qi } m i=0 , q ∈{SR,SD,RD} denote the channel impulse responses (CIRs) of the three links, with common memory lengthm. Without loss of generality, we only consider the case,m≥i max 1 . The output sequences at the relay and destination are, {Y Rk } and {Y Dk }, respectively, with, Y Rk = m X i=0 h SRi X S(k−i) +V Rk , Y Dk = m X i=0 (h SDi X S(k−i) +h RDi X R(k−i) )+V Dk . (2.1) The following power constraints are assumed for the source and relay signals, for alln, 1 n n X k=1 E[X 2 qk ] = 1 n β n q ≤P q , q∈{S,R}. (2.2) For a givenm, this channel is called the linear Gaussian relay channel (LGRC) with finite memorym (see [39]) as the output is a linear convolution of the input codeword with the channel impulse response. Clearly, the channels have ISI since the channel output at timek depends on the input symbols{X q (k)} q∈{S,R} at timek as well as previous input symbols{X q (i)} q∈{S,R} ,i<k. In addition, the noise samples{V q (k)} q∈{R,D} at timek are correlated with the noise samples{V q (i)} q∈{R,D} at timesi<k, howeverV R (k) is independent ofV D (k). 1 Ifm <imax , CIRs can be zero padded to make them equal. 16 We now define then-block circular Gaussian relay channel (n-CGRC) forn>m, an-block memo- ryless channel obtained by modifying the LGRC with memorym. This channel model will play a pivotal role in the capacity computation as seen in the sequel. Specifically, then-CGRC over eachn-block has input vectors{X Sk } n k=1 ,{X Rk } n k=1 which produce output vectors{Y c Rk } n k=1 at the relay and{Y c Dk } n k=1 at the destination with, Y nc R = H c SR X n S +V nc R , Y nc D = H c SD X n S + H c RD X n R +V nc D , (2.3) where H c is the circulant channel matrix, whose first row is defined as, H c q (1,:) = [h q0 ,0,··· ,0,h qm ,··· ,h q2 ,h q1 ], q∈{SR,SD,RD}, and each subsequent row is a single cyclic shift to the right. The only difference with the linear channel model is that the channel output is the circular convolution of the input codeword with the channel impulse response instead of a linear one. The circular noise processes over each n-block{V c qk } n k=1 , q ∈ {R,D} have periodic autocorrelation function and can be found in [39]. Noise samples from different n-blocks are independent since the channel is n-block memoryless. Then-CGRC inherits the LGRCŠs power constraints. We next define key notation which will be used throughout the chapter. We letΣ q =E[X n q (X n q ) † ], and N q =E[V nc q (V nc q ) † ], q∈{S,R} be the source and relay input correlation matrices and noise correlation matrices, respectively. The cross-correlation matrices are defined as Σ SR = E[X n S (X n R ) † ] = Σ † RS . We shall repeatedly make use of the fact that circulant matrices can be diagonalized by the Discrete Fourier Transform (DFT) matrix, which we denote as F. Thus,X nf q = FX n q , q∈{S,R} is the DFT of the input signal X n q with Ψ q = E[X nf q (X nf q ) † ], q∈{S,R}. The cross-correlation matrices in the frequency domain can be similarly defined asΨ SR =E[X nf S (X nf R ) † ] = Ψ † RS . Also for a matrix A,|A| denotes the absolute value of the determinant of A. Additionally, the following diagonal matrices are defined: D l = FH c l F † , l∈{SR,SD,RD}, (2.4) C q = FN q F † , q∈{R,D}. (2.5) 17 Finally, we define X n = X n S − ˆ X n S (X n R ), the error in the estimate of X n S given X n R . Then, Σ = E[X n (X n ) † ] = Σ S −Σ SR Σ −1 R Σ † SR andΨ =E[X nf (X nf ) † ] = FΣF † = Ψ S −Ψ SR Ψ −1 R Ψ † SR . Direct computation of the capacity of then-block LGRC is challenged by the presence of inter-block interference due to the fact that the channel impulse responses have memory and also noises are correlated across the block. In [39], it is shown that if we extend the definitions of then-LGRC, then-CGRC to a synchronous Gaussian multi-terminal channel, then the capacity region of the two multi-terminal channels is the same in the limit asn goes to infinity. As the relay channel is a special case of a synchronous multi- terminal channel, we get our desired result. Thus, the capacityC of the LGRC can be computed as the limit of then-CGRC, asn grows to infinity. It is easier to deal with then-CGRC model as it avoids the interblock interference by converting the linear convolution ofn-LGRC to a circular convolution. In the sequel, we derive single-letter expressions for a variety of rate bounds: achievable rates for DF and CF, and a generalization of the cut-set bound for then-LGRC by exploiting this equivalence. 2.2 Bounds on the Capacity of then-CGRC 2.2.1 Achievable Rate: Decode-and-Forward Since then-CGRC defined in Eqn. (2.3) is ann-block memoryless relay channel, its achievable rate under DF coding strategy follows directly from [26] if we replace(X,X 1 ,Y 1 ,Y) by(X n S ,X n R ,Y nc R ,Y nc D ). The DF achievable rate is thus given by C c nDF (P S ,P R ) = sup p(x n S ,x n R ) 1 n min{I(X n S ;Y nc R |X n R ),I(X n S ,X n R ;Y nc D )}, satisfying the power constraints given by (2.2). To simplify notation, we define the following power constraint set P D = ( α(·),P S (·),P R (·) : 0≤α(ω i )≤ 1, 1 n n X i=1 P S (ω i )≤P S , 1 n n X i=1 P R (ω i )≤P R ) . 18 Theorem 1. The achievable rate for an-block CGRC with finite memorym, where the relay employs DF , is given by C c nDF (P S ,P R ) = max P D min{C c 1nDF ,C c 2nDF }, where, C c 1nDF = 1 2n n X i=1 C α(ω i )|H SR (ω i )| 2 P S (ω i ) N R (ω i ) ! , C c 2nDF = 1 2n n X i=1 C P(ω i ) N D (ω i ) , H q (ω i ) = D qii ,q∈{SR,SD,RD}, (2.6) N q (ω i ) = C qii ,q∈{R,D}, (2.7) and P(ω i ) = |H SD (ω i )| 2 P S (ω i )+|H RD (ω i )| 2 P R (ω i )+2 q ¯ α(ω i )|H SD (ω i )H RD (ω i )| 2 P S (ω i )P R (ω i ). Additionally,C(x) = log(1+x). TheH q (ω i ) are the channel components for frequency bini and link q; theN q (ω i ) are similarly defined noise components. TheP S (ω i ) andP R (ω i ) are the powers allocated by the source and the relay, respectively, for i’th component of the channel and 0≤ α(ω i )≤ 1 is the cross-correlation between the input signals as defined in [26] and ¯ α(ω i ) = 1−α(ω i ). Before proving the theorem, we introduce two key lemmas. We first require a property of the maxi- mizing input probability distribution from [80]. Lemma 1. [80] The capacity of the degraded relay channel with finite memory of lengthm is C DF = lim n→∞ sup q 1 n min{I(X n S ;Y n R |X n R ),I(X n S ,X n R ;Y n D )}, where the maximization is taken over the input distributionq = Q n i=1 p(x Si |x Si−1 ,x Ri )p(x Ri |x Ri−1 ). Lemma 1 implies that the processX R is allowed to evolve without any dependence on the processX S , while the processX S may be causally dependent onX R . The key to showing Theorem 1 is proving that the DFT decomposition is optimal for DF. To this end, we must show that a certain correlation structure holds for the source and relay signals,X R andX S . 19 Lemma 2. Given Ψ R diagonal, for jointly Gaussian input (X nf S ,X nf R ) satisfying the form of the input distribution of Lemma 1, the matrices Ψ, D will be diagonal if and only if Ψ S and Ψ SR are diagonal matrices, where D = D SD Ψ S D † SD + D RD Ψ R D † RD +2Re(D SD Ψ SR D † RD )+ C D . (2.8) Proof. For a given diagonalΨ R , ifΨ S andΨ SR are diagonal matrices, it is easy to see that bothΨ and D are diagonal. To show that diagonal Ψ and D implies diagonal Ψ S and Ψ SR for a given diagonal Ψ R , we proceed as follows. As the input vectors are jointly, multivariate, Gaussian we can decompose our source, in the DFT domain, into X nf S = VX nf R + WX nf S0 , (2.9) where,X nf S ,X nf R are the DFTs of the input symbolsX n S ,X n R , V and W are generaln×n matrices and X nf S0 are a set ofn independent Gaussian random variables, also independent ofX nf R . From Lemma 1, it is sufficient to consider only lower triangular matrices V and W. Substituting the value ofX nf S from Eqn. (2.9) inΨ, we get Ψ = VΨ R V † + WΨ S0 W † − VΨ R Ψ −1 R Ψ R V † = WΨ S0 W † , where Ψ S0 is the covariance matrix ofX nf S0 , and is diagonal by construction. As the product of a non- singular lower and upper triangular matrix is diagonal if and only if they themselves are diagonal, W must be diagonal forΨ to be diagonal, as W is lower triangular andΨ S0 W † is upper triangular. To show the related result for D, we first assume, without loss of generality, that the channel diagonal matrices are the identity. Then, substituting Eqn. (2.8), we have, D = (V+ I)Ψ R (V+ I) † + WΨ S0 W † . As argued above for W, only a diagonal V, diagonalizes (V + I)Ψ R (V + I) † for a diagonal Ψ R . Since V, W andΨ R are diagonal,Ψ S andΨ SR must be diagonal as well. With Lemma 1 and 2 in hand, we can prove Theorem 1. 20 Proof. For the Gaussian relay channel C c 1nDF ≡ I(X n S ;Y nc R |X n R ) =h(Y nc R |X n R )−h(Y nc R |X n R ,X n S ) = h(H c SR X n S +V nc R |X n R )− 1 2 log2πe|N R | (a) ≤ 1 2 log2πe H c SR ΣH c† SR + N R − 1 2 log2πe|N R | (b) ≤ 1 2 log2πe D SR ΨD † SR + C R − 1 2 log2πe|C R |, (2.10) where (a) follows from the fact that a Gaussian distribution maximizes the entropy; (b) follows from the fact that both the channel impulse response H c l and noise correlation N R matrices are circulant and thus can be diagonalized by the DFT matrix F and due to the fact that F is unitary and hence has a unity determinant. Similarly, C c 2nDF ≡I(X n S ,X n R ;Y nc D ) ≤ 1 2 log2πe D SD Ψ S D † SD + D RD Ψ R D † RD +2Re(D SD Ψ SR D † RD ) | {z } ≡D +C D − 1 2 log2πe|C D | = 1 2 log2πe|C D + D|− 1 2 log2πe|C D |. (2.11) Equality occurs when the input vectors are multivariate Gaussian distributed. We assume, as in Lemma 2, that without loss of generality, the diagonal channel matrices are identity matrices. Thus we have C c 2nDF (a) ≤ 1 2 log D 1 + V 1 Ψ R V † 1 (b) = 1 2 log Ψ −1 R + V 1 D −1 1 V † 1 |Ψ R ||D 1 |, (2.12) where, V 1 = V + I and D 1 = C D + WΨ S0 W † and (a) follows from making these substitutions in Eqn.(2.11) and (b) follows from the Generalized Matrix-Determinant Lemma [44]. By Hadamard’s in- equality (see e.g.[29]), the determinant of a positive definite matrix is maximized when the matrix is diagonal, thus a diagonal Ψ R maximizesC c 2nDF . As we have that a diagonal Ψ R is necessary, Lemma 2 provides our final desired result. The final expression in the theorem results from manipulating Eqns.(2.10) 21 Figure 2.2: Decomposition of relay with ISI into parallel memoryless relays in frequency domain. and (2.11) and employing the definitions in Eqns. (2.6) and (2.7). The implication of Theorem 1 is that if we designX S in the DFT domain, a codeword which is white across sub-channels is optimal when the blocklengthn→∞. This implies that treating the relay channel as a set of n parallel and independent scalar relay channels is optimum for the computation of the DF rate (as shown in Fig. 2.2) and it can be easily shown that permuting the channels at relay via channel matching as was done in [133] for the multi-hop channel is sub-optimal, as one cannot exploit potential cooperative gain in a single sub-channel. Observe however, that the input power constraints are coupled for then parallel channels. So coding across subcarriers is not required to achieve the DF capacity. 2.2.2 Achievable Rate: Compress-and-Forward In DF, the relay completely decodes the source codeword and then retransmits a related signal of lower rate to the destination. In contrast, in CF, the relay quantizes the received signal and transmits this quantized information to the destination. Since the maximizing input distribution is not known for the Gaussian CF relay channel, we consider inputs with normal pdf and a Gaussian quantizer at the relay. We will show that under the above assumptions, the CF rate can be computed by decomposing the relay with memory into a set of parallel, scalar relay channels. 22 Theorem 2. Forn-block CGRC defined in the last section, the CF achievable rate with Gaussian inputs is given by, C c nCF = sup ˆ NR(ωi)≥0,1≤i≤n n X i=1 1 2n C P S (ω i ) |H SD (ω i )| 2 N D (ω i ) + |H SR (ω i )| 2 ˆ N(ω i ) !! , (2.13) subject to the input power constraint (2.2) and n X i=1 log ˆ N R (ω i ) ≥ n X i=1 log P S (ω i ) |H SR (ω i )| 2 N D (ω i )+|H SD (ω i )| 2 ˆ N(ω i ) + ˆ N(ω i )N D (ω i ) |H SD (ω i )| 2 P S (ω i )+|H RD (ω i )| 2 P R (ω i )+N D (ω i ) , (2.14) where, ˆ N(ω i ) =N R (ω i )+ ˆ N R (ω i ) and ˆ N R (ω i ) is the variance of the quantization noise in thei-th sub- band. As in Theorem 1, theH q (ω i ) are the channel components for frequency bini and linkq; theN q (ω i ) are similarly defined noise components. TheP S (ω i ) andP R (ω i ) are the powers allocated by the source and the relay, respectively, fori’th component of the channel. Before proving the Theorem 2, we introduce one key lemma which states the optimality of decomposed relay for the computation of CF rate. Lemma 3. A diagonalΨ S andΨ R maximizesC c nCF in Theorem 2. Proof. To prove this Lemma, we need to show that Eqn. (2.20) subject to the power constraints (2.2) and rate constraint (2.21) is maximized by a diagonalΨ S andΨ R . We prove this by showing that diagonalΨ S and Ψ R maximizes the objective function Eqn. (2.20) without altering the constraint set (2.21). Consider the first term in Eqn. (2.20). The matrix inside the determinant operator can be decomposed into two parts. We observe that maximizing the expression below overΨ S argmax ΨS 1 2 log C D 0 0 C R + ˆ C R + D SD Ψ S D † SD D SD Ψ S D † SR D SR Ψ S D † SD D SR Ψ S D † SR 23 is equivalent to maximizing, argmax ΨS 1 2 log D −1 SD C D D −1† SD | {z } C 0 0 D −1 SR C R + ˆ C R D −1† SR | {z } D + Ψ S Ψ S Ψ S Ψ S (a) = argmax ΨS 1 2 log Ψ S D+Ψ S C+Ψ S Ψ S (b) = argmax ΨS 1 2 log|Ψ S | Ψ S −(C+Ψ S )Ψ −1 S (D+Ψ S ) = 1 2 log|Ψ S | CΨ −1 S D+ C+ D . (2.15) Here, (a) follows since exchanging rows does not change the absolute value of the determinant and (b) follows from the properties of the determinant of a block matrix (see [54]). By Hadamard’s inequality [29], the expression in Eqn.(2.15) is maximized whenΨ S is diagonal. Although we have shown that diagonal Ψ S maximizes the objective function, it remains to be shown that by choosing Ψ S and Ψ R to be diagonal, the compression rate ˆ C R at the relay is not altered. The compression rate is determined by the inequality constraint (2.21), which can be rewritten as, log ˆ C R ≥ log D SR Ψ S D † SR + C R + ˆ C R D SR Ψ S D † SD D SD Ψ S D † SR D SD Ψ S D † SD + C D −log D SD Ψ S D † SD + D RD Ψ R D † RD + C D . . = f(Ψ S ,Ψ R ) (2.16) Now we have to establish that diagonal Ψ S and Ψ R does not reduce the compression rate constraint set. We prove this by showing that for every choice of arbitrary Ψ S and Ψ R , we can find a diagonal Ψ ∗ S and Ψ ∗ R , which either gives the same constraint on the compression rate or improves it,.i.e., f(Ψ S ,Ψ R ) ≥ f(Ψ ∗ S ,Ψ ∗ R ) First for any choice (diagonal or non-diagonal) of Ψ S , we will find a condition on Ψ R which will maximally enlarge the compression rate constraint set. We consider the second term in the RHS of (2.16), which is a function ofΨ R , whereΨ R is a general non-negative definite matrix, not necessarily diagonal. 24 log D SD Ψ S D † SD + D RD Ψ R D † RD + C D ≡ log Ψ S + D −1 SD D RD Ψ R D † RD D †−1 SD + D −1 SD C D D †−1 SD | {z } ≡C1 (a) = log E S Λ S E † S + C 1 (b) = log Λ S + E † S C 1 E S (c) ≤ log|Λ S +Λ C1 | (2.17) where in (a), Ψ S = E S Λ S E † S is the spectral decomposition of Ψ S with E S being the unitary eigenvector matrix associated with the diagonal eigenvalue matrixΛ S . (b) is true because eigenvector matrix is unitary and (c) follows from the well known Hadamard’s inequality (see e.g.[29]), where Λ C1 is the eigenvalue matrix of C 1 . Since increase in the second term in the RHS of (2.16) will make the constraint set larger, for a fixed choice of non-diagonal Ψ S , equality is achieved in (2.17) if Ψ R is chosen from a family of positive definite matrices (by varying Λ C1 ) diagonalized by E S and of the form Ψ R = D −1 RD D SD (E S Λ C1 E † S − D −1 SD C D D †−1 SD )D † SD D †−1 RD , satisfying the power constraint (2.2) at the relay and maximizing the determi- nant in (2.17). For diagonalΨ S , it is easy to verify that diagonalΨ R will make C 1 diagonal. 25 Now with such choice of Ψ R , we will show that it is sufficient to consider only diagonal Ψ S . Let us consider the RHS of (2.16) again, log D SR Ψ S D † SR + C R + ˆ C R D SR Ψ S D † SD D SD Ψ S D † SR D SD Ψ S D † SD + C D −log D SD Ψ S D † SD + D RD Ψ R D † RD + C D ≡ log C R + ˆ C R 0 0 C D + D SR 0 0 D SD Ψ S Ψ S Ψ S Ψ S D † SR 0 0 D † SD −log|Ψ S + C 1 | ≡ log C 2 0 0 C 3 + Ψ S Ψ S Ψ S Ψ S −log|Ψ S + C 1 | = log Ψ S Ψ S + C 3 Ψ S + C 2 Ψ S −log|Ψ S + C 1 | (a) = log|Ψ S |+log (Ψ S + C 2 )Ψ −1 S (Ψ S + C 3 )−Ψ S −log|Ψ S + C 1 | = log|Ψ S |+log C 2 + C 3 + C 2 Ψ −1 S C 3 −log|Ψ S + C 1 | ≡ log|Ψ S |+log Ψ −1 S + C ′ 1 −log|Ψ S + C 1 | = log I n +Ψ S C ′ 1 −log|Ψ S + C 1 | | {z } ≡g(ΨS) (2.18) where C 2 = D † SR C R + ˆ C R D SR , C 3 = D † SD C D D SD and C ′ 1 = (C 2 + C 3 ) C −1 2 C −1 3 are all diagonal matrices and (a) follows from the fact thatdet A B C D = det(A)det D− CA −1 B , with A = D = Ψ S , B = Ψ S + C 3 and C = Ψ S + C 2 . The constraint set{Ψ S : tr(Ψ S )≤ P S } is closed and bounded andg(Ψ S ) is continuous everywhere. Thus by the extreme value theorem (see [96]),g(Ψ S ) has a maxima and mininma in the constraint set and the extreme points can be shown to be attained by the diagonal Ψ S by differentiatingg(Ψ S ) w.r.t. Ψ S and solving the KKT conditions (the proof is similar to the proof of Theorem 3 in [36] and thus omitted for brevity). So for every non-diagonal Ψ S , we can find a diagonal Ψ S which gives the same value ofg(Ψ S ). Thus by choosingΨ S andΨ R to be diagonal, the compression rate is not reduced and a diagonal Ψ S maximizes the objective function Eqn. (2.20). This completes the proof. 26 With the Lemma 3 in hand, we can now prove Theorem 2. Proof. As the n-CGRC with memory m is block memoryless, we can extend the CF achievable rate results for scalar relay channels [26] to the vector relay channels to yield the following lower bound on the capacity, C c nCF = sup 1 n I(X n S ;Y nc D , ˆ Y nc R |X n R ) with, I(X n R ;Y nc D )≥I(Y nc R ; ˆ Y nc R |X n R ,Y nc D ), (2.19) where ˆ Y nc R is the quantized version ofY nc R and the supremum is taken over all joint distributions of the form, p(x n S ,x n R ,y nc R ,ˆ y nc R ,y nc D ) = p(x n S )p(x n R )p(y nc D ,y nc R x n S ,x n R )p(ˆ y nc R |y nc R ,x n R ). Using Eqn. (2.3) and our assumption of Gaussian inputs, we have, I(X n S ;Y nc D , ˆ Y nc R |X n R ) =h(Y nc D , ˆ Y nc R |X n R )−h(Y nc D , ˆ Y nc R |X n S ,X n R ) (a) =h(H c SD X n S + H c RD X n R +V nc D ,Y nc R + ˆ V nc R |X n R ) −h(H c SD X n S + H c RD X n R +V nc D ,Y nc R + ˆ V nc R |X n S ,X n R ) (b) = 1 2 log(2πe) 2 H c SD Σ S H c† SD + N D H c SD Σ S H c† SR H c† SD Σ S H c SR H c SR Σ S H c† SR + N R + ˆ N R − 1 2 log(2πe) 2 N D 0 0 N R + ˆ N R (c) = 1 2 log(2πe) 2 D SD Ψ S D † SD + C D D SD Ψ S D † SR D † SD Ψ S D SR D SR Ψ S D † SR + C R + ˆ C R − 1 2 log(2πe) 2 C D 0 0 C R + ˆ C R . (2.20) Here, (a) follows from the fact that we assume that the quantized version ofY nc R , ˆ Y nc R is given byY nc R + 27 ˆ V nc R , where ˆ V nc R is a sequence of independent random variables (whose covariance matrix ˆ N R = F † ˆ C R F will be optimized to maximize the achievable rate), which are also independent of the input vectors and additive noises; (b) follows from the fact that the input vectors are jointly Gaussian, and (c) holds because both the channel matrix H c l and noise covariance matrix N q are circulant by construction; hence they are diagonalized by the unitary matrix F. The inequality constraint with the Gaussian inputs can be simplified as follows. I(X n R ;Y nc D ) = 1 2 log(2πe) D SD Ψ S D † SD + D RD Ψ R D † RD + C D −h(Y nc D |X n R ). Similarly, for the RHS of Inequality (2.19), we have I(Y nc R ; ˆ Y nc R |X n R ,Y nc D ) = h( ˆ Y nc R ,Y nc D |X n R )−h(Y nc D |X n R )−h( ˆ V nc R ) = 1 2 log(2πe) 2 D SR Ψ S D † SR + C R + ˆ C R D SR Ψ S D † SD D SD Ψ S D † SR D SD Ψ S D † SD + C D −h(y nc D |x n R ) − 1 2 log(2πe) ˆ C R . Thus the inequality constraint is given by log ˆ C R ≥ log D SR Ψ S D † SR + C R + ˆ C R D SR Ψ S D † SD D SD Ψ S D † SR D SD Ψ S D † SD + C D −log D SD Ψ S D † SD + D RD Ψ R D † RD + C D . (2.21) Now using Lemma 3, we can not only restrict Ψ S and Ψ R to be a diagonal matrices, but as with the computation of the DF achievable rate, a stronger result, which says that the CF achievable rate with Gaussian inputs is maximized when we decompose the network inton parallel, scalar relay channels, can be proved. Note that a diagonal Ψ S and Ψ R alone does not achieve the desired decomposition into parallel relay channels as a diagonal Ψ S and Ψ R does not block diagonalize the matrices in Eqn. (2.20) and (2.21). 28 The implied statistical independence by diagonal Ψ S and Ψ R coupled with a proper orthonormal permu- tation matrix does achieve our desired result, which we will show next. We define such an orthonormal permutation matrix, P, such that, P 2j−1,j = 1 for j = 1,··· ,n P 2j−2n,j = 1 for j =n+1,··· ,2n. Employing the defined P and usingdet(I n + PAP T ) = det(I n + A), we rewrite the Eqn.(2.20) as, C c nCF = 1 2 log I 2n + C −1 D 0 0 C R + ˆ C R −1 D SD Ψ S D † SD D SD Ψ S D † SR D SR Ψ S D † SD D SR Ψ S D † SR = 1 2 log I 2n + P C −1 D 0 0 C R + ˆ C R −1 P T P |{z} I D SD Ψ S D † SD D SD Ψ S D † SR D SR Ψ S D † SD D SR Ψ S D † SR P T . (2.22) It can be easily shown that, P C −1 D 0 0 C R + ˆ C R −1 P T , is a purely diagonal matrix, however if we treat it as a block-diagonal matrix, then itsi-th 2×2 diagonal block is C −1 Dii 0 0 C Rii + ˆ C Rii −1 = 1 ND(ωi) 0 0 1 NR(ωi)+ ˆ NR(ωi) , whereas P D SD Ψ S D † SD D SD Ψ S D † SR D SR Ψ S D † SD D SR Ψ S D † SR P T 29 is a2n×2n block diagonal matrix where thei-th diagonal block is a2×2 matrix given by, D SDii Ψ Sii D ∗ SDii D SDii Ψ Sii D ∗ SRii D SRii Ψ Sii D ∗ SDii D SRii Ψ Sii D ∗ SRii = |H SD (ω i )| 2 P S (ω i ) H SD (ω i )H ∗ SR (ω i )P S (ω i ) H ∗ SD (ω i )H SR (ω i )P S (ω i ) |H SR (ω i )| 2 P S (ω i ) . Thus, the channel is decoupled, as the inputs corresponding to different coordinates do not interfere. Thus we have, C c nCF = 1 n n X i=1 I(X Si ;Y c Di , ˆ Y c Ri |X Ri ), Similarly it can be shown for the compression rate constraint (2.21). With the decoupled channel, it is very easy to derive the expression in Theorem 2. As obvious from the Lemma 3 that the decomposition across the frequency band is also optimal for the CF achievable rate just as with the DF protocol. But note that in contrast to the case of DF, then parallel memoryless relay channels for CF are not only coupled via the input power constraints but also by the compression rate constraint. 2.2.3 Capacity Upper Bound For the upper bound, we generalize the max-flow-min-cut theorem stated in [26, 29] for the block memo- ryless relay channel, C cup n (P S ,P R ) = sup q 1 n min{C cup 1n ,C cup 2n } where C cup 1n =I(X n S ;Y nc R ,Y nc D |X n R ), C cup 2n =I(X n S ,X n R ;Y nc D ) . The maximization is taken over the same input distribution as in the lower bound (see Lemma 1). For the Gaussian relay channel, using similar matrix manipulations as done in the cases of the achievable rates, we can obtain, C cup 1n ≤ n X i=1 1 2 C α(ω i )P S (ω i ) |H SR (ω i )| 2 N R (ω i ) + |H SD (ω i )| 2 N D (ω i ) !! , (2.23) 30 C cup 2n = 1 n n X i=1 1 2 C P(ω i ) N D (ω i ) , (2.24) where,P(ω i ) is as defined in the Theorem 1. This result implies that decomposition of the network into parallel scalar relay channels can also be effectively employed in calculating an upper bound on capacity. 2.2.4 Optimality of the achievable rates We next examine scenarios under which CF and DF are capacity achieving. To this end, we generalize the definition of degradedness [26] to then-block memoryless relay channel as, Definition 1. An block memoryless vector relay channel (X n S ×X n R ,p(y nc R ,y nc D |x n S ,x n R ),Y nc R ×Y nc D ) is said to be degraded if, p(y nc R ,y nc D |x n S ,x n R ) = p(y nc R |x n S ,x n R )p(y nc D |x n R ,y n R ). An alternative statement of Definition 1 holds for the n-CGRC with ISI if we exploit properties of the DFT matrix and undertake some matrix manipulation: Definition 2. An-block memoryless Gaussian circular relay channel is said to be degraded if the following condition is satisfied, |H SR (ω i )| 2 N R (ω i ) ≥ |H SD (ω i )| 2 N D (ω i ) , ∀ 1≤i≤n. where variables are defined as before. Thus, if the source-to-relay channel SNR is better than the source- to-destination SNR in all of the frequency sub-bands, then the circular relay channel is degraded. If the relay channel is degraded, then it can be readily seen that (see [26]) the achievable rate using DF coding coincides with the cut-set upper bound. In contrast, with the compress-and-forward coding scheme, capacity is achieved if ˆ Y nc R is a deter- ministic, invertible function of the relay input Y nc R . To see this, we re-examine the achievable rate for compress-and-forward strategy (see Eqn (2.19)). If ˆ Y nc R is a deterministic, invertible function ofY nc R then, C c nCF = supI(X n S ;Y nc D , ˆ Y nc R |X n R ) = supI(X n S ;Y nc D ,Y nc R |X n R ), 31 subject to the constraint, I(X n R ;Y nc D ) ≥ I(Y nc R ; ˆ Y nc R |X n R ,Y nc D ) I(X n R ;Y nc D ) (a) ≥ h(Y nc R |X n R ,Y nc D ) I(X n R ;Y nc D ) ≥ I(X n S ;Y nc R |X n R ,Y nc D ) I(Y n S ;Y nc D |X n R )+I(X n R ;Y nc D ) ≥ I(X n S ;Y nc D |X n R )+I(X n S ;Y nc R |X n R ,Y nc D ) I(X n S ,X n R ;Y nc D ) ≥ I(X n S ;Y nc R ,Y nc D |X n R ). Here, (a) follows from the fact that h( ˆ Y nc R |Y nc R ,X n R ,Y nc D ) = 0 as ˆ Y nc R is a deterministic, invertible function of Y nc R . Under this condition it is clear that the cut-set upper bound is also given by, C up n = supI(X n S ;Y nc R ,Y nc D |X n R ) which in turn is equivalent to C c nCF . This condition can be realized when the relay and destination are co-located. In that case, the relay does not have to quantize its input Y nc R ( ˆ Y nc R =Y nc R ) as the ultimate receiver at the destination can decode the uncompressed relay input because of physical proximity and hence the CF coding rate achieves the cut-set upper bound. This result is in ac- cordance with the corresponding results on memoryless relay channels (see [65]) and will be emphasized with examples later in the chapter. 2.3 Limiting Capacity of the Relay Channel with memory As implied by the results in [39], our desired capacity bounds for the linear Gaussian relay channel with finite memory will be obtained if we take the limit asn goes to infinity for then-CGRC capacity bounds. However, due to the presence of the relay, taking such limits requires careful treatment. In this section, we take such limits to yield the primary results of the chapter. As before, we define the following power constraint set P C = α(·),P S (·),P R (·) : 0≤α(ω)≤ 1, 1 2π Z π −π P S (ω)dω≤P S , 1 2π Z π −π P R (ω)dω≤P R . Thus, we have, 32 Theorem 3. The achievable rate for a LGRC with finite memory m, where the relay uses decode-and- forward coding strategy, is given by, C DF (P S ,P R ) = max P C min{C 1DF ,C 2DF } where C 1DF = 1 4π R π −π C α(ω)|HSR(ω)| 2 PS(ω) NR(ω) dω, C 2DF = 1 4π R π −π C P(ω) ND(ω) dω, and P(ω) = |H SD (ω)| 2 P S (ω)+|H RD (ω)| 2 P R (ω)+2 q ¯ α(ω)|H SD (ω)H RD (ω)| 2 P S (ω)P R (ω). We have to show that the limit asn→∞, the expression in Theorem 1 converges to the one in Theorem 3. Due to the multihop nature of the probem, the rate expression in Theorem 1 involves a minimum over two functions, the result does not follow directly from the results of [48] and [39], which deal with single link systems. We now give a sketch of the proof. Proof. We need to show that, lim n→∞ C c nDF (P S ,P R ) = C DF (P S ,P R ). To prove this we will use a theorem on minimax optimization [79] stated below. Given,Φ :X×Z7→R, consider the minimax optimization problemmin x∈X max z∈Z Φ(x,z). Define, r x (z) =−Φ(x,z) if z∈Z,r(z) = max x∈X r x (z), t z (x) = Φ(x,z) if x∈X,t(x) = max z∈Z t z (x). Theorem 4. [79] If X and Z are convex and compact, t z (.) and r x (.) are closed and convex for each z∈Z andx∈X, respectively, thenmin x∈X max z∈Z Φ(x,z) = max z∈Z min x∈X Φ(x,z). Now, C c nDF (P S ,P R ) = max P D min{C c 1nDF ,C c 2nDF } (a) = max P D min 0≤λ≤1 (λC c 1nDF + ¯ λC c 2nDF ), 33 where, ¯ λ = 1−λ and here (a) follows from the fact thatλC c 1nDF + ¯ λC c 2nDF is a line connectingC c 1nDF andC c 2nDF ,λ = 0 andλ = 1 are two extreme points of this line. Let, Z n ≡ ( (α(ω i ),P S (ω i ),P R (ω i )) : 0≤α(ω i )≤ 1, 1 n n X i=1 P S (ω i )≤P S , 1 n n X i=1 P R (ω i )≤P R ) . X ≡ {λ : 0≤λ≤ 1}, Φ(x,z n ) =λC c 1n + ¯ λC c 2n . Using Theorem 4, it can be shown that max {zn∈Zn} min {x∈X} (λC c 1nDF + ¯ λC c 2nDF ) = min {x∈X} max {z∈Zn} (λC c 1nDF + ¯ λC c 2nDF ). Now we show that, lim n→∞ max Zn (λC c 1nDF + ¯ λC c 2nDF ) = max Z (λC 1DF + ¯ λC 2DF ) = max Z Φ(x,z), where, Z ≡ (α(ω),P S (ω),P R (ω)) : 0≤α(ω)≤ 1, 1 2π Z π −π P S (ω)≤P S , 1 2π Z π −π P R (ω)≤P R . Since,Φ(x,z) is a strictly concave function in each of the arguments inz and since each of arguments ofz lies in a convex constraint set, Φ(x,z) achieves its maximum at some uniquez =z ∗ . Similarly, Φ(x,z n ) attains its maxima at some uniquez n =z ∗ n . If a function is bounded and almost everywhere continuous on the interval [−π,π] then it is Riemann integrable on the interval,i.e., lim n→∞ 1 n P n i=1 f(x i ) = 1 2π R π −π f(x)dx. Since the capacity of a power constrained system is finite, it can be shown that (for details see Lemma 3 of [121]) lim n→∞ Φ(x,z ∗ n ) = Φ(x,z ∗ ) and lim n→∞ max Zn (λC c 1nDF + ¯ λC c 2nDF ) = max Z (λC 1DF + ¯ λC 2DF ). AsZ is closed and convex, applying Theorem 4 again to exchange the min-max, we get our desired result. We have the associated CF result: Theorem 5. The achievable rate for a LGRC with finite memorym, where the relay uses compress-and- forward coding strategy, is given by, C CF (P S ,P R ) = sup 1 4π Z π −π C P S (ω) |H SD (ω)| 2 N D (ω) + |H SR (ω)| 2 N R (ω)+ ˆ N R (ω) !! dω, 34 subject to the input power constraints as in the earlier bounds and, Z π −π log ˆ N R (ω)dω≥ Z π −π log P S (ω) |H SR (ω)| 2 N D (ω)+|H SD (ω)| 2 ˆ N(ω) + ˆ N(ω)N D (ω) |H SD (ω)| 2 P S (ω)+|H RD (ω)| 2 P R (ω)+N D (ω) dω. Proof. It is easy to see that the input power constraints and the inequality (2.14) form a closed but non- convex constraint set. But as the objective function is continuous and bounded in [−π,π] and concave in P S (ω) and convex in ˆ N R (ω), by the properties of the Riemann integral Eqns. (2.13) and (2.14) converge to the desired result. Finally, the cut-set bound for the relay channel with memory is similarly derived, yielding the result below, Theorem 6. An upper bound on the capacity of a LGRC with memory m is given by, C up (P S ,P R ) = max P C min{C up 1 ,C up 2 }, whereC up 1 = 1 4π R π −π C α(ω)P S (ω) |HSR(ω)| 2 NR(ω) + |HSD(ω)| 2 ND(ω) dω andC up 2 = C 2DF withC 2DF is as given in Theorem 3. 2.4 Illustrative Examples In this section, we discuss several simple examples to illustrate the computation and relationships of dif- ferent achievable rates and the upper bound. We will start with simple example of relay channel with equal transmission bandwidth on all the links to illustrate how the theorems are applied and then move on to other, more practical, examples. 2.4.1 Relay with equal bandwidths We examine the simplest case when all the channels are the same ideal low-pass filters of bandwidthW and the noise is AWGN (see Fig. 2.3). Let, N R (ω) =N 1 ,N D (ω) =N 1 +N 2 =N, 0≤ω≤ W 2π . 35 Figure 2.3: Relay with equal bandwidths on each link. Due to the common channel assumption and the noise variances, the relay channel is degraded. We make the following idealized assumption: we can achieve both strict bandwidth and time limitation simultane- ously, the key is that we require finite memory. It can be shown that the channel capacity is given by, C = max 0≤α≤1 minW C αP S N 1 W ,C P S +P R +2 √ ¯ αP S P R NW . Uniform input PSD’s achieve capacity, which is expected since all the parallel degraded relay channels are identical. This result is a generalization of the discrete memoryless Gaussian relay channel [26] in the bandwidth limited case. If PS N1 ≤ PR N2 , thenα = 1 maximizes the capacity and its given byWC PS N1W and if PS N1 > PR N2 then the capacity is given byWC α ∗ PS N1W , whereα ∗ is solution of αPS N1W = PS+PR+2 √ ¯ αPSPR NW . 2.4.2 Relay with unequal bandwidths Our second example considers a single-relay channel with different bandwidths across different links. This type of channel is common in DSL (see [128]), underwater communication where channel bandwidth depends on internode separation (see [135, 134]). Let, W SD ,W SR and W RD (W SD < W SR ,W SD < W RD ) be the bandwidths of the three links (see Fig. 2.4). Furthermore, we assume that all channels are ideal lowpass filters. Noise is AWGN and has the same PSD as in the earlier example. It can be readily seen that the relay channel is degraded and it can be shown that the network reduces to a degraded relay of 36 Figure 2.4: Decomposition of a relay with unequal bandwidths into a relay with equal bandwidth and a two hop channel with “remaining” bandwidths. bandwidthW SD and a 2-hop channel with link bandwidthsW SR −W SD andW RD −W SD . Hence the capacity of this channel is given by, C = minW SD C αP S1 N 1 W SD ,C P S1 +P R1 +2 √ ¯ αP S1 P R1 NW SD +min (W SR −W SD )C P S2 N 1 (W SR −W SD ) ,(W RD −W SD )C P R2 N(W RD −W SD ) , where,P S1 +P S2 ≤P S ,P R1 +P R2 ≤P R . The decomposition is very similar to the decomposition ob- served in [97] for the rate-constrained relay channel, where because of the constraint on the relay encoding rate, the source splits its rate between direct transmission and cooperative transmission using the relay. In our example, due to the excess bandwidth available on the 2 hop link, the source splits the rate between the two parallel sub-channels. 2.4.3 Comparison of achievable rates The example in this subsection is motivated by underwater acoustic communication. UW channels are characterized by unique physical and statistical properties. The physical property that we are interested in 37 Figure 2.5: Channel model of a single-relay channel with ISI. is the attenuation which depends on the propagation distance and carrier frequency of the transmitted sig- nal. For the statistical part, the channel is assumed to be WSSUS (Wide Sense Stationary with Uncorrelated Scattering). We model a UW channel by taking into account both properties to form a frequency-dependent fading multipath channel with colored Gaussian noise (for detailed channel model see [106, 92, 62]). In this example, we compare DF achievable rate studied with respect to the direct transmission from source to destination and 2 hop relaying with the total input power constraint remaining same in all the cases. For simulation, we have taken (Δf) c = 3.33 KHz for the SR and RD link and (Δf) c = 5 KHz for the SD link. A carrier frequency off c of27 KHz and an available transmission bandwidth of10 KHz are considered for all the 3 links. The physical attenuation of the channel is calculated using Thorp’s formula (see [9]). Rayleigh fading model is investigated where each channel tap is a complex Gaussian random process, whose variances sum up to1. Noise is colored Gaussian as defined in [106]. A total input power constraint of P t = 20 dB is considered for all the schemes. We also assume that the the fading realizations are independent of each other and the channel state information is available at all the nodes. For the performance analysis, we average the achieved rate for a particular realization of the channel over all realizations of the fading states in order to capture the effects of fading. We use the channel model of Fig. 2.5. Fig. 2.6 plots the capacity bounds in bits/sec/Hz as we vary the relay positiona. Comparing 2-hop, direct transmission, and the cooperative DF scheme, not surprisingly relay cooperation provides the best achievable rate in underwater acoustic communication. 38 Figure 2.6: Achievable rates and upper bound whenh = 0.25 Km,d SD = 1 Km anda varies from 0 to 1.0 Km. 2.5 Symbol asynchronous relay channel: MIMO Relay channel with memory A relay channel is said to have asynchronism among the nodes if the codewords transmitted from the source and relay do not coincide in time at the reciever. Frame synchronism refers to the ability of the nodes to receive or initiate the transmission of their codewords in unison. In many practical situations, it is perfectly reasonable to assume that this type of synchronism is achievable with the help of channel feedback or cooperation among transmitters. In contrast, symbol synchronism is far more challenging to achieve due to the smaller time scales. Asynchronism in multiuser networks has been examined in multiple contexts over the years from an information theoretic perspective; (see, e.g., [56, 121] and references therein). In this chapter, we find the bounds on the capacity of a symbol-asynchronous Gaussian single-relay channel in which the node i linearly modulates its symbols employing a fixed waveformξ i (t)– this could be a signature code or a pulse shape. We exploit the results of [121] , which examined the symbol asynchronous Gaussian MAC, to show that our symbol asynchronous relay network is equivalent to a relay with finite memory. With this equivalence in hand, we can exploit our prior results for relay channels with inter-symbol interference in Section IV to determine closed form expressions for upper and lower bounds on the capacity of the symbol 39 asynchronous relay channel. 2.5.1 Channel Model For a single-relay channel, assuming frame synchronism and additive white Gaussian noise model, we can write the continuous time received signals as, Y R (t) = n X i=1 X S (i)ξ S (t−iT−τ SR )+N R (t) (2.25) Y D (t) = n X i=1 X S (i)ξ S (t−iT−τ SD )+X R (i)ξ R (t−iT−τ RD )+N D (t), (2.26) where{X S (i)} n i=1 ,{X R (i)} n i=1 are the input sequences of the source and relay, respectively.ξ S (t),ξ R (t) are the unit energy modulating waveforms of support [0,T] used by the source and relay. The delays or offsetsτ SR ,τ SD ,τ RD account for the symbol asynchronism between the users and are known to the receiver. N R (t),N D (t) are additive white Gaussian noise at the relay and destination with power spectral density equal toσ 2 R andσ 2 D , respectively. We assume the same input power constraints as in (2.2). We can obtain an equivalent channel model with discrete-time outputs whose capacity is same as that of the continuous time one described above by considering the projection of the observation process {Y R (t),Y D (t)} along the direction of the unit energy signals{ξ S (t)} and{ξ R (t)} and their T-shifts: Y R (i) = Z (i+1)T+τSR iT+τSR Y R (t)ξ S (t−iT−τ SR )dt Y Dj (i) = Z (i+1)T+τRD iT+τjD Y D (t)ξ j (t−iT−τ jD )dt, (2.27) where,j∈{R,S}. By substituting (2.25) into (2.27) and by defining the cross-correlations between the assigned signature waveformsξ S (t) andξ R (t) as (assuming without loss of generality thatτ RD ≤τ SD ) ρ RS = Z T 0 ξ R (t)ξ S (t+τ RD −τ SD )dt, ρ SR = Z T 0 ξ R (t)ξ S (t+T +τ RD −τ SD )dt, 40 Figure 2.7: Discrete time equivalent of a symbol-asynchronous relay channel. it follows easily that the discrete-time channel output is given by, Y R (i) = X S (i)+N R (i) (2.28) Y DR (i) Y DS (i) = X |j|≤1 H(j) X R (i+j) X S (i+j) + N DR (i) N DS (i) , (2.29) where,H(0), H(−1) = H T (1) are given by, H(0) = 1 ρ RS ρ RS 1 , H(1) = 0 ρ SR 0 0 and 1≤ i≤ n (X S (0) = X S (n+1) = 0,X R (0) = X R (n+1) = 0); the discrete-time noise process {[ N DR (i) N DS (i) ] T } is Gaussian with zero mean and covariance matrix: E N DR (i) N DS (i) N DR (j) N DS (j) =σ 2 D H(i−j). Since the receivers at the relay and destination know the assigned waveformsξ S (t) andξ R (t) as well as the symbol epochs{iT+τ q }, q∈{SR,SD,RD}, these receivers can compute{Y R (i)} n i=1 ,{Y DR (i)} n i=1 and{Y DS (i)} n i=1 by passing the observations through two matched filters for signalsξ S (t) andξ R (t), re- spectively. The key observation is that this operation yields sufficient statistics for the transmitted messages [75] and the equivalent channel model is a MIMO relay channel with memory 2 (see Fig. 2.7). It is clear from the equivalent discrete time channel model that the capacity is independent of the delay of the signal 41 coming from source to relay, as the channel impulse responses of the three links in the relay channel are functions of the relative offsets between the users in the MAC (multiple-access channel) portion of the relay. If eitherρ RS = 1 orρ SR = 0, then the channel becomes memoryless, as signals are symbol syn- chronous. For example, if the users are assigned the same signal and the channel is symbol synchronous, both outputs coincide and are equal to Y R (i) = X S (i)+N R (i) Y D (i) = X S (i)+X R (i)+N D (i). The channel is then a conventional discrete-time Gaussian relay channel, whose capacity is discussed in [26]. If the assigned signals are not equal, but the users remain symbol synchronous, then the outputs reduce to the memoryless MIMO relay channel, Y R (i) = X S (i)+N R (i) Y DR (i) Y DS (i) = 1 ρ ρ 1 X R (i) X S (i) + N DR (i) N DS (i) , where{[N DR (i)N DS (i)] T } is an independent Gaussian process with covariance matrix given by, σ 2 D σ 2 D ρ σ 2 D ρ σ 2 D , andρ = R T 0 ξ S (t)ξ R (t)dt. The capacity of memoryless MIMO relay channel is studied in [122]. 2.5.2 Achievable rates and upper bound We examine two relay coding strategies: a) Decode-and-Forward (DF) and b) Compress-and-forward (CF) and the "cut-set" upper bound. As CF and DF have differing regimes in which they offer the best rate [65] for memoryless channels, it is of interest to investigate both coding strategies for the symbol-synchronous relay channel as we do herein. 42 Theorem 7. The DF achievable rate of a symbol asynchronous relay channel with input power constraints (2.2) is given by, C DF (P S ,P R ) = sup 1 4π min Z 2π 0 C 1 σ 2 R α(ω)P S (ω) dω, Z 2π 0 C 1 σ 2 D (P S (ω)+P R (ω)+2 p (1−α(ω))P S (ω)P R (ω)ρ(ω) + 1 σ 4 D α(ω)P S (ω)P R (ω)(1−ρ 2 (ω)) dω , where,C(x) = log(1+x),ρ(ω) =ρ RS +ρ SR cos(ω) andP S (ω) andP R (ω) are the power allocated by the source and relay in the bandω andα(ω) is the correlation between the source and relay codewords in the bandω. Proof. Before proving the theorem, we introduce two key lemmas. Lemma 4. det I 2n + 1 σ 2 D E[x n x nT ]G = det I 2n + 1 σ 2 D Σ R Σ RS Σ SR Σ S I n S S T I n , where, S T = ρ RS I n +ρ SR 0 1 0 0 0 1 0 0 . . . 0 1 0 0 1 1 0 0 , G = 1 ρ RS ρ SR ρ RS 1 ρ SR ρ SR 1 ρ RS . . . ρ SR 1 ρ RS ρ SR ρ RS 1 . This Lemma can be easily derived from Lemma 1 of [121]. The key to showing Theorem 7 is proving 43 that the DFT decomposition is again optimal for DF. To this end, we will need the following Lemma. Lemma 5. det I 2n + 1 σ 2 D Ψ R Ψ RS Ψ SR Ψ S I n D D ∗ I n ≤ n Y i=1 det I 2 + 1 σ 2 D ψ Rii ψ RSii ψ SRii ψ Sii 1 d ii d ∗ ii 1 , where, D is a diagonal matrix withd ii as it’si-th diagonal element. The proof of this Lemma uses appropriate permutation matrix P to block-diagonalize the LHS, similar to the proof of Lemma 3. With Lemma 4 and 5 in hand, we can now prove the Theorem 7. The input-output relation of a discrete-time Gaussian circular MIMO relay channel model is given by, Y c R (i) = X S (i)+N c R (i), (2.30) ¯ Y c D (i) = Y c DR (i) Y c DS (i) = 0 ρ SR 0 0 X R (i−1) n X S (i−1) n + 1 ρ RS ρ RS 1 X R (i) X S (i) + 0 0 ρ SR 0 X R (i+1) n X S (i+1) n + N c DR (i) N c DS (i) , (2.31) for1≤i≤n; where((i) n ) equalsi modulon except when s is zero or an integer multiple ofn, in which case ((i) n ) = n. The noise processes ovar each n-block{N c R (i)} and N c DR (i) N c DS (i) are circular and its autocorrelation is a periodic repitition of the autocorrelation of the original noise samples within ann-block as defined in [39]. The output of the channel in vector form can be written as, Y c R (1) Y c R (2) . . . Y c R (n−1) Y c R (n) = I n X S (1) X S (2) . . . X S (n−1) X S (n) + N c R (1) N c R (2) . . . N c R (n−1) N c R (n) , (2.32) 44 Y c DR (1) Y c DS (1) Y c DR (2) . . . Y c DR (n) Y c DS (n) = G X R (1) X S (1) X R (2) . . . X R (n) X S (n) + N c DR (1) N c DS (1) N c DR (2) . . . N c DR (n) N c DS (n) , (2.33) where, the Gaussian noise processes{N c R (i)} and N c DR (i) N c DS (i) have covariance matrix σ 2 R I n andσ 2 D G, respectively. So, I(X n S ,X n R ; ¯ Y nc D ) = h( ¯ Y nc D )−h( ¯ Y nc D |X n S ,X n R ) = h(GX n + ¯ N nc D )−h(GX n + ¯ N nc D |X n S ,X n R ) (a) = h(GX n + ¯ N nc D )−h( ¯ N nc D ) (b) ≤ 1 2 logdet cov(GX n + ¯ N nc D ) − 1 2 logdet σ 2 D G = 1 2 logdet I 2n + 1 σ 2 D E[X n X nT ]G (c) = det I 2n + 1 σ 2 D Σ R Σ RS Σ SR Σ S I n S S T I n (d) = 1 2 logdet I 2n + 1 σ 2 D Ψ R Ψ RS Ψ SR Ψ S I n D D ∗ I n (e) ≤ n Y i=1 det I 2 + 1 σ 2 D ψ Rii ψ RSii ψ SRii ψ Sii 1 d ii d ∗ ii 1 (2.34) where,X n ={X R (i),X S (i)} n i=1 and (a) is justified since noise is independent of the inputs, (b) follows as Gaussian input maximizes entropy for a given input covariance matrix, (c) follows from the fact that S is a circulant matrix and hence it gets diagonalized by the DFT matrix F i.e., S = FDF † , (d) follows from the Lemma 4 and (e) is true because of Lemma 5. Here,d ii is thei-th eigenvalue of the circulant matrix S and is given by the DFT of the first column of S and thus d ii = ρ RS +ρ SR e −jωi = ρ(ω i ). Let, ψ Rii = P R (ω i ),ψ Sii = P S (ω i ) be the are the power allocated by the source and the relay fori-th component of the channel andα(ω i ) be the correlation 45 betweenX Si andX Ri as defined in [26], thenψ RSii = p (1−α(ω i ))P S (ω i )P R (ω i ) and subtituting these values in (2.34) we get, I(X n S ,X n R ; ¯ Y nc D ) ≤ 1 2 n X i=1 log 1+ 1 σ 2 D (P S (ω i )+P R (ω i )+2 p (1−α(ω i ))P S (ω i )P R (ω i )ρ(ω i )) + 1 σ 4 D α(ω)P S (ω i )P R (ω i )(1−ρ 2 (ω i )) (2.35) Similarly, I(X n S ;Y nc R |X n R ) = h(X n c +N nc R )−h( ¯ N nc R ) ≤ 1 2 logdet I n + 1 σ 2 R Σ = 1 2 logdet I n + 1 σ 2 R Ψ (a) ≤ 1 2 n X i=1 log 1+ 1 σ 2 R α(ω i )P S (ω i ) (2.36) where,X n c = X n S |X n R and Σ is the conditional covariance matrix and it can be expressed in terms of the input covariance matrices and its given by Σ = E[X n c (X n c ) T ] = Σ S −Σ SR Σ −1 R Σ † SR . (a) can be shown using Lemma 2. Now combining Eqn. (2.35) and (2.36) and taking the limit in the block-lengthn, we get the expression of Theorem 7. Theorem 8. The CF achievable rate of a symbol asynchronous relay channel with input power constraints (2.2) is given by, C CF (P S ,P R ) = sup 1 4π Z 2π 0 log A(ω) 1+ ˆ N R (ω) dω Z 2π 0 log A(ω) B(ω) dω ≤ 0, where, A(ω) = 1 + ˆ N R (ω) + 2P S (ω) + ˆ N R (ω)P S (ω), B(ω) = ˆ N R (ω)((1 +P S (ω))(1 +P R (ω))− P S (ω)P R (ω)ρ 2 (ω)) and ˆ N R (ω) is the compression noise at the relay, which limits the amount of com- pression that can be performed at the relay. 46 Theorem 9. The upper bound on the capacity of a symbol asynchronous relay channel with input power constraints (2.2) is given by, C upper (P S ,P R ) = sup 1 4π min Z 2π 0 C 1 σ 2 R + 1 σ 2 D α(ω)P S (ω) dω, Z 2π 0 C 1 σ 2 D (P S (ω)+P R (ω)+2 p (1−α(ω))P S (ω)P R (ω)ρ(ω) + 1 σ 4 D α(ω)P S (ω)P R (ω)(1−ρ 2 (ω)) dω , The proof of these two Theorem 8 and 9 are very similar to the Theorem 7 and thus omitted for brevity. The results can be easily extended to the case where the transmitters only know that the crosscorre- lations (ρ RS ,ρ SR ) that parametrize the channel belong to an uncertainty set Γ, which is determined by the choice of the signature waveforms. For example, if both users are assigned a rectangular waveform then the uncertainty set is equal to the segment Γ ={0≤ ρ RS ≤ 1,0≤ ρ SR ≤ 1,ρ SR +ρ RS = 1}. The achievable rate of the Gaussian asynchronous relay channel under this condition is obtained by tak- ing infimum of the rates in Theorem 7 and 8 over the set Γ. This follows simply because for reliable communication a code has to be good no matter which actual channel is in effect. We will now compare DF and CF achievable rate for symbol-asynchronous relay and show the respec- tive optimal regions with an example. We place the nodes such that the source, relay and destination are collinear, and the distances of the direct paths in each of the links are given byd SR =d,d RD = 1−d and d SD = 1. This corresponds to the relay channel of Fig. 2.5 withh = 0 anda =d. The channel impulse response at each of the three links follows inverse power law, .i.e.,H SR =d − α 2 ,H SR = (1−d) − α 2 and H SD = 1, whereα is the attenuation constant. Modulating waveformsξ S (t) andξ R (t) are rectangular waveforms of unit energy and finite supportT . For simplicity, the additive noises at the relay and destina- tion are white, Gaussian and of unit power. The relative delays between the users are proportional to the distance, for e.g.,τ SR = Td,τ RD = T(1−d) andτ SD = T , which means if the relay is at the source, τ SR = 0 andτ RD =τ SD =T , and similarly, if the relay and destination are co-located,τ SR =τ SD =T andτ RD = 0. With equal power constraints at the source and relay, we plot DF and CF achievable rates and upper bound on the capacity of this channel model in Fig. 2.8, as we move the relay from source to the desti- nation. It can be seen that when the relay and destination are co-located (d→ 1), then relay can send its input directly to the destination without quantizing it and thus ˆ N Rii = 0, ∀i. In this case, the CF rate 47 Figure 2.8: Rates for symbol-asynchronous relay network forP S = 10,P R = 10,α = 2 and varyingd. performs optimally, matching the cut-set upper bound as ˆ Y nc R =Y nc R , a deterministic function ofY nc R . In contrast, when d→ 0, DF is optimal. Note that, when d→ 0 the channel is symbol-synchronous and memoryless asτ RD = τ SD andξ S (t) = ξ R (t). These trends for CF and DF for the LGRC mimic those for the memoryless relay [65]. 2.6 Concluding Remarks In this chapter, we have derived single-letter expressions for the achievable rates and an upper bound on the capacity of a state-dependent relay channel with states known at all the nodes. As argued in the introduc- tion, such systems find wide application in a variety of wideband wireless communication systems, such as underwater acoustic and terrestrial ultrawideband. We have examined two important coding strategies for the relay channel, decode-and-forward and compress-and-forward; further, we provide an upper bound that is a generalization of the cut-set bound for multi-terminal networks. Our results suggest the optimality of OFDM input signaling for relay channels with memory. While this chapter talks about the informa- tion theoretic limits of communication over a relay channel with states, it is natural to also consider the corresponding estimation problem where the encoder is partially or fully aware of the channel state and it wishes to help it reveal to the decoder in addition to sending independent information. We will formulate this joint communication and sensing problem in the next chapter. 48 Chapter 3 Causal state communication In this chapter, we formulate the problem of simultaneous transmission of channel state and an independent additional information over a state dependent channel with partial or full state information at the encoder. It is easy to see the tension between transmission rate and estimation error in estimating the channel state. For e.g., if the decoder wishes to estimate the state with maximum fidelity, the encoder should allocate all the available rate to send a fine description of its state knowledge, which will make information rate zero. On the other side, if the encoder uses part of the available rate to send an independent message, the state estimation error at the decoder will go up as the remaining rate can only provide a coarser description of the channel state. We will study this trade-off between independent information rate and estimation error in this chapter for a point-to-point state dependent channel. The rest of this chapter is organized as follows. Section 3.1 describes the basic channel model with discrete alphabets, characterizes the minimum distortion in estimating the state, establishes its achievability and proves the converse part of the theorem. Section 3.2 extends the results to the information rate– distortion tradeoff setting, wherein we evaluate the capacity–distortion function with strictly causal state at the encoder. Since the intuition gained from the study of the strictly causal setup carries over when the encoder has causal knowledge of the state sequence, the causal case is treated only briefly in Section 3.3 with key examples provided for the causal case. Finally, Section 3.4 concludes the chapter. 3.1 Problem Setup and Main Result Consider a point-to-point communication system with state depicted in Fig. 3.1. Suppose that the encoder has strictly causal access to the channel state sequence S n and wishes to communicate the state to the 49 X i S i Y i ˆ S n Encoder Decoder p(s) p(y|x,s) S i−1 Figure 3.1: Strictly causal state communication. decoder. We assume a DMC with a DM state model (X×S,p(y|x,s)p(s),Y) that consists of a finite input alphabetX , a finite output alphabetY, a finite state alphabetS, and a collection of conditional pmfs p(y|x,s) onY. The channel is memoryless in the sense that, without feedback, p(y n |x n ,s n ) = Q n i=1 p Y|X,S (y i |x i ,s i ), and the state is memoryless in the sense that the sequence (S 1 ,S 2 ,...) is inde- pendent and identically distributed (i.i.d.) withS i ∼p S (s i ). An(|S| n ,n) code for strictly causal state communication over the DMC with DM state consists of • an encoder that assigns a symbolx i (s i−1 )∈X to each past state sequences i−1 ∈S i−1 fori∈ [1: n], and • a decoder that assigns an estimate ˆ s n ∈ ˆ S n to each received sequencey n ∈Y n . The fidelity of the state estimate is measured by the expected distortion E(d(S n , ˆ S n )) = 1 n n X i=1 E(d(S i , ˆ S i )), whered :S× ˆ S→ [0,∞) is a distortion measure between a state symbols∈S and a reconstruction sym- bol ˆ s∈ ˆ S. Without loss of generality, we assume that for every symbols∈S there exists a reconstruction symbol ˆ s∈ ˆ S such thatd(s,ˆ s) = 0. A distortionD is said to be achievable if there exists a sequence of(|S| n ,n) codes such that limsup n→∞ E(d(S n , ˆ S n ))≤D. We next characterize the minimum distortionD ∗ , which is the infimum of all achievable distortionsD. 50 Theorem 10. The minimum distortion for strictly causal state communication is D ∗ = min E(d(S, ˆ S)), where the minimum is over all conditional pmfsp(x)p(u|x,s) and functions ˆ s(u,x,y) such that I(U,X;Y)≥I(U,X;S). To illustrate this result, we consider the following. Example 1 (Quadratic Gaussian state communication). Consider the Gaussian channel with additive Gaussian state [24] Y =X +S +Z, where the stateS∼ N(0,Q) and the noiseZ∼ N(0,N) are independent. Assume an expected average transmission power constraint n X i=1 E(x 2 i (S i−1 ))≤nP, where the expectation is over the random state sequence S n . We assume the squared error (quadratic) distortion measured(s,ˆ s) = (s−ˆ s) 2 . We compare different transmission strategies for estimating the state at the decoder. In the classical communication paradigm, the encoder would ignore its knowledge of the channel state (since the strictly causal state information at the encoder does not increase the channel capacity) and transmit an agreed- upon training sequence to the decoder. The minimum distortion is achieved by estimating the stateS i via minimum mean squared error (MMSE) estimation from the noisy observationY i =X i +S i +Z i (with the MMSE estimate ˆ S i = E(S i |Y i )) and is given by D = E(( ˆ S i −S i ) 2 ) = E(S 2 i )− E(S i Y i ) 2 E(Y 2 i ) = QN Q+N . Note that the result is independent of the particular sequenceX i , i.e., one could “send”X i = 0,i∈ [1:n]. This distortion is optimal when the encoder is oblivious of the state sequence as shown in [132]. Alternatively, a block Markov coding scheme can be performed, in which the encoder communicates a description of the state sequence in the previous block using a capacity-achieving code. This strategy is similar to a source–channel separation scheme, whereby the state sequence is treated as a source and the 51 compressed version of the source is sent across the noisy channel at a rate lower than the capacity. Since the distortion–rate function of the state isD(R) = Q2 −2R (see, for example, [29]) and the capacity of the channel (with strictly causal state information at the encoder) isC = C(P/(Q+N)), the distortion achieved by this coding scheme isD = D(C) = Q(Q+N)/(P +Q+N). It is straightforward to see that for the same values of P,Q and N, ignoring the state knowledge at the encoder can offer a lower distortion than using this (suboptimal) block Markov encoding scheme. The minimum distortion can be achieved by performing another block Markov coding scheme, wherein the encoder communicates a description of the state sequence in the previous block by incorporating side information (X,Y) about the stateS of the previous block at the decoder and using a Wyner–Ziv coding at a rate equal to the capacity of the channelC = C(P/(Q+N)). Thus, this strategy replacesD(R) = Q2 −2R of the last scheme with the Wyner–Ziv distortion–rate functionD WZ (R) = (QN/(Q+N))2 −2R (see [127]) and the minimum distortion can be determined by computingD ∗ = D WZ (C) = QN/(P + Q +N). (The proof of optimality is provided in Section III.) Note that D ∗ can be attained by setting X =αU∼ N(0,P) andU =S+ ˜ S in Theorem 10, where ˜ S∼ N(0,Q/P) is independent of(S,X) and ˆ S = E(S|U,X,Y) = E(S|S + ˜ S,S +Z). In the following two subsections, we prove Theorem 10. 3.1.1 Proof of Achievability We useb transmission blocks, each block consisting ofn symbols. In blockj, a description of the state sequenceS n (j−1) in blockj−1 is sent. Codebook generation. Fix a conditional pmfp(x)p(u|x,s) and functionˆ s(u,x,y) such thatI(U,X;Y)> I(U,X;S), and letp(u|x) = P s p(s)p(u|x,s). For eachj∈ [1:b], randomly and independently generate 2 nRS sequencesx n (l j−1 ),l j−1 ∈ [1:2 nRS ], each according to Q n i=1 p X (x i ). For eachl j−1 ∈ [1:2 nRS ], randomly and conditionally independently generate 2 n ˜ RS sequencesu n (k j |l j−1 ),k j ∈ [1 : 2 n ˜ RS ], each according to Q n i=1 p U|X (u i |x i (l j−1 )). Partition the set of indices k j ∈ [1 : 2 n ˜ RS ] into equal-size bins B(l j ) = [(l j −1)2 n( ˜ RS−RS) +1:l j 2 n( ˜ RS−RS) ],l j ∈ [1:2 nRS ]. This defines the codebook C j = (x n (l j−1 ),u n (k j |l j−1 ): l j−1 ∈ [1:2 nRS ], k j ∈ [1:2 n ˜ RS ] , j∈ [1:b]. The codebook is revealed to both the encoder and the decoder. 52 Encoding. By convention, letl 0 = 1. At the end of blockj, the encoder finds an indexk j such that (s n (j),u n (k j |l j−1 ),x n (l j−1 ))∈T (n) ǫ ′ . If there is more than one such index, it selects one of them uniformly at random. If there is no such index, it selects an index from[1:2 n ˜ RS ] uniformly at random. In blockj+1 the encoder transmitsx n (l j ), where l j is the bin index ofk j . Decoding. Let ǫ > ǫ ′ . At the end of block j + 1, the decoder finds the unique index ˆ l j such that (x n ( ˆ l j ),y n (j + 1))∈T (n) ǫ . (If there is more than one such index, it selects one of them uniformly at random. If there is no such index, it selects an index from [1:2 nRS ] uniformly at random.) It then finds the unique index ˆ k j ∈B( ˆ l j ) such that (u n ( ˆ k j | ˆ l j−1 ),x n ( ˆ l j−1 ),y n (j))∈T (n) ǫ . Finally it computes the reconstruction sequence as ˆ s i (j) = ˆ s(u i ( ˆ k j | ˆ l j−1 ),x i ( ˆ l j−1 ),y i (j)) fori∈ [1:n]. Analysis of expected distortion. Let L j−1 ,K j ,L j be the indices chosen in block j. We bound the distortion averaged over the random choice of the codebooksC j ,j∈ [1:b]. Define the “error” event E(j) = (S n ,U n ( ˆ K j | ˆ L j−1 ),X n ( ˆ L j−1 ),Y n (j)) / ∈T (n) ǫ and consider the events E 1 (j) = (S n ,U n (K j |L j−1 ),X n (L j−1 ),Y n (j)) / ∈T (n) ǫ , E 2 (j−1) ={ ˆ L j−1 6=L j−1 }, E 2 (j) ={ ˆ L j 6=L j }, E 3 (j) ={ ˆ K j 6=K j }. Then by the union of events bound, P{E(j)}≤ P{E 1 (j)}+ P{E 2 (j−1)}+ P{E 2 (j)}+ P{E c 2 (j−1)∩E c 2 (j)∩E 3 (j)}. We bound each term. For the first term, let ˜ E 1 (j) = (S n ,U n (K j |L j−1 ),X n (L j−1 )) / ∈T (n) ǫ ′ 53 and note that P{E 1 (j)}≤ P{ ˜ E 1 (j)}+ P{ ˜ E c 1 (j)∩E 1 (j)}. By the independence of the codebooks (in particular, the independence ofL j−1 andC j ) and the covering lemma [35, Sec. 3.7], P{ ˜ E 1 (j)} tends to zero as n→∞ if ˜ R S > I(U;S|X) +δ(ǫ ′ ). Since ǫ > ǫ ′ and Y n (j)|{U n (K j |L j−1 = u n ,X n (L j−1 ) = x n ,S n (j) = s n } ∼ Q n i=1 p Y|X,S (y i |x i ,s i ), by the conditional typicality lemma [35, Sec. 2.5], P{ ˜ E c 1 (j)∩E 1 (j)} tends to zero asn→∞. Next, by the same independence of the codebooks and the packing lemma [35, Sec. 3.2], P{E 2 (j−1)} and P{E 2 (j)} tend to zero asn→∞ if R S <I(X;Y)−δ(ǫ). (3.1) Finally, following the same steps as in the analysis of the Wyner–Ziv coding scheme [35, Sec. 11.3] (in particular, the analysis ofE 3 ), it can be readily shown that P{E c 2 (j−1)∩E c 2 (j)∩E 3 (j)} tends to zero as n→∞ if ˜ R S −R S <I(U;Y|X)−δ(ǫ). (3.2) Combining the bounds (3.1) and (3.2), we have shown that P{E(j)} tends to zero asn→∞ ifI(U,X;Y)> I(U;S|X)+δ(ǫ ′ )+2δ(ǫ) =I(U,X;S)+δ ′ (ǫ), which is satisfied by our choice ofp(x)p(u|x,s) forǫ sufficiently small. When there is no “error” (S n ,U n ( ˆ K j | ˆ L j−1 ),X n ( ˆ L j−1 ),Y n (j))∈T (n) ǫ . Thus, by the law of total expectation and the typical average lemma [35, Sec. 2.4], the asymptotic distortion averaged over the random codebook, encoding, and decoding is upper bounded as limsup n→∞ E(d(S n (j), ˆ S n (j)))≤ limsup n→∞ d max P{E(j)}+(1+ǫ) E(d(S, ˆ S)) P{E c (j)} ≤ (1+ǫ) E(d(S, ˆ S)), where d max = max (s,ˆ s)∈S× ˆ S d(s,ˆ s) < ∞. By taking ǫ → 0 and b → ∞, any distortion larger than E(d(S, ˆ S)) is achievable for a fixed conditional pmf p(x)p(u|x,s) and function ˆ s(u,x,y) satisfy- ingI(U,X;Y)>I(U,X;S). Finally, by the continuity of mutual information terms inp(x)p(u|x,s), the same conclusion holds when we relax the strict inequality toI(U,X;Y)≥ I(U,X;S). This completes the achievability proof of Theorem 10. 54 Remark 1. While the above achievability proof is for finite alphabets, it can be easily adapted to the Gaussian setting in Example 1 by incorporating cost constraints on the channel input and applying the standard discretization argument [35, Sections 3.4 and 3.8]. 3.1.2 Proof of the Converse In this section, we prove that for every code, the achieved distortion is lower bounded asD≥D ∗ . Given an(|S| n ,n) code, we identify the auxiliary random variablesU i = (S i−1 ,Y n i+1 ),i∈ [1:n]. Note that, as desired,U i → (X i ,S i )→Y i form a Markov chain fori∈ [1:n]. Consider n X i=1 I(U i ,X i ;S i ) = n X i=1 I(S i−1 ,Y n i+1 ,X i ;S i ) (a) = n X i=1 I(S i−1 ,Y n i+1 ;S i ) = n X i=1 I(S i−1 ;S i )+I(Y n i+1 ;S i |S i−1 ) (b) = n X i=1 I(Y n i+1 ;S i |S i−1 ) (c) = n X i=1 I(S i−1 ;Y i |Y n i+1 ) ≤ n X i=1 I(S i−1 ,Y n i+1 ,X i ;Y i ) = n X i=1 I(U i ,X i ;Y i ), (3.3) where (a) follows since X i is a function of S i−1 , (b) follows since S n is i.i.d., and (c) follows by the Csiszár sum identity [30, 46, 35, Sec. 2.3]. LetT be the standard time-sharing random variable, uniformly distributed over[1:n] and independent of (X n ,S n ,Y n ), and letU = (T,U T ),X =X T ,S =S T , andY =Y T . It can be easily verified thatX is independent ofS,U→ (X,S)→Y form a Markov chain, andI(U,X;S)≤I(U,X;Y). To lower bound the expected distortion of the given code, we rely on the following result. Lemma 6. SupposeZ→ V → W form a Markov chain andd(z,ˆ z) is a distortion measure. Then for every reconstruction function ˆ z(v,w), there exists a reconstruction function ˆ z ∗ (v) such that E d(Z,ˆ z ∗ (V)) ≤ E d(Z,ˆ z(V,W)) . 55 This extremely useful lemma traces back to Blackwell’s notion of channel ordering [12, 93] and can be interpreted as a “data processing inequality” for estimation. In the context of network information theory, it has been utilized by Kaspi [59] (see also [35, Section 20.3.3]) and appeared in the above simple form in [18]. Proof. Using the law of iterated expectations, we have E[d(Z,ˆ z(V,W))] = E V [E[d(Z,ˆ z(V,W))|V]]. (3.4) Now, for eachv∈V, E[d(Z,ˆ z(V,W))|V =v] = X z∈Z,w∈W p(z|v)p(w|v)d(z,ˆ z(v,w)) = X w∈W p(w|v) X z∈Z p(z|v)d(z,ˆ z(v,w)) ≥ min w∈W X z∈Z p(z|v)d(z,ˆ z(v,w)) (3.5) = X z∈Z p(z|v)d(z,ˆ z(v,w ∗ (v))), wherew ∗ (v) attains the minimum in (3.5) for a givenv. Define ˆ z ∗ (v) = ˆ z(v,w ∗ (v)). Then (3.4) becomes E[d(Z,ˆ z(V,W))] = E V [E[d(Z,ˆ z(V,W))|V]] ≥ E V " X z∈Z p(z|v)d(z,ˆ z ∗ (v)) # = E[d(Z,ˆ z ∗ (V))] which completes the proof. 56 Now consider E d(S n , ˆ S n ) = 1 n n X i=1 E d(S i ,ˆ s i (Y n )) (a) ≥ 1 n n X i=1 min ˆ s ∗ (i,ui,xi,yi) E d(S i ,ˆ s ∗ (i,U i ,X i ,Y i )) = min ˆ s ∗ (u,x,y) E d(S,ˆ s ∗ (U,X,Y)) , where (a) follows from Lemma 6 by identifyingS i asZ, (U i ,X i ,Y i ) = (S i−1 ,X i ,Y n i ) asV , andY i−1 asW , and noting thatS i → (S i−1 ,X i ,Y n i )→ Y i−1 form a Markov chain. This completes the proof of Theorem 10. 3.1.3 Lossless Communication Suppose that the state sequence needs to be communicated losslessly, i.e.,lim n→∞ P{ ˆ S n 6=S n } = 0. We can establish the following congruence of Theorem 10. Corollary 1. IfH(S) < Δ ∗ = max p(x) I(X,S;Y), then the state sequence can be communicated loss- lessly. Conversely, if the state sequence can be communicated losslessly, thenH(S)≤ Δ ∗ . To prove this, consider the special case of ˆ S = S and Hamming distortion measure d(s,ˆ s) (i.e., d(s,ˆ s) = 0 if s = ˆ s and 1 if s6= ˆ s). By setting U = S in the achievability proof of Theorem 10 in Subsection 3.1.1 and noting that no “error” implies thatS n = ˆ S n , we can conclude that the state sequence can be communicated losslessly ifΔ ∗ >H(S) for somep(x). The converse follows immediately since the lossless condition that the block error probability P{ ˆ S n 6=S n } tends to zero asn→∞ implies the zero Hamming distortion condition that the average symbol error probability(1/n) P n i=1 P{ ˆ S i 6=S i } tends to zero asn→∞. Combining this observation with the converse proof of Theorem 10 in Subsection 3.1.2, we can conclude thatH(S) must be less than or equal toΔ ∗ . Remark 2. If we define Δ ∗ = max p(x) I(X,S;Y), then min{H(S), Δ ∗ } characterizes the state uncer- tainty reduction rate, which captures the performance of the optimal list decoder for the state sequence (see [63] for the exact definition). The proof of this result again follows from Theorem 10 by letting ˆ S be the set of pmfs onS andd(s,ˆ s) = log(1/ˆ s(s)) be the logarithmic distortion measure and adapting the technique by Courtade and Weissman [25]. 57 3.2 Capacity–Distortion Tradeoff Now suppose that in addition to the state sequenceS n , the encoder wishes to communicate a messageM independent ofS n . What is the optimal tradeoff between the rateR of the message and the distortionD of state estimation? A(2 nR ,n) code for strictly causal state communication consists of • a message set[1:2 nR ], • an encoder that assigns a symbolx i (m,s i−1 )∈X to each messagem∈ [1 : 2 nR ] and past state sequences i−1 ∈S i−1 fori∈ [1:n], and • a decoder that assigns a message estimate ˆ m∈ [1:2 nR ] (or an error messagee) and a state sequence estimate ˆ s n ∈ ˆ S n to each received sequencey n ∈Y n . We assume thatM is uniformly distributed over the message set. The average probability of error is de- fined asP (n) e = P{ ˆ M6= M}. As before, the channel state estimation error is defined as E(d(S n , ˆ S n )). A rate–distortion pair is said to be achievable if there exists a sequence of (2 nR ,n) codes such that lim n→∞ P (n) e = 0 and limsup n→∞ E(d(S n , ˆ S n ))≤ D. The capacity–distortion function C SC (D) is the supremum of the ratesR such that(R,D) is achievable. We characterize this optimal tradeoff between information transmission rate (capacity C) and state estimation (distortionD) as follows. Theorem 11. The capacity–distortion function for strictly causal state communication is C SC (D) = max I(U,X;Y)−I(U,X;S) , (3.6) where the maximum is over all conditional pmfsp(x)p(u|x,s) with|U|≤|S|+2 and functions ˆ s(u,x,y) such that E(d(S, ˆ S))≤D. The proof of Theorem 11 is similar to the zero-rate case in Theorem 10 and thus we highlight the key steps here. Proof. Before proving the Theorem 11, we summarize a few useful properties ofC SC (D) in Lemma 7. In [132], they also discussed similar properties of the capacity–distortion function for the case in which the channel state information is not available. 58 Lemma 7. The capacity–distortion functionC SC (D) in Theorem 11 has the following properties: (1)C SC (D) is a nondecreasing concave function ofD for allD≥D ∗ . (2)C SC (D) is a continuous function ofD for allD>D ∗ . (3)C SC (D ∗ ) = 0 ifD ∗ 6= 0 andC SC (D ∗ )≥ 0 ifD ∗ = 0. The monotonicity is trivial. The concavity can be shown by using the standard time sharing argument. The continuity is a direct consequence of the concavity. The last property follows from Section IV . With these properties in hand, let us prove Theorem 11. 3.2.1 Proof of Achievability We useb transmission blocks, each consisting ofn symbols. The encoder uses rate-splitting technique, whereby in blockj, it appropriately allocates its rate between transmitting independent information and a description of the state sequenceS n (j−1) in blockj−1. Codebook generation. Fix a conditional pmfp(x)p(u|x,s) and functionˆ s(u,x,y) that attainC SC (D/(1+ ǫ)), where D is the desired distortion, and let p(u|x) = P s p(s)p(u|x,s). For each j ∈ [1 : b], ran- domly and independently generate 2 n(R+RS) sequencesx n (m j ,l j−1 ),m j ∈ [1:2 nR ],l j−1 ∈ [1:2 nRS ], each according to Q n i=1 p X (x i ). For each m j ∈ [1 : 2 nR ],l j−1 ∈ [1 : 2 nRS ], randomly and condi- tionally independently generate 2 n ˜ RS sequences u n (k j |m j ,l j−1 ), k j ∈ [1 : 2 n ˜ RS ], each according to Q n i=1 p U|X (u i |x i (m j ,l j−1 )). Partition the set of indices k j ∈ [1 : 2 n ˜ RS ] into equal-size binsB(l j ) = [(l j −1)2 n( ˜ RS−RS) +1:l j 2 n( ˜ RS−RS) ],l j ∈ [1:2 nRS ]. This defines the codebook C j = (x n (m j ,l j−1 ),u n (k j |m j ,l j−1 ): m j ∈ [1:2 nR ],l j−1 ∈ [1:2 nRS ], k j ∈ [1:2 n ˜ RS ] , j∈ [1:b]. The codebook is revealed to the both encoder and the decoder. Encoding. By convention, letl 0 = 1. At the end of blockj, the encoder finds an indexk j such that (s n (j),u n (k j |m j ,l j−1 ),x n (m j ,l j−1 ))∈T (n) ǫ ′ . If there is more than one such index, it selects one of them uniformly at random. If there is no such index, it selects an index from[1:2 n ˜ RS ] uniformly at random. In blockj+1 the encoder transmitsx n (m j+1 ,l j ), 59 wherem j+1 is the new message index to be sent in blockj +1 andl j is the bin index ofk j . Decoding. Letǫ > ǫ ′ . At the end of blockj +1, the decoder finds the unique index ˆ m j+1 , ˆ l j such that (x n (ˆ m j+1 , ˆ l j ),y n (j +1))∈T (n) ǫ . The decoder thus decodes the message index ˆ m j+1 in blockj +1. It then finds the unique index ˆ k j ∈B( ˆ l j ) such that(u n ( ˆ k j |ˆ m j , ˆ l j−1 ),x n (ˆ m j , ˆ l j−1 ),y n (j))∈T (n) ǫ . Finally it computes the reconstruction sequence as ˆ s i (j) = ˆ s(u i ( ˆ k j |ˆ m j , ˆ l j−1 ),x i (ˆ m j , ˆ l j−1 ),y i (j)) fori∈ [1:n]. Following the analysis of minimum distortion in Section II, it can be readily shown that the scheme can achieve any rate up to the capacity–distortion function given in Theorem 11. 3.2.2 Proof of the Converse We need to show that given any sequence of (2 nR ,n) code with lim n→∞ P (n) e = 0 and E(d(S n , ˆ S n ))≤ D, we must have R≤ C SC (D). We identify the auxiliary random variables U i := (M,S i−1 ,Y n i+1 ), i∈ [1 :n] withS 0 = Y n+1 =∅. Note that, as desired,U i → (X i ,S i )→ Y i form a Markov chain for i∈ [1:n]. Consider nR =H(M) (a) ≤I(M;Y n )+nǫ n = n X i=1 I(M;Y i |Y n i+1 )+nǫ n ≤ n X i=1 I(M,Y n i+1 ;Y i )+nǫ n = n X i=1 I(M,Y n i+1 ,S i−1 ;Y i )− n X i=1 I(S i−1 ;Y i |M,Y n i+1 )+nǫ n (b) = n X i=1 I(M,Y n i+1 ,S i−1 ,X i ;Y i )− n X i=1 I(S i−1 ;Y i |M,Y n i+1 )+nǫ n (c) = n X i=1 I(M,Y n i+1 ,S i−1 ,X i ;Y i )− n X i=1 I(Y n i+1 ;S i |M,S i−1 )+nǫ n (b) = n X i=1 I(M,Y n i+1 ,S i−1 ,X i ;Y i )− n X i=1 I(Y n i+1 ;S i |M,S i−1 ,X i )+nǫ n (d) = n X i=1 I(M,Y n i+1 ,S i−1 ,X i ;Y i )− n X i=1 I(M,S i−1 ,X i ,Y n i+1 ;S i )+nǫ n = n X i=1 I(U i ,X i ;Y i )− n X i=1 I(U i ,X i ;S i )+nǫ n (3.7) 60 where (a) follows by Fano’s inequality [29, Theorem 7.7.1], which states that H(M|Y n ) ≤ nǫ n for some ǫ n → 0 as n→∞ for any code satisfying lim n→∞ P (n) e = 0, (b) follows since X i is a func- tion of (M,S i−1 ), (c) follows by the Csiszár sum identity [30, 46, 35, Sec. 2.3], and (d) follows since (M,S i−1 ,X i ) is independent ofS i . So now we have R≤ 1 n n X i=1 I(U i ,X i ;Y i )− n X i=1 I(U i ,X i ;S i )+nǫ n (a) ≤ 1 n n X i=1 C SC (E(d(S i ,ˆ s i (U i ,X i ,Y i ))))+nǫ n (b) ≤C SC 1 n n X i=1 E(d(S i ,ˆ s i (U i ,X i ,Y i ))) +nǫ n (c) ≤C SC (D), (3.8) where (a) follows from the definition of the capacity–distortion function, (b) follows by the concavity of C SC (D) (see Property1 in Lemma 7), and(c) follows from Lemmas 6 and 7. This completes the proof of Theorem 11. Remark 3. Note that the inverse of the capacity–distortion function, namely, the distortion–capacity func- tion for strictly causal state communication is D SC (C) = min E(d(S, ˆ S)), (3.9) where the minimum is over all conditional pmfsp(x)p(u|x,s) and functionsˆ s(u,x,y) such thatI(U,X;Y)− I(U,X;S)≥C. By settingC = 0 in (3.9), we recover Theorem 10. (More interestingly, we can recover Theorem 11 from Theorem 10 by considering a supersourceS ′ = (S,W), where the message sourceW is independent ofS, and two distortion measures—the Hamming distortion measured(w, ˆ w) and a generic distortion measure d(s,ˆ s).) At the other extreme, by setting D =∞ in (3.6), we recover the capacity expression C = max p(x) I(X;Y) (3.10) of a DMC with DM state when the state information is available strictly causally at the encoder. (Unlike the general tradeoff in Theorem 11, strictly causal state information is useless when communicating the message alone.) Finally, by settingU =∅ in Theorem 11, we recover the result in [132] on the capacity– distortion function when the state information is not available at the encoder. 61 Remark 4. Theorem 11 (as well as Theorem 10) holds for any finite delay, that is, whenever the encoder is defined asx i (m,s i−d ) for somed∈ [1:∞). More generally, it continues to hold as long as the delay is sublinear in the block lengthn. Remark 5. The characterization of the capacity–distortion function in Theorem 11, albeit very compact, does not bring out the intrinsic tension between state estimation and independent information transmission. It can be alternatively written as C SC (D) = max p(x),Dx:EX(DX)≤D I(X;Y)− E X [R (X) WZ (D X )] , (3.11) where R (x) WZ (D) = min p(u|x,s),ˆ s(u,x,y):E[d(S, ˆ S(U,x,Y))]≤D I(U;S|x,Y), x∈X, is the Wyner–Ziv rate–distortion function [127] with side information (x,Y). The rateR (x) WZ (D x ) can be viewed as the price the encoder pays to estimate the channel state at the decoder under distortionD x by signaling withx. In particular, ifR (x) WZ (D) is independent ofx for a fixedD (i.e.,R (x) WZ (D) =R WZ (D)), then by the convexity of the Wyner–Ziv rate–distortion function, the alternative characterization ofC SC (D) in (3.11) simplifies to C SC (D) =C SC (∞)−R WZ (D), (3.12) where R WZ (D) =R (x) WZ (D), x∈X. Thus, in this case the capacity is achieved by splitting the unconstrained capacityC SC (∞) into information transmission and lossy source coding of the past state sequence with side information(X,Y). This simple characterization will be very useful in evaluating the capacity–distortion function in several examples. Remark 6. Along the same lines of [63], the optimal tradeoff between the state uncertainty reduction rate Δ and independent information transmission rateR can be characterized as the set of (R,Δ) pairs such that R≤I(X;Y) Δ≤H(S) R+Δ≤I(X,Y;S) 62 for somep(x). This result includes both the state uncertainty reduction rate in Remark 2 and the channel capacity in (3.10) as special cases. In the following subsections, we illustrate Theorem 11 via simple examples. 3.2.3 Injective Deterministic Channels Suppose that the channel output Y =y(X,S) is a function of X and S such that given every x∈X , the function y(x,s) is injective (one-to-one) in s. This condition implies thatH(Y|X) = H(S) for everyp(x). For this class of injective deterministic channels, the characterization of the capacity–distortion function in Theorem 11 can be greatly simplified. Proposition 1. The capacity–distortion function of the injective deterministic channel is C SC (D) =C SC (0) = max p(x) I(X;Y) = max p(x) H(Y)−H(S) . (3.13) In other words, we can achieve the unconstrained channel capacity as well as perfect state estimation. This is no surprise since the injective condition implies that given the channel inputX and outputY , the stateS can be recovered losslessly. Note that this result is independent of the distortion measured(s,ˆ s) as long as our critical assumption—for everys, there exists an ˆ s withd(s,ˆ s) = 0—is satisfied. To prove achievability in Proposition 1, substituteU =Y in Theorem 11 (which satisfies the Markovity conditionU→ (X,S)→Y sinceY is a function of(X,S)). For the converse, consider I(U,X;Y)−I(U,X;S) =I(X;Y)− I(U;S|X)−I(U;Y|X) =I(X;Y)− H(U|Y,X)−H(U|X,S) (a) =I(X;Y)−(H(U|Y,X)−H(U|Y,X,S)) =I(X;Y)−I(U;S|Y,X) (b) =I(X;Y), where(a) follows sinceY =y(X,S) and(b) follows from the injective condition. 63 Example 2 (Gaussian channel with additive Gaussian state and no noise). Consider the channel Y =X +S, where the state S ∼ N(0,Q). Assume the squared error distortion measure and an expected average power constraintP onX. The capacity–distortion function of this channel is C SC (D) = C(P/Q) for allD, which is the capacity without state estimation. Example 3 (Binary symmetric channel with additive Bernoulli state and no noise). Consider the channel Y =X⊕S, whereX andY are binary and the stateS∼ Bern(q). Assume the Hamming distortion measure. The capacity–distortion function of this channel is C SC (D) = 1−H(q) for allD. In the following subsections, we extend the above two examples to the more general cases where there is additive noise. 3.2.4 Gaussian Channel with Additive Gaussian State We revisit the Gaussian channel with additive Gaussian noise (see Example 1) Y =X +S +Z, whereS∼ N(0,Q) andZ∼ N(0,N). As before, we assume an average expected power constraintP and the squared error distortion measured(x,ˆ x) = (x− ˆ x) 2 . We note the following extreme cases of the capacity–distortion function: • IfN = 0, thenC SC (D) =C SC (∞) =∞. 64 • IfD≤D ∗ =QN/(P+Q+N) (the optimal distortion mentioned in Example 1), thenC SC (D) = 0. • IfD≥ QN/(Q+N) (the minimum distortion achievable when the encoder has no knowledge of the state), thenC(D) =C(∞) = C(P/(Q+N)), which is achieved by first decoding the codeword X n in a "noncoherent" fashion, then utilizingX n along with the channel outputY n to estimateS n (see [132]). More generally, we have the following. Proposition 2. The capacity–distortion function of the Gaussian channel with additive Gaussian state when the state information is strictly causally available at the encoder is C SC (D) = 0, 0≤D< QN P+Q+N , C (P+Q+N)D−QN QN , QN P+Q+N ≤D< QN Q+N , C P Q+N , D≥ QN Q+N . Proposition 2 can be proved by evaluating the characterization in Theorem 11 with the optimal choice of the auxiliary random variableU and the estimation function ˆ s(u,x,y). However, the alternative char- acterization in Remark 5 provides a more direct proof. Since the Wyner–Ziv rate–distortion function [127] for the Gaussian sourceS with side informationY = x+S +Z is independent ofx, it follows imme- diately from (3.12) thatC SC (D) = C SC (∞)−R WZ (D), which is equivalent to the expression given in Proposition 2. 3.2.5 Binary Symmetric Channel with Additive Bernoulli State Consider the binary symmetric channel Y =X⊕S⊕Z, where the stateS∼ Bern(q),q∈ [0,1/2], and the noiseZ∼ Bern(p),p∈ [0,1/2], are independent of each other. Assume the Hamming distortion measured(x,ˆ x) =x⊕ ˆ x. We note the following extreme cases of the capacity–distortion function: • Ifp = 0, thenD ∗ = 0 andC SC (D)≡ 1−H(q). • Ifq = 0, thenD ∗ = 0 andC SC (D)≡ 1−H(p). 65 • Ifp = 1/2, thenD ∗ =q andC SC (D)≡ 0. • Ifq = 1/2, thenD ∗ =p andC SC (D)≡ 0. • IfD≥q, thenC SC (D) =C SC (∞) = 1−H(p∗q) = 1−H(p(1−q)+q(1−p)). More generally, we have the following. Proposition 3. The capacity–distortion function of the binary symmetric channel with additive Bernoulli state when the state information is strictly causally available at the encoder is C SC (D) = 1−H(p∗q)−R WZ (D), where R WZ (D) = min α,β∈[0,1]:αβ+(1−α)q≤D α H(β∗p)−H(p∗q)+H(q)−H(β) (3.14) is the Wyner–Ziv rate-distortion function for the Bernoulli source and Hamming distortion measure. As in the Gaussian case, the proof of the proposition follows immediately from the alternative charac- terization of the capacity–distortion function in Remark 5. Here the Wyner–Ziv rate–distortion function follows again from [127]. 3.3 Causal State Communication So far in our discussion, we have assumed that the encoder has strictly causal knowledge of the state sequence. What will happen if the encoder has causal knowledge of the state sequence, that is, at time i∈ [1 :n] the previous and current state sequences i is available at the encoder? Now a (2 nR ,n) code, probability of error, achievability, and capacity–distortion function are defined as in the strictly causal case in Section 3.2, except that the encoder is of the formx i (m,s i ),i∈ [1:n]. It turns out that the optimal tradeoff between capacity and distortion can be achieved by a simple modification to the block Markov coding scheme for the strictly causal case. Theorem 12. The capacity–distortion function for causal state communication is C C (D) = max I(U,V;Y)−I(U,V;S) , (3.15) 66 where the maximum is over all conditional pmfsp(v)p(u|v,s) with|V|≤ min{(|X|−1)|S|+1,|Y|}+1 and|U|≤|S|+2 and functionsx(v,s) and ˆ s(u,v,y) such that E(d(S, ˆ S))≤D. At one extreme point, ifD =∞, then the theorem recovers the unconstrained channel capacity C C (∞) = max p(v)p(u|v,s),x(v,s) I(U,V;Y)−I(U,V;S) = max p(v),x(v,s) I(V;Y) established by Shannon [99]. At the other extreme point, the optimal distortion for causal state communi- cation is D ∗ = min E(d(S, ˆ S)), where the minimum is over all conditional pmfsp(v)p(u|v,s) and functionsx(v,s) and ˆ s(u,v,y) such that I(U,V;Y)≥I(U,V;S). Moreover, the condition for zero Hamming distortion can be shown to be max p(x|s) I(X,S;Y)≥H(S), which was proved in [63]. Note that by settingV =X in the theorem, we recover the capacity–distortion functionC SC (D) for strictly causal communication in Theorem 11. To prove achievability for Theorem 12, we use the Shannon strategy [99] (see also [35, Sec. 7.5]) and perform encoding over the set of all functions{x v (s):S7→X} indexed byv as the input alphabet. This induces a DMC with DM statep(y|v,s)p(s) =p(y|x(v,s),s)p(s) with the state information strictly causally available at the encoder and we can immediately apply Theorem 11 to prove achievability of C C (D). For the converse, we identify the auxiliary random variablesV i = (M,S i−1 ) andU i = Y n i+1 , i∈ [1:n]. Note that (U i ,V i )→ (X i ,S i )→Y i form a Markov chain,V i is independent ofS i , andX i is a function of(V i ,S i ) as desired. The rest of the proof utilizes Lemma 6 and the concavity ofC C (D), and follows similar steps to that for the strictly causal case. In the following subsections, we illustrate Theorem 12 through simple examples. 67 3.3.1 Gaussian Channel with Additive Gaussian State We revisit the Gaussian channel (see Example 1 and Subsection 3.2.4) Y =X +S +Z. While the complete characterization ofC C (D) is not known even for the unconstrained case (D =∞), the optimal distortion can be characterized as D ∗ = QN √ P + √ Q 2 +N . Achievability follows by settingU = V =∅,X = p P/QS, and ˆ S = E(S|Y). The converse follows from the fact that D ∗ is also the optimal distortion when the state information is known noncausally at the encoder (see [110]). It is evident that knowing channel state causally helps the encoder to coherently choose the channel codewordX to amplify the channel stateS unlike the strictly causal case whereX and S are independent of each other. 3.3.2 Binary Symmetric Channel with Additive Bernoulli State We revisit the binary symmetric channel (see Subsection 3.2.5) Y =X⊕S⊕Z, whereS∼ Bern(q) andZ∼ Bern(p) are independent of each other. We note the following extreme cases of the capacity–distortion function: • Ifp = 0, thenD ∗ = 0 andC C (D)≡ 1−H(q). • Ifq = 0, thenD ∗ = 0 andC C (D)≡ 1−H(p). • Ifp = 1/2, thenD ∗ =q andC C (D)≡ 0. • If D≥ q, then C C (D) = C C (∞) = 1−H(p), which is achieved by canceling the state at the encoder (X =V⊕S). In general, the capacity–distortion function is given by the following proposition. 68 Proposition 4. The capacity–distortion function of the binary symmetric channel with additive Bernoulli state when the state information is causally available at the encoder is C C (D) = 1−H(p)−H(q)+H(D), D≤q. Proof. For the proof of achievability, observe that if we cancel the state at the encoder and split the uncon- strained capacity into information transmission and lossy source coding of the past state sequence (without side information sinceV andY are independent ofS), thenC C (∞)−R(D) = (1−H(p))−(H(q)− H(D)) is achievable. This corresponds to evaluating Theorem 11 withX = V⊕S,U = V⊕S⊕ ˜ S, and ˆ S = U⊕V = S⊕ ˜ S, whereV ∼ Bern(1/2) and ˜ S∼ Bern(D) are independent ofS. (Note the similarity to rate splitting for the strictly causal case discussed in Remark 5.) For the proof of the converse, consider I(U,V;Y)−I(U,V;S) =I(U,V,S;Y)−I(U,V,Y;S) =H(Y)−H(Y|U,V,S)−H(S)+H(S|U,V,Y) (a) =H(Y)−H(Y|X,S)−H(S)+H(S|U,V,Y) (b) =H(Y)−H(Y|X,S)−H(S)+H(S⊕ ˆ S|U,V,Y) ≤ 1−H(p)−H(q)+H(S⊕ ˆ S) = 1−H(p)−H(q)+H(D), where (a) follows sinceX is a function of (V,S) and (U,V)→ (X,S)→Y form a Markov chain, and (b) follows since ˆ S is a function of(U,V,Y). This completes the proof of the proposition. Figure 3.2 compares the capacity–distortion function with causal state information in Proposition 4 to that with strictly causal state information in Proposition 3 whenp =q = 1/4. 3.3.3 Five-Card Trick We next consider the classical five-card trick. Two information theorists, Alice and Bob, perform a “magic" trick with a shuffled deck ofN cards, numbered from0 toN−1. Alice asks a member of the audience to selectK cards at random from the deck. The audience member passes theK cards to Alice, who examines them and hands one back. Alice then arranges the remainingK−1 cards in some order and places them 69 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 CC(D) CSC(D) D Figure 3.2: The capacity–distortion function of the binary symmetric channel with additive Bernoulli state (p = q = 1/4) when the state information is available strictly causally (C SC ) or causally (C C ) at the encoder. face down in a neat pile. Bob, who has not witnessed these proceedings, then enters the room, looks at theK−1 cards, and determines the missingK-th card, held by the audience member. There are two key questions: • GivenK, find the maximum number of cardsN for which this trick could be performed? • How is this trick performed? This trick (discussed in [27], [87]) can be formulated as state communication at zero Hamming distortion with causal state knowledge at the encoder. Proposition 5. The maximum number of cardsN for which the trick could be performed isK!+K−1. Proof. To show that the maximum cannot be larger than K! +K− 1, that is, to prove the converse, we suppose that multiple rounds of the trick were to be performed. In the framework of causal state communication, the stateS corresponds to an unordered tuple ofK cards selected by the audience member, which is uniformly distributed over all possible choices ofK cards. The channel inputX (as well as the channel output Y ) corresponds to the ordered tuple of K− 1 cards placed and received, respectively, by Alice and Bob. Since Bob has to recover the missing card losslessly, the problem is equivalent to 70 reproducing the stateS itself with zero Hamming distortion (by combining the remaining card with the receivedK−1 cards). Now by Theorem 12, the necessary condition for zero Hamming distortion is given by max p(x|s) H(X)−H(S)≥ 0, or equivalently, max p(x|s) H(X|S)−H(S|X) ≥ 0. (3.16) SinceS is uniform and the maximum is attained by the (conditionally) uniformX, the condition in (3.16) simplifies to log(K!)≥ log(N−(K−1)), or equivalently, N≤K!+K−1. We now show that we only need one round of communication to achieve this upper bound on causal state communication. Without loss of generality, assume that the selected cards(c 0 ,··· ,c K−1 ) are ordered withc 0 <c 1 <···<c K−1 . Alice selects cardc i to hand back to the audience wherei =c 0 +c 1 +···+ c K−1 (mod K). Observe that c 0 +c 1 +···+c K−1 =Kr 1 +i, (3.17) for some integerr 1 . The remainingK−1 cards(c j1 ,··· ,c jK−1 ) (c j0 =c i is the deleted card) are summed and decomposed, i.e., c j1 +c j2 +···+c jK−1 =Kr 2 +s, (3.18) for some integerr 2 . Since all theK cards sum toi (mod K), the missing cardc j0 =c i must be congruent to−s+i (mod K). Thus c j0 =c i =K(r 1 −r 2 )−s+i. (3.19) Therefore, if we renumber theN− (K− 1) cards from 0 toK!− 1 (by removing theK− 1 retained cards), the hidden card’s new number is congruent to−s (mod K) since there arei cardsc 0 ,...,c i−1 before cardc i and hence the hidden card’s new numberc i −i is equal toK(r 1 −r 2 )−s. There are exactly 71 (K−1)! possibilities remaining for the hidden card’s number, which can be conveyed by a predetermined permutation of theK−1 retained cards. This completes the achievability proof. 3.4 Concluding Remarks The problem of joint information transmission and channel state estimation over a DMC with DM state was studied in [132] (no state information at the encoder) and [109, 110] (full state information at the encoder). In this paper, we bridged the gap between these two results by studying the case in which the encoder has strictly causal or causal knowledge of the channel state information. The resulting capacity–distortion function permits a systematic investigation of the tradeoff between information transmission and state estimation. The missing link in these set of results was the important open problem of finding the capacity– distortion functionC NC (D) for a general DMC with DM state with an arbitrary distortion measure, when the state sequence is non-causally at the encoder. This problem is discussed in the next chapter along with the case when the decoder wants to estimate a function of the state instead of reconstructing the channel state directly. 72 Chapter 4 On non-causal channel state information In the first part of this chapter, we investigate the capacity–distortion trade-off for state dependent chan- nels with non-causal channel state knowledge at the encoder. We derive lower and upper bounds on the capacity–distortion function and show that the two bounds match for some example channels. In the remaining part of the chapter, we slightly change the state communication framework considered so far. Instead of estimating the source signal, we consider the scenario where the decoder is interested to estimate a known function of the state and channel input. Since, we assume non-causal channel state knowledge at the encoder, the encoder has the freedom to choose the channel input in such a way that the modified state becomes easy to estimate. The rest of this chapter is organized as follows. Section 4.1 describes the basic channel model with discrete alphabets, characterizes the lower and upper bound on the capacity–distortion function, establishes its achievability and proves the converse part of the theorem. Section 4.2 then provides key examples to illustrate the tightness of the bounds proposed. Section 4.3 extends the results to the rate–distortion tradeoff setting, wherein we consider a lossy source coding with side information vending machine at the encoder setup and we derive a stronger converse for the problem setting. Section 4.4 describes the basic channel model of implicit communication, establishes an upper bound and a lower bound on the minimum distortion. Section 4.5 specializes our results to the example of the asymptotic version of the Witsenhausen counterexample, characterizes the minimum distortion, establishes achievability and proves the converse. Finally, Section 4.6 concludes the chapter. 73 Figure 4.1: Non-causal state communication. 4.1 Problem Setup and Main Result Consider a point-to-point communication system with state depicted in Fig. 4.1. Suppose that the en- coder has non-causal access to the channel state sequenceS n and wishes to communicate the state to the decoder. Also suppose that, in addition to the state sequenceS n , the encoder wishes to communicate a message M independent of S n . We assume a DMC with a DM state model (X×S,p(y|x,s)p(s),Y) that consists of a finite input alphabetX , a finite output alphabetY, a finite state alphabetS, and a col- lection of conditional pmfsp(y|x,s) onY. The channel is memoryless in the sense that, without feed- back,p(y n |x n ,s n ) = Q n i=1 p Y|X,S (y i |x i ,s i ), and the state is memoryless in the sense that the sequence (S 1 ,S 2 ,...) is independent and identically distributed (i.i.d.) withS i ∼p S (s i ). A(2 nR ,n) code for non-causal state communication consists of • a message set[1:2 nR ], • an encoder that assigns a symbol x i (m,s n )∈X to each message m∈ [1 : 2 nR ] and each state sequences n ∈S n , and • a decoder that assigns a message estimate ˆ m∈ [1:2 nR ] (or an error messagee) and a state sequence estimate ˆ s n ∈ ˆ S n to each received sequencey n ∈Y n . We assume that M is uniformly distributed over the message set. The average probability of error is defined asP (n) e = P{ ˆ M6=M}. The fidelity of the state estimate is measured by the expected distortion E(d(S n , ˆ S n )) = 1 n n X i=1 E(d(S i , ˆ S i )), whered :S× ˆ S→ [0,∞) is a distortion measure between a state symbols∈S and a reconstruction 74 symbol ˆ s∈ ˆ S. Without loss of generality, we assume that for every symbols∈S there exists a recon- struction symbol ˆ s∈ ˆ S such thatd(s,ˆ s) = 0. A rate–distortion pair is said to be achievable if there exists a sequence of (2 nR ,n) codes such that lim n→∞ P (n) e = 0 and limsup n→∞ E(d(S n , ˆ S n ))≤ D. The capacity–distortion functionC NC (D) is the supremum of the ratesR such that(R,D) is achievable. We next characterize this optimal tradeoff between information transmission rate (capacityC) and state estimation (distortionD). The following theorem gives a lower bound on the capacity–distortion function C NC (D). Theorem 13. The capacity–distortion function for non-causal state communication is lower bounded by C NC (D)≥C l NC (D) = max I(U;Y)−I(U;S) , (4.1) where the maximum is over all conditional pmfsp(u|s)p(x|u,s) and functionsˆ s(u,y) such that E(d(S, ˆ S))≤ D. The achievability scheme essentially follows the coding scheme proposed in [109] for the same setting, which was later also utilized in the hybrid coding scheme of [78] and [16] for related problems. For completeness, we will provide the proof of this theorem below. 4.1.1 Proof of Achievability Although the achievability proof essentially follows [109], analysis of expected distortion requires a more rigorous argument unlike the conventional proof of the channel coding theorem [35]. Codebook generation. Fix a conditional pmfp(u|s)p(x|u,s) and functionˆ x(u,y) that attainC l NC (D/(1+ ǫ)), whereD is the desired distortion, and letp(u) = P s p(s)p(u|s). Randomly and independently gen- erate 2 n(R+ ˜ RS) sequencesu n (m,l),m∈ [1:2 nR ],l∈ [1:2 n ˜ RS ], each according to Q n i=1 p U (u i ). The codebook is revealed to both the encoder and the decoder. Encoding. Supposem∈ [1:2 nR ] is the new message index to be sent to the decoder. The encoder finds an indexl such that (s n ,u n (m,l))∈T (n) ǫ ′ . If there is more than one such index, it selects one of them uniformly at random. If there is no such index, it selects an index from [1 : 2 n ˜ RS ] uniformly at random. Based on the sequence u n (m,l), the encoder 75 chooses a channel input sequencex n such that (x n ,s n ,u n (m,l))∈T (n) ǫ ′ and transmits thisx n over the state-dependent channel. Decoding. Letǫ > ǫ ′ . At the end of the block, the decoder finds the unique index pair (ˆ m, ˆ l) such that (u n (ˆ m, ˆ l),y n )∈T (n) ǫ . If there is more than one such index, an error is declared. Finally it computes the reconstruction sequence as ˆ s i = ˆ s(u i (ˆ m, ˆ l),y i ) fori∈ [1:n]. Analysis of expected distortion. LetM,L be the indices chosen to communicate. We bound the proba- bility of error and distortion averaged over the random choice of the codebooks. Define the “error" event E = (S n ,U n ( ˆ M, ˆ L),X n ,Y n ) / ∈T (n) ǫ and consider the events E 1 = (S n ,U n (M,L),X n ,Y n ) / ∈T (n) ǫ , E 2 ={ ˆ M6=M}∪{ ˆ L6=L}. Then by the union of events bound, P{E}≤ P{E 1 }+ P{E 2 }. We bound each term. For the first term, let ˜ E 1 = (S n ,U n (M,L),X n ) / ∈T (n) ǫ ′ and note that P{E 1 }≤ P{ ˜ E 1 }+ P{ ˜ E c 1 ∩E 1 }. By the independence of the codebooks (in particular, the independence ofL and the codebook) and the covering lemma [35, Sec. 3.7], P{ ˜ E 1 } tends to zero asn→∞ if ˜ R S > I(U;S) +δ(ǫ ′ ). Sinceǫ > ǫ ′ andY n |{U n (M,L) =u n ,X n =x n ,S n =s n }∼ Q n i=1 p Y|X,S (y i |x i ,s i ), by the conditional typicality lemma [35, Sec. 2.5], P{ ˜ E c 1 ∩E 1 } tends to zero asn→∞. Next, using the proof strategies used in [78, 72], we can show that P{E 2 } tend to zero as n→∞ if R + ˜ R S < I(U;Y)−δ(ǫ). Combining the bounds and eliminating ˜ R S , we have shown that P{E} 76 tends to zero asn→∞ ifR < I(U;Y)−I(U;S)−δ(ǫ)−δ(ǫ ′ ), which is satisfied by our choice of p(u|s)p(x|u,s) forǫ andǫ ′ sufficiently small. When there is no “error”,(S n ,V n ( ˆ M, ˆ L),U n ,Y n )∈T (n) ǫ . Thus, by the law of total expectation and the typical average lemma [35, Sec. 2.4], the asymptotic distortion averaged over the random codebook, encoding, and decoding is upper bounded as limsup n→∞ E(d(S n , ˆ S n )) ≤ limsup n→∞ d max P{E}+(1+ǫ) E(d(S, ˆ S)) P{E c } ≤ (1+ǫ) E(d(S, ˆ S)), where d max = max (s,ˆ s)∈S× ˆ S d(s,ˆ s) <∞. By taking ǫ→ 0, any distortion larger than E(d(S, ˆ S)) is achievable for a fixed conditional pmf p(u|s)p(x|u,s) and function ˆ x(u,y) satisfying R < I(U;Y)− I(U;S). Finally, by the continuity of mutual information terms inp(u|s)p(x|u,s), the same conclusion holds when we relax the strict inequality toR≤ I(U;Y)−I(U;S). This completes the achievability proof. Remark 7. Following an argument similar to that made in [38], it can be shown that it suffices to maximize the lower bound onC NC (D) using a deterministic mappingx(u,s), since for a fixedp(u|s),I(U;Y)− I(U;S) is a convex function ofp(x|u,s). Remark 8. Note that the inverse of the capacity–distortion function, namely, the distortion–capacity func- tion for non-causal state communication is D NC (C) = min E(d(S, ˆ S)), (4.2) where the minimum is over all conditional pmfsp(u|s) and functionsx(u,s),ˆ s(u,y) such thatI(U;Y)− I(U;S)≥ C. By settingC = 0, we can provide an upper bound on the minimum distortionD ∗ . At the other extreme, by settingD =∞ in (4.1), we recover the capacity expression C = max p(u|s)x(u,s) I(U;Y)−I(U;S) (4.3) of a DMC with DM state (i.e. the Gelfand–Pinsker channel [38]) when the state information is available non-causally at the encoder. Finally, by choosingU = (V,W) withp(u|s)x(u,s) =p(v)p(w|v,s)x(v,s) 77 we recover the causal state communication result in [20] and similarly by settingU =∅ in Theorem 13, we recover the result in [132] on the capacity–distortion function when the state information is not available at the encoder. Theorem 14. The capacity–distortion function for non-causal state communication is upper bounded by C NC (D)≤C u NC (D) = max I(X,S;Y)−I(U,Y;S) , (4.4) where the maximum is over all conditional pmfsp(u|s)p(x|u,s) and functionsˆ s(u,y) such that E(d(S, ˆ S))≤ D. Remark 9. Observe that the upper bound onC NC (D) can also be written as I(X,S;Y)−I(U,Y;S) (a) = I(X,S,U;Y)−I(U,Y;S) = I(U,S;Y)−I(U,Y;S)+I(X;Y|U,S) = I(U;Y)−I(U;S)+I(X;Y|U,S), where (a) follows sinceU→ (X,S)→ Y forms a Markov chain. From this alternate characterization of the upper bound, it is easy to see that if we restrictp(x|u,s) = x(u,s) then the upper bound matches with the lower bound. It may not be sufficient to maximize the upper bound over the functionx(u,s), such a choice maximizesI(U;Y)−I(U;S), however it minimizes the second termI(X;Y|U,S) in the upper bound. Thus the upper bound may be strictly larger than the lower bound. Remark 10. Note that from the definition of the inverse of the capacity–distortion function, namely, the distortion–capacity function for non-causal state communication, we can provide a lower bound on the minimum distortionD ∗ using Theorem 14: D ∗ ≥ min E(d(S, ˆ S)), where the minimum is over all conditional pmfsp(u|s)p(x|u,s) and the function ˆ x(u,y) such that I(X,S;Y)≥I(U,Y;S). 78 4.1.2 Proof of the Converse Before proving Theorem 14, we summarize a few useful properties ofC u NC (D) in Lemma 8 without proof. Lemma 8. The upper bound on capacity–distortion functionC u NC (D) in Theorem 14 has the following properties: (1)C u NC (D) is a nondecreasing concave function ofD. (2)C u NC (D) is a continuous function ofD. To prove the converse, we need to show that given any sequence of(2 nR ,n) codes withlim n→∞ P (n) e = 0 and E(d(S n , ˆ S n ))≤ D, we must have R ≤ C u NC (D). We identify the auxiliary random variables U i := (M,S n i+1 ,Y i−1 ,Y n i+1 ), i ∈ [1 : n] with Y 0 = Y n+1 = S n+1 = ∅. Note that, as desired, U i → (X i ,S i )→Y i form a Markov chain fori∈ [1:n]. Consider nR =H(M) (a) ≤I(M;Y n )+nǫ n =I(M;Y n )−I(M,S n ;Y n )+I(M,S n ;Y n ) +nǫ n , (4.5) where(a) follows by Fano’s inequality [29, Theorem7.7.1], which states thatH(M|Y n )≤nǫ n for some ǫ n → 0 asn→∞ for any code satisfyinglim n→∞ P (n) e = 0. Now consider I(M,S n ;Y n ) =H(Y n )−H(Y n |M,S n ) (a) =H(Y n )−H(Y n |M,X n ,S n ) = n X i=1 (H(Y i |Y i−1 )−H(Y i |M,X n ,S n ,Y i−1 )) (b) ≤ n X i=1 (H(Y i )−H(Y i |M,X n ,S n ,Y i−1 )) (c) = n X i=1 (H(Y i )−H(Y i |X i ,S i )) = n X i=1 I(Y i ;X i ,S i ), (4.6) 79 where (a) follows sinceX i is a function of (M,S n ) for eachi∈ [1 :n], (b) follows since conditioning reduces entropy, and(c) follows from the memoryless property of the channelp(y|x,s). Similarly I(M,S n ;Y n )−I(M;Y n ) = I(S n ;Y n |M) = n X i=1 I(S i ;Y n |M,S n i+1 ) = n X i=1 I(S i ;M,Y n ,S n i+1 ) = n X i=1 I(S i ;M,Y i−1 ,Y n i+1 ,S n i+1 ,Y i ) = n X i=1 I(S i ;U i ,Y i ). (4.7) Now substituting (4.6) and (4.7) in (4.5) we get R≤ 1 n n X i=1 (I(X i ,S i ;Y i )−I(U i ,Y i ;S i ))+ǫ n (a) ≤ 1 n n X i=1 C u NC (E(d(S i ,ˆ s i (U i ,Y i ))))+ǫ n (b) ≤C u NC 1 n n X i=1 E(d(S i ,ˆ s i (U i ,Y i ))) +ǫ n (c) ≤C u NC (D), (4.8) where(a) follows from the definition of theC u NC (·),(b) follows by the concavity ofC u NC (D) (see Property 1, Lemma 8), and(c) follows from the fact that ˆ s i = ˆ s i (Y n ) is a function of the pair(U i ,Y i ) andC u NC (·) is a nondecreasing continuous function of the argument. This completes the proof of Theorem 14. Remark 11. Note that we could have chosen the auxiliary random variable U i := (M,S n i+1 ,Y i−1 ), i∈ [1 : n] with Y 0 = S n+1 =∅ in our problem setting and then, following the steps, similar to the Gelfand–Pinsker converse (see [38]), we can show that R≤ 1 n n X i=1 (I(U i ;Y i )−I(U i ;S i ))+nǫ n , which is same as the lower bound onC l NC (D). However with this choice of the auxiliary random variable 80 U i , proving that the distortion constraint is satisfied, is beyond our current technique of using estimation- theoretic inequalities such as [20, Lemma 1] since the Markov chain condition Y n → (U i ,Y i )→ S i fails to hold with non-causal state information at the encoder (recall that the Markov chain condition holds for strictly causal or causal state communication, which enables us to completely characterize the capacity–distortion for these two cases in [20]). 4.2 Illustrative Examples In this section, we discuss several simple examples to illustrate the computation and optimality of the existing achievable rate and the proposed upper bound. 4.2.1 Lossless Communication Suppose that the state sequence needs to be communicated losslessly, i.e.,lim n→∞ P{ ˆ S n 6=S n } = 0. We can establish the following congruence of Theorem 13 and Theorem 14. Corollary 2. H(S)< Δ ∗ = max p(x|s) I(X,S;Y)⇔ the state sequence can be communicated losslessly. To prove this, consider the special case of ˆ S = S and Hamming distortion measure d(s,ˆ s) (i.e., d(s,ˆ s) = 0 ifs = ˆ s and 1 ifs6= ˆ s). By settingU = S in the achievability proof of Theorem 13 (for R = 0 case) and noting that no “error” implies thatS n = ˆ S n , we can conclude that the state sequence can be communicated losslessly ifΔ ∗ >H(S) for somep(x|s). The converse follows immediately since the lossless condition that the block error probability P{ ˆ S n 6=S n } tends to zero asn→∞ implies the zero Hamming distortion condition that the average symbol error probability(1/n) P n i=1 P{ ˆ S i 6=S i } tends to zero asn→∞. Combining this observation with the converse proof of Theorem 14, we can conclude that H(S) must be less than or equal toΔ ∗ . 4.2.2 Quadratic Gaussian state communication Consider the Gaussian channel with additive Gaussian state [24]: Y =X +S +Z, 81 where the stateS∼ N(0,Q) and the noiseZ∼ N(0,N) are independent. Assume an expected average transmission power constraint, n X i=1 E(x 2 i (S n ))≤nP, where the expectation is over the random state sequence S n . We assume the squared error (quadratic) distortion measured(s,ˆ s) = (s−ˆ s) 2 . We wish to characterize the minimum distortionD ∗ achievable for this channel. This problem was studied in [110] and the optimal distortion was characterized as D ∗ = QN √ P + √ Q 2 +N . Achievability follows from [110] by choosingX = p P/QS, and ˆ S = E(S|Y). In this paper, we give an alternative simple proof of the converse. For the converse, let us look at the expression of Theorem 14 I(X,S;Y)−I(U,Y;S) = h(Y)−h(Y|X,S)−h(S)+h(S|U,Y) (a) = h(Y)−h(Z)−h(S)+h(S− ˆ S|U,Y) (b) ≤ h(Y)−h(Z)−h(S)+h(S− ˆ S) (c) ≤ 1 2 log(2πe)(( √ P + p Q) 2 +N)− 1 2 log(2πe) 2 QN + 1 2 log(2πe)D = 1 2 log (( √ P + √ Q) 2 +N)D QN , where (a) follows since ˆ S is a function of (U,Y), (b) follows since conditioning reduces entropy, and (c) follows from the maximum differential entropy lemma [35]. Invoking Remark 10 of Theorem 14, we complete the proof of the converse. 4.2.3 Binary Symmetric Channel with Additive Bernoulli State Consider the binary symmetric channel Y =X⊕S⊕Z, 82 where the stateS∼ Bern(q),q∈ [0,1/2], and the noiseZ∼ Bern(p),p∈ [0,1/2], are independent of each other. Assume the Hamming distortion measured(s,ˆ s) =s⊕ˆ s. We note the following extreme cases of the capacity–distortion function: • Ifp = 0, thenD ∗ = 0 andC NC (D)≡ 1−H(q). • Ifq = 0, thenD ∗ = 0 andC NC (D)≡ 1−H(p). • Ifp = 1/2, thenD ∗ =q andC NC (D)≡ 0. • IfD≥ q, thenC NC (D) = C NC (∞) = 1−H(p), which is achieved by canceling the state at the encoder (X =U⊕S). In general, the capacity–distortion function is given by the following proposition. Proposition 6. The capacity–distortion function of the binary symmetric channel with additive Bernoulli state when the state information is non-causally available at the encoder is C NC (D) = 1−H(p)−H(q)+H(D), D≤q. Proof. For the proof of achievability, observe that the capacity–distortion function with non-causal state information at the encoderC NC (D) is same as that of with causal state information (see [20]). Hence the achievability follows from the coding strategy of causal state communication. For the proof of the converse, consider I(X,S;Y)−I(U,Y;S) = H(Y)−H(Y|X,S)−H(S)+H(S|U,Y) (a) = H(Y)−H(Y|X,S)−H(S)+H(S⊕ ˆ S|U,Y) ≤ 1−H(p)−H(q)+H(S⊕ ˆ S) = 1−H(p)−H(q)+H(D), where(a) follows since ˆ S is a function of(U,Y). This completes the proof of the proposition. 83 Figure 4.2: Source coding with side information vending machine at the encoder. 4.3 Source coding with vending machine at the encoder We now consider a distributed source coding problem, where the description of a sourceX is sent over a noiseless link to a decoder, which will recover the source up to some distortion D. The scenario we consider is the one depicted in Figure 2, where the actions are taken at the encoder. Based on its obser- vation of the source sequenceX n , which is i.i.d. p(x n )∼ Q n i=1 p(x i ), the encoder sends a description of the source or an indexW to the decoder. Having seen the source sequenceX n , the encoder chooses the action sequenceA n . The decoder side informationY n is generated as the output of a DMCp(y|x,a) whose input is the pair (X,A). The reconstruction sequence ˆ X n is then based on the received indexW and on the side information sequenceY n . Note that whenW =∅, the problem reduces to the problem studied in Section II. This problem setting is previously studied in [91] and they provided lower and upper bounds on the rate–distortion function for this channel. In this paper, we will provide an improved lower bound by using a technique similar to that used in Section II. A (2 nR ,n) lossy source code for action-dependent side information available non-causally at the de- coder consists of • an encoder that assigns an indexw(x n )∈ [1:2 nR ] to eachx n ∈X n , • an action encoder that assigns an action sequencea i (x n )∈A to eachx n ∈X n fori∈ [1:n], • the side informationy n will be then the output of the memoryless channelp(y|x,a) whose input is (x n ,a n (w)) and • a decoder that assigns an estimate ˆ x i (w,y n ) to each received indexw∈ [1:2 nR ] and side informa- tion sequencey n ∈Y n . 84 The rate–distortion function with non-causal action-dependent side information available at the decoder, R SI−D (D), is the infimum of the rates R such that there exists a sequence of (2 nR ,n) codes with limsup n→∞ E(d(X n , ˆ X n ))≤ D. The following theorems summarizes the lower and upper bound on R SI−D (D) provided in [91]. Theorem 15. [91] The rate–distortion function for the channel model is upper bounded by R SI−D (D)≤ min I(X;A)−I(Y;A)+I(X;U|Y,A) , where the minimum is over all conditional pmfsp(a,u|x) and function ˆ x(a,u,y) such that E(d(X, ˆ X(A,U,Y)))≤D. Theorem 16. [91] The rate–distortion function for the channel model is lower bounded by R SI−D (D)≥ min I(X; ˆ X)−I(Y;X,A) , where the minimum is over all conditional pmfsp(a|x)p(ˆ x|y,a,x) such that E(d(X, ˆ X))≤D. It is easy to see that the two bounds do not coincide. We next derive an improved lower bound for this problem setting, we propose the following: Theorem 17. The rate–distortion function for the channel model is lower bounded by R SI−D (D)≥ min I(X;U,Y)−I(Y;X,A) , where the minimum is over all conditional pmfsp(a|x)p(u|a,x) such that E(d(X, ˆ X(U,Y)))≤D. Remark 12. Note that the lower bound of Theorem 17 improves that of Theorem 16 since I(X;U,Y)−I(Y;X,A) (a) =I(X;U,Y, ˆ X)−I(Y;X,A) ≥I(X; ˆ X)−I(Y;X,A). 85 Remark 13. The lower bound proposed in Theorem 17 can be rewritten as I(X;U,Y)−I(Y;X,A) = I(U;X|A,Y)+I(X;Y,A) −I(Y;X,A)−I(X;A|U,Y) = I(U;X|A,Y)+I(X;A) −I(Y;A)−I(X;A|U,Y). Now comparing this alternate characterization with the upper bound of Theorem 15, we can see that the gap between the two bounds is given by the termI(X;A|U,Y). So for the two bounds to match, the joint pmf minimizing the lower bound should satisfy the Markov chain conditionA→ (U,Y)→X. Proof. We need to show that given any sequence of (2 nR ,n) codes with E(d(X n , ˆ X n ))≤ D, we must haveR≥ R l SI−D (D). We identify the auxiliary random variablesU i := (X i−1 ,Y n\i ,W),i∈ [1 :n] 86 withn\i = [1 :n]−i andX 0 = Y 0 = Y n+1 =∅. Note that, as desired,U i → (X i ,A i )→ Y i form a Markov chain. Consider nR =H(W) (a) ≥I(X n ;W|Y n ) =I(X n ;Y n ,W)−I(X n ;Y n ) =H(X n )−H(X n |Y n ,W)−H(Y n )+H(Y n |X n ) (b) = n X i=1 H(X i )−H(X n |Y n ,W) −H(Y n )+H(Y n |X n ,A n ) (c) = n X i=1 (H(X i )−H(X i |X i−1 ,Y n\i ,W,Y i ) −H(Y i |Y i−1 )+H(Y i |X i ,A i )) (d) ≥ n X i=1 (H(X i )−H(X i |X i−1 ,Y n\i ,W,Y i ) −H(Y i )+H(Y i |X i ,A i )) = n X i=1 (H(X i )−H(X i |U i ,Y i ) −H(Y i )+H(Y i |X i ,A i )) = n X i=1 (I(U i ,Y i ;X i )−I(X i ,A i ;Y i )), where (a) follows from the fact thatW is a function ofX n , (b) follows sinceA n is a function ofX n , (c) follows sinceY i−1 ,X n\i ,A n\i →X i ,A i →Y i , and (d) follows since conditioning reduces entropy. Noting that ˆ X i = ˆ X i (W,Y n ) is a function of the pair(U i ,Y i ), the proof can be now completed using the standard time sharing argument similar to that employed in Section II. 4.4 Discrete memoryless implicit channel In the last few sections, we have studied the problem of state communication when the channel state infor- mation is available non-causally at the encoder. For this state communication problem, we proposed lower 87 Figure 4.3: Channel model for implicit communication and upper bound on the capacity–distortion function, which matches for Gaussian and binary state depen- dent channels. In this section, we consider a slightly different problem, where the decoder is not interested in estimating the state, but it wants to reconstruct a function of the state with maximum fidelity. Consider a point-to-point communication system with state depicted in Fig. 4.3. Suppose that the encoder has non- causal access to the channel state sequenceS n and wishes to communicate a modified version of the state to the decoder. We assume a DMC with a DM state model(U×S×X,x(u,s)p(y|u,s)p(s),Y) that con- sists of a finite input alphabetU, a finite output alphabetY, a finite state alphabetS, a finite modified state alphabetX , a deterministic functionx(u,s) onX , and a collection of conditional pmfsp(y|u,s) onY. The channel is memoryless in the sense that, without feedback,p(y n |u n ,s n ) = Q n i=1 p Y|U,S (y i |u i ,s i ), and the state is memoryless in the sense that the sequence (S 1 ,S 2 ,...) is independent and identically distributed (i.i.d.) with S i ∼ p S (s i ). The enocder wants to communicate the modified state sequence X n ={x i (u i ,s i )} n i=1 to the decoder. An(|X| n ,n) code for modified state communication over the DMIC with DM state consists of • an encoder that assigns a symbolu i (s n )∈U to each state sequences n ∈S n fori∈ [1:n], • the modified state signalx i ∈X is then obtained by a symbol-by-symbol mapping of the channel inputu i and the channel states i , and • a decoder that assigns an estimate ˆ x n ∈ ˆ X n to each received sequencey n ∈Y n . The fidelity of the estimate is measured by the expected distortion E(d(X n , ˆ X n )) = 1 n n X i=1 E(d(X i , ˆ X i )), where d : X× ˆ X → [0,∞) is a distortion measure between a modified state symbol x ∈ X and a reconstruction symbol ˆ x∈ ˆ X . Without loss of generality, we assume that for every symbolx∈X there 88 exists a reconstruction symbol ˆ x∈ ˆ X such thatd(x,ˆ x) = 0. A distortionD is said to be achievable if there exists a sequence of(|X| n ,n) codes such that limsup n→∞ E(d(X n , ˆ X n ))≤D. We next characterize the minimum distortionD ∗ , which is the infimum of all the achievable distortions D. The following theorem gives an upper bound on the minimum distortionD ∗ . Theorem 18. For a discrete, memoryless, state-dependent channel p(y|u,s) with non-causal state in- formation S n ∼ Q n i=1 p(s i ) at the encoder and a disotortion measure on the modified state signal, d(x,ˆ x) :X× ˆ X→R, the minimum achievable distortionD ∗ is upper bounded by D ∗ ≤ min E(d(X, ˆ X)), where the minimum is over all conditional pmfsp(v|s) and functionsu(v,s) and ˆ x(v,y) such that I(U,S;Y)≥I(V,Y;S). The achievability scheme essentially follows the coding scheme proposed in [109] for the setting of non-causal state amplification, which was later also utilized in the hybrid coding scheme of [78] and in the problem of estimation with a helper in [16]. The upper bound of Theorem 18 can be easily retrieved from the proof of Theorem 13 by substitutingR = 0. Theorem 19. For a discrete, memoryless, state-dependent channel p(y|u,s) with non-causal state in- formation S n ∼ Q n i=1 p(s i ) at the encoder and a disotortion measure on the modified state signal, d(x,ˆ x) :X× ˆ X→R, the minimum achievable distortionD ∗ is lower bounded by D ∗ ≥ min E(d(X, ˆ X)), where the minimum is over all conditional pmfsp(v|s),p(u|v,s) and function ˆ x(v,y) such that I(U,S;Y)≥I(V,Y;S). Note that the upper bound of Theorem 18 and lower bound of Theorem 19 onD ∗ differ in the choice 89 of the optimizing pmf. While the upper bound looks for the optimal function u(v,s), the lower bound optimizes the distortion over all possible choices on the conditional pmfp(v|u,s). The proof of this lower bound follows from the proof of Theorem 14. 4.4.1 Lossless Communication Suppose that the modified state sequenceX n needs to be communicated losslessly, i.e.,lim n→∞ P{ ˆ X n 6= X n } = 0. We can establish the following congruences of Theorem 18 and 19. Similar results on lossless stateS communication have been reported in [20]. Corollary 3. IfH(S)≤ Δ ∗ = max p(u|s) I(U,S;Y)+H(S|X), then the modified state sequence can be communicated losslessly. Conversely, if the modified state sequence can be communicated losslessly, then H(S)≤ Δ ∗ . We provide a sketch of the proof here. Consider the special case of ˆ X =X and Hamming distortion measure d(x,ˆ x) (i.e., d(x,ˆ x) = 0 if x = ˆ x and 1 if x6= ˆ x). By setting V = X in the achievability of Theorem 18 and noting that no “error” implies that X n = ˆ X n , we can conclude that the modified state sequence can be communicated losslessly if Δ ∗ > H(S) for some p(u|s). The converse follows immediately since the lossless condition implies that the block error probability P{ ˆ X n 6= X n } tends to zero asn→∞ implies the zero Hamming distortion condition that the average symbol error probability (1/n) P n i=1 P{ ˆ X i 6= X i } tends to zero asn→∞. Combining this observation with the converse proof of Theorem 19 in Subsection 3.1.2, we can conclude thatH(S) must be less than or equal toΔ ∗ . 4.4.2 Binary Symmetric Channel with Additive Bernoulli State We next show how our results enable an alternative proof for a result in [16]. Consider the binary symmet- ric channel Y =U⊕S⊕Z =X⊕Z, 90 where the stateS∼ Bern(q),q∈ [0,1/2], and the noiseZ∼ Bern(p),p∈ [0,1/2], are independent of each other. The inputU is assumed to be binary and it is cost constrained by E(ρ(U))≤C, whereρ(U) = 1 ifU = 1 and0 otherwise. Assuming the Hamming distortion measured(x,ˆ x) =x⊕ˆ x, we wish to find the minimum distortionD ∗ achievable for this channel model. The binary channel model was also studied in [16] in connection with problems in estimation with a helper who knows the interference. The minimum distortionD ∗ achievable for this problem is open with lower and upper bounds onD ∗ proposed in [16]. Specializing our lower bound in Theorem 19 to this case of binary estimation with a helper problem, we can give a simpler, alternative proof of the lower bound on D ∗ proposed in [16, Theorem 3]. Corollary 4 ([16]). A lower bound for the achievable distortion for the problem of binary estimation with a helper is given by D ∗ ≥ minH −1 2 (H 2 (p)+H 2 (q)−H 2 (Y))− E(U), where we defineH −1 2 (·) = 0 if the argument is negative or greater than 1, and the minimization is over the conditional pmfp(u|s) such that E(U) = E(ρ(U))≤C. Proof. Consider the lower bound proposed in Theorem 19, I(U,S;Y)−I(V,Y;S) =H 2 (Y)−H 2 (Y|U,S)−H 2 (S)+H(S|V,Y) (a) =H 2 (Y)+H 2 (S⊕ ˆ X⊕Y|V,Y)−H 2 (q)−H 2 (p) =H 2 (Y)+H 2 (S⊕X⊕Y⊕X⊕ ˆ X|V,Y)−H 2 (q)−H 2 (p) =H 2 (Y)+H 2 (S⊕Z⊕X⊕ ˆ X|V,Y)−H 2 (q)−H 2 (p) =H 2 (Y)+H 2 (S⊕Z⊕Y⊕X⊕ ˆ X|V,Y)−H 2 (q)−H 2 (p) =H 2 (Y)+H 2 (U⊕X⊕ ˆ X|V,Y)−H 2 (q)−H 2 (p) ≤H 2 (Y)+H 2 (U⊕X⊕ ˆ X)−H 2 (q)−H 2 (p) (b) ≤H 2 (Y)+H 2 (D ∗ + E(U))−H 2 (q)−H 2 (p), (4.9) 91 Figure 4.4: Vector Witsenhausen’s counterexample. where,(a) follows since ˆ X is a function of(V,Y), and(b) follows since E(U⊕X⊕ ˆ X)≤ E(U)+E(X⊕ ˆ X) =D ∗ + E(U). This completes the alternative proof of the corollary. 4.5 Witsenhausen counterexample While the above distortion bounds are proved for finite alphabets, it can be easily adapted to the Gaussian setting by incorporating cost constraints on the channel input and applying the standard discretization argument [35, Sections 3.4 and 3.8]. In this section, we specialize the DMIC results to the Gaussian setting, which is equivalent to the asymptotic version of Witsenhausen’s counterexample (see Fig. 4.4, introduced in [124]), which has the following channel model: Y i =U i (S n )+S i +Z i , 1≤i≤n =X i +Z i , where • S n andZ n are independent,S n is i.i.d.∼ N(0,σ 2 ) andZ n is i.i.d.∼ N(0,1). • The channel inputs are confined to 1 n E n X i=1 (U 2 i (S n )) ≤P. • The decoder wants to estimate the modified host signalX n with minimum possible distortion. • We assume the squared error (quadratic) distortion measured(x,ˆ x) = (x− ˆ x) 2 . We wish to compute the minimum distortion D ∗ achievable for this channel. Based on our result on modified state estimation over implicit communication channel, we derive an improved lower bound on 92 D ∗ and we show that the achievable strategy proposed in [42] achieves our improved lower bound and thus we characterize the minimum distortion of the asymptotic Witsenhausen problem for all values of the parametersP andσ 2 . Theorem 20. For the vector Witsenhausen problem with E(U 2 )≤ P , the minimum mean-squared esti- mation errorD ∗ for the estimation ofX n is given by D ∗ = min σSU σ(σ 2 +σ SU )−(Pσ 2 −σ 2 SU ) √ P +σ 2 +2σ SU +1 + 2 (P +σ 2 +2σ SU +1)(Pσ 2 +σ 2 −σ 2 SU ) 2 , whereσ SU ∈ [−σ √ P,0] and(x) + = max{0,x}. Remark 14. IfP = 0, the encoder can not do anything to help the decoder (sinceU = 0). Hence the minimum distortion is achieved by estimating the stateS i =X i via minimum mean squared error (MMSE) estimation from the noisy observationY i =S i +Z i and D ∗ = E(( ˆ X i −X i ) 2 ) = E(( ˆ S i −S i ) 2 ) = E(S 2 i )− E(S i Y i ) 2 E(Y 2 i ) = σ 2 σ 2 +1 , By substitutingσ SU =P = 0 in Theorem 20, we recover the result. Remark 15. The following two conditions achieve zero minimum distortion at the decoder, i.e.D ∗ = 0 if 1. P≥σ 2 , and 2. P≥ 1− 1 P+σ 2 +1 The first condition was observed in [42], and the second in [16]. As a sanity check, we now show that these two conditions can also be recovered from Theorem 20. • Consider the minimum distortionD ∗ whenP≥ σ 2 . Let us chooseσ SU =−σ 2 ≥−σ √ P . Then substituting the value ofσ SU in Theorem 20 we getD ∗ = 0 as σ(σ 2 +σ SU )−(Pσ 2 −σ 2 SU ) p P +σ 2 +2σ SU +1 + = σ(σ 2 −σ 2 )−(Pσ 2 −σ 4 ) p P−σ 2 +1 + = −σ 2 (P−σ 2 ) p P−σ 2 +1 + = 0. 93 • For the second condition chooseσ SU = 0 in Theorem 20. Then the condition for zero distortion is given by σ 3 −Pσ 2 p P +σ 2 +1≤ 0 P 2 +Pσ 2 −σ 2 ≥ 0 P(P +σ 2 +1)≥P +σ 2 P≥ 1− 1 P +σ 2 +1 . This completes the proof. In the following two subsections, we give a proof of Theorem 20. 4.5.1 Proof of the Converse In this subsection, we prove that for every code, the achieved distortion is lower bounded as D≥ D ∗ . Consider the lower bound proposed in Theorem 19, I(U,S;Y)−I(V,Y;S) =h(Y)−h(Y|U,S)−h(S)+h(S|V,Y) =h(Y)+h(S|V,Y)−h(S)−h(Z) (a) =h(Y)+h(S−α ˆ X−βY|V,Y)−h(S)−h(Z) (b) ≤h(Y)+h(S−α ˆ X−βY)−h(S)−h(Z) =h(Y)+h(S−α( ˆ X−X)−βY−αX)−h(S)−h(Z) =h(Y)+h(S−αX−βY | {z } A −α( ˆ X−X) | {z } B )−h(S)−h(Z) (c) ≤ 1 2 log(2πe) E(Y 2 )+ 1 2 log(2πe) E((A+B) 2 )− 1 2 log(2πe) 2 σ 2 , (4.10) 94 where the inequality in(a) follows since ˆ X is a function of(V,Y),(b) follows since conditioning reduces entropy, and (c) follows from the maximum entropy lemma [35]. Now E(Y 2 ) = E((U +S +Z) 2 )≤ ˜ P +σ 2 +2σ SU +1, where ˜ P = E(U 2 )≤P andσ SU = E(US). Similarly, E((A+B) 2 ) = E(A 2 )+ E(B 2 )+2 E(AB) (a) ≤ E(A 2 )+ E(B 2 )+2 p E(A 2 ) p E(B 2 ) = p E(A 2 )+ p E(B 2 ) 2 , where (a) follows from the Cauchy-Schwartz inequality with equality iffS−αX−βY is some scalar multiple of ˆ X−X. By the definition of the code,I(U,S;Y)−I(V,Y;S)≥ 0, substituting the value of E(Y 2 ) in (4.10) we get 1 2 log( ˜ P +σ 2 +2σ SU +1)≥ 1 2 logσ 2 − 1 2 log E((A+B) 2 ) = 1 2 log σ 2 E((A+B) 2 ) ≥ 1 2 log σ 2 p E(A 2 )+ p E(B 2 ) 2 . Sincelog(·) is an increasing function we can write p E(A 2 )+ p E(B 2 ) 2 ≥ σ 2 ˜ P +σ 2 +2σ SU +1 . (4.11) Continuing with the chain of inequalities we get from (4.11) and the definition ofB =α(X− ˆ X) E(( ˆ X−X) 2 )≥ 1 α 2 s σ 2 ˜ P +σ 2 +2σ SU +1 − p E(A 2 ) + 2 , (4.12) where E(A 2 ) = (1−(α+β)) 2 σ 2 +(α+β) 2 ˜ P +β 2 −2(α+β)(1−(α+β))σ SU . To compute the lower bound, we choose specific values of the parametersα andβ (the reason for choosing these values will be clear from the achievable strategy) given by α+β = σ p ˜ P +σ 2 +2σ SU +1 α = ( ˜ P− σ 2 SU σ 2 ) η , (4.13) 95 where the parameterη is given by η = ˜ P− σ 2 SU σ 2 (σ 2 +σ SU )+σ p ˜ P +σ 2 +2σ SU +1 ( ˜ Pσ 2 +σ 2 −σ 2 SU ) . (4.14) Substituting the value ofη in (4.13) and then substituting back the value ofα andβ in (4.12), we can show that the lower bound on E( ˆ X−X) 2 is given by E( ˆ X−X) 2 ≥ min σSU σ(σ 2 +σ SU )−( ˜ Pσ 2 −σ 2 SU ) p ˜ P +σ 2 +2σ SU +1 + 2 ( ˜ P +σ 2 +2σ SU +1)( ˜ Pσ 2 +σ 2 −σ 2 SU ) 2 = min σSU σ(σ 2 +σSU) √ ˜ P+σ 2 +2σSU+1 −( ˜ Pσ 2 −σ 2 SU ) + 2 ( ˜ Pσ 2 +σ 2 −σ 2 SU ) 2 , (4.15) where ˜ P∈ [0,P],σ SU ∈ [−σ p ˜ P,σ p ˜ P]. It is easy to see that forP≥σ 2 , E(X− ˆ X) 2 ≥ 0 by choosing σ SU =−σ 2 . ForP≤ σ 2 , we claim that it is sufficient to considerσ SU ∈ [−σ p ˜ P,0] while optimizing (4.15). To prove this, we have to show that for a fixed value ofσ SU ∈ [0,σ p ˜ P] σ(σ 2 +σ SU ) p ˜ P +σ 2 +2σ SU +1 (a) ≥ σ(σ 2 −σ SU ) p ˜ P +σ 2 −2σ SU +1 → (σ 2 +σ SU ) 2 ( ˜ P +σ 2 −2σ SU +1) (b) ≥ (σ 2 −σ SU ) 2 ( ˜ P +σ 2 +2σ SU +1) → ˜ Pσ 2 +σ 2 ≥σ 2 SU , (4.16) which is true for all values of ˜ P andσ SU . Here, (a) follows since the sign ofσ SU only affects the term σ(σ 2 +σSU) √ ˜ P+σ 2 +2σSU+1 in (4.15) and(b) follows sinceσ 2 ≥|σ SU | for allP≤σ 2 . Finally, it can be easily shown that the minimum distortionD ∗ is a decreasing function of ˜ P and hence ˜ P = P achieves the minimum distortion. This completes the proof of the converse. Remark 16. The lower bound in [43] was derived by computing the following expression D ∗ ≥ min E(d(X, ˆ X)), where the minimum is over all conditional pmfs p(u|s),p(ˆ x|u,y,s) such that I(U,S;Y) ≥ I( ˆ X;S). Observe that by lettingβ = 0 andα≥ 0 in (4.10), we recover the lower bound of [43]. Before proving the optimality of our proposed lower bound, we comment on the looseness of the lower bound proposed in [43], which corresponds to the special case ofβ = 0 in our proposed lower bound. 96 Figure 4.5: Pictorial description of the estimation problem. In Fig. 4.5, we give a geometric description of the random variables involved in the estimation problem. The proposed lower bound in the last subsection will be tight if we chooseα andβ such that • S−α ˆ X−βY is orthogonal to the linear plane containing(V,Y), and • S−αX−βY is a scalar multiple of ˆ X−X. From Fig. 4.5, we can observe that the first condition would be satisfied if we chooseα andβ such that α ˆ X +βY = ˆ S = E(S|V,Y), the MMSE estimator of S given (V,Y). For this choice of α and β, if we can show thatS− ˆ S = S− E(S|V,Y) is a scalar multiple of ˆ X−X = E(X|V,Y)−X (it is clear from the Fig. 4.5 since bothS− ˆ S and ˆ X−X are orthogonal to the plane containing(V,Y)), the second condition is implied sinceS−αX−βY = (S− ˆ S)+α( ˆ X−X). For the lower bound of [43] to be tight, one needs to chooseα such that • S−α ˆ X is orthogonal to ˆ X, and • S−αX is a scalar multiple of ˆ X−X. From Fig. 4.5, we can observe that the first condition would be satisfied if we chooseα such thatα ˆ X = ˆ S = E(S|V,Y) or in other words, ˆ X and ˆ S are collinear (it is shown using dotted line in Fig. 4.5). For this choice ofα, the second condition is implied sinceS−αX = (S− ˆ S)+α( ˆ X−X) andS− ˆ S is a scalar multiple of ˆ X−X. However the first condition restricts the possible choices of the encoding function (and hence also the reconstruction function ˆ X) as the plane containing the resulting modified stateX and channel stateS has to be orthogonal to the (V,Y) plane. Thus the lower bound proposed in [43] can be strictly suboptimal for some values ofP andσ 2 and we will illustrate this with plots in the sequel. 97 In the next subsection, we will show that a combination of linear coding and dirty-paper coding (DPC) proposed in [42] satisfies both the conditions outlined above with the proper selection of combining coeffi- cients and hence it achieves the minimum distortionD ∗ for the asymptotic Witsenhausen counterexample. 4.5.2 Proof of Achievability The achievability proof is trivial forP≥σ 2 , since in this case zero minimum distortion can be achieved by choosing the channel inputU =−S, which precancels the interference. Hence we prove the achievability for P ≤ σ 2 . For achievability, we specialize Theorem 18 to the Gaussian case by considering jointly Gaussian(S,V,U,X,Y) with the following conditions: • V =ηS +N v , whereN v ∼ N(0,σ 2 v ) is independent ofS andZ andη is given by (4.14). • U =V +γS = (η+γ)S +N v , sinceU is a function ofV andS. • X =U +S = (1+η+γ)S +N v . • Y =X +Z = (1+η+γ)S +N v +Z. • Thenσ SU = E(US) = (η+γ)σ 2 , i.e.,η+γ = σSU σ 2 . • P = E(U 2 ) = (η +γ) 2 σ 2 +σ 2 v = σ 2 SU σ 2 +σ 2 v , and this impliesσ 2 v = P− σ 2 SU σ 2 and sinceσ 2 v ≥ 0, we have− √ Pσ 2 ≤σ SU ≤ 0. • Choose ˆ X(V,Y) = E(X|V,Y), the minimum mean-squared error (MMSE) estimator ofX given (V,Y). The achievable strategy proposed here is essentially similar to the coding scheme proposed in [42, 43, 16], which also uses a combination of linear coding and DPC scheme, but with the following conditions: • U =U dpc +U lin , whereU lin =−bS andU dpc is a Gaussian random variable independent ofS and Z. • V =U dpc +a(1−b)S, wherea is the amplification factor similar to the DPC scheme (see [24] for details). • Choose ˆ X(V,Y) = E(X|V,Y). 98 Table 4.1: Equivalence of the achievable scheme of [43] to our proposed coding scheme Proposed coding scheme Linear coding + DPC [43] N v U dpc η a(1−b) η+γ −b The mapping of this coding scheme to our proposed coding strategy is delineated in Table 4.1. We will evaluate the distortion achieved by the coding strategy and we will show that the coding scheme achieves the lower bound onD ∗ derived in the last subsection. Although the coding strategy considered here is same as that of in [42, 16], in contrast to these prior works, we are able to evaluate the optimal values of the parameters η and γ (hence a and b) that achieves the minimum distortion D ∗ and these optimal values enable us to get a better understanding of the estimation problem over the implicit communication channel. Before proving the optimality of the coding scheme, we introduce two key propositions. Proposition 7. For jointly Gaussian(S,V,U,X,Y) and α = E(SV) E(Y 2 )− E(SY) E(VY) E(VY) = η(σ 2 +Pσ 2 −σ 2 SU )−(σ 2 +σ SU )(P− σ 2 SU σ 2 ) η(σ 2 +σ SU )+P− σ 2 SU σ 2 β = E(SY) E(V 2 )− E(SV) E(VY) E(VY) = η(σ 2 SU −Pσ 2 )+(σ 2 +σ SU )(P− σ 2 SU σ 2 ) η(σ 2 +σ SU )+P− σ 2 SU σ 2 , it can be shown thatα ˆ X +βY =α E(X|V,Y)+βY = E(S|V,Y). Proposition 7 shows that for jointly Gaussian (S,V,U,X,Y),α ˆ X +βY is orthogonal to, and hence independent of(V,Y), since they are jointly Gaussian. The proof of this proposition is provided below. Proof. Since(X,V,Y) is jointly Gaussian, we can write E(X|V,Y) =νV +θY. 99 Here ν = E(XV) E(Y 2 )− E(XY) E(VY) |Det| (a) = E(VY) E(Y 2 )− E(XY) E(VY) |Det| = E(VY)(E(Y 2 )− E(XY)) |Det| = E(VY) E(Y(Y−X)) |Det| = E(VY) E(YZ) |Det| (b) = E(VY) |Det| , (4.17) where|Det| = E(Y 2 ) E(V 2 )− E 2 (VY) (E(Y 2 ) E(V 2 )− E 2 (VY)≥ 0 by Cauchy–Schwartz inequality) and (a) follows since E(VY) = E(V(X +Z)) = E(XV)+ E(VZ) = E(XV), and (b) follows since E(YZ) = E((X +Z)Z) = E(Z 2 ) = 1. Similarly, θ = E(XY) E(V 2 )− E(XV) E(VY) |Det| (a) = E(XY) E(V 2 )− E 2 (VY) |Det| , (4.18) where (a) follows since E(XV) = E(VY). Similarly for jointly Gaussian (S,V,Y), E(S|V,Y) can be written as E(S|V,Y) =λV +φY. Here λ = E(SV) E(Y 2 )− E(SY) E(VY) |Det| φ = E(SY) E(V 2 )− E(SV) E(VY) |Det| . (4.19) 100 Now expandingα ˆ X +βY =α E(X|V,Y)+βY = E(S|V,Y) we get α(νV +θY)+βY =λV +φY ανV +(αθ+β)Y =λV +φY. Let us chooseα andβ such that λ =αν (4.20) and φ =αθ+β =α(θ−1)+α+β. (4.21) From (4.20) it can be radily seen that α = λ ν = E(SV) E(Y 2 )− E(SY) E(VY) E(VY) . (4.22) Considering (4.21) we obtain β =φ−αθ (a) = (1−α(γ +1))φ (b) = |Det| E(VY) φ = E(SY) E(V 2 )− E(SV) E(VY) E(VY) , (4.23) where(a) follows from the fact that for jointly Gaussian(S,V,U,Y),θ = (γ+1)φ, and(b) follows from (4.38). This completes the proof of the proposition. Proposition 8. For jointly Gaussian(S,V,U,X,Y),S−βY−αX = (1−(α+β))S−(α+β)U−βZ is linearly dependent with ˆ X−X = E(X|V,Y)−X, that is, one is a scalar multiple of the other. 101 The proposition can be shown as follows. Proof. For the choice ofα andβ in Proposition 8, from Proposition 7 we have ˆ S =α ˆ X +βY S− ˆ S =S−α ˆ X−βY S− ˆ S =S−α( ˆ X−X)−βY−αX S− ˆ S +α( ˆ X−X) =S−βY−αX, (4.24) where ˆ S = E(S|V,Y) and ˆ X = E(X|V,Y). From (4.24), it can be readily seen that showingS−βY−αX is linearly dependent to ˆ X−X is same as showing thatS− ˆ S is a scalar multiple of ˆ X−X. Now ˆ X−X = E(X|V,Y)−X = E(U +S|V,Y)−U−S = E(U|V,Y)−U + E(S|V,Y)−S (a) = E(V +γS|V,Y)−V−γS + ˆ S−S = (γ +1)( ˆ S−S), (4.25) where(a) follows from the choice of auxiliary variableV . This completes the proof of the proposition. With Proposition 7 and 8 in hand, we can prove the achievability of Theorem 20. The achievability proof requires much manipulation, for conciseness, we highlight only the key steps of the proof, relegating the details to [21]. For jointly Gaussian(S,U,V,X,Y), let us consider the upper bound proposed in Theorem 18: I(U,S;Y)−I(V,Y;S) =h(Y)−h(Y|U,S)−h(S)+h(S|V,Y) =h(Y)−h(Z)−h(S)+h(S−α ˆ X−βY|V,Y) (a) =h(Y)−h(Z)−h(S)+h(S−α ˆ X−βY) =h(Y)−h(Z)−h(S)+h((1−(α+β))S−(α+β)U−βZ−α( ˆ X−X)) (b) = 1 2 log(P +σ 2 +2σ SU +1)− 1 2 logσ 2 + 1 2 log( p E(A 2 )+ p E(B 2 )) 2 , (4.26) 102 where (a) follows from Proposition 7, and (b) follows from Proposition 8 and the definitions ofA andB if A =µB withµ ≥ 0. (4.27) Equation (4.26) implies that E(( ˆ X−X) 2 )≥ 1 α 2 s σ 2 P +σ 2 +2σ SU +1 − p E(A 2 ) + 2 , (4.28) where E(A 2 ) = (1−(α+β)) 2 σ 2 +(α+β) 2 P +β 2 −2(α+β)(1−(α+β))σ SU ,(x) + = max{0,x} and the value ofα andβ are given by Proposition 7. The upper bound on the distortion is given by E( ˆ X−X) 2 = E(E(X|V,Y) 2 −X) 2 subject to the conditions (4.28) and (4.27). For the upper bound to match with the lower bound, we have to show that the value ofη given in (4.14) satisfies both the conditions (4.28) and (4.27). Now, let us compute the value of E(E(X|V,Y) 2 −X) 2 . From Proposition 7 we have S− ˆ S = (S−βY−αX) | {z } A −α( ˆ X−X) | {z } B . (4.29) From (4.29) we can write E(S− ˆ S) 2 (a) = E(B 2 )+ E(A 2 )+2 p E(B 2 ) p E(A 2 ) = ( p E(B 2 )+ p E(A 2 )) 2 → E( ˆ X−X) 2 = 1 α 2 (( q E(S− ˆ S) 2 − p E(A 2 )) + ) 2 , (4.30) 103 where (a) follows provided the condition of (4.27) holds. We consider the estimation error in estimating the stateS, i.e., E(S− ˆ S) 2 = E(S− E(S|ηS +N v ,(1+γ)S +Z)) 2 (a) = E(S 2 )− 1 |Det| (η 2 E(S 2 ) 2 +(1+γ) 2 E(S 2 ) 2 E(N 2 v )) = E(S 2 )− η 2 E(S 2 ) 2 +(1+γ) 2 E(S 2 ) 2 E(N 2 v ) (1+γ) 2 E(S 2 ) E(N 2 v )+η 2 E(S 2 )+ E(N 2 v ) = σ 2 (η−1) 2 σ 2 +2σ SU (1−η)+1+ σ 2 SU σ 2 + η 2 σ 2 P− σ 2 SU σ 2 . (4.31) where(a) follows since(S,V,Y) are jointly Gaussian and|Det| = (1+γ) 2 E(S 2 ) E(N 2 v )+η 2 E(S 2 )+ E(N 2 v ). To satisfy the condition of (4.28), from (4.30) we must have P +σ 2 +2σ SU +1 ≥ (η−1) 2 σ 2 +2σ SU (1−η)+1+ σ 2 SU σ 2 + η 2 σ 2 P− σ 2 SU σ 2 , (4.32) resulting in a quadratic equation inη given by (Pσ 2 +σ 2 −σ 2 SU )η 2 −2η(σ 2 +σ SU ) P− σ 2 SU σ 2 − P− σ 2 SU σ 2 2 ≤ 0. (4.33) The discriminant of the quadratic equation is given by 4(σ 2 +σ SU ) 2 P− σ 2 SU σ 2 2 +4(Pσ 2 +σ 2 −σ 2 SU ) P− σ 2 SU σ 2 2 ≥ 0 and hence the quadratic equation (4.33) has real roots. Manipulating (4.33) we have, (P +σ 2 +2σ SU +1)η 2 σ 2 − η(σ 2 +σ SU )+ P− σ 2 SU σ 2 2 ≤ 0, (4.34) which implies that either η≤ P− σ 2 SU σ 2 (σ 2 +σ SU )+σ √ P +σ 2 +2σ SU +1 (Pσ 2 +σ 2 −σ 2 SU ) , (4.35) 104 or η≥ P− σ 2 SU σ 2 (σ 2 +σ SU )−σ √ P +σ 2 +2σ SU +1 (Pσ 2 +σ 2 −σ 2 SU ) . (4.36) We now consider the condition (4.27) to determine the range of values ofη that are feasible. From Propo- sition 7 we have, S−αX−βY (a) =− 1 γ +1 ( ˆ X−X)+α( ˆ X−X) = 1 α(γ +1) −1 (−α)( ˆ X−X), (4.37) where(a) follows from Proposition 8. Comparing (4.37) with the condition of (4.27), we get, µ = 1 α(γ +1) −1 (a) = 1 σ 2 +σ SU σ 2 | {z } η ′′ 1 −η η− (σ 2 +σ SU ) P− σ 2 SU σ 2 (Pσ 2 +σ 2 −σ 2 SU ) | {z } η ′ 1 , (4.38) where (a) follows from Proposition 7, Proposition 8 and some algebraic manipulations. Thus, to satisfy the condition (4.27) we should selectη such thatη ′ 1 ≤η≤η ′′ 1 sinceη ′′ 1 ≥η ′ 1 . It is clear that for some fixedσ SU ∈ [−σ √ P,0], when η ′′ 1 ≥ P− σ 2 SU σ 2 (σ 2 +σ SU )+σ √ P +σ 2 +2σ SU +1 (Pσ 2 +σ 2 −σ 2 SU ) =η ∗ , (4.39) condition (4.27) is staisfied atη = η ∗ and hence for this choice of η, the achievable distortion is given from (4.30) as E( ˆ X−X) 2 = 1 α 2 (( q E(S− ˆ S) 2 − p E(A 2 )) + ) 2 (a) = 1 α 2 q E(S− ˆ S) 2 − q (1−α(γ +1)) 2 E(S− ˆ S) 2 + 2 = E(S− ˆ S) 2 α 2 1− p (1−α(γ +1)) 2 + 2 , (4.40) 105 where(a) follows from the fact thatS−βY−αX | {z } A = (1−α(γ+1))(S− ˆ S). The value of(1−α(γ+1)) 2 is given by (1−α(γ +1)) = 1 η ∗ σ P− σ 2 SU σ 2 p P +σ 2 +2σ SU +1, (4.41) Now substituting this value of p (1−α(γ +1)) 2 in (4.40) we get E( ˆ X−X) 2 = E(S− ˆ S) 2 α 2 1− p (1−α(γ +1)) 2 + 2 = σ(σ 2 +σ SU )−(Pσ 2 −σ 2 SU ) √ P +σ 2 +2σ SU +1 + 2 (P +σ 2 +2σ SU +1)(Pσ 2 +σ 2 −σ 2 SU ) 2 . (4.42) Finally, consider the case when for some choice ofσ SU ∈ [−σ √ P,0] σ 2 +σ SU σ 2 ≤ P− σ 2 SU σ 2 (σ 2 +σ SU )+σ √ P +σ 2 +2σ SU +1 (Pσ 2 +σ 2 −σ 2 SU ) σ(σ 2 +σ SU )≤ (Pσ 2 −σ 2 SU ) p P +σ 2 +2σ SU +1. (4.43) In this case, if we choose input power ˆ P =σ 2 SU /σ 2 ≤P , the LHS in (4.43) becomes larger than the RHS, which reduces to0. Since(Pσ 2 −σ 2 SU ) √ P +σ 2 +2σ SU +1 is a continuous function ofP , we can then conclude that there exists an input power assignmentP ∗ withP ∗ ≤P such that σ(σ 2 +σ SU ) = (P ∗ σ 2 −σ 2 SU ) p P ∗ +σ 2 +2σ SU +1. Hence for this choice of input power, E( ˆ X−X) 2 = 0 from (4.42). This completes the proof of the achievability. Remark 17. For optimal η = η ∗ and η ∗ +γ =−b = σ SU /σ 2 6=−1 (if b = 1, the interference is pre-canceled and hence there is no need to perform the DPC strategy), the optimal amplification factora ∗ that achieves the minimum MMSE is given by a ∗ = Pσ 2 −σ 2 SU Pσ 2 +σ 2 −σ 2 SU 1+ σ √ P +σ 2 +2σ SU +1 σ 2 +σ SU . 106 Figure 4.6: Comparison of our optimal amplification factora ∗ with the optimala used in DPC strategy of [24]. Note that the optimal amplification factor a of the DPC strategy proposed in [24], which achieves the capacity of the state-dependent channelY =U dpc +(1−b)S +Z is given by a = E(U dpc ) 2 E(U dpc ) 2 +1 = E(U +bS) 2 E(U +bS) 2 +1 = Pσ 2 −σ 2 SU Pσ 2 +σ 2 −σ 2 SU . We can observe that the DPC strategy is optimal for estimating the modified stateX, albeit with a different amplification factora than the one which maximizes the rate over the state-dependent channel. This obser- vation enables us to understand the difference between communication and estimation in the presence of an interfering signal which is known at the transmitter. While the objective of the DPC strategy considered in [24] was to choose an amplification factora that maximize the rate of communication without violating the decodability condition ofV =U dpc +a(1−b)S, the estimation problem that we consider here instead looks for an amplification factora ∗ that minimizes the mean squared error E(X−E(X|V,Y)) 2 in estimat- ing the modified stateX subject to correct decoding ofV . Figure 4.6 compares the optimal amplification factora ∗ witha as the input powerP varies from0 to1 withσ 2 = 1. Note that withσ 2 = 1, considering input powerP≥ 1 eliminates the need to use the DPC strategy, since the encoder has enough power to 107 Figure 4.7: Comparison of prior lower bounds with the minimum distortion of Theorem 20. pre-cancel the interference (b = 1). Also from Figure 4.6, we can observe that the optimal amplification factora ∗ = 1 for input powerP≥ 0.57. The implication ofa ∗ = 1 is that the encoder is able to cancel the effect of interference completely, i.e.,Y =V +Z, and since the decoder correctly decodesV , the min- imum distortion achieved isD ∗ = 0. This fact is verified in Figure 4.7, which plots the optimal distortion as a function of input power. 4.5.3 Comaparison of lower bounds with optimal MMSE In Fig. 4.7, we compare the optimal distortionD ∗ given in Theorem 20 to the lower bounds onD ∗ proposed in [16, Theorem 8] and [43, Corollary 1] for a fixed value ofσ 2 = 1 and asP varies from 0 to 0.8. As can be seen from the plot, the lower bound given in [16] is better than the one in [43], but both the lower bounds are strictly suboptimal. Observe that forP close toσ 2 = 1, the prior lower bounds are optimal as they all satisfy the condition of zero distortion discussed in Remark 15. 108 4.6 Concluding Remarks In this chapter, we study the problem of state or modified state estimation over a state dependent chan- nel with non-causal channel state information at the encoder. Although the capacity–distortion function can be optimally characterized with strictly causal or causal channel knowledge, treating the non-causal channel state information for the discrete memoryless case seems beyond our current techniques of identi- fying auxiliary random variables and using estimation-theoretic inequalities such as Lemma 6. The upper bound on the capacity–distortion function that we propose in this chapter seems to match with the lower bound for all the examples that we consider here, a fact, we exploit herein to characterize the minimum distortion in vector version of Witsenhausen counterexample. However, we could not shed any light on the optimality/suboptimality of the proposed bounds, which needs to be explored in future. So far in our discussion, we assumed that a genie has made the i.i.d. state sequences available to the encoder for free of cost. However, in practical problems acquiring some knowledge about the channel state will require some expenditure of the system resources. Extending the joint communication and estimation framework to these cost constrained channel state information setting is the focus of our next chapter. 109 Chapter 5 Action dependent state: channel coding The problem of joint information transmission and state communication introduced in Chapter 3 hinges on two asumptions; (a) the state sequence is i.i.d. and is decided by the nature and the users have no control over the channel, and (b) the encoders can acquire the channel state informations at no price. But in a practical setup, both these assumptions are likely to be failed. For e.g., in a cognitive radio setup, the encoder take cost constrained actions to determine the available white spaces in a licensed spectrum and choose to allocate resources to a the channel that supports the best rate among the available options. In this chapter, we extend the joint communication and sensing framework to include the notion of “action” dependent states and cost constrained action dependent channel state information at the channel enoders. The rest of this chapter is organized as follows. Section 5.1 describes the basic channel model of action- dependent channels with discrete alphabets. Section 5.2 considers the problem of joint communication and estimation over these channel with strictly causal or causal perfect channel state knowledge at the channel encoder, and characterizes the capacity–distortion function for both adaptive and non-adaptive action encoder. Section 5.3 illustrate the results with few examples. Section 5.4 studies the capacity of adaptive action dependent channel with non-causal state knowledge and illustrates the result with the example of the action-dependent additive Gaussian channel. Section 5.5 extends the results to add a cost to the acquisition of channel state information at the channel encoder. Finally, Section 5.6 concludes the chapter. 110 Figure 5.1: Channels with adaptive action-dependent states. 5.1 Joint Communication and Estimation: Problem Formulation Consider a point-to-point communication system with action dependent state depicted in Fig. 5.1. Suppose that the encoder has access to the noisy channel state sequenceS n e and wishes to communicate the state S n to the decoder in addition to the messageM. We assume a DMC with discrete memoryless (DM) state model (X×S×S e ×A,p(y|x,s)p(s|a)p(s e |s,a),Y) that consists of a finite input alphabetX , a finite output alphabetY, a finite state alphabetS, a finite noisy state alphabetS e , a finite action alphabetA and a collection of conditional pmfsp(y|x,s) onY, p(s|a) onS andp(s e |s,a) onS e . The communication channel is memoryless in the sense that, without feedback,p(y n |x n ,s n ) = Q n i=1 p Y|X,S (y i |x i ,s i ), con- ditioned on the action and state(S e1 ,S e2 ,...) is independent and identically distributed (i.i.d.) and given the action sequence, the state is memoryless in the sense that(S 1 ,S 2 ,...) is i.i.d. withS i ∼p S (s i |a i ). A(2 nR ,n) code for action-dependent state communication consists of • a message set[1:2 nR ], • a non-adaptive action action encoder that assigns an action sequencea n (m)∈A n to each message m∈ [1:2 nR ], • lets n ∈S n be the channel state sequence generated in response to the action sequencea n ∈A n . Given nature generated state sequences n and messagem dependent action sequencea n , channel encoder receives partial state informations n e ∈S n e through a DMC characterized by the conditional pmfp(s e |s,a), • a channel encoder that assigns a symbolx i (m,s k e )∈X to each messagem∈ [1 : 2 nR ] and each state sequences k ∈S k and 111 • a decoder that assigns a message estimate ˆ m∈ [1 : 2 nR ] (or an error message e) to each received sequencey n ∈Y n , • a decoder that assigns an estimate ˆ s n ∈ ˆ S n to each received sequencey n ∈Y n . In case of an adaptive action encoder, the encoder assigns an action sequence a i (m,s i−1 e )∈A to each messagem∈ [1:2 nR ] and past state sequences i−1 e ∈S i−1 fori∈ [1:n]. Also depending on the amount of channel state information at the channel encoder, we can further classify the action dependent channel into three groups: (a) strictly causal channel state knowledgek =i−1, (b) causal channel state knowledgek =i, and(c) non-causal channel state informationk =n. We assume thatM is uniformly distributed over the message set. The average probability of error is defined asP (n) e = P{ ˆ M6=M}. The fidelity of the state estimate is measured by the expected distortion E(d(S n , ˆ S n )) = 1 n n X i=1 E(d(S i , ˆ S i )), whered :S× ˆ S→ [0,∞) is a distortion measure between a state symbols∈S and a reconstruction symbol ˆ s∈ ˆ S. Without loss of generality, we assume that for every symbol s∈S there exists a re- construction symbol ˆ s∈ ˆ S such thatd(s,ˆ s) = 0. A rate–distortion pair is said to be achievable if there exists a sequence of(2 nR ,n) codes such thatlim n→∞ P (n) e = 0 andlimsup n→∞ Ed(S n , ˆ S n )≤D. The capacity–distortion functionC A SC (D) is the supremum of the ratesR such that (R,D) is achievable. We characterize this optimal tradeoff between information transmission rate (capacity C) and state estima- tion (distortionD) for both non-adaptive and adaptive action dependent channels with various degrees of channel state information at the encoder in the following sections. 5.2 Action dependent Causal state communication The action dependent channel model described in Section II considers two important aspects of state dependent channel. The formation channel states is assumed to be controlled by an action encoder, and the acquisition of channel state information at the channel encoder is associated with the expenditure of costly system resources. Deriving capacity–distortion function for the combined channel model of Fig. 5.1 is an open problem. So we treat both these aspects of the channel model separately. In this section, we characterize the capacity–distortion function of the action dependent channel with strictly causal (k =i−1 in Fig. 5.1) or causal (k =i in Fig. 5.1) perfect channel state information (S e =S) at the channel encoder. 112 The setting where the channel encoder observes a partial channel state information S e though a noisy communication channelp(s e |s,a) is considered later in the chapter. 5.2.1 Non-adaptive action Initially the action encoder is assumed to be non-adaptive, which means that the action sequence chosen by the action encoder is a function of the messagem∈ [1 : 2 nR ]. We characterize the optimal tradeoff between information transmission rate (capacityC) and state estimation (distortionD) for this channel as follows. Theorem 21. The capacity–distortion function for strictly causal action dependent state communication is C A SC (D) = max I(U,A,X;Y)−I(U,X;S|A) , where the maximum is over all conditional pmfsp(a)p(x|a)p(u|x,s,a) and functionˆ s(u,x,a,y) such that E(d(S, ˆ S))≤D andI(U,X;Y|A)−I(U,X;S|A)≥ 0. Remark 18. It might sometimes be natural to consider channels of the formp(y|s,x,a). The capacity– distortion expression remains unchanged for this more general channel model. This follows directly by defining a new stateS ′ = (S,A) and applying the above characterization. Remark 19. When both the sender and the receiver is oblivious of the channel state, the capacity– distortion function for action dependent state communication can be obtained by choosing U =∅ and is given by, C A (D) = maxI(X,A;Y), where the maximum is over all conditional pmfsp(a)p(x) and function ˆ s(x,a,y) such that E(d(S, ˆ S))≤ D. Remark 20. The capacity–distortion function can be rewritten as C A SC (D) = maxI(A;Y)+ I(U,X;Y|A)−I(U,X;S|A) , where the maximum is over all conditional pmfsp(a)p(x|a)p(u|x,s,a) and functionˆ s(u,x,a,y) such that E(d(S, ˆ S))≤ D andI(U,X;Y|A)−I(U,X;S|A)≥ 0. The expression shows that capacity achieving 113 schemes are of a 2-stage form: The first stage involves the choice and communication of a state sequence, through whichI(A;Y) bits can be communicated per channel use, while the second stage consists of a block Markov coding strategy with decoder side information similar to the achievability scheme of [20], since conditioned on the action sequence, which has been deciphered following the first stage, the prob- lem is equivalent to the strictly causal state communication in [20]. In fact, the information inequality I(U,X;Y|A)−I(U,X;S|A)≥ 0 stems from the non-negativity of the rate in second stage of commu- nication. Hence, the achievability scheme essentially follows the coding scheme proposed in [20] for the setting of strictly-causal state amplification. For completeness, the proof of the theorem is provided below. Remark 21. In [64], [131], also a similar two-stage coding condition appears in a similar fashion as an extra constraint resulted from the additional reconstruction requirements in the two-stage communication setting. While [64] considers the constrained channel coding problem where the channel state is allowed to depend on an action sequence and the decoder is additionally required to reconstruct the channel input signal reliably (reversible input (RI) constraint in the terminology of [108]), [131] study the two-user state-dependent multiaccess channel in which the states of the channel are known non-causally to one of the encoders and only strictly causally to the other encoder. Our problem is setting is different from both these works, since we consider strictly causal channel knowledge at the channel encoder and we wish reconstruct the state (not the channel input) at the decoder. Remark 22. So far in our discussion, we have assumed that the channel encoder has strictly causal knowledge of the state sequence. The results can be easily extended using the Shannon strategy (see [99]) to include the scenario when the channel encoder has causal knowledge of the state sequence, that is, at timei∈ [1:n] the previous and current state sequences i is available at the channel encoder. In this case, the capacity–distortion function is given by C A C (D) = max I(U,V,A;Y)−I(U,V;S|A) , where the maximum is over all conditional pmfsp(v),p(a),p(u|v,s,a) and functionsx(v,s),ˆ s(u,v,a,y) such that E(d(S, ˆ S)) ≤ D and I(U,V;Y|A)−I(U,V;S|A) ≥ 0. The capacity–distortion function with causal state information can be strictly better than that of with strictly causal state knowledge, as by substitutingV =X we can recover the result in Theorem 21. 114 5.2.2 Proof of Achievability We now give a sketch of achievability of Theorem 21. We useb transmission blocks, each consisting ofn symbols. The channel encoder uses rate-splitting technique, whereby in blockj, it appropriately allocates it’s rate between cooperative transmission of common messagem j and a description of the state sequence S n (j−1) in blockj−1. Codebook generation. Fix a conditional pmfp(a)p(x|a)p(u|x,s,a) and function ˆ s(u,x,y,a) that attain C A SC (D/(1+ǫ)), whereD is the desired distortion, and letp(u|x,a) = P s p(s|a)p(u|x,s,a). For each j∈ [1:b], randomly and independently generate 2 nR sequencesa n (m j ),m j ∈ [1:2 nR ], each according to Q n i=1 p A (a i ) and for eacha n (m j ), generate2 nRS sequencesx n (m j ,l j−1 ),m j ∈ [1:2 nR ],l j−1 ∈ [1: 2 nRS ], each according to Q n i=1 p X|A (x i |a i ). For eachm j ∈ [1 : 2 nR ],l j−1 ∈ [1 : 2 nRS ], randomly and conditionally independently generate 2 n ˜ RS sequencesu n (k j |m j ,l j−1 ),k j ∈ [1 : 2 n ˜ RS ], each according to Q n i=1 p U|X,A (u i |x i (m j ,l j−1 ),a i (m j )). Partition the set of indicesk j ∈ [1:2 n ˜ RS ] into equal-size bins B(l j ) = [(l j −1)2 n( ˜ RS−RS) +1 :l j 2 n( ˜ RS−RS) ],l j ∈ [1 : 2 nRS ]. The codebook is revealed to the both encoder and the decoder. Encoding. By convention, letl 0 = 1. At the end of blockj, the sender finds an indexk j such that (s n (j),u n (k j |m j ,l j−1 ),x n (m j ,l j−1 ),a n (m j ))∈T (n) ǫ ′ . If there is more than one such index, it selects one of them uniformly at random. If there is no such index, it selects an index from [1 : 2 n ˜ RS ] uniformly at random. In blockj +1, the action encoder chooses the action sequencea n (m j+1 ), wherem j+1 is the new message index to be sent in blockj+1. Lets n (j+1) be the channel state sequence generated in response to the action sequence. The channel encoder then transmitsx n (m j+1 ,l j ) over the state dependent channel in blockj +1, wherel j is the bin index ofk j . Decoding. Letǫ > ǫ ′ . At the end of blockj +1, the receiver finds the unique index ˆ m j+1 , ˆ l j such that (x n (ˆ m j+1 , ˆ l j ),y n (j +1),a n (ˆ m j+1 ))∈T (n) ǫ . It then looks for the unique compression index ˆ k j ∈B( ˆ l j ) such that (u n ( ˆ k j |ˆ m j , ˆ l j−1 ),x n (ˆ m j , ˆ l j−1 ),a n (ˆ m j ),y n (j))∈T (n) ǫ and ˆ k j ∈B( ˆ l j ). Finally it computes the reconstruction sequence as ˆ s i (j) = ˆ s(u i ( ˆ k j |ˆ m j , ˆ l j−1 ),x i (ˆ m j , ˆ l j−1 ),a i (ˆ m j ),y i (j)) fori∈ [1:n]. Following the analysis of capacity–distortion function in [18], it can be easily shown that the scheme can achieve any rate up to the capacity-distortion function given in Theorem 21. 115 Before proving the converse of Theorem 21, we recall Lemma 6 and summarize a few useful properties ofC A SC (D) (similar to the [18, Corollary 1]). Corollary 5. The capacity-distortion functionC A SC (D) in Theorem 21 has the following properties: (1)C A SC (D) is a non-decreasing concave function ofD for allD≥D ∗ , (2)C A SC (D) is a continuous function ofD for allD>D ∗ , (3)C A SC (D ∗ ) = 0 ifD ∗ 6= 0 andC SC (D ∗ )≥ 0 ifD ∗ = 0, whereD ∗ is the minimum distortion with strictly causal channel state at the sender akin to the zero rate case in [18]. With this lemma in hand, we now prove the converse of Theorem 21. 5.2.3 Proof of the Converse We need to show that given any sequence of(2 nR ,n)-codes withlim n→∞ P (n) e = 0 and E(d(S n , ˆ S n ))≤ D, we must haveR≤C A SC (D). We identify the auxiliary random variablesU i := (M,S i−1 ,Y n i+1 ,A n\i ), i∈ [1 :n] with (S 0 ,Y n+1 ) = (∅,∅). Note that, as desired, (U i ,A i )→ (X i ,S i )→ Y i form a Markov chain. Consider nR =H(M) (a) ≤I(M;Y n )+nǫ n = n X i=1 I(M;Y i |Y n i+1 )+nǫ n ≤ n X i=1 I(M,Y n i+1 ;Y i )+nǫ n = n X i=1 (I(M,Y n i+1 ,S i−1 ;Y i )−I(S i−1 ;Y i |M,Y n i+1 ))+nǫ n (b) = n X i=1 I(M,Y n i+1 ,S i−1 ,A n ;Y i )− n X i=1 I(Y n i+1 ;S i |M,S i−1 ,A n )+nǫ n (c) = n X i=1 I(M,Y n i+1 ,S i−1 ,A n ;Y i )− n X i=1 I(M,S i−1 ,Y n i+1 ,A n\i ;S i |A i )+nǫ n (d) = n X i=1 (I(U i ,X i ,A i ;Y i )−I(U i ,X i ;S i |A i ))+nǫ n , 116 where (a) can be shown by Fano’s inequality [29, Theorem7.7.1], (b) follows from the Csis´ zar sum iden- tity [35, Sec. 2.3] and sinceA n is a function ofM, (c) follows from the fact that givenA i ,(M,S i−1 ,A n\i ) is independent ofS i , and (d) is true asX i is a function of(M,S i−1 ). Similarly, for this choice ofU i , n X i=1 I(U i ,X i ;S i |A i ) = n X i=1 I(M,S i−1 ,Y n i+1 ,A n\i ,X i ;S i |A i ) = n X i=1 I(Y n i+1 ;S i |M,S i−1 ,A n ) (b) = n X i=1 I(S i−1 ;Y i |M,Y n i+1 ,A n ) ≤ n X i=1 I(M,S i−1 ,Y n i+1 ,A n\i ;Y i |A i ) (d) = n X i=1 I(U i ,X i ;Y i |A i ). So now we have R≤ 1 n n X i=1 I(U i ,X i ,A i ;Y i )− n X i=1 I(U i ,X i ;S i |A i )+nǫ n (a) ≤ 1 n n X i=1 C A SC (E(d(S i ,ˆ s i (U i ,X i ,A i ,Y i ))))+nǫ n (b) ≤C A SC 1 n n X i=1 E(d(S i ,ˆ s i (U i ,X i ,A i ,Y i ))) +nǫ n (c) ≤C A SC (D), where (a) follows from the definition of capacity-distortion function, (b) follows by the concavity of C A SC (D) (see Property 1 of Corollary 5), and (c) can be shown using Lemma 6 and Corollary 5. This completes the proof of Theorem 21. 5.2.4 Adaptive Action It is natural to wonder whether “feedback” from the past states at the action stage (a i (m,s i−1 ), see Fig. 5.1) increases the capacity-distortion function or not. For an extreme example, consider a channel for whichp(y|s,x,a) = p(y|s,a). Clearly, the capacity–distortion function for any such channel with only message dependent non-adaptive action (a n (m)) is same as that of no CSI, since the action encoder is oblivious of the channel state. But with adaptive action, the action encoder can perform block Markov 117 strategy to yield a potentially larger capacity–distortion function, which is summarized below. Theorem 22. The capacity–distortion function for strictly causal adaptive action dependent state commu- nication is C AA SC (D) = max I(U,A,X;Y)−I(U,X;S|A) , where the maximum is over all conditional pmfsp(a)p(x|a)p(u|x,s,a) and functionˆ s(u,x,a,y) such that E(d(S, ˆ S))≤D. Note that the unconstrained capacity remains unchanged even if we allow the actions to depend on the past states. But in generalC AA SC (D)≥ C A SC (D) as the adaptive action helps the receiver to get a better estimate of the state. Finally, by setting A =∅ in Theorem 22, we recover the result by [18] on the capacity–distortion function when the i.i.d. state information is available strictly causally at the encoder. Remark 23. When the past states are available at both the encoders, the encoders cooperate to send information consisting of the common message and a description of the state in previous block (similar to sending a common message over multiple access channel (MAC)), whereas in the non-adaptive action scenario, while the common message is sent cooperatively, description of the state is a private message of the channel encoder. The achievability proof with adaptive action thus follows by treating the action encoder and channel encoder as a single unit and applying the block Markov coding strategy of [20] (with slight modfication to deal with fact that the channel state is conditionally independent given the actions), which exploits decoder side information. Thus the proof of the achievability is omitted here for brevity. Remark 24. In the converse proof of Theorem 21, we have used the following Markov chain condition (M,S i−1 ,A n\i )→A i →S i , which need not hold when allowing adaptive actions and hence the converse proof for adaptive action necessitates different definition of the key auxiliary random variable given by U i := (M,S i−1 ,Y n i+1 ,A i−1 ). Note that although this choice ofU i satisfiesR≤ 1 n P n i=1 I(U i ,X i ,A i ;Y i )− P n i=1 I(U i ,X i ;S i |A i )+nǫ n , we can not use it in the converse proof of Theorem 21, since with this choice ofU i we are unable to show that the information inequality P n i=1 I(U i ,X i ;S i |A i )≤ P n i=1 I(U i ,X i ;Y i |A i ) holds. Remark 25. Just like the non adaptive action dependent scenario, the result in Theorem 22 can be ex- tended to causal channel state knowledge at the channel encoder. In this case, the capacity–distortion function is given by C AA C (D) = max I(U,V,A;Y)−I(U,V;S|A) , 118 where the maximum is over all conditional pmfsp(v),p(a),p(u|v,s,a) and functionsx(v,s),ˆ s(u,v,a,y) such that E(d(S, ˆ S))≤D. 5.3 Illustrative Examples In the following subsections, we illustrate Theorem 21 and Theorem 22 through simple examples. 5.3.1 Actions Seen by Decoder: Consider the case where the decoder also has access to the actions taken. Noting that this is a special case of our setting by taking the pair (Y,A) as the new channel output, thatU→ (X,S,A)→ Y if and only ifU→ (X,S,A)→ (Y,A), we obtain that the capacity–distortion function for the case of message depepdent action is given by C A SC (D) = max H(A)+I(U,X;Y|A)−I(U,X;S|A) , where the maximization is over the same set of distribution and same feasible set as in Theorem 21. Similarly we can evaluate the capacity–distortion function for the case of adaptive actions. This expression is quite intuitive: The amount of information per symbol that can be conveyed through the actions in the first stage is represented by the termH(A). In the second stage, both encoder and decoder know the action sequence, so can condition on it and can perform the usual block Markov strategy on each subsequence associated with each action symbol, achieving a rate ofI(U,X;Y|A)−I(U,X;S|A). The maximization is a search for the optimal tradeoff between the amount of information that can be conveyed by the actions, and the quality of the second stage channel that they induce. 5.3.2 Gaussian Channel with Additive Action Dependent State Consider the Gaussian channel with additive action dependent state [123] Y =X +S +Z =X +A+ ˜ S +Z, 119 where ˜ S∼ N(0,Q) and the noise Z∼ N(0,N) are independent. Assume an expected average power constraint on both the channel and action encoder n X i=1 E(x 2 i (m,S i−1 ))≤nP X , n X i=1 E(a 2 i )≤nP A . We consider the squared error (quadratic) distortion measured(s,ˆ s) = (s−ˆ s) 2 . When the action sequnce is only a function of the message, using Theorem 21 we have the following. Proposition 9. The capacity–distortion function of the Gaussian channel with message dependent action is C A SC (D) = 0, 0≤D<D A min , 1 2 log P A QN/D , D A min ≤D<D max , C ( √ PX+ √ PA) 2 Q+N , D≥D max . whereD A min = QN PX+Q+N ,D max = QN Q+N andP A =P X +Q+N+P A +2 q P A (P X −( QN D −(Q+N))). When we allow the action encoder to observe the past states, the capacity–distortion follows from Theorem 22 and it has the similar form of Proposition 9, but P A and D A min are replaced by P AA and D AA min , respectively, whereP AA =P X +Q+N +P A +2 √ P A P X andD AA min =QN/P AA . The proof of the proposition is as shown below. 120 Proof. We will prove Proposition 9 for the non adaptive action framework first. For the prove of the converse, consider I(U,A,X;Y)−I(U,X;S|A) = I(X,A;Y)−(I(U;S|X,A)−I(U;Y|X,A)) = I(X,A;Y)−(h(U|X,A)−h(U|S,X,A)−h(U|X,A)+h(U|X,Y,A)) = I(X,A;Y)−(h(U|X,Y,A)−h(U|S,X,A)) (a) = I(X,A;Y)−(h(U|X,Y,A)−h(U|S,X,Y,A)) = I(X,A;Y)−I(U;S|X,Y,A) = h(Y)−h(Y|X,A)−h(S|X,Y,A)+h(S|U,X,Y,A) (b) = h(X +A+ ˜ S +Z)−h(X +A+ ˜ S +Z|X,A)−h(A+ ˜ S|X,A,X +A+ ˜ S +Z) +h(S− ˆ S|U,X,Y,A) (c) = h(X +A+ ˜ S +Z)−h( ˜ S +Z)−h( ˜ S|S +Z)+h(S− ˆ S|U,X,Y,A) (d) ≤ h(X +A+ ˜ S +Z)−h( ˜ S +Z)−h( ˜ S|S +Z)+h(S− ˆ S) (e) = h(X +A+ ˜ S +Z)− 1 2 log2πe(Q+N)− 1 2 log2πe QN Q+N +h(S− ˆ S) (f) ≤ 1 2 log2πe(P X +P A +Q+N +2σ XA )− 1 2 log2πe(Q+N)− 1 2 log2πe QN Q+N + 1 2 log2πeD = 1 2 log (P X +P A +Q+N +2σ XA )D QN , (5.1) where (a) follows from the fact thatU→ [S,X]→ Y , (b) is true since ˆ S = f(U,X,Y,A), (c) follows from the independence of(X,A) with( ˜ S,Z),(d) follows from the fact that conditioning reduces entropy, (e) is true because( ˜ S, ˜ S+Z) are jointly Gaussian, and(f) follows fromσ XA = E(XA) and the maximum differential entropy lemma [35] with|σ XA |≤ √ P X P A . For the Gaussian action dependent channel, the information inequality can be rewritten similarly as 0 ≤ I(U,X;Y|A)−I(U,X;S|A) ≤ 1 2 log (P A (P X +Q+N)−σ 2 XA )D P A QN →σ 2 XA ≤ P A P X − QN D −(Q+N) . 121 Figure 5.2: Capacity–distortion function: adaptive vs. non-adaptive action Sinceσ 2 XA ≤ P A P X by definition, we can write that ifD≥ D max = QN/(Q+N),σ 2 = P A P X and σ 2 = P A P X − QN D −(Q+N) , otherwise. Now substituting back the value ofσ XA in (5.1), we can complete the converse proof. For the achievability we take,X∼ N(0,P X ),A∼ N(0,P A ) with E(XA) =σ XA = min{P A P X ,P A P X − QN D −(Q+N) }, and U = S + ˆ Z, where ˆ Z ∼ N(0,σ 2 z ) is a Gaussian random variable independent of all other ran- dom variables and σ 2 z = QN QN D −(Q+N) . And for the reconstruction we choose ˆ S = h(U,X,A,Y) = E[S|U,X,Y,A],.i.e., ˆ S is the minimum mean-square error (MMSE) estimator givenU,X,A,Y . It is easy to show that the achievability scheme achieves the capacity–distortion function given in Proposition 9. For the adaptive action dependent Gaussian channel, since the action encoder and the channel encoder can be combined to form a single super encoder, Proposition 9 can be easily proved in this case by consid- eringX ′ = (X,A) and applying the capacity–distortion function result of Gaussian strictly causal state communication in [20]. Note that sinceP AA ≥ P A , the capacity–distortion function is larger in the adaptive action scenario 122 Figure 5.3: State dependent MAC with strictly causal CSI at both encoders. (see Figure 8). In fact, the minimum distortion achievable with adaptive action is smaller than that of non-adaptive action. But the unconstrained capacity (capacity–distion function forD≥ D max ) is same in both the cases, which implies that adaptive action in useful in estimation rather than in information transmission. Finally by substitutingP A = 0, both the capacity–distortion functions converges to the one in [18]. 5.3.3 State dependent MAC Consider communicating a common message over a memoryless state-dependent MAC (see Figure 9) characterized byp(y|s,x 1 ,x 2 ), where the state sequence is known strictly-causally to both encoders. This problem can be seen as a special case of our adaptive action setting via the following associations: A =X 2 ,X =X 1 ,p(s|a) =p(s),p(y|s,a,x) =p(y|s,x 1 ,x 2 ). Applying Theorem 22 to this case, keeping in mind the Remark 18 following the statement of the The- orem 22, about channels of the form p(y|s,x,a), we get that the capacity–distortion function is given by C S SC (D) = max I(U,X 2 ,X 1 ;Y)−I(U,X 1 ;S|X 2 ) , where the maximum is overp(x 1 ,x 2 )p(u|x 1 ,s,x 2 ) and function ˆ s(u,x 1 ,x 2 ,y) such that E(d(S, ˆ S))≤ D. This setting was considered in [71, 77] and it recovers the common message capacity results of [71, 77]. One can also consider a scenario where the state sequence is known strictly-causally to the first 123 encoder, but unknown at the second encoder and at the receiver. This problem, motivated by multiterminal communication scenarios involving transmitters with different degrees of channel state information, is a special case of Theorem 21, we get that the capacity–distortion function (C AS SC (D)) is same asC S SC (D) with additonal constraint ofI(U,X 1 ;Y|X 2 )−I(U,X 1 ;S|X 2 )≥ 0 on the feasible distributions. Clearly C S SC (D)≥ C AS SC (D), since with symmetric channel state information the encoders can jointly perform both message and state cooperation as oppose to only message cooperation when the state information is available at only one of the encoders. 5.4 Non-causal action dependent communication The capacity–distortion function of a state dependent channel with non-causal channel state information (k = n in Fig. 5.1) at the channel encoder is a long standing open problem, see [109, 23]. So in section we only consider the problem of information transmission over these action dependent channel and we compare the capacity of such channels with both non-adaptive and adaptive action encoder. We again assume that channel state knowledge at the channel encoder is perfect, i.e.,S e =S. The capacity of a non- adative action dependent channel with noncausal state knowledge at the channel encoder was characterized in [123]. In this chapter, we close the temporal gap by determining the capacity with adaptive action encoder as follows. Theorem 23. The capacity of the adaptive action-dependent channel with noncausal channel state knowl- edge at the channel encoder is C = max I(U,A;Y)−I(U;S|A) , where the maximum is over all conditional pmfsp(a)p(u|s,a) and functionx(u,s). Remark 26. Note that a 2-stage encoding strategy similar to [123] achieves capacity. The first stage involves the choice and communication of an action sequence, through whichI(A;Y) bits can be com- municated, while the second stage consists of coding over a Gelfand–Pinsker channel with states, whose distribution is the conditional one given the action sequence, which has been decoded at the decoder after the first stage of communication. Remark 27. The capacity of our model in Theorem 23 is the same as the model with noncausal state S n at the channel encoder and no state at all at the action encoder, established in [123]. This shows 124 that the strictly causal knowledge of the state at the action encoder does not increase capacity. So an achievable strategy would just ignore the strictly causal state knowledge at the action encoder, and exactly follow the achievable strategy in [123] for the nonadaptive action-dependent channel case. However our converse proof does not follow directly from the converse part proof of the capacity formula for the nonadaptive action-dependent channels in [123] because, at timei, the action encoder sends inputs which are functions of not only the message to transmit, but also the past state sequence S i−1 . The converse proof thus necessitates different definition of the key auxiliary random variableU. 5.4.1 Proof of the Converse: We need to show that given any sequence of (2 nR ,n) codes with lim n→∞ P (n) e = 0, we must have R ≤ C. We identify the auxiliary random variables U i := (M,S i−1 ,Y n i+1 ,A i−1 ), i ∈ [1 : n] with S 0 =A 0 =Y n+1 =∅. Note that, as desired,U i → (X i ,S i )→Y i form a Markov chain. Consider nR =H(M) (a) ≤I(M;Y n )+nǫ n = n X i=1 I(M;Y i |Y n i+1 )+nǫ n ≤ n X i=1 I(M,Y n i+1 ;Y i )+nǫ n = n X i=1 (I(M,Y n i+1 ,S i−1 ;Y i )−I(S i−1 ;Y i |M,Y n i+1 ))+nǫ n (b) = n X i=1 I(M,Y n i+1 ,S i−1 ,A i ;Y i )− n X i=1 I(Y n i+1 ;S i |M,S i−1 ,A i )+nǫ n (c) = n X i=1 I(M,Y n i+1 ,S i−1 ,A i−1 ,A i ;Y i )− n X i=1 I(M,S i−1 ,Y n i+1 ,A i−1 ;S i |A i )+nǫ n (d) = n X i=1 (I(U i ,A i ;Y i )−I(U i ;S i |A i ))+nǫ n , (5.2) where (a) can be shown by Fano’s inequality (see [29, Theorem7.7.1]), (b) follows from the Csis´ zar sum identity [35, Sec. 2.3] and sinceA i is a function of(M,S i−1 ), (c) follows since givenA i ,(M,S i−1 ,A i−1 ) is independent ofS i and (d) follows from the definition of the auxiliary random variableU i . LetT be the standard time-sharing random variable, uniformly distributed over[1:n] and independent of(X n ,S n ,Y n ), and letU = (T,U T ),A =A T ,X =X T ,S =S T , andY =Y T . It can be easily verified 125 that givenA, S is independent ofT , U → (X,S)→ Y form a Markov chain, andR≤ I(U,A;Y)− I(U;S|A). This completes the proof of the converse. 5.4.2 Adaptive action dependent Gaussian channel While the above achievability and converse proof is for finite alphabets, it can be easily adapted to the Gaussian setting by incorporating cost constraints on the channel input and applying the standard dis- cretization argument [35, Sections 3.4 and 3.8]. In this section, we consider the “writing on clean paper and then writing on its corrupted version" channel (introduced in [123]), which has the following channel model: Y i =X i (M,S n )+S i +Z i , 1≤i≤n =X i (M,S n )+A i (M,S i−1 )+W i +Z i , (a) ≡X i (M,S n )+A i (M)+W i +Z i , where (a) follows from Theorem 23 since adaptive action does not help in achieving a higher rate compared to the nonadaptive action scenario and • S i =A i (M,S i−1 )+W i , 1≤i≤n. • W n andZ n are independent,W n is i.i.d.∼ N(0,σ 2 w ) andZ n is i.i.d.∼ N(0,σ 2 n ). • The actions are cost constrained, 1 n E n X i=1 (A 2 i (M,S i−1 ) ≤P A . • The subsequent channel inputs are power constrained as well and is given by 1 n E n X i=1 (X 2 i (M,S n ) ≤P X . We wish to compute the capacity of this channel. A lower bound for this channel with nonadaptive action was provided in [123] by assuming that(A,S,U,X,Y) are jointly Gaussian. In the following theorem we compute the capacity of this channel and show that jointly Gaussian(A,S,U,X,Y) is indeed the optimal choice of the distributions. 126 Figure 5.4: State dependent MAC with asymmetric CSI at both encoders [130]. Theorem 24. The capacity of the adaptive action-dependent Gaussian channel under individual power constraints on the channel and action inputs is given by C = max ρXA,ρXW 1 2 log 1+ P 1 σ 2 n + 1 2 log 1+ ( √ P A +ρ XA √ P X ) 2 N 1 , where the maximization is performed over allρ XA ∈ [0,1] andρ XW ∈ [−1,0] such thatρ 2 XA +ρ 2 XW ≤ 1. HereP 1 =P X (1−ρ 2 XA −ρ 2 XW ),N 1 =P 1 +( p σ 2 w +ρ XW √ P X ) 2 +σ 2 n ,ρ XA andρ XW are the cross- correlation functions between correlated(X,A) and(X,W), respectively. The theorem can be easily proved by noting the equivalence of our problem setting of adaptive action- dependent channel with that of cooperative MAC with states known noncausally at one encoder and only strictly causally at the other encoder (see [130]) as depicted in Figure 2. In our setting of Figure 1, the channel encoder knows the state sequence S n = A n (M) +W n noncausally, but since both the action encoder and channel encoder sends a common message M, the channel encoder also knows the input sequence of the action encoderA n (M) and hence it can substract the action sequenceA n (M) from the observed state sequenceS n to learn the i.i.d. sequenceW n . This establishes the following equivalence as delineated in Table 5.1. Finally, since strictly causal knowledge of the state at one of the encoders in the MAC does not increase the common message capacity if the other is informed of the state noncausally (for details see [130]), we can use the common message capacity of additive Gaussian GGP channel 127 Table 5.1: Equivalence of setting in [130] to our formulation of adaptive action-dependent channel MAC with asymmetric CSI [130] Adaptive action-dependent channel S W W c M X 1 X X 2 A Y Y p(y|x 1 ,x 2 ,s) p(y|x,a,w) (Gaussian state dependent MAC with noncausal CSI at only one of the encoders) derived in [102] to arrive at Theorem 24. 1 Remark 28. In fact, the framework of DM MAC with asymmetric state information in [130] can be seen as a special case of our problem setting (a similar connection was made in [123] with the cooperative MAC problem in [102]) by associating the MAC encoder with strictly causal state knowledge as the action encoder (X 2 ≡ A) and other MAC encoder with noncausal state knowledge as the channel encoder (X 1 ≡ X). Applying Theorem 23 to this case, we can recover the common message capacity result of [130], since the capacity expression of Theorem 23 remains almost unchanged for a more general channel of the form p(y|s,x,a) (similar to [123]), the only difference is that X has to be chosen of the form x(u,s,a) rather thanx(u,s). 5.5 Cost constrained sampling: Probing Capacity So far in our discussion, we have focussed on the action dependent channels with perfect state knowl- edge at the channel encoder. The perfect state knowledge is acquired without incurring any loss in the system resources. This section is aimed at capturing and understanding the tradeoffs involved in natural scenarios where the acquisition of channel state information is associated with expenditure of costly sys- tem resources. The encoder and decoder actions are cost constrained creating tension between achievable rate, estimation error in estimating the channel state and the cost of acquisition of the channel state (or the defect) information. The setting is depicted in Fig 5.1 with the additional assumption that the channel state is independent of actions taken at the encoder, i.e.,p(s|a) = p(s). The capacity–distortion function of these probing channels with strictly causal or causal noisy stateS e information and the capacity with 1 After submission of [22], we have been informed of the work in [34], where the capacity of adaptive action-dependent Gaussian channel of Figure 2 has also been independently solved in the same manner. 128 Figure 5.5: Equivalence of the setting of probing the channel state at the encoder to that of channels with action dependent states. non-causal noisy state knowledge at the encoder follows directly from the results of previous sections by showing that the probing channel is equivalent to a action-dependent channel with perfect state knowledge. The strategy is outlined only for deriving the capacity of adaptive action dependent probing channel with non-causal noisy state information. Theorem 25. The capacity of the adaptive action dependent probing channel is given by C = max I(U,A;Y)−I(U;S e |A) , where the maximum is over all conditional pmfsp(a)p(u|s e ,a) and functionx(u,s e ). As mentioned earlier, the theorem can be proved by showing the equivalence (shown in Fig. 5.5) of this channel model with the action dependent channel of Section IV (as done in [5] for the non-adative action scenario). 5.5.1 Gaussian Probing Channel Using standard arguments, the capacity results of the previous section can be shown to carry over to continuous-alphabet channels, similarly as for the original problems of coding with transmitter state in- formation, such as in [24]. In this section, we consider the “Learning” to write on a dirty paper channel 129 (introduced in [5]), which has the following relations between channel inputs, channel outputs, states and actions: Y n =X n (M,S n e )+S n +Z n . • Channel state or Interference S n is i.i.d. with S n ∼ N(0,QI) and independent of i.i.d. noise, Z n ∼ N(0,NI). • We now consider the setting as in Figure 4. While in Writing on Dirty paper, it was assumed that interference or channel state was completely available, but this might not be true in real systems one might have to pay a price to acquire this information. Hence, in contrast to writing on a paper where intensity and positions of all dirt spots are known, we have to take action to learn where the paper is most dirty, hence the name Learning to Write on a Dirty paper. Actions are binary, with cost function Λ(a) =a. HereA = 1 corresponds to an observation of the channel state whileA = 0 to a lack of an observation. So S e =h(S,A) =∗ if A = 0 =h(S,A) =S if A = 1, where * stands for erasure or no information. • Channel inputs are cost constrained, 1 n E n X i=1 (X i ) 2 ≤P. Invoking Theorem 25, we have the capacity of this channel as C(Γ,P) = max I(U,A;Y)−I(U;S e |A) , where the maximum is over all conditional pmfs p(a)p(u|s e ,a) and function x(u,s e ) such that p(A = 1)≤ Γ and E(X 2 )≤P . This channel was studied in [5] and they provide a lower bound on the capacity by considering a simple power splitting achievable scheme which performs better than the trivial time sharing lower bound. In this section, we will provide an upper bound and an improved lower bound on the 130 capacity–cost function of the probing Gaussian channel. The following proposition gives a lower bound on the capacity–cost function, which is better than the one proposed in [5]. Proposition 10. The capacity of the Gaussian adaptive action dependent probing channel under individual power constraints on the channel and action inputs is lower bounded by C(Γ,P)≤ max α,σXS h(Y)+ 1 2 log (Q(αP/Γ+N)−σ 2 XS ) Γ (αP/Γ+Q+N +2σ XS ) Γ (QN) Γ (Q+N) 1−Γ , whereY ∼ ΓN(0,αP/Γ+Q+N+2σ XS )+(1−Γ)N(0,(1−α)P/(1−Γ)+Q+N) and the maximization is performed over allα∈ [0,1] andσ XS such that|σ XS |≤ q αPQ Γ . Remark 29. Note that substitutingσ XS = 0 in Proposition 10 recovers the power splitting lower bound proposed in [5]. So the proposed lower bound can be strictly larger than that of [5], which we will also illustrate thorough a plot in the sequel. 5.5.1.1 Proof of Achievability As for establishing an inner bound on the capacity, choose (S,X,Y) jointly Gaussian and U = X ∼ N(0, (1−α)P Γ ) whenA = 0. Note that whenA = 1,Y can be expanded as follows: ˜ Y =X +S +Z =X− ˆ X(A = 1,S)+S + ˆ X(A = 1,S)+Z, where ˆ X(A = 1,S) is optimal linear estimator (in the MMSE sense) ofX given(A = 1,S). Denoting ˜ X =X− ˆ X(A = 1,S) ˜ S = 1+ σ XS Q S, we get an alternate representation of ˜ Y ˜ Y = ˜ X + ˜ S +Z. Since givenA = 1, (S,X,Y) is jointly Gaussian and ˜ X is the error in the optimal estimation ofX given (A = 1,S), we get that ( ˜ X, ˜ S,Z) are independent Gaussian random variables. Conditioning onA = 1, 131 this brings us back to the Costa’s model, (see [24]) and implies that Costa’s choice of auxiliary random variable applied to our notation ˜ U = ˜ X +α costa ˜ S α costa = E( ˜ X) 2 E( ˜ X) 2 +N would be optimal in our original problem setting too. Here E( ˜ X) 2 is given by E( ˜ X) 2 = αP Γ − σ 2 XS Q . Since Y = X +S +Z, we have Y|A = 0∼ N(0,(1−α)P/(1− Γ) +Q +N) and Y|A = 1∼ N(0,αP/Γ+Q+N +2σ XS ). Hence,Y follows a distribution, which is a mixture of Gaussian, and is given byY ∼ ΓN(0,αP/Γ+Q+N +2σ XS )+(1−Γ)N(0,(1−α)P/(1−Γ)+Q+N). Considering this distribution gives the following lower bound on capacity: I(U,A;Y)−I(U;S e |A) =I(A;Y)+I(U;Y|A)−I(U;S e |A) =h(Y)−Γh(Y|A = 1)−(1−Γ)h(Y|A = 0)+Γ(I(U;Y|A = 1)−I(U;S|A = 1)) +(1−Γ)I(U;Y|A = 0) =h(Y)−Γh(Y|A = 1)+Γ(I(U;Y|A = 1)−I(U;S|A = 1))−(1−Γ)h(Y|A = 0,X) =h(Y)−Γh(Y|A = 1)+Γ(I(U;Y|A = 1)−I(U;S|A = 1))−(1−Γ)h(S +Z) =h(Y)−Γ 1 2 log(2πe)(αP/Γ+Q+2σ XS +N)−(1−Γ) 1 2 log(2πe)(Q+N) +Γ(I(U;Y|A = 1)−I(U;S|A = 1)), whereh(Y) denotes the differential entropy of a continuous random variable following Gaussian mixture model. It is well known that the differential entropy of Gaussian mixture does not admit an closed form expression (see [55] for details). By direct application of Costa’s calculation [24] I(U;Y|A = 1)−I(U;S|A = 1) = 1 2 log 1+ E( ˜ X) 2 N ! = 1 2 log 1+ αP/Γ− σ 2 XS Q N . 132 This concludes the achievability proof of Proposition 10. Proposition 11. The capacity of the Gaussian adaptive action dependent probing channel under individual power constraints on the channel and action inputs is upper bounded by C(Γ,P)≤ max α,σXS 1 2 log (P +2Γσ XS +Q+N)(Q(αP/Γ+N)−σ 2 XS ) Γ (αP/Γ+Q+N +2σ XS ) Γ (QN) Γ (Q+N) 1−Γ , where the maximization is performed over allα∈ [0,1] andσ XS such that|σ XS |≤ q αPQ Γ . 5.5.1.2 Proof of the Converse Recall that the capacity of the adaptive action dependent probing channel is given in Theorem 25. Now I(U,A;Y)−I(U,S e |A) =I(A;Y)+I(U;Y|A)−I(U;S e |A) =I(A;Y)+ P(A = 1)(I(U;Y|A = 1)−I(U;S e |A = 1)) + P(A = 0)(I(U;Y|A = 0)−I(U;S e |A = 0)) (a) =I(A;Y)+ P(A = 1)(I(U;Y|A = 1)−I(U;S|A = 1)) + P(A = 0)I(U;Y|A = 0) ≤I(A;Y)+ P(A = 1)(I(U;Y,S|A = 1)−I(U;S|A = 1)) + P(A = 0)I(U;Y|A = 0) =I(A;Y)+ P(A = 1)I(U;Y|A = 1,S)+ P(A = 0)I(U;Y|A = 0) (b) ≤I(A;Y)+ P(A = 1)I(X;Y|A = 1,S)+ P(A = 0)I(U;Y|A = 0) (c) =I(A;Y)+ P(A = 1)I(X;Y|A = 1,S)+ P(A = 0)I(X;Y|A = 0), where (a) follows from the fact that S e = S if A = 1 and S e =∗ if A = 0, (b) follows from data processing inequality sinceU→ (X,S)→Y form a Markov chain and (c) is true since givenA = 0,X 133 is a function ofU and givenA = 0 andX,Y andU are independent of each other. Continuing with the chain of inequalities I(U,A;Y)−I(U,S e |A)≤I(A;Y)+ P(A = 1)I(X;Y|A = 1,S)+ P(A = 0)I(X;Y|A = 0) =h(Y)−h(Y|A)+ P(A = 1)(h(Y|A = 1,S)−h(Y|A = 1,S,X)) + P(A = 0)(h(Y|A = 0)−h(Y|A = 0,X)) =h(Y)− P(A = 1)h(Y|A = 1)− P(A = 0)h(Y|A = 0) + P(A = 1)(h(Y|A = 1,S)−h(Y|A = 1,S,X)) + P(A = 0)(h(Y|A = 0)−h(Y|A = 0,X)) =h(Y)− P(A = 1)(h(Y|A = 1)−h(Y|A = 1,S)) − P(A = 1)h(Y|A = 1,S,X)− P(A = 0)h(Y|A = 0,X) =h(Y)− P(A = 1)I(S;Y|A = 1)− P(A = 1)h(X +S +Z|A = 1,S,X) − P(A = 0)h(X +S +Z|A = 0,X) (a) =h(Y)− P(A = 1)I(S;Y|A = 1) − P(A = 1)h(Z)− P(A = 0)h(S +Z) =h(Y)+ P(A = 1)h(S|A = 1,Y)− P(A = 1)h(S|A = 1) − P(A = 1)h(Z)− P(A = 0)h(S +Z) (b) =h(Y)+ P(A = 1)h(S|A = 1,Y)− 1 2 log(2πe) (1+Γ) (QN) Γ (Q+N) (1−Γ) , where (a) follows from the fact that given A = 0, X is independent of S and (b) is true since state S∼ N(0,Q) and noiseZ∼ N(0,N) are independent of each other. Obviously for fixed second moments, the differential entropyh(Y) is maximized if Y is Gaussian, .i.e., h(Y)≤ 1 2 log(2πe) E(Y 2 ) (a) = 1 2 log(2πe) E A (E(Y 2 )|A) = 1 2 log(2πe)(P(A = 0)(P 0 +Q+N)+ P(A = 1)(P 1 +Q+N +2σ XS )) (b) = 1 2 log(2πe)(P +Q+2Γσ XS +N), 134 where (a) follows from the law of iterated expectation and (b) follows from the fact that P(A = 0)P 0 + P(A = 1)P 1 ≤P andh(Y) is an increasing function ofP withσ XS = E(XS|A = 1). Similarly for the conditional entropyh(S|A = 1,Y) is maximized if (S,Y) is jointly Gaussian givenA = 1. Now denote by ˆ S(A = 1,Y) = E(S|A = 1,Y) the MMSE estimator ofS given(A = 1,Y) and observe that h(S|A = 1,Y) =h(S− ˆ S(A = 1,Y)|A = 1,Y) ≤h(S− ˆ S(A = 1,Y)) =h(S− E(S|A = 1,Y)) =h(S− E(S|A = 1,X +S +Z)) ≤ 1 2 log(2πe) E(S− E(S|A = 1,X +S +Z)) 2 = 1 2 log(2πe) Q(P 1 +N)−σ 2 XS P 1 +Q+N +2σ XS where in fact all the inequalities are attained with equality if(S,X,Y) are jointly Gaussian givenA = 1. HereP 1 power allocated to the transmitter whenA = 1 and can be given byP 1 = α P Γ withα∈ [0,1]. Substituting the valus ofP 1 we get h(S|A = 1,Y)≤ 1 2 log(2πe) Q(αP/Γ+N)−σ 2 XS αP/Γ+Q+N +2σ XS . Combining all of this we can rewrite the upper bound as C(Γ,P)≤ 1 2 log (P +2Γσ XS +Q+N)(Q(αP/Γ+N)−σ 2 XS ) Γ (αP/Γ+Q+N +2σ XS ) Γ (QN) Γ (Q+N) 1−Γ , where by Cauchy-Schwartz inequality|σ XS |≤ q αPQ Γ . This completes the prove of the converse. Note that the lower bound of Proposition 10 and the upper bound of Proposition 11 differ in the dif- ferential entropyh(Y). While the lower bound is based on computingh(Y) withY following a Gaussian mixture model, the upper bound is evaluated by assuming a GaussianY with the same variance as that of the Gaussian mixture used in the lower bound and thus by maximum differential entropy lemma [35], the upper bound can be strictly greater than the proposed lower bound. 135 Table 5.2: The impact of adaptive action on action-dependent channel CSI Non adaptive Adaptive Communication perfect CSIT Strictly causal [99] [99] Causal [123] [123] Non causal [123] Our work imperfect CSIT Strictly causal [99] [99] Causal [5] [5] Non causal [5] Our work Communication and Estimation perfect CSIT Strictly causal Our work Our work Causal Our work Our work Non causal Open Open imperfect CSIT Strictly causal Our work Our work Causal Our work Our work Non causal Open Open 5.6 Concluding Remarks In this chapter, we extend the state communication setting to include two interesting practical constraints. We have let the encoders take actions adaptively based on its past observation of the channel in order to decide on the current channel state, which is not only easy to estimate at the decoder, but it also provides a good channel for communicating additional independent information. We have studied different cases with various degrees of channel state information at the channel encoder and quantitatively characterized the utility/inutility of the adaptive action in those settings. We have concluded that allowing the actions to be dependent of the past channel observations is not useful in terms of communication more rate over the state dependent channel, but adaptiveness allows the decoder to get a better estimate of the channel state. We have summarized the contribution of this chapter in Table 5.2 (CSIT=channel state information at transmitter), which represents our result in perspective of existing literature. 136 Chapter 6 Distortion metric for Robotic sensor networks The scope of the problem of state communication is not only limited to the setting where the decoder wants to estimate the channel state or the channel impulse response in order to equalize the information signal distorted by the channel. The framework of state communication over a state dependent channel is quite generic, as illustrated using an example of online informative motion planning using a autonomous vehicle for data collection from a network of sensing agents. The objective of the problem is to monitor a dynamic source signal of interest. The remainder of this chapter is organized as follows. We first formulate the robotic data collection problem for a set of correlated jointly Gaussian sources, we introduce the distortion metric, we propose a communication strategy, and evaluate the distortion metric for this communication strategy along a tra- jectory (Section 6.1). The communication sterategy is then extended to a source with unknown location in Section 6.2 and to moving sources in Section 6.3. The properties of the proposed distortion metric are derived in Section 6.4. We then propose a number of approximate algorithms for optimizing the distortion metric efficiently (Section 6.5). Finally, we validate our proposed approach through simulated experiments (Section 6.6) and draw conclusions (Section 6.7). 6.1 Problem Formulation: Single Source In this section, we formulate the problem of mobile data collection from stationary sensors using an au- tonomous vehicle. 1 We consider a pre-deployed network of K sensors located inR d with d∈{2,3}, 1 Throughout the chapter, we closely follow the standard notations. In particular, a random variable is denoted by an upper case letter (e.g.,X,Y,Z) and its realization is denoted by a lower case letter (e.g.,x,y,z). The shorthand notationX n is used to denote the tuple (or the column vector) of random variables(X 1 ,...,Xn), andx n is used to denote their realizations. The notationX n ∼ p(x n ) means thatp(x n ) is the probability mass function (pmf) of the random vectorX n . Similarly,Y n |{X n =x n }∼ p(y n |x n ) means thatp(y n |x n ) is the conditional pmf ofY n given{X n =x n }. 137 Figure 6.1: Channel model for the sensor network for a particular location of the AUV . The channel is a two hop communication channel with the sensors acting as a relay. We additionally assume that the second hop channel is a function of the source signal. which yields the 2D and 3D problems respectively. We assume that the locationl s (:,k)∈R d is given for each sensork∈ [1:K], whereK is the total number of deployed sensors. The objective of this chapter is planning the trajectory of a robotic vehicle to gather data from a de- ployment of stationary sensors to estimate a set of correlated dynamic sources{S (m) } M m=1 . The sources are assumed to be located atl (m) ∈R d ,∀ 1≤ m≤ M. The sources can be modeled as a discrete time stochastic process{S (m) i } i≥1 ,∀ 1≤ m≤ M. The stochastic process{S (m) i } i≥1 for eachm∈ [1:M] is assumed to be i.i.d. over time, whereas the different sources at any instant of timei can be arbitrarily correlated to each other, or in other words,{S (m) i } M m=1 are jointly distributed random variables with joint density function f M (s (1) i ,··· ,s (M) i ) for each i≥ 1. For the purpose of this chapter, we will assume that{S (m) i } M m=1 is a zero mean jointly Gaussian random variable with a given covariance matrix, i.e., (S (1) i ,··· ,S (M) i ) T ∼ N(0,Σ S ). The covariance matrix Σ S is a function of the autocorrelation coeffi- cients between the different sources and are assumed to be constant over time and known at the sensors and the AUV . Note that when the covariance matrix Σ S is diagonal, then the sources are independent of each other. Later in this section we will relax the assumption of knowing the correlation coefficients and we will show that the estimation error at the AUV remains unchanged even without the knowledge ofΣ S . The i.i.d. assumption on the source distributions implies that the time domain sampling rate of the sources is larger than the coherence time of the sources (i.e., the time over which a time varying source may be considered constant). From a practical perspective, this assumption maps to the case wherein the sensors only sample the source when there is a change in the source; as such, innovative information is collected and can be thought to be statistically independent from the previous samples. 138 Each sensor observes the sources through a noisy broadcast channelp(s 1 ,...,s K |s) as shown in Fig. 6.1. The observation of each sensor k, which we refer to as the state of the sensor k, can be modeled as a discrete time stochastic process{S ki } i≥1 . The stochastic process{S ki } i≥1 for each k∈ [1 : K] is assumed to be i.i.d. over time, whereas the state of different sensors at any instant of time i can be arbitrarily correlated to each other, or in other words,{S ki } K k=1 are jointly distributed random variables with joint density function f K (s 1i ,··· ,s Ki ) for each i≥ 1. For the purpose of this chapter, we will assume that{S ki } K k=1 is a zero mean jointly Gaussian random variable with a given covariance matrix, i.e., (S 1i ,··· ,S Ki ) T ∼ N(0,Σ). The covariance matrix can be evaluated by assuming an additive white Gaussian noise channel between the source and each of the sensors, which is given by S ki = M X m=1 h m (L(l (m) ,l s (:,k)))S (m) i +Z ki ,∀ 1≤k≤K, (6.1) where the receiver noise at thek-th sensor isZ ki ∼ N(0,1), andh m (·) is channel coefficient, which is some deterministic function of the distance L(·) between the source m and the sensor k. Note that the covariance matrixΣ K×K is time invariant since the process is i.i.d. We consider the problem of data collection from such a sensor network using a mobile autonomous vehicle equipped with a wireless modem. The moving vehicle can be thought of as a receiver that processes the received output to estimate the source sequence. The locationl v ∈R d of the vehicle is assumed to be known with reasonable fidelity (e.g., using an onboard localization system). The movement of the vehicle is controlled and may be subject to constraints, such as obstacles or vehicle kinematics. Based on these constraints, a traversal costC(l 1 ,l 2 ) is defined for all pairs of pointsl 1 ,l 2 ∈R d . Each of the stationary sensorsk is capable of transmitting a function of its observationX ki = f(S i k ) (note thatX ki is a causal function of the sensor stateS k ) to the vehicle through a communication channel, which is not only influenced by the receiver noise, but also by the presence of the acoustic source in the medium. We assume an expected average transmission power constraint at the sensors such that n X i=1 E(x 2 ki (S i k ))≤nP k ,∀ 1≤k≤K, (6.2) where the expectation is over the random source sequenceS (m)n , andn is the number of samples taken by the vehicle at each location along its path. We also assume that the amount of time n the vehicle stays at a particular location is sufficient to invoke the law of large numbers. The communication channel 139 between the sensors and the AUV is modeled as a noisy state dependent Gaussian multiple access channel p t (y|s 1 ,...,s K ) and its ouput is given by Y i = K X k=1 h k (L(l v (:,t+1),l s (:,k)))X ki (S i k ) + M X m=1 S (m) i +Z i , (6.3) where the receiver noiseZ i ∼ N(0,1) and{h k } K k=1 are channel coefficients, which are again functions of the distance between the vehicle and corresponding sensor. We would like to underscore the difference between our channel model and the standard communication channel. We assume that in addition to the additive noise and the encoded source signals sent by the sensors, the channel output Y i at AUV also depends on the dynamic sources{S (m) i } M m=1 (see Fig. 6.1, which models the channel between the sensors and the AUV as a state dependent channel with source acting as channel state). A motivating example for this state-dependent channel setting would be to consider a dynamic acoustic source, which is to be estimated at the autonomous moving vehicle with the help of a pre-deployed sensor network. The vehicle’s goal is to move along a trajectory collecting data from the sensors to estimate the un- derlying sensor field with maximum fidelity. The fidelity of a source estimate for each of the sources at a particular location is measured by the expected distortion D(l v ) (m) = E(d(S (m)n , ˆ S (m)n )) = 1 n n X i=1 E(d(S (m) i , ˆ S (m) i (Y m ))),∀ 1≤m≤N, (6.4) whered :S× ˆ S→ [0,∞) is a distortion measure between a state symbols∈S and a reconstruction symbol ˆ s ∈ ˆ S, which is a function of the observation y n at a particular location l v . Without loss of generality, we assume that for every symbols∈S there exists a reconstruction symbol ˆ s∈ ˆ S such that d(s,ˆ s) = 0. In this chapter, we will consider squared error distortion d(s,ˆ s) = (s− ˆ s) 2 . The aim of the autonomous vehicle is to estimate the underlyingm-th sourceS (m)n in minimum mean squared error (MMSE), i.e., D(l v ) (m) = min ft(·), ˆ S (m) t (·) 1 n n X i=1 E(S (m) i − ˆ S (m) i (Y n )) 2 , 140 where f t (·) is the encoding function at the sensors. The estimate is made at each location and updated along the trajectory as the vehicle gathers more and more information. 6.1.1 Motion Planning In the context of gathering data from pre-deployed sensor fields, the motion planning optimization problem is to generate a trajectory for an autonomous vehicle that retrieves data from the sensors and minimizes the traversal cost of the trajectory. The autonomous vehicle moves along a trajectoryP = [l v (:,1),··· ,l v (: ,T)] (a trajectory is represented as a collection of points inR d ) to gather data. As the vehicle receives more information from the sensors along its trajectory of travel, it updates the effective distortionD e (P t+1 ) (m) in the estimation of each of the underlying sourcem, which is given by D e (P t+1 ) (m) = tD e (P t ) (m) +D(l v (:,t+1)) (m) t+1 , (6.5) whereD e (P 1 ) (m) =D(l v (:,1)) (m) , andD(l v (:,t)) (m) is given by (6.4). The path planning problem can then be defined depending on the objective function. If we want to minimize the weighted effective distortion of the different sources, we can state the following formal optimization problem: Problem 1. Given a trajectory cost function C(P) = P T t=1 c(P(t− 1),P(t)), and a set of possible trajectoriesP∈ψ, find P ∗ = argmin P∈Ψ M X m=1 β m D e (P) (m) s.t. β m ≥ 0, M X m=1 β m = 1 andC(P)≤B, where β m are pre-established weights that signify the relative importance of the different sources, T is the index of the last point on the trajectory andB is a budget threshold on the cost of the trajectory (e.g., maximum mission time, battery life, or remaining fuel). 141 If, instead, we minimize the maximum distortion, then the path planning problem is now P(t) ∗ = arg min P(t)∈Ψ(t) max{D(l v (:,t)) (m) } M m=1 s.t.C(P)≤B whereP ∗ (t) is the sampling position chosen by the AUV at timet andP ∗ =∪{P(t) ∗ } T t=1 . Remark 30. Instead of using the distortion metric as a fidelity criterion for the source estimate, the amount of information that the receiver retrieves about the source sequence can also be measured by the blockwise normalized mutual information, as is customary in measuring the security of the cipher systems in the literature of the Shannon theory (see [81] for an example). The blockwise normalized mutual information gain can be defined as I = 1 n I(S n ;Y n ). An alternative to minimizing the mean squared error of the estimate is to maximize the blockwise mutual information. This more commonly used metric of normalized mutual information for arbitrary source sequences is related to the squared error distortion metric. The relation can be summarized as follows: (see [110]) E(d(S n , ˆ S n )≥D R = 1 n I(S n ;Y n ) , (6.6) whereD(R) is the distortion–rate function, which is the minimum distortion achieved while reconstructing a source at a remote decoder given a fixed rateR (see[29] for detailed definition). So this information theoretic measure guarantees that if there is coding involved at the sensors, then a large value of the associ- ated mutual information implies reliable estimation at the receiver of the underlying source sequence. For Gaussian sources with additive white Gaussian noises, which is the focus of our chapter, it can be shown that both metrics are equivalent (see [110, 81] for details), i.e., path planning performed with maximum mutual information gain as the cost function will yield the same optimal path as with the one with mini- mum squared error estimation error. For any general source distribution, this mutual information measure would never be smaller than the rate–distortion function (from (6.6)) of the source sequence, computed at the distortion given by the minimum value ofD from Equation 6.4. Hence, minimizing the distortion remains a valid approach even for non-Gaussian source sequences. 142 Although both the metric is quite efficient in planning the trajectory for the robotic vehicle, unlike the squred error distortion metric, maximizing the normalized mutual information gain along the tra- jectory does not explicitly provide a reconstructed source signal, which is one of the main objective of our source monitoring problem. In fact, the normalized mutual information gain 1 n I(S n ;Y n ) = 1 n (h(S n )−h(S n |Y n )) captures the difference between the original source uncertainty and the resid- ual source uncertainty after observing the channel output. So, instead of reconstructing the source subject to some fidelity criterion, the focus of the normalized mutual information gain is to maximize the source uncertainty reduction rate, in which the decoder forms a list of possible reconstructed source-sequences. There is a fundamental difference between the source uncertainty reduction rate and distortion, as under some distortion measure, it may so happen that a source sequence not in the list of decoder may be closer to the original source sequence in the defined distortion measure. 6.1.2 Communication Strategy In this subsection, we propose an encoding strategy at the sensors and a decoding strategy at the AUV , that minimizes the one step distortionD(l v (:,t+1)) (m) in estimating the sourcem, when the AUV is at a par- ticular locationl v (:,t+1) at timet+1. If the vehicle were to know the sensor observations(s 1i ,··· ,s Ki ) perfectly, then the optimal estimate of the underlying sourceS (m) i would be given by ˆ S (m) i (s 1i ,...,s Ki ) = E(S (m) i |(S 1i ,...,S Ki ) = (s 1i ,...,s Ki )). Since the underlying sources{S (m) i } M m=1 are jointly Gaussian with (S 1i ,...,S Ki ), then the optimal estimate ˆ S (m) i for them-th source is a linear function of the sensor observations(s 1i ,...,s Ki ) and is given by ˆ S (m) i = E(S (m) i S 1i ) ··· E(S (m) i S Ki ) Σ −1 s 1i . . . s Ki = Σ c (m,:)Σ −1 s 1i . . . s Ki = Λ(m) s 1i . . . s Ki = K X k=1 λ k (m)s ki . 143 Here Σ c (m,:) = [ E(S (m) i S 1i ) ··· E(S (m) i S Ki ) ] is a row vector withK elements, where thek-th element is the cross-correlation between them-th source and the noisy observation at sensork. Similarly, Λ = Σ c (m,:)Σ −1 is a row vector with dimensionK. Thus, the aim of the vehicle (receiver in this case) is to estimate this linear function of the sensor observations at each location and update it along the path based on a strategy as given below. As discussed in the introduction, the optimal encoding strategy at the sensors that achieve the minimum distortion is an open problem. We assume that the stationary sensors have limited capabilities and hence we choose the encoding functionX Ki =f(S i k ) =α k S Ki , whereα k = p P k /Σ(k,k) is a constant chosen to satisfy the input power constraint ofP k at sensork. We would like to comment thatα k is a function of the distance between the source and the sensors and the covariance matrix of the source Σ S . We assume that the sensors have these information in order to calculate the amplification factor. We will discuss the effect of not knowing these parameters later in the sequel. Note that this simple amplification strategy at the sensors may be suboptimal, but with the practical constraint of limited processing power at the sensors, amplify-and-forward is the most natural coding strategy to consider and offers good performance as seen in the sequel. In fact, it was shown in [37] that in the single source case, the analog scheme using full power is optimal for the complete symmetric case. The symmetric setting is the scenario where the sensors observe the true sourceS under the same amount of noise contamination, and all sensor encoders have the same amount of power to use. However this symmetry requirement is unnecessarily restrictive. Using the results of [113], it can be easily seen that the uncoded scheme is optimal as long asP i σ 4 i (σ 2 +σ 2 i ) −1 remains constant across all the sensor encoders, whereσ 2 ,P i ,σ 2 i are the variance of the hidden Gaussian sourceS, the transmission power constraint and the variance of the sensor noise at sensori, respectively. Note that we have assumed all sensor noise powers to be unity. So according to our notationσ 2 i = 1 h(L(l,ls(:,i))) 2 . Let us compute the one-step distortionD(l v (:,t+1)) (m) in estimating them-th source for the proposed encoder and decoder as the data gathering vehicle moves from positionl v (:,t) tol v (:,t+1). We assume that when the vehicle is at locationl v (:,t), the effective distortion in estimating the sourcem till timet is given byD e (P t ) (m) . To compute the one step distortion, let us look at the the received output at the moving vehicle, when it is at locationl v (:,t+1). It is given by Y i = K X k=1 h k (L(l v (:,t+1),l s (:,k)))α k S ki + M X m=1 S (m) i +Z i = [ S 1i ... S Ki ]h+[ S (1) i ... S (M) i ]1+Z i , 144 whereh and 1 are respectivelyKandM dimensional column vectors with theirj-th component given by h j =h j (L(l v (:,t+1),l s (:,1)))α j and1 j = 1. We choose ˆ S (m) i (y i ) = E(S (m) i |Y i =y i ) = E(S (m) Y) E(Y 2 ) y i , then the expected distortion in estimating them-th source at the AUV , when it is at locationl v (:,t+1) is D(l v (:,t+1)) (m) =σ 2 − E(S (m) Y) 2 E(Y 2 ) , where E(S (m) Y) 2 E(Y 2 ) = (Σ c (m,:)h+Σ S (m,:)1) 2 h T Σh+2h T Σ T c 1+1 T Σ S 1+1 . (6.7) Here Σ c is theM×K cross-correlation matrix between the sources and the sensor observations defined earlier. We would like to comment that the one step distortion D(l v (:,t + 1)) (m) is a function of the distance between the source and the sensors, the sensors and the AUV , and the covariance matrix of the source Σ S . So to calculate this distortion function, the AUV requires the knowledge of these parameters. We will relax these assumptions later in the chapter and we propose a communication strategy that will enable the AUV to compute the distortion function without any prior knowledge about these parameters. As the vehicle traverses froml v (:,t) tol v (:,t+1), the effective distortionD e (P t+1 ) (m) in estimating the sourcem is given by (6.5). Remark 31. The problem of online informative motion planning to estimate an underlying source for a network of sensing agents, each subject to some dynamic constraints and sensor limitations, has also been considered in [76], which focuses on a fixed vector source and quantifies the informativeness of a node given its measurement sequence as a function of the Fisher information metric for its connection to the Cram´ er–Rao lower bound (CRLB) on the error covariance of unbiased estimators [29]. Indeed, the Cram´ er–Rao matrix is exactly the inverse of the Fisher information matrix. However, the one caveat of this approach is that the Cram´ er–Rao lower bound is not always achievable, and hence the Fisher information metric is not a true measure of the actual information gain incurred in such settings. In contrast, we consider an underlying stochastic source (as compared to a fixed vector source), and hence the Fisher information metric in its usual form (see [76]) is not the lower bound on the error covariance in estimating the dynamic source. The dynamic source in our problem setting naturally suggests a stochastic version of the CRLB (see [32]), wherein a further averaging the CRLB is needed to account for the stochatic nature of the source and this bound tends to be even looser if the CRLB for the fixed source is 145 not tight. In contrast to prior work, we provide an encoding strategy at the sensors and a decoding strategy at the receiver, which quantifies the actual information gain at a node in terms of the effective mean squared error distortion. It can be easily shown that, conditioned on the encoding strategy of amplify-and-forward at the sensors (note that the encoding strategy we adopt may be suboptimal as we discuused above), the decoding strategy at the moving vehicle, this approach is optimal as it minimizes the mean squared error in estimating the source sequence. Remark 32. Instead of considering individual power constraints at the sensors, we can also consider the scenario when the sensors have a coupled transmission power constraint given by K X k=1 n X i=1 1 n E(X 2 ki )≤P. Since the sensors have sum power constraint, not all the sensors are active at each instant of time. If the channel between the a sensor and the source or the AUV is very noisy, then it is better not to allocate any power to this sensor. However it might not be always possible to find the best sensor, as the sensor which is closest to the source compared to the other sensors, may not be the one closest to the AUV . Hence at each instant of time, only a subset of sensors remain active. The one step distortionD(l v ) in estimating a source is a minimization over all possible power allocation at the sensors and it is given by D(l v ) = min {P k } K k=1 , P K k=1 P k ≤P σ 2 − E(SY) 2 E(Y 2 ) , where E(SY) E(Y 2 ) is given in (6.7). Note thatD(l v ) is a function of the power allocations at the sensors, since the received outputY at the AUV is a function of(P 1 ,...,P K ) through the amplification factorα k , which sensork chooses to satisfy its transmission power constraintP k . Remark 33. The results can be easily extended to the case where the sensors and the AUV are aware of the covarinace Σ S (m,m) of the M sources, but are oblivious of the correlation coefficients ρ ij = Σ S (i,j)/ p Σ S (i,i)Σ S (j,j), ∀ 1≤ i≤ M, 1≤ j ≤ M between the different sources. We claim that even without the knowledge of the off-diagonal elements in Σ S , the estimation error in estimating the dynamic sources remains the same. We will prove the claim forM = 2 sources and the results can be easily extendible for more than 2 sources. To prove this, we will use a communication strategy that involves two stages. In the first stage, which we call the training phase, the sensors and the AUV both listen to the sources to learn the unknown correlation coefficient and the sensors do not communicate their 146 Figure 6.2: Visualization of an autonomous vehicle gathering data from a deployment of sensors monitor- ing a source signal. We assume the location of the source is uniformly distributed in the source cloud and the dimension of the cloud is known to both the sensors and the AUV . noisy observations of the sources to the AUV . The variance of the noisy observation at the sensor k in the training phase is given by E(S 2 k ) = (h 1 ) 2 Σ S (1,1) + (h 2 ) 2 Σ S (2,2) + 2h 1 h 2 ρ 12 p Σ S (1,1)Σ S (2,2) + 1 (we assumed M = 2) and since the variance the sensor observation is an one-to-one function of the unknown correlation coefficientρ 12 , estimating ρ 12 is equivalent to estimating the variance of S k , which is a normally distributed random variable with mean0 and unknown variance. It is well known that the best unbiased estimator (achieves the Cramer-Rao lower bound) for estimaing the variance of a Gaussian distributed random variable is given by the sample varianceT = 1 n P n i=1 S 2 ki and the mean squared error achieved by the unbiased estimator is given byVar(T) = 2(E(S 2 k )) 2 /n (see [60]), which approaches 0 in the limit asn→∞. Since the sensors are collecting sufficiently large number of source samples for each location of the AUV , we can assume thatn is large and thus the sensors can learn the correlation coefficientρ 12 perfectly. Similarly, the AUV can also learnρ 12 . Given that the sensors and the AUV estimateρ 12 perfectly in the first stage, the second stage involves estimation of the sources with knownΣ S and we can use a amplify-and-forward encoding strategy at the sensors and the linear MMSE at the AUV to achieve the same one step distortion given in (6.7). Note that since the jointly Gaussian sources are i.i.d. over time, the covariance matrix Σ S is fixed and hence we need to estimate the correlation parameters only once and thus the number of samples used in the training phases does not affect the overall long term effective estimation error along a trajectory. 147 6.2 Tracking a source with unknown location So far in our discussion, we have assumed full channel state information at the sensors and the AUV , i.e., knowledge of{h m } M m=1 in (6.1) and{h k } K k=1 in (6.3). Since the channel coefficients are functions of the internode separation, this means that the sensors know the position of the source and the AUV has knowledge about the location of both the sensors and the source. This assumption may not be very realistic, as one could argue that if the AUV knows the position of the source exactly, one could directly place the AUV close to the source and there is no need to deploy the sensors to track the dynamic source. Another example where the assumption of full channel state knowledge at the sensors and the AUV will fail to satisfy is the scenario when the dynamic source is not fixed at a particular location, but is moving across the field. In this section, we extend our framework to include the situation when the sensors and the AUV are oblivious of the exact location of the source. We continue to assume that the AUV knows the placement of the sensors. The sensors and the AUV attempts to estimate the exact location of the source. If the unbiased estimator used by the sensors and the AUV to estimate the position of the source has a meanµ and varianceσ 2 , then we can assume that the position of the source is distributed according to some distribution (p(l), l∈S) in ad-dimensional sphereS, with center at a distanceµ from the exact location of the source and the radius of the sphere is given byσ 2 (see Fig. 6.2). Suppose that the exact location of the source isl∈S. The sensors observe the source through a noisy communication channel, the output of which at sensork is given by (6.1) withM = 1. The sesnors again send a amplified version of their observations to the AUV to satisfy the transmission power constraint. We assume an individual input power constraint on each of the sensors as in (6.2). The amplification factor α(k) = s P k Σ(k,k) = s P k h 2 (L(l,l s (:,k)))σ 2 +1 is a function of the channel coefficient, which in turn depends on the exact location of the sourcel∈R d . Since the sensors are oblivious of the exact location of the source and since the input power constraint has to be met for all possible location of the source inside the sphereS, for a fixedP k , each sensor finds a location on or within the source cloud which is the solution of the following optimization problem ˆ l = argmax l∈S E(S k ) 2 ≡ argmax l∈S h 2 (L(l,l s (:,k))). (6.8) 148 Then the modified amplification factor at sensor k is determined by calculating α(k) assuming that the source is at ˆ l. α(k) derived in this fashion will always satisfy the input power constraint since for any source positionl∈S E(X k ) 2 = E(α k S k ) 2 = P k (h 2 (L(l,l s (:,k)))σ 2 +1) h 2 (L( ˆ l,l s (:,k)))σ 2 +1 ≤P k . If we assumeh(L(l,l s (:,k))) = 1/L(l,l s (:,k)), then E(S k ) 2 = E(h(L(l,l s (:,k)))S i +Z ki ) 2 is a decreas- ing function of the distance between the sensork and the source. Hence, the solution of the optmization problem (6.8) in this case is a point on or within the sphereS whose distance from the sensork is mini- mum. In fact this point can be exactly characterized by finding the point of intersection of the surface of the sphere with the line joining the sensork with the center the sphere andh(L( ˆ l,l s (:,k))) = 1/(d k −σ), whered k is the distance between the sensork and the center of the sphereS. With this communication strategy of the sensors, the AUV performs path planning by evaluating the one step distortion in estimating the source based on an MMSE estimator discussed in the previous sec- tions. However, since the one step distortion is a function of the exact location of the source (which is not known to the AUV), the objective of the AUV is to determine a value for the one step distortionD(l v ), such that it is close to the one step distortion value with the true location of the source known to the AUV . Suppose that the exact location of the source isl∈S and the one step distortion function if the AUV knows the exact location of the source is given byD l (l v ), then without the knowledge of the source location, the AUV chooses a one step distortion functionD(l v ) which is equal to the D(l v ) = Z S D l (l v )p(l)dS = E(D l (l v )), wherep(l)dS is the probability of the source being in an infinitesimal volume aroundl∈S. To calaculate the E(D l (l v )), the AUV randomly selectsn pointsl(j), 1≤j≤n from on or within the sphereS i.i.d. according to the distributionp(·) and the one step distortionD(l v ) is then given by D(l v ) = 1 n n X j=1 D l(j) (l v ). This empirical mean of the distortion value converges to E(D l (l v )) in the limit as the number of sample 149 Figure 6.3: Visualization of an autonomous vehicle gathering data from a deployment of sensors monitor- ing a moving source signal. We assume that at each instant of time the source is uniformly distributed in the source cloud and the cloud is changing position because the source is moving. pointsn→∞ by law of large numbers. As the vehicle traverses froml v (:,t) tol v (:,t+1), the effective distortionD e (P t+1 ) in estimating the source can now be calculated using (6.5). Remark 34. Although for analytical simplicity, we have considered the uncertainty region about source location as spheres, the results can be easily extended to any other shape of the source cloud, for e.g., an ellipsoid, which could be more relevant in practice as the varinace of the error in estimating the true source location may be different along different dimensions. Similarly, the results could also be extended to account for different sizes of uncertainty region for the different sensors and the AUV . 6.3 Moving Sources So far in our discussion, we have assumed that the source is fixed at a particular location throughout the data collection problem. What will happen if the source starts moving with the dynamics of the motion of the source unknown to the sensors and the AUV? We will model the problem of data collection with a moving source as an extension of the problem of path planning with unknown source location, discussed in the last section (see Fig. 6.3). Since the source is moving with an unknown dynamics, the sensors and the AUV knows the position of the source within an uncertainty region (could be a sphereS as in the last section) at each instant of time. The difference between the moving source and source with unknown location is that while in the unknown source location framework, once the position of the source is chosen, 150 it is fixed throughout the data collection time and hence the uncertainty sphere is stationary, but in the moving source case, the uncertainty sphere moves across the sensor field according to the dynamics of the source motion. For example, if the source moves along a straight dotted line as shown in the Fig. 6.3, the union of the uncertainty sphere across time forms a tube. So while in the unknown source location case, we have a fixed communication channel with unknown channel coefficients, in the moving source case the communication channel is not only unknown, but it also time varying. Let us look into our problem formulation more closely. The problem setting has three important pa- rameters which define our region of operation. These parameters are: (a) Time duration between two successive samples at the AUV ,T sam ,(b) Coherence time of the broadcast channel,T coh , which measures the duration of time beyond which the channel coefficient changes significantly due to the movement of the source, and(c)T auv , the time duration that the AUV stays at any particular location. Comparing these pa- rameters with the parameters of a time varying communication channel, we can see thatT sam corresponds to time duration of an channel input symbol,T coh is the coherence time of the time varying channel and T auv is the delay constraint or the duration of one block of channel input symbols. Based on the above comparison, we can safely assume that the sampling rateT sam at the AUV is much smaller than both the coherence time of the channelT coh andT auv . Now depending on the relative values ofT coh andT auv , we operate in two different regimes. IfT coh is much smaller thanT auv , then we can say that the channel is fast fading and ifT coh is of the order ofT auv , then the channel is a slow fading communication channel. In both regimes, we make the assumption that the uncertainty region about the source location remains fixed for T auv and changes its position after every T auv . So in the fast fading environment, since the location of the source changes after every T coh with T coh << T auv , we assume that in the i-th block of communication from [iT auv ,(i + 1)T auv ], the source moves to a random position inside the fixed uncertainty cloud (fixed in thei-th block) after everyT coh . In the slow fading environment however, since T coh ≈ T auv , the position of the source is a fixed unknown random variable inside the uncertainty cloud of thei-th block. So the slow fading environment is similar to the problem of path planning with unknown source location discussed in the last section, the only difference is that while the uncertainty cloud is assumed to be fixed throughout the data collection time in the last section, the cloud changes its position after everyT auv in the moving source case. In both of these regimes, the decoding strategy at the AUV is similar to the one used in last section. At each instant of time, the AUV calculates the one step distortion function by averaging the distortions computed over all poissible realization of the source location within the uncertainty region. So the path 151 planning solution would be same for both the regime. It is easy to see that the decoding strategy employed by the AUV is going to be optimal (conditioned on the amplify-and-forward strategy at the sensors) for the fast fading environment, since in this case at a fixed location of the AUV , the channel coefficient is averaged over its all possible realizations inside the uncertainty cloud. Remark 35. Note that our estimation strategy at the AUV is same for both the slow fading and fast fading environment. We replace the true one step distortion function with the knowledge of the source position at the AUV with an average one step distortion function averaged over all possible realization of the source location in the source cloud. This averaging of the distortion although a very apt choice for the fast fading environemnt (a similar averaging of rate is also performed while sending some digital message over a fast fading communication channel, see [115]), average distortion may not be the best alternative when we are working in slow fading environment. In fact, the difference between the one step distortion computed by our strategyD(l v ) and the one step distortion with the source position known at the AUV can be as large as max l∈S D l (l v )−D(l v ) = max l∈S D l (l v )− E(D l (l v )) in the worst case. Depending on the size of the uncertainty sphere and distribution of the source position in the source cloud, this upper bound can be quite large. The problem of communicating some digital information over a slow-fading channel is widely investigated in in wireless community, where “ǫ outage rate” (see [115] for details) is believed to be the right metric in characterizing the achievable rate region of a communication channel. Our problem setting is different as we are sending a analog source (not digital messages) over a fading communication channel and to the best of our knowledge, there is no work on determining the right metric for the estimation error in estimating a source send over a slow fading communication channel. For a given δ and ǫ, we could similarly define aǫ outage one step distortionD(l v ) for the slowly moving sources as the distortion which is the solution of the following equation P(|D l (l v )−D(l v )| >δ)≤ǫ, whereD l (l v ), the one step distortion with source locationl∈S, is the random variable withl following a distributionp(l) over the uncertainty regionS. This metric would ensure that for any location of the source inside the source cloud, the difference between the true distortion and the computed one is bounded by δ with high probability. However unlike ǫ outage rate, it is extremely difficult to solve this equation in closed form and hence a better tractable metric is needed as an estimate for the one step distortion function for slowly moving sources. 152 6.4 Properties of the Distortion Metric In this section, we show that the distortion metric defined in the previous section is neither monotonic nor submodular, two properties often associated with information optimization objectives [66]. These formal properties will provide insight into the design of motion planning algorithms suitable for optimizing data gathering trajectories that minimize distortion. Definition 3. Let Ω be the set of grid points of the sensor field. A functionf : 2 Ω → R defined on the subsets ofΩ, is said to be monotonically decreasing if for everyT⊆S⊆ Ω, we have thatf(T)≥f(S). Definition 4. A function f(·) defined above, is said to be submodular iff for every X ⊆ Y ⊆ Ω and x∈ Ω\Y we have thatf(X∪{x})−f(X)≥f(Y∪{x})−f(Y). It is easy to see that the effective distortion functionD e (·) is not monotonically decreasing since it is inversely proportional to the channel qualities between the vehicle and the sensors (and thus proportional to the Euclidean distance between the sensors and the vehicle according to our channel model). Hence, along a randomly chosen trajectoryP, the effective distortion may increase or decrease depending on the varying channel qualities along the trajectory. The effective distortion metricD e (·) is also not submodular. To see this, let us consider a trajectory P 1 ⊆P 2 and consider a new point{x}∈ Ω\P 2 . The effective distortions along the trajectory are given by: D e (P 1 ∪{x})−D e (P 1 ) = D(x)−D e (P 1 ) |P 1 |+1 D e (P 2 ∪{x})−D e (P 2 ) = D(x)−D e (P 2 ) |P 2 |+1 , whereD(x) is the one step distortion of the source observationS i at locationx∈ Ω. Since the effective distortion of the source observationsD e (·) is not monotonic (varies randomly along a trajectory depending on the variation of channel quality along the path), there is no definite ordering betweenD e (P 1 ∪{x})− D e (P 1 ) andD e (P 2 ∪{x})−D e (P 2 ) for any randomly selected path in the sensor field. Thus the effective distortion metricD e (·) is also not submodular. 153 6.5 Motion Planning Algorithms We now discuss a sampling-based motion planning algorithm that efficiently generates trajectories to min- imize the distortion metric while also maintaining the cost budget constraints. As discussed above, the distortion metric is neither monotone nor submodular, which excludes motion planning algorithms that rely on these assumptions [11, 100]. The distortion metric is also not convex, and it often contains a num- ber of local minima. Thus, gradient-based methods are likely to perform poorly (see the simulations in the following section). Our approach extends the RRT* [57] and Information-Rich RRT [76] algorithms to provide optimized distortion minimization. The key idea is to sample the configuration space of the vehicle (i.e., locations where the vehicle may visit) and to build up a tree of possible trajectories by incrementally extending candidate trajectories towards the sampled points. The main challenges presented by the distortion metric are (1) calculating the distortion at each node on the tree in an efficient manner, and (2) focusing the tree generation such that candidate paths satisfy the budget requirements. We now discuss how we are able to handle these two challenges. One desirable property of the distortion metric is that the distortion at timet + 1 is fully defined by the next segment of the vehicle’s trajectory, the locations of the sensors, the location of the information source, and the source distortionsD e (P t ) (m) at timet along that trajectory. In the case where we know the locations of the sensors and the location of the information source, it is straightforward to build trajectories in an incremental fashion by storing the trajectory segments and the matrixD e (P t ) (m) at each node. The memory requirements grow linearly asO(MN), whereN is the number of nodes in the tree, andM is the number of sources. In the case of an unknown source, it is sufficient to store an estimate of the distortion and then propagate that estimate forward. The problem of minimizing distortion subject to a budget constraint differs in several key ways from problems typically solved using sampling-based motion planners. In many problem domains, a vehicle must move from one point to another while minimizing the trajectory cost [57, 76]. For the problem of distortion minimization, there is no fixed goal point. Instead, there is a hard constraint on budget at which point the mission time has expired or the vehicle has run out of fuel. To apply sampling-based motion planners to these problems, we make the following modification: if a candidate trajectory would exceed the budget, it will never be extended towards a sampled point. We maintain a closed list of nodes on the 154 0 2 4 6 8 10 0.16 0.18 0.2 0.22 0.24 0.26 Number of Obstacles Mean Squared Error Single Stationary Source, Mission time: 10 hours Random Walk Move to Source Heuristic Distortion Gradient BCDM−RRT 0 2 4 6 8 10 0.16 0.18 0.2 0.22 0.24 0.26 Number of Obstacles Mean Squared Error Single Stationary Source, Mission time: 20 hours Random Walk Move to Source Heuristic Distortion Gradient BCDM−RRT 0 2 4 6 8 10 0.76 0.78 0.8 0.82 0.84 0.86 0.88 Number of Obstacles Mean Squared Error Multiple Stationary Sources, Mission time: 10 hours Random Walk Move to Source Heuristic Distortion Gradient BCDM−RRT 0 2 4 6 8 10 0.76 0.78 0.8 0.82 0.84 0.86 0.88 Number of Obstacles Mean Squared Error Multiple Stationary Sources, Mission time: 20 hours Random Walk Move to Source Heuristic Distortion Gradient BCDM−RRT Figure 6.4: Comparison of the BCDM-RRT sampling-based motion planning algorithm to a gradient-based approach and two heuristic strategies in a 10 km× 10 km environment with obstacles. The simulated vehicle is capable of unconstrained motion at a maximum speed of 1 km/hr. Results shown for a single source and for five sources. The proposed method provides improved estimation of the source signal for a given trajectory length. Each data point is averaged over 1000 random sensor deployments, and error bars are one SEM. tree that represent completed trajectories. 2 As new points are generated, trajectories that are not completed will be extended and eventually lead to completed trajectories. This approach builds up a large number of completed trajectories that efficiently explore the space of trajectories and provide different levels of distortion minimization. The final optimization that we make for the sampling-based motion planner are to rewire the tree if a new trajectory appears to provide lower distortion at a lower cost (see [57]). We also store the locations of the nodes in a KD-tree, which allows for efficient retrieval of the nearest nodes. The full Budget- Constrained Distortion Minimization RRT (BCDM-RRT) algorithm is described in Algorithm 1. The general algorithm allows for constraints on the vehicle’s motion (e.g., non-holonomic constraints) and can also account for obstacles and “no-fly zones" in the environment. 2 We note that this method can also be used in a receding-horizon manner by planning over a budget increment and then re-planning after the budget increment is expended. 155 0 2 4 6 8 10 0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 Number of Obstacles Mean Squared Error Moving Source, Mission time: 10 hours Random Walk Move to Source Heuristic Distortion Gradient BCDM−RRT 0 2 4 6 8 10 0.4 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 Number of Obstacles Mean Squared Error Moving Source, Mission time: 20 hours Random Walk Move to Source Heuristic Distortion Gradient BCDM−RRT Figure 6.5: Simulations with a moving source in a 10 km× 10 km environment with obstacles. Similar to the results with stationary sources, the gradient methods and the BCDM-RRT provide improved mini- mization of squared error versus the heuristic and random methods. Each data point is averaged over 1000 random sensor deployments, and error bars are one SEM. We can also extend the proposed algorithm to multiple vehicles by sampling in the space generated by the cross product of the vehicle’s individual configuration spaces. In the multi-vehicle case, a large number of samples may be required to sufficiently cover the sampling space and generate desirable trajectories. We leave a detailed discussion of multi-vehicle data gathering for distortion metrics for future work. 6.6 Simulated Experiments We now provide simulated experiments to test the proposed motion planning techniques and their effec- tiveness in minimizing the distortion metric. The simulations were performed in C++ on an Ubuntu Linux desktop with a 3.2 GHz Intel i7 processor with 9 GB of RAM. We first examine the performance of the proposed BCDM-RRT sampling-based motion planner in a 10 km× 10 km 2D environment with 10 ran- domly placed sensors and a randomly placed source. The vehicle is capable of unconstrained motion with a maximum speed of 1 km/hr. The environment contains a varying number of circular obstacles generated with random radii up to 5 km. In these simulations, the cost constraint considered is the mission time, which represents the time that vehicle may remain deployed. We compare the BDM-RRT method to a gradient-based approach. Gradient-based optimization meth- ods have previously been used in mobile sensor networks to optimize for localization accuracy [107]. We also compare to a heuristic that moves directly to the source, which has been applied in prior work for robotic data muling [10]. 156 Algorithm 1 Budget-Constrained Distortion Minimization RRT (BCDM-RRT) 1: Input: Step sizeΔ, BudgetB, WorkspaceX all , Free spaceX free , Sensor/source locationsS locations , Starting configurationx start 2: % Calculate initial distortion 3: D init ←InitialDistortion(x start ,S locations ) 4: % Initialize cost and starting node 5: C init ← 0,n←hx start ,C init ,D init i 6: T ←{n},T closed ←∅ 7: while processing time remains do 8: % Sample configuration space of vehicle 9: x samp ←Sample(X all ) 10: % Find nearest point in tree that can be extended 11: n near ←Nearest(x samp ,T\T closed ) 12: % Extend towards nearest point 13: x new ←Steer(x nnear ,x samp ,Δ) 14: ifx new ∈X free then 15: % Update distortion and cost 16: D new ←Distortion(D nnear ,x new ,S locations ) 17: c(x new )←EvaluateCost(x nnear ,x samp ,Δ) 18: C new ←C nnear +c(x new ) 19: n new ←hx new ,C new ,D new i 20: % Set parent of new node 21: parent(n new )←n near 22: % Add to closed list if budget exceeded 23: ifC new >B−Δ then 24: T closed ←T closed ∪{n new } 25: end if 26: T ←T∪{n new } 27: end if 28: % Get near nodes within pre-specified radiusr 29: N near ←Near(x nnew ,T\T closed ,r) 30: % RewireN near if cost/distortion both smaller [57] 31: T ←Rewire(T,N near ) 32: end while 33:P←MinDistortionPath(T closed ) 6.6.1 Stationary Sources Figure 6.4 shows the results from data gathering tours using 1000 random deployments with an increasing number of obstacles for two mission times. Examples are shown for the case of a single source and for the case of multiple (five) sources. In all cases, the BCDM-RRT outperforms the gradient-based method due to its ability to escape local-minima in the distortion function and find a more globally optimal path. The advantage of the BCDM-RRT is greater with increasing mission times and with fewer obstacles (due to fewer constraints on the vehicle’s motion). The BCDM-RRT was run with 100,000 samples, which took approximately 10 seconds per deployment. 157 The benefit of BCDM-RRT over the gradient-based method is also significant in the multi-source case where there is increased variation of the objective function caused by the presence of multiple sources. We also compare to two heuristics that are unaware of the underlying distortion: a random walk and a strategy that moves directly to the information source. The BCDM-RRT and gradient-based methods both outperform these heuristics, which demonstrates the importance of considering distortion in the trajectory optimization. 6.6.2 Moving Sources We also examine the benefit of utilizing the distortion metrics for the case of a single moving source. Figure 6.5 shows results from simulations using a source that moves on a fixed trajectory. The trajectory of the source is known to the vehicle, but the exact location is only known within 1 km (i.e., the uncertainty region is a sphere with radius 1 km). In these simulations, since the location of the source is not known exactly, the heuristic strategy moves to the center of the uncertainty region. Similar to the case of stationary sources, the BCDM-RRT method (with 10,000 samples) and the distortion gradient method outperform the heuristic methods. The gradient- based method is more competitive here because the necessity of estimating the position of the source negates some of the benefit of long-term planning. Even in this challenging scenario, the BCMD-RRT still provides improved performance. 6.6.3 Vehicle Dynamics We next apply our BCDM-RRT algorithm to a autonomous underwater vehicle (AUV) with dynamics. The BCDM-RRT was integrated into the Open Motion Planning Library (OMPL) [31]. The 3D position of the AUV is defined asx = (x,y,z), and the controls are modeled asu = (u f ,u z ,u θ ). Simple vehicle dynamics are modeled using the following equations: ¨ x = u f cosθ, ¨ y = u f sinθ, ¨ z = u z , and ¨ θ = u θ . Figure 6.6 shows an example trajectory in a 3-sensor network where the vehicle veers towards one sensor before moving towards the source and its closest sensor. This sophisticated behavior is possible due to the generality of the BCDM-RRT motion planning algorithm. The demonstration of the proposed techniques to vehicles with motion constraints demonstrates the broad applicability of our approach across platforms and application environments. 158 − 1 0 1 − 1 − 0.5 0 0.5 1 − 1 − 0.5 0 0.5 1 X (km) Y (km) Z (km) Sensor Source AUV path Sensor Sensor Figure 6.6: Trajectory generated for an autonomous underwater vehicle (AUV) with dynamics. The in- formation source is shown as a square, and the sensors are shown as diamonds. The vehicle’s starting point is shown as a circle, and its trajectory is shown as a solid line. The proposed algorithm minimizes the expected distortion along the path and also incorporates general constraints on the vehicle’s trajectory. The vehicle moves in a trajectory that veers towards the bottom sensor before moving towards a location between the information source and its closest sensor. 6.7 Concluding Remarks This chapter has presented an example of path planning problem, which can be modeled as state commu- nication problem over a discrete memoryless state dependent channel. The notion of “action” is evident from this example, since the trajectory chosen by the AUV can be thought of as taking adaptive actions based on its channel observation in order to minimize the estimation error in estimating the source signal. In addition, we have also observed that designing a path planning problem for tracking a source could be extremely challenging, when the underlying source is dynamic and we propose a sampling-based motion planning algorithm for optimizing data gathering tours for minimal distortion, and we show that plan- ning using distortion metrics provides significant improvements in data gathering efficiency versus naive methods. 159 Chapter 7 Action dependent side information: source coding The capacity–distortion results on channels with action dependent states with action dependent side in- formation at the encoder naturally leads to the question of investigating the rate–distortion function of the corresponding distributed source coding problem with action dependent side information VM at the decoder. The problem of characterizing the rate-distortion region for a point-to-point link with action dependent side information was investigated in [91]. However, for cascade source coding models, even with conventional side information sequences (i.e., without VMs as in Fig. 7.1) at Node 2 and Node 3, is generally open. We refer to [111] and references therein for a review of the state of the art on the cascade problem and to [120] for the cascade-broadcast problem. In this chapter, we focus on the cascade source coding problem with side information VMs. The basic cascade source coding model consists of three nodes arranged so that Node 1 communicates with Node 2 and Node 2 to Node 3 over finite-rate links, as illustrated for a computer network scenario in Fig. 1.2 and schematically in Fig. 7.1-(a). Both Node 2 and Node 3 wish to reconstruct a, generally lossy, version of sourceX and have access to different side information sequences. An extension of the cascade model is the cascade-broadcast model of Fig. 7.1-(b), in which an additional "broadcast" link of rateR b exists that is received by both Node 2 and Node 3. Two specific instances of the models in Fig. 7.1 for which a characterization of the rate-distortion performance has been found are the settings considered in [15] and that in [3], which we briefly review here for their relevance to the present work. In [15], the cascade model in Fig. 7.1(a) was considered for the special case in which the side informationY measured at Node 2 is also available at Node 1 (i.e., X = (X,Y)) and we have the Markov chainX−Y−Z so that the side information at Node 3 is degraded with respect to that of Node 2. Instead, in [3], the cascade-broadcast model in Fig. 7.1(b) was considered 160 Figure 7.1: (a) Cascade source coding problem and (b) cascade-broadcast source coding problem. for the special case in which either rate R b or R 1 is zero, and the reconstructions at Node 1 and Node 2 are constrained to be retrievable also at the encoder in the sense of the Common Reconstruction (CR) introduced in [103] (see below for a rigorous definition). The rest of this chapter is organized as follows. Section 7.1 describes the basic channel model with discrete alphabets, derives the achievable rate-distortion-cost trade-offs for the cascade source coding set up with side information vending machine. Section 7.2 studies the basic channel model for cascade- broadcast source coding set up with side information vending machine, determines the rate for lossless reconstruction at the decoders, and investigates the rate–distortion region with common reconstruction constraint on the decoders. Section 7.3 extends the results to the setting where the side information at the decoders are functions of adaptive actions. Finally, Section 7.4 concludes the chapter. 7.1 Cascade Source Coding with A Vending Machine In this section, we first describe the system model for the cascade source coding problem with a side information vending machine of Fig. 7.2. We then present the characterization of the corresponding rate- distortion-cost performance in Sec. 7.1.2. 161 Figure 7.2: Cascade source coding problem with a side information “vending machine” at Node 3. 7.1.1 System Model The problem of cascade source coding of Fig. 7.2, is defined by the probability mass functions (pmfs) p XY (x,y) andp Z|AY (z|a,y) and discrete alphabetsX,Y,Z,A, ˆ X 1 , ˆ X 2 , as follows. The source sequences X n andY n withX n ∈X n andY n ∈Y n , respectively, are such that the pairs (X i ,Y i ) fori∈ [1,n] are independent and identically distributed (i.i.d.) with joint pmfp XY (x,y). Node 1 measures sequences X n and Y n and encodes them in a message M 1 of nR 1 bits, which is delivered to Node 2. Node 2 estimates a sequence ˆ X n 1 ∈ ˆ X n 1 within given distortion requirements to be discussed below. Moreover, Node 2 maps the messageM 1 received from Node 1 and the locally available sequenceY n in a message M 2 ofnR 2 bits, which is delivered to Node 3. Node 3 wishes to estimate a sequence ˆ X n 2 ∈ ˆ X n 2 within given distortion requirements. To this end, Node 3 receives messageM 2 and based on this, it selects an action sequence A n , where A n ∈A n . The action sequence affects the quality of the measurement Z n of sequenceY n obtained at the Node 3. Specifically, givenA n andY n , the sequenceZ n is distributed asp(z n |a n ,y n ) = Q n i=1 p Z|A,Y (z i |y i ,a i ). The cost of the action sequence is defined by a cost function Λ: A→[0,Λ max ] with 0≤ Λ max <∞, as Λ(a n ) = P n i=1 Λ(a i ). The estimated sequence ˆ X n 2 with ˆ X n 2 ∈ ˆ X n 2 is then obtained as a function ofM 2 andZ n . The estimated sequences ˆ X n j forj = 1,2 must satisfy distortion constraints defined by functionsd j (x,ˆ x j ):X× ˆ X j → [0,D max ] with 0≤ D max <∞ forj = 1,2, respectively. A formal description of the operations at the encoder and the decoder follows. Definition 5. An(n,R 1 ,R 2 ,D 1 ,D 2 ,Γ,ǫ) code for the set-up of Fig. 7.2 consists of two source encoders, namely g 1 :X n ×Y n → [1,2 nR1 ], (7.1) 162 which maps the sequencesX n andY n into a messageM 1 ; g 2 :Y n ×[1,2 nR1 ]→ [1,2 nR2 ], (7.2) which maps the sequenceY n and messageM 1 into a messageM 2 ; an “action” function ℓ: [1,2 nR2 ]→A n , (7.3) which maps the messageM 2 into an action sequenceA n ; two decoders, namely h 1 : [1,2 nR1 ]×Y n → ˆ X n 1 , (7.4) which maps the messageM 1 and the measured sequenceY n into the estimated sequence ˆ X n 1 ; h 2 : [1,2 nR2 ]×Z n → ˆ X n 2 , (7.5) which maps the message M 2 and the measured sequence Z n into the the estimated sequence ˆ X n 2 ; such that the action cost constraintΓ and distortion constraintsD j forj = 1,2 are satisfied, i.e., 1 n n X i=1 E[Λ(A i )]≤ Γ (7.6) and 1 n n X i=1 E[d j (X ji , h ji )]≤D j forj = 1,2, (7.7) where we have defined as h 1i and h 2i theith symbol of the function h 1 (M 1 ,Y n ) and h 2 (M 2 ,Z n ), respec- tively. Definition 6. Given a distortion-cost tuple (D 1 ,D 2 ,Γ), a rate tuple (R 1 ,R 2 ) is said to be achievable if, for anyǫ> 0, and sufficiently largen, there exists a(n,R 1 ,R 2 ,D 1 +ǫ,D 2 +ǫ,Γ+ǫ) code. Definition 7. The rate-distortion-cost regionR(D 1 ,D 2 ,Γ) is defined as the closure of all rate tuples (R 1 ,R 2 ) that are achievable given the distortion-cost tuple(D 1 ,D 2 ,Γ). Remark 36. For side information Z available causally at Node 3, i.e., with decoding function (7.5) at Node 3 modified so that ˆ X i is a function ofM 2 andZ i only, the rate-distortion regionR(D 1 ,D 2 ,Γ) has been derived in [1]. 163 In the rest of this section, for simplicity of notation, we drop the subscripts from the definition of the pmfs, thus identifying a pmf by its argument. 7.1.2 Rate-Distortion-Cost Region In this section, a single-letter characterization of the rate-distortion-cost region is derived. Proposition 12. The rate-distortion-cost regionR(D 1 ,D 2 ,Γ) for the cascade source coding problem illustrated in Fig. 7.2 is given by the union of all rate pairs(R 1 ,R 2 ) that satisfy the conditions R 1 ≥ I(X; ˆ X 1 ,A,U|Y) (7.8a) andR 2 ≥ I(X,Y;A)+I(X,Y;U|A,Z), (7.8b) where the mutual information terms are evaluated with respect to the joint pmf p(x,y,z,a,ˆ x 1 ,u) =p(x,y)p(ˆ x 1 ,a,u|x,y)p(z|y,a), (7.9) for some pmfp(ˆ x 1 ,a,u|x,y) such that the inequalities E[d 1 (X, ˆ X 1 )] ≤ D 1 , (7.10a) E[d 2 (X,f(U,Z))] ≤ D 2 , (7.10b) and E[Λ(A)] ≤ Γ, (7.10c) are satisfied for some functionf:U×Z→ ˆ X 2 . Finally,U is an auxiliary random variable whose alphabet cardinality can be constrained as|U|≤|X||Y||A|+3, without loss of optimality. Remark 37. For side informationZ independent of the actionA givenY, i.e., forp(z|a,y) =p(z|y), the rate-distortion regionR(D 1 ,D 2 ,Γ) in Proposition 12 reduces to that derived in [15]. The achievability follows as a combination of the techniques proposed in [91] and [15, Theorem 1]. Here we briefly outline the main ideas, since the technical details follow from standard arguments. For the scheme at hand, Node 1 first maps sequences X n and Y n into the action sequence A n using the standard joint typicality criterion. This mapping requires a codebook of rateI(X,Y;A) (see, e.g., [35, pp. 62-63]). Given the sequenceA n , the sequencesX n andY n are further mapped into a sequenceU n . 164 This requires a codebook of sizeI(X,Y;U|A) for each action sequenceA n from standard rate-distortion considerations [35, pp. 62-63]. Similarly, given the sequences A n and U n , the sequences X n and Y n are further mapped into the estimate ˆ X n 1 for Node 2 using a codebook of rateI(X,Y; ˆ X 1 |U,A) for each codeword pair (U n ,A n ). The thus obtained codewords are then communicated to Node 2 and Node 3 as follows. By leveraging the side informationY n available at Node 2, conveying the codewordsA n ,U n and ˆ X n 1 to Node 2 requires rateI(X,Y;U,A)+I(X,Y; ˆ X 1 |U,A)−I(U,A, ˆ X 1 ;Y) by the Wyner-Ziv theorem [35, p. 280], which equals the right-hand side of (7.8a). Then, sequencesA n andU n are sent by Node 2 to Node 3, which requires a rate equal to the right-hand side of (7.8b). This follows from the rates of the used codebooks and from the Wyner-Ziv theorem, due to the side informationZ n available at Node 3 upon application of the action sequenceA n . Finally, Node 3 produces ˆ X n 2 that leverages through a symbol-by- symbol function as ˆ X 2i = f(U i ,Z i ) fori∈ [1,n]. We will now proof the converse of Proposition 12 for a more general case of adaptive action to be defined in Sec 7.3. 165 7.1.2.1 Proof of the converse Here, we prove the converse part of Proposition 15. Since the setting of Proposition 12 is more restrictive, as it does not allow for adaptive actions, the converse proof for Proposition 12 follows immediately. For any(n,R 1 ,R 2 ,D 1 +ǫ,D 2 +ǫ,Γ+ǫ) code, we have nR 1 ≥H(M 1 ) ≥H(M 1 |Y n ) (a) =I(M 1 ;X n ,Z n |Y n ) =H(X n ,Z n |Y n )−H(X n ,Z n |M 1 ,Y n ) =H(X n |Y n )+H(Z n |X n ,Y n )−H(Z n |Y n ,M 1 )−H(X n |Z n ,Y n ,M 1 ) (a,b) =H(X n |Y n )+H(Z n |X n ,Y n ,M 1 ,M 2 )−H(Z n |Y n ,M 1 ,M 2 )−H(X n |Z n ,Y n ,M 1 ,M 2 ) (c) =H(X n |Y n )−H(X n |Z n ,Y n ,M 1 ,M 2 ,A n , ˆ X n 1 ) + n X i=1 H(Z i |Z i−1 ,X n ,Y n ,M 1 ,M 2 )−H(Z i |Z i−1 ,Y n ,M 1 ,M 2 ) (c,d) ≥ n X i=1 (H(X i |Y i )−H(X i |X i−1 ,Y i ,M 2 ,A i ,Z n , ˆ X 1i )) + n X i=1 H(Z i |Z i−1 ,X n ,Y n ,M 1 ,M 2 ,A i )−H(Z i |Z i−1 ,Y n ,M 1 ,M 2 ,A i ) (e) = n X i=1 I(X i ; ˆ X 1i ,A i ,U i |Y i )+H(Z i |Y i ,A i )−H(Z i |Y i ,A i ) = n X i=1 I(X i ; ˆ X 1i ,A i ,U i |Y i ), (7.11) where (a) follows sinceM 1 is a function of (X n ,Y n ); (b) follows sinceM 2 is a function of (M 1 ,Y n ); (c) follows since A i is a function of (M 2 ,Z i−1 ) and since ˆ X n 1 is a function of (M 1 ,Y n ); (d) fol- lows since conditioning decreases entropy and since X n and Y n are i.i.d.; and (e) follows by defin- ing U i = (M 2 ,X i−1 ,Y i−1 ,A i−1 ,Z n\i ) and since (Z i−1 ,X n ,Y n\i ,M 1 ,M 2 )— (A i ,Y i )—Z i form a Markov chain by construction. We also have nR 2 ≥ H(M 2 ) = I(M 2 ;X n ,Y n ,Z n ) 166 = H(X n ,Y n ,Z n )−H(X n ,Y n ,Z n |M 2 ) = H(X n ,Y n )+H(Z n |X n ,Y n )−H(Z n |M 2 )−H(X n ,Y n |M 2 ,Z n ) = n X i=1 H(X i ,Y i )+H(Z i |Z i−1 ,X n ,Y n )−H(Z i |Z i−1 ,M 2 ) −H(X i ,Y i |X i−1 ,Y i−1 ,M 2 ,Z n ) (a) = n X i=1 H(X i ,Y i )+H(Z i |Z i−1 ,X n ,Y n ,M 2 ,A i )−H(Z i |Z i−1 ,M 2 ,A i ) −H(X i ,Y i |X i−1 ,Y i−1 ,M 2 ,Z n ,A i ) (b) ≥ n X i=1 H(X i ,Y i )+H(Z i |X i ,Y i ,A i )−H(Z i |A i )−H(X i ,Y i |U i ,A i ,Z i ), (7.12) where (a) follows becauseM 2 is a function of(M 1 ,Y n ) and thus of(X n ,Y n ) and becauseA i is a function of (M 2 ,Z i−1 ) and (b) follows since conditioning decreases entropy, since the Markov chain relationship Z i —(X i ,Y i ,A i )—(X n\i ,Y n\i ,M 2 ) holds and by using the definition ofU i . DefiningQ to be a random variable uniformly distributed over [1,n] and independent of all the other random variables and with X △ = X Q , Y △ = Y Q , Z △ = Z Q , A △ = A Q , ˆ X 1 △ = ˆ X 1Q , ˆ X 2 △ = ˆ X 2Q and U △ = (U Q ,Q), from (7.11) we have nR 1 ≥ I(X; ˆ X 1 ,A,U|Y,Q) (a) ≥ H(X|Y)−H(X| ˆ X 1 ,A,U,Y) =I(X; ˆ X 1 ,A,U|Y), where in (a) we have used the fact that (X n ,Y n ) are i.i.d and conditioning reduces entropy. Moreover, from (7.12) we have nR 2 ≥ H(X,Y|Q)+H(Z|X,Y,A,Q)−H(Z|A,Q)−H(X,Y|U,A,Z,Q) (a) ≥ H(XY)+H(Z|X,Y,A)−H(Z|A)−H(X,Y|U,A,Z) = I(XY;U,A,Z)−I(Z;X,Y|A) = I(XY;A)+I(X,Y;U|A,Z), where (a) follows since (X n ,Y n ) are i.i.d, since conditioning decreases entropy, by the definition ofU and by the problem definition. We note that the defined random variables factorize as (7.9) since we have 167 the Markov chain relationshipX—(A,Y)—Z by the problem definition and that ˆ X 2 is a function f(U,Z) ofU andZ by the definition ofU. Moreover, from the cost and distortion constraints (7.6)-(7.7), we have D j +ǫ≥ 1 n n X i=1 E[d j (X i , ˆ X ji )] = E[d j (X, ˆ X j )], forj = 1,2, (7.13a) andΓ+ǫ≥ 1 n n X i=1 E[Λ(A i )] = E[Λ(A)]. (7.13b) To bound the cardinality of auxiliary random variableU, we fixp(z|y,a) and factorize the joint pmf p(x,y,z,a,u,ˆ x 1 ) as p(x,y,z,a,u,ˆ x 1 ) = p(u)p(ˆ x 1 ,a,x,y|u)p(z|y,a). Therefore, for fixedp(z|y,a), the quantities (7.8a)-(7.10c) can be expressed in terms of integrals given by R g j (p(ˆ x 1 ,a,x,y|u))dF(u), forj = 1,...,|X||Y||A| + 3, of functions g j (·) that are continuous on the space of probabilities over alphabet|X|×|Y|×|A|×| ˆ X 1 |. Specifically, we haveg j forj = 1,...,|X||Y||A|− 1, given by the pmf p(a,x,y) for all values of x∈X , y∈Y and a∈A, (except one), g |X||Y||A| = H(X|A,Y, ˆ X 1 ,U =u),g |X||Y||A|+1 =H(X,Y|A,Z,U =u), andg |X||Y||A|+1+j = E[d j (X, ˆ X j )|U = u], forj = 1,2. The proof in concluded by invoking the Fenchel–Eggleston–Caratheodory theorem [35, Appendix C]. 7.1.3 Lossless Compression Suppose that the source sequenceX n needs to be communicated losslessly at both Node 2 and Node 3, in the sense thatd j (x,ˆ x j ) is the Hamming distortion measure forj = 1,2 (d j (x,ˆ x j ) = 0 ifx = ˆ x j and d j (x,ˆ x j ) = 1 ifx6= ˆ x j ) andD 1 = D 2 = 0. We can establish the following immediate consequence of Proposition 12. Corollary 6. The rate-distortion-cost regionR(0,0,Γ) for the cascade source coding problem illustrated in Fig. 7.2 with Hamming distortion metrics is given by the union of all rate pairs(R 1 ,R 2 ) that satisfy the conditions R 1 ≥ I(X;A|Y)+H(X|A,Y) (7.14a) andR 2 ≥ I(X,Y;A)+H(X|A,Z), (7.14b) 168 Figure 7.3: Cascade source coding problem with a side information “vending machine” at Node 2 and Node 3. where the mutual information terms are evaluated with respect to the joint pmf p(x,y,z,a) =p(x,y)p(a|x,y)p(z|y,a), (7.15) for some pmfp(a|x,y) such that E[Λ(A)]≤ Γ. 7.2 Cascade-Broadcast Source Coding with A Side Information Vending Machine In this section, the cascade-broadcast source coding problem with a side information vending machine illustrated in Fig. 7.3 is studied. At first, the rate-cost performance is characterized for the special case in which the reproductions at Node 2 and Node 3 are constrained to be lossless. Then, the lossy version of the problem is considered in Sec. 7.2.4, with an additional common reconstruction requirement in the sense of [103] and assuming degradedness of the side information sequences. 7.2.1 System Model In this section, we describe the general system model for the cascade-broadcast source coding problem with a side information vending machine. We emphasize that, unlike the setup of Fig. 7.2, here, the vending machine is at both Node 2 and Node 3. Moreover, we assume that an additional broadcast link of rateR b is available that is received by Node 2 and 3 to enable both Node 2 and Node 3 so as to take concerted actions in order to affect the side information sequences. We assume the action sequence taken 169 Figure 7.4: Cascade-broadcast source coding problem with a side information “vending machine” at Node 2. by Node 2 and Node 3 to be a function of only the broadcast messageM b sent over the broadcast link of rateR b . The problem is defined by the pmfs p X (x), p YZ|AX (y,z|a,x) and discrete alphabetsX,Y,Z,A, ˆ X 1 , ˆ X 2 , as follows. The source sequenceX n withX n ∈X n is i.i.d. with pmfp X (x). Node 1 measures sequence X n and encodes it into messages M 1 and M b of nR 1 and nR b bits, respectively, which are delivered to Node 2. Moreover, messageM b is broadcast also to Node 3. Node 2 estimates a sequence ˆ X n 1 ∈ ˆ X n 1 and Node 3 estimates a sequence ˆ X n 2 ∈ ˆ X n 2 . To this end, Node 2 receives messagesM 1 and M b and, based only on the latter message, it selects an action sequenceA n , whereA n ∈A n . Node 2 maps messagesM 1 andM b , received from Node 1, and the locally available sequenceY n in a messageM 2 of nR 2 bits, which is delivered to Node 3. Node 3 receives messagesM 2 andM b and based only on the latter message, it selects an action sequenceA n , whereA n ∈A n . GivenA n andX n , the sequencesY n and Z n are distributed asp(y n ,z n |a n ,x n ) = Q n i=1 p YZ|A,X (y i ,z i |a i ,x i ). The cost of the action sequence is defined as in previous section. A formal description of the operations at encoder and decoder follows. Definition 8. An (n,R 1 ,R 2 ,R b ,D 1 ,D 2 ,Γ,ǫ) code for the set-up of Fig. 7.4 consists of two source en- coders, namely g 1 :X n → [1,2 nR1 ]×[1,2 nR b ], (7.16) which maps the sequenceX n into messagesM 1 andM b , respectively; g 2 :[1,2 nR1 ]×[1,2 nR b ]×Y n → [1,2 nR2 ] (7.17) 170 which maps the sequenceY n and messages(M 1 ,M b ) into a messageM 2 ; an “action” function ℓ: [1,2 nR b ]→A n , (7.18) which maps the messageM b into an action sequenceA n ; two decoders, namely h 1 : [1,2 nR1 ]×[1,2 nR b ]×Y n → ˆ X n 1 , (7.19) which maps messagesM 1 andM b and the measured sequenceY n into the estimated sequence ˆ X n 1 ; and h 2 : [1,2 nR2 ]×[1,2 nR b ]×Z n → ˆ X n 2 , (7.20) which maps the messages M 2 and M b into the the estimated sequence ˆ X n 2 ; such that the action cost constraint (7.6) and distortion constraint (7.7) are satisfied. Achievable rates(R 1 ,R 2 ,R b ) and rate-distortion-cost region are defined analogously to Definitions 6 and 7. The rate–distortion–cost region for the system model described above is open even for the case without VM at Node 2 and Node 3 (see [120]). Hence, in the following subsections, we characterize the rate region for a few special cases. As in the previous section, subscripts are dropped from the pmf for simplicity of notation. 7.2.2 Lossless Compression In this section, a single-letter characterization of the rate-cost regionR(0,0,Γ) is derived for the special case in which the distortion metrics are assumed to be Hamming and the distortion constraints areD 1 = 0 andD 2 = 0. 171 Proposition 13. The rate-cost regionR(0,0,Γ) for the cascade-broadcast source coding problem illus- trated in Fig. 7.3 with Hamming distortion metrics is given by the union of all rate triples (R 1 ,R 2 ,R b ) that satisfy the conditions R b ≥ I(X;A) (7.21a) R 1 +R b ≥ I(X;A)+H(X|A,Y) (7.21b) andR 2 +R b ≥ I(X;A)+H(X|A,Z) (7.21c) where the mutual information terms are evaluated with respect to the joint pmf p(x,y,z,a) =p(x,a)p(y,z|a,x), (7.22) for some pmfp(a|x) such that E[Λ(A)]≤ Γ. Remark 38. If R 1 = 0 and R 2 = 0, the rate-cost regionR(Γ) of Proposition 13 reduces to the one derived in [17, Theorem 1]. Remark 39. The rate region (7.21) also describes the rate-distortion region under the more restrictive requirement of lossless reconstruction in the sense of the probabilities of error Pr[X n 6= ˆ X n j ]≤ ǫ for j = 1,2, as it follows from standard arguments (see [35, Sec. 3.6.4]). A similar conclusion applies for Corollary 6. The converse proof for bound (7.21a) follows immediately since A n is selected only as a function of messageM b . As for the other two bounds, namely (7.21b)-(7.21c), the proof of the converse can be established following cut-set arguments and using the point-to-point result of [91]. For achievability, we use the code structure proposed in [91] along with rate splitting. Specifically, Node 1 first maps sequence X n into the action sequenceA n . This mapping requires a codebook of rateI(X;A). This rate has to be conveyed over linkR b by the definition of the problem and is thus received by both Node 2 and Node 3. Given the so obtained sequenceA n , communicatingX losslessly to Node 2 requires rateH(X|A,Y). We split this rate into two rates r 1b and r 1d , such that the message corresponding to the first rate is carried over the broadcast link of rateR b and the second on the direct link of rateR 1 . Note that Node 2 can thus recover sequenceX losslessly. The rateH(X|A,Z) which is required to sendX losslessly to Node 3, is then split into two parts, of ratesr 2b andr 2d . The message corresponding to the rater 2b is sent to Node 3 172 on the broadcast link of the rateR b by Node 1, while the message of rater 2d is sent by Node 2 to Node 3. This way, Node 1 and Node 2 cooperate to transmitX to Node 3. As per the discussion above, the following inequalities have to be satisfied r 2b +r 2d +r 1b ≥ H(X|A,Z), r 1b +r 1d ≥ H(X|A,Y), R 1 ≥ r 1d , R 2 ≥ r 2d , andR b ≥ r 1b +r 2b +I(X;A), Applying Fourier-Motzkin elimination [35, Appendix C] to the inequalities above, the inequalities in (7.21) are obtained. 7.2.3 Example: Switching-Dependent Side Information We now consider the special case of the model in Fig. 7.3 in which the actionsA∈A ={0,1,2,3} acts a switch that decides whether Node 2, Node 3 or either node gets to observe a side informationW . The side information W is jointly distributed with source X according to the joint pmf p(x,w). Moreover, defining as e an "erasure" symbol, the conditional pmfp(y,z|x,a) is as follows: Y = Z = e forA = 0 (neither Node 2 nor Node 3 observes the side information W ); Y = W and Z = e for A = 1 (only Node 2 observes the side informationW );Y = e andZ =W forA = 2 (only Node 3 observes the side informationW ); andY = Z = W forA = 3 (both nodes observe the side informationW ) 1 . We also select the cost function such that Λ(j) = λ j forj∈A. WhenR 1 = R 2 = 0, this model reduces to the ones studied in [17, Sec. III]. The following is a consequence of Proposition 2. Corollary 7. For the setting of switching-dependent side information described above, the rate-cost region (7.21) is given by R b ≥ I(X;A) (7.23a) R 1 +R b ≥ H(X)−p 1 I(X;W|A = 1)−p 3 I(X;W|A = 3) (7.23b) andR 2 +R b ≥ H(X)−p 2 I(X;W|A = 2)−p 3 I(X;W|A = 3) (7.23c) 1 This implies thatp(y,z|x,a) = P w p(w|x)δ(y−w)δ(z− e) fora = 1 and similarly for other values ofa. 173 Figure 7.5: The side information S-channelp(w|x) used in the example of Sec. 7.2.3. where the mutual information terms are evaluated with respect to the joint pmf p(x,y,z,a) =p(x,a)p(y,z|a,x), (7.24) for some pmfp(a|x) such that P 3 j=0 p j λ j ≤ Γ, where we have denotedp j = P[A =j] forj∈A. Proof. The region (7.23) is obtained from the rate-cost region (7.21) by noting that in (7.21b) we have I(X;A)+H(X|A,Y) =H(X)−I(X;Y|A) and similarly for (7.21c). In the following, we will elaborate upon two specific instances of the switching-dependent side infor- mation example. Binary Symmetric Channel (BSC) betweenX andW : Let (X,W) be binary and symmetric so that p(x) = p(w) = 1/2 for x,w∈{0,1} andP[X 6= W] = δ for δ∈ [0,1]. Moreover, let λ j =∞ for j = 0,3 and λ j = 1 otherwise. We set the action cost constraint to Γ = 1. Note that, given this definition ofΛ(a), at each time, Node 1 can choose whether to provide the side informationW to Node 2 or to Node 3 with no further constraints. By symmetry, it can be seen that we can set the pmfp(a|x) with x∈{0,1} anda∈{1,2} to be a BSC with transition probabilityq. This implies thatp 1 = Pr[A = 1] =q and p 2 = Pr[A = 2] = 1−q. We now evaluate the inequality (7.23a) as R b ≥ 0; inequality (7.23b) as R 1 +R b ≥ 1−p 1 I(X;W|A = 1) = 1−qH(δ); and similarly inequality (7.21c) as R 2 +R b ≥ 1−(1−q)H(δ). From these inequalities, it can be seen that, in order to trace the boundary of the rate- cost region, in general, one needs to consider all values of q in the interval [0,1]. This corresponds to appropriate time-sharing between providing side information to Node 2 (for a fraction of timeq) and Node 174 3 (for the remaining fraction of time). Note that, as shown in [17, Sec. III], ifR 1 =R 2 = 0, it is optimal to setq = 1 2 , and thus equally share the side information between Node 2 and Node 3, in order to minimize the rateR b . This difference is due to the fact that in the cascade model at hand, it can be advantageous to provide more side information to one of the two encoders depending on the desired trade-off between the ratesR 1 andR 2 in the achievable rate-cost region. S-Channel betweenX andW : We now consider the special case of Corollary 7 in which (X,W) are jointly distributed so thatp(x) = 1/2 andp(w|x) is the S-channel characterized byp(0|0) = 1−δ and p(1|1) = 1 (see Fig. 7.5). Moreover, we letλ 1 = 1, λ 2 = 0, λ 0 = λ 3 =∞ as above, while the cost constraint is set to Γ≤ 1. As discussed in [17, Sec. III] for this example withR 1 = R 2 = 0, providing side information to Node 2 is more costly and thus should be done efficiently. In particular, given Fig. 7.5, it is expected that biasing the choiceA = 2 whenX = 1 (i.e., providing side information to Node 2) may lead to some gain (see [17]). Here we show that in the cascade model, this gain depends on the relative importance of ratesR 1 andR 2 . To this end, we set p(a|x) as p(1|0) = α and p(1|1) = β for α,β ∈ [0,1]. We now evaluate the inequality (7.23a) asR b ≥ 0; inequality (7.23b) as R 1 +R b ≥ 1− α+β 2 H (1−δ)α α+β −H(1−δ) α α+β ; (7.25) and inequality (7.23c) as R 2 +R b ≥ 1− 2−α−β 2 H (1−δ)(1−α) 2−α−β −H(1−δ) 1−α 2−α−β , (7.26) We now evaluate the minimum weighted sum-rateR 1 +ηR 2 obtained from (7.25)-(7.26) forR b = 0.4, δ = 0.6 and bothΓ = 0.1 andΓ = 0.9. Parameterη≥ 0 rules on the relative importance of the two rates. For comparison, we also compute the performance attainable by imposing that the actionA be selected independent ofX, which we refer to as the greedy approach [91]. Fig. 7.6 plots the difference between the two weighted sum-ratesR 1 +ηR 2 . It can be seen that, asη decreases and thus minimizing rateR 1 to Node 2 becomes more important, one can achieve larger gains by choosing the actionA to be dependent onX. Moreover, this gain is more significant when the action cost budget Γ allows Node 2 to collect a larger fraction of the side information samples. 175 Figure 7.6: Difference between the weighted sum-rateR 1 +ηR 2 obtained with the greedy and with the optimal strategy as per Corollary 7 (R b = 0.4,δ = 0.6). 7.2.4 Lossy Compression with Common Reconstruction Constraint In this section, we turn to the problem of characterizing the rate-distortion-cost regionR(D 1 ,D 2 ,Γ) forD 1 ,D 2 > 0. In order to make the problem tractable 2 , we impose the degradedness conditionX− (A,Y)−Z (as in [17]), which implies the factorization p(y,z|a,x) = p(y|a,x)p(z|y,a); (7.27) and that the reconstructions at Nodes 2 and 3 be reproducible by Node 1. As discussed, this latter con- dition is referred to as the CR constraint [103]. Note that this constraint is automatically satisfied in the lossless case. To be more specific, an (n,R 1 ,R 2 ,R b ,D 1 ,D 2 ,Γ,ǫ) code is defined per Definition 8 with the difference that there are two additional functions for the encoder, namely ψ 1 :X n → ˆ X n 1 (7.28a) andψ 2 :X n → ˆ X n 2 , (7.28b) 2 As noted earlier, the problem is open even in the case with no VM [120]. 176 which map the source sequence into the estimated sequences at the encoder, namelyψ 1 (X n ) andψ 2 (X n ), respectively; and the CR requirements are imposed, i.e., Pr[ψ 1 (X n )6= h 1 (M 1 ,M b ,Y n )] ≤ ǫ (7.29a) and Pr[ψ 2 (X n )6= h 2 (M 2 ,M b ,Z n )] ≤ ǫ, (7.29b) so that the encoder’s estimatesψ 1 (·) andψ 2 (·) are equal to the decoders’ estimates (cf. (7.19)-(7.20)) with high probability. Proposition 14. The rate-distortion regionR(D 1 ,D 2 ,Γ) for the cascade-broadcast source coding prob- lem illustrated in Fig. 7.3 under the CR constraint and the degradedness condition (7.27) is given by the union of all rate triples(R 1 ,R 2 ,R b ) that satisfy the conditions R b ≥ I(X;A) (7.30a) R 1 +R b ≥ I(X;A)+I(X; ˆ X 1 , ˆ X 2 |A,Y) (7.30b) R 2 +R b ≥ I(X;A)+I(X; ˆ X 2 |A,Z) (7.30c) andR 1 +R 2 +R b ≥ I(X;A)+I(X; ˆ X 2 |A,Z)+I(X; ˆ X 1 |A,Y, ˆ X 2 ), (7.30d) where the mutual information terms are evaluated with respect to the joint pmf p(x,y,z,a,ˆ x 1 ,ˆ x 2 ) =p(x)p(a|x)p(y|x,a)p(z|a,y)p(ˆ x 1 ,ˆ x 2 |x,a), (7.31) such that the inequalities E[d j (X, ˆ X j )] ≤ D j , forj = 1,2, (7.32a) and E[Λ(A)] ≤ Γ, (7.32b) are satisfied. Remark 40. If eitherR 1 = 0 orR b = 0 and the side informationY is independent of the actionA given X, i.e., p(y|a,x) =p(y|x), the rate-distortion regionR(D 1 ,D 2 ,Γ) of Proposition 14 reduces to the one derived in [3, Proposition 10]. The achievability follows similar to Proposition 13. Specifically, Node 1 first maps sequenceX n into 177 the action sequenceA n . This mapping requires a codebook of rateI(X;A). This rate has to be conveyed over linkR b by the definition of the problem and is thus received by both Node 2 and Node 3. The source sequenceX n is mapped into the estimate ˆ X n 2 for Node 3 using a codebook of rateI(X; ˆ X 2 |A) for each sequenceA n . Communicating ˆ X n 2 to Node 2 requires rateI(X; ˆ X 2 |A,Y) by the Wyner-Ziv theorem. We split this rate into two ratesr 2b andr 2d , such that the message corresponding to the first rate is carried over the broadcast link of rateR b and the second on the direct link of rateR 1 . Note that Node 2 can thus recover sequence ˆ X n 2 . Communicating ˆ X n 2 to Node 3 requires rateI(X; ˆ X 2 |A,Z) by the Wyner-Ziv theorem. We split this rate into two ratesr 0b andr 0d . The message corresponding to the rater 0b is send to Node 3 on the broadcast link of the rateR b by Node 1, while the message of rater 0d is sent by Node 2 to Node 3. This way, Node 1 and Node 2 cooperate to transmit ˆ X 2 to Node 3. Finally, the source sequenceX n is mapped by Node 1 into the estimate ˆ X n 1 for Node 2 using a codebook of rateI(X; ˆ X 1 |A, ˆ X 2 ) for each pair of sequences (A n , ˆ X n 2 ). Using the Wyner-Ziv coding, this rate is reduced toI(X; ˆ X 1 |A,Y, ˆ X 2 ) and split into two ratesr 1b andr 1d , which are sent through linksR b andR 1 , respectively. As per discussion above, the following inequalities have to be satisfied r 0b +r 0d +r 2b ≥ I(X; ˆ X 2 |A,Z), r 2b +r 2d ≥ I(X; ˆ X 2 |A,Y), r 1b +r 1d ≥ I(X; ˆ X 1 |A,Y, ˆ X 2 ), R 1 ≥ r 1d +r 2d , R 2 ≥ r 0d , andR b ≥ r 1b +r 2b +r 0b +I(X;A), Applying Fourier-Motzkin elimination [35, Appendix C] to the inequalities above, the inequalities in (7.30) are obtained. We now provide the proof of the converse. 7.2.4.1 Proof of the converse Here, we prove the converse parts of Proposition 14 and Proposition 16. We start by proving Proposition 14. The proof of Proposition 16 will follow by setting Z =∅, and noting that in the proof below the actionA i can be made to be a function ofY i−1 , in addition to being a function ofM b , without modifying 178 any steps of the proof. By the CR requirements (7.29), we first observe that for any(n,R 1 ,R 2 ,R b ,D 1 + ǫ,D 2 +ǫ,Γ+ǫ) code, we have the Fano inequalities H(ψ 1 (X n )|h 1 (M 1 ,M b ,Y n )) ≤ nδ(ǫ), (7.33a) andH(ψ 2 (X n )|h 2 (M 2 ,M b ,Z n )) ≤ nδ(ǫ), (7.33b) whereδ(ǫ) denotes any function such thatδ(ǫ)→ 0 ifǫ→ 0. Next, we have nR b ≥H(M b ) (a) =I(M b ;X n ,Y n ) =H(X n ,Y n )−H(X n ,Y n |M b ) (a) =H(X n )+H(Y n |X n ,M b )−H(X n ,Y n |M b ) (b) = n X i=1 H(X i )+H(Y i |Y i−1 ,X n ,M b ,A i )−H(X i ,Y i |X i−1 ,Y i−1 ,M b ,A i ) = n X i=1 H(X i )+H(Y i |Y i−1 ,X n ,M b ,A i )−H(X i |X i−1 ,Y i−1 ,M b ,A i ) −H(Y i |X i ,Y i−1 ,M b ,A i ) (c) = n X i=1 H(X i )+H(Y i |X i ,A i )−H(X i |X i−1 ,Y i−1 ,M b ,A i )−H(Y i |X i ,A i ) (d) ≥ n X i=1 H(X i )−H(X i |A i ), (7.34) where (a) follows sinceM b is a function ofX n ; (b) follows sinceA i is a function ofM b and since 179 X n is i.i.d.;(c) follows since(Y i−1 ,X n\i ,M b )—(A i ,X i )—Y i forms a Markov chain by problem defini- tion; and (d) follows conditioning reduces entropy. In the following, for simplicity of notation, we write h 1 , h 2 ,ψ 1 ,ψ 2 for the values of corresponding functions in Sec. 7.2.4. Next, We can also write n(R 1 +R b ) ≥ H(M 1 ,M b ) (a) = I(M 1 ,M b ;X n ,Y n ,Z n ) = H(X n ,Y n ,Z n )−H(X n ,Y n ,Z n |M 1 ,M b ) = H(X n )+H(Y n ,Z n |X n )−H(Y n ,Z n |M 1 ,M b )−H(X n |Y n ,Z n ,M 1 ,M b ) (b) = H(X n )+H(Y n ,Z n |X n ,M b )−H(Y n |M 1 ,M b ) −H(Z n |M 1 ,M b ,Y n ,A n )−H(X n |Y n ,Z n ,M 1 ,M b ,M 2 ,A n ) (b,c) = n X i=1 H(X i )+H(Y i ,Z i |X i ,A i )−H(Y i |Y i−1 ,M 1 ,M b ,A i ) −H(Z i |Z i−1 ,M 1 ,M b ,Y n ,A n )−H(X i |X i−1 ,Y n ,Z n ,M 1 ,M b ,A n ,M 2 , h 1 , h 2 ) (d) ≥ n X i=1 H(X i )+H(Y i |X i ,A i )+H(Z i |Y i ,A i )−H(Y i |A i )−H(Z i |Y i ,A i ) −H(X i |Y i ,A i , h 1 , h 2 ) = n X i=1 I(X i ;Y i ,A i , h 1 , h 2 )−I(Y i ;X i |A i ) = n X i=1 I(X i ;Y i ,A i , h 1 , h 2 ,ψ 1 ,ψ 2 )−I(X i ;ψ 1 ,ψ 2 |Y i ,A i , h 1 , h 2 )−I(Y i ;X i |A i ) (e) ≥ n X i=1 I(X i ;Y i ,A i ,ψ 1 ,ψ 2 )−H(ψ 1 ,ψ 2 |Y i ,A i , h 1 , h 2 ) +H(ψ 1 ,ψ 2 |Y i ,A i , h 1 , h 2 ,X i )−I(Y i ;X i |A i ) (f) ≥ n X i=1 I(X i ;Y i ,A i ,ψ 1 ,ψ 2 )−I(Y i ;X i |A i )+nδ(ǫ) = n X i=1 I(X i ;A i )+I(X i ;ψ 1 ,ψ 2 |Y i ,A i )+nδ(ǫ), (7.35) where (a) follows because (M 1 ,M b ) is a function of X n ; (b) follows because M b is a function of X n , A n is a function ofM b andM 2 is a function of (M 1 ,M b ,Y n ); (c) follows sinceH(Y n ,Z n |X n ,M b ) = P n i=1 H(Y i ,Z i |Y i−1 ,Z i−1 ,X n ,M b ,A i ) = P n i=1 H(Y i ,Z i |X i ,A i ) and since h 1 and h 2 are functions of (M 1 ,M b ,Y n ) and(M 2 ,M b ,Z n ), respectively and because(Y i ,Z i )—(X i ,A i )—(X n\i ,Y i−1 ,Z i−1 ,M b ) 180 forms a Markov chain; (d) follows since conditioning reduces entropy, since side information VM follows p(y n ,z n |a n ,x n )= Q n i=1 p Y|A,X (y i |a i ,x i ) p Z|A,Y (z i |a i ,y i ) from (7.27) and because Z i —(Y i ,A i )— (Y n\i ,Z i−1 ,M 1 ,M b ) forms a Markov chain; (e) follows by the chain rule for mutual information and the fact that mutual information is non-negative; and (f) follows by the Fano inequality (7.33) and because entropy is non-negative. We can also write n(R 2 +R b ) ≥ H(M 2 ,M b ) (a) = I(M 2 ,M b ;X n ,Y n ,Z n ) = H(X n ,Y n ,Z n )−H(X n ,Y n ,Z n |M 2 ,M b ) (a) = H(X n )+H(Y n ,Z n |X n ,M b )−H(Z n |M 2 ,M b )−H(X n ,Y n |Z n ,M 2 ,M b ) (b) = n X i=1 H(X i )+H(Y i ,Z i |Y i−1 ,Z i−1 ,X n ,M b ,A i )−H(Z i |Z i−1 ,M 2 ,M b ,A i ) − H(X i ,Y i |X i−1 ,Y i−1 ,M 2 ,M b ,Z n ,A i ) = n X i=1 H(X i ,Y i )−H(Y i |X i )+H(Y i ,Z i |Y i−1 ,Z i−1 ,X n ,M b ,A i ) − H(Z i |Z i−1 ,M 2 ,M b ,A i )−H(X i ,Y i |X i−1 ,Y i−1 ,M 2 ,M b ,Z n ,A i ) 181 (c) = n X i=1 H(X i ,Y i )−H(Y i |X i )+H(Y i |X i ,A i )+H(Z i |A i ,Y i ,X i ) −H(Z i |Z i−1 ,M 2 ,M b ,A i )−H(X i ,Y i |X i−1 ,Y i−1 ,M 2 ,M b ,Z n ,A i ) (d) = n X i=1 H(X i ,Y i )−I(Y i ;A i |X i )+H(Z i |A i ,Y i ,X i )−H(Z i |Z i−1 ,M 2 ,M b ,A i ) −H(X i ,Y i |X i−1 ,Y i−1 ,M 2 ,M b , h 2 ,Z n ,A i ) (e) ≥ n X i=1 H(X i ,Y i )+I(X i ;A i )−I(Y i ,X i ;A i )+H(Z i |A i ,Y i ,X i ) −H(Z i |A i )−H(X i ,Y i |h 2 ,A i ,Z i ) = n X i=1 I(X i ,Y i ; h 2 ,A i ,Z i ,ψ 2i )−I(X i ,Y i ;ψ 2i |h 2 ,A i ,Z i )+I(X i ;A i ) −I(Y i ,X i ;A i )−I(X i ,Y i ;Z i |A i ) ≥ n X i=1 I(X i ,Y i ;A i ,Z i ,ψ 2i )−H(ψ 2i |h 2 ,A i ,Z i )+H(ψ 2i |h 2 ,A i ,X i ,Y i ,Z i ) +I(X i ;A i )−I(X i ,Y i ;Z i ,A i ) (f) ≥ n X i=1 I(X i ;A i )+I(X i ,Y i ;ψ 2i |A i ,Z i )+nδ(ǫ), (7.36) where (a) follows since M b is a function of X n and because M 2 is a function of (M 1 ,M b ,Y n ) and thus of (X n ,Y n ); (b) follows since A i is a function of M b and since X n is i.i.d.; (c) follows since (Y i ,Z i )—(X i ,A i )—(X n\i ,Y i−1 ,Z i−1 ,M b ) forms a Markov chain and since p(y n ,z n |a n ,x n )= Q n i=1 p Y|A,X (y i |a i ,x i )p Z|A,Y (z i |a i ,y i );(d) follows since h 2 is a function of(M 2 ,M b ,Z n ); (e) follows since conditioning reduces entropy; and (f) follows since entropy is non-negative and using the Fanos inequality. Moreover, with the definitionM = (M 1 ,M 2 ,M b ), we have the chain of inequalities n(R 1 +R 2 +R b )≥H(M) (a) =I(M;X n ,Y n ,Z n ) =H(X n ,Y n ,Z n )−H(X n ,Y n ,Z n |M) (a) =H(X n )+H(Y n ,Z n |X n ,M b )−H(X n ,Y n ,Z n |M) 182 =I(X n ;A n )+H(Y n ,Z n |X n ,M b )−H(Y n ,Z n |M) −H(X n |Y n ,Z n ,M)+H(X n |A n ) =I(X n ;A n )+H(Y n ,Z n |X n ,M b )−H(Y n ,Z n |M)+I(X n ;Y n ,Z n ,M|A n ) =I(X n ;A n )+I(M;X n |Y n ,A n ,Z n )+H(Y n ,Z n |X n ,M b ) −H(Y n ,Z n |M)+I(X n ;Y n ,Z n |A n ) (b) =H(X n )−H(X n |A n )+H(X n |Y n ,A n ,Z n )−H(X n |Y n ,A n ,Z n ,M) −H(Y n ,Z n |M)+H(Y n ,Z n |A n ) =H(X n )−H(X n |A n )+H(X n ,Y n ,Z n |A n )−H(X n |Y n ,A n ,Z n ,M) −H(Y n ,Z n |M) =H(X n )+H(Y n ,Z n |A n ,X n )−H(X n |Y n ,A n ,Z n ,M)−H(Y n ,Z n |M) (c) = n X i=1 H(X i )+H(Y i |A i ,X i )+H(Z i |A i ,Y i )−H(X i |X i−1 ,Y n ,A n ,Z n ,M) −H(Z i |Z i−1 ,M,A i )−H(Y i |Y i−1 ,Z n ,M,A i ) (d) = n X i=1 H(X i )+H(Y i |A i ,X i )+H(Z i |A i ,Y i )−H(X i |X i−1 ,Y n ,A n ,Z n ,M, h 1 , h 2 ) −H(Z i |Z i−1 ,M,A i )−H(Y i |Y i−1 ,Z n ,M,A i , h 2 ) ≥ n X i=1 H(X i )+H(Y i |A i ,X i )+H(Z i |A i ,Y i )−H(X i |Y i ,A i , h 1 , h 2 ) −H(Z i |A i )−H(Y i |Z i ,A i , h 2 ) (e) ≥I(X i ;A i ,Y i ,ψ 1 ,ψ 2 )+H(Y i |A i ,X i )+H(Z i |A i ,Y i ) −H(Z i |A i )−H(Y i |Z i ,A i ,ψ 2 )−nδ(ǫ), (7.37) where(a) follows since(M 1 ,M b ) is a function ofX n andM 2 is a function of(M 1 ,M b ,Y n );(b) follows since H(Y n ,Z n |X n ,M b ) = P n i=1 H(Y i ,Z i | Y i−1 ,Z i−1 ,X n ,M b ,A i ) = P n i=1 H(Y i ,Z i | X i ,A i ) = H(Y n ,Z n |X n ,A n ); (c) follows sinceA i is a function ofM b ; (d) follows since h 1 , h 2 are functions of (M,Y n ) and(M,Z n ), respectively; and(e) follows since entropy is non-negative and by Fano’s inequal- ity. Next, from (7.37) we have 183 n(R 1 +R 2 +R b )≥I(X i ;A i ,Y i ,ψ 1 ,ψ 2 )+H(Y i |A i ,X i )+H(Z i |A i ,Y i )−H(Z i |A i ) −H(Y i ,Z i |A i ,ψ 2 )+H(Z i |A i ,ψ 2 )−nδ(ǫ) =I(X i ;A i ,Y i ,ψ 1 ,ψ 2 )+H(Y i |A i ,X i )−H(Z i |A i )−H(Y i |A i ,ψ 2 ) +H(Z i |A i ,ψ 2 )−nδ(ǫ) =I(X i ;A i ,Y i ,ψ 1 ,ψ 2 )−I(X i ;Y i |A i ,ψ 2 )−I(Z i ;ψ 2 |A i )−nδ(ǫ) (a) =I(X i ;A i ,Y i ,ψ 1 ,ψ 2 )−I(X i ;Y i |A i ,ψ 2 )−I(Y i ;A i |X i )−I(Z i ;Y i |A i ) +I(Y i ;A i ,ψ 2 |X i )+I(Z i ;Y i |ψ 2 ,A i )−nδ(ǫ) (b) =I(X i ;A i ,Y i ,ψ 1 ,ψ 2 )−I(X i ;Y i |A i ,ψ 2 )+I(X i ;A i )−I(Y i ,X i ;A i ) −I(Z i ;X i ,Y i |A i )+I(X i ,Y i ;A i ,ψ 2 )+I(Z i ;X i ,Y i |ψ 2 ,A i )−I(X i ;A i ,ψ 2 )−nδ(ǫ) =I(X i ;A i )+I(X i ;A i ,Y i ,ψ 1 ,ψ 2 )+I(X i ,Y i ;A i ,ψ 2 ,Z i )−I(A i ,Z i ;X i ,Y i ) −I(X i ;Y i ,A i ,ψ 2 )−nδ(ǫ) =I(X i ;A i )+I(X i ;A i ,Y i ,ψ 1 ,ψ 2 )+I(X i ,Y i ;ψ 2 |A i ,Z i )−I(X i ;Y i ,A i ,ψ 2 )−nδ(ǫ) =I(X i ;A i )+I(X i ,Y i ;ψ 2 |A i ,Z i )+I(X i ;ψ 1 |A i ,Y i ,ψ 2 )−nδ(ǫ), (7.38) where(a) is true since I(Y i ;A i |X i )+I(Z i ;Y i |A i )−I(Y i ;A i ,ψ 2 |X i )−I(Z i ;Y i |ψ 2 ,A i ) =H(Y i |X i )−H(Y i |X i ,A i )+H(Z i |A i )−H(Z i |A i ,Y i )−H(Y i |X i )+H(Y i |X i ,A i ) −H(Z i |ψ 2 ,A i )+H(Z i |A i ,Y i ) =H(Z i |A i )−H(Z i |ψ 2 ,A i ); (b) follows becauseI(Z i ;X i ,Y i |A i ) =I(Z i ;Y i |A i ) andI(Z i ;X i ,Y i |A i ,ψ 2 ) =I(Z i ;Y i |A i ,ψ 2 ). Next, define ˆ X ji =ψ ji (X n ) forj = 1,2 andi = 1,2,...,n and letQ be a random variable uniformly distributed over [1,n] and independent of all the other random variables and with X △ = X Q , Y △ = Y Q , A △ =A Q , from (7.34), we have nR b ≥H(X|Q)−H(X|A,Q) (a) ≥ H(X)−H(X|A) =I(X;A), 184 where (a) follows sinceX n is i.i.d. and since conditioning decreases entropy. Next, from (7.35), we have n(R 1 +R b ) ≥ I(X;A|Q)+I(X; ˆ X 1 , ˆ X 2 |Y,A,Q) (a) ≥ I(X;A)+I(X; ˆ X 1 , ˆ X 2 |Y,A), where (a) follows sinceX n is i.i.d., since conditioning decreases entropy and by the problem definition. From (7.36), we also have n(R 2 +R b ) ≥ I(X;A|Q)+I(X,Y; ˆ X 2 |A,Z,Q) (a) ≥ I(X;A)+H(X,Y|A,Z,Q)−H(X,Y|A,Z, ˆ X 2 ) (b) = I(X;A)+H(Y|A,Z)+H(X|A,Y,Z)−H(X,Y|A,Z, ˆ X 2 ) = I(X;A)+I(X,Y; ˆ X 2 |A,Z) ≥ I(X;A)+I(X; ˆ X 2 |A,Z) where (a) follows sinceX n is i.i.d. and by conditioning reduces entropy; and (b) follows by the problem definition. Finally, from (7.38), we have n(R 1 +R 2 +R b )≥I(X,A|Q)+I(X,Y; ˆ X 2 |A,Z,Q)+I(X; ˆ X 1 |A,Y, ˆ X 2 ,Q) (a) ≥ I(X,A)+H(X,Y|A,Z,Q)−H(X,Y|A,Z, ˆ X 2 )+I(X; ˆ X 1 |A,Y, ˆ X 2 ) (b) = I(X;A)+H(Y|A,Z)+H(X|A,Y,Z)−H(X,Y|A,Z, ˆ X 2 )+I(X; ˆ X 1 |A,Y, ˆ X 2 ) =I(X;A)+I(X,Y; ˆ X 2 |A,Z)+I(X; ˆ X 1 |A,Y, ˆ X 2 ) ≥I(X;A)+I(X; ˆ X 2 |A,Z)+I(X; ˆ X 1 |A,Y, ˆ X 2 ) (7.39) where (a) follows sinceX n is i.i.d, since conditioning decreases entropy, and by the problem definition; and (b) follows by the problem definition. From cost constraint (7.6), we have Γ+ǫ≥ 1 n n X i=1 E[Λ(A i )] = E[Λ(A)]. (7.40) 185 Moreover, letB be the eventB ={(ψ 1 (X n )6=h 1 (M 1 ,M b ,Y n ))∧(ψ 2 (X n )6=h 2 (M 2 ,M b ))}. Using the CR requirement (7.29), we have Pr(B)≤ǫ. Forj = 1,2, we have E h d(X j , ˆ X j ) i = 1 n n X i=1 E h d(X ji , ˆ X ji ) i = 1 n n X i=1 E h d(X ji , ˆ X ji ) B i Pr(B)+ 1 n n X i=1 E h d(X ji , ˆ X ji ) B c i Pr(B c ) (a) ≤ 1 n n X i=1 E h d(X ji , ˆ X ji ) B c i Pr(B c )+ǫD max (b) ≤ 1 n n X i=1 E[d(X ji ,h ji )]+ǫD max (c) ≤ D j +ǫD max , (7.41) where (a) follows using the fact that Pr(B)≤ ǫ and that the distortion is upper bounded by D max ; (b) follows by the definition of ˆ X ji andB; and (c) follows by (7.7). 7.3 Adaptive Actions In this section, we assume that actions taken by the nodes are not only a function of the messageM 2 for the model of Fig. 7.2 orM b for the models of Fig. 7.3 and Fig. 7.4, respectively, but also a function of the past observed side information samples. Following [22], we refer to this case as the one with adaptive actions. Note that for the cascade-broadcast problem, we consider the model in Fig. 7.4, which differs from the one in Fig. 7.3 considered thus far in that the side informationZ is not available at Node 3. At this time, it appears to be problematic to define adaptive actions in the presence of two nodes that observe different side information sequences. For the cascade model in Fig. 7.2, a (n,R 1 ,R 2 ,D 1 ,D 2 ,Γ) code is defined per Definition 5 with the difference that the action encoder (7.3) is modified to be ℓ: [1,2 nR2 ]×Z i−1 →A, (7.42) 186 which maps the messageM 2 and the past observed decoder side information sequenceZ i−1 into theith symbol of the action sequenceA i . Moreover, for the cascade-broadcast model of Fig. 7.4, the “action” function (7.18) in Definition 8 is modified as ℓ: [1,2 nR b ]×Y i−1 →A, (7.43) which maps the messageM b and the past observed decoder side information sequenceY i−1 into theith symbol of the action sequenceA i . Proposition 15. The rate-distortion-cost regionR(D 1 ,D 2 ,Γ) for the cascade source coding problem illustrated in Fig. 7.2 with adaptive action-dependent side information is given by the rate region described in Proposition 12. Proposition 16. The rate-distortion-cost regionR(D 1 ,D 2 ,Γ) for the cascade-broadcast source coding problem under the CR illustrated in Fig. 7.4 with adaptive action-dependent side information is given by the region described in Proposition 14 by settingZ =∅. Remark 41. The results above show that enabling adaptive actions does not increase the achievable rate-distortion-cost region. These results generalize the observations in [22] for the point-to-point setting, wherein a similar conclusion is drawn. To establish the propositions above, we only need to prove the converse. The proofs for Proposition 15 and Proposition 16 are given above. 7.4 Concluding Remarks In this chapter, we consider few distributed source coding problems for multi-terminal networks with action dependent side information at some of the decoders. These source coding problems can be interpreted as source coding dual of the channel coding problems considered in Chapter 5. Our results extends to multi- hop scenarios the conclusion in [91] that a joint representation of data and control messages enables an efficient use of the available communication links. In particular, layered coding strategies prove to be optimal for all the considered models, in which, the base layer fulfills two objectives: determining the actions of downstream nodes and simultaneously providing a coarse description of the source. Moreover, the examples provided in the paper demonstrate the dependence of the optimal coding design on network topology action costs. 187 Chapter 8 Conclusions and Future Work The problem of joint information transmission and channel state estimation over a DMC with DM state was studied in this thesis. To study the trade-off between the rate of information transmission and estimation error in estimating the state of the channel, we have started with a problem of computing the fundamen- tal limit of communication (no state estimation) of a state-dependent relay channel and then went on to investigate the capacity–distortion function. We have summarized the results of thesis below. 8.1 Summary In Chapter 2, we have started with deriving a single-letter expressions for the achievable rates and an upper bound on the capacity of a relay channel with inter-symbol interference and additive colored Gaussian noise. The channel state or channel impulse responses are assumed to be known at both the encoders and decoders. We have examined two important relay channel coding strategies, decode-and-forward and compress-and-forward; further, we provide an upper bound that is a generalization of the cut-set bound for multi-terminal networks. Some of the conditions for which the channel capacity can be computed, such as degraded relay channels, are delineated. The proof methods rely on the decomposition of the multipath channel into parallel channels via a DFT decomposition. Thus, our results suggest the optimality of OFDM input signaling even for relay channels. As such, optimal achievable schemes for memoryless channels are likely to have similar properties when extended to relay channels with finite memory. The problem of joint information transmission and channel state estimation over a DMC with DM state was then studied in Chapter 3, where we bridge the gap between the two results [132] (no state information at the encoder) and [109, 110] (full state information at the encoder) by studying the case in which the 188 encoder has strictly causal or causal knowledge of the channel state information. The resulting capacity– distortion function permits a systematic investigation of the tradeoff between information transmission and state estimation. We showed the use of block Markov coding coupled with channel state estimation by treating the decoded message and received channel output as side information at the decoder is optimal for communicating the state. Additional information transmission requires a simple rate-splitting strategy. In Chapter 4, we extend this state communication setting to study the problem of estimation of a mod- ified stateX n , which is a per symbol deterministic function of the encoder outputU n and channel state S n . We derive lower and upper bounds on the minimum distortion D ∗ for the estimation problem and specialized our result to the Gaussian case with the mean squared error distortion between the reconstruc- tion and the signal. This Gaussian case is equivalent to the asymptotic, information theoretic, version of Witsenhausen’s counterexample [42]. We characterize the minimum distortionD ∗ for this asymptotic counterexample by showing that the two bounds derived for the DMIC problem match for this example channel. In fact, our results tighten the previously known lower bounds for the Witsenhausen problem in [43], [16] and also show that the achievable strategy combining linear coding with DPC suggested in [42] achieves the proposed lower bound if the proper amplification factor is employed. So far in our discussion, we have considered the problem of joint communication and estimation over a DMC with DM state, where the channel state information is assumed to available for free at the encoder. However, in practical problems, neither the channel state is always chosen by the nature (for e.g., in the uplink of a cellular network, the codeword of unwanted user acts as channel state), nor the channel state information is available to the encoder without incurring any cost. To include this scenario, we consider the state communication problem over an action dependent channel. The notion of “action” in both the channel coding framework was introduced by [123], where they study the capacity for the scenario when the state sequence is generated based on the message dependent nonadaptive action sequence. The scenario of adaptive action-dependent communication, although more practical in real life examples, were left open. In Chapter 5, we study the utility of the adaptive action by considering the scenario when the receiver is not only interested in decoding the message, but also in estimating the channel state in distortion. We have characterized the capacity–distortion function of such channels when the state is available strictly causally or causally available at the channel encoder for both the non-adaptive and adaptive action encoder. We have shown that although adaptive action is not useful to increase the capacity, it provides an improved estimation of the state sequence at the decoder as compared to a message dependent non-adaptive action. These coding theorems are then extended optimally to the case of non-causal channel state information 189 at the channel encoder to compare the rates (no state estimation) with non-adaptive and adaptive action encoders. It is found out that the adaptive action is not useful in this scenario. Similar conclusions about inutility of adaptive action are shown to hold for channels with cost constraint imprefect channel state information. In Chapter 6, we have examined the problem of cost-constrained motion planning of a robotic vehicle to gather data from a network of stationary sensors tracking a dynamic source, which has originally motivated us to look at this state communication problem. Since the underlying objective of path planning is to collect data from the sensors to estimate a stochastic source sequence, we propose a performance metric based on the concept of the minimizing the squared error distortion in communicating the sensed signal. We analyze the formal properties of the distortion function, we propose a communication strategy and evaluate the distortion metric for this communication strategy with a set of correlated jointly Gaussian sources. In addition, we extend our results to sources with unknown location and moving sources, which is of immense practical importance for many spatio-temporal monitoring applications. We introduce a sampling-based motion planning algorithm for optimizing data gathering tours for minimal distortion, and we show that planning using distortion metrics provides significant improvements in data gathering efficiency versus naive methods. Finally, we end the thesis by studying the source coding dual of our state communication problem. In an increasing number of applications, communication networks are expected to be able to convey not only data, but also information about control for actuation over multiple hops. In Chapter 7, we have tackled the analysis of a baseline communication model with three nodes connected in a cascade with the possible presence of an additional broadcast link. We have characterized the optimal trade-off between rate, distortion and cost for actuation in a number of relevant cases of interest. In general, the results point to the advantages of leveraging a joint representation of data and control information in order to utilize in the most efficient way the available communication links. Specifically, in all the considered models, a layered coding strategy, possibly coupled with rate splitting, has been proved to be optimal. This strategy is such that the base layer has the double role of guiding the actions of the downstream nodes and of providing a coarse description of the source, similar to [91]. Moreover, it is shown that this base compression layer should be designed in a way that depends on the network topology and on the relative cost of activating the different links. 190 8.2 Future directions The joint communication and estimation framework that we have considered in this thesis covers a wide range of framework. We have characterized the capacity-distortion function of few point-to-point com- munication systems with state in this thesis. There are various extensions to these results that one can consider. We would like to point out few of them in the subsections below. 8.2.1 Linear Two Hop Network In the distributed source coding setup in Chapter 7, we have considered some multi-terminal source coding problems with action dependent side information at some or all the decoders. So it is natural to consider the dual multi-terminal channel coding problem with the objective of characterizing the capacity-distortion function. Like the source coding setup, the most natural extension to the point-to-point state dependent channel is to consider a two-hop network shown in Fig. 8.2.1. Consider a two hop communication system with state depicted in Fig. 8.2.1. Suppose that the en- coder has strictly causal access to the channel state sequence S n 1 and wishes to communicate the state S 1 to the relay and the decoder. The relay sends a strictly causal function of the channel output Y n 1 of the first hop and channel state S n 2 of the second hop to the decoder. The relay wants to estimate the channel state S n 1 of the first hop in minimum possible distortion and the decoder is interested in esti- mating both the channel states S n 1 and S n 2 . We assume a DMC with a DM state model (X 1 ×X 2 × S 1 ×S 2 ,p(y 1 |x 1 ,s 1 )p(y 2 |x 2 ,s 2 )p(s 1 )p(s 2 ),Y 1 ×Y 2 ) that consists of finite input alphabets (X 1 ,X 2 ), finite output alphabets (Y 1 ,Y 2 ), finite state alphabets (S 1 ,S 2 ), and a collection of conditional pmfs p(y 1 |x 1 ,s 1 ) onY 1 and p(y 2 |x 2 ,s 2 ) onY 2 . The channels are memoryless in the sense that, without feedback, p(y n j |x n j ,s n j ) = Q n i=1 p Yj|Xj,Sj (y ji |x ji ,s ji ), ∀ j ∈ {1,2}, and the states are memory- less in the sense that the sequence (S j1 ,S j2 ,...) is independent and identically distributed (i.i.d.) with S ji ∼p Sj (s ji ),∀j∈{1,2}. An(|S 1 | n ,|S 2 | n ,n) code for strictly causal state communication over the two hop DMC with DM state consists of • an encoder that assigns a symbolx 1i (s i−1 1 )∈X 1 to each past channel state sequences i−1 1 ∈S i−1 1 fori∈ [1:n] of the first hop, • a relay that assigns a symbolx 2i (s i−1 2 ,y i−1 1 )∈X 2 to each past channel state sequences i−1 2 ∈S i−1 2 191 Figure 8.1: State communication problem for two-hop state dependent channel. of the second hop and the past channel output sequencey i−1 1 ∈Y i−1 1 of the first hop fori∈ [1:n], • a relay that assigns an estimate ˆ s n 11 ∈ ˆ S n 11 to each received sequencey n 1 ∈Y n 1 , and • a decoder that assigns estimates ˆ s n 12 ∈ ˆ S n 12 and ˆ s n 2 ∈S n 2 to each received sequencey n 2 ∈Y n 2 . The fidelity of the state estimates is measured by the expected distortion, the definition of which is shown only for channel stateS n 2 ∈S n 2 of the second hop (similarly we can define the other distortions). E(d(S n 2 , ˆ S n 2 )) = 1 n n X i=1 E(d(S 2i , ˆ S 2i )), whered :S 2 × ˆ S 2 → [0,∞) is a distortion measure between a state symbols 2 ∈S 2 and a reconstruction symbol ˆ s 2 ∈ ˆ S 2 . Without loss of generality, we assume that for every symbol s 2 ∈S 2 there exists a reconstruction symbol ˆ s 2 ∈ ˆ S 2 such thatd(s 2 ,ˆ s 2 ) = 0. A distortion3-tuple(D 11 ,D 12 ,D 2 ) is said to be achievable if there exists a sequence of(|S 1 | n ,|S 2 | n ,n) codes such that limsup n→∞ E(d(S n j , ˆ S n j ))≤D j ,∀j∈{11,12,2}. We next characterize this region of achievable distortions. We then also like to extend these results to char- acterize the capacity-distortion function for the two-hop channels, where the encoder has some additional independent information to be communicated to the decoder. To begin with, we can consider a simpler version of the problem, where the second hop channel is not state dependentS 2 =∅ and hence the relay and the decoder is interested in estimating the channel state S n 1 of the first hop in minimum possible distortions. Firstly we observe that, since the channel outputY n 2 at the decoder is a degraded version of the channel outputY n 1 of the first hop, the distortion achieved at the decoder will be larger than that achieved at the relay. From the results of strictly causal state communication over point-to-point state dependent channels in Chapter 3, it is not surprising that 192 the most natural achievable strategy would be to use block-Markov encoding strategy incorporating side information at the relay and the decoder. The encoder sends a descriptionU n 2 of the state sequenceS n 1 of the previous block by using Wyner–Ziv coding strategy (see [127]) incorporating side information at the relay and decoder (see Chapter 3), which is to be decoded at both the relay and decoder. The decoder uses a function ˆ s 12i (U 2i ,Y 2i ) to estimate the channel state at the decoder. Conditioned on U n 2 and the side information at the relay, the encoder sends a finer description of the channel stateS n 1 of the previous block, which is only decoded at the relay. The relay then uses a function ˆ s 11i (U 1i ,U 2i ,Y 1i ) to estimate the channel state. Although this block-Markov encoding strategy incorporating side information at the decoder is shown to be optimal for the point-to-point communication systems (see Chapter 3 for the proof), the proof of the converse for the two hop network discussed here seems beyond our current techniques of identifying auxiliary random variables and using estimation-theoretic inequalities such as Lemma 6. 8.2.2 Action dependent channel with asymmetric messages In Chapter 5, both the action encoder and the channel encoder is assumed to be transmitting a common messageM. A natural extension to these problem is to conside the case when one or both the encoders have some private message to communicate in addition to the shared common message. In Fig. 8.2.2, we have considered a point-to-point communication system with action dependent state with a private message at the action encoder. Specifically, we consider a communication system where encoding is in two parts: given a common message and private message, an action sequenceA n is created. The actions affect the formation of the channel statesS n , which are accessible to the sender strictly causally when producing the channel input sequence. The receiver after observing the channel output Y n , wants to decode the messages and also wants to estimate the channel state. We assume a discrete memoryless channel (DMC) with discrete memoryless state (DMS) model (X×S×A,p(y|x,s)p(s|a),Y) that consists of a finite input alphabetX , a finite output alphabetY, a finite state alphabetS, a finite action alphabetA and a collection of conditional pmfs p(y|x,s) onY. The channel is memoryless in the sense that, without feedback,p(y n |x n ,s n ) = Q n i=1 p Y|X,S (y i |x i ,s i ), and given the action sequence, the state is memoryless in the sense that(S 1 ,S 2 ,...) are independent and identically distributed (i.i.d.) withS i ∼p S (s i |a i ). A(2 nR0 ,2 nR1 ,n) code for strictly causal action dependent state communication consists of • two message setsm 0 ∈ [1:2 nR0 ] andm 1 ∈ [1:2 nR1 ], 193 Figure 8.2: Action dependent channel with a private message at the action encoder. • an action encoder that assigns an action sequencea n (m 0 ,m 1 )∈A n to each messagem 0 ∈ [1:2 nR ] andm 1 ∈ [1:2 nR1 ], • a channel encoder that assigns a symbolx i (m 0 ,s i−1 )∈X to each messagem∈ [1:2 nR0 ] and past state sequences i−1 ∈S i−1 fori∈ [1:n], and • a decoder that assigns message estimates ˆ m 0 ∈ [1:2 nR0 ], ˆ m 1 ∈ [1:2 nR1 ] (or an error message e) and a state sequence estimate ˆ s n ∈ ˆ S n to each received sequencey n ∈Y n . We assume thatM 0 andM 1 are indepedent and uniformly distributed over the respective message sets. The average probability of error is defined asP (n) e = P{ ˆ M 0 6= M 0 }∪{ ˆ M 1 6= M 1 }. The fidelity of the state estimate is measured by the expected distortion E(d(S n , ˆ S n )) = 1 n n X i=1 E(d(S i , ˆ S i )), whered :S× ˆ S→ [0,∞) is a distortion measure between a state symbols∈S and a reconstruction symbol ˆ s∈ ˆ S. Without loss of generality, we assume that for every symbols∈S there exists a recon- struction symbol ˆ s∈ ˆ S such thatd(s,ˆ s) = 0. A rate–distortion pair is said to be achievable if there exists a sequence of (2 nR0 ,2 nR1 ,n) codes such that lim n→∞ P (n) e = 0 and limsup n→∞ Ed(S n , ˆ S n )≤ D. The capacity–distortion regionC A SC (D) is defined as in [132] and is the closure of all the rates (R 0 ,R 1 ) such that(R 0 ,R 1 ,D) is achievable. We wish to chracterize the optimal tradeoff region for this problem. Remark 42. Note that this problem includes the degraded relay channel of Fig. 8.2.2 (see [26] for a detailed definition of degraded memoryless relay channel) as a special case. The relationship between the channel model is shown in the Table 8.1 below. Note that the relay channel is degraded since the channel outputs of the relay satifies the Markov chain condition X S → X R ,Y R → Y D . The difference of our channel model with the degraded relay considered in [26] is that the decoder in our model is additionally 194 Figure 8.3: Channel model for a degraded memoryless relay channel. Table 8.1: Equivalence of setting of relay channel of [26] to our formulation of action-dependent channel with asymmetric message Action-dependent channel with asymmetric messages Degraded Relay Channel of Fig. 8.2.2 M 0 ∅ M 1 M S Y R A X S X X R Y Y D p(s|a)p(y|x,s) p(y R |x S )p(y D |x R ,y R ) interested in estimating the outputY n R of the source to relay channel. The capacity of a degraded relay channel is characterized in [26] and our framework would extend this result to the joint communication ans estimation setup. If we consider the setting where the action encoder is allowed to adaptively depend on the past obseravtion of the channel state, then the special case of the setting with(M 0 ,M 1 ) = (∅,M 1 ) would be a degraded relay channel with output feedback from the relay to the source. There are a number of other open problems that present avenues for future research. A few of these are: • For the problem of relay channel with finite memory, the important future direction would be to extend these results to consider highly asynchronous links and developing practical coding strate- gies to achieve theoretical performance. Future work would also include the investigation of the Amplify-and-Forward (AF) strategy at the relay. Although, we numerically observe that decompo- sition into parallel channels is sub-optimal for the AF relay, proper linear processing at the relay could outperform the DF and CF achievable rates for some values of the channel parameters. • For the state communication part, we recall an important open problem of finding the capacity– distortion functionC NC (D) for a general DMC with DM state with an arbitrary distortion measure, 195 when the state sequence is noncausally at the encoder. The problem was studied in [109], which established a lower bound onC NC (D) as C NC (D)≥ max I(U;Y)−I(U;S) , (8.1) where the maximum is over all conditional pmfsp(u|s) and functionsx(u,s) and ˆ s(u,y) such that E(d(S, ˆ S))≤ D. While it is believed that this lower bound is tight in general (see, for example, [110] for the case of Gaussian channels with additive Gaussian states with quadratic distortion mea- sure), the proof of the converse seems beyond our current techniques of identifying auxiliary random variables and using estimation-theoretic inequalities such as Lemma 6. Although we propose an up- per bound for the problem in Chapter 4, we are not able to show the optimality or sub-optimality of the proposed bound. • For the modified state estimation problem, we believe that our results can lead to three sets of new work: (a) combined with the finite blocklength results of [41] we can potentially provide better bounds on the original scalar Witsenhausen’s counterexample, (b) new results on the extensions of the counterexample (see [40]); and (c) assist in the examination of problems of estimation with a helper who knows the interference, as studied in [16]. • We observe that a simple encoding and decoding strategy performs well in the robotic path plan- ning problem. It would be interesting to explore the fundamental limit of the proposed distortion metric as future work. From the algorithmic point of view, one important area for future work is to examine the asymptotic optimality of the BCDM-RRT. Another area to consider is more informed sampling techniques for biasing the tree generation towards areas of minimal distortion. These re- finements will move towards improved estimation-theoretic planning for a wide variety of robotic data collection tasks. • So far in our discussion, we have only considered the case when the encoder can take cost con- strained actions to not only select the quality of the state dependent communication channel, but also controls the amount of noisy state information available at the encoder. However, the decoder can also take adaptive actions based on the channel output and past obseravtion of the noisy chan- nel state and then obtain its own partial state information which is used to decode the transmitted message and construct an estimate of the channel state as shown in Fig. 8.2.2. We can motivate this 196 Figure 8.4: The problem of joint communication and estimation with cost constrained action dependent side information at both the encoder and the decoder. general setting by considering an extension of the path planning problem considered in Chapter 6, where we no longer restrict the sensors or the encoders to be fixed and hence the sensors along with the autonomous vehicle can take actions by changing their positions in order to get a better view of the target or the source. 197 References [1] B. Ahmadi and O. Simeone. Distributed and cascade lossy source coding with a side information “vending machine”. Preprint available at http://arxiv.org/abs/1109.6665, 2011. [2] B. Ahmadi and O. Simeone. Robust coding for lossy computing with receiver-side observation costs. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, 31 2011- aug. 5 2011. [3] B. Ahmadi, R. Tandon, O. Simeone, and H. V . Poor. Heegard-Berger and cascade source coding problems with common reconstruction constraints. Preprint available at arXiv:1112.1762v3, 2011. [4] F. Arrichiello, D. N. Liu, S. Yerramalli, A. Pereira, J. Das, U. Mitra, and G. S. Sukhatme. Effects of underwater communication constraints on the control of marine robot teams. In Proc. Int. Conf. Robot Communication and Coordination, pages 1–8, 2009. [5] H. Asnani, H. Permuter, and T. Weissman. Probing capacity. IEEE Trans. Inf. Theory, 57:7317– 7332, 2011. [6] M. Baglietto, T. Parisini, and R. Zoppoli. Nonlinear approximations for the solution of team optimal control problems. In IEEE Conference on Decision and Control CDC, pages 4592–4594, San Diego, USA, 1997. [7] R. Bansal and T. Basar. Stochastic teams with nonclassical information revisited: When is an affine control optimal? IEEE Transactions on Automatic Control, 32:554–559, 1987. [8] T. Berger and R. W. Yeung. Multiterminal source encoding with one distortion criterion. IEEE Trans. Inf. Theor., 35(2):228–236, September 2006. [9] L. Berkhovskikh and Y . Lysanov. Fundamentals of Ocean Acoustics. New York: Springer, 1984. [10] D. Bhadauria, O. Tekdas, and V . Isler. Robotic data mules for collecting data over sparse sensor fields. J. Field Robotics, 28(3):388–404, 2011. [11] J. Binney and G. S. Sukhatme. Branch and bound for informative path planning. In Proc. IEEE Int. Conf. Robotics and Automation, pages 2147–2154, 2012. [12] D. Blackwell. Equivalent comparisons of experiments. Ann. Math. Statits., 24:265–272, 1953. [13] Cecilia Carbonelli and Urbashi Mitra. Cooperative multihop communication for underwater acous- tic networks. In Proceedings of the 1st ACM international workshop on Underwater networks, WUWNet ’06, pages 97–100, New York, NY , USA, 2006. ACM. [14] B. Chen and G. W. Wornell. Quantization index modulation: A class of provably good methods for digital watermarking and information embedding. IEEE Trans. Inf. Theory, 47(4):1423–1443, May 2001. 198 [15] Y . K. Chia, H. Permuter, and T. Weissman. Cascade, triangular and two way source coding with degraded side information at the second user. Preprint available at http://arxiv.org/abs/1010.3726, 2010. [16] Y .-K. Chia, R. Soundararajan, and T. Weissman. Estimation with a helper who knows the interfer- ence. Preprint available at arXiv:1203.4311v1, 2012. [17] Yeow-Khiang Chia, H. Asnani, and T. Weissman. Multi-terminal source coding with action de- pendent side information. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, pages 2035 –2039, 31 2011-aug. 5 2011. [18] C. Choudhuri, Y .-H. Kim, and U. Mitra. Capacity-distortion trade-off in channels with state. In Proc. 48th Ann. Allerton Conf. Comm. Control Comput., pages 1311–1318, Allerton, IL, September 2010. [19] C. Choudhuri, Y .-H. Kim, and U. Mitra. Causal state amplification. In Proc. IEEE Int. Symp. Inf. Theory, pages 2110–2114, St. Petersburg, Russia, August 2011. [20] C. Choudhuri, Y .-H. Kim, and U. Mitra. Causal state communication. atit, Nov. 2012. [21] C. Choudhuri and U. Mitra. Discrete memoryless implicit communication with application to Wit- senhausen’s counterexample: Addendum. 2012. [22] C. Choudhuri and U. Mitra. How useful is adaptive action? In Global Communications Conference (Globecom), 2012 IEEE Conference on, Anaheim, CA, USA, dec. 2012. [23] C. Choudhuri and U. Mitra. On non-causal side information at the encoder. In Communication, Control and Computing, 2012 Allerton Conference on, Urbana, IL, USA, Oct. 2012. [24] Max H. M. Costa. Writing on dirty paper. IEEE Trans. Inf. Theory, 29(3):439–441, 1983. [25] T. A. Courtade and T. Weissman. Multiterminal source coding under logarithmic loss. arXiv:1110.3069 [cs.IT], 2011. [26] T. Cover and A.E. Gamal. Capacity theorems for the relay channel. Information Theory, IEEE Transactions on, 25(5):572 – 584, sep 1979. [27] T. M. Cover. Conflict between state information and intended information. In Proc. IEEE Inf. Theory Workshop, Metsovo, Greece, June 1999. [28] Thomas M. Cover and Young-Han Kim. Capacity of a class of deterministic relay channels. In Information Theory, 2007. ISIT 2007. IEEE International Symposium on, pages 591 –595, june 2007. [29] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley, New York, second edition, 2006. [30] Imre Csiszár and János Körner. Broadcast channels with confidential messages. IEEE Trans. Inf. Theory, 24(3):339–348, 1978. [31] I. A. ¸ Sucan, M. Moll, and L. E. Kavraki. The Open Motion Planning Library. IEEE Robotics & Automation Mag., 2012. To appear. [32] J.-P. Delmas and H. Abeida. Cramer-rao bounds of doa estimates for bpsk and qpsk modulated signals. Signal Processing, IEEE Transactions on, 54(1):117 – 126, jan. 2006. [33] Natasha Devroye, Patrick Mitran, and Vahid Tarokh. Achievable rates in cognitive radio channels. IEEE Trans. Inf. Theory, 52(5):1813–1827, May 2006. 199 [34] L. Dikstein, H. Permuter, and S. Shamai. Mac with action-dependent state information at one encoder. In Proc. IEEE Int. Symp. Inf. Theory, Cambridge, MA, USA, 2012. [35] A. El Gamal and Y .-H. Kim. Network Information Theory. Cambridge University Press, 2012. [36] A. El Gamal, M. Mohseni, and S. Zahedi. Bounds on capacity and minimum energy-per-bit for awgn relay channels. Information Theory, IEEE Transactions on, 52(4):1545 –1561, april 2006. [37] M. Gastpar. Uncoded transmission is exactly optimal for a simple Gaussian network. Information Theory, IEEE Transactions on, 54(11):5247 –5251, nov. 2008. [38] S. I. Gelfand and M. S. Pinsker. Coding for channel with random parameters. Probl. Control Inf. Theory, 9(1):19–31, 1980. [39] A.J. Goldsmith and M. Effros. The capacity region of broadcast channels with intersymbol interfer- ence and colored gaussian noise. Information Theory, IEEE Transactions on, 47(1):219 –240, jan 2001. [40] P. Grover, S. Y . Park, and A. Sahai. On the generalized Witsenhausen counterexample. In Allerton Conference on Communication, Control, and Computing, Illinois, USA, 2009. [41] P. Grover, S. Y . Park, and A. Sahai. The finite-dimensional Witsenhausen counterexample. to appear in IEEE Transactions on Automatic Control, 2012. [42] P. Grover and A. Sahai. Witsenhausen’s counterexample as assisted interference suppression. Spe- cial Issue on “Information Processing and Decision Making in Distributed Control Systems” of the International Journal of Systems, Control and Communications, 2010. [43] P. Grover, A. B. Wagner, and A. Sahai. Information embedding meets distributed control. Preprint available at http://arxiv.org/pdf/1003.0520v1.pdf, 2010. [44] D. A. Harville. Matrix Algebra From a Statistician’s Perspective. Springer-Verlag. [45] C. Heegard and T. Berger. Rate distortion when side information may be absent. Information Theory, IEEE Transactions on, 31(6):727 – 734, nov 1985. [46] Chris Heegard. Capacity and coding for computer memory with defects. Ph.D. thesis, Stanford University, Stanford, CA, November 1981. [47] Chris Heegard and Abbas El Gamal. On the capacity of computer memories with defects. IEEE Trans. Inf. Theory, 29(5):731–739, 1983. [48] W. Hirt and J.L. Massey. Capacity of the discrete-time gaussian channel with intersymbol interfer- ence. Information Theory, IEEE Transactions on, 34(3):38, may 1988. [49] Y . C. Ho and T. Chang. Another look at the nonclassical information structure problem. IEEE Transactions on Automatic Control, 25(3):537–540, 1980. [50] Y . C. Ho, M.P. Kastner, and E. Wong. Teams, signaling, and information theory. IEEE Transactions on Automatic Control, 23:305–312, 1978. [51] G. Hollinger, S. Choudhary, P. Qarabaqi, C. Murphy, U. Mitra, G. Sukhatme, M. Stojanovic, H. Singh, and F. Hover. Underwater data collection using robotic sensor networks. IEEE J. Se- lected Areas in Communications, 30(5):899–911, 2012. [52] G. Hollinger, U. Mitra, and G. Sukhatme. Active classification: Theory and application to under- water inspection. In Proc. International Symposium on Robotics Research (ISRR), Flagstaff, AZ, August 2011. 200 [53] G. Hollinger, S. Yerramalli, S. Singh, U. Mitra, and G. S. Sukhatme. Distributed coordination and data fusion for underwater search. In Proc. IEEE Conf. Robotics and Automation, pages 349–355, 2011. [54] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1990. [55] M.F. Huber, T. Bailey, H. Durrant-Whyte, and U.D. Hanebeck. On entropy approximation for Gaussian mixture random vectors. In Multisensor Fusion and Integration for Intelligent Systems, 2008. MFI 2008. IEEE International Conference on, pages 181 –188, aug. 2008. [56] J. Hui and P. Humblet. The capacity region of the totally asynchronous multiple-access channel. Information Theory, IEEE Transactions on, 31(2):207 – 216, mar 1985. [57] S. Karaman and E. Frazzoli. Sampling-based algorithms for optimal motion planning. Int. J. Robotics Research, 30(7):846–894, 2011. [58] A.H. Kaspi. Rate-distortion function when side-information may be present at the decoder. Infor- mation Theory, IEEE Transactions on, 40(6):2031 –2034, nov 1994. [59] Amiram H. Kaspi. Two-way source coding with a fidelity criterion. IEEE Trans. Inf. Theory, 31(6):735–740, 1985. [60] S. M. Kay. Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Prentice Hall, 1993. [61] Guy Keshet, Yossef Steinberg, and Neri Merhav. Channel coding in the presence of side informa- tion. Found. Trends Comm. Inf. Theory, 4(6):445–586, 2008. [62] D.B. Kilfoyle and A.B. Baggeroer. The state of the art in underwater acoustic telemetry. Oceanic Engineering, IEEE Journal of, 25(1):4 –27, jan. 2000. [63] Y .-H. Kim, A. Sutivong, and T. M. Cover. State amplification. IEEE Trans. Inf. Theory, 54(5):1850– 1859, May 2008. [64] K. Kittichokechai, T. J. Oechtering, and M. Skoglund. Coding with action-dependent side information and additional reconstruction requirements. Preprint available at http://arxiv.org/abs/1202.1484, 2012. [65] G. Kramer, M. Gastpar, and P. Gupta. Cooperative strategies and capacity theorems for relay net- works. Information Theory, IEEE Transactions on, 51(9):3037 – 3063, sept. 2005. [66] A. Krause and C. Guestrin. Submodularity and its applications in optimized information gathering. ACM Trans. Intelligent Systems and Technology, 2(4), 2011. [67] A. Krause, C. Guestrin, A. Gupta, and J. Kleinberg. Robust sensor placements at informative and communication-efficient locations. ACM Trans. Sensor Networks, 7(4), 2011. [68] A. Krishnan, E. Nissen, S. Saripalli, and R. Arrowsmith. Change detection using airborne Lidar: application to earthquakes. In Proc. Int. Symp. Experimental Robotics, 2012. [69] J. Kron, A. Gattami, T. J. Oechtering, and M. Skoglund. Iterative source-channel coding approach to WitsenhausenŠs counterexample. Preprint available at http://arxiv.org/abs/1205.4563, 2012. [70] A. V . Kusnetsov and B. S. Tsybakov. Coding in a memory with defective cells. Probl. Control Inf. Theory, 10(2):52–60, April 1974. 201 [71] A. Lapidoth and Y . Steinberg. The multiple access channel with causal and strictly causal side information at the encoders. In Proc. International Zurich Seminar on Communications, March 2010. [72] A. Lapidoth and S. Tinguely. Sending a bivariate gaussian over a gaussian mac. Information Theory, IEEE Transactions on, 56(6):2714 –2752, june 2010. [73] J.T. Lee, E. Lau, and Y .C.L. Ho. The Witsenhausen counterexample: a hierarchical search approach for nonconvex optimization problems. IEEE Transactions on Automatic Control, 46(3):382–397, 2001. [74] W. Lee and D. Xiang. Information-theoretic measures for anomaly detection. In IEEE Symposium on Security and Privacy, 2001. [75] E. L. Lehmann. Testing Statistical Hypotheses. New York: Wiley, 1959. [76] D S. Levine. Information-rich path planning under general constraints using rapidly-exploring ran- dom trees. Master’s thesis, Massachusetts Institute of Technology, June 2010. [77] M. Li, O. Simeone, and A. Yener. Multiple access channels with states causally known at transmit- ters. arXiv:1011.6639, 2011. [78] S. H. Lim, P. Minero, and Y .-H. Kim. Lossy communication of correlated sources over multiple access channels. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pages 851 –858, oct. 2010. [79] D. G. Luenberger. Optimization by Vector Space Methods. New York:Wiley, 1969. [80] N. Marina, A. Kavcic, and N.T. Gaarder. Capacity theorems for relay channels with isi. In Infor- mation Theory, 2008. ISIT 2008. IEEE International Symposium on, pages 479 –483, july 2008. [81] N. Merhav and S. Shamai. Information rates subject to state masking. IEEE Trans. Inf. Theory, 53(6):2254–2261, June 2007. [82] E. C. van der Meulen. Transmission of information in a T-terminal discrete memoryless channel. Ph.D. thesis, Dep. of Statistics, University of California, Berkeley, 1968. [83] J. Mitola. Cognitive Radio: An Integrated Agent Architecture for Software Defined Radio. Ph.D. thesis, KTH Royal Inst. Techn., Stockholm, Sweden, 2000. [84] S.K. Mitter and A. Sahai. Information and control: Witsenhausen revisited. in Yamamoto, Y. and Hara, S. (Eds.): Learning, Control and Hybrid Systems: Lecture Notes in Control and Information Sciences, Springer, New York, 241:281–293, 1999. [85] A.F. Molisch. Wireless communications. Wiley, 2011. [86] Pierre Moulin and Joseph A. O’Sullivan. Information-theoretic analysis of information hiding. IEEE Trans. Inf. Theory, 49(3):563–593, 2003. [87] C. Mulcahy. Fitch Cheney’s five card trick. Math Horizon, feb. 2003. [88] M. Naghshvar and T. Javidi. Active m-ary sequential hypothesis testing, August 2010. [89] A. Orlitsky and J. R. Roche. Coding for computing. IEEE Trans. Inf. Theory, 47(3):903–917, March 2001. [90] C.H. Papadimitriou and J.N. Tsitsiklis. Intractable problems in control theory. SIAM Journal on Control and Optimization, 24:639–654, 1986. 202 [91] H. Permuter and T. Weissman. Source coding with a side information “vending machine". IEEE Trans. Inf. Theory, 57:4530–4544, 2011. [92] J. G. Proakis. Digital Communications. McGraw-Hill, fourth edition. [93] M. Raginsky. Shannon meets Blackwell and Le Cam: channels, codes, and statistical experiments. In Proc. IEEE Int. Symp. Inf. Theory, pages 1220–1224, St. Petersburg, Russia, August 2011. [94] A. Reznik, S.R. Kulkarni, and S. Verdu. Degraded gaussian multirelay channel: capacity and opti- mal power allocation. Information Theory, IEEE Transactions on, 50(12):3037 – 3046, dec. 2004. [95] M. Rotkowitz. Linear controllers are uniformly optimal for the Witsenhausen counterexample. In Decision and Control, 2006 45th IEEE Conference on, pages 553 –558, dec. 2006. [96] W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, third edition, 1976. [97] A. Sabharwal and U. Mitra. Bounds and protocols for a rate-constrained relay channel. Information Theory, IEEE Transactions on, 53(7):2616 –2624, july 2007. [98] R. C. Shah, S. Roy, S. Jain, and W. Brunette. Data mules: Modeling and analysis of a three-tier architecture for sparse sensor networks. Ad Hoc Networks, 1(2–3):215–233, 2003. [99] Claude E. Shannon. Channels with side information at the transmitter. IBM J. Res. Develop., 2(4):289–293, 1958. [100] A. Singh, A. Krause, C. Guestrin, and W. Kaiser. Efficient informative sensing using multiple robots. J. Artificial Intelligence Research, 34:707–755, 2009. [101] R. N. Smith, Y . Chao, P. P. Li, D. A. Caron, B. H. Jones, and G. S. Sukhatme. Planning and implementing trajectories for autonomous underwater vehicles to track evolving ocean processes based on predictions from a regional ocean model. Int. J. Robotics Research, 29(12):1475–1497, 2010. [102] A. Somekh-Baruch, S. Shamai(Shitz), and S. Verdu. Cooperative multiple-access encoding with states available at one transmitter. IEEE Trans. Inf. Theory, 54(10):4448–4469, October 2008. [103] Y . Steinberg. Coding and common reconstruction. Information Theory, IEEE Transactions on, 55(11):4995 –5010, nov. 2009. [104] M. Stojanovic. Recent advances in high-speed underwater acoustic communications. IEEE Journal of Oceanic Engineering, 21(2):125–137, April 1996. [105] M. Stojanovic. On the relationship between capacity and distance in an underwater acoustic communication channel. ACM SIGMOBILE Mobile Computing and Communications Review, 11(4):34–43, 2007. [106] Milica Stojanovic. On the relationship between capacity and distance in an underwater acoustic communication channel. In Proceedings of the 1st ACM international workshop on Underwater networks, WUWNet ’06, pages 41–47, New York, NY , USA, 2006. ACM. [107] E. Stump, V . Kumar, B. Grocholsky, and P. M. Shiroma. Control for localization of targets using range-only sensors. Int. J. Robotics Research, 28(6):743–757, 2009. [108] O. Sumszyk and Y . Steinberg. Information embedding with reversible stegotext. In Electrical and Electronics Engineers in Israel, 2008. IEEEI 2008. IEEE 25th Convention of, pages 394 –395, dec. 2008. 203 [109] A. Sutivong. Channel capacity and state estimation for state-dependent channel. Ph.D. thesis, Stanford University, Palo Alto, CA, 2003. [110] A. Sutivong, M. Chiang, T. M. Cover, and Y .-H. Kim. Channel capacity and state estimation for state-dependent Gaussian channels. IEEE Trans. Inf. Theory, 51(4):1486–1495, April 2005. [111] R. Tandon, S. Mohajer, and H.V . Poor. Cascade source coding with erased side information. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, pages 2944 – 2948, 31 2011-aug. 5 2011. [112] O. Tekdas, D. Bhadauria, and V . Isler. Efficient data collection from wireless nodes under the two-ring communication model. Int. J. Robotics Research, 31(6):774–784, 2012. [113] C. Tian. Amplification of the hidden Gaussian channel states. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pages 3068 –3072, july 2012. [114] Y . Tirta, Z. Li, Y .-H. Lu, and S. Bagchi. Efficient collection of sensor data in remote fields using mobile collectors. In Proc. Int. Conf. Computer Communications and Networks, pages 515–519, 2004. [115] David N. C. Tse and Pramod Viswanath. Fundamentals of Wireless Communication. Cambridge University Press, Cambridge, 2005. [116] M. Vajapeyam, S. Vedantam, U. Mitra, J.C. Preisig, and M. Stojanovic. Distributed space-time cooperative schemes for underwater acoustic communications. Oceanic Engineering, IEEE Journal of, 33(4):489 –501, oct. 2008. [117] E. van der Meulen. Three-terminal communication channels. Adv. Appl. Prob., 3:120 – 154, 1971. [118] E. van der Meulen. A survey of multi-way channels in information theory: 1961-1976. Information Theory, IEEE Transactions on, 23(1):1 – 37, jan 1977. [119] I. Vasilescu, K. Kotay, D. Rus, M. Dunbabin, and P. Corke. Data collection, storage, and retrieval with an underwater sensor network, 2005. [120] D. Vasudevan, C. Tian, and S. N. Diggavi. Compression with actions. In Communication, Control, and Computing (Allerton), 2006 44th Annual Allerton Conference on, sept. 2006. [121] S. Verdu. The capacity region of the symbol-asynchronous gaussian multiple-access channel. In- formation Theory, IEEE Transactions on, 35(4):733 –751, jul 1989. [122] B. Wang, J. Zhang, and A. Host-Madsen. On the capacity of mimo relay channels. Information Theory, IEEE Transactions on, 51(1):29 –43, jan. 2005. [123] T. Weissman. Capacity of channels with action-dependent states. IEEE Trans. Inf. Theory, 56:5396–5411, 2010. [124] H. S. Witsenhausen. A counterexample in stochastic optimum control. SIAM Journal on Control, 6:131–147, 1968. [125] F. Wu, C. Huang, and Y . Tseng. Data gathering by mobile mules in a spatially separated wireless sensor network. In Int. Conf. Mobile Data Management: Systems, Services and Middleware, pages 293–298, 2009. [126] Y . Wu and S. Verdu. Witsenhausen’s counterexample: A view from optimal transport theory. In Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on, pages 5732 –5737, dec. 2011. 204 [127] A. D. Wyner and J. Ziv. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory, 22(1):1–10, January 1976. [128] Wei Yu, G. Ginis, and J.M. Cioffi. Distributed multiuser power control for digital subscriber lines. Selected Areas in Communications, IEEE Journal on, 20(5):1105 –1115, jun 2002. [129] B. Yuan, M. Orlowska, and S. Sadiq. On the optimal robot routing problem in wireless sensor networks. IEEE Trans. Knowledge and Data Engineering, 19(9):1252–1261, 2007. [130] A. Zaidi, P. Piantanida, and S. Shamai. Multiple access channel with states known noncausally at one encoder and only strictly causally at the other encoder. In Proc. IEEE Int. Symp. Inf. Theory, St Petersburg, Russia, 2011. [131] A. Zaidi, P. Piantanida, and S. Shamai. Wyner-Ziv type versus noisy network coding for a state- dependent mac. Preprint available at http://arxiv.org/abs/1202.1209, 2012. [132] W. Zhang, S. Vedantam, and U. Mitra. Joint transmission and state estimation: A constrained channel coding approach. IEEE Trans. Inf. Theory, 57(10):7084–7095, October 2011. [133] Wenyi Zhang and U. Mitra. Channel-adaptive frequency-domain relay processing in multicarrier multihop transmission. In Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pages 3229 –3232, 31 2008-april 4 2008. [134] Wenyi Zhang, M. Stojanovic, and U. Mitra. Analysis of a linear multihop underwater acoustic network. Oceanic Engineering, IEEE Journal of, 35(4):961 –970, oct. 2010. [135] Wenyi Zhang, Milica Stojanovic, and Urbashi Mitra. Analysis of a simple multihop underwater acoustic network. In Proceedings of the third ACM international workshop on Underwater Net- works, WuWNeT ’08, pages 3–10, New York, NY , USA, 2008. ACM. [136] Zhen Zhang. Partial converse for a relay channel. Information Theory, IEEE Transactions on, 34(5):1106 –1110, sep 1988. [137] Lei Zhao, Yeow-Kiang Chia, and T. Weissman. Compression with actions. In Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on, pages 164 –171, sept. 2011. 205
Abstract (if available)
Abstract
The fundamental trade-off between communication rate and estimation error in sensing the channel state at the decoder is investigated for a discrete memoryless channel with discrete memoryless action dependent state when the state information is available either partially or fully at the encoder. We first investigate the capacity a relay channel with finite memory, where the action independent fixed channel state information is assumed to be known at both the encoder and decoder and then went on to investigate the problem of determining the trade-off between capacity and distortion for the channel with states known only at the encoder. ❧ The relay channel with finite memory is modeled with channels with inter-symbol interference (ISI) and additive colored Gaussian noise. The channel state or channel impulse responses are assumed to be known at both the encoders and decoder. Prior results are used to show that the capacity of this channel can be computed by examining the circular degraded relay channel in the limit of infinite block length. The thesis provides single letter expressions for the achievable rates with decode-and-forward (DF) and compress-and-forward (CF) processing employed at the relay. Additionally, the cut-set bound for the relay channel is generalized for the ISI/colored Gaussian noise scenario. All results hinge on showing the optimality of the decomposition of the relay channel with ISI/colored Gaussian noise into an equivalent collection of coupled parallel, scalar, memoryless relay channels. The region of optimality of the DF and CF achievable rates is also discussed. The resulting rates are illustrated through the computation of numerical examples. ❧ The problem of state communication over a discrete memoryless channel with discrete memoryless state when the state information is available strictly causally at the encoder is then studied. It is shown that block Markov encoding, in which the encoder communicates a description of the state sequence in the previous block by incorporating side information about the state sequence at the decoder, yields the minimum state estimation error. When the same channel is used to send additional independent information at the expense of a higher channel state estimation error, the optimal tradeoff between the rate of the independent information and the state estimation error is characterized via the capacity–distortion function. It is shown that any optimal tradeoff pair can be achieved via rate-splitting. These coding theorems are then extended optimally to the case of causal channel state information at the encoder using the Shannon strategy. ❧ For non-causal channel state knowledge at the encoder, information-theoretic lower and upper bounds (based respectively on ideas from hybrid-coding and rate–distortion theory) are derived on the capacity– distortion function. Some examples are provided, for which the capacity–distortion functions are characterized by showing that the two bounds match. These coding theorems are then extended to the case of source coding with side information vending machine at the encoder (introduced in [H. Asnani, H. Permuter, and T. Weissman. Probing capacity. IEEE Trans. Inf. Theory, 57:7317-7332, 2011.5]) to provide an improved lower bound on the rate–distortion function. In some of the communication scenarios, however, the decoder is not interested in estimating the state directly, but it wants to reconstruct a function of the state with maximum fidelity. This problem of modified state estimation over a discrete memoryless implicit channels (DMIC) with discrete memoryless (DM) states is studied when the state information is available non-causally at the encoder. Lower and upper bounds on the optimal distortion in estimating the input of the implicit channel are derived. The methods developed for the DMIC with DM state model are then used to investigate the optimal distortion for the asymptotic version of the Witsenhausen counterexample, one of the fundamental problems in distributed control theory. The minimum distortion is characterized for the counterexample
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Communication and cooperation in underwater acoustic networks
PDF
Communication and estimation in noisy networks
PDF
Structured codes in network information theory
PDF
Novel optimization tools for structured signals recovery: channels estimation and compressible signal recovery
PDF
Communicating over outage-limited multiple-antenna and cooperative wireless channels
PDF
Space-time codes and protocols for point-to-point and multi-hop wireless communications
PDF
Models and information rates for channels with nonlinearity and phase noise
PDF
Exploitation of sparse and low-rank structures for tracking and channel estimation
PDF
Energy-efficient packet transmissions with delay constraints for wireless communications
PDF
Sequential decision-making for sensing, communication and strategic interactions
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Large system analysis of multi-cell MIMO downlink: fairness scheduling and inter-cell cooperation
PDF
Optimal resource allocation and cross-layer control in cognitive and cooperative wireless networks
PDF
Synthetic and bio-inspired molecular communications
PDF
Active state tracking in heterogeneous sensor networks
PDF
Fundamental limits of caching networks: turning memory into bandwidth
PDF
Learning and control for wireless networks via graph signal processing
PDF
Distributed interference management in large wireless networks
PDF
Informative path planning for environmental monitoring
PDF
Towards energy efficient mobile sensing
Asset Metadata
Creator
Choudhuri, Chiranjib
(author)
Core Title
Joint communication and sensing over state dependent channels
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
02/19/2013
Defense Date
11/26/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Communication,estimation,OAI-PMH Harvest,state estimation,wireless communication
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mitra, Urbashi (
committee chair
), Caire, Giuseppe (
committee member
), Sukhatme, Gaurav S. (
committee member
)
Creator Email
cchoudhu@usc.edu,chiru.iit@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-220838
Unique identifier
UC11293937
Identifier
usctheses-c3-220838 (legacy record id)
Legacy Identifier
etd-ChoudhuriC-1442.pdf
Dmrecord
220838
Document Type
Dissertation
Rights
Choudhuri, Chiranjib
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
estimation
state estimation
wireless communication