Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Data-driven optimization for indoor localization
(USC Thesis Other)
Data-driven optimization for indoor localization
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Data-Driven Optimization for Indoor Localization by Nachikethas Ananthakrishnan Jagadeesan A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Electrical Engineering) August 2018 Copyright 2018 Nachikethas Ananthakrishnan Jagadeesan Dedication To my parents ii Acknowledgements There is much pleasure to be gained from useless knowledge Bertrand Russell One of the enduring delights of spending time in a university is the opportunity that it aords to explore and nd oneself. I shall remain eternally grateful to my advisor, Prof. Bhaskar Krishnamachari, who allowed me the freedom to pursue a varied set of topics and interests. But for his patience and guidance through my umpteen false starts and diversions, I can scarcely imagine writing this dissertation. One of my fondest memories is a group meeting, back when I had just started on my PhD program, where the surprise topic was \Why do a PhD?". Through many such discussions, advice, and his infectious enthusiasm for knowledge, his in uence on me extends far beyond the contents of this thesis. Over the years, I have had the good fortune to interact with, and learn from multiple faculty. I would like to especially thank Prof. Leana Golubchik for introducing me to various topics in privacy and learning. I would also like to thank Prof. Leana, and Prof. Murali Annavaram for their insightful comments on my research, and serving on my thesis committee. My coursework at USC has contributed greatly to my overall development as a researcher. Specically, I would like to thank Prof. Gary Rosen for his humour lled classes that rst introduced me to mathematical statistics, Prof. Nicolai Haydn for a delightful introduction to analysis, Prof. Larry Goldstein for a wonderful semester learning statistical learning, and Prof. Minlan Yu for teaching me the elements iii of distributed systems. I would also like to thank Prof. Ali Zahid for his unwavering support and patient guidance throughout my PhD. I thank my labmates for a fun and intellectually stimulating atmosphere. It has been a treat to observe and interact with them. The diversity of their talents never ceases to amaze me. Thanks Suvil, Mahesh, Pranav, Parisa, Vlad, Keyvan, Shangxing, Quynh, Pradipta, Jason, Kwame, Pedro, and Martin. Many thanks to the numerous resourceful people at USC who took care of the administrative burdens, especially Diane Demetras and Shane Goodo. My life outside of school was made memorable by my friends. Dileep, Arunima, Mythili, Sundar, Jagan, Ranjan, Saket, and Saurov were wonderful companions in navigating LA. The innumerable bike expeditions with Dileep and Saurov, the countless tennis matches with Jagan and Sundar, the entertaining conversations with Ranjan, the hilarious poker games organized by Mythili, and the wonderful dinners with Arunima remain very fond memories. I thank my father, Jagadeesan Ganapathy Ananthakrishnan, and my mother, Usha Kumari Jagadeesan, for their innite patience, support, and encouragement throughout this journey. My brother, Rishikesh Ananthakrishnan Jagadeesan, remains a constant source of enthusiasm and support. Thank you for everything Ashley. You bring me untold joy and happiness. iv Table of Contents Dedication ii Acknowledgements iii List Of Tables vii List Of Figures viii Abstract xi Chapter 1: Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Optimization based Framework . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Distributional Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 2: A Unifying Framework For Deriving Localization Algorithms 1 7 2.1 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Optimization based approach to Localization . . . . . . . . . . . . . . . . . . . . . 14 Chapter 3: A Unifying Framework For Evaluating Localization Algorithms 2 18 3.1 Distance based Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Comparing Algorithms using Stochastic Dominance . . . . . . . . . . . . . . . . . 21 3.3 Comparison based on Upper Bound of Error CDFs . . . . . . . . . . . . . . . . . . 27 3.4 Comparison with the Cram er{Rao Bound . . . . . . . . . . . . . . . . . . . . . . . 32 3.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5.1 Fingerprinting Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.5.2 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.5.3 Simulation and Trace Results . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Chapter 4: Distributionally Robust Localization Algorithms 40 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.1.1 Robustness to Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1.2 Robustness to Parameter Uncertainty . . . . . . . . . . . . . . . . . . . . . 41 4.1.3 Distributional Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Distributionally Robust Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 Uncertainty Set Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1 The material in this chapter is based in part on the work in [1] 2 The material in this chapter is based in part on the work in [1] v 4.4 Initial Estimate for Uncertainty Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Chapter 5: Ecient Computation Of The Distributionally Robust Estimate 48 5.1 Low Complexity Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.1.1 Deriving the Dual Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.3.1 Large-Scale Fading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.3.2 Small-Scale Fading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3.3 Non-Stationary Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.3.4 Real World Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Chapter 6: Conclusion and Future Work 69 6.1 Attainability of the optimal error CDF . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2 Uncertainty Sets for the Distribution of Observations . . . . . . . . . . . . . . . . . 72 6.3 Room Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.3.1 Optimization based approach to room localization . . . . . . . . . . . . . . 76 6.3.2 Incorporating Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Reference List 79 vi List Of Tables 2.1 Literature Survey on Localization Methods . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Recovering existing algorithms in our framework . . . . . . . . . . . . . . . . . . . 16 3.1 Normalized Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 vii List Of Figures 2.1 Localization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 An illustration of the estimates returned by dierent localization algorithms. . . . 15 2.3 Illustration of the posterior distribution of R using observations taken from a log- normal distribution. For this illustration, four transmitters were placed in a 16 m 16 m area. A log-normal path loss model was used to to determine the signal strengths. Each subplot above shows the posterior distribution of R constructed by the receiver upon receiving a dierent vector of observations. . . . . . . . . . . . . 17 3.1 Illustration of the empirically estimated distribution of O using signal strength measurements taken at dierent locations. Each subplot refers to the distribution of signal strength from a unique access point. The left subplots refer to measurements taken at night, while the right subplots refer to measurements taken during daytime. Our framework makes better use of the higher variance data. . . . . . . . . . . . . 30 3.2 Comparison of error CDFs for dierent localization algorithms. Traditional nger- printing based on matching the mean signal strengths is indicated by the moniker `FING'. Each subplot indicates the algorithm performance for a dierent set of test data. The left subplots refer to the case when the signal strengths are tightly clustered around the mean, while the right subplots refer to the measurements with more variance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Variation of distance error with the size of the training data set. For each fraction of the original data set, we compute the distance error for 100 random choices of the data points. The regression line for each algorithm is also shown. The results indicate that our empirical estimates of the prior improves with increasing training data, which results in better algorithm performance. . . . . . . . . . . . . . . . . . 34 3.4 An illustration of stochastic dominance. We plot the error CDFs of MAP, MMSE, MEDE, and a na ve baseline algorithm (NBS) that returns the location of the base station with the highest signal strength. For this illustration, three transmitters were placed evenly on a line and log-normal fading was assumed. The baseline algorithm is outperformed by MAP, MMSE, and MEDE, as demonstrated by their strict stochastic dominance over the baseline. . . . . . . . . . . . . . . . . . . . . . 39 4.1 Illustration of robust formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 viii 5.1 Valid parameter selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.2 Impact of 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.3 An illustration of the trajectory followed by the robust estimate (^ r), for various values of the parameters 1 and 2 . The MMSE estimate ( 0 ) and the actual receiver location (r 0 ) are shown as well. For this simulation, log-normal fading with correlated observations was assumed. The actual receiver location and the initial estimates for the uncertainty set were kept xed. The trajectory of the robust estimate does not seem to follow any obvious pattern. . . . . . . . . . . . . . . . . 59 5.4 Performance of the MMSE and Robust estimates on increasing noise variance. For this illustration, a log-normal fading model was used to generate the observations in a rectangular 60 m by 80 m space using identical parameters as that used in Figure 5.2. In this plot, the MMSE estimator correctly assumes that the observations are drawn from a log-normal distribution. Note that the performance of the robust estimate tracks that of the MMSE estimator very closely. . . . . . . . . . . . . . . 60 5.5 An illustration of the impact of the strength of the induced correlations on the performance of the MMSE and the robust estimate. For the illustration, correlation strength was varied from = 2:0 in Figure 5.5a to = 10:0 in Figure 5.5e in equal increments. Figure 5.5f corresponds to IID observations or !1. For this illustration, a log-normal fading model was used to generate the observations in a rectangular 60 m by 80 m space using identical parameters as that used in Figure 5.2. In this plot, the MMSE estimator correctly assumes that the observations are drawn from a log-normal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.6 Performance of the MMSE and Robust estimates on increasing noise variance. For this illustration, a Rayleigh fading model was used to generate the observations in a rectangular 60 m by 80 m space using identical parameters as that used in Figure 5.4. The observations are correlated. In this plot, the MMSE estimator incorrectly assumes that the observations are drawn from a log-normal distribution. The robust estimate performs better than the MMSE estimator here. Moreover, the robust estimate improves on increasing the noise variance. . . . . . . . . . . . 64 5.7 Performance of the MMSE and Robust estimates on increasing noise variance. For this illustration, a Rayleigh fading model with dierent noise variances for the mean power was used to simulate the non-stationary behavior of signal strength measurements. The observations were generated in a rectangular 60 m by 80 m space using identical parameters as that used in Figure 5.6. The observations are correlated. In this plot, the MMSE estimator incorrectly assumes that the observations are drawn from a log-normal distribution with standard deviation . The robust estimate performs better than the MMSE estimator here. Moreover, the robust estimate improves on increasing the noise variance. . . . . . . . . . . . 67 ix 5.8 Performance of the MMSE and Robust estimates on a real dataset. In this experi- ment, two sets of observations were collected from a a rectangular 4 m by 2 m space in an indoor oce environment. Plot 5.8a shows the performance of the robust estimate on observations taken during the night when there was little movement in the environment. Plot 5.8b shows the performance of the robust estimate on observations taken during the daytime, when there were regular movement of people inside the oce. The parameters of the robust estimator was unchanged between both cases. We see that the robust estimate performs better than MMSE both cases. The increase in performance improves during daytime. . . . . . . . . . . . . 68 x Abstract We consider the problem of estimating the location of an RF-device using observations, such as received signal strengths, generated according to a possibly unknown distribution from a set of transmitters with known locations. We survey the literature on this problem, showing that previous authors have considered implicitly or explicitly various metrics. We contend that the literature is disconnected and disorganized and that it is hard to decipher any unied theory that fairly evaluates these algorithms across dierent metrics of interest. Moreover, while most works employ the services of a model to obtain the distribution of observations, not enough attention is given to the issue of how well the location estimate performs when the actual distribution deviates from that predicted by the model. We address these issues in this thesis. We present a Bayesian optimization framework that unies the localization literature, and shows how to optimize the location estimation with respect to a given metric. We demonstrate how the framework can incorporate a general class of algorithms, including both model-based methods and data-driven algorithms such ngerprinting. This is illustrated by re-deriving the most popular algorithms within this framework. Furthermore, we propose using the error-CDF as a unied way of comparing algorithms based on two methods: (i) stochastic dominance, and (ii) an upper bound on error-CDFs. We prove that an algorithm that optimizes any distance based cost function is not strictly stochastically dominated by any other algorithm. We further present a distributionally robust formulation of the localization problem that explicitly takes into account the uncertainty in the distribution that generates the observations. We identify the structure of the robust solution and demonstrate how to the construct the xi optimization problem so that it is easily computed, and always yields the optimal solution. We show that the robust estimate outperforms traditional methods in the presence of modeling errors, while remaining close to the traditional estimate when the modeling is exact. xii Chapter 1 Introduction The problem of consistently and accurately estimating the location of a wireless device in an indoor environment, merits signicant scientic attention. In addition to the rich set of research challenges that this problem aords, there exists a plethora of applications that benet from advances in solving this important problem [2{4]. These include indoor location based smartphone applications and services, search and rescue operations, geo-fencing, and asset tracking, to name a few. Localization services that utilize received signal strength (RSS) measurements are particularly attractive, since they impose no additional hardware requirements on the wireless devices, while remaining suciently accurate for many applications. Consequently, advances in localization algorithms based on RSS measurements can be rolled out easily as a software upgrade for a variety of wireless devices. A signicant number of researchers have tackled this fundamental problem and proposed various algorithms for radio signal strength (RSS) based localization. Many works adopt standard algorithms from signal processing, specically estimation theory, such as Maximum Likelihood Estimation [5], Minimum Mean Squared Error Estimation [6], Best Linear Unbiased Estimator [7], etc., while other techniques such as ngerprinting [8], and sequence-based localization [9], are somewhat more heuristically derived. These algorithms are typically evaluated using numerical 1 and trace-based simulations, using varied metrics such as the mean squared position error, the absolute distance error, etc. 1.1 Motivation 1.1.1 Optimization based Framework We contend that the literature is disconnected and disorganized and that it is hard to decipher any unied theory that fairly evaluates these algorithms across dierent metrics of interest. We argue that the state-of-the-art approach to localization in the literature | which typically involves rst presenting an algorithm and then evaluating its performance according to a particular metric or a set of metrics | is akin to putting the proverbial cart before the horse. For instance, it is not uncommon for algorithms to be evaluated on metrics for which they are not explicitly or implicitly optimized. We advocate a systematic way of designing location estimation algorithms which we refer to as the \optimization-based approach to localization". In this approach, rst the localization metric is dened in terms of a suitable cost function, then an appropriate estimation algorithm is derived to minimize that cost function. In addition, our optimization framework is applicable to any deterministic or stochastic model of the observations (assumed to be known) and can accommodate any prior distribution for location. Our framework also applies to data-driven approaches such as ngerprinting. We show that, in such data-driven settings, our framework makes better use of the available data compared to traditional methods. Fundamentally a Bayesian approach, this framework is also compatible with Bayesian ltering for location tracking over time [10]. As an illustration of our framework, we consider rst a common metric used in the evaluation of localization algorithms, the absolute distance error, and derive an algorithm which yields location estimates so as to minimize the expected distance error (MEDE). For a second illustration, we also consider as another metric the probability that the location estimate is within a given radius of the 2 true location (P (d)) and derive an algorithm which maximizes this probability. Furthermore, we show that standard algorithms such as MLE and MMSE can be derived similarly from optimizing the corresponding metrics (likelihood, mean squared error respectively). In conjunction with our framework for deriving algorithms to optimize a specied metric, we also consider the problem of comparing dierent localization algorithms with each other; for which we make use of the error CDF. For an important class of cost functions that can be expressed as non-negative monotonically increasing functions of distance error, we prove that there is in eect a partial ordering among various estimation algorithms. Certain algorithms dominate other algorithms for all such cost functions, and we show the necessary condition for this to happen. But there could also be two algorithms, A 1 and A 2 , and two metrics, M 1 and M 2 , such that A 1 is better than A 2 in terms of M 1 while the reverse is true for M 2 . Thus we show that there is, in general, no single-best localization algorithm, but rather a \Pareto Set" of algorithms that are each optimized for dierent cost functions. We evaluate the optimization-based framework for location estimation using both numerical simulations, traces [11], and data obtained from experiments in an indoor oce environment. We illustrate how our framework can incorporate a variety of localization algorithms, including ngerprinting based methods. Our evaluation conrms what is predicted by the theory | no single algorithm outperforms others with respect to all metrics, thus underlining the need for an optimization based approach such as the one we propose. 1.1.2 Distributional Robustness The literature on RSS based localization is rich and varied. Many works focus on the software systems and infrastructure required to enable localization services, while others focus on the design of algorithms that estimate the device location [10,12{17]. Underlying each of these works is a model that explains how the RSS observations vary with the device location. These models may be either empirically derived, as in the case of ngerprinting based methods, or an analytical model 3 such as the log-normal path loss model may be used. In general, these models can be represented as a distribution of observations, f OjR (ojr). Adopting a Bayesian view, this distribution of observations may be used to derive a posterior belief of the device location, f RjO (rjo). Most localization algorithms may be viewed as methods that derive a location estimate from this posterior distribution [18]. However, the accuracy of the location estimates returned by these methods, as measured by various metrics such as the mean squared error (MSE), expected distance error (EDE), likelihood, etc., depends crucially on how well the chosen model approximates the actual distribution of the RSS observations. While most works employ the services of a model to obtain the distribution of observations, f OjR (ojr), not enough attention is given to the issue of how well the location estimate performs when the actual distribution deviates from that predicted by the model. This issue persists even when this distribution is empirically derived. Empirically derived distributions are sensitive to small changes in the environment and moreover, such distributions are often non-stationary, and hence change depending on the time of the day. This mismatch, between the distribution used to obtain the location estimate and the actual distribution generating the observations, results in subpar performance of localization algorithms. The performance guarantees of a localization algorithm is only valid as long as the model, upon which the algorithm is based, accurately tracks reality. Consequently, there is a need for an approach that explicitly takes into account the inherent ambiguity in modeling the environment, specically the distribution of observations, f OjR (ojr). We address these deciencies by explicitly specifying the model ambiguities, and deriving a location estimate that is resilient to such uncertainties in the model. Uncertainty in the distribution of observations, f OjR (ojr), results in an uncertain posterior, f RjO (rjo). We demonstrate how to construct an uncertainty set that contains all possible posterior distributions that we may wish to consider, and further show how to derive the robust location estimate. This robust formulation demonstrates better performance compared to the traditional approaches that do not take into account ambiguities in the underlying distribution. Moreover, even in the case where the 4 distribution predicted by the model is accurate, the robust formulation tracks the performance of traditional localization methods very closely. In other words, the robust formulation presented here gains increased resilience to inaccuracies in the model, while giving up very little performance compared to when the model is exactly true. This makes the robust formulation a very useful tool to deploy whenever we do not have perfect knowledge of the environment. Furthermore, the uncertainty set construction takes into account the condence that we have in the model. The uncertainty set can be made more or less stringent in the distributions that it admits, depending on how close we think the model is to the true distribution. Moreover, the proposed robust formulation is computationally feasible and, as a consequence of the data parallelism inherent in the construction, its run time performance improves considerably when run in multi-core or multi-processor environments. 1.2 Contributions We describe an optimization-based Bayesian framework that unies and puts in context various previously proposed techniques for localization and provides a systematic basis for developing new algorithms. We introduce a partial ordering over the set of algorithms by considering a stochastic dominance relationship between their error CDFs. We prove that any algorithm that optimizes a distance based cost function is not stochastically dominated by any other algorithm. We also present how algorithms may be compared based on how `close' an algorithm gets to the upper bound on error CDFs. We propose one such measure of closeness (area of dierence) and identify MEDE as the optimal algorithm over that measure. 5 We illustrate how our framework encompasses both model-based approaches and data-driven methods such as ngerprinting, through simulations and real-world experiments. We describe a distributionally robust formulation of the indoor localization problem, that takes into account uncertainty in the distribution of observations, f OjR (ojr). We derive and illustrate the structure of the robust solution, and further demonstrate how to construct the optimization problem in form that is computationally feasible and easily computed using standard software tools. We demonstrate that the robust formulation performs better than traditional methods when the there are errors in the distribution of observations given by the model. In case when the model distribution is accurate, when show that the robust estimate closely tracks the performance of traditional methods. 6 Chapter 2 A Unifying Framework For Deriving Localization Algorithms 1 2.1 Literature Survey In this section, we survey the existing literature on RSS based localization algorithms with the intention of comparing, across papers, the metrics used to evaluate the algorithms. The results of the survey are summarized in Table 2.1. We identify certain metrics that are commonly used across the literature: MSE: Mean Squared Error is the expected value of the square of the Euclidean distance between our estimate and the true location. Often the square root of this quantity Root Mean Squared Error (RMSE) is given instead of MSE. As RMSE may be derived using MSE, we shall only use MSE in our discussions in this document. The minimum mean squared error (MMSE) algorithm returns an estimate that minimizes the MSE. EDE: The Expected Distance Error (EDE) is the expected value of the Euclidean distance between our estimate and the true location. The minimum expected distance error (MEDE) algorithm returns an estimate that minimizes the EDE. 1 The material in this chapter is based in part on the work in [1] 7 Localization Algorithm Distribution of Observations Prior Distribution Observation Vector Estimate Figure 2.1: Localization Algorithms P (d): P (d) indicates the probability that the receiver location is within a distance of d from our location estimate. P(d) is closely related to the metric D(p) which gives the radius at which an open ball around our location estimate yields a probability of at least p. The MP (d) algorithm returns an estimate that minimizes the P (d). As evidenced by Table 2.1, it is with striking regularity that one encounters a mismatch between an algorithm and the metric used for its evaluation. While there is hardly anything amiss in checking how an algorithm performs on a metric that it is not optimized for, it is shortsighted to draw a conclusion as to the ecacy of the said algorithm based on such an evaluation. An awareness of the metric that an algorithm is implicitly or explicitly optimized for, is essential to its fair assessment. We believe that such of notion of consistent evaluation of algorithms across all important metrics of interest has been absent in the community so far. In addition, while the literature on localization abounds in algorithms that yield a location estimate, there is no unifying theory that relates them to each other with appropriate context. 8 For instance, [19] picks four algorithms for evaluation, independent of the metrics used to evaluate the algorithms. Such an approach makes it unclear if an algorithm is optimal with respect with any of the given metrics. In this case, we can only make (empirical) inferences regarding the relative ordering of the chosen algorithms among the chosen metrics. Consequently, there are no theoretical guarantees on algorithm performance and it becomes hard, if not impossible, to accurately predict how a chosen algorithm will behave when evaluated with a metric that was not considered. Moreover, while error CDFs have been used earlier to evaluate localization algorithms [20{23], they are typically used to derive inferences about algorithm performance with respect to the Euclidean distance and D(p) metrics. In the absence of the unifying theory presented in this proposal, it is unclear how one may draw meaningful conclusions regarding the relative performance of algorithms across various metrics based on their error CDF. Our proposed unifying framework places the commonly employed subjective reading of error CDFs on a rm theoretical footing and enables a better understanding of algorithm performance than what was previously possible. Moreover, our framework is computationally tractable as the optimization is typically done over a reasonably sized discrete set of possible locations. Table 2.1: Literature Survey on Localization Methods Study Algorithm Model Metric WLAN location determination via clustering and probability distributions [24] MLE Fingerprinting P (d) Indoor Localization Without the Pain [13] Genetic Algorithm Log-Normal D(p) 9 RADAR: An In-Building RF-based user location and tracking system [8] Clustering Fingerprinting EDE The Horus WLAN location determination system [15] MLE Fingerprinting EDE Locating in ngerprint space: Wireless indoor localization with little human intervention [12] Clustering Fingerprinting EDE Weighted centroid localization in Zigbee-based sensor networks [25] RSSI weighted position Free Space Path Loss EDE Sequence based localization in wireless sensor networks [9] SBL Free Space Path Loss EDE Best linear unbiased estimator algorithm for RSS based localization [7] Best linear unbiased estimate Log-Normal MSE Cooperative Received Signal Strength-Based Sensor Localization With Unknown Transmit Powers [26] MLE Log-Normal MSE Relative location estimation in wireless sensor networks [16] MLE Log-Normal MSE 10 RSS-Based Wireless Localization via SDP: Noncooperative and Cooperative Schemes [27] MLE Log-Normal MSE A Study of Localization Accuracy Using Multiple Frequencies and Powers [22] MP (d) Log-Normal MSE Maximum likelihood localization estimation based on received signal strength [5] MLE Log-Normal MSE Distance Estimation from RSS under log-normal shadowing [28] Best unbiased estimate Log-Normal MSE Table 2.1 also indicates that there is considerable interest in the community for the EDE metric. However, it is interesting to note that none of the algorithms evaluated using that metric are explicitly optimized for it. In the following sections we show how such metrics t into our framework. More importantly, it is our hope that thinking in terms of the framework below shall lead to a clearer understanding of the trade-os involved in choosing an algorithm and a better specication of the criterion necessary for its adoption. The optimization-based framework for localization presented here is inspired in part by the optimization-based approach to networking developed since the late 90's [29], which has shown successfully that ecient medium access, routing, and congestion control algorithms, protocols, and architectures all can be derived from suitably specied network utility maximization problems [30]. 11 Moreover, Bayesian optimization is increasingly gaining in popularity in the recent years [31], including applications in cognitive radio networks [32,33], largely due to the increased availability of both abundant data and the computational power needed to process that data. Our proposed framework is poised to leverage both these trends. The Bayesian structure of the localization problem, as presented here, bears similarities to the formulation of the simultaneous localization and mapping (SLAM) problem commonly employed by the robotics community [34,35]. In general, SLAM algorithms tackle the problem of incrementally building a map of the environment by a mobile robot, in addition to estimating the location of the robot within this constructed map. SLAM algorithms are particularly suited for tracking a mobile robot over time. However this imposes additional assumptions such as knowledge of a model that describes the motion of the robot. While SLAM algorithms may be used to provide simple localization services, for instance by assuming that the robot is not mobile, the focus there is more on maintaining a consistent view of the posterior belief of the location over time, rather than deriving an estimate of the location from such a belief. In contrast, in this dissertation we mainly focus on the derivation and evaluation of indoor localization algorithms, employing an optimization-based approach. Incorporating posterior belief updates, using Bayesian lters [36] or algorithms based on SLAM, to enable location tracking services within an optimization framework remains an area of future work. In this chapter, we present a unifying optimization based approach to localization. We base our approach on a Bayesian view of parameter estimation that can be found in classical statistics and signal processing texts [37,38]. We adapt the general theory of Bayesian estimation to the indoor localization setting, pointing out the constraints and advantages this entails. We show how existing algorithms can be derived in this framework and point out how alternate algorithms may be derived. 12 2.2 Problem Setting LetSR 2 be the two-dimensional space of interest in which localization is to be performed 2 . We assume thatS is closed and bounded. Let the location of the receiver (the node whose location is to be estimated) be denoted asr = [x r ;y r ]. Using a Bayesian viewpoint [37,38], we assume that this location is a random variable with some prior distribution f R (r). This prior distribution is used to represent knowledge about the possible position, obtained, for instance from previous location estimates or knowledge of the corresponding user's mobility characteristics in the space; in the absence of any prior knowledge, it could be set to be uniform overS. Let o2R N represent the location dependent observation data that was collected. As an example,o could represent the received signal strength values from transmitters whose locations are known. Mathematically, we only require that the observation vector is drawn from a known distribution that depends on the receiver locationr: f O (ojR =r). In case of RSS measurements, this distribution characterizes the stochastic radio propagation characteristics of the environment and the location of the transmitters. Note that this distribution could be expressed in the form of a standard fading model whose parameters are tted with observed data, such as the well-known simple path loss model with log-normal fading [39]. The distribution f O (ojR =r) is general enough to incorporate more data-driven approaches such as the well-known ngerprinting procedure. In ngerprinting, there is a training phase in which statistical measurements are obtained at the receiver at various known locations and used to estimate the distribution of received signal strengths at each location. 3 Fundamentally, the data-driven approach constructs f O (ojR =r) empirically, while model-dependent approaches take the distribution over observations directly from the model. 2 It is trivial to extend the framework to 3-D localization, for simplicity, we focus on the more commonly considered case of 2-D localization here. 3 We note that in many implementations of ngerprinting, only the mean received signal strength from each transmitter is used, which of course is a special case, equivalent to assuming a deterministic signal strength measurement with a unit step function cumulative distribution function. 13 Using the conditional distribution of the observed vector and the prior overR, we obtain the conditional distribution over the receiver locations using Bayes' rule: f R (rjO =o) = f O (ojR =r)f R (r) R r2S f O (ojR =r)f R (r) dr : (2.1) Algorithms for localization are essentially methods that derive a location estimate from the above posterior distribution. In fact, any localization algorithm A is a mapping from the observation vectoro the prior distribution over the location, f R (r) the conditional distribution overo, f O (ojR =r) to a location estimate ^ r, as illustrated in Figure 2.1. A visualization of the posterior distribution for the popular simple path loss model with log-normal fading is given in Figure 2.3. 2.3 Optimization based approach to Localization The starting point for estimating the receiver location is a cost function that must be dened a priori. In the most general terms, the cost function is modeled asC(r; ~ r;o), i.e., a function of the true locationr, a given proposed location estimate ~ r, and the observation vectoro. We dene the expected cost function given an observation vector as follows: E[C(r; ~ r;o)] = Z r2S C(r; ~ r;o)f R (rjO =o) dr: (2.2) Given any cost functionC, the optimal location estimation algorithm can be obtained in a unied manner by solving the following optimization for any given observation vector to obtain the optimal estimate ^ r: ^ r = arg min ~ r E[C(r; ~ r;o)]: (2.3) 14 Note that this optimization may be performed to obtain an arbitrarily near-optimal solution by numerically computingE[C(r; ~ r;o)] over the discretization of a two or three dimensional search space. Given recent gains in computing power, the optimization is feasible for typical indoor localization problems. Moreover, the optimization naturally lends itself to parallel execution since the computation of the expected cost at all candidate locations are independent of each other. Assuming uniform coverage, the solution will improve upon increasing the number of points in our search space. In practice, for RSS localization, these points could be spaced apart on the order of 10's of centimeters. -1.0 -0.5 0.0 0.5 1.0 Possible locations 0.00 0.25 0.50 0.75 Posterior PDF MMSE MAP MEDE MP(1.5) Figure 2.2: An Illustration of the distribution of true location given the observations and the locations that correspond to dierent optimizations. In this example, we show the dierent estimates returned by the various localization algorithms for a unimodal, asymmetric posterior probability density function of the form f(x) = ( 4 5 (1 +x) if 1x< 0 4 5 (1 x 2 ) if 0x 1: The asymmetry of the distribution function pulls estimates other than MAP from the mode, with MP(d) being the most aected. In this example, it may be shown that for d 1:5, the MP(d) estimate is ^ x MP(d) = d 6 . Thus we see that the MP (d) estimate moves closer to the MAP estimate with decreasingd. This example serves to illustrate how diering optimization objectives can yield very dierent estimates for the same posterior distribution, thereby underlining the importance of deciding on an optimization objective upfront. Existing algorithms such as MLE, MMSE, MEDE and MP (d) can be recovered in this framework using suitable choices of the cost functionC. For instance, it is straightforward to verify that 15 minimizing the expected distance error yields MEDE. Perhaps more interestingly, the MLE estimate can also be recovered using an appropriate distance based cost function, as can be seen by employing Theorem 1 with a uniform prior. Table 2.2 lists the choice of cost functions and the optimization problem to be solved that results in each of these algorithms. Figure 2.2 provides an example of how these dierent optimizations can yield very dierent location estimates. Table 2.2: Recovering existing algorithms in our framework Algorithm Cost function Optimization MMSE C(r; ~ r;o) = (k~ rrk 2 ) 2 r MMSE = arg min ~ r E h (k~ rrk 2 ) 2 i MEDE C(r; ~ r;o) =k~ rrk 2 r MEDE = arg min ~ r E [k~ rrk 2 ] MP (d) C(r; ~ r;o) =P (k~ rrk 2 d) r MP(d) = arg max ~ r E [P (k~ rrk 2 d)] MLE C(r; ~ r;o) =P (k~ rrk 2 ) r MLE = lim !0 arg max ~ r E [P (k~ rrk 2 )] The most compelling aspect of this unied optimization-based approach to localization is its generality. Being Bayesian in nature, it can incorporate both model and data-driven approaches to characterizing the radio environment in a given space, and can accommodate prior information in a natural way (as such, it is also highly compatible with location tracking approaches that use Bayesian ltering). In addition, the framework gets better over time as more observations or inputs help improve the prior. While we present and evaluate our framework using RSS measurements for ease of exposition, it is not limited to such measurements. Other modalities such as ToA, TDoA and AoA [40,41] are easily incorporated as well. 16 X 5 10 15 Y 5 10 15 0.0002 0.0004 0.0006 (a) X 5 10 15 Y 5 10 15 0.00025 0.00050 0.00075 (b) X 5 10 15 Y 5 10 15 0.00025 0.00050 0.00075 (c) X 5 10 15 Y 5 10 15 f R (r|O = o) 0.00025 0.00050 0.00075 (d) X 5 10 15 Y 5 10 15 f R (r|O = o) 0.00025 0.00050 0.00075 (e) X 5 10 15 Y 5 10 15 f R (r|O = o) 0.00025 0.00050 0.00075 (f) Figure 2.3: Illustration of the posterior distribution of R using observations taken from a log- normal distribution. For this illustration, four transmitters were placed in a 16 m 16 m area. A log-normal path loss model was used to to determine the signal strengths. Each subplot above shows the posterior distribution of R constructed by the receiver upon receiving a dierent vector of observations. 17 Chapter 3 A Unifying Framework For Evaluating Localization Algorithms 1 In addition to the previously dened unifying framework, we also propose the use of the distance error cdf as a unied way of evaluating localization algorithms. For a localization algorithm, say A, the L 2 (Euclidean) distance between the estimate (^ r A ) and the true location (r) is represented by the random variable D A . Note that D A =k^ r A rk 2 : (3.1) The CDF of D A , also termed the error cdf of algorithm A, may be characterized by averaging the probability that the true location lies within a certain distance, say d, of our estimate, over the all possible receiver locations. This notion is dened below. Denition 1. Let A2A, whereA denotes the set of all localization algorithms. Denote by ^ r A a location estimate returned by the algorithm A. Then, the error cdf of A is a monotonically increasing function F A :QR 0 ! [0; 1] such that F A (d) = Z r2S P [D A d]f R (r) dr: (3.2) 1 The material in this chapter is based in part on the work in [1] 18 Let d be the maximum distance between any two points inS. ThenQ is the closed interval [0;d ]. Using the error cdf, we may meaningfully dene an ordering over the class of localization algorithms using the concept of stochastic dominance. Denition 2. Let A 1 ;A 2 2A. We say that A 1 stochastically dominates A 2 if F A1 (d)F A2 (d) 8d2Q: (3.3) Denition 3. Let A 1 ;A 2 2A. We say that A 1 strictly stochastically dominates A 2 if in addition to equation (3.3), there exists d 1 ;d 2 2Q such that d 1 <d 2 and F A1 (d)>F A2 (d) 8d2 [d 1 ;d 2 ]: (3.4) 3.1 Distance based Cost Functions We now restrict our attention to an important class of metrics, those cost functions that can be specied to be monotonically increasing with respect to the distance between the true and estimated positions. We show that in fact localization algorithms form a partially-ordered set with respect to this important class of metrics. We also show that localization algorithms derived using the optimization-based approach for these metrics lie essentially on a \Pareto Boundary" of the set of all localization algorithms. The localization cost function, formally dened below, generalizes most metrics commonly used in the localization literature. Denition 4. Let g :QR 0 !R 0 be a monotonically increasing function. Denote the set of all such functions byG. For a localization algorithm A, g(D A ) is the distance error localization cost function. E [g(D A )] is the expected cost of the algorithm A. We use the above notion of expected cost as a metric to compare dierent localization algorithms. Note that this cost function is a special case of the more general cost function introduced in the 19 previous section. Here we are only interested in cost functions that depend on the distance between the true location and our estimate. Many localization algorithms of interest try to optimize for some distance based cost function, either explicitly or implicitly. We have seen already that MMSE, MEDE and MP (d) have distance based cost functions. Although perhaps not immediately apparent, MAP may also be computed using a distance based cost function. Specically, we may retrieve the MAP estimate using MP(d) with an adequately small radius d as shown in the theorem below and also borne out by our evaluation results. Theorem 1. If the posterior distribution f RjO is continuous and the MAP estimate lies in the interior ofS, then for any > 0 there exists > 0 such that, jP 1 (d)P 2 (d)j 8 0<d<; (3.5) where P 1 (d) and P 2 (d) give the probability of the receiver location being within distance d of the MAP and the MP (d) estimate respectively. Proof. For ease of notation, denote the posterior distribution f RjO by f. Dene the open ball of radius d> 0 aroundr as B d (r) =fx2Sjkxrk 2 <dg: (3.6) We denote the interior ofS by S o . Letr 1 2S o be the MAP estimate. Pick any > 0. Dene 0 = R r2S dr : (3.7) From the continuity of f there exists > 0 such that f(r 1 )f(r)< 0 8kr 1 rk 2 <: (3.8) 20 Consequently for any 0<d< we have, Z r2B d (r1) (f(r 1 ) 0 ) dr Z r2B d (r1) f(r) dr: (3.9) Letr 2 be the MP () estimate. By denition, Z r2B d (r1) f(r) dr Z r2B d (r2) f(r) dr: (3.10) Sincer 1 is the MAP estimate, Z r2B d (r2) f(r) dr Z r2B d (r2) f(r 1 ) dr: (3.11) Combining the above inequalities together, we conclude (f(r 1 ) 0 ) Z r2B d (r1) drP 1 (d)P 2 (d)f(r 1 ) Z r2B d (r2) dr: (3.12) Note that the dierence between the upper and lower bounds in the above inequality is at most . Consequently, the dierencejP 1 (d)P 2 (d)j is also bounded by , jP 1 (d)P 2 (d)j 8 0<d<: (3.13) 3.2 Comparing Algorithms using Stochastic Dominance In this section, we explore how we may meaningfully compare algorithms using our optimization based framework. If we are interested in a particular cost function, then comparing two algorithms 21 is straightforward. Compute their expected cost and the algorithm with the lower cost is better. However, with stochastic dominance we can deduce something more powerful. For any two localization algorithms A 1 and A 2 , if A 1 stochastically dominates A 2 , then the expected cost of A 1 does not exceed that of A 2 for any distance based cost function. More formally, Theorem 2. For any two localization algorithms A 1 ;A 2 2A, if A 1 stochastically dominates A 2 , then E [g(D A1 )]E [g(D A2 )] 8g2G: (3.14) If A 1 strictly stochastically dominates A 2 , then E [g(D A1 )]<E [g(D A2 )] 8g2G: (3.15) Proof. Recall that the domain of the cost function is given byQR 0 . For any element p in the range of the cost function, the inverse image of p is given by g 1 (p) =fd2Qjg(d) =pg: (3.16) Let h(p) = inf g 1 (p) . From the monotonicity of g(d), we conclude that h(p) is monotonically increasing in p. Moreover, we have the following relation for any algorithm A2A P (g(D A )p) =P (D A h(p)): (3.17) This allows us to characterize the CDF of g(D A ) asF A (h (g(d A ))). Moreover, since cost functions are non-negative, g(D A ) 0. Thus the expected value of g(D A ) for any algorithm A may be expressed as E [g(D A )] = sup(Q) Z 0 (1F A (h (g(d A )))) dd A : (3.18) 22 Consider any two algorithms A 1 and A 2 such that A 1 stochastically dominates A 2 . From (3.3), we have F A1 (h (g(d)))F A2 (h (g(d))) 8d2Q: (3.19) From (3.18) and (3.19), we conclude E [g(D A1 )]E [g(D A2 )]; (3.20) which proves (3.14). If A 1 strictly dominates A 2 , then in addition to (3.19), A 1 and A 2 satises F A1 (h (g(d)))>F A2 (h (g(d))) 8d2 [d 1 ;d 2 ]: (3.21) From (3.18), (3.19) and (3.21) we have E [g(D A1 )]<E [g(D A2 )]: (3.22) Theorem 2 is the rst step towards ranking algorithms based on stochastic dominance. It also gives us a rst glimpse of what an optimal algorithm might look like. From Theorem 2, an algorithm A that stochastically dominates every other algorithm is clearly optimal for the entire set of distance based cost functions. However, it is not obvious that such an algorithm need even exist. On the other hand, we can compute algorithms that are optimal with respect to a particular cost function. As given in the following theorem, such optimality implies that the algorithm isn't strictly dominated by any other algorithm. In other words, if algorithm A is optimal with respect to a distance based cost function g, then A is not strictly stochastically dominated by any other algorithm B. 23 Theorem 3. For a localization algorithm A2A, if there exists a distance based cost function g2G such that for any other localization algorithm B2A E [g (D A )]E [g (D B )]; (3.23) then for all algorithms B2A, there exists a distance d2Q such that F A (d)F B (d): (3.24) Proof. Say A2A is the optimal algorithm for the distance based cost function g2G. Further assume that for some algorithm B2A, F A (d)F B (d) 8d2Q; (3.25) and that there exists some d 0 2Q such that F A (d 0 )<F B (d 0 ). If these conditions implied that A is strictly dominated by B, then by Theorem 2 we have E [g(D A )]>E [g(D B )]: (3.26) However, from the optimality of A with respect to g, E [g(D A )]E [g(D B )]; (3.27) 24 which contradicts (3.26). To complete the proof of Theorem 3 we need to show the strict dominance of B over A. Since the error CDFs are right continuous, for any > 0 there exists a > 0 such that F B (d)F B (d 0 )< 8d2 [d 0 ;d 0 +]; (3.28) and F A (d)F A (d 0 )< 8d2 [d 0 ;d 0 +]: (3.29) Choosing < F B (d 0 )F A (d 0 ) 2 implies F B (d)>F A (d) 8d2 [d 0 ;d 0 +]; (3.30) which proves that B strictly dominates A. Theorems 2 & 3 establish the utility of ranking algorithms based on stochastic dominance. However, if we are given two algorithms, it is not necessary that one should dominate the other. As Theorem 4 shows, if they do not conform to a stochastic dominance ordering, the algorithms are incomparable. Theorem 4. For any two localization algorithmsA 1 andA 2 , ifA 2 does not stochastically dominate A 1 and vice versa, then there exits distance based cost functions g 1 ;g 2 2G such that E [g 1 (D A1 )]<E [g 1 (D A2 )]; (3.31) and E [g 2 (D A2 )]<E [g 2 (D A1 )]: (3.32) 25 Proof. Since A 2 does not stochastically dominate A 1 , there exists some d 1 2Q such that F A1 (d 1 )>F A2 (d 1 ): (3.33) Since A 1 does not stochastically dominate A 2 , there exists some d 2 2Q such that F A1 (d 2 )<F A2 (d 2 ): (3.34) Dene the cost functions g 1 and g 2 as g 1 (D) = 8 > > > < > > > : 0 if Dd 1 1 if D>d 1 ; (3.35) and g 2 (D) = 8 > > > < > > > : 0 if Dd 2 1 if D>d 2 : (3.36) Thus, E [g 1 (D A1 )] = 1F A1 (d 1 ) (3.37) < 1F A2 (d 1 ) =E [g 1 (D A2 )]; (3.38) where the inequality follows from (3.33). Similarly, E [g 2 (D A2 )] = 1F A2 (d 2 ) (3.39) < 1F A1 (d 2 ) =E [g 2 (D A1 )]; (3.40) where the inequality in the second step follows from (3.34). 26 Theorem 4 establishes the existence of a \Pareto Boundary" of the set of all localization algorithms. Choosing an algorithm from within this set depends on additional considerations such as its performance on specic cost functions of interest. 3.3 Comparison based on Upper Bound of Error CDFs In the previous section, we focused on using stochastic dominance to rank and compare algorithms without paying much attention to what an ideal algorithm might look like. In this section, we explore this topic more detail. To begin, we ask if there exists an algorithm that dominates every other algorithm? From Theorem 2 we know that such an algorithm, if it exists, will be the best possible algorithm for the class of distance based cost functions. Moreover the error CDF of such an algorithm will be an upper bound on the error CDFs of all algorithms A2A. Denition 5. We denote the upper envelope of error CDFs for all possible algorithms A2A by F . We now turn our attention to formally dening the error boundF . Our denition also provides us with a way to compute F . Let D A represent the distance error for algorithm A. Consider the following class of MP (d) cost functions. For each d2Q, let g d (D) = 8 > > > < > > > : 0 if Dd 1 if D>d: (3.41) Then, the value of F at any distance d2Q may be computed using the MP (d) cost function at that distance. More formally, 27 Denition 6. The upper envelope of error CDFs for all possible algorithms A2A, F is dened as F (d) = sup A2A f1E [g d (D A )]g; 8d2Q: (3.42) The upper envelope of error CDFs, F , satises the following properties: 1. F stochastically dominates every algorithm A2A, 2. F is monotonically increasing in [0;d ], 3. F is Riemann integrable over [0;d ]. The monotonicity of F is direct consequence of the monotonicity of CDFs. Moreover, since F is monotonic, it is also Riemann integrable [42, p. 126]. In general, F may not be attainable by any other algorithm. However, as we show below, it is achievable under certain circumstances, which lends credence to its claim as a useful upper bound that may be used as a basis of comparison of localization algorithms. Given the ideal performance characteristics of F , it is worthwhile to investigate if it is ever attained by an algorithm. A trivial case is when the MP(d) algorithm yields the same estimate for all distances of interest in the domain. In this particular case, MAP and MP(d) are optimal since the error CDF of the MAP or MP(d) estimate traces F . As an illustration consider a continuous symmetric unimodal posterior distribution over a circular space with the mode located on the center of the circle. Clearly, the MAP estimate is given by the center. Moreover, the MP(d) estimate is the same at all distances, namely the center of the circle. Thus we immediately have that both the MAP and MP (d) estimates have attained F . An extensive discussion on the attainability of F can be found in [18]. Thus we see that there exist conditions under which F is attained by an algorithm. Conse- quently, it is worthwhile to search for algorithms that are close to this bound or even attain it 28 under more general settings. This leads us directly to the second method of comparing algorithms. We identify how close the error CDFs of our algorithms get to the upper bound F . Consider algorithms A;B2A. Intuitively, if the error CDF of A is closer to F than that of B, then it seems reasonable to expect A to perform better. To make this idea precise, we need to dene our measure of closeness to F . In the following paragraph, we propose one such measure of how close the error CDF of an algorithm A is to F . Our proposal satises a nice property. Namely, searching for an algorithm that is optimal over this measure is equivalent to searching for an algorithm that minimizes a particular distance based cost function. Consequently, to specify the algorithm we only need to identify this cost function. Denition 7. The area between the error CDF of algorithm A and the upper envelope of error CDFs is given by A = d Z 0 (F (x)F A (x)) dx: (3.43) The intuition behind our measure is can be summarized easily. We seek to nd an algorithm A that minimizes the \area enclosed" by F and the error CDF of A. Note that A 0 for all A2A. In general, it is not clear if every measure of closeness between F and F A will yield a cost function for us to minimize. However, if we do nd such a cost function, then we have the advantage of not needing to explicit know F in the execution of our algorithm. This is the case for A as proved in the theorem below. Theorem 5. The algorithm that minimizes the area between its error CDF and the upper envelope of error CDFs for all possible algorithms is the MEDE algorithm. Proof. As F dominates every other algorithm, the area under the CDF curve of any other algorithm is no greater than the area under F . Consequently, maximizing R A is equivalent to maximizing the area under the CDF curve of the algorithm. We use this fact to show that the 29 -100 -50 O 0.0 0.1 0.2 0.3 0.4 fO (o|R = r) (a) -100 -50 O 0.00 0.05 0.10 0.15 f O (o|R = r) (b) -100 -50 O 0.0 0.1 0.2 0.3 fO (o|R = r) (c) -100 -50 O 0.00 0.05 0.10 f O (o|R = r) (d) -100 -50 O 0.0 0.1 0.2 0.3 0.4 fO (o|R = r) (e) -100 -50 O 0.00 0.05 0.10 0.15 f O (o|R = r) (f) Figure 3.1: Illustration of the empirically estimated distribution of O using signal strength measurements taken at dierent locations. Each subplot refers to the distribution of signal strength from a unique access point. The left subplots refer to measurements taken at night, while the right subplots refer to measurements taken during daytime. Our framework makes better use of the higher variance data. 30 expected distance error of algorithm A,E [D A ] is a linear function of R A . Thus an algorithm that minimizes R A minimizesE [D A ] as well and vice versa. More formally, for all A2A R A = d Z 0 (F (x)F A (x)) dx (3.44) = d Z 0 (1F A (x)) dx + d Z 0 (F (x) 1) dx (3.45) =E [D A ] +; (3.46) where the last equality follows from the fact that the random variable D A is non-negative and is the constant given by = R d 0 (F (x) 1) dx. As consequence of Theorem 5, we note that if F is attainable by any localization algorithm, then it is attained by MEDE. This in turn yields a simple test for ruling out the existence of an algorithm that attains F . On plotting the error CDF plots of dierent algorithms, if we nd an algorithm that is not dominated by MEDE, then we may conclude that F is unattainable. Thus it is relatively easy to identify cases where there is a gap between F and MEDE. However, the issue of conrming that MEDE has attained F is more dicult as it involves a search over the set of all algorithms. In summary, the utility of F lies in its ability to pin point the strengths and weaknesses of a proposed algorithm. As we have seen, some algorithms such as MP(d) is designed to do well at specic distances while others such as MEDE aims for satisfactory performance at all distances. Other algorithms lie somewhere in between. Consequently, choosing one algorithm over the other depends on the needs of the application utilizing the localization algorithm. Therein lies the strength of our proposed framework. It allows us to eectively reason about the applicability of an algorithm for the use case at hand. 31 3.4 Comparison with the Cram er{Rao Bound The Cram er{Rao Bound (CRB) is often used as an aid in evaluating localization algorithms [16,17, 28,43]. However, the CRB is not necessarily a good choice as an absolute measure of performance for every localization algorithm. Let D A denote the distance error corresponding to a algorithm A. Dene d A =E [D A ]. Since the ideal distance error is 0, d A is the bias of algorithm A. The expected mean squared error of algorithm A may be then expressed as E D 2 A =E h (D A d A ) 2 i +d 2 A (3.47) = Var[D A ] + (Bias[D A ]) 2 : (3.48) In our setting, the CRB is a lower bound on Var[D A ]. Thus an algorithm that attains the CRB is optimal only if (i) our objective is to minimize the mean squared error, and (ii) the class of algorithms under consideration is unbiased. In our framework, the MMSE algorithm given in Table 2.2 shares the same objective as that of an algorithm evaluated against the CRB. However, the MMSE algorithm considers both variance and bias simultaneously, allowing for estimates that have a small, non-zero bias combined with a small variance. 3.5 Numerical Results We evaluate the proposed framework using simulations, traces and real world experiments. In Section 3.5.1, we provide an illustration of how ngerprinting methods t in to the our Bayesian optimization framework, using real-world data collected from an indoor oce environment. We show that while many implementations of ngerprinting use only the mean signal strength from each transmitter, we are able to better utilize the collected data by building an empirical distribution of the received observations. 32 1 2 3 0.8 0.9 1.0 MAP MMSE MEDE FING (a) 1 2 3 0.6 0.7 0.8 0.9 1.0 MAP MMSE MEDE FING (b) 1 2 3 0.80 0.85 0.90 0.95 1.00 MAP MMSE MEDE FING (c) 1 2 3 0.6 0.7 0.8 0.9 1.0 MAP MMSE MEDE FING (d) 1 2 3 0.80 0.85 0.90 0.95 1.00 MAP MMSE MEDE FING (e) 1 2 3 0.7 0.8 0.9 1.0 MAP MMSE MEDE FING (f) Figure 3.2: Comparison of error CDFs for dierent localization algorithms. Traditional ngerprint- ing based on matching the mean signal strengths is indicated by the moniker `FING'. Each subplot indicates the algorithm performance for a dierent set of test data. The left subplots refer to the case when the signal strengths are tightly clustered around the mean, while the right subplots refer to the measurements with more variance. 33 We also evaluate the performance of the MLE, MP (d), MMSE and MEDE using simulations as well as using traces [11]. In both cases the signal propagation was modelled using a simplied path loss model with log-normal shadowing [39]. We assume that the prior distribution (f R ) is uniform overS. 0.50 0.75 Fraction of Training Data 1.1 1.2 1.3 1.4 Distance Error (m) MAP MMSE MEDE Figure 3.3: Variation of distance error with the size of the training data set. For each fraction of the original data set, we compute the distance error for 100 random choices of the data points. The regression line for each algorithm is also shown. The results indicate that our empirical estimates of the prior improves with increasing training data, which results in better algorithm performance. 3.5.1 Fingerprinting Methods Model-based methods assume that the distribution of observations given a receiver location is known. In contrast, ngerprinting methods avoid the need to model the distribution of observations by noting that one only needs to identify the change in distribution of observations from one location to another. Most implementations simplify matters even further by assuming that mean of the observations is distinct across dierent locations in our space of interest. The estimated mean is thus said to `ngerprint' the location. This approach works well only in cases when the distribution of signal strengths is mostly concentrated around the mean. In this case, the approach of using only the mean signal strength 34 amounts to approximating the signal strength distribution with a normal distribution centered at the estimated mean signal strength and variance approaching zero. However, if the distribution has signicant variance this approach is likely to fail. Indeed, in the regime of signicantly varying signal strengths, keeping only the mean amounts to throwing away much of the information that one has already taken pains to collect. As already indicated in Chapter 2, we make better use of the collected data by empirically constructing the distribution of observations f O (ojR =r). This formulation allows us to the use the same algorithms as in the model-based approach, as the only dierence here is in the construction of f O (ojR =r). This is in contrast to many existing implementations where one resorts to heuristics such as clustering. Indeed, under mild assumptions it is well known that the empirical distribution converges with probability one to true distribution [44] which gives our approach the nice property that it can always do better given more data. As a proof-of-concept, we compare the performance of our approach with that of traditional ngerprinting methods in two dierent settings. The data was collected from a 4 m 2 m space inside an oce environment. The space was divided into eight 1 m 1 m squares and signal strength samples were collected from the center of each square. Two hundred and fty signal strength readings were collected for the ten strongest access points detected using the WiFi card on a laptop running Linux. The beacon interval for each access point was approximately 100 ms. The signal strength measurements were taken 400 ms apart. Two sets of data were collected, one at night time and the other during the day. The measurements taken at night show that the observed signal strengths are highly concentrated around the mean, as can be seen from the left subplots of Figure 3.1. The measurements taken during daytime show slightly more variability as can be seen in the right subplots of Figure 3.1. Ten percent of the collected data is randomly chosen for evaluating algorithm performance. The remaining data is used to construct the empirical distribution ^ f O (ojR =r) from which the MMSE, MAP and MEDE estimates are derived. It is also used to compute the mean signal 35 strength vector or ngerprint for each location. For the algorithm denoted as `FING' in Figure 3.2, the ngerprint closest to the test observation vector (in terms of Euclidean distance) is used to predict the location. From the performance results given in Figure 3.2, we see that, as expected, the traditional ngerprinting approach works very well when the variability in the signal strength data is low. On the other hand, even with slight variability in the data, the estimates derived using our Bayesian framework outperforms traditional ngerprinting. To illustrate that our framework performs better given more observations over time, we investigate the variation of distance error with increasing size of the training data set. The data set with more variability was chosen for the purposes of this illustration. For each fraction of the original data set, we compute the distance error for 100 random choices of the data points, for MAP, MEDE and MMSE algorithms. Figure 3.3 shows how the average of these distance errors varies on increasing the size of the training data set. As can be seen from Figure 3.3, with increasing data we are able to better estimate the empirical distribution ^ f O (ojR =r) from which the MMSE, MAP and MEDE estimates are derived, thereby resulting in better performance. 3.5.2 Simulation Model Sayfl 1 ;l 2 ;:::;l N g (N > 2) are the known positions of (N) wireless transmitters. We assume each transmitter is located on a planar surface given byS = [0;l][0;b] wherel;b 2R >0 . The locations of the transmitters are given by the two dimensional vector l i = (x i ;y i )2S8i2f1; 2;:::;Ng. We wish to estimate the receiver locations, given by the vectorr = (x;y), from the received signal strengths. For a given transmitter-receiver pair, say i, the relationship between the received signal power (P i r ) and the transmitted signal power (P i t ) may be modelled by the simplied path loss model: P i r =P i t K h d0 di i W i , where the distance between the receiver and thei th transmitter is given 36 by d i (r) = q (xx i ) 2 + (yy i ) 2 , and W i represents our noise that is log-normally distributed with zero mean and variance 2 . In log scale, the path loss model is given by P i r j dBm =P i t j dBm +Kj dB 10 log 10 d i d 0 +W i j dB ; (3.49) where K is a constant given by the gains of the receiver and transmit antennas and possibly the frequency of transmission. d 0 is a reference distance, taken to be 1m. In this setting, our estimation problem may be restated as follows. We are given measurements of the receiver signal strengths P 1 r ;P 2 r ;:::;P N r from which we are to estimate the receiver location r. Thus, our observation vectorO may be written as O i =P i r j dBm P i t j dBm Kj dB =W i j dB 10 log 10 d i d 0 ; for all i2f1;:::;Ng. In other words, the distribution of each observation is given by O i N 10 ln [d i (r)]; 2 . Finally, the distribution of the observation vector f O (ojR =r) can be obtained from the above by taking the product of all the individual observation pdfs. 3.5.3 Simulation and Trace Results The parameters for the simulation were chosen to be identical as that of the traces. The dimensions of the area of interest (S) was 50 m 70 m. Sixteen transmitters were chosen randomly and 100 RSSI readings were taken for each transmitter at 5 distinct receiver locations. The transmit power was kept constant at 16 dBm. The estimated model parameters were a path loss of K = 39:13 dB at reference distance d 0 = 1 m, fading deviation = 16:16 and path loss exponent = 3:93. We used two distances for the MP (d) algorithm: (i) = 0:5 m was relatively small while (ii) d = 3 m covers a more sizable area. The normalized performance results are presented in Table 3.1. Each row evaluates the performance of the indicated algorithm across dierent metrics, while each 37 Table 3.1: Normalized Performance Results Simulations Likelihood P () P (d) MSE EDE MLE 1.0000 0.9222 0.8994 1.4359 1.1240 MP () 0.9808 1.0000 0.9165 1.3172 1.0997 MP (d) 0.6963 0.7573 1.0000 1.2860 1.0857 MMSE 0.6806 0.6980 0.8737 1.0000 1.0643 MEDE 0.7247 0.7455 0.9080 1.1272 1.0000 Traces Likelihood P () P (d) MSE EDE MLE 1.0000 0.9989 0.9865 1.1583 1.1808 MP () 0.9976 1.0000 0.9881 1.1522 1.1785 MP (d) 0.8213 0.8596 1.0000 1.2605 1.3760 MMSE 0.9013 0.9171 0.9797 1.0000 1.1101 MEDE 0.8529 0.8685 0.9517 1.1569 1.0000 column demonstrates how dierent algorithms perform under the given metric. The performance value for each metric is normalized by the performance of the best algorithm for that metric. Thus the fact that each algorithm performs best in the metric that it is optimized for, is re ected in the occurrence of ones as the diagonal entries in the table. As the algorithms presented here are each optimal for a specic cost function, the theory predicts that none of them are strictly stochastically dominated by any other algorithm. The results conrm this theoretical prediction. Moreover, choosing to be small results in the performance of MP () being near identical to that of MLE, which is in line with what we expect from Theorem 1. Intuitively, MP(d) tries to identify the region with given radius that `captures' most of the a posteriori pdf f R (rjO =o). Consequently for a suciently small radius d, MP(d) will return a region that contains the MLE estimate. As a result, we are justied in thinking of the MP(d) algorithm as a generalization of MLE (or MAP, in case our prior is non-uniform). In practice, 38 the value of d to be used will be dictated by the needs of the application that makes use of the localization service. 0.2 0.4 0.6 Normalized Distance Error 0.25 0.50 0.75 CDF MAP MMSE MEDE NBS Figure 3.4: An illustration of stochastic dominance. We plot the error CDFs of MAP, MMSE, MEDE, and a na ve baseline algorithm (NBS) that returns the location of the base station with the highest signal strength. For this illustration, three transmitters were placed evenly on a line and log-normal fading was assumed. The baseline algorithm is outperformed by MAP, MMSE, and MEDE, as demonstrated by their strict stochastic dominance over the baseline. 39 Chapter 4 Distributionally Robust Localization Algorithms 4.1 Preliminaries In this section, we give a brief overview of robust optimization methods and review the literature on robust localization. In general, robust methods aim to maintain satisfactory performance in the face of small variations from the model assumptions [45]. In this approach, we rst specify an appropriate uncertainty set that captures the possible model variations that we wish to consider. The robust solution maximizes its performance with respect to the worst possible model contained within this uncertainty set [46]. The heart of the problem lies in choosing a set that suciently models the complexities of the problem at hand while remaining computationally tractable. Recent years saw a dramatic increase in both the availability of data and the computational capability needed to process that data. Robust optimization methods have adapted to this trend by designing uncertainty sets that better utilize the data at hand [47,48]. In this chapter, we make use of these methods to develop data-driven distributionally robust solutions for the problem of indoor localization. While not addressing the problem of indoor localization, robust optimization methods have been employed in a similar setting [49], where objectives such as minimizing the energy consumption was considered under distance uncertainty. 40 Many indoor positioning systems base their location estimation on a model-based probabilistic description of location-dependent observations such as received signal strength (RSS) measure- ments [3]. Invariably, such a model makes assumptions about the observations and the environment that may not strictly hold in practice, giving rise to the need for robust solutions. We may develop dierent robust solutions depending on the model assumptions we wish to be robust against. We list below three common robustness targets that are most relevant to the indoor localization problem. 4.1.1 Robustness to Outliers In this regime, we wish to be robust to arbitrary variation of a small subset of our observations. Such variation may be intentional as in the case of an attack mounted against the localization infrastructure. Thus, this notion is equivalent to Secure Localization, where we aim to protect the localization system against arbitrary tampering of a subset of observations. The literature on robust indoor localization has focussed mostly on this robustness target [50{58]. 4.1.2 Robustness to Parameter Uncertainty In this regime, we wish to be robust to small variations in the model parameters. In the context of indoor localization, this paradigm assumes that the distribution of the observation vectors is known. For instance, it is often assumed that the noise corrupting indoor received signal strength measurements have a log-normal distribution [59]. Consequently, we may wish to be robust to small variations to the mean and variance of these observations. 4.1.3 Distributional Robustness In this regime, we aim to be robust to changes in the underlying distribution of our observation vector. We specify an appropriate class of distributions that our observations may belong to, and then optimize over the worst case distribution from that class. Note that this is a generalization of 41 robustness to parameter uncertainty. Indeed, being robust to parameter uncertainty is equivalent to specifying that our distribution set includes only distributions of a specied form (say, the Normal distribution), but with the distribution parameters (say, the mean and variance) taking values from within a specied set. We focus our attention in this chapter to the commonly considered case of 2D localization. We begin by formulating indoor localization as a Bayesian optimization problem [18], which allows us to introduce our robustness requirements in a simple and natural manner. LetSR 2 be the two-dimensional space of interest in which localization is to be performed. We assume thatS is convex, closed, and bounded. Let the location of the receiver (the node whose location is to be estimated) be denoted asr = [x r ;y r ]. Using a Bayesian viewpoint, we assume that this location is a random vector with some prior distribution f R (r). This prior distribution is used to represent knowledge about the possible position obtained, for instance, from previous location estimates or knowledge of the corresponding user's mobility characteristics in the space; in the absence of any prior knowledge, it could be set to be uniform overS. Leto2R N represent the location dependent observation data that was collected. As an example,o could represent the received signal strength values from transmitters whose locations are known. Mathematically, we only require that the observation vector is drawn from a distribution that depends on the receiver locationr: f OjR (ojr). In case of RSS measurements, this distribution characterizes the stochastic radio propagation characteristics of the environment and the location of the transmitters. Note that this distribution could be expressed in the form of a standard fading model whose parameters are tted with observed data, such as the well-known simple path loss model with log-normal fading [59]. 42 Using the conditional distribution of the observed vector and the prior overR, we obtain the posterior distribution over the receiver locations using Bayes' rule: f RjO (rjo) = f OjR (ojr)f R (r) R r2S f OjR (ojr)f R (r) dr : (4.1) Traditionally, algorithms for localization are methods that derive a location estimate from the above posterior distribution. In this view, a localization algorithm A is a mapping from the observation vectoro, the prior distribution over the location, f R (r), the conditional distribution overo, f OjR (ojr), to a location estimate ^ r. Consequently, the usefulness of this location estimate is intimately tied to the validity of the derived posterior distribution. Our objective in this chapter is to obtain a location estimate that accounts for ambiguity in this posterior. Such ambiguity might stem from either uncertainty about the prior or from uncertainty about the conditional distribution of the observation vector, or both. In this chapter, our focus is on dealing with uncertainty about the conditional distribution of the observation vector, leaving the investigation of other cases for future work. 4.2 Distributionally Robust Formulation Our uncertainty about the posterior distribution is specied by constructing a set of possible posterior distributions, denoted byF. Given an estimate, say ^ r, of the true location, we incur a cost that is assumed to depend only of the Euclidean distance between ^ r and the true location r. Denote the cost function as g (kr ^ rk). We assume that g :R 0 7!R 0 is a non-decreasing continuous function. 43 In the classical non-robust formulation, we construct a single posterior distribution function using Bayes' rule as given in equation (6.6). This posterior, say f RjO (rjo), is used to derive an location estimate ^ r that solves the optimization problem ^ r = arg min r 0 2S E f [g (kr 0 Rk)]; (4.2) where the expectation is overRf RjO (rjo) andS is our space of interest. In the distributionally robust formulation, we minimize the cost of our estimate over the worst possible posterior distribution inF. This can be expressed as ^ r = arg min r 0 2S max f2F E f [g (kr 0 Rk)]: (4.3) The key to solving this robust formulation eciently is to nd an appropriate uncertainty setF that is not overly conservative while still providing satsifactory robustness guarantees. We discuss this issue in the following section. 4.3 Uncertainty Set Construction We dene our uncertainty setF to be the class of distributions that have the mean and the covariance matrix to be close to our best estimate of our mean and covariance of R. Denote this estimate of the mean 0 , and the estimate of the covariance 0 . This estimate of the mean and covariance may be derived by utilizing the collected observation data. For a choice of tunable parameters, 1 0 and 2 1, that express our condence about the estimated mean 44 and covariance, consider the set of distributions,F (S; 0 ; 0 ; 1 ; 2 ), such that each distribution function f2F satises P (R2S) = 1; (4.4a) (E f [R] 0 ) > 1 0 (E f [R] 0 ) 1 ; (4.4b) E f h (R 0 ) (R 0 ) > i 2 0 : (4.4c) These constraints dene a set of distributions that satisfy the following properties: (i) The mean lies within an ellipsoid of size 1 centered at 0 , and (ii) the covariance matrix lies within a positive semi-denite cone dened by the matrix inequality. It has been shown that such a construction of the uncertainty set often results in computationally tractable optimization problems that can be solved using mature, widely available software packages [48,60]. Moreover, this construction is conceptually simple while still encompassing a rich collection of interesting uncertainty sets [60] and naturally incorporating the use of all the available data [47]. The initial estimate of the mean, 0 , and the covariance, 0 , is crucial to the above formulation. These may be derived using the available observation data. In the following section we illustrate how to incorporate our observation data in the uncertainty set construction. In addition, we identify the properties of the robust solution and simplify the above problem to a form that is easily solved using a software solver. 4.4 Initial Estimate for Uncertainty Sets Typically localization systems collect observations that depend on location and then describe this relationship either using an analytical model or using an empirically derived distribution. Such descriptions are used to construct the distribution of the observations, f OjR (ojr). Thus the source 45 of uncertainty in the posterior, f RjO (rjo), may be viewed as stemming from uncertainty about the conditional distribution of the observation vector, f OjR (ojr). Given an uncertainty set containing possible distributions for f OjR (ojr), we can conceptually derive an uncertainty set for the posterior by applying Bayes' rule for each candidate distribution forf OjR (ojr). However, such a na ve approach is not computationally feasible since the uncertainty sets are not necessarily nite. Thus, we use a model-derived or empirically estimated distribution for f OjR (ojr) to derive the estimates 0 and 0 . For RSS data, the log-normal model may be used. Denote this model-derived or empirically estimated distribution as ^ f OjR (ojr). Using this distribution, we can derive a candidate posterior using Bayes' rule. Denote this derived posterior distribution as ^ f RjO (rjo). Then, 0 =E ^ f [R]; (4.5) 0 =E ^ f h (R 0 ) (R 0 ) > i : (4.6) The estimates given above, along with a choice of the parameters 1 0 and 2 1, completes the description of our uncertainty set,F (S; 0 ; 0 ; 1 ; 2 ), for the posterior. 46 Uncertain distribu- tion, f OjR (ojr) Observation vector,o Prior distribution, f R (r) Bayes' Theorem Estimates 0 , 0 Uncertain poste- rior, f RjO (rjo) Parameters 1 , 2 Uncertainty set, F (S; 0 ; 0 ; 1 ; 2 ) Robust estimate (^ r), ^ r = arg min r 0 2S max f2F E f [g (kr 0 Rk)] Figure 4.1: Illustration of key steps involved in the construction of the uncertainty set and the computation of the robust estimate. The uncertainty in the posterior distribution, f RjO (rjo), stems from the uncertainty about the distribution of the observations, f OjR (ojr). This uncertain posterior is used to derive the initial estimates 0 and 0 . These estimates, along with the parameters 1 and 2 , determine the uncertainty set used to derive the robust estimate. 47 Chapter 5 Ecient Computation Of The Distributionally Robust Estimate In this chapter we investigate the structure of the solution to the inner moment problem max f2F E f [g (kr 0 Rk)]; for suciently large values of the parameters 1 and 2 , leaving the discussion on the impact of 1 and 2 for the subsequent section. The following formulation is inspired by Scarf's classical result in inventory theory [61]. We show that for any given candidate locationr 0 , the distribution that yields the maximum cost will have positive support only on the boundary of our space, @S. This is formalized in the following theorem. Theorem 6. For anyr 0 2S and a non-decreasing continuous cost function g :R 0 7!R 0 , there existsr 2@S such that max f2F E f [g (kr 0 Rk)]g (kr 0 r k): (5.1) Proof. Letr = arg max r2S g (kr 0 rk). SinceS is closed and bounded,r exists. Assume that no point on the boundary attains the maximum cost c =g (kr r 0 k). Fix a value ofr that lies in the interior ofS and attainsc . Consider the rayz() =r 0 +(r r 0 ) where 0. SinceS is 48 convex,z()2S for all 2 [0; 1]. Sincer lies in the interior ofS there exists a 0 > 1 such that z( 0 ) lies in the interior ofS. SinceS is closed and bounded, the ray must intersect the boundary for some value of greater than 0 . In other words, we can nd a ^ > 0 > 1 such that the ray z() intersects the boundary @S at ^ r =z( ^ )2@S. Then, k^ rr 0 k = ^ kr r 0 k>kr r 0 k: (5.2) Since the cost function g is non-decreasing, g (k^ rr 0 k)g (kr r 0 k) =c : (5.3) Since c is the maximum cost by denition, we must have g (k^ rr 0 k) =c , which contradicts our initial assumption that no point on the boundary attains the maximum cost c . Thus there exists r 2@S such thatr = arg max r2S g (kr 0 rk). For any posterior distribution f2F, E f [g (kr 0 Rk)] = Z r2S g (kr 0 rk)f RjO (rjo) dr (5.4) g (kr 0 r k) Z r2S f RjO (rjo) dr (5.5) =g (kr 0 r k): (5.6) Theorem 6 indicates that attempting to maximize our expected cost, E f [g (kr 0 Rk)], pushes the support of the resulting distribution f closer to the boundary of our spaceS. Furthermore, since our space of interestS is convex and inR 2 , we can eciently approximateS using a convex polygon [62]. Using such an approximation allows us to further simplify the structure of the worst 49 case distribution. Let ^ S be our convex polygon approximation ofS. For any pointr 0 2 ^ S, the point in ^ S that is farthest fromr 0 is one of the vertices of ^ S. Consequently, for a convex polygon ^ S inR 2 , we have the following renement of Theorem 6. Theorem 7. If ^ S is a convex polygon in R 2 , then for each r 0 2 ^ S, there exists a vertex v2 ^ S such that max f2F E f [g (kr 0 Rk)]g (kr 0 vk): (5.7) Proof. It follows from Theorem 6 that the upper bound is attained by a pointr 2@ ^ S such that r = arg max r2 ^ S g (kr 0 rk). Since the function g is non-decreasing,r = arg max r2 ^ S kr 0 rk. Hence, it remains to be shown that the maximum ofkr 0 rk overr2 ^ S is attained by a vertex of ^ S. Let ^ V =fv 1 ;:::;v n g be the set of vertices of ^ S. Then ^ S may be represented as the convex hull of its vertices, conv( ^ V ) = ^ S. In other words, for anyr2 ^ S there existsf i g n i=1 such that r = P n i=1 i v i , where i 2 [0; 1] for all i2f1; 2;:::;ng and P n i=1 i = 1. Fix somer 0 2 ^ S. Then for anyr2 ^ S, we have kr 0 rk = r 0 n X i=1 i v i (5.8a) = n X i=1 i (r 0 v i ) (5.8b) n X i=1 i kr 0 v i k (5.8c) kr 0 v j k; (5.8d) where (5.8c) follows from the Cauchy-Schwarz inequality, andv j = arg max 1in kr 0 v i k. Clearly the upper bound in (5.8d) is attained by setting j = 1 and i = 0 for all i6=j, i2f1;:::;ng. In other words, the maximum ofkr 0 rk overr2 ^ S is attained by settingr to be an appropriate vertex of ^ S. 50 While the above results are presented for a two dimensional spaceS, they are easily extended to a three dimensional setting [63]. Theorem 6 indicates that the worst case distribution has support only on the boundary of our space. Moreover, in case we employ a convex polygon approximation for our space of interest, Theorem 7 suggests that we only need to consider distributions that have support on the vertices. However, it is worth noting that is not guaranteed that these bounds are attained by a distribution withinF for all values of 1 and 2 . For instance, consider the scenario where we attempt to simplify the inner moment problem, max f2F E f [g (kr 0 Rk)], by reducing our search space to include only those distributions within F that have support only on the vertices of ^ S. There always exists such distributions that can satisfy constraint (4.4b) for any choice of 0 and 1 . However, the same cannot be said for constraint (4.4c). Considering only distributions with support on the vertices of ^ S eectively imposes a lower bound on the covariance matrix ofR, and hence the parameters 2 and 0 needs to be chosen with some care. We will revisit this issue of parameter selection in the following section. 5.1 Low Complexity Formulation We now turn our attention to the issue of simplifying our optimization problem (4.3) to a form that is easily computed using standard software tools. Assume thatS is a convex polygon. If the original space is convex but not a polygon, we can always nd a convex polygon approximation with a desired level of accuracy [62]. We then discretize the distributions withinF by allowing them to have support only on a discrete grid-like set of locations, say V , withinS. This imposes nearly no compromises since we can get arbitrarily close to the continuous setting by making our grid progressively ner. The vertices ofS are always included in V . This ensures that the convex hull of V always returnsS, conv(V ) =S. 51 This reduction of the uncertainty set to include only discrete distributions allows for us to write our optimization problem in a simpler manner. Let V =fr i g n i=1 represent the locations of the grid points withinS. The vectorp represents a distribution over this set of grid points. Dene A i = (r i 0 )(r i 0 ) > (5.9) B i = 2 6 6 6 6 4 0 (r i 0 ) (r i 0 ) > 1 3 7 7 7 7 5 ; (5.10) for alli2f1; 2;:::;ng. For each candidate locationr2S, letg r represent the cost vector for that location, (g r ) i =g(krr i k) for alli2f1; 2;:::;ng. Then for each candidate locationr, the inner moment problem in (4.3) can be represented by the following semi-denite program (SDP) [64]: maximize g > r p (5.11a) subject to n X i=1 p i A i 2 0 ; (5.11b) n X i=1 p i B i 0; (5.11c) n X i=1 p i = 1; (5.11d) p i 0 for all i2f1; 2;:::;ng: (5.11e) The solution to the above optimization problem gives us the worst-case distribution corresponding to the candidate locationr. Note that, for any givenr, the above inner moment problem (5.11) is a SDP that can be solved eciently both in theory and practice [65]. As we typically deal with a nite number of possible location estimates, we can potentially enumerate the solution of problem (5.11) for all possible location estimates and then choose the best among them. Thus the problem remains computationally feasible in its current form. Moreover, as the optimization at 52 the candidate locations are independent of each other, we can further accelerate the process using parallel or distributed computing. Alternately, we can formulate the inner moment problem (5.11) in its dual form and use the fact that the minimization operations may be performed jointly. 5.1.1 Deriving the Dual Form The following SDP formulation yields the robust location estimate: minimize Z1;Z2;;r; (5.12a) subject to r2S; (5.12b) Z 1 ;Z 2 0; (5.12c) 2 tr( 0 Z 1 ) +; (5.12d) tr(A i Z 1 ) tr(B i Z 2 ) + (g r ) i 0; (5.12e) for all i2f1; 2;:::;ng. Proof. Note that the matrices A i and B i are symmetric for all i2f1; 2;:::;ng. The Lagrangian associated with the primal problem (5.11) is given by L(p;Z 1 ;Z 2 ;;) = n X i=1 p i tr(A i Z 1 ) 2 tr( 0 Z 1 ) n X i=1 p i tr(B i Z 2 ) + n X i=1 p i n X i=1 p i (g r ) i n X i=1 i p i : (5.13) 53 Grouping together the terms with the variable coecients yields L(p;Z 1 ;Z 2 ;;) = n X i=1 p i [tr(A i Z 1 ) tr(B i Z 2 ) + i (g r ) i ] 2 tr( 0 Z 1 ): (5.14) The dual function is given by h(Z 1 ;Z 2 ;;) = inf p L(p;Z 1 ;Z 2 ;;): (5.15) Consequently we need tr(A i Z 1 ) tr(B i Z 2 ) + i (g r ) i = 0; (5.16) for all i2f1; 2;:::;ng, to ensure that the dual function lies above1. In addition, the dual variables Z 1 ;Z 2 are symmetric, and i 0 for all i2f1; 2;:::;ng. Thus the dual program can be expressed as minimize (5.17a) subject to Z 1 ;Z 2 0; (5.17b) 2 tr( 0 Z 1 ) +; (5.17c) tr(A i Z 1 ) tr(B i Z 2 ) + (g r ) i 0; (5.17d) for all i2f1; 2;:::;ng. The SDP formulation given above (5.12) can be solved easily using a standard solver such as CVXPY [66] or Convex.jl [67]. This dual form has the advantage that we can obtain both the robust estimate and the corresponding posterior distribution, through the dual variables of 54 SDP (5.12), using a single optimization program. An illustration of the same is given in Figure 5.2 for various values of the parameter 2 . As can be seen in the aforementioned gure, dierent choices of the parameter 2 yields very dierent solutions. For large values of 2 , the support of the posterior get progressively closer to the vertices ofS. This raises the question of how we should choose these parameters for a particular problem at hand. More fundamentally, we need to know the range of parameters for which we are guaranteed that an optimal solution exists. We explore this topic below. 0 2 1 0 1 2 1 2 Figure 5.1: An illustration of valid parameter selection. Assume that we restrictp to have support only on the vertices. Say we have an initial estimate 0 = (0; 1). Then the covariance matrix E p h (R 0 ) (R 0 ) > i is lower bounded by 2 0 0 1 for small values of 1 . In particular, the problem (4.3) is infeasible for the choice of parameters 1 = 1, 2 = 1:5, and 0 = I 2 , where I is the identity matrix. The optimization becomes feasible if 0 = 2I. 5.2 Parameter Selection In this section, we investigate the issue of how to choose the parameters involved in the construc- tion of our uncertainty setF (S; 0 ; 0 ; 1 ; 2 ), which consequently determines the optimization problem (5.12) that generates the robust estimate. As indicated in Section 4.4, the mean 0 and the covariance matrix 0 of the posterior distribution f RjO (ojr) is obtained either using a model such as the log-distance path loss model [39] or using an empirically estimated distribution 55 10 20 30 40 50 x 0 20 40 60 80 y 2 = 4 0 r 0 r 0.015 0.045 0.075 0.105 0.135 0.165 (a) 10 20 30 40 50 x 0 20 40 60 80 y 2 = 8 0 r 0 r 0.008 0.024 0.040 0.056 0.072 0.088 0.104 (b) 10 20 30 40 50 x 0 20 40 60 80 y 2 = 16 0 r 0 r 0.004 0.012 0.020 0.028 0.036 0.044 0.052 (c) 10 20 30 40 50 x 0 20 40 60 80 y 2 = 32 0 r 0 r 0.002 0.006 0.010 0.014 0.018 0.022 0.026 (d) 10 20 30 40 50 x 0 20 40 60 80 y 2 = 64 0 r 0 r 0.0025 0.0075 0.0125 0.0175 0.0225 0.0275 0.0325 (e) 10 20 30 40 50 x 0 20 40 60 80 y 2 = 128 0 r 0 r 0.004 0.012 0.020 0.028 0.036 0.044 0.052 0.060 (f) Figure 5.2: An illustration of the posterior distribution corresponding to the robust estimate ( ^ r), for various values of the parameter 2 . The MMSE estimate ( 0 ) and the actual receiver location (r 0 ) are shown as well. As predicted by Theorems 6 and 7, the worst-case distribution is pushed towards the boundary and vertices of the rectangular 60 m by 80 m space. 56 ^ f OjR (ojr). It remains to be shown that the low complexity formulation (5.12) always yields an optimal solution for any choice of 1 > 0 and 2 > 1. Recall that each distribution in our uncertainty setF must satisfy the constraints (4.4a), (4.4b), and (4.4c). In the low complexity formulations (5.11) and (5.12), our search space involves distributions that have support on a discrete set of locations within S. Since all locations with positive support lie inS, constraint (4.4a) is always satised. SinceS is a closed, and convex polygon inR 2 , any point in its interior can be represented as a convex combination of its vertices [63]. In other words, for any choice of 0 in the interior ofS, there exists a distributionp such thatE p [R] = 0 . Thus the constraint (4.4b) is satised strictly for any choice of 1 > 0. In addition, algorithms to compute convex hulls of a nite set of points in R 2 are well known in the literature [68,69], so this distribution can be explicitly computed if need be. The satisability of the nal constraint (4.4c) is sensitive to our choice of 0 and our choice of V , the discretization ofS. To illustrate this, consider the example given in Figure 5.1. Assume that we force the distributionp to have positive support only on the vertices ofS. Then for a suciently small value of 1 , constraint (4.4b) determinesp, which in turn determines the covariance matrix ^ =E p h (R 0 ) (R 0 ) > i . Thus, if we happen to choose an initial estimate 0 ^ , then our optimization problem (5.12) is infeasible for some values of 2 1. In summary, restrictingp to have support only on the vertices eectively imposes a lower bound on the covariance matrix which needs to be taken into consideration while choosing the 2 and 0 . A similar situation holds, albeit to a signicantly lesser degree, when we discretizeS to V . As might be expected, the lower bound on the covariance matrix is now determined by the points in V closest to 0 . Indeed, as we make V ner the lower bound becomes progressively loose and it disappears entirely when there exists a location in V that coincides with 0 . This can always be ensured while constructing V . Hence the constraint (4.4c) is satised strictly for any choice of 2 > 1. Thus, by this construction of the uncertainty set, we see that Slater's condition holds for 57 the primal problem (5.11), and hence strong duality holds [64]. Consequently, for any choice of 0 in the interior ofS, the optimum of the dual problem (5.12) is attained. Given the range of the parameters, 1 and 2 , it is instructive to track the trajectory of the robust estimate as these parameters are varied. If one can decipher some structure in the path followed by the robust estimate, it can yield some insight into the right choice of the parameters for a given instance of the localization problem. Specically, consider the case when the initial estimates for our uncertainty set, 0 and 0 , are kept xed while we vary the parameters, 1 and 2 . In the simplest scenario, we might imagine that the robust estimate traces out a line on varying 1 and 2 . Our simulation results, as seen in Figure 5.3, shows us that this is not necessarily true. The path followed by the robust estimate does not seem to follow an obvious pattern. 5.3 Numerical Results In this section, we investigate the performance of the robust estimate compared to the MMSE estimator under simulations and real world experiments. Firstly, we consider the wireless setting that corresponds to large-scale fading or shadowing. In this setting, the observations, corresponding to the average received power level at a given location, are drawn from a log-normal distribution [39]. In this case, we set the MMSE estimator to correctly assume the distribution of observations. The observations arriving from dierent transmitters may be correlated with each other. We consider both settings corresponding to correlated observations, and when the observations are independent and identically distributed (IID). Secondly, we consider the wireless setting that corresponds to small-scale fading. In this setting, the signal strength at a location uctuates about the average value according to a Rayleigh distribution [39]. In this case, the MMSE estimator ignores the uctuations about the mean power level. 58 0 10 20 30 40 50 60 l 0 10 20 30 40 50 60 70 80 b r 0 0 r (a) 0 10 20 30 40 50 60 l 0 10 20 30 40 50 60 70 80 b r 0 0 r (b) 0 10 20 30 40 50 60 l 0 10 20 30 40 50 60 70 80 b r 0 0 r (c) 0 10 20 30 40 50 60 l 0 10 20 30 40 50 60 70 80 b r 0 0 r (d) 0 10 20 30 40 50 60 l 0 10 20 30 40 50 60 70 80 b r 0 0 r (e) 0 10 20 30 40 50 60 l 0 10 20 30 40 50 60 70 80 b r 0 0 r (f) Figure 5.3: An illustration of the trajectory followed by the robust estimate (^ r), for various values of the parameters 1 and 2 . The MMSE estimate ( 0 ) and the actual receiver location (r 0 ) are shown as well. For this simulation, log-normal fading with correlated observations was assumed. The actual receiver location and the initial estimates for the uncertainty set were kept xed. The trajectory of the robust estimate does not seem to follow any obvious pattern. 59 Thirdly, we incorporate the non-stationary behavior exhibited by signal strength measure- ments [70]. In this setting, the variance of the observations changes depending on the time of the day. A higher variance is used to model observations generated during working hours (daytime), while a lower variance is used for nighttime observations. Finally, we evaluate the performance of the robust estimate under a real world setting. We collect received signal strength observations in an indoor environment under two scenarios, namely, daytime and nighttime. The daytime observations correspond to a setting with relatively high variance, while the nighttime observations correspond to a relatively low variance setting. In all of these cases, the initial estimates for the uncertainty set are obtained using the model assumed by the MMSE estimator. 5.3.1 Large-Scale Fading 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Standard Deviation ( ) 2.0 2.5 3.0 3.5 4.0 RMSE Robust MMSE Figure 5.4: Performance of the MMSE and Robust estimates on increasing noise variance. For this illustration, a log-normal fading model was used to generate the observations in a rectangular 60 m by 80 m space using identical parameters as that used in Figure 5.2. In this plot, the MMSE estimator correctly assumes that the observations are drawn from a log-normal distribution. Note that the performance of the robust estimate tracks that of the MMSE estimator very closely. 60 Sayfl 1 ;l 2 ;:::;l m g (m> 2) are the known positions of (m) wireless transmitters. We assume each transmitter is located on a planar surface given byS = [0;l] [0;b] where l;b2R >0 . The locations of the transmitters are given by the two dimensional vector l i = (x i ;y i )2S 8i2 f1; 2;:::;mg. We wish to estimate the receiver locations, given by the vectorr = (x;y), from the received signal strengths. For a given transmitter-receiver pair, say i, the relationship between the received signal power (P i r ) and the transmitted signal power (P i t ) may be modelled by the simplied path loss model P i r =P i t K d 0 d i W i ; (5.18) where the distance between the receiver and thei th transmitter is given byd i (r) = q (xx i ) 2 + (yy i ) 2 , and W i represents our noise that is log-normally distributed with zero mean and variance 2 . In log scale, the path loss model is given by P i r j dBm =P i t j dBm +Kj dB 10 log 10 d i d 0 +W i j dB ; (5.19) where K is a constant given by the gains of the receiver and transmit antennas and possibly the frequency of transmission. d 0 is a reference distance, taken to be 1m. In this setting, our estimation problem may be restated as follows. We are given measurements of the receiver signal strengths P 1 r ;P 2 r ;:::;P m r from which we are to estimate the receiver location r. Thus, our observation vectorO may be written as O i =P i r j dBm P i t j dBm Kj dB =W i j dB 10 log 10 d i d 0 ; for all i2f1;:::;Ng. In other words, the distribution of each observation is given by O i N 10 ln [d i (r)]; 2 . 61 For the setting where the received observations are IID, the distribution of the observation vector f OjR (ojr) can be obtained from the above by taking the product of all the marginal observation pdfs. To evaluate the setting where the observations may be correlated, we transform the observation vector that was obtained above. Under the IID assumption, the covariance matrix ofOjR is given by ~ =E f OjR h (O o) (O o) > i (5.20) = 2 I; (5.21) where I is the identity matrix, and o = E f OjR [O]. Using a transformation matrix C, we can transform the observation vector as O = CO. This yields the updated covariance matrix, ~ = 2 CC > . The transformed observations are now jointly normal with covariance ~ . We would like the transformation matrix to encode the property that the observations coming from transmitters that are closer to each other are likely to be more correlated compared to a pair that is spaced farther apart. This can be achieved by setting C ij = 1 1+dij , where d ij is the distance between the transmitters i and j, and is a parameter that can be used to tune the strength of the induced correlations. Large values of corresponds to situations where the elements of the observation vector are only weakly correlated with each other. As we set !1, the covariance matrix of the observation vector becomes diagonal, corresponding to IID observations being generated. As we make smaller, the observations get increasingly correlated with each other. In the limit ! 0, the elements of the observation vector become identical. Figure 5.5 illustrates how the performance of the robust estimate varies under dierent levels of correlation among the observations. In this manner, we can evaluate the robust formulation under the setting of correlated observa- tions. The specic values of the parameters used in the evaluation are given below. The dimensions of the area of interest (S) was 60 m 80 m. Sixteen transmitters were chosen randomly and 64 62 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Standard Deviation ( ) 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 RMSE Robust MMSE (a) 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Standard Deviation ( ) 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 RMSE Robust MMSE (b) 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Standard Deviation ( ) 4.8 5.0 5.2 5.4 5.6 5.8 6.0 RMSE Robust MMSE (c) 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Standard Deviation ( ) 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 RMSE Robust MMSE (d) 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Standard Deviation ( ) 3.00 3.25 3.50 3.75 4.00 4.25 4.50 4.75 RMSE Robust MMSE (e) 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Standard Deviation ( ) 2.0 2.5 3.0 3.5 4.0 RMSE Robust MMSE (f) Figure 5.5: An illustration of the impact of the strength of the induced correlations on the performance of the MMSE and the robust estimate. For the illustration, correlation strength was varied from = 2:0 in Figure 5.5a to = 10:0 in Figure 5.5e in equal increments. Figure 5.5f corresponds to IID observations or !1. For this illustration, a log-normal fading model was used to generate the observations in a rectangular 60 m by 80 m space using identical parameters as that used in Figure 5.2. In this plot, the MMSE estimator correctly assumes that the observations are drawn from a log-normal distribution. 63 RSSI readings were taken for each transmitter at 300 distinct receiver locations. The transmit power was kept constant at 16 dBm. The model parameters are a path loss of K = 39:13 dB at reference distance d 0 = 1 m, and path loss exponent = 3:93. The robust estimator uses the cost function corresponding to the MSE, g (kr ^ rk) =kr ^ rk 2 . The parameters of the robust estimator are, 1 = 8 and 2 = 8. The prior distribution was assumed to be uniform overS which corresponds to the case where we have no prior knowledge of device location. In Figure 5.4, the MMSE estimator correctly ts a log-normal distribution over the received power levels, which is used to generate the initial estimates, 0 ; 0 , used by the robust estimator. In this setting, the MMSE estimator is expected to be the best performing estimator as measured by the RMSE metric. This behavior is shown in Figure 5.4. However, it is notable that the robust estimate does not deviate too far from the MMSE estimate, with the mean RMSE of the robust estimate being at most 0:2 m above that of the MMSE estimate. 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Standard Deviation ( ) 10.2 10.4 10.6 10.8 11.0 11.2 RMSE Robust MMSE Figure 5.6: Performance of the MMSE and Robust estimates on increasing noise variance. For this illustration, a Rayleigh fading model was used to generate the observations in a rectangular 60 m by 80 m space using identical parameters as that used in Figure 5.4. The observations are correlated. In this plot, the MMSE estimator incorrectly assumes that the observations are drawn from a log-normal distribution. The robust estimate performs better than the MMSE estimator here. Moreover, the robust estimate improves on increasing the noise variance. 64 5.3.2 Small-Scale Fading Consider the setting where the signal strength at any given location is the result of superposition of dierent multi path components, none of which is dominant. Then, the in-phase and the quadrature-phase component of the received signal is the sum of many random components and is well approximated by a normal distribution using the central limit theorem [39]. Consequently, the absolute amplitude of the received signal follows a Rayleigh distribution. The squared amplitude, and hence the power, follows an exponential distribution. Let the mean received signal power, for a given transmitter-receiver pair, say i, be given by P i r . Note that this local mean, P i r , follows a log-normal distribution as given in equation (5.18). The received signal power, say ~ P i r , uctuates around this average value according to a exponential distribution, ~ P i r =P i r Z i (5.22) where Z i Exp(1). The distribution of the received power, ~ P i r , is a product of a log-normal random variable, corresponding to the mean power level, and an exponential random variable, corresponding to uctuations about this mean. In Figure 5.6, the MMSE estimator eectively ignores these uctuations about the mean value. The MMSE estimator ts a log-normal distribution over the received power levels, ignoring the uctuations due to small scale fading. This incorrect distribution used by the MMSE estimator is used to generate the initial estimates, 0 ; 0 , used by the robust estimator. As can be seen in Figure 5.6, the robust estimator outperforms the MMSE estimator in this case, demonstrating its usefulness in the presence of modeling errors. 65 5.3.3 Non-Stationary Behavior It is well known that signal strength measurements in indoor environments exhibit non-stationary behavior [70]. Specically, the variability exhibited by the signal strength measurements varies depending on the time of the day. During daytime, the presence of an increased number of people indoors creates a dynamic environment that results in increased variability for the signal strength measurements. This eect is reduced during the evening and nighttime. To evaluate the robust formulation under this setting, we generate two sets of observations using the small-scale fading model. The ratio of the standard deviation of the two sets of observations is given by 1 2 =, where is a parameter. For the illustration in Figure 5.7, we choose = 2. The smaller standard deviation is given by 2 = q 2 1+ 2 , where is a parameter we can use to vary the overall variance of the combined set of observations. This parameter is used as the standard deviation of the log-normal model used by the MMSE estimator and thus to generate the initial estimates for the robust formulation. The performance of the robust formulation, and the MMSE estimator is shown in Figure 5.7. The robust formulation performs better in this setting as well, further strengthening the case for its adoption when the model is not known exactly. 5.3.4 Real World Experiments We evaluated our robust formulation using real world data. The data was collected from a 4 m2 m space inside an oce environment. The space was divided into eight 1 m 1 m squares and signal strength samples were collected from the center of each square. Two hundred and fty signal strength readings were collected for the sixteen strongest access points detected using the WiFi card on a laptop running Linux. The beacon interval for each access point was approximately 100 ms. The signal strength measurements were taken 400 ms apart. Two sets of data were collected, one at night time and the other during the day. The measurements taken at night show that the 66 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Standard Deviation ( ) 10.2 10.4 10.6 10.8 11.0 11.2 RMSE Robust MMSE Figure 5.7: Performance of the MMSE and Robust estimates on increasing noise variance. For this illustration, a Rayleigh fading model with dierent noise variances for the mean power was used to simulate the non-stationary behavior of signal strength measurements. The observations were generated in a rectangular 60 m by 80 m space using identical parameters as that used in Figure 5.6. The observations are correlated. In this plot, the MMSE estimator incorrectly assumes that the observations are drawn from a log-normal distribution with standard deviation . The robust estimate performs better than the MMSE estimator here. Moreover, the robust estimate improves on increasing the noise variance. observed signal strengths are highly concentrated around the mean. The measurements taken during daytime show more variability. Thirty percent of the collected data was randomly chosen for evaluating algorithm performance. The remaining data was used to t a log-normal distribution from which the MMSE estimates and the initial estimates for the robust formulation were derived. The parameters of the robust estimator are, 1 = 1 and 2 = 1. The performance of the robust estimator on this real dataset is shown in Figure 5.8. The results shown that the robust estimate outperforms the MMSE estimate in both the daytime and the nighttime datasets. The performance increase is more pronounced in the daytime, which is expected since the observations display more variability during daytime. This echos the trend seen in the simulation results. 67 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Distance Error (m) 0.0 0.2 0.4 0.6 0.8 1.0 Error CDF Robust MMSE (a) 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Distance Error (m) 0.0 0.2 0.4 0.6 0.8 1.0 Error CDF Robust MMSE (b) Figure 5.8: Performance of the MMSE and Robust estimates on a real dataset. In this experiment, two sets of observations were collected from a a rectangular 4 m by 2 m space in an indoor oce environment. Plot 5.8a shows the performance of the robust estimate on observations taken during the night when there was little movement in the environment. Plot 5.8b shows the performance of the robust estimate on observations taken during the daytime, when there were regular movement of people inside the oce. The parameters of the robust estimator was unchanged between both cases. We see that the robust estimate performs better than MMSE both cases. The increase in performance improves during daytime. 68 Chapter 6 Conclusion and Future Work We have introduced an optimization based approach to localization using a general unied framework that also allows for a fair and consistent comparison of algorithms across various metrics of interest. We have demonstrated how this framework may be used to derive localization algorithms, including ngerprinting methods. We have shown the existence of a partial ordering over the set of localization algorithms using the concept of stochastic dominance, showing further that the optimality of an algorithm over a particular distance based cost function implies that the algorithm is not stochastically dominated by another. We have identied key properties of an \ideal" localization algorithm whose performance corresponds to the upper bound on error CDFs, and highlighted how we may compare dierent algorithms relative to this performance. Specically, we have shown that MEDE minimizes the area between its own error CDF and the performance of such an ideal algorithm. We believe that the framework presented here goes a long way towards unifying the localization literature. The optimization based approach places the localization algorithm desideratum at the forefront, where we believe it belongs. We also introduced a distributionally robust approach to the problem of indoor localization based on RSS observations, that explicitly takes into account the inherent uncertainty in the distribution of observations. We have identied the structure of the robust solution and illustrated how the solution changes on varying the parameters. We have demonstrated how to construct the problem 69 such that it is easily computed using standard software tools and always returns an optimal solution. We have evaluated our robust solution under realistic channel fading models. Our results show that the robust solution outperforms the traditional approach in the presence of modeling errors, while remaining close to the traditional estimate when the modeling is exact. To the best of our knowledge, this is the rst work that addresses distributionally robust indoor localization. In the following sections, we outline some open research problems and ideas for future work. 6.1 Attainability of the optimal error CDF As indicated in Section 3.3, if we have a symmetric unimodal distribution over a circular area with the mode located at the center, then the MAP estimate is optimal. On the other hand, from an algorithmic perspective, MEDE has the nice property, following Theorem 3, that if there exists an algorithm that attains F , then the MEDE algorithm will attain it as well. However, it is not clear a priori ifF is attainable. A sucient and necessary condition for the attainability F may be obtained in light of the following observations: By design, the MP(d) estimate matches the performance of F for the specic distance d2Q. However, there are no guarantees on the performance the estimate at other distances. The MP (d) algorithm need not return a unique location estimate. This re ects the fact that there may be multiple locations inS that maximizes the P (d) metric for the given posterior distribution. Consequently, consider a modied version of the MP(d) algorithm that enumerates all possible MP(d) estimates for the given distance d. If F is attainable by some algorithm A, then the estimate returned by A, say ^ r A , must exist in the list returned by the MP(d) algorithm. Note that this condition holds for any distance d2Q used for the MP(d) algorithm. Consequently, if the intersection of the estimate list returned by MP(d) for all d2Q is non-empty, then we conclude that F is attained by each estimate in the intersection. If the intersection is a null set, 70 then we conclude that F is unattainable, thereby giving us the necessary and sucient condition for the existence of F . While the above test calls for the repeated execution of the MP (d) algorithm over a potentially large search space, we may exploit the fact that each execution is independent of each other, thereby gaining considerable speed improvements by utilizing parallel processing. In some cases, it may be possible to carry out this test analytically. The condition enumerated in Section 3.3 is a case in point. The task of identifying the cases when we can obtain the algorithm that attains F is a topic of future research. A slight generalization of the example given in Section 3.3 leads to the following conjecture. Conjecture 1. If the posterior distribution f R (rjO =o) is continuous, symmetric and unimodal with the mode located at the centroid of a convex spaceS, then the MAP estimate is optimal. Let d 0 be the maximal radius of a neighbourhood N d (c)S centered on the centroid (c) of our spaceS. For all dd 0 inf r2N d (c) f R (rjO =o) sup r2N d (r 0 )nN d (c) f R (rjO =o); (6.1) for anyr 0 2S. Note that this follows directly from the assumption that the posterior is unimodal. Consequently, for all d d 0 the MAP estimate is one of the best performing estimates. The optimality of MAP at distances beyond d 0 follows immediately if Z S\(N d (c)nN d (r 0 )) dr Z S\(N d (r 0 )nN d (c)) dr; (6.2) for anyr 0 2S. We intuit that the above condition holds sincec is the centroid ofS, but a proof remains elusive. 71 6.2 Uncertainty Sets for the Distribution of Observations The robust formulation presented in this dissertation denes an uncertainty set of posterior distributions, f RjO (rjo), from which we derive the robust estimate. As illustrated in Figure 4.1, the uncertainty in the posterior distribution fundamentally stems from uncertainty about the distribution of observations, f OjR (ojr). This suggests an alternate approach. Namely, we may consider dening an uncertainty set of the distribution of observations, and then derive a robust estimate using this new uncertainty set. The key challenge in this approach lies in translating the uncertainty constraints to the posterior distribution. Specically, consider the situation where we dene a set of distributions similar to the denition in equation (4.4). Assume that we have collected a set of observations,fo 1 ;o 2 ;:::;o m g, from a particular unknown location inS. Then empirical mean and covariance matrix of the observations may be computed for that location, ^ = 1 m m X i=1 o i ; (6.3) ^ = 1 m 1 m X i=1 (o i ^ )(o i ^ ) > : (6.4) This empirical mean and covariance matrix may be used as the initial estimates, 0 and 0 , to construct an uncertainty set similar to equation (4.4), but for the distribution of observa- tions, f OjR (ojr). However, without further assumptions, it is unclear how these uncertainty constraints for f OjR (ojr) can be translated to dene an uncertainty set for the posterior distribu- tion, f RjO (rjo). One possible way forward is to further assume that the relevant distributions have a certain functional form. Specically, assume that the observations are generated from a certain family of distributions, such as the log-normal family. By choosing an appropriate conjugate prior, we can x the family of the posterior distributions. The constraints on the mean 72 and covariance matrix of the distribution of observations can be translated to constraints on the mean and covariance matrix of the posterior. This approach has the disadvantage that we are xing the family of distributions of the prior, and hence that of the posterior, a priori. Depending on our prior knowledge of the location, or lack thereof, such an assumption may not be appropriate. More work needs to be done to investigate the feasibility and performance of such an approach, and to compare it with the robust formulation presented here. However, the current robust formulation has the appealing feature of being conceptually close to how localization algorithms are designed and implemented currently. Current methods specify a single model for observation vector and do not impose any restriction on the choice of the prior. These features are retained in the current robust formulation. In Section 5.1, we discussed how to select the parameters, 1 and 2 , such that the resulting optimization problem always returns an optimal solution. Ideally, we would like to derive our parameters from the received observations in a manner that yields a probabilistic guarantee that our solution is robust with respect to the true posterior distribution. For instance, if we construct the distribution of observations, f OjR (ojr), empirically from a large data set, then we are reasonably justied in choosing 1 close to 0, and 2 close to 1. However, further work is needed to fully characterize this dependence. To this end, a rst step is to obtain a condence region for the mean and the covariance matrix ofR, based solely on the received set of observations. Finally, we aim to choose our initial estimates 0 and 0 , and the parameters 1 and 2 , using the received set of observations in a manner that will guarantee that the true posterior distribution lies within the uncertainty setF (S; 0 ; 0 ; 1 ; 2 ), with high probability. 73 In [48], under a dierent problem setting, the authors specify the how to obtain the parameters of the uncertainty set from historical data, such that the true distribution lies within the constructed set with high probability. Specically, the authors consider problems of the form minimize x2X max f2F E f [h(x;)] (6.5) whereX is a convex set of feasible solutions, is some random vector of parameters, h(x;) is a cost function that is convex inx and concave in, andF is the set of distributions of. In this setting, it is assumed that one has access to an independent set of samplesf 1 ; 2 ;:::; m g generated according to an unknown distribution. This set of samples is used to construct the set F such that it contains the unknown distribution of with high probability. The current work diers from the above primarily in that we construct the uncertainty set F for the posterior distribution, f RjO (rjo), and not the distribution of observations, f OjR (ojr). Consequently, as illustrated in Figure 4.1, our estimates of the mean and covariance matrix involved in the construction ofF are not the empirical mean and covariance matrix of the received set of observations, but that ofR derived from the observations with the help of a model. The problem of obtaining performance guarantees for the robust estimate under this setting remains an area of future work. 6.3 Room Localization We consider the problem of identifying the room that contains an RF-device. We begin by formulating indoor localization as a Bayesian optimization problem [18,37,38], which allows us to introduce our room localization requirements in a simple and natural manner. LetSR 2 be the two-dimensional space of interest in which localization is to be performed. We assume thatS is convex, closed, and bounded. A room,RS, is a subset of our space. Rooms are assumed to be disjoint simple polygons inS, and we do not require that they are convex. We 74 have n r rooms,R =fR 1 ;:::;R nr g, and assume that we have access to a map that returns the unique room index for each location inS. Let the location of the receiver (the node whose location is to be estimated) be denoted byr2S. The receiver is located in one of the rooms, identied by the room index m(r) which is unknown. Using a Bayesian viewpoint, we assume that this location is a random variable with some prior distribution f R (r). This prior distribution is used to represent knowledge about the possible position, obtained, for instance from previous location estimates or knowledge of the corresponding user's mobility characteristics in the space; in the absence of any prior knowledge, it could be set to be uniform overS. Leto2R N represent the location dependent observation data that was collected. As an example,o could represent the received signal strength values from transmitters whose locations are known. Mathematically, we only require that the observation vector is drawn from a known distribution that depends on the receiver location r: f OjR (ojr). In case of RSS measurements, this distribution characterizes the stochastic radio propagation characteristics of the environment and the location of the transmitters. Note that this distribution could be expressed in the form of a standard fading model whose parameters are tted with observed data, such as the well-known simple path loss model with log-normal fading [39]. Using the conditional distribution of the observed vector and the prior overR, we obtain the conditional distribution over the receiver locations using Bayes' rule: f RjO (rjo) = f OjR (ojr)f R (r) R r2S f OjR (ojr)f R (r) dr : (6.6) Algorithms for localization are essentially methods that derive a location estimate from the above posterior distribution. In fact, any localization algorithm A is a mapping from the observation vectoro the prior distribution over the location, f R (r) 75 the conditional distribution overo, f OjR (ojr) to a location estimate ^ r. 6.3.1 Optimization based approach to room localization Most localization algorithms may be derived as the solution to optimization problems with carefully chosen cost functions [18]. Adopting this approach, we dene a cost function that captures the room localization objective. Thus, the room localization algorithm can be derived as the solution to an opitmization problem using this cost function. In the most general terms, the cost function is modeled asC(r; ~ r;o), i.e., a function of the true locationr, a given proposed location estimate ~ r, and the observation vector o. We dene the expected cost function given an observation vector as follows: E[C(r; ~ r;o)] = Z r2S C(r; ~ r;o)f RjO (rjo) dr: (6.7) Given any cost functionC, the optimal location estimate can be obtained by solving the following optimization for any given observation vector to obtain the optimal estimate ^ r: ^ r = arg min ~ r E[C(r; ~ r;o)]: (6.8) The receiver is located in one of the rooms, identied by the room index m(r) which is unknown. In room localization, the location estimates estimate the this room index. This objective can be specied by using the cost function C (r; ~ r;o) = 8 > > > < > > > : 1 if ~ r2R m(r) 0 if ~ r = 2R m(r) : (6.9) 76 In this case, the expected cost is given by E[C(r; ~ r;o)] =P ~ r2R m(r) (6.10) =P r2R m(~ r) (6.11) = Z r2R m(~ r) f RjO (rjo) dr: (6.12) This leads to the following location estimate, ^ r = arg max ~ r P r2R m(~ r) : (6.13) It is easily seen that for an estimate ^ r, any other location in the same room matches the performance of ^ r. This property is a consequence of our choice of the cost function, and it matches the objective of room localization where we do not distinguish between locations within the same room. Consequently, we can reduce our location estimate to an estimate of the room containing the device: ^ m = arg max i P [r2R i ]: (6.14) 6.3.2 Incorporating Search In certain settings, such as search and rescue operations, each room (say i) has a cost, c i > 0, associated with it. This cost may represent the diculty of accessing the room or its distance from the origin. This cost can be incorportated into the original cost function as follows: C (r; ~ r;o) = 8 > > > < > > > : 1 c m(r) if ~ r2R m(r) 0 if ~ r = 2R m(r) : (6.15) 77 In this case, the room estimate is given by ^ m = arg max i P [r2R i ] c i : (6.16) This also indicates the order in which we should search for the receiver, in case we wish to search untill the it is found. Specically, list the rooms in decreasing order of P[r2Ri] ci , and carry out the search in this order. 78 Reference List [1] N. A. Jagadeesan and B. Krishnamachari, \A unifying bayesian optimization framework for radio frequency localization," IEEE Transactions on Cognitive Communications and Networking, vol. 4, no. 1, pp. 135{145, March 2018. [2] R. Zekavat and R. Buehrer, Handbook of Position Location: Theory, Practice and Advances, ser. IEEE Series on Digital & Mobile Communication. Wiley, 2011. [Online]. Available: https://books.google.co.in/books?id=2kJuNmtDMkgC [3] H. Liu, H. Darabi, P. Banerjee, and J. Liu, \Survey of wireless indoor positioning techniques and systems," IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 37, no. 6, pp. 1067{1080, Nov 2007. [4] J. Xiao, Z. Zhou, Y. Yi, and L. M. Ni, \A survey on wireless indoor localization from the device perspective," ACM Comput. Surv., vol. 49, no. 2, pp. 25:1{25:31, Jun. 2016. [Online]. Available: http://doi.acm.org/10.1145/2933232 [5] A. Waadt, C. Kocks, S. Wang, G. Bruck, and P. Jung, \Maximum likelihood localization estimation based on received signal strength," in Proc. IEEE ISABEL'10, Nov. 2010, pp. 1{5. [6] Y.-F. Huang, Y.-T. Jheng, and H.-C. Chen, \Performance of an MMSE based indoor localiza- tion with wireless sensor networks," in Proc. IEEE NCM'10, Aug. 2010, pp. 671{675. [7] L. Lin and H. So, \Best linear unbiased estimator algorithm for received signal strength based localization," in Proc. IEEE EUSIPCO'11, Aug. 2011, pp. 1989{1993. [8] P. Bahl and V. Padmanabhan, \RADAR: an in-building RF-based user location and tracking system," in Proc. IEEE INFOCOM'00, vol. 2, Mar. 2000, pp. 775{784. [9] K. Yedavalli and B. Krishnamachari, \Sequence-based localization in wireless sensor networks," IEEE Trans. Mobile Comput., vol. 7, no. 1, pp. 81{94, Jan. 2008. [10] D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello, \Bayesian ltering for location estimation," IEEE Pervasive Comput., vol. 2, no. 3, pp. 24{33, Jul. 2003. [11] K. Bauer, E. W. Anderson, D. McCoy, D. Grunwald, and D. C. Sicker, \CRAWDAD data set cu/rssi (v. 2009-05-28)," Downloaded from http://crawdad.org/cu/rssi/, May 2009. [12] Z. Yang, C. Wu, and Y. Liu, \Locating in ngerprint space: Wireless indoor localization with little human intervention," in Proc. ACM MobiCom'12, 2012, pp. 269{280. [13] K. Chintalapudi, A. Padmanabha Iyer, and V. N. Padmanabhan, \Indoor localization with- out the pain," in Proceedings of the Sixteenth Annual International Conference on Mobile Computing and Networking, ser. MobiCom '10. New York, NY, USA: ACM, 2010, pp. 173{184. 79 [14] N. B. Priyantha, A. Chakraborty, and H. Balakrishnan, \The cricket location-support system," in Proc. ACM MobiCom'00, 2000, pp. 32{43. [15] M. Youssef and A. Agrawala, \The horus wlan location determination system," in Proc. ACM MobiSys'05, 2005, pp. 205{218. [16] N. Patwari, A. Hero, M. Perkins, N. Correal, and R. O'Dea, \Relative location estimation in wireless sensor networks," IEEE Trans. Signal Process., vol. 51, no. 8, pp. 2137{2148, Aug. 2003. [17] J. Wang, J. Chen, and D. Cabric, \Cramer-Rao bounds for joint RSS/DoA-based primary-user localization in cognitive radio networks," IEEE Transactions on Wireless Communications, vol. 12, no. 3, pp. 1363{1375, Mar. 2013. [18] N. A. Jagadeesan and B. Krishnamachari, \A unifying bayesian optimization framework for radio frequency localization," USC ANRG Technical Report, ANRG-2017-03, http://anrg.usc. edu/www/papers/tr 201703.pdf. [19] H. Aksu, D. Aksoy, and I. Korpeoglu, \A study of localization metrics: Evaluation of position errors in wireless sensor networks," Comput. Netw., vol. 55, no. 15, pp. 3562{3577, Oct. 2011. [20] E. Elnahrawy, X. Li, and R. Martin, \The limits of localization using signal strength: a comparative study," in Proc. IEEE SECON'04, Oct. 2004, pp. 406{414. [21] R. M. Vaghe and R. M. Buehrer, \Received signal strength-based sensor localization in spatially correlated shadowing," in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 4076{4080. [22] X. Zheng, H. Liu, J. Yang, Y. Chen, R. P. Martin, and X. Li, \A study of localization accuracy using multiple frequencies and powers," IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 8, pp. 1955{1965, Aug. 2014. [23] B. Wang, S. Zhou, W. Liu, and Y. Mo, \Indoor localization based on curve tting and location search using received signal strength," IEEE Transactions on Industrial Electronics, vol. 62, no. 1, pp. 572{582, Jan. 2015. [24] M. Youssef, A. Agrawala, and A. Udaya Shankar, \WLAN location determination via clustering and probability distributions," in Proc. IEEE PerCom'03, Mar. 2003, pp. 143{150. [25] J. Blumenthal, R. Grossmann, F. Golatowski, and D. Timmermann, \Weighted centroid localization in zigbee-based sensor networks," in Proc. IEEE WISP'07, Oct. 2007, pp. 1{6. [26] R. M. Vaghe, M. R. Gholami, R. M. Buehrer, and E. G. Strom, \Cooperative received signal strength-based sensor localization with unknown transmit powers," IEEE Transactions on Signal Processing, vol. 61, no. 6, pp. 1389{1403, March 2013. [27] R. W. Ouyang, A. K. S. Wong, and C. T. Lea, \Received signal strength-based wireless localization via semidenite programming: Noncooperative and cooperative schemes," IEEE Transactions on Vehicular Technology, vol. 59, no. 3, pp. 1307{1318, March 2010. [28] S. Chitte, S. Dasgupta, and Z. Ding, \Distance estimation from received signal strength under log-normal shadowing: Bias and variance," IEEE Signal Process. Lett., vol. 16, no. 3, pp. 216{218, Mar. 2009. [29] S. H. Low and D. E. Lapsley, \Optimization ow control i: Basic algorithm and convergence," IEEE/ACM Trans. Netw., vol. 7, no. 6, pp. 861{874, Dec. 1999. 80 [30] S. Shakkottai and R. Srikant, Network Optimization and Control, in Foundations and Trends in Networking. Boston - Delft: Now Publishers Inc, Jan. 2008. [31] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas, \Taking the human out of the loop: A review of bayesian optimization," Proceedings of the IEEE, vol. 104, no. 1, pp. 148{175, Jan. 2016. [32] J. Jacob, B. R. Jose, and J. Mathew, \Spectrum prediction in cognitive radio networks: A bayesian approach," in 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies, Sep. 2014, pp. 203{208. [33] X. Xing, T. Jing, Y. Huo, H. Li, and X. Cheng, \Channel quality prediction based on bayesian inference in cognitive radio networks," in 2013 Proceedings IEEE INFOCOM, April 2013, pp. 1465{1473. [34] H. Durrant-Whyte and T. Bailey, \Simultaneous localization and mapping: part I," IEEE Robotics Automation Magazine, vol. 13, no. 2, pp. 99{110, June 2006. [35] S. Thrun, W. Burgard, and D. Fox, \A probabilistic approach to concurrent mapping and localization for mobile robots," Mach. Learn., vol. 31, no. 1-3, pp. 29{53, Apr. 1998. [36] S. S arkk a, Bayesian Filtering and Smoothing. New York, NY, USA: Cambridge University Press, 2013. [37] G. Casella and R. L. Berger, Statistical Inference, 2nd ed. Cengage Learning, 2001. [38] H. Van Trees, K. Bell, and Z. Tian, Detection Estimation and Modulation Theory, Part I. Wiley, 2013. [39] A. F. Molisch, Wireless Communications, 2nd ed. Chichester, West Sussex, U.K: Wiley, 2010. [40] D. Niculescu and B. Nath, \Ad hoc positioning system (APS) using AoA," in Proc. IEEE INFOCOM'03, vol. 3, Mar. 2003, pp. 1734{1743. [41] A. Savvides, C.-C. Han, and M. B. Strivastava, \Dynamic ne-grained localization in ad-hoc networks of sensors," in Proc. ACM MobiCom'01, 2001, pp. 166{179. [42] W. Rudin, Principles of Mathematical Analysis, 3rd ed. New York: McGraw-Hill Sci- ence/Engineering/Math, Jan. 1976. [43] M. Angjelichinoski, D. Denkovski, V. Atanasovski, and L. Gavrilovska, \Cram er{Rao lower bounds of RSS-based localization with anchor position uncertainty," IEEE Transactions on Information Theory, vol. 61, no. 5, pp. 2807{2834, May 2015. [44] H. G. Tucker, \A generalization of the glivenko-cantelli theorem," Ann. Math. Statist., vol. 30, no. 3, pp. 828{830, Sep. 1959. [45] P. J. Huber and E. M. Ronchetti, Robust Statistics. John Wiley & Sons. [46] A. Ben-Tal and A. Nemirovski, \Robust convex optimization," Math. Oper. Res., vol. 23, no. 4, pp. 769{805, Nov. 1998. [47] D. Bertsimas, V. Gupta, and N. Kallus, \Data-driven robust optimization," Mathematical Programming, pp. 1{58, 2017. 81 [48] E. Delage and Y. Ye, \Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems," Operations Research, vol. 58, no. 3, pp. 595{612, Jan. 2010. [49] W. Ye and F. Ordonez, \Robust optimization models for energy-limited wireless sensor networks under distance uncertainty," IEEE Transactions on Wireless Communications, vol. 7, no. 6, pp. 2161{2169, June 2008. [50] L. Lazos and R. Poovendran, \Hirloc: high-resolution robust localization for wireless sensor networks," IEEE Journal on Selected Areas in Communications, vol. 24, no. 2, pp. 233{246, Feb 2006. [51] L. Lazos, R. Poovendran, and S. Capkun, \Rope: robust position estimation in wireless sensor networks," in IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005., April 2005, pp. 324{331. [52] S. Capkun and J. P. Hubaux, \Secure positioning of wireless devices with application to sensor networks," in Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies., vol. 3, March 2005, pp. 1917{1928 vol. 3. [53] L. Lazos and R. Poovendran, \Serloc: Secure range-independent localization for wireless sensor networks," in Proceedings of the 3rd ACM Workshop on Wireless Security, ser. WiSe '04, 2004, pp. 21{30. [54] Z. Li, W. Trappe, Y. Zhang, and B. Nath, \Robust statistical methods for securing wireless lo- calization in sensor networks," in IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005., April 2005, pp. 91{98. [55] D. Liu, P. Ning, and W. K. Du, \Attack-resistant location estimation in sensor networks," in IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005., April 2005, pp. 99{106. [56] B. Krishnamachari and K. Yedavalli, Secure Localization and Time Synchronization for Wireless Sensor and Ad Hoc Networks, R. Poovendran, S. Roy, and C. Wang, Eds. Boston, MA: Springer US, 2007. [57] R. Casas, D. Cuartielles, A. Marco, H. J. Gracia, and J. L. Falco, \Hidden issues in deploying an indoor location system," IEEE Pervasive Computing, vol. 6, no. 2, pp. 62{69, April 2007. [58] P. Li and X. Ma, \Robust acoustic source localization with tdoa based ransac algorithm," in Emerging Intelligent Computing Technology and Applications: 5th International Conference on Intelligent Computing, ICIC 2009, Ulsan, South Korea, September 16-19, 2009. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 222{227. [59] T. Rappaport, Wireless Communications: Principles and Practice, 2nd ed. Upper Saddle River, NJ, USA: Prentice Hall PTR, 2001. [60] W. Wiesemann, D. Kuhn, and M. Sim, \Distributionally robust convex optimization," Operations Research, vol. 62, no. 6, pp. 1358{1376, 2014. [Online]. Available: http://dx.doi.org/10.1287/opre.2014.1314 [61] H. E. Scarf, A min-max solution of an inventory problem. Santa Monica, Calif.: Rand Corp., 1957. [62] E. M. Bronstein, \Approximation of convex sets by polytopes," Journal of Mathematical Sciences, vol. 153, no. 6, pp. 727{762, 2008. 82 [63] R. Rockafellar, Convex Analysis, ser. Princeton Landmarks in Mathematics and Physics. Princeton University Press, 2015. [64] S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY, USA: Cambridge University Press, 2004. [65] L. Vandenberghe and S. Boyd, \Semidenite programming," SIAM Review, vol. 38, no. 1, pp. 49{95, 1996. [66] S. Diamond and S. Boyd, \CVXPY: A Python-embedded modeling language for convex optimization," Journal of Machine Learning Research, vol. 17, no. 83, pp. 1{5, 2016. [67] M. Udell, K. Mohan, D. Zeng, J. Hong, S. Diamond, and S. Boyd, \Convex optimization in Julia," SC14 Workshop on High Performance Technical Computing in Dynamic Languages, 2014. [68] M. d. Berg, O. Cheong, M. v. Kreveld, and M. Overmars, Computational Geometry: Algorithms and Applications, 3rd ed. Santa Clara, CA, USA: Springer-Verlag TELOS, 2008. [69] F. P. Preparata and S. J. Hong, \Convex hulls of nite sets of points in two and three dimensions," Commun. ACM, vol. 20, no. 2, pp. 87{93, Feb. 1977. [Online]. Available: http://doi.acm.org/10.1145/359423.359430 [70] K. Kaemarungsi and P. Krishnamurthy, \Analysis of wlan's received signal strength indication for indoor location ngerprinting," Pervasive and Mobile Computing, vol. 8, no. 2, pp. 292 { 316, 2012, special Issue: Wide-Scale Vehicular Sensor Networks and Mobile Sensing. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1574119211001234 83
Abstract (if available)
Abstract
We consider the problem of estimating the location of an RF-device using observations, such as received signal strengths, generated according to a possibly unknown distribution from a set of transmitters with known locations. We survey the literature on this problem, showing that previous authors have considered implicitly or explicitly various metrics. We contend that the literature is disconnected and disorganized and that it is hard to decipher any unified theory that fairly evaluates these algorithms across different metrics of interest. Moreover, while most works employ the services of a model to obtain the distribution of observations, not enough attention is given to the issue of how well the location estimate performs when the actual distribution deviates from that predicted by the model. We address these issues in this thesis. ❧ We present a Bayesian optimization framework that unifies the localization literature, and shows how to optimize the location estimation with respect to a given metric. We demonstrate how the framework can incorporate a general class of algorithms, including both model-based methods and data-driven algorithms such fingerprinting. This is illustrated by re-deriving the most popular algorithms within this framework. Furthermore, we propose using the error-CDF as a unified way of comparing algorithms based on two methods: (i) stochastic dominance, and (ii) an upper bound on error-CDFs. We prove that an algorithm that optimizes any distance based cost function is not strictly stochastically dominated by any other algorithm. ❧ We further present a distributionally robust formulation of the localization problem that explicitly takes into account the uncertainty in the distribution that generates the observations. We identify the structure of the robust solution and demonstrate how to the construct the optimization problem so that it is easily computed, and always yields the optimal solution. We show that the robust estimate outperforms traditional methods in the presence of modeling errors, while remaining close to the traditional estimate when the modeling is exact.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Optimizing task assignment for collaborative computing over heterogeneous network devices
PDF
Achieving efficient MU-MIMO and indoor localization via switched-beam antennas
PDF
Radio localization techniques using ranked sequences
PDF
Optimal resource allocation and cross-layer control in cognitive and cooperative wireless networks
PDF
Multichannel data collection for throughput maximization in wireless sensor networks
PDF
On practical network optimization: convergence, finite buffers, and load balancing
PDF
Distributed algorithms for source localization using quantized sensor readings
PDF
I. Asynchronous optimization over weakly coupled renewal systems
PDF
Modeling intermittently connected vehicular networks
PDF
Utilizing context and structure of reward functions to improve online learning in wireless networks
PDF
Landscape analysis and algorithms for large scale non-convex optimization
PDF
Scheduling and resource allocation with incomplete information in wireless networks
PDF
Solution of inverse scattering problems via hybrid global and local optimization
PDF
Scalable optimization for trustworthy AI: robust and fair machine learning
PDF
Online learning algorithms for network optimization with unknown variables
PDF
Localization of multiple targets in multi-path environnents
PDF
Stochastic oilfield optimization under uncertainty in future development plans
PDF
Addressing uncertainty in Stackelberg games for security: models and algorithms
PDF
A protocol framework for attacker traceback in wireless multi-hop networks
PDF
Remote exploration with robotic networks: queue-aware autonomy and collaborative localization
Asset Metadata
Creator
Ananthakrishnan Jagadeesan, Nachikethas
(author)
Core Title
Data-driven optimization for indoor localization
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
08/13/2018
Defense Date
06/07/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
algorithms,Bayesian,localization,networks,OAI-PMH Harvest,optimization,wireless
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Krishnamachari, Bhaskar (
committee chair
), Annavaram, Murali (
committee member
), Golubchik, Leana (
committee member
)
Creator Email
nachikethas@alumni.usc.edu,nachikethas@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-68722
Unique identifier
UC11672151
Identifier
etd-Ananthakri-6733.pdf (filename),usctheses-c89-68722 (legacy record id)
Legacy Identifier
etd-Ananthakri-6733.pdf
Dmrecord
68722
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Ananthakrishnan Jagadeesan, Nachikethas
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
algorithms
Bayesian
localization
networks
optimization
wireless