Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
On the theory and applications of structured bilinear inverse problems to sparse blind deconvolution, active target localization, and delay-Doppler estimation
(USC Thesis Other)
On the theory and applications of structured bilinear inverse problems to sparse blind deconvolution, active target localization, and delay-Doppler estimation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
On the Theory and Applications of Structured Bilinear Inverse Problems to Sparse Blind Deconvolution, Active Target Localization, and Delay-Doppler Estimation by Sunav Choudhary A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2016 Copyright 2016 Sunav Choudhary Acknowledgments As I finish my doctoral studies at USC and look back, I am surprised at the sheer number of people who were involved in the completion of this dissertation. It indeed takes a village, and if I were to thank each and every individual and organization who played a role at various stages of the research that has gone into this dissertation, I would be left wanting for a substantially larger acknowledgment section and a lot more time. I will try my best to thank all people and organizations who were substantially involved in the process and led to the timely completion of this dissertation. If I forgot someone, please accept my sincere apologies. First and foremost, I want to thank my advisor Prof. Urbashi Mitra for her patience and guidance. A student only realizes in retrospect, the difficulty of the multiple balancing roles played by his advisor. I am especially grateful for her insistence on being a good presenter in both written and oral forms and the countless hours she has spent over the years improving my ability to make good presentations and writing papers that would otherwise be unreadable. Funding a graduate student who wants to work on a moderately long term idea during his initial years is a substantial risk on the advisor’s part and I am glad and grateful to have received that opportunity from my advisor. I would like to thank the members of my qualifying exam and defense committees (Prof. Antonio Ortega, Prof. Shaddin Dughmi, Prof. Larry Goldstein, Prof. Justin Haldar and Prof. Suvrajeet Sen) for the great questions and suggestions they offered. It helped a lot to get viewpoints from researchers in different domains of expertise. On that note, USC deserves my thanks in no small part for the wide breadth of courses that it offers to every graduate student and the freedom and encouragement of enrolling in them. Personally, besides the coursework in the Electrical Engineering department, I benefited immensely from courses in the Industrial Systems Engineering, Computer Science and Mathematics departments that were taught by dedicated instructors (who would never tire of answering my numerous questions) and gave me differing viewpoints towards my research efforts. Indeed, bringing together ideas from such diverse fields was one of the most joyous and fruitful experiences I have ever had in my life and gave me great confidence in standing by the seemingly contentious results of my research during scrutinized presentations at various conferences. I was lucky to have had numerous interactions at various conferences, workshops and seminars with fellow graduate students and distinguished faculty that helped shape the research questions that were of interest to me and helped me learn about other related fields of interest. Graduate school would have been a lot more dull if it were not for my friends, group-mates, officemates, roommates, and co-authors with whom I have had the privilege of having many an intellectual discussions, brainstorming sessions and memorable lunches and dinners. My hearty thanks to Srinivas, Chiru, Daphney, Karthikeyan, Arash, Hao, Sucha, Kuan-Wen, Dileep, Naveen, Karthik, Chitresh, Kunal, Emrah, Nicolo and many others for enriching my overall experience at USC. My special thanks goes to co-authors on my research efforts (Naveen, Sajjad, Prof. Shri Narayanan, Prof. Franz Hover, Dr. Geoff Hollinger) for the amazing experience of collaborative research. I should also thank my mentee interns over three summers, who taught me a lot about mentorship and gave me an enjoyable learning experience from the other side of the table. The administrative staff in the Electrical Engineering department were kind and helpful throughout my years here and I am grateful to all of them for going out of their way to help whenever I needed. My hearty thanks to Gerrielyn, Diane, Corine, Tim, Susan, Anita and Tracy for the same. Last but not the least, I thank my parents and my brother for their love and support from half way around the world! 1 Abstract Linear models are the fundamental building blocks for numerous estimation algorithms throughout science and engineering. Decades of research into estimation of linear models has yielded strong theoretical foundations as well as a rich set of algorithms for this class of problems. While still widely used, linear modeling has become increasing limited in its ability to capture the nature of many real datasets in today’s applications, necessitating research into estimation of non-linear models. Although significant progress has been made towards developing generalizable theories for estimation of non-linear models, intuitive understanding of the characteristics of such estimation problems is far from well understood. Although theoretical and practical understanding of non-linear estimation problems is likely to keep researchers busy for next few decades, any estimation problem (regardless of the linearity of the underlying model) needs to address the following two questions at the minimum: 1. Is the problem well-posed? 2. If the problem is well-posed, does there exist an efficient algorithm to solve it? While the specific definition of ‘well-formulated’ and ‘efficient’ varies with the application in question and the scale of data, the intuition associated with these questions is fairly standard. Well-posedness characterizes the information theoretic feasibility of the estimation problem assuming no restriction on computational resources, whereas efficiency refers to achievability of the estimate with limited computational resources for a well-posed problem. This dissertation was born out of the need to understand a generalizable framework for analyzing well-posedness of one class of non-linear estimation problems. Herein, fundamental questions about signal identifiability and efficient recovery for cone constrained bilinear models are raised and partially answered. Chapter 2 abstractly formulates finite dimensional inverse problems with bilinear observations and conic constraints on the unknowns. Conic constraints are chosen since they represent a large class of regularizing techniques (like sparsity, low-rank nature, etc.) and bilinearity is studied since it is arguably the simplest form of non-linear modeling after piecewise-linearity. Some general dimensionality based techniques are developed for studying well-posedness in terms of existence of a unique answer in the absence of noise. The flexibility and generalizability of the proposed approach stems from the lifting procedure in optimization [1] that is a key step in the technical development of the results. Some identifiability results for one specific and important problem (blind deconvolution under separable conic constraints) are worked out to illustrate the power of the proposed approach. Chapter 3 is devoted to an in-depth study of the single channel blind deconvolution problem under specific conic constraints like canonical-sparsity and repetition coding. Blind deconvolution is chosen owing to its ubiquity in signal processing applications and notorious ill-posedness without application specific assumptions. This chapter flushes out the full power of the lifting technique for the identifiability analysis machinery developed in Chapter 2 by fully characterizing the ambiguity space for single channel blind deconvolution and obtaining surprisingly strong unidentifiability results. Chapter 4 studies the problem of localizing a target from samples of its separable and decaying field signature. This problem is a disguised instance of a bilinear inverse problem with conic constraints. However, unlike the focus on identifiability analysis in Chapters 2 and 3, the goal here is to derive a localization algorithm with theoretical guarantees on convergence rate by leveraging the approximate bilinearity in the 2 noisy target signature. It is further proved that both the number of samples and the computational effort necessary to achieve a given localization accuracy is much lower than that needed by a naive matrix completion based algorithm with passive sampling. The sufficiency of lower number of samples for the proposed approach stems from utilization of the unimodal approximation on the target signature field. The unimodal assumption is data-driven and is yet another example of a conic regularizing constraint. Besides allowing for sufficiency of lower number of samples, unimodality also provides robustness against simultaneously sparse and low-rank observation noise. This type of noise significantly degrades performance of low-rank matrix completion based estimators in the absence of prior information. Chapter 5 studies the identifiability and recovery of delay-Doppler components of a narrowband time- varying communication channel with leakage caused by finite blocklength and finite transmission bandwidth. The lifting technique is used to illustrate that this is a low-rank matrix recovery problem with complex exponential structural constraints. It is further shown that if the global optimum of the estimation problem is uniquely identifiable then there are no other local optima and that a naive low-rank matrix recovery algorithm like nuclear norm minimization is bound to fail due to unidentifiability that stems from ignorance of structural constraints. Since low-rank matrix structure can be equivalently transformed into a bilinear model, a simple and efficient estimation algorithm is designed based on the alternating minimization heuristic and shown to work well where basis pursuit based approaches fail. 3 Contents Acknowledgments 1 Abstract 2 Contents 4 List of Figures 8 List of Publications 10 1 Introduction 12 1.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2 Summary for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3 Summary for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 Summary for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5 Summary for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Identifiability Scaling Laws for Bilinear Inverse Problems 19 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.3 Organization, Reading Guide and Notation . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.1 Bilinear Maps and Bilinear Inverse Problems (BIPs) . . . . . . . . . . . . . . . . . . . 22 2.2.2 Identifiability Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.3 Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 Identifiability Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 Universal Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Deterministic Instance Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.3 A Random Rank One Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.4 Random Instance Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.1 A Measure of Geometric Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.2 The Role of Conic Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.3 The Role of δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4.4 Interpretation of Lemma 2.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4.5 The Gaussian and Bernoulli Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4.6 Distinctions between Theorems 2.13 and 2.14 . . . . . . . . . . . . . . . . . . . . . . . 34 2.5 Numerical Results on Blind Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5.1 Bi-orthogonally Supported Uniform Distributions . . . . . . . . . . . . . . . . . . . . . 35 2.5.2 Null Space of Linear Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4 2.5.3 Verification Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.4 Small Complexity ofN (S, 2) T M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.5 Large Complexity ofN (S, 2) T M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.6 Infinite Complexity ofN (S, 2) T M . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.7.1 Proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.7.2 Proof of Corollary 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7.3 Proof of Proposition 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7.4 Proof of Theorem 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7.5 Proof of Corollary 2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7.6 Proof of Lemma 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.7.7 Proof of Lemma 2.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.7.8 Proof of Theorem 2.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.7.9 Proof of Corollary 2.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.7.10 Proof of Lemma 2.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.7.11 Proof of Lemma 2.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.7.12 Proof of Lemma 2.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.7.13 Proof of Theorem 2.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.7.14 Proof of Theorem 2.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3 Identifiability Limits of Sparse Blind Deconvolution 51 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.1 Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.2 Contributions and Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.4 A Note on the Usage of Dimension/Degrees of Freedom . . . . . . . . . . . . . . . . . 55 3.1.5 Notational Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2.1 The Blind Deconvolution Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2.2 Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.3 Anti-Diagonal Sum Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3 Parameterizing the Rank-Two Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4 Main Unidentifiability Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4.1 Non-sparse Blind Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.2 Canonical-Sparse Blind Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4.3 Mixed Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5 Unidentifiability Results with Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5.1 Repetition Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5.2 Partially Cooperative Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.7.1 Proof of Proposition 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.7.2 Proof of Theorem 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.7.3 Proof of Proposition 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.7.4 Proof of Lemma 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.7.5 Proof of Theorem 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.7.6 Proof of Theorem 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.7.7 Proof of Theorem 3.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.7.8 Proof of Lemma 3.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.7.9 Proof of Corollary 3.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.7.10 Proof of Corollary 3.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5 3.7.11 Proof of Corollary 3.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.7.12 Proof of Corollary 3.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.7.13 Proof of Corollary 3.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.7.14 Proof of Corollary 3.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4 Active Target Localization on Decaying Separable Fields 89 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.1.1 Contributions and Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2.1 Target Field Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2.2 Lifted Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3 Sampling and Reconstruction Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.1 The PAMCUR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.2 Correctness and Localization-Accuracy Trade-off . . . . . . . . . . . . . . . . . . . . . 95 4.3.3 Complexity Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.4 Baseline algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4.1 Matrix Completion based variants (MConly and MCuni) . . . . . . . . . . . . . . . . . 99 4.4.2 Surface Interpolation (interp) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.4.3 Mean-shift based Gradient Ascent (MS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.5 Numerical Experiments: Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.6 Numerical Experiments: Elevation Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.6.1 3-D road network dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.8 Proofs and Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.8.1 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.8.2 Proof of Lemma 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.8.3 Proof of Theorem 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.8.4 Proof of Lemma 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.8.5 Proof of Lemma 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.8.6 Proof of Lemma 4.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.8.7 Relaxing Positivity and Sampling Grid Assumptions in Algorithm 1 . . . . . . . . . . 110 4.8.8 Coherence Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.8.9 Mean-shift based gradient ascent (MS) algorithm . . . . . . . . . . . . . . . . . . . . . 112 5 Delay-Doppler Estimation of Channels with Leakage 114 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.2 System Model and Low-Rank Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3 A Recovery Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.4 Discussion and Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.4.1 Selecting Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.4.2 Failure of Nuclear Norm Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.4.3 Practical Convergence Rate and Stopping Criterion . . . . . . . . . . . . . . . . . . . 121 5.4.4 Parameter Regimes to avoid Ill-conditioning . . . . . . . . . . . . . . . . . . . . . . . . 122 5.4.5 Numerical Results: One Dominant Component . . . . . . . . . . . . . . . . . . . . . . 123 5.4.6 Numerical Results: Multiple Dominant Components . . . . . . . . . . . . . . . . . . . 123 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.6.1 Supporting Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.6.2 Proof of Theorem 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6 5.6.3 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6 Conclusions and Future Directions 130 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Bibliography 133 7 List of Figures 2.1 Lifted matricesS k ∈R m×n for linear convolution map with m = 3, n = 4, q =m +n− 1 = 6 and 1≤k≤q. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Exponentially decaying behavior of the theoretically predicted failure probability bound in Theorem 2.14, w.r.t.n for fixed values ofm, for parameters = 0.1,δ = 10 −4 andp =m+n−3 for the lifted linear convolution map (p defined as in Theorem 2.14). . . . . . . . . . . . . . . 32 2.3 Linear Scaling behavior of log(Failure Probability) with logn for fixed values of m. The absolute value of the fitted slope is 0.48. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4 Linear Scaling behavior of log(Failure Probability) with problem dimension n for fixed values of m. The absolute value of the fitted slopes are between 0.093 and 0.094. . . . . . . . . . . . 38 2.5 Exponentially decaying behavior of the simulated failure probability w.r.t.n for fixed values of m, for parameter μ = 0.8 and the lifted linear convolution map. The absolute value of the fitted slopes are between 0.94 and 1.08. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.1 Lifted matricesS k ∈R m×n for linear convolution map withm = 3,n = 4 and 1≤k≤m +n− 1. 58 3.2 Illustration of the anti-diagonal sum interpretation of the lifted linear convolution operator S (·) for (m,n) = (3, 4) satisfyingS (W ) =z. The shorthand w kl =W (k,l) has been used with 1≤k≤m = 3 and 1≤l≤n = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3 Venn diagram displaying the subset/superset relationships betweenN S (m,n) , 2 ,N 0 (m,n), N 2 (m,n) andM(m,n) as indicated by Theorem 3.3 for m,n≥ 3. . . . . . . . . . . . . . . . . 60 3.4 An arbitrary vectors∈K 0 (Λ,d) with Λ ={3, 4, 7, 8, 9, 12} and d = 14. Everys∈K 0 (Λ,d) is zero on the index set Λ (indicated by blue dots). Heights of the black dashed stems, indicating values on Λ c , can vary across different vectors inK 0 (Λ,d). . . . . . . . . . . . . . . . . . . . . 63 3.5 Two hop relay assisted communication link topology between the source S and the destination D with k intermediate relays R 1 , R 2 , ..., R k . The j th SISO channel is scaled byg(l j ) and delayed by l j units (represented asg(l j )D −lj h) for 1≤j≤k. . . . . . . . . . . . . . . . . . . 67 3.6 Anarbitraryvectors∈K b (Λ,d)with Λ ={3, 4, 7, 8, 9, 12},d = 14andb = (−0.5,−0.835, 0.3, 0.5, 0.835, 0.15) T . The specific vector s in the left plot satisfies s(Λ) = cb with c =−1, verifying (3.29) with code vectorb as in the right plot. Everys∈K b (Λ,d) is collinear withb on the index set Λ (indicated by union of solid red and dashed red stems) with Λ representing the identities of the cooperating relays. Heights of the double dashed black stems, indicating values on Λ c , represent contributions from non-cooperating relays and can vary across different vectors in K b (Λ,d). Letting Λ ∗ ={1, 3, 4}, (Λ,b) is a type 1 pair by Definition 3.2 and is validated by s(Λ T (Λ− 1)) =−b(Λ ∗ ) (indicated by solid red stems on both plots) being collinear with s(Λ T (Λ− 1) + 1) =−b(Λ ∗ + 1) (pointed to by black arrows on both plots). Union of the blue braces on the left plot denotes the index set Λ S (Λ− 1) ={2, 3, 4, 6, 7, 8, 9, 11, 12}. . . . 69 4.1 An underwater side-scan sonar image with pseudo-synthetic target signature. The background noise and artifacts are due to reflections from the seabed. . . . . . . . . . . . . . . . . . . . . 90 4.2 Plot of dominant singular values for approximately separable fields (approximately rank-1 fields). 93 8 4.3 A heatmap of the ratio η(σ,σ 0 ,ζ) p 1−ζ 2 /σ 2 over the domain 0≤ζ≤σ≤σ 0 . In the high SNR regime, σ/σ 0 ≈ 1 and the bound η(σ,σ 0 ,ζ)≥ p 1−ζ 2 /σ 2 is very tight irrespective of the value of ζ. 96 4.4 Plot of an arbitrary non-negative unimodal vector u 0 ∈R 401 with unit ` 2 -norm, satisfying u 0 (l)∝ exp(−|0.1×l− 20.1|) over 1≤ l≤ 401. Choosing ζ 0 = 0.3 √ 2, calculations give l BL r = 176 and l BR r = 226. The threshold l BR r −l BL r is less than one-eighth of the length ofu 0 . 96 4.5 Trade-off showing the localization bound (normalized by size of search space) achievable for a given accuracy bound p 1−ζ 2 /σ 2 (and hence for a given sampling budget) for discretized versions of standard Gaussian, Laplacian and Cauchy fields. . . . . . . . . . . . . . . . . . . . 100 4.6 Variation of the probability of correct localization by MConly algorithm to within 4% of the search space, averaged over 10 trials. Results are for three different decay profiles across a range of sampled window sizes and field spread factors with Gaussian distributed background noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.7 A low resolution visualization of the elevations on the road network dataset. The regions where readings are not available are shown in blue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.8 Trade-off between number of samples collected and the localization accuracy achieved for the MConly, MCuni, interp, MS and PAMCUR algorithms averaged over 500 runs for two different grid sizes, viz. n = 50 and n = 100. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.9 Localization error performance vs number of samples for PAMCUR algorithm on the elevation dataset in Figure 4.7. Each color represents the result of 500 independent trials plotted individually. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.10 A three dimensional embedding of vectorsu M ,u andu 0 for aiding visualization in proof of Theorem 4.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.1 A single path time-varying channel,H 1 , representation in delay Doppler domain. HereK = 128 and M = 128. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.2 Estimation accuracy results averaged over 10 realizations of the observation noise vector. Closer the value of ˆ f/f to 1, higher is the relative accuracy of the estimate. . . . . . . . . . . . . . . 123 5.3 Scatter plot of relative accuracy n ˆ f(l)/f(l) 1≤l≤ 4 o of all components in the estimated frequency vector ˆ f versus frequency factor η∈{0.8, 1, 1.2} at a constant oversampling factor ofρ = 1.5 and 20dB SNR. Results for different η are coded by markers of different color/shape. For a given η, stronger the clustering of the markers around 1 along the y-axis, higher is the overall relative accuracy of ˆ f. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.4 Plot of normalized MSE performance of BP and AMALR versus frequency factor η ∈ {0.8, 1, 1.2} at a constant oversampling factor of ρ = 1.5 and 20dB SNR. BP fails com- pletely, due to non-utilization of structural properties. AMALR gives less than 2% normalized MSE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 9 List of Publications [1] S. Choudhary, S. Beygi, and U. Mitra, “Delay-Doppler Estimation via Structured Low-Rank Matrix Recovery,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, Mar. 2016, pp. 3786–3790. [2] S. Choudhary, N. Kumar, S. Narayanan, and U. Mitra, “Active Target Localization using Low-Rank Matrix Completion and Unimodal Regression,” ArXiv e-prints, vol. abs/1601.07254, Jan. 2016. [Online]. Available: http://arxiv.org/abs/1601.07254. [3] S. Honnungar, S. Choudhary, and U. Mitra, “On Target Localization with Communication Costs via Tensor Completion: A Multi-modal Approach,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, Mar. 2016, pp. 6400–6404. [4] S. Choudhary and U. Mitra, “Fundamental Limits of Blind Deconvolution Part II: Sparsity-Ambiguity Trade-offs,” ArXiv e-prints, vol. abs/1503.03184, Mar. 2015. [Online]. Available: http://arxiv.org/ abs/1503.03184. [5] ——, “Analysis of Target Detection via Matrix Completion,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, Apr. 2015, pp. 3771–3775. [6] U. Mitra, S. Choudhary, F. Hover, R. Hummel, N. Kumar, S. Narayanan, M. Stojanovic, and G. Sukhatme, “Structured Sparse Methods for Active Ocean Observation Systems with Communication Constraints,” IEEE Commun. Mag., vol. 53, no. 11, pp. 88–96, Nov. 2015. [7] S. Choudhary, D. Kartik, N. Kumar, S. Narayanan, and U. Mitra, “Active Target Detection with Navigation Costs: A Randomized Benchmark,” in 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, USA, Sep. 2014, pp. 109–115. [8] S. Choudhary, N. Kumar, S. Narayanan, and U. Mitra, “Active Target Detection with Mobile Agents,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014, pp. 4218–4222. [9] S. Choudhary and U. Mitra, “Identifiability Scaling Laws in Bilinear Inverse Problems,” ArXiv e- prints, vol. abs/1402.2637, Feb. 2014, submitted to IEEE Transactions on Information Theory. arXiv: 1402.2637. [Online]. Available: http://arxiv.org/abs/1402.2637. [10] ——, “Fundamental Limits of Blind Deconvolution Part I: Ambiguity Kernel,” ArXiv e-prints, vol. abs/1411.3810, Nov. 2014. arXiv: 1411.3810. [Online]. Available: http://arxiv.org/abs/1411.3810. [11] ——, “Sparse Blind Deconvolution: What Cannot Be Done,” in 2014 IEEE International Symposium on Information Theory (ISIT), Honolulu, USA, Jun. 2014, pp. 3002–3006. [12] ——, “On The Impossibility of Blind Deconvolution for Geometrically Decaying Subspace Sparse Signals,” in 2nd IEEE Global Conference on Signal and Information Processing (GlobalSIP), Atlanta, USA, Dec. 2014, pp. 463–467. [13] ——, “Identifiability Bounds for Bilinear Inverse Problems,” in 47th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, USA, Nov. 2013, pp. 1677–1681. [14] ——, “On Identifiability in Bilinear Inverse Problems,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, May 2013, pp. 4325–4329. 10 [15] ——, “Sparse Recovery from Convolved Output in Underwater Acoustic Relay Networks,” in 2012 Asia-Pacific Signal Information Processing Association Annual Summit and Conference (APSIPA ASC), Hollywood, USA, Dec. 2012, pp. 1–8. [16] G. Hollinger, S. Choudhary, P. Qarabaqi, C. Murphy, U. Mitra, G. Sukhatme, M. Stojanovic, H. Singh, and F. Hover, “Underwater Data Collection Using Robotic Sensor Networks,” IEEE J. Sel. Areas Commun., vol. 30, no. 5, pp. 899–911, Jun. 2012. [17] ——, “Communication Protocols for Underwater Data Collection using a Robotic Sensor Network,” in 2011 IEEE GLOBECOM Workshops (GC Wkshps), Houston, USA, Dec. 2011, pp. 1308–1313. 11 Chapter 1 Introduction As the world heads towards the use of systems with ever increasing complexities, nonlinear effects have become an unavoidable part of modeling, observation, control and reconstruction processes for high dimensional sampleddataapplications. Inmanylargedatasetsofnaturalimages, sparsesignalmodelsandlow-dimensional non-linear manifold based models explain and predict observed data better than classical linear models [2]. In integrated circuit design of sampling front-ends, secondary and tertiary non-linear effects have become dominant energy sinks as circuit densities have increased exponentially. In control theoretic applications, simple non-linear controllers outperform their linear counterparts in many critical centralized and decentralized architectures [3]. In compression and reconstruction applications, non-linear reconstruction strategies can regularly outperform traditional linear filtering for undersampled data [4], [5]. Classical filter design tools in sampling and signal processing systems have been inherently linear, and are not well suited for native extension to non-linear models or systems. On the other hand, optimization theory based tools for certain classes of non-linear objectives and constraints have become increasingly feasible with dramatic increase in computing power over the last decade [6]. It is therefore natural to expect that optimization-centric non-linear processing capabilities should inform and broaden our current modeling and sampling choices for problem specific non-linear models in contemporary applications. Classically, the approach to modeling and sampling non-linearity has been one of avoidance, i.e. designing, sampling and reconstructing approximately linearly. This is not a universally desirable approach, despite the widespread popularity of linear models in science and engineering. To see why this is the case, one needs to look no further than typical sources of non-linearity in some contemporary applications. 1. Observing unknown signals through partially known linear systems: Consider a corporation (like Google) that maintains a database of few basic facts on customers visiting its website. Suppose that the corporation wants to strategically market the launch of a new product to maximize revenue. From prior experiences, the corporation knows that some linear function of a sufficiently detailed profile of a customer serves as a fairly good predictor of that customer’s response to the new product. However, government regulations only allow the corporation to have access to a small subset of the full customer profile and the company is left with the task of simultaneously determining the linear function (the unknown signal) and predicting the missing information in its customer profiles (partially known linear system). In the machine learning literature, this task falls under semi-supervised learning [7]. 2. Non-linear manifold based signal models: Suppose that researchers in proteomics [8] discover a hitherto unknown protein molecule responsible for an important cellular physiological process. To probe the structure of this newly discovered molecule, researchers use X-rays to bombard a crystallized form of the protein. However, X-ray detectors that can measure the phase of the reflected rays tend to be very expensive to manufacture, due to technological limitations and it is common practice to work with much cheaper detectors that can only measure the intensity of the reflected X-rays over a wide bandwidth. The price paid in measuring only spectral intensities comes in the form of the observed intensities no longer being linear functions of the unknown signal (the protein structure). Inferring the 12 unknown signal now becomes a non-linear inverse problem; this particular problem is known as the phase retrieval problem [9]. The above categories are by no means exhaustive, but they strongly suggest inadequacy of linear models in important applications. Beside modeling necessities, there are multiple other reasons to embrace non- linearities. 1. Acquiring linear observations are prohibitively expensive for some applications as compared to acquiring certain classes of non-linear observations, e.g. phaseless measurements in X-ray crystallography [10], bilinear composite channels in multihop communications [11], [12], matrix factorization for topic modeling [13], etc. 2. Non-linear models can lead to efficient sampling and reconstruction schemes for high dimensional applications where sample complexity or latency is a bottleneck, e.g. magnetic resonance imaging [14], hyperspectral imaging [15], collaborative filtering [16], wideband communications [17], etc. To further exemplify this point, the theory and application of compressed sensing [4] and low-rank matrix completion [5] allows the use of random linear sensing on a non-linear low-dimensional model of a data set of high ambient dimension, to break the sampling lower bound of Shannon and Nyquist. Working with non-linear models of course comes with a price. While the mathematical theory of linear systems is simple to comprehend and is at a mature stage of development, no analogous mathematical theory for non-linear systems holds in such generality. This makes both algorithmic as well as information theoretic development for non-linear models a much harder task than their linear counterparts. Furthermore, application specific considerations cannot always be directly incorporated by the current state of the art in sparse signal approximation theory. For a successful transition from theory to practice necessarily requires novel solutions to such roadblocks, leading to modifications to the existing theory and appropriate generalizations to classes of non-linear and structured linear observation schemes. This dissertation is an attempt to raise and partially address fundamental questions about signal identifiability and recovery for finite dimensional bilinear inverse problems with structural constraints. 1.1 Organization Chapters 2 and 3 describe the results of the author’s research efforts towards the information theoretic characterization of finite dimensional inverse problems with bilinear observations; one of the simplest forms of non-linear observation models. Emphasis is put on theoretical development of identifiability in such inverse problems under regularizing constraints like sparsity. Much justification of the specific classes of assumptions made in this theoretical research is inspired by an effort to identify the characteristics of the sparsity constrained blind deconvolution problem (a notoriously difficult bilinear inverse problem in signal processing). Chapter 4 describes how approximate bi-linearity can benefit a practical application like active target localization. Here the emphasis is on using separability (a form of bilinear parameterization) and monotonicity/unimodality (a convex conic constraint) to derive a localization algorithm with theoretical guarantees on convergence rate and localization-accuracy trade-off and demonstrating the regularizing properties of separability on a real dataset that is not strictly separable. The explicit connection to bilinear observation systems, stems from the separability and low-rank structures on the low-dimensional physical field, i.e. the minimal parametrization of the field enters the description in a bilinear fashion. Chapter 5 investigates how structured bi-linearity stemming from delay-Doppler domain parametrization of narrow-band communication channels can be used to guarantee identifiability of the channel estimation inverse problem, when straightforward formulations based on low-rank matrix recovery are fraught with non-convexities that are difficult to relax without loosing identifiability. Below, we summarize the background, related work and contributions for each of these chapters. 13 1.2 Summary for Chapter 2 A number of ill-posed inverse problems like blind deconvolution [18], blind source separation [19] and dictionary learning [15] in signal processing, matrix factorization in machine learning [20], blind equalization in wireless communications [21], etc. share the common characteristic of being bilinear inverse problems (BIPs), i.e. the observation model is a function of two variables and conditioned on one variable being known, the observation is a linear function of the other variable. A key issue that arises for such inverse problems is that of identifiability, i.e. whether the observation is sufficient to unambiguously determine (beyond trivial ambiguities) the pair of inputs that generated the observation. Of particular interest are signal recovery problems from under-determined systems of measurement where additional structure is needed in order to ensure recovery, and the observation model is non-linear in the parametrization of the problem. Identifiability (and signal reconstruction) for linear inverse problems with sparsity and low-rank structures has received considerable attention in the context of compressed sensing and low-rank matrix recovery, respec- tively, and are now quite well understood [22]. In a nutshell, both compressed sensing and low-rank matrix recovery theories guarantee that the unknown sparse/low-rank signal can be unambiguously reconstructed from relatively few properly designed linear measurements using algorithms with runtime growing polynomially in the signal dimension. For non-linear inverse problems (including BIPs), however, characterization of identifiability (and signal reconstruction) still remains largely open. To illustrate that analyzing identifiability is nontrivial, we present a simple example. Consider the blind linear deconvolution problem represented by Problem (P 1 ). Suppose that we have the observationz = (1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0) T ∈R 17 . It is not difficult to verify that both x = (1, 0, 0, 0, 1, 0, 0) T ,y = (1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1) T (1.1a) and, x = (1, 0, 1, 0, 1, 0, 1) T ,y = (1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) T (1.1b) are valid solutions to Problem (P 1 ). Furthermore, it is not immediately obvious as to what structural constraints would disambiguate between the above two solutions. We have showed identifiability and constructed fast recovery algorithms in [11] when x (possibly sparse) is on a one dimensional subspace (modulo global sign flip), whereas we show negative results for the more general sparse (with respect to the canonical basis) blind deconvolution problem with extension to geometrically decaying subspace sparse signals in Chapter 3. To the best of our knowledge, a unified theoretical treatment of identifiability for BIPs has not been attempted in prior literature. We only found literature addressing identifiability for specific BIPs under specific assumptions (with potential to be generalized) that we summarize below. The generality of our approach includes, in particular, the linear convolution model of [23], the circular convolution model of [24], [25] and the compressed bilinear observation model of [26] as special cases. Our technical approach hinges on the lifting procedure from optimization [1] that has previously been employed to derive convex relaxations for hard optimization problems but has never been employed to study identifiability. Furthermore, existing identifiability or recovery results do not generalize to cases where the sampling operator admits rank two matrices in its null space. In [25], a recoverability analysis for the blind circular deconvolution problem is undertaken, but the knowledge of the sparsity pattern of one input signal is needed. Some identifiability results for blind deconvolution are summarized in [27], but the treatment therein is inflexible to the inclusion of side information about the input signals. For the dictionary learning problem, an identifiability analysis is developed in [28] leveraging results from [29] and exact recoverability of over-complete dictionaries from training samples (only polynomially large in the dimensions of the dictionary) has been proved in [13] assuming sparse (but unknown) coefficient matrix. While every BIP can be recast as a dictionary learning problem in principle, such a transformation would result in additional structural constraints on the dictionary that may or may not be trivial to incorporate in the existing analyses. This is especially true for bilinear maps over vector pairs. In contrast, we develop our methods to specifically target bilinear maps over vector pairs (e.g. convolution map) and thus obtain definitive results where the dictionary learning based formulations would most likely fail. Identifiability for non-negative matrix factorization was examined in [20] exploiting 14 geometric properties of the non-negative orthant. Although our results can be easily visualized in terms of geometry, they can also be stated purely in terms of linear algebra (Theorem 2.4). Identifiability results for low-rank matrix completion [5], [30] are provided in [31] via algebraic and combinatorial conditions using graph theoretic tools, but there is no straightforward way to extend these results to more general lifted linear operators like the convolution map. Overall, a unified flexible treatment of identifiability in BIPs with deterministic operators has not been developed till date. In this chapter, we present such a framework incorporating conic constraints on the input signals (which includes sparse signals in particular). Details of the contributions are given below. 1. We cast conic prior constrained BIPs as low-rank matrix recovery problems, establish the validity of the ‘lifting’ procedure (Section 2.2.3) and develop deterministic sufficient conditions for identifiability (Section 2.3.2) while bridging the gap to necessary conditions in a special case. Our characterization agrees with the intuition that identifiability subject to priors should depend on the joint geometry of the signal space and the bilinear map. Our results are geared towards bilinear maps that admit a nontrivial rank-two null space, as is the case with many important BIPs like blind deconvolution. 2. We develop trade-offs between probability of identifiability of a random instance of the input signal and the complexity of the rank two null space of the lifted bilinear map under three classes of sig- nal ensembles, viz. dependent but uncorrelated, independent Gaussian, and independent Bernoulli (Section 2.3.4). Specifically, we demonstrate that instance identifiability can be characterized by the complexity of restricted rank two null space, measured by the covering number of the set {(C(X),R(X))|X∈N (S, 2) T M\{0}}, whereC(X) andR(X) denote, respectively, the column and row spaces of the matrixX andN (S, 2) T M denotes the rank two null space of the lifted bilinear mapS (·) restricted by the prior on the signal set toM. To the best of our knowledge, this gives new structural results solely based on the bilinear measurement model and is thus applicable to general BIPs. 3. We demonstrate that the rank two null space of the lifted bilinear map can be partly characterized in at least one important case (blind deconvolution), and conjecture that the same should be possible for other bilinear maps of interest (dictionary learning, blind source separation, etc.). Based on this characterization, we present numerical simulations for selected variations on the blind deconvolution problem to demonstrate the tightness of our scaling laws (Section 2.5). Characterization of the rank-two null space of blind deconvolution is addressed further in Chapter 3. 1.3 Summary for Chapter 3 Continuing from Chapter 2, this chapter uses the machinery developed therein and goes into a more in-depth analysis of the blind linear deconvolution problem under sparsity constraints. We chose to address blind deconvolution in particular since it is an important and ubiquitous enough non-linear inverse problem in its own right (even without identifying it as a BIP) for applications in control theory [27], [32], wireless communications [18], [21], [33]–[35] and image processing [36], [37]. In the absence of additional constraints, blind deconvolution is known to be ill-posed and each application mentioned above imposes some form of prior knowledge on the underlying signal structures to render the inverse problem better behaved. Prior research on blind system and channel identification [21], [27], [35] has mainly focused on single-in-multiple-out (SIMO) and multiple-in-multiple-out (MIMO) systems, also known as the blind multi-channel finite-impulse-response (FIR) estimation problem. A key property necessary in this setting for successful identifiability and recovery of the multiple channel vectors is that the channels should display sufficient diversity or richness either stochastically (cyclostationary second order statistics) [21], [38], [39] or deterministically (no common zero across all channels) [35], [40]–[43]. As pointed out in [18], [38], such diversity is generally unavailable in the single-in-single-out (SISO) systems thus making them extremely challenging. In the absence of inter-symbol-interference [44], the problem is easily solved. While there have been a few attempts at exploiting sparsity priors for blind deconvolution type of problems [19], [24], [25], 15 [28], [45], [46], satisfactory results have not been obtained on the key issue of identifiability of such sparse models, except with very restrictive constraints. A promising identifiability analysis was proposed in [28], leveraging results from [29] on matrix factorization for sparse dictionary learning using non-convex ` 1 norm and ` q quasi-norm for 0<q< 1. At this time, however, extending this machinery to the single channel blind linear deconvolution problem remains elusive. In this chapter, we quantify unidentifiability for certain families of noiseless sparse blind and semi- blind deconvolution problems under a non-asymptotic and non-statistical setup. Specifically, given model orders m,n ∈ Z + , we investigate some application driven choices of the separable domain restriction (x ∗ ,y ∗ )∈K⊆R m ×R n and establish that the vectorsx ∗ andy ∗ cannot be uniquely determined from their linearly convolved resultant vectorz ∗ =x ∗ ?y ∗ , even up to scaling ambiguities. The specific application we consider, to motivate our choices in Section 3.5, is multi-hop sparse channel estimation for relay assisted communication. Our focus is on an algorithm independent identifiability analysis and hence we shall not examine efficient/polynomial-time algorithms, but rather show information theoretic impossibility results. Our approach leads to the following novelties. 1. We explicitly demonstrate the almost everywhere unidentifiable nature of unconstrained blind decon- volution by constructing families of adversarial signal pairs for even model orders m and n. This is a much stronger unidentifiability result than any results in prior literature. 2. We show that sparsity in the canonical basis is not sufficient to ensure identifiability, even in the presence of perfect model order information, and we construct non-zero dimensional unidentifiable subsets of the domainK for any given support set ofx ∗ (ory ∗ ) as evidence. 3. We state and prove a measure theoretically tight, non-linear, recursive characterization of the rank two null space of the lifted linear convolution map. To the best of our knowledge, this is a new result. Because of the simplicity of the linear convolution map, a subset of this rank-two null space can be parametrized to yield an analytically useful representation that is instrumental in the derivation of our results. Our proofs are constructive and demonstrate the rotational ambiguity phenomenon in general bilinear inverse problems. For blind deconvolution, the rotational ambiguity phenomenon is the reason for the existence of a large dimensional set of unidentifiable input pairs. 4. We consider a theoretical abstraction to a multi-hop channel estimation problem [47], [48] and extend our unidentifiability results to this setting. Specifically, we show that other types of side information like repetition coding or geometric decay for the unknown vectors is still insufficient for identifiability. 1.4 Summary for Chapter 4 Detecting and localizing a target, from samples of its induced field, is an important problem of interest with manifestations in a wide variety of applications like environmental monitoring, cyber-security, medical diagnosis and military surveillance. Often, the target field admits structural properties that enable the design of lower sample detection strategies with good performance. In this chapter, we study a variation of the target detection and localization problem with sampling constraints on the induced target field. The possibility of reducing the number of samples required for target detection/localization is of interest for time critical applications where speed of acquisition is a bottleneck, like in magnetic resonance imaging due to the slow sampling process and in underwater sonar imaging due to large search spaces. An early survey of active target detection consisting of statistical and signal processing approaches that assume availability of the full target field/signature can be found in [49] (see also [50], [51]). The field of anomaly detection [52] further generalizes the scope of target detection and employs tools from machine learning, e.g. [53]–[58] perform window based target detection in full sonar images. General theoretical analysis on either of these problems is plagued by the lack of good models for experimental scenarios that are amenable to tractable analysis. In [58]–[64] there is a focus on path planning for active sensing of structured fields (in particular, [62] uses compressed sensing) with an explicit consideration of the navigation cost and 16 stopping time. In contrast, the goal of this chapter is to explore theoretical properties of adaptive sensing for structured fields stemming from the exploration-exploitation trade-off. Early work [65] focusing on target detection in multiple-in-multiple-out (MIMO) radar used a statistical approach, which was refined in [66]–[68] using a combination of joint sparse sensing and low-rank matrix completion ideas, relying on the strong theoretical guarantees of low-rank matrix completion from random samples [69]–[71]. The focus in the papers [66]–[68] is to adapt the design of the MIMO radar array to optimize coherence, which is also very different from our goal here of studying the detection and localization error performance of low-rank matrix completion. Finally, we note that distilled sensing [72]–[74] has a somewhat similar algorithmic philosophy as ours for target detection, but therein the field is assumed to be sparse rather than low-rank, thus facing basis mismatch challenges [75] that we can avoid completely. This chapter designs a sampling and localization strategy which exploits separability and unimodality in target fields and theoretically analyzes the trade-off achieved between sampling density, noise level and convergence rate of localization. In particular, the strategy adopts an exploration-exploitation approach to target detection and utilizes the theory of low-rank matrix completion, coupled with unimodal regression, on decaying and approximately separable target fields. The assumptions on the field are fairly generic and are applicable to many decay profiles since no specific knowledge of the field is necessary besides its admittance of an approximately rank-one representation. Besides regularization, separability is also shown to give computational benefits in terms of enabling unimodal regression on vectors (instead of unimodal regression on matrices). In contrast to our results, most prior literature on noisy low-rank matrix completion investigates bounds on mean squared estimation error, and very little is known about the performance of matrix completion for other tasks (like detection or localization). Coherence factors are computed for exponential and power-law fields to theoretically approximate the number of samples necessary for the algorithm, and numerical experiments and comparisons are performed to test the efficacy and robustness of the presented approach on synthetically generated data as well as on a real 3-D road network dataset. A somewhat surprising phenomenon is demonstrated that Laplacian fields achieve better localization vs accuracy trade-off under a fixed sampling budget, as compared to Gaussian or Cauchy fields. The presented approach is shown to be fairly robust to missing data points and to the presence of multiple local peaks, demonstrating the regularizing properties of separability. Furthermore, the proposed method also outperforms competing approaches at low sampling densities; a scenario which is fairly common when sampling is a bottleneck. 1.5 Summary for Chapter 5 The estimation of a narrowband time-varying communication channel is a central problem of interest in the design of the physical layer of any communication system. In challenging environments like underwater acoustic communications, practical considerations of finite block length and finite transmission bandwidth become much more important than in the conventional terrestrial mobile communication systems. The conventional least-squares and Wiener-filtering estimators do not take advantage of the inherent structure of the channel. The other drawback of Wiener-filtering is that the knowledge of the scattering function is required [76]; however, the scattering function is not typically known at the receiver. Often, a flat spectrum in the delay-Doppler domain is assumed, which introduces performance degradation due to the mismatch with respect to the true scattering function [76]. While contemporary approaches based on compressed sensing [77]–[79] can exploit the inherit sparsity structure of dominant components in the channel, finite block length and finite transmission bandwidth result in the reduction of usable sparsity of channels in practical communication systems owing to model mismatch. This effect, known as channel leakage [77], [79], has been shown to significantly degrade the performance of compressed sensing based methods. This chapter designs an estimation strategy for the delay-Doppler components of the communication channel, and explicitly accounts for the spectrum leakage effects. Both separability of the channel matrix in the delay-Doppler domain and the non-linear complex exponential structure induced by Doppler shifts are utilized to show that the channel matrix is of low rank (equal to the number of dominant paths) and to develop the estimation algorithm. Furthermore, it is shown that while the overall estimation problem is a low-rank matrix recovery problem with structural constraints, naive low-rank matrix recovery via nuclear norm 17 minimization is bound to fail due to non-utilization of the inherent structure in the problem beyond that of low-rank. In fact, it is shown that the structural constraint due Doppler shift must be taken into consideration in order to guarantee identifiability under the observation model induced by the training sequence and that there are no local optimum for the weighted minimum-mean-square-error (MMSE) optimization problem if the global optimum is uniquely identifiable (a non-trivial statement for non-linear/non-convex inverse problems). The algorithmic approach involves alternating minimization to directly use the bilinear structure arising out of a factored representation of the low-rank channel matrix, accompanied by careful selection of the weights and the initialization point to accelerate convergence (as compared to unweighted MMSE estimation). Numerical experiments show that the proposed method estimates a multipath channel from a small number of measurements and very small separation between Doppler shifts; a regime where competing approaches based on naive use of sparsity (like in Basis Pursuit [80]) or low-rank structures fail. Extensive justification is provided for the choice of simulation parameters, initialization strategies, and ill-posed regimes (avoided for simulation). 18 Chapter 2 Identifiability Scaling Laws for Bilinear Inverse Problems 2.1 Introduction We examine the problem of identifiability in bilinear inverse problems (BIPs), i.e. input signal pair recovery for systems where the output is a bilinear function of two unknown inputs. Important practical examples of BIPs include blind deconvolution [18], blind source separation [19] and dictionary learning [15] in signal processing, matrix factorization in machine learning [20], blind equalization in wireless communications [21], etc. Of particular interest are signal recovery problems from under-determined systems of measurement where additional structure is needed in order to ensure recovery, and the observation model is non-linear in the parametrization of the problem. Consider a discrete-time blind linear deconvolution problem. Letx∈D x andy∈D y be respectively m and n dimensional vectors from domainsD x ⊆R m andD y ⊆R n , and suppose that the noise free linear convolution ofx andy is observed asz. Then the blind linear deconvolution problem can be represented as the following feasibility problem. find (x,y) subject to x?y =z, x∈D x ,y∈D y . (P 1 ) We draw the reader’s attention to the observation/measurement modelz =x?y. Notice that if eitherx ory was a fixed and known quantity, then we would have an observation model that is linear in the other variable. However, when both x andy are unknown variables, then the linear convolution measurement modelz =x?y is no longer linear in the variable pair (x,y). Such a structural characteristic is referred to as a bilinear measurement structure (formally defined in Section 2.2). The blind linear deconvolution problem (P 1 ) is the resulting inverse problem. Such inverse problems arising from a bilinear measurement structure shall be referred to as bilinear inverse problems (formally defined in Section 2.2). A key issue in many under-determined inverse problems is that of identifiability: “Does a unique solution exist that satisfies the given observations?” Identifiability (and signal reconstruction) for linear inverse problemswithsparsityandlow-rankstructureshasreceivedconsiderableattentioninthecontextofcompressed sensing and low-rank matrix recovery, respectively, and are now quite well understood [22]. In a nutshell, both compressed sensing and low-rank matrix recovery theories guarantee that the unknown sparse/low-rank signal can be unambiguously reconstructed from relatively few properly designed linear measurements using algorithms with runtime growing polynomially in the signal dimension. For non-linear inverse problems (including BIPs), however, characterization of identifiability (and signal reconstruction) still remains largely open. To illustrate that analyzing identifiability is nontrivial, we present a simple example. Consider the blind linear deconvolution problem represented by Problem (P 1 ). Suppose that we have the observation 19 z = (1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0) T ∈R 17 withD x =R 7 andD y =R 11 . It is not difficult to verify that both x = (1, 0, 0, 0, 1, 0, 0) T ,y = (1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1) T (2.1a) and, x = (1, 0, 1, 0, 1, 0, 1) T ,y = (1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) T (2.1b) are valid solutions to Problem (P 1 ). Furthermore, it is not immediately obvious as to what structural constraints would disambiguate between the above two solutions. We have showed identifiability and constructed fast recovery algorithms in [11] when x (possibly sparse) is on a one dimensional subspace (modulo global sign flip), whereas we show negative results for the more general sparse (with respect to the canonical basis) blind deconvolution problem with extension to geometrically decaying subspace sparse signals in Chapter 3. 2.1.1 Contributions 1. We cast conic prior constrained BIPs as low-rank matrix recovery problems, establish the validity of the ‘lifting’ procedure (Section 2.2.3) and develop deterministic sufficient conditions for identifiability (Section 2.3.2) while bridging the gap to necessary conditions in a special case. Our characterization agrees with the intuition that identifiability subject to priors should depend on the joint geometry of the signal space and the bilinear map. Our results are geared towards bilinear maps that admit a nontrivial rank two null space, as is the case with many important BIPs like blind deconvolution. 2. We develop trade-offs between probability of identifiability of a random instance and the complexity of the rank two null space of the lifted bilinear map under three classes of signal ensembles, viz. dependent but uncorrelated, independent Gaussian, and independent Bernoulli (Section 2.3.4). Specifically, we demonstratethatinstanceidentifiabilitycanbecharacterizedbythecomplexityofrestrictedranktwonull space, measured by the covering number of the set{(C(X),R(X))|X∈N (S, 2) T M\{0}}, where C(X) andR(X) denote, respectively, the column and row spaces of the matrixX andN (S, 2) T M denotes the rank two null space of the lifted bilinear mapS (·) restricted by the prior on the signal set toM. To the best of our knowledge, this gives new structural results solely based on the bilinear measurement model and is thus applicable to general BIPs. 3. We demonstrate that the rank two null space of the lifted bilinear map can be partly characterized in at least one important case (blind deconvolution), and conjecture that the same should be possible for other bilinear maps of interest (dictionary learning, blind source separation, etc.). Based on this characterization, we present numerical simulations for selected variations on the blind deconvolution problem to demonstrate the tightness of our scaling laws (Section 2.5). 2.1.2 Related Work Our treatment of BIPs draws on several different ideas. We employ ‘lifting’ from optimization [1] which enables the creation of good relaxations for intractable optimization problems. This can come at the expense of an increase in the ambient dimension of the optimization variables. Lifting was used in [81] for analyzing the phase retrieval problem and in [25] for the analysis of blind circular deconvolution. We employ lifting in the same spirit as [25], [81] but our goals are different. Firstly, we deal with general BIPs which include the linear convolution model of [23], the circular convolution model of [24], [25] and the compressed bilinear observation model of [26] as special cases. Secondly, we focus solely on identifiability (as opposed to recoverability by convex optimization [25], [81]) enabling far milder assumptions on the distribution of the input signals. After lifting, we have a rank one matrix recovery problem, subject to inherited conic constraints. While encouraging results have been shown for low-rank matrix recovery using the nuclear norm heuristic [70], quite stringent incoherence assumptions are needed between the sampling operator and the true matrix. Furthermore, the results do not generalize to an analysis of identifiability when the sampling operator admits 20 rank two matrices in its null space. We are able to relax the incoherence assumptions in special cases for analyzing identifiability and also consider sampling operators with a non-trivial rank two null space. Since the works [82]–[84] can be interpreted as solving BIPs with the lifted map drawn from a Gaussian random ensemble, thus leading to a trivial rank two null space with high probability, the results therein are not directly comparable to our results. In [25], a recoverability analysis for the blind circular deconvolution problem is undertaken, but the knowledge of the sparsity pattern of one input signal is needed. Taking our Problem (P 1 ) as an example, [25] assumesD x =C(B) andD y =C(C) for some tall deterministic matrixB and a tall Gaussian random matrixC, where for any matrixX,C(X) denotes the column space ofX. In contrast, we shall make the less stringent assumption onx andy and show that identifiability holds with high probability in the presence of rank two matrices in the null space of the lifted linear operator (sampling operator). A closely related (but different) problem is that of retrieving the phase of a signal from the magnitude of its Fourier coefficients (the Fourier phase retrieval problem). This is equivalent to recovering the signal given its auto correlation function [85]. In terms of our example blind deconvolution problem (P 1 ), phase retrieval is equivalent to having the additional constraintsD x =D y andy being the time reversed version ofx. While the Fourier phase retrieval problem may seem superficially similar to the blind deconvolution problem, there are major differences between the two, so much as to ensure identifiability and efficient recoverability for the sparsity regularized (in the canonical basis) version of the former [86], while one can explicitly show unidentifiability for the canonical-sparsity regularized version of the latter (as shown in Chapter 3), even with oracle knowledge of the supports of both signals. The difference arises because the Fourier phase retrieval problem is a (non-convex) quadratic inverse problem rather than a BIP, and it satisfies additional properties (constant trace of the lifted variable) which make it better conditioned for efficient recovery algorithms [87]. For the dictionary learning problem, an identifiability analysis is developed in [28] leveraging results from [29] on matrix factorization for sparse dictionary learning using the ` 1 norm and ` p quasi-norm for 0 < p < 1. More recently, exact recoverability of over-complete dictionaries from training samples (only polynomially large in the dimensions of the dictionary) has been proved in [13] assuming sparse (but unknown) coefficient matrix. While every BIP can be recast as a dictionary learning problem in principle, such a transformation would result in additional structural constraints on the dictionary that may or may not be trivial to incorporate in the existing analyses. This is especially true for bilinear maps over vector pairs. In contrast, we develop our methods to specifically target bilinear maps over vector pairs (e.g. convolution map) and thus obtain definitive results where the dictionary learning based formulations would most likely fail. Some identifiability results for blind deconvolution are summarized in [27], but the treatment therein is inflexible to the inclusion of side information about the input signals. Identifiability for non-negative matrix factorization was examined in [20] exploiting geometric properties of the non-negative orthant. Although our results can be easily visualized in terms of geometry, they can also be stated purely in terms of linear algebra (Theorem 2.4). Identifiability results for low-rank matrix completion [5], [30] are provided in [31] via algebraic and combinatorial conditions using graph theoretic tools, but there is no straightforward way to extend these results to more general lifted linear operators like the convolution map. Overall, to the best of our knowledge, a unified flexible treatment of identifiability in BIPs has not been developed till date. In this chapter, we present such a framework incorporating conic constraints on the input signals (which includes sparse signals in particular). 2.1.3 Organization, Reading Guide and Notation The remainder of the chapter is organized as follows. The first half of Section 2.2 formally introduces BIPs and a working definition of identifiability. Section 2.2.3 describes the lifting technique to reformulate BIPs as rank one matrix recovery problems, and characterizes the validity of the technique. Section 2.3 states our main results on both deterministic and random instance identifiability. Section 2.4 elaborates on the intuitions, ideas, assumptions and subtle implications associated with the results of Section 2.3. Section 2.5 is devoted to results of numerical verification and Section 2.6 concludes the chapter. Detailed proofs of all the results in the chapter appear in the Sections 2.7.1 to 2.7.14. 21 In order to maintain linearity of exposition to the greatest extent possible, we chose to create a separate section (Section 2.4) for elaborating on intuitions, ideas, assumptions and implications associated with the important results of the chapter. Thus, with the exception of Section 2.4, rest of the chapter can be read in a linear fashion. However, we recommend the reader to switch between Sections 2.3 and 2.4 as necessary, to better interpret the results presented in Section 2.3. We state the notational conventions used throughout rest of the chapter. All vectors are assumed to be column vectors unless stated otherwise. We shall use lowercase boldface alphabets to denote column vectors (e.g.z) and uppercase boldface alphabets to denote matrices (e.g.A). The all zero (respectively all one) vector/matrix shall be denoted by 0 (respectively 1) and the identity matrix by I. The canonical base matrices for the space of m×n real matrices will be denoted byE i,j for 1≤i≤m, 1≤j≤n and is defined (element-wise) as (E i,j ) k,l = ( 1, i =k,j =l, 0, otherwise. (2.2) For vectors and/or matrices, (·) T , Tr(·) and rank(·) respectively denote the transpose, trace and rank of their argument, whenever applicable. Special sets are denoted by uppercase blackboard bold font (e.g.R for real numbers). Other sets are denoted by uppercase calligraphic font (e.g.S). Linear operators on matrices are denoted by uppercase script font (e.g.S). The set of all matrices of rank at most k in the null space of a linear operatorS will be denoted byN (S,k), defined as N (S,k), X∈R m×n rank(X)≤k,S (X) = 0 , (2.3) and referred to as the ‘rank k null space’. For any matrixX, we denote the row and column spaces byR(X) andC(X) respectively. The projection matrix onto the column space (respectively row space) ofX shall be denoted byP C(X) (respectivelyP R(X) ). For any rank one matrixM, an expression of the formM =σuv T would denote the singular value decomposition of M with vectorsu andv each admitting unit ` 2 -norm. The standard Euclidean inner product on a vector space will be denoted byh·,·i and the underlying vector space will be clear from the usage context. All logarithms are with respect to (w.r.t.) base e unless specified otherwise. We shall use theO(h),o(h) and Θ(h) notation to denote order of growth of any functionf:R→R of h∈R w.r.t. its argument. We have, f(h) =O(h) ⇐⇒ lim h→∞ f(h) h <∞, (2.4a) f(h) =o(h) ⇐⇒ lim h→∞ f(h) h = 0, (2.4b) f(h) = Θ(h) ⇐⇒ lim h→∞ f(h) h ∈ (0,∞). (2.4c) 2.2 System Model This section introduces the bilinear observation model and the associated bilinear inverse problem in Subsection 2.2.1 and our working definition of identifiability in Subsection 2.2.2. Subsection 2.2.3 describes the equivalent linear inverse problem obtained by lifting and conditions under which the equivalence holds. This equivalence is used to establish all of our identifiability results in Section 2.3. 2.2.1 Bilinear Maps and Bilinear Inverse Problems (BIPs) Definition 2.1 (Bilinear Map). A mappingS:R m ×R n →R q is called a bilinear map ifS(·,y):R m →R q is a linear map∀y∈R n andS(x,·):R n →R q is a linear map∀x∈R m . We shall consider the generic bilinear system/measurement model z =S(x,y), (2.5) 22 wherez is the vector of observations,S:R m ×R n →R q is a given bilinear map, and (x,y) denotes the pair of unknown signals with a given domain restriction (x,y)∈K. We are interested in solving for vectorsx and y from the noiseless observationz as given by (2.5). The BIP corresponding to the observation model (2.5) is represented by the following feasibility problem. find (x,y) subject to S(x,y) =z, (x,y)∈K. (P 2 ) The non-negative matrix factorization problem [20] serves as an illustrative example of such a problem. Let X∈R m×k andY ∈R k×n be two element-wise non-negative, unknown matrices and suppose that we observe the matrix productZ =XY which clearly has a bilinear structure. The non-negative matrix factorization problem is represented by the feasibility problem find (X,Y ) subject to Z =XY, X≥ 0,Y ≥ 0. (P 3 ) where the expressionsX≥ 0 andY ≥ 0 constrain the matricesX andY to be elementwise non-negative. The elementwise non-negativity constraintsX≥ 0,Y ≥ 0 form a domain restriction in Problem (P 3 ), in the same way as the constraint (x,y)∈K serves to restrict the feasible set in Problem (P 2 ). 2.2.2 Identifiability Definition Notice that every BIP has an inherent scaling ambiguity due to the identity S(x,y) =S αx, 1 α y , ∀α6= 0, (2.6) whereS(·,·) represents the bilinear map. Thus, a meaningful definition of identifiability, in the context of BIPs, must disregard this type of scaling ambiguity. This leads us to the following definition of identifiability. Definition 2.2 (Identifiability). A vector pair (x,y)∈K⊆ R m ×R n is identifiable w.r.t. the bilinear map S:R m ×R n → R q if∀(x 0 ,y 0 )∈K⊆ R m ×R n satisfying S(x,y) = S(x 0 ,y 0 ),∃α6= 0 such that (x 0 ,y 0 ) = αx, 1 α y . Remark 2.1. It is straightforward to see that our definition of identifiability in turn defines an equivalence class of solutions. Thus, we seek to identify the equivalence class induced by the observationz in (2.5). Later, in Section 2.2.3, we shall ‘lift’ Problem (P 2 ) to Problem (P 4 ) where, every equivalence class in the domain (x,y)∈K of the former problem maps to a single point in the domainW∈K 0 of the latter problem. Remark 2.2. The scaling ambiguity represented by (2.6) is common to all BIPs and our definition of identifiability (Definition 2.2) only allows for this kind of ambiguity. There may be other types of ambiguities depending on the specific BIP. For example, the forward system model associated with Problem (P 3 ) is given by the matrix product operationS(X,Y ) =XY which shows the following matrix multiplication ambiguity. S(X,Y ) =S XT,T −1 Y (2.7) whereT −1 is the right inverse ofT. It is possible to define weaker notions of identifiability to allow for this kind of ambiguity. In this chapter, we shall not address this question any further and limit ourselves to the stricter notion of identifiability as given by Definition 2.2. 23 2.2.3 Lifting While Problem (P 2 ) is an accurate representation of the class of BIPs, the formulation does not easily lend itself to an identifiability analysis. We next rewrite Problem (P 2 ) to facilitate analysis, subject to some technical conditions (see Theorem 2.1 and Corollary 2.2). The equivalent problem is a matrix rank minimization problem subject to linear equality constraints minimize W rank(W ) subject to S (W ) =z, W∈K 0 , (P 4 ) whereK 0 ⊆R m×n is any set satisfying K 0 \ W∈R m×n rank(W )≤ 1 = xy T (x,y)∈K , (2.8) andS :R m×n →R q is a linear operator that can be deterministically constructed from the bilinear map S(·,·) with the optimization variable W in Problem (P 4 ) being related to the optimization variable pair (x,y) in Problem (P 2 ) by the relation W =xy T . The transformation of Problem (P 2 ) to Problem (P 4 ) is an example of ‘lifting’ and we shall refer toS (·) as the ‘lifted linear operator’ w.r.t. the bilinear map S(·,·). Other examples on lifting can be found in [25], [81]. Before stating the equivalence results between Problems (P 2 ) and (P 4 ) we describe the construction ofS (·) fromS(·,·). Let φ j :R q →R be the j th coordinate projection operator of q dimensional vectors to scalars, i.e. ifz = (z 1 ,z 2 ,...,z q ) thenφ j (z) =z j . Clearly,φ j is a linear operator and hence the compositionφ j ◦S:R m ×R n →R is a bilinear map. As S is a finite dimensional operator, it is a bounded operator, hence by the Riesz Representation Theorem [88],∃S j ∈R m×n such thatS j is the unique linear operator satisfying φ j ◦S(x,y) =hx,S j yi, ∀x∈R m ,y∈R n , (2.9) whereh·,·i denotes an inner product operation in R m . Using (2.9), we can convert the bilinear equality constraint in Problem (P 2 ) into a set of q linear equality constraints as follows: z j =φ j ◦S(x,y) =x T S j y = xy T ,S j (2.10) for each 1≤ j≤ q, where the last inner product in (2.10) is the trace inner product in the space R m×n and z j denotes the j th coordinate of the observation vector z. Setting W =xy T in (2.10), the q linear equality constraints in (2.10) can be compactly represented, using operator notation, by the vector equality constraintS (W ) =z, whereS :R m×n →R q is a linear operator acting onW∈R m×n . This derivation uniquely specifiesS (·) using the matricesS j , 1≤j≤q, and we have the identity S xy T =S(x,y), ∀(x,y)∈R m ×R n . (2.11) For the sake of completeness, we state the definitions of equivalence and feasibility in the context of optimization problems (Definitions 2.3 and 2.4). Thereafter, the connection between Problems (P 2 ) and (P 4 ) is described via the statements of Theorem 2.1 and Corollary 2.2. Definition 2.3 (Equivalence of optimization problems). Two optimization problems P and Q are said to be equivalent if every solution to P gives a solution to Q and every solution to Q gives a solution to P. Definition 2.4 (Feasibility). An optimization problem is said to be feasible, if the domain of the optimization variable is non-empty. Theorem 2.1. Let Problem (P 2 ) be feasible and letK opt andK 0 opt denote the set of solutions to Problems (P 2 ) and (P 4 ), respectively. Then the following are true. 1. Problem (P 4 ) is feasible with solution(s) of rank at most one. 24 Linear Convolution Map 1 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 S 2 S 3 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 S 1 S 4 0 0 0 0 0 0 0 1 0 0 1 0 S 5 0 0 0 0 0 0 0 0 0 0 0 1 S 6 Figure 2.1: Lifted matricesS k ∈R m×n for linear convolution map with m = 3, n = 4, q =m +n− 1 = 6 and 1≤k≤q. 2. K 0 opt ⊆ xy T (x,y)∈K opt . 3. K 0 opt = xy T (x,y)∈K opt if and only if{0}( xy T (x,y)∈K opt does not hold. Proof. Section 2.7.1. Notice thatK opt andK 0 opt in Theorem 2.1 depend on the observation vectorz, so that the statements of Theorem 2.1 have a hidden dependence onz. Since the observation vectorz is a function of the input signal pair (x,y) it is desirable to have statements analogous to Theorem 2.1 that do not depend on the observation vectorz. This is the purpose of Corollary 2.2 below which makes use ofN (S, 1), the rank one null space of the lifted operatorS (·) (see (2.3)). Corollary 2.2. Let Problem (P 2 ) be feasible and letK opt (z) andK 0 opt (z) respectively denote the set of optimal solutions to Problems (P 2 ) and (P 4 ) for a given observation vector z. Problems (P 2 ) and (P 4 ) are equivalent, i.e.K 0 opt (z) = xy T (x,y)∈K opt (z) , for everyz∈{S(x,y)| (x,y)∈K} if and only if {0}(K 0 T N (S, 1) does not hold. Proof. Section 2.7.2. Remark 2.3. The statements of Theorem 2.1 and Corollary 2.2 are needed to establish the validity of lifting for general BIPs withN (S, 1)6={0}. In caseN (S, 1) ={0} (e.g. blind deconvolution), Corollary 2.2 immediately implies that lifting is valid. Remark 2.4. Notice that lifting Problem (P 2 ) to Problem (P 4 ) allows us some freedom in the choice of the setK 0 . Also, we have the additional side information that the optimal solution to Problem (P 4 ) is a rank one matrix. These factors could be potentially helpful to develop tight and tractable relaxations to Problem (P 4 ), that work better than the simple nuclear norm heuristic [89] (e.g. see [13]). We do not pursue this question here. This transformation from Problem (P 2 ) to Problem (P 4 ) gives us several advantages, 1. Problem (P 4 ) has linear equality constraints as opposed to the bilinear equality constraints of Prob- lem (P 2 ). The former is much easier to handle from an optimization as well as algorithmic perspective than the latter. 2. Convex relaxation for the nonconvex rank constraint in Problem (P 4 ) is well known [89], which is an important requirement from an algorithmic perspective. In contrast, convex relaxation for a generic bilinear constraint is not known. 3. The bilinear map is completely determined by the set of matricesS j and is separated from the variable W in Problem (P 4 ). Thus, Problem (P 4 ) can be used to study generic BIPs. Figure 2.1 illustrates a toy example involving the linear convolution map. 25 4. For every BIP there is an inherent scaling ambiguity (see (2.6)) associated with the bilinear constraint. However, in Problem (P 4 ), this scaling ambiguity has been taken care of implicitly whenW =xy T is the variable to be determined. Clearly,W is unaffected by the type of scaling ambiguity described in (2.6). Norm constraints onx ory can be used to recoverx andy fromW but these constraints do not affect Problem (P 4 ). 5. Ifx and/ory are sparse in some known dictionary (possibly over-complete) then they can be absorbed into the mapping matricesS j without altering the structure of Problem (P 4 ). Indeed, ifA andB are dictionaries such thatx =Aβ andy =Bγ then we have x T S j y =β T A T S j B γ = βγ T ,A T S j B (2.12) for each 1≤j≤q. It is clear that Problem (P 4 ) can be rewritten withW =βγ T as the optimization variable (with a corresponding modification toK 0 ), and comparing (2.12) and (2.10) we see that the matrixA T S j B can be designated to play the same role in the rewritten Problem (P 4 ) asS j played in the original Problem (P 4 ). Thus, without loss of generality, we can consider Problem (P 4 ) to be our lifted problem that retains all available prior information from Problem (P 2 ) (assuming that the equivalence conditions in Corollary 2.2 are satisfied). 2.3 Identifiability Results We state our main results in this section starting with deterministic characterizations of identifiability in Subsections 2.3.1 and 2.3.2 that are simple to state but computationally hard to check for a given BIP. Subsequently, in Subsection 2.3.4 we investigate whether identifiability holds for most inputs if the input is drawn from some distribution over the domain. Since we have some freedom of choice in the selection of the setK 0 according to Remark 2.4, we will work with an arbitraryK 0 satisfying (2.8). The extreme cases ofK 0 = xy T (x,y)∈K andK 0 =R m×n will sometimes be used for examples and to build intuition. Also, for some of the results, we have converse statements only for one of the extreme cases. We shall use the setM to denote the differenceK 0 −K 0 , defined as M =K 0 −K 0 ,{X 1 −X 2 |X 1 ,X 2 ∈K 0 }. (2.13) 2.3.1 Universal Identifiability As a straightforward consequence of lifting, we have the following necessary and sufficient condition for Problem (P 4 ) to succeed for all values of the observationz =S(x,y). Proposition 2.3. LetK 0 = xy T (x,y)∈K . The solution to Problem (P 4 ) will be correct for every observationz =S(x,y) if and only ifN (S, 2) T M ={0}. Proof. Section 2.7.3. Remark 2.5. Notice that the “only if” part of Proposition 2.3 requires uniqueness of an observation z that is valid for Problem (P 2 ) as well and not just for Problem (P 4 ). The latter could have observations that arise because of the freedom in the choice ofK 0 , but those may not be valid for the former. As a result, the conclusion of the “only if” part of Proposition 2.3 is somewhat weaker in that it does not implyN (S, 2) ={0}. WhenK =R m ×R n ,M represents the set of all rank two matrices in R m×n so that Proposition 2.3 reduces to the more familiar result:N (S, 2) ={0} is necessary and sufficient for the action of the linear operatorS to be invertible on the set of all rank one matrices, where the inversion of the action ofS is achieved as the solution to Problem (P 4 ). While the characterization ofN (S, 2) for arbitrary linear operatorsS (·) is challenging, it has been shown that ifS (·) is picked as a realization from some desirable distribution thenN (S, 2) ={0} (implies 26 N (S, 2) T M ={0}) is satisfied with high probability. As an example, [82], [83] show that ifS :R m×n →R q is picked from a Gaussian random ensemble, thenN (S, 2) ={0} is satisfied with high probability for q =O(max(m,n)). 2.3.2 Deterministic Instance Identifiability WhenS (·) is sampled from less desirable distributions, as for matrix completion [5], [30] or matrix recovery for a specific given basis [70], one does not haveN (S, 2) ={0} with high probability. To guarantee identifiability (and unique reconstruction) for such realizations ofS (·), significant domain restrictions via the setK (orK 0 ) are usually needed, so thatN (S, 2) T M ={0} and Proposition 2.3 comes into effect. Unfortunately, for many important BIPs (blind deconvolution, blind source separation, matrix factorization, etc.) the lifted linear operatorS (·) does have a non-trivialN (S, 2) set. This makes identifiability an important issue in practice. Fortunately, we still haveN (S, 1) ={0} in many of these cases so that Corollary 2.2 implies that lifting is valid. For such maps, we have the following deterministic sufficient condition (Theorem 2.4) for a rank one matrixM∈K 0 ⊆R m×n to be identifiable as a solution of Problem (P 4 ). Theorem 2.4 is heavily used for the results in the sequel. Theorem 2.4. LetN (S, 1) T M ={0} and M = σuv T be a rank one matrix inK 0 ⊆R m×n . Suppose that for every X∈N (S, 2) T M\{0} either u / ∈C(X) or v / ∈R(X) is true, then given the observation z =S (M),M can be successfully recovered by solving Problem (P 4 ). Proof. Section 2.7.4. Theorem 2.4 is only a sufficient condition for identifiability. We bridge the gap to the necessary conditions under a special case in Corollary 2.5 below. We use the notationM−K 0 to denote the set{M−Y |Y ∈K 0 }. Corollary 2.5. LetN (S, 1) T M ={0} andM =σuv T be a rank one matrix inK 0 ⊆R m×n . Suppose that every matrix X∈N (S, 2) T (M−K 0 )\{0} admits a singular value decomposition with σ 1 (X) =σ 2 (X). Let us denote such a decomposition asX =σ ∗ u 1 v T 1 +σ ∗ u 2 v T 2 , and letu =α 1 u 1 +α 2 u 2 andv =α 3 v 1 +α 4 v 2 for some α 1 ,α 2 ,α 3 ,α 4 ∈R with α 2 1 +α 2 2 =α 2 3 +α 2 4 = 1. Given the observationz =S (M), Problem (P 4 ) successfully recoversM if and only if for everyX∈N (S, 2) T (M−K 0 )\{0}, α 1 α 3 +α 2 α 4 ≤ 0. Proof. Section 2.7.5. Intuitively, Corollary 2.5 exploits the fact that all nonzero singular values of a matrix are of the same sign. Indeed, (α 1 ,α 2 ) (respectively (α 3 ,α 4 )) is an element of the two dimensional space of representation coefficients ofu w.r.t.C(X) (respectivelyv w.r.t.R(X)) with a fixed representation basis. Corollary 2.5 says that identifiability of M holds if and only if the vectors (α 1 ,α 2 ) and (α 3 ,α 4 ) do not form an acute angle between them. The assumption of σ 1 (X) =σ 2 (X) has been made in Corollary 2.5 for ease of intuition. Although we do not state it here, an analogous result holds for σ 1 (X)6=σ 2 (X) with the condition on the inner producth(α 1 ,α 2 ), (α 3 ,α 4 )i≤ 0 replaced by the same condition on a weighted inner product, where the weights depend on the ratio of σ 1 (X) to σ 2 (X). For arbitrary lifted linear operatorsS (·), checking Theorem 2.4 for a given rank one matrixM is usually hard, unless a simple characterization ofN (S, 2) orN (S, 2) T M has been provided. It is reasonable to ask “How many rank one matricesM are identifiable?”, given any particular lifted linear operatorS (·) and assuming that the rank one matricesM are drawn at random from some distribution. It is highly desirable if most rank one matricesM are identifiable. Before we can show such a result we need to define a random model for the rank one matrixM. 2.3.3 A Random Rank One Model We considerM =xy T as a random rank one matrix drawn from an ensemble with the following properties: (A1) x (andy) is a zero mean random vector with an identity covariance matrix. 27 (A2) x andy are mutually independent. As a practical motivation for this random model, we consider a blind channel estimation problem where the transmitted signalx passes through an unknown linear time invariant channel impulse responsey. In the absence of measurement noise, the observed signal at the receiver would be the linear convolutionz =x?y, which is a bilinear map. A practical modeling choice puts the channel realizationy statistically independent of the transmitted signalx. Furthermore, if channel phase is rapidly varying, then the sign of each entry for y is equally likely to be positive or negative with resultant mean as zero. The transmitted signal x can be assumed to be zero mean with independent and identically distributed entries (and thus identical variance per entry) under Binary-Phase-Shift-Keying and other balanced Phase-Shift-Keying modulation schemes. The assumption of equal variance per tap is somewhat idealistic for channely, but strictly speaking, this requirement is not absolutely necessary for our identifiability results. Dependent Entries First, we consider the case when the elements ofx (respectivelyy) are not independent. We shall be interested in the following two possible properties ofx andy: (A3) The distribution ofx (respectively, y) factors into a product of marginal distributions ofkxk 2 and x/kxk 2 (respectively,kyk 2 andy/kyk 2 ). (A4) ∃r> 0 such thatkxk 2 ≥r (respectivelykyk 2 ≥r) a.s. WestatethefollowingtechnicallemmasthatwillbeneededintheproofsofTheorem2.10andCorollary2.11. Lemma 2.7 is mainly useful when the assumption (A3) cannot be satisfied but one needs bounds that closely resemble that of Lemma 2.6. These lemmas allow us to upper bound the probability that x (respectively y) is close to one of the key subspaces in Theorem 2.4, i.e.C(X) (respectivelyR(X)) whereX is in the appropriately constrained subset ofN (S, 2). Lemma 2.6. Given any m×n real matrixX∈N (S, 2) T M\{0} and a constant δ∈ (0, 1), a rank one random matrixM =xy T =σuv T satisfying assumptions (A1)-(A3) also satisfies, Pr P C(X) u 2 2 ≥ 1−δ ≤ 2 m(1−δ) (2.14a) and, Pr P R(X) v 2 2 ≥ 1−δ ≤ 2 n(1−δ) . (2.14b) Proof. Section 2.7.6. Lemma 2.7. Given any m×n real matrixX∈N (S, 2) T M\{0} and a constant δ∈ (0, 1), a rank one random matrixM =xy T =σuv T satisfying assumptions (A1)-(A2), withx (respectivelyy) satisfying (A4) for a constant r =r x (respectively r =r y ), also satisfies, Pr P C(X) u 2 2 ≥ 1−δ ≤ 2 r 2 x (1−δ) (2.15a) and, Pr P R(X) v 2 2 ≥ 1−δ ≤ 2 r 2 y (1−δ) . (2.15b) Proof. Section 2.7.7. Remark 2.6. Lemma 2.7 will give non-trivial bounds if r x (respectively r y ) go to∞ fast enough as m (respectively n) goes to∞, and this growth rate could be slower than Θ( √ m) (respectively Θ( √ n)). 28 An example where Lemma 2.7 is applicable but Lemma 2.6 is not, can be constructed as follows. As before, lety represent a channel impulse response independent ofx, so that (A2) is satisfied. Letx represent a coded data stream under Pulse-Amplitude-Modulation such thatkxk 2 ∈ n p m/3, p 2m/3 o with equal probability,E[x] = 0 and x m is coded as a function ofkxk 2 yielding the following conditional correlation matrices: E " xx T kxk 2 = r m 3 # = 1 3 I + 1 6 (E 1,m +E m,1 ) (2.16a) and, E " xx T kxk 2 = r 2m 3 # = 2 3 I− 1 6 (E 1,m +E m,1 ) (2.16b) whereE i,j ∈R m×m is the matrix with elements given by (E i,j ) k,l = ( 1, i =k,j =l, 0, otherwise, (2.17) for every 1≤ i,j≤ m. The expressions in (2.16) clearly imply thatkxk 2 and x/kxk 2 are dependent so that (A3) does not hold. Nonetheless, by construction, we have Pr kxk 2 = r m 3 = Pr kxk 2 = r 2m 3 ! = 1 2 , (2.18) so that (2.16) impliesE xx T = I, thus satisfying (A1). Also,kxk 2 ≥ p m/3 a.s. so that (A4) is satisfied. Thus, Lemma 2.7 is applicable with r x = p m/3. Independent Entries While Lemma 2.6 provides useful bounds, it does not suffice for many problems whereN (S, 2) T M is large. We can get much stronger bounds than Lemma 2.6 if the elements of vector x (respectively y) come from independent distributions, by utilizing the concentration of measure phenomenon [90]. We shall consider the standard Gaussian and the symmetric Bernoulli distributions, and sharpen the bounds of Lemma 2.6 in the two technical lemmas to follow. Note that a zero mean independent and identically distributed assumption on the elements ofx andy already implies the assumptions (A1)-(A3). The bounds of Lemmas 2.9 and 2.8 have an interpretation similar to the restricted isometry property [91] and are used in the proofs for Theorems 2.13 and 2.14, respectively. We retain the assumptionN (S, 1) T M ={0} from Theorem 2.4 and follow the convention that a random variable Z has a symmetric Bernoulli distribution if Pr(Z = +1) = Pr(Z =−1) = 1/2. Lemma 2.8. LetN (S, 1) T M ={0}. Given any m×n real matrix X ∈N (S, 2) T M\{0} and a constant δ∈ (0, 1), a random vectorx∈R m with each element drawn independently from a standard normal distribution satisfies Pr P C(X) x 2 2 ≥ (1−δ)kxk 2 2 ≤ exp −m log 1 √ δ + 2 logm− 2 m + 2− log 2δ 1−δ . (2.19) Proof. Section 2.7.10. Lemma 2.9. LetN (S, 1) T M ={0}. Given anym×n real matrixX∈N (S, 2) T M\{0} and a constant δ∈ (0, 1), a random vector x∈R m with each element drawn independently from a symmetric Bernoulli distribution satisfies Pr P C(X) x 2 2 ≥ (1−δ)kxk 2 2 ≤ exp − m(1−δ) 4 + log 4 . (2.20) 29 Proof. Section 2.7.11. Section 2.4.6 provides additional remarks on Lemma 2.9. 2.3.4 Random Instance Identifiability We first consider the special case where the size of the setN (S, 2) T M is small w.r.t. mn, in Section 2.3.4. We use the same intuition in Section 2.3.4 to appropriately partition the setN (S, 2) T M when its size is large (possibly infinite) with respect to m +n. For future reference, we define the setH as H, n (C(X),R(X)) X∈N (S, 2) \ M\{0} o . (2.21) Small Complexity ofN (S, 2) T M It is intuitive to expect that the number of rank one matricesM that are identifiable as optimal solutions to Problem (P 4 ) should depend inversely on the size/complexity ofN (S, 2) T M. Below, we shall make this notion precise. We shall do so by lower bounding the probability of satisfaction of the sufficient conditions in Theorem 2.4. Theorem 2.10. LetN (S, 1) T M ={0} and M = σuv T ∈K 0 ⊆R m×n be a rank one random matrix satisfying assumptions (A1)-(A3). Suppose that the setH is finite with cardinality f S,M (m,n). For any constant δ ∈ (0, 1), the sufficient conditions of Theorem 2.4 are satisfied with probability greater than 1− 4f S,M (m,n) mn(1−δ) . Proof. We describe the basic idea behind the proof and defer the full proof to Section 2.7.8. The proof consists of the following important steps. (a) We fix the matrixX∈N (S, 2) T M\{0} and then relax the “hard” event of subspace membership {u∈C(X)} to the “soft” event of being close to the subspace in ` 2 -norm n u−P C(X) u 2 2 ≤δ o . This “soft” event describes a body of nonzero volume inR m . A similar argument holds for the vectorv as well. (b) Next, the volumes (probabilities) of both these bodies (events) is computed individually and utilizing independence between realizations ofu andv, the probability of the intersection of these events is easily computed. The bounds of Lemma 2.6 are used in this step. (c) Lastly, we employ a union bound over the set of valid matricesX∈N (S, 2) T M\{0} to make our results universal in nature. Sections 2.4.1, 2.4.2 and 2.4.3 provide additional remarks on Theorem 2.10. In Theorem 2.10, we can drive the probability of identifiability 1− 4f S,M (m,n) mn(1−δ) arbitrarily close to one by increasing m and/or n provided that f S,M (m,n) grows as o(mn). For many important BIPs (blind deconvolution, blind source separation, matrix factorization, etc.) this growth rate requirement onf S,M (m,n) is too pessimistic. Tighter versions of Theorem 2.10, with more optimistic growth rate requirements on f S,M (m,n), are possible if the assumptions of Lemma 2.8 or 2.9 are satisfied. This is the content of Theorems 2.13 and 2.14 described in Section 2.3.4. We provide a corollary to Theorem 2.10 when assumption (A3) does not hold so that Lemma 2.6 is inapplicable. The result uses Lemma 2.7 in place of Lemma 2.6 for the proof. The bound is asymptotically useful if f S,M (m,n) grows as o r 2 x (m)r 2 y (n) . 30 Corollary 2.11. LetN (S, 1) T M ={0} and M = σuv T ∈K 0 ⊆R m×n be a rank one random matrix satisfying assumptions (A1)-(A2) withx (respectivelyy) satisfying (A4) for a constantr =r x (m) (respectively r =r y (n)). Suppose that the setH is finite with cardinality f S,M (m,n). For any constant δ∈ (0, 1), the sufficient conditions of Theorem 2.4 are satisfied with probability greater than 1− 4f S,M (m,n) r 2 x (m)r 2 y (n)(1−δ) . Proof. Section 2.7.9. Large/Infinite Complexity ofN (S, 2) T M When the complexity ofN (S, 2) T M is infinite or exponentially large in m +n, the bounds of Section 2.3.4 become trivially true for large enough m or n. We investigate an alternative bounding technique for this situation using covering numbers. Intuitively speaking, covering numbers measure the size of discretized versions of uncountable sets. The advantage of using such an approach is that the results are not contingent upon the exact geometry ofN (S, 2) T M. Thus, like Theorem 2.10, the technique and subsequent results are applicable to every bilinear map. We shall see that to arrive at any sensible results, we will need to use the tighter estimates given by Lemmas 2.8 and 2.9 that are only possible when our signalsx andy are component-wise independent. Definition 2.5 (Covering Number and Metric Entropy [92]). For any two setsB,D⊆R n , the minimum number of translates ofB needed to coverD is called the covering number ofD w.r.t.B and is denoted by N(D,B). The quantity logN(D,B) is known as the metric entropy ofD w.r.t.B. It is known that ifD⊆R n is a bounded convex body that is symmetric about the origin, and we let B =D,{x|x∈D} for some 0<< 1, then the covering number N(D,D) obeys [92] 1 n ≤N(D,D)≤ 2 + 1 n . (2.22) We can equivalently say that the metric entropy logN(D,D) equals n log Θ(1/). We shall use this notation for specifying metric entropies of key sets in the theorems to follow. We state a technical lemma needed to prove Theorems 2.13 and 2.14. The lemma bounds the difference between norms of topologically close projection operators as a function of the covering resolution, thus providing a characterization of the sets used to cover over the space of interest. Lemma 2.12. LetG(m) = Y ∈R m×2 Y T Y = I , D(m) = [y 1 ,y 2 ]∈R m×2 max j=1,2 ky j k 2 ≤ 1 and 0<< 1. There exists a covering ofG(m) with metric entropy≤ 2m log Θ(1/) w.r.t. D(m) such that for anyY,Z∈G(m) satisfyingY−Z∈D(m) we have P C(Y ) x 2 − P C(Z) x 2 ≤ √ 2kxk 2 (2.23) for allx∈R m . Proof. Section 2.7.12. Section 2.4.4 provides additional remarks on Lemma 2.12. We are now ready to extend Theorem 2.10 to the case where the complexity ofN (S, 2) T M is large (possibly infinite). We shall do so for Bernoulli and Gaussian priors (as illustrative distributions) in Theorems 2.13 and 2.14 respectively. The proofs for both these theorems follow on the same lines as that of Theorem 2.10, except that the probability bounds of Lemma 2.6 are replaced by those of Lemmas 2.9 and 2.8 for Bernoulli and Gaussian priors, respectively. We define the setsH 1 (m) andH 2 (n) as below for future use. H 1 (m),G(m) \ n C(X) X∈N (S, 2) \ M\{0} o , (2.24a) H 2 (n),G(n) \ n R(X) X∈N (S, 2) \ M\{0} o . (2.24b) 31 2 3 4 5 6 7 8 9 10 10 −6 10 −4 10 −2 10 0 dimension, n Failure Probability Bound m=2 m=3 m=4 m=5 Figure 2.2: Exponentially decaying behavior of the theoretically predicted failure probability bound in Theorem 2.14, w.r.t. n for fixed values of m, for parameters = 0.1, δ = 10 −4 and p =m +n− 3 for the lifted linear convolution map (p defined as in Theorem 2.14). Theorem 2.13. LetN (S, 1) T M ={0}, the setsG(m),G(n) andD(m),D(n) be defined according to Lemma 2.12, andM =xy T ∈R m×n be a rank one random matrix with components of x (respectivelyy) drawn independently from a symmetric Bernoulli distribution withK 0 chosen as K 0 = λxy T x∈{−1, 1} m ,y∈{−1, 1} n ,λ∈R , (2.25) andM =K 0 −K 0 . Let p c log Θ(1/) denote the metric entropy of the setH 1 (m) w.r.t. D(m), p r log Θ(1/) denote the metric entropy of the setH 2 (n) w.r.t. D(n), for any 1>≥ 0 > 0 and let p =p c +p r . For any constant δ 0 ∈ 0, 1− 2 2 , the sufficient conditions of Theorem 2.4 are satisfied with probability greater than 1− 16 exp p log Θ 1 − (m +n) 1−δ 4 with δ = 1− √ 1−δ 0 − √ 2 2 . Proof. Section 2.7.13. Theorem2.14. LetN (S, 1) ={0}, the setsG(m),G(n) andD(m),D(n) be defined according to Lemma 2.12, andM =xy T ∈R m×n be a rank one random matrix with components ofx (respectivelyy) drawn indepen- dently from a standard Gaussian distribution. Let p c log Θ(1/) denote the metric entropy of the setH 1 (m) w.r.t. D(m), p r log Θ(1/) denote the metric entropy of the setH 2 (n) w.r.t. D(n) for any 0<< 1 and let p =p c +p r . For any constant δ 0 ∈ 0, 1− 2 2 , the sufficient conditions of Theorem 2.4 are satisfied with probability greater than 1−C(m,n,δ) exp p log Θ 1 − (m +n) log 1 √ δ where C(m,n,δ) = exp 2 logmn + 4− 2 log 2δ 1−δ = 1 δ − 1 2 Θ m 2 n 2 . (2.26) and δ = 1− √ 1−δ 0 − √ 2 2 . Proof. Section 2.7.14. Sections 2.4.5 and 2.4.6 provide additional remarks on Theorems 2.13 and 2.14. A non-trivial illustration of the theoretical scaling law bound of Theorem 2.14 is provided in Figure 2.2, withS (·) as the lifted linear convolution map. Since the bound is parametrized by (,δ), we choose = 0.1 and δ = 10 −4 for the illustration. Quite surprisingly (and fortunately), the metric entropy p in Theorem 2.14 can be exactly characterized whenS (·) represents the lifted linear convolution map. Specifically, we have p =m +n− 3. We refer the reader to Proposition 2.15 in Section 2.5.2 for details. Remark 2.7. We can obtain results analogous to Theorems 2.13 and 2.14 whenx andy are drawn from non-identical distributions, e.g. x is component-wise i.i.d. symmetric Bernoulli and y is component-wise i.i.d. standard Gaussian. The argument is a straightforward modification of the proof. 32 2.4 Discussion In this section, we elaborate on the intuitions, ideas, assumptions and subtle implications associated with the main results of this chapter that were presented in Section 2.3. 2.4.1 A Measure of Geometric Complexity For the purpose of measuring the size/complexity ofN (S, 2) T M in Theorem 2.10, we used the cardinality f S,M (m,n) of the setH ={(C(X),R(X))|X∈N (S, 2) T M\{0}} as a surrogate. This set essentially lists the distinct pairs of row and column spaces in the rank two null space of the lifted linear operator S (·) that are not excluded by the domain restriction M ∈K 0 . We note that the cardinality of the set N (S, 2) T M T {X∈R m×n |kXk F = 1} could be infinite while its complexity could be finite in the sense just described. The same measure of complexity is used for the extensions of Theorem 2.10 in Theorems 2.13 and 2.14. Throughout rest of the chapter, any reference to the complexity of a set of matricesM 0 ⊆R m×n is in the sense just described, i.e. through the cardinality of the set{(C(X),R(X))|X∈M 0 \{0}}. 2.4.2 The Role of Conic Prior There are three distinct aspects to the prior knowledge in terms of the conic constraint M∈K 0 on the unknown signal. 1. Probability Bounds: A key advantage of prior knowledge about the signal is apparent from the union bounding step in the proof of Theorem 2.10. Union bounding over the setN (S, 2) T M\{0} always gives better bounds than union bounding over the supersetN (S, 2)\{0}, the quantitative difference being the number f S,M (m,n) in the bound of Theorem 2.10. In general, the difference could be exponentially large in m or n (see Theorems 2.13 and 2.14). We also note thatK 0 does not need to be a cone in order to exploit this approach to improve the probability bounds. 2. Computational Trade-offs: Recalling Remark 2.4, the size ofK 0 also trades off the ease of computation and the identifiability bounds of Theorem 2.10. If the size of K 0 needs to be increased to ease computation, an effort must be made to not suffer a substantial increase in the size/complexity of the setN (S, 2) T M\{0}. For high dimensional problems (m or n is large), non-convex conic priors like the sparse cone in compressed sensing [4] and the low-rank cone in matrix completion [5] have been shown to admit good computationally tractable relaxations. 3. Geometric Complexity Measure: The measure of geometric complexity described in Section 2.4.1 followed naturally from Theorem 2.4 in an effort to describe the identifiability of a BIP in terms of quantities like row and column spaces familiar from linear algebra. This measure of complexity is invariant w.r.t. conic extensions in the following way. LetM 0 ⊆R m×n be any set of matrices and letM 00 denote its conic extension, defined as M 00 , λX X∈R m×n ,λ∈R + . (2.27) Then, we have {(C(X),R(X))|X∈M 0 \{0}} ={(C(X),R(X))|X∈M 00 \{0}}. (2.28) Qualitatively speaking, the flavor of results in this chapter could also be derived for non-conic priors but the measure of geometric complexity that is used is implicitly based on conic extensions. Thus, there is no significant loss of generality in restricting ourselves to conic priors. 2.4.3 The Role of δ Although the parameter δ∈ (0, 1) appears in Theorem 2.10 as an artifact of our proof strategy, it has an important practical consequence. It represents a tolerance parameter for approximate versus exact prior 33 information on the input signals. Specifically, Theorem 2.10 is a statement about identifiability up to a δ-neighborhood around the true signal (x,y). The same holds true for Theorems 2.13 and 2.14 describing the large/infinite complexity case. 2.4.4 Interpretation of Lemma 2.12 Lemma 2.12 can be informally restated as follows. Keeping (2.23) satisfied,G(m) can always be covered by D(m) with metric entropy≤ 2m log Θ(1/). In Theorems 2.13 and 2.14 below, we are interested in covering the subsetH 1 (m)⊆G(m) by D(m) and suppose that the resulting metric entropy is p c log Θ(1/). In a sense, Lemma 2.12 represents the worst case scenario that p c is upper bounded by 2m and no better upper bound is known. In the worst case, the aforementioned subset ofG(m) has nearly the same complexity as G(m) and this happens when the setN (S, 2) T M does not represent a large enough structural restriction on the set of rank two matrices inR m×n . For large m, to guarantee identifiability for most inputs, we would (realistically) want p c to be less than m by at least a constant factor. This is implied by Theorems 2.13 and 2.14. Informally, smaller or more structuredN (S, 2) T M implies a smaller value of p c which in turn implies identifiability for a greater fraction of the input ensemble. 2.4.5 The Gaussian and Bernoulli Special Cases A standard Gaussian prior on the elements ofx andy gives an example of the setN (S, 2) T M with infinite complexity, provided thatN (S, 2) is complex enough. In this case,K ={(x,y)|x∈R m ,y∈R n } in Prob- lem(P 2 )implyingthatK 0 ⊇{W∈R m×n | rank(W )≤ 1}from(2.8). Thus,M⊇{W∈R m×n | rank(W )≤ 2}⊇ N (S, 2) and henceN (S, 2) T M =N (S, 2). SinceM is superfluous in this case, Theorem 2.14 omits all references to it. If the row or column spaces of matrices inN (S, 2) are parametrized by one or more real parameters (see Section 2.5.2 for an example involving the linear convolution operator), thenN (S, 2) has infinite complexity. The scenario of a Bernoulli prior on elements of x and y gives an example of the setN (S, 2) T M with finite (but exponentially large in m +n) complexity, provided thatN (S, 2) is complex enough. The precise statement requires a little more care than the Gaussian case described above. The motivation behind considering Bernoulli priors is to restrict the unit vectorsx/kxk 2 andy/kyk 2 to take values from a large but finite set while adhering to the requirement of a conic prior on (x,y) according to Problem (P 2 ). Thus, in this case we haveK ={(λ 1 x,λ 2 y)|x∈{−1, 1} m ,y∈{−1, 1} n ,λ 1 ∈R,λ 2 ∈R}. Let us selectK 0 according to (2.8), but without any relaxation, as K 0 = λxy T x∈{−1, 1} m ,y∈{−1, 1} n ,λ∈R . (2.29) Clearly, matrices inK 0 can account for at most 2 m−1 distinct column spaces and 2 n−1 distinct row spaces, thus implying that matrices inM =K 0 −K 0 are generated by at most 2 m−1 2 ≤ 2 2m−2 distinct column spaces and at most 2 n−1 2 ≤ 2 2n−2 distinct row spaces. Thus,N (S, 2) T M⊆M is of finite complexity. It is clear that the complexity ofM is exp(Θ(m +n)) so that ifM\N (S, 2) is small enough then the complexity of N (S, 2) T M is exponentially large in m +n. 2.4.6 Distinctions between Theorems 2.13 and 2.14 Assumptions on We prevent an arbitrarily small for Theorem 2.13 by imposing a strictly positive lower bound 0 > 0. This is necessary for Bernoulli priors onx andy sinceN (S, 2) T M has a finite complexity, implying that the covering numbers ofH 1 (m) w.r.t.D(m) andH 2 (n) w.r.t.D(n) have an absolute upper bound independent of . Thus, the logarithmic dependence (of the key metric entropies) on 1/ cannot hold unless is lower bounded away from zero. Theorem 2.14, in contrast, allows for arbitrarily small sinceN (S, 2) T M =N (S, 2) has infinite complexity, for Gaussian priors onx andy. Despite this distinction between Theorems 2.13 and 2.14, 34 we choose to present our results in the stated form to emphasize similarity in the theorem statements and proofs. A constant factor loss We loose a constant factor of approximately 2 in the exponent on the r.h.s. of (2.20) as compared to (2.19) for a fixedδ∈ (0, 1) (compared using first order approximation of logδ). While this seems to be an artifact of the proof strategy, it is unclear whether a better constant can be obtained for the symmetric Bernoulli distribution (or more generally, for subgaussian distributions [93]). Indeed, for the proof of Lemma 2.8 in Section 2.7.10, we have used the rotational invariance property of the multivariate standard normal distribution. This property does not carry over to general subgaussian distributions. 2.5 Numerical Results on Blind Deconvolution We observe that ifN (S, 2) T M ={0} then f S,M (m,n) = 0 and Theorem 2.10 correctly predicts that the input signals are identifiable with probability one (in agreement with Proposition 2.3). Below, we consider example bilinear maps and input distributions withN (S, 2) T M6={0} and numerically examine the scaling behavior suggested by Lemmas 2.6, 2.9 and 2.8 and Theorems 2.10, 2.13 and 2.14. Since Lemma 2.6 and Theorem 2.10 impose only broad constraints on the input distribution, for the purpose of numerical simulations, we construct a specific input distribution that satisfies assumptions (A1)-(A3) in Section 2.5.1. Since this research was motivated by our interest to understand the cone constrained blind deconvolution problem (P 1 ), our selection of example bilinear maps are closely related to the linear convolution map. We provide a partial description of the rank two null space for the linear convolution map in Section 2.5.2. 2.5.1 Bi-orthogonally Supported Uniform Distributions A bi-orthogonal set of vectors is a collection of orthonormal vectors and their additive inverses. It is widely used for signal representation in image processing and as a modulation scheme in communication systems. We can construct a uniform distribution over a bi-orthogonal set and it would satisfy assumptions (A1)-(A3) as shown below. Let{e 1 ,e 2 ,...,e m } be an orthonormal basis forR m and the random unit vectoru∈{±e 1 ,±e 2 ,...,±e m } be drawn according to the law Pr(u = +e j ) = Pr(u =−e j ) = 1 2m , ∀1≤j≤m, (2.30) whereu has the same meaning as in Lemma 2.6 and Theorem 2.10. Letkxk 2 be drawn from a distribution (independent of u) supported on the non-negative real axis with E h kxk 2 2 i = m. Then, by construction, x =kxk 2 ·u satisfies assumption (A3) and it also satisfies assumption (A2) ifkyk 2 and v are drawn analogously but independent ofu andkxk 2 . Using (2.30), we further observe that E[u] = m X j=1 [Pr(u = +e j )− Pr(u =−e j )]·e j = 0 (2.31) and, E uu T = m X j=1 [Pr(u = +e j ) + Pr(u =−e j )]·e j e T j = 1 m m X j=1 e j e T j = 1 m I (2.32) where the last equality in (2.32) is true since{e 1 ,e 2 ,...,e m } is an orthonormal basis forR m . By independence ofkxk 2 fromu we have E[x] =E[kxk 2 ]·E[u] = 0 (2.33) 35 from (2.31), and E xx T =E h kxk 2 2 i ·E uu T =m· 1 m I = I (2.34) from (2.32). Hence,x is a zero mean random vector with an identity covariance matrix and thus satisfies assumption (A1). Following the same line of reasoning as in Section 2.4.5, we can show that a bi-orthogonally supported uniform prior on x and y gives an example of the setN (S, 2) T M with small complexity in the sense described in Section 2.4.1. Indeed, we have K ={(λ 1 e i ,λ 2 f j )| 1≤i≤m, 1≤j≤n,λ 1 ∈R,λ 2 ∈R} (2.35) where{e 1 ,e 2 ,...,e m } and{f 1 ,f 2 ,...,f n } respectively form an orthonormal basis forR m andR n . Let us selectK 0 according to (2.8), but without any relaxation, as K 0 = λe i f T j 1≤i≤m, 1≤j≤n,λ∈R . (2.36) It is clear that matrices inK 0 can account for at most m distinct column spaces and n distinct row spaces, thus implying that matrices inM =K 0 −K 0 are generated by at most m 2 ≤m 2 distinct column spaces and by at most n 2 ≤n 2 distinct row spaces. Thus,N (S, 2) T M⊆M is of small complexity (only polynomially large in m and n). In fact, exhaustive search for Problem (P 4 ) is tractable for any bi-orthogonally supported uniform prior, owing to the small complexity ofN (S, 2) T M. 2.5.2 Null Space of Linear Convolution The following proposition establishes a parametric representation of a subset ofN (S, 2) whereS :R m×n → R m+n−1 denotes the lifted equivalent of the linear convolution map in Problem (P 1 ). This result is essentially Lemma 3.5 from Chapter 3. As described by (2.10) in Section 2.2.3, let S k ∈R m×n , 1≤ k≤ m +n− 1 denote a basis forS (·). For 1≤i≤m, 1≤j≤n and 1≤k≤m +n− 1, we have the description (S k ) ij = ( 1, i +j =k + 1, 0, otherwise. (2.37) Figure 2.1 illustrates a toy example of the linear convolution map with m = 3 and n = 4. Proposition 2.15. IfX∈R m×n admits a factorization of the form X = u 0 0 −u 0 v T v T 0 (2.38) for somev∈R n−1 andu∈R m−1 , thenX∈N (S, 2). Proof. See the proof of Lemma 3.5 in the next chapter (Chapter 3). Since the set of m×n dimensional rank two matrices has 2(m +n− 2) DoF andS (·) mapsR m×n to R m+n−1 withN (S, 1) ={0},N (S, 2) has at most (2m + 2n− 4)− (m +n− 1) = (m +n− 3) DoF. We see that the representation on the r.h.s. of (2.38) also has (m +n− 3) DoF, so that our parametrization is tight up to DoF. The converse of Proposition 2.15 is false in general (see Proposition 3.2 in Chapter 3). 2.5.3 Verification Methodology We test identifiability by (approximately) solving the following optimization problem, minimize X rank(X) subject to kX−Mk F ≤μ, S (X) = 0, (P 5 ) 36 whereM =xy T is the true matrix and is a tuning parameter. The rationale behind solving Problem (P 5 ) is as follows. If the sufficient conditions of Theorem 2.4 are not satisfied, then∃X∈N (S, 2) T M such that bothx∈C(X) andy∈R(X) are true. We approximate the event E 1 ={∃X∈N (S, 2) withu∈C(X),v∈R(X)} (2.39a) by the event E 2 ={∃X∈N (S, 2) such thatkX−Mk F ≤μ}. (2.39b) As Problem (P 5 ) is itself NP-hard to solve exactly, we can employ the re-weighted nuclear norm heuristic [94] to solve Problem (P 5 ) approximately. If the resulting solution to Problem (P 5 ) has rank two then we declare that eventE 2 has happened. Clearly, we haveE 2 ⊆E 1 so that sufficient conditions for identifiability by Theorem 2.4 fail if eventE 2 took place. The examples we consider in Sections 2.5.4 to 2.5.6 are, however, motivated from the representation in (2.38) and share the same parametrization structure forN (S, 2). This enables us to use approximate verification techniques that are faster than the re-weighted nuclear norm heuristic, especially if the search space is discrete and finite. The re-weighted nuclear norm heuristic is still useful if no parametrization structure is available forN (S, 2). 2.5.4 Small Complexity ofN(S,2) T M Letx∈{e 0 1 ,e 0 2 ,...,e 0 m } andy∈{f 0 1 ,f 0 2 ,...,f 0 n } be drawn from bi-orthogonally supported uniform distribu- tions, as described in Section 2.5.1, where{e 0 1 ,e 0 2 ,...,e 0 m } and{f 0 1 ,f 0 2 ,...,f 0 n } respectively represent the canonical bases forR m andR n . We consider a lifted linear operatorS (·) with the following description: N (S, 2) T M consists ofb √ mc·b √ nc parts and the (i,j) th partP ij , 1≤i≤b √ mc, 1≤j≤b √ nc is given by P ij = λ e i 0 0 −e i 0 f T j f T j 0 λ∈R (2.40) where{e 1 ,e 2 ,...,e m−1 } and{f 1 ,f 2 ,...,f n−1 } respectively denote the canonical basis forR m−1 andR n−1 , andb·c is the floor function. Clearly, the elements ofP ij are closely related to the representation in (2.38). For this lifted linear operator, the bound of Theorem 2.10 is applicable with f S,M (m,n) =b √ mc·b √ nc implying that the probability of failure to satisfy the sufficient conditions of Theorem 2.4 decreases as O(1/ √ mn). Since exhaustive search for eventE 2 is tractable (see Section 2.5.1), we employ the same to compute the failure probability. The results are plotted in Figure 2.3 on a log-log scale. Note that we have plotted the best linear fit for the simulated parameter values, since the probabilities can be locally discontinuous in logn due to the appearance ofb·c function in the expression of f S,M (m,n). We see that the simulated order of growth of the failure probability is O n −0.48 for every fixed value of m (exponent determined by slope of plot in Figure 2.3) almost exactly matches the theoretically predicted order of growth (equals O n −0.5 ). 2.5.5 Large Complexity ofN(S,2) T M Letx∈R m andy∈R n be drawn component-wise independently from a symmetric Bernoulli distribution (see Section 2.4.5) and let τ∈ (0, 1) be a constant. Following our guiding representation (2.38), we consider a lifted linear operatorS (·) with the following description:N (S, 2) consists of 2 bτmc × 2 bτnc parts and the (i,j) th partP ij , 0≤i≤ 2 bτmc − 1, 0≤j≤ 2 bτnc − 1, is given by P ij = λ g i 0 1 −g i 0 −1 0 h T j 1 T h T j 1 T 0 λ∈R (2.41) whereg i ∈{−1, 1} bτmc (respectivelyh j ∈{−1, 1} bτnc ) denotes the binary representation of i (respectively j) of lengthbτmc bits (respectivelybτnc bits) expressed in the alphabet set{−1, 1}, and the all one column 37 0 2 4 6 8 10 12 14 −8 −6 −4 −2 0 2 log n log Failure Probability m = 4 (sim) m = 16 (sim) m = 64 (sim) m = 4 (fit) m = 16 (fit) m = 64 (fit) Figure 2.3: Linear Scaling behavior of log(Failure Probability) with logn for fixed values of m. The absolute value of the fitted slope is 0.48. 0 5 10 15 20 25 30 −5 −4 −3 −2 −1 0 dimension, n log 10 Failure Probability Bound m = 9 (sim) m = 11 (sim) m = 13 (sim) m = 9 (fit) m = 11 (fit) m = 13 (fit) Figure 2.4: Linear Scaling behavior of log(Failure Probability) with problem dimension n for fixed values of m. The absolute value of the fitted slopes are between 0.093 and 0.094. vectors in (2.41) are of appropriate dimensions so that the elements ofP ij are matrices inR m×n . The bound in Theorem 2.13 is applicable to this example. We employ exhaustive search for eventE 2 for small values of m and n (it is computationally intractable for large m or n). The results are plotted in Figure 2.4 on a semilog scale, where we have used τ = 0.2 and δ 0 = 0.3 and δ 0 is as in the statement of Theorem 2.13. As in the case of Figure 2.3, we plot the best linear fit for the simulated parameter values to disregard local discontinuities introduced due to the use of theb·c function. Since it is hard to analytically compute the metric entropies p c and p r , we shall settle for a numerical verification of the scaling law with problem dimension and an approximate argument as to the validity of predictions made by Theorem 2.13 for this example. By construction, we have the bounds p c ≤bτmc and p r ≤bτnc but the careful reader will note that because of the element-wise constant magnitude property of a symmetric Bernoulli random vector, it does not lie in the column span of any of the matrices inN (S, 2), as described by the generative description in (2.41), but can be arbitrarily close to such a span as m increases. We thus expect that p c = c m and p r = r n for some parameters c and r close to zero. By choice of parameters, ≤ p (1−δ 0 )/2 = 0.59. With c = r = 0 and setting = 0.01 the theoretical prediction on the absolute value of the slope is 0.073 which is quite close to the simulated value of 0.093. We clearly recover the linear scaling behavior of the logarithm of failure probability with the problem dimension n. 2.5.6 Infinite Complexity ofN(S,2) T M Letx∈R m andy∈R n be drawn component-wise independently from a standard Normal distribution. We consider the linear convolution operator from Problem (P 1 ), lettingS (·) denote the lifted linear convolution map. A representation ofS (·) and a description of the rank two null spaceN (S, 2) has been mentioned in 38 5 5.5 6 6.5 7 7.5 8 8.5 9 −10 −9 −8 −7 −6 −5 −4 −3 dimension, n log Failure Probability Bound m = 3 (sim) m = 4 (sim) m = 5 (sim) m = 3 (fit) m = 4 (fit) m = 5 (fit) Figure 2.5: Exponentially decaying behavior of the simulated failure probability w.r.t. n for fixed values of m, for parameter μ = 0.8 and the lifted linear convolution map. The absolute value of the fitted slopes are between 0.94 and 1.08. the prequel (Section 2.5.2). The bound in Theorem 2.14 is applicable to this example. However, unlike the examples in Sections 2.5.4 and 2.5.5, we cannot employ exhaustive search overN (S, 2) to test identifiability, since the search space is uncountably infinite by Proposition 2.15. We resort to the method described in Section 2.5.3 relying on the re-weighted nuclear norm heuristic. The results are plotted in Figure 2.5 on a semilog scale, where we have used μ = 0.8 to detect the occurrence of the eventE 2 as described by (2.39b), andM in Problem (P 5 ) is normalized such thatkMk F = 1. A relatively high value of μ = 0.8 is used to ensure that the rare eventE 2 admits a large enough probability of occurrence. Only data points that satisfy n≥m are plotted since the behavior of the convolution operator is symmetric w.r.t. the order of its inputs. Since the re-weighted nuclear norm heuristic does not always converge monotonically in a small number of steps, we stopped execution after a finite number of steps, which might explain the small deviation from linearity, observed in Figure 2.5, as compared to the respective best linear fits on the same plot. Nonetheless, we approximately recover the theoretically predicted qualitative linear scaling law of the logarithm of the failure probability with the problem dimension n, for fixed values of m. There does not seem to be an easy way of comparing the constants involved in the simulated result to their theoretical counterparts as predicted by Theorem 2.14. 2.6 Conclusions Bilinear transformations occur in a number of signal processing problems like linear and circular convolution, matrix product, linear mixing of multiple sources, etc. Identifiability and signal reconstruction for the corresponding inverse problems are important in practice and identifiability is a precursor to establishing any form of reconstruction guarantee. In the present chapter, we determined a series of sufficient conditions for identifiability in conic prior constrained Bilinear Inverse Problems (BIPs) and investigated the probability of achieving those conditions under three classes of random input signal ensembles, viz. dependent but uncorrelated, independent Gaussian, and independent Bernoulli. The theory is unified in the sense that it is applicable to all BIPs, and is specifically developed for bilinear maps over vector pairs with non-trivial rank two null space. Universal identifiability is absent for many interesting and important BIPs owing to the non-triviality of the rank two null space, but a deterministic characterization of the input instance identifiability is still possible (may be hard to check). Our probabilistic results were formulated as scaling laws that trade-off probability of identifiability with the complexity of the restricted rank two null space of the bilinear map in question, and results were derived for three different levels of complexity, viz. small (polynomial in the signal dimension), large (exponential in the signal dimension) and infinite. In each case, identifiability can hold with high probability depending on the relative geometry of the null space of the 39 bilinear map and the signal space. Overall, most random input instances are identifiable, with the probability of identifiability scaling inversely with the complexity of the rank two null space of the bilinear map. An especially appealing aspect of our approach is that the rank two null space can be partly or fully characterized for many bilinear problems of interest. We demonstrated this by partly characterizing the rank two null space of the linear convolution map, and presented numerical verification of the derived scaling laws on examples that were based on variations of the blind deconvolution problem, exploiting the representation of its rank two null space. Overall, the results in this chapter indicate that lifting is a powerful technique for identifiability analysis of general cone constrained BIPs. 2.7 Proofs 2.7.1 Proof of Theorem 2.1 1. Suppose that (x 0 ,y 0 )∈K is a solution to Problem (P 2 ) for a given observationz =z 0 =S(x 0 ,y 0 ). SettingW 0 =x 0 y T 0 and using (2.8) we haveW 0 ∈K 0 . Using (2.11) we getS (W 0 ) =S(x 0 ,y 0 ) =z 0 and rank(W 0 ) = rank x 0 y T 0 ≤ 1. Thus,W 0 is a feasible point of Problem (P 4 ) with rank at most one. As there exists a rank≤ 1 matrixW satisfyingS (W ) =z andW∈K 0 , the solution of Problem (P 4 ) must be of rank one or less. 2. AnyW∈K 0 opt ⊆K 0 satisfies rank(W )≤ 1 (see proof of first part) andS (W ) =z. Thus, K 0 opt ⊆K 0 \ W∈R m×n rank(W )≤ 1 \ W∈R m×n S (W ) =z (2.42a) = xy T (x,y)∈K \ {W|S (W ) =z} (2.42b) = xy T (x,y)∈K \ xy T S xy T =z = xy T (x,y)∈K \ xy T S(x,y) =z (2.42c) = xy T (x,y)∈K opt , (2.42d) where (2.42b) is due to (2.8), (2.42c) is due to (2.11) and (2.42d) is true because Problem (P 2 ) is a feasibility problem. 3. The feasible set for Problem (P 4 ) isK 0 T {W∈R m×n |S (W ) =z}. From the proof of second part, we know that K 0 \ {W| rank(W )≤ 1} \ {W|S (W ) =z} = xy T (x,y)∈K opt . (2.43) Thus, clearly xy T (x,y)∈K opt ⊆K 0 \ {W|S (W ) =z}. (2.44) Weshallprovethecontrapositivestatementineachdirection. Firstassumethat{0}( xy T (x,y)∈K opt . By (2.44), 0 is a feasible point for Problem (P 4 ) and thus rank(0) = 0 is the optimal value for this problem. Since every W ∈ R m×n \{0} has a rank strictly greater than zero we conclude that K 0 opt ={0}6= xy T (x,y)∈K opt . Conversely, suppose thatK 0 opt 6= xy T (x,y)∈K opt . Since K 0 opt ⊆ xy T (x,y)∈K opt (see proof of second part),∃(x 0 ,y 0 )∈K opt such thatx 0 y T 0 / ∈K 0 opt . By (2.44),x 0 y T 0 is a feasible point for Problem (P 4 ) and hence the optimal value of this problem is strictly less than rank x 0 y T 0 ≤ 1. The only way for this to be possible is to have rank x 0 y T 0 = 1 and the optimal value of Problem (P 4 ) as zero. Since the only matrix of rank zero is the all zero matrix, we conclude that{0} =K 0 opt ( xy T (x,y)∈K opt . 40 2.7.2 Proof of Corollary 2.2 From (2.3), (2.42a) and (2.42d) we have K 0 \ N (S, 1) = xy T (x,y)∈K opt (0) . (2.45) We shall prove the contrapositive statements. First assume that{0}(K 0 T N (S, 1). Using (2.45), we have {0}( xy T (x,y)∈K opt (0) andthelastpartofTheorem2.1impliesthatK 0 opt (0)6= xy T (x,y)∈K opt (0) . SinceK opt (0) is nonempty, 0∈{S(x,y)| (x,y)∈K}. Thus, Problems (P 2 ) and (P 4 ) are not equivalent (equivalence fails forz = 0). Conversely, suppose that∃z∈{S(x,y)| (x,y)∈K} resulting inK 0 opt (z)6= xy T (x,y)∈K opt (z) . Using last part of Theorem 2.1, we have{0}( xy T (x,y)∈K opt (z) , which is possible only ifz =S (0) = 0. Now using (2.45) we get{0}(K 0 T N (S, 1). 2.7.3 Proof of Proposition 2.3 Problem (P 4 ) fails if and only if it admits more than one optimal solution. LetN (S, 2) T M ={0} and for the sake of contradiction suppose thatW 1 ∈K 0 andW 2 ∈K 0 denote two solutions to Problem (P 4 ) for some observationz, so that (W 1 −W 2 )∈M. Then,S (W 1 ) =S (W 2 ) so that (W 1 −W 2 ) is in the null space ofS. But, rank(W 1 −W 2 )≤ rank(W 1 ) +rank(W 2 )≤ 2 so that we haveW 1 −W 2 = 0 and Problem (P 4 ) has a unique solution. Conversely, let Problem (P 4 ) have a unique solution for every observationz =S(x,y). For the sake of contradiction, suppose that there is a matrixY inN (S, 2) T M\{0}. SinceY ∈M,∃Y 1 ,Y 2 ∈K 0 such that Y =Y 1 −Y 2 . Further,Y 6= 0 is in the null space ofS, so thatz =S (Y 1 ) =S (Y 2 ) withY 1 6=Y 2 implying thatY 1 andY 2 are both valid solutions to Problem (P 4 ) for the observationz. SinceK 0 = xy T (x,y)∈K , ∃(x 1 ,y 1 ), (x 2 ,y 2 )∈K such thatS(x 1 ,y 1 ) =S (Y 1 ) andS(x 2 ,y 2 ) =S (Y 2 ), so thatz =S (Y 1 ) =S (Y 2 ) is a valid observation. This violates the unique solution assumption on Problem (P 4 ) for the valid observation z. HenceN (S, 2) T M ={0}, completing the proof. 2.7.4 Proof of Theorem 2.4 LetM ∗ ∈K 0 be a solution to Problem (P 4 ) such thatM ∗ 6=M. SinceM is a valid solution to Problem (P 4 ), we have rank(M ∗ ) = rank(M) = 1 and X = M−M ∗ ∈N (S, 2) T M\{0}. If M ∗ = σ ∗ u ∗ v T ∗ , then R(X) = Span(v,v ∗ ) andC(X) = Span(u,u ∗ ). This contradicts the assumption that at least one ofu / ∈C(X) orv / ∈R(X) is true and completes the proof. 2.7.5 Proof of Corollary 2.5 We start with the “if” part. ForM to be identifiable, we need rank(M−X)> 1 for every matrixX6=M in the null space ofS (·) that also satisfiesM−X∈K 0 . Since rank(M−X)≥ rank(X)−rank(M) = rank(X)− 1, it is sufficient to consider matrices X with rank(X)≤ 2. Thus, for identifiability of M, we need rank(M−X) > 1, ∀X ∈ N (S, 2) T (M−K 0 )\{0}. UsingN (S, 1) T M = {0} and X ∈ N (S, 2) T (M−K 0 )\{0}, we haveu∈C(X) andv∈R(X) and by assumption, we have σ 1 (X) =σ 2 (X). Let X = σ ∗ u 1 v T 1 +σ ∗ u 2 v T 2 and u = α 1 u 1 +α 2 u 2 , v = α 3 v 1 +α 4 v 2 for some α 1 ,α 2 ,α 3 ,α 4 ∈ R with α 2 1 +α 2 2 =α 2 3 +α 2 4 = 1. It is easy to check thatX has the following equivalent singular value decompositions, X =σ ∗ u 1 v T 1 +σ ∗ u 2 v T 2 =σ ∗ (α 1 u 1 +α 2 u 2 )(α 1 v 1 +α 2 v 2 ) T +σ ∗ (α 2 u 1 −α 1 u 2 )(α 2 v 1 −α 1 v 2 ) T . (2.46) Using the representations foru andv, we have, M−X =−σ ∗ (α 2 u 1 −α 1 u 2 )(α 2 v 1 −α 1 v 2 ) T + (α 1 u 1 +α 2 u 2 )[(σα 3 −σ ∗ α 1 )v 1 + (σα 4 −σ ∗ α 2 )v 2 ] T . (2.47) As the column vectorsu =α 1 u 1 +α 2 u 2 andu 0 =α 2 u 1 −α 1 u 2 on the right hand side of (2.47) are linearly independent, rank(M−X) = 1 is possible if and only if every column ofM−X combinesu andu 0 in the 41 same ratio. This means that the row vectors on the r.h.s. of (2.47) are scalar multiples of each other. Thus, for rank(M−X) = 1 it is necessary that σα 3 −σ ∗ α 1 α 2 = σα 4 −σ ∗ α 2 −α 1 (2.48a) or equivalently, σα 1 α 3 +σα 2 α 4 =σ ∗ α 2 1 +σ ∗ α 2 2 =σ ∗ (2.48b) which is not possible unless, α 1 α 3 +α 2 α 4 = σ ∗ σ > 0. (2.48c) So, α 1 α 3 +α 2 α 4 ≤ 0 implies that rank(M−X)> 1. AsX∈N (S, 2) T (M−K 0 )\{0} is arbitrary,M is identifiable by Problem (P 4 ). Next we prove the “only if” part. Let M be identifiable and X∈N (S, 2) T (M−K 0 )\{0} so that M−X is feasible for Problem (P 4 ). As before, we have u∈C(X), v∈R(X) and σ 1 (X) = σ 2 (X). If X = σ ∗ u 1 v T 1 +σ ∗ u 2 v T 2 , then u = α 1 u 1 +α 2 u 2 , v = α 3 v 1 +α 4 v 2 for some α 1 ,α 2 ,α 3 ,α 4 ∈ R with α 2 1 +α 2 2 = α 2 3 +α 2 4 = 1. It is simple to check that (2.46) and (2.47) are valid. We shall now assume = α 1 α 3 +α 2 α 4 > 0 and arrive at a contradiction. Since multiplying a matrix by a nonzero scalar does not change its row or column space and scales every nonzero singular value in the same ratio, we can take σ ∗ = σ without violating any assumptions on X. Thus, α 1 α 3 +α 2 α 4 = σ ∗ /σ and we have (2.48c) =⇒ (2.48b) =⇒ (2.48a) =⇒ rank(M−X) = 1 (the last implication is due to (2.47)) thus contradicting the identifiability ofM. 2.7.6 Proof of Lemma 2.6 Using assumption (A1), we have E h kxk 2 2 i =E x T x =E Tr xx T = Tr E xx T = Tr(I) =m. (2.49) Hence, E h P C(X) u 2 2 i = 1 m E h kxk 2 2 i E h P C(X) u 2 2 i (2.50a) = 1 m E h kxk 2 2 P C(X) u 2 2 i (2.50b) = 1 m E h P C(X) x 2 2 i (2.50c) = 1 m E x T P C(X) x (2.50d) = 1 m E Tr P C(X) xx T = 1 m Tr P C(X)E xx T (2.50e) = 1 m Tr P C(X) I (2.50f) ≤ 2 m (2.50g) where (2.50a) follows from (2.49), (2.50b) and (2.50c) are true becauseu =x/kxk 2 and assumption (A3) implies independence ofkxk 2 andu, (2.50d) is true sinceP 2 =P for any projection matrixP, (2.50e) is true since expectation operator commutes with trace and projection operators, (2.50f) follows from assumption (A1) and, (2.50g) is true sinceX∈N (S, 2) T M\{0} is a matrix of rank at most two. 42 Finally, applying Markov inequality to the non-negative random variable P C(X) u 2 2 and using the computed estimate ofE h P C(X) u 2 2 i from (2.50) gives Pr P C(X) u 2 2 ≥ 1−δ ≤ E h P C(X) u 2 2 i 1−δ = 2 m(1−δ) . (2.51) We have thus established (2.14a). Using the exact same sequence of steps for the random vectorv gives the bound in (2.14b). 2.7.7 Proof of Lemma 2.7 Notice that (2.50d)-(2.50g) in the proof of Lemma 2.6 in Section 2.7.6 does not use assumption (A3). Hence, reusing the same arguments we get E h P C(X) x 2 2 i =E x T P C(X) x ≤ 2. (2.52) Thus, we have Pr P C(X) u 2 2 ≥ 1−δ = Pr P C(X) x 2 2 ≥ (1−δ)kxk 2 2 (2.53a) ≤ Pr P C(X) x 2 2 ≥ (1−δ)r 2 x (2.53b) ≤ E h P C(X) x 2 2 i r 2 x (1−δ) (2.53c) ≤ 2 r 2 x (1−δ) (2.53d) where (2.53a) is true since u = x/kxk 2 , (2.53b) holds because of assumption (A4), (2.53c) follows from applying Markov inequality to the non-negative random variable P C(X) x 2 2 and, (2.53d) follows from (2.52). Thus, the derivation (2.53) establishes (2.15a). Using the same sequence of steps for the random vectorv gives the bound in (2.15b). 2.7.8 Proof of Theorem 2.10 For any constant δ ∈ (0, 1), letA(δ) denote the event that∃X ∈N (S, 2) T M\{0} satisfying both u−P C(X) u 2 2 ≤δ and v−P R(X) v 2 2 ≤δ. We note thatA(δ) constitutes a non-decreasing sequence of sets as δ increases. Hence, using continuity of the probability measure from above we have, Pr(A(0))≤ Pr(A(δ)) (2.54a) for any δ∈ (0, 1), and Pr(A(0)) = lim δ→0 Pr(A(δ)). (2.54b) Note thatA(0) denotes the event that∃X∈N (S, 2) T M\{0} satisfying bothu∈C(X) andv∈R(X) which is a “hard” event. The eventA(0) c corresponds precisely to the sufficient conditions of Theorem 2.4. Hence, it is sufficient to obtain an appropriate lower bound for Pr(A(0) c ) to make our desired statement. Drawing inspiration from (2.54a) and (2.54b), we shall upper bound Pr(A(0)) by Pr(A(δ)). 43 For any givenX∈N (S, 2) T M\{0} we have, Pr u−P C(X) u 2 2 ≤δ, v−P R(X) v 2 2 ≤δ = Pr kuk 2 2 − P C(X) u 2 2 ≤δ,kvk 2 2 − P R(X) v 2 2 ≤δ (2.55a) = Pr P C(X) u 2 2 ≥ 1−δ, P R(X) v 2 2 ≥ 1−δ (2.55b) = Pr P C(X) u 2 2 ≥ 1−δ Pr P R(X) v 2 2 ≥ 1−δ (2.55c) ≤ 4 mn(1−δ) 2 (2.55d) where (2.55a) is true because I−P C(X) (respectively I−P R(X) ) is the orthogonal projection matrix onto the orthogonal complement space ofC(X) (respectivelyR(X)), (2.55b) is true because we havekuk 2 =kvk 2 = 1, (2.55c) is true by independence ofu andv, and (2.55d) comes from applying Lemma 2.6. Next we employ union bounding over allX∈N (S, 2) T M\{0} representing distinct pairs of column and row subspaces (C(X),R(X)) to upper bound Pr(A(δ)). We denote the number of these distinct pairs of (C(X),R(X)) overX∈N (S, 2) T M\{0} by f S,M (m,n). Finally, using (2.55) we have Pr(A(δ))≤ X (C(X),R(X)) Pr P C(X) ⊥u 2 2 ≤δ, P R(X) ⊥v 2 2 ≤δ (2.56a) =f S,M (m,n) Pr P C(X) ⊥u 2 2 ≤δ, P R(X) ⊥v 2 2 ≤δ ≤ 4f S,M (m,n) mn(1−δ) 2 (2.56b) where (2.56a) is an union bounding step. Hence, Pr(A(0) c ) = 1− Pr(A(0)) ≥ 1− Pr(A(δ)) (2.57a) ≥ 1− 4f S,M (m,n) mn(1−δ) 2 (2.57b) ≥ 1− 4f S,M (m,n) mn(1−δ 0 ) (2.57c) where (2.57a) is from (2.54a), (2.57b) is from (2.56) and δ 0 = 1− (1−δ) 2 ∈ (0, 1). 2.7.9 Proof of Corollary 2.11 The proof is essentially to that of Theorem 2.10 in Section 2.7.8 with one important difference: we use Lemma 2.7 instead of Lemma 2.6 when bounding the right hand side of (2.55c). This gives us the bound Pr u−P C(X) u 2 2 ≤δ, v−P R(X) v 2 2 ≤δ ≤ 4 r 2 x (m)r 2 y (n)(1−δ) 2 (2.58) which leads to the bound Pr(A(δ))≤ 4f S,M (m,n) r 2 x (m)r 2 y (n)(1−δ) 2 (2.59) 44 in place of (2.56b). Finally, Pr(A(0) c ) = 1− Pr(A(0)) ≥ 1− Pr(A(δ)) (2.60a) ≥ 1− 4f S,M (m,n) r 2 x (m)r 2 y (n)(1−δ) 2 (2.60b) ≥ 1− 4f S,M (m,n) r 2 x (m)r 2 y (n)(1−δ 0 ) (2.60c) where (2.60a) is from (2.54a), (2.60b) is from (2.59) and δ 0 = 1− (1−δ) 2 ∈ (0, 1). 2.7.10 Proof of Lemma 2.8 This is a Chernoff-type bound. We set Y = P C(X) x 2 2 − (1−δ)kxk 2 2 =δ P C(X) x 2 2 − (1−δ) P C(X) ⊥x 2 2 (2.61) and compute the bound Pr(Y ≥ 0)≤E[exp(tY )] (2.62) that holds for all values of the parameter t for which the right hand side of (2.62) exists. Using properties of Gaussian random vectors under linear transforms, we haveP C(X) x andP C(X) ⊥x as statistically independent Gaussian random vectors implying E exp tδ P C(X) x 2 2 −t(1−δ) P C(X) ⊥x 2 2 =E[exp(tδZ 1 )]·E[exp(−t(1−δ)Z 2 )]. (2.63) with, Z 1 = P C(X) x 2 2 and Z 2 = P C(X) ⊥x 2 2 . (2.64) SinceN (S, 1) T M ={0}, bothC(X) andR(X) are two dimensional spaces. On rotating coordinates to the basis given by n C(X),C(X) ⊥ o , it can be seen that Z 1 is the sum of squares of two i.i.d. standard Gaussian random variables and hence has a χ 2 distribution with two DoF. By the same argument, Z 2 is a χ 2 distributed random variable with (m− 2) DoF. Recall that the moment generating function of a χ 2 distributed random variable Z with k DoF is given by E[exp(tZ)] = (1− 2t) −k/2 , ∀t< 1/2. (2.65) Using (2.61), (2.62), (2.63) and (2.65) we have the bound Pr(Y ≥ 0)≤ (1− 2tδ) −1 (1− 2t(1−δ)) −(m−2)/2 = exp − m− 2 2 log(1− 2t(1−δ))− log(1− 2tδ) (2.66) which can be optimized over t. It can be verified by differentiation that the best bound is obtained for t ∗ = m− 2 2δm − 1 m(1−δ) . (2.67) Plugging this value of t into (2.66) and using (2.61) we get the desired result. 45 2.7.11 Proof of Lemma 2.9 This is also a Chernoff-type bound. Although the final results of Lemmas 2.8 and 2.9 look quite similar, we cannot reuse the manipulations in Section 2.7.10 for this proof and proceed by a slightly different route (also applicable to other subgaussian distributions) since the symmetric Bernoulli distribution does not share the rotational invariance property of the multivariate standard normal distribution. Let{c 1 ,c 2 } denote an orthonormal basis forC(X) and set Y = P C(X) x 2 2 − (1−δ)kxk 2 2 . (2.68) Notice thatkxk 2 = √ m, so we have Pr(Y ≥ 0) = Pr P C(X) x 2 2 ≥m(1−δ) = Pr |hc 1 ,xi| 2 +|hc 2 ,xi| 2 ≥m(1−δ) ≤ Pr [ j=1,2 n |hc j ,xi| 2 ≥ m 2 (1−δ) o (2.69a) ≤ 2 Pr |hc,xi| 2 ≥ m 2 (1−δ) (2.69b) = 2 Pr |hc,xi|≥ r m 2 (1−δ) = 4 Pr hc,xi≥ r m 2 (1−δ) (2.69c) ≤ 4 exp t 2 2 −t r m 2 (1−δ) (2.69d) where (2.69a) and (2.69b) utilize elementary union bounds, c is a generic unit vector, (2.69c) uses the symmetry of the distribution ofx about the origin, and (2.69d) is the Chernoff bounding step that utilizes 46 the following computation: E[exp(thc,xi)] =E exp m X j=1 tx j c j =E m Y j=1 exp(tx j c j ) = m Y j=1 E[exp(tx j c j )] (2.70a) = m Y j=1 e tcj +e −tcj 2 (2.70b) = m Y j=1 ∞ X k=0 (tc j ) 2k (2k)! (2.70c) < m Y j=1 ∞ X k=0 (tc j ) 2k 2 k k! (2.70d) = m Y j=1 exp t 2 c 2 j /2 = exp t 2 2 m X j=1 c 2 j = exp t 2 2 (2.70e) where (2.70a) uses independence of elements ofx, (2.70b) is true because each element ofx has a symmetric Bernoulli distribution, (2.70c) uses the series expansion of the exponential function, (2.70e) follows from kck 2 = 1 and (2.70d) is due to (2k)! = 2 k k−1 Y r=0 (2r + 1)> 2 k k−1 Y r=0 (r + 1) = 2 k k!. (2.71) The bound in (2.69d) can be optimized over t with the optimum being achieved at t ∗ = r m 2 (1−δ). (2.72) Plugging this value of t into (2.69d) gives the desired result. 2.7.12 Proof of Lemma 2.12 Consider the normk·k 2,∞ onR m×2 defined as kYk 2,∞ = max{ky 1 k 2 ,ky 2 k 2 } (2.73) for allY = [y 1 ,y 2 ]∈R m×2 . It is clear thatD(m) is the unit ball n Y ∈R m×2 kYk 2,∞ ≤ 1 o of this norm, which is a convex body symmetric about the origin. Hence, using (2.22) we have the metric entropy of 47 D(m) w.r.t. D(m) as 2m log Θ(1/). It is clear thatG(m) = Y ∈R m×2 Y T Y = I (D(m), implying that metric entropy ofG(m) w.r.t. D(m) is≤ 2m log Θ(1/). LetY = [y 1 ,y 2 ] andZ = [z 1 ,z 2 ] be two elements fromG(m) such thatY−Z∈D(m), and letx∈R m be arbitrary. Then, P C(Y ) x 2 = q |hy 1 ,xi| 2 +|hy 2 ,xi| 2 = s X j=1,2 |hz j ,xi +hy j −z j ,xi| 2 ≤ s X j=1,2 (|hz j ,xi| +|hy j −z j ,xi|) 2 (2.74a) ≤ s X j=1,2 (|hz j ,xi| +kxk 2 ) 2 (2.74b) = kxk 2 1 1 + |hz 1 ,xi| |hz 2 ,xi| 2 ≤ √ 2kxk 2 + P C(Z) x 2 (2.74c) where (2.74a) is due to (x +y) 2 ≤ (|x| +|y|) 2 ,∀x,y∈R, (2.74b) is due to the Cauchy-Schwartz inequality and the boundky j −z j k 2 ≤,j = 1, 2 asY−Z∈D(m), and (2.74c) is due to the triangle inequality. Since Y andZ are interchangeable in the derivation of (2.74c) andx is arbitrary, we immediately arrive at (2.23). 2.7.13 Proof of Theorem 2.13 We follow a proof strategy similar to that of Theorem 2.10. For any constant δ∈ (0, 1), letB c (δ) (respectively B r (δ)) denote the event that∃X∈N (S, 2) T M\{0} satisfying, P C(X) x 2 2 ≥ (1−δ)kxk 2 2 (respectively P R(X) y 2 2 ≥ (1−δ)kyk 2 2 ), and letA(δ) denote the event that∃X∈N (S, 2) T M\{0} satisfying both P C(X) x 2 2 ≥ (1−δ)kxk 2 2 and P R(X) y 2 2 ≥ (1−δ)kyk 2 2 . We note thatA(δ) constitutes a non-decreasing sequence of sets as δ increases. Hence, using continuity of the probability measure from above we have, Pr(A(0))≤ Pr(A(δ)) (2.75a) for any δ∈ (0, 1), and Pr(A(0)) = lim δ→0 Pr(A(δ)). (2.75b) Note thatA(0) denotes the event that∃X∈N (S, 2) T M\{0} satisfying bothx∈C(X) andy∈R(X) which is a “hard” event. The eventA(0) c corresponds precisely to the sufficient conditions of Theorem 2.4. Hence, it is sufficient to obtain an appropriate lower bound for Pr(A(0) c ), or alternatively, upper bound Pr(A(0)) using (2.75a). It is straightforward to see that Pr(A(δ))≤ Pr B r (δ) \ B c (δ) (2.76a) = Pr(B r (δ)) Pr(B c (δ)), (2.76b) where (2.76a) is becauseA(δ) happens only whenB r (δ) andB c (δ) are caused by the same matrix X∈ N (S, 2) T M\{0}, and (2.76b) is due to mutual independence betweenx andy. We havex andy drawn component-wise i.i.d. from a symmetric Bernoulli distribution. For any given Y ∈N (S, 2) T M\{0} we have a bound on Pr P C(Y ) x 2 ≥ √ 1−δkxk 2 from Lemma 2.9. We focus on the union bounding step to compute Pr(B c (δ)). The proof of Lemma 2.12 assures us that as long asY and Z are close enough withY,Z∈H 1 (m), i.e. within the same D(m) ball for some 1>≥ 0 > 0, we are guaranteed tight control over P C(Y ) x 2 − P C(Z) x 2 for any arbitraryx. In fact, using (2.74) we have the bound Pr ∃Y ∈Z +D(m), P C(Y ) x 2 ≥ √ 1−δkxk 2 ≤ Pr P C(Z) x 2 ≥ √ 1−δ− √ 2 kxk 2 . (2.77) 48 LettingZ k ∈R m×2 denote the center of the k th D(m) ball we have k ranging from 1 to exp[p c log Θ(1/)]. We thus have Pr(B c (δ)) upper bounded by X k Pr ∃Y ∈Z k +D(m), P C(Y ) x 2 ≥ √ 1−δkxk 2 (2.78a) ≤ X k Pr P C(Z k ) x 2 ≥ √ 1−δ− √ 2 kxk 2 (2.78b) ≤ exp p c log Θ 1 Pr P C(Z) x 2 ≥ √ 1−δ 0 kxk 2 (2.78c) ≤ 4 exp p c log Θ 1 − m(1−δ 0 ) 4 (2.78d) where (2.78a) is from an elementary union bound, (2.78b) is from (2.77), (2.78c) uses δ 0 = 1− √ 1−δ− √ 2 2 , (2.79) withZ being generic, and (2.78d) is true due to Lemma 2.9. Replicating a similar sequence of steps to bound Pr(B r (δ)), one readily obtains the bound Pr(B r (δ))≤ 4 exp p r log Θ 1 − n(1−δ 0 ) 4 (2.80) with δ 0 given by (2.79). Hence, combining (2.75a), (2.76b), (2.78d) and (2.80) we get Pr(A(0))≤ Pr(B r (δ)) Pr(B c (δ))≤ 16 exp (p c +p r ) log Θ 1 − (m +n) 1−δ 0 4 (2.81) which yields the desired bound for Pr(A(0) c ) when p =p c +p r . 2.7.14 Proof of Theorem 2.14 We havex andy drawn component-wise i.i.d. from a standard Gaussian distribution. The proof is essentially similartothatofTheorem2.13withoneimportantdifference(besidereplacingalloccurrencesofN (S, 2) T M byN (S, 2) and assuming values in (0, 1)): we use the bound given by Lemma 2.8 instead of Lemma 2.9 when evaluating Pr P C(Z) x 2 ≥ √ 1−δ 0 kxk 2 in (2.78c). This gives us the bounds Pr(B c (δ 0 ))≤C 0 (m,δ 0 ) exp p c log Θ 1 −m log 1 √ δ 0 (2.82a) and (analogously), Pr(B r (δ 0 ))≤C 0 (n,δ 0 ) exp p r log Θ 1 −n log 1 √ δ 0 (2.82b) where C 0 (m,δ 0 ) = exp 2 logm− 2 m + 2− log 2δ 0 1−δ 0 = 0.5 exp(2) 1−δ 0 δ 0 exp 2 logm− 2 m = 1−δ 0 δ 0 Θ m 2 , (2.83) As in (2.81), we have Pr(A(0))≤ Pr(B r (δ 0 )) Pr(B c (δ 0 )) ≤C 0 (m,δ 0 )C 0 (n,δ 0 ) exp (p r +p c ) log Θ 1 exp −(m +n) log 1 √ δ 0 (2.84) 49 which gives the desired bound, since p =p c +p r and C 0 (m,δ 0 )C 0 (n,δ 0 ) = 1−δ 0 δ 0 2 Θ m 2 Θ n 2 =C(m,n,δ 0 ). (2.85) 50 Chapter 3 Identifiability Limits of Sparse Blind Deconvolution 3.1 Introduction Blind deconvolution is a challenging inverse problem that is ubiquitous in applications of signal processing, control theory and wireless communications, such as blind image restoration [36], [37], blind system identifi- cation [27], [32] and, blind channel estimation and equalization [18], [21], [33]–[35], [95]. In the absence of additional constraints, blind deconvolution is known to be ill-posed (in the sense of Hadamard [96], [97]) and each application mentioned above imposes some form of prior knowledge on the underlying signal structures to render the inverse problem better behaved. Sparsity based models have been used extensively in the past decade to capture hidden signal structures in many applications of interest. Prominent examples of the exploitation of sparsity include natural images admitting sparse wavelet domain representations [14], [98], ultra wide band communication channels exhibiting sparsity in Doppler-delay domain representations [99], [100], and user preferences and topic models displaying low-rank structures [16], [101] (sparsity in eigenvalue domain). While there have been a few attempts at exploiting sparsity priors for blind deconvolution and related problems [19], [24], [25], [28], [45], [46], there does not appear to be a general characterization of identifiability of such sparse models, except with very restrictive constraints. For example, [25] assumes a single random subspace signal model as opposed to the more commonly used union of subspaces signal model in compressed sensing [4]. In the present chapter, we prove some surprising negative results for signal identifiability in the non-sparse and sparsity constrained blind and semi-blind deconvolution problems, even for the single subspace model. Before explaining our contributions in detail in Section 3.1.2, we present below the rotational ambiguity phenomenon underlying our impossibility results. 3.1.1 Intuition The discrete-time unconstrained blind linear deconvolution problem can be stated as the task of recovering (up to scalar multiplicative ambiguities) the unknown signal pair (x ∗ ,y ∗ )∈K =R m ×R n from the observation of their noise free linear convolutionz ∗ =x ∗ ?y ∗ ∈R m+n−1 , where ?:R m ×R n →R m+n−1 denotes the linear convolution operator. It is intuitive to see that (x ∗ ,y ∗ ) is unidentifiable (impossible to recover uniquely) if there exists another pair (x,y)∈K such that x?y = x ∗ ?y ∗ with x and x ∗ as non-collinear. All unidentifiability results in this chapter rely on the rotational ambiguity phenomenon and that yields the existence of many unidentifiable signal pairs, given just one unidentifiable pair. This ambiguity is illustrated by the following simple constructive example. Let (x 1 ,y 1 )∈K = R m ×R n denote an unidentifiable signal pair. Thus, there exists a signal pair (x 2 ,y 2 )∈K such thatx 1 andx 2 are non-collinear andx 1 ?y 1 =x 2 ?y 2 =z 0 ∈R m+n−1 . Now consider the 51 parameterized vectorsx 0 1 ,x 0 2 ∈R m andy 0 1 ,y 0 2 ∈R n defined as x 0 1 =x 1 cosθ−x 2 sinθ, y 0 1 =y 1 sinφ−y 2 cosφ, (3.1a) x 0 2 =x 1 cosφ−x 2 sinφ, y 0 2 =y 1 sinθ−y 2 cosθ, (3.1b) where θ6=φ (mod π) are the parameters. Clearly,x 0 1 andx 0 2 are non-collinear since θ6=φ (mod π) andx 1 andx 2 are non-collinear. Simple algebraic manipulations reveal that x 0 1 ?y 0 1 =x 1 ?y 1 cosθ sinφ +x 2 ?y 2 sinθ cosφ−x 2 ?y 1 sinθ sinφ−x 1 ?y 2 cosθ cosφ =z 0 sin(θ +φ)−x 2 ?y 1 sinθ sinφ−x 1 ?y 2 cosθ cosφ (3.2a) and, x 0 2 ?y 0 2 =x 1 ?y 1 cosφ sinθ +x 2 ?y 2 sinφ cosθ−x 2 ?y 1 sinφ sinθ−x 1 ?y 2 cosφ cosθ =z 0 sin(θ +φ)−x 2 ?y 1 sinθ sinφ−x 1 ?y 2 cosθ cosφ. (3.2b) Since (3.2a) and (3.2b) have the same r.h.s., both (x 0 1 ,y 0 1 ) and (x 0 2 ,y 0 2 ) are unidentifiable signal pairs inK. Since (θ,φ)∈ [0,π) 2 describes a two dimensional parameter space, (3.2) implies that the unidentifiable subset ofK is at least two dimensional. We note that this rotational ambiguity phenomenon is different from (and a generalization of) the well known shift ambiguities of blind deconvolution. In fact, the rotational ambiguity phenomenon can be interpreted as a sophisticated side-effect of the shift ambiguity phenomenon. Furthermore, the intuition from the illustrative example above also carries over to the canonical-sparsity constrained blind deconvolution problem that we treat later in the paper. 3.1.2 Contributions and Organization In this chapter, we quantify unidentifiability for certain families of noiseless sparse blind and semi-blind deconvolution problems under a non-asymptotic and non-statistical setup. Specifically, given model orders m,n∈Z + , we investigate some application driven choices of the separable domain restriction (x ∗ ,y ∗ )∈K⊆ R m ×R n and establish that the vectorsx ∗ andy ∗ cannot be uniquely determined from their linearly convolved resultant vectorz ∗ =x ∗ ?y ∗ , even up to scaling ambiguities. The specific application we consider, to motivate our choices in Section 3.5, is multi-hop sparse channel estimation for relay assisted communication. Our focus is on an algorithm independent identifiability analysis and hence we shall not examine efficient/polynomial-time algorithms, but rather show impossibility results. Towards developing a principled approach to studying these negative results, we cast blind deconvolution as a rank-one matrix recovery problem using the lifting technique and study the rank-two null space of the resultant linear operator. We primarily use algebraic proof techniques, but are limited by the parametrizability of this rank-two null space. To mitigate this, we resort to measure theoretic arguments to reason only about dimension-wise significant subsets of this rank-two null space, while still using algebraic techniques. Except Theorem 3.3, which is a purely algebraic result, all other results make use of this hybrid proof technique. Our approach leads to the following novelties. 1. We are able to analyze unidentifiability in blind deconvolution by studying the rank constrained null space of a linear operator on matrices. We explicitly demonstrate the almost everywhere unidentifiable nature of unconstrained blind deconvolution by constructing families of adversarial signal pairs for known even model orders m and n. This is the content of Theorem 3.6 in Section 3.4.1 and is a much stronger unidentifiability result than any counterparts in the literature. 2. We show that sparsity in the canonical basis is not sufficient to ensure identifiability, even in the presence of perfect model order information. Specifically, given any canonical-sparse signal pair (x ∗ ,y ∗ )∈K⊆R m ×R n with known support set, we construct non-zero dimensional unidentifiable subsets of the domainK as evidence and compute a non-zero lower bound on the dimension of such an unidentifiable subset. This is the content of Theorems 3.7 and 3.8 in Section 3.4.2. 52 3. We state and prove a dimension-wise tight, non-linear, recursive characterization of the rank-two null space of the lifted linear convolution map. To the best of our knowledge, this is a new result. Because of the simplicity of the linear convolution map, a subset of this rank-two null space can be parametrized to yield an analytically useful representation that is instrumental in the derivation of our results. Characterizing the rank-two null space is useful for a variety of applications, including the study of identifiability scaling laws in bilinear inverse problems (see Chapter 2). This is the content of Section 3.3; in particular, Theorem 3.3. Furthermore, our proofs are constructive and demonstrate the rotational ambiguity phenomenon in general bilinear inverse problems. For blind deconvolution, the rotational ambiguity phenomenon is the reason for the existence of a large dimensional set of unidentifiable input pairs. 4. We consider a theoretical abstraction to a multi-hop channel estimation problem and extend our unidentifiability results to this setting. Specifically, we show that other types of side information like repetition coding or geometric decay for the unknown vectors are still insufficient for identifiability. This is the content of Corollaries 3.12 to 3.15 in Section 3.5. The rest of the chapter is organized as follows. The remainder of Section 3.1 reviews related literature and introduces notation and the notion of dimension used in the rest of the chapter. Section 3.2 describes the system model, sets up the notion of identifiability up to a suitably defined equivalence class, and presents the lifted reformulation of the blind deconvolution problem as a rank-one matrix recovery problem. Section 3.3 characterizes the rank-two null space of the lifted linear convolution map, exploiting both parametric as well as recursive definitions. Section 3.4 presents the key unidentifiability results for both non-sparse and canonical sparse cases exploiting the parametric representation of the rank-two null space of the lifted linear convolution map. Section 3.5 extends the techniques and results in Section 3.4 to more general families of subspace based structural priors, including repetition coding and geometrically decaying canonical representations. Section 3.6 concludes the chapter. Detailed proofs of all the results in the chapter appear in the Sections 3.7.1-3.7.14. 3.1.3 Related Work Prior research on blind system and channel identification [21], [27], [35] has mainly focused on single-in- multiple-out (SIMO) and multiple-in-multiple-out (MIMO) systems, also known as the blind multi-channel finite-impulse-response (FIR) estimation problem. These systems have multiple output channels that need to be estimated from the observed outputs when the channels are driven by either a single source (SIMO system) or multiple sources (MIMO system). A key property necessary for successful identifiability and recovery of the multiple channel vectors is that the channels should display sufficient diversity or richness either stochastically (cyclostationary second order statistics) [21], [38], [39] or deterministically (no common zero across all channels) [35], [40]–[43]. With the exception of non-Gaussian i.i.d. sources [102], such diversity is generally unavailable in the single-in-single-out (SISO) systems [18], [38] thus making them extremely challenging. However, SISO systems are our primary concern due to their equivalence to blind deconvolution problems. In this chapter, we shall not be concerned with stochastic formulations where channel taps are assumed to be drawn from a distribution. Instead, we shall consider sparse and non-sparse channel instances and characterize their instance identifiabilities. When blind deconvolution is treated as a channel estimation problem, [44] shows that guard intervals of sufficient length between blocks of transmitted symbols enable successful blind channel identification, even for the deterministic SISO system type. More specifically, for an n length channel impulse response (CIR) a guard interval of length n− 1 is needed between consecutive blocks of transmitted source symbols, resulting in an absence of any inter-block or inter-symbol interference (ISI) in the convolved output of the channel. We consider scenarios involving ISI and show unidentifiability w.r.t. a wide variety of domain restrictions. Although our definition of identifiability is similar to that in [44], our results are substantially different in nature. Firstly, we deal with real fields (not algebraically closed) and with sparse subsets of real vector spaces (induced by measures that are not absolutely continuous w.r.t. the Lebesgue measure) as opposed to the algebraically closed complex fields and vector spaces considered in [44]. This requires different proof techniques. Secondly, we focus on unidentifiability results and characterize the 53 dimension of the unidentifiable subsets (albeit semi-heuristically through free parameters) in addition to showing existence results. Furthermore, our proofs constructively show unidentifiable subsets as opposed to non-constructive existence proofs. However, considering that the differential geometry inspired approach adopted in [44] is closely tied to the notion of dimension, we suspect (although not explicitly pursued in [44]) that a simple free parameter counting heuristic could be applied to interpret the results therein and arrive at a dimensional characterization of the unidentifiability in non-sparse blind deconvolution that would bear similarities (but would probably be weaker) with results on the non-sparse case in this chapter. Indeed, the notion of non-singularity of the Jacobian matrix of transformation that is mentioned in the appendix of [44] is also used in some proofs in this chapter, although the transformations in question are unrelated. To the best of our knowledge, blind deconvolution was cast as a rank-one matrix recovery problem first in [23]; we adopted this framework in Chapter 2 on characterization of identifiability in general bilinear inverse problems. Herein, we specifically consider the blind deconvolution problem and characterize its inherent unidentifiability under some application-motivated sparsity and subspace constraints. Further, [23] focused on developing heuristic recovery algorithms for the deconvolution problem and did not explicitly address identifiability. The subsequent paper [25], by the authors of [23], does (implicitly) address identifiability (through a study of recoverability by convex programming), but assumes the knowledge of the support of the sparse signal. Although blind deconvolution is an instance of a bilinear inverse problem, the results in this chapter differ significantly from those in Chapter 2 in two ways. Firstly, Chapter 2 focused on identifia- bility results (analogous to achievability results in information theory) whereas the present chapter derives unidentifiability results (akin to converse/impossibility results in information theory). Secondly, the present chapter explores more explicit sparsity priors motivated by applications like cooperative communication [47], [48], channel estimation for wide-band communication [100] and musical source separation [46], whereas Chapter 2 developed the theory of regularized bilinear inverse problems more abstractly, making assumptions no more specific than non-convex cone constrained priors. A consequence of focusing specifically on blind deconvolution is that we are able to develop tight converse bounds that match our achievability bounds in Chapter 2. The tractability of this analysis hinges on the analytical simplicity of the convolution operator after lifting. During the course of this research, a generalization of the approach in [103] was developed in [104] using group theoretic ideas. The results could potentially be adapted to sparse blind deconvolution. We contrast our results with the message in [104] in the later part of this chapter. A promising identifiability analysis was proposed in [28], leveraging results from [29] on matrix factorization for sparse dictionary learning using the ` 1 -norm and ` q quasi-norm for 0 < q < 1. Their approach and formulation differ from ours in two important aspects. Firstly, we are interested in SISO systems whereas [28] deals with SIMO systems. Secondly, [28] analyzes identifiability as a local optimum to a non-convex ` 1 (or ` q for 0 < q < 1) optimization and hence is heavily dependent on the algorithmic formulation, whereas we consider the solution as the local/global optimum to the ` 0 optimization problem and our impossibility results are information theoretic in nature, implying that they hold regardless of algorithmic formulation. We emphasize that the constrained ` 1 optimization formulation in [28] is non-convex and therefore does not imply existence of provably correct and efficient recovery algorithms, despite identifiability of the channel. Although it would be interesting to try and extend their approach to SISO systems and compare with our results, this is non-trivial and beyond the scope of this dissertation. An inverse problem closely related to blind deconvolution is the Fourier phase retrieval problem [10], [85], [105] where a signal has to be reconstructed from its autocorrelation function. This is clearly a special case of the blind deconvolution problem with much fewer degrees of freedom and allows identifiability and tractable recovery with a sparsity prior on the signal [85]. A second important difference is that after lifting [1], the Fourier phase retrieval problem has one linear constraint involving a positive semidefinite matrix. This characteristic is known to be helpful in the conditioning of the inverse problem and in the development of recovery algorithms [87]. While the blind deconvolution problem does not enjoy the same advantage, this approach seems to be a good avenue to explore if additional constraints are allowed. This is a future direction of research. 54 3.1.4 A Note on the Usage of Dimension/Degrees of Freedom Intuitively, the ‘intrinsic dimension’ or ‘degrees of freedom’ of a set/space refers to the minimum number of free/independent parameters or coordinates that are necessary to specify any arbitrary point within it. This naturally leads to the idea of the parameter counting heuristic when called upon to analyze the complexity of a given object. For the purpose of this paper, we will only be interested in this intuition of parameter counting to quantify dimension or degrees of freedom. On a more rigorous note, we rely on the notion of Hausdorff dimension owing to its close relation to Lebesgue measure, but we shall not bridge the gap between parameter counting and Hausdorff dimension owing to the substantial technical machinery needed to do the same while being peripheral to the main message of the paper. A detailed treatment of Hausdorff dimension can be found in [106]. 3.1.5 Notational Conventions All vectors are assumed to be column vectors unless stated otherwise. We shall use lowercase boldface alphabets to denote column vectors (e.g.a) and uppercase boldface alphabets to denote matrices (e.g.A). The MATLAB r indexing rules will be used to denote parts of a vector/matrix (e.g.A(2 : 3, 4 : 6) denotes the sub-matrix ofA formed by the rows{2, 3} and columns{4, 5, 6}). The all zero vector/matrix (respectively all one vector) shall be denoted by 0 (respectively 1) and its dimension would be clear from the usage context. We will use symbolic shorthand representations for block matrices with the understanding that they are dimension-wise consistent upon element-wise expansion, like 0 v T v T 0 = 0 v(1) v(2) ··· v(n− 1) v(1) v(2) ··· v(n− 1) 0 . (3.3) For vectors and/or matrices, (·) T , Tr(·) and rank(·) respectively return the transpose, trace and rank of their argument, whenever applicable. Special sets are denoted by uppercase blackboard bold font (e.g.R for real numbers). Other sets are denoted by uppercase calligraphic font (e.g.S). For any setS,|S| shall denote its cardinality. Linear operators on matrices are denoted by uppercase script font (e.g.S). For any matrix M, we denote its column space byC(M). The standard Euclidean inner product on a vector space will be denoted byh·,·i and the underlying vector space will be clear from the usage context. To avoid unnecessarily heavy notation, we shall adopt the following convention: The scope of both vector variables (like x, y, u, v, etc.) as well as matrix variables (like X, Y , etc.) are restricted to individual theorems and/or proofs. Their meanings are allowed to differ across theorems and proofs (and even across disjoint subparts of the same proof when there is no risk of confusion), thus facilitating the reuse of variable names across different theorems and avoiding heavy notation. 3.2 System Model Since blind deconvolution is a type of a bilinear inverse problem, the section that describes the system model draws heavily from its counterpart Section 2.2 in the last chapter. 3.2.1 The Blind Deconvolution Problem We shall consider the noiseless linear convolution system model z =x?y, (3.4) where (x,y) denotes the pair of unknown signals with a given application specific domain restriction (x,y)∈K⊆R m ×R n , ?:R m ×R n →R m+n−1 denotes the linear convolution map, andz∈R m+n−1 is the 55 vector of observations given by z(l) = min(l,m) X j=1 x(j)y(l + 1−j), 1≤l≤n, min(l,m) X j=l+1−n x(j)y(l + 1−j), 1≤l−n≤m− 1. (3.5) We are interested in solving for the vectors x andy from the noiseless observation z in (3.4). The blind linear deconvolution problem corresponding to (3.4) is represented by the feasibility problem find (x,y) subject to x?y =z, (x,y)∈K. (P 6 ) Assuming that the model orders m and n, respectively, of vectorsx andy are fixed and known a priori, we are concerned with whether the pair (x,y) can be uniquely identified in a meaningful sense. Notice that the deconvolution problem (P 6 ) has an inherent scaling ambiguity due to the identity x?y =αx? 1 α y, ∀α6= 0, (3.6) stemming from the bilinearity of the convolution operator. Thus, any meaningful definition of identifiability for blind deconvolution must disregard this type of scaling ambiguity. This leads us to the following definition of identifiability. Definition 3.1 (Identifiability). A vector pair (x,y)∈K⊆R m ×R n is identifiable with respect to the linear convolution map?, if∀(x 0 ,y 0 )∈K⊆R m ×R n satisfyingx?y =x 0 ?y 0 ,∃α6= 0 such that (x 0 ,y 0 ) = αx, 1 α y . This definition is in the same spirit as the notion of noise free identifiability described in [44], but restricted to a setK⊆R m ×R n . It is easy to see that Definition 3.1 induces an equivalence structure on the set of identifiable pairs inK, and identifiability refers to the identification of the equivalence class inK that generated the observationz in (3.4). For future reference, we define the equivalence relation Id R :K×K→{0, 1} as follows. Given any (x,y), (x 0 ,y 0 )∈K, Id R (x,y), (x 0 ,y 0 ) = 1 if and only if∃α6= 0 such that (x 0 ,y 0 ) = αx, 1 α y . It is straightforward to check that Id R (·,·) is indeed an equivalence relation. LetK/Id R denote the set of equivalence classes ofK induced by Id R (·,·), and for any (x,y) ∈ K let [(x,y)] ∈ K/Id R denote the equivalence class containing (x,y). Then Definition 3.1 amounts to declaring a vector pair (x,y)∈K as identifiable if and only if every (x 0 ,y 0 )∈K with [(x 0 ,y 0 )]6= [(x,y)] satisfiesx?y6=x 0 ?y 0 . Remark 3.1. The setK is introduced to capture application specific constraints on the signal pair (x,y). We considerK to be a separable product of cones inR m ×R n , i.e. there exist setsD 1 ⊆R m andD 2 ⊆R n such thatK =D 1 ×D 2 and for every α> 0, (x,y)∈K implies that (αx,αy)∈K. Such a separability ofK is motivated by the observation that in many applications of interestx andy are unrelated (likex andy may respectively represent the unknown source and the unknown channel in a blind channel estimation problem). Throughout this chapter, we shall solely consider separable domainsK. 3.2.2 Lifting While Problem (P 6 ) is an accurate representation of a blind deconvolution problem, it is not easily amenable to an identifiability analysis in the sense of Definition 3.1. We use the lifting technique from optimization [1] to rewrite Problem (P 6 ) as a rank minimization problem subject to linear equality constraints [23], [107]; a 56 form that is better suited for an identifiability analysis, minimize W rank(W ) subject to S (W ) =z, W∈W, (P 7 ) whereW⊆R m×n is any set satisfying W \ W∈R m×n rank(W )≤ 1 = xy T (x,y)∈K , (3.7) andS :R m×n →R m+n−1 is a linear operator which can be deterministically constructed from the linear convolution map. We shall refer toS as the lifted linear convolution map. Specifically,S (·) is the unique linear operator that satisfies S xy T =x?y, ∀(x,y)∈R m ×R n . (3.8) By construction, the optimal solution to Problem (P 7 ) is a rank-one matrixW opt and let its singular value decomposition beW opt =σ opt u opt v T opt . Assuming thatK is closed under the reciprocal scaling transformation (x,y)7→ (αx,y/α),W opt yields a solution (x,y) opt = √ σ opt u opt , √ σ opt v opt to Problem (P 6 ). The exact steps involved in the lifting technique and a proof of equivalence between the lifted and the original problems, in the much broader context of bilinear inverse problems, were discussed in the last chapter. Our unidentifiability results in Section 3.4 will be based on an analysis of Problem (P 7 ). Remark 3.2. It is well known from functional analysis [108] that any finite dimensional linear operation can be decomposed into a set of inner product operations that collectively define the linear operation. The lifted linear convolution mapS (·) can be decomposed into a functionally equivalent set, comprising of (m +n− 1) matrices, using coordinate projections. Let S j ∈ R m×n denote the j th matrix in the decomposition and φ j :R m+n−1 →R denote the j th coordinate projection operator of (m +n− 1) dimensional vectors to scalars, i.e. ifz∈R m+n−1 then φ j (z) =z(j), for 1≤j≤m +n− 1. For all (x,y)∈R m ×R n , we have the relation φ j (x?y) =φ j ◦S xy T = S j ,xy T =x T S j y, ∀1≤j≤m +n− 1, (3.9) whereh·,·i denotes the trace inner product in the space of matricesR m×n . By way of exposition, we compute S 3 for (m,n) = (3, 4). Let (x,y)∈R 3 ×R 4 . Invoking the expansion in (3.5), we have φ 3 (x?y) = 3 X j=1 x(j)y(4−j) = x(1) x(2) x(3) 0 0 1 0 0 1 0 0 1 0 0 0 y(1) y(2) y(3) y(4) =x T S 3 y. (3.10) The matrixS 3 ∈R 3×4 is given by S 3 (k,l) = ( 1, k +l = 4, 0, otherwise, (3.11) for 1≤k≤ 3 and 1≤l≤ 4. In general, it is not hard to see that the matricesS j , 1≤j≤m +n− 1, are Hankel matrices in{0, 1} m×n ⊂R m×n specified as S j (k,l) = ( 1, k +l =j + 1, 0, otherwise, (3.12) for 1≤k≤m, and 1≤l≤n. Figure 3.1 illustrates these matrices forming the decomposition corresponding to the lifted linear convolution operatorS (·) for the case of (m,n) = (3, 4). 57 Linear Convolution Map 1 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 S 2 S 3 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 S 1 S 4 0 0 0 0 0 0 0 1 0 0 1 0 S 5 0 0 0 0 0 0 0 0 0 0 0 1 S 6 Figure 3.1: Lifted matricesS k ∈R m×n for linear convolution map withm = 3,n = 4 and 1≤k≤m +n− 1. Symbols 51 1 1 1 2 1 3 1 4 2 1 2 2 2 3 2 4 3 1 3 2 3 3 3 4 1 2 3 4 5 6 Figure 3.2: Illustration of the anti-diagonal sum interpretation of the lifted linear convolution operatorS (·) for (m,n) = (3, 4) satisfyingS (W ) =z. The shorthand w kl =W (k,l) has been used with 1≤k≤m = 3 and 1≤l≤n = 4. 3.2.3 Anti-Diagonal Sum Interpretation We illustrate an interpretation of the lifted linear convolution operator that forms a key step in the proofs of our results in Section 3.3 and has its origins in the shift ambiguities associated with blind deconvolution. Let z∈ R m+n−1 be the result of applying the lifted convolution map S (·) to the matrix W ∈ R m×n , i.e.S (W ) = z and consider the case (m,n) = (3, 4) in Figure 3.2 as an example. It is clear that z(j) is formed by adding all elements ofW that lie on the j th arrow (the j th anti-diagonal) in Figure 3.2. In other words, the linear operatorS (·) sums elements of its argument along the anti-diagonals to generate the output. From the definition of linear convolution in (3.5), it is easy to see that this interpretation of the lifted convolution operator holds regardless of the numerical values of the dimensions m and n. As an immediate consequence, we make the following observation (also an important part of the proofs to the results in Section 3.3). LetW 0 ∈R (m−1)×(n−1) be any arbitrary matrix and we consider the matrices W 1 ,W 2 ∈R m×n constructed as W 1 = 0 W 0 0 0 T , W 2 = 0 T 0 W 0 0 . (3.13) Clearly, the elements ofW 2 along the j th anti-diagonal are a circularly shifted copy of the elements ofW 1 along the j th anti-diagonal, i.e.W 2 is formed by shifting down the elements ofW 1 by one unit along the anti-diagonals. In particular, this implies that the sum of elements along the j th anti-diagonal is the same for bothW 1 andW 2 . Using the sum along anti-diagonals interpretation of the lifted linear convolution map S (·), we have S (W 1 ) =S (W 2 ) = 0 S 0 (W 0 ) 0 , (3.14) 58 whereS 0 :R (m−1)×(n−1) →R m+n−3 denotes the lifted linear convolution map in one lower dimension than S (·) w.r.t. both rows and columns. Thus,W 1 andW 2 are indistinguishable under the action of the linear operatorS (·). 3.3 Parameterizing the Rank-Two Null Space Suppose thatS (·) is the lifted linear operator corresponding to the linear convolution map. We denote the rank-two null space ofS (·) byN (S, 2) which is defined as N (S, 2), Q∈R m×n rank(Q)≤ 2,S (Q) = 0 . (3.15) In this section, we establish a partially parametric and partially dimension recursive characterization of N (S, 2) that is critical to our subsequent results in Section 3.4. We first establish Lemma 3.1 that describes a subset ofN (S, 2). Proposition 3.2 shows existence of subsets ofN (S, 2) not covered by Lemma 3.1. Theorem 3.3 then provides an almost everywhere characterization ofN (S, 2). Finally, Proposition 3.4 shows existence of subsets ofN (S, 2) that are not covered by Theorem 3.3. We shall make use of the fact that the rank-one null space of the linear convolution operator is trivial, i.e. N (S, 1), Q∈R m×n rank(Q)≤ 1,S (Q) = 0 ={0}, (3.16) which follows from interpreting convolution as polynomial multiplication, since the product of two real polynomials is identically zero if and only if at least one of them is identically zero. Lemma 3.1. Let m,n≥ 2 andQ∈R m×n admit a factorization of the form Q = u 0 0 −u 0 v T v T 0 , (3.17) for somev∈R n−1 andu∈R m−1 . ThenQ∈N (S, 2). Proof. LetQ admit a factorization as in (3.17). Then, Q = 0 uv T 0 0 T | {z } Q1 + 0 T 0 −uv T 0 | {z } Q2 (3.18) Clearly,Q 2 in (3.18) is obtained by shifting down the elements ofQ 1 by one unit along the anti-diagonals and then flipping the sign of each element. Since the convolution operatorS (·) sums elements along the anti-diagonals (see Figure 3.2 for illustration and Section 3.2.3 for details), the representation ofQ as in (3.18) immediately implies thatS (Q) = 0. Since (3.17) implies that rank(Q)≤ 2 so we haveQ∈N (S, 2). We notice thatS (·) mapsR m×n toR m+n−1 . On counting the number of free parameters in the singular value decomposition, am×n size rank-two matrix can be observed to have 2(m +n− 2) free parameters [109], so thatN (S, 2) has at most (2m+2n−4)−(m+n−1) = (m+n−3) free parameters. Since the representation on the r.h.s. of (3.17) also has (m +n− 3) free parameters andN (S, 1) ={0}, our parametrization is dimension-wise tight. However, the converse of Lemma 3.1 is false in general, as we show in Proposition 3.2 below. This implies that while unidentifiability results for blind deconvolution can be shown by proving existence of matrices satisfying (3.17), proof of identifiability results for deterministic input signals, on the other hand, requires substantially more mathematical effort and careful analysis on the subset ofN (S, 2) that is not element-wise representable in the form of (3.17). Proposition 3.2. For m,n≥ 3, there exists a (m +n− 3) dimensional setM such thatM⊂N (S, 2)⊂ R m×n and for everyM∈M,M is not representable in the form of (3.17). Proof. Section 3.7.1. 59 Figure 3.3: Venn diagram displaying the subset/superset relationships betweenN S (m,n) , 2 ,N 0 (m,n), N 2 (m,n) andM(m,n) as indicated by Theorem 3.3 for m,n≥ 3. We are ready to state an almost everywhere description ofN (S, 2) for the lifted linear convolution map S :R m×n →R m+n−1 . Our description will involve a recursion over dimension. To avoid any confusion, we shall explicitly augment the dimension to the symbolic representation of the lifted operator. Thus, the lifted linear convolution mapS :R m×n →R m+n−1 will be denoted byS (m,n) and the associated rank-two null space will be denoted byN S (m,n) , 2 . For m,n≥ 2, we define the following sets M(m,n), Q∈N S (m,n) , 2 Q(m, 1) =Q(1,n) = 0 , (3.19) N 0 (m,n), u 0 0 −u 0 v T v T 0 u∈R m−1 ,v∈R n−1 , (3.20) N 2 (m,n), u 0 0 u ∗ 0 v T v T ∗ 0 uv T +u ∗ v T ∗ ∈N S (m−1,n−1) , 2 \{0} . (3.21) NotethatProposition3.2effectivelyimpliesthatbothN 0 (m,n)andN S (m,n) , 2 \N 0 (m,n)aredimension- wise equally large sets. The setQ can be taken to beN 2 (m,n). Theorem 3.3. The following relationships hold. 1) N S (1,n) , 2 ={0} andN S (m,1) , 2 ={0} for all positive integers m,n. 2) N S (m,2) , 2 =N 0 (m, 2) andN S (2,n) , 2 =N 0 (2,n) for integers m,n≥ 2. 3) N S (m,n) , 2 ⊇N 0 (m,n) S N 2 (m,n) for integers m,n≥ 3. 4) N S (m,n) , 2 \M(m,n) = N 0 (m,n) S N 2 (m,n) \M(m,n) for integers m,n≥ 3. Proof. Section 3.7.2 provides the complete proof. The proof depends on Lemma 3.1. The setsN S (2,n) , 2 =N 0 (2,n) andN S (m,2) , 2 =N 0 (m, 2) are explicitly parametrized∀m,n≥ 2. In principle, to construct elements fromN 2 (m,n) for arbitrary m,n≥ 3 with n≥ m, we can start from the explicitly parameterized setN S (2,n−m+2) , 2 =N 0 (2,n−m + 2) and apply (3.21) recursively (m− 2) times. Figure 3.3 shows a Venn diagram displaying relationships among the different sets in Theorem 3.3. We note that Theorem 3.3 proves equality of the setsN S (m,n) , 2 andN 0 (m,n) S N 2 (m,n) when restricted toM(m,n) c . In Proposition 3.4 below, we show that this restriction is necessary when m = n≥ 3. For 60 m,n≥ 3 andm6=n, as per the definition in (3.19), it is trivially clear thatM(m,n) is a non-empty set since (using the definition in (3.20) and Lemma 3.1) it contains the matrices Q = u 0 0 −u 0 0 0 v T 0 v T 0 0 ∈N 0 (m,n) (3.22) foru∈R m−2 andv∈R n−2 . However, it is presently unclear whetherM(m,n)\ N 0 (m,n) S N 2 (m,n) is a non-empty set. A related but different question is this context is whether∃Q∈M(m,n) that simultaneously satisfiesQ(:, 1)6= 0,Q(:,n)6= 0,Q(1, :)6= 0 T andQ(m, :)6= 0 T . We leave this as an open question. Proposition 3.4. For n≥ 3,M(n,n)\ N 0 (n,n) S N 2 (n,n) is a non-empty set of dimension at least (n− 1). Proof. Section 3.7.3. While we shall only use Lemma 3.1 in the sequel, we briefly emphasize the importance of the other results in this section. Proposition 3.2 and Theorem 3.3 supply the intuition that the rank-two null space of the linear convolution map (which represents all possible ambiguities in blind deconvolution) is a substantially complex geometrical object. Roughly speaking, the rank-two null spaceN S (m,n) , 2 can be divided into exactly two disjoint dimension-wise significant parts, viz.N 0 (m,n) andN 2 (m,n), and a dimension-wise insignificant (not necessarily disjoint fromN 0 (m,n) S N 2 (m,n)) partM(m,n). Both these parts are equally significant since they are of the same dimension, and whatever part ofN S (m,n) , 2 is outsideN 0 (m,n) S N 2 (m,n), is contained withinM(m,n) (see Figure 3.3 for illustration) and is therefore dimension-wise insignificant. Hence, we claim that Theorem 3.3 provides an almost everywhere characterization ofN S (m,n) , 2 . The setM(m,n) is dimension-wise insignificant sinceM(m,n)⊆N S (m,n) , 2 but according to (3.19),M(m,n) must satisfy two more constraints than required byN S (m,n) , 2 , namely Q(m, 1) = Q(1,n) = 0,∀Q ∈ M(m,n). Since the number of free parameters inN S (m,n) , 2 is (m +n− 3), the dimension ofM(m,n) is atmost (m +n− 5). Of the two dimension-wise significant parts, Lemma 3.1 gives an analytically simple parametrization for one part, namelyN 0 (m,n). The other part, namelyN 2 (m,n), admits a dimension recursive definition. At this time, an analytically simple parametrization ofN 2 (m,n) analogous toN 0 (m,n) remains elusive. These results hint that development of provably correct non-randomized coding strategies promoting signal identifiability under blind deconvolution may need to be quite sophisticated. In particular, the codes must disallow ambiguities arising from the recursive definition ofN 2 (m,n). For provably correct simple randomized linear coding strategies like [25], Proposition 3.2 hints that a coding redundancy of Θ(m +n) is intuitively necessary to prevent bad realizations of random codes with high probability. Proposition 3.4 lower bounds the dimension of the insignificant setM(m,n) in the special case of m =n. This might be interpreted as the additional ambiguity arising from having to distinguish between equi- dimensional vectorsx,y∈R n under a bilinear operator like linear convolution; since (x,y) being a solution to the unconstrained blind deconvolution problem also implies (y,x) as a solution. This ambiguity does not arise whenx∈R m ,y∈R n and m6=n. 3.4 Main Unidentifiability Results We shall use identifiability in the sense of Definition 3.1. Lemma 3.5 states a re-parameterization result that will be used extensively in the proofs of the results in this section. We show a strong unidentifiability result for non-sparse blind deconvolution in Section 3.4.1. The proof strategy yields valuable insight for the sparsity constrained blind deconvolution identifiability results that we present in Section 3.4.2. Throughout this section, we assume thatK represents the (not necessarily convex) feasible cone in Problem (P 6 ), i.e.∀(x,y)∈K one has (αx,αy)∈K for every α6= 0, butK is allowed to change from theorem to theorem. To prove the results in this section, Lemma 3.1 suffices and we do not need the full power of Theorem 3.3. Further, 61 to be consistent with the notation for canonical-sparse cones in (3.25), we denote the set of unconstrained non-pathological d dimensional vectors by K(∅,d), w∈R d w(1)6= 0,w(d)6= 0 . (3.23) Lemma 3.5. Let d≥ 4 be an even integer andw∈R d be an arbitrary vector. Then, there exists a vector w ∗ ∈R d−1 and a scalar φ∈R such that w = w ∗ 0 0 −w ∗ cosφ sinφ . (3.24) Proof. Section 3.7.4. We note that unidentifiability is ultimately a result of ambiguities, some of which may be trivial. For blind linear deconvolution, the scaling and shift ambiguities are trivial and well known. While the scaling ambiguity has already been eliminated by an appropriate notion of identifiability in Definition 3.1, the shift ambiguities will be eliminated by requiring the feasible setK to be a separable product of cones from the familyK 0 (Λ,d) defined in (3.25). This implies that model orders m and n are known, which is critical for identifiability in SISO systems [110]. Theorems 3.6 and 3.8 explore additional ambiguities in blind deconvolution that are non-trivial. 3.4.1 Non-sparse Blind Deconvolution Theorem 3.6. Let m,n≥ 4 be even integers andx∈K(∅,m) be an arbitrary vector. LetK =K(∅,m)×R n be the feasible set. Then (x,y)∈K is unidentifiable almost everywhere w.r.t. any measure over y that is absolutely continuous w.r.t. the n dimensional Lebesgue measure. Proof. The proof is presented in Section 3.7.5. The construction of the vector pair (x 0 ,y 0 ) in (3.66) from the representation of the vector pair (x,y) in (3.64) utilizes the rotational ambiguity inherent in blind deconvolution (see Section 3.1.1). Theorem 3.6 shows that unidentifiability is the norm rather than the exception. In fact, the number of identifiable pairs is insignificant over continuous distributions. The theorem could also be stated for the feasible setK(∅,m)×K(∅,n) with essentially no modification of the proof strategy. Proposition 3.2 (and the discussion leading to it) intuitively suggests that Θ(m) additional constraints onx in Problem (P 6 ) are almost necessary for identifiability w.r.t.K =K(∅,m)×R n , from a heuristic free parameter counting argument. In fact, Θ m log 3 m additional random linear constraints are assumed for the theoretical proofs in [25]. If these additional constraints on x∈K(∅,m) are allowed, then it would turn Problem (P 6 ) into a semi-blind deconvolution problem. Theorem 3.6 provides somewhat more concrete evidence towards the same intuition. Theorem 3.8 in Section 3.4.2 below also supports the above intuition for sparse blind deconvolution, where the sparsity prior introduces additional constraints into Problem (P 6 ). Our prior work [11] discusses an example for multi-hop channel estimation, where m− 1 additional subspace constraints are imposed onx∈R m by system design, and this not only leads to identifiability but also to efficient and provably correct recovery algorithms. A philosophically similar situation is discussed in [44], [111] using a guard interval based system for blind channel identification. It is also instructive to compare Theorem 3.6 to the identifiability results for blind circular deconvolution under generic subspace and sparsity constraints in [104]. Since circular convolution is related to linear convolution by periodic extension of one of the two input signals (say the signal with ambient dimension m), Theorem 3.6 also says that blind circular deconvolution does not get any easier even with the constraint that only the first m out of the 2m coefficients can be non-zero. However, if the subspace is chosen generically as in [104] (from a distribution with intrinsic dimension strictly larger than m) then elementary probabilistic/dimensional intuition would suggest that the set of unidentifiable signals is avoided with probability one, which is the essence of the results in [104]. 62 0 2 4 6 8 10 12 14 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Figure 3.4: An arbitrary vectors∈K 0 (Λ,d) with Λ ={3, 4, 7, 8, 9, 12} and d = 14. Everys∈K 0 (Λ,d) is zero on the index set Λ (indicated by blue dots). Heights of the black dashed stems, indicating values on Λ c , can vary across different vectors inK 0 (Λ,d). We note that Theorem 3.6 states a much stronger (almost everywhere w.r.t. Lebesgue measure) uniden- tifiability result than any counterparts in the literature (e.g. [28]) which only assert existence of some unidentifiable input. The requirement of the model orders m and n being even positive integers is becauseR is not algebraically closed. A weaker version of Theorem 3.6, only asserting the existence of an unidentifiable signal pair inK =K(∅,m)×R n , follows readily from Theorem 3.7 in Section 3.4.2, and indeed does not require m or n to be even integers. This observation agrees with the absence of any conditions on the model orders in [28]. We mention in passing that the statement of Theorem 3.6 is asymmetric w.r.t.x andy since it applies to everyx but not to everyy (only almost everyy). This is slightly stronger than the measure theoreti- cally symmetric version of Theorem 3.6 that asserts unidentifiability almost everywhere w.r.t. the (m +n) dimensional Lebesgue product measure over the pair (x,y). We further note that Theorem 3.6 cannot be strengthened to assert unidentifiability for every pair (x,y)∈K as a result of a simple thought experiment. If such a statement were true, then reinterpreting convolution as polynomial multiplication implies that there cannot exist any polynomials over the real field that admit exactly two polynomial factors over the reals; which is clearly a false statement. The unidentifiability results in this subsection are an attempt to quantify the ill-posedness of non-sparse blind deconvolution. The main message is that if an application exhibits a bilinear observation model of linear convolution and no additional application specific structure can be imposed on the unknown variables then it is necessary to drastically revise the system design specifications. Such revision may incorporate some form of randomized precoding of the unknowns, so that the effective bilinear operator governing the observation model looks substantially different from the convolution operator (e.g. the Gaussian random precoding used in [25]). Alternatively, sparsity in non-canonical bases could also be helpful (e.g. Rademacher random vector signal model used in [103]), as canonical-sparsity constraints are shown to be insufficient for identifiability in the following subsection. 3.4.2 Canonical-Sparse Blind Deconvolution In the presence of a sparsity prior ony∈R n , Theorem 3.6 does not apply anymore, since a sparsity prior is necessarily generated from a measure that is not absolutely continuous w.r.t. the n dimensional Lebesgue measure. Assuming that the sparsity prior is w.r.t. the canonical basis, we first prove a weak negative result in Theorem 3.7 below asserting the existence of unidentifiable inputs. Building on these proof ideas, Theorem 3.8 is presented which strengthens the conclusions of Theorem 3.7. For any integer d≥ 3 and index set Λ⊆{2, 3,...,d− 1}, we define the canonical-sparse domain K 0 (Λ,d), w∈R d w(1)6= 0,w(d)6= 0,w(Λ) = 0 . (3.25) 63 We have intentionally imposedw(1)6= 0 andw(d)6= 0 for everyw∈K 0 (Λ,d) in our generic definition of a canonical-sparse domainK 0 (Λ,d)⊂R d in (3.25), in order to exclude all pathological examples. Theorem 3.7. Let m,n≥ 5 be arbitrary integers. For any given index sets∅6= Λ 1 ⊆{3, 4,...,m− 2} and∅6= Λ 2 ⊆{3, 4,...,n− 2}, letK =K 0 (Λ 1 ,m)×K 0 (Λ 2 ,n). Then there exists an unidentifiable pair (x,y)∈K. Proof. Section 3.7.6. Asavisualizationaid, anarbitraryvectorsinthecanonical-sparseconeK 0 (Λ,d) =K 0 ({3, 4, 7, 8, 9, 12}, 14) is shown in Figure 3.4. A couple of comments about the premise of Theorem 3.7 are in order. Let (x ∗ ,y ∗ )∈K denote an unidentifiable pair in the feasible set of Theorem 3.7. 1. Our proof technique hinges on an adversarial construction ofx ∗ Λ 1 S (Λ 1 − 1) andy ∗ Λ 2 S (Λ 2 − 1) . By feasibility of (x ∗ ,y ∗ ),{x ∗ (1),x ∗ (m),y ∗ (1),y ∗ (n)}63 0 needs to be satisfied, which might conflict with our adversarial construction if{1,m} T Λ 1 S (Λ 1 − 1) 6=∅ or{1,n} T Λ 2 S (Λ 2 − 1) 6=∅ holds. Clearly, both of these possibilities are rendered impossible, if we insist on Λ 1 ⊆{3, 4,...,m− 2} and Λ 2 ⊆{3, 4,...,n− 2}. 2. We insist on Λ 1 6=∅ and Λ 2 6=∅ to have a strictly non-trivial realization of the sparse blind deconvolution problem, i.e. an instance which violates some assumption of Theorem 3.6 other than model orders m andn being even. It is easy to see that if Λ =∅ in (3.25) thenK 0 (Λ,d) admits a non-zerod dimensional Lebesgue measure. Hence, if either Λ 1 or Λ 2 is empty, then the sparse blind deconvolution problem instance so generated will fall under the purview of Theorem 3.6 (assuming evenm andn). Furthermore, Λ 1 6=∅ and Λ 2 6=∅ respectively imply m≥ 5 and n≥ 5. Exploiting the equivalence between bilinear inverse problems and rank-one matrix recovery problems (see Theorem 2.1 in Chapter 2), the result of Theorem 3.7 can be interpreted as evidence of the null space of the convolution operator admitting simultaneously sparse and low-rank matrices. For rank-one matrix completion problems [5], it is relatively straightforward to see that a random sampling operator on a sparse rank-one matrix will return zeros on most samples, thus rendering it impossible to distinguish the rank-one matrix in question from the all zero matrix. However, the same observation is not at all straightforward for a rank-one matrix recovery problem [70] when the sampling operator is fixed to the lifted linear convolution map. Theorem 3.7 asserts that this is indeed true and the lifted linear convolution mapS (·) admits non-zero canonical-sparse matrices within its rank-two null space. While Theorem 3.7 asserts the existence of unidentifiable inputs within the feasible domainK 0 (Λ 1 ,m)× K 0 (Λ 2 ,n), it does not say anything about the size of the set of all unidentifiable inputs in this domain. Indeed, the feasible domainR m ×R n in Theorem 3.6 is essentially unstructured and therefore the set of all unidentifiable inputs is quite large (almost every input is unidentifiable). In contrast, the feasible domain of Theorem 3.7 is far more structured and hence, it is intuitive to expect that the set of all unidentifiable inputs within this domain should be much smaller. This intuition is quantified by Theorem 3.8 which essentially strengthens the conclusions of Theorem 3.7 by improving the analysis in the proof of Theorem 3.7. We recall thatK/Id R denotes the set of all equivalence classes induced by the equivalence relation Id R (·,·) onK, and Id R (·,·) is the same equivalence relation that was defined in Section 3.2.1. Let us denote the Minkowski sum of the sets{−1} and Λ, by the shorthand notation Λ− 1, defined as Λ− 1 ={−1} + Λ,{j− 1|j∈ Λ}. (3.26) The proof of Theorem 3.8 requires Lemma 3.9 which is also stated below. Theorem 3.8. Let m,n≥ 5 be arbitrary integers. For any given index sets∅6= Λ 1 ⊆{3, 4,...,m− 2} and ∅6= Λ 2 ⊆{3, 4,...,n− 2}, letK =K 0 (Λ 1 ,m)×K 0 (Λ 2 ,n) and define p j ,|Λ j S (Λ j − 1)| for j∈{1, 2}. Then there exists a setG ∗ ⊆K/Id R of dimension (m +n− 1−p 1 −p 2 ) such that every (x,y)∈G ∗ is unidentifiable. 64 Proof. Section 3.7.7. We notice that the assumptions of Theorem 3.8 imply p 1 ≤ (m− 3) and p 2 ≤ (n− 3) so that the unidentifiable subset ofK/Id R is at least 5 dimensional and is always non-trivial. We also note that the canonical-sparse feasible domain of Theorem 3.8 is far more structured than the feasible set of Theorem 3.6 and hence, the set of all unidentifiable inputs in Theorem 3.8 is dimension-wise much smaller. Nonetheless, the canonical-sparsity structure is not strong enough to guarantee identifiability for all canonical-sparse vectors. Additionally, note that Λ 1 and Λ 2 denote sets of zero indices, so that larger cardinality of Λ 1 or Λ 2 implies a sparser problem instance. Yet another way of interpreting Theorem 3.8 is that regardless of the sparse support pattern specified by sets Λ 1 and Λ 2 , there exists a sizable set of input signals satisfying these support constraints while being unidentifiable under blind linear deconvolution. Note that this does not preclude the existence of a sizable set of input signals satisfying the support constraints that are identifiable under blind linear deconvolution. Lemma 3.9. Let d≥ 2 be an arbitrary integer andw∈ w 0 ∈R d w 0 (1)6= 0,w 0 (d)6= 0 be an arbitrary vector. The quotient setQ ∼ (w,d) defined as Q ∼ (w,d), (w ∗ ,γ)∈R d−1 × [0, 2π) w = w ∗ 0 0 −w ∗ cosγ sinγ , (3.27) is finite with cardinality at most (2d− 2). Proof. Section 3.7.8. It is important to note that Theorems 3.6 and 3.8 are of different flavors and are not comparable since they make different assumptions on the feasible domain. In particular, neither theorem universally implies the other even if we consider special cases for each theorem. To make this point more explicit we make the following observations. 1. Theorem 3.6 asserts almost everywhere unidentifiability within the feasible setK(∅,m)×R n , but this does not imply the conclusions of Theorem 3.8, since the feasible set in the latter theorem is K 0 (Λ 1 ,m)×K 0 (Λ 2 ,n), which is a measure zero set w.r.t. the (m +n) dimensional Lebesgue measure associated with the Cartesian product spaceK(∅,m)×R n . 2. In Theorem 3.8, even if we assume the model orders m and n to be even and the zero index sets Λ 1 and Λ 2 to be empty (so that the feasible setK 0 (Λ 1 ,m)×K 0 (Λ 2 ,n) is equal toK(∅,m)×R n almost everywhere w.r.t. the (m +n) dimensional Lebesgue measure), we can only draw the conclusion that there exists a (m +n) dimensional unidentifiable subset (call itG unID ) of the feasible set. This is clearly insufficient to show almost everywhere unidentifiability (as claimed by Theorem 3.6) since the complement ofG unID could be a (m +n) dimensional set as well, admitting non-zero (m +n) dimensional Lebesgue measure. 3. If Λ 1 = Λ 2 =∅ is considered with even model orders m,n≥ 4 so that the feasible sets in Theorems 3.6 and 3.8 are equal almost everywhere w.r.t. the (m +n) dimensional Lebesgue measure, then the conclusion of Theorem 3.6 is stronger, since almost everywhere unidentifiability overK(∅,m)×R n automatically implies the existence of a (m +n) dimensional unidentifiable subset. 3.4.3 Mixed Extensions It is possible to fuse the ideas from Theorems 3.6 and 3.8 to develop analogous results for domainsK formed by the Cartesian product of a sparse and a non-sparse set. Below, we develop two corollaries that extend the results from Theorems 3.6 and 3.8 differently. Corollary 3.10 represents a weak unidentifiability result, extending primarily from Theorem 3.8 whereas Corollary 3.11 represents a strong unidentifiability result, implementing an extension of Theorem 3.6. Unlike Theorems 3.6 and 3.8, the corollaries below are in fact comparable (only for even model order n≥ 4) and the implications of Corollary 3.11 can be clearly seen to be stronger than those of Corollary 3.10. 65 Corollary 3.10. Let m ≥ 5 and n ≥ 2 be arbitrary integers. For any given index set ∅ 6= Λ ⊆ {3, 4,...,m− 2}, letK =K 0 (Λ,m)×R n and define p,|Λ S (Λ− 1)|. Then there exists a setG ∗ ⊆K/Id R of dimension (m +n−p− 1) such that every (x,y)∈G ∗ is unidentifiable. Proof. Section 3.7.9. Corollary 3.11. Let m≥ 5 be an arbitrary integer and n≥ 4 be an even integer. For any given index set∅6= Λ⊆{3, 4,...,m− 2}, letK =K 0 (Λ,m)×R n and define p ,|Λ S (Λ− 1)|. Then there exists a set G 0 ⊆ K 0 (Λ,m) T {w∈R m |kwk 2 = 1} of dimension (m−p− 1) such that ∀x ∈ G 0 , (x,y) ∈ K is unidentifiable almost everywhere w.r.t. any measure over y that is absolutely continuous w.r.t. the n dimensional Lebesgue measure. Proof. Section 3.7.10. Note that Corollary 3.11 is a non-trivial extension of Theorem 3.6, since Definition 3.1 defines identifiability of an input signal pair within some feasible setK and this set is different for Theorem 3.6 and Corollary 3.11. In particular, if (x ∗ ,y ∗ )∈K is an unidentifiable pair in the feasible set of Corollary 3.11, then the proof of the result involves the construction of a candidate adversarial inputx ∗ ∈K 0 (Λ,m) as in Theorem 3.8 as well as the vectory ∗ ∈R n as in Theorem 3.6. Through the unidentifiability results in this section, we have attempted to quantify the ill-posedness of blind deconvolution under canonical-sparsity constraints and expose the underlying geometric reasons for it. Roughly speaking, if the set of zero indices for the signal vector x∈R m has cardinality|Λ| then the dimension of the set of unidentifiable choices forx is reduced from m in Theorem 3.6 to somewhere between (m−|Λ|− 1) and (m− 2|Λ|) in Corollary 3.11. The main message is that if an application exhibits a bilinear observation model of linear convolution and the additional application specific structure is that of canonical sparsity of the unknown variables, then it is necessary to drastically revise the system design specifications for any hope of signal identifiability. Such revision may incorporate some form of randomized precoding of the unknowns, so that the effective bilinear operator governing the observation model looks substantially different from the convolution operator (e.g. the Gaussian random precoding used in [25]). Alternatively, sparsity in non-canonical bases could also be helpful (e.g. Rademacher random vector signal model used in Theorem 2.13 of Chapter 2 to show identifiability). 3.5 Unidentifiability Results with Coding Structural information about unknown input signals is almost exclusively a result of specific application dependant data or system deployment architectures. In this section, we consider an abstraction for a problem in multi-hop channel estimation [11], [48] and analyze other subspace based structural priors relevant to this application. Studying constrained adversarial realizations of these abstractions has utility far beyond communication systems. For example, when communication subsystems are parts of a larger complex system involving sensing and navigation for autonomous vehicles [112]–[116] then non-linear inter-system interactions may become unavoidable. In this section, we extend our unidentifiability results from Section 3.4 to simple forms of coding across the transmitters for the second-hop channel estimation problem. Section 3.5.1 considers the somewhat idealized, but simpler to interpret, scenario of repetition coding across transmitters and states unidentifiability results for both unstructured as well as canonical-sparse channels. The ideas are generalized by Corollary 3.15 in Section 3.5.2 and Corollary 3.16 states yet another important special case pertaining to geometrically decaying subspace sparse signals, previously considered in [117]. In the spirit of using commonly employed notation, we shall denote the unknown pair (x,y) by the pair (g,h) withg pertaining to a topological configuration of the network andh representing the channel impulse response as described next. In a previous paper [48], a structured channel model is developed based on the multichannel approximation for the second hop in a relay assisted communication topology shown in Figure 3.5. Specifically, we consider k relays R 1 , R 2 , ..., R k and leth∈R n denote the propagation delay and power adjusted SISO channel impulse response (CIR) common to each relay-destination R j → D channel, j∈{1, 2,...,k}. Thus, the effective CIR 66 Figure 3.5: Two hop relay assisted communication link topology between the source S and the destination D with k intermediate relays R 1 , R 2 , ..., R k . The j th SISO channel is scaled byg(l j ) and delayed by l j units (represented asg(l j )D −lj h) for 1≤j≤k. at the destination is formed by a superposition of k distinct delayed scalar multiples of the vectorh. If the maximum propagation delay of R k relative to R 1 is upper bounded by m, then there exists a vectorg∈R m and an index subset{l 1 ,l 2 ,...,l k }⊆{1, 2,...,m} (with l 1 = 1) such that the propagation delay adjusted SISO CIR for the R j → D channel isg(l j )h and thatg(l) = 0,∀l6∈{l 1 ,l 2 ,...,l k }. With these definitions and lettingD −l ∈{0, 1} (m+n−1)×n denote the Toeplitz matrix representation for delay by l units∀1≤l≤m, the effective CIR at the destination is given by the linear convolutionz = P k j=1 g(l j )D −lj h =g?h∈R m+n−1 and the channel estimation problem at the destination is to recover the vector pair (g,h) from the observation ofz. Since the vectorsg andh are not arbitrary, but have physical interpretations leading to structural restrictions, we can incorporate this knowledge by a constraint of the form of (g,h)∈K for some setK⊆R m ×R n . For example, ifh represents a SISO CIR for underwater acoustic communication [99], [100] (or, more generally, for wide-band communication [118]), thenh has been observed to exhibit canonical-sparsity. Now, associating the pair (g,h) with a feasible solution to Problem (P 6 ), the success of the channel estimation problem at the destination is critically contingent upon the identifiability of (g,h) as a solution to Problem (P 6 ) by the criterion in Definition 3.1. Like all the prior results, we shall considerK =D 1 ×D 2 to be a separable product of cones inR m ×R n withg∈D 1 ⊆R m andh∈D 2 ⊆R n . In many applications, it is typical to haveD 1 (respectivelyD 2 ) to be a low-dimensional subset ofR m (respectivelyR n ). An important interpretation ofg∈D 1 is thatg can be considered as a coded representation (of ambient dimension m) of the unknown whose actual dimension is much smaller (equal to that of the Hausdorff dimension ofD 1 ). However, this coded representation is invoked to help strengthen the identifiability properties of the problem, rather than to help with error correction in the presence of observation noise. This is a somewhat different interpretation of coding than the classical goal of correcting noise induced errors. Operationally, both interpretations rely on redundancy albeit for different purposes. 3.5.1 Repetition Coding Suppose that a few of the relays in Figure 3.5 cooperatively decide to maintain the same relative transmission powers and phases and this decision is communicated to the destination for use as side information for channel estimation. A physical motivation for allowing few rather than all relays to cooperate, may be attributed to infeasibility of such coordination for widely separated relays and/or for a large number of relays. Mathematically, this translates to the use of a smaller feasible setK by virtue of a stricter structural restriction ong via the setD 1 that is smaller thanR m . Specifically, let Λ 0 ⊆{l 1 ,l 2 ,...,l k }⊆{1, 2,...,m} be the index set of propagation delays corresponding to the cooperating relays. Assuming that the only structure ong∈D 1 is that induced by the cooperating relays, we define the repetition coded domain as D 1 =K 1 (Λ 0 ,m) where (Λ 0 ,m) act as parameters to the following parametrized definition of a family of cones 67 for any dimension d≥ 2 and any index set Λ⊆{1, 2,...,d}, K 1 (Λ,d), w∈R d w(1)6= 0,w(d)6= 0,∃c∈R\{0}, such thatw(Λ) =c1 . (3.28) In other words, the setD 1 imposes the structure that each of its member vectors has the same non-zero value c on the index subset Λ 0 ; the value of c being allowed to vary across members. Even for a large number of cooperating relays (as measured by the cardinality|Λ 0 |), Corollaries 3.12 and 3.13 stated below imply unidentifiability results forh∈R n (unstructured channel) andh∈K 0 (Λ 00 ,n) (canonical-sparse channel), respectively. Corollary 3.12 makes a statement for the repetition coded domainK 1 (Λ 0 ,m) that is analogous to the statement made by Corollary 3.10 for the canonical-sparse domainK 0 (Λ,m). However, the proof of Corollary 3.12 needs different constructions for a candidate unidentifiable input signal than those used in the proof for Corollary 3.10, since the feasible domainK is different across these results. The proof of Corollary 3.13 is based mostly on that of Corollary 3.12 with key modifications to accommodate the difference in the feasible domain. Corollary 3.13 can further be interpreted as extending the unidentifiability results of Theorem 3.8 from the canonical-sparse product domainK 0 (Λ 1 ,m)×K 0 (Λ 2 ,n) to the mixed product of repetition coded and canonical-sparse domainsK 1 (Λ 0 ,m)×K 0 (Λ 00 ,n). Roughly speaking, because we only consider separable domainsK, one can simply fuse the different constructions of a candidate adversarial signalx ∗ ∈K 1 (Λ 0 ,m) from Corollary 3.12 andy ∗ ∈K 0 (Λ 00 ,n) from Corollary 3.10 to produce a candidate unidentifiable pair (x ∗ ,y ∗ )∈K 1 (Λ 0 ,m)×K 0 (Λ 00 ,n) for Corollary 3.13. This ability to fuse individual adversarial signal constructions on setsD 1 andD 2 to produce candidate unidentifiable signal pairs within the Cartesian product domainK =D 1 ×D 2 underlies the simplicity of arguments in the proof of Corollary 3.15. At present, it is unclear if there exist adaptations of the constructions used in the proof of Corollary 3.15 that would enable analogous statements for non-separable domainsK in a generalizable way. Corollary 3.12. Let m ≥ 3 and n ≥ 2 be arbitrary integers. For any given index set ∅ 6= Λ 0 ⊆ {2, 3,...,m− 1}, letK =K 1 (Λ 0 ,m)×R n and define p,|Λ 0 S (Λ 0 − 1)|. Then there exists a setG ∗ ⊆K/Id R such that every (g,h)∈K is unidentifiable. If Λ 0 T (Λ 0 − 1)6=∅ thenG ∗ is of dimension at least (m +n−p), otherwiseG ∗ is of dimension at least (m +n−p + 1). Proof. Section 3.7.11. Corollary 3.13. Let m ≥ 3 and n ≥ 5 be arbitrary integers. For any given index sets ∅ 6= Λ 0 ⊆ {2, 3,...,m− 1} and∅6= Λ 00 ⊆{3, 4,...,n− 2}, letK =K 1 (Λ 0 ,m)×K 0 (Λ 00 ,n) and let p 1 ,|Λ 0 S (Λ 0 − 1)| and p 2 ,|Λ 00 S (Λ 00 − 1)|. Then there exists a setG ∗ ⊆K/Id R such that every (g,h)∈G ∗ is unidentifiable. If Λ 0 T (Λ 0 − 1)6=∅ thenG ∗ is of dimension at least (m +n−p 1 −p 2 ), otherwiseG ∗ is of dimension at least (m +n + 1−p 1 −p 2 ). Proof. Section 3.7.12. We note that in contrast to the results for the canonical-sparse domainK 0 (Λ,m) in Corollary 3.10, the dimension of unidentifiable subset for the repetition coded domainK 1 (Λ 0 ,m) as given by Corollary 3.12 depends on whether Λ 0 T (Λ 0 − 1) is non-empty, i.e. if atleast one pair of adjacent indices forg∈K 1 (Λ 0 ,m) are repetition coded and hence belong to Λ 0 . Corollary 3.13 essentially says that even with a canonical-sparse prior structure on the channel vectorh∈R n and nearly full cooperation between intermediate relays (|Λ 0 | = Θ(m)) via repetition coding on g∈R m , there may be non-zero dimensional unidentifiable signal subsets in the feasible domainK. We recall that the index subsets Λ 0 and Λ 00 in Corollary 3.13 are known at the receiver and the unidentifiable signal set exists despite the availability of this side information. Thus, the repetition coding based system architecture laid out in this section is bound to fail in practice although it may be an intuitive choice. An important implication of Corollary 3.13 is that it is necessary to have Λ 0 T {1,m}6=∅, which is equivalent to the requirement that either the first relay or the last relay (when ordered by increasing propagation delays) must be present in the cooperating subset of relays. 68 0 2 4 6 8 10 12 14 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 2 4 6 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Figure 3.6: An arbitrary vector s ∈ K b (Λ,d) with Λ = {3, 4, 7, 8, 9, 12}, d = 14 and b = (−0.5,−0.835, 0.3, 0.5, 0.835, 0.15) T . The specific vector s in the left plot satisfies s(Λ) = cb with c =−1, verifying (3.29) with code vectorb as in the right plot. Everys∈K b (Λ,d) is collinear withb on the index set Λ (indicated by union of solid red and dashed red stems) with Λ representing the identities of the cooperating relays. Heights of the double dashed black stems, indicating values on Λ c , represent contributions from non-cooperating relays and can vary across different vectors inK b (Λ,d). Letting Λ ∗ ={1, 3, 4}, (Λ,b) is a type 1 pair by Definition 3.2 and is validated bys(Λ T (Λ− 1)) =−b(Λ ∗ ) (indicated by solid red stems on both plots) being collinear withs(Λ T (Λ− 1) + 1) =−b(Λ ∗ + 1) (pointed to by black arrows on both plots). Union of the blue braces on the left plot denotes the index set Λ S (Λ− 1) ={2, 3, 4, 6, 7, 8, 9, 11, 12}. 3.5.2 Partially Cooperative Coding The results in Section 3.5.1 can be generalized to other codes. Consider the system described in Section 3.5.1, except that the cooperating relays decide on a structural restriction forg∈D 1 ⊆R m that is different from Section 3.5.1. As before, let Λ 0 ⊆{1, 2,...,m} denote the index set of propagation delays corresponding to the cooperating relays. We letD 1 =K b (Λ 0 ,m) denote a partially cooperative coded domain where the vector b∈R |Λ 0 | denotes the cooperative code (alternatively,b could also be interpreted as a power-delay-profile across cooperating relays) and (Λ 0 ,m) are specific parameters to the following parametrized definition of a family of cones for any dimension d≥ 2 and any index set Λ⊆{1, 2,...,d}, K b (Λ,d), w∈R d w(1)6= 0,w(d)6= 0,∃c∈R\{0}, such thatw(Λ) =cb . (3.29) As a visual example, Figure 3.6 shows an arbitrary vector s in the partially cooperative coded domain K b (Λ,d) =K b ({3, 4, 7, 8, 9, 12}, 14) for the code vector b = (0.5, 0.835,−0.3,−0.5,−0.835,−0.15) T . It is important to note that the code vectorb∈R |Λ 0 | is known at the destination so that the decoder has explicit knowledge of the setD 1 =K b (Λ 0 ,m). Furthermore, the partially cooperative coded domainK b (Λ,d) unifies and generalizes both the repetition coded domainK 1 (Λ,d) and the canonical-sparse domainK 0 (Λ,d). To see this, we first note thatb = 0 impliesK b (Λ,d) =K 0 (Λ,d) by definition, and that the definitions (3.28) and (3.29) are equivalent forb = 1. Secondly, if the code vector b∈R |Λ1 S Λ2| admits a partition of the form h b(Λ 1 ) T ,b(Λ 2 ) T i = 0 T , 1 T for some disjoint index subsets Λ 1 , Λ 2 ⊆{1, 2,...,d}, then the definitions in (3.29), (3.28) and (3.25) would imply thatK b (Λ 1 S Λ 2 ,d) =K 0 (Λ 1 ,d) T K 1 (Λ 2 ,d). This can be checked directly by substitution. Corollaries 3.14 and 3.15 stated below imply unidentifiability results for certain joint properties of the (Λ 0 ,b) pair, i.e. for certain joint configurations of the cooperating relay index set and the cooperative code employed. The unidentifiability results may hold for bothh∈R n (unstructured channel) andh∈K b (Λ 00 ,n) (subspace-sparse channel), even for a large number of cooperating relays (as measured by the cardinality|Λ 0 |). In what follows, Λ ∗ + 1 denotes the Minkowski sum of the sets{1} and Λ ∗ . 69 Corollary 3.14. Let m ≥ 3 and n ≥ 2 be arbitrary integers. For any given index set ∅ 6= Λ 0 ⊆ {2, 3,...,m− 1}, let b6= 0, K =K b (Λ 0 ,m)×R n and define p ,|Λ 0 S (Λ 0 − 1)|. If Λ 0 T (Λ 0 − 1) =∅, then there exists a setG ∗ ⊆K/Id R of dimension at least (m +n−p + 1), such that every (g,h)∈G ∗ is unidentifiable. Otherwise, if Λ 0 T (Λ 0 − 1)6=∅ then let∅6= Λ ∗ ⊆{1, 2,...,|Λ 0 |} denote the index subset such that∀g∈K b (Λ 0 ,m),g Λ 0 T (Λ 0 − 1) is collinear withb(Λ ∗ ). Ifb(Λ ∗ ) andb(Λ ∗ + 1) are collinear, then there exists a setG ∗ ⊆K/Id R of dimension at least (m +n−p), such that every (g,h)∈G ∗ is unidentifiable. Proof. Section 3.7.13. Corollary 3.14 makes a statement for the partially cooperative coded domainK b (Λ 0 ,m) that is essentially analogous to the statement of Corollary 3.12 for the repetition coded domainK 1 (Λ 0 ,m), the only technical distinction being a more elaborate specification of the unidentifiable configurations of the pair (Λ 0 ,b) in the former result. Some of the technical conditions on the pair (Λ 0 ,b), for generating unidentifiable configurations, areautomaticallysatisfiedwhenb = 1andhencewechosetopresentthissimplerinstantiationofCorollary3.14 in the form of Corollary 3.12 under the rubric of repetition coding in the last subsection. Corollary 3.15 stated below fuses together the unidentifiability results of Corollaries 3.13 and 3.14 as an attempt to generalization. In particular, both domainsK b (Λ 0 ,m) andK 1 (Λ 0 ,m) share the same construction for a candidate unidentifiable input signal, provided the premise of Corollary 3.14 is satisfied. Furthermore, Corollaries 3.10, 3.12 and 3.14 motivate the following categorization of the (Λ,b) pairs in preparation for the statement of Corollary 3.15. Definition 3.2. Letd≥ 3 be an arbitrary integer,∅6= Λ⊆{2, 3,...,d− 1} be a given index set andb∈R |Λ| be a code vector. Let p,|Λ S (Λ− 1)|. The following mutually exclusive (but not exhaustive) categories for the pair (Λ,b) are defined. 0) If d≥ 5,∅6= Λ⊆{3, 4,...,d− 2} andb = 0 then (Λ,b) is of type 0. 1) Let b6= 0 and Λ T (Λ− 1)6=∅. If b is collinear with 1∈ R |Λ| then (Λ,b) is of type 1. If b is not collinear with 1∈R |Λ| then let∅6= Λ ∗ ⊆{1, 2,...,|Λ|} denote the index subset such that∀w∈K b (Λ,d), w Λ T (Λ− 1) is collinear withb(Λ ∗ ). Ifb(Λ ∗ ) andb(Λ ∗ + 1) are collinear, then (Λ,b) is of type 1. 2) Ifb6= 0 and Λ T (Λ− 1) =∅ then (Λ,b) is of type 2. For a visual example, we note that Figure 3.4 represents a (Λ,b) pair of type 0 while Figure 3.6 shows a type 1 pair with Λ ∗ ={1, 3, 4} (Λ ∗ as defined above in Definition 3.2). Corollary 3.15. Let m,n≥ 3 be arbitrary integers. For any given index sets∅6= Λ 0 ⊆{2, 3,...,m− 1} and ∅6= Λ 00 ⊆{2, 3,...,n− 1}, letK =K b (Λ 0 ,m)×K b 0(Λ 00 ,n) and letp,|Λ 0 S (Λ 0 − 1)| andp 0 ,|Λ 00 S (Λ 00 − 1)|. If the pairs (Λ 0 ,b) and (Λ 00 ,b 0 ) are respectively of types t and t 0 for t,t 0 ∈{0, 1, 2}, then there exists a set G ∗ ⊆K/Id R of dimension at least (m +n− 1−p−p 0 +t +t 0 ) such that every (g,h)∈G ∗ is unidentifiable. Proof. Section 3.7.14. Corollary 3.15 stated above utilizes the construction of candidate adversarial inputs over different domains, as developed in Corollaries 3.10 and 3.14, and fuses them together to generate candidate unidentifiable signal pairs by exploiting the separability of the feasible domainK. It is straightforward to check that Theorem 3.8 and Corollaries 3.10, 3.12, 3.13, and 3.14 can all be derived from Corollary 3.15 by suitable choices of the parameter pairs (Λ 0 ,b) and (Λ 00 ,b 0 ), e.g. settingb = 1 andb 0 = 0 gives back Corollary 3.13. In essence, barring Corollary 3.11 which requires special assumptions, Corollary 3.15 encompasses all of the unidentifiability results for constrained blind deconvolution developed in this part of the paper. The main message of Corollary 3.15 is that the coded subspaces represented by the vectorsb andb 0 are critical to the identifiability of the blind linear deconvolution problem (P 6 ). If these coded subspaces are solely determined by the application or natural system configuration (like the multi-hop channel estimation example above) then sparsity may not be sufficient to guarantee identifiability for the blind deconvolution problem. A certain amount of design freedom on both of the coded subspaces (represented by vectorsb andb 0 ) is necessary to guarantee identifiability under blind deconvolution (like in [21], [25], [103]). 70 As yet another important special case for wireless communications, we interpret the vectorb as a power- delay profile and assume it to be geometrically decaying over some contiguous index subset Λ, i.e.b satisfies b(j) = rb(j− 1) for some 0 <|r| < 1 and all j∈{2, 3,...,|Λ|}. Then we get the geometrically decaying coneK b (Λ,d) of all d-dimensional real vectors that are geometrically decaying on the index subset Λ and it admits the following unidentifiability result (in the statement below,K b (Λ 2 ,n) represents a geometrically decaying cone). Corollary 3.16. Let m≥ 3 and n≥ 5 be arbitrary integers. For any given index subset∅6= Λ 1 ⊆ {3, 4,...,m− 2} and any contiguous index subset∅6= Λ 2 ⊆{2, 3,...,n− 1}, letK =K 0 (Λ 1 ,m)×K b (Λ 2 ,n) and definep 1 ,|Λ 1 S (Λ 1 − 1)|,p 2 ,|Λ 2 |+1. Then there exists a setG⊆K/Id R such that every (g,h)∈G is unidentifiable. If|Λ 2 | = 1 thenG is of dimension at least (m +n + 1−p 1 −p 2 ), otherwiseG is of dimension at least (m +n−p 1 −p 2 ). The proof is omitted since it can be observed to be a direct specialization of Corollary 3.15. The occurrence of a geometrically decaying power-delay profile is common in all wide-band wireless communication channels [119]–[121]. 3.6 Conclusions Blind deconvolution is an important non-linear inverse problem commonly encountered in signal processing applications. Natively, blind deconvolution is ill-posed from the viewpoint of signal identifiability and it is assumed that application specific additional constraints (like sparsity) would suffice to guarantee identifiability for this inverse problem. In this chapter, we showed that (somewhat surprisingly) sparsity in the canonical basis is insufficient to guarantee identifiability under fairly generic assumptions on the support of the sparse signal. Specifically, we explicitly demonstrate a form of rotational ambiguity that holds true for sparse vectors as well. Our approach builds on the lifting technique from optimization to reformulate blind deconvolution into a rank-one matrix recovery problem, and analyzes the rank-two null space of the resultant linear operator. While this approach is philosophically applicable to other bilinear inverse problems (like dictionary learning), it is the simplicity of the convolution operator in the lifted domain that makes our analysis tractable. We developed scaling laws quantifying the ill-posedness of canonical-sparse blind deconvolution by bounding the dimension of the unidentifiable sparse signal set. For non-sparse blind deconvolution, we proved Lebesgue almost everywhere unidentifiability for even model orders; a much stronger impossibility result than any counterparts in existing literature. More specifically, we explicitly demonstrated a rotational ambiguity in blind deconvolution for both canonical-sparse and non-sparse input vectors (a non-trivial generalization of the well-known shift ambiguities in blind deconvolution), that generates a dimension-wise large set of unidentifiable inputs. Our proofs are constructive, dimensions of key sets are explicitly computed, and results hold over the real field which is not algebraically closed. When applied to a second-hop sparse channel estimation problem, our methods also revealed the insufficiency of side information involving repetition coding or geometrically decaying signals towards guaranteeing identifiability under blind linear deconvolution. To establish our results, we developed a dimension-wise tight, partially parametric and partially recursive characterization of the rank-two null space of the linear convolution map. This is a precursory result to non-randomized code design strategies for guaranteeing signal identifiability under the bilinear observation model of linear convolution. The design of such codes is a topic of ongoing research. Finally, an unanswered question that ensues from our analysis is whether the dimension-wise insignificant partM(m,n)\ N 0 (m,n) S N 2 (m,n) , of the rank-two null spaceN S (m,n) , 2 of blind deconvolution, is empty for m,n≥ 3 and m6=n. 3.7 Proofs 3.7.1 Proof of Proposition 3.2 The proof involves contradiction by an explicit dimension recursive construction. Let m,n≥ 3 be arbitrary integers and letS 0 :R (m−1)×(n−1) →R m+n−3 denote the lifted linear convolution map in one lower dimension 71 thanS (·) w.r.t. both rows and columns. Further, letu ∗ ∈R m−2 \{0},v ∗ ∈R n−2 \{0} be arbitrary vectors and defineX∈R (m−1)×(n−1) as X = u ∗ 0 0 −u ∗ 0 v T ∗ v T ∗ 0 . (3.30) Since (m− 1)≥ 2 and (n− 1)≥ 2, using Lemma 3.1 onX impliesX∈N (S 0 , 2). For any invertible matrix A∈R 2×2 , let us designate u ∗ 0 0 −u ∗ A = u 1 u 2 , A −1 0 v T ∗ v T ∗ 0 = v T 1 v T 2 (3.31) for someu 1 ,u 2 ∈R m−1 andv 1 ,v 2 ∈R n−1 . Clearly,X =u 1 v T 1 +u 2 v T 2 . Furthermore, the vectors{u 1 ,u 2 } are linearly independent, owing to the invertibility ofA. Consider the matrixY ∈R m×n constructed as Y = u 1 0 0 u 2 0 v T 1 v T 2 0 = 0 u 1 v T 1 0 0 T | {z } Y1 + 0 T 0 u 2 v T 2 0 | {z } Y2 . (3.32) From (3.13) and (3.14) in Section 3.2.3,S (Y 1 ) does not change if elements ofY 1 are shifted down by one unit along the anti-diagonals. This implies S (Y j ) = 0 S 0 u j v T j 0 (3.33) for j = 1, 2 and thus, S (Y ) = 0 S 0 u 1 v T 1 +u 2 v T 2 0 = 0 S 0 (X) 0 = 0 (3.34) using (3.32) andu 1 v T 1 +u 2 v T 2 =X∈N (S 0 , 2). Therefore,Y ∈N (S, 2) since (3.32) implies rank(Y )≤ 2, butY as represented in (3.32) is not convertible to the form in (3.17) because of linear independence ofu 1 andu 2 . To finish the proof, we need to show that the parametric family of matrices Y as defined by (3.32) is of dimension (m +n− 3). We denote this set byM. Considerku ∗ k 2 =kv ∗ k 2 = 1 so thatu ∗ ∈R m−2 is (m− 3) dimensional andv ∗ ∈R n−2 is (n− 3) dimensional. A represents two independently chosen directions of linear combination to generateu 1 andu 2 , soA is two dimensional. Finally, given our calculations, we need to add one dimension for a scalar multiplicative parameter to account for all matricesY ∈M. Thus, M is (m− 3) + (n− 3) + 2 + 1 = (m +n− 3) dimensional. 3.7.2 Proof of Theorem 3.3 Throughout this proof, we shall let φ j denote the j th coordinate projection operator, i.e. φ j (x) =x(j). 1. WearguethatN S (1,n) , 2 ={0}. TheproofforN S (m,1) , 2 ={0}issimilar. Sincelinearconvolution with a scalar is equivalent to multiplication by the same scalar, by definition ofS (1,n) :R n →R n , the lifted operatorS (1,n) (·) is an identity operator onR n . This immediately impliesN S (1,n) , 2 ={0} since the identity operator has a trivial null space. 2. Letn≥ 2 be an arbitrary integer. We show thatN S (2,n) , 2 =N 0 (2,n). The proof ofN S (m,2) , 2 = N 0 (m, 2) for m≥ 2 is similar. Setting m = 2 in Lemma 3.1, we haveN 0 (2,n)⊆N S (2,n) , 2 . To complete the proof, we need to showN S (2,n) , 2 ⊆N 0 (2,n). LetX∈N S (2,n) , 2 ⊂R 2×n . We have φ 1 ◦S (2,n) (X) = 0 and by definition ofS (2,n) (·), we haveX(1, 1) =φ 1 ◦S (2,n) (X) (see Figure 3.2 for 72 illustration), implying thatX(1, 1) = 0. By a similar argument,X(2,n) =φ n+1 ◦S (2,n) (X) = 0. By definition ofS (2,n) (·) we have, S (2,n) (X) = 0 X(1, 2 :n) +X(2, 1 :n− 1) 0 = 0 (3.35) implying thatX(1, 2 :n) =−X(2, 1 :n− 1). Lettingv T =X(1, 2 :n) we get X = 1 0 0 −1 0 v T v T 0 ∈N 0 (2,n). (3.36) SinceX∈N S (2,n) , 2 is arbitrary, we haveN S (2,n) , 2 ⊆N 0 (2,n). 3. Let m,n≥ 3 be arbitrary integers. We’ll show thatN 0 (m,n) S N 2 (m,n)⊆N S (m,n) , 2 . From Lemma 3.1, we haveN 0 (m,n)⊆N S (m,n) , 2 . It remains to show thatN 2 (m,n)⊆N S (m,n) , 2 . To this end, we borrow constructions from the proof of Proposition 3.2. Let X =u 1 v T 1 +u 2 v T 2 ∈ N S (m−1,n−1) , 2 \{0} for someu 1 ,u 2 ∈R m−1 andv 1 ,v 2 ∈R n−1 . We constructY ∈N 2 (m,n)⊂ R m×n as in (3.32) and following the chain of arguments (3.32)→ (3.34) using the anti-diagonal sum interpretation of Section 3.2.3, we have S (m,n) (Y ) = 0 S (m−1,n−1) u 1 v T 1 +u 2 v T 2 0 = 0 S (m−1,n−1) (X) 0 = 0 (3.37) sinceX∈N S (m−1,n−1) , 2 . Further, (3.32)impliesthatrank(Y )≤ 2andthereforeY ∈N S (m,n) , 2 . SinceX∈N S (m−1,n−1) , 2 \{0} is arbitrary, we haveN 2 (m,n)⊆N S (m,n) , 2 . 4. Let m,n ≥ 3 be arbitrary integers. From the preceding part, we have N 0 (m,n) S N 2 (m,n) ⊆ N S (m,n) , 2 implying that N 0 (m,n) S N 2 (m,n) \M(m,n)⊆N S (m,n) , 2 \M(m,n). It remains to show thatN S (m,n) , 2 \M(m,n)⊆ N 0 (m,n) S N 2 (m,n) \M(m,n). LetX∈N S (m,n) , 2 \ M(m,n)⊂R m×n . We have φ 1 ◦S (m,n) (X) = 0 and by definition ofS (m,n) (·), we haveX(1, 1) = φ 1 ◦S (m,n) (X) (see Figure 3.2 for illustration), implying that X(1, 1) = 0. By a similar argument, X(m,n) =φ m+n−1 ◦S (m,n) (X) = 0. Our proof strategy will be to show that X always admits a factorization like (3.32) and then deduce that this necessarily implies our result. SinceX6∈M(m,n), eitherX(m, 1)6= 0 orX(1,n)6= 0 is true. Let us first assume that X(m, 1)6= 0 implying thatX(:, 1)6= 0. We have two further possibilities, viz.X(1, :)6= 0 T andX(1, :) = 0 T . 4a) Suppose thatX(1, :)6= 0 T . We setu 2 =X(2 :m, 1)∈R m−1 so that 0,u T 2 =X(:, 1) T . Since N S (m,n) , 1 = {0}, we have rank(X) = 2 and since X(:, 1) 6= 0 we conclude that∃j 0 ∈ {2, 3,...,n} such that X(:, 1) and X(:,j 0 ) are linearly independent. We choose u 1 ∈R m−1 as follows. IfX(m,j 0 ) = 0 (e.g.X(m,n) = 0 for j 0 =n), then we setu 1 =X(1 :m− 1,j 0 ), else we set u 1 = X(m, 1) X(m,j 0 ) X(1 :m− 1,j 0 )−X(1 :m− 1, 1). (3.38) We claim that the vectors u T 1 , 0 and 0,u T 2 are linearly independent and spanC(X), which would imply thatX admits a factorization of the form X = u 1 0 0 u 2 v T v T ∗ (3.39) 73 for some vectors v,v ∗ ∈R n . Indeed, if X(m,j 0 ) = 0 then u T 1 , 0 =X(:,j 0 ) T and X(:, 1) T = 0,u T 2 are linearly independent and spanC(X) since rank(X) = 2. In caseX(m,j 0 )6= 0, setting α =X(m, 1)/X(m,j 0 ) we observe that u 1 0 0 u 2 = X(:, 1) X(:,j 0 ) −1 1 α 0 . (3.40) Since X(m, 1)6= 0 and X(m,j 0 )6= 0, α∈ R\{0} and (3.40) imply that u T 1 , 0 and 0,u T 2 are linearly independent and spanC(X) owing to the linear independence ofX(:, 1) andX(:,j 0 ) and rank(X) being equal to two. Thus, (3.39) holds implying thatu 1 (1)v(1) =X(1, 1) = 0 and u 2 (m− 1)v ∗ (n) =X(m,n) = 0. We haveu 2 (m− 1) =X(m, 1)6= 0 by construction sov ∗ (n) = 0. Further, ifu 1 (1) = 0 were true then (3.39) would implyX(1, :) = 0 T contradicting our assumption ofX(1, :)6= 0 T . Thus,u 1 (1)6= 0 which impliesv(1) =X(1, 1)/u 1 (1) = 0. Lettingv 1 =v(2 :n) andv 2 =v ∗ (1 :n− 1), (3.39) implies X = u 1 0 0 u 2 0 v T 1 v T 2 0 = 0 u 1 v T 1 0 0 T | {z } X1 + 0 T 0 u 2 v T 2 0 | {z } X2 . (3.41) Following the same chain of arguments as (3.32)→ (3.34) with the anti-diagonal sum interpretation, (3.41) implies 0 S (m−1,n−1) u 1 v T 1 +u 2 v T 2 0 =S (m,n) (X 1 ) +S (m,n) (X 2 ) = 0 (3.42) sinceX∈N S (m,n) , 2 . Therefore,u 1 v T 1 +u 2 v T 2 ∈N S (m−1,n−1) , 2 . Ifu 1 v T 1 +u 2 v T 2 6= 0 then (3.41) and (3.21) imply thatX∈N 2 (m,n). Instead, ifu 1 v T 1 +u 2 v T 2 = 0 thenu 2 v T 2 =−u 1 v T 1 and (3.41) implies X =X 1 +X 2 = 0 u 1 v T 1 0 0 T + 0 T 0 −u 1 v T 1 0 = u 1 0 0 −u 1 0 v T 1 v T 1 0 (3.43) andthusX∈N 0 (m,n)by(3.20). SinceX6∈M(m,n)byassumption,wegetX∈ N 0 (m,n) S N 2 (m,n) \ M(m,n). 4b) Now suppose thatX(1, :) = 0 T so that X = 0 T Y (3.44) for some matrixY ∈R (m−1)×n . We will use mathematical induction on m to settle this case. Induction Step: LetN S (m 0 ,n) , 2 \M(m 0 ,n)⊆ N 0 (m 0 ,n) S N 2 (m 0 ,n) \M(m 0 ,n) be true for m 0 = m− 1≥ 3. Since S (m,n) (·) sums elements along the anti-diagonals (see Figure 3.2 for illustration) andX∈N S (m,n) , 2 , we have 0 S (m 0 ,n) (Y ) =S (m,n) (X) = 0 (3.45) implying thatY ∈N S (m 0 ,n) , 2 . Further,X(m, 1)6= 0 by assumption andY (m 0 , 1) =X(m, 1) by (3.44) implying thatY 6∈M(m 0 ,n) by definition. Thus,Y ∈N S (m 0 ,n) , 2 \M(m 0 ,n) and by the induction hypothesis either Y ∈N 2 (m 0 ,n) orY ∈N 0 (m 0 ,n). We separate the analysis for each of these cases below. 74 (i) Suppose that Y ∈N 2 (m 0 ,n). Then there exist vectors u 1 ,u 2 ∈R m 0 −1 andv 1 ,v 2 ∈R n−1 such that Y = u 1 0 0 u 2 0 v T 1 v T 2 0 (3.46) andu 1 v T 1 +u 2 v T 2 ∈N S (m 0 −1,n−1) , 2 \{0}. Now using (3.44) we get X = 0 0 u 1 0 0 u 2 0 v T 1 v T 2 0 = u 3 0 0 u 4 0 v T 1 v T 2 0 (3.47) where we have setu T 3 = 0,u T 1 andu T 4 = 0,u T 2 . Clearly,u 3 ,u 4 ∈R m−1 . To conclude that X∈N 2 (m,n) we need to show thatu 3 v T 1 +u 4 v T 2 ∈N S (m−1,n−1) , 2 \{0}. We have, u 3 v T 1 +u 4 v T 2 = 0 u 1 v T 1 + 0 u 2 v T 2 = 0 T u 1 v T 1 +u 2 v T 2 (3.48) and therefore S (m−1,n−1) u 3 v T 1 +u 4 v T 2 = 0 S (m−2,n−1) u 1 v T 1 +u 2 v T 2 = 0 (3.49) where the first equality is because S (m−1,n−1) (·) sums elements along the anti-diagonals and the second equality follows from the fact thatu 1 v T 1 +u 2 v T 2 ∈N S (m 0 −1,n−1) , 2 \{0} and m 0 =m− 1. Sinceu 1 v T 1 +u 2 v T 2 6= 0, (3.48) implies thatu 3 v T 1 +u 4 v T 2 6= 0. Further, rank u 3 v T 1 +u 4 v T 2 ≤ 2 and thus, (3.49) implies thatu 3 v T 1 +u 4 v T 2 ∈N S (m−1,n−1) , 2 \{0} and hence X∈N 2 (m,n). By assumption, X6∈M(m,n) and therefore X∈N 2 (m,n)\ M(m,n). (ii) Assume thatY ∈N 0 (m 0 ,n). Then there exist vectorsu∈R m 0 −1 andv∈R n−1 such that Y = u 0 0 −u 0 v T v T 0 (3.50) and (3.44) gives X = 0 0 u 0 0 −u 0 v T v T 0 = u ∗ 0 0 −u ∗ 0 v T v T 0 (3.51) where we have setu T ∗ = 0,u T . Clearly,u ∗ ∈R m−1 and (3.51) impliesX∈N 0 (m,n) by definition. By assumption,X6∈M(m,n) and thereforeX∈N 0 (m,n)\M(m,n). Induction Basis: We need to show thatN S (3,n) , 2 \M(3,n)⊆ N 0 (3,n) S N 2 (3,n) \M(3,n). Let Z∈N S (3,n) , 2 \M(3,n). If Z(1, :)6= 0 T then it follows from Part 4a of the proof that Z∈ N 0 (3,n) S N 2 (3,n) \M(3,n). Instead, ifZ(1, :) = 0 T then we have 0 S (2,n) Z(2 : 3, :) =S (2,n) (Z) = 0 (3.52) where the first equality is due toS (2,n) (·) summing elements along the anti-diagonals and the second equality follows fromZ∈N S (3,n) , 2 . Therefore,Z(2 : 3, :)∈N S (2,n) , 2 implying that Z(2 : 3, :)∈N 0 (2,n) from Part 2 of the proof. Thus, there exist u∈R andv∈R n−1 such that Z(2 : 3, :) = u 0 0 −u 0 v T v T 0 (3.53) 75 implying that Z = 0 T Z(2 : 3, :) = 0 0 u 0 0 −u 0 v T v T 0 = u ∗ 0 0 −u ∗ 0 v T v T 0 (3.54) where u T ∗ = [0,u]. Thus, (3.54) implies that Z∈N 0 (3,n). Since Z6∈M(3,n) we have Z∈ N 0 (3,n)\M(3,n). We have proved that ifX(m, 1)6= 0 andX∈N S (m,n) , 2 thenX∈N 0 (m,n) S N 2 (m,n). To finish the proof we need to consider the other scenario for X6∈M(m,n), i.e. show thatX(1,n)6= 0 and X∈N S (m,n) , 2 imply X∈N 0 (m,n) S N 2 (m,n). The arguments for this case are conceptually similar to our approach forX(m, 1)6= 0 and hence we omit it. 3.7.3 Proof of Proposition 3.4 We explicitly define an n− 1 dimensional parameterized set of matricesM 2 (n)⊂R n×n as follows, M 2 (n), 0 −u T 0 u 0 λu 0 −λu T 0 u∈R n−2 \{0},λ∈R\{0} (3.55) and show thatM 2 (n)⊆N S (n,n) , 2 T M(n,n)\ N 0 (n,n) S N 2 (n,n) . LetX be an arbitrary element ofM 2 (n) and let u∈R n−2 \{0},λ∈R\{0} denote some representative parameters certifying that X∈M 2 (n) according to (3.55). It can be easily verified thatX is skew-symmetric. SinceS (n,n) (·) sums elements along anti-diagonals (see Figure 3.2 for illustration), skew-symmetry ofX implies thatX is in the null space ofS (n,n) (·). Further,X admits the factorization X = 0 −u T 0 u 0 λu 0 −λu T 0 = 0 1 u 0 0 λ 1 0 T λ 0 −u T 0 (3.56) implying that rank(X)≤ 2 and therefore X∈N S (n,n) , 2 . Since X(n, 1) = X(1,n) = 0 we can also conclude thatX∈N S (n,n) , 2 T M(n,n). Tofinishtheproof, weneedtoshowthatX6∈N 0 (n,n) S N 2 (n,n). Weproceedforaproofbycontradiction. First, assume thatX∈N 2 (n,n). Then there exist vectorsu 1 ,u 2 ,v 1 ,v 2 ∈R n−1 such that X = u 1 0 0 u 2 0 v T 1 v T 2 0 . (3.57) Since u6= 0 and λ6= 0, (3.56) implies X(1, :)6= 0 T and X(n, :)6= 0 T . Now (3.57) implies that X(1, : ) = 0,u 1 (1)v T 1 andX(n, :) = u 2 (n− 1)v T 2 , 0 . Therefore,u 1 (1)6= 0 andu 2 (n− 1)6= 0. Using the two different representations forC(X) in (3.56) and (3.57) we must have 0,u T , 0 =γ u T 1 , 0 +β 0,u T 2 (3.58) for some [γ,β] T ∈R 2 \{0}. However, (3.58) implies 0 =γu 1 (1) and 0 =βu 2 (n−1) and thus,γ =β = 0 since u 1 (1)6= 0 andu 2 (n− 1)6= 0. This leads to a contradiction and thereforeX6∈N 2 (n,n). Next, we assume X∈N 0 (n,n). All of our arguments from the preceding case ofX∈N 2 (n,n) are still valid if we setu 2 =−u 1 . Therefore, once again we reach a contradiction andX6∈N 0 (n,n). So, we haveX6∈N 0 (n,n) S N 2 (n,n) and the proof is complete. 76 3.7.4 Proof of Lemma 3.5 Ifw(1) = 0 then we simply setw ∗ =w(2 :d) andφ =−π/2 to satisfy (3.24). Ifw(d) = 0 then we simply set w ∗ =w(1 :d− 1) andφ = 0 to satisfy (3.24). So, we assume thatw(1)6= 0 andw(d)6= 0 which respectively implyw ∗ (1)6= 0 andw ∗ (d− 1)6= 0. We shall try to construct a vectors∈R d−1 such thatw T can be written as a linear combination of the vectors 0,s T and s T , 0 and algebraically manipulate the vector to fit the form in (3.24). Sincew(1)6= 0 andw(d)6= 0, we haves(1)6= 0 ands(d− 1)6= 0. We assumew(1) = 1 w.l.o.g. since, if the parameters s∈R d−1 ,φ∈R certify (3.24) for the vectorw/w(1) then the parameters w(1)s∈R d−1 ,φ∈R certify (3.24) for the vectorw with anyw(1)6= 0. We selects(1) = 1 and let θ represent a parameter such that tanθ =− w(d) s(d− 1) . (3.59) Since w(d)6= 0 and s(d− 1)6= 0, tanθ6∈{0,±∞}, and we solve the following system of equations for s(2 :d− 1) withw known, w(2 :d− 1) = s(2 :d− 2) −1 s(d− 1) −s(2 :d− 2) 1 tanθ . (3.60) We need 2≤d− 2 for (3.60) to make sense. This results in the restrictiond≥ 4. Equation (3.60) is equivalent to the set of equations w(j) =s(j)−s(j− 1) tanθ, j = 2, 3,...,d− 1. (3.61) Using (3.59) to substitute tanθ in (3.61), we have s(j− 1) = s(d− 1) w(d) w(j)−s(j) , j = 2, 3,...,d− 1 (3.62) which can be solved recursively fors(d− 2),s(d− 3),...,s(1) (in that order) as polynomial expressions in the variable s(d− 1). Since s(1) = 1 by assumption, (3.62) for j = 2 represents a consistency equation. Further, (3.62) implies that the degree ofs(d− 1) in the expression fors(j− 1) is one higher than in the expression fors(j). Sinces(d− 2) is a quadratic polynomial ofs(d− 1) (checked by substitutingj =d− 1 in (3.62)), it follows thats(1) =s(d− (d− 1)) is an d− 1 order polynomial ofs(d− 1). Finally,s(1)6= 0 and w(d)6= 0 imply that the d− 1 order polynomial equation ins(d− 1) (the consistency equation) admits a non-zero constant term and hence a non-zero real root if d− 1 is odd (equivalently, if d is even). Thus, (3.62) admits a consistent solution and using (3.59) and (3.60), we get the real vectorw ∗ =s secθ∈R d−1 satisfying w = cosφ w ∗ 0 − sinφ 0 w ∗ = w ∗ 0 0 −w ∗ cosφ sinφ (3.63) for φ =θ. 3.7.5 Proof of Theorem 3.6 As in the statement of the theorem, let (x,y)∈K be an arbitrary vector pair withK =K(∅,m)×R n . Since m,n≥ 4 are even integers, invoking Lemma 3.5 once each forx andy, we can construct vectorsu∈R m−1 , v∈R n−1 and scalars θ,φ∈R such that x = u 0 0 −u cosθ sinθ , y = 0 v v 0 sinφ − cosφ . (3.64) Equation(3.64)impliesthatthedependenceofy isuniformlycontinuousonthenrealnumbers (φ,v(1),v(2),...,v(n− 1)) (with a non-singular Jacobian matrix) and these n real numbers completely parametrizey. Further, since the measure overy is absolutely continuous w.r.t. the n dimensional Lebesgue measure (implied by Lemma 3.9), 77 it is possible to choose a measure over φ that is absolutely continuous w.r.t. the one dimensional Lebesgue measure, and therefore φ6∈{lπ/2|l∈Z} is true almost everywhere w.r.t. the measure over φ. Finally, since x andy have disjoint parameterizations,φ−θ6∈{lπ|l∈Z} is true almost everywhere w.r.t. the measure over φ (owing to absolute continuity w.r.t. one dimensional Lebesgue measure). Thus, φ6∈{lπ/2,θ +lπ|l∈Z} is true almost everywhere w.r.t. the measure over φ and therefore also w.r.t. the measure overy. Once we have vectorsu andv, we consider the decomposition X = u 0 0 −u cosθ sinθ sinφ − cosφ 0 v T v T 0 + u 0 0 −u cosφ sinφ − sinθ cosθ 0 v T v T 0 = sin(φ−θ) u 0 0 −u 0 v T v T 0 (3.65) forX∈N (S, 2), as in (3.17). Setting x 0 = u 0 0 −u cosφ sinφ , y 0 = 0 v v 0 sinθ − cosθ (3.66) and observing (3.64) and (3.65) we conclude thatxy T −x 0 (y 0 ) T =X∈N (S, 2) and therefore the pairs (x,y) and (x 0 ,y 0 ) produce the same convolved output. Sincex(1)6= 0, (3.64) implies thatu(1)6= 0 and therefore (3.66) implies x 0 (1)6= 0 for φ6∈{lπ/2|l∈Z}. So, assuming φ6∈{lπ/2|l∈Z} implies that (x 0 ,y 0 )∈K. Further,x andx 0 are linearly independent if φ−θ6∈{lπ|l∈Z}, implying that (x,y) is unidentifiable by Definition 3.1 with (x 0 ,y 0 ) as the certificate. The proof is complete since φ6∈{lπ/2,θ +lπ|l∈Z} is true almost everywhere w.r.t. the measure overy. 3.7.6 Proof of Theorem 3.7 The proof relies on the use of (3.64) as a generative model to construct vectors x ∈ K 0 (Λ 1 ,m) and y∈K 0 (Λ 2 ,n) such that (x,y) is unidentifiable inK. To do so, we shall construct a family of vectors u 0 ∈M⊂R m−1 such that any vectorx∈R m that admits the representation in (3.64) (certified by some vector u∈M) satisfies x(1)6= 0, x(m)6= 0 and x(Λ 1 ) = 0 (thus implying x∈K 0 (Λ 1 ,m)). For every j∈ Λ 1 we setu 0 (j− 1) = 0. Sincex(j) = 0, we getu 0 (j) = 0 for every j∈ Λ 1 if (3.64) is to be consistent regardless of the value of θ. We assign arbitrary non-zero values to all other elements of u 0 (2 : m− 1), i.e. for j6∈ Λ 1 S Λ 1 − 1 we are free to chooseu 0 (j). LettingM denote the set of all vectorsu 0 ∈R m−1 so constructed, we have M = n w∈R m−1 w(j) = 0,∀j∈ Λ 1 [ (Λ 1 − 1) andw(k)6= 0,∀k6∈ Λ 1 [ (Λ 1 − 1) o . (3.67) For k∈{1,m− 1}, clearly k6∈ Λ 1 S (Λ 1 − 1) so thatM\{0} is non-empty and any vectoru∈M satisfies u(1)6= 0,u(m− 1)6= 0 andu(j− 1) =u(j) = 0 for every j∈ Λ 1 . Therefore, any vectorx∈R m that is representable as in (3.64) with a vectoru∈M acting as its certificate, satisfies x(1)6= 0,x(m)6= 0 and x(Λ 1 ) = 0 for θ6∈{lπ/2|l∈Z}, thus implyingx∈K 0 (Λ 1 ,m). A similar construction leads to a family of vectors v 0 ∈M 0 ⊂ R n−1 such that any vector y∈ R n , admitting a representation as in (3.64) with some vector v ∈M 0 as its certificate, satisfies y(1)6= 0, y(n)6= 0 andy(Λ 2 ) = 0 for φ6∈{lπ/2|l∈Z}, thus implyingy∈K 0 (Λ 2 ,n). Next, we arbitrarily select θ and φ satisfying φ−θ6∈{lπ|l∈Z}, tanθ6∈{0,±∞} and tanφ6∈{0,±∞}, and letu∈M andv∈M 0 be chosen arbitrarily as well to invoke (3.64) and (3.66) and respectively generate the pairs (x,y) and (x 0 ,y 0 ). By (3.65), the pairs (x,y) and (x 0 ,y 0 ) are indistinguishable under the linear convolution map, since xy T −x 0 (y 0 ) T =X∈N (S, 2). Since (x,y), (x 0 ,y 0 )∈K and, x andx 0 are linearly independent by the choice φ−θ6∈{lπ|l∈Z}; (x,y)∈K is unidentifiable by Definition 3.1. 78 3.7.7 Proof of Theorem 3.8 The proof builds on the ideas used to prove Theorem 3.7. Let u∈K 0 (Λ 1 S (Λ 1 − 1),m− 1) and v ∈ K 0 (Λ 2 S (Λ 2 − 1),n− 1) be chosen arbitrarily. Further, we choose θ and φ to satisfy (θ,φ)∈ n (β,γ)∈ [0, 2π) 2 β,γ6∈{lπ/2|l∈Z}, β−γ6∈{lπ|l∈Z} o . (3.68) We use the 4-tuple (u,v,θ,φ) in (3.64) to generate the vector pair (x,y) and in (3.66) to generate the pair (x 0 ,y 0 ). By construction, 06∈{u(1),u(m− 1),v(1),v(n− 1)} and 06∈{sinθ, cosθ, sinφ, cosφ}, so (3.64) implies 06∈{x(1),x(m),y(1),y(n)} and (3.66) implies 06∈{x 0 (1),x 0 (m),y 0 (1),y 0 (n)}. We also have u(Λ 1 ) =u(Λ 1 − 1) = 0 andv(Λ 2 ) =v(Λ 2 − 1) = 0 by construction, and x(Λ 1 ) =u(Λ 1 ) cosθ−u(Λ 1 − 1) sinθ = 0, y(Λ 2 ) =−v(Λ 2 ) cosφ +v(Λ 2 − 1) sinφ = 0, (3.69a) x 0 (Λ 1 ) =u(Λ 1 ) cosφ−u(Λ 1 − 1) sinφ = 0, y(Λ 2 ) =−v(Λ 2 ) cosθ +v(Λ 2 − 1) sinθ = 0, (3.69b) where (3.69a) follows from (3.64) and (3.69b) follows from (3.66). Therefore, x,x 0 ∈ K 0 (Λ 1 ,m) and y,y 0 ∈K 0 (Λ 2 ,n), implying that (x,y), (x 0 ,y 0 )∈K 0 (Λ 1 ,m)×K 0 (Λ 2 ,n) =K. By (3.65), the pairs (x,y) and (x 0 ,y 0 ) are indistinguishable under the linear convolution map, since xy T −x 0 (y 0 ) T =X∈N (S, 2). Sincex andx 0 are linearly independent by the choice φ−θ6∈{lπ|l∈Z}, (x,y)∈K is unidentifiable by Definition 3.1. We construct a setG as follows. A vector pair (x,y)∈G if and only if all of the following conditions are satisfied. (A1) (x,y)∈R m ×R n is generated from a 4-tuple (u,v,θ,φ)∈R m−1 ×R n−1 × [0, 2π)× [0, 2π) using (3.64). (A2) θ and φ satisfy (3.68). (A3) u∈K 0 (Λ 1 S (Λ 1 − 1),m− 1) andv∈K 0 (Λ 2 S (Λ 2 − 1),n− 1). From the arguments in the previous paragraph, we have∅6=G⊂K 0 (Λ 1 ,m)×K 0 (Λ 2 ,n) =K and furthermore, every (x ∗ ,y ∗ )∈G is unidentifiable withinK. Let us consider the sets G 0 1 = (u,θ)∈R m−1 × [0, 2π) (A2) and (A3) are true , G 0 2 (θ) = (v,φ)∈R n−1 × [0, 2π) given θ∈ [0, 2π)\{lπ/2|l∈Z}, (A2) and (A3) are true . (3.70) By the definition in (3.25),K 0 (Λ 1 S (Λ 1 − 1),m− 1) is a m−1−|Λ 1 S (Λ 1 − 1)| = (m− 1−p 1 ) dimensional Borel subset ofR m−1 , andK 0 (Λ 2 S (Λ 2 − 1),n− 1) is a n− 1−|Λ 2 S (Λ 2 − 1)| = (n− 1−p 2 ) dimensional Borel subset ofR n−1 . Further, [0, 2π)\{lπ/2|l∈Z} is a one dimensional Borel subset ofR and given a value of θ∈ [0, 2π), [0, 2π)\{lπ/2,θ +lπ|l∈Z} is also a one dimensional Borel subset ofR. Hence, (3.70), (A2) and (A3) imply that 1. G 0 1 is a (m− 1−p 1 ) + 1 = (m−p 1 ) dimensional Borel subset ofR m , 2. givenθ∈ [0, 2π)\{lπ/2|l∈Z},G 0 2 (θ) is a (n− 1−p 2 ) + 1 = (n−p 2 ) dimensional Borel subset ofR n , 3. (θ,φ) spans a two dimensional Borel subset ofR 2 , and 4. (u,v,θ,φ) spans a (m− 1−p 1 ) + (n− 1−p 2 ) + 2 = (m +n−p 1 −p 2 ) dimensional Borel subset of R m+n . Next, we compute the dimension of the setG. Consider a factorization ofG into G 1 ={x ∗ ∈R m | (x ∗ ,y ∗ )∈G} and G 2 (x ∗ ) ={y ∗ ∈R n | (x ∗ ,y ∗ )∈G}, (3.71) i.e.G ={(x ∗ ,y ∗ )|x ∗ ∈G 1 ,y ∗ ∈G 2 (x ∗ )}. Equations (3.71) and (3.64) imply that everyx∈G 1 is generated as the result of a uniformly continuous map from (u,θ)∈G 0 1 (with a non-singular Jacobian matrix). From 79 Lemma 3.9, the quotient setQ ∼ (x,m) is finite for everyx∈G 1 and therefore eachx∈G 1 can be generated by at most a finite number of elements (u,θ)∈G 0 1 using (3.64). Given somex∈G 1 , let Θ⊂ [0, 2π) be a set such that for every θ∈ Θ, there exists (u,θ)∈G 0 1 generatingx using (3.64). Clearly,|Θ|≤ (2m− 2) from Lemma 3.9. Then, (3.71) and (3.64) imply that everyy∈G 2 (x) is generated as the result of a uniformly continuous map from (v,φ)∈ S θ∈Θ G 0 2 (θ) (with a non-singular Jacobian matrix). Using Lemma 3.9, the quotient setQ ∼ (y,n) is finite for everyy∈G 2 (x) and therefore eachy∈G 2 (x) can be generated by at most a finite number of elements (v,φ)∈ S θ∈Θ G 0 2 (θ) using (3.64). Since Θ is a finite set, the above arguments and (3.71) imply that every element (x,y)∈G is generated by at most a finite number of 4-tuples (u,v,θ,φ) satisfying (A2) and (A3), using the uniformly continuous maps in (3.64) (with non-singular Jacobian matrices). Since the set of all 4-tuples (u,v,θ,φ) satisfying (A2) and (A3) is a (m +n−p 1 −p 2 ) dimensional Borel subset ofR m+n , we getG as a Borel subset ofR m+n of dimension (m +n−p 1 −p 2 ). We let G ∗ = G/Id R , i.e. G ∗ is the quotient set of G w.r.t. the equivalence relation Id R . Clearly, G ∗ ⊆K/Id R . Since each element (x,y)∈G/Id R is a representative for a one dimensional Borel subset αx, 1 α y ⊂G⊂R m ×R n , the dimension ofG/Id R is one less than the dimension ofG. Hence,G ∗ is a (m +n− 1−p 1 −p 2 ) dimensional set. 3.7.8 Proof of Lemma 3.9 The proof borrows ideas from the proof of Lemma 3.5. Let (w 0 ∗ ,γ 0 )∈Q ∼ (w,d) be arbitrarily selected. Since 06∈{w(1),w(d)} by assumption, we have 06∈{w 0 ∗ (1),w 0 ∗ (d− 1), sinγ 0 , cosγ 0 } by definition ofQ ∼ (w,d), and therefore division by any element of the set{w(1),w(d),w 0 ∗ (1),w 0 ∗ (d− 1), sinγ 0 , cosγ 0 } is well defined. We have, cosγ 0 =w(1)/w 0 ∗ (1), sinγ 0 =−w(d)/w 0 ∗ (d− 1), (3.72) and for every 2≤j≤d− 1, w(j) =w 0 ∗ (j) cosγ 0 −w 0 ∗ (j− 1) sinγ 0 . (3.73) We define the vectorsw/w(d) =c∈R d andw 0 ∗ /w 0 ∗ (d− 1) =s∈R d−1 . For j = 2, 3,...,d− 1, we have s(j− 1) = w 0 ∗ (j− 1) w 0 ∗ (d− 1) = 1 w 0 ∗ (d− 1) [w 0 ∗ (j) cotγ 0 −w(j) cscγ 0 ] (3.74a) = 1 w 0 ∗ (d− 1) −w 0 ∗ (j) w(1) w(d) w 0 ∗ (d− 1) w 0 ∗ (1) +w(j) w 0 ∗ (d− 1) w(d) (3.74b) = w(j) w(d) − w(1) w(d) w 0 ∗ (j) w 0 ∗ (d− 1) w 0 ∗ (d− 1) w 0 ∗ (1) =c(j)−c(1) s(j) s(1) (3.74c) where (3.74a) follows from (3.73), (3.74b) follows from (3.72) using basic trigonometric relationships, and (3.74c) follows from the definitions for vectorsc ands. Equation (3.74) represents a set of constraints on the vector variables and conceptually resembles (3.62) in Section 3.7.4. Sinces(d− 1) = 1 by definition, (3.74) can be solved recursively fors(d− 2),s(d− 3),...,s(1) (in that order) as polynomial expressions in the variable 1/s(1). Note thats(1) =w 0 ∗ (1)/w 0 ∗ (d− 1)6= 0 by definition. Specifically, (3.74) for j = 2 expresses s(1) as a (d− 2) degree polynomial of the variable 1/s(1) and represents a consistency equation that must be satisfied by every solution to (3.74) and hence by every element of the setQ ∼ (w,d). Furthermore, dividing the consistency equation bys(1) on both l.h.s. and r.h.s. results in a polynomial equation of degree (d− 1) in the variable 1/s(1) and therefores(1) can admit at most (d− 1) distinct values. IfS denotes the set of admissible values ofs(1) then|S|≤ (d− 1). For a given value ofs(1)∈S, (3.74) uniquely determiness(j) for every 2≤j≤d− 2 and we haves(d− 1) = 1 by definition. From (3.72), we get tanγ 0 =− w(d) w(1) w 0 ∗ (1) w 0 ∗ (d− 1) =− s(1) c(1) (3.75) 80 implying that given a value ofs(1)∈S, tanγ 0 is uniquely determined, and cscγ 0 ∈ n ± p 1 + tan −2 γ 0 o and γ 0 ∈ [0, 2π) each admits at most two distinct values. Therefore, the pair of scalars (s(1),γ 0 ) admits at most 2|S| distinct values. Using (3.72) and the definition ofs, we havew 0 ∗ =w 0 ∗ (d− 1)·s =−s·w(d)· cscγ 0 implying thatw 0 ∗ is uniquely determined by (s(1),γ 0 ). Therefore, (w 0 ∗ ,γ 0 ) admits at most 2|S|≤ (2d− 2) distinct values. Since (w 0 ∗ ,γ 0 ) represents an arbitrary element ofQ ∼ (w,d), we conclude thatQ ∼ (w,d) is finite with cardinality at most (2d− 2). 3.7.9 Proof of Corollary 3.10 This follows as a consequence of the proof strategy used for Theorem 3.8 in Section 3.7.7. Let u ∈ K 0 (Λ S (Λ− 1),m− 1) andv∈R n−1 be chosen arbitrarily and we choose θ and φ to satisfy (3.68). The 4-tuple (u,v,θ,φ) is used in (3.64) to generate the vector pair (x,y) and in (3.66) to generate the pair (x 0 ,y 0 ). Sinceu(Λ) =u(Λ− 1) = 0 by construction, using Λ 1 = Λ in (3.69) givesx(Λ) =x 0 (Λ) = 0 thus implying x,x 0 ∈K 0 (Λ,m). We havey,y 0 ∈R n from (3.64) and (3.66), and therefore (x,y), (x 0 ,y 0 )∈K 0 (Λ,m)×R n = K. By (3.65), the pairs (x,y) and (x 0 ,y 0 ) are indistinguishable under the linear convolution map, since xy T −x 0 (y 0 ) T =X∈N (S, 2). Sincex andx 0 are linearly independent by the choice φ−θ6∈{lπ|l∈Z}, (x,y)∈K is unidentifiable by Definition 3.1. We construct a setG as follows. A vector pair (x,y)∈G if and only if the conditions (A1) and (A2) from Section 3.7.7 and (A4) below are satisfied. (A4) u∈K 0 (Λ S (Λ− 1),m− 1). Clearly,∅6=G⊂K 0 (Λ,m)×R n =K and from the above arguments, every (x ∗ ,y ∗ )∈G is unidentifiable withinK. Let us consider the sets G 0 1 = (u,θ)∈R m−1 × [0, 2π) (A2) and (A4) are true , G 0 2 (θ) = (v,φ)∈R n−1 × [0, 2π) given θ∈ [0, 2π)\{lπ/2|l∈Z}, (A2) is true . (3.76) Analogous to Section 3.7.7, we get 1. K 0 (Λ S (Λ− 1),m− 1) is a (m− 1−p) dimensional Borel subset ofR m−1 , 2. [0, 2π)\{lπ/2|l∈Z} is a one dimensional Borel subset ofR, and 3. given a value of θ∈ [0, 2π), [0, 2π)\{lπ/2,θ +lπ|l∈Z} is a one dimensional Borel subset ofR. Hence, (3.76), (A2) and (A4) imply that 1. G 0 1 is a (m− 1−p) + 1 = (m−p) dimensional Borel subset ofR m , 2. given θ∈ [0, 2π)\{lπ/2|l∈Z},G 0 2 (θ) is a (n− 1) + 1 =n dimensional Borel subset ofR n , 3. (θ,φ) spans a two dimensional Borel subset ofR 2 , and 4. (u,v,θ,φ) spans a (m− 1−p) + (n− 1) + 2 = (m +n−p) dimensional Borel subset ofR m+n . To compute the dimension of the setG, we consider the factorization ofG given by (3.71) and follow the same sequence of arguments as in Section 3.7.7 to conclude that every element (x,y)∈G is generated by at most a finite number of 4-tuples (u,v,θ,φ) satisfying (A2) and (A4), using the uniformly continuous maps in (3.64) (with non-singular Jacobian matrices). Since the set of all 4-tuples (u,v,θ,φ) satisfying (A2) and (A4) is a (m +n−p) dimensional Borel subset ofR m+n , we getG as a Borel subset ofR m+n of dimension (m +n−p). We letG ∗ =G/Id R ⊆K/Id R . By the same argument as in Section 3.7.7, the dimension of G/Id R is one less than the dimension ofG, and hence,G ∗ is a (m +n− 1−p) dimensional set. 81 3.7.10 Proof of Corollary 3.11 This proof is based on ideas from the proofs for Theorem 3.6 in Section 3.7.5 and Theorem 3.8 in Section 3.7.7. Letu∈K 0 (Λ S (Λ− 1),m− 1) andy∈R n be arbitrarily chosen. Since n≥ 4 is an even integer, invoking Lemma 3.5 fory yields a vectorv∈R n−1 and a scalar φ∈ [0, 2π) such that the second relationship in (3.64) is satisfied. Next, we select θ∈ [0, 2π)\{lπ/2|l∈Z} and use the parameters (u,θ) in (3.64) to generate the vectorx∈R m . We reuse the parameter 4-tuple (u,v,θ,φ) in (3.66) to generate the pair (x 0 ,y 0 )∈R m ×R n . Given the value of θ, we assume φ∈ [0, 2π)\{lπ/2,θ +lπ|l∈Z}. By construction, we have 06∈ {sinφ, cosφ, sinθ, cosθ,u(1),u(m− 1)} so that (3.64) and (3.66) imply that 06∈{x(1),x(m),x 0 (1),x 0 (m)}. Furthermore, u(Λ) = u(Λ− 1) = 0 by construction, so using Λ 1 = Λ in (3.69) gives x(Λ) = x 0 (Λ) = 0 thus implyingx,x 0 ∈K 0 (Λ,m). Sincey,y 0 ∈R n , we have (x,y), (x 0 ,y 0 )∈K 0 (Λ,m)×R n =K. By (3.65), the pairs (x,y) and (x 0 ,y 0 ) are indistinguishable under the linear convolution map, sincexy T −x 0 (y 0 ) T = X∈N (S, 2). Since x and x 0 are linearly independent by the choice φ−θ6∈{lπ|l∈Z}, (x,y)∈K is unidentifiable by Definition 3.1. Toshowthat, givenanyvectoru∈K 0 (Λ S (Λ− 1),m− 1)andanyθ∈ [0, 2π)\{lπ/2|l∈Z}, almostevery choice ofy∈R n yields an unidentifiable pair (x,y)∈K, it suffices to show that φ6∈{lπ/2,θ +lπ|l∈Z} holds almost everywhere w.r.t. the measure overy. From (3.64), the dependence ofy is uniformly continuous on the n real numbers (φ,v(1),v(2),...,v(n− 1)) and these n real numbers completely parametrize y. Further, since the measure overy is absolutely continuous w.r.t. the n dimensional Lebesgue measure, it is possible to choose a measure overφ that is absolutely continuous w.r.t. the one dimensional Lebesgue measure. Since{lπ/2,θ +lπ|l∈Z} is a zero dimensional set (given the value of θ), φ6∈{lπ/2,θ +lπ|l∈Z} is true almost everywhere w.r.t. the measure over φ and therefore also w.r.t. the measure overy. As was already computed in Section 3.7.9,K 0 (Λ S (Λ− 1),m− 1) is a (m− 1−p) dimensional Borel subset of R m−1 and [0, 2π)\{lπ/2|l∈Z} is a one dimensional Borel subset of R. Therefore, the setH defined as H, n (u 0 ,θ 0 ) u 0 ∈K 0 Λ [ (Λ− 1),m− 1 , θ 0 ∈ [0, 2π)\{lπ/2|l∈Z} o , (3.77) is a (m− 1−p) + 1 = (m−p) dimensional Borel subset ofR m . LetH 0 ⊆K 0 (Λ,m) denote the set of all vectorsx∈K 0 (Λ,m) such that∃(u,θ)∈H generating the vectorx using (3.64). Since (3.64) represents uniformly continuous maps with non-singular Jacobian matrices, we haveH 0 as a (m−p) dimensional Borel subset of R m . We setG 0 =H 0 T {w∈R m |kwk 2 = 1}. Observing thatH 0 is a cone, and that we form G 0 ⊂H 0 ⊂R m by selecting only those vectors fromH 0 that admit unit ` 2 -norm, it is easy to conclude that G 0 has one less degree of freedom thanH 0 . Thus,G 0 ⊆K 0 (Λ,m) T {w∈R m |kwk 2 = 1} is a (m−p− 1) dimensional set. We note that given anyx∈G 0 , Lemma 3.9 implies that the quotient setQ ∼ (x,m) is finite and therefore eachx∈G 0 can be generated by at most a finite number of elements (u,θ)∈H using (3.64). Givenx∈G 0 , let Θ⊂ [0, 2π) be a set such that for every θ∈ Θ, there exists (u,θ)∈H generatingx using (3.64). Invoking the same arguments as in the third paragraph of this proof, we have that the set{lπ/2,θ +lπ|l∈Z,θ∈ Θ} is countable and hence φ6∈{lπ/2,θ +lπ|l∈Z,θ∈ Θ} is true almost everywhere w.r.t. the measure overy, implying that (x,y)∈K is unidentifiable almost everywhere w.r.t. the measure overy. Sincex∈G 0 was arbitrarily chosen, the proof is done. 3.7.11 Proof of Corollary 3.12 For notational convenience, we shall denoteg byx andh byy. Architecturally, the proof uses (3.64) and (3.66) as generative models and (3.65) as the ambiguity model, but the specific construction of the setG ∗ is different from that of Corollary 3.10. For intuitive clarity, we shall assume that this proof architecture holds and derive some sufficient conditions that would lead us to the desired construction for the setG ∗ . We choose the parameters v∈R n−1 ,φ∈ [0, 2π) arbitrarily and generate the vectory∈R n using the second relationship in (3.64). Let u∈R m−1 ,θ∈ [0, 2π) represent parameters such that x∈K 1 (Λ 0 ,m) is generated by the first relationship in (3.64) and using the 4-tuple (u,v,θ,φ) in (3.66), we generate (x 0 ,y 0 )∈K =K 1 (Λ 0 ,m)×R n . By definition ofK 1 (Λ 0 ,m), such a construction is consistent if and only if we 82 have x(Λ 0 ) =u(Λ 0 ) cosθ−u(Λ 0 − 1) sinθ =c 1 (θ,φ)1, (3.78a) x 0 (Λ 0 ) =u(Λ 0 ) cosφ−u(Λ 0 − 1) sinφ =c 2 (θ,φ)1, (3.78b) for some scalars c 1 (θ,φ),c 2 (θ,φ)∈R\{0} that could depend on (θ,φ), and 06∈{u(1),u(m− 1)}. Solving the system of equations in (3.78) foru(Λ 0 ) andu(Λ 0 − 1), we get u(Λ 0 ) = c 1 (θ,φ) sinφ−c 2 (θ,φ) sinθ sin(φ−θ) 1, (3.79a) u(Λ 0 − 1) = c 1 (θ,φ) cosφ−c 2 (θ,φ) cosθ sin(φ−θ) 1. (3.79b) Given a value of φ∈ [0, 2π)\{sπ/2|s∈Z}, letG(φ)⊆R m−1 × [0, 2π) denote the set of solutions to (u,θ) satisfying the system of equations in (3.79) with the restriction θ6∈{φ +sπ,sπ/2|s∈Z}. For the time being, we assume thatG(φ) is non-empty (this will be shown later in the proof). Thus, the construction of (x,y), (x 0 ,y 0 )∈K is consistent and invoking (3.65) says that these vector pairs are indistinguishable under the linear convolution map sincexy T −x 0 (y 0 ) T =X∈N (S, 2). Sincex andx 0 are linearly independent by the choice φ−θ6∈{sπ|s∈Z} and θ,φ6∈{sπ/2|s∈Z}, (x,y)∈K is unidentifiable by Definition 3.1. Next we show thatG(φ) is non-empty. First, we assume that Λ 0 T (Λ 0 − 1) =∅. In this case,u(Λ 0 ) and u(Λ 0 − 1) do not share any variables so that any θ6∈{φ +sπ|s∈Z} will yield a consistent assignment for u Λ 0 S (Λ 0 − 1) through (3.79) with c 1 (θ,φ) and c 2 (θ,φ) set at arbitrarily fixed non-zero constants. Assigning arbitrary non-zero values to the variablesu(j) for j6∈ Λ 0 S (Λ 0 − 1) yields an element ofG(φ) thus proving its non-emptiness. Next, we assume Λ 0 T (Λ 0 − 1)6=∅. In this case,u(Λ 0 ) andu(Λ 0 − 1) share at least one variable. Any variable, so shared, is assigned different symbolic representations by (3.79a) and (3.79b) and thus these assignments must be identical for the consistency of the system of equations in (3.79). Since θ6∈{φ +sπ|s∈Z}, sin(φ−θ) is non-zero and we have c 2 (θ,φ) c 1 (θ,φ) = sinφ− cosφ sinθ− cosθ (3.80) for consistency. Choosing any φ6∈{sπ +π/4|s∈Z}, θ6∈{φ +sπ,sπ +π/4|s∈Z} and setting c 1 (θ,φ) = sinθ− cosθ and c 2 (θ,φ) = sinφ− cosφ will satisfy (3.80) and thus generate a consistent assignment to u Λ 0 S (Λ 0 − 1) through (3.79). Assigning arbitrary non-zero values to the variablesu(j) forj6∈ Λ 0 S (Λ 0 − 1) yields an element ofG(φ) thus proving its non-emptiness. LetG 0 ⊆K be a set such that (x,y)∈G 0 if and only if condition (A1) from Section 3.7.7 and the conditions (A5) and (A6) stated below are satisfied. (A5) φ∈ [0, 2π)\{sπ +π/4,sπ/2|s∈Z} and θ∈ [0, 2π)\{φ +sπ,sπ +π/4,sπ/2|s∈Z}. (A6) u satisfies (3.79) and 06∈{u(1),u(m− 1)}. From the arguments in the previous paragraph, it is clear thatG 0 6=∅ and every (x ∗ ,y ∗ )∈G 0 is unidentifiable withinK. We consider the sets G 0 1 = n (v,θ,φ)∈R n−1 × [0, 2π) 2 (A5) is true o , G 0 2 (θ,φ) = u∈R m−1 given (θ,φ), (A6) is true . (3.81) Since (θ,φ) satisfying (A5) spans a two dimensional Borel subset ofR 2 andv∈R n−1 is unconstrained, (3.81) implies thatG 0 1 is a 2 + (n− 1) = (n + 1) dimensional Borel subset ofR n+1 . To compute the dimension of G 0 2 (θ,φ), we shall consider separately the cases of Λ 0 T (Λ 0 − 1)6=∅ and Λ 0 T (Λ 0 − 1) =∅. If Λ 0 T (Λ 0 − 1)6=∅ then the arguments in the previous paragraph imply that for u∈G 0 2 (θ,φ),u Λ 0 S (Λ 0 − 1) lies on a one dimensional subspace collinear with 1∈R p andu(j) is unconstrained for j6∈ Λ 0 S (Λ 0 − 1). Thus,G 0 2 (θ,φ) is 83 a 1 + (m− 1−p) = (m−p) dimensional Borel subset ofR m−1 for a given pair (θ,φ) satisfying (A5) and Λ 0 T (Λ 0 − 1)6=∅. On the other hand, if Λ 0 T (Λ 0 − 1) =∅ then from the arguments in the previous paragraph we haveu(Λ 0 ) andu(Λ 0 − 1) independently lying on one dimensional subspaces collinear with 1∈R p/2 and hence,u Λ 0 S (Λ 0 − 1) lies on a two dimensional subspace ofR p foru∈G 0 2 (θ,φ). Sinceu(j) is unconstrained for j6∈ Λ 0 S (Λ 0 − 1),G 0 2 (θ,φ) is a 2 + (m− 1−p) = (m + 1−p) dimensional Borel subset of R m−1 for a given pair (θ,φ) satisfying (A5) and Λ 0 T (Λ 0 − 1) =∅. By definition ofG 0 , it is clear that every (x,y)∈G 0 is generated from a triplet (v,θ,φ)∈G 0 1 and a vectoru∈G 0 2 (θ,φ) using the uniformly continuous maps in (3.64) (with non-singular Jacobian matrices). Invoking Lemma 3.9 in a manner similar to Section 3.7.7, we conclude that every element (x,y)∈G 0 is generated by at most a finite number of 4-tuples (u,v,θ,φ) and henceG 0 is a Borel subset of R m+n of the same dimension as the set of all 4-tuples (u,v,θ,φ) satisfying (A5) and (A6). Thus,G 0 ⊆R m+n is a (n + 1) + (m−p) = (m +n + 1−p) dimensional set for Λ 0 T (Λ 0 − 1)6=∅ and a (n + 1) + (m + 1−p) = (m +n + 2−p) dimensional set for Λ 0 T (Λ 0 − 1) =∅. We letG ∗ ⊆G 0 /Id R ⊆K/Id R . By the same argument as in Section 3.7.7, the dimension ofG ∗ is one less than the dimension ofG 0 which completes the proof. 3.7.12 Proof of Corollary 3.13 The chain of arguments in this proof is similar in flavor to that in the proof of Corollary 3.12 in Section 3.7.11 with modifications borrowed from the proof of Corollary 3.10 in Section 3.7.9. For notational convenience, we denoteg byx andh byy. Letv∈K 0 (Λ 00 S (Λ 00 − 1),n− 1) and (θ,φ) satisfying (3.68) be chosen arbitrarily. We chooseu∈R m−1 to satisfy (3.78) for some scalarsc 1 (θ,φ),c 2 (θ,φ)∈R\{0} that could depend on (θ,φ), and 06∈{u(1),u(m− 1)}. From the arguments in Section 3.7.11, (3.78) admits a non-empty set of solutions so thatu admits at least one choice. We use the 4-tuple (u,v,θ,φ) in (3.64) to generate the vector pair (x,y) and in (3.66) to generate the pair (x 0 ,y 0 ). By construction, we have 06∈{sinφ, cosφ, sinθ, cosθ} so that (3.64) and (3.66) imply that 06∈{y(1),y(n),y 0 (1),y 0 (n)}. Since v(Λ 00 ) =v(Λ 00 − 1) = 0 by construction, using Λ 1 = Λ 00 in (3.69) gives y(Λ 00 ) = y 0 (Λ 00 ) = 0 thus implying y,y 0 ∈K 0 (Λ 00 ,n). (3.78) implies that x(Λ 0 ) andx 0 (Λ 0 ) are both one dimensional vectors collinear with 1∈R |Λ 0 | and using 06∈{u(1),u(m− 1)} implies 06∈{x(1),x(m),x 0 (1),x 0 (m)} from (3.64) and (3.66). Thus, x,x 0 ∈K 1 (Λ 0 ,m) by definition and we have (x,y), (x 0 ,y 0 )∈K 1 (Λ 0 ,m)×K 0 (Λ 00 ,n) =K. By (3.65), the vector pairs (x,y) and (x 0 ,y 0 ) are indistinguishable under the linear convolution map sincexy T −x 0 (y 0 ) T =X∈N (S, 2). Sincey andy 0 are linearly independent by the choiceφ−θ6∈{sπ|s∈Z} andθ,φ6∈{sπ/2|s∈Z}, (x,y)∈K is unidentifiable by Definition 3.1. LetG 0 ⊆K be a set such that (x,y)∈G 0 if and only if the conditions (A1) from Section 3.7.7, (A5) and (A6) from Section 3.7.11, and (A7) stated below are satisfied. (A7) v∈K 0 (Λ 00 S (Λ 00 − 1),n− 1). From the arguments in the previous paragraph, it is clear thatG 0 6=∅ and every (x ∗ ,y ∗ )∈G 0 is unidentifiable withinK. For notational brevity, we letG 0 1 =K 0 (Λ 00 S (Λ 00 − 1),n− 1) and letG 0 3 ⊆ [0, 2π) 2 denote the set of all pairs (θ,φ) satisfying condition (A5). Reusing the definition ofG 0 2 (θ,φ) from (3.81), we have that every (x,y)∈G 0 is generated from a vectorv∈G 0 1 , a pair (θ,φ)∈G 0 3 and a vectoru∈G 0 2 (θ,φ) using the uniformly continuous maps in (3.64) (with non-singular Jacobian matrices). Note thatG 0 1 is a (n− 1−p 2 ) dimensional Borel subset of R n−1 andG 0 3 is a 2 dimensional Borel subset of R 2 using arguments from Sections 3.7.9 and 3.7.11, respectively. Furthermore, arguments in Section 3.7.11 imply that for a given pair (θ,φ)∈G 0 3 , G 0 2 (θ,φ)⊆R m−1 is a (m−p 1 ) dimensional Borel set for Λ 0 T (Λ 0 − 1)6=∅ and a (m + 1−p 1 ) dimensional Borel set for Λ 0 T (Λ 0 − 1) =∅. Invoking Lemma 3.9 in a manner analogous to Section 3.7.7, we conclude that every (x,y)∈G 0 is generated by at most a finite number of 4-tuples (u,v,θ,φ) and henceG 0 is a Borel subset ofR m+n of the same dimension as the set of all 4-tuples (u,v,θ,φ) satisfying (A5), (A6) and (A7). Thus,G 0 is a (n− 1−p 2 ) + 2 + (m−p 1 ) = (m +n + 1−p 1 −p 2 ) dimensional set for Λ 0 T (Λ 0 − 1)6=∅, and a (n− 1−p 2 ) + 2 + (m + 1−p 1 ) = (m +n + 2−p 1 −p 2 ) dimensional set for Λ 0 T (Λ 0 − 1) =∅. 84 We letG ∗ ⊆G 0 /Id R ⊆K/Id R . By the same argument as in Section 3.7.7, the dimension ofG ∗ is one less than the dimension ofG 0 which completes the proof. 3.7.13 Proof of Corollary 3.14 This proof is conceptually similar to the proof of Corollary 3.12 in Section 3.7.11, the only difference being the use of the cooperative codeb∈R |Λ 0 | in place of the repetition code 1∈R |Λ 0 | and the ensuing change in the premise of Corollary 3.14 for the case of Λ 0 T (Λ 0 − 1)6=∅. For notational clarity, we denoteg byx andh byy. Let parameters v∈R n−1 ,φ∈ [0, 2π) be arbitrarily chosen and let u∈R m−1 ,θ∈ [0, 2π) represent parameters such that the 4-tuple (u,v,θ,φ) generates (x,y)∈K =K b (Λ 0 ,m)×R n from (3.64) and (x 0 ,y 0 )∈K from (3.66). For the construction to be consistent we must have x(Λ 0 ) =u(Λ 0 ) cosθ−u(Λ 0 − 1) sinθ =c 1 (θ,φ)b, (3.82a) x 0 (Λ 0 ) =u(Λ 0 ) cosφ−u(Λ 0 − 1) sinφ =c 2 (θ,φ)b, (3.82b) for some scalars c 1 (θ,φ),c 2 (θ,φ)∈R\{0} that could depend on (θ,φ), and 06∈{u(1),u(m− 1)}. Solving (3.82) foru(Λ 0 ) andu(Λ 0 − 1), we get u(Λ 0 ) = c 1 (θ,φ) sinφ−c 2 (θ,φ) sinθ sin(φ−θ) b, (3.83a) u(Λ 0 − 1) = c 1 (θ,φ) cosφ−c 2 (θ,φ) cosθ sin(φ−θ) b. (3.83b) Given a value of φ∈ [0, 2π)\{sπ/2|s∈Z}, if there exists a solution to (u,θ) satisfying (3.83) with the restriction θ6∈{φ +sπ,sπ/2|s∈Z}, then the construction of (x,y), (x 0 ,y 0 )∈K is consistent. Invoking (3.65) says that (x,y) and (x 0 ,y 0 ) are indistinguishable under the linear convolution map since xy T − x 0 (y 0 ) T =X∈N (S, 2). Sincex andx 0 are linearly independent by the choice φ−θ6∈{sπ|s∈Z} and θ,φ6∈{sπ/2|s∈Z}, (x,y)∈K is unidentifiable by Definition 3.1. To show that the system in (3.83) admits at least one solution, we invoke the arguments in the third paragraph of Section 3.7.11. For Λ 0 T (Λ 0 − 1) =∅, the arguments hold verbatim from Section 3.7.11. For Λ 0 T (Λ 0 − 1)6=∅,u Λ 0 T (Λ 0 − 1) must receive consistent assignments from (3.83a) and(3.83b). By definition of the index subset Λ ∗ ⊆{1, 2,...,|Λ 0 |} in the statement of Corollary 3.14, (3.83a) and (3.83b) respectively imply (3.84a) and (3.84b) below. u Λ 0 \ (Λ 0 − 1) = c 1 (θ,φ) sinφ−c 2 (θ,φ) sinθ sin(φ−θ) b(Λ ∗ ), (3.84a) u Λ 0 \ (Λ 0 − 1) = c 1 (θ,φ) cosφ−c 2 (θ,φ) cosθ sin(φ−θ) b(Λ ∗ + 1). (3.84b) By the hypotheses of Corollary 3.14,b(Λ ∗ ) andb(Λ ∗ + 1) are collinear. So, there exists a constantr∈R\{0} such thatb(Λ ∗ + 1) =rb(Λ ∗ ). Since sin(φ−θ)6= 0 for θ6∈{φ +sπ|s∈Z}, the consistency requirement in (3.84) reduces to c 1 (θ,φ) sinφ−c 2 (θ,φ) sinθ =r·c 1 (θ,φ) cosφ−r·c 2 (θ,φ) cosθ, (3.85) which implies c 2 (θ,φ) c 1 (θ,φ) = sinφ−r cosφ sinθ−r cosθ = sin(φ−A) sin(θ−A) (3.86) for A = arctan(r). Choosing any φ6∈{sπ +A|s∈Z}, θ6∈{φ +sπ,sπ +A|s∈Z} and setting c 1 (θ,φ) = sin(θ−A) and c 2 (θ,φ) = sin(φ−A) will satisfy (3.86) and thus generate a consistent assignment to u Λ 0 S (Λ 0 − 1) through (3.83). This proves existence of at least one solution to (3.83) for Λ 0 T (Λ 0 − 1)6=∅. LetG 0 ⊆K be a set such that (x,y)∈G 0 if and only if condition (A1) from Section 3.7.7 and the conditions (A8) and (A9) stated below are satisfied. 85 (A8) φ∈ [0, 2π)\{sπ +A,sπ/2|s∈Z} and θ∈ [0, 2π)\{φ +sπ,sπ +A,sπ/2|s∈Z}. (A9) u satisfies (3.83) and 06∈{u(1),u(m− 1)}. In the previous paragraph, we proved thatG 0 6=∅ and by construction, it is clear that every (x ∗ ,y ∗ )∈G 0 is unidentifiable withinK. Considering the sets G 0 1 = n (v,θ,φ)∈R n−1 × [0, 2π) 2 (A8) is true o , G 0 2 (θ,φ) = u∈R m−1 given (θ,φ), (A9) is true , (3.87) and following the same chain of arguments as in the fourth paragraph of Section 3.7.11 we getG 0 1 as a (n + 1) dimensional Borel subset ofR n+1 and the dimension ofG 0 2 (θ,φ) is computed as follows. If Λ 0 T (Λ 0 − 1) =∅ then u(Λ 0 ) and u(Λ 0 − 1) lie (independently of each other) on one dimensional subspaces collinear with b∈R p/2 and hence,u Λ 0 S (Λ 0 − 1) lies on a two dimensional subspace ofR p foru∈G 0 2 (θ,φ). Sinceu(j) is unconstrained for j6∈ Λ 0 S (Λ 0 − 1),G 0 2 (θ,φ) is a 2 + (m− 1−p) = (m + 1−p) dimensional Borel subset of R m−1 for a given pair (θ,φ) satisfying (A8). On the other hand, if Λ 0 T (Λ 0 − 1)6=∅ then (3.83) and (3.84) imply that foru∈G 0 2 (θ,φ),u(j) are unconstrained for j6∈ Λ 0 S (Λ 0 − 1) andu Λ 0 S (Λ 0 − 1) lies on a one dimensional subspace ofR p and is given by u Λ 0 \ (Λ 0 − 1) u Λ 0 T (Λ 0 − 1) u (Λ 0 − 1)\ Λ 0 =c(θ,φ) b(Λ c ∗ ) b(Λ ∗ ) 1 r b(Λ c ∗ ) (3.88) for some scalarc(θ,φ)∈R\{0}. Thus, for a given pair (θ,φ) satisfying (A8),G 0 2 (θ,φ) is a 1 + (m− 1−p) = (m−p) dimensional Borel subset ofR m−1 . SettingG ∗ ⊆G 0 /Id R ⊆K/Id R and using arguments from Section 3.7.7, we can conclude that the dimension ofG ∗ is one less than the dimension ofG 0 . Following the same chain of arguments as in the penultimate paragraph of Section 3.7.11, we getG ∗ as a (n + 1)+(m−p)−1 = (m +n−p) dimensional set for Λ 0 T (Λ 0 − 1)6=∅ and as a (n + 1) + (m + 1−p)− 1 = (m +n + 1−p) dimensional set for Λ 0 T (Λ 0 − 1) =∅. 3.7.14 Proof of Corollary 3.15 This proof will cleverly combine parts from the proofs of Corollaries 3.12, 3.13 and 3.14 in Sections 3.7.11, 3.7.12 and 3.7.13 respectively. For notational convenience, we denote g by x and h by y. Since every unidentifiability result in this chapter uses (3.64) to construct unidentifiable pairs (x,y) in the feasible set K =D 1 (m)×D 2 (n), and the generative parameters forx andy in (3.64) are disjoint, the same generative use of (3.64) also works for the feasible setK =D 2 (m)×D 1 (n) to produce analogous unidentifiability results. For example, Corollary 3.13 asserts that forK =K 1 (Λ 0 ,m)×K 0 (Λ 00 ,n) and Λ 0 T (Λ 0 − 1)6=∅, there exists an unidentifiable subset ofK/Id R of dimension (m +n−p 1 −p 2 ). Invoking the (informal) disjoint generation argument, we get that forK =K 0 (Λ 0 ,m)×K 1 (Λ 00 ,n) and Λ 00 T (Λ 00 − 1)6=∅, there exists an unidentifiable subset ofK/Id R of dimension (m +n−p 1 −p 2 ). This conclusion can be verified by going through the chain of arguments in Section 3.7.12 with appropriate modifications to account for the change in the feasible domainK w.r.t. Corollary 3.13. We conclude that it suffices to consider unordered pairs (t,t 0 )∈{0, 1, 2} 2 for the purpose of this proof. Note that Theorem 3.8 is exactly equivalent to the case of (t,t 0 ) = (0, 0), and Corollary 3.13 addresses special sub-cases of both (t,t 0 ) = (1, 0) as well as (t,t 0 ) = (2, 0). In the rest of the proof, we complete the treatment for the cases (t,t 0 )∈{(1, 0), (2, 0)} and also prove Corollary 3.15 for the cases (t,t 0 )∈{(1, 1), (2, 1), (2, 2)}. Proof for (t,t 0 )∈{(1, 0), (2, 0)} From Definition 3.2, it is clear that Corollary 3.13 addressed the sub-case ofb collinear with 1∈R |Λ 0 | under both Λ 0 T (Λ 0 − 1)6=∅ as well as Λ 0 T (Λ 0 − 1) =∅. To address the other sub-case of b not collinear with 86 1∈R |Λ 0 | under both Λ 0 T (Λ 0 − 1)6=∅ as well as Λ 0 T (Λ 0 − 1) =∅, we observe that the assumptions of this sub-case also occur in the premise of Corollary 3.14. Comparing the proofs of Corollaries 3.14 and 3.12 in Sections 3.7.13 and 3.7.11 respectively, it can be seen that Section 3.7.13 follows the same sequence of arguments as Section 3.7.11 after replacing (3.78) and (3.79) by (3.82) and (3.83) respectively, thus changing the feasible set fromK 1 (Λ 0 ,m)×R n in Corollary 3.12 toK b (Λ 0 ,m)×R n in Corollary 3.14. Since the proof to Corollary 3.13 in Section 3.7.12 uses the first part of the argument in Section 3.7.11 (i.e. the part pertaining to the use of (3.78) and (3.79)) without modification and changes the feasible set fromK 1 (Λ 0 ,m)×R n in Corollary 3.12 toK 1 (Λ 0 ,m)×K 0 (Λ 00 ,n) in Corollary 3.13, it may be possible to merge the domain changes between Corollaries 3.12 and 3.14 with the domain changes between Corollaries 3.12 and 3.13 to arrive at unidentifiability results for the feasible setK b (Λ 0 ,m)×K 0 (Λ 00 ,n) under both Λ 0 T (Λ 0 − 1)6=∅ as well as Λ 0 T (Λ 0 − 1) =∅. To see that this argument is indeed valid, we recall the (informal) disjoint generative parameters phenomenon mentioned at the beginning of the present section. The proof is finished by observing that in (3.81), the change between Corollaries 3.12 and 3.13 modifiesG 0 1 only and the change between Corollaries 3.12 and 3.14 modifiesG 0 2 (θ,φ) only, so that both these changes can be applied simultaneously. Proof for (t,t 0 )∈{(1, 1), (2, 1), (2, 2)} For this part of the proof, we use the sequence of arguments from Corollary 3.14 two times, viz. once for x∈K b (Λ 0 ,m) and once fory∈K b 0(Λ 0 ,n). Specifically, we choose the 4-tuple of parameters (u,v,θ,φ) to generate (x,y)∈K =K b (Λ 0 ,m)×K b 0(Λ 00 ,n) from (3.64) and (x 0 ,y 0 )∈K from (3.66). For the construction to be consistent,u must satisfy (3.82) for some scalarsc 1 (θ,φ),c 2 (θ,φ)∈R\{0} that could depend on (θ,φ), and 06∈{u(1),u(m− 1)}. Similarly, consistency also requiresv to satisfy y(Λ 00 ) =v(Λ 00 − 1) sinφ−v(Λ 00 ) cosφ =c 0 1 (θ,φ)b 0 , (3.89a) y 0 (Λ 00 ) =v(Λ 00 − 1) sinθ−v(Λ 00 ) cosθ =c 0 2 (θ,φ)b 0 . (3.89b) for some scalarsc 0 1 (θ,φ),c 0 2 (θ,φ)∈R\{0} that could depend on (θ,φ), and 06∈{v(1),v(n− 1)}. If (Λ 0 ,b) is of typet = 2, then arguments in Section 3.7.13 imply that (3.82) admits at least one solution to (u,θ) via the assignment in (3.83). On the other hand, if (Λ 0 ,b) is of type t = 1, then the arguments for the sub-case of Λ 0 T (Λ 0 − 1)6=∅ in Sections 3.7.11 and 3.7.13 together imply that there exists a solution to (u,θ) satisfying (3.82) via the consistent assignment in (3.83). Details of the argument in Sections 3.7.11 and 3.7.13 further imply that it is possible to choose φ∈ [0, 2π)\{sπ/2|s∈Z} and θ∈ [0, 2π)\{φ +sπ,sπ/2|s∈Z}, so that 06∈{sinφ, cosφ, sinθ, cosθ} and it follows from (3.64) and (3.66) that 06∈{x(1),x(m),x 0 (1),x 0 (m)} and 06∈{y(1),y(n),y 0 (1),y 0 (n)} on invoking 06∈{u(1),u(m− 1),v(1),v(n− 1)}. Since we concluded the existence of a solution to (3.82) using nothing more than the type information for the pair (Λ 0 ,b), the same arguments and conclusions also hold for (3.89) if (Λ 00 ,b 0 ) is of type t 0 = 1 or type t 0 = 2, i.e. (3.89) admits at least one solution to (v,φ), satisfyingφ∈ [0, 2π)\{sπ/2|s∈Z} andθ∈ [0, 2π)\{φ +sπ,sπ/2|s∈Z}, via the assignment v(Λ 00 ) = c 0 1 (θ,φ) sinθ−c 0 2 (θ,φ) sinφ sin(φ−θ) b 0 , (3.90a) v(Λ 00 − 1) = c 0 1 (θ,φ) cosθ−c 0 2 (θ,φ) cosφ sin(φ−θ) b 0 , (3.90b) which follows from (3.89) on solving forv(Λ 00 ) andv(Λ 00 − 1). Hence, the construction of (x,y), (x 0 ,y 0 )∈K is consistent and (3.65) implies that (x,y) and (x 0 ,y 0 ) are indistinguishable under the linear convolution map sincexy T −x 0 (y 0 ) T =X∈N (S, 2). Sincexandx 0 arelinearlyindependentbythechoiceφ−θ6∈{sπ|s∈Z} and θ,φ6∈{sπ/2|s∈Z}, (x,y)∈K is unidentifiable by Definition 3.1. If (Λ 00 ,b 0 ) is of type t 0 = 1, we invoke the arguments in the sequence of implications (3.83)→ (3.84)→ (3.85)→ (3.86) from Section 3.7.13 to infer the existence of a constant r 0 ∈R\{0} such that the consistency of the assignment in (3.90) reduces to c 0 1 (θ,φ) sinθ−c 0 2 (θ,φ) sinφ =r 0 ·c 0 1 (θ,φ) cosθ−r 0 ·c 0 2 (θ,φ) cosφ, (3.91) 87 which implies c 0 1 (θ,φ) c 0 2 (θ,φ) = sinφ−r 0 cosφ sinθ−r 0 cosθ = sin(φ−B) sin(θ−B) (3.92) forB = arctan(r 0 ). For this case, choosing anyφ6∈{sπ +B|s∈Z},θ6∈{φ +sπ,sπ +B|s∈Z} andsetting c 0 1 (θ,φ) = sin(φ−B), c 0 2 (θ,φ) = sin(θ−B) will satisfy (3.92) and thus generate a consistent assignment to v Λ 00 S (Λ 00 − 1) through (3.90). We letG 0 ⊆K be a set such that (x,y)∈G 0 if and only if the conditions (A1) from Section 3.7.7, (A9) from Section 3.7.13, and (A10) and (A11) stated below are satisfied. (A10) φ∈ [0, 2π)\{sπ +A,sπ +B,sπ/2|s∈Z} and θ∈ [0, 2π)\{φ +sπ,sπ +A,sπ +B,sπ/2|s∈Z}. (A11) v satisfies (3.90) and 06∈{v(1),v(n− 1)}. From preceding arguments, it is clear thatG 0 6=∅ and every (x ∗ ,y ∗ )∈G 0 is unidentifiable withinK. Considering the setG 0 2 (θ,φ) from (3.87) and the sets G 0 1 (θ,φ) = v∈R n−1 given (θ,φ), (A11) is true , G 0 3 = n (θ,φ)∈ [0, 2π) 2 (A10) is true o , (3.93) we invoke arguments similar to those in the penultimate paragraph of Section 3.7.13. Thus,G 0 2 (θ,φ) is a (m + 1−p) dimensional (respectively (m−p) dimensional) Borel subset of R m−1 if (Λ 0 ,b) is of type t = 2 (respectively type t = 1). This implies a simple analytical formula: G 0 2 (θ,φ) is a (m− 1−p +t) dimensional Borel subset of R m−1 if (Λ 0 ,b) is of type t∈{1, 2}. By symmetry of definition, we also get G 0 1 (θ,φ) as a (n− 1−p 0 +t 0 ) dimensional Borel subset ofR n−1 if (Λ 00 ,b 0 ) is of type t 0 ∈{1, 2}. Clearly,G 0 3 is a two dimensional Borel subset ofR 2 and every (x,y)∈G 0 is generated from a pair (θ,φ)∈G 0 3 , a vector v∈G 0 1 (θ,φ), and a vectoru∈G 0 2 (θ,φ) using the uniformly continuous maps in (3.64) (with non-singular Jacobian matrices). Invoking Lemma 3.9 in a manner similar to Section 3.7.7, we conclude that every element (x,y)∈G 0 is generated by at most a finite number of 4-tuples (u,v,θ,φ) and henceG 0 is a Borel subset of R m+n of the same dimension as the set of all 4-tuples (u,v,θ,φ) satisfying (A9), (A10) and (A11). Thus, G 0 is a 2 + (m− 1−p +t) + (n− 1−p 0 +t 0 ) = (m +n−p−p 0 +t +t 0 ) dimensional set for (Λ 0 ,b) of type t and (Λ 00 ,b 0 ) of typet 0 for (t,t 0 )∈{(1, 1), (2, 1), (2, 2)}. SettingG ∗ ⊆G 0 /Id R ⊆K/Id R and invoking the same argument as in Section 3.7.7, the dimension ofG ∗ is one less than the dimension ofG 0 which completes the proof. 88 Chapter 4 Active Target Localization on Decaying Separable Fields 4.1 Introduction Detecting and localizing a target, from samples of its induced field, is an important problem of interest with manifestations in a wide variety of applications like environmental monitoring, cyber-security, medical diagnosis and military surveillance. Because of its ubiquity, a rich literature has evolved around this problem and its application specific variations utilizing ideas from statistics, signal processing, information theory, machine learning and data mining. In this chapter, we study a variation of the target detection and localization problem with sampling constraints on the induced target field. In particular, we consider the scenario where localization is desired from a set of samples that is information theoretically insufficient to reconstruct the complete target field, and construct a localization algorithm with accompanying theoretical performance analysis. The only structural properties we assume are that the target field is separable (or equivalently, bi-linearly parameterizable) and is monotonically non-increasing w.r.t. distance from the target location. The possibility of reducing the number of samples required for target detection/localization is of interest for time critical applications where speed of acquisition is a bottleneck, like in magnetic resonance imaging due to the slow sampling process and in underwater sonar imaging due to large search spaces. As a simple illustrative example, consider the side-scan sonar image in Figure 4.1, acquired by an autonomous underwater vehicle (AUV) with the goal of locating the position of the target (marked by a region of high intensity reflection) amongst background clutter (reflections from the sea bed). Examining the complete image, it is easy to identify the location of the object of interest. However, we note that the target field in Figure 4.1 is highly structured and recalling the philosophy of compressed sensing [4], good detection/localization may be possible from very few samples of the complete field in Figure 4.1 at the expense of using a more sophisticated (but computationally tractable) algorithm. This simple setup extends into the realistic time critical application of searching for a missing aircraft. If an aircraft crashed into a water body and the flight data recorder (black box) needs to be located, an AUV may be deployed to measure the underwater acoustic signals from the black box that decay in intensity as one moves farther away from the source. Acquiring physically distinct samples for an AUV is a slow process since it involves underwater navigation. At the same time, searching for the flight recorder is time critical, since longer the delay incurred in locating the black box, the lesser are the chances of finding it intact (or at all finding it). 4.1.1 Contributions and Organization We consider a static separable target field whose magnitude decays monotonically with increasing distance from the true location of the target. We employ an approach based on low-rank matrix completion [69] that allows us to derive a localization algorithm that does not need the knowledge of the target field decay 89 Figure 4.1: An underwater side-scan sonar image with pseudo-synthetic target signature. The background noise and artifacts are due to reflections from the seabed. profile; the only requirement is that the target field should be separable along some known directions. In particular, the algorithm can be viewed as a solution to the exploration-exploitation problem wherein the possible location of the target is unknown a priori and the sampling strategy enables the coarse learning of the location and presence of target, resulting in subsequent sampling in more informed locations. We prove correctness and convergence of the proposed algorithm and further develop an analytical trade-off between the number of collected samples and the target localization error in the presence of noise when employing a uniformly random spatial pixel sampling strategy. Besides lower sampling requirements, separability also gives a computational advantage to the unimodal regression subroutine in the sense that it only needs to be run on vectors rather than matrices. In contrast to our results, most prior literature on noisy low-rank matrix completion investigates bounds on mean squared estimation error, and very little is known about the performance of matrix completion for other tasks (like detection or localization). Our approach is fairly general and as such does not exploit specialized models for the background clutter, beyond that of reduced sharpness of the separability assumption. Thus, further improvement in performance may be possible by taking this information into consideration. For example, the sonar images of the form in Figure 4.1 suffer from certain position dependent imaging artifacts that may be removed by intermediate processing. We perform extensive numerical experiments on synthetic and real datasets to validate the efficacy and robustness of the proposed approach. In particular, the proposed approach is shown to be robust to missing data points and multiple local peaks which demonstrated the regularizing properties of separability. The rest of the chapter is organized as follows. In the remainder of this section, we explore related prior art and define the mathematical notation used in the chapter. Section 4.2 describes and justifies our assumptions on the target field and introduces the lifted reformulation of the underlying field. Section 4.3 describes our localization algorithm and states theoretical results to prove its correctness. Section 4.4 reviews other methods that we use to compare against our algorithm for the purpose of numerical simulations. Sections 4.5 and 4.6 respectively describe our simulation results on synthetic and real data sets. Section 4.7 concludes the chapter. Detailed proofs of all results in the chapter and useful supplementary material can be found in Section 4.8. 4.1.2 Related Work For an early survey of active target detection, we refer the reader to [49] consisting of statistical and signal processing approaches that assume availability of the full target field/signature (see also [50], [51]). The field of anomaly detection [52] further generalizes the scope of target detection and employs tools from machine learning, e.g. [53]–[58] perform window based target detection in full sonar images. General theoretical analysis on either of these problems is plagued by the lack of good models for experimental scenarios that are amenable to tractable analysis. In [58]–[64] there is a focus on path planning for active sensing of structured fields (in particular, [62] uses compressed sensing) with an explicit consideration of the navigation cost and stopping time. In contrast, the goal of this chapter is to explore theoretical properties of adaptive sensing for 90 structured fields stemming from the exploration-exploitation trade-off. Early work [65] focusing on target detection in multiple-in-multiple-out (MIMO) radar used a statistical approach, which was refined in [66]–[68] using a combination of joint sparse sensing and low-rank matrix completion ideas, relying on the strong theoretical guarantees of low-rank matrix completion from random samples [69]–[71]. The focus in the papers [66]–[68] is to adapt the design of the MIMO radar array to optimize coherence, which is also very different from our goal here of studying the detection and localization error performance of low-rank matrix completion. Finally, we note that distilled sensing [72]–[74] has a somewhat similar algorithmic philosophy as ours for target detection, but therein the field is assumed to be sparse rather than low-rank, thus facing basis mismatch challenges [75] that we can avoid completely. 4.1.3 Notation We use lowercase boldface alphabets to denote column vectors (e.g. z) and uppercase boldface alpha- bets to denote matrices (e.g. A). The MATLAB r indexing rules will be used to denote parts of a vec- tor/matrix (e.g.A(2 : 3, 4 : 6) denotes the sub-matrix ofA formed by the rows{2, 3} and columns{4, 5, 6}). The all zero, all one and identity matrices shall be respectively denoted by 0, 1 and I with dimensions dictated by context. (·) T denotes the transpose operation andh·,·i denotes the standard inner product onR n . The functionsk·k F andk·k ∗ respectively return the Frobenius and nuclear norms of their matrix argument. The function|·| applied to a scalar (respectively a set) returns its absolute value (respectively cardinality). Vector inequalities are assumed to hold element-wise, e.g. ifz∈R n thenz≤ 0 is shorthand for the n inequality relationsz(j)≤ 0,∀ 1≤j≤n. R andZ + respectively denote the set of real numbers and the set of positive integers. We shall use the O(·) notation, to upper bound the order of growth of any function f:R→R of h∈R w.r.t. its argument, i.e. f(h) =O(h) ⇐⇒ lim h→∞ f(h) h <∞. 4.2 System Model 4.2.1 Target Field Assumptions Let the search region (see Figure 4.1) be the two dimensional unit square [0, 1] 2 ⊂R 2 , andy = (y c ,y r )∈ [0, 1] 2 denote an arbitrary location in the search space. Let H :R 2 →R denote the scalar valued field induced by the target, i.e. the target signature. Thus, a mobile agent measuring the field value at locationy∈R 2 would record the value H(y) =H(y c ,y r )∈R. We shall make the following key (physically motivated) assumptions on the field H(y): (A1) H(y) is separable in some known basis ofR 2 , independent of the true location of the target. (A2) The magnitude of the field,|H(y)| is a monotonically non-increasing function of the distance from the target in every direction. (A3) H(y) is spatially invariant relative to the target’s position. Without loss of generality (w.l.o.g.), we assume separability of H(y) in the y c and y r directions (i.e. in the canonical basis{[1, 0], [0, 1]}) as per (A1). This means that there exist functions F :R→R and G:R→R such that H(y) = F (y c )G(y r ),∀(y c ,y r )∈ R 2 . Notice that if H(y) is instead separable in the rotated directions Σ(1, 0) T and Σ(0, 1) T for some known Σ∈R 2×2 , then we can work in this rotated coordinate system. Assumption (A2) is intuitively clear and can be mathematically described by the inequality: |H(t 1 (y−y 0 ))|≥|H(t 2 (y−y 0 ))|, (4.1) holding∀y∈R 2 ,t 2 > t 1 > 0, wherey 0 represents the unknown location of the target. Assumption (A3) implies that if the target were moved fromy 0 to a new positiony 0 0 , then the new field at locationy would be given by H(y−y 0 0 +y 0 ), thus ensuring that (A1) holds in the canonical basis, regardless of the target’s positiony 0 . In this sense (A3) is stricter than necessary for our purposes, but we retain it for intuitive clarity. 91 Scalar fields commonly correspond to intensity measurements (like the sonar image in Figure 4.1). The following types of commonly assumed intensity fields satisfy our assumptions: 1. Exponential fields: H(y) =H 0 exp −kΣyk p p , for any 2× 2 diagonal matrix Σ∈R 2×2 and constants p,H 0 > 0. For p = 1, we get two dimensional Laplacian fields H(y) =H 0 exp(−kΣyk 1 ), (4.2a) and for p = 2, we get two dimensional Gaussian fields H(y) =H 0 exp −kΣyk 2 2 . (4.2b) 2. Power Law fields: H(y) = H 0 (a 1 +|y c | p1 ) r1 (a 2 +|y r | p2 ) r2 (4.3) for constants H 0 ,p 1 ,p 2 ,a 1 ,a 2 ,r 1 ,r 2 > 0. With p 1 = p 2 = 2 and r 1 = r 2 = 1, we get a field that is separable as a product of two Cauchy fields H(y) = H 0 (a 1 +y 2 c )(a 2 +y 2 r ) . (4.4) 3. Any multiplicative combination of fields satisfying our assumptions, e.g. H(y) =H 0 exp −c 1 y 2 c −c 2 |y r | (1 +|y c |)(1 +y 2 r ) (4.5) for some constantsH 0 ,c 1 ,c 2 > 0. In particular, the set of separable fields is closed under multiplication. In some cases, the target field may not strictly satisfy the separability assumption (A1). However, the algorithm we develop in the sequel also works with approximate separability (measured by how well can the field be approximated by a rank-1 matrix; see Section 4.2.2). For example: 1. The first ten singular values{σ j | 1≤j≤ 10} of the sonar image in Figure 4.1 are shown in Figure 4.2a, relative to the first singular value σ 1 . Clearly, σ 2 is about 7dB below σ 1 which implies that the sonar image is approximately rank-1 (hence approximately separable). 2. The commonly occurring inverse square law field, H(y) = H 0 / y 2 c +y 2 r for some constant H 0 > 0, is not separable in the sense of assumption (A1). Let H ∈ R 20×20 denote the discretized field matrix formed by sampling H(y) on the gridy∈{(y c ,y r )|y c ,y r ∈{−9.5,−8.5,..., 9.5}} withH 0 = 1. Plotting the first seven singular values{σ j | 1≤j≤ 7} of H, relative to the first singular value σ 1 , in Figure 4.2b shows that σ 2 is about 9dB below σ 1 . Thus, inverse square law fields can be approximately separable. Further, it can be computationally verified that the approximately 9dB attenuation from σ 1 to σ 2 , for discretized inverse square law fields, also holds for few other sampling grids, e.g.{(y c ,y r )|y c ,y r ∈{−99.5,−98.5,..., 99.5}} and{(y c ,y r )|y c ,y r ∈{−19,−17,..., 19}}. 4.2.2 Lifted Formulation By virtue of assumption (A2), localizing the target is synonymous with locating the peak of the induced field. In light of our assumptions, we can state the target detection problem as the following task: To determine the location of the peak in the field H(y) from its values in only a few locationsy∈ [0, 1] 2 . We use the lifting technique from optimization [1] to demonstrate that the separability assumption (A1) implies a rank one structure on the field. This key observation allows large reductions in both number of collected samples as 92 (a) Plot of the first ten singular values of the sonar image in Figure 4.1, normalized w.r.t. the first singular value. (b) Plot of the first seven singular values of the dis- cretized inverse square law field, normalized w.r.t. the first singular value. Figure 4.2: Plot of dominant singular values for approximately separable fields (approximately rank-1 fields). well as the computational effort necessary for target detection by utilizing existing theoretical results for high-dimensional low-rank matrix completion algorithms [69], [71]. LetH(y) =F (y c )G(y r )bethecanonicalseparablerepresentationofthetargetfieldandletH denoteahigh resolution discretized version of H(y) on a n r ×n c rectangular (not necessarily uniform) gridV⊂ [0, 1] 2 . Let V = y 1 r ,y 2 r ,...,y nr r × y 1 c ,y 2 c ,...,y nc c be the representation of the grid for y 1 r ,y 2 r ,...,y nr r ,y 1 c ,y 2 c ,...,y nc c ∈ [0, 1]. The set of all possible sampled values of the field on the setV is given by H y i c ,y j r y i c ,y j r ∈V and can be arranged in the form of the rank one matrixH∈R nr×nc , whose (i,j) th entryH(i,j) is H(i,j) =H y i c ,y j r =F y i c G y j r . (4.6) where y i c ,y j r is the physical location of the (i,j) th point inV. The matrix H is clearly of rank one since we can express it as the outer product H =fg T wheref T = F (y 1 c ),F (y 2 c ),...,F (y nc c ) andg T = G(y 1 r ),G(y 2 r ),...,G(y nr r ) . Without loss of generality, we assume that bothy 1 r ,y 2 r ,...,y nr r andy 1 c ,y 2 c ,...,y nc c are sorted in ascending order, corresponding respectively to traversing the grid from top to bottom and from left to right. Because of the preceding derivation, we can refer toH as the target field with a slight abuse of terminology. Consequently, we can considerV in a rescaled sense to refer to the set of index pairs {1, 2,...,n r }×{1, 2,...,n c } for the matrixH. 4.3 Sampling and Reconstruction Approach To convey the main aspects of our approach, we shall assume that H(·) is a positive scalar field and the sampling gridV is square with n r =n c =n. These assumptions can be somewhat relaxed as described in Section 4.8.7. 4.3.1 The PAMCUR Algorithm We use standard low-rank noisy matrix completion followed by peak localization along each axis. The algorithm starts with n 2 possible locations of the peak and after execution, returns a smaller set of index pairs that are guaranteed to contain the peak, provided that the error form the matrix completion step is sufficiently small. This can be considered as the “first pass” over the search region, giving us a coarse segmentation of the region into an area of interest that contains the peak, and its complement region which can be discarded. The algorithmic procedure can be repeated on this smaller region of interest, giving rise to the exploration-exploitation interpretation of our hierarchical approach. The key steps for the first pass are described in Algorithm 1 withP V 0(·) denoting the projection operator on the set of index pairs inV 0 . 93 Algorithm 1 PAMCUR: Partial Adaptive Matrix Completion with Unimodal Regression Inputs: 1. The regular gridV ={1, 2,...,n} 2 2. Upper bound on noise power per sample (averaged across samples), 2 Output: Localization index bounds l L c ,l R c ,l L r ,l R r ∈Z + such that the target is located within the rectangular region formed by the index pairs in l L c ,l L c + 1,...,l R c × l L r ,l L r + 1,...,l R r . Steps: (S1) Select a subsetV 0 ⊂V ofO n log 2 n points uniformly and independently at random from the n 2 points inV, and measure the (possibly noisy) samplesH(i,j) for every index pair (i,j)∈V 0 , i.e. record the projectionP V 0(H). (S2) Solve the convex nuclear norm heuristic to stable low-rank matrix completion [69] minimize Q kQk ∗ subject to kP V 0(Q)−P V 0(H)k F ≤|V 0 |, (P 8 ) to obtain the solution c H∈R n×n . (S3) Compute the largest singular value and corresponding singular vectors of c H as the triplet (u,σ,v). (S4) Compute the lower bounding index l L r ∈{1, 2,...,n} for localization as the solution to the optimization problem minimize z,l l subject to l∈{1, 2,...,n}, z(j + 1)≤z(j), j =l,l + 1,...,n− 1, z(j− 1)≤z(j), j = 2, 3,...,l, hz,ui≥kzk 2 p 1−ζ 2 /σ 2 , z≥ 0, (P 9 ) where ζ is an upper bound on H− c H F from the theory of low-rank matrix completion and depends only on n, and|V 0 |. If necessary, replaceu by−u in Problem (P 9 ) to make it feasible. Note: Since Problem (P 9 ) is parametrized by the known parameters n,u, ζ and σ, we shall refer to it as Problem P 9 (n,ζ,σ,u) if it is necessary to make the dependence explicit. (S5) Compute the upper bounding index l R r ∈{1, 2,...,n} for localization as the solution to Problem (P 9 ) with the objective function changed from l to−l. (S6) Repeat steps (S4) and (S5) with Problem P 9 (n,ζ,σ,v) to respectively obtain the remaining two indices l L c and l R c (both in{1, 2,...,n}). If Problem P 9 (n,ζ,σ,v) is not feasible, solve Problem P 9 (n,ζ,σ,−v) instead. 94 We remark that for every fixed value of l∈{1, 2,...,n}, Problem (P 9 ) reduces to a convex feasibility problem [122]. Given that l admits at most n distinct values, Problem (P 9 ) is efficiently solvable. Further, results from low-rank matrix completion [69], [70] guarantee that a sample complexity of|V 0 | =O n log 2 n is sufficient for the solution to Problem (P 8 ) to be a good reconstruction ofH with high probability (w.h.p.) over the realizations ofV 0 , and that the hidden constant depends on the coherence [70] of H with the canonical basis for matrices inR n×n (coherence parameters for decaying exponential and power-law fields are analytically computed in Section 4.8.8). It is intuitive to reason that good mean-squared error (MSE) leads to good peak localization inH. Theorems 4.1 and 4.3 below, give precise results to the same effect. 4.3.2 Correctness and Localization-Accuracy Trade-off For a quantitative comparison of the trade-offs involved, we present the following analysis, that holds w.h.p. over realizations ofV 0 . Suppose that the fraction q =|V 0 |/n 2 of the total number of elements inH are sampled at random, where the sampling budget|V 0 | = Ω n log 2 n is sufficiently high with the right constants as given by [71] or [69]. Using Theorem 7 from [69] on mean-squared-error performance of the low-rank matrix completion subproblem (P 8 ), we get bounds on the Frobenius norm of the reconstruction error matrix Z =H− c H as kZk F ≤ 2 + 4 p n(1 + 2/q) |V 0 | =C(q,n)|V 0 | =ζ, (4.7) where we have used the shorthand notation C(q,n) = 2 + 4 p n(1 + 2/q). In particular, we have a bound on the reconstruction error of the form H− c H F ≤ζ for ζ depending only on n, and|V 0 |, where c H is the solution to Problem (P 8 ). LetH =σ 0 u 0 v T 0 and c H =σuv T + P n j=2 σ j u j v T j respectively denote singular value decompositions (SVDs), where σ is the largest singular value of the matrix c H (in agreement with step (S3) of the algorithm). Then, the SNR is (note that the noise power 2 |V 0 | 2 is computed only over the observed entries) SNR =kHk 2 F /(|V 0 |) 2 =σ 2 0 /(|V 0 |) 2 , (4.8) and we can rewrite (4.7) as kZk F = σ 0 u 0 v T 0 −σuv T − n X j=2 σ j u j v T j F ≤C(q,n)σ 0 / √ SNR =ζ. (4.9) Both the correctness of the proposed algorithm and the localization-accuracy trade-off characterization follow from the theorem below, which lower bounds the magnitudes of the inner productshv 0 ,vi andhu 0 ,ui. Theorem 4.1. LetH =σ 0 u 0 v T 0 and c H =σuv T + P n j=2 σ j u j v T j respectively denote SVDs, where σ is the largest singular value of c H and let the bound H− c H F ≤ζ≤σ be satisfied. Thenhu 0 ,uihv 0 ,vi≥η(σ,σ 0 ,ζ) where η(σ,σ 0 ,ζ), 1− σ σ 0 + s 1− σ σ 0 2 + σ σ 0 2 − ζ σ 0 2 . (4.10) Proof. Section 4.8.1. In the moderate to high SNR regimes, we expect good reconstruction so that the relative errorζ/σ is much smaller than 1. We also expectσ/σ 0 to be very close to 1, but slightly less than 1 since the native formulation in Problem (P 8 ) is known to bias solutions towards zero [69]. Assuming σ =σ 0 we have the approximate bound αβ≥ p 1−ζ 2 /σ 2 0 which implies max{α,β}≥ 4 p 1−ζ 2 /σ 2 0 and min{α,β}≥ p 1−ζ 2 /σ 2 0 . Assuming α =hu 0 ,ui>hv 0 ,vi =β and comparing the lower bound expressions with (4.9), we have the following SNR dependencies: a) hu 0 ,ui scales as 4 p 1−C 2 (q,n)/SNR, and 95 Figure 4.3: A heatmap of the ratio η(σ,σ 0 ,ζ) p 1−ζ 2 /σ 2 over the domain 0≤ζ≤σ≤σ 0 . In the high SNR regime, σ/σ 0 ≈ 1 and the bound η(σ,σ 0 ,ζ)≥ p 1−ζ 2 /σ 2 is very tight irrespective of the value of ζ. Figure 4.4: Plot of an arbitrary non-negative unimodal vector u 0 ∈ R 401 with unit ` 2 -norm, satisfying u 0 (l)∝ exp(−|0.1×l− 20.1|) over 1≤ l≤ 401. Choosing ζ 0 = 0.3 √ 2, calculations give l BL r = 176 and l BR r = 226. The threshold l BR r −l BL r is less than one-eighth of the length ofu 0 . b) hv 0 ,vi scales as p 1−C 2 (q,n)/SNR. Lemma 4.2. For 0≤ζ≤σ≤σ 0 , η(σ,σ 0 ,ζ)≥ p 1−ζ 2 /σ 2 . Proof. Section 4.8.2. The main purpose of Lemma 4.2 is to bound η(σ,σ 0 ,ζ) in terms of quantities that are known to the algorithm during execution, and this is utilized in Problem (P 9 ). Figure 4.3 demonstrates the tightness of the bound in Lemma 4.2, especially in the high SNR regime where σ≈σ 0 . Theorem 4.1 essentially utilizes error bounds on low-rank matrix completion and translates them into error bounds on the estimated singular vectors. Thereafter, it becomes conceptually straightforward to compute localization error bounds both numerically (by solving Problem (P 9 )) and analytically. Note that the dependence on the number of collected samples has been entirely captured in the quantity ζ. This level of abstraction also allows us to compare localization performance for different decay profiles under a fixed sampling budget that is high enough for all the decay profiles in question. Since our localization algorithm is iterative in nature, to finish the proof of correctness we also need to show that it converges in a meaningful sense. The following theorem guarantees that the localized region shrinks geometrically in each application of Algorithm 1 until the localization boundaries are close enough to the true peak, provided that the observation noise is small enough for moderately good reconstruction in step (S2) and the target field admits a sufficiently sharp peak. Theorem 4.3. Let H = σ 0 u 0 v T 0 denote the SVD of the positive matrix H∈R n×n , where u 0 ≥ 0 and v 0 ≥ 0 are unimodal vectors (as described in Lemma 4.4) with respective peaks at l 0 r and l 0 c . Assume that the following are true. 96 1) Step (S2) of Algorithm 1 achieves a reconstruction error ζ upper bounded as ζ 2 /σ 2 < 7/16. Let us define ζ 0 , 4 p 1−ζ 2 /σ 2 − 3> 0. 2) ku 0 k 1 =ρ u √ n andkv 0 k 1 =ρ v √ n for some ρ u ,ρ v ≤ζ 0 /2. 3) 1≤l L c ≤l BL c , l BR c ≤l R c ≤n, 1≤l L r ≤l BL r and l BR r ≤l R r ≤n, where l BL r = arg max 1≤j≤n j subject to ku 0 (1 :j)k 1 ≤ζ 0 / √ 2, (4.11a) l BL c = arg max 1≤j≤n j subject to kv 0 (1 :j)k 1 ≤ζ 0 / √ 2, (4.11b) l BR r = arg min 1≤j≤n j subject to ku 0 (j :n)k 1 ≤ζ 0 / √ 2, (4.11c) l BR c = arg min 1≤j≤n j subject to kv 0 (j :n)k 1 ≤ζ 0 / √ 2. (4.11d) Then, Algorithm 1 gives a localized region of size l R c −l L c × l R r −l L r < 16ρ 2 u ρ 2 v n 2 /(ζ 0 ) 4 . (4.12) Proof. Section 4.8.3. Clearly, all assumptions in the above theorem are symmetric w.r.t. vectorsu 0 andv 0 . We also make the following observations. 1. All assumptions of Theorem 4.3 depend on the quantity ζ 0 . If step (S2) of Algorithm 1 achieves a better reconstruction, then ζ 2 /σ 2 is smaller and hence ζ 0 is larger. A larger value of ζ 0 means that the ` 1 -norm requirement onu 0 is less stringent (allowing for a milder sharpness requirement on the peak). 2. If u 0 showed a completely diffuse peak (all elements are of equal magnitude), then we would have u 0 = 1/ √ n andku 0 k 1 = √ n, sinceku 0 k 2 = 1 andu 0 ≥ 0. Thus, the requirement ofku 0 k 1 =ρ u √ n≤ ζ 0 √ n/2 as the criterion for sharpness of the peak is fairly modest, especially if ρ u ≤ζ 0 /2 is a constant independent of n. 3. The conditions in (4.11) help describe the state of Algorithm 1 at which one may expect geometric shrinkage of the localized region. In particular, geometric shrinkage continues only until the size of the localized region falls below l BR c −l BL c × l BR r −l BL r . As illustrated in Figure 4.4, if the peak in u 0 (respectivelyv 0 ) is sufficiently sharp, then this threshold l BR r −l BL r (respectively l BR c −l BL c ) for geometric shrinkage is fairly small. The proof of Theorem 4.3 relies on the following supporting lemmas, that also outline the high level proof strategy. One solution strategy for Problem (P 9 ) involves solving a sequence of specific instances of Problem (P 10 ). Lemma 4.4 is essentially an implication of weak duality for Problem (P 10 ). Since Problem (P 10 ) is stated as a feasibility problem, to simplify subsequent analysis, Lemma 4.5 states the equivalent optimization problem. Finally, Lemma 4.6 helps to identify the dominant bound in (4.14) for the special case relevant to the proof of Theorem 4.3. Lemma 4.4. Define the feasibility problem find z subject to z(j + 1)≤z(j), j =l ∗ ,l ∗ + 1,...,n− 1, z(j− 1)≤z(j), j = 2, 3,...,l ∗ , hz,vi≥ρ, kzk 2 = 1, z≥ 0, (P 10 ) 97 w.r.t. z∈R n and parametrized by l ∗ ∈{1, 2,...,n}, 0≤ ρ≤ 1 and v∈R n + . Letkvk 2 = 1 and suppose that v is unimodal with peak at l 0 ∈{l ∗ ,l ∗ + 1,...,n}, i.e. v(j + 1)≤v(j) for every l 0 ≤ j≤ n− 1 and v(j− 1)≤v(j) for every 2≤j≤l 0 . For Problem (P 10 ) to be feasible, it is necessary that ρ 2 ≤h1,vi 2 + δ 2 − 2δh1,vi (k R +k L + 1) +δ 2 (k R +k L + 1) 2 (4.13) holds for any integers 0≤k L ≤l 0 −l ∗ , 0≤k R ≤n−l 0 , and any δ∈R + satisfying δ≤ h1,v(j + 1 :n)i (l 0 +k R −j) , ∀j∈ l 0 −k L − 1,...,l 0 +k R − 1 . (4.14) Proof. Section 4.8.4. Lemma 4.5. Forv∈R n + and l ∗ ∈{1, 2,...,n}, the optimization problem minimize z −hz,vi subject to z(j + 1)≤z(j), j =l ∗ ,l ∗ + 1,...,n− 1, z(j− 1)≤z(j), j = 2, 3,...,l ∗ , kzk 2 2 = 1. (P 11 ) admits a solutionz opt that satisfiesz opt ≥ 0. Furthermore,z opt is feasible for Problem (P 10 ) if and only if the optimal value of Problem (P 11 ) is no greater than−ρ. Proof. Section 4.8.5. Lemma 4.6. If v∈R n + is a unimodal vector (as described in Lemma 4.4) with peak at l 0 ∈{2, 3,...,n}, thenh1,v(j + 1 :n)i/ l 0 −j is a monotonically non-decreasing function of j over 1≤j≤l 0 − 1. Proof. Section 4.8.6. 4.3.3 Complexity Computations For a quantitative comparison of the trade-offs involved, we present the following analysis. In each round of sampling (each invocation of Algorithm 1), we collect O νn log 2 n random samples on an n×n sub-matrix H formed by sampling H(·) on a regular grid (using results from [70], dependence of the number of samples on the coherence parameter ν has been factored in). Let us assume that sampling the field at the Nyquist rate would have required m 2 n 2 samples, i.e. discretization of H(·) into an mn×mn sized grid would allow for reconstruction of H(·) using linear low-pass filtering. Theorem 4.3 guarantees a geometric reduction in the size of the search space as long as the search space is large in an appropriate sense, implying that the search space becomes small after at most N R =O(logmn) sampling rounds. Assuming that the small search space is of size independent of n and can be covered using a constant number of samples, the total number of samples collected equals N S =N R ·O νn log 2 n =O νn log 2 n logmn =O (1−γ(n))n log 2 n logmn . (4.15) On the other hand, one-step naive matrix completion would have required O νmn log 2 mn samples which is order-wise larger than N S by a factor of O m logmn log 2 n . Let us denote the total runtime by N T . To compute this, we denote the run time of the n×n matrix completion problem from O νn log 2 n random samples byR 1 (n). We note that Algorithm 1 involves solving Problem (P 8 ) once and solving Problem (P 9 ) four times. Clearly, Problem (P 8 ) is a matrix completion problem and Problem (P 9 ) can be solved by solving the n distinct instances of the feasibility problem P 10 (l ∗ ,·,·) for l ∗ ∈{1, 2,...,n}. By Lemma 4.5, Problem (P 10 ) is equivalent to Problem (P 11 ) which is a very simple convex quadratic program with non-negativity constraints in its dual form (see (4.40)). Denoting the 98 running time for solving Problem (P 11 ) in its dual form by R 2 (n), we have the complexity of Algorithm 1 as R 1 (n) + 4nR 2 (n) and the total running time as N R times the complexity of Algorithm 1, or equivalently, N T = O((R 1 (n) +nR 2 (n)) logmn). In contrast, one-step naive matrix completion would have required O(R 1 (mn)) running time; substantially larger than O(R 1 (n) logmn), if using general purpose Semidefinite Program Solvers like SeDuMi with CVX [123], [124] that scale as R 1 (p) =O p 3.5 . Thus, reconstructing the entire field turns out to be much worse from both sampling and computational viewpoints. 4.4 Baseline algorithms 1 In this section, we introduce some baseline methods which shall serve as points of comparison for our PAMCUR algorithm. Each baseline algorithm is similar to PAMCUR in terms of the multi-resolution approach, i.e. each stage involves sampling on a subset of a n×n grid within some region of interest (ROI) with the sampling resolution getting finer with each progressive stage. However, the various algorithms differ in the localization strategies employed to shrink down the ROI in subsequent iterations. For all baseline algorithms presented below, we use a fixed scale κ≤ 1 in simulations for progressive reduction of the ROI with each stage (leading to a geometric reduction at a fixed rate) till the update in target localization across consecutive stages falls below a tolerance threshold. In contrast, PAMCUR (by design) chooses the reduction in ROI adaptively at each stage and thus does not correspond to a fixed value of κ. For the purpose of comparison, we have considered fixed sized grids at each stage and presented results for n = 50 and n = 100. To solve the low-rank matrix completion problem (P 8 ) we used the LMaFit implementation [125], [126], while the unimodal regression was solved using the Pair Adjacent Violators approach [127], [128]. 4.4.1 Matrix Completion based variants (MConly and MCuni) These two algorithms are closely related to PAMCUR and differ only in which parts of PAMCUR they employ. The MConly algorithm, at each stage, performs a standard noisy low-rank matrix completion followed by peak detection along the horizontal and vertical directions as proposed in our previous paper [113]. The MCuni algorithm, at each stage, additionally finds the best unimodal fit (in the sense of ` 2 -norm) to the estimated singular vectorsu andv after the matrix completion step but before the peak detection step. This unimodal regression step provides robustness in the presence of spurious peaks and is posed analogous to Problem (P 10 ). 4.4.2 Surface Interpolation (interp) This method, at each stage, attempts to impute missing data after sampling by employing interpolation over a smooth surface using a nearest-neighbor approach. This is done by simply searching for the nearest sampled location and duplicating the measurement at that location. Such nearest-neighbor interpolation could lead to a noisy completed matrix, so we smooth it further by a moving average mask before executing peak detection on the imputed matrix. If implemented efficiently (using space partitioning methods like k-d trees), this method admits O(n 3 ) running time on an n×n grid [129]. 4.4.3 Mean-shift based Gradient Ascent (MS) Mean-shift (MS) is actually a popular algorithm used in pattern recognition for unsupervised clustering of data points in the feature space and we present below a suitable adaptation to perform gradient-ascent on the target field. Unlike the algorithms discussed so far, this method only exploits local information. The MS algorithm proceeds by collecting samples, approximating the local gradient from these samples and then performing a gradient ascent step to determine the next sampling neighborhood. Specifically, the gradient 1 This section is joint work with Naveen Kumar and Srikanth Narayanan and is presented here for completeness 99 Accuracy Bound 0.5 0.6 0.7 0.8 0.9 1 Localization Bound 10 -1 10 0 Gaussian Laplacian Cauchy Figure 4.5: Trade-off showing the localization bound (normalized by size of search space) achievable for a given accuracy bound p 1−ζ 2 /σ 2 (and hence for a given sampling budget) for discretized versions of standard Gaussian, Laplacian and Cauchy fields. direction at a location v k is computed using a mean-shift update over a window of size ω according to a center of mass type computation (p below denotes an arbitrary location) v k+1 = P p p·U(p−v k ) P p U(p−v k ) , U(p−v) = ( 1, kp−vk c ≤ω, 0, otherwise. (4.16) Note that the new location v k+1 is in the direction of one of the eight adjacent locations v k based on the direction of the gradient. Being local, this algorithm is quite susceptible to finding local peaks. To give this algorithm a fighting chance, within each stage, we shall allow it to execute a few times with random starting points so that multiple local peaks can be detected and the highest peak can be returned. The computational complexity is measured accordingly and (if implemented efficiently using space partitioning methods like k-d trees [129]) equalsO Mn 3 whereM is the number of restarts andn×n is the grid size. The pseudo-code for this method is outlined as Algorithm 2 in Section 4.8.9. A larger ω provides robustness to local noise characteristics at the cost of increasing the number of collected samples and the risk of smoothing out peaks with small spreads. Thus, the number of samples acquired depends not only on the length of the gradient ascent trail but also on ω. 4.5 Numerical Experiments: Synthetic Data It is instructive to study the localization bound vs accuracy trade-off for some known unimodal decay profiles from the exponential and power law families. Figure 4.5 shows the results for the standard Gaussian, Laplacian (F (x) = 0.5 exp(−|x|)) and Cauchy (F (x) = 1/ π +πx 2 ) fields, withu in Problem (P 9 ) representing the discretization of a one dimensional continuous function. We see that for a given (sufficiently high) accuracy level p 1−ζ 2 /σ 2 (which translates to a fixed sampling budget), the Laplacian field admits the best one-step localization bound. This is somewhat surprising at first sight since Gaussian fields are inherently far more localized than Laplacian fields. However, the same phenomenon was confirmed via actual simulation for these decay profiles across a range of window sizes and spread factors (see Figure 4.6). Intuitively, good localization per sample requires the right balance of “spread” and support of the “gradient” of the field which seems to be better in case of Laplacian fields and hence they show the best localization performance for a given sampling budget. 100 Number of samples in each direction (w) 0 20 40 60 80 100 120 Spread Factor (s) 50 100 150 0 0.2 0.4 0.6 0.8 1 (a) Laplacian Number of samples in each direction (w) 0 20 40 60 80 100 120 Spread Factor (s) 50 100 150 0 0.2 0.4 0.6 0.8 1 (b) Gaussian Number of samples in each direction (w) 0 20 40 60 80 100 120 Spread Factor (s) 50 100 150 0 0.2 0.4 0.6 0.8 1 (c) Cauchy Figure 4.6: Variation of the probability of correct localization by MConly algorithm to within 4% of the search space, averaged over 10 trials. Results are for three different decay profiles across a range of sampled window sizes and field spread factors with Gaussian distributed background noise. Longitude 8.5 9 9.5 10 10.5 11 Latitude 56.6 56.8 57 57.2 57.4 57.6 -100 -50 0 50 100 Figure 4.7: A low resolution visualization of the elevations on the road network dataset. The regions where readings are not available are shown in blue. 4.6 Numerical Experiments: Elevation Dataset 2 4.6.1 3-D road network dataset For testing on real data, we use an altitude dataset for road networks in North Jutland, Denmark [130]. The dataset covers a region of 185× 135 km 2 and comprises of elevation measurements in metres at 434874 locations sampled along physical roads. In the past, this dataset has been mostly used in unsupervised learning tasks or in applications that require accurate elevation information, e.g. eco-routing [131]. For the purpose of evaluation, the objective is to locate the region with the highest elevation in the map. Two such regions are clearly visible from the elevation heat map as shown in Figure 4.7. To simulate the uniform grids assumed by the algorithms under consideration, if the data at a sampled location is missing, we fill it in by selecting the nearest available sample. A fixed grid size of n×n is used for all experiments as mentioned earlier in Section 4.4. For all algorithms other than MS, we vary the number of collected samples by controlling the percentage of samples acquired on the n×n subgrid at each stage (denoted by the fraction α). For the MS algorithm, the number of collected samples depends on the number of restarts K. 4.6.2 Results To evaluate each approach, we use the location of the highest peak (see the map in Figure 4.7) as ground truth. Average distance of the localization from the highest peak is then measured as a metric for accuracy of 2 This section is joint work with Naveen Kumar and Srikanth Narayanan and is presented here for completeness 101 average number of samples (normalized by field-size) 0 2 4 6 8 10 log-MSE from target -4 -3 -2 -1 0 PAMCUR MC-uni MC-only interp MS (a) n=100 average number of samples (normalized by field-size) 0 2 4 6 8 log-MSE from target -3 -2 -1 0 PAMCUR MC-uni MC-only interp MS (b) n=50 Figure 4.8: Trade-off between number of samples collected and the localization accuracy achieved for the MConly, MCuni, interp, MS and PAMCUR algorithms averaged over 500 runs for two different grid sizes, viz. n = 50 and n = 100. (a) localization area vs number of samples (b) localization error vs number of samples Figure 4.9: Localization error performance vs number of samples for PAMCUR algorithm on the elevation dataset in Figure 4.7. Each color represents the result of 500 independent trials plotted individually. the algorithm which is plotted in Figure 4.8 against the number of samples collected (normalized w.r.t. the field size). We expect that as the number of collected samples is increased the accuracy of localization should improve for each algorithm, resulting in a trade-off curve. This could happen through better accuracy of the smoothing or the low-rank reconstruction process as the number of samples is increased. Generally speaking, we do observe this to be true in Figure 4.8 in terms of the absolute log-mean square error being lower for a grid size of n = 100 than for a grid size of n = 50. We further note that all the algorithms based on matrix completion, viz. MConly, MCuni and PAMCUR, show very similar trade-off curves. The PAMCUR algorithm is somewhat more efficient at low sampling density (n = 50) owing to its built-in adaptive sampling strategy and hence yields better localization for the same number of collected samples. The MS algorithm, which only employs local search, performs poorly at higher sampling density (n = 100) which may be attributed to the formation of multiple noise induced local peaks. Finally, we note that the best localization accuracy trade-off on n = 100 sized grid is achieved by the interp algorithm, which is not at all surprising since an exhaustive search is performed by this algorithm on the completed 2D-grid and the higher sampling density ensures sufficient smoothness. For n = 50, smoothness of the completed 2D-grid seems to be inadequate for the interp algorithm to outperform other approaches. Further confirmation of the efficacy of PAMCUR for low sampling density is evident from Figure 4.9 where the decrease in 102 a) the size of the localization area (Figure 4.9a), and b) the localization error (Figure 4.9b) have been plotted against increase in the number of samples, for different intra-stage sampling fractions α. It is clear that the trade-off vs number of samples is better for lower values of α, subject to α being greater than the information theoretic lower limit for PAMCUR. This lower limit seems to be somewhere between α = 0.3 andα = 0.4 since we did not get interpretable results for the former while the latter gave algorithmic convergence. Finally, we make the pleasing observation that the initial rate of error reduction in the peak estimate as well as the rate of shrinkage of the localization bounding box is at a geometric rate and occurs with high probability over the realization of the sampling locations. This is in agreement with our theoretical result in Theorem 4.3. 4.7 Conclusions In this chapter, the problem of target localization from incomplete samples of the target field was examined with the goal of reducing the number of samples necessary to solve the problem by utilizing the structural properties of the target field while adopting an exploration-exploitation approach. An algorithm (PAMCUR) was presented that exploited separability and unimodality of the decaying field around the target to use a low-rank matrix completion based approach coupled with unimodal regression at multiple resolutions, and a theoretical trade-off analysis between sampling density, noise level and convergence rate of localization was developed. Knowledge of exact decay profiles was shown to be unnecessary and the algorithm is applicable to any decaying and approximately separable target field. Besides regularization, separability was also shown to give computational benefits in terms of enabling unimodal regression on vectors (instead of unimodal regression on matrices). It was demonstrated (somewhat surprisingly) that Laplacian fields achieve better localization vs accuracy trade-off under a fixed sampling budget, as compared to Gaussian or Cauchy fields. Numerical experiments and comparisons on synthetic and real datasets were performed to test the efficacy and robustness of the presented approach, and the results demonstrated the advantages of the PAMCUR algorithm (for low sampling density) over other approaches based on mean-shift clustering, surface interpolation and naive low-rank matrix completion with peak detection. The experiments also showed that the approach is fairly robust to missing data points and to the presence of multiple local peaks, and demonstrated the regularizing properties of separability. As a matter of future work, we intend to investigate the possibility of tighter integration of structural properties into low-rank matrix completion, to enhance the currently two-step process of matrix completion followed by unimodal regression. 4.8 Proofs and Supplementary Material 4.8.1 Proof of Theorem 4.1 We let P u = uu T and P v = vv T respectively denote the matrices projecting onto the vectors u and v, and let P u ⊥ = I−P u and P v ⊥ = I−P v denote the projection matrices onto the respective orthogonal complement spaces. We have, kZk 2 F =k(P u +P u ⊥)Z(P v +P v ⊥)k 2 F =kP u ZP v +P u ZP v ⊥ +P u ⊥ZP v +P u ⊥ZP v ⊥k 2 F (4.17a) =kP u ZP v k 2 F +kP u ZP v ⊥k 2 F +kP u ⊥ZP v k 2 F +kP u ⊥ZP v ⊥k 2 F (4.17b) ≥kP u ZP v k 2 F +kP u ZP v ⊥k 2 F +kP u ⊥ZP v k 2 F . (4.17c) where (4.17b) follows from (4.17a) since each term within thek·k 2 F expression in (4.17a) is orthogonal to the other three terms w.r.t. the standard trace inner product over the vector space of n×n real matrices. Furthermore, we neglect the last term in (4.17b) to arrive at (4.17c) since we anticipate it to be small in the high SNR regime. This is because the dominant singular vectors (u,v) of c H should be a good approximation 103 of the true singular vectors (u 0 ,v 0 ) at high SNR so that projectingu 0 (respectivelyv 0 ) on to the orthogonal complement spaceu ⊥ (respectivelyv ⊥ ) should incur only a small amount of energy. We can evaluate each of the terms on the r.h.s. of (4.17c) as below. kP u ZP v k F = (σ 0 hu 0 ,uihv 0 ,vi−σ)uv T + 0 F =|σ 0 hu 0 ,uihv 0 ,vi−σ|. (4.18a) kP u ZP v ⊥k F = σ 0 hu 0 ,uiuv T 0 P v ⊥ + 0 F =|hu 0 ,ui|σ 0 v T 0 P v ⊥ F =|hu 0 ,ui|σ 0 kP v ⊥(v 0 )k F =|hu 0 ,ui|σ 0 q 1−|hv 0 ,vi| 2 . (4.18b) kP u ⊥ZP v k F = σ 0 P u ⊥(u 0 )hv 0 ,viv T + 0 F =|hv 0 ,vi|σ 0 kP u ⊥(u 0 )k F =|hv 0 ,vi|σ 0 q 1−|hu 0 ,ui| 2 . (4.18c) For brevity of notation we let α =hu 0 ,ui∈ [−1, 1] and β =hv 0 ,vi∈ [−1, 1]. From the assumptions of the theorem,kZk F ≤ζ and combining this with (4.17) and (4.18) implies ζ 2 ≥ (σ 0 αβ−σ) 2 +σ 2 0 α 2 1−β 2 +σ 2 0 β 2 1−α 2 =σ 2 − 2σσ 0 αβ−σ 2 0 α 2 β 2 +σ 2 0 α 2 +β 2 (4.19a) ≥σ 2 − 2σσ 0 αβ−σ 2 0 α 2 β 2 + 2σ 2 0 |αβ|, (4.19b) where (4.19b) was obtained from (4.19a) using the relation α 2 +β 2 ≥ 2|αβ|. Because the signs ofu andv can be switched globally without changing the estimate σuv T , w.l.o.g. we assume α =hu,u 0 i 0. We have σ 2 0 α 2 β 2 + 2σ 0 α(σβ−σ 0 |β|) +ζ 2 −σ 2 ≥ 0, (4.20) which is a quadratic inequality in αβ. If (4.20) were satisfied with equality, then the corresponding quadratic equationw.r.t.thevariableσ 0 αwouldhaverootsintheset β −2 (σ 0 |β|−σβ)± q (σ 0 |β|−σβ) 2 −β 2 (ζ 2 −σ 2 ) , by the quadratic formula. Since ζ≤σ from the premise of the theorem, the two roots are of opposite signs. Further, σ 0 α 0 by assumption and therefore, to satisfy (4.20), σ 0 α must be greater than or equal to the larger root. Thus, we have σ 0 α≥β −2 (σ 0 |β|−σβ) + q (σ 0 |β|−σβ) 2 −β 2 (ζ 2 −σ 2 ) >β −2 (σ 0 |β|−σβ). (4.21) Since α≤ 1 and|β|≤ 1, assuming β < 0 leads to (4.21) implying that σ 0 ≥σ 0 α>β −2 (σ 0 |β|−σβ) =β −2 (σ 0 |β| +σ|β|) = (σ 0 +σ)/|β|≥σ 0 +σ, (4.22) which is a clear contradiction. Hence, β > 0 and the left inequality in (4.21) yields the joint bound αβ≥ β −1 σ 0 (σ 0 β−σβ) + q (σ 0 β−σβ) 2 −β 2 (ζ 2 −σ 2 ) ≥ 1− σ σ 0 + s 1− σ σ 0 2 + σ σ 0 2 − ζ σ 0 2 , (4.23) thus proving the theorem. 4.8.2 Proof of Lemma 4.2 Setting x = ζ/σ 0 and y = σ/σ 0 we have 0 ≤ ζ ≤ σ ≤ σ 0 ⇐⇒ 0 ≤ x ≤ y ≤ 1. We also have p 1−ζ 2 /σ 2 = p 1−x 2 /y 2 and η(σ,σ 0 ,ζ) = (1−y) + q (1−y) 2 +y 2 −x 2 from (4.10). To prove the result, it thus suffices to show that f(x,y),η(σ,σ 0 ,ζ)− p 1−ζ 2 /σ 2 = (1−y) + q (1−y) 2 +y 2 −x 2 − p 1−x 2 /y 2 (4.24) 104 Figure 4.10: A three dimensional embedding of vectors u M ,u andu 0 for aiding visualization in proof of Theorem 4.3. is non-negative over the domain 0≤x≤y≤ 1. We have f(0,y) = (1−y) + q (1−y) 2 +y 2 − √ 1 = q (1−y) 2 +y 2 −y≥ 0. (4.25) We further have ∂ ∂x f(x,y) = ∂ ∂x q (1−y) 2 +y 2 −x 2 − p 1−x 2 /y 2 = −x q (1−y) 2 +y 2 −x 2 − −x/y 2 p 1−x 2 /y 2 = −x q (1−y) 2 +y 2 −x 2 + x y p y 2 −x 2 . (4.26) Since ∂ ∂x f(x,y)≥ 0 ⇐⇒ x y p y 2 −x 2 ≥ x q (1−y) 2 +y 2 −x 2 ⇐⇒ 1 y p y 2 −x 2 ≥ 1 q (1−y) 2 +y 2 −x 2 (4.27a) ⇐⇒ y p y 2 −x 2 ≤ q (1−y) 2 +y 2 −x 2 ⇐⇒ y 2 y 2 −x 2 ≤ (1−y) 2 +y 2 −x 2 (4.27b) ⇐⇒ 0≤ (1−y) 2 + 1−y 2 y 2 −x 2 (4.27c) is true over 0≤ x≤ y≤ 1, it follows that ∂ ∂x f(x,y)≥ 0 over 0≤ x≤ y≤ 1. Thus, for every y∈ [0, 1], f(x,y) is increasing w.r.t. x over 0≤x≤y and f(0,y) is non-negative. Therefore, f(x,y)≥f(0,y)≥ 0 over 0≤x≤y≤ 1, completing the proof. 4.8.3 Proof of Theorem 4.3 We shall use ζ, σ,u andv as defined in the steps (S3) and (S4) of Algorithm 1, and η(σ,σ 0 ,ζ) as defined in Theorem 4.1. The proof proceeds by separately bounding l R c −l L c and l R r −l L r . We shall only derive the bound on l R r −l L r since both bounds follow from the same sequence of steps. Theorem 4.1 and Lemma 4.2 together imply thathu 0 ,uihv 0 ,vi≥ η(σ,σ 0 ,ζ)≥ p 1−ζ 2 /σ 2 . Since |hv 0 ,vi|≤kv 0 k 2 kvk 2 ≤ 1 (respectively|hu 0 ,ui|≤ku 0 k 2 kuk 2 ≤ 1) by the Cauchy-Schwartz inequality, we have|hu 0 ,ui|≥ p 1−ζ 2 /σ 2 (respectively|hv 0 ,vi|≥ p 1−ζ 2 /σ 2 ). We assume w.l.o.g. thathu 0 ,ui > 0 implying thathu 0 ,ui≥ p 1−ζ 2 /σ 2 and that Problem P 9 (n,ζ,σ,u) is feasible at step (S4) of Algorithm 1 (otherwisehu 0 ,ui< 0 implying thathu 0 ,−ui≥ p 1−ζ 2 /σ 2 and that Problem P 9 (n,ζ,σ,−u) is feasible). Let z L r ,l L r denote a solution to Problem P 9 (n,ζ,σ,u). It is clear from the constraints in Problem (P 9 ) that z L r is a unimodal vector and satisfies z L r ,u ≥ z L r 2 p 1−ζ 2 /σ 2 . For brevity of notation, we set 105 u M =z L r / z L r 2 and gethu M ,ui≥ p 1−ζ 2 /σ 2 . The proof proceeds by boundinghu M ,u 0 i using the bounds onhu M ,ui andhu 0 ,ui. Let points A, B and C respectively represent the vectorsu M ,u andu 0 in n-dimensional space with O as origin (see Figure 4.10 as an aid to visualization). Therefore, OA, OB and OC are all unit length line segments and the inner productshu M ,ui,hu 0 ,ui andhu M ,u 0 i are respectively equal to cos\AOB, cos\BOC and cos\COA. Using the cosine rule from elementary trigonometry on triangle COA, we have AC = p OA 2 +OC 2 − 2·OA·OC· cos\COA = p 2− 2·hu M ,u 0 i. (4.28) Similarly, using the cosine rule on triangles AOB and BOC respectively gives AB = p 2− 2·hu M ,ui and BC = p 2− 2·hu 0 ,ui. By the triangle inequality, we have AC≤ AB +BC leading to p 2− 2·hu M ,u 0 i≤ p 2− 2·hu M ,ui + p 2− 2·hu 0 ,ui =⇒ p 1−hu M ,u 0 i≤ p 1−hu M ,ui + p 1−hu 0 ,ui =⇒ p 1−hu M ,u 0 i≤ q 1− p 1−ζ 2 /σ 2 + q 1− p 1−ζ 2 /σ 2 (4.29a) =⇒ 1−hu M ,u 0 i≤ 4· 1− p 1−ζ 2 /σ 2 =⇒ hu M ,u 0 i≥ 1− 4· 1− p 1−ζ 2 /σ 2 = 4 p 1−ζ 2 /σ 2 − 3,ζ 0 (4.29b) where (4.29a) uses the lower bounds on the inner productshu M ,ui andhu 0 ,ui. For the r.h.s. of (4.29b) to be a useful bound, we need it to be positive, or equivalently, p 1−ζ 2 /σ 2 > 3/4 ⇐⇒ ζ 2 /σ 2 < 7/16, which is assumed in the premise of this theorem. Let us refer to Problems (P 10 ) and (P 11 ) as Problems P 10 (l ∗ ,ρ,v) and P 11 (l ∗ ,v) to make the dependence on the parametersl ∗ ,ρ andv explicit. From the premise of the theorem,u 0 ≥ 0 is a unimodal vector with its peakattheindexl 0 r . Recallthatu M isalsoaunimodalvectorwithitspeakatindexl L r andsupposew.l.o.g.that l L r ≤l 0 r . It is clear thatu M is feasible for Problem P 11 l L r ,u 0 and (4.29b) implies that−hu M ,u 0 i≤−ζ 0 . By Lemma 4.5, we can assume thatu M ≥ 0 and thatu M is feasible for Problem P 10 l L r ,ζ 0 ,u 0 . Next, using Lemma 4.4, we get the bound in (4.13) provided that the restrictions on k L , k R and δ are satisfied. In the notation for Problem P 10 l L r ,ζ 0 ,u 0 , (4.13) says that (ζ 0 ) 2 ≤h1,u 0 i 2 + δ 2 − 2δh1,u 0 i · (k L +k R + 1) +δ 2 (k L +k R + 1) 2 ,h(δ) (4.30) is true for any integers 0≤k L ≤l 0 r −l L r , 0≤k R ≤n−l 0 r , and any δ∈R + satisfying δ≤ h1,u 0 (j + 1 :n)i (l 0 r +k R −j) , ∀j∈ l 0 r −k L − 1,...,l 0 r +k R − 1 . (4.31) We will use k L =l 0 r −l L r and k R = 0. Invoking Lemma 4.6 onu 0 implies thath1,u 0 (j + 1 :n)i/ l 0 r −j is monotonically non-decreasing in j over 1≤j≤l 0 r − 1. Hence, the dominating bound in (4.31) is obtained for j =l 0 r −k L − 1 =l L r − 1. This gives the largest permissible value of δ as δ ∗ = 1,u 0 l L r :n / l 0 r −l L r + 1 , leading to h(δ ∗ ) =h1,u 0 i 2 + δ 2 ∗ − 2δ ∗ h1,u 0 i l 0 r −l L r + 1 +δ 2 ∗ l 0 r −l L r + 1 2 =h1,u 0 i 2 − 2h1,u 0 iδ ∗ l 0 r −l L r + 1 +δ 2 ∗ l 0 r −l L r + 2 l 0 r −l L r + 1 =h1,u 0 i 2 − 2h1,u 0 i 1,u 0 l L r :n + l 0 r −l L r + 2 l 0 r −l L r + 1 1,u 0 l L r :n 2 = h1,u 0 i− 1,u 0 l L r :n 2 + 1,u 0 l L r :n 2 l 0 r −l L r + 1 < 1,u 0 1 :l L r − 1 2 + h1,u 0 i 2 l 0 r −l L r + 1 ≤ 1 2 (ζ 0 ) 2 + ρ 2 u n l 0 r −l L r + 1 (4.32) 106 where the last inequality follows from the premiseh1,u 0 i =ku 0 k 1 ≤ρ u √ n and using l L r − 1<l L r ≤l BL r with (4.11a). Using (4.32) in (4.30) gives (ζ 0 ) 2 ≤h(δ ∗ )< 1 2 (ζ 0 ) 2 + ρ 2 u n l 0 r −l L r + 1 =⇒ l 0 r −l L r < 2ρ 2 u n (ζ 0 ) 2 − 1 (4.33) Note that step (S5) of Algorithm 1 essentially uses Problem P 9 (n,ζ,σ,u) to find l R r ; just like step (S4) except that we are now looking for the maximum index l permitting Problem P 9 (n,ζ,σ,u) to be feasible. Hence, all arguments in the preceding three paragraphs are still valid withu M now representing a unimodal vector with its peak at index l R r and l R r ≥l 0 r . Analogous to (4.33), this leads to the bound l R r −l 0 r < 2ρ 2 u n (ζ 0 ) 2 − 1 (4.34) which, when added to (4.33) gives l R r −l L r < 4ρ 2 u n (ζ 0 ) 2 − 2< 4ρ 2 u n (ζ 0 ) 2 . (4.35) All arguments in this proof w.r.t. u 0 can be duplicated w.r.t. v 0 , starting at the second paragraph from |hv 0 ,vi|≥ p 1−ζ 2 /σ 2 . Hence, analogous to (4.35), we can derive an upper bound on l R c −l L c which when multiplied with (4.35) gives (4.12) and completes the proof. 4.8.4 Proof of Lemma 4.4 We shall reason about the feasibility of Problem (P 10 ) by studying the closely related optimization prob- lem (P 11 ). Using Lemma 4.5, solutions to Problem (P 11 ) can be translated to and from Problem (P 10 ). Thus, it suffices to show that under the assumptions of this lemma, it is necessary for the inequality in (4.13) to hold if the optimal value of Problem (P 11 ) is not to exceed−ρ. The unimodality constraints (first two constraints) in Problem (P 11 ) can be written more compactly as a linear inequality constraintAz≤ 0 whereA∈R (n−1)×n is a bidiagonal matrix with non-zero elements A(j,j :j + 1) = ( (1,−1), 1≤j≤l ∗ − 1, (−1, 1), l ∗ ≤j≤n− 1. (4.36) The Lagrangian for Problem (P 11 ) is L(z;λ,μ) =−hz,vi +λ T Az +μ kzk 2 2 − 1 =z T A T λ−v +μz T z−μ (4.37) and its partial first and second derivatives w.r.t.z are ∂ ∂z L(z;λ,μ) =A T λ−v +2μz and ∂ 2 ∂z 2 L(z;λ,μ) = 2μI respectively. Clearly,L(z;λ,μ) is minimized w.r.t.z atz = v−A T λ /(2μ) for μ> 0, implying that the Lagrangian dual function is g(λ,μ) = inf z L(z;λ,μ) =− 1 2μ v−A T λ 2 2 +μ 1 4μ 2 v−A T λ 2 2 − 1 =− 1 4μ v−A T λ 2 2 −μ. (4.38) We further have ∂ ∂μ g(λ,μ) = 4μ 2 −1 v−A T λ 2 2 − 1 so that g(λ,μ) is maximized w.r.t. μ for μ > 0 at μ = v−A T λ 2 /2. Letz opt be a solution to Problem (P 11 ). From (4.38), we have sup μ g(λ,μ) =− 1 4μ (2μ) 2 −μ =−2μ =− v−A T λ 2 (4.39) 107 and using weak duality theory for Problem (P 11 ) gives −hz opt ,vi≥ sup λ≥0 sup μ g(λ,μ) = sup λ≥0 − v−A T λ 2 =− inf λ≥0 v−A T λ 2 . (4.40) Lemma 4.5 says that for feasibility of Problem (P 10 ), we must have−hz opt ,vi≤−ρ which implies that ρ≤ v−A T λ 2 for everyλ∈R n + . Next, we make an appropriate choice ofλ≥ 0 to get the inequality in (4.13). Let 0≤ k L ≤ l 0 −l ∗ and 0≤ k R ≤ n−l 0 denote integers and let δ > 0 be a real number, all chosen arbitrarily. The vectorw T ,λ T A can be expressed piecewise as w(j) = λ(1) , j = 1, λ(j)−λ(j− 1) , 2≤j≤l ∗ − 1, −λ(l ∗ )−λ(l ∗ − 1) , j =l ∗ , −λ(j) +λ(j− 1) , l ∗ + 1≤j≤n− 1, λ(n− 1) , j =n. (4.41) We selectλ such thatw satisfies w(j) = v(j) , 1≤j≤l 0 −k L − 1, j6=l ∗ , v(j)−δ , l 0 −k L ≤j≤l 0 +k R , v(j) , l 0 +k R + 1≤j≤n. (4.42) We do not explicitly set w(l ∗ ) but require it to satisfy the consistency of assignments using (4.41) and (4.42). We solve forλ(1 :l ∗ − 1) recursively element-wise starting fromλ(1) and forλ(l ∗ :n− 1) recursively element-wise starting fromλ(n− 1). We get λ(j) = j X k=1 v(k) , 1≤j≤l ∗ − 1, −(k R +k L + 1)δ + n X k=j+1 v(k) , l ∗ ≤j≤l 0 −k L − 2, − l 0 +k R −j δ + n X k=j+1 v(k) , l 0 −k L − 1≤j≤l 0 +k R − 1, n X k=j+1 v(k) , l 0 +k R ≤j≤n− 1, (4.43) and for consistency, we have w(l ∗ ) =−λ(l ∗ )−λ(l ∗ − 1) = (k R +k L + 1)δ +v(l ∗ )−h1,vi. (4.44) Sinceλ≥ 0 is needed, we must ensure in (4.43) that− l 0 +k R −j δ+h1,v(j + 1 :n)i≥ 0 for everyl 0 −k L − 1≤j≤l 0 +k R − 1, or equivalently, (4.14) should hold to guaranteeλ l 0 −k L − 1 :l 0 +k R − 1 ≥ 0. Since v≥ 0, we already haveλ(1 :l ∗ − 1)≥ 0 andλ l 0 +k R :n− 1 ≥ 0 in (4.43). Further,λ l ∗ :l 0 −k L − 2 ≥ 0 follows from (4.14) with j = l 0 −k L − 1 and the simple observation that λ(l ∗ )≥ λ(l ∗ + 1)≥···≥ λ l 0 −k L − 1 ≥ 0. Withw =A T λ satisfying (4.42) and (4.44), we get ρ 2 ≤ v−A T λ 2 2 = v(l ∗ )−w(l ∗ ) 2 + v l 0 −k L :l 0 +k R −w l 0 −k L :l 0 +k R 2 2 = (k R +k L + 1)δ−h1,vi 2 +δ 2 · l 0 +k R −l 0 +k L + 1 =h1,vi 2 − 2δh1,vi(k R +k L + 1) +δ 2 (k R +k L + 1) 2 +δ 2 (k R +k L + 1) =h1,vi 2 + δ 2 − 2δh1,vi · (k R +k L + 1) +δ 2 (k R +k L + 1) 2 , (4.45) completing the proof. 108 4.8.5 Proof of Lemma 4.5 Assumingz opt ≥ 0, the second part of the lemma follows on observing that 1. the constraintkz opt k 2 2 = 1 is equivalent to the constraintkz opt k 2 = 1, 2. the optimal value of Problem (P 11 ) is−hz opt ,vi and−hz opt ,vi≤−ρ ⇐⇒ hz opt ,vi≥ρ, and 3. the remaining constraints in Problem (P 10 ) are also present in Problem (P 11 ) andz opt ≥ 0. To show the first part of the lemma, we start from an arbitrary solutionz ∗ for Problem (P 11 ) and transform it into a vector inR n + that is feasible for Problem (P 11 ) and gives the same or a better value of the objective function thanz ∗ . Ifz ∗ ≥ 0 then no transformation is necessary. Otherwise, we invoke the following sequence of arguments. For brevity, we refer to the first two constraints in Problem (P 11 ) as the unimodality constraints. 1. If z ∗ (l ∗ ) < 0, then form a vector z 0 ∈ R n that agrees with z ∗ on the indices{1,...,n}\{l ∗ } and z 0 (l ∗ ) =|z ∗ (l ∗ )|>z ∗ (l ∗ ). Clearly,kz 0 k 2 2 =kz ∗ k 2 2 = 1 andz 0 satisfies the unimodality constraints since l ∗ is still the index of the largest element and the other elements are same as those in z ∗ . Further, v(l ∗ )≥ 0 implies that (−hz 0 ,vi)− (−hz ∗ ,vi) =hz ∗ ,vi−hz 0 ,vi =z ∗ (l ∗ )v(l ∗ )−z 0 (l ∗ )v(l ∗ ) = (z ∗ (l ∗ )−|z ∗ (l ∗ )|)v(l ∗ )≤ 0. (4.46) Thus,z 0 is feasible for Problem (P 11 ) and is optimal w.r.t. the value of the objective function since −hz 0 ,vi≤−hz ∗ ,vi. Therefore, w.l.o.g. we subsequently assumez ∗ (l ∗ )≥ 0. 2. Ifz ∗ (l ∗ ) = 0 thenz ∗ ≤ 0 implying that the optimal value of Problem (P 11 ) is−hz ∗ ,vi =h−z ∗ ,vi≥ 0 sincev≥ 0. Define the vectorz 0 ∈R n such thatz 0 (l ∗ ) = 1 andz 0 ({1,...,n}\{l ∗ }) = 0. Then,z 0 has unit length and trivially satisfies the unimodality constraints with peak at index l ∗ , making it feasible for Problem (P 11 ). Furthermore,−hz 0 ,vi =−v(l ∗ )≤ 0 implies thatz 0 achieves an objective function value that is at least as good asz ∗ . Therefore, w.l.o.g. we may subsequently assumez ∗ (l ∗ )> 0. 3. Let Λ⊆{1, 2,...,n}\{l ∗ } denote the set of indices on whichz ∗ is negative. Sincez ∗ is monotonically non-decreasing on the index set{1, 2,...,l ∗ − 1}, Λ 1 , Λ T {1, 2,...,l ∗ − 1} is either empty or is the contiguous set (1, 2,...,|Λ 1 |). By an analogous reasoning, Λ 2 , Λ T {l ∗ + 1,l ∗ + 2,...,n} is either empty or is the contiguous set{n−|Λ 2 | + 1,n−|Λ 2 | + 1,...,n}. Consider a vectorz 0 ∈R n such that z 0 (Λ) = 0 andz 0 (Λ c ) =z ∗ (Λ c )/kz ∗ (Λ c )k 2 . Since l ∗ ∈ Λ c andz ∗ (l ∗ )> 0,kz ∗ (Λ c )k 2 > 0 andz 0 (Λ c ) is well defined. Clearly,kz 0 k 2 2 = 1 andz 0 satisfies the unimodality constraints because (a) on index subset Λ c , z 0 is a positively rescaled version of z ∗ and thus honors the element-wise inequality constraints, (b) on index subset Λ,z 0 coincides with a zero vector and trivially satisfies the unimodality constraints with equality, and (c) the boundary cases 0 =z 0 (|Λ 1 |)≤z 0 (|Λ 1 | + 1) andz 0 (n−|Λ 2 |)≥z 0 (n−|Λ 2 | + 1) = 0 are also satisfied sincez 0 (|Λ 1 | + 1) andz 0 (n−|Λ 2 |) are non-negative by definition of Λ 1 and Λ 2 . Further, using the non-negativity ofv, (−hz 0 ,vi)− (−hz ∗ ,vi) =hz ∗ ,vi−hz 0 ,vi =hz ∗ (Λ),v(Λ)i +hz ∗ (Λ c ),v(Λ c )i−hz 0 (Λ c ),v(Λ c )i =−h−z ∗ (Λ),v(Λ)i +hz ∗ (Λ c ),v(Λ c )i− 1 kz ∗ (Λ c )k 2 hz ∗ (Λ c ),v(Λ c )i ≤ 0 + kz ∗ (Λ c )k 2 − 1 kz ∗ (Λ c )k 2 hz ∗ (Λ c ),v(Λ c )i≤ 0, (4.47) where the first inequality is because−z ∗ (Λ)> 0 =⇒ h−z ∗ (Λ),v(Λ)i≥ 0 and the second inequality is because 0<kz ∗ (Λ c )k 2 ≤kz ∗ k 2 = 1 andz ∗ (Λ c )≥ 0 =⇒ hz ∗ (Λ c ),v(Λ c )i≥ 0. Thus,z 0 is feasible for Problem (P 11 ), satisfies z 0 ≥ 0 and is optimal w.r.t. the value of the objective function, since −hz 0 ,vi≤−hz ∗ ,vi. Therefore, w.l.o.g. we can assumez ∗ ≥ 0, completing the proof. 109 4.8.6 Proof of Lemma 4.6 We will use mathematical induction. Let f(j),h1,v(j + 1 :n)i/ l 0 −j be defined over 1≤j≤l 0 − 1. We have 1,v l 0 :n ≥v l 0 from element-wise non-negativity ofv, andv l 0 ≥v l 0 − 1 from unimodality ofv. This leads to the induction basis f l 0 − 2 −f l 0 − 1 = 1,v l 0 − 1 :n 2 − 1,v l 0 :n = v l 0 − 1 − 1,v l 0 :n 2 ≤ v l 0 − 1 −v l 0 2 ≤ 0. (4.48) For the inductive step, we have f(j− 1) = h1,v(j :n)i (l 0 −j + 1) = v(j) (l 0 −j + 1) + l 0 −j (l 0 −j + 1) h1,v(j + 1 :n)i (l 0 −j) = v(j) (l 0 −j + 1) + l 0 −j (l 0 −j + 1) f(j), (4.49) implying that f(j− 1) is a convex combination of v(j) and f(j). If v(j)≤ f(j) is true, then we would immediately have f(j− 1)≤ f(j) since f(j− 1) must lie on the real line between v(j) and f(j). From unimodality ofv, we havev l 0 ≥v l 0 − 1 ≥···≥v(j) and therefore v(j) = P l 0 k=j+1 v(j) l 0 −j ≤ P l 0 k=j+1 v(k) l 0 −j = 1,v j + 1 :l 0 l 0 −j ≤ h1,v(j + 1 :n)i l 0 −j =f(j), (4.50) completing the proof. 4.8.7 Relaxing Positivity and Sampling Grid Assumptions in Algorithm 1 For a non-square sampling grid of size n r ×n c , the sample complexity bound to guarantee success of low-rank matrix completion w.h.p. changes [70] to|V 0 | = O (n r +n c ) log 2 max{n r ,n c } and steps (S1) and (S2) in Algorithm 1 should be adjusted accordingly. Steps (S4) and (S5) should be changed to operate on Problem P 9 (n r ,ζ,σ,u) to give l L r ,l R r ∈{1, 2,...,n r }, and analogously, step (S6) should operate on Problem P 9 (n c ,ζ,σ,v) to give l L c ,l R c ∈{1, 2,...,n c }. The functional forms for ζ and C(q,n) should change according to [69], however Theorem 4.1 is valid as is. Theorem 4.3 undergoes only a small change with n being replaced by n r in all assumptions pertaining to u 0 and n being replaced by n c for all assumptions aboutv 0 , implying that n 2 in the localization bound (4.12) is replaced by n r ·n c . To relax the positivity assumption on H(·), we note that Algorithm 1 works in exactly the same way even if only the weaker condition of|u 0 |≥ 0 and|v 0 |≥ 0 being unimodal is satisfied. This is apparent from examining the statements for Theorem 4.3 and Lemmas 4.4 and 4.5. The properties of positivity and unimodality ofu 0 andv 0 are not used by Theorem 4.1, and by examining the proof of Theorem 4.3 we see that these properties ofu 0 andv 0 are relevant only in the steps (S4) through (S6) of Algorithm 1 through the use of Problem (P 9 ). We make the following claim without proof (note that the absolute value operator |·| is understood to act element-wise on vectors). Corollary 4.7. Consider a modification of Algorithm 1 with all instances of Problem P 9 (n,ζ,σ,u) replaced by Problem P 9 (n,ζ,σ,|u|) and all instances of Problem P 9 (n,ζ,σ,v) replaced by Problem P 9 (n,ζ,σ,|v|). The conclusion of Theorem 4.3 holds for this modified algorithm under the weaker assumption of|u 0 | and|v 0 | being unimodal vectors with respective peaks atl 0 r andl 0 c , whereH =σ 0 u 0 v T 0 is the SVD of the not necessarily positive matrixH∈R n×n , provided that all other assumptions of Theorem 4.3 remain unchanged. 4.8.8 Coherence Computation We follow the definitions laid out in [70]. LetH =σ 0 u 0 v T 0 denote the SVD ofH and let ν > 0 denote the coherence parameter defined as the minimum value of ν 0 satisfying the bounds max i,j |hu 0 ,e i i| 2 +|hv 0 ,e j i| 2 ≤ 2ν 0 √ n and max i,j |hu 0 ,e i i| 2 ·|hv 0 ,e j i| 2 ≤ ν 0 n . (4.51) 110 Since H is formed by discretization of the function H(y) = F (y c )G(y r ), for high enough resolution of discretization, we can write μ(u 0 ), max i |hu 0 ,e i i| 2 ku 0 k 2 2 ≈ max i (i+1)/ √ n i/ √ n F (y c )dy c 2 1/2 −1/2 F 2 (y c )dy c (i+1)/ √ n i/ √ n dy c = max i √ n (i+1)/ √ n i/ √ n F (y c )dy c 2 √ n 1/2 −1/2 F 2 (y c )dy c ≈ ess sup yc F 2 (y c ) √ n 1/2 −1/2 F 2 (y c )dy c . (4.52) where we have assumed−1/2≤y c ≤ 1/2. If the function F (·) is highly localized within [−1/2, 1/2] then, μ(u 0 ) = (1−γ(n)) ess sup yc F 2 (y c ) √ n ∞ −∞ F 2 (y c )dy c . (4.53) where the approximation factor (1−γ(n)) encapsulates all of the foregoing approximations. Similarly, μ(v 0 ), max j |hv 0 ,e j i| 2 kv 0 k 2 2 = (1−γ 0 (n)) ess sup yr G 2 (y r ) √ n ∞ −∞ G 2 (y r )dy r . (4.54) Equation (4.51) implies that ν = max √ n 2 μ(u 0 ) +μ(v 0 ) ,nμ(u 0 )μ(v 0 ) . (4.55) Barring the approximation factors of (1−γ(n)) and (1−γ 0 (n)), it is clear that √ nμ(u 0 ) and √ nμ(v 0 ) are independent of n as long as the approximations in (4.52), (4.53) and (4.54) are valid. In particular, the coherence parameter ν is unchanged by sub-sampling on a √ g× √ g uniform grid as long as γ(n)≈γ(g) and γ 0 (n)≈γ 0 . Exponential Fields Let H(y) =H 0 exp(−a c |y c | pc −a r |y r | pr ) with F (y c ) = √ H 0 exp(−a c |y c | pc ) and G(y r ) = √ H 0 exp(−a r |y r | pr ). We have μ(u 0 ) = ess sup yc exp(−2a c |y c | pc ) √ n ∞ −∞ exp(−2a c |y c | pc )dy c = 1 2 √ n ∞ 0 exp(−2a c y pc c )dy c −1 (4.56a) = 1 2 √ n 1 p c (2a c ) 1/pc ∞ 0 t 1 pc −1 exp(−t)dt ! −1 (4.56b) = (2a c ) 1/pc √ n(2/p c )Γ(1/p c ) , (4.56c) 111 where (4.56b) is obtained from (4.56a) by the change of variables t = 2a c y pc c , and (4.56c) uses the definition of the Γ-function. Similarly, μ(v 0 ) = (2a r ) 1/pr √ n(2/p r )Γ(1/p r ) (4.57) and the coherence parameter is determined as in (4.55). Power Law Fields LetH(y) =H 0 (a c +|y c | pc ) −1 (a r +|y r | pr ) −1 withF (y c ) = √ H 0 (a c +|y c | pc ) −1 andG(y r ) = √ H 0 (a r +|y r | pr ) −1 . We have μ(u 0 ) = ess sup yc (a c +|y c | pc ) −2 √ n ∞ −∞ (a c +|y c | pc ) −2 dy c = 1 2a 2 c √ n ∞ 0 (a c +y pc c ) −2 dy c −1 (4.58a) = 1 2a 2 c √ n (π/p c )(1− 1/p c ) a 2−1/pc c sin(π/p c ) ! −1 , p c ∈ 1 2 , 1 S (1,∞) 1 2a 2 c √ n 1 a c −1 , p c = 1 (4.58b) = p 2 c sin(π/p c ) 2 √ nπ(p c − 1)a 1/pc c , p c ∈ 1 2 , 1 S (1,∞) (2a c √ n) −1 , p c = 1 (4.58c) where (4.58b) was obtained from (4.58a) by considering the following cases. For p c = 1, we have ∞ 0 (a c +y pc c ) −2 dy c = ∞ 0 (a c +y c ) −2 d(a c +y c ) =−(a c +y c ) −1 ∞ yc=0 =a −1 c . (4.59) For p c > 1, we invoke the definite integral formula ∞ 0 t m dt (t n +a n ) r = (−1) r−1 πa m+1−nr · Γ[(m + 1)/n] n sin[(m + 1)π/n]· (r− 1)!· Γ[(m + 1)/n−r + 1] (4.60) from [132], valid in the range n(r− 2)<m + 1<nr, with the values m = 0,n =p c ,a =a 1/pc c ,r = 2. To see that the range criterion is satisfied, we observe that n(r− 2)<m + 1<nr reduces to 0< 1< 2p c which is true for p c > 1/2. Similarly, μ(v 0 ) = p 2 r sin(π/p r ) 2 √ nπ(p r − 1)a 1/pr r , p r ∈ 1 2 , 1 S (1,∞) (2a r √ n) −1 , p r = 1 (4.61) and the coherence parameter is determined as in (4.55). 4.8.9 Mean-shift based gradient ascent (MS) algorithm 112 Algorithm 2 Modified mean-shift gradient ascent algorithm for sampling and peak detection on the current grid Input: n×n grid selected at the current stage Output: Location of peaks μ 1 ,μ 2 ,... where the number of peaks is initially unknown Steps: 1: Set the maximum number of iterations to M, and the size of window to ω 2: Cluster index K← 1 3: for iter ← 1 to M do 4: Randomly select an initial location X on the map. 5: while X hasn’t already been visited do 6: // assuming we’re going to find a new peak 7: Mark X as visited and belonging to cluster K 8: Compute the new center of mass X c within a window of size ω around X. 9: Compute the direction from X toX c and quantize the angle into 8 equi-partitioned bins in [0, 2π) 10: Update the position of X to one of the 8 adjacent positions based on the quantized direction 11: end while 12: Mark all points in this trail leading up to X as belonging to the same cluster as X 13: if X belongs to the new cluster K then 14: // new peak found 15: Append X as μ K to the set of peaks 16: K←K + 1 17: end if 18: end for 19: Return the highest peak among μ 1 ,μ 2 ,...,μ K 113 Chapter 5 Delay-Doppler Estimation of Channels with Leakage 5.1 Introduction 1 Wireless communications have enabled new systems such as intelligent traffic safety (vehicle-to-vehicle communication) [79], [133]–[135], robotic networks [112], underwater surveillance systems [136], [137] etc. To establish high data rate reliable wireless communication between a transmitter and a receiver, accurate channel state information is needed at the receiver. Training-based methods, which probe the channel in time and frequency with known signals and reconstruct the channel response from the output signals, are most commonly used to accomplish this task (see [78] and references therein). There are many well-known approaches for the training-based channel estimation approach. For example, least-squares (LS) [76], [78], Wiener filters [76], [138], compressed sensing (CS) methods based on (element- wise) sparsity structure of dominant paths in the channel [77]–[79], and hybrid sparse and diffuse (HSD) estimators [139], [140]. The conventional LS and Wiener-filtering estimators do not take advantage of the inherent structure of the channel. The other drawback of Wiener-filtering is that the knowledge of the scattering function is required [76]; however, the scattering function is not typically known at the receiver. Often, a flat spectrum in the delay-Doppler domain is assumed, which introduces performance degradation due to the mismatch with respect to the true scattering function [76]. Compressed sensing (CS) methods [77]–[79] take advantage of the inherit sparsity structure of dominant components in the channel; however due to finite block length and finite transmission bandwidth the sparsity of the channel decreases in practical communication systems. This effect is known as channel leakage[77], [79]. It has been shown that leakage [77], [79] and basis mismatch [75] significantly degrade the performance of CS methods. Contributions: We show that channel matrix in the time-delay (or Doppler-delay) domain representation is a low rank matrix with rank equal to number of dominant paths. Using the low rank structure, we develop an alternating minimization based approach to reconstruct the channel matrix using linear measurements at the receiver. Our approach optimizes a weighted mean squared error cost function and works directly in the low-rank parametrized space. We analyze the algorithm to show that the global optimum can be recovered in the absence of noise, even though the underlying problem is non-convex under the given parametrization. We justify our selection of weights to speed up convergence as compared to an unweighted MMSE estimate and highlight why a minimum nuclear norm initialization of the algorithm fails. Performance of the algorithm is demonstrated by numerical experiments in parameter regimes where the inverse problem is well-conditioned, and it is also shown that basis pursuit fails with gross errors. 1 This section is joint work with Sajjad Beygi and is presented here for completeness 114 Organization: Section 5.2 derives the communication system model, shows that the model has a low rank property, and introduces the weighted MMSE estimation problem. Section 5.3 describes our alternating minimization algorithm and states its theoretical properties. Section 5.4 is devoted to a discussion of parameter choices and presentation of numerical results. Section 5.5 concludes the chapter. Notational Conventions: We use lowercase boldface alphabets to denote column vectors (e.g.z), uppercase boldface alphabets to denote matrices (e.g.A). The MATLAB r indexing rules are used to denote parts of a vector/matrix (e.g.A(2 : 3, 4 : 6) denotes the sub-matrix ofA formed by the rows{2, 3} and columns {4, 5, 6}) with one exception: indexing does not necessarily start at 1. The all zero, all one and identity matrices are respectively denoted by 0, 1 and I with dimensions dictated by context. A diagonal matrix with elements fromx is written as diag(x) and unrolling of a matrixA into a column vector is denoted by vec(A) =A(:). (·) T , (·) ∗ and (·) H respectively denote the transpose, complex conjugation, and conjugate transpose operations and|·| denotes absolute value. The functionsk·k,k·k F andk·k ∗ respectively return the spectral, Frobenius and nuclear norms of their matrix argument. R andC respectively denote the set of real and complex numbers. 5.2 System Model and Low-Rank Structure 2 Let the transmitted signal s(t) be generated by modulating the transmitted pilot sequence s(n) onto the transmit pulse p t (t) as, s(t) = +∞ X n=−∞ s(n)·p t (t−nT s ), (5.1) where T s is the sampling period. This signal model is fairly general and encompasses Orthogonal-Frequency- Division-Multiplexing (OFDM) signals as well as single-carrier signals. The signal s(t) is transmitted over a linear, time-varying channel. The received signal y(t) can be written as, y(t) = +∞ −∞ h(t,τ)·s(t−τ) dτ +z(t). (5.2) Here, h(t,τ) is the channel’s time-varying impulse response, and z(t) is complex white Gaussian noise. A common model for the time-varying impulse response is h(t,τ) = P X i=1 a i δ(t−τ i ) exp(j2πν i t). (5.3) At the receiver, y(t) is converted into a discrete-time signal using an anti-aliasing filter p r (t). That is, y(n) = +∞ −∞ y(t)·p r (nT s −t) dt. (5.4) The relationship between the discrete-time signal s(n) and received signal y(n), using (5.1) through (5.4) simplifies to y(n) = +∞ X m=−∞ h(n,m)·s(n−m) +z(n), (5.5) whereh(n,m) is the discrete time-delay representation of the observed channel and is related to the continuous- time channel impulse response h(t,τ) by h(n,m) = +∞ −∞ h(t +nT s ,τ)·p t (t−τ +mT s )·p r (−t) dt dτ. (5.6) 2 This section is joint work with Sajjad Beygi and is presented here for completeness 115 Without loss of generality, we assume that p r (t) has a root-Nyquist spectrum with respect to the sample duration T s . This implies that z(n) is a sequence of i.i.d. circularly symmetric complex Gaussian random variables with varianceσ 2 z and thath(n,m) is causal with maximum delayM−1 (i.e.h(n,m) = 0 form≥M and m< 0). To account for pulse shaping and a finite-length training sequence we assume that p t (t) and p r (t) are causal with support [0,T supp ). For simplicity, we begin by focusing on a single path channel and discuss the relaxation to multi-path channels towards the end of this section. The contribution to the received signal from a single dominant path,i.e. h i (t,τ) =a i δ(t−τ i ) exp(j2πν i t), in (5.3) is of the form s(t)?h i (t,τ) = ∞ X l=−∞ s(l)·a i exp(j2πν i t)·p t (t−lT s −τ i ) (5.7) ≈ ∞ X l=−∞ s(l)·a i exp(j2πν i (lT s +τ i ))·p t (t−lT s −τ i ), where the approximation is valid if we make the (reasonable) assumption that ν i T supp 1 (? denotes convolution). If we let p(t) =p t (t)?p r (t), we can write the contribution after filtering and sampling as y i (n) =s(t)?h i (t,τ)?p r (t) t=nT = ∞ X l=−∞ s(l)·a i exp j2πν i (lT s +τ i ) ·p (n−l)T s −τ i = ∞ X m=−∞ s(n−m)·a i exp j2πν i ((n−m)T s +τ i ) ·p(mT s −τ i ), (5.8) For brevity, let us define h i (n,m) =a i exp j2πν i ((n−m)T s +τ i ) ·p(mT s −τ i ) (5.9) and observe that (5.8) can be written as y i (n) = M−1 X m=0 h i (n,m)·s(n−m) +z(n) (5.10) = M−1 X m=0 K X k=−K H i (k,m) 2K + 1 exp j 2πnk 2K + 1 ! s(n−m) +z(n), (5.11) for n = 0, 1,...,N r − 1, with 2K + 1≥N r denoting the total number sample measurements and H i (k,m) = Nr−1 X n=0 h i (n,m) exp −j 2πnk 2K + 1 , |k|≤K (5.12) representing the discrete delay-Doppler, spreading function of the channel [141]. By substituting (5.9) in (5.12) we can write H i (k,m) = ˜ a i h τ,i (m)h ν,i (k), (5.13) where ˜ a i =a i exp(−j2πν i τ i ),h τ,i (m) = exp(−j2πν i mT s )·p(mT s −τ i ), andh ν,i (k) =w k 2K+1 ,ν i T s with w(k,f) defined as w(k,f) = N r 2K + 1 , k−f = 0 exp −jπ(k−f)(N r − 1) · sin(π(k−f)N r ) (2K + 1) sin(π(k−f)) , k−f6= 0. (5.14) 116 0 20 40 60 80 100 120 140 −128 −64 64 128 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Delay Doppler Figure 5.1: A single path time-varying channel,H 1 , representation in delay Doppler domain. Here K = 128 and M = 128. We note that the leakage in the delay and Doppler plane is due to the non-zero support ofh τ andh ν . The leakage with respect to Doppler decreases with the observation length N r , and the leakage with respect to delay decreases with the bandwidth of the transmitted signal. By linearity of the discrete Fourier transform, we have H(k,m) = X i H i (k,m) (5.15) where the summation is over all the dominant paths in the channel. Let the transmit sequence s(n) be of length N r +M− 1 over n =−(M− 1),...,N r − 1 and let us collect the N r received samples in a column vector y = [y(0),...,y(N r − 1)] T . (5.16) From (5.13), we have the representationH 1 =hg T whereg = √ ˜ a 1 h τ,1 ∈C M andh∈C Nr is an intrinsically one dimensional non-vanishing and non-linear function of f = ν 1 T s uniquely defined by (5.13) as h(k). In Fig. 5.1, we have illustrated the channel matrix, H 1 , in delay-Doppler domain. Let us represent the observation model (5.13) by the vector equationy =A(H 1 ) +z, whereA:C Nr×M →C Nr is the effective linear operator acting onH 1 andA n ∈C Nr×M denote the unique matrices such thaty(n) = Tr(A n H 1 )+z(n) is satisfied for n = 0, 1,...,N r − 1, and are given by A n (k, :) = ( 0 T , k6=n, s(n :−1 :n−M + 1) T , k =n. (5.17) For one dominant path, the weighted minimum-mean-squared-error (MMSE) estimate of (g,h) with weight matrixW, can be expressed as the solution to the optimization problem minimize g,h W y−A hg T 2 2 subject to h∈D h , g∈D g , (P 12 ) whereD h andD g respectively denote the structural restrictions on the optimization variables h and g. Clearly,D h is completely parameterized by f. It is also apparent thatD g has a complicated analytical dependence on f. To simplify the derivation and analysis of our iterative algorithm to solve Problem (P 12 ), we relax the structural restrictiong∈D g tog∈C M . The price we pay for this relaxation is that we shall need more observations (y would need to be longer) to guarantee that the weighted MMSE estimate of (g,h) 117 will recover the ground truth, in the absence of noise. The relaxed version of Problem (P 12 ) can be expressed as minimize g,f W y−A h f g T 2 2 subject to f∈ (−0.5, 0.5], g∈C M (P 13 ) whereh f just denotesh with its dependence on f made explicit, and we constrain f to lie in (−0.5, 0.5] sinceh f is a periodic function of f with period length of one. In the sequel, we shall only be interested in non-negative diagonal weight matricesW = p diag(w)∈R Nr×Nr for some non-negative weight vector w∈R Nr . For subsequent reference, we define J w (g,f), W y−A h f g T 2 2 = Nr−1 X n=0 w(n) y(n)−Tr A n h f g T 2 (5.18) as our objective function. If the number of dominant paths is R> 1 in (5.15) then our objective function simply becomes J w (G,f) = Nr−1 X n=0 w(n) y(n)− R X i=1 Tr A n h f(i) G(:,i) T 2 (5.19) where the contribution toH in (5.15) from the i th dominant path isH i =h f(i) G(:,i) T , and Problem (P 13 ) is transformed into minimize G,f J w (G,f) subject to f∈ (−0.5, 0.5] R , G∈C M×R . (P 14 ) Let us define the matrixH f ∈C M×R such thatH f (:,i) =h f(i) . Then (5.19) implies that J w (G,f) = Nr−1 X n=0 w(n) y(n)−Tr A n H f G T 2 , (5.20) and a symbolic comparison to (5.10) reveals that Tr A n H f G T =H f (n, :)G T s n (5.21) wheres n =s(n :−1 :n−M + 1) for n = 0, 1,...,N r − 1. 5.3 A Recovery Algorithm In this section, we state our recovery algorithm and state some of its theoretical properties in the absence of noise. Detailed proofs of the results stated here can be found in Section 5.6. Let (G opt ,f opt ) be a global optimum of Problem (P 14 ). It is easy to see that Problem (P 14 ) is a non-convex optimization problem w.r.t. the parameterization in (G,f). Observing that H f G T is a rank R matrix, Problem (P 14 ) can also be regarded as a low-rank matrix recovery problem [70] but with added structural constraints on the matrix factorsG andH f (typicallyR is much smaller thanM orN r , thus makingH f G T a low-rank matrix). We shall adopt an alternating minimization based approach, as described in Algorithm 3, to solve for a local optimum to Problem (P 14 ). Our approach utilizes the low-rank structure ofH f G T directly without a further relaxation to a nuclear norm minimization problem [5] (see Section 5.4.2 for an explanation of why nuclear norm minimization would fail). The ‘arg loc min’ operator in step 2 of the algorithm returns a local minimizer (as opposed to the ‘arg min’ operator that returns a global minimizer) and hence can be implemented using simple gradient descent. 118 Algorithm 3 AMALR Input: 1. Weight vectorw≥ 0 2. Observation vectory∈C Nr 3. Observation operatorA:C Nr×M →C Nr 4. Number of dominant paths R≥ 1 5. Stopping resolution ε≥ 0 Output: MatrixG lo ∈C M×R and vectorf lo ∈ (−0.5, 0.5] R representing a local optimum of Problem (P 14 ) Steps: 1. Initializef (0) = 0∈ (−0.5, 0.5] R 2. At iteration k≥ 1 do (a) G (k) ← arg loc min G∈C M×R J w G,f (k−1) (b) f (k) ← arg loc min f∈(−0.5,0.5] R J w G (k) ,f 3. Repeat until the value of the objective function has converged, i.e. until J w G (k) ,f (k) − J w G (k+1) ,f (k+1) <ε. Remark 5.1. It is easy to see that Algorithm 3 terminates, since the objective function value J G (k) ,f (k) is lower bounded by zero and decreases with each iteration (thereby, improvement in J G (k) ,f (k) will fall below the stopping resolution ε> 0 after a finite number of steps). Indeed, we have for every k≥ 1, J G (k) ,f (k) ≥J G (k+1) ,f (k) ≥J G (k+1) ,f (k+1) (5.22) where the first inequality is due to step (2a) and the second inequality is due to step (2b) of the algorithm, and atleast one of the two inequalities is necessarily strict in each iteration since the algorithm would otherwise be terminated by step 3 due to lack of progress. We analyze the output of Algorithm 3 when the elements of the pilot sequences are drawn i.i.d. from a uniform distribution over the 4-QAM constellation n ± 1 √ 2 ± j √ 2 o . We have chosen the 4-QAM constellation for the ensuing simplicity of analytical computation; the results easily extend to other distributions that are symmetric about the origin and have finite fourth moment. Our main result (Theorem 5.1) is that under no noise and sufficiently small stopping resolution, the output G lo ,f lo of Algorithm 3 is the global optimum (G opt ,f opt ) of Problem (P 14 ) whenever the global optimum is uniquely identifiable. Theorem 5.1. Let elements of s∈ C Nr +M−1 be drawn i.i.d. uniformly from the 4-QAM constellation n ± 1 √ 2 ± j √ 2 o and define Δ,GH T f −G opt H T f opt ∈C M×Nr . Given a generic weight vectorw∈R Nr + in the absence of noise, Δ6= 0 implies ∂ ∂G J w (G,f)6= 0, if the global optimum (G opt ,f opt ) is uniquely identifiable. Proof. Section 5.6.3. To see why Theorem 5.1 implies that Problem (P 14 ) has no local optimum, we reason as follows. It is clear that w.r.t. the parameter vectorG, Problem (P 14 ) is an unconstrained minimization problem. It is therefore necessary that the partial derivative of the objective function w.r.t. G vanishes at every local optimum 119 of Problem (P 14 ). Since G is a matrix over the field of complex numbers and J w (G,f) is a real scalar valued function, the first order optimality condition is ∂ ∂G J w (G,f) = 0, using the notation of Wirtinger calculus [142]. In the sequel, we shall use ∂ ∂G J w (G,f) exclusively in the sense of Wirtinger calculus. Our proof strategy will rely on showing that ∂ ∂G J w (G,f) cannot vanish unless evaluated at (G opt ,f opt ). Remark 5.2. The unique identifiability of (G opt ,f opt ) as the solution to Problem (P 14 ) is necessary for any recovery guarantee, as the output of Algorithm 3. To see this, note that if there is a second global optimum (G ∗ ,f ∗ ) to Problem (P 14 ) then there is no way to determine which of the two solutions is the correct one. In this case, we have Δ ∗ =G ∗ H T f∗ −G opt H T f opt 6= 0, but global optimality of (G ∗ ,f ∗ ) implies ∂ ∂G J w (G ∗ ,f ∗ ) = 0 thus contradicting the conclusion of Theorem 5.1 if the unique identifiability clause was removed from the statement of the theorem. An illustration of how the particular structure of H f helps global identifiability in Problem (P 14 ) is provided in Section 5.4.2. While Theorem 5.1 excludes the existence of local optima, it does not quantify the convergence rate of Algorithm 3. To do so involves knowing the mean and the variance of the partial derivative ∂ ∂G J w (G,f) over the random training sequences (Theorem 5.2). This idea is further explained in Section 5.4.3. Theorem 5.2. Let elements of s∈ C Nr +M−1 be drawn i.i.d. uniformly from the 4-QAM constellation n ± 1 √ 2 ± j √ 2 o and define Δ,GH T f −G opt H T f opt ∈C M×Nr . In the absence of noise, we have E " ∂ ∂G J w (G,f) 2 F # = E ∂ ∂G J w (G,f) 2 F + (M− 1)R Nr−1 X n=0 w 2 (n)kΔ(:,n)k 2 2 (5.23) and E ∂ ∂G J w (G,f) 2 F = vec(Δ ∗ ) H B vec(Δ ∗ ) (5.24) where (using⊗ to denote Kronecker product of matrices)B = diag(w)H f H H f diag(w) ⊗ I∈C M·Nr×M·Nr is a Hermitian symmetric positive semidefinite matrix satisfying the eigenvalue bounds 0Bkdiag(w)H f k 2 I. (5.25) Proof. Section 5.6.2. 5.4 Discussion and Simulations In this section we discuss our choices for the parameters involved in the simulation based testing of Algorithm 3 and present relevant numerical results. We also comment on the convergence rate, failure of nuclear norm minimization and parameter regimes where Problem (P 14 ) is highly ill-conditioned. 5.4.1 Selecting Weights We note that the statement of Theorem 5.1 is not very sensitive to particular choices of the weight vector, since w can be chosen generically even if it is required to be close to an intended weight vectorw ∗ , e.g. consider w(n) =δ n w ∗ (n) whereδ n is uniformly distributed in (0.9, 1.1) and are chosen i.i.d. acrossn = 0, 1,...,N r −1. In particular, Theorem 5.1 holds forw = 1, i.e. when Problem (P 14 ) represents ordinary MMSE estimation. However, proper selection of the weight vectorw∈R Nr + has a huge impact on the practical performance of Algorithm 3. We found that the convergence rate of the ordinary MMSE estimation through Algorithm 3 for Problem (P 14 ) (obtained by settingw = 1) is extremely slow. One possible reason for this behavior is thatf enters non-linearly in the cost function J w (G,f). Indeed, to find a local minimizer w.r.t. G during iteration (k + 1) in the algorithm, the estimate of f from the previous iterationf (k) is used to approximatef opt . The errors in observationsA H f G T are sensitive to 120 the differencef (k) −f opt , but the sensitivity grows linearly with the observation index n∈{0, 1,...,N r − 1} since ∂ ∂f H f (n, :)G T s n =j2πnH f (n, :)G T s n . (5.26) As a result, in the early iterations forw = 1, the errors in the observations with large indexn would dominate the cost function J w (G,f), and therefore also dominate the partial derivative ∂ ∂G J w (G,f) w.r.t.G even though the errors are a result of inaccuracy in the estimate off. Thus, the partial derivative w.r.t.G gets unfairly biased, adversely affecting the convergence rate of the algorithm. Following (5.26), a natural way to desensitize the partial derivative ∂ ∂G J w (G,f) w.r.t. the observation index n is to set the weight vector as w(n) = ( 1, n = 0, 1 n , 1≤n≤N r . (5.27) The above choice ofw indeed gives much better convergence rate thanw = 1; we only report results for the setting ofw in (5.27) in the sequel. 5.4.2 Failure of Nuclear Norm Minimization Since Problem (P 14 ) is a low-rank matrix recovery problem [70], a plausible initialization strategy could be as follows. We solve the nuclear norm regularized (to promote low-rank solution) least squares problem minimize X kW (y−A(X))k 2 2 +λkXk ∗ subject to X∈C Nr×M (P 15 ) to obtain a global optimumX ∗ for some parameterλ> 0. Thereafter, we find the best rankR approximation toX ∗ using the singular value decomposition to obtain matricesU∈C Nr×R , Σ∈R R×R andV ∈C M×R respectively representing the left singular vectors, the singular values and the right singular vectors. Finally, we initialize Algorithm 3 by settingG =V ∗ Σ and/orH f =U. The above initialization strategy does not work for Problem (P 14 ) because of the specific structure of the observation operatorA. Specifically, it is intuitive to see that the above mentioned nuclear norm minimization strategy ignores all structural information inH f . According to (5.21), the n th observationy(n) recorded throughA has contributions only from the n th row of the unknown matrixH f G T . The specific structural relationship between rows ofH f is the only connection between the different observations obtained through A and fosters hope for the inverse problem (P 14 ) to be well posed. If this structural relationship between rows ofH f is ignored, then (5.21) implies that each observation inA is associated with R≥ 1 new variables that do not contribute to any other observation, e.g.y(n) will be associated with the R new variables in H f (n, :). On counting the degrees of freedom in the unknown matrixH f G T after ignoring the structural relationship between rows ofH f , it is straightforward to see that we have (M +N r )·R free parameters but only N r observations. This implies that using the solution to Problem (P 15 ) is highly unreliable to initialize Algorithm 3 for Problem (P 14 ). Intuitively, an arbitrary initialization should do equally well (or equally bad). 5.4.3 Practical Convergence Rate and Stopping Criterion Although Theorem 5.1 directly implies that Problem (P 14 ) does not have any local optima if it has a unique global optimum (G opt ,f opt ), it is not necessarily true that Algorithm 3 would output (G opt ,f opt ) as the answer. This is because the improvement in the value of the objective function J w (G,f) across iterations, although non-zero, may fall below the stopping resolution ε > 0 resulting in premature termination of Algorithm 3. In our experiments, we have observed that in such cases the algorithm’s output G lo ,f lo is nowhere near the right answer (G opt ,f opt ), resulting in gross errors. Usually, premature termination is linked to slow convergence rates. For Algorithm 3, it can be theoretically shown that the rate of convergence to the right answer (G opt ,f opt ) is quite slow (barring few exceptions) 121 when the current iterate G (k) ,f (k) is far away and is rapid (linear convergence rate) when the current iterate is near the right answer. This explains why we observe large errors on premature termination of the algorithm and further highlights the importance of initializing Algorithm 3 close enough to the correct answer. Some initialization thumb rules are mentioned in Section 5.4.4 below. Yet another reason for slow convergence of Algorithm 3 can be seen by studying Theorem 5.2. Given the current iterate G (k) ,f (k) , it is intuitive to expect that the best stepping direction for the next iteration should have a weak dependence on the realization of the training sequences and be strongly dependent on Δ k =G (k) H T f (k) −G opt H T f opt . However, Theorem 5.2 implies that even for a fixed Δ =GH T f −G opt H T f opt , the partial derivative ∂ ∂G J w (G,f) varies greatly between realizations of the training sequence s. This is evidenced by the relative magnitudes of E h ∂ ∂G J w (G,f) 2 F i and E ∂ ∂G J w (G,f) 2 F . Using (5.23) and (5.24), the former is roughly M times larger than the latter for weight vectorw = 1. The major implication of this observation is that for most realizations of the training sequences, the partial derivative ∂ ∂G J w (G,f) (also the negative stepping direction for gradient descent) points in a direction that is very different from the best stepping direction, resulting in slow convergence rates. 5.4.4 Parameter Regimes to avoid Ill-conditioning Problem (P 14 ) may become ill-conditioned if columns ofH f have high correlation, e.g. if R = 2 andf(1) is close enough tof(2), then we get H f G T =H f (:, 1)G(:, 1) T +H f (:, 2)G(:, 2) T ≈H f (:, 1)G(:, 1) T +H f (:, 1)G(:, 2) T =H f (:, 1)(G(:, 1) +G(:, 2)) T (5.28) implying that the rank one matrix H f (:, 1)(G(:, 1) +G(:, 2)) T is nearly indistinguishable from the right answer H f G T of rank two. The condition number of the matrix H H f H f provides a convenient way to quantify the overall ill-conditioning inH f for a given set of frequenciesf. Lower values of the condition number forH H f H f imply better conditioning of Problem (P 14 ). To provide meaningful simulation results, we only consider instances off for which the condition number ofH H f H f is less than 10 (the lowest possible condition number is 1 and corresponds toH f having perfectly orthonormal columns). It is clear from (5.13) that each column ofH f is a periodically sampled complex exponential of a different frequency, and a total of N r time instances are sampled. Thus, for the columnsH f (:,l) andH f (:,l 0 ) to be perfectly orthonormal, we must have the frequenciesf(l) andf(l 0 ) separated by an integer multiple of 1/N r . To keep the condition number ofH H f H f to be less than 10, numerical computations reveal that 1. for R = 2, a frequency separation of at least 0.35/N r is necessary, and 2. for R = 4, a frequency separation of at least 0.7/N r is necessary. We appropriately restrict our choice of frequencies inf to ensure well-conditioned instances ofH f . We note that the conditioning of Problem (P 14 ) also depends on the condition number ofG H G via an approximation analogous to (5.28). However, this is not a significant problem for our simulations since we shall restrict ourselves to instances where columns ofG are generated as i.i.d. circularly symmetric complex Gaussian random vectors with zero mean and identity covariance matrix. As mentioned in Section 5.4.3, initialization plays an important role in determining the convergence rate of the proposed algorithm. With R = 1 dominant component, a stopping resolution of ε = 10 −5 and weight vectorw as in (5.27), we found that the algorithm terminates prematurely whenever the right answer f opt is farther than 1/N r from the initialized frequency f (0) , for a wide range of delay spread choices M and no observation noise. With the same weight vector and stopping resolution but for R = 4 dominant components, the all zero initializationf (0) = 0 gives good results when the right answerf opt has frequency separation between successive indices in the interval [0.9/N r , 1.1/N r ], i.e. very close to 1/N r , in the absence of noise and for a wide range of delay spreads. For R≥ 5 dominant components, we could not make the algorithm work reliably with all zero frequency initializationf (0) = 0, even if the frequencies inf are separated exactly by 1/N r . 122 5 10 15 20 0.4 0.6 0.8 1 SNR (dB) ˆ f/f f = 0.0097 f = 0.0130 f = 0.0179 (a) Relative accuracy ˆ f/f of estimated frequency ˆ f versus SNR at a constant frequency factor ofη =0.7 with different oversampling factors ρ∈{1.1,1.5,2}. 5 10 15 20 −0.5 0 0.5 1 SNR (dB) ˆ f/f f = 0.0065 f = 0.0130 f = 0.0185 (b) Relative accuracy ˆ f/f of estimated frequency ˆ f versus SNR at a constant oversampling factor ofρ= 1.5 with different frequency factorsη∈{0.35,0.7,1}. Figure 5.2: Estimation accuracy results averaged over 10 realizations of the observation noise vector. Closer the value of ˆ f/f to 1, higher is the relative accuracy of the estimate. 5.4.5 Numerical Results: One Dominant Component We have R = 1 and we set the delay spread to M = 35. The number of free variables in the model clearly equals (M + 1)·R =M + 1. We set the number of observations as N r =bρ· (M + 1)·Rc =bρ· (M + 1)c where ρ represents the over sampling factor. We vary ρ in the range [1, 2]. We vary the Doppler shift f as a fraction of 1/N r , i.e. we test the algorithm against a Doppler shift of f =η/N r where the frequency factor η is in the range [−1, 1]. We assume that the observations are corrupted by additive circularly symmetric complex white Gaussian noise with signal-to-noise-ratio (SNR) between 5dB and 20dB. Figure 5.2a shows the accuracy of frequency estimation ˆ f/f versus SNR for different oversampling factors ρ while keeping the frequency factor η as constant (note that changing ρ changes N r and hence changes the numerical value of the Doppler shift f even thoughη is constant). Not surprisingly, larger oversampling leads to better estimation performance at low SNR. In terms of actual numerical values, we see accurate estimation above 10dB SNR even for a high Doppler frequency of f = η bρ·(M+1)c = 0.0179 at (η,ρ) = (0.7, 1.1). Figure 5.2b shows the accuracy of frequency estimation ˆ f/f versus SNR for different frequency factors η while keeping the oversampling factor ρ as constant. For η< 1, the relative accuracy ˆ f/f is fairly good even at SNRs as low as 5dB. However, for η = 1, estimation fails completely at all SNRs suggesting that our algorithm may have terminated prematurely due to a slow convergence rate (see Section 5.4.3). 5.4.6 Numerical Results: Multiple Dominant Components We have R = 4 and we set the delay spread to M = 35. Borrowing the terminology from Section 5.4.5, we set the oversampling factor to ρ = 1.5 and SNR as 20dB. We set the Doppler frequency vector as f = η Nr , 2η Nr , 3η Nr , 4η Nr and vary η in [0.8, 1.2] where the lower bound on η is influenced by well-conditioning requirements and the upper bound is dictated by convergence rate requirements (see Section 5.4.4). Figure 5.3 shows a scatter plot of the estimation accuracy ˆ f(l)/f(l) of each frequency component l∈{1, 2, 3, 4} versus the frequency factor η while keeping the oversampling factor ρ and SNR as constants. The relative accuracy of estimation n ˆ f(l)/f(l) 1≤l≤ 4 o is very good for η = 1, is fairly good at η = 1.2, but breaks down completely for η = 0.8 which might be attributed to a combination of moderately high condition number and premature termination. Figure 5.4 shows the performance of basis pursuit (BP) relative to that of Algorithm 3 (AMALR) on the same set of frequencies, compared w.r.t. normalized MSE for the effective channel matrixH =H f G T . It is clear that BP fails completely (even though (5.10) suggests thatH is sparse in Fourier domain), and we attribute it to non-utilization of the specific low-rank factored structure of the channel matrix and the non-mixing nature of the observation operator as described in Section 5.4.2. 123 0.7 0.8 0.9 1 1.1 1.2 1.3 -4 -2 0 2 4 6 f = (0.0037, 0.0074, 0.0111, 0.0148) f = (0.0046, 0.0093, 0.0139, 0.0185) f = (0.0056, 0.0111, 0.0167, 0.0222) frequency factor, Figure 5.3: Scatter plot of relative accuracy n ˆ f(l)/f(l) 1≤l≤ 4 o of all components in the estimated frequency vector ˆ f versus frequency factor η∈{0.8, 1, 1.2} at a constant oversampling factor of ρ = 1.5 and 20dB SNR. Results for different η are coded by markers of different color/shape. For a given η, stronger the clustering of the markers around 1 along the y-axis, higher is the overall relative accuracy of ˆ f. Figure 5.4: Plot of normalized MSE performance of BP and AMALR versus frequency factor η∈{0.8, 1, 1.2} at a constant oversampling factor of ρ = 1.5 and 20dB SNR. BP fails completely, due to non-utilization of structural properties. AMALR gives less than 2% normalized MSE. 124 5.5 Conclusions In this chapter, we have investigated the estimation of a narrow-band time-varying channel under finite block length and finite transmission bandwidth. A novel low-rank matrix recovery based formulation with structural constraints was proposed to estimate the channel in the delay-Doppler domain, utilizing separability in the Doppler and delay directions. An alternating minimization algorithm was proposed with a weighted MMSE cost function for the estimation step using noisy training signal measurements. Identifiably results for the channel leakage (due to finite block length and finite transmission bandwidth) in both delay and Doppler directions for the channel were developed. Extensive justification for the selection of weights and simulation parameters was given and performance was verified by simulations. Investigation of other weighing strategies for the weighted MMSE cost function is left for future research. 5.6 Proofs 5.6.1 Supporting Lemmas Lemma 5.3. Let elements of s ∈ C Nr +M−1 be drawn i.i.d. uniformly from the 4-QAM constellation n ± 1 √ 2 ± j √ 2 o and lets n =s(n :−1 :n−M + 1) for n = 0, 1,...,N r − 1. We haveE s n s H n = I and E s n s H n s n 0s H n 0 = ( MI, n =n 0 , I, n6=n 0 . (5.29) Proof. The proof proceeds through an individual computation of each element of the matrixE s n s H n s n 0s H n 0 for all allowable values of the pair (n,n 0 )∈{0, 1,...,N r − 1} 2 . It is clear thats n (k) =s(n−k),|s n (k)| = 1 and E[s n (k)] = 0 for every 0≤k≤M− 1. For brevity, let us define the random matricesA nn 0 =s n s H n s n 0s H n 0 = s H n s n 0 s n s H n 0 for all permitted pairs (n,n 0 ). For n =n 0 , we have A nn 0 =A nn = s H n s n s n s H n =ks n k 2 2 s n s H n = M−1 X k=0 |s n (k)| 2 ! s n s H n =Ms n s H n . (5.30) By independence of s n (k) and s n (k 0 ) for k 6= k 0 , we have E[A nn (k,k 0 )] = M·E[s n (k)s ∗ n (k 0 )] = M· E[s n (k)]E[s n (k 0 )] ∗ = 0. Hence,E[A nn ] =ME s n s H n =MI on observing thatA nn (k,k) =M·s n (k)s ∗ n (k) = M·|s n (k)| 2 =M. For n6=n 0 , we haveA nn 0(k,k 0 ) = s H n s n 0 s n (k)s H n 0(k 0 ) implying that E[A nn 0(k,k 0 )] =E s H n s n 0 s n (k)s ∗ n 0(k 0 ) =E " M−1 X l=0 s ∗ n (l)s n 0(l) ! s n (k)s ∗ n 0(k 0 ) # = M−1 X l=0 E[s ∗ n (l)s n 0(l)s n (k)s ∗ n 0(k 0 )] = M−1 X l=0 E[s ∗ (n−l)s(n 0 −l)s(n−k)s ∗ (n 0 −k 0 )]. (5.31) For brevity, we setl 1 =n−l,l 2 =n 0 −l,l 3 =n−k andl 4 =n 0 −k 0 . Let us evaluate the fourth order moment termE[s ∗ (n−l)s(n 0 −l)s(n−k)s ∗ (n 0 −k 0 )] =E[s ∗ (l 1 )s(l 2 )s(l 3 )s ∗ (l 4 )] on the r.h.s. of (5.31). The answer clearly depends on the redundancy in the set of the four indices{l 1 ,l 2 ,l 3 ,l 4 }. We have the following cases. 1. An index in{l 1 ,l 2 ,l 3 ,l 4 } occurs an odd number of times: In this case, there must exist an index that occurs exactly once or exactly three times. If an index occurs exactly three times then the fourth index in the set would occur exactly once. Thus, there must exist an index l 0 ∈{l 1 ,l 2 ,l 3 ,l 4 } occurring exactly once and therefores(l 0 ) is independent of the set{s(l 1 ),s(l 2 ),s(l 3 ),s(l 4 )}\{s(l 0 )}. Utilizing independence, it is easy to see that we haveE[s ∗ (l 1 )s(l 2 )s(l 3 )s ∗ (l 4 )] = 0. For example, if l 0 =l 1 then E[s ∗ (l 1 )s(l 2 )s(l 3 )s ∗ (l 4 )] =E[s(l 1 )] ∗ E[s(l 2 )s(l 3 )s ∗ (l 4 )] = 0·E[s(l 2 )s(l 3 )s ∗ (l 4 )] = 0. (5.32) 125 2. {l 1 ,l 2 ,l 3 ,l 4 } has two distinct indices, each occurring twice: If l 1 =l 2 , then n−l =n 0 −l =⇒ n =n 0 . This is precluded by the premise n6= n 0 under consideration. Thus, we cannot have the pairing l 1 =l 2 6=l 3 =l 4 . This leaves the following two possible pairings to consider. (a) l 1 =l 4 6=l 2 =l 3 : By independence betweens(l 1 ) ands(l 2 ), we have E[s ∗ (l 1 )s(l 2 )s(l 3 )s ∗ (l 4 )] =E[s ∗ (l 1 )s(l 2 )s(l 2 )s ∗ (l 1 )] =E s 2 (l 1 ) ∗ E s 2 (l 2 ) . (5.33) Using the equivalent representation exp j qπ 4 1≤q≤ 3 for the 4-QAM constellation, we have E s 2 (l 2 ) = 3 X q=1 Pr s(l 2 ) = exp j qπ 4 exp j 2qπ 4 = 1 4 3 X q=1 exp j qπ 2 = 1 4 (j− 1−j + 1) = 0 (5.34) and hence r.h.s. of (5.33) evaluates to zero. (b) l 1 =l 3 6=l 2 =l 4 : We have E[s ∗ (l 1 )s(l 2 )s(l 3 )s ∗ (l 4 )] =E[s ∗ (l 1 )s(l 2 )s(l 1 )s ∗ (l 2 )] =E h |s(l 1 )| 2 |s(l 2 )| 2 i =E 1 2 · 1 2 = 1. (5.35) Further, l 1 =l 3 implies n−l =n−k =⇒ k =l and l 2 =l 4 implies n 0 −l =n 0 −k 0 =⇒ k 0 =l. Therefore, we have l =k =k 0 for this case. 3. One index occurs four times in{l 1 ,l 2 ,l 3 ,l 4 }: This implies l 1 =l 2 in turn implying n =n 0 and hence is precluded by the premise n6=n 0 . Overall, we have shown that under n6=n 0 , E[s ∗ (l 1 )s(l 2 )s(l 3 )s ∗ (l 4 )] = ( 1, l 1 =l 3 6=l 2 =l 4 , 0, otherwise, (5.36) or equivalently, E[s ∗ (n−l)s(n 0 −l)s(n−k)s ∗ (n 0 −k 0 )] = ( 1, l =k =k 0 , 0, otherwise, (5.37) under n6=n 0 . Using (5.37) to evaluate the r.h.s. of (5.31) we get the following. 1. If k6=k 0 , then each of the M terms being summed on the r.h.s. of (5.31) evaluates to zero. Hence, E[A nn 0(k,k 0 )] = 0 if k6=k 0 under n6=n 0 . 2. If k =k 0 , then of the M terms being summed on the r.h.s. of (5.31), only one term evaluates to one and corresponds to the summation index l =k =k 0 whereas the other M− 1 terms evaluate to zero. Hence,E[A nn 0(k,k)] = 1 under n6=n 0 . Therefore,E[A nn 0] = I for n6=n 0 and the proof is complete. Lemma 5.4. LetX∈C m×n andC∈C n×n be arbitrary matrices and let⊗ denote the matrix Kronecker product operation. We have Tr XCX H = vec(X) H (C⊗ I) vec(X). Proof. We define the shorthandx = vec(X)∈C m·n and letE i,j ∈R n×n denote the matrix with the (i,j) th 126 element as one and all other elements as zero. Then standard linear algebra gives Tr XCX H = Tr X X 1≤i≤n 1≤j≤n C(i,j)E i,j X H = X 1≤i≤n 1≤j≤n C(i,j)Tr E i,j X H X = X 1≤i≤n 1≤j≤n C(i,j)Tr 0 (i−1)×m X(:,j) H 0 (n−i)×m X = X 1≤i≤n 1≤j≤n C(i,j)X(:,j) H X(:,i) = X 1≤i≤n 1≤j≤n C(i,j)x (j− 1)m + 1 :j·m H x (i− 1)m + 1 :i·m =x H (C⊗ I)x, (5.38) finishing the proof. 5.6.2 Proof of Theorem 5.2 Let us define the shorthandD = diag(w)H f H H f diag(w), so thatB =D⊗I. From (5.46) and the definition of the matricesA n , we get ∂ ∂G J w (G,f) = Nr−1 X n=0 w(n)s n Tr A ∗ n Δ H H f (n, :) = Nr−1 X n=0 w(n)s n s H n Δ ∗ (:,n)H f (n, :). (5.39) From Lemma 5.3,E s n s H n = I for every 0≤n≤N r − 1 and therefore, E ∂ ∂G J w (G,f) = Nr−1 X n=0 w(n)E s n s H n Δ ∗ (:,n)H f (n, :) = Nr−1 X n=0 w(n)Δ ∗ (:,n)H f (n, :) = Δ ∗ diag(w)H f . (5.40) Using Lemma 5.4, (5.40) gives E ∂ ∂G J w (G,f) 2 F = Tr Δ ∗ diag(w)H f Δ ∗ diag(w)H f H = Tr Δ ∗ DΔ T = vec(Δ ∗ ) H (D⊗ I) vec(Δ ∗ ) (5.41) and proves (5.24). Next, we will invoke some standard properties of the Kronecker matrix product [143]. We have (X 1 ⊗X 2 ) H = X H 1 ⊗X H 2 for arbitrary matrices X 1 and X 2 , so that B is Hermitian symmetric by the Hermitian symmetry of bothD and I. Further, ifX 1 ∈C p×p andX 2 ∈C q×q respectively have eigenvalues λ i , 1≤ i≤ p and μ j , 1≤ j≤ q (listed with multiplicities), then X 1 ⊗X 2 has the p·q eigenvalues (with multiplicities) λ i μ j , (i,j)∈{1,...,p}×{1,...,q}. Thus, the minimum and maximum eigenvalues ofB are (by definition) the same as that ofD. Since x H Dx = H H f diag(w)x 2 2 ≤ H H f diag(w) 2 kxk 2 2 (5.42) 127 with equality achieved whenx is the leading eigenvector ofH H f diag(w), we have 0D H H f diag(w) 2 I. Therefore, B is positive semidefinite with maximum eigenvalue as H H f diag(w) 2 =kdiag(w)H f k 2 and (5.25) is proved. From (5.39) and standard linear algebra, we have ∂ ∂G J w (G,f) 2 F = Tr " Nr−1 X n=0 w(n)s n s H n Δ ∗ (:,n)H f (n, :) # H Nr−1 X n 0 =0 w(n 0 )s n 0s H n 0Δ ∗ (:,n 0 )H f (n 0 , :) = Nr−1 X n=0 Nr−1 X n 0 =0 w(n)w(n 0 )Tr H f (n, :) H Δ(:,n) T s n s H n s n 0s H n 0Δ ∗ (:,n 0 )H f (n 0 , :) . (5.43) On using Lemma 5.3 to compute expectations, we get E " ∂ ∂G J w (G,f) 2 F # = Nr−1 X n=0 Nr−1 X n 0 =0 w(n)w(n 0 )Tr H f (n, :) H Δ(:,n) T E s n s H n s n 0s H n 0 Δ ∗ (:,n 0 )H f (n 0 , :) = Nr−1 X n=0 Nr−1 X n 0 =0 w(n)w(n 0 )Tr H f (n, :) H Δ(:,n) T Δ ∗ (:,n 0 )H f (n 0 , :) + (M− 1) Nr−1 X n=0 w 2 (n)Tr H f (n, :) H Δ(:,n) T Δ ∗ (:,n)H f (n, :) = Tr " Nr−1 X n=0 w(n)Δ ∗ (:,n)H f (n, :) # H " Nr−1 X n 0 =0 w(n 0 )Δ ∗ (:,n 0 )H f (n 0 , :) # + (M− 1) Nr−1 X n=0 w 2 (n)kΔ ∗ (:,n)k 2 2 Tr H f (n, :) H H f (n, :) = Nr−1 X n=0 w(n)Δ ∗ (:,n)H f (n, :) 2 F + (M− 1) Nr−1 X n=0 w 2 (n)kΔ(:,n)k 2 2 kH f (n, :)k 2 2 = E ∂ ∂G J w (G,f) 2 F + (M− 1)R Nr−1 X n=0 w 2 (n)kΔ(:,n)k 2 2 (5.44) where the expression for the expected gradientE ∂ ∂G J w (G,f) in the last equality follows from (5.40) and kH f (n, :)k 2 2 = R X l=1 |H f (n,l)| 2 = R X l=1 1 =R (5.45) by definition. This proves (5.23). 128 5.6.3 Proof of Theorem 5.1 We will show the contrapositive, i.e. if ∂ ∂G J w (G,f) = 0 for a generic weight vectorw then it implies Δ = 0. Standard linear algebra, Wirtinger calculus [142], and the definition of the matricesA n imply ∂ ∂G J w (G,f) = ∂ ∂G Nr−1 X n=0 w(n) y(n)−Tr A n H f G T 2 = Nr−1 X n=0 w(n) y(n)−Tr A n H f G T ∗ ∂ ∂G y(n)−Tr A n H f G T =− Nr−1 X n=0 w(n) Tr A n H f opt G opt T −Tr A n H f G T ∗ A n H f =− Nr−1 X n=0 w(n)Tr A ∗ n G opt H T f opt−GH T f H A n H f = Nr−1 X n=0 w(n)Tr A ∗ n Δ H s n H f (n, :). (5.46) Clearly, if ∂ ∂G J w (G,f) = 0 for generic vectorsw∈R Nr + then, we must have Tr A ∗ n Δ H s n H f (n, :) = 0 for every 0≤n≤N r − 1. Since,s n 6= 0 (by domain restriction) andH f (n, :)6= 0 T (by definition), we must have Tr A ∗ n Δ H = 0 for every 0≤n≤N r −1, or equivalentlyA Δ T = 0 ⇐⇒ A H f opt(G opt ) T =A H f G T . UsingtheuniqueidentifiabilityoftheglobaloptimumimpliesH f opt(G opt ) T =H f G T , orequivalently Δ = 0. 129 Chapter 6 Conclusions and Future Directions In this dissertation, we have formulated and studied a class of estimation problems that we termed as bilinear inverse problems (BIPs). Instances from this class of problems occur throughout signal processing applications in particular, and science and engineering in general. Often, these constitute hard problems in a field since linear system theory is not directly applicable to these inverse problems leading to concerns about well-posedness and efficient algorithmic recovery. The dissertation develops a somewhat generalizable, optimization-centric framework to analyze the well-posedness of such problems, and develops efficient recovery algorithms whenever the problems turn out to be well-posed. Special emphasis was given to the problems involving blind deconvolution, active target localization and delay-Doppler estimation (all instances of bilinear inverse problems) to demonstrate how our thought process and framework could be applied. 6.1 Summary Chapter 2 abstractly formulated the finite dimensional inverse problems with bilinear observations and conic constraints on the unknowns. Conic constraints were chosen since they represent a large class of regularizing techniques (like sparsity, low-rank nature, etc.). Some general dimensionality based techniques were developed for studying well-posedness in terms of existence of a unique answer in the absence of noise. The flexibility and generalizability of the proposed approach was a result of the lifting technique in optimization [1]. Some identifiability results for one specific and important problem (blind deconvolution under separable conic constraints) were worked out to illustrate the power of the proposed approach. Chapter 3 was devoted to an in-depth study of the single channel blind deconvolution problem under specific conic constraints like canonical-sparsity and repetition coding. Blind deconvolution was chosen owing to its ubiquity in signal processing applications and notorious ill-posedness without application specific assumptions. This chapter flushed out the full power of the lifting technique for the identifiability analysis machinery developed in Chapter 2 by fully characterizing the ambiguity space for single channel blind deconvolution, illustrating the rotational ambiguity phenomenon as the cause of ill-posedness and obtaining surprisingly strong unidentifiability results. Chapter 4 studied the problem of localizing a target from samples of its separable and decaying field signature. This problem was shown to be a disguised instance of a BIP with conic constraints. However, unlike the focus on identifiability analysis in Chapters 2 and 3, the goal here was to derive a localization algorithm with theoretical guarantees on convergence rate by leveraging the approximate bilinearity in the noisy target signature. It was further proved that both the number of samples and the computational effort necessary to achieve a given localization accuracy was much lower than that needed by a naive matrix completion based algorithm with passive sampling, and this was shown to be the result of the unimodal approximation on the target signature field (yet another example of a conic regularizing constraint). Besides allowing for sufficiency of lower number of samples, unimodality was also shown to provide robustness against simultaneously sparse and low-rank observation noise. It is well known that this type of noise significantly 130 degrades performance of low-rank matrix completion based estimators in the absence of prior information. Chapter 5 studied the identifiability and recovery of delay-Doppler components of a narrowband time- varying communication channel with leakage caused by finite blocklength and finite transmission bandwidth. The lifting technique was used to illustrate that this is a low-rank matrix recovery problem with complex exponential structural constraints. It was further shown that if the global optimum of the estimation problem is uniquely identifiable then no other local optima exist. However, a naive low-rank matrix recovery algorithm like nuclear norm minimization was shown to fail due to unidentifiability stemming from ignorance of structural constraints. Since low-rank matrix structure could be equivalently transformed into a bilinear model, a simple and efficient estimation algorithm was designed based on the alternating minimization heuristic and shown to work well where basis pursuit based approaches failed. 6.2 Outlook The main takeaway from this dissertation that needs a lot more research is the basic understanding of how to systematically relax constrained (usually conic constraints) low-rank matrix recovery problems encountered in various applications. Such relaxations, first and foremost, need to preserve identifiability of the underlying model and only then could be considered candidates for tractable recovery. As was demonstrated in this dissertation, identifiability of constrained low-rank matrix recovery problems cannot be taken for granted especially when they are engendered by certain classes of cone constrained BIPs. While identifiability analysis could help expose the fundamental limitations of different classes of BIPs, significant research efforts are needed to understand the underlying geometry of these problems giving rise to the unidentifiability conditions. It is my strong belief that optimization-centric techniques would be useful for such identifiability analysis that is flexible to different constraints even if good algorithms cannot be designed due to ill-posedness. The next logical step in refining the framework of identifiability analysis for BIPs involves its extension to families of BIPs other than vector-vector BIPs that potentially require weaker notions of identifiability to have any hope of being meaningfully posed [104]. Dictionary learning [29], [144], [145] and non-negative matrix factorization [20], [146] are examples of the class of matrix-matrix BIPs for which the identifiability notions need to be significantly relaxed for them to be meaningfully well-posed. These problem classes could potentially benefit from our framework of identifiability analysis for a fundamental and geometrical understanding of the limitations in presence of various regularizing conic constraints, even though tractable recovery algorithms have been found for these formulations under some instances of fairly strict assumptions. Finally, it would be very interesting (and potentially challenging) to extend this framework for identifiability analysis to general multilinear inverse problems since singular value decompositions may not always exist in such cases. A different but related line of research could try to address the question of developing recovery algorithms for cone constrained BIPs in a systematic fashion. Although it seems unlikely that the algorithmic question could be addressed at the level of generality at which the identifiability issue was addressed, it is necessary to go beyond the simple, provably correct, nuclear norm based low-rank matrix recovery relaxations to address many of the BIPs that occur in practice [84], [147], [148]. A related problem in the context of single channel blind deconvolution (and vector-vector BIPs in general) would be to design non-randomized precoding strategies to enable identifiability and/or tractable recovery (like what was achieved in [25] with randomized precoding) which would shed further light on the geometry of the rotational ambiguity phenomenon. The active target localization problem admits a natural extension to multiple targets that would lead to the violation of the approximate unimodality condition unless there is only one dominant target. Modeling this problem appropriately (perhaps using a variation of non-negative matrix factorization with added unimodality constraints) and designing a localization algorithm for the same would be an interesting and non-trivial theoretical and practical exercise, potentially leading to new tools and techniques for convergence analysis for a non-negative matrix factorization problem with added conic constraints (this is a fairly sophisticated instance of a matrix-matrix BIP). There is also the possibility of developing a localization algorithm that makes more effective use of the collected samples by proceeding with a joint optimization rather than by a two-step approach analogous to that in Chapter 4. A pleasant side-effect of such an analysis would be a 131 theoretical study of the performance of matrix completion based approaches for tasks other than estimation. The complex exponential constraints for the channel estimation problem in the delay-Dopper domain, as illustrated in Chapter 5, offer an excellent source of problems requiring a systematic and generalizable approach of relaxing constrained low-rank matrix recovery problems while being careful so as not to lose model identifiability in the process. A study of convergence analysis of this class of constrained BIPs including techniques for accelerating convergence would be valuable in further refinement of our understanding of the geometry of BIPs with complex exponential constraints. 132 Bibliography [1] E. Balas, “Projection, Lifting and Extended Formulation in Integer and Combinatorial Optimization,” Ann. Oper. Res., vol. 140, pp. 125–161, 2005, issn: 0254-5330. [2] M. Elad, M. A. T. Figueiredo, and Y. Ma, “On the Role of Sparse and Redundant Representations in Image Processing,” Proc. IEEE, vol. 98, no. 6, pp. 972–982, Jun. 2010. [3] Y.-C. Ho, “Review of the Witsenhausen problem,” in 47 th IEEE Conference on Decision and Control (CDC), Dec. 2008, pp. 1611–1613. [4] D. L. Donoho, “Compressed Sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, 2006. [5] E. J. Candès and B. Recht, “Exact Matrix Completion via Convex Optimization,” Found. Comput. Math., vol. 9, no. 6, pp. 717–772, 2009, issn: 1615-3375. [6] J. Mattingley and S. Boyd, “Real-Time Convex Optimization in Signal Processing,” IEEE Signal Process. Mag., vol. 27, no. 3, pp. 50–61, May 2010, issn: 1053-5888. [7] O. Chapelle, B. Schölkopf, and A. Zien, Eds., Semi-Supervised Learning (Adaptive Computation and Machine Learning series), 1st ed. The MIT Press, Jan. 2010, isbn: 9780262514125. [Online]. Available: http://amazon.com/o/ASIN/0262514125/. [8] R. Twyman, Principles of Proteomics (Advanced Texts), 1st ed. Taylor & Francis, Sep. 2004. [Online]. Available: http://amazon.com/o/ASIN/1859962734/. [9] M. V. Klibanov, P. E. Sacks, and A. V. Tikhonravov, “The phase retrieval problem,” Inverse Problems, vol. 11, no. 1, pp. 1–28, 1995. [10] E. J. Candès, Y. C. Eldar, T. Strohmer, and V. Voroninski, “Phase Retrieval via Matrix Completion,” SIAM J. Imaging Sci., vol. 6, no. 1, pp. 199–225, 2013, issn: 1936-4954. [11] S. Choudhary and U. Mitra, “Sparse Recovery from Convolved Output in Underwater Acoustic Relay Networks,” in 2012 Asia-Pacific Signal Information Processing Association Annual Summit and Conference (APSIPA ASC), Hollywood, USA, Dec. 2012, pp. 1–8. [12] ——, “Sparse Blind Deconvolution: What Cannot Be Done,” in 2014 IEEE International Symposium on Information Theory (ISIT), Honolulu, USA, Jun. 2014, pp. 3002–3006. [13] A. Agarwal, A. Anandkumar, and P. Netrapalli, “Exact Recovery of Sparsely Used Overcomplete Dictionaries,” ArXiv e-prints, vol. abs/1309.1952, Sep. 2013. eprint: 1309.1952. [Online]. Available: http://arxiv.org/abs/1309.1952. [14] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The Application of Compressed Sensing for Rapid MR Imaging,” Magn. Reson. Med., vol. 58, no. 6, pp. 1182–1195, 2007, issn: 1522-2594. [15] Z. Xing, M. Zhou, A. Castrodad, G. Sapiro, and L. Carin, “Dictionary learning for noisy and incomplete hyperspectral images,” SIAM J. Imaging Sci., vol. 5, no. 1, pp. 33–56, 2012. [16] Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan, “Large-Scale Parallel Collaborative Filtering for the Netflix Prize,” in Algorithmic Aspects in Information and Management, ser. Lecture Notes in Computer Science, R. Fleischer and J. Xu, Eds., vol. 5034, Springer Berlin Heidelberg, 2008, pp. 337–348. 133 [17] N.Michelusi,U.Mitra,andM.Zorzi,“HybridSparse/DiffuseUWBChannelEstimation,”in 2011 IEEE 12 th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Jun. 2011, pp. 201–205. [18] J. Hopgood and P. J. W. Rayner, “Blind Single Channel Deconvolution Using Nonstationary Signal Processing,” IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 476–488, 2003. [19] P. D. O’Grady, B. A. Pearlmutter, and S. T. Rickard, “Survey of Sparse and Non-Sparse Methods in Source Separation,” International Journal of Imaging Systems and Technology, vol. 15, no. 1, pp. 18–33, 2005. [20] D. L. Donoho and V. Stodden, “When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?” In Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf, Eds., MIT Press, 2004, pp. 1141–1148. [Online]. Available: http://papers. nips.cc/paper/2463-when-does-non-negative-matrix-factorization-give-a-correct- decomposition-into-parts.pdf. [21] C. R. Johnson Jr., P. Schniter, T. J. Endres, J. D. Behm, D. R. Brown, and R. A. Casas, “Blind Equalization Using the Constant Modulus Criterion: A Review,” Proc. IEEE, vol. 86, no. 10, pp. 1927– 1950, Oct. 1998. [22] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “The Convex Geometry of Linear Inverse Problems,” Found. Comput. Math., vol. 12, no. 6, pp. 805–849, 2012. [23] M. S. Asif, W. Mantzel, and J. K. Romberg, “Random Channel Coding and Blind Deconvolution,” in 47 th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Sep. 2009, pp. 1021–1025. [24] C. Hegde and R. G. Baraniuk, “Sampling and Recovery of Pulse Streams,” IEEE Trans. Signal Process., vol. 59, no. 4, pp. 1505–1517, 2011. [25] A. Ahmed, B. Recht, and J. Romberg, “Blind Deconvolution Using Convex Programming,” IEEE Trans. Inf. Theory, vol. 60, no. 3, pp. 1711–1732, 2014. [26] P. Walk and P. Jung, “Compressed Sensing on the Image of Bilinear Maps,” in 2012 IEEE International Symposium on Information Theory Proceedings (ISIT), Jul. 2012, pp. 1291–1295. [27] K. Abed-Meraim, W. Qiu, and Y. Hua, “Blind System Identification,” Proc. IEEE, vol. 85, no. 8, pp. 1310–1322, Aug. 1997. [28] A. Kammoun, A. Aissa El Bey, K. Abed-Meraim, and S. Affes, “Robustness of Blind Subspace Based Techniques using ` p Quasi-norms,” in 2010 IEEE Eleventh International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2010, pp. 1–5. [29] R.GribonvalandK.Schnass,“DictionaryIdentification—SparseMatrix-Factorizationvia` 1 -Minimization,” IEEE Trans. Inf. Theory, vol. 56, no. 7, pp. 3523–3539, 2010, issn: 0018-9448. [30] E. J. Candès and T. C. Tao, “The Power of Convex Relaxation: Near-Optimal Matrix Completion,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2053–2080, 2010. [31] F. Kiraly and R. Tomioka, “A Combinatorial Algebraic Approach for the Identifiability of Low-Rank Matrix Completion,” ArXiv e-prints, vol. abs/1206.6470, Jun. 2012. eprint: 1206.6470. [Online]. Available: http://arxiv.org/abs/1206.6470. [32] E. Bai and M. Fu, “Blind System Identification and Channel Equalization of IIR Systems without Statistical Information,” IEEE Trans. Signal Process., vol. 47, no. 7, pp. 1910–1921, 1999. [33] K. Diamantaras, “Blind Channel Identification Based on the Geometry of the Received Signal Constellation,” IEEE Trans. Signal Process., vol. 50, no. 5, pp. 1133–1143, 2002. [34] M. Vanderveen and A. Paulraj, “Improved Blind Channel Identification Using a Parametric Approach,” IEEE Commun. Lett., vol. 2, no. 8, pp. 226–228, 1998. 134 [35] H. Liu, G. Xu, L. Tong, and T. Kailath, “Recent developments in blind channel equalization: From cyclostationarity to subspaces,” Signal Processing, vol. 50, no. 1, pp. 83–99, 1996, Special Issue on Subspace Methods, Part I: Array Signal Processing and Subspace Computations, issn: 0165-1684. [36] D. Kundur and D. Hatzinakos, “Blind Image Deconvolution,” IEEE Signal Process. Mag., vol. 13, no. 3, pp. 43–64, May 1996. [37] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding Blind Deconvolution Algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12, pp. 2354–2367, Dec. 2011. [38] O. Grellier, P. Comon, B. Mourrain, and P. Trébuchet, “Analytical Blind Channel Identification,” IEEE Trans. Signal Process., vol. 50, no. 9, pp. 2196–2207, 2002, issn: 1053-587X. [39] G. Giannakis and C. Tepedelenlioglu, “Basis Expansion Models and Diversity Techniques for Blind Identification and Equalization of Time-Varying Channels,” Proc. IEEE, vol. 86, no. 10, pp. 1969–1986, Oct. 1998, issn: 0018-9219. [40] C. Shin, R. Heath, and E. Powers, “Blind Channel Estimation for MIMO-OFDM Systems,” IEEE Trans. Veh. Technol., vol. 56, no. 2, pp. 670–685, 2007. [41] G. Xu, H. Liu, L. Tong, and T. Kailath, “A Least-Squares Approach to Blind Channel Identification,” IEEE Trans. Signal Process., vol. 43, no. 12, pp. 2982–2993, 1995. [42] E. de Carvalho and D. T. M. Slock, “Blind and Semi-Blind FIR Multichannel Estimation: (Global) Identifiability Conditions,” IEEE Trans. Signal Process., vol. 52, no. 4, pp. 1053–1064, 2004, issn: 1053-587X. [43] P. Vaidyanathan and B. Vrcelj, “Transmultiplexers as precoders in modern digital communication: A tutorial review,” in Proceedings of the 2004 International Symposium on Circuits and Systems (ISCAS), vol. 5, 2004, pp. 405–412. [44] J. H. Manton and W. D. Neumann, “Totally blind channel identification by exploiting guard intervals,” Systems & Control Letters, vol. 48, no. 2, pp. 113–119, 2003. [45] K. Herrity, R. Raich, and A. O. Hero III, “Blind Reconstruction of Sparse Images with Unknown Point Spread Function,” Computational Imaging VI, vol. 6814, no. 1, C. A. Bouman, E. L. Miller, and I. Pollak, Eds., 68140K, 2008. [46] D. Barchiesi and M. D. Plumbley, “Dictionary Learning of Convolved Signals,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp. 5812– 5815. [47] M. Vajapeyam, S. Vedantam, U. Mitra, J. C. Preisig, and M. Stojanovic, “Distributed Space–Time Cooperative Schemes for Underwater Acoustic Communications,” IEEE J. Ocean. Eng., vol. 33, no. 4, pp. 489–501, Oct. 2008. [48] N. Richard and U. Mitra, “Sparse Channel Estimation for Cooperative Underwater Communications: A Structured Multichannel Approach,” in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2008, pp. 5300–5303. [49] B. Bhanu, “Automatic Target Recognition: State of the Art Survey,” IEEE Trans. Aerosp. Electron. Syst., vol. AES-22, no. 4, pp. 364–379, Jul. 1986, issn: 0018-9251. [50] T. Aridgides, D. Antoni, M. F. Fernandez, and G. J. Dobeck, “Adaptive filter for mine detection and classification in side-scan sonar imagery,” in SPIE’s 1995 Symposium on OE/Aerospace Sensing and Dual Use Photonics, International Society for Optics and Photonics, 1995, pp. 475–486. [51] J. C. Hyland and G. J. Dobeck, “Sea mine detection and classification using side-looking sonar,” in SPIE’s 1995 Symposium on OE/Aerospace Sensing and Dual Use Photonics, International Society for Optics and Photonics, 1995, pp. 442–453. [52] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection: A Survey,” ACM Comput. Surv., vol. 41, no. 3, 15:1–15:58, Jul. 2009, issn: 0360-0300. 135 [53] P. F. Schweizer and W. Petlevich, “Automatic Target Detection and Cuing System for an Autonomous UnderwaterVehicle(auv),”inProceedings of the 6 th International Symposium on Unmanned Untethered Submersible Technology, Jun. 1989, pp. 359–371. [54] E. Dura, Y. Zhang, X. Liao, G. J. Dobeck, and L. Carin, “Active learning for detection of mine-like objects in side-scan sonar imagery,” IEEE J. Ocean. Eng., vol. 30, no. 2, pp. 360–371, Apr. 2005. [55] S. Reed, Y. Petillot, and J. Bell, “An automatic approach to the detection and extraction of mine features in sidescan sonar,” IEEE J. Ocean. Eng., vol. 28, no. 1, pp. 90–105, Jan. 2003. [56] K. Mukherjee, S. Gupta, A. Ray, and S. Phoha, “Symbolic analysis of sonar data for underwater target detection,” IEEE J. Ocean. Eng., vol. 36, no. 2, pp. 219–230, Apr. 2011. [57] N. Kumar, Q. F. Tan, and S. S. Narayanan, “Object classification in sidescan sonar images with sparse representation techniques,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Mar. 2012, pp. 1333–1336. [58] G. Hollinger, S. Choudhary, P. Qarabaqi, C. Murphy, U. Mitra, G. Sukhatme, M. Stojanovic, H. Singh, and F. Hover, “Underwater Data Collection Using Robotic Sensor Networks,” IEEE J. Sel. Areas Commun., vol. 30, no. 5, pp. 899–911, Jun. 2012. [59] N. K. Yilmaz, “Path planning of autonomous underwater vehicles for adaptive sampling,” PhD thesis, Massachusetts Institute of Technology, 2005. [60] F. Bourgault, T. Furukawa, and H. F. Durrant-Whyte, “Optimal Search for a Lost Target in a Bayesian World,” in Field and Service Robotics, ser. Springer Tracts in Advanced Robotics, vol. 24, Springer Berlin Heidelberg, 2006, pp. 209–222. [61] L. Mihaylova, T. Lefebvre, H. Bruyninckx, K. Gadeyne, and J. De Schutter, “A Comparison of Decision Making Criteria and Optimization Methods for Active Robotic Sensing,” in Numerical Methods and Applications, ser. Lecture Notes in Computer Science, I. Dimov, I. Lirkov, S. Margenov, and Z. Zlatev, Eds., vol. 2542, Springer Berlin Heidelberg, 2003, pp. 316–324, isbn: 978-3-540-00608-4. [62] R. Hummel, S. Poduri, F. Hover, U. Mitra, and G. Sukhatme, “Mission design for compressive sensing with mobile robots,” in 2011 IEEE International Conference on Robotics and Automation (ICRA), May 2011, pp. 2362–2367. [63] B. J. Englot, “Sampling-based coverage path planning for complex 3D structures,” PhD thesis, Massachusetts Institute of Technology, 2012. [64] G. Hollinger, B. Englot, F. Hover, U. Mitra, and G. Sukhatme, “Uncertainty-driven view planning for underwater inspection,” in 2012 IEEE International Conference on Robotics and Automation (ICRA), May 2012, pp. 4884–4891. [65] I. Bekkerman and J. Tabrikian, “Target Detection and Localization Using MIMO Radars and Sonars,” IEEE Trans. Signal Process., vol. 54, no. 10, pp. 3873–3883, Oct. 2006. [66] D. Kalogerias, S. Sun, and A. Petropulu, “Sparse sensing in colocated MIMO radar: A matrix completion approach,” in 2013 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Dec. 2013, pp. 496–502. [67] D. S. Kalogerias and A. P. Petropulu, “Matrix completion in colocated MIMO radar: Recoverability, bounds & theoretical guarantees,” IEEE Trans. Signal Process., vol. 62, no. 2, pp. 309–321, 2014. [68] S. Sun, A. Petropulu, and W. Bajwa, “Target estimation in colocated MIMO radar via matrix completion,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2013, pp. 4144–4148. [69] E. J. Candes and Y. Plan, “Matrix Completion With Noise,” Proc. IEEE, vol. 98, no. 6, pp. 925–936, Jun. 2010. [70] D. Gross, “Recovering Low-Rank Matrices From Few Coefficients in Any Basis,” IEEE Trans. Inf. Theory, vol. 57, no. 3, pp. 1548–1566, 2011. 136 [71] S. Negahban and M. J. Wainwright, “Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise,” J. Mach. Learn. Res., vol. 13, no. 1, pp. 1665–1697, May 2012. [72] J. Haupt, R. M. Castro, and R. Nowak, “Distilled sensing: Adaptive sampling for sparse detection and estimation,” IEEE Trans. Inf. Theory, vol. 57, no. 9, pp. 6222–6235, Sep. 2011. [73] J. Haupt, R. Baraniuk, R. Castro, and R. Nowak, “Sequentially designed compressed sensing,” in Proc. IEEE Statistical Signal Processing Workshop, Aug. 2012, pp. 401–404. [74] M. L. Malloy and R. D. Nowak, “Near-optimal adaptive Compressed Sensing,” in Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Nov. 2012, pp. 1935–1939. [75] Y. Chi, L. L. Scharf, A. Pezeshki, and A. R. Calderbank, “Sensitivity to basis mismatch in compressed sensing,” IEEE Trans. Signal Process., vol. 59, no. 5, pp. 2182–2195, 2011. [76] A. F. Molisch, Wireless Communications, 2nd ed. Wiley, Dec. 2010. [77] G. Taubock, F. Hlawatsch, D. Eiwen, and H. Rauhut, “Compressive estimation of doubly selective channels in multicarrier systems: Leakage effects and sparsity-enhancing processing,” IEEE J. Sel. Topics Signal Process., vol. 4, no. 2, pp. 255–271, 2010. [78] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressed channel sensing: A new approach to estimating sparse multipath channels,” Proc. IEEE, vol. 98, no. 6, pp. 1058–1076, 2010. [79] S. Beygi, U. Mitra, and E. Ström, “Nested Sparse Approximation: Structured Estimation of V2V Channels Using Geometry-Based Stochastic Channel Model,” ArXiv e-prints, vol. abs/1412.2999, Dec. 2014. eprint: 1412.2999. [Online]. Available: http://arxiv.org/abs/1412.2999. [80] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Rev., vol. 43, no. 1, pp. 129–159, 2001, Reprinted from SIAM J. Sci. Comput. 20 (1998), no. 1, 33–61 (electronic). [81] E. J. Candès, T. Strohmer, and V. Voroninski, “PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming,” Comm. Pure Appl. Math., vol. 66, no. 8, pp. 1241– 1274, 2013. [82] B. Recht, W. Xu, and B. Hassibi, “Null space conditions and thresholds for rank minimization,” Math. Program., vol. 127, no. 1, Ser. B, pp. 175–202, 2011. [83] E. J. Candès and Y. Plan, “Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements,” IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 2342–2359, 2011. [84] K. Lee, Y. Wu, and Y. Bresler, “Near Optimal Compressed Sensing of Sparse Rank-One Matrices via Sparse Power Factorization,” ArXiv e-prints, vol. abs/1312.0525, Dec. 2013. [Online]. Available: http://arxiv.org/abs/1312.0525. [85] K. Jaganathan, S. Oymak, and B. Hassibi, “Sparse Phase Retrieval: Convex Algorithms and Limita- tions,” in 2013 IEEE International Symposium on Information Theory Proceedings (ISIT), Jul. 2013, pp. 1022–1026. [86] ——, “Sparse Phase Retrieval: Uniqueness Guarantees and Recovery Algorithms,” ArXiv e-prints, vol. abs/1311.2745, Nov. 2013. [Online]. Available: http://arxiv.org/abs/1311.2745. [87] A. Beck, “Convexity Properties Associated with Nonconvex Quadratic Matrix Functions and Appli- cations to Quadratic Programming,” J. Optim. Theory Appl., vol. 142, no. 1, pp. 1–29, 2009, issn: 0022-3239. [88] W. Rudin, Real and complex analysis, Third. New York: McGraw-Hill Book Co., 1987, pp. xiv+416, isbn: 0-07-054234-1. [89] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization,” SIAM Rev., vol. 52, no. 3, pp. 471–501, 2010. 137 [90] M. Ledoux and M. Talagrand, Probability in Banach Spaces, ser. Classics in Mathematics. Berlin: Springer-Verlag, 2011, pp. xii+480, Isoperimetry and Processes, Reprint of the 1991 Edition. [91] E. J. Candès and T. C. Tao, “Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?” IEEE Trans. Inf. Theory, vol. 52, no. 12, pp. 5406–5425, 2006. [92] R. Vershynin. (2009). Lectures in Geometric Functional Analysis, [Online]. Available: http://www- personal.umich.edu/~romanv/papers/GFA-book/GFA-book.pdf. [93] R. Baraniuk, M. A. Davenport, M. F. Duarte, and C. Hegde. (Apr. 2011). An Introduction to Compressive Sensing, [Online]. Available: http://cnx.org/content/col11133/1.5/. [94] K. Mohan and M. Fazel, “Reweighted nuclear norm minimization with application to system identifi- cation,” in 2010 American Control Conference (ACC), Jun. 2010, pp. 2953–2959. [95] S. Lambotharan and J. A. Chambers, “A New Blind Equalization Structure for Deep-Null Communi- cation Channels,” IEEE Trans. Circuits Syst. II, vol. 45, no. 1, pp. 108–114, Jan. 1998. [96] J. Hadamard, “Sur les problèmes aux dérivées partielles et leur signification physique,” Princeton University Bulletin, vol. 13, no. 28, pp. 49–52, 1902. [97] A. N. Tikhonov and V. Y. Arsenin, Metody resheniya nekorrektnykh zadach, Third. “Nauka”, Moscow, 1986, p. 288. [98] D. L. Donoho, “Sparse Components of Images and Optimal Atomic Decompositions,” Constr. Approx., vol. 17, no. 3, pp. 353–382, 2001. [99] W. Li and J. Preisig, “Estimation of Rapidly Time-Varying Sparse Channels,” IEEE J. Ocean. Eng., vol. 32, no. 4, pp. 927–939, Oct. 2007. [100] C. R. Berger, S. Zhou, J. C. Preisig, and P. Willett, “Sparse Channel Estimation for Multicarrier Underwater Acoustic Communication: From Subspace Methods to Compressed Sensing,” IEEE Trans. Signal Process., vol. 58, no. 3, pp. 1708–1721, Mar. 2010. [101] C. H. Papadimitriou, H. Tamaki, P. Raghavan, and S. Vempala, “Latent Semantic Indexing: A ProbabilisticAnalysis,”in Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, ser. PODS ’98, Seattle, Washington, USA: ACM, 1998, pp. 159–168, isbn: 0-89791-996-3. [102] O. Shalvi and E. Weinstein, “New Criteria for Blind Deconvolution of Nonminimum Phase Systems (Channels),” IEEE Trans. Inf. Theory, vol. 36, no. 2, pp. 312–321, Mar. 1990. [103] S. Choudhary and U. Mitra, “Identifiability Scaling Laws in Bilinear Inverse Problems,” ArXiv e- prints, vol. abs/1402.2637, Feb. 2014, submitted to IEEE Transactions on Information Theory. arXiv: 1402.2637. [Online]. Available: http://arxiv.org/abs/1402.2637. [104] Y. Li, K. Lee, and Y. Bresler, “A Unified Framework for Identifiability Analysis in Bilinear Inverse Problems with Applications to Subspace and Sparsity Models,” ArXiv e-prints, vol. abs/1501.06120, Jan. 2015. [Online]. Available: http://arxiv.org/abs/1501.06120. [105] A. Fannjiang, “Absolute uniqueness of phase retrieval with random illumination,” Inverse Problems, vol. 28, no. 7, pp. 075008, 20, 2012, issn: 0266-5611. [106] P. Mattila, Geometry of sets and measures in Euclidean spaces, Fractals and rectifiability, ser. Cam- bridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 1995, vol. 44, pp. xii+343, isbn: 0-521-46576-1; 0-521-65595-1. [107] S. Choudhary and U. Mitra, “On Identifiability in Bilinear Inverse Problems,” in 2013 IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, May 2013, pp. 4325–4329. [108] G. Helmberg, Introduction to Spectral Theory in Hilbert Space (Dover Books on Mathematics). Dover Publications, Jun. 2008, isbn: 9780486466224. 138 [109] A. Edelman and N. R. Rao, “Random matrix theory,” Acta Numer., vol. 14, pp. 233–297, 2005. [110] A. Liavas, P. Regalia, and J.-P. Delmas, “Blind Channel Approximation: Effective Channel Order Determination,” IEEE Trans. Signal Process., vol. 47, no. 12, pp. 3336–3344, 1999. [111] J. H. Manton, “An Improved Least Squares Blind Channel Identification Algorithm for Linearly and Affinely Precoded Communication Systems,” IEEE Signal Process. Lett., vol. 9, no. 9, pp. 282–285, Sep. 2002. [112] G. Hollinger, S. Choudhary, P. Qarabaqi, C. Murphy, U. Mitra, G. Sukhatme, M. Stojanovic, H. Singh, and F. Hover, “Communication Protocols for Underwater Data Collection using a Robotic Sensor Network,” in 2011 IEEE GLOBECOM Workshops (GC Wkshps), Houston, USA, Dec. 2011, pp. 1308–1313. [113] S. Choudhary, N. Kumar, S. Narayanan, and U. Mitra, “Active Target Detection with Mobile Agents,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014, pp. 4218–4222. [114] G. Hollinger, U. Mitra, and G. Sukhatme, “Autonomous data collection from underwater sensor networks using acoustic communication,” in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2011, pp. 3564–3570. [115] L. Freitag, M. Johnson, M. Grund, S. Singh, and J. Preisig, “Integrated Acoustic Communication and Navigation for Multiple UUVs,” in OCEANS, 2001. MTS/IEEE Conference and Exhibition, vol. 4, Nov. 2001, pp. 2065–2070. [116] S. Choudhary, D. Kartik, N. Kumar, S. Narayanan, and U. Mitra, “Active Target Detection with Navigation Costs: A Randomized Benchmark,” in 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, USA, Sep. 2014, pp. 109–115. [117] S. Choudhary and U. Mitra, “On The Impossibility of Blind Deconvolution for Geometrically Decaying Subspace Sparse Signals,” in 2nd IEEE Global Conference on Signal and Information Processing (GlobalSIP), Atlanta, USA, Dec. 2014, pp. 463–467. [118] A. F. Molisch, “Ultra-Wide-Band Propagation Channels,” Proc. IEEE, vol. 97, no. 2, pp. 353–371, Feb. 2009, issn: 0018-9219. [119] A. Molisch, D. Cassioli, C.-C. Chong, S. Emami, A. Fort, B. Kannan, J. Karedal, J. Kunisch, H. Schantz, K. Siwiak, and M. Win, “A Comprehensive Standardized Model for Ultrawideband Propagation Channels,” IEEE Trans. Antennas Propag., vol. 54, no. 11, pp. 3151–3166, Nov. 2006, issn: 0018-926X. [120] A. Saleh and R. Valenzuela, “A Statistical Model for Indoor Multipath Propagation,” IEEE J. Sel. Areas Commun., vol. 5, no. 2, pp. 128–137, Feb. 1987, issn: 0733-8716. [121] N.Michelusi,B.Tomasi,U.Mitra,J.Preisig,andM.Zorzi,“AnEvaluationoftheHybridSparse/Diffuse Algorithm for Underwater Acoustic Channel Estimation,” in OCEANS 2011, Sep. 2011, pp. 1–10. [122] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004, pp. xiv+716, isbn: 0-521-83378-7. [123] M. Grant and S. Boyd, “Graph implementations for nonsmooth convex programs,” in Recent Advances in Learning and Control, ser. Lecture Notes in Control and Information Sciences, V. Blondel, S. Boyd, and H. Kimura, Eds., http://stanford.edu/~boyd/graph_dcp.html, Springer-Verlag Limited, 2008, pp. 95–110. [124] I. CVX Research, Cvx: Matlab Software for Disciplined Convex Programming, version 2.0, http: //cvxr.com/cvx, Aug. 2012. [125] Y. Xu, W. Yin, Z. Wen, and Y. Zhang, “An alternating direction algorithm for matrix completion with nonnegative factors,” Frontiers of Mathematics in China, vol. 7, no. 2, pp. 365–384, 2012. [126] Y. Shen, Z. Wen, and Y. Zhang, “Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization,” Optim. Methods Softw., vol. 29, no. 2, pp. 239–263, 2014. 139 [127] M. Ayer, H. D. Brunk, G. M. Ewing, W. T. Reid, and E. Silverman, “An empirical distribution function for sampling with incomplete information,” Ann. Math. Statist., vol. 26, pp. 641–647, 1955. [128] Q. F. Stout, “Unimodal Regression via Prefix Isotonic Regression,” Comput. Stat. Data Anal., vol. 53, no. 2, pp. 289–297, Dec. 2008. [129] D. T. Lee and C. K. Wong, “Worst-case analysis for region and partial region searches in multidi- mensional binary search trees and balanced quad trees,” Acta Informat., vol. 9, no. 1, pp. 23–29, 1977. [130] M. Kaul, B. Yang, and C. S. Jensen, “Building accurate 3d spatial networks to enable next generation intelligent transportation systems,” in 2013 IEEE 14 th International Conference on Mobile Data Management (MDM), IEEE, vol. 1, Jun. 2013, pp. 137–146. [131] C. Guo, Y. Ma, B. Yang, C. S. Jensen, and M. Kaul, “Ecomark: Evaluating models of vehicular environmental impact,” in Proceedings of the 20 th International Conference on Advances in Geographic Information Systems, ser. SIGSPATIAL ’12, ACM, New York, NY, USA, 2012, pp. 269–278. [132] M. Spiegel, S. Lipschutz, and J. Liu, Schaum’s Outline of Mathematical Handbook of Formulas and Tables, 3ed (Schaum’s Outline Series), 3rd ed. McGraw-Hill, Aug. 2008, isbn: 9780071548557. [133] D. W. Matolak, “V2V communication channels: State of knowledge, new results, and what’s next,” in Communication Technologies for Vehicles, Springer, 2013, pp. 1–21. [134] S. Beygi, E. G. Ström, and U. Mitra, “Structured sparse approximation via generalized regulariz- ers: With application to V2V channel estimation,” in IEEE Global Telecommunications Conference (GLOBECOM), 2014. [135] S. Beygi, E. G. Strom, and U. Mitra, “Geometry-based stochastic modeling and estimation of vehicle to vehicle channels,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4289–4293. [136] U. Mitra, Z. Tang, S. Beygi, G. Leus, X. Tu, and S. Yerramalli, “Underwater Acoustic Communication System Design: Exploiting Multi-scale/Multi-lag Channels,” Submitted to the IEEE Communication Magazine, 2015. [137] S. Beygi and U. Mitra, “Multi-scale multi-lag channel estimation using low rank structure of received signal,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4299–4303. [138] S. Jnawali, S. Beygi, and H.-R. Bahrami, “RF impairments compensation and channel estimation in MIMO-OFDM systems,” in 2011 IEEE Vehicular Technology Conference (VTC Fall), 2011, pp. 1–5. [139] N. Michelusi, U. Mitra, A. Molisch, and M. Zorzi, “UWB Sparse/Diffuse Channels, Part I: Channel Models and Bayesian Estimators,” IEEE Trans. Signal Process., vol. 60, no. 10, pp. 5307–5319, Oct. 2012, issn: 1053-587X. [140] ——, “UWB Sparse/Diffuse Channels, Part II: Estimator Analysis and Practical Channels,” IEEE Trans. Signal Process., vol. 60, no. 10, pp. 5320–5333, Oct. 2012, issn: 1053-587X. [141] G. Matz, H. Bolcskei, and F. Hlawatsch, “Time-frequency foundations of communications: Concepts and tools,” IEEE Signal Process. Mag., vol. 30, no. 6, pp. 87–96, 2013. [142] R. Remmert, Theory of complex functions, ser. Graduate Texts in Mathematics. Springer-Verlag, New York, 1991, vol. 122, pp. xx+453, Translated from the second German edition by Robert B. Burckel, Readings in Mathematics, isbn: 0-387-97195-5. [143] R. A. Horn and C. R. Johnson, Topics in matrix analysis. Cambridge University Press, Cambridge, 1991, pp. viii+607. [144] J. Sun, Q. Qu, and J. Wright, “Complete Dictionary Recovery over the Sphere,” ArXiv e-prints, vol. abs/1504.06785, Apr. 2015. [Online]. Available: http://arxiv.org/abs/1504.06785. 140 [145] S. Arora, A. Bhaskara, R. Ge, and T. Ma, “More Algorithms for Provable Dictionary Learning,” ArXiv e-prints, vol. abs/1401.0579, Jan. 2014. [Online]. Available: http://arxiv.org/abs/1401.0579. [146] S. Arora, R. Ge, R. Kannan, and A. Moitra, “Computing a nonnegative matrix factorization - provably,” in Proceedings of the 44 th Symposium on Theory of Computing Conference (STOC), 2012, pp. 145–162. [147] A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, and R. Tandon, “Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization,” ArXiv e-prints, vol. abs/1310.7991, Oct. 2013. [Online]. Available: http://arxiv.org/abs/1310.7991. [148] J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Math. Program., vol. 146, no. 1-2, Ser. A, pp. 459–494, 2014. 141
Abstract (if available)
Abstract
Linear models are the fundamental building blocks for numerous estimation algorithms throughout science and engineering. Decades of research into the estimation of linear models has yielded strong theoretical foundations as well as a rich set of algorithms for this class of problems. While still widely used, linear modeling has become increasing limited in its ability to capture the nature of many real datasets in today’s applications, necessitating research into the estimation of non-linear models. Although significant progress has been made towards developing generalizable theories for estimation of non-linear models, intuitive understanding of the characteristics of such estimation problems is far from well understood. Although theoretical and practical understanding of non-linear estimation problems is likely to keep researchers busy for next few decades, any estimation problem (regardless of the linearity of the underlying model) needs to address the following two questions at the minimum: 1. Is the problem well-posed? 2. If the problem is well-posed, does there exist an efficient algorithm to solve it? ❧ While the specific definition of ‘well-formulated’ and ‘efficient’ varies with the application in question and the scale of data, the intuition associated with these questions is fairly standard. Well-posedness characterizes the information theoretic feasibility of the estimation problem assuming no restriction on computational resources, whereas efficiency refers to the achievability of the estimate with limited computational resources for a well-posed problem. This dissertation was born out of the need to understand a generalizable framework for analyzing well-posedness of one class of nonlinear estimation problems. Herein, fundamental questions about signal identifiability and efficient recovery for cone-constrained bilinear models are raised and partially answered. ❧ Chapter 2 abstractly formulates finite dimensional inverse problems with bilinear observations and conic constraints on the unknowns. Conic constraints are chosen since they represent a large class of regularizing techniques (like sparsity, low-rank nature, etc.) and bi-linearity is studied since it is arguably the simplest form of non-linear modeling after piecewise-linearity. Some general dimensionality based techniques are developed for studying well-posedness in terms of the existence of a unique answer in the absence of noise. The flexibility and generalizability of the proposed approach stem from the lifting procedure in optimization that is a key step in the technical development of the results. Some identifiability results for one specific and important problem (blind deconvolution under separable conic constraints) are worked out to illustrate the power of the proposed approach. ❧ Chapter 3 is devoted to an in-depth study of the single channel blind deconvolution problem under specific conic constraints like canonical-sparsity and repetition coding. Blind deconvolution is chosen owing to its ubiquity in signal processing applications and notorious ill-posedness without application specific assumptions. This chapter flushes out the full power of the lifting technique for the identifiability analysis machinery developed in Chapter 2 by fully characterizing the ambiguity space for single channel blind deconvolution and obtaining surprisingly strong un-identifiability results. ❧ Chapter 4 studies the problem of localizing a target from samples of its separable and decaying field signature. This problem is a disguised instance of a bilinear inverse problem with conic constraints. However, unlike the focus on identifiability analysis in Chapters 2 and 3, the goal here is to derive a localization algorithm with theoretical guarantees on convergence rate by leveraging the approximate bi-linearity in the noisy target signature. It is further proved that both the number of samples and the computational effort necessary to achieve a given localization accuracy is much lower than that needed for a naive matrix completion based algorithm with passive sampling. The sufficiency of the lower number of samples for the proposed approach stems from the utilization of the unimodal approximation on the target signature field. The unimodal assumption is data-driven and is yet another example of a conic regularizing constraint. Besides allowing for sufficiency of the lower number of samples, unimodality also provides robustness against simultaneously sparse and low-rank observation noise. This type of noise significantly degrades the performance of low-rank matrix completion based estimators in the absence of prior information. ❧ Chapter 5 studies the identifiability and recovery of delay-Doppler components of a narrowband time-varying communication channel with leakage caused by finite blocklength and finite transmission bandwidth. The lifting technique is used to illustrate that this is a low-rank matrix recovery problem with complex exponential structural constraints. It is further shown that if the global optimum of the estimation problem is uniquely identifiable then there are no other local optima and that a naive low-rank matrix recovery algorithm like nuclear norm minimization is bound to fail due to un-identifiability that stems from ignorance of structural constraints. Since low-rank matrix structure can be equivalently transformed into a bilinear model, a simple and efficient estimation algorithm is designed based on the alternating minimization heuristic and shown to work well where basis pursuit based approaches fail.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Novel optimization tools for structured signals recovery: channels estimation and compressible signal recovery
PDF
Exploitation of sparse and low-rank structures for tracking and channel estimation
PDF
On the interplay between stochastic programming, non-parametric statistics, and nonconvex optimization
PDF
New approaches to satellite formation-keeping and the inverse problem of the calculus of variations
PDF
Finite dimensional approximation and convergence in the estimation of the distribution of, and input to, random abstract parabolic systems with application to the deconvolution of blood/breath al...
PDF
Representation, classification and information fusion for robust and efficient multimodal human states recognition
Asset Metadata
Creator
Choudhary, Sunav
(author)
Core Title
On the theory and applications of structured bilinear inverse problems to sparse blind deconvolution, active target localization, and delay-Doppler estimation
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
09/12/2016
Defense Date
02/05/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
active target localization,alternating minimization,bilinear inverse problems,blind deconvolution,identifiability,low-rank matrix recovery,non-convex optimization,OAI-PMH Harvest,parametric representation,rank two null space,rank-one matrix completion,sparse approximation
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mitra, Urbashi (
committee chair
), Haldar, Justin (
committee member
), Sen, Suvrajeet (
committee member
)
Creator Email
sunavch@gmail.com,sunavcho@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-302431
Unique identifier
UC11280280
Identifier
etd-ChoudharyS-4782.pdf (filename),usctheses-c40-302431 (legacy record id)
Legacy Identifier
etd-ChoudharyS-4782.pdf
Dmrecord
302431
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Choudhary, Sunav
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
active target localization
alternating minimization
bilinear inverse problems
blind deconvolution
identifiability
low-rank matrix recovery
non-convex optimization
parametric representation
rank two null space
rank-one matrix completion
sparse approximation