Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Exploitation of sparse and low-rank structures for tracking and channel estimation
(USC Thesis Other)
Exploitation of sparse and low-rank structures for tracking and channel estimation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EXPLOITATION OF SPARSE AND LOW-RANK STRUCTURES FOR TRACKING AND CHANNEL ESTIMATION by Amr Elnakeeb A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2020 Copyright 2020 Amr Elnakeeb Epigraph “The gorilla study illustrates two important facts about our minds: we can be blind to the obvious, and we are also blind to our blindness.” —Daniel Kahneman, Thinking, Fast and Slow “We flipped a coin to see who was going to pay for the meal. I lost and paid. He was about to thank me when he abruptly stopped and said that he paid for half of it probabilistically.” —Nassim Taleb, Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets “Understanding how to act under conditions of incomplete information is the highest and most urgent human pursuit.” —Nassim Taleb, The Black Swan: The Impact of the Highly Improbable ii Dedication To the Elnakeeb Family: Past, present and future... iii Acknowledgements First and foremost, thanks God the most merciful and the most beneficent to whom I relate any success or achievement in my life. I would like to express my sincere thanks and deepest gratitude to my PhD advisor, Prof. Urbashi Mitra, for her continuous support, thoughtfulness, advice and guidance throughout this work. Thank you “Ubli.” I would like to thank my defense committee: Prof. Mahdi Soltanolkotabi and Prof. Larry Goldstein. I would like to extend my thanks to the qualifying exam committee that included the two aforementioned faculty members, in addition to Prof. Andreas Molisch and Prof. Mahta Moghaddam for their thoughtful comments and recommendations on this dissertation. A special thanks to Dr. David Maluf, my internship supervisor at Cisco Systems Inc., for his encouragement and his cheerful discussions. I have enormously learned from him on the technical and personal levels, which both contributed to shaping my way of thinking. I would like to acknowledge my parents, my brothers Mohamed and Khaled, and my fianc´ ee Sarah for their constant support, love and encouragement. I am incredibly privileged having them in my life. Their presence made the PhD journey possible. I would like to extend my thanks to Susan Wiedem, our business/budget administrator for her time, flexibility and help with the paperwork. All appreciation and respect go to her, in addition to her cheerful conversations that made long, tough days go easy. iv Speaking of friends, I would like to recognize the invaluable assistance and support that you all provided as well as the joyful conversations that we had during the journey; friends and lab-mates from USC, friends from Cairo University and childhood friends. A special thanks goes to my dear friend, Ahmed Morsy, who is, in fact, not just a friend, but a brother in bond. Thanks for the wonderful time we spent together during the PhD journey. I would like to thank Libin Liu and Dhruva Mokhasunavisu, my lab-mates, for their help and their awesome conversations throughout the PhD time. I would like to extend my thanks to Ramy Tadros, a former graduate of USC, for answering logistical questions and passing down prior experience. Finally, I would like to thank Starbucks, The Coffee Bean & Tea Leaf and illy for their dozens supplies of coffee and lattes during the sleepless nights! v Table of Contents Epigraph ii Dedication iii Acknowledgements iv List of Tables ix List of Figures x Abstract xii Related Publications xiv 1 Introduction 1 1.1 Sparse and low-rank structures: “The Why” . . . . . . . . . . . . . . . 1 1.2 Thesis contributions: “The What” . . . . . . . . . . . . . . . . . . . . 1 1.3 Thesis organization: “The How” . . . . . . . . . . . . . . . . . . . . . 3 2 Line Constrained Estimation with Applications to Target Tracking: Exploiting Sparsity and Low-rank 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Signal Model and Optimization Problem . . . . . . . . . . . . . . . . . 9 2.4 Solving the Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 Derivation of the update step forL . . . . . . . . . . . . . . . . 12 2.4.2 Derivation of the update step forS . . . . . . . . . . . . . . . . 13 2.4.3 Derivation of the update step forR . . . . . . . . . . . . . . . 14 2.4.4 Derivation of the update step forS r . . . . . . . . . . . . . . . 15 2.5 Cram´ er–Rao bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.6 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.7 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.7.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . 22 vi 2.7.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.7.3 CRB Comparisons . . . . . . . . . . . . . . . . . . . . . . . . 28 3 Nonlinear Trajectory Tracking via Variety-Based Background Subtraction Models 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Signal Model and Optimization Problem . . . . . . . . . . . . . . . . . 36 3.2.1 RPCA: Prior related work . . . . . . . . . . . . . . . . . . . . 36 3.2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2.3 Proposed optimization problem . . . . . . . . . . . . . . . . . 38 3.3 Solving the Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 Convergence of VBSI . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Practical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5.1 General mapping . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5.2 Changes in derivations . . . . . . . . . . . . . . . . . . . . . . 46 3.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6.1 The Art of Tuning . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6.2 Performance Comparisons . . . . . . . . . . . . . . . . . . . . 50 4 Leaked Channel Estimation: Exploitation of Sparse and Low-rank Structures 54 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 Parametric Signal Representation . . . . . . . . . . . . . . . . . . . . . 60 4.4 Structured Estimation of Time-varying Narrowband Channels . . . . . 62 4.4.1 Nonconvex Alternating Direction Minimization . . . . . . . . . 62 4.4.2 Parametric Low-rank Atomic Norm . . . . . . . . . . . . . . . 69 4.5 Numerical Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.5.1 Signal Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.5.2 Performance Comparisons . . . . . . . . . . . . . . . . . . . . 76 4.5.3 Doppler Estimation and Resolution Constraint . . . . . . . . . 78 4.5.4 Bit Error Performance Comparison . . . . . . . . . . . . . . . 80 5 Leaked Channel Estimation: Extensions to MIMO OFDM and performance bounds 82 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3 Solving the Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.3.1 Convex Optimization via the Atomic Norm . . . . . . . . . . . 90 5.3.2 Optimality and Uniqueness . . . . . . . . . . . . . . . . . . . . 91 5.3.3 Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . 91 5.4 Cram´ er–Rao bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 vii 5.4.1 Cram´ er–Rao Bound Derivation . . . . . . . . . . . . . . . . . 92 5.4.2 Cram´ er–Rao Bound on the Whole Channel Estimator . . . . . . 96 5.5 Sequences that optimize the Cram´ er–Rao Bound . . . . . . . . . . . . 98 5.6 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.6.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.6.2 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . 107 6 Conclusions 110 A Line Constrained Estimation with Applications to Target Tracking: Exploiting Sparsity and Low-rank 114 A.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.1.1 Background and preliminaries . . . . . . . . . . . . . . . . . . 114 A.1.2 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2 Proof of Corollary 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.3 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 A.4 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 B Nonlinear Trajectory Tracking via Variety-Based Background Subtraction Models 126 B.1 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 B.2 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 B.3 Proof of Theorem 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 B.4 Changes in Key Theorems 4-6 due to Practical Considerations . . . . . 137 C Leaked Channel Estimation: Exploitation of Sparse and Low-rank Structures 138 C.1 Proof of Theorem 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 C.2 Proof of Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 C.3 Proof of Theorem 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 D Leaked Channel Estimation: Extensions to MIMO OFDM and performance bounds 149 D.1 Proof of Theorem 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 D.2 Atomic Norm and Optimization Problem Construction . . . . . . . . . 151 D.3 Proof of Theorem 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 D.4 Derivation of Approximate Identity 1 . . . . . . . . . . . . . . . . . . . 154 Bibliography 161 viii List of Tables 2.1 NMSE forL andS for various SNR values . . . . . . . . . . . . . . . 28 3.1 NMSE ofS,L andS 0 for the two varieties . . . . . . . . . . . . . . . 51 ix List of Figures 2.1 Signal model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Error of the matricesL andS versus SNR. . . . . . . . . . . . . . . . 25 2.3 NMSE of Position Estimation versus SNR. . . . . . . . . . . . . . . . 26 2.4 NMSE of Velocity Estimation versus SNR. . . . . . . . . . . . . . . . 26 2.5 NMSE of Position Estimation versus SNR. . . . . . . . . . . . . . . . 27 2.6 NMSE of Velocity Estimation versus SNR. . . . . . . . . . . . . . . . 27 2.7 Frame 71. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.8 Frame 71 with noise (SNR=30 dB). . . . . . . . . . . . . . . . . . . . 29 2.9 Ground Truth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.10 Estimated Sequence SNR = 30 dB. . . . . . . . . . . . . . . . . . . . 29 2.11 Comparison of Vertical (or horizontal) lines for different SNR values. . 30 2.12 Comparison of lines with various slopes (variousθ) for different SNR values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.13 Comparison of ALL techniques with CRB for different SNR values. . . 32 3.1 Signal model example. . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Shapes of second order and third order varieties. . . . . . . . . . . . . . 49 3.3 Convergence rate of VBSI for the two varieties. . . . . . . . . . . . . . 50 3.4 Comparison of different techniques versus SNR. . . . . . . . . . . . . . 51 3.5 Convergence for noisy scenario. . . . . . . . . . . . . . . . . . . . . . 52 3.6 MSE and the value of the bound derived in Approximation 1 vs different system parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1 ±2σ Normalized mean-squared-error (NMSE) curves of estimation strategies vs. signal-to-noise-ratio (SNR) at the receiver side—leaked channel estimation. 76 4.2 Normalized mean-squared-error (NMSE) vs. signal-to-noise-ratio (SNR) at the receiver side—channel Doppler shifts estimation using PLAN approach (effect ofn T ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.3 Normalized mean-squared-error (NMSE) vs. signal-to-noise-ratio (SNR) at the receiver side—channel Doppler shifts estimation using PLAN approach (effect ofm 0 andp 0 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 Doppler shifts are the roots ofQ(ν). This figure depicts1−Q(ν) vs. ν. For n T =100,p 0 =5, andm 0 =10. . . . . . . . . . . . . . . . . . . . . . . 79 x 4.5 Doppler shifts are the roots ofQ(ν). This figure depicts1−Q(ν) vs. ν. For n T =200,p 0 =5, andm 0 =10. . . . . . . . . . . . . . . . . . . . . . . 80 4.6 BER vs. SNR performance comparison. The label “PLAN-not satisfied” indicates that the resolution constraint is not enforced in the channel realization, while “PLAN-satisfied” indicates the Doppler shifts are well-separated as given in Theorem 9. . . . . . . . . . . . . . . . . . . . . 81 5.1 CRB/NMSE of delay estimation versus SNR. . . . . . . . . . . . . . . 101 5.2 CRB/NMSE of channel estimation techniques versus SNR. . . . . . . . 101 5.3 NMSE/CRB of Doppler for various settings versus SNR. . . . . . . . . 104 5.4 Distance from whiteness versus sample size. . . . . . . . . . . . . . . . 104 5.5 AF for delay-optimized sequences. . . . . . . . . . . . . . . . . . . . . 105 5.6 AF for Doppler-optimized sequences. . . . . . . . . . . . . . . . . . . 105 5.7 CAF between delay-optimized sequences and Doppler-optimized sequences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.8 BER of SPACE’08 data for various OFDM blocks. . . . . . . . . . . . 108 xi Abstract Sparse and low-rank structures have been actively considered in a variety of applications. In this work, we examine two important applications: Target tracking and leaked channel estimation. First, trajectory estimation of moving targets is examined. Background subtraction methods, exploiting low-rank backgrounds and sparse features of interest, are extended to incorporate linear constraints. The line constraint is enforced via a rotation that yields an additional low rank condition. The optimization is solved via the Augmented Lagrange Multiplier method. Furthermore, the background subtraction model is extended to encompass nonlinear trajectory tacking via the inclusion of algebraic varieties. The optimization is solved via the Iteratively Reweighted Least Squares method. For both linear and nonlinear trajectory tracking, the convergence of the proposed algorithms is studied via a boundedness analysis. An average performance improvement of 5 dB is observed over previous background subtraction methods. Second, time-varying narrowband leaked Multiple-Input Multiple-Output Orthogonal Frequency Division Multiplexing (MIMO OFDM) channel estimation is considered. It has been shown that the leaked MIMO OFDM channel is effectively separable in delay and Doppler domains. A convex optimization approach, based on the atomic norm heuristic, is leveraged to solve the recovery problem. The Cram´ er–Rao bound (CRB) on the leaked channel parameters is derived. Fixed point expressions for the training sequences that optimize the decoupled CRB in delay and Doppler domains are derived. xii For both trajectory tracking and channel estimation problems, the proposed algorithms are tested on experimental data and performance improvements of 2-5 dBs are observed over prior methods. xiii Related Publications Conference Papers 1. Amr Elnakeeb and Urbashi Mitra, “On Training Sequence Optimization for Leaked MIMO OFDM Channels,” in 2019 IEEE Global Communications Conference (GLOBECOM), 9-13 December 2019, Waikoloa, HI, USA 2. Amr Elnakeeb and Urbashi Mitra, “Variety-Based Background Subtraction for Nonlinear Trajectory Tracking,” in 2019 Asilomar Conference on Signals, Systems, and Computers, 3-6 Nov. 2019, CA, USA 3. Amr Elnakeeb and Urbashi Mitra, “On the Cram´ er–Rao Bound of Time-Varying Narrowband Leaked MIMO OFDM Channels,” in 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), June 2018, pp. 1-5. 4. Amr Elnakeeb and Urbashi Mitra, “Cram´ er-Rao Bound for Line Constrained Trajectory Tracking,” in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2018, pp. 3894–3898. 5. Amr Elnakeeb and Urbashi Mitra, “Sparsity and Rank Exploitation for Time-Varying Narrowband Leaked OFDM Channel Estimation,” in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2018, pp. 3894–3898. 6. Amr Elnakeeb and Urbashi Mitra, “Low-rank, Sparse and Line Constrained Estimation: Applications to Target Tracking and Convergence,” in 2017 IEEE International Symposium on Information Theory (ISIT), June 2017, pp. 2283-2287. Journal Papers 7. Amr Elnakeeb and Urbashi Mitra, “On Non-linear Trajectory Tracking via Variety-Based Background Subtraction Models,” submitted to IEEE Transactions on Signal Processing, June 2020 8. Amr Elnakeeb and Urbashi Mitra, “On leaked MIMO OFDM Channels: Performance Bounds and Applications to Underwater Communications,” submitted to IEEE Transactions on Signal Processing, April 2019 xiv 9. Amr Elnakeeb and Urbashi Mitra, “Line Constrained Estimation with Applications to Target Tracking: Exploiting Sparse and Low-rank Structures,” IEEE Transactions on Signal Processing, vol. 66, no. 24, pp. 6488-6502, Dec 2018. 10. Sajjad Beygi, Amr Elnakeeb, Sunav Choudhary, and Urbashi Mitra, “Bilinear Matrix Factorization Methods for Time-Varying Narrowband Channel Estimation: Exploiting Sparsity and Rank,” IEEE Transactions on Signal Processing, vol. 66, no. 22, pp. 6062–6075, Nov 2018. xv Chapter 1 Introduction 1.1 Sparse and low-rank structures: “The Why” Structured signal processing is an exciting, rapidly growing, research area that has attracted considerable attention over the last years. In particular, sparse and low-rank structures have been actively considered in a variety of applications, computer vision and machine learning, demosaicing, pattern analysis, machine intelligence and image classification and reconstruction [1–6] , face recognition [7], collaborative filtering [8], background modeling [9], and subspace segmentation [10, 11], as well as applications in wireless communications, in particular, channel estimation [12–25]. The key is that leveraging more intrinsic structure in the problem yields better system performance. The definition of system performance substantially depends on the application in hand; it could be better estimation results, i.e., smaller MSE, or it could be power savings or savings in the number of measurements required to perform a certain task. 1.2 Thesis contributions: “The What” In this work, we focus primarily on the development of theory and algorithms for sparse and low-rank recovery, with applications to target tracking and channel estimation. In 1 subsequent chapters, we will see how the fundamentals of sparse and low-rank structures are expanded and extended to serve these applications. Target tracking has been persistently studied over the years [26–29], with applications in video-surveillance, traffic monitoring, human-computer interaction and navigation of vehicles. Sparse and other structured representations have been applied to target tracking [1, 29–31], wherein background subtraction is employed for detection, recognition and tracking. In this work, we leverage the sparse and low-rank structures for linear and nonlinear trajectory tracking. In particular, for background subtraction, the following decomposition is considered: a low-rank matrix representing the background and a sparse matrix consisting of the foreground objects treated as the sparse foreground [1–4, 6, 32, 33]; this approach is called Robust Principal Component Analysis (RPCA). Herein, we expand upon the RPCA model considered by adapting this low-rank and sparse framework for target tracking for objects on linear trajectories via the incorporation of additional rotation/transformation constraints and nonlinear trajectories via the inclusion of algebraic varieties [34, 35]. Wireless communications have enabled intelligent traffic safety [23, 36], automated robotic networks, underwater surveillance systems [18, 37], and many other useful technologies. In all of these systems, establishing a reliable, high data rate communication link between the transmitter and receiver is essential. To achieve this goal, accurate channel state information is needed to equalize the received signal and, thus combat the effects of the wireless channel. One of the well-known approaches to acquire channel state information is to probe the channel in time/frequency with known signals, and reconstruct the channel response from the output signals (see [38] and references therein). Least-squares (LS) and Wiener filters are classical examples of this approach. However, these methods do not take advantage of the rich, intrinsic structure of wireless communication channels in their estimation process. In particular, many time-varying channels have sparse representations in the Doppler-delay domain. The main challenge with classical approaches, i.e., LS and Wiener filtering, is that they 2 require large number of measurements compared to the number of unknown parameters in the estimation problem to perform well. To combat this challenge, we need to take advantage of the side information about the structure of the unknown parameters. By exploiting inherent structure, we can reduce the size of the feasible set of solutions. This results in the need for fewer observations. In this work, sparse and low-rank properties of the channel are exploited yielding better estimation results as well as a considerable reduction in the number of training symbols required for the channel estimation. 1.3 Thesis organization: “The How” The rest of this thesis is organized as follows: Chapter 2 presents the linear trajectory tracking application, in which sparse and low-rank structures are exploited. Related work is presented in Section 2.2. The signal model and optimization are introduced in Section 2.3. In Section 2.4, the multi-objective optimization is solved via the ALM method and an overall iterative algorithm is provided. The CRB is derived in Section 2.5. Convergence is discussed via a boundedness analysis in Section 2.6. Simulation results are given in Section 2.7. Appendices A.1, A.2, A.3 and A.4 present proofs of Theorem 1, Corollary 1, 2, and Theorem 3 of this chapter. The work in Chapter 2 is extended in Chapter 3 to encompass nonlinear trajectory tracking, via the inclusion algebraic varieties. The signal model and optimization are introduced in Section 3.2. In Section 3.3, the multi-objective optimization is solved and the Variety Based Background Subtraction-Iteratively Reweighted Least Squares (VBBS-IRLS) algorithm is provided. Convergence analysis and performance bounds are studied in Section 3.4. Simulation results are presented in Section 3.6. Appendices B.1 through B.3 provide the proofs of the technical results of this chapter. Chapter 4 presents the leaked channel estimation work for single carrier. Section 4.2 develops the communication system model which is used in Section 4.3 to derive the 3 discrete time observation model which captures the bilinear structure of the problem. Section 4.4 derives the two strategies based on non-convex and convex programming. Section 4.5 is devoted to discussion and numerical results. Appendices C.1, C.2, and C.3 provide the proofs of the three major theorems of the work. Chapter 5 extends the work of Chapter 4 to MIMO OFDM systems and presents theoretical performance bounds, i.e., the Cramer-Rao Bound. The signal model is introduced in Section 5.2. In Section 5.3, the optimization problem is formulated and the attendant algorithm is provided; optimality, uniqueness, and the estimation algorithm are also given. Different variants of the CRB are derived in Section 5.4. The algorithm for findings the optimized training sequences is derived in Section 5.5. Numerical results based on both synthetic and experimental data are given in Section 5.6. Appendices D.1 through D.4 provide the proofs of the technical results. Finally, Chapter 6 summarizes the work and concludes the thesis. 4 Chapter 2 Line Constrained Estimation with Applications to Target Tracking: Exploiting Sparsity and Low-rank 2.1 Introduction We expand upon the RPCA model considered in [2, 3] by adapting this low-rank and sparse framework for target tracking for objects on nearly linear trajectories. Extracting linear features is a key point in computer vision applications, object tracking, scene analysis, image compression, crack detection in materials, and robot-navigation [39–41]. Further, line segments arise in a myriad of natural and synthetic objects. Many complex objects can be represented by a combination of multiple linear features [42]; thus the proposed approach could be exploited in a larger, more sophisiticated system. We will apply our new methods to target tracking as an exemplar application. Target tracking has been persistently studied over the years [26–29], with applications in video-surveillance, traffic monitoring, human-computer interaction and navigation of vehicles. Sparse and other structured representations have been applied to target tracking 5 [1, 29–31], wherein background subtraction is employed for detection, recognition and tracking. In particular, the augmented Lagrange multipliers method [43–45] is used to tackle the optimization problem presented in [1], which we will adapt in the current work. In [30], the object of interest is represented via multiple linear subspaces. A particle filter framework has been exploited in [31]. In [29], both detection and tracking are simultaneously considered via applying the RPCA model and a graph cut algorithm for detection, and fractal analysis for tracking. The rest of this chapter is organized as follows: Related work is presented in Section 2.2. The signal model and optimization are introduced in Section 2.3. In Section 2.4, the multi-objective optimization is solved via the ALM method and an overall iterative algorithm is provided. The CRB is derived in Section 2.5. Convergence is discussed via a boundedness analysis in Section 2.6. Simulation results are given in Section 2.7. Appendices A.1, A.2, A.3 and A.4 present proofs of Theorem 1, Corollary 1, 2, and Theorem 3 of this work. 2.2 Related Work In [2], RPCA is employed, in which thel 1 -norm relaxation is considered for the sparse component. In several practical applications, sparse components are not uniformly and independently distributed, but have additional structure. Mixed norms allow one to exploit certain types of structure, i.e., an l 2,1 -norm captures both element and group sparsity—as in [3], in which the spatial connection between foreground regions is considered. Another option is to employ Bayesian inference methods (see e.g., [46, 47]). Alternative norms for RPCA are considered in [48] (cappedl 1 -norm and trace) and [49] (Schattern-p norm and an lq-norm); however these approaches are non-convex and in particular induce high complexity relative to our proposed approach, but they are tighter than their convex counterparts. 6 In [50], RPCA is exploited for a particular application, i.e., maritime scenes. A double-constrained RPCA, named Shape and Confidence Map-based RPCA (SCM-RPCA), is proposed to improve the object foreground detection in maritime scenes, in which the optimization problem is formulated such that the sparse component is constrained by shape and confidence maps. Herein, a different application is examined: We consider how to add additional matrix-based structure to enable finding lines. We find that exploiting additional structure, unique to applications, can often provide strong improvements in performance to sparse/low-rank based methods (see e.g., [51–53]). We assume that we have a single object of interest, whose position and velocity are to be estimated. The expected linear trajectory is translated into additional constraints for the optimization. Furthermore, we compare with the target tracking methods presented in [27–29]. In [27], a technique to detect moving object in videos with dynamic background; through a two-step algorithm is proposed. First, real objects in a degraded video are detected using a two-level thresholding approach. Second, generalized regression neural network is exploited to track the previous detected objects throughout the frames of the video. In [28], an object tracking algorithm that exploits a sparsity-based discriminative classifier (SDC) and a sparsity-based generative model (SGM) is proposed. In [29], group object detection and tracking is presented. RPCA is combined with a graph cut algorithm for the detection part; whilst the object is tracked via fractal analysis. We also derive the Cram´ er–Rao Bound (CRB) to evaluate the potential optimality of the proposed estimation method. Our derivation builds upon the work [54–58]. In particular, [54] presents the computation of the CRB for the traditional RPCA problem. Additionally, we examine the impact of the various constraints in the proposed problem. In [59], the impact of multiple constraints is questioned for certain low-rank and sparse problems with a particular structure. We examine the impact of selecting different subsets of the multiple constraints and confirm that all constraints are, in 7 fact, needed to achieve the best performance. The main contributions of this work are summarized as follows: 1. We propose a new formulation using low-rank and sparse representations for tracking objects that move on straight lines, where we introduce our algorithm, Line Estimation via the Augmented Lagrange Multiplier (LE-ALM). The introduction of an additional regularizing rotation allows the line structure to be captured by an additional low rank constraint. 2. We provide a low complexity solution to the multiobjective optimization via the Augmented Lagrange Multiplier (ALM) method [43, 44]. 3. We adapt the derivations in [54–56] to accommodate the additional linear constraint for the CRB derivations. The computed CRB predicts that employing the additional constraint will reduce the mean-squared error as evidenced by the numerical results. 4. We assess the convergence of the proposed iterative algorithm LE-ALM via a boundedness analysis, building on the work of [1]; we also show that the aggregated recovery error approaches zero. 5. We estimate the velocity and position of the target from the estimated sparse component. We apply our technique on synthetic data. Simulation studies show that the proposed method offers 5 dB and 3.5 dB improvement, on average, in position and velocity estimation, respectively, over two of the state-of-the-art background subtraction techniques when applied to this problem. Furthermore, simulation studies show that the proposed method offers 7 dB, 6 dB and 5.5 dB improvement, on average, in position and velocity estimation, respectively, over three of the state-of-the-art target tracking techniques, and more importantly, the proposed method estimation error for the data matrix is very close to the computed 8 CRB, with average deviation of 1.2 dB, suggesting the near-optimality of our method. 6. We apply our target tracking algorithm to real video data, and observe that even for quasi-linear trajectories, the algorithm works very well even at low SNR values. 7. We show the efficacy of exploiting multiple structures rather than just a single structure, by taking out one constraint and see how the performance degrades accordingly. 2.3 Signal Model and Optimization Problem We extend the model of RPCA given in [2, 3], where: X = L +S, X ∈ R m×n , andL,S have the same size ofX. The matrixX collects the observed frames which are vectorized column-wise. The matrixL is low rank and captures the background information. The matrixS is sparse and captures the motion of the object of interest. Fig. 2.1 provides a depiction of the signal model, wherein video frames are vectorized and collected in the observation matrixX; the object is represented by black pixels. Our optimization is as follows: minimize L,S,R kLk ∗ +λ 1 kSk 2,1 +λ 2 kRSk ∗ subject to X =L +S, RR T =I m , (2.1) whereR is a transformation matrix; R∈ R m×m , andI m is m×m identity matrix. The nuclear norm of a general matrixA∈R m×n is given askAk ∗ = Trace √ A T A = P min(m,n) i=1 σ i (A). The mixedl 2,1 norm of a general matrixA∈R m×n is given askAk 2,1 = P m i=1 q P n j=1 a 2 ij . The first constraint is the usual decomposition constraint into low-rank and sparse components as in [2, 3]. The second constraint guarantees thatR is a transformation 9 Figure 2.1. Signal model. matrix. We observe that if the line is appropriately “de-rotated,” the resulting matrix RS will be rank one. Thus, the line structure is captured by the third term in the objective function. The matrix R is a transformation matrix that preserves energy; that is,kRSk 2 F =kSk 2 F . In particular, we seek an isometric transformation that is dimension-preserving. For the left hand side, by definition of the Frobenius norm, kRSk 2 F = tr(R T S T SR) = tr(RR T S T S), where the last equality holds due to trace properties. For the right hand side, by definition of the Frobenius norm,kSk 2 F = tr(S T S). To ensure equality, one solution is thatRR T =I m (identity matrix), which is the second condition stated in (2.1). Note also that the multiplication does not reduce the dimension ofS sinceR is a square matrix. We also underscore ifS is the identity matrix or has certain diagonal elements, the transformation matrixR would not induce a full rotation, and we would not get an exact rank-one matrixRS. That being said, these cases occur with low probability. 10 To facilitate derivations, we define S r as the matrix S after rotating via R, i.e., S r =RS, then the optimization model can be rewritten as follows: minimize L,S,R,Sr kLk ∗ +λ 1 kSk 2,1 +λ 2 kS r k ∗ subject to X =L +S, RR T =I m , S r =RS. (2.2) 2.4 Solving the Optimization We decompose this problem into four subproblems that are solved iteratively. By adapting the method of Augmented Lagrange Multipliers (ALM) [43], the Lagrangian problem is expressed as follows, L(L,S,R,S r ,M,N,V,β) =kLk ∗ +λ 1 kSk 2,1 +λ 2 kS r k ∗ +hM,XLSi + D N,RR T −I m E +hV,S r −RSi + β 2 kXLSk 2 F +kRR T −I m k 2 F +kS r −RSk 2 F , (2.3) where, M, N, and V are Lagrange multipliers, and β is a positive scalar, and the matrix/vector inner product is given ashA;Bi = Tr(A T B). The squared Frobenius norm for a general matrixA∈R m×n is given askAk 2 F = P m i=1 P n j=1 |a ij | 2 . Using the ALM method, we have, (L k+1 ,S k+1 ,R k+1 ,S r k+1 ) = arg min L,S,R,Sr L(L,S,R,S r ,M k ,N k ,V k ,β k ), M k+1 =M k +β k (X−L k+1 −S k+1 ), N k+1 =N k +β k (R k+1 R T k+1 −I m ), and V k+1 =V k +β k (S r k+1 −R k+1 S k+1 ). (2.4) 11 Solving (2.4) directly is computationally intractable [43–45]; thus, we use the alternating method to minimize each component separately [43–45], that is, L k+1 = arg min L L(L,S k ,R k ,S r k ,M k ,N k ,V k ,β k ) S k+1 = arg min S L(L k+1 ,S,R k ,S r k ,M k ,N k ,V k ,β k ) R k+1 = arg min R L(L k+1 ,S k+1 ,R,S r k ,M k ,N k ,V k ,β k ) S r k+1 = arg min Sr L(L k+1 ,S k+1 ,R k+1 ,S r ,M k ,N k ,V k ,β k ) (2.5) Seeking stationary points via differentiation, when solving for a single parameter, we can obtain the solutions for the four alternating terms as follows. 2.4.1 Derivation of the update step forL To derive the update for the matrixL, we perform the differentiation with respect toL while keeping all other key matrices as constants, ^ L k+1 = arg min L L(L,S k ,R k ,S r k ,M k ,N k ,V k ,β k ). We drop the indicesk andk + 1 for simplicity; that yields, ^ L = arg min L L(L,S,R,S r ,M,N,V,β) = arg min L kLk ∗ +λ 1 kSk 2,1 +λ 2 kS r k ∗ +hM,XLSi + D N,RR T −I m E +hV,S r −RSi + β 2 kXLSk 2 F +kRR T −I m k 2 F +kS r −RSk 2 F ! . Simplifying by taking out the terms that are not function ofL yields, ^ L = arg min L kLk ∗ +hM,−Li + β 2 kXLSk 2 F = arg min L 1 β kLk ∗ + 1 2 kL− (β −1 M +XS)k 2 F . 12 LetW =β −1 M +X−S. Then, ^ L = arg min L 1 β kLk ∗ + 1 2 kL−Wk 2 F . Then, using the result from Singular Value Thresholding algorithm [43–45], the resulting estimate forL is given by, ^ L =UT1 β (W )V T , (2.6) whereUV T is the Singular Value Decomposition (SVD) ofW , andT α (·) is a soft thresholding operator defined asT α (x) = sign(x) max{|x|−α, 0}; for matrices,T α (·) is applied component-wise. 2.4.2 Derivation of the update step forS To derive the update for the matrixS, we perform the differentiation with respect toS while keeping all other key matrices as constants; ^ S k+1 = arg min S L(L k+1 ,S,R k ,S r k ,M k ,N k ,V k ,β k ). We drop the indicesk andk + 1 for simplicity; that yields, ^ S = arg min S L(L,S,R,S r ,M,N,V,β) = arg min S kLk ∗ +λ 1 kSk 2,1 +λ 2 kS r k ∗ +hM,XLSi + D N,RR T −I m E +hV,S r −RSi + β 2 kXLSk 2 F +kRR T −I m k 2 F +kS r −RSk 2 F ! . 13 Simplifying by taking out the terms that are not function ofS yields, ^ S = arg min S λ 1 kSk 2,1 +hM,Si +hV,RSi + β 2 kXLSk 2 F +kS r −RSk 2 F = arg min S λ 1 β kSk 2,1 + 1 2 kS− (β −1 M +X−L)k 2 F +kRS− (β −1 V +S r )k 2 F . Let,Z =β −1 M +XL, andE =β −1 V +S r . Then, ^ S = arg min S λ 1 β kSk 2,1 + 1 2 kS−Zk 2 F +kRS−Ek 2 F ! . Differentiation with respect toS yields the following equation for estimatingS, λ 1 β sign(Z) +SZ +R T (RS−E) = 0, (2.7) where = diag kS j→ k −1 2 |S|, andS j→ is thej th row ofS. Solving Equation (2.7) iteratively forS yields the solution ˆ S. 2.4.3 Derivation of the update step forR To derive the update for the matrixR, we perform the differentiation with respect toR while keeping all other key matrices as constants; ^ R k+1 = arg min R L(L k+1 ,S k+1 ,R,S r k ,M k ,N k ,V k ,β k ). 14 We drop the indicesk andk + 1 for simplicity; that yields, ^ R = arg min R L(L,S,R,S r ,M,N,V,β) = arg min R kLk ∗ +λ 1 kSk 2,1 +λ 2 kS r k ∗ +hM,XLSi + D N,RR T −I m E +hV,S r −RSi + β 2 kXLSk 2 F +kRR T −I m k 2 F +kS r −RSk 2 F ! . Simplifying by taking out the terms that are not function ofR yields, ^ R = arg min R D N,RR T E +hV,RSi + β 2 kRR T −I m k 2 F +kS r −RSk 2 F = arg min R 1 2 kRR T − (β −1 N +I m )k 2 F +kRS− (β −1 V +S r )k 2 F . LetK =β −1 N +I m , and useE as defined before. Then, ^ R = 1 2 kRR T −Kk 2 F +kRS−Ek 2 F ! . Differentiation with respect toR yields the following equation for estimatingR, 2RR T R− (K T +K)R +RSS T −ES T = 0. (2.8) Again, solving Equation (2.8) iteratively provides the estimate ˆ R. 2.4.4 Derivation of the update step forS r To derive the update for the matrixS r , we perform the differentiation with respect to S r , ˆ S r k+1 = arg min Sr L(L k+1 ,S k+1 ,R k+1 ,S r ,M k ,N k ,V k ,β k ). 15 We drop the indicesk andk + 1 for simplicity; that yields, ˆ S r k+1 = arg min Sr L(L,S,R,S r ,M,N,V,β) = arg min Sr kLk ∗ +λ 1 kSk 2,1 +λ 2 kS r k ∗ +hM,XLSi + D N,RR T −I m E +hV,S r −RSi + β 2 kXLSk 2 F +kRR T −I m k 2 F +kS r −RSk 2 F ! . Simplifying, by removing the terms that are not function ofS r , yields, ˆ S r = arg min Sr λ 2 kS r k ∗ +hV,S r i + β 2 kSR−S r k 2 F = arg min Sr λ 2 β kS r k ∗ + 1 2 kS r − (β −1 V +RS)k 2 F . LetJ =β −1 V +RS. Then, ˆ S r = arg min Sr λ 2 β kS r k ∗ + 1 2 kS r −Jk 2 F . Then, using the result from the Singular Value Thresholding algorithm [43–45], ^ S r =UTλ 2 β (J)V T , (2.9) whereUV T is the SVD ofJ, andT α (·) is the soft thresholding operator we defined previously. Combining these solutions (described by Equations (2.6) through (2.9)), the ALM-based approach is summarized in Algorithm 1. After algorithm termination, we get the positions of the object as well as its velocity from the recovered matrixS as will be described in Section 2.7. We define the composed error, at iterationk, as, e k =kX−L k −S k k 2 F +kR k R T k −I m k 2 F +kS r k −R k S k k 2 F . (2.10) 16 Algorithm 1 LE-ALM: Estimation of L and S via joint sparse and low-rank optimization 1: Given:X,λ 1 ,λ 2 ,β andρ (ρ> 1) z . 2: while not converged do 3: ObtainingL,S,R, andS r as in (2.6):(2.9), respectively. 4: Updating Lagrange Multipliers via (2.4). 5: Updatingβ:β k+1 =ρβ k . 6: Setk =k + 1 and go to step 2. 7: end while 2.5 Cram´ er–Rao bound In this section, we generalize the derivations presented in [54–56] to obtain the CRB for our new problem. Herein, we answer the question of how the CRB would change as we go from the classical RPCA to more constrained RPCA that exploits the special structure in line estimation. We define y =A(L +S) +n; whereA is a linear operator;A : R m×n → R p , andn is a Gaussian-distributed noise vectorN (0, ). Hence, using the matrix-vector notation, withl ands are vec(L) and vec(S), respectively, we have, y y =A(l +s) +n, (2.11) whereA∈R p×mn is the matrix corresponding to the linear operatorA and vec stands for vectorizing a given matrix column-wise. We define the set X l,s def ={(L,S)∈R m×n ×R m×n : rank(L)≤r,||S|| 0 ≤s, rank(S) = 1}. (2.12) The first inequality in the definition ofX l,s represents the low-rank condition, and the second inequality represents the sparsity constraint. To enforce the line trajectory in the z Tuning of various parameters,λ 1 ,λ 2 ,β andρ, is given in Section 2.7. Herein, the definition of “not converged” means that the error in each of the three constraints is less than some predefined accuracy,γ. More detail is given in Section 2.7. y We underscore that, one shall not confuse the vector l= vec(L) with the all ones vector1. 17 structure ofS, the third condition inX l,s has been added. It is the third condition that necessitates adapting the results of [54] to our case. The rank-one condition presumes that the trajectory is already in the linear forms of a horizontal or vertical line. Thus, the uncertainty of finding the appropriate rotation,R, is removed from this computation. In this way, our computed CRB is a best-case scenario and should provide a further lower bound than the CRB incorporating the estimation ofR. We will see in the numerical results (Section 2.7), that in fact, the estimation ofR results in limited uncertainty and in fact, the idealized CRB is quite close to the achieved performance whereR must still be found. In our numerical results, random lines are generated with arbitrary lengths and angles and the auxiliary rotation is found via the LE-ALM algorithm. We then average the mean-squared error over not only the noise, but also all of these possible realizations of the the trajectory. Therefore, the closeness of the simulated algorithm performance to the idealized CRB suggests near-optimality of our proposed method. Following the derivations in [54], there are no computational differences for handling theL matrix, which captures the low rank background information both in our work here and in [54]. Differences in the derivation arise in handling the sparse component,S. Herein,S has the additional constraint of being rank-one. LetL = U 0 0 V T 0 be the SVD ofL, whereU 0 = [u 1 ,u 2 ,....u r ]∈ R m×r , 0 = diag([λ 1 ,λ 2 ,....λ r ])∈R r×r , andV 0 = [v 1 ,v 2 ,....v r ]∈R n×r . Furthermore, letU 1 and V 1 be the unit orthogonal bases for the spaces orthogonal to span{U 0 } and span{V 0 }, respectively. Then, we define Q l = [V 1 ⊗U 0 ,V 0 ⊗U 0 ,V 0 ⊗U 1 ]∈R mn×[(m+n)r−r 2 ] , where⊗ denotes Kronecker product. AsQ l is only a function ofL, thisQ l is defined as that in [54]. We next defineQ s ∈R mn×min(m+n−1,s) to be the matrix whose columns are vec e m i e n T j , (i,j)∈S; whereS = supp(S) ={(i,j)∈ [m]× [n] : S ij 6= 0}. 18 ThisQ s is however different from theQ s in [54]. Herein, we need to ensure thatS is a rank-one matrix per our signal model. Theorem 1 For (L,S)∈X l,s , the MSE for any locally unbiased estimator [54, 55, 58, 60] ( ˆ L, ˆ S) satisfies, MSE L,S ≥ tr Q T l A T Q T s A T −1 AQ l AQ s −1 , (2.13) whereQ l ,Q s , andA are as defined before. Proof: see Appendix A.1 Corollary 1 Let = σ 2 I p . Moreover, for any set Ω, letP Ω ∈ R mn×mn denote a diagonal matrix whose diagonal entries are ones for indices in Ω and zeros otherwise. Let A be the selection operator that observes the entries ofL that are randomly and uniformly selected, and indexed by Ω, andS is a random and uniform subset of Ω. For RPCA, i.e., our case, with Ω = [m]×[n] with sizep, and under the assumption ofp− min(m +n− 1,s)≥ c(Nlog 3 N) 2 × log( N 2 ) ; whereN = (m +n)r−r 2 , c and are some constants, then we have, min(s,m +n− 1)−N + 1 3 pN p− min(s,m +n− 1) + 2 3 mnN p− min(s,m +n− 1) ! σ 2 ≤ CRB≤ min(s,m +n− 1)−N + 3 pN p− min(s,m +n− 1) + 2 mnN p− min(s,m +n− 1) ! σ 2 (2.14) 19 with probability greater than 1− 10e −c 2 . Further, ifp =mn, we have min(s,m +n− 1)−N + mnN mn− min(s,m +n− 1) ! σ 2 ≤ CRB≤ min(s,m +n− 1)−N + 5 mnN mn− min(s,m +n− 1) ! σ 2 (2.15) with probability greater than 1− 10e −c 2 . Proof: see Appendix A.2. Examining our obtained CRB relative to the previous CRB without the rank-one constraint (in [54], Equation (26) therein), the difference is the term: min(m +n− 1,s), instead ofs; this results from the differentQ s matrix. We selected (m +n)r− r 2 + min(m +n− 1,s) linearly independent rows of Q l Q s through the operation AQ l AQ s (see right hand side of (2.13)); and this requiresA∈R p×mn withp being sufficiently large to select these (m+n)r−r 2 + min(m+n−1,s) linearly independent rows of Q l Q s . Intuitively, this term is possibly smaller than the original makes sense as we have fewer degrees of freedom in our current problem due to the linear tranjectroy constraint. This effect ofS, and accordinglyQ s , propagates through the derivations resulting in the bounds provided in (2.14) and (2.15). For low sparsity level s, both CRBs approach each other since for our newly obtained CRB, min(m+n−1,s) is dominated bys, and in turn reduces to the old CRB. On the other hand, ass increases, the termm +n− 1 dominates, and therefore gives lower values than the old CRB. 2.6 Convergence Analysis A legitimate question is whether the multiple iterative optimizations, considered in this work (in Algorithm 1), converge. Given the four alternating terms in Equation (2.5), 20 this analysis is an open problem [45]; however, we can adapt the derivations in [1] to conduct a boundedness analysis and confirm the feasibility of the accumulation points of the algorithm. We prove that the accumulation points do not diverge with iteration and in fact, exist. Furthermore, we prove that the composed error,e k , converges to zero. Theorem 2 Any accumulation point (L ∗ ,S ∗ ,R ∗ ,S ∗ r ) of (L k ,S k ,R k ,S r k ) is a feasible solution (i.e.,X =L ∗ +S ∗ ,R ∗ R ∗ T =I m , andS ∗ r =R ∗ S ∗ ), and the composed errore k , in (2.10), converges to zero, i.e., lim k→∞ e k = 0. Proof: see Appendix A.3. Theorem 3 The sequences{L k },{S k },{R k }, and{S r k } generated by Algorithm 1 are bounded. Proof: see Appendix A.4. These theorems indicate that the iterations of the algorithm are guaranteed to yield a decomposition into low-rank, and sparse model L +S, with an appropriate transformation that leads to the low-rank matrixS r , and thus the recovery of a (nearly) straight line is achieved. Theorem 2 proves that accumulation points (if any) of (L k ,S k , R k ,S r k ) approach a feasible solution to the required decomposition with a composed error approaching zero; whereas Theorem 3 proves that such accumulation points do, in fact, exist. Proving that the solution we reach is globally optimal, in terms of exact recovery of position and velocity needs further study. However, our simulation results (on both synthetic and real data) yield high performance for position and velocity estimation; thus, suggesting that the alternating scheme may in fact be near optimal. Additionally, our simulation results show that we are extremely close to the CRB for the matrix estimation. 21 2.7 Simulations 2.7.1 Synthetic Data Herein we consider numerical validation of our method, LE-ALM, in a noisy environment, and compare to two background subtraction techniques [2] and [3] as well as three target tracking techniques [27], [28] and [29]. Both [2] and [3] exploit the RPCA and employ background subtraction. In [2], it was observed that the received sequence is comprised of two terms, a low-rank matrix (background) and a structured sparse outlier matrix (foreground). Then, a saliency measure is proposed to dynamically estimate the support of the foreground. In [3], a similar technique has been devised that takes into consideration more information about the object (the sparse foreground structures), i.e., structured sparsity in addition to element-wise sparsity is considered. Herein, we consider additional side information about the object over that considered by both of these techniques—the linear nature of the trajectory. From now on, we denote the method of [2] as Background Subtraction based on Matrix Decomposition (BS-MD) and [3] as Efficient Background Subtraction based on Matrix Decomposition (EBS-MD), where the abbreviation BS is to emphasize the fact that both methods are based on background subtraction, however, since they do not exploit the side information of object, moving on a line, as we do; we expect performance improvement relative to these schemes. In [27], a technique to detect moving object in videos with dynamic background is proposed via a two-step algorithm. First, real objects are detected through by separating objects from turbulence-induced motions using a two-level thresholding approach. The input source is a degraded video. Second, a generalized regression neural network is exploited to track the previous detected objects throughout the frames of the video. In [28], an object tracking algorithm that exploits a sparsity-based discriminative classifier (SDC) and a sparsity-based generative model (SGM) is proposed. In the SDC model, an efficient method is introduced to calculate the confidence value that has more weight 22 assigned to the background than the foreground. In the SGM model, a histogram-based technique that considers the spatial information of each patch is developed. In [29], detection and tracking are combined together. Detection is achieved via exploiting the RPCA model as well as a graph cut algorithm; whereas the tracking is achieved via fractal analysis. Both detection and tracking are simultaneously considered in the same optimization problem. While the advantage of the first method [27] is dealing with dynamic backgrounds and the advantage of the second method [28] is dealing with complex backgrounds, and the advantage of the third method [29] is considering fractal analysis for better tracking results; neither of them is taking into account the inherent side structure that the moving object might have. Herein, we take into consideration special structure of the moving object, and hence, in addition to nearly perfect recovery of the background, we get excellent performance for object detection. We refer to [27] as (TT-GRNN) for Target Tracking using Generalized Regression Neural Network, [28] as (ROT-CM) for Robust Object Tracking algorithm using a Collaborative Model, and [29] as (GODT) for Group Object Detection and Tracking. The noise instances are taken entry-wise from the matrixZ 0 ;Z 0 (ij) ∼ N (0,σ 2 ). MatricesX,L 0 ,S 0 , andZ 0 arem×n = 120× 300. The matrixL 0 is low-rank (with rankr = 5), and is generated asL 0 =UV T , whereU is of sizem×r, andV is of size n×r. BothU andV are matrices with i.i.dN (0,σ 2 ), where is chosen arbitrarily to make the singular values ofL 0 much larger than those ofZ 0 (we set to 10). The entries ofS 0 are independently distributed, each taking on value 0 with probability p = 0.8, and object values are generated with probability 1−p = 0.2, with an arbitrarily fixed preselected value that represents the intensity of the object. Each run of a simulation (total of 10 5 runs) has randomly generated velocities drawn from a discrete uniform distribution [-10,10]. To generate a linear trajectory, the equation that describes the motion from position (i,j) to position (k,l) with velocityv kl,ij is as follows: (k,l) = (i,j)+v kl,ij . This equation is translated into two equations forx andy axes, respectively, as follows: k = i +v kl,ij | cos(γ kl,ij )|, and l = j +v kl,ij | sin(γ kl,ij )|; where γ kl,ij = 23 tan −1 (l−j)/(k−i) . The direction of the motion (positive or negative) is considered in the sign ofv kl,ij . The matrixX is then normalized: X = X/||X|| 2 F . The SNR is defined as|| X|| 2 F /σ 2 ; as a result, the noise variance is then 1/SNR. Cross validation has been used forλ 1 andλ 2 tuning, they have been set to 0.4 and 2.5, respectively. For the parameters β and ρ (ρ > 1), there is no systematic way for initial tuning [2, 61]; thus, we employ the values used therein: β 0 = 30/||sign(X)|| 2 ,ρ = 1.1; these settings appear to provide good convergence, on average. Algorithm 1 is run until the error in each of the three constraints, defined in optimization problem (3.6), converges to 10 −6 . Since the methods proposed in [2, 3] depend on background subtraction and we adapt their signal model, we first conduct simulations on the estimation of the matrices L andS that represent the background and the foreground, respectively. Ground truth is denoted by X 0 , and the estimated matrices as ˆ X; recovery errors in L and S, respectively, are defined as follows, e L = ||L 0 − ^ L|| 2 F ||L 0 || 2 F , and e S = ||S 0 − ^ S|| 2 F ||S 0 || 2 F . We do not investigate the error in eitherR orS r as they are auxiliary matrices generated to regularize the optimization. Fig. 2.2 shows the error in the recovery ofL andS, respectively—for different values of SNR for our method compared to the other two background subtraction methods. While all three methods offer comparable error for the recovery of the background, as represented by the low-rank matrixL, our algorithm provides remarkable improvement in the estimation of S, which contains the object tracking information. After the recovery of the matrix S, further post-processing has been applied to extract the moving object information. For position estimation, negligible values≈ 10 −7 are thresholded to zero. After estimating the positions of the object ( ˆ i, ˆ j) and their values 24 10 15 20 25 30 35 40 45 50 SNR (dB) 10 -6 10 -5 10 -4 10 -3 e L and e S e L : LE-ALM e L : EBS-MD e L : BS-MD e S : LE-ALM e S : EBS-MD e S : BS-MD Figure 2.2. Error of the matricesL andS versus SNR. ˆ s ij , we can get the estimated velocities. The estimated velocity ˆ v kl,ij from estimated position ( ˆ i, ˆ j) to estimated position ( ˆ k, ˆ l) is given by, ˆ v kl,ij = ˆ l− ˆ j sin(ˆ γ kl,ij ) = ˆ k− ˆ i cos(ˆ γ kl,ij ) , (2.16) where ˆ γ kl,ij = tan −1 ˆ l− ˆ j)/( ˆ k− ˆ i) . The Normalized Mean Square Error (NMSE) for the estimated positions and velocities are, respectively, given by, ||p 0 − ^ p|| 2 2 ||p 0 || 2 2 , and ||v 0 − ^ v|| 2 2 ||v 0 || 2 2 , where (p 0 ,v 0 ) are ground truth vectors, and ( ˆ p, ˆ v) are the estimates. The NMSE results of position and velocity estimation are presented in Fig. 2.3 and Fig. 2.4, resepctively; clearly showing the performance improvement offered by LE-ALM over both EBS-MD 25 10 20 30 40 50 SNR (dB) 0.001 0.003 0.005 NMSE of position LE-ALM EBS-MD BS-MD TT-GRNN ROT-CM GODT Figure 2.3. NMSE of Position Estimation versus SNR. 10 20 30 40 50 SNR (dB) 0.001 0.003 0.007 NMSE of velocity LE-ALM EBS-MD BS-MD TT-GRNN ROT-CM GODT Figure 2.4. NMSE of Velocity Estimation versus SNR. and BS-MD as well as the three target tracking methods [27–29] referred to TT-GRNN, ROT-CM and GODT, respectively. Finally, we assess the performance loss if the object trajectory deviates from the line model. Herein, when generating the matrixX, the positions of the moving object are on a piecewise linear trajectory composed of two connected line segments. Positions and velocities inS 0 are generated as before, but with two line-segments, rather than one, with the angle between the two segments denoted by θ; θ = π is thus a straight line. This angleθ is a measure of how linear the trajectory is. Herein, comparison with background subtraction methods suffices—since they do better than the target tacking methods as evidenced by Fig. 2.3 and Fig. 2.4. While our algorithm performs much better for linear trajectory estimation, it performs as good as other algorithms do for modest values ofθ as shown in Fig. 2.5 and Fig. 2.6, LE-ALM retains its superiority over other algorithms as far asθ approaches 180 o (the two lines reduce to one straight line). 26 100 120 140 160 180 (degrees) 0.001 0.003 0.006 NMSE of position LE-ALM EBS-MD BS-MD SNR= 50 dB SNR= 10 dB SNR= 30 dB Figure 2.5. NMSE of Position Estimation versus SNR. 100 120 140 160 180 (degrees) 0.001 0.003 0.006 NMSE of velocity LE-ALM EBS-MD BS-MD SNR= 10 dB SNR= 30 dB SNR= 50 dB Figure 2.6. NMSE of Velocity Estimation versus SNR. 2.7.2 Real Data Herein, we apply our method to real video data from [62, 63]. In [62, 63], there is a video of two moving boats following nearly linear trajectories. The following pre-processing was necessary in order to apply our method. Specifically, we extracted the segment between seconds 51 and 72, which correspondes to an approximately straight line of movement; see [64]. This part of the video is converted into a sequence of 641 frames (images). Each frame is of size 741× 1920. We focus on single object tracking—thus, the second small boat is removed. We used frames of size 741× 900. We used 10 frames. yielding matrices of size (741× 900)× 10 = 666900× 10. These matrices are the matrices used in simulations to estimate the matricesL andS. Since these matrices are too much large to display as images, what we do display as images are the ten images added together in one matrix of size 741× 900. To help the reader visualize the process, frame 71 is shown in Fig. 2.7. We tested our technique for different values of synthetic noise represented through the parameter SNR (as defined in previous subsections). For convenience, Fig. 2.8 27 shows the noisy version of frame 71 for SNR=30 dB. Fig. 2.9 represents the ideal scenario; whereas Fig. 2.10, as an example, is the estimated sequences for SNR=30 dB. The white objects represent the moving object for the ten frames stacked on top of each other. Table 3.1 summarizes the NMSE of the recovered background matrix and the recovered sparse object matrix for various SNR values. As can be seen from the table, the proposed LE-ALM method provides strong estimation results and it has very small error even at low SNR regimes. Table 2.1 NMSE forL andS for various SNR values SNR (in dB) e L e S 5 7.89× 10 −5 5.71× 10 −4 10 6.97× 10 −5 4.83× 10 −4 15 5.34× 10 −5 3.84× 10 −4 20 4.92× 10 −5 2.21× 10 −4 25 3.71× 10 −5 1.73× 10 −4 30 2.69× 10 −5 1.24× 10 −4 35 1.94× 10 −5 0.89× 10 −4 40 0.85× 10 −5 0.62× 10 −4 45 0.21× 10 −5 0.36× 10 −4 50 0.013× 10 −5 0.18× 10 −4 A legitimate question is what would have happened had we kept the second (smaller) boat. However, in this work we are only interested in single object estimation, we simulated the case when we keep the second boat, and we encounter a performance loss of just 0.5 dB in position estimation, and 0.88 dB in velocity estimation. Both values are averaged over all values of SNR. 2.7.3 CRB Comparisons In this subsection, we investigate the possible optimality of our method by comparing to the CRB derived in Section 2.5. In this subsection, we are using synthetic data; where 28 Figure 2.7. Frame 71. Figure 2.8. Frame 71 with noise (SNR=30 dB). Figure 2.9. Ground Truth. Figure 2.10. Estimated Sequence SNR=30 dB. the matricesX,L 0 , S 0 , andZ 0 are generated in the same way discussed in Section 2.7.1. First, we compare our estimated error to our obtained CRB with limiting the orientation of the lines to only vertical or horizontal lines. This is pictorially depicted through Fig. 2.11. Clearly, both curves are on top of each other which means that our estimator has outstanding results in terms of vertical or horizontal lines estimation. 29 10 20 30 40 50 SNR (dB) 10 -6 10 -4 10 -2 10 0 NMSE of X CRB LE-ALM Figure 2.11. Comparison of Vertical (or horizontal) lines for different SNR values. Second, we compare our estimator results to the computed CRB for various types of lines. We take into consideration lines that swing from horizontal line (θ = 0 ◦ ) to vertical line (θ = 90 ◦ ) through the parameter θ. We average over two forms of randomness: for a fixedθ, we generate numerous lines that correspond to this value; we also average over the noise for a specific SNR value. Fig.2.12 shows the result of this comparison for different SNR values. This result concludes that our estimation error is very small, i.e., very close to the CRB for any type of line. Finally, Fig. 2.13 shows the NMSE error of the matrix X for our method in comparison with both background subtraction methods and traget tracking methods as well as the derived CRB. We compare to the previously noted methods: BS-MD [2], EBS-MD [3], TT-GRNN [27], ROT-CM [28] and GODT [29]. However, as the five methods do not exploit the linear side information, as we do, we expect a loss in performance relative to LE-ALM as seen in Fig. 2.13. Furthermore, LE-ALM is the closest to the derived CRB, with only a deviation of 1.2 dB, on average. Furthermore, motivated by the work in [59] which shows that multiobjective optimization with 30 Figure 2.12. Comparison of lines with various slopes (variousθ) for different SNR values. multiple structures can do no better than exploiting only one of the structures, we take out the sparsity objective i.e.,λ 1 kSk 2,1 from the multiobjective optimization given in (2.1), and see how the performance is accordingly affected. As can be seen from Fig. 2.13, the performance; however, degrades, verifying that considering multiobjective optimization for our particular problem of linear trajectory estimation does better than exploiting only a single structure. Acknowledgments We would like to thank the USC Robotic Embedded Systems Lab: Hordur Heidarsson and Stephanie Kemna for providing us with the video data used herein. 31 10 15 20 25 30 35 40 45 50 SNR (dB) 10 -5 10 -4 10 -3 NMSE of X CRB LE-ALM EBS-MD BS-MD TT-GRNN ROT-CM GODT Sparsity taken out Figure 2.13. Comparison of ALL techniques with CRB for different SNR values. 32 Chapter 3 Nonlinear Trajectory Tracking via Variety-Based Background Subtraction Models 3.1 Introduction Target tracking has been persistently studied [26, 65], due to its use in several applications, e.g., traffic monitoring, video-surveillance, navigation and human-computer interaction. Structured representations have been recently applied to target tracking, yielding better tracking results [1, 30, 53]. Background subtraction, which exploits robust principal component analysis (RPCA) [66] has attracted wide attention [1–3] and use in target tracking [1, 29–31]. With such a framework, the background is modeled as a low-rank component, and the foreground is modeled as a sparse component after a video is vectorized. In [35], low-rank matrix completion is extended beyond linear models to encompass cases where the data points belong to an algebraic variety. The rationale behind exploiting algebraic varieties [34] is that a myriad class of non-linear curves and surfaces 33 can be represented by a variety model. The key result of [35] is that the monomial features matrix is rank deficient if and only if the data points belong to a variety. A lower limit on the sampling rate is derived. An efficient matrix completion algorithm that minimizes a convex or non-convex surrogate of the rank of the matrix of the monomial features is also provided. The kernel trick, which is commonly used in pattern analysis [67], is exploited to avoid dealing directly with the high-dimensional monomial matrix. Iterative Reweighted Least Squares (IRLS) [68–70] is specifically chosen to solve the optimization since it kernelizes very well in comparison to other matrix completion algorithms (e.g., singular value thresholding [71] and alternating minimization [72]). In our prior work [73], we extended the background subtraction model considered in [2, 3] by adapting this low-rank and sparse framework to target tracking for objects on nearly linear trajectories. The expected linear trajectory is translated into additional constraints for the optimization. In particular, a secondary low-rank constraint is imposed in the optimization. In [53, 73, 74], the optimization framework is presented, convergence is studied via a boundedness analysis, the Cram´ er–Rao Bound (CRB) is derived for testing the goodness of our estimator, and finally simulation results on both synthetic and real data are provided, which show superior estimation results over other methods that do not consider the additional side structure. Herein, we further extend the background subtraction model to incorporate non-linear trajectory tracking. We consider a particular category of nonlinearities—that is, each sparse entry has a mapping to a 2D (or 3D) point in space, where these space points belong to an “algebraic variety,” i.e., each space point is a solution to a system of polynomial equations. Herein, we propose the new Variety-based Background Subtraction framework to handle non-linear trajectories. The sparse trajectory is assumed to belong to an algebraic variety necessitating an additional low-rank penalty function in the overall objective function. The main contributions of this Chapter are summarized as follows: 34 1. Variety-based background subtraction model is proposed. The multiobjective optimization is formulated such that the following three components are simultaneously optimized: The sparse trajectory, the low-rank background and the variety representing the non-linearity of the trajectory. 2. The Variety-based Background Subtraction Iterative Reweighted Least Squares (VBSI) efficient algorithm is proposed to solve the optimization. 3. We prove that key iterates of our algorithm are bounded at every iteration suggesting convergence of the algorithm. Furthermore, we show that any limit point of the algorithm is a stationary point. Finally, the error of the target component is shown to converge to zero. 4. The sensitivity of the iterative algorithm to its initialization is characterized. 5. Perfect recovery is observed for the noiseless scenario, and an improvement of 2.4 dB is observed over prior work for the noisy scenario, with fast convergence. The rest of this Chapter is organized as follows: The signal model and optimization are introduced in Section 3.2. In Section 3.3, the multi-objective optimization is solved and the VBSI algorithm is provided. Convergence analysis and performance bounds are studied in Section 3.4. Practical considerations are provided in Section 3.5. Simulation results are presented in Section 3.6. Appendices B.1. through B.4. provide the proofs of the technical results. Notation: Scalar values are denoted by lower-case letters, x and vectors by bold letters,x. A matrix is denoted by bold capital letters,X. The transpose ofX is given byX T and its conjugate transpose byX H . The identity matrix is defined asI, with dimension given to its subscript. The set of real numbers is denoted by R, and the set of complex numbers is denoted by C. The Kronecker product is denoted by⊗ and the entry-wise (Hadamard/Shur) product is denoted by. Other notations defined throughout the Chapter whenever needed. 35 3.2 Signal Model and Optimization Problem 3.2.1 RPCA: Prior related work The background sutraction model considered in[2, 3] isX =L +S, whereX,L,S∈ R m×n , m ≥ n. The matrix X is the noiseless observation matrix that collects the observed frames which are vectorized. The matrixL is low rank and represents the background information. The matrixS is sparse and describes the object of interest. Specifically, the optimization considered in [3] is as follows, arg min L,S kLk ∗ +λ 1 kSk 2,1 s.t. X =L +S, (3.1) where the nuclear norm is used to capture the low-rank component, whereas the mixed l 2,1 norm is used to capture the sparse component, and the problem is constrained by the decomposition of the observation matrixX intoL andS. The nuclear norm of a general matrixA∈R m×n is defined askAk ∗ = Trace √ A T A = P min(m,n) i=1 σ i (A). The mixed l 2,1 norm of a general matrixA∈ R m×n is defined askAk 2,1 = P n j=1 q P m i=1 a 2 ij . The problem provided in (3.1) is convex due to the convexity of the nuclear norm and the mixedl 2,1 norm as well as the imposed linear constraint. Since convex approximations are not as tight as their non-convex counterparts, there are other non-convex variants [75] (Schatten-p norm and anlq-norm) and [76] (cappedl 1 norm and trace) that yield better results at the expense of losing the global optimality ensured by convexity. We will consider the Schatten-p norm herein. 36 3.2.2 Preliminaries Preliminaries: Algebraic varieties and lifting An algebraic variety is the solution set to a system of polynomial equations [34]. Mathematically, an algebraic varietyV is defined as, V ={y∈R m :p 1 (y) =p 2 (y)··· =p q (y) = 0}. (3.2) In [35], the matrix completion problem is extended beyond the linear framework. For matrix completion, denote the matrix to be completed byY = [y 1 ,...,y n ]∈ R m×n , where each columny i ∈R m is viewed as one data point; thusY is comprised ofn data points. The goal in [35] was to complete the matrixY under the constraint that these data points are collected from a non-linear model; specifically, a variety. In [35], the mapping φ d : R m → R M is defined such that it sends the vector y = [y 1 ,...,y m ] to the vector of all monomials of y 1 ,...,y m of degree at most d. Mathematically, φ d (y) = (y ) ||≤d ∈R M , (3.3) where = (α 1 ,...,α m ) is a multi-index of non-negative integers, with y = Δ y α 1 1 ...y αm m , and|| = Δ α 1 +··· +α m . The matrix φ d (Y )∈ R M×n is the resulting matrix after applying φ d to each column ofY . The matrix φ d (Y ) is called the lifted matrix, since the dimension is increased from m to M, where the latter dimension is called the feature space dimension. The key result of [35] is thatφ d (Y ) is rank deficient if and only if the columns ofY (the data points ofY ) belong to an algebraic variety. The optimization problem can then be formulated as a rank minimization problem via the Schatten-p quasi-norm minimization subject to a projection constraint. 37 Given the feature space dimension , M = m +d d ∼O(m d ), the kernel trick [67] is leveraged in [35] to create a more modest working dimension. For two matrices Y = [y 1 ,...,y n ]∈R m×n ,Z = [z 1 ,...,z n ]∈R m×n , the efficiently computable kernel functionk d (Y,Z) is defined as, k d (Y,Z) :=φ d (Y ) T φ d (Z) = (Y T Z + 1) d , (3.4) where (.) d is the entry-wise d th power of a matrix and 1 is the all ones matrix. The working dimension in (3.4) isn, which is much smaller thanM. 3.2.3 Proposed optimization problem A visualization of the system model is depicted in Fig. 3.1. The matrixX consists of five frames, each of sizel×k. The object of interest is represented by black pixels. y Additionally, the sparse entries are mapped into points in space (p 1 throughp 5 ), where the (x,y) pairs are stacked column-wise to construct a new matrixS 0 ∈ R 2×5 . These space points (columns ofS 0 ) belong to a variety, e.g., in the provided example, it is 9y 2 −x 2 = 36. It shall be noted thatS 0 ∈ R 2×k , where k is the sparsity level ofS, i.e., the number of nonzero entries inS. The idea is easily extendable to multi-dimension structures, in whichS 0 ∈R b×k , whereb is the dimension of the problem. In the sequel, without loss of generality, we assume 2D structures. The feature space dimension M represents the number of unique polynomials in m variables of degree at mostd. This shall not be confused withM 0 , defined as the number of unique polynomials inm variables of degreed;M 0 = m+d−1 d = m+d−1 m−1 . y Herein, without loss of generality, we represent the object by one pixel (one entry). The idea is easily extendable to an object represented by a rectangle window of pixels of sizew×h. 38 Given how matrixX is constructed, we “lose” the algebraic variety embedded in the spatial representation. Thus, whileS captures the sparsity, it isS 0 that contains the spatial information. With the goal of determining the support of the sparse entries, we construct the mapping betweenS andS 0 . If we define the coordinate system axes asx∈R 1×l andy∈R k×1 , then one can construct the matrixC∈R 2×m as C = vec(X) T vec(Y ) T , (3.5) whereX = 1⊗x, with 1 ∈ R k×1 andY = 1⊗y, with 1 ∈ R 1×l . Under the assumption that the sparse matrix is binary, with ones representing the object pixels, the mapping is the matrixC, i.e.,S 0 = CS. For the example provided in Fig. 3.1, S 0 = CS = −10 −5 0 5 10 3.88 2.6 2 2.6 3.88 , where the columns ofS 0 are the points p 1 throughp 5 on the parabolic curve. The case of the matrixS not being binary is discussed in Section 3.5. We define the mapping φ d : R 2 → R M such that it sends the vectors 0 = [s 0 1 ,s 0 2 ] to the vector of all monomials ofs 0 1 ands 0 2 of degree at mostd, thus,M = 2 +d d . Therefore, the lifted matrix φ d (S 0 )∈ R M×n results after applying φ d to each column ofS 0 . From [35],φ d (S 0 ) is rank deficient if and only if the columns ofS 0 belong to an algebraic variety. Since we already know that the columns ofS 0 belong to an algebraic variety, thusφ d (S 0 ) is rank deficient. As in [35], we employ the Schatten-p quasi norm to promote low rank in our monomial matrix. Our final optimization problem is, arg min L,S,S 0 kLk q Sq +λ 1 kSk c 2,c +λ 2 kφ d (S 0 )k p Sp subject to X =L +S, CS =S 0 , (3.6) 39 Figure 3.1. Signal model example. where the Schatten-p quasi-normkAk Sp of a general matrixA is defined askAk Sp = P min(m,n) i=1 σ i (A) p 1 p , 0 < p≤ 1. Note that the Schatten-p quasi-norm reduces to the nuclear norm whenp = 1 and to the Frobenius norm whenp = 2. The mixedl 2,c norm of a general matrixA∈ R m×n is defined askAk c 2,c = ( P n j=1 ( P m i=1 a 2 ij ) c 2 ) 1 c . Further, c = 1 if the mixed l 2,1 is the intended norm for sparsity. Note that, in (3.6), we have generalized the nuclear norm and the mixedl 2,1 norm that are provided in (3.1) to the Schatten-q norm and the mixedl 2,c norm, respectively. We reformulate the optimization given in (3.6) by integrating the constraints into the objective function, arg min S kX−Sk q Sq +λ 1 kSk c 2,c +λ 2 kφ d (CS)k p Sp . (3.7) 40 3.3 Solving the Optimization The major challenge in solving (3.7) is that the Schatten norm and the mixedl 2,c norm are generally non-smooth. For example, ifp, orc, orq < 1. Introducing regularization termsμ i > 0 z [35, 70] is one way to smoothen these norms. Thus, (3.7) is rewritten as follows, arg min S J (S) = arg min S tr (X−S) T (X−S) +μ 2 1 I n q 2 +λ 1 n X j=1 kS (j) k 2 2 +μ 2 2 c 2 +λ 2 tr φ T d (CS)φ d (CS) +μ 2 3 I n p 2 ! , (3.8) whereS (j) is thej th column ofS. We defineI n to be the identity matrix of dimension n. Let L(S) = tr (X−S) T (X−S) +μ 2 1 I n q 2 , (3.9) Q(S) = n X j=1 kS (j) k 2 2 +μ 2 2 c 2 , and (3.10) P(S) = tr φ T d (CS)φ d (CS) +μ 2 3 I n p 2 . (3.11) We endeavor to minimizeJ (S) via gradient methods, as such we provide the gradient expressions for the constituent terms with respect toS in the overall objective function:L,Q andP. z One can chooseμ 1 =μ 2 =μ 3 =μ; we assign different values to give room to tune independently. More discussions are provided in Section VI. 41 Gradient ofL(S) ∂L(S) ∂S =q(S−X) (X−S) T (X−S) +μ 2 1 I n q 2 −1 =q(S−X)M, (3.12) whereM = Δ (X−S) T (X−S) +μ 2 1 I n q 2 −1 . Gradient ofQ(S) Consider first the column wise differentiation, ∂Q(S) ∂S (j) = cS (j) kS (j) k 2 2 +μ 2 2 1− c 2 .Thus, ∂Q(S) ∂S =cSN, (3.13) where N is a diagonal matrix with the j th diagonal entry being N jj = Δ kS (j) k 2 2 +μ 2 2 c 2 −1 . Gradient ofP(S) P(S) = tr φ T d (CS)φ d (CS) +μ 2 3 I n p 2 (a) = tr k d (CS,CS) +μ 2 3 I n p 2 (b) = tr (CS) T (CS) + 1 d +μ 2 3 I n !p 2 (3.14) = tr (S T DS + 1) d +μ 2 3 I n p 2 , (3.15) 42 where (a) and (b) hold due to (3.4), andD = Δ C T C. Thus, ∂P(S) ∂S =pdDS W S T DS + 1 d−1 ! =pdDS(WF ), (3.16) where denotes the entry-wise product between two matrices. Further,F ∈ R n×n = Δ (S T DS + 1) d−1 , andW ∈ R n×n is defined as the weight matrix corresponding to P(S);W = Δ (k d (CS,CS) +μ 2 3 I n ) p 2 −1 . Combining (3.12), (3.13) and (3.16) yields, ∂J (S) ∂S =q(S−X)M +λ 1 cSN +λ 2 pdDS(WF ). By setting the gradient ∂J (S) ∂S to zero, we have the following stationary point equation, q(S−X)M +λ 1 cSN +λ 2 pdDS(WF ) = 0. (3.17) As can clearly be seen from (3.12), (3.13) and (3.16), the matricesM,N andW directly depend only onS. As such, the use of the IRLS is natural. We observe that in each iteration, S as well as M,N and W are updated. The IRLS algorithm is shown in Algorithm 2, we dub this the Variety-based Background Subtraction Iteratively Reweighted Least Squares (VBSI) 3.4 Convergence of VBSI In this section, we study the convergence of the VBSI algorithm. The optimization problem in (3.7) is non-convex for two reasons: (a) ifp, orc, orq < 1; and (b) due to the monomial lifting. Therefore, achieving a globally optimal solution is not guaranteed. 43 Algorithm 2 : VBBS-IRLS for Non-linear Trajectory Tracking 1: Given:X∈R m×n ,α 1 ,α 2 ,α 3 > 1,λ 1 ,λ 2 > 0, and> 0. 2: Initialize: t = 0,μ 1 ,μ 2 ,μ 3 > 0,M t =N t =W t =I n . Furthermore, initializeS t by solving Equation (3.17). 3: while not converged do 4: UpdateS t+1 by solving Equation (3.17) forS. 5: Update the weight matricesM,N, andW : M t+1 = (X−S t+1 ) T (X−S t+1 ) +μ 2 1 I n q 2 −1 , (3.18) N t+1 (ij) = kS t+1 (j) k 2 2 +μ 2 2 c 2 −1 , j =i, 0, j6=i, (3.19) W t+1 = k d (CS t+1 ,CS t+1 ) +μ 2 3 I n p 2 −1 . (3.20) 6: Updateμ 1 =μ 1 /α 1 ,μ 2 =μ 2 /α 2 ,μ 3 =μ 3 /α 3 . 7: IfkS t+1 −S t k F ≤, break. 8: Sett =t + 1. 9: end while However, we can show that the iterates remain bounded in nature and thatS t converges to a stationary point in Equation (3.17) via Theorems 4 and 5. Theorem 4 Assume thatS has binary valued components, then the sequence{S t } generated by Algorithm 1 satisfies the following: I1)J (S t+1 ) ≤ J (S t ), i.e., the regularized cost function is monotonically non-increasing. I2) The sequence{S t } is bounded at every iteration. I3) lim t→∞ e t = 0, wheree t =kS t −S t+1 k F . Proof of Theorem 4: See Appendix B.1. 44 Theorem 5 Any limit point of the sequence{S t } generated by Algorithm 2 is a stationary point of the optimization problem given in (3.7). Proof of Theorem 5: See Appendix B.2. Theorem 6 At the termination of Algorithm 2,kS ∗ − ˆ Sk 2 F satisfies: k ˆ S−S ∗ k 2 F ≤ 2δ γ q +λ 1 c +λ 2 pdη n (D) , (3.21) for a required accuracy|J (S ∗ )−J ( ˆ S)| ≤ δ, where η n (D) is the smallest eigenvalue ofD, andγ = min{τ q q−2 , (λ −1 1 τ) c c−2 , (λ −1 2 τ) p p−2 }, withτ =J (S 0 ) is the initial value of the objective function prior to iteration. Proof of Theorem 6: See Appendix B.3. Theorem 6 provides a tighter result than a pure Liptschitz constant computation and provides the conditions under which closeness of the objective function of an estimated S to the optimal value bounds the error of the estimatedS from the optimal value. As can be seen, as any of the weighting parameters λ 1 (weight ofS (3.7)) or λ 2 (weight ofS 0 (3.7)) increases, the error between the ground truth and the estimate decreases, if all other parameters are kept fixed. Similarly, as the required accuracy is tightened, i.e., smallerδ, the overall error is also decreased. Furthermore, as the variety degreed increases, the error is decreased. This is expected because the larger the degree of the variety is, the more rank-deficient the lifted matrix will be, which gives a better estimate frS 0 , and in turn forS as well. The impact ofJ (S 0 ) = τ is clearly seen; i.e., a bad initialization leads to largeτ which in turn leads to a large overall error at termination. 45 3.5 Practical considerations 3.5.1 General mapping In Section 3.2, the sparse matrixS is assumed to be binary. However, realistic video does not conform to such an assumption. In this section, we extend our signal model and all subsequent derivations to include sparse matrices that are not binary. The key is that we need to be able to clearly disambiguate target from non-target. The fix we introduce is thatS 0 = C tanh(S), where tanh(S) is the hyperbolic tangent function applied element-wise on the matrixS. The conditional mean estimator for a binary signal in Gaussian noise is the tanh operator applied to the scaled observation. Thus, the tanh operator enables improved support detection of the sparse components. The tanh operator has been chosen as it has the desirable properties of being continuous and differentiable. These properties facilitate the iterative gradient approach that is used to solve our ultimate optimization. IfS is a binary matrix, tanh(S) =S. 3.5.2 Changes in derivations In this subsection, we highlight the key changes in the derivations as we go from the special case of binary sparse matrices to general sparse matrices. First, the optimization in (3.7) becomes, arg min S kX−Sk q Sq +λ 1 kSk c 2,c +λ 2 kφ d (C tanh(S))k p Sp , (3.22) After differentiation, the final stationary point equation is written as follows, q(S−X)M +λ 1 cSN +λ 2 pd sech 2 (S) D tanh(S) (FW ) ! = 0, (3.23) 46 where, sech 2 (S) is the square value of the hyperbolic secant function applied element-wise on the matrixS. For the key theorems 4-6, ifS is a sparse matrix of moderately large size, we can derive the following approximation x , tr (tanh(S t )− tanh(S t+1 )) T D tanh(S t+1 )(F t W t ) ≈ tr (S t −S t+1 ) T sech 2 (S t+1 ) (D tanh(S t+1 ) (F t W t ) !! . (3.24) The subsequent steps in the proofs of Theorems 4 and 5 follow (under the assumption that the approximation is exact), and the results approximately hold in case of a general sparse matrix. The bound in Theorem 6, for a general sparse matrix and under the assumption (3.24), becomes, Approximation 1 For a general sparse matrix and under the assumption (3.24), at the termination of Algorithm 2,kS ∗ − ˆ Sk 2 F satisfies: k ˆ S−S ∗ k 2 F . 2δ γ − 2λ 2 pdmnη n (D) q +λ 1 c , (3.25) whereδ,γ,η n (D) andτ are defined in Theorem 6 A sketch of the modified derivations is provided in Appendix B.4. x The approximation is tested empirically and the error is on order of10 −5 . 47 3.6 Numerical Results We employn = 20 frames, each of sizel×k, wherel = 25 andk = 60; thusm = l×k = 1500. Hence, the matricesL,S, andX are 1500× 20. The rank, r, of the low-rank matrix is set to 4, and each component ofL is generated iid fromN (0, 0.09). Herein, we consider a general sparse matrixS and thus assume the presence of the tanh function in the algorithm (Section 3.5). We consider two varieties. The first variety is of degree 2, while the second variety is of degree 3: 1. Second order: y =−x 2 , −12≤x≤ 0 y =x 2 , 0≤x≤ 12 ; 2. Third order: y =x 3 , −6≤x≤ 6 y = (−x + 12) 3 , 6≤x≤ 18, which are shown in Fig. 3.2. The coordinate system for the first experiment is: x c ∈R 1×l = [−12 : 1 : 12] and y c ∈ R k×1 = [−150 : 5 : 145] T . The coordinate system for the second experiment is: x c ∈R 1×l = [−6 : 1 : 18] andy c ∈R k×1 = [−232 : 8 : 240] T . Locations of the objects in space, i.e., columns of S 0 , are sampled uniformly at random for each case. Frames are then constructed based on these generated locations. The frames are then vectorized, with setting the object intensity value to 30 at the designated locations of the object, and zero otherwise. We also set p = 1, c = 1, q = 1,d = 3 and = 10 −3 . All simulations are averaged over 10000 runs. 3.6.1 The Art of Tuning There are five parameters that need to be tuned: μ 1 ,μ 2 ,μ 3 , λ 1 and λ 2 . It is memory intensive to keep track of all the parameters simultaneously—thus, to reduce the number 48 -12 -9 -6 -3 0 3 6 9 12 15 18 -232 -168 -104 -40 24 88 152 216 Third order variety Second order variety Figure 3.2. Shapes of second order and third order varieties. of parameters, we setμ 1 =μ 2 =μ 3 =μ. It is observed that the regularization parameter μ affects the rate of the convergence of the algorithm, while the weighting parameters λ 1 andλ 2 emphasize the importance of one objective function over the others. We next study the impact of the choices of these parameters. The regularization parameter μ has to be decreased at every iteration to guarantee that the smoothed optimization problem is as close as possible to the original optimization problem at algorithm termination [70]. We select μ t+1 = μt α ,α > 1. The choice of the initial value of μ, i.e., μ 0 , and the value of α, is crucial to the rate of the convergence. A good choice forμ 0 is related to the spectral information of the observation matrix X[66], therefore, we set μ 0 = μ c kXk 2 , where μ c is tuned via cross-validation with an eye to accuracy and convergence rate. We examine two metrics: convergence and recovery error e S = Δ ||S 0 − ^ S|| 2 F ||S 0 || 2 F . We provide the details of tuning for the third order variety, a similar procedure can be followed for the second order variety. We observe that relatively small values of μ c (i.e., 0.01 and 0.001) lead to fast convergence, but at the expense of large error. On the other hand, relatively large values ofμ c (i.e., 1 and 10) give better recovery, but at the 49 0 10 20 30 40 Iterations 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 Second order variety Third order variety Figure 3.3. Convergence rate of VBSI for the two varieties. expense of the slowness of the convergence. Cross-validation suggests that μ c = 0.1 achieves the best of both worlds. That being said, there is a range of values forμ c that yields the same results (in terms of recovery error and convergence), and 0.1 is in this range. Similarly, we tuneα to 1.2,λ 1 to 0.007 andλ 2 to 0.03. Similarly, for the second order variety, the tuned values are as follows: λ 1 = 0.005,λ 2 = 0.045,μ c = 0.3, and α = 1.1. 3.6.2 Performance Comparisons For the noiseless scenario, Fig. 3.3 shows the convergence of our algorithm for the two considered varieties employing the parameter values previously discussed. We see convergence occurs within 20 iterations. The algorithm terminates when kS t+1 −S t k F ≤, we then calculate ˆ L and ˆ S 0 from the recovered ˆ S, via ˆ L =X− ˆ S and ˆ S 0 = C tanh( ˆ S), respectively. The Normalized Mean Square Error (NMSE) is calculated for each of the three quantities: ground truth is denoted by 0 , and the estimated matrices as ˆ . The NMSE is defined as || 0 − ^ || 2 F || 0 || 2 F , where||.|| 2 F is the squared 50 8 10 12 14 16 SNR 10 -4 10 -3 10 -2 10 -1 10 0 EBS VBSI Figure 3.4. Comparison of different techniques versus SNR. Frobenius norm. The results of our algorithm are shown in Table 3.1 in terms of the NMSE of the key matrices. Table 3.1 NMSE ofS,L andS 0 for the two varieties Third order Second order e S 9.12× 10 −5 12× 10 −5 e L 5.31× 10 −4 7.8× 10 −4 e S 0 1.65× 10 −4 2.18× 10 −4 We test the efficacy of our estimator in a noisy environment and compare with the traditional background subtraction method [3] that does not consider the side structure, i.e., the variety. We denote the method of [3] as Efficient Background Subtraction (EBS). The matrixX is normalized: X =X/||X|| 2 F . The SNR is defined as|| X|| 2 F /σ 2 ; as a result, the noise variance is then 1/SNR. The noise matrixZ is added toX, where each entry ofZ∼N (0,σ 2 ). Key parameters are set to the same values given in subsections A and B. As in the noiseless case, VBSI converges after 20 iterations in the presence of 51 0 16 0.2 14 0.4 10 -4 0.6 SNR 12 0.8 10 1 40 35 Iterations 30 25 20 15 8 10 5 0 Figure 3.5. Convergence for noisy scenario. noise. As shown in Fig. 3.4, VBSI offers an improvement of 2.4 dB on average over EBS, due to the variety side information. Fig. 3.5 shows the NSME of the algorithm versus SNR and iterations. As expected, performance improves with increasing SNR. Even in SNR regimes, the algorithm offers good recovery results. For all SNR values, the algorithm converges in 20 iterations, as for the noiseless scenario. In Fig. 3.6, we examine the effect of four different parameters p,δ,λ 1 and τ on the MSE ofS, defined as||S 0 − ^ S|| F as well as the bound on the MSE as provided in Approximation 1. The initial settings we consider are: p = 1,c = 1,q = 1,d = 3,τ = 50,δ = 10 −3 ,λ 1 = 0.007, andλ 2 = 0.03. Each plot varies one of these values while holding the others fixed at the initial setting. The key is that we see that the Approximation does indeed upper-bound the true MSE and that the trends predicted by Approximation 1 are reflected in the true MSE. Similar trends are seen for the other parameters considered in Approximation 1 (q,c,d, andλ 2 ). 52 0 0.5 1 0 0.2 0.4 0.6 0.8 1 1.2 Bound value and MSE 10 -3 p-bound p-MSE 0.005 0.01 0.015 0 0.2 0.4 0.6 0.8 1 Bound value and MSE 10 -3 -bound -MSE 6 8 10 10 -3 1 2 3 4 5 6 7 8 Bound value and MSE 10 -4 1 -bound 1 -MSE 0 50 100 0 0.2 0.4 0.6 0.8 1 Bound value and MSE 10 -3 -bound -MSE Figure 3.6. MSE and the value of the bound derived in Approximation 1 vs different system parameters. 53 Chapter 4 Leaked Channel Estimation: Exploitation of Sparse and Low-rank Structures 4.1 Introduction Wireless communications have enabled intelligent traffic safety [23, 36], automated robotic networks, underwater surveillance systems [18, 37], and many other useful technologies. In all of these systems, establishing a reliable, high data rate communication link between the transmitter and receiver is essential. To achieve this goal, accurate channel state information is needed to equalize the received signal and, thus combat the effects of the wireless channel. One of the well-known approaches to acquire channel state information is to probe the channel in time/frequency with known signals, and reconstruct the channel response from the output signals (see [38] and references therein). Least-squares (LS) and Wiener filters are classical examples of this approach. However, these methods do not take advantage of the rich, intrinsic structure of wireless communication channels in their estimation process. In particular, 54 many time-varying channels have sparse representations in the Doppler-delay domain. The main challenge with classical approaches, i.e., LS and Wiener filtering, is that they require large number of measurements compared to the number of unknown parameters in the estimation problem to perform well. To combat this challenge, we need to take advantage of the side information about the structure of the unknown parameters. By exploiting inherent structure, we can reduce the size of the feasible set of solutions. This results in the need for fewer observations. Methods for sparse channel estimation have existed for some time [12–18] and there has been a recent resurgence using modern signal processing methods for sparsity [17, 19, 20], group-sparsity or mixed/hybrid (sparse and group sparsity) structures [21–23] and even rank [18, 24, 25]. The use of practical pulse shapes due to finite block length and transmission bandwidth constraints results in a loss of sparsity and degrades performance of the modern sparse methods [19, 23]. This effect is defined as channel leakage in [19, 23]. It has been shown that the performance of compressed sensing (CS) methods are significantly degraded due to the leakage effect in practice [19, 23, 77–79]. The mitigation of leakage effects via sparse basis expansion has been previously considered in [19, 77–79]. In [77], a compressive method for tracking doubly selective channels within multicarrier systems, including OFDM systems is proposed. Using a recently introduced concept of modified compressed sensing (MOD-CS), the sequential delay-Doppler sparsity of the channel is exploited to improve estimation performance through a recursive estimation module. In [78], a compressive estimator of doubly selective channels for pulse-shaping multicarrier MIMO systems (including MIMO OFDM as a special case) is proposed. The use of multichannel compressed sensing exploits the joint sparsity of the MIMO channel for improved performance. A multichannel basis optimization for enhancing joint sparsity is also proposed. In [79], advanced compressive estimators of doubly dispersive channels within multicarrier communication systems (including classical OFDM systems) are considered. The 55 performance of compressive channel estimation has been shown to be limited by leakage components impairing the channel’s effective delay-Doppler sparsity. In [19], the application of compressed sensing to the estimation of doubly selective channels within pulse-shaping multicarrier systems (considered OFDM systems as a special case) is considered. By exploiting sparsity in the delay-Doppler domain, CS-based channel estimation allows for an increase in spectral efficiency through a reduction of the number of pilot symbols. For combating leakage effects that limit the delay-Doppler sparsity, a sparsity-enhancing basis expansion is considered and a method for optimizing the basis with or without prior statistical information about the channel is proposed. In this work it is shown that with these practical communication system constraints, the transmitted signal after passing through a linear, time-varying narrowband channel exhibits a parametric, low-rank, bilinear form. It is this signal description that enables methods that are distinct from the previously described work which are inherently one-dimensional in nature where our methods are two-dimensional. The rank of this representation is determined by the number of dominant paths, which is small in both cellular and underwater environments [80]. The bilinear form is due to the separability of the pulse leakage effects in the delay and Doppler domains [23]. Herein, we propose two methods which directly exploit the bilinear, low-rank form. Our first approach is a variation of gradient based methods, motivated by the strong performance of this type of approach for other applications [81, 82], due to the ease of finding a good initialization. To solve the non-convex optimization problem, we use the alternating direction method [83] due to the bilinearity of the measurement model. In the first step of this algorithm, we recover the channel in the delay direction and in the second step we estimate the channel in the Doppler direction. We repeat the steps iteratively until we converge to a stationary point. If the optimum values of the channel estimates are identifiable, we can prove that the gradient of the non-convex objective function is zero only at that point, for the noiseless case. Despite this positive result, we also show that the gradients have 56 high-variance, thus underscoring the likelihood of finding a local minimum which is always a challenge with non-convex objective functions. Our second approach is based on an alternative parameterization of the channel. We define a set of atoms to describe the set of rank-one matrices in our channel estimation problem. Utilizing this set of atoms, we show that the channel estimation problem can be stated as a parametric low-rank matrix recovery problem. Motivated by convex recovery for inverse problems via the atomic norm heuristic [84, 85], we develop a recovery algorithm employing the atomic norm to enforce the channel model and leakage structures via a convex optimization problem. We show that the solution of our convex optimization problem is the optimal solution of our channel estimation problem in the noiseless scenario. Furthermore, we discuss the conditions under which the solution of this convex program is unique. Finally, we develop a scaling law that relates the number of measurements needed to the probability of correct estimation of leaked channel parameters. We analyze the algorithm to show that the global optimum can be recovered in the absence of noise. Numerical results showed that the proposed algorithm can provide a performance of 5 dB to 12 dB improvement, in the SNR sense, for SNRs over 5 dB compared to an l 1 -based sparse approximation method. Furthermore, the proposed method offers 2 dB improvement, on average, over the basis expansion method considered in [19], which takes into account the leakage effects. The rest of this chapter is organized as follows. Section 4.2 develops the communication system model which is used in Section 4.3 to derive the discrete time observation model which captures the bilinear structure of the problem. Section 4.4 derives the two strategies based on non-convex and convex programming. Section 4.5 is devoted to discussion and numerical results. Appendices C.1, C.2, and C.3 provide the proofs of the three major theorems of the work. 57 4.2 System Model We assume that the transmitted signal x(t) is generated by the modulation of a pilot sequencex[n] onto the transmit pulsep t (t) as given by x(t) = +∞ X n=−∞ x[n]p t (t−nT s ), where T s is the sampling period. Note that this signal model is quite general, and encompasses OFDM signals as well as single-carrier signals. The signal x(t) is transmitted over a linear, time-varying channel. The received signaly(t) can be written as, y(t) = Z +∞ −∞ h (t,τ)x(t−τ)dτ +z(t). (4.1) Here, h(t,τ) is the channel’s time-varying impulse response, and z(t) is a white Gaussian noise process. A common model for the narrowband time-varying (TV) channel impulse response is as follows, h(t,τ) = p 0 X k=1 η k δ(τ−t k )e j2πν k t , (4.2) where p 0 denotes the number of dominant paths in the channel, η k , t k , and ν k denote the kth channel path’s attenuation gain, delay, and Doppler shift, respectively. At the receiver,y(t) is converted into a discrete-time signal using an anti-aliasing filterp r (t). That is, y[n] = Z +∞ −∞ y(t)p r (nT s −t)dt. We assume thatp t (t) andp r (t) are causal with support [0,T supp ). Under the reasonable assumptionν max T supp 1, whereν max = max(ν 1 ,...,ν p 0 ) denotes the Doppler spread 58 of the channel [36] and definingp(t) = p t (t)∗p r (t), we can write the received signal after filtering and sampling as [23], y[n] = m 0 −1 X m=0 h (l) [n,m]x[n−m] +z[n], (4.3) whereh (l) [n,m] = P p 0 k=1 h (l) k [n,m] and h (l) k [n,m] =η k e j2πν k ((n−m)Ts−t k ) p(mT s −t k ), (4.4) forn∈N. The superscriptl denotes leakage. Here,m 0 = j τmax Ts k + 1, whereτ max is the maximum delay spread of the channel, denotes the maximum discrete delay spread of the channel. The pulse shaping filter support does not impact the value ofτ max . Without loss of generality, if we assume thatp r (t) has a root-Nyquist spectrum with respect to the sample duration T s , then z[n] is a sequence of i.i.d circularly symmetric complex Gaussian random variables with a constant varianceσ 2 z . The pulse leakage effect is due to the non-zero support of the pulsep(·) in Equation (5.7). The leakage with respect to Doppler can be decreased by increasing the pulse shape duration, while the leakage with respect to delay can be decreased by increasing the bandwidth of the transmitted signal. Given the practical constraints of pulse shape duration and bandwidth, the leakage effect increases the number of nonzero coefficients of the observed leaked channel at the receiver (for more details, see [19, 23]). The main goal of channel estimation is the determination of the channel coefficients, i.e.,{h (l) k [n,m]| for 1≤ k≤ p 0 } and 0≤ m≤ m 0 − 1 at time instancen, in order to equalize their effect on the transmitted signal. It is clear that at each time instance,n, there existm 0 p 0 (unknown) channel coefficients to be estimated. These coefficients are estimated via the observationsy[n] and the pilot sequencex[n] which are both known at the receiver during channel estimation (see signal model in Equation (5.5)). We observe thatn∈{m 0 ,··· ,n T +m 0 − 1}, andn T denotes the total number of training symbols. 59 In the next section, we show that the channel coefficients exhibit structures that we can explicitly exploit in our estimation process in order to decrease the set of feasible solutions for channel coefficients, and thus improve the estimation fidelity. 4.3 Parametric Signal Representation In this section, we exploit the intrinsic structures in the measurement model in Equation (5.5). We show that, even though the leakage effect diminishes the sparsity of the effective observed channel coefficients, leakage also introduces an elegant parametric low-rank structure that will enable high accuracy channel estimation with a limited number of measurements. Defineg k (t) =p(t−t k )e −j2πν k t , then using Equations (5.5) and (5.7), the received signal from thek-th path can be written as s k [n] = m 0 −1 X m=0 g k (mT s )x[n−m] =x T n g k , (4.5) whereg k = [g k (0T s ),··· ,g k ((m 0 − 1)T s )] T andx n = [x[n],x[n− 1]··· ,x[n− (m 0 − 1)]] T . Vectors g k , for 1 ≤ k ≤ p 0 , contain only the (shifted) leakage pulse shape information and the vectorx n is described bym 0 consecutive samples from the training signal up to timen. We can represent the (aggregated) received signal in Equation (5.5) as y[n] = p 0 X k=1 ¯ η k s k [n]e j2π¯ ν k n , (4.6) where ¯ η k = η k e −j2πν k t k and ¯ ν k = ν k T s ∈ [− 1 2 , 1 2 ]. If we stack thes k [n] form 0 ≤ n≤ n T +m 0 − 1 in a vector ass k = [s k [m 0 ],··· ,s k [n T +m 0 − 1]] T , we can write s k =Xg k , (4.7) 60 where X is a n T -by-m 0 matrix, with its i-th row equaling x T i+m 0 −1 . In wireless communication systems, typically n T > m 0 , that is, all s k live in a common low-dimensional subspace spanned by the columns of a known n T × m 0 matrix X withn T > m 0 . We assume thatkg k k 2 = 1 without loss of generality. Using Equation (4.5), recovery ofs k is guaranteed ifg k can be recovered. Therefore, the number of degrees of freedom in Equation (4.6) becomes O(m 0 p 0 ), which is smaller than the number of measurements n T when p 0 ,m 0 n T . Applying Equation (4.5), we can rewrite Equation (4.6) as y[n] = p 0 X k=1 ¯ η k e j2πn¯ ν k x T n g k . (4.8) We define d(ν) = e −j2πm 0 ν , ··· ,e −j2π(n T +m 0 −1)ν T , which is a vector of all possible Doppler shifts in the channel representation. Thus, we have y[n] = * p 0 X k=1 ¯ η k g k d(ν k ) H ,x n e T n−m 0 +1 + , (4.9) forn = m 0 ,··· ,n T +m 0 − 1, wherehX,Yi = trace(Y H X) ande n , 1≤ n≤ n T , are the canonical basis vectors forR n T ×1 . We observe that the first factor in the matrix inner product given in Equation (4.9) has only the (narrowband) time-varying channel information and the second factor is a function of the training sequence only. Hereafter, we define the leaked channel matrix as H l = p 0 X k=1 ¯ η k g k d(ν k ) H . (4.10) Since each term in the above summation in Equation (5.12) is a rank one matrix, we know that rank(H l )≤p 0 . Thus, Equation (4.9) leads to a parametrized rank-p 0 matrix recovery problem, which we write as y = Π(H l ), (4.11) 61 where the linear operator Π :C m 0 ×n T →C n T ×1 is defined as [Π(H l )] n = D H l ,x n e T n E andH is a low-rank matrix with parametric representation given in Equation (5.12). 4.4 Structured Estimation of Time-varying Narrowband Channels In this section, we propose two algorithms to estimate the leaked channel matrixH l via the measurements, y[n], by exploiting the parametrized low-rank matrix structure described in Section 4.3. We show that by leveraging the channel structure in the estimation process, the channel coefficients can be estimated with high accuracy and with a small number of measurements. From this point on, we drop the subscriptl on H l for clarity and it is understood thatH always refers to the leaked channel matrix. 4.4.1 Nonconvex Alternating Direction Minimization Recovering the channel from the linear measurement modely = Π(H), described in Equation (5.13), is equivalent to determining a low-rank matrixH that satisfies this measurement model. Thus, we seek to solve, ˆ H l = arg min H ky− Π(H)k 2 2 s.t. rank(H)≤p 0 H = P p 0 k=1 ¯ η k g k d(ν k ) H , (4.12) where p 0 denotes the maximum number of dominant paths in the channel. Unfortunately, this optimization problem, due to the rank constraint is, in general, a NP-hard problem [86]. A tractable relaxation of the the rank constraint is the nuclear norm of the target matrix [86]. In particular, we can rewrite the relaxed optimization problem as, ˆ H l = arg min H ky− Π(H)k 2 2 +λkHk ∗ s.t.H = p 0 X k=1 ¯ η k g k d(ν k ) H , (4.13) 62 where the parameterλ in Equation (4.13) determines the trade-off between the fidelity of the solution to the measurements y and its conforming to the low-rank model. Furthermore, from Equation (5.12), we know that the channel matrix H can be represented as H = p 0 X k=1 ¯ η k g k d(ν k ) H =H t H ν , (4.14) where H t = [¯ η 1 g 1 , ¯ η 2 g 2 ,··· , ¯ η p 0 g p 0 ], (4.15) H ν = [d(ν 1 ),d(ν 2 ),··· ,d(ν p 0 )] H . (4.16) As can be seen from Equation (4.14), the sampled channel matrix is a function of both the delay and the Doppler values. One can estimate the sampled channel matrix vectors, i.e, g and d(ν) H , or attempt to estimate the delay and Doppler values that contribute to those vectors. The Doppler values have a direct structural relationship withH through the Vandermonde matrix. While the pulse shaping filters are known, endeavoring to exploit the resulting parametric relationship between the leakage vector g and the delay (and also the Doppler) did not provide good performance in our direct estimation of the delay. This could be due to the highly non-linear nature of the pulse correlation function or the presence of the Doppler in the delay leakage vector. Thus, we were able to be parametric with respect to the Doppler; but not with respect to the delay. We treatH t as an effective delay matrix; although it is dependent on Doppler as well, and do not seek to estimate the delays directly. In developing our equalizer, we only need to useH t (as a whole) andH ν . 63 Therefore, we can reformulate the optimization problem in Equation (4.13) with H =H t H ν as, arg min Ht,Hν ky− Π(H t H ν )k 2 2 +λkH t H ν k ∗ . (4.17) From [87], we know that the minimization of the nuclear norm of matrix products can be rewritten as a Frobenius norm minimization, min Ht,Hν kH t H ν k ∗ = min Ht,Hν 1 2 kH t k 2 F +kH ν k 2 F . (4.18) With this equivalence in hand, we will consider the following alternative optimization problem, arg min Ht,Hν ky− Π(H t H ν )k 2 2 + λ 2 kH t k 2 F +kH ν k 2 F . (4.19) It shall be noted that the cost function given in (4.19) is an upper bound on the cost function given in (4.17), due to the following inequalities, min Ht,Hν ky− Π(H t H ν )k 2 2 +λ 1 ||H t H ν || ∗ (a) ≤ min Ht,Hν ky− Π(H t H ν )k 2 2 +λ 1 q rank(H t H ν )||H t H ν || F (b) ≤ min Ht,Hν ky− Π(H t H ν )k 2 2 +λ 1 √ p 0 ||H t H ν || F (c) ≤ min Ht,Hν ky− Π(H t H ν )k 2 2 +λ 1 √ p 0 ||H t || F ||H ν || F (d) ≤ min Ht,Hν ky− Π(H t H ν )k 2 2 +λ 1 √ p 0 2 ||H t || 2 F +||H ν || 2 F = min Ht,Hν ky− Π(H t H ν )k 2 2 + λ 2 ||H t || 2 F +||H ν || 2 F , 64 where, λ = λ 1 √ p 0 , (a) holds since||A|| ∗ ≤ q rank(A)||A|| F , (b) holds since q rank(H t H ν ) = q rank(H)≤ √ p 0 , (c) holds since||AB|| F ≤||A|| F ||B|| F , and (d) holds since||A|| F ||B|| F ≤ 1 2 (||A|| 2 F +||B|| 2 F ). The form of this upper bound facilitates optimization, although it is still non-convex and thus our methods will yield local optima only. From Equation (4.16), we see that the matrixH ν is a partial Vandermonde matrix and is fully determined by the Doppler parameters = [ν 1 ,··· ,ν p 0 ]. Therefore, instead of optimizing the matrixH ν onC p 0 ×n T , we perform the optimization over the set of Doppler parameters∈ [− 1 2 , 1 2 ] p 0 as follows: arg min Ht,ν ky− Π(H t H ν ())k 2 2 + λ 2 kH t k 2 F + λ 2 kH ν ()k 2 F . Note that the above optimization problem is non-convex [88], due to the combination of unknown products, i.e., Π(H t H ν ). Due to the separability of the objective function in this optimization problem in delay and Doppler directions, see Section V in [23], we can employ the alternating projections algorithm [83] which is a space and efficient technique that stores the iterates in factored form. The algorithm is extraordinarily simple, and easy to interpret: In the first step, we fix eitherH t orH ν , and try to optimize the other. Then, in the second step, we substitute the computed matrix in the first step and optimize over the second matrix. We iterate over these two steps until the algorithm converges to the optimal (or a stationary point) solution. Given the current estimates of H k t andH k ν , the updating rules can be summarized as H k+1 ν = arg min Hν () y− Π H k+1 t H ν () 2 2 , (4.20) H k+1 t = argmin Ht y− Π H t H ν ( k ) 2 2 + λ 2 kH t k 2 F . (4.21) We can further simplify these iterations using Lemma 1. 65 Lemma 1:[see e.g., [89]] Suppose that Π :C m 0 ×n T →C n T ×1 is a linear operator where [Π(H)] n = tr(X n H) = m 0 X i=1 n T X j=1 X n [j,i]H[i,j], whereX n =x n e T n . Then, forH =H t H ν , we can write Π(H) =A t vec(H ν ) =A ν vec(H t ), (4.22) where vec(·) stacks the columns of its matrix argument into a single column vector. The matricesA t andA ν are defined as A t [k,l +p 0 (j− 1)] = m 0 X i=1 X k [j,i]H t [i,l], and A ν [k,i +m(l− 1)] = n T X j=1 X k [j,i]H ν [l,j], wherel∈{1,··· ,p 0 }. Applying Lemma 1, we can rewrite each iteration of the alternating projection algorithm in Equation (4.21) and (4.20) as H k+1 ν = arg min Hν () y−A k+1 t vec(H ν ()) 2 2 , (4.23) H k+1 t = argmin Ht y−A k vec(H t ) 2 2 + λ 2 kvec(H t )k 2 2 . (4.24) In the rest of this work, we connote this method of channel estimation as the non-convex alternating direction minimization (NADM) approach. From Equation (4.24), we see that updating in the delay direction is just a Ridge estimator or Tikhonov regularization [90] and in the Doppler direction we need to find the roots of a (exponential) polynomial 66 using a root-finding algorithm [91]. One of the challenges with this approach is that it is sensitive to the initialization. Remark 1. If we know the exact number of the dominant paths in the channel, i.e., rank(H) = p 0 , then enforcing the separability of the channel in leakage and Doppler direction (matrices) in Equation (4.17) will implicitly enforce the low-rank structure of the channel matrix. Therefore, when the number of dominant paths in the channel is known exactly, we can setλ = 0 in Equation (4.17) for the channel estimation. Considering the case in Remark 1, we state a theorem (Theorem 7), that under no noise and sufficiently small stopping resolution, the output (H opt t ,H opt ν ) of the NADM algorithm is the global optimum (H ∗ t ,H ∗ ν ) of the channel estimation problem whenever the global optimum is uniquely identifiable y . Theorem 7 Define , H t H T −H ∗ t H ∗ T and J(H t ,H ) =ky− Π(H t H ν )k 2 2 . In the absence of noise, 6= 0 implies that ∂J(Ht,H ) ∂Ht 6= 0 and ∂J(Ht,Hν ) ∂Hν 6= 0, if the global optimum (H opt t ,H opt ν ) is uniquely identifiable. Proof: See Appendix C.1. Remark 2. The unique identifiability of (H ∗ t ,H ∗ ν ) as the solution of (H ∗ t ,H ∗ ν ) = arg min Ht,Hν J(H t ,H ) (4.25) is necessary for any recovery guarantee. To see this, note that if there is a second global optimum (bmH ∗∗ t ,H ∗∗ ν ) to (4.25) then there is no way to determine which of the two solutions is the correct one. In this case, we have ∗∗ = H ∗∗ t H ∗∗ T −H ∗ t H ∗ T 6= 0, but global optimality of (H ∗∗ t ,H ∗∗ ) implies ∂J(H ∗∗ t ,H ∗∗ ) ∂Ht = 0, thus contradicting y Identifiability here means that given sufficient information, can we unambiguously determine the pair of inputs(H t ,H ν ) that generated the observation. 67 the conclusion of Theorem 7 if the unique identifiability clause was removed from the statement of the theorem. While Theorem 7 excludes the existence of local optima, it does not quantify the convergence rate of the algorithm. To do so involves knowing the mean and the variance of the partial derivative ∂J(Ht,H ) ∂Ht over the random training sequence (Theorem 8). We analyze the output of proposed algorithm to find the solution of the channel estimation problem when the elements of the pilot sequencex are drawn i.i.d from a sequence of random BPSK modulation, i.e.,{−1, +1} with equal probability. Theorem 8 Let elements ofx∈ C n T +m 0 −1 be generated by a sequence of random BPSK modulation, i.e.,{−1, +1} with equal probability, and define , H t H T − H ∗ t H ∗ T = [ 1 ,··· , n T −1 ]∈C m 0 ×n T . In the absence of noise, we have E ∂J(H t ,H ) ∂H t 2 F = E ( ∂J(H t ,H ) ∂H t ) 2 F + (m 0 − 1)p 0 n T −1 X n=0 k n k 2 2 , (4.26) and E ( ∂J(H t ,H ) ∂H τ ) 2 F = vec( ∗ ) H B( ∗ ), (4.27) where (using⊗ to denote Kronecker product of matrices)B = H H H ⊗I∈ C m 0 n T ×m 0 n T is a Hermitian symmetric positive semidefinite matrix satisfying the eigenvalue bounds 0BkH k 2 2 I. (4.28) Proof: See Appendix C.2. In our simulations, see Section 4.5, we did observe the impact of bad initializations; i.e., local optima were found which resulted in poor performance [82]. This challenge 68 motivates us to see if we can use a convex programming approach to enforce the parametric bilinear low-rank structure. In the next section, we employ atomic norm heuristic, introduced by [92], to promote the sparsity of number dominant paths in the channel as well as the parametric low rank structure in the channel model. 4.4.2 Parametric Low-rank Atomic Norm We know that the number of dominant paths in a narrowband time-varying wireless communication system is small. This means that the number of terms in Equation (4.8), p 0 , is quite small compared to the number of training signal measurements, i.e., p 0 n T . In other words, the channel can be described as a summation of rank-one matrices of formgd(ν) H in Equation (5.12). This representation motivates the use of the atomic norm heuristic to promote this structure. For this purpose, we define an atom asA(g,ν) =gd(ν) H , whereν∈ [− 1 2 , 1 2 ] andg∈C m 0 ×1 . Without loss of generality, we considerkgk 2 = 1. Then, we define the set of all atoms as A = A(g,ν)|ν∈ [− 1 2 , 1 2 ],kgk 2 = 1,g∈C m 0 ×1 . (4.29) Our goal here is to find a representation of the channel with a small number of dominant paths, i.e., kHk A,0 = inf p ( H = p X k=1 ¯ η k g k d(ν k ) H ) . (4.30) Due to the combinatoric nature of the norm defined in Equation (D.5), the above optimization problem is NP-hard. Thus, we alternatively consider the convex relaxation 69 of above norm as the atomic norm associated with the set of atoms defined in Equation (D.4) as kHk A = inf{t> 0 : H∈tconv(A)} (4.31) = inf ¯ η k ,ν k ,kg k k 2 =1 ( X k |¯ η k | : H = X k ¯ η k g k d(ν k ) H ) , where conv denotes the convex hull. This relaxation is similar to the relaxation of l 0 -norm of a vector by itsl 1 -norm, which is the prevalent relaxation in the compressed sensing literature to avoid the combinatorial nature ofl 0 -norm in the recovery algorithm. Remark 3. The atomic representation in Equation (D.6) for matrix H, i.e., H = P k ¯ η k A(g k ,ν k ) = P k ¯ η k g k d(ν k ) H , not only captures the functional forms of its elements, but also enforces the rank-one constraint on each term in the summation, i.e., rank(A(g k ,ν k )) = 1. To enforce the sparsity of the atomic representation (low rank) of the received signal, we solve minimize H kHk A s.t. y = Π(H). (4.32) From [84], we know that due to the Vandermonde decomposition, the convex hull of the set of atomsA can be characterized by a semidefinite program (SDP). ThereforekHk A in Equation (4.32) admits an equivalent SDP representation. Proposition 1 (see, e.g., [84, 93]). For anyH∈C m 0 ×n T , kHk A = inf z,W 1 2n T trace(Toep(z) +n T W ) s.t. Toep(z) H H H W 0, (4.33) 70 wherez is a complex vector whose first element is real, Toep(z) denotes then T ×n T Hermitian Toeplitz matrix whose first column isz, andW is a Hermitian m 0 ×m 0 matrix. Therefore, we can use any efficient SDP solver such as CVX [94], to solve the optimization problem in Equation (4.32). For noisy measurements, we consider minimize H kHk A s.t. ky− Π(H)k 2 ≤σ 2 z . (4.34) The above problem can similarly be cast as SDP by adding the noisy convex constraint. Similar to the noiseless version, the noisy version can be solved using CVX. Next, we show that the solution of the optimization problem in Equation (4.32) is the optimal solution of our channel estimation problem in the noiseless scenario. Furthermore, we discuss the conditions under which the solution of this convex program is unique. Finally, we develop a scaling law that relates the number of measurements needed to the probability of correct estimation of leaked channel parameters. Optimality and Uniqueness The dual of the optimization problem in Equation (4.32), using standard Lagrangian analysis, can be written as maximize <h,yi s.t.kΠ ∗ ()k ∗ A ≤ 1, (4.35) where Π ∗ () = P k (k)x k e T k−m 0 +1 is the adjoint operator of Π andk·k ∗ A denotes the dual norm of the atomic norm, which is an inner product. Therefore, we have kΠ ∗ ()k ∗ A = sup kk A ≤1 <{Π ∗ (), } (4.36) = sup ν∈[− 1 2 , 1 2 ],kgk 2 =1 < nD Π ∗ (),gd(ν) H Eo . 71 Equality in Equation (4.36) holds, since the set gd(ν) H ν,g covers all the extremal points of the atomic norm unit ball, i.e.,{ :kk A ≤ 1}. If we define the vector-valued function(ν) = Π ∗ ()d(ν), then by Cauchy-Schwarz inequality, we have, kΠ ∗ ()k ∗ A = sup ν∈[− 1 2 , 1 2 ],kgk 2 =1 <g H (ν) ≤ sup ν∈[− 1 2 , 1 2 ] k(ν)k 2 . (4.37) Now, if we consider the following condition that k(ν)k 2 ≤ 1, (C-1) then, we can rewrite the optimization problem in Equation (4.35) as follows, maximize <{h,yi} subject tok(ν)k 2 ≤ 1, (4.38) where(ν) = Π ∗ ()d(ν) or (ν) = n T +m 0 −1 X n=m 0 (n−m 0 + 1)e j2πnν x H n . (4.39) Similarly, we have<{h,yi} =<{hΠ ∗ (),Hi} = P k <¯ η ∗ k g H k (ν k ). If we further assume (ν k ) = sign(¯ η k )g k , (C-2) fork∈{1,··· ,p 0 }, then we have<{h,yi} = P k |¯ η k |≥kHk A . Moreover, using the H¨ older inequality, we know that <{h,yi}≤kΠ ∗ ()k ∗ A kHk A ≤kHk A . 72 Therefore, if condition (C-2) holds, then<{h,yi} =kHk A . In other words, under conditions (C-1) and (C-2), the solution of the primal (Equation (4.32)) and dual (Equation (4.35)) optimization problems introduce a zero duality gap. Thus, H and are optimal solutions of the primal and dual optimization problem. Furthermore, using proof by contradiction, we can see that condition (C-2) ensures the uniqueness of the optimal solution. Suppose ˆ H = P k ˆ η k ˆ g k d(ˆ ν k ) H is another optimal solution. Since ˆ H andH are different, there are some ˆ ν k that are not in the support ofH. Define T ν ={ν 1 ,ν 2 ,··· ,ν p 0 } as the Doppler shifts’ support ofH. Then, we have <{h,yi} =<{ D Π ∗ (), ˆ H E } = X k∈Tν <{¯ η ∗ k g H k (ν k )} + X k/ ∈Tν <{ ˆ ¯ η ∗ k ˆ g H k (ˆ ν k )} < X k∈Tν |¯ η ∗ k | + X k/ ∈Tν | ˆ ¯ η ∗ k |, which is in contradiction with the optimality of ˆ H. Thus, we show that if we can guarantee the existence of a dual polynomial(ν) = Π ∗ ()d(ν) meeting the two key conditions (C-1) and (C-2), then the optimization problem in Equation (4.32) will find the optimal solution of the channel estimation problem. In Theorem 9, we show that a proper dual polynomial(ν) exists under two main conditions: 1) a minimum Doppler separation and 2) a sufficient number of measurements. Theorem 9 Supposen T ≥ 64 (see [93]) and the training sequence{x[n]|m 0 ≤ n≤ n T + m 0 − 1} is generated using a i.i.d. random source with Rademacher distribution a . Assume that min 1≤i<j≤p 0 |ν i −ν j |≥ 4 n T , then there exists a constantc, such that for n T ≥cp 0 m 0 log 3 n T p 0 m 0 δ , (4.40) 73 the proposed optimization problem in Equation (4.32) can recover H with probability at least 1−δ. Proof: See Appendix C.3. a The Rademacher distribution is a discrete probability distribution where x i [j] =±1 with probability1/2. Channel Estimation Algorithm After we evaluate the dual parameters by solving the optimization problem in Equation (4.32), we can construct the function(ν) in Equation (4.39). Then, we can employ(ν) to estimate the Doppler parameters by enforcing condition (C-2), as we know that|(ν k )| = 1 fork∈{1,··· ,p 0 }. Towards this goal, we need to find the roots of the following polynomial Q(ν) = 1−k(ν)k 2 2 = 1−(ν) H (ν) = 1−k(ν)k 2 2 , (4.41) which are equal to{ν k } p 0 k=1 . After the estimation of{ν k } p 0 k=1 , we can substitute them into Equation (4.8) to achieve a linear system of equations to evaluate{¯ η k g k } p 0 k=1 . Note that we do not need to compute the values of ¯ η k andg k separately in order to equalize the channel distortion. As seen in Equation (4.8), to construct an equalizer, we just require ¯ η k g k for 1≤k≤p 0 , using Equation (5.7) we can rewrite the channel coefficients as h (l) [n,m] = p 0 X k=0 ¯ η k g k [m]e j2π¯ ν k n , (4.42) thus to design a channel equalizer, one just need evaluate the above coefficients in Equation (4.42). We connote our convex channel estimation approach as the parametric low rank atomic norm (PLAN) method. 74 4.5 Numerical Simulations In this section, we perform several numerical experiments to validate the performance of the proposed channel estimation algorithms. Furthermore, we compare the performance of our proposed algorithms, namely NADM and PLAN, with sparsity-based method, see e.g., [20, 23] and the references therein. More details about the model of the channel coefficients and measurements for the element-wise sparsity-based method, i.e., LASSO method, can be found in Section V in [23]. Additionally, we compare the performance of our proposed algorithms with the work [19] that considers an iterative basis-optimization approach which exploits prior statistics and sparsity to mitigate the leakage effect. From [19], basis expansion optimization is coupled with OMP, namely OMP (optimized). 4.5.1 Signal Parameters We construct a narrowband time-varying channel based on the model given in Equation (5.3). We first generate the channel delay, Doppler, and attenuation parameters randomly. In our experiments, the (normalized) delay, in (0, 1], and Doppler parameters, in [− 1 2 , 1 2 ], are generated via uniform random variables and the channel attenuation parameters are generated using a Rayleigh random variable, unless otherwise stated. The transmit training signal x = [x[1],x[2],··· ,x[n T + n 0 − 1]] T is generated by a random BPSK modulation, i.e.,{−1, +1} with equal probability. Moreover, the transmit and received pulse shapes are Gaussian pulses with 50% window’s support over a single symbol interval. 75 0 5 10 15 SNR (dB) -16 -14 -12 -10 -8 -6 -4 -2 0 2 NMSE (dB) NADM PLAN LASSO OMP (optimized) NADM: Worst case NADM: Best case Figure 4.1.±2σ Normalized mean-squared-error (NMSE) curves of estimation strategies vs. signal-to-noise-ratio (SNR) at the receiver side—leaked channel estimation. 4.5.2 Performance Comparisons In the first numerical simulation, we compare the normalized mean-squared-error (NMSE) of estimation performance of different algorithms. We define the NMSE as follows: NMSE = k− ˆ k kk , (4.43) where is the true value of the target parameter and ˆ denotes its estimated value. Results in Fig. 4.1 depict the NMSE, averaged over total number of runs (10000 runs), of the estimation of the leaked channel matrix, H, using the proposed NADM and PLAN approaches, the sparsity-based method, and the OMP optimized method, with n T = 64 measurements of the training signal,m 0 = 10 andp 0 = 5. From Fig.4.1, we observe that both the convex and non-convex approaches perform better than the method 76 Figure 4.2. Normalized mean-squared-error (NMSE) vs. signal-to-noise-ratio (SNR) at the receiver side—channel Doppler shifts estimation using PLAN approach (effect ofn T ). using thel 1 -norm alone to promote the sparsity of the channel coefficients. This is due to the fact that the new methods directly exploit the leakage effects presented in Sections 4.3 and 4.4, while the sparsity-based method ignores the leakage effect. Furthermore, our method offers 2 dB improvement, on average, over the basis expansion method proposed in [19] which does explicitly consider leakage. To explore the sensitivity of the non-convex NADM, we include±2σ bars on each plot. The error bars underscore that the NADM method has much higher variance than the other considered methods. Specifically, NADM has approximately twice the variance of that of PLAN. The best and the worst curves are determined from the best and the worst performance cases over all the runs. From Fig. 4.1, we conclude that NADM approach is very sensitive to initialization. Finding a good strategy for a proper initialization of the NADM approach is an interesting research problem and is left for future research. 77 5 10 15 20 SNR (dB) 10 -4 10 -3 10 -2 NMSE p 0 =10, m 0 =4 p 0 =10, m 0 =6 p 0 =10, m 0 =8 p 0 =15, m 0 =4 p 0 =20, m 0 =4 Figure 4.3. Normalized mean-squared-error (NMSE) vs. signal-to-noise-ratio (SNR) at the receiver side—channel Doppler shifts estimation using PLAN approach (effect ofm 0 andp 0 ). 4.5.3 Doppler Estimation and Resolution Constraint As discussed in Section 4.4.2, in our proposed PLAN approach, we compute the Doppler shift parameters,ν k for 1≤ k≤ p 0 directly, by finding the roots of polynomialQ(ν), defined in Equation (4.41), then we use the evaluated Doppler shifts to estimate the leaked channel gains using Equation (4.8). In Fig. 4.2, withm 0 = 10 andp 0 = 5, the NMSE of our proposed convex algorithm, PLAN, for the estimation of the Doppler shift parameters is shown as a function of SNR. In this figure, for SNR≥ 5 dB, we see that for all values ofn T , the proposed algorithm can estimate the Doppler parameters with at most 0.01 (normalized) error. Furthermore, the accuracy of Doppler shifts estimation improves by increasing the number of measurements. Fig. 4.3 shows the effect of changingm 0 andp 0 forn T = 64. As clearly can be seen from Fig. 4.3, asm 0 orp 0 increases, the accuracy of Doppler shifts estimation does not degrade much as long as the number of the measurements is sufficient, i.e, on the order of m 0 p 0 ; however, the 78 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Doppler shift ( ) 0 0.2 0.4 0.6 0.8 1 1-Q( ) Figure 4.4. Doppler shifts are the roots ofQ(ν). This figure depicts1−Q(ν) vs.ν. For n T =100,p 0 =5, andm 0 =10. performance degrades much more if the quantity m 0 p 0 exceeds n T by a large factor. From Fig. 4.2 and Fig. 4.3, we conclude that our PLAN approach can perform quite well with the measurements number only on the order of a constant factor ofm 0 p 0 . Fig. 4.4 and 4.5 illustrate the Doppler shift recovery using the (dual) function (ν). The channel Doppler shift parameters to generate this figure were ν k ∈ {−0.4,−0.2, 0, 0.2, 0.3}. It is clear that by increasing n T from 100 to 200 measurements, other peaks in the curve in Fig. 4.5 get much smaller than the peaks for the locations of the Doppler shifts of the channel. In other words, by increasing the number of measurements,n T , the resolution constraint in Theorem 7 becomes smaller and we are able to estimate the Doppler shifts with higher accuracy. 79 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Doppler shift ( ) 0 0.2 0.4 0.6 0.8 1 1-Q( ) Figure 4.5. Doppler shifts are the roots ofQ(ν). This figure depicts1−Q(ν) vs.ν. For n T =200,p 0 =5, andm 0 =10. 4.5.4 Bit Error Performance Comparison In this section, we compare the performance of our PLAN method and the element sparsity only (sparse approximation) approach based on thel 1 -norm, for data recovery during the data transmission phase. For this comparison, we build a minimum mean square error (MMSE) equalizer using the estimated channel matrix using these two algorithms, to equalize the channel distortion (see for example Chapter 16.2 in [80]). We consider m 0 = 10, p 0 = 5, and n T = 150 during the channel estimation phase. Then, we generate n = 1000 bits randomly and modulated by BPSK signaling to compute the bit-error-rate (BER) during data transmission. The results in Fig. 4.6 show the BER vs. SNR performance of these algorithms. The label “PLAN-not satisfied” indicates that the resolution constraint is not met in the channel for the Doppler shift realizations, while “PLAN-satisfied” indicates the Doppler shifts are well-separated as suggested in Theorem 9. For both cases (well-separated and not), we generate 80 0 2 4 6 8 10 12 14 16 SNR (dB) 10 -2 10 -1 10 0 BER LASSO OMP (optimized) PLAN- not satisfied PLAN- satisfied Figure 4.6. BER vs. SNR performance comparison. The label “PLAN-not satisfied” indicates that the resolution constraint is not enforced in the channel realization, while “PLAN-satisfied” indicates the Doppler shifts are well-separated as given in Theorem 9. random values of the Doppler shifts. We first select ν uniformly from [−0.5, 0.5]; for each realization, we check if the resolution constraint is met. If it is, this set of values is labeled as “well-separated,” if it is not, it is labeled as “not well-separated.” Then simulations/computations are done for these two sets of values and performance averaged accordingly. The performance of the sparse approximation method does not depend on the Doppler shift separation constraint, thus in Fig. 4.6, only one curve for this method is provided. From Fig. 4.6, we see that PLAN offers significantly superior performance over a sparsity-only method even if the resolution constraint is not met. Furthermore, improvement in performance is achieved if the constraint is met, i.e., the Doppler shifts are sufficiently separated. Additionally, slight performance improvement over [19] is observed if the resolution constraint is not met especially at low SNR values; whereas much stronger improvement is observed if the resolution constraint is met. 81 Chapter 5 Leaked Channel Estimation: Extensions to MIMO OFDM and performance bounds 5.1 Introduction As high data rate communications is enabled via high performance channel estimation, numerous works have designed methods to accurately equalize channel effects at the receiver [38, 95–98]. In particular, [38] considers time-varying channel estimation via the inclusion of pilot signals, known to both transmitter and receiver; Least Squares (LS) methods are a well-known example of this approach. These classical methods are exploited to estimate time-varying OFDM [95, 96] and MIMO OFDM [97, 98] channels . One disadvantage of these classical methods is that they require a large number of measurements compared to the actual number of unknowns in the estimation problem. This is a result of not leveraging the intrinsic structure of the wireless channels, e.g., sparsity. 82 There has been strong recent interest in employing structure to improve channel estimation. In particular, methods for sparse channel estimation have received recent attention [12–14, 16, 18] enabling improved performance while reducing the number of observations. However, sparsity is lost in the presence of practical constraints (finite bandwidth and blocklength) resulting in leaked channels [23, 78, 99, 100]. Leakage occurs in doubly selective channels that suffer from time and frequency propagation effects [78, 99, 100]. Furthermore, if there are even slight synchronization errors, that would also lead to leakage. One solution strategy for this issue is to employ sparsity enhancing methods such as those based on basis expansion optimization for doubly-selective channels [99]. Another approach is in [78], where a modified compressed sensing technique, which adapts orthogonal matching pursuit, is proposed. These prior works focus on OFDM or MIMO-OFDM signaling. In contrast, in our prior approach [100], it was shown that time-varying channels can be effectively modeled as bilinear in the leakage functions along delay and Doppler domains. The resulting model is a channel matrix that is low rank. As such, convex optimization via the atomic norm [101–103] can be employed to determine the channel parameters. Herein, we extend this approach to the multiple-input multiple-ouput (MIMO) OFDM case. We are able to adapt the optimality, uniqueness and scaling law results in [100] for single-carrier to the MIMO OFDM scenario herein. Importantly, the decoupling of the channel matrix in the delay and Doppler domains continues to hold true for MIMO OFDM. To analyze the performance and conduct system optimization, we compute the CRB for key channel parameters (Doppler and delay) for the leaked MIMO OFDM channel. Our analysis reveals that the CRB is also decoupled in the delay and Doppler domains. We observe that there is prior art on the derivation of the CRB for MIMO OFDM channels [104]. In [105], the CRB derived in [106] is adapted for estimating channels under the availability of side information. In [107], the CRB of [105, 106] is further extended for optimal design and placement of pilot symbols. However, [105–107] do 83 not consider leakage effects nor do they exploit the bilinear nature of the channel; to this end we achieve tight bounds for our proposed methods. Given the effective separability of the CRB in delay and Doppler domains, we determine training sequences optimized for the estimation of delay and Doppler, separately. The CRB is optimized through the training sequence design. The optimization itself is carried out via the solution of fixed-point equations, one for delay and one for Doppler. The resulting sequences are analyzed through computing a whiteness metric [108] and the ambiguity function. The main contributions of this work are summarized as follows: 1. We adapt the framework in [100] to MIMO OFDM systems resulting in the Bilinear Leaked Channel Estimator (BLCE) algorithm based on the atomic norm and updated technical results on the optimality, uniqueness and scaling law. 2. We derive the CRB for the parameters of the leaked MIMO OFDM channel under the assumption of random and deterministic training sequences. We also prove that the CRB matrix is effectively decoupled in delay and Doppler domains. 3. We derive fixed-point expressions for determining the training sequences that optimize the CRB in the delay and Doppler domains. We investigate relationships between the delay-optimal and the Doppler-optimal sequences via whiteness tests and the computation of (cross-)ambiguity functions. We also numerically observe that the optimized sequences achieve 5 dB and 2.5 dB gains over purely random sequences for Doppler and delay, respectively. 4. Numerical comparisons to sparse methods based on the l 1 norm [20, 23], an adaptive MIMO OFDM method [109] and a sparsity-enhancing basis-expansion method [78] reveal that our new method offers an average improvement of 6.3 dB, 2 dB and 1.3 dB, respectively, with regards to the normalized mean-squared error. Comparison of the BLCE estimator results with the derived CRB simulation shows that the proposed algorithm is essentially near optimal. 84 5. For the experimental data, SPACE’08 [110], in comparison to the methods in [110], our method provides a gain of 3 dB, on average, with regards to the BER. The rest of this Chapter is organized as follows: The signal model is introduced in Section 5.2. In Section 5.3, the optimization problem is formulated and the attendant algorithm is provided; optimality, uniqueness, and the estimation algorithm are also given. Different variants of the CRB are derived in Section 5.4. The algorithm for findings the optimized training sequences is derived in Section 5.5. Numerical results based on both synthetic and experimental data are given in Section 5.6. Appendices D.1 through D.4 provide the proofs of the technical results. Notation: Scalar values are denoted by lower-case letters, x and vectors by bold letters,x. A matrix is denoted by bold capital letters,X. The transpose ofX is given byX T and its conjugate transpose byX H . The identity matrix is defined asI, with dimension given to its subscript. The set of real numbers is denoted byR, and the set of complex numbers is denoted byC. The Kronecker product is denoted by⊗. Other notations defined throughout the Chapter whenever needed. 5.2 System Model We consider a time-varying MIMO-OFDM communication system. The transmit OFDM signalx (nt) (t), on then t th transmit antenna, over a time-varying system, is the modulation of a pilot sequencex (nt) l [n] onto the transmit pulsep t (t), and is given by, x (nt) (t) = +∞ X n=−∞ 1 √ L L−1 X l=0 x (nt) l [n]e j2πf l t p t (t−nT s ), (5.1) whereT s is the sampling period, andf l is thel th subcarrier frequency, 0≤l≤L− 1;L is the total number of subcarriers in one OFDM symbol, and 1≤n t ≤N t , whereN t is 85 the total number of transmit antennas. For a general signalx(t) that is transmitted over a linear, time-varying channel, the received signaly(t) is written as, y(t) = Z +∞ −∞ h (t,τ)x(t−τ)dτ +z(t). (5.2) Here, h(t,τ) represents the channel’s time-varying impulse response, and z(t) is a white Gaussian noise process. A common model for the narrowband time-varying (TV) channel impulse response is as follows (see e.g., [38, 100]), h(t,τ) = p 0 X k=1 η k δ(τ−t k )e j2πν k t , (5.3) wherep 0 is the number of dominant paths in the channel,η k ,t k , andν k denote thekth channel path’s attenuation gain, delay, and Doppler shift, respectively. At the receiver side, after convertingy(t) through an anti-aliasing filterp r (t), the resulting discrete-time signal is written as follows, y[n] = Z +∞ −∞ y(t)p r (nT s −t)dt. (5.4) The pulsesp t (t) andp r (t) are assumed to be causal with support [0,T supp ). Assuming ν max T supp 1, with ν max = max(ν 1 ,...,ν p 0 ) denoting the Doppler spread of the channel, and defining the effective pulsep(t) asp(t) =p t (t)∗p r (t), the received signal after filtering and sampling is written as, y[n] = m 0 −1 X m=0 h le [n,m]x[n−m] +z[n], (5.5) wheren∈N, (∗) denotes the convolution operator,m 0 = j τmax Ts k + 1, withτ max being the maximum delay spread of the channel, and h le [n,m] = p 0 X k=1 h le,k [n,m], (5.6) 86 where, h le,k [n,m] =η k e j2πν k ((n−m)Ts−t k ) p(mT s −t k ). (5.7) The subscript “le” in Equations (5.6) and (5.7) denotes leakage. From Equation (5.7), the pulse leakage effect is a result of the non-zero support of the pulsep(·)—which is a consequence of the practical constraints of finite transmit bandwidth and finite block length [23, 99]. The leakage effects destroy the sparsity in the delay-Doppler domains. Substituting Equations (5.1) and (5.6) into Equation (5.5), while taking into account the multi-antenna setting, the received noiseless signal on the n r -th antenna at time instancen can be written as provided in Equation (5.8), y (nr ) [n] = 1 √ L Nt X nt=1 m 0 −1 X m=0 L−1 X l=0 x (nt) l [n−m] p 0 X k=1 η (nt,nr ) k e j2πν (n t ,nr) k (n−m)Ts e j2πf l ((n−m)Ts−t (n t ,nr) k ) p(mT s −t (nt,nr ) k ) . (5.8) where, 1≤ n r ≤ N r ; with N r denoting the number of receive antennas, and m 0 ≤ n≤ n T +m 0 − 1; with n T being the total number of training samples. Moreover, x (nt) l [n] is the transmit signal by thel-th OFDM subcarrier on then t -th transmit antenna at time instancen. Furthermore,η (nt,nr ) k ,t (nt,nr ) k , andν (nt,nr ) k are thek th channel path’s attenuation gain, delay, and Doppler shift, respectively, between the n t -th and n r -th antenna pair; with 1≤k≤p 0 . We extend the result of a single measurement in Equation (5.8) to the vector of all measurements via the following theorem. Theorem 10 The received signal vector at then r -th antenna of the MIMO OFDM for all time instancesn can be written as follows, y (nr ) = [y (nr ) [m 0 ],y (nr ) [m 0 + 1],...y (nr ) [n T +m 0 − 1]] T , (5.9) 87 where, y (nr ) [n] = 1 √ L X n E n , p 0 X k=1 T (nr ) k T D H (ν (nr ) k ) , (5.10) n =m 0 ,m 0 + 1,··· ,n T +m 0 − 1; where the definitions ofX n ,E n ,T (nr ) k , and D H (ν (nr ) k ) are given in Appendix A,hA,Bi = trace(B T A); andA T denotes the transpose of the matrixA, whileA H denotes the conjugate transpose of the matrixA. Consequently, the whole received MIMO OFDM received matrix at time instance n can be written as follows, Y = y (1) , y (2) , ..., y (Nr ) n T ×Nr . (5.11) Proof: See Appendix D.1. From now on, we drop the time instance n for simplicity. From Theorem 1, it is obvious that the first term in the matrix inner product, in Equation (5.9), is a function of the training sequence only; whilst the second term has only the narrowband time-varying MIMO OFDM channel information. Therefore, the MIMO OFDM leaked channel matrix at then r -th receiving antenna can be written as, H (nr ) le = p 0 X k=1 T (nr ) k T D H (ν (nr ) k ), (5.12) where each factor k in Equation (5.12) has rank N t at most. Consequently, rank(H (nr ) le )≤ N t p 0 . Therefore, Equation (5.9) reduces to a parametrized rank-N t p 0 matrix recovery problem, which can be written as, y (nr ) = Π H (nr ) le , (5.13) where the linear operator Π : C m 0 LNt×n T → C m 0 LNtn T ×1 , [Π(H (nr ) le )] n = 1 √ L hX n E n , P p 0 k=1 T (nr ) k T D H (ν (nr ) k )i withE n ∈ R m 0 ×n T = 1e T n−m 0 +1 ; wheree n is 88 the canonical basis vectors,m 0 ≤n≤n T −m 0 +1, forR n T ×1 , andH (nr ) le is a low-rank matrix with the parametric representation given in Equation (5.13). The channel matrixH (nr ) le , given in Equation (5.12), can be written as, H (nr ) le = p 0 X k=1 T (nr ) k T D H (ν (nr ) k ) =H (nr ) le,τ H (nr ) le,ν , (5.14) where, H (nr ) le,τ = [T (nr ) 1 T ,.....,T (nr ) p 0 T ]∈C NtLm 0 ×Ntp 0 , and (5.15) H (nr ) le,ν = [D(ν (nr ) 1 ),.....,D(ν (nr ) p 0 )] H ∈C Ntp 0 ×n T . (5.16) Given the above factorization of Equation (5.14) into Equation (5.15) and Equation (5.16), we conclude the following. Remark From Equations (5.15) and (5.16), we see that the matrix H (nr ) le,τ contains the channel components due to the leakage parameters only and the matrixH (nr ) le,ν contains the channel components due to the Doppler values only. We have a parametric relationship between the Doppler values and the components of D(ν (nr ) k ). On the other hand, the matrixH (nr ) le,τ is a function of both delay and Doppler as represented by the definition ofT (nr ) k . Unfortunately, trying to directly estimate the delay component led to performance degradation. Therefore, we call H (nr ) le,τ the “leakage” matrix, or the effective “delay matrix;” although it is dependent on Doppler as well, but we do not seek to estimate the delays directly. As will be seen later in Section 5.3.3 that describes the estimation algorithm; in developing our equalizer, we only need to useH (nr ) le,τ andH (nr ) le,ν without estimating the implicit delay parameters at all. From now on, the term delay means 89 the effective delay or the leakage, and does not mean the explicit delay values. Thus, we model the channel matrix as being approximately separable in leakage in the delay and Doppler domains. We originally derived this approximation in [100] and we refer the reader to Section II for the single-carrier case. Here, we note that a similar approximation holds in the MIMO-OFDM case. 5.3 Solving the Optimization 5.3.1 Convex Optimization via the Atomic Norm In this section, motivated by the convex program considered in [100] for single carrier signals, we extend that convex characterization to multi-antenna systems, via Equation (5.12). We will solve the optimization problem using the parametric low-rank atomic norm approach. The benefit of the atomic norm is that we have basis functions that are parameterized by continuous valued parameters [85, 111]. Thus, we do not determine a finite set of candidate basis functions. In [112], it is shown that a mismatch in the candidate basis functions can lead to large errors. If the unknown parameters are discretized, mismatch will typically occur. Furthermore, if these unknown parameters are discretized too finely, the resulting set of basis functions will be ill-conditioned. The channel model given in Equation (5.12) can be interpreted as a summation of rank-N t matrices; each of which is of the formT (nr ) k T D(ν (nr ) k ) H . We assume thatN t p 0 n T ; that is, the overall channel is sparse and thus its cardinality is less than that of the number of training symbols. In a noisy environment, the ultimate optimization problem using the atomic norm is written as, minimize H (nr) le H (nr ) le A s.t. y (nr ) − Π H (nr ) le 2 ≤σ 2 z , (5.17) 90 where σ 2 z is the noise variance, andk.k A is the atomic norm. More detail about the atomic norm can be found in [101, 103]. Details about the construction of the optimization problem in (5.17) are provided in Appendix D.2. Due to the Vandermonde decomposition, the convex hull of the set of atoms represented byA can be characterized by a semidefinite program (SDP), and the ultimate problem can be solved using off-the-shelf convex solvers, like CVX [94]. (See e.g., [84, 93, 100].) 5.3.2 Optimality and Uniqueness The optimality and uniqueness analyses are nearly the same as that presented in our previous work for single carrier [100]. Conditions of optimality of the solutions presented by the atomic norm can be derived with proper adaptation of the proofs [100] to comprise the different, new linear operator Π for the MIMO OFDM problem. Key matrices are defined differently and key conditions are checked for this new signal model. Herein, we have m 0 p 0 LN t N r parameters to be estimated rather than m 0 p 0 parameters for the single carrier case [100]. 5.3.3 Estimation Algorithm The Doppler parameters{ν k } p 0 k=1 are estimated as in [100] by obtaining dual parameters of the optimization problem presented in Equation (5.17); with appropriate resizing of the vectors and matrices presented therein. Having estimated{ν k } p 0 k=1 , one can directly determine{D(ν k )} p 0 k=1 , and accordingly,H le,ν . Additionally, after estimating{ν k } p 0 k=1 , we now can plug them into Equation (D.3) form 0 ≤n≤m 0 +n T −1 to obtain a system of linear equations that we solve to get{T k } p 0 k=1 , and accordinglyH le,τ . Finally, having determinedH le,ν andH le,τ , one can determine the effective leaked channelH le via the multiplication of the two matrices. 91 The atomic norm algorithm complexity is comparable to that of solving a system of linear equations which is on the order of the number of unknowns [84, 93], i.e., m 0 p 0 LN t N r . 5.4 Cram´ er–Rao bound 5.4.1 Cram´ er–Rao Bound Derivation In this section, we derive the CRB for the parameters of the leaked MIMO OFDM channel. We focus on the CRB as it has proven to be useful in training sequence design which is our ultimate goal [113, 114]. By substituting Equation (5.14) into Equation (5.10), we can writey (nr ) [n] as (with the factor 1 √ L dropped for simplicity), y (nr ) [n] =hX n E n ,H (nr ) le i =hX n E n ,H (nr ) le,τ H (nr ) le,ν i. From now on we drop the subscript “le” for simplicity. The received noisy vector at receiving antennan r for all measurementsn :m 0 ≤n≤n T +m 0 − 1 is written as, y (nr ) = diag(X T H(n r )) +z, (5.18) where, y (nr ) and z ∈ C n T ×1 ; z ∼ N (0 n T ,σ 2 z I n T ),H ∈ C NtLm 0 n T ×n T = 1 n T ⊗ vec (H), 1 n T and 0 n T are the all ones and all zeros column vectors of dimension n T , respectively, and⊗ is the Kronecker product. Also, X ∈ C NtLm 0 n T ×n T = [vec(X m 0 E m 0 ),..., vec(X n T +m 0 −1 E n T +m 0 −1 )]. Further, the operators vec(.) and diag(.) denote the column-wise matrix vectorization and taking the diagonal elements of matrix, respectively. The matrixH (nr ) can be expressed as the multiplication of 92 H (nr ) τ ∈ C NtLm 0 n T ×Ntp 0 andH (nr ) ν ∈ C Ntp 0 ×n T , as previously considered in Equations (5.14)-(5.16). Thus, Equation (5.18) reduces to, y (nr ) = diag X T H (nr ) τ H (nr ) ν +z. (5.19) In the sequel, we drop the superscript (n r ). We can rewrite Equation (5.19) in the following two forms, y =A τ (H ν X T )h τ +z =A ν (X T H τ )h ν +z, (5.20) where,h τ ∈ C N 2 t Lm 0 p 0 n T ×1 = vec(H τ ),h ν ∈ C Ntp 0 n T ×1 = vec(H ν ), andA τ andA ν are operators applied toH ν X T andX T H τ , respectively, withA τ :C Ntp 0 ×NtLm 0 n T → C n T ×N 2 t Lm 0 p 0 n T , andA ν : C n T ×Ntp 0 → C n T ×Ntp 0 n T . Specifically, the operation ofA τ can be described as follows (withA τ defined as the matrix representation of the operator A τ ;A τ ∈C n T ×N 2 t Lm 0 p 0 n T ), A τ (H ν X T ) =A τ = [X T ] 1 [H ν T ] 1 ⊗I NtLm 0 n T . . . [X T ] n T [H l,ν T ] n T ⊗I NtLm 0 n T , (5.21) where, [Z] i denotes thei th row of the matrixZ. Similarly,A ν is defined as the matrix representation of the operatorA ν ;A ν ∈C n T ×Ntp 0 n T , A ν (X T H τ ) =A ν = [X T H τ ] 1 ... 0 T Ntp 0 . . . . . . . . . 0 T Ntp 0 ... [X T H τ ] n T , (5.22) where, 0 Ntp 0 is the all zeros column vector of size N t p 0 . We have the following assumptions (A1)-(A4) and consequences (C1)-(C2): 93 A1) The training symbols x (nt) l [n] are drawn from an independent and identically distributed (i.i.d) sequence that has probability density function (pdf)p x (.) with zero mean and varianceσ 2 x . A2) Elements ofh τ are drawn from an i.i.d. sequence that has pdfp hτ (.). A3) Elements ofh ν are drawn from an i.i.d. sequence that has pdf p hν (.) with zero mean. A4) vec(X),h τ ,h ν , andz are mutually independent. C1) From A4), the matricesA τ andA ν are independent. C2) From A1) and A3),E(A τ ) = 0. Theorem 11 Under the regularity conditions in [107], assumptions A1-A4, and consequences C1-C2, the MSE matrix of the leaked MIMO OFDM channel estimator ˆ h(y), which is defined as h =E n [ ˆ h(y)−h][ ˆ h(y)−h] H o , satisfies, h ≥ 1 σ 2 z R τ +ρ τ I τ −1 0 0 1 σ 2 z R ν +ρ ν I ν −1 , (5.23) where, the latter matrix is the complex CRB matrix [107, 115],h = [h H τ ,h H ν ] H , I τ = I N 2 t Lm 0 p 0 n T is the identity matrix of size N 2 t Lm 0 p 0 n T and I ν = I Ntp 0 n T is the identity matrix of size N t p 0 n T , ρ τ = E ∂ lnp(h τ ) ∂h H τ 2 , ρ ν = E ∂ lnp(h ν ) ∂h H ν 2 ,R τ =E(A H τ A τ ) andR ν =E(A H ν A ν ). Proof: See Appendix D.3 . The above result shows that we have an effectively decoupled CRB onh τ andh ν . In other words, 94 CRB| hτ = 1 σ 2 z R τ +ρ τ I τ −1 , and (5.24) CRB| hν = 1 σ 2 z R ν +ρ ν I ν −1 . (5.25) As aforementioned in Section 5.2, the estimation ofh ν is parametric with respect to the Doppler values. Therefore, we can directly derive a CRB on the Doppler values instead ofh ν . We leverage the relationship for CRBs of functions of parameters [116]; that is, CRB| =T H CRB| hν T, where, ∈ C Ntp 0 n T is the vector of all Doppler parameters to be estimated for each receiving antenna for all measurements n T , T ∈ C Ntp 0 n T ×Ntp 0 n T is the Jacobian matrix of the transformation defined as: T = n ∂hνj ∂ i o ; j = 1, 2,...,N t p 0 n T , and i = 1, 2,...,N t p 0 n T . Thus, the CRB on the Doppler parameters can finally be written as follows, CRB| =T H 1 σ 2 z R ν +ρ ν I ν −1 T. (5.26) In section 5.4.2, we derive the CRB when the effective separability of the channel is not exploited, which allows us to determine how much gain we get when we consider the structure. 95 5.4.2 Cram´ er–Rao Bound on the Whole Channel Estimator We first derive the CRB for the unstructured case, i.e., for the whole channelH. We can rewrite Equation (5.18) as follows (with superscriptn r dropped for simplicity), y = diag(X T H) +z =A(X T )h l +z, (5.27) where,A is an operator applied toX T such thatA : C n T ×NtLm 0 n T → C n T ×NtLm 0 n 2 T , andh∈C NtLm 0 n 2 T ×1 = vec(H). The operation ofA can be described as follows (with A defined as the matrix representation of the operatorA;A∈C n T ×NtLm 0 n 2 T ), A(X T ) =A = [X T ] 1 0 T NtLm 0 n T ... 0 T NtLm 0 n T 0 T NtLm 0 n T [X T ] 2 ... 0 T NtLm 0 n T . . . . . . . . . . . . 0 T NtLm 0 n T 0 T NtLm 0 n T ... [X T ] n T , (5.28) where 0 NtLm 0 n T is the all zeros column vector of sizeN t Lm 0 n T . We make the following assumptions: B1) Elements ofh are drawn from an i.i.d. sequence that has pdfp h (.) with zero mean. B2) Elements of vec(X),h andz are mutually independent. 96 Theorem 12 Under the regularity conditions [107] and the assumptions A1 and B1-B2, the MSE matrix of the leaked MIMO OFDM channel estimator ˆ h(y), which is defined as h =E n [ ˆ h(y)−h][ ˆ h(y)−h] H o , satisfies, h ≥ " 1 σ 2 z R +ρI NtLm 0 n 2 T # −1 , (5.29) where, the latter matrix is the complex CRB matrix [107, 115],R = E(A H A), ρ =E ∂ lnp(h) ∂h H 2 andI NtLm 0 n 2 T is the identity matrix of sizeN t Lm 0 n 2 T . Proof: It is very similar to that of Theorem 11. We emphasize two points: 1. The decoupled CRB is derived for a vector = [h H τ ,h H ν ] H of length N 2 t Lm 0 p 0 n T + N t p 0 n T , i.e., O(n T ), because it leverages the effective “separability” of the channel in delay and Doppler domains; whilst the undecoupled CRB is derived for a vector =h of lengthN t Lm 0 n 2 T , i.e.,O(n 2 T ). 2. To connect the different CRBs, we define the structured vectorh s = [h H τ ,h H ν ] H = [vec(H τ ) H , vec(H ν ) H ] H ∈C (N 2 t Lm 0 p 0 n T +Ntp 0 n T )×1 . Invoking the relationship for CRBs of functions of parameters [116], CRB| hws =G H CRB| hs G, (5.30) where, h ws ∈ C NtLm 0 n 2 T ×1 is the whole effective channel vector when the separability structure is leveraged (subscript “ws” stands for whole structured), G∈ C (N 2 t Lm 0 p 0 n T +Ntp 0 n T )×NtLm 0 n 2 T is the Jacobian matrix of the transformation defined as: G = ∂hs,j ∂h ws,i ; j = 1, 2,...,N 2 t Lm 0 p 0 n T + N t p 0 n T , and i = 1, 2,...,N t Lm 0 n 2 T . 97 5.5 Sequences that optimize the Cram´ er–Rao Bound In this section, we seek the sequences that optimize the CRB. To that end, we assume the training sequences are deterministic. To compute the modified CRB, we have the following assumption D1 and consequence E1: D1) Elements ofh τ ,h ν , andz are mutually independent. E1) From D1), the matricesA τ andA ν are independent. Theorem 13 Under the regularity conditions [107], assumptions A2-A3 and D1, and consequences E1 and C2, the MSE matrix of the leaked MIMO OFDM channel estimator ˆ h(y), which is defined as h =E n [ ˆ h(y)−h][ ˆ h(y)−h] H o , satisfies, h ≥ 1 σ 2 z R τ +ρ τ I τ −1 0 0 1 σ 2 z R ν +ρ ν I ν −1 , (5.31) where, the latter matrix is the complex CRB matrix [107, 115], h,R τ ,R ν ,I τ ,I ν ,ρ τ , andρ ν are as defined in Theorem 11. Proof: It is very similar to that of Theorem 11. We observe that the resulting CRBs in Theorems 11 and 13 are identical for deterministic and random sequences due to the i.i.d assumptions. To find the sequences ˆ X that minimize the CRB in delay and Doppler domains separately, we pose the following optimization problems, arg min X J τ s.t tr XX H =P, (5.32) arg min X J ν s.t tr XX H =P, (5.33) 98 whereJ τ = tr 1 σ 2 z R τ +ρ τ I τ −1 ,J ν = tr 1 σ 2 z R ν +ρ ν I ν −1 , andP is a scalar works as a power constraint. We next introduce two approximate identities that will be employed to solve for the desired training sequences. The identities stem from seeking the stationary points of the objective functions specified in Equations (5.32) and (5.33). The optimal sequences are stationary points of these cost functions; however, solving for the optimal solutions yield fixed point equations. We eventually solve these fixed point equations iteratively. The term “approximate” is used here as there are two approximations we need to make in order to yield the desired identities. Approximate Identity 1 The training sequences X that optimize the CRB in the delay domain are the solution of the optimization problem presented in Equation (5.32) and are approximately given by solving the following implicit equation forX, − 1 σ 2 z tr C D T + tr E R ! 1 σ 2 z E H H aν (I n T ⊗X T ) H N(I n T ⊗X T )H aν +ρ τ I τ ! −2T +λX T = 0, (5.34) where the definitions ofH aν ,C,D,E,L,N, andλ are provided in the following proof. Proof: See Appendix D.4. 99 Approximate Identity 2 The training sequences X that optimize the CRB in the Doppler domain are the solution of the optimization problem presented in Equation (5.33) and are approximately given by solving the following implicit equation forX, − 1 σ 2 z tr W Q T + tr O P 1 σ 2 z E H H aτ (I n T ⊗X T ) H N(I n T ⊗X T )H aτ +ρ ν I ν ! −2T +λX T = 0, (5.35) where the definitions ofH aτ ,W,Q,O,P, andλ are provided in the proof. Proof: Similar to proof of Approximate Identity 1. The fixed point equations (5.34) and (5.35) are solved iteratively. We start with an initial point and keep iterating until the left and right hand side are nearly equal (an accuracy of 10 −6 ). 5.6 Simulations 5.6.1 Synthetic Data In this section, we perform simulations to test the efficacy of our MIMO OFDM channel estimation approach, which we name: Bilinear Leaked Channel Estimation (BLCE). We compare to an element-wise sparsity-based approach [20, 23], (i.e., LASSO). Furthermore, we compare to two MIMO OFDM channel estimation methods: 1) an adaptive sparse MIMO OFDM channel estimation method [109], Adaptive Sparse Channel Estimation (ASCE). ASCE is adaptive in the sense that the channel estimate at iterationn uses the previous estimaten−1 while inducing two penalties to minimize the normalized mean square error with a sparse penalty—leading to improved performance over other adaptive non-sparse methods, and 2) a sparsity-enhancing basis-optimization 100 5 10 15 20 25 SNR (dB) 10 -5 10 -4 10 -3 10 -2 10 -1 NMSE BLCE CRB n T = 6000 n T = 6000 n T = 8000 n T = 8000 N t = N r = 4 N t = N r = 2 Figure 5.1. CRB/NMSE of delay estimation versus SNR. 5 10 15 20 25 30 SNR (dB) -20 -15 -10 -5 0 5 NMSE/CRB (dB) LASSO ASCE SOMP-optimized BLCE Structured CRB Unstructured CRB Figure 5.2. CRB/NMSE of channel estimation techniques versus SNR. method that takes into account leakage effects [78], (SOMP-optimized), in which multichannel compressed sensing (MCS) is exploited via Simultaneous Orthogonal Matching Pursuit (SOMP) [117] in addition to a basis-expansion method that mitigates the leakage effects. Our narrowband time-varying channel is constructed based on the model in Equation (5.3), in which the channel delay, attenuation and Doppler parameters are randomly generated. Delay parameters are normalized to be in (0, 1], and normalized Doppler parameters with support [− 1 2 , 1 2 ], are both generated via uniform random variables. Attenuation parameters are generated using a Rayleigh random variable (σ = 1). We generate transmitted MIMO OFDM BPSK training symbols withL subcarriers, on each transmit antennan t . All simulations are averaged over 1000 experiments. Ground truth is denoted by 0 , and the estimated matrices as ˆ . The Normalized Mean Square Error (NMSE) is defined as || 0 − ^ || 2 F || 0 || 2 F , where||.|| 2 F is the squared Frobenius norm. In Fig. 5.1, we provide the NMSE of the delay matrix estimate, and a similar plot is obtained for the Doppler values estimate (omitted due to space limits). We fix the In all simulations, as mentioned earlier, the term “delay” implicitly means the effective delay component, or the leakage componentT . 101 simulation parameters m 0 = 3, p 0 = 2 and L = 64; we vary n T as well as N t and N r , and see the impact on the NMSE of the Doppler and delay estimates. First, it is clear that, for very small SNR values and for all values ofn T , N t andN r , the NMSE is low and the CRB is essentially achieved. Second, as we increase n T , performance improves. Third, increasing (N t ,N r ) degrades performance for fixedn T , as the number of estimated parameters will increase in this case. We conclude that we achieve strong estimation performance as long as the number of measurements,n T , is on the order of the number of unknowns,m 0 p 0 LN t N r , which matches the scaling law of [100, 118]. Fig. 5.2 shows the NMSE of the estimation results of the leaked MIMO OFDM channel matrix, defined as ||H− ^ H|| 2 F /||H|| 2 F , for our method versus LASSO (l 1 -norm), ASCE and SOMP-optimized for different SNR values. The simulation parameters used aren T = 8000,m 0 = 3,p 0 = 2,L = 64, andN t = N r = 2. As can be clearly seen, our estimation method performs significantly better than LASSO with an average improvement of 6.3 dB. Moreover, our method does considerably better than ASCE with an improvement of 2 dB, on average. These two methods do not exploit the knowledge of the leakage as we do. More importantly, for the SOMP-optimized method that considers the leakage effect; our method has an improvement of 1.3 dB, on average. Clearly, our method nearly approaches the whole structured CRB (given in (5.30)), suggesting near optimality of our method. Furthermore, the whole structured CRB yields a gain of 2 dB, on average, over the whole unstructured CRB—underscoring that the more structure considered, i.e., separability and sparsity, the greater the performance. We study the impact of the training sequences that optimize the CRB in the delay and Doppler domains. Simulation parameters used arem 0 = 3,p 0 = 4,L = 16,N r = N t = 2, andn T = 800. We simulate the following instances: 1. CRB-Random: Random training sequences are used. 102 2. CRB/NMSE-Optimal: Optimal training sequences are used (See Approximate Identities 1 and 2, Equations (5.34) and (5.35)) for the CRB calculation/ for calculating the NMSE. 3. CRB-Mismatched: We use the optimal training sequence for delay to estimate Doppler and vice versa. 4. CRB/NMSE-Quantized Optimal: y The optimized sequences are quantized to BPSK and used for the CRB calculation/ for calculating the NMSE. In Fig. 5.3, we provide the CRBs for the Doppler domain, and a similar set of plots is obtained for the delay domain (omitted due to space limits). It is clear from Fig. 5.3 that the optimized, continuous-valued sequence offers the best performance. A performance gain on the order of 5 dB over purely random sequences is achieved in the Doppler domain (and 2.5 dB gain in delay domain). Quantization degrades the performance a bit—but still offers improvement over the random BPSK sequences. Mismatched sequences have the worst performance. Our algorithm performance nearly approaches the structured CRB for optimal sequences as well as quantized-optimal sequences. We emphasize that the optimized sequences used in these experiments are based on solving the individual optimizations of delay objectiveJ τ and Doppler objectiveJ ν as given in (5.32) and (5.33), respectively. The same procedure of optimization can be applied jointly on both delay and Doppler objectives, i.e.,J joint = J τ + ξJ ν , where ξ is a weighting parameter. That being said, our goal is to understand the differences between the two sets of optimizing sequences, thus, the separate optimization was conducted. y The optimized sequences given by the solutions of (5.34) and (5.35) are in continuous forms, i.e., infinite number of alphabets can be used for signaling. Practical signaling, however, takes place in a discrete form, i.e., only finite number of alphabets is available, that is why we quantize the sequences. One is not limited to BPSK and can map to any discrete signaling method. 103 5 10 15 20 25 SNR (dB) 10 -5 10 -4 10 -3 10 -2 10 -1 NMSE/CRB (Doppler) CRB-Mismatched CRB-Random NMSE-Quantized Optimal CRB-Quantized Optimal NMSE-Optimal CRB-Optimal Figure 5.3. NMSE/CRB of Doppler for various settings versus SNR. 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 10 4 0 1 2 3 4 5 Delay Doppler White Gaussian Figure 5.4. Distance from whiteness versus sample size. We examine a measure of how white (uncorrelated) a sequence is; that is, the distance to whitenessW , [108], W (ϕ) = log 1 2π Z π −π ϕ(w)dw− 1 2π Z π −π log (ϕ(w)) dw, (5.36) whereϕ(w) is the Power Spectral Density (PSD) of the sequence. The function,W (ϕ), has two key properties [108]: W (ϕ) = 0 if and only if the sequence is white; and, as correlation increases, the value ofW (ϕ) increases as well. We compute the “periodogram,” I N (w) [108], for the estimation of the spectral density, which is given by, I N (w) = 1 2πN N X k=1 X k e jkw 2 , (5.37) where, N is the sample size, andX k is thek-th element of the sequence to be tested. Consequently, the new statistic ˆ W N is the estimated quantity of (5.36), whereϕ(w) is replaced withI N (w). 104 Figure 5.5. AF for delay-optimized sequences. Figure 5.6. AF for Doppler-optimized sequences. Figure 5.7. CAF between delay-optimized sequences and Doppler-optimized sequences. It is observed that lim N→∞ ˆ W N a.s − → W [108, 119]. The convergence in probability is sufficient for the test for whiteness [108], which is guaranteed by the almost-sure convergence, i.e., stronger. We provide numerical results of ˆ W N for delay-optimized and Doppler-optimized training sequences as well as purely white Gaussian sequences, as a function of the sample size N. The white Gaussian sequences are included as a reference for comparison. The sample size has the dimension of a single column of X, i.e., N t Lm 0 n T , and the results are averaged over all columns, n T , each of which is first averaged over 1000 runs. As can be seen from Fig. 5.4, as the sample size increases, all the sequences tend to be whiter; however, Doppler-optimized training sequences are faster in achieving this than delay-optimized sequences, leading to lower CRB values (as well as MSE values) for Doppler-optimized training sequences than delay-optimized training sequences, for any given SNR value. White Gaussian sequences are the fastest. We emphasize the fact that a transmitter essentially transmits one sequence in the end. Exhaustive search is performed to choose the best linear combination of both delay optimized and Doppler optimized sequences so that the NMSE is minimized. The result 105 is a 0.64 weight put on Doppler-optimized sequences, and 0.36 on delay-optimized sequences. Finally, we examine the ambiguity functions of both of the delay-optimized and Doppler-optimized sequences as well as the cross-ambiguity function between them. The ambiguity functions are used to visualize differences stemming from the resulting two sequences. Ambiguity functions are often employed by radar designers as a means of studying different waveforms [120]. They are used to determine the range and Doppler resolutions for a specific radar waveform. Mathematically, the ambiguity function of a waveform/sequences(t) is defined as follows, X (τ;ν) = Z ∞ −∞ s(t)s ∗ (t−τ)e j2πνt dt. (5.38) The expression in (5.38) can be extended to the cross-ambiguity function between two sequences/waveforms s 1 (t) and s 2 (t) by replacing s(t) with s 1 (t) and s ∗ (t−τ) with s ∗ 2 (t−τ). Since we do not have a continuous representation for the sequences, the ambiguity/cross-ambiguity functions are numerically calculated, for large sample size (using the “ambgfun” command in MATLAB). We denote the Ambiguity Function by AF, and the Cross-Ambiguity Function by CAF. For Figures 5.5 through 5.7, the simulation parameters used are m 0 = 3, p 0 = 4, N t = N r = 2, L = 16 and n T = 800. The sample size is as previously defined for the whiteness tests. For the AFs of the delay-optimized sequences, we calculate the AF of a single column of X, i.e., N t Lm 0 n T , and the results are averaged over all columns, n T , each of which is first averaged over 1000 runs. Similarly, for the AFs of the Doppler-optimized sequences. z For the CAF between delay-optimized and Doppler-optimized sequences, we consider the CAF between each column of the delay-optimized sequences matrix and the corresponding column of the z We also considered the CAFs among delay-optimized sequences separately and among Doppler-optimized sequences separately. The delay-optimized sequences are highly correlated with each other, so are the Doppler-optimized sequences. Results are omitted due to space limits. 106 Doppler-optimized sequences, and the results are averaged over all columns,n T , each of which is also first averaged over 1000 runs. x Fig. 5.5 and 5.6 show the absolute values of the normalized AF of delay-optimized sequences and Doppler-optimized sequences, respectively. Clearly, both sequences are nearly white as predicted by ˆ W N . Furthermore, at ν = 0 the absolute value of the AF reduces to the absolute value of the auto-correlation function which confirms that the Doppler sequences are more uncorrelated than the delay-optimized sequences. Additionally, Fig. 5.7 shows the absolute value of the normalized CAF between delay-optimized sequences and Doppler-optimized sequences. Similarly, atν = 0 the absolute value of the CAF reduces to the the absolute value of the cross-correlation function, which shows that the two sequences are strongly correlated with one another. 5.6.2 Experimental Data We use the data recorded during the 2008 Surface Processes and Acoustic Communications Experiment (SPACE’08) [110], which was conducted off the coast of Martha’s Vineyard, MA, from October 14 to November 1, 2008. Bandwidth is denoted byB, subcarrier spacing is denoted by Δf = B L , whereL is the total number of subcarriers in one OFDM symbol, and is equal to 1024. One OFDM subcarrier duration is denoted by T = 1 Δf . Every OFDM block is followed by a T g guard interval to avoid interblock interference. We also use the guard interval to estimate p 0 andm 0 , as will be shown later. The sampling ratef s is 39.0625 KHz, and the carrier frequency is 13 KHz. The frequency band used is from 8.12 KHz to 17.88 KHz, thus B = 9.765625 KHz. We use the first minute of the recordings, which has 2343750 samples; 480000 of which are the training samples, whereas other samples are used for x We also considered the CAF for all possible combinations of columns between delay-optimized sequences and Doppler optimized sequences. This is ann 2 T combinations. It shows also a high correlation between delay and Doppler-optimized sequences, for each possible pair. These results are omitted due to space limits. We observe that the results for matching columns suffice. 107 0 5 10 15 20 25 30 OFDM block index 2.5 3 3.5 4 BER 10 -3 BLCE SCE Figure 5.8. BER of SPACE’08 data for various OFDM blocks. zero-paddings, synchronization, and guard intervals between consecutive blocks. We divide that minute into two segments; each has 240000 training samples: For the first 30 seconds, 2× 2 MIMO-OFDM setting is implemented; whereas for the second 30 seconds, 2× 4 MIMO-OFDM setting is implemented. Given the above values for the key parameters, the value of Δf will be 9.5367 Hz and the value ofT will be 104.8577 ms, which corresponds to 16000 training samples; thus, we have a total of 30 OFDM blocks—each MIMO setting accordingly comprises 15 OFDM blocks. To apply our algorithms, p 0 and m 0 have to be known a priori. We estimate the two values from the T g guard interval. Samples that possess power larger than 50% of the power of the maximum-power sample are considered dominant paths; this isp 0 . Further,m 0 is calculated the as described in Section 5.2, Equation (5.5). We round up the division of the maximum delay spread (value in time that corresponds to the last sample that possesses power larger than 50% of the maximum-power sample) by the sampling timeT s to computem 0 . Additionally, necessary preprocessing is performed at the front end to bring the wideband nature of the underwater data to a narrowband realization, by means of resampling. 108 We compare our overall channel estimation results to a sparse channel estimation (SCE) technique [110], in which the same experimental SPACE’08 data is used. In [110], a channel estimator based on compressive sensing is implemented to exploit the channel natural sparsity via thel 1 -norm. Specifically, a modified, complex-valued version of Basis Pursuit (BP) is implemented. We build a minimum mean square error (MMSE) equalizer using the estimated channel matrix using these two algorithms, to equalize the channel (see for example Chapter 16.2 in [80]) for the next two minutes (total of 960000 effective samples). Each minute is divided into two 30 seconds portions, as mentioned earlier; with the two aforementioned MIMO settings. The BER is averaged over the two minutes for each MIMO OFDM setting. As shown in Fig. 5.8, our method is superior relative to SCE, with approximately an order of 2 (i.e., 3 dB), as the latter does not take into account leakage effects. It should also be mentioned that the abscissa in Fig. 5.8 represents the OFDM block index versus E b N 0 because, with real data, the noise variance is not known a priori. Acknowledgments We also would like to thank Prof. Shengli Zhou for providing the SPACE’08 experimental data used herein. 109 Chapter 6 Conclusions In Chapters 2 and 3, we extend the type of structures considered in the context of structured estimation beyond low rank and sparsity. In particular, in Chapter 2, we devise a formulation that enables the finding of “lines” in matrices. This generalization is applied to the tracking of a single object in video. We develop an iterative algorithm based on the ALM method to recover the underlying sparse and low-rank components of the matrix with the “line constraint.” The convergence of the iterative LE-ALM algorithm is assessed via a boundedness analysis of key components. Furthermore, we are able to show that an aggregate error converges to zero. We show that the accumulation points of the iterative algorithm both exist and are feasible solutions. The algorithm is assessed via simulation and shows measurable improvement over state-of-the-art background subtraction methods that do not exploit the line constraint. The robustness of LE-ALM to the line assumption is also investigated—when the trajectory deviates from a line, performance reduces to that of the previously proposed background subtraction methods exploiting sparsity and low-rank alone. Furthermore, our proposed technique has been compared to three target tracking techniques, and it has been shown that it gains over these techniques. Finally, a new CRB, that incorporates the additional line constraint, has been derived. Numerical results show 110 that the performance of LE-ALM is very close to the derived CRB, with only 1.2 dB deviation on average suggesting the near-optimality of LE-ALM. In Chapter 3, types of structures considered for background subtraction are extended beyond sparsity and low-rank. The incorporation of algebraic varieties into the background subtraction model is shown to be very efficient, in terms of tracking targets moving on nonlinear trajectories. We develop an iterative algorithm to solve the optimization and recover the low-rank and sparse components as well as the locations of the sparse object, induced by the “algebraic variety.” The proposed VBBS-IRLS algorithm is shown to be convergent as iterations progress. Exact recovery is observed in a noiseless environment, whereas a 2.4 dB performance improvement is observed over previous background subtraction methods that consider sparse and low-rank structures alone. Convergence of the proposed algorithm is thoroughly studied. A boundedness analysis is conducted where it is proved that the sequence is bounded at every iteration (not divergent), and any limit point of the sequence generated by the algorithm is a stationary point, in addition to an error approaching zero on the limit. A performance bound on the proposed algorithm is provided, which binds all system parameters together along with the sensitivity to initialization to achieve a required estimation accuracy. In Chapters 4 and 5, we develop new parametric descriptions of narrow-band time-varying channels which lead to novel channel estimation strategies. In particular, the representations admit a low-rank, bilinear form as functions of the Doppler, delay, and pulse shape functions. These forms suggest matrix factorization methods for channel estimation. In Chapter 4, we consider the single carrier framework. Despite the popularity of Wirtinger flow methods for solving non-convex optimizations with good initializations, we show that such a strategy does not work well for our channel estimation application. Thus, the alternating/gradient descent approach (NADM) for the non-convex objective function suffers from local minima. If the channel is identifiable, it can be shown that the gradient is only zero at the true channel in the 111 noiseless case, however it is also proven that the variance of the gradient is high, implying slow convergence. In contrast, a different parametric representation leads to a convex optimization wherein the atomic norm heuristic can be employed to promote sparsity/low-rank. The parametric, low rank, atomic norm (PLAN) approach can be implemented via a semi-definite program. Uniqueness conditions are established for PLAN as well as optimality conditions. Finally, a scaling law is developed that is dependent on a practically achievable resolution constraint (how close two paths can be in Doppler and delay). A challenge for the future is to fully incorporate the functional form of both Doppler and delay. As true channel sparsity is lost when practical pulse shapes are employed, the proposed methods offer strong improvement over a classical estimation strategy which assumes a truly sparse observation, i.e. LASSO. Simulation results show that the proposed methods can offer anywhere from 5− 12dB improvement over LASSO. Furthermore, a 2 dB improvement is observed, on average, over the basis expansion method that considers leakage effects. In Chapter 5, the MIMO OFDM framework is considered. It is proven that the leaked MIMO OFDM channel is effectively separable in delay and Doppler domains. A convex approach to solve the resulting optimization problem is considered, in which the atomic norm heuristic is employed. Furthermore, the CRB for the leaked channel parameters is derived to measure the potential optimality of the proposed method. The CRB for the different parameters is proven to be effectively decoupled in delay and Doppler domains. The CRB on the whole channel estimator is also derived to demonstrate that exploiting more structure can provide tighter bounds. Additionally, fixed point expressions for the training sequences that optimize the structured CRB in delay and Doppler domains are derived. Simulation results show an improvement of 6.3 dB on average, with regards to the mean square error, when we compare with classical sparsity-based channel estimation method, and 2 dB improvement, when we compare with an adaptive sparse 112 estimation method. The latter two methods do not consider leakage effects, as we do. Furthermore, our method offers 1.3 dB improvement, with regards to the mean square error, when compared to a sparse basis-expansion method that takes leakage effects into consideration. Simulation results show that the estimated parameters approach the different CRB variants, even at low SNR regimes—suggesting optimality of the proposed method. Optimized training sequences exhibit a performance gain on the order of 5 dB and 2.5 dB over purely random sequences used for training for Doppler and delay, respectively. Further, those training sequences, are highly correlated with one another, as observed by simulations. The proposed method works considerably well with experimental data, SPACE’08, and offers a 3 dB improvement, with regards to the BER, when compared to a traditional sparse recovery method that considers the same experimental dataset. 113 Appendix A Line Constrained Estimation with Applications to Target Tracking: Exploiting Sparsity and Low-rank A.1 Proof of Theorem 1 A.1.1 Background and preliminaries We denotex = [l T ,s T ] T , and hence the error covariance matrix and the MSE are given by, x =E{(ˆ x−x)(ˆ x−x) T }, and MSE x =E{||ˆ x−x|| 2 2 } = tr( x ). (A.1) We consider local unbiased estimators [54, 55, 58, 60]. The local unbiased condition imposes the unbiased constraint on parameters in the neighborhood of a single point. More specifically, it is required thatE{ˆ x(y)} =z such thatz∈B (x)∩X l,s , where B (x) is a ball aboutx and is defined asB (x) ={z∈ R mn :||z−x|| 2 ≤ }. The 114 expectation is taken over the noisen for the model in-hand, i.e., y = Ax +n. It shall be noted thatB (x) contains only those pointsz that are close tox and satisfy the constraint set imposed by the parameter spaceX l,s . Unlike the unconstrained case, both the biasb(z) and the bias gradientB(z) = ∂b ∂z vanish at the considered pointz =x. From [55] (Lemma 2), we have the error covariance matrix for any locally unbiased estimator satisfies (the† below denotes the pseudo-inverse), x ≥P [P T J x P ] † P T , (A.2) provided that the 2mn× 2mn Fisher information matrix, J x = −E{∇ 2 lnf x } has bounded elements, and rank(P T J x P ) = rank(P ). The function f x represents the probability density function for the given parameterx. Furthermore,P represents any 2mn×N matrix that has a column space equals span{ 1 , 2 ,....., N }; and{ i } N i=1 are N linearly independent vectors withx+δ i i ∈X l,s for sufficiently smallδ i > 0; where, recall Equation (2.12), we have, X l,s def = {(L,S) ∈ R m×n ×R m×n : rank(L) ≤ r,||S|| 0 ≤ s, rank(S) = 1}. Explanations of how this set is constructed was previously provided in Section 2.5. A.1.2 Derivations We derive the constrained CRB on the MSE (Equation (A.1)) for the model presented in (3.6) by applying Equation (A.2). The derivations presented in [54, 55] have been adapted to accommodate the additional line constraint. For any x ∈ X l,s , let{ i } N i=1 be N linearly independent vectors that satisfy {x i = x +δ i i } N i=1 ∈X l,s for sufficiently small δ i > 0, following lemma 2 in [55], we can obtain a tighter lower bound if we select{ i } N i=1 spanning a subspace as large as 115 possible. As suggested by this Lemma, and motivated by the work in [54, 55], the next part of the derivations is about finding the directions for bothL andS. FindingL Directions: Since we have no change in theL matrix, it follows that theL directions in our CRB are the same as those provided in [54]. LetL =U 0 0 V T 0 be the SVD ofL, whereU 0 = [u 1 ,u 2 ,....u r ]∈R m×r , 0 = diag([λ 1 ,λ 2 ,....λ r ])∈R r×r , andV 0 = [v 1 ,v 2 ,....v r ]∈ R n×r . Let alsoU = [U 0 ,U 1 ]∈R m×m ,V = [V 0 ,V 1 ]∈R n×n ; whereU 1 andV 1 are the unit orthogonal bases for the spaces orthogonal to span{U 0 } and span{V 0 }, respectively. Using the Kronecker product, theL directions could be expressed as the column vectors ofV 1 N U 0 ,V 0 N U 0 , andV 0 N U 1 . Then,Q l = [V 1 N U 0 ,V 0 N U 0 ,V 0 N U 1 ] ∈ R mn×[(m+n)r−r 2 ] . It shall be noted that the columns of the matrixQ l are linearly independent, i.e.,Q T l Q l =I (m+n)r−r 2. In other words, rank(Q l ) = (m +n)r−r 2 . (For full derivations, see [54]). FindingS Directions: RecallS =supp(S)={(i,j)∈ [m]× [n] :S ij 6= 0}, andS is of sparsity levels, and has a line structure embedded in it. To find the linearly independent directions forS, we investigate the sparsity constraint and the line constraint separately, and we combine afterwards. For the sparsity constraint, as in [54], the directions given by s i,j = vec e m i e n T j , (i,j) ∈ S satisfy that s + δ i,j s i,j has at most s nonzero components for sufficiently small δ i,j . Let then Q s ∈ R mn×s be the matrix whose columns are vec e m i e n T j , (i,j)∈S. For the rank-one constraint that we have added to the definition ofX l,s in (2.12), and following the same procedure that we applied for the rank ofL; except here, for 116 S, we have r = 1; the columns ofQ s thus have to be linearly independent. Hence, Q s ∈R mn×m+n−1 . Combining both conditions together, we eventually haveQ s ∈R mn×min(m+n−1,s) . Combining the results presented for finding theL directions and theS directions, and definingP as follows, P = Q l 0 0 Q s ∈R [2mn]×[(m+n)r−r 2 +min(m+n−1,s)] . (A.3) Hence, the columns ofP , namely i form a set of unit orthogonal directions such that {x i =x +δ i i } N i=1 ∈X l,s withN = (m +n)r−r 2 + min(m +n− 1,s). Recall (A.2), x ≥P [P T J x P ] † P T . From that point on, the derivations of CRB presented have been adapted for the new matrixP (because of the new matrixQ s ). Since the Fisher information matrix for the model presented in (2.11) isJ x =A T −1 A, then, x ≥P [P T (A T −1 A)P ] † P T . (A.4) Taking the trace of both sides of (A.4) and carrying out a bit of algebra yields, tr( x )≥ tr Q l 0 0 Q s T (A T −1 A) Q l 0 0 Q s −1 . (A.5) Rearranging the right hand side of (A.5), and using the equality in Equation (A.1), that is, MSE L,S = tr( x ), yields MSE L,S ≥ tr Q T l A T Q T s A T −1 AQ l AQ s −1 . This completes the proof of Theorem 1. 117 It has to be mentioned, (also see [54, 55]), that Theorem 1 holds as long as the condition: rank AQ l AQ s = (m +n)r−r 2 + min(m +n− 1,s). The rationale behind this is to select (m +n)r−r 2 + min(m +n− 1,s) linearly independent rows of Q l Q s through the operation AQ l AQ s , and this requiresA∈R p×mn withp being sufficiently large to select (m+n)r−r 2 +min(m+n−1,s) linearly independent rows of Q l Q s . Different from [54, 55], we herein have the min term that limits the choices onQ s not only to be sparse but also to be rank-one. A.2 Proof of Corollary 1 Following [54] (Equation (20) therein—adapted to the bound derived in Theorem 1), if we let =σ 2 I p andP Ω ∈R mn×mn to denote a diagonal matrix whose diagonal entries are ones for indexes in Ω and zeros otherwise. For RPCA with Ω = [m]× [n] with size p =mn, and under the assumption ofp− min(m +n− 1,s)≥ c(Nlog 3 N) 2 × log( N 2 ), where N (throughout this section) is denoted by (m + n)r− r 2 , c and are some constants; applying the matrix inversion formula to the right hand side of (2.13), we have MSE L,S ≥ CRB = (min(m +n− 1,s))−N σ 2 + tr (Q T l P Ω/S Q l ) −1 σ 2 + tr (Q T l P Ω/S Q l ) −1 (Q T l P Ω Q l ) σ 2 . (A.6) Seeking the third term in the CRB in Equation (A.6) and using the fact that P Ω = P Ω/S +P S , we then have, tr Q T l P Ω/S Q l −1 Q T l P Ω Q l σ 2 =Nσ 2 + tr Q T l P Ω/S Q l −1 Q T l P S Q l σ 2 . (A.7) 118 Substituting the result obtained in Equation (A.7) into Equation (A.6) yields, CRB = (min(m +n− 1,s))−N σ 2 + tr (Q T l P Ω/S Q l ) −1 σ 2 +Nσ 2 + tr Q T l P Ω/S Q l −1 Q T l P S Q l σ 2 . (A.8) Rearranging terms yields, CRB = (min(m +n− 1,s)) σ 2 + tr (Q T l P Ω/S Q l ) −1 I N + (Q T l P S Q l ) σ 2 . (A.9) For our case, i.e., RPCA, and with the previously mentioned definition ofP Ω , we can writeP Ω = I np = P Ω/S +P S . On the other hand, with the choice of S as a random and uniform subset of Ω (recall the statement of Corollary 1), thenP S = I np −P S c. Combining the two aforementioned statements together, that consequently makesP Ω/S =P S c. Substituting this into Equation (A.9) yields, CRB = (min(m +n− 1,s)) σ 2 + tr (Q T l P S cQ l ) −1 I N + (Q T l P S Q l ) σ 2 = min(m +n− 1,s)−N + 2 tr(Q T l P S cQ l ) −1 σ 2 . (A.10) Applying the concentration result derived in [55] (Theorem 2 therein) to the process defined by 2(Q T l P S cQ l ) −1 in Equation (A.10), we finally get, min(m +n− 1,s)−N + 1 3 pN p− min(m +n− 1,s) + 2 3 mnN p− min(m +n− 1,s) σ 2 ≤ CRB≤ min(m +n− 1,s)−N + 3 pN p− min(m +n− 1,s) + 2 mnN p− min(m +n− 1,s) σ 2 (A.11) 119 with probability greater than 1−10e −c 2 . Furthermore, with the choice ofp =mn, (A.11) simplifies to, min(m +n− 1,s)−N + mnN mn− min(m +n− 1,s) σ 2 ≤ CRB≤ min(m +n− 1,s)−N + 5 mnN mn− min(m +n− 1,s) σ 2 with probability greater than 1− 10e −c 2 . This completes the proof of Corollary 1. 120 A.3 Proof of Theorem 2 Recall, β k+1 = ρβ k , with ρ > 1. Moreover, we assume the geometrically increasing positive sequence{β k } satisfies P k 1/β k <∞. Then, L k+1 =L(L k+1 ,S k+1 ,R k+1 ,S r k+1 ,M k ,N k ,V k ,β k ) (a) ≤L(L k+1 ,S k+1 ,R k+1 ,S r k ,M k ,N k ,V k ,β k ) (a) ≤L(L k+1 ,S k+1 ,R k ,S r k ,M k ,N k ,V k ,β k ) (a) ≤L(L k+1 ,S k ,R k ,S r k ,M k ,N k ,V k ,β k ) (a) ≤L(L k ,S k ,R k ,S r k ,M k ,N k ,V k ,β k ) (A.12) (b) =kL k k ∗ +λ 1 kS k k 2,1 +λ 2 kS r k k ∗ +hM k ,X−L k −S k i + D N k ,R k R T k −I m E hV k ,S r k −R k S k i + β k 2 kX−L k −S k k 2 F +kR k R T k −I m k 2 F +kS r k −R k S k k 2 F −L k +L k (c) =kL k k ∗ +λ 1 kS k k 2,1 +λ 2 kS r k k ∗ +hM k ,X−L k −S k i + D N k ,R k R T k −I m E +hV k ,S r k −R k S k i + β k 2 kX−L k −S k k 2 F +kR k R T k −I m k 2 F +kS r k −R k S k k 2 F − kL k k ∗ +λ 1 kS k k 2,1 +λ 2 kS r k k ∗ +hM k−1 ,X−L k −S k i + D N k−1 ,R k R T k −I m E +hV k−1 ,S r k −R k S k i + β k−1 2 kX−L k −S k k 2 F +kR k R T k −I m k 2 F +kS r k −R k S k k 2 F ! +L k . Rearranging terms yields, L k+1 ≤hM k −M k−1 ,X−L k −S k i + D N k −N k−1 ,R k R T k −I m E + hV k −V k−1 ,S r k −R k S k i + β k −β k−1 2 kX−L k −S k k 2 F +kR k R T k −I m k 2 F +kS r k −R k S k k 2 F +L k 121 (d) =β k−1 kX−L k −S k k 2 F +β k−1 kR k R T k −I m k 2 F +β k−1 kS r k −R k S k k 2 F + β k −β k−1 2 kX−L k −S k k 2 F +kR k R T k −I m k 2 F +kS r k −R k S k k 2 F +L k , where inequality (a) holds due to the fact of the iterative procedure [43–45]. Equality (b) follows from substituting in the matrix values from (A.12) into (2.3). Equality (c) follows from substituting in the definition ofL k from (2.3). Equality (d) follows from substituting in the definitions of the Lagrange multipliers from (2.4). Rearranging terms yields, L k+1 −L k ≤ β k +β k−1 2 kX−L k −S k k 2 F +kR k R T k −I m k 2 F +kS r k −R k S k k 2 F (e) = β k +β k−1 2 e k (f) = β k +β k−1 2 ||M k −M k−1 || 2 F β 2 k−1 + ||N k −N k−1 || 2 F β 2 k−1 + ||V k −V k−1 || 2 F β 2 k−1 ! , (A.13) where equality (e) holds according to definition of the composed error, e k , given in Equation (2.10), and equality (f) follows from the substitution of the composed error from Equation (2.10) into (2.4). Further, from the last equality, the error is of the form: e k =O(β −2 k−1 ). Accordingly, L k+1 −L k ≤ β k +β k−1 2 e k = β k−1 (1 +ρ) 2 e k = (1 +ρ) κβ k−1 , whereκ is some constant. Sinceβ k is geometrically increasing sequence;L is bounded. Additionally, due to the form ofβ k and from (A.13), lim k→∞ e k = 0. This completes the proof of Theorem 2. A.4 Proof of Theorem 3 Before presenting the proof of Theorem 3, we need a key Lemma. 122 Lemma 2: The following limits hold: lim k→∞ β k 2 kX−L k+1 −S k+1 k 2 F +hM k ,X−L k+1 −S k+1 i = 0, lim k→∞ β k 2 kR k+1 R T k+1 −I m k 2 F +hN k ,R k+1 R T k+1 −I m i = 0, lim k→∞ β k 2 kS r k+1 −R k+1 S k+1 k 2 F + D V k ,S r k+1 −R k+1 S k+1 E = 0. Proof: See below. Proof of Lemma 2: For the first equality statement in Lemma 2, taking the Frobenius norm of both sides of the second equation of (2.4) yields, kM k+1 k 2 F =kM k k 2 F +β 2 k kX−L k+1 −S k+1 k 2 F + 2β k hM k ,X−L k+1 −S k+1 i. Simplifying yields, kM k+1 k 2 F −kM k k 2 F 2β k = β k 2 kX−L k+1 −S k+1 k 2 F +hM k ,X−L k+1 −S k+1 i. Since β k is geometrically increasing sequence, hence, β k 2 kX−L k+1 −S k+1 k 2 F + hM k ,X k −L k+1 −S k+1 i approaches zero ask gets large. Similarly, for the second equality statement in Lemma 2, taking the Frobenius norm of both sides of the third equation of (2.4) yields, kN k+1 k 2 F =kN k k 2 F +β 2 k kR k+1 R T k+1 −I m k 2 F + 2β k D N k ,R k+1 R T k+1 −I m E . Simplifying yields, kN k+1 k 2 F −kN k k 2 F 2β k = β k 2 kR k+1 R T k+1 −I m k 2 F + D N k ,R k+1 R T k+1 −I m E . 123 Again, since β k is geometrically increasing sequence, hence, β k 2 kR k+1 R T k+1 −I m k 2 F + D N k ,R k+1 R T k+1 −I m E approachs zero ask gets large. Finally, for the third equality statement in Lemma 2, taking the Frobenius norm of both sides of the fourth equation of (2.4) yields, kV k+1 k 2 F =kV k k 2 F +β 2 k kS r k+1 −R k+1 S k+1 k 2 F + 2β k D V k ,S r k+1 −R k+1 S k+1 E . Simplifying yields, kV k+1 k 2 F −kV k k 2 F 2β k = β k 2 kS r k+1 −R k+1 S k+1 k 2 F + D V k ,S r k+1 −R k+1 S k+1 E . This completes the proof of Lemma 2. Substituting the definition ofL k+1 from (2.3), L k+1 =L(L k+1 ,S k+1 ,R k+1 ,S r k+1 ,M k ,N k ,V k ,β k ) =kL k+1 k ∗ +λ 1 kS k+1 k 2,1 +λ 2 kS r k+1 k ∗ +hM k ,X−L k+1 −S k+1 i + D N k ,R k+1 R T k+1 −I m E + D V k ,S r k+1 −R k+1 S k+1 E + β k 2 kX−L k+1 −S k+1 k 2 F +kR k+1 R T k+1 −I m k 2 F +kS r k+1 −R k+1 S k+1 k 2 F Rearranging yields, kL k+1 k ∗ +λ 1 kS k+1 k 2,1 +λ 2 kS r k+1 k ∗ =L k+1 − hM k ,X−L k+1 −S k+1 i + β k 2 kX−L k+1 −S k+1 k 2 F − D N k ,R k+1 R T k+1 −I m E + β k 2 kR k+1 R T k+1 −I m k 2 F − D V k ,S r k+1 −R k+1 S k+1 E + β k 2 kS r k+1 −R k+1 S k+1 k 2 F . 124 Rearranging terms yields, kL k+1 k ∗ +λ 1 kS k+1 k 2,1 +λ 2 kS r k+1 k ∗ =L k+1 −hM k ,X−L k+1 −S k+1 i−hN k ,R k+1 R T k+1 −I m i−hV k ,S r k+1 −R k+1 S k+1 i − β k 2 kX−L k+1 −S k+1 k 2 F +kR k+1 R T k+1 −I m k 2 F kS r k+1 −R k+1 S k+1 k 2 F | {z } e k+1 . (A.14) From Theorem 2, we have thatL k+1 is bounded. From Lemma 2, all other terms on the right hand side of (A.14) are bounded, thus the left hand side is bounded. Consequently, L k ,S k , andS r k are bounded sequences, as ise k due to the fact that it is positive and it converges to zero, as proved in Theorem 2; recalling Equation (2.10),R k is; thus, bounded. This completes the proof of Theorem 3. 125 Appendix B Nonlinear Trajectory Tracking via Variety-Based Background Subtraction Models B.1 Proof of Theorem 4 As stated in Theorem 4, we need to prove three statements: I1)J (S t+1 )≤J (S t ). I2){S t }<∞∀t. I3) lim t→∞ kS t −S t+1 k F = 0. Before proceeding, we provide several key Lemmas from prior art. 126 Lemma 3: (See [70]) Assume thath(Y ) is concave and differential on the set of symmetric, positive semidefinite matricesS n ++ . Then, for any two matrices Y,Z ∈ S n ++ the following holds, h(Y )−h(Z)≥ tr (Y−Z) T ∇h(Y ) . Lemma 4: (See [121]) ForG∈R n×n , tr(G u ) is concave onS n ++ when 0<u< 1. Lemma 5: (See [70]) LetY andZ∈R m×n be general two matrices, such that each column ofY and Z is nonzero. Letf j (w), j = 1,...,n, be concave and differentiable function, then the the following holds, n X j=1 f j (kY (j) k 2 2 )−f j (kZ (j) k 2 2 )≥ tr (Y T Y−Z T Z)E , whereE is a diagonal matrixR n×n with E jj =∇f j (kY (j) k 2 2 ). For the special case wheref j (w) =w u 2 , 0<u< 2, the following holds, kYk c 2,c −kZk c 2,c ≥ c 2 tr (Y T Y−Z T Z)E , whereE jj = (kY (j) k 2 2 ) c 2 −1 . 127 Lemma 6: (See [122]) Let be two matricesY andZ∈S n ++ . Letl 1 (Y )≥ l 2 (Y )≥···≥ l n (Y )≥ 0, and l 1 (Z)≥ l 2 (Z)≥···≥ l n (Z)≥ 0 be ordered eigenvalues ofY andZ, respectively. Then, the following holds, tr(YZ)≥ n X i=1 l i (Y )l n−i+1 (Z). A) Proof of statement I1: First, from (3.9), L(S t )−L(S t+1 ) = tr (X−S t ) T (X−S t ) +μ 2 1 I n q 2 − tr (X−S t+1 ) T (X−S t+1 ) +μ 2 1 I n q 2 (d) ≥ q 2 tr (X−S t ) T (X−S t )− (X−S t+1 ) T (X−S t+1 ) T (X−S t ) T (X−S t ) +μ 2 1 I n q 2 −1 = q 2 tr (X−S t )− (X−S t+1 ) T (X−S t )− (X−S t+1 ) M t ! +q tr (X−S t )− (X−S t+1 ) T X−S t+1 M t ! = q 2 tr (S t+1 −S t ) T (S t+1 −S t )M t +q tr (S t+1 −S t ) T (X−S t+1 )M t , (B.1) where (d) holds by applying Lemma 4 by lettingh(G) = tr((G+μ 2 1 I n ) q 2 ) and applying Lemma 3 withY = (X−S t ) T (X−S t ) andZ = (X−S t+1 ) T (X−S t+1 ) . 128 Second, from (3.10), Q(S t )−Q(S t+1 ) = n X j=1 kS t (j) k 2 2 +μ 2 2 c 2 − n X j=1 kS (t+1) (j) k 2 2 +μ 2 2 c 2 , (e) ≥ c 2 tr (S T t S t −S T t+1 S t+1 )N t ≥ c 2 tr (S t −S t+1 ) T (S t −S t+1 )N t +c tr (S t −S t+1 ) T S t+1 N t , (B.2) where (e) holds by applying Lemma 5 by letting f j (w) = (w +μ 2 2 ) u 2 ,Y = S t and Z =S t+1 . Finally, from (3.11), P(S t )−P(S t+1 ) = tr φ T d (CS t )φ d (CS t ) +μ 2 3 I n p 2 − tr φ T d (CS t+1 )φ d (CS t+1 ) +μ 2 3 I n p 2 (f) = tr (S T t DS t + 1) d +μ 2 3 I n p 2 − tr S T t+1 DS t+1 + 1) d +μ 2 3 I n p 2 (g) ≥ pd 2 tr S T t DS t −S t+1 DS t+1 ! T (S T t DS t + 1) d−1 (S T t DS t + 1) d +μ 2 3 I n p 2 −1 ! (h) = pd 2 tr S T t DS t −S t+1 DS t+1 ! T F t W t ! = pd 2 tr CS t −CS t+1 T CS t −CS t+1 F t W t ! +pd tr CS t −CS t+1 T CS t+1 F t W t ! (i) = pd 2 tr (S t −S t+1 ) T D(S t −S t+1 )(F t W t ) +pd tr (S t −S t+1 )) T DS t+1 (F t W t ) , (B.3) 129 where (f) holds from (3.14), (g) holds by applying Lemma 4 by lettingh(G) = tr((G+ μ 2 3 I k ) p 2 ) and applying Lemma 3 withY =S T t DS t andZ =S T t+1 DS t+1 , (h) holds by substituting the definition ofW t andF t , and (i) holds due to substituting the definition C T C =D. Combining Equations (B.1), (B.2) and (B.3) yields, J (S t )−J (S t+1 ) = (L(S t )−L(S t+1 )) +λ 1 (Q(S t )−Q(S t+1 )) +λ 2 (P(S t )−P(S t+1 )) ≥ q 2 tr (S t+1 −S t ) T (S t+1 −S t )M t +q tr (S t+1 −S t ) T (X−S t+1 )M t + λ 1 c 2 tr (S t −S t+1 ) T (S t −S t+1 )N t +λ 1 c tr (S t −S t+1 ) T S t+1 N t + λ 2 pd 2 tr (S t −S t+1 ) T D(S t −S t+1 )(F t W t ) +λ 2 pd tr (S t −S t+1 )) T DS t+1 (F t W t ) . (B.4) AsS t+1 is a stationary point of Equation (3.17), we have that, q(S t+1 −X)M t +λ 1 cS t+1 N t +λ 2 pdDS t+1 (F t W t ) = 0. (B.5) A dot product with (S t −S t+1 ) T on both sides of (B.5) and taking trace of both sides yield, qtr (S t −S t+1 ) T (S t+1 −X)M t +λ 1 ctr (S t −S t+1 )) T S t+1 N t + λ 2 pdtr (S t −S t+1 ) T DS t+1 (F t W t ) = 0. (B.6) 130 Substituting the result of (B.6) into (B.4) yields, J (S t )−J (S t+1 ) = (L(S t )−L(S t+1 )) +λ 1 (Q(S t )−Q(S t+1 )) +λ 2 (P(S t )−P(S t+1 )) ≥ q 2 tr (S t+1 −S t ) T (S t+1 −S t )M t + λ 1 c 2 tr (S t −S t+1 ) T (S t −S t+1 )N t + λ 2 pd 2 tr (S t −S t+1 ) T D(S t −S t+1 )(F t W t ) ≥ 0. (B.7) Thus,J (S t ) is non-increasing, i.e.,J (S t+1 )≤J (S t ), which proves statement (I1). B) Proof of statement I2: To prove the boundedness of the sequencesS t , one can use any norm in the objective (3.7). IfkX−Sk q Sq is used, then, kX−Sk q Sq (j) ≤ tr (X−S t ) T (X−S t ) +μ 2 1 I n q 2 (k) = tr(M 2 q−2 t M t ) = tr(M q q−2 t ) (l) ≤ tr(M q q−2 t ) +λ 1 tr(N c c−2 t ) +λ 2 tr(W p p−2 t ) (m) =J (S t ) (n) ≤J (S 0 ) = Δ τ, (B.8) where (j) holds since the value of the non-smooth norm is always less than or equal to the value of the smooth norm (see theorem 1 in [70]), (k) holds since M = (X−S) T (X−S) +μ 2 I q 2 −1 , (l) holds since λ 1 ,λ 2 ≥ 0 and N,W are PSD matrices, (m) holds from the definition ofJ (S t ) in (3.8), and (n) holds from (B.7), thus the sequence{S t } is bounded, which proves statement (I2). Similarly, ifkSk c 2,c is used, then, 131 λ 1 kSk c 2,c ≤λ 1 n X j=1 kS (j) k 2 2 +μ 2 2 c 2 =λ 1 tr(N 2 c−2 t N t ) =λ 1 tr(N c c−2 t ) ≤λ 1 tr(N c c−2 t ) + tr(M q q−2 t ) +λ 2 tr(W p p−2 t ) =J (S t )≤J (S 0 ) = Δ τ. (B.9) Similarly, ifkφ d (CS)k p Sp is used, then, λ 2 kφ d (CS)k p Sp ≤λ 2 tr φ T d (CS)φ d (CS) +μ 2 3 I n p 2 =λ 2 tr(W 2 p−2 t W t ) =λ 2 tr(W p p−2 t ) ≤λ 2 tr(W p p−2 t ) + tr(M q q−2 t ) +λ 1 tr(N c c−2 t ) =J (S t )≤J (S 0 ) = Δ τ. (B.10) C) Proof of statement I3: Corollary 2: Inequality (B.8) implies the following, min{min i η i (M t ), min i η i (N t ), min i η i (W t )} ≥ min{τ q q−2 , (λ −1 1 τ) c c−2 , (λ −1 2 τ) p p−2 } =γ, whereλ 1 ,λ 2 are weighting parameters andη i is thei th eigenvalue of the matrix. 132 Proof of Corollary 2: From (B.8), we have, tr(M q q−2 t ) +λ 1 tr(N c c−2 t ) +λ 2 tr(W p p−2 t )≤τ. ⇒ n X i=1 η i (M q q−2 t ) +λ 1 n X i=1 η i (N c c−2 t ) +λ 2 n X i=1 η i (W p p−2 t )≤τ. ⇒ min i η i (M q q−2 t ) +λ 1 min i η i (N c c−2 t ) +λ 2 min i η i (W p p−2 t )≤τ. ⇒ min i η i (M t ) q q−2 +λ 1 min i η i (N t ) c c−2 +λ 2 min i η i (W t ) p p−2 ≤τ. ⇒ min i η i (M t ) q q−2 ≤τ, and λ 1 min i η i (N t ) c c−2 ≤τ, and λ 2 min i η i (W t ) p p−2 ≤τ. ⇒ min i η i (M t )≥τ q 2−q , and min i η i (N t )≥ (λ −1 1 τ) c 2−c , and min i η i (W t )≥ (λ −1 2 τ) p 2−p ⇒ min{min i η i (M t ), min i η i (N t ), min i η i (W t )}≥ min{τ q q−2 , (λ −1 1 τ) c c−2 , (λ −1 2 τ) p p−2 } = Δ γ. From (B.7), we have, J (S t )−J (S t+1 )≥ q 2 tr (S t+1 −S t ) T (S t+1 −S t )M t + λ 1 c 2 tr (S t −S t+1 ) T (S t −S t+1 )N t + λ 2 pd 2 tr (S t −S t+1 ) T D(S t −S t+1 )(F t W t ) (o) ≥ q 2 n X i=1 η n−i+1 (M t )η i (S t −S t+1 ) T (S t −S t+1 ) + λ 1 c 2 n X i=1 η n−i+1 (N t )η i (S t −S t+1 ) T (S t −S t+1 ) + λ 2 pd 2 n X i=1 η n−i+1 (F t W t )η i S t −S t+1 ) T D(S t −S t+1 ) ! (p) ≥ q 2 n X i=1 η n−i+1 (M t )η i (S t −S t+1 ) T (S t −S t+1 ) + λ 1 c 2 n X i=1 η n−i+1 (N t )η i (S t −S t+1 ) T (S t −S t+1 ) + λ 2 pd 2 n X i=1 η n−i+1 (F t )η n (W t )η i S t −S t+1 ) T D(S t −S t+1 ) ! 133 (q) ≥ q 2 n X i=1 η n−i+1 (M t )η i (S t −S t+1 ) T (S t −S t+1 ) + λ 1 c 2 n X i=1 η n−i+1 (N t )η i (S t −S t+1 ) T (S t −S t+1 ) + λ 2 pd 2 n X i=1 η n (W t )η i S t −S t+1 ) T D(S t −S t+1 ) ! (r) ≥ γ 2 qkS t −S t+1 k 2 F +λ 1 ckS t −S t+1 k 2 F +λ 2 pdkC(S t −S t+1 )k 2 F , (B.11) where (o) holds by applying Lemma 6 and applying trace properties, and trace properties, (p) holds sinceF t andW t are PSD matrices, (q) holds since P i η i (F t )≥ 1, and (r) holds by applying Corollary 2.Summing up (B.11) for allt≥ 1, we have, τ =J (S 0 )≥ γ 2 ∞ X t=1 qkS t −S t+1 k 2 F +λ 1 ckS t −S t+1 k 2 F +λ 2 pdkC(S t −S t+1 )k 2 F . (B.12) From (B.12), essentially lim t→∞ kS t −S t+1 k F = 0, which proves statement (I3), and completes the proof of Theorem 4. B.2 Proof of Theorem 5 Since the sequence{S t } is bounded by Theorem 4, thus there exists a subsequence{S ˆ t } and a matrix ˆ S such that lim ˆ t→∞ S ˆ t → ˆ S. Since{S ˆ t+1 } solves (3.17), thus we have, q(S ˆ t+1 −X)M ˆ t +λ 1 cS ˆ t+1 N ˆ t +λ 2 pdDS ˆ t+1 (F ˆ t W ˆ t ) = 0. (B.13) Let ˆ t + 1→∞, then (B.13) implies thatS ˆ t+1 converges to some matrix ˜ S. 134 From Theorem 4, we have, lim t→∞ kS t −S t+1 k F = 0 ⇒k ˆ S− ˜ Sk F = lim ˆ t→∞ kS ˆ t −S ˆ t+1 k F = 0. Thus, ˆ S = ˜ S. Denote ˆ S asS ∗ and let ˆ t→∞, then (B.13) can be rewritten as, q(S ∗ −X)M ∗ +λ 1 cS ∗ N ∗ +λ 2 pdDS ∗ (FW ∗ ) = 0. whereM ∗ , N ∗ andW ∗ are as defined in (3.18), (3.19) and (3.20), respectively, by substitutingS t+1 withS ∗ . Thus,S ∗ satisfies the first-order optimality condition of the optimization problem given in (3.7) [121]. 135 B.3 Proof of Theorem 6 Repeating the analysis of Theorem 4 with replacingS t+1 byS yields (from (q)), J (S t )−J (S ∗ ) ≥ q 2 n X i=1 η n−i+1 (M t )η i (S t −S ∗ ) T (S t −S ∗ ) + λ 1 c 2 n X i=1 η n−i+1 (N t )η i (S t −S ∗ ) T (S t −S ∗ ) + λ 2 pd 2 n X i=1 η n (W t )η i S t −S ∗ ) T D(S t −S ∗ ) ! (s) ≥ q 2 n X i=1 η n−i+1 (M t )η i (S t −S ∗ ) T (S t −S ∗ ) + λ 1 c 2 n X i=1 η n−i+1 (N t )η i (S t −S ∗ ) T (S t −S ∗ ) + λ 2 pd 2 n X i=1 η n (W t )η n (D)η i S t −S ∗ ) T (S t −S ∗ ) (t) ≥ γ 2 qkS t −S t+1 k 2 F +λ 1 ckS t −S t+1 k 2 F +λ 2 pdη n (D)k(S t −S t+1 )k 2 F , (B.14) where (s) holds sinceη i (AB) =η i (BA) [123] and by applying min-max theory [124] and (t) holds by applying Corollary 2. With the required accuracy|J (S t )−J (S ∗ )|≤ δ being achieved at termination of the algorithm, att =T , at whichS t =S T = ˆ S, we have, γ 2 qk ˆ S−S ∗ k 2 F +λ 1 ck ˆ S−S ∗ k 2 F +λ 2 pdη n (D)k(S t −S t+1 )k 2 F ≤δ. (B.15) Rearranging (B.15) yields, k ˆ S−S ∗ k 2 F ≤ 2δ γ q +λ 1 c +λ 2 pdη n (D) , which completes the proof of Theorem 6. 136 B.4 Changes in Key Theorems 4-6 due to Practical Considerations Under the assumption in (3.24), Equation (B.6) becomes, qtr (S t −S t+1 ) T (S t+1 −X)M t +λ 1 ctr (S t −S t+1 )) T S t+1 N t + λ 2 pdtr (tanh(S t )− tanh(S t+1 )) T D tanh(S t+1 )(F t W t ) ! = 0. (B.16) Then, following same steps of Theorems 4 and 5, one can deduce that they hold in case of a general sparse matrix. For the derivation of Approximation 1, the same steps of proof of Theorem 6 are followed, then inequality (B.15) becomes, γ 2 qk ˆ S−S ∗ k 2 F +λ 1 ck ˆ S−S ∗ k 2 F +λ 2 pdη n (D)ktanh( ˆ S)− tanh(S ∗ )k 2 F ≤δ. (B.17) Using the boundktanh( ˆ S)− tanh(S ∗ )k 2 F ≤ktanh( ˆ S)k 2 F +ktanh(S ∗ )k 2 F and the fact that tanh(S ij )≤ 1 yieldsktanh( ˆ S)− tanh(S ∗ )k 2 F ≤ 2mn, then (B.17) becomes, γ 2 qk ˆ S−S ∗ k 2 F +λ 1 ck ˆ S−S ∗ k 2 F + 2λ 2 pdmnη n (D) ≤δ, which gives the following bound, k ˆ S−S ∗ k 2 F . 2δ γ − 2λ 2 pdmnη n (D) q +λ 1 c , which completes the derivation of Approximation 1, where the approximate inequality symbol is used to emphasize that the result is an approximation due to the assumption in (3.24). 137 Appendix C Leaked Channel Estimation: Exploitation of Sparse and Low-rank Structures C.1 Proof of Theorem 7 We will show the contrapositive, i.e., if ∂J(Ht,H ) ∂Ht = 0, then it implies = 0. Standard linear algebra, Wirtinger calculus [125], and the definition of the matricesX n imply, ∂J(H t ,H ) ∂H t = ∂ P n T −1 n=0 y[n]− Tr X n H t H T ν 2 ∂H t =− n T −1 X n=0 Tr X n H ∗ t H ∗ ν T − Tr X n H t H T ν ∗ X n H ν = n T −1 X n=0 Tr X ∗ n H x n H [n, :]. (C.1) Clearly, if ∂J(Ht,H ) ∂Ht = 0, then, we must have Tr X ∗ n H x n H [n, :] = 0 for every 0 ≤ n ≤ n T − 1. Since, x n 6= 0 (by domain restriction) andH [n, :] 6= 0 T (by 138 definition), we must have Tr X ∗ n H = 0 for every 0≤n≤n T − 1, or equivalently Π () = 0 ⇐⇒ Π H ∗ t H ∗ ν T = Π H t H ν T . Using the unique identifiability of the global optimum impliesH ∗ t H ∗ ν T =H t H T ν , or equivalently = 0. C.2 Proof of Theorem 8 Before starting the proof of this Theorem, we state Lemma 7 that is useful in the rest of our proof. Lemma 7: [i.e., see [126]] LetX∈C m×n andC∈C n×n be arbitrary matrices and let⊗ denote the matrix Kronecker product operation. We have, Tr XCX H = vec(X) H (C⊗I)vec(X). (C.2) Proof of Theorem 8: Let us define the shorthandD =H H H , so thatA =D⊗I. From (C.1) and the definition of the matricesX n , we get ∂J(H t ,H ) ∂H t = n T −1 X n=0 Tr(X ∗ n H )x n H [n, :] = n T −1 X n=0 x n x H n ∗ [:,n]H [n, :]. Since the training pilots are generated using a random BPSK sequence, we have E{x n x n H } =I for every 0≤n≤n T − 1 and therefore, E ( ∂J(H t ,H ) ∂H t ) = n T −1 X n=0 ∗ [:,n]H [n, :] = ∗ H . 139 Using Lemma 7, we have E ( ∂J(H t ,H ) ∂H t ) 2 F = Tr ∗ D T = vec( ∗ ) H (D⊗I)vec( ∗ ). (C.3) Next, we will invoke some standard properties of the Kronecker matrix product [126]. We have (Y 1 ⊗Y 2 ) H = Y H 1 ⊗Y 2 H for arbitrary matrices Y 1 and Y 2 , so thatB is Hermitian symmetric by the Hermitian symmetry of bothD andI. Further, ifY 1 ∈C p×p andY 2 ∈C q×q respectively have eigenvaluesλ i , 1≤ i≤ p and μ j , 1≤ j≤ q (listed with multiplicities), thenY 1 ⊗Y 2 has thepq eigenvalues (with multiplicities)λ i μ j , (i,j)∈{1,...,p}×{1,...,q}. Thus, the minimum and maximum eigenvalues ofB are (by definition) the same as that ofD. Since x H Dx = H H x 2 2 ≤ H H 2 2 kxk 2 2 , with equality achieved whenx is the leading eigenvector ofH H , we have 0 D H H 2 2 I. Therefore,B is positive semidefinite with maximum eigenvalue as H H 2 2 = kH k 2 2 and (4.28) is proved. From (C.3) and standard linear algebra, we have ∂J(H t ,H ) ∂H t 2 F = Tr " n T −1 X n=0 x n x H n ∗ [:,n]H [n, :] # H × n T −1 X n 0 =0 x n 0x H n 0 ∗ [:,n 0 ]H [n 0 , :] = n T −1 X n=0 n T −1 X n 0 =0 Tr H [n, :] H T [:,n]x n x H n ×x n 0x H n 0 ∗ [:,n 0 ]H [n 0 , :] . 140 Since the training pilots are generated using a random BPSK sequence, we get, E ∂J(H t ,H ) ∂H t 2 F = n T −1 X n=0 n T −1 X n 0 =0 Tr H [n, :] H T [:,n] ∗ [:,n 0 ]H [n 0 , :] + (m 0 − 1) n T −1 X n=0 Tr H [n, :] H T [:,n] ∗ [:,n]H [n, :] = n T −1 X n=0 ∗ [:,n]H [n, :] 2 F + (m 0 − 1) n T −1 X n=0 k[:,n]k 2 2 kH [n, :]k 2 2 = E ( ∂J(H t ,H ) ∂H t ) 2 F + (m 0 − 1)p 0 n T −1 X n=0 k[:,n]k 2 2 , where the expression for the expected gradient E n ∂J(Ht,H ) ∂Ht o in the last equality follows from (C.3) and kH [n, :]k 2 2 = p 0 X l=1 |H [n,l]| 2 = p 0 X l=1 1 =p 0 . This completes the proof of the Theorem 8. C.3 Proof of Theorem 9 According to our analysis in Section 4.4.2, if we could design a vector-valued function (polynomial) (ν) that satisfies conditions (C-1) and (C-2), then the optimization problem in Equation (4.32) will recover the optimal solution of our channel estimation problem. In this proof, we use a technique called dual certifier construction [84, 85, 93]. Based on this technique, we construct a randomized dual polynomial(ν), i.e., the dual certifier, with the use of Fejer’s kernel [93] and show that given enough number of measurements, the constructed dual certifier satisfies both conditions (C-1) and (C-2) with high probability. 141 Consider the the squared Fejer’s kernel as [93] f r (ν) = 1 n T n T +m 0 −1 X n=m 0 f n e −j2πnν , (C.4) wheref n = 1 n T P n T k=n−n T 1− |k| n T 1− |n−k| n T . Define the randomized matrix-valued version of the squared Fejer’s kernel as [84, 85, 93] F r (ν) = 1 n T n T +m 0 −1 X n=m 0 f n e −j2πnν x n x H n . (C.5) Since the training signal is generated using an i.i.d. random source with Rademacher distribution, it is clear that E{F r (ν)} = f r (ν)I m 0 and for its derivative we have E{F 0 r (ν)} = f 0 r (ν)I m 0 , where I m 0 denotes an m 0 × m 0 identity matrix. Now, we define a candidate vector-valued dual certifier polynomial(ν) as (ν) = p 0 X k=1 F r (ν−ν k ) k +F 0 r (ν−ν k ) k , (C.6) where k = [α k,1 ,··· ,α k,m 0 ] T and k = [β k,1 ,··· ,β k,m 0 ] T are constant coefficients. Clearly, the candidate(ν) defined in Equation (C.6) follows the valid form of(ν) given in Equation (4.39). Coefficients k and k are selected such that the candidate (ν) satisfies condition (C-2) and part of (C-1), namely (ν k ) = sign(¯ η k )g k , (C.7) 0 (ν k ) = 0, i.e., maximum occurs atν k . (C.8) We can summarize the above equations as T 1 ,··· , T p 0 ,γ T 1 ,··· ,γ T p 0 T =g, (C.9) 142 whereg = sign(¯ η 1 )g T 1 ,··· , sign(¯ η p 0 ) g T p 0 , 0 T ,··· , 0 T T ,γ = q |f 00 r (0)|, and the matrix can be written as = 1 n T n T +m 0 −1 X n=m 0 ( n H n )⊗ (x n x H n )f n , (C.10) where ⊗ is the Kronecker product and n = [e −j2πnν 1 ,..,e −j2πnνp 0 , j2πn γ e −j2πnν 1 ,.., j2πn γ e −j2πnνp 0 ] H . Thus, if we show that is invertible, then we can easily evaluate k and k from the system of equations in Equation (C.9) and accordingly(ν) will satisfy both (C.7) and (C.8). Lemma 8 shows that for enough number of measurements and well-separated Doppler shift parameters, the matrix is invertible with high probability. Lemma 8: [See Proposition 16 in [84] and Lemma 2.2 in [93] Define an event E = {k− E{}k ≤ } for the generated random i.i.d. sequencex n with Rademacher distribution, then 1. Let 0<δ < 1 and|ν i −ν j |≥ 1 n T for∀i6=j. Then for any∈ (0, 0.5], as long as n T ≥ 80m 0 p 0 2 log 4m 0 p 0 δ , eventE occurs with probability at least 1−δ. 2. Define ¯ = E{}. Let|ν i −ν j |≥ 1 n T for∀i6=j. Then ¯ is invertible. 3. Given that E holds for an ∈ (0, 0.25], then we have −1 − ¯ −1 ≤ 2 ¯ −1 andk −1 k≤ 2 ¯ −1 . Thus, the construction of (ν) in Equation (C.6) ensures condition (C-2) and 0 (ν k ) = 0 for∀k. To complete the proof we need to show thatk(ν)k 2 < 1 for allν∈ [−0.5, 0.5]\T ν to guarantee condition (C-1). We show that this condition will be satisfied by proposed(ν) in Lemma 10 and Lemma 11. But before stating these 143 lemmas, let us to define some notations and state Lemma 9 that we use in the proof of Lemma 10 and 11. Let −1 = [LR] whereL∈R 2m 0 p 0 ×m 0 p 0 andR∈R 2m 0 p 0 ×m 0 p 0 , then using (C.9), we have T 1 ,··· , T p 0 ,γ T 1 ,··· ,γ T p 0 T =Lg. If we multiply both side of above equation by (m) (ν) = 1 γ m h F (m) r (ν−ν 1 ),··· ,F (m) r (ν−ν p 0 ), 1 γ F (m+1) r (ν−ν 1 ),··· , 1 γ F (m+1) r (ν−ν 1 ) # H , (C.11) where (m) (ν) denotes the mth order derivative of the function (ν) for m = 0, 1, 2,··· , then we can present themth order entry-wise derivative of(ν) by 1 γ m (m) (ν) = (m) (ν) H Lg. (C.12) Similarly, if we define ¯ (ν) = E{(ν)} and ¯ −1 = [ ¯ L ¯ R], then we have, 1 γ m ¯ (m) (ν) = h E n (m) (ν) oi H ( ¯ L⊗I m 0 )g. (C.13) Furthermore, we can write (m) (ν) = 1 n T n T +m 0 −1 X n=m 0 ( j2πn γ ) m f n e j2πnν n ⊗x n x H n , (C.14) 144 and E n (m) (ν) o =! (m) (ν)⊗I, where ! (m) (ν) = 1 γ m h f (m) r (ν−ν 1 ),··· ,f (m) r (ν−ν p 0 ), 1 γ f (m+1) r (ν−ν 1 ),··· , 1 γ f (m+1) r (ν−ν 1 ) # H = 1 n T n T +m 0 −1 X n=m 0 ( j2πn γ ) m f n e j2πnν ( n ⊗I). (C.15) Now using these relationships, we can use Lemma 9 to show that (m) (ν) is concentrated around ¯ (m) (ν) with high probability. Lemma 9:[see the proof of Theorem 3 in [127]] Consider|ν i −ν j |≥ 1 n T for∀i6=j and letδ∈ (0, 1). Then, form = 0, 1, 2,··· , we have 1 γ m (m) (ν)− ¯ (m) (ν) 2 ≤ (C.16) forn T ≥ c m 0 p 0 2 log 3 n T m 0 p 0 δ , wherec is a constant number, with probability at least 1−δ. Let us defineT near ν =∪ p 0 k=1 [ν k −ν ,ν k +ν ] andT far ν = [−0.5, 0.5]\T near ν where ν =O 1 n T , e.g., sayν = 0.1 n T . Lemma 10: Assume|ν i −ν j |≥ 1 n T for∀i6=j and letδ∈ (0, 1). Then, k(ν)k 2 < 1, for∀ν∈T far ν with probability at least (1−δ) forn T ≥cm 0 p 0 log 3 n T m 0 p 0 δ . Proof: See below. 145 Proof of Lemma 10: We start by a relationship results from triangular inequality k(ν)k 2 ≤k(ν)− ¯ (ν)k 2 +k ¯ (ν)k 2 , (C.17) where ¯ (ν) = E{(ν)}. To complete the proof, since from Lemma 9, we know that k(ν)− ¯ (ν)k 2 approaches to zero with high probability forν∈ (0, 1), we just need to show thatk ¯ (ν)k 2 < 1 forν∈T far ν . From (C.13), we have k ¯ (ν)k 2 = sup x:kxk 2 =1 x H ¯ (ν) (C.18) = sup x:kxk 2 =1 x H [E{ (ν)}] H ( ¯ L⊗I)g (C.19) = sup x:kxk 2 =1 p 0 X k=1 [!(ν) H ¯ L] k (x H ¯ η k g k )< 0.99992. (C.20) The above inequality follows from the fact that (x H ¯ η k g k )≤ 1 and the proof of Lemma 2.4 in [93] forν∈T far ν (or see Lemma 10 in [85]). Lemma 11: Assume|ν i −ν j |≥ 1 n T for∀i6=j and letδ∈ (0, 1). Then, k(ν)k 2 < 1, for∀ν∈T near ν with probability at least (1−δ) forn T ≥cm 0 p 0 log 3 n T m 0 p 0 δ . Proof: See below. Proof of Lemma 11: Our choice of the coefficients implies that dk(ν)k 2 2 dν | ν=ν k = 0 and d 2 k(ν)k 2 2 dν 2 | ν=ν k =< ( 2(ν) H d(ν) dν ) | ν=ν k = 0. 146 Thus, forν∈T near = [ν k −ν ,ν k +ν ], to proof the claim of the theorem, it is sufficient to show that d 2 k(ν)k 2 2 dν 2 < 0. Note that 1 2 d 2 k(ν)k 2 2 dν 2 =k 0 (ν)k 2 2 +<{ 00 (ν) H (ν)}, (C.21) forν∈T near ν . Using Lemma 9, we can write 1 γ 2 k 0 (ν)k 2 2 = 1 γ ( 0 (ν)− ¯ 0 (ν) + ¯ 0 (ν)) 2 2 ≤ 2 + 2k ¯ 0 (ν)k 2 γ + k ¯ 0 (ν)k 2 2 γ 2 . (C.22) Similar to calculations in Lemma 2.3 and 2.4 in [93], we havek ¯ 0 (ν)k 2 ≤ 1.6n T and γ > πn T √ 3 forn T ≥ 2. Therefore, we have 1 γ 2 k 0 (ν)k 2 2 ≤ 2 + 1.75 + k ¯ 0 (ν)k 2 2 γ 2 . (C.23) Similarly, we havek ¯ (ν)k 2 ≤ 1 andk ¯ 00 (ν)k 2 ≤ 21.15n 2 T forν∈T near , thus 1 γ 2 <{ 00 (ν) H (ν)} = 1 γ 2 < n ( 00 (ν)− ¯ 00 (ν) + ¯ 00 (ν)) H ((ν)− ¯ (ν) + ¯ (ν)) o ≤ 2 + 4.25 + <{ ¯ 00 (ν) H ¯ (ν)} γ 2 . (C.24) Therefore, substituting (C.23) and (C.24) in the inequality (C.21), we have 1 2γ 2 d 2 k(ν)k 2 2 dν 2 < 2 2 + 6 + 1 γ 2 k ¯ 0 (ν)k 2 2 +<{ ¯ 00 (ν) H ¯ (ν)} . Similar to argument in Lemma 2.3 in [93] (see equation 2.19 and afterward), we can conclude that 1 γ 2 k ¯ 0 (ν)k 2 2 +<{ ¯ 00 (ν) H ¯ (ν)} ≤−0.029. 147 Therefore, we have 1 2 d 2 k(ν)k 2 2 dν 2 ≤ 2 2 +6−0.029< 0, for small enough, e.g.,≤ 10 −5 . This completes the proof. Putting the results of Lemma 10 and Lemma 11 together, Theorem 9 is proved, since(ν) is verified to satisfy both conditions (C-1) and (C-2) with high probability for enough number of measurements. 148 Appendix D Leaked Channel Estimation: Extensions to MIMO OFDM and performance bounds D.1 Proof of Theorem 10 Define the following, l (nt,nr ) k (t) =p(t−t (nt,nr ) k )e −j2πν (n t ,nr) k t , x (nt) l [n] =x (nt) l [n]e j2πf l nTs , and η (nt,nr ) k,l =η (nt,nr ) k e −j2πf l t (n t ,nr) k . Thus, Equation (5.8) can be rewritten as, y (nr ) [n] = 1 √ L Nt X nt=1 m 0 −1 X m=0 L−1 X l=0 p 0 X k=1 x (nt) l [n−m] η (nt,nr ) k,l e j2πν (n t ,nr) k nTs l k (mT s ) . (D.1) 149 Furthermore, define the following, The matrixT (nr ) k ∈C Nt×NtLm 0 as T (nr ) k (0), T (nr ) k (1), ..., T (nr ) k (m 0 − 1) , (D.2) where, T (nr ) k (m) ∈ C Nt×NtL is equal to I Nt ⊗ (nt,nr ) k (m), where, I Nt is identity matrix of size N t , (nt,nr ) k (m) ∈ C 1×L equals [l (nt,nr ) k (m)η (nt,nr ) k,0 ,...,l (nt,nr ) k (m)η (nt,nr ) k,L−1 ]; with 0 ≤ m ≤ m 0 − 1, and⊗ denotes the Kronecker product. The matrixX n ∈C m 0 LNt×m 0 as [x n e T 1 ;x n−1 e T 2 ;... ;x n−(m 0 −1) e T m 0 ], wheree m is the canonical basis vectors, 1≤ m≤ m 0 , forR m 0 ×1 , the operation (;) is defined as vertical concatenation andx n ∈C NtL×1 is defined as [x (1) 0 [n],...,x (1) L−1 [n], ......,x (Nt) 0 [n],...,x (Nt) L−1 [n]] T . The vector of Doppler shifts (nr ) k (n)∈C 1×Nt for all values ofn t : 1≤n t ≤N t at time instancen as [e j2πν (1,nr) k nTs ,e j2πν (2,nr) k nTs ,......,e j2πν (N t ,nr) k nTs ] T . Therefore, we can rewrite Equation (D.1) as follows, y (nr ) [n] = 1 √ L p 0 X k=1 (nr ) k (n)T (nr ) k X n 1 m 0 , (D.3) where,1 is the all ones column vector of sizem 0 . 150 Let d(ν (nt,nr ) ) ∈ C n T ×1 denote a vector of all possible Doppler shifts in the channel representation, for a given antenna pair (n t ,n r ), and is defined as e −j2πm 0 ν (n t ,nr) , ... ,e −j2π(n T +m 0 −1)ν (n t ,nr) T . Spanning over all transmit antennas N t , we define the matrixD(ν (nr ) )∈C n T ×Nt as d(ν (1,nr ) ), ... ,d(ν (Nt,nr ) ) . Also, we define the set of matrices{E}∈ R m 0 ×n T such thatE n ∈ R m 0 ×n T = 1e T n−m 0 +1 ; wheree n is the canonical basis vectors,m 0 ≤n≤n T −m 0 + 1, forR n T ×1 . Thus, y (nr ) [n] = 1 √ L hX n E n , p 0 X k=1 T (nr ) k T D H (ν (nr ) k )i, forn =m 0 ,··· ,n T +m 0 − 1. We can stack all vectors of all receiving antennas to construct the received matrix as, Y [n] = y (1) , y (2) , ..., y (Nr ) n T ×Nr , where, y (nr ) = [y (nr ) [m 0 ],y (nr ) [m 0 + 1],...y (nr ) [n T +m 0 − 1]] T . D.2 Atomic Norm and Optimization Problem Construction We apply the concepts of atomic norm to the channel model described in Equation (5.9). The channel representation given in Equation (5.12) can be viewed as a summation of rank-N t matrices; each of which is of the form T (nr ) k T D H (ν (nr ) k ). We assume that N t p 0 n T ; that is, the overall channel is sparse and thus its cardinality is less than that of the number of training symbols. The matrix T (nr ) k , in Equation (D.2), is rewritten, with n r dropped for simplicity, as T k = L k k , where, L k = [l k (0)I Nt ,l k (1)I Nt ,....,l k (m 0 − 1)I Nt ] ∈ C Nt×Ntm 0 , and k = I Ntm 0 ⊗ [η k,0 ,η k,1 ,......η k,L−1 ]∈ C Ntm 0 ×Ntm 0 L . Thus, each of the rank-N t matrices is of the 151 form T L T D H (ν). Define the atomA(L,ν) =L T D H (ν), whereν∈ h − 1 2 , 1 2 i . The set of all atoms is written as, A = A (L,ν)|ν∈ [− 1 2 , 1 2 ],k[L] i k 2 = 1, [L] i ∈C 1×Ntm 0 , 1≤i≤N t , (D.4) where [L] i ∈C 1×Ntm 0 is thei-th row of the matrixL, 1≤i≤N t . We seek the solution of the following problem, kHk A,0 = inf p H = p X k=1 T k L T k D H (ν k ) , (D.5) which represents the leaked MIMO OFDM channel estimation problem via the minimum number of dominant paths. Due to the NP-hardness of (D.5), the problem is convexified via the atomic norm using the set of atoms, given in (D.4), as follows, kHk A = inf k ,ν k ,k[L] i k 2 =1 T k 1 : H = X k T k L T k D H (ν k ) , (D.6) where conv stands for the convex hull, and abs stands for the absolute function applied element-wise. The final optimization problem is in given in (5.17); see Section 5.3.1. D.3 Proof of Theorem 11 As in [115], the complex Fisher Information Matrix (FIM)J is defined by, J =E " ∂ lnp(y,h) ∂h ∗ #" ∂ lnp(y,h) ∂h ∗ # H , (D.7) 152 where,p(y,h) is the joint distribution ofy andh. The expectation is taken overh and y. We can rewriteJ as, J =E E ∂ lnp(y|h) ∂h ∗ ∂ lnp(y|h) ∂h ∗ H |h + ∂ lnp(h) ∂h ∗ ∂ lnp(h) ∂h ∗ H . (D.8) For the model given in Equation (5.20),J is written as, J =E 1 σ 2 z A H τ A τ A H τ A ν A H ν A τ A H ν A ν + ∂ lnp(h) ∂h ∗ ∂ lnp(h) ∂h ∗ H . Letq = " ∂ lnp(h l,τ ) ∂h ∗ l,τ # andr = " ∂ lnp(h l,ν ) ∂h ∗ l,ν # . Thus, J = 1 σ 2 z E A H τ A τ A H τ A ν A H ν A τ A H ν A ν + E n qq H o E n qr H o E n rq H o E n rr H o (A4), (C1) = 1 σ 2 z E(A H τ A τ ) E(A H τ )E(A ν ) E(A H ν )E(A τ ) E(A H ν A ν ) + E{qq H } E{q}E{r H } E{r}E{q H } E{rr H } (A3) = 1 σ 2 z E(A H τ A τ ) E(A H τ )E(A ν ) E(A H ν )E(A τ ) E(A H ν A ν ) + E{qq H } 0 0 E{rr H } (C2) = 1 σ 2 z E(A H τ A τ ) 0 0 E(A H ν A ν ) + E{qq H } 0 0 E{rr H } = 1 σ 2 z E(A H τ A τ ) +ρ τ I τ 0 0 1 σ 2 z E(A H ν A ν ) +ρ ν I ν , 153 where,ρ τ =E ∂ lnp(h l,τ ) ∂h H l,τ 2 ,ρ ν =E ∂ lnp(h l,ν ) ∂h H l,ν 2 . Then, Σ h is lower bounded byJ −1 as given below, h ≥ 1 σ 2 z E(A H τ A τ ) +ρ τ I τ 0 0 1 σ 2 z E(A H ν A ν ) +ρ ν I ν −1 = 1 σ 2 z R τ +ρ τ I τ −1 0 0 1 σ 2 z R ν +ρ ν I ν −1 , whereR τ =E(A H τ A τ ) andR ν =E(A H ν A ν ). D.4 Derivation of Approximate Identity 1 Substituting definition ofR τ =E(A H τ A τ ) into (5.32), arg min X tr 1 σ 2 z E(A H τ A τ ) +ρ τ I τ −1 s.t tr XX H =P. (D.9) We expand the first trace term as follows, tr 1 σ 2 z E(A H τ A τ ) +ρ τ I τ −1 (a) = tr 1 σ 2 z E [X T ] 1 [H ν T ] 1 ⊗I NtLm 0 n T . . . [X T ] n T [H ν T ] n T ⊗I NtLm 0 n T H [X T ] 1 [H ν T ] 1 ⊗I NtLm 0 n T . . . [X T ] n T [H ν T ] n T ⊗I NtLm 0 n T +ρ τ I τ −1 = tr 1 σ 2 z E ( n T X i=1 h [X T ] i [H ν T ] i ⊗I NtLm 0 n T i H h [X T ] i [H ν T ] i ⊗I NtLm 0 n T i ) +ρ τ I τ ! −1 = tr 1 σ 2 z E ( n T X i=1 h [H ν T ] i ⊗I NtLm 0 n T i H [X T ] H i [X T ] i h [H ν T ] i ⊗I NtLm 0 n T i ) +ρ τ I τ ! −1 , (D.10) 154 where (a) holds due to the definition of A τ as in Equation (5.21). We define the following matrices: H aν (with the bold subscript a denotes augmentation) as the h [H l,ν T ] i ⊗I NtLm 0 n T i matrices, augmented below one another ∀i = 1, 2,...,n T . Mathematically, each h [H l,ν T ] i ⊗I NtLm 0 n T i ∈ C NtLm 0 n T ×N 2 t Lm 0 p 0 n T . Thus,H aν ∈C NtLm 0 n 2 T ×N 2 t Lm 0 p 0 n T . M∈ R n T ×n T 2 as [e T i+n T (i−1) ;e T (i+1)+n T (i) ;...;e T (n T )+n T (n T −1) ]; 1≤ i≤ n T , and e m is the canonical basis vectors,m∈ [1, 2,...,n 2 T ], forR n 2 T ×1 . Thus, we can rewrite Equation (D.10) as follows, tr 1 σ 2 z E h M(I n T ⊗X T )H aν i H M(I n T ⊗X T )H aν +ρ τ I τ ! −1 = tr 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ −1 , (D.11) whereN =M H M andX n T =I n T ⊗X T . Substituting the result of Equation (D.11) into the optimization problem (D.9), arg min X tr 1 σ 2 z E H H a l,ν X H n T NX n T H a l,ν +ρ τ I τ −1 s.t tr XX H =P, (D.12) which can be solved using the Lagrange multiplier method [128], i.e., via minimizing the Lagrangian function defined as, L(X,λ) = tr 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ −1 +λ tr(XX H )−P , (D.13) 155 where, λ is the Lagrange multiplier. We seek the minimization ofL(X,λ), i.e., the solution(s) of ∂L(X,λ) ∂X H = 0, where ∂L(X,λ) ∂X H is given by, L(X,λ) ∂X H = ∂tr 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ −1 ∂X H | {z } (A) + ∂λ tr(XX H )−P ∂X H | {z } (B) . (D.14) We defined the first term (A) and the second term (B) for later use. We seek simplifying the first term (A) in (D.14). From the chain rule Lemma [129, 130], we have, ∂tr(F ) ∂X = ∂tr(G) ∂X ∂tr(F ) ∂G (D.15) for a square matrixF ; that is a function of another square matrixG. We use the above Lemma (D.15) by defining, F = 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ −1 , and G = 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ , to get Equation (D.16). ∂tr 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ −1 ∂X H = ∂tr 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ ∂X H × ∂tr 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ −1 ∂ 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ . (D.16) For the first term of the RHS in (D.16), we have, 156 ∂tr 1 σ 2 z E H H aν X H n T NX n T H aν ∂X H + ∂tr ρ τ I τ ∂X H (b) = ∂tr 1 σ 2 z E H H aν (I n T ⊗X T ) H N(I n T ⊗X T )H aν ∂X H (c) = 1 σ 2 z ∂ E tr H H aν (I n T ⊗X T ) H N(I n T ⊗X T )H aν ! ∂X H (d) = 1 σ 2 z ∂ E tr (I n T ⊗X T ) H N(I n T ⊗X T )H aν H H aν ! ∂X H (e) = 1 σ 2 z ∂tr E (I n T ⊗X T ) H N(I n T ⊗X T )H aν H H aν ! ∂X H (f) = 1 σ 2 z ∂tr (I n T ⊗X T ) H N(I n T ⊗X T )E H aν H H aν ∂X H (g) = 1 σ 2 z ∂tr (I n T ⊗X T ) H N(I n T ⊗X T )K Hν ∂X H (h) = 1 σ 2 z ∂tr (I n T ⊗X T ) H N(I n T ⊗X T )U Hν S Hν V H Hν ∂X H (i) = 1 σ 2 z ∂tr S 1 2 Hν V H Hν (I n T ⊗X T ) H N(I n T ⊗X T )U Hν S 1 2 Hν ∂X H (j) = 1 σ 2 z ∂tr S 1 2 Hν V H Hν (I n T ⊗X ∗ )A ∂X H + ∂tr B(I n T ⊗X T )U Hν S 1 2 Hν ∂X H (k) = 1 σ 2 z ∂tr (I n T ⊗X ∗ )AS 1 2 Hν V H Hν ∂X H + ∂tr (I n T ⊗X T )U Hν S 1 2 Hν B ∂X H (l) = 1 σ 2 z ∂ ∂X H AS 1 2 Hν V H Hν T , (I n T ⊗X ∗ ) + ∂ ∂X H U Hν S 1 2 Hν B T , (I n T ⊗X T ) (m) ≈ 1 σ 2 z ∂ ∂X H C⊗D , I n T ⊗X ∗ + ∂ ∂X H E⊗L), I n T ⊗X T (n) = 1 σ 2 z ∂ ∂X H tr I n T ⊗X ∗ C⊗D T + ∂ ∂X H tr I n T ⊗X T E⊗L T ! (o) = 1 σ 2 z ∂ ∂X H tr I n T ⊗X ∗ C T ⊗D T + ∂ ∂X H tr I n T ⊗X T E T ⊗L T ! 157 (p) ≈ 1 σ 2 z ∂ ∂X H tr I n T C T ⊗ X ∗ D T + ∂ ∂X H tr I n T E T ⊗ X T L T ! (q) = 1 σ 2 z ∂ ∂X H tr I n T C T tr X ∗ D T + ∂ ∂X H tr I n T E T tr X T L T (r) = 1 σ 2 z tr C ∂ ∂X H tr X ∗ D T + tr E ∂ ∂X H tr X T L T (s) = 1 σ 2 z tr C D T + tr E L , (D.17) where (b) holds since (ρ τ I τ ) is not a function of X, (c) holds due to trace and expectation properties, (d) holds due to trace properties, (e) holds due to trace and expectation properties, (f) holds since expectation is taken over the channel, i.e., the term (I n T ⊗X T ) H N(I n T ⊗X T ) is deterministic, (g) holds since we defineK Hν = E H aν H H aν , (h) holds since U Hν S Hν V H Hν is the Singular Value Decomposition (SVD) of K Hν , (i) holds due to trace properties, (j) holds due to trace properties and our definitions of A and B as follows: A = 1 2 N(I n T ⊗ X T )U Hν S 1 2 Hν ∈ C n 2 T ×NtLm 0 n 2 T ,B = 1 2 S 1 2 Hν V H Hν (I n T ⊗X ∗ )N ∈ C NtLm 0 n 2 T ×n 2 T , (k) holds due to trace properties, i.e., tr(XY ) = tr(YX) for general matricesX andY , (l) holds due to trace properties, i.e.,hX,Yi = tr(YX T ) for general matricesX andY , (m) holds since one can approximateC⊗D by (S Hν V H Hν ) T A T withC andD are shaped like I n T andX, respectively, and one can approximateE⊗L byB T (U Hν S 1 2 Hν ) T withE andL are shaped likeI n T andX T , respectively, (see e.g., [131]). Furthermore, (n) holds due to trace properties, (o) holds since (X⊗ Y ) T = X T ⊗Y T for general matricesX andY , (p) holds due to the mixed-product property: (X⊗Y )(W⊗Z) = (XW )⊗ (YZ) for general matrices X,Y,W and Z of such size that one can form the matrix products XW and YZ, (q) holds from the definition of trace of Kronecker products; that is, tr(X⊗Y ) = tr(X)tr(Y ), (r) holds since tr I n T C T and tr I n T E T are not function ofX and tr(A T ) = tr(A) In step (p), we have made the approximation that A is not a function of X, to facilitate the computation; as such we go from second order derivatives to first order derivatives. 158 for general matrix A, and (s) holds due to definitions of complex differntiation of matrices as given in [132] (Equations (229),(230), (246) and (247) therein). Note that, in general, Kronecker factorization might have more than one term [131]; that is, one could rewrite the factorizations above as, P r k=1 C k ⊗D k = (S Hν V H Hν ) T A T and P v k=1 E k ⊗L k =B T (U Hν S 1 2 Hν ) T . Without loss of generality, we assume a single term factorization for simplicity. For the second term in the RHS of (D.16), we have, ∂tr 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ −1 ∂ 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ (t) =− 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ −2T , (D.18) where, (t) holds since ∂tr(A −1 ) ∂A =−A −2T for a general matrixA [132]. Combining terms and substituting Equations (D.17) and (D.18) into Equation (D.16) yield the following for the first term (A), ∂tr 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ −1 ∂X H =− 1 σ 2 z tr C D T + tr E L 1 σ 2 z E H H aν X H n T NX n T H aν +ρ τ I τ −2T . (D.19) For the second term (B) in Equation (D.14), we have [132], ∂λ tr(XX H )−P ∂X H =λX T . (D.20) 159 Substituting the results of the first term (A) (Equation (D.19)) and the second term (B) (Equation (D.20)) into Equation (D.14) yields Equation (D.21). Setting Equation (D.21) to zero and solving iteratively forX gives ˆ X. ∂L(X,λ) ∂X H = ∂tr 1 σ 2 z E H H aν (I n T ⊗X T ) H N(I n T ⊗X T )H aν +ρ τ I τ −1 ∂X H + ∂λ tr(XX H )−P ∂X H =− 1 σ 2 z tr C D T + tr E L 1 σ 2 z E H H aν (I n T ⊗X T ) H N(I n T ⊗X T )H aν +ρ τ I τ −2T +λX T . (D.21) 160 Bibliography [1] O. Oreifej, X. Li, and M. Shah, “Simultaneous video stabilization and moving object detection in turbulence,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 2, pp. 450–462, Feb. 2013. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2012.97 [2] X. Liu, G. Zhao, J. Yao, and C. Qi, “Background subtraction based on low-rank and structured sparse decomposition,” IEEE Transactions on Image Processing, vol. 24, no. 8, pp. 2502–2514, Aug 2015. [3] S. E. Ebadi, V . G. Ones, and E. Izquierdo, “Efficient background subtraction with low-rank and sparse matrix decomposition,” in 2015 IEEE International Conference on Image Processing (ICIP), Sept 2015, pp. 4863–4867. [4] B. Lois and N. Vaswani, “Online matrix completion and online robust pca,” in 2015 IEEE International Symposium on Information Theory (ISIT), June 2015, pp. 1826–1830. [5] E. J. Candes and Y . Plan, “Matrix completion with noise,” Proceedings of the IEEE, vol. 98, no. 6, pp. 925–936, June 2010. [6] T. Bouwmans, A. Sobral, S. Javed, S. K. Jung, and E.-H. Zahzah, “Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset,” Computer Science Review, vol. 23, pp. 1 – 71, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1574013715300459 [7] J. Wright, A. Y . Yang, A. Ganesh, S. S. Sastry, and Y . Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009. [8] M. Weimer, A. Karatzoglou, Q. V . Le, and A. Smola, “Cofirank maximum margin matrix factorization for collaborative ranking,” in Proceedings of the 20th International Conference on Neural Information Processing Systems, ser. NIPS’07. Red Hook, NY , USA: Curran Associates Inc., 2007, p. 1593–1600. 161 [9] E. J. Cand` es, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?” J. ACM, vol. 58, no. 3, Jun. 2011. [Online]. Available: https://doi.org/10.1145/1970392.1970395 [10] L. Canyi, J. Feng, Z. Lin, and S. Yan, “Correlation adaptive subspace segmentation by trace lasso,” 12 2013, pp. 1345–1352. [11] G. Liu, Z. Lin, and Y . Yu, “Robust subspace segmentation by low-rank representation,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, ser. ICML’10. Madison, WI, USA: Omnipress, 2010, p. 663–670. [12] E. Aktas and U. Mitra, “Single-user sparse channel acquisition in multiuser DS-CDMA systems,” IEEE Transactions on Communications, vol. 51, no. 4, pp. 682–693, April 2003. [13] C. Carbonelli, S. Vedantam, and U. Mitra, “Sparse channel estimation with zero tap detection,” IEEE Transactions on Wireless Communications, vol. 6, no. 5, pp. 1743–1763, May 2007. [14] S. F. Cotter and B. D. Rao, “Sparse channel estimation via matching pursuit with application to equalization,” IEEE Transactions on Communications, vol. 50, no. 3, pp. 374–377, March 2002. [15] J. Homer, I. Mareels, R. R. Bitmead, B. Wahlberg, and A. Gustafsson, “Lms estimation via structural detection,” IEEE transactions on Signal Processing, vol. 46, no. 10, pp. 2651–2663, 1998. [16] M. Kocic, D. Brady, and M. Stojanovic, “Sparse equalization for real-time digital underwater acoustic communications,” in ’Challenges of Our Changing Global Environment’. Conference Proceedings. OCEANS ’95 MTS/IEEE, vol. 3, Oct 1995, pp. 1417–1422 vol.3. [17] C. R. Berger, S. Zhou, J. C. Preisig, and P. Willett, “Sparse channel estimation for multicarrier underwater acoustic communication: From subspace methods to compressed sensing,” IEEE Transactions on Signal Processing, vol. 58, no. 3, pp. 1708–1721, 2010. [18] S. Beygi and U. Mitra, “Multi-scale multi-lag channel estimation using low rank approximation for OFDM,” IEEE Transactions on Signal Processing, vol. 63, no. 18, pp. 4744–4755, Sept 2015. [19] G. Taub¨ ock, F. Hlawatsch, D. Eiwen, and H. Rauhut, “Compressive estimation of doubly selective channels in multicarrier systems: Leakage effects and sparsity-enhancing processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 255–271, April 2010. 162 [20] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressed channel sensing: A new approach to estimating sparse multipath channels,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1058–1076, June 2010. [21] N. Michelusi, U. Mitra, A. Molisch, and M. Zorzi, “UWB Sparse/Diffuse Channels, Part I: Channel Models and Bayesian Estimators,” vol. 60, no. 10, pp. 5307–5319, Oct. 2012. [22] N. Michelusi, U. Mitra, A. Molisch, and M. Zorzi, “UWB Sparse/Diffuse Channels, Part II: Estimator Analysis and Practical Channels,” vol. 60, no. 10, pp. 5320–5333, Oct. 2012. [23] S. Beygi, U. Mitra, and E. G. Str¨ om, “Nested sparse approximation: Structured estimation of V2V channels using geometry-based stochastic channel model,” IEEE Transactions on Signal Processing, vol. 63, no. 18, pp. 4940–4955, Sept 2015. [24] S. Beygi and U. Mitra, “Structured estimation of time-varying narrowband wireless communication channels,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 3529–3533. [25] S. Beygi and U. Mitra, “Time-varying narrowband channel estimation: Exploiting low-rank and sparsity via bilinear representation,” in Signals, Systems and Computers, 2016 50th Asilomar Conference on. IEEE, 2016, pp. 1235–1239. [26] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACM Comput. Surv., vol. 38, no. 4, Dec. 2006. [Online]. Available: http://doi.acm.org/10.1145/1177352.1177355 [27] K. K. Halder, M. Tahtali, and S. G. Anavatti, “Target tracking in dynamic background using generalized regression neural network,” in 2015 International Conference on Advanced Mechatronics, Intelligent Manufacture, and Industrial Automation (ICAMIMIA), Oct 2015, pp. 8–11. [28] W. Zhong, H. Lu, and M. H. Yang, “Robust object tracking via sparsity-based collaborative model,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp. 1838–1845. [29] L. Lin, W. Lin, and S. Huang, “Group object detection and tracking by combining RPCA and fractal analysis,” Soft Computing, vol. 22, no. 1, pp. 231–242, Jan 2018. [Online]. Available: https://doi.org/10.1007/s00500-016-2329-1 163 [30] F. Chen, Q. Wang, S. Wang, W. Zhang, and W. Xu, “Object tracking via appearance modeling and sparse representation,” Image Vision Comput., vol. 29, no. 11, pp. 787–796, Oct. 2011. [Online]. Available: http: //dx.doi.org/10.1016/j.imavis.2011.08.006 [31] D. Shan and Z. Chao, “Improved l1-tracker using robust PCA and random projection,” Machine Vision and Applications, vol. 27, no. 4, pp. 577–583, May 2016. [Online]. Available: https://doi.org/10.1007/s00138-016-0750-1 [32] N. Vaswani, T. Bouwmans, S. Javed, and P. Narayanamurthy, “Robust subspace learning: Robust PCA, robust subspace tracking, and robust subspace recovery,” IEEE Signal Processing Magazine, vol. 35, no. 4, pp. 32–55, July 2018. [33] F. Shang, J. Cheng, Y . Liu, Z. Q. Luo, and Z. Lin, “Bilinear factor matrix norm minimization for robust pca: Algorithms and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2018. [34] D. A. Cox, J. Little, and D. O’Shea, Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra, 4th ed. Springer Publishing Company, Incorporated, 2015. [35] G. Ongie, R. Willett, R. D. Nowak, and L. Balzano, “Algebraic variety models for high-rank matrix completion,” in Proceedings of the 34th International Conference on Machine Learning, vol. 70, 06–11 Aug 2017, pp. 2691–2700. [Online]. Available: http://proceedings.mlr.press/v70/ongie17a.html [36] G. Matz, H. Bolcskei, and F. Hlawatsch, “Time-frequency foundations of communications: Concepts and tools,” IEEE Signal Processing Magazine, vol. 30, no. 6, pp. 87–96, Nov 2013. [37] G. A. Hollinger, S. Choudhary, P. Qarabaqi, C. Murphy, U. Mitra, G. S. Sukhatme, M. Stojanovic, H. Singh, and F. Hover, “Underwater data collection using robotic sensor networks,” IEEE Journal on Selected Areas in Communications, vol. 30, no. 5, pp. 899–911, 2012. [38] P. Bello, “Characterization of randomly time-variant linear channels,” IEEE Transactions on Communications Systems, vol. 11, no. 4, pp. 360–393, December 1963. [39] S. Mahadevan and D. P. Casasent, “Detection of triple junction parameters in microscope images,” Proc. SPIE, Optical Pattern Recognition XII, David P . Casasent; Tien-Hsin Chao; Eds, vol. 4387, pp. 204–214, 01 2001. [40] P. Kahn, L. Kitchen, and E. M. Riseman, “A fast line finder for vision-guided robot navigation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 11, pp. 1098–1102, Nov 1990. 164 [41] P. Kahn, L. Kitchen, and E. Riseman, “Real-time feature extraction: A fast line finder for vision-guided robot navigation,” vol. 12, pp. 1098 – 1102, 12 1990. [42] S. El Mejdani, R. Egli, and F. Dubeau, “Old and new straight-line detectors: Description and comparison,” Pattern Recogn., vol. 41, no. 6, pp. 1845–1866, Jun. 2008. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2007.11.013 [43] M. Tao and X. Yuan, “Recovering low-rank and sparse components of matrices from incomplete and noisy observations,” SIAM J. on Optimization, vol. 21, no. 1, pp. 57–81, Jan. 2011. [Online]. Available: http://dx.doi.org/10.1137/100781894 [44] Y . Chen, Y . Wang, M. Li, and G. He, “Augmented Lagrangian alternating direction method for low-rank minimization via non-convex approximation,” Signal, Image and Video Processing, vol. 11, no. 7, pp. 1271–1278, Oct 2017. [Online]. Available: https://doi.org/10.1007/s11760-017-1084-9 [45] Z. Lin, R. Liu, and H. Li, “Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning,” Machine Learning, vol. 99, no. 2, pp. 287–325, May 2015. [Online]. Available: https://doi.org/10.1007/s10994-014-5469-5 [46] N. Han, Y . Song, and Z. Song, “Bayesian robust principal component analysis with structured sparse component,” Computational Statistics and Data Analysis, vol. 109, pp. 144 – 158, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S016794731630295X [47] X. Jia, X. Feng, W. Wang, C. Xu, and L. Zhang, “Bayesian inference for adaptive low rank and sparse matrix estimation,” Neurocomputing, vol. 291, pp. 71 – 83, 2018. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S0925231218302030 [48] Q. Sun, S. Xiang, and J. Ye, “Robust principal component analysis via capped norms,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’13. New York, NY , USA: ACM, 2013, pp. 311–319. [Online]. Available: http://doi.acm.org/10.1145/2487575.2487604 [49] J. Wang, M. Wang, X. Hu, and S. Yan, “Visual data denoising with a unified schatten-p norm and lq norm regularized principal component pursuit,” Pattern Recogn., vol. 48, no. 10, pp. 3135–3144, Oct. 2015. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2015.01.024 [50] A. Sobral, T. Bouwmans, and E. h. ZahZah, “Double-constrained RPCA based on saliency maps for foreground detection in automated maritime surveillance,” in 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Aug 2015, pp. 1–6. 165 [51] S. Beygi and U. Mitra, “Multi-scale multi-lag channel estimation using low rank approximation for OFDM,” IEEE Transactions on Signal Processing, vol. 63, no. 18, pp. 4744–4755, Sept 2015. [52] S. Beygi, U. Mitra, and E. G. Str¨ om, “Nested sparse approximation: Structured estimation of v2v channels using geometry-based stochastic channel model,” IEEE Transactions on Signal Processing, vol. 63, no. 18, pp. 4940–4955, Sept 2015. [53] A. Elnakeeb, S. Beygiharchegani, S. Choudhary, and U. Mitra, “Bilinear matrix factorization methods for time-varying narrowband channel estimation: Exploiting sparsity and rank,” IEEE Transactions on Signal Processing, pp. 1–1, 2018. [54] G. Tang and A. Nehorai, “Constrained cramer rao bound on robust principal component analysis,” IEEE Transactions on Signal Processing, vol. 59, no. 10, pp. 5070–5076, Oct 2011. [55] G. Tang and A. Nehorai, “Lower bounds on the mean-squared error of low-rank matrix reconstruction,” IEEE Transactions on Signal Processing, vol. 59, no. 10, pp. 4559–4571, Oct 2011. [56] P. Stoica and B. C. Ng, “On the cramer-rao bound under parametric constraints,” IEEE Signal Processing Letters, vol. 5, no. 7, pp. 177–179, July 1998. [57] X. Ding, L. He, and L. Carin, “Bayesian robust principal component analysis,” IEEE Transactions on Image Processing, vol. 20, no. 12, pp. 3419–3430, Dec 2011. [58] J. D. Gorman and A. O. Hero, “Lower bounds for parametric estimation with constraints,” IEEE Transactions on Information Theory, vol. 36, no. 6, pp. 1285–1301, Nov 1990. [59] S. Oymak, A. Jalali, M. Fazel, Y . C. Eldar, and B. Hassibi, “Simultaneously structured models with application to sparse and low-rank matrices,” IEEE Transactions on Information Theory, vol. 61, no. 5, pp. 2886–2908, May 2015. [60] Z. Ben-Haim and Y . C. Eldar, “The Cram´ er-Rao bound for estimating a sparse parameter vector,” IEEE Transactions on Signal Processing, vol. 58, no. 6, pp. 3384–3389, June 2010. [61] G. Tang and A. Nehorai, “Robust principal component analysis based on low-rank and block-sparse matrix decomposition,” in 2011 45th Annual Conference on Information Sciences and Systems, March 2011, pp. 1–5. 166 [62] S. Bhattacharya, S. Kim, H. Heidarsson, G. S. Sukhatme, and V . Kumar, “A topological approach to using cables to separate and manipulate sets of objects,” The International Journal of Robotics Research, vol. 34, no. 6, pp. 799–815, 2015. [Online]. Available: https://doi.org/10.1177/0278364914562236 [63] “Full video, link: https://www.dropbox.com/s/adj31afj7ald5w3/GP040018 . MP4?dl=0.” [64] “Cropped video, link: https://www.dropbox.com/s/8n9ul0sk8wvzsx8/video3. mp4?dl=0.” [65] X. R. Li and V . P. Jilkov, “Survey of maneuvering target tracking. part i. dynamic models,” IEEE Transactions on Aerospace and Electronic Systems, vol. 39, no. 4, pp. 1333–1364, Oct 2003. [66] E. J. Cand` es, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?” J. ACM, vol. 58, no. 3, pp. 11:1–11:37, Jun. 2011. [Online]. Available: http://doi.acm.org/10.1145/1970392.1970395 [67] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Fourth Edition, 4th ed. Orlando, FL, USA: Academic Press, Inc., 2008. [68] M. Fornasier, H. Rauhut, and R. Ward, “Low-rank matrix recovery via iteratively reweighted least squares minimization,” SIAM J. on Optimization, vol. 21, no. 4, pp. 1614–1640, Dec. 2011. [Online]. Available: http: //dx.doi.org/10.1137/100811404 [69] K. Mohan and M. Fazel, “Iterative reweighted algorithms for matrix rank minimization,” J. Mach. Learn. Res., vol. 13, no. 1, pp. 3441–3473, Nov. 2012. [Online]. Available: http://dl.acm.org/citation.cfm?id=2503308.2503351 [70] C. Lu, Z. Lin, and S. Yan, “Smoothed low rank and sparse matrix recovery by iteratively reweighted least squares minimization,” CoRR, vol. abs/1401.7413, 2014. [Online]. Available: http://arxiv.org/abs/1401.7413 [71] J. Cai, E. Cand` es, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010. [Online]. Available: https://doi.org/10.1137/080738970 [72] P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” in Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, ser. STOC ’13. New York, NY , USA: ACM, 2013, pp. 665–674. [Online]. Available: http://doi.acm.org/10.1145/ 2488608.2488693 167 [73] A. Elnakeeb and U. Mitra, “Low-rank, sparse and line constrained estimation: Applications to target tracking and convergence,” in 2017 IEEE International Symposium on Information Theory (ISIT), June 2017, pp. 2283–2287. [74] A. Elnakeeb and U. Mitra, “Cram´ er-rao bound for line constrained trajectory tracking,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018, pp. 3444–3448. [75] J. Wang, M. Wang, X. Hu, and S. Yan, “Visual data denoising with a unified schatten-p norm and l-q norm regularized principal component pursuit,” Pattern Recognition, vol. 48, no. 10, pp. 3135 – 3144, 2015, discriminative Feature Learning from Big Data for Visual Recognition. [76] Q. Sun, S. Xiang, and J. Ye, “Robust principal component analysis via capped norms,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’13. New York, NY , USA: Association for Computing Machinery, 2013, p. 311–319. [Online]. Available: https://doi.org/10.1145/2487575.2487604 [77] D. Eiwen, G. Taub¨ ock, F. Hlawatsch, and H. G. Feichtinger, “Compressive tracking of doubly selective channels in multicarrier systems based on sequential delay-doppler sparsity,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp. 2928–2931. [78] D. Eiwen, G. Taub¨ ock, F. Hlawatsch, H. Rauhut, and N. Czink, “Multichannel-compressive estimation of doubly selective channels in MIMO-OFDM systems: Exploiting and enhancing joint sparsity,” in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, March 2010, pp. 3082–3085. [79] D. Eiwen, G. Taub¨ ock, F. Hlawatsch, and H. G. Feichtinger, “Group sparsity methods for compressive channel estimation in doubly dispersive multicarrier systems,” in 2010 IEEE 11th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), June 2010, pp. 1–5. [80] A. F. Molisch, Wireless communications. John Wiley & Sons, 2012, vol. 34. [81] S. Oymak, B. Recht, and M. Soltanolkotabi, “Sharp time–data tradeoffs for linear inverse problems,” arXiv preprint arXiv:1507.04793, 2015. [82] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht, “Low-rank solutions of linear matrix equations via procrustes flow,” arXiv preprint arXiv:1507.03566, 2015. 168 [83] M. R. Hestenes, “Multiplier and gradient methods,” Journal of optimization theory and applications, vol. 4, no. 5, pp. 303–320, 1969. [84] G. Tang, B. N. Bhaskar, P. Shah, and B. Recht, “Compressed sensing off the grid,” IEEE Transactions on Information Theory, vol. 59, no. 11, pp. 7465–7490, Nov 2013. [85] Y . Chi, “Guaranteed blind sparse spikes deconvolution via lifting and convex optimization,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 4, pp. 782–794, June 2016. [86] M. Fazel, H. Hindi, and S. Boyd, “Rank minimization and applications in system theory,” in American Control Conference, 2004. Proceedings of the 2004, vol. 4. IEEE, 2004, pp. 3273–3278. [87] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,” SIAM review, vol. 52, no. 3, pp. 471–501, 2010. [88] S. Choudhary, S. Beygi, and U. Mitra, “Delay-doppler estimation via structured low-rank matrix recovery,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 3786–3790. [89] J. P. Haldar and D. Hernando, “Rank-constrained solutions to linear matrix equations using powerfactorization,” IEEE Signal Processing Letters, vol. 16, no. 7, pp. 584–587, 2009. [90] A. N. Tikhonov and V . Y . Arsenin, Solutions of ill-posed problems. Winston Washington, DC, 1977, vol. 14. [91] W. H. Press, B. P. Flannery, S. A. Teukolsky, W. T. Vetterling, and P. B. Kramer, Numerical recipes: the art of scientific computing. AIP, 1987. [92] V . Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “The convex geometry of linear inverse problems,” Foundations of Computational mathematics, vol. 12, no. 6, pp. 805–849, 2012. [93] E. J. Candes and C. Fernandez-Granda, “Towards a mathematical theory of super-resolution,” Communications on Pure and Applied Mathematics, vol. 67, no. 6, pp. 906–956, June 2014. [Online]. Available: https: //onlinelibrary.wiley.com/doi/abs/10.1002/cpa.21455 [94] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming, version 2.1,” http://cvxr.com/cvx, Mar. 2014. 169 [95] Y .-S. Choi, P. J. V oltz, and F. A. Cassara, “On channel estimation and detection for multicarrier signals in fast and selective Rayleigh fading channels,” IEEE Transactions on Communications, vol. 49, no. 8, pp. 1375–1387, Aug 2001. [96] I. Barhumi, G. Leus, and M. Moonen, “Equalization for OFDM over doubly selective channels,” IEEE Transactions on Signal Processing, vol. 54, no. 4, pp. 1445–1458, April 2006. [97] W.-G. Song and J.-T. Lim, “Channel estimation and signal detection for MIMO-OFDM with time varying channels,” IEEE Communications Letters, vol. 10, no. 7, pp. 540–542, July 2006. [98] Y . Qiao, S. Yu, P. Su, and L. Zhang, “Research on an iterative algorithm of LS channel estimation in MIMO OFDM systems,” IEEE Transactions on Broadcasting, vol. 51, no. 1, pp. 149–153, March 2005. [99] G. Taub¨ ock, F. Hlawatsch, D. Eiwen, and H. Rauhut, “Compressive estimation of doubly selective channels in multicarrier systems: Leakage effects and sparsity-enhancing processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 255–271, April 2010. [100] S. Beygi, A. Elnakeeb, S. Choudhary, and U. Mitra, “Bilinear matrix factorization methods for time-varying narrowband channel estimation: Exploiting sparsity and rank,” IEEE Transactions on Signal Processing, vol. 66, no. 22, pp. 6062–6075, Nov 2018. [101] V . Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “The convex geometry of linear inverse problems,” Foundations of Computational Mathematics, vol. 12, no. 6, pp. 805–849, Dec 2012. [Online]. Available: https://doi.org/10.1007/s10208-012-9135-7 [102] Y . Chi, “Guaranteed blind sparse spikes deconvolution via lifting and convex optimization,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 4, pp. 782–794, June 2016. [103] F. F. Bonsall, “A general atomic decomposition theorem and banach’s closed range theorem,” The Quarterly Journal of Mathematics, vol. 42, no. 1, pp. 9–14, Jan 1991. [Online]. Available: http://dx.doi.org/10.1093/qmath/42.1.9 [104] A. K. Jagannatham and B. D. Rao, “Complex constrained CRB and its application to semi-blind MIMO and OFDM channel estimation,” in Processing Workshop Proceedings, 2004 Sensor Array and Multichannel Signal, July 2004, pp. 397–401. 170 [105] B. M. Sadler, R. J. Kozick, T. Moore, and A. Swami, “Bounds on SIMO and MIMO channel estimation and equalization with side information,” Journal of VLSI signal processing systems for signal, image and video technology, vol. 30, no. 1, pp. 107–126, Jan 2002. [Online]. Available: https://doi.org/10.1023/A:1014046808970 [106] J. D. Gorman and A. O. Hero, “Lower bounds for parametric estimation with constraints,” IEEE Transactions on Information Theory, vol. 36, no. 6, pp. 1285–1301, Nov 1990. [107] M. Dong and L. Tong, “Optimal design and placement of pilot symbols for channel estimation,” IEEE Transactions on Signal Processing, vol. 50, no. 12, pp. 3055–3069, Dec 2002. [108] K. Drouiche, “A new test for whiteness,” IEEE Transactions on Signal Processing, vol. 48, no. 7, pp. 1864–1871, July 2000. [109] G. Gui and F. Adachi, “Adaptive sparse channel estimation for time-variant MIMO-OFDM systems,” in 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC), July 2013, pp. 878–883. [110] J. Huang, J. Huang, C. R. Berger, S. Zhou, and P. Willett, “Iterative sparse channel estimation and decoding for underwater MIMO-OFDM,” EURASIP Journal on Advances in Signal Processing, vol. 2010, no. 1, p. 460379, May 2010. [Online]. Available: https://doi.org/10.1155/2010/460379 [111] R. Heckel and M. Soltanolkotabi, “Generalized line spectral estimation via convex optimization,” IEEE Transactions on Information Theory, vol. 64, no. 6, pp. 4001–4023, June 2018. [112] Y . Chi, L. L. Scharf, A. Pezeshki, and R. Calderbank, “The sensitivity to basis mismatch of compressed sensing for spectrum analysis and beamforming,” 2009. [113] E. Hosseini and E. Perrins, “The Cramer-Rao Bound for training sequence design for burst-mode cpm,” IEEE Transactions on Communications, vol. 61, no. 6, pp. 2396–2407, 2013. [114] P. Stoica and O. Besson, “Training sequence design for frequency offset and frequency-selective channel estimation,” IEEE Transactions on Communications, vol. 51, no. 11, pp. 1910–1917, 2003. [115] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1993. [116] L. L. Scharf, “Statistical signal processing detection, estimation, and time series analysis,” 1991, with Cedric Demeure collaborating on chapters 10 and 11. 171 [117] M. F. Duarte, S. Sarvotham, D. Baron, M. B. Wakin, and R. G. Baraniuk, “Distributed compressed sensing of jointly sparse signals,” in Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005., Oct 2005, pp. 1537–1541. [118] A. Elnakeeb and U. Mitra, “Sparsity and rank exploitation for time-varying narrowband leaked OFDM channel estimation,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018, pp. 3894–3898. [119] M. B. Prietley, “Spectral analysis and time series, vols. 1 and 2,” 1981. [120] B. Mahafza and A. Z. Elsherbeni, “Matlab simulations for radar systems design,” 2003. [121] S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY , USA: Cambridge University Press, 2004. [122] M. Lai, Y . Xu, and W. Yin, “Improved iteratively reweighted least squares for unconstrained smoothed lq minimization,” SIAM Journal on Numerical Analysis, vol. 51, no. 2, pp. 927–957, 7 2013. [123] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, 1990. [Online]. Available: http://www.amazon. com/Matrix-Analysis-Roger-Horn/dp/0521386322%3FSubscriptionId% 3D192BW6DQ43CK9FN0ZGG2%26tag%3Dws%26linkCode%3Dxm2% 26camp%3D2025%26creative%3D165953%26creativeASIN%3D0521386322 [124] R. Vershynin, High-Dimensional Probability: An Introduction with Applications in Data Science, ser. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. [125] R. Remmert, Theory of complex functions. Springer Science & Business Media, 2012, vol. 122. [126] K. B. Petersen, M. S. Pedersen et al., “The matrix cookbook,” Technical University of Denmark, vol. 7, p. 15, 2008. [127] R. Heckel and M. Soltanolkotabi, “Generalized line spectral estimation via convex optimization,” arXiv preprint arXiv:1609.08198, 2016. [128] S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY , USA: Cambridge University Press, 2004. 172 [129] G. Trenkler, “Handbook of Matrices : Lutkepohl, H. (1996): New York: John wiley & Sons, ISBN 0-471-96688-6,” Computational Statistics & Data Analysis, vol. 25, no. 2, pp. 243–243, July 1997. [Online]. Available: https://ideas.repec.org/a/eee/csdana/v25y1997i2p243-243.html [130] G. A. Korn and T. M. Korn, “Mathematical handbook for scientists and engineers,” May 2018. [131] C. V . Loan and N. Pitsianis, “Approximation with Kronecker products,” in Linear Algebra for Large Scale and Real Time Applications. Kluwer Publications, 1993, pp. 293–314. [132] K. B. Petersen and M. S. Pedersen, “The matrix cookbook,” Nov 2012, version 20121115. [Online]. Available: http://www2.imm.dtu.dk/pubdb/p.php?3274 173
Abstract (if available)
Abstract
Sparse and low-rank structures have been actively considered in a variety of applications. In this work, we examine two important applications: Target tracking and leaked channel estimation. ❧ First, trajectory estimation of moving targets is examined. Background subtraction methods, exploiting low-rank backgrounds and sparse features of interest, are extended to incorporate linear constraints. The line constraint is enforced via a rotation that yields an additional low rank condition. The optimization is solved via the Augmented Lagrange Multiplier method. Furthermore, the background subtraction model is extended to encompass nonlinear trajectory tacking via the inclusion of algebraic varieties. The optimization is solved via the Iteratively Reweighted Least Squares method. For both linear and nonlinear trajectory tracking, the convergence of the proposed algorithms is studied via a boundedness analysis. An average performance improvement of 5 dB is observed over previous background subtraction methods. Second, time-varying narrowband leaked Multiple-Input Multiple-Output Orthogonal Frequency Division Multiplexing (MIMO OFDM) channel estimation is considered. It has been shown that the leaked MIMO OFDM channel is effectively separable in delay and Doppler domains. A convex optimization approach, based on the atomic norm heuristic, is leveraged to solve the recovery problem. The Cramér–Rao bound (CRB) on the leaked channel parameters is derived. Fixed point expressions for the training sequences that optimize the decoupled CRB in delay and Doppler domains are derived. For both trajectory tracking and channel estimation problems, the proposed algorithms are tested on experimental data and performance improvements of 2-5 dBs are observed over prior methods.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Novel optimization tools for structured signals recovery: channels estimation and compressible signal recovery
PDF
On the theory and applications of structured bilinear inverse problems to sparse blind deconvolution, active target localization, and delay-Doppler estimation
PDF
Communication and cooperation in underwater acoustic networks
PDF
Matrix factorization for noise-robust representation of speech data
PDF
Multidimensional characterization of propagation channels for next-generation wireless and localization systems
PDF
Reinforcement learning with generative model for non-parametric MDPs
PDF
Estimation of graph Laplacian and covariance matrices
PDF
Design and analysis of reduced complexity transceivers for massive MIMO and UWB systems
PDF
Hybrid beamforming for massive MIMO
PDF
New Lagrangian methods for constrained convex programs and their applications
PDF
Multiple pedestrians tracking by discriminative models
PDF
Multiple humnas tracking by learning appearance and motion patterns
PDF
Design and testing of SRAMs resilient to bias temperature instability (BTI) aging
PDF
Feature learning for imaging and prior model selection
PDF
Robustness of gradient methods for data-driven decision making
PDF
Exploiting side information for link setup and maintenance in next generation wireless networks
PDF
Motion pattern learning and applications to tracking and detection
PDF
Exploiting diversity with online learning in the Internet of things
PDF
Multi-constrained inversion algorithms for microwave imaging
PDF
Functional connectivity analysis and network identification in the human brain
Asset Metadata
Creator
Elnakeeb, Amr
(author)
Core Title
Exploitation of sparse and low-rank structures for tracking and channel estimation
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
07/19/2020
Defense Date
04/16/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
channel estimation,low-rank,OAI-PMH Harvest,OFDM,sparsity,target tracking
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mitra, Urbashi (
committee chair
), Goldstein, Larry (
committee member
), Soltanolkotabi, Mahdi (
committee member
)
Creator Email
amr.y.elnakeeb@gmail.com,elnakeeb@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-334498
Unique identifier
UC11663576
Identifier
etd-ElnakeebAm-8702.pdf (filename),usctheses-c89-334498 (legacy record id)
Legacy Identifier
etd-ElnakeebAm-8702.pdf
Dmrecord
334498
Document Type
Dissertation
Rights
Elnakeeb, Amr
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
channel estimation
low-rank
OFDM
sparsity
target tracking