Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Design and analysis of reduced complexity transceivers for massive MIMO and UWB systems
(USC Thesis Other)
Design and analysis of reduced complexity transceivers for massive MIMO and UWB systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Design and Analysis of Reduced Complexity Transceivers for Massive MIMO and UWB Systems Ratnam Vishnu Vardhan A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Electrical Engineering) Thesis Committee: Prof. Alexander, Kenneth (External) Prof. Chugg, Keith M. Prof. Lindsey, William C. Prof. Molisch, Andreas F. (Chair) Ming Hsieh Dept. of Electrical Engineering University of Southern California August 2018 In the loving memory of nanamma, who was always proud of her grandchildren. Acknowledgements During the six years that went into writing this thesis, I have been fortunate to have an incredible support system of faculty, colleagues, friends and family who made this journey ever so enjoyable. This journey would have been incredibly dicult if it were not for their contributions and support. I would therefore like to convey my heart felt gratitude to everyone who has made this thesis possible. Although it is impossible to individually thank all the people who have inspired and helped me at every step of the way, there are some I would like to a take moment to acknowledge. Firstly, I would like to thank my PhD. adviser Prof. Andreas Molisch for taking me under his guidance and helping me formulate the problem statement for this thesis work. I am constantly amazed by his dedication for research and shall strive to show a similar commitment in my future endeavors. I also thank him for giving me the opportunity to visit so many countries in the guise of conference trips. I would also like to thank my qualifying and defense committee members Prof. Kenneth Alexander, Prof. Keith Chugg, Prof. William C. Lindsey and Prof. Antonio Ortega for taking the time to review my thesis and for providing me with constructive suggestions and feedback. Additionally, I express my gratitude to Prof. Giuseppe Caire for being a mentor and a constant source of inspiration; Dr. Ozgun Bursalioglu and Dr. Haralabos Papadopoulos for providing me my rst industrial research experience and helping me improve my academic writing skills; and Prof. Urbashi Mitra for providing me my rst research experience during my undergraduate studies and for recommending me to the PhD program. I would also like to extend my gratitude to the present and past members of the WiDeS team: Umit Bas, Hao Feng, Zheda Li, Shengqian Han, Rui Wang, Daoud Burghal, Vinod Kristem, Seun Sangodoyin, Sundar Aditya, Hussein Hamoud and MingChun Lee for fostering a cordial work environment, for the long discussions about work, future and social life (or lack thereof) and for establishing a healthy competitive environment which kept me on my toes. In particular, I would like to thank Umit, Rui, Hao and Zheda for all the advice over the years and for being great travel buddies. The trip to Cambodia, and the animated discussions about South Asian politics and diplomacy at a restaurant near Sacr e-Coeur shall remain in my memory for a long time. Next, I would like to thank my amazing (read dicult) roommates: Amulya Yadav, Pankaj Rajak and Swarnabha Chattaraj for the practical lesson in the art of patience. On a serious note, I thank them for keeping me sane during the strenuous PhD cycle and for condoning all my antics. I would also like to thank my pseudo-roomates: Pradipta Ghosh, Nasir FNU, i Subhayan De and Arindam Jati for the intense late-night `Setters of Catan' bouts; my ex- roommate Chiranjib Choudhury for all the help and advice during the initial days of my PhD; and my undergraduate friends Yaswanth Cherivirala, Vikrant Mahajan and Somok Mondal for the constant support over the years, either in person or via chat. The multiple MB long chat histories are a true testament to how big an in uence they have been in my life. I would also like to thank my improv theater group Vidushak - my home away from home. Some of my fondest memories in these past six years have been with these amazing group of people: potlucks, Christmas parties, movie-nights, practice sessions and the Diwali plays. In particular, I would like to thank: Adarsh Shekar, Vikram Ramnarayanan, Krishnakali Dasgupta, Mallika Sanyal, Satyanarayan Rao, Sanmukh Rao, Hiteshi Sharma, Bhamati Dash and Kiran Sajjanshetty for all the fun times, for believing in me and for tolerating me and my jokes. Lastly, but most importantly, I would like to thank my family who have always believed in me and supported me. In particular I would like to thank my grandmother Mrs. Bharati Ratnam, uncle RAdm. Srinivas Ratnam, aunt Mrs. Harika Ratnam and cousin brother Siddarth Ratnam for accommodating me and taking care of me during two crucial years of my student life. I would not be here today if it was not for their sacrices. I would also like to thank my beloved sister Shruti Ratnam, for all the ghts, arguments and laughs over the years. I would nally like to thank my parents Prof. R. V. Raja Kumar and Prof. Jayashree Ratnam, for their continued support and advice when the chips were down and for all the technical and non-technical advice they have provided me over the years. The values, ideals and scientic frame of mind imbibed in me by them have played a pivotal role throughout my life, let alone during my thesis work. Whatever I try to achieve today, is just a tryst to make them proud. ii Contents 1 Introduction 1 1.1 Brief History of Reduced Complexity MIMO Transceivers . . . . . . . . . . . . . 3 1.1.1 Antenna Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Beam Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Hybrid Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.4 Non-coherent Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.5 Channel Estimation Overhead . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Motivation for the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Connections to Ultra-Wide Band Systems . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Other Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 I Hybrid Beamforming with Selection 14 2 Introduction and System Model 15 2.1 General Assumptions and Channel model . . . . . . . . . . . . . . . . . . . . . . 16 3 Capacity Analysis based on Coupled Doubly-correlated Wishart Matrices 20 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Joint distribution of channel eigenvalues . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 First-order approximation and second-order statistics . . . . . . . . . . . 22 3.2.2 Joint Distribution of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Instantaneous hSNR Capacity Analysis . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.1 hSNR System capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.2 Sum of correlated lognormals . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Application to beamformer design for HBwS . . . . . . . . . . . . . . . . . . . . . 31 3.6 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.A Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.B Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.C Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.D Appendix D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 iii 4 Beamformer Design in a Multi-user Scenario 40 4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1.1 Connections to limited-feedback precoding . . . . . . . . . . . . . . . . . . 42 4.2 Transforming the search space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3 Lower bound on the objective function . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3.1 Interpreting of the Fubini-Study distance metric - f FS ( ^ T) . . . . . . . . . 46 4.4 Design of the RD-beamformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.5 Reducing the hardware and computational complexity . . . . . . . . . . . . . . . 48 4.5.1 Reducing hardware cost of the beamforming matrix . . . . . . . . . . . . 48 4.5.2 Restricting the size of the switch position set . . . . . . . . . . . . . . . . 49 4.6 Channel Estimation Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.7.1 In uence of number of input ports (L tx ) . . . . . . . . . . . . . . . . . . . 52 4.7.2 In uence of restricted switch positions (S) . . . . . . . . . . . . . . . . . . 53 4.7.3 In uence of number of User Equipments (UEs) (M 1 ) . . . . . . . . . . . . 53 4.7.4 In uence of the bound on PfTg - D . . . . . . . . . . . . . . . . . . . . 54 4.8 Anisotropic Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.9 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.A Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.B Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.C Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.D Appendix D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.E Appendix E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5 Diversity versus Training overhead Trade-o in switched transceivers 63 5.1 General Assumptions and Channel model . . . . . . . . . . . . . . . . . . . . . . 65 5.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.3 Computing the capacities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.A Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.B Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6 Conclusions and Future research directions 75 II Analog Channel Estimation Techniques for massive MIMO 78 7 QAM capable Multi-dierential Frequency Shift Reference UWB Radio 80 7.1 General Assumptions and System Model . . . . . . . . . . . . . . . . . . . . . . . 81 7.2 Analysis of the Integrator Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.2.1 Signal plus noise power estimation . . . . . . . . . . . . . . . . . . . . . . 84 7.2.2 Signal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2.3 Noise component analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 iv 7.2.4 Pilot Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.4 Optimal Bit and Power Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.4.1 Uncoded/ Fixed Code-rate Systems . . . . . . . . . . . . . . . . . . . . . 88 7.4.2 Adaptive Code-rate Systems . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.A Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8 Multi-Antenna Frequency Shift Reference Receivers for massive MIMO 94 8.1 General Assumptions and System model . . . . . . . . . . . . . . . . . . . . . . . 96 8.2 Analysis of the demodulation outputs . . . . . . . . . . . . . . . . . . . . . . . . 99 8.2.1 Signal component analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 8.2.2 Noise component analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.3.1 Power Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.5 Improved MA-FSR designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.A Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 9 Reference Tone Aided Transmission 109 9.1 General Assumptions and System Model . . . . . . . . . . . . . . . . . . . . . . . 110 9.2 Analysis of the demodulation outputs . . . . . . . . . . . . . . . . . . . . . . . . 114 9.2.1 Signal component analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 9.2.2 Noise component analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 9.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 9.3.1 Power Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 9.3.2 Digital beamforming with full resolution ADCs . . . . . . . . . . . . . . . 117 9.4 A Sample Initial Access Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 9.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 9.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 9.A Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 10 Periodic Analog Channel Estimation aided Beamforming 123 10.1 General Assumptions and System model . . . . . . . . . . . . . . . . . . . . . . . 125 10.2 Analog beamformer design at the receiver . . . . . . . . . . . . . . . . . . . . . . 128 10.2.1 Recovery of the reference tone - using a single PLL . . . . . . . . . . . . . 128 10.2.2 Phase and amplitude oset estimation . . . . . . . . . . . . . . . . . . . . 130 10.2.3 Recovery of the reference tone - using weighted carrier arraying . . . . . . 133 10.3 Data transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 10.4 Initial access and aCSI estimation at the BS . . . . . . . . . . . . . . . . . . . . . 138 10.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 10.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 v 11 Conclusions and Future research directions 142 Acronyms 147 Index 150 Bibliography 151 vi Abstract Spectral Eciency, measured in bits/s/Hz, was the main performance criterion for design of wireless communication systems till about the last decade. However, with the advent of `inter- net of things' as well as with the increase in system bandwidth and transceiver complexities, cost and energy eciency have also become important design metrics today. Thus, in the interest of making systems practically viable, a signicant amount of focus has been laid on the design of `reduced complexity' transceivers that can reap the benets of state-of-the-art communication technologies, like massive Multiple Input Multiple Output (MIMO) and Ultra- Wide Band (UWB) communication, while being cost and energy ecient. A common feature of these reduced complexity transceivers, for e.g., antenna selection, hybrid beamforming, se- lective Rake receivers and transmit reference receivers, is reduction of the signal dimension via analog pre-processing/beamforming, thus facilitating the use of a lower dimensional digital processor. Such `hybrid' signal processing techniques usually require some knowledge of the wireless propagation channel which is conventionally obtained using digital channel estimation (CE) techniques. However, due to the lack of full signal digitization in these transceivers, a signicant fraction of transmission resources may have to be expended to obtain this channel knowledge. Thus, conventional CE techniques contribute to a large CE overhead. In the rst part of this thesis, a novel class of such reduced complexity transceivers, namely Hybrid Beamforming with Selection (HBwS), is presented, that utilize switches to aid in the analog beamforming. The use of switches enables the analog beamforming to adapt to the instantaneous channel variations, providing better user separability, beamforming gain, and/or simpler hardware than some conventional reduced complexity transceivers. In this part, the capacity of a system with a HBwS transmitter is analyzed and good designs for the analog beamforming stage are proposed. The trade-o between the system capacity and the CE overhead is also characterized and incorporated into the proposed transceiver designs. In the second part, it shall be shown that in sparse multi-path channels, estimation of the amplitude and phase of a single transmitted sinusoidal tone encompasses `sucient channel knowledge' for enabling hybrid signal processing at the receiver (RX). Since amplitude and phase estimation of a single tone is signicantly simpler than conventional CE, it can be performed in the analog domain. Thus, by avoiding digital CE, the estimation overhead of reduced complexity receivers can be lowered signicantly. To illustrate this, three novel receiver architectures that can perform analog CE shall be presented. These new receivers admit both coherent and non- coherent implementations and are in fact inspired by a class of non-coherent receivers for UWB systems. The part concludes with a discussion about the advantages and limitations of such analog CE techniques and some approaches to resolve them. Chapter 1 Introduction With progress in digital technologies and the age of the internet, data communication has pervaded many aspects of life today. From high speed video streaming, to self-driving cars, to smart home devices, to sensors in factories, the sources, their requirements and applications are innite. Since mobility is a desired feature for many of these applications, this enormous `customer' base has to be supported by wireless links, at least at the last mile. With this rising amount of data trac and the limited availability of wireless spectrum at micro-wave frequencies, it is expected that there will be a spectrum crunch in the near future. Throughput improvements via traditional approaches such as cell densication, better spectrum management and better modulation and coding, are nearing saturation and therefore there is a need for new ways to meet the rising demands. Another feature of this customer base is the variability in their requirements. For example, communications systems for bio-medical sensors may require ultra- low power techniques, sensors in factories may require ultra-low latencies, while autonomous vehicles (terrestrial or otherwise) may require resilience to high doppler spreads. Resolving the spectrum crunch while also meeting the varied demands is a key requirement for the 5 th generation (5G) of cellular technologies [1]. Of the numerous proposals within the 5G umbrella, one communication technique that has received signicant limelight is MIMO. MIMO systems [2{4], wherein both the transmitter and receiver have multiple antenna elements, promise large gains in spectral eciency by oering spatial degrees of freedom for data transmission. In the diversity mode of operation, a MIMO system can oer a diversity gain 1 of up to M tx M 2 , where M tx is the total number of antennas at the Base Station (BS) and M 2 is the number of antennas at a UE, thus oering very good link margin and strong resilience to channel fading. In spatial multiplexing mode, a MIMO BS can transmit multiple spatial streams to one or more UEs on the same time-frequency re- sources. The number of such parallel spatial streams (multiplexing gain) is upper bounded by minfM tx ;M rx g, whereM rx is the aggregate number of antennas at all the UEs, thus promising a huge increase in throughput. Intuitively, such a system can create sharp beams in desired directions thereby directing transmit power more eectively and permitting simultaneous trans- mission of multiple data streams on the same spectral resources (see Fig 1.1). In fact, with 1 Diversity gain is the negative slope of the curve between bit error rate and the signal-to-noise ratio in logarithmic scale: log(PBER)= log(SNR). 1 Figure 1.1: A simplied illustration of multi-user MIMO transmission the progress in digital and analog hardware technologies, the development of low complexity precoding algorithms and also the possibility of using the millimeter wave (mm-wave) frequency band for cellular transmission 2 , cellular networks are moving towards the massive MIMO regime i.e., BS with dozens or hundreds, and UEs with tens of antenna elements. First proposed in [5], massive MIMO systems oer several additional advantages beyond throughput increase, such as, simplied precoding, higher beamforming gains and improved energy eciency, to name a few [6]. Since its invention, massive MIMO has received signicant interest and, in fact, it is this form of MIMO that is considered one of the key enablers for 5G [7]. However, full complexity massive full complexity massive MIMO systems, with a dedicated up/down-conversion chain (RF chain) for each antenna, are hard to implement in practice. This is because, while the antenna elements are themselves cost and power ecient, the RF chains which include compo- nents like analogue-to-digital converters, digital-to-analogue converters, mixers, lters etc. are power hungry and expensive [8]. As a solution reduced complexity MIMO transceivers have been proposed, where the large antenna array is connected to a smaller number of RF chains via analog hardware. 3 These transceivers are based on the premise that without forsaking per- formance signicantly, the data precoding/combining in MIMO channels can usually be broken down into a large dimensional, albeit simple, front-end processing followed by a possibly more involved, albeit reduced dimensional, back-end processing. The front end processing is imple- mented in analog domain using analog components such as switches or phase shifters, while the more involved back-end processing is performed in digital domain via fewer RF chains. The analog hardware are cost eective and consume less power than the RF chains, thereby, making implementation of massive MIMO practically viable. Reduced complexity MIMO transceivers predate massive MIMO, and a brief literature review on them is presented in the next section. 2 mm-wave band refers to the frequencies at or above 20 GHz, which oer at least 10 times more spectrum than micro-wave frequencies, albeit with a very strong channel attenuation. mm-wave is also a strong contender within the 5G umbrella. 3 Another reduced complexity approach uses dedicated RF chains at each antenna with low bit resolution ADCs [9]. This approach is not considered within this thesis for brevity. 2 1.1 Brief History of Reduced Complexity MIMO Transceivers For ease of exposition, the discussion here shall be restricted to the downlink scenario, where a reduced complexity BS with M tx antennas andK tx RF chains transmits toM 1 full-complexity UEs with M 2 antennas each (M rx ,M 1 M 2 ). Therefore the phrases BS and transmitter (TX) or UE and RX shall be used interchangeably. Furthermore, the instantaneous wireless channel realizations shall be referred to as instantaneous Channel State Information (iCSI), while the average channel statistics, such as, the transmit/receive spatial correlation matrices, shall be referred to as second order channel statistics (aCSI). The iCSI shall be assumed to stay constant only within a coherence time-bandwidth block, and aCSI shall be assumed to remain constant over a wide time and frequency range. We shall consider both the cases where the system bandwidth is smaller than the channel coherence bandwidth, known as frequency- at fading scenario, and where it is larger than the coherence bandwidth, known as the frequency-selective fading scenario. (a) Antenna selection (b) Beam selection (c) HBiCSI (d) HBaCSI Figure 1.2: An illustration of the dierent reduced complexity transceivers at a BS with one up-conversion chain. 1.1.1 Antenna Selection One of the rst reduced complexity multi-antenna transceivers proposed was antenna selection, where a smaller number of RF chains (K tx ) are connected to the M tx transmit antennas via a bank of analog switches, as illustrated in Fig. 1.2a. For each channel realization, the analog switches connect a carefully chosen subset of K tx BS antennas to the RF chains. Knowledge of the M tx M 2 instantaneous channel realizations, also known as iCSI, for all the M 1 UEs 3 is usually required to pick this subset of antennas. 4 In one implementation involving a single UE, this subset is usually chosen by the UE and the corresponding antenna indices are fed back to the BS via a feedback channel. In another implementation involving multiple UEs, the subset of antennas is usually chosen at the BS using iCSI (see Section 1.1.5 for details on iCSI acquisition at BS/UE). In spatial multiplexing mode of operation, antenna selection can support up to minfK tx ;M rx g spatial data streams (spatial multiplexing gain) while in diversity mode of operation, antenna selection can provide the full M tx M 2 channel diversity gain to a single UE. The performance of antenna selection is often hard to characterize in closed form and most available results are for special cases involving a frequency- at fading channel with independent and identically distributed (i.i.d.) fading components. For the frequency- at i.i.d. fading case, the signal to noise ratio (SNR) distribution forM rx = 1 was studied in [13,14]. The uncoded Bit Error Rate (BER) forM 1 = 1 and diversity mode of operation was studied in [15]. The capacity in the high and low SNR regimes was characterized for M 1 = 1 in [16] and bounds on the capacity were derived in [17,18]. In contrast to the above, asymptotic results in the large antenna regime (M tx 1) were studied forM rx = 1 in [19],M 1 1;M 2 = 1 in [20] andM 1 = 1;M 2 1 in [21{23]. However there are very few results available on the performance analysis in correlated channels. For frequency- at correlated channels, SNR for single antenna selection was studied in [24], capacity bounds for the diversity mode of operation were derived in [25], and the spatial multiplexing mode was considered in [26]. The diculty of such a capacity analysis is explained in more detail in Chapter 3. Another challenge for antenna selection is the design of near-optimal antenna subset selection algorithms. The capacity maximizing antenna subset selection for the frequency- at Multiple Input Single Output (MISO) scenario is well known, and is obtained by picking the K tx antenna elements with the instantaneously highest signal strengths, also known as Generalized Selection Transmission [14, 27]. However nding the capacity maximizing antenna subset for M rx > 1 is a non-convex problem, even in the single UE scenario. Brute force search techniques are infeasible since they involve a search over Mtx Ktx subsets leading to a high computational burden. A pleathora of sub-optimal, fast subset selection algorithms have therefore been proposed based on greedy algorithms [17, 28{31] and convex optimization [32{34]. Some antenna selection algorithms where the switching happens more slowly based only on aCSI, were also explored [10{12]. A detailed review on antenna selection is available in [27,35]. While iCSI based antenna selection oers a large diversity gain in frequency- at fading, this gain reduces in frequency-selective fading channels. This is because, in the limit of a large transmission bandwidth, frequency diversity across dierent coherent bands makes all the antenna subsets equivalent. Even with frequency at fading, the performance of antenna selection degrades quickly in the presence of channel correlation. 1.1.2 Beam Selection Consequently, to improve performance in correlated channels, beam selection has been proposed, where an M tx M tx analog beamforming matrix T, implemented by analog hardware such as 4 A few exceptions exist where the selection happens, for example, based on aCSI [10{12]. 4 phase-shifters, sits between the antenna ports and the analog switches at the BS, as illustrated in Fig. 1.2b. The switch bank selects the best K tx out-of-the M tx beamformer ports for each channel realization. As in case of antenna selection, this selection can be performed by the RX and fed back to the TX in a single UE scenario, or can be directly chosen at the TX using iCSI in the multi-UE scenario. 5 Beam selection essentially changes the basis for the switching from antenna domain to beam domain, along which the channels are less correlated and provide much larger beamforming gains. Beam selection was rst proposed in [36, 37] where a xed Discrete Fourier Transform (DFT) beamformer was proposed. The optimality of this design in the large antenna limit for uniformly spaced arrays was studied in [38], and its performance in comparison to antenna selection was studied in [39,40]. Subsequently, an alternate beamformer design that adapts to aCSI was proposed in [41], where the beamformer design is the right singular vector matrix of the transmit spatial covariance matrix. While this design requires adaptive analog hardware, it yields better performance than the DFT based designs. To reduce the hardware cost, the use of unit magnitude discrete phase-shifters to build T was considered in [41, 42]. More recently, lens based implementations of the DFT beamformer have also been explored [43, 44], to lower the hardware cost. Since nding the capacity maximizing subset of K tx input ports is a non-convex problem, several fast selection algorithms have been proposed for beam-selection. Maximum magnitude selection was proposed in [43], where the K tx ports with the largest signal magnitude are selected for each channel realization. To better handle multi-user interference, signal-to-interference-plus-noise ratio based selection was considered in [45], greedy beam selection algorithms for sum rate maximization are considered in [46, 47] and an interference aware two level beam selection was considered in [48]. As is the case with antenna selection, the diversity gain of iCSI based beam selection diminishes in frequency selective fading channels. Despite providing diversity and/or beamforming gains, antenna and beam selection in their original form might not be suitable for massive MIMO systems. One one hand, antenna selection performs poorly in the highly directional channels encountered with massive MIMO. On the other hand, beam-selection generally suers from a large hardware cost owing to the large dimension of the beamformer T for M tx 1. 6 Additionally, in most of the prior works the beamforming matrix oers orthogonal beam choices which may be inferior, especially in spatially sparse channels [50]. Finally, the full iCSI requirement poses a large CE overhead (see Section 1.1.5). 1.1.3 Hybrid Beamforming This led to the proposition of another class of reduced complexity transceivers that is more tailored for massive MIMO, called hybrid beamforming [51{53]. In hybrid beamforming, the switch bank is removed and the analog beamforming matrix T directly connects to theK tx RF chains on one side and the M tx antennas on the other, thus having a dimension of M tx K tx . Since usuallyK tx M tx , this architecture has much lower hardware cost than beam selection. This concept was rst introduced in the mid-2000s in [41, 54, 55], motivated by the fact that 5 In an alternate design the selection may be performed based on aCSI. 6 Some works, however, have explored use of switches between the antennas and the analog beamforming matrix [49] to reduce this hardware cost. 5 the number of RF chains is only lower-limited by the number of data streams that are to be transmitted; in contrast, the beamforming gain and diversity gain is given by the number of antenna elements if suitable beamforming is done. The interest in hybrid beamforming has exploded over the past few years [56{71] thanks to the recent contributions by Heath and co- workers. This was motivated by the fact that since the wireless channels for massive MIMO (especially at mm-wave frequencies) are inherently directional, analog hardware can be utilized to focus power into the dominant channel directions with minimal loss in system performance. While numerous architectures have been proposed, they can be broadly classied based on the channel state information utilized in designing/updating the beamforming matrix T. In one architecture called Hybrid Beamforming based on iCSI (HBiCSI) [54{63], the beam- forming matrix, and therefore the analog precoding beams, adapt to the iCSI, as illustrated in Fig. 1.2c for a single UE case. In the case of a single UE with frequency- at channel fading, the capacity maximizing beamforming matrix is composed of the K tx right singular vectors of the channel matrix, corresponding to the K tx largest singular values [55]. However this analog beamformer design requires the use of analog components that can provide arbitrary ampli- tude and phase control, which imposes a large implementation cost. Therefore subsequent works have explored beamformer designs with unit gain and/or discrete phase-shift compo- nents. While the capacity maximizing design of the beamformer with such components is a non-convex optimization problem, several techniques have been proposed for the design of near optimal beamformers based on sparse recovery techniques [57], alternating optimization [58{60] or de-coupling of analog and digital precoding [61{63]. To reduce hardware cost partially con- nected architectures, where each RF chain is only connected to a subset of the antennas, have also been proposed [58, 64]. While HBiCSI promises good performance 7 , full iCSI correspond- ing to all the M 1 UEs may be required leading to a large CE overhead, similar to antenna selection and beam selection. Additionally, HBiCSI imposes strict performance specication requirements on the analog hardware. This is because their parameters may have to be up- dated several times within each coherence time interval, which can be very short especially at mm-wave frequencies. In the other architecture called Hybrid Beamforming based on aCSI (HBaCSI), the beam- forming matrix adapts to the slowly-varying aCSI, such as, the transmit/receive spatial correla- tion matrices, as illustrated in Fig. 1.2d. Since aCSI changes slowly, it can be acquired with a low overhead and the analog hardware parameters need to be updated infrequently. Additionally, iCSI is only needed in the channel sub-space spanned by the analog precoding beams, leading to a signicant reduction in the CE overhead in comparison to HBiCSI. Hybrid beamformers using aCSI for the analog part were rst suggested in [41] for the single UE scenario, where it was shown that the capacity maximizing beamforming matrix is composed of the K tx right singular vectors of the TX spatial correlation matrix, corresponding to the K tx largest singular values. For multiple UEs, [64] proposed `Joint Space Division Multiplexing' that groups the UEs with similar spatial correlation matrices into groups and utilizes aCSI to beamform to each group while also suppressing inter group interference. Several alternate designs or extensions have also been considered with dierent architectures for the analog beamformer [65{71]. While it enjoys a lower CE overhead and hardware cost, the performance (excluding CE overhead) 7 Here system performance refers to the capacity excluding the CE overhead. 6 of HBaCSI may be worse than HBiCSI since the analog beams can only span a xed channel subspace of dimension K tx , which does not adapt to iCSI. This loss however maybe small if either the system has a wide-bandwidth with frequency selective fading or the dimension of the dominant channel subspace at the BS 8 , or equivalently, the rank of the BS spatial correlation matrix isK tx [64]. 1.1.4 Non-coherent Receivers Another approach to reduce the system complexity of massive MIMO transceivers is by the use of non-coherent reception. However unlike the previous subsections, this approach can only reduce the hardware complexity of the RX and not the TX. In a non-coherent RX, only the amplitude information of the received signal is exploited for data demodulation and not its phase. Thus, recovery of the carrier at the RX is not required and the RX is resilient to the TX carrier phase noise. Some non-coherent schemes also do not require iCSI for symbol detection, thus signicantly reducing CE overhead. Non-coherent RXs can be broadly classied into energy detection based techniques and auto-correlation based techniques. In energy detection, the TX transmits an amplitude modulated data stream and the RX performs energy detection at each antenna and combines these energies for symbol detection [72, 73]. Such an RX does not require iCSI and only requires one down-conversion chain for demodulation, thus reducing implementation cost. In one class of auto-correlation based techniques, the TX transmits dierentially encoded symbols and the RX performs multi-antenna, multi-symbol dierential detection [74{76]. However these dierential schemes usually require a full complexity receiver at each RX antenna, and therefore may require a large implementation cost. Another class of auto-correlation RXs with low hardware cost shall be explored later in this thesis in Chapter 8. Since these non-coherent techniques are mainly for complexity reduction at the RX, they shall not be the focus of discussion in this thesis (except in Chapters 7{8). 1.1.5 Channel Estimation Overhead Note that for all the reduced complexity schemes discussed above, some form of Channel State Information (CSI), either iCSI or aCSI, is required to perform the analog and digital processing. Full iCSI (which is required for antenna selection, beam selection and HBiCSI) can be obtained at the BS either via uplink training in Time Division Duplexing (TDD) mode, or via downlink training followed by iCSI feedback in Frequency Division Duplexing (FDD) mode of operation. The aCSI can be obtained in a similar way to iCSI, with the added advantage that in some scenarios it may have a very large coherence time and coherence bandwidth. The overhead for such CSI acquisition can be rather large when reduced complexity architectures are present at both the BS and the UEs, which is a scenario of interest for mm-wave systems. Unlike full complexity massive MIMO systems, the presence of fewer RF chains at the BS and UE necessitates transmission of multiple temporal pilots for performing such CE, even when the coherence bandwidth may be innitely large [77{80]. This is because the limited number of RF 8 It represents the channel subspace at the BS along which a signicant portion of the channel power is concentrated. Such a notion is quite common for massive MIMO and is used in several proposed schemes like Joint Space Division Multiplexing [64]. 7 chains limits the dimension of the channel that can be observed at the BS or UE, at any one time, as illustrated in Fig. 1.3. As an illustration, even if the CSI coherence bandwidth can accomodate orthogonal frequency pilots for all M rx UE antennas in TDD or M tx antennas in FDD, full or exhaustive CE approaches [78] require MtxMrx KtxKrx temporal pilot transmissions within each CSI coherence time. The overhead is larger for FDD since the channel estimates have to be further fed back to the TX via a feedback channel. Such a large pilot overhead may consume a signicant portion of the time-frequency resources when the CSI coherence time is short, such as in vehicle-to-vehicle channels, in systems using narrow TX/RX beams, e.g., massive MIMO, or in channels with large carrier frequencies and high blocking probabilities, e.g., at mm-wave frequencies [81]. The overhead also increases system latency and makes the Initial Access (IA) 9 procedure very cumbersome [82{84]. Figure 1.3: An illustration of exhaustive CE with K tx =K rx = 1. Consequently several approaches have been proposed in the literature to reduce the CE overhead. The rst such scheme was Limited Feedback Precoding (LFP), which was mainly designed to reduce the channel feedback overhead in FDD systems. In a single UE LFP (M 1 = 1), the BS and UE share an apriori codebook of precoding matrices, and the UE uses iCSI to pick the best precoding matrix from the codebook and only feeds back only the codebood index to the BS. Several dierent designs for the codebooks based on Grassmannian sub-space packing [85{87], rotation and scaling [88] or random vector quantization [89, 90] have been proposed. Note that antenna selection and beam selection can be interpreted as a type of LFP, in the single UE scenario. In a multi-UE scenario, LFP either involves the feedback of quantized CSI from the UEs [91] or uses opportunistic beamforming at the BS with SNR feedback from the UEs [92,93]. A detailed overview of dierent LFP techniques is available in [94]. However such LFP schemes are only benecial for FDD systems and only reduce the feedback overhead but not the pilot overhead. Several fast CE approaches have therefore been suggested to reduce the pilot overhead, which are discussed below assuming M tx =K tx =K rx = 1 for convenience. 10 Side information aided frequency- at CE approaches utilize reciprocity of aCSI or temporal correlation to re- duce the iCSI pilot overhead [79,95{97]. Compressed sensing based CE approaches [77,98{100] exploit the sparse nature of the massive MIMO channels to reduce the temporal pilots up to O[ log(M rx =)], where is the channel sparsity level. Iterative angular domain CE uses pro- 9 Initial access refers to the phase wherein, a UE and BS discover each other, synchronize, and coordinate to initiate communication. 10 ForMtx > 1, the pilot overhead increases further, either multiplicatively or additively, by a function of Mtx, determined by the CE algorithm used at the TX. 8 Performance Hardware Cost Beamformer update CE Overhead Antenna Selection Poor Low NA High Beam Selection Good High Slow High HBiCSI Good Low Fast High HBaCSI Moderate Low Slow Moderate Table 1.1: Comparison of coherent reduced complexity transceivers for massive MIMO gressively narrower search beams at the RX to reduce the required temporal pilot transmissions toO(logM rx ) [84,101,102]. Approaches that utilize side information to improve iterative angu- lar domain CE [103,104] or perform angle domain tracking [105,106] have also been considered. Sparse ruler based approaches exploit the possible Toeplitz structure of the spatial correlation matrix to reduce temporal pilots to O( p M rx ) [80,107{110]. 1.2 Motivation for the thesis To summarize the previous section (see also Table 1.1), antenna selection performs poorly while beam selection suers from a large implementation cost in massive MIMO systems. While HBiCSI oers good performance, it may impose strict design constraints on the analog hardware which may need to be updated frequently. Additionally, all these 3 schemes suer from a large CE overhead. HBaCSI reduces CE overhead and allows analog hardware to update only based on average channel statistics, thereby lowering hardware cost. However its performance may be inferior to HBiCSI since the analog beams can only span a xed channel subspace of dimension K tx , which does not adapt to iCSI. The performance gap may be especially large in frequency- at fading systems if the rank of the TX spatial correlation matrix, or equivalently the dimension of the dominant channel subspace is much larger thanK tx , which is possible both at microwave [111] and mm-wave [81] frequencies. This problem is further aggravated in multi- UE systems since the BS RF chains have to be allocated to dierent UE groups, leading to only a small number of RF chains per UE/ UE group. While there has been a signicant amount of work on HBiCSI and HBaCSI under dierent system models and constraints, there is limited work on bridging the performance gap between the two. This raises the interesting question of whether it is possible to adapt the precoding beams to iCSI while still maintaining a low CE overhead and allowing the analog beamforming matrix to update infrequently based on aCSI. A clue in this regard is provided by the beam selection scheme discussed in the previous section. In beam selection, the beamforming matrix provides several static or aCSI based options for analog precoding beams, and the switching ensures that the eective precoding beam adapts to the current channel realization. While this would still require switches to update to iCSI, unlike phase shifters which are used to build the analog beamformer, analog switches are cheap, have low insertion loss and can be easily designed to switch quickly based on iCSI, thereby reducing the hardware constraints [31,41,112{115]. However note that beam selection in its conventional form suers from a very large CE overhead and a large hardware cost. As a solution, in this thesis a generalization of beam selection, namely, Hybrid Beamforming with Selection (HBwS) 9 is proposed, as illustrated in Fig. 1.4. In HBwS the beamforming matrix T has a dimension of M tx L tx , where K tx L tx M tx , thus reducing the dimension of the analog beamforming matrix and therefore its cost in comparison to beam selection. The switch bank connects the bestK tx out-of-the L tx input ports of T to the RF chains based on iCSI. The precoding beam options provided by the beamformer may be non-orthogonal, and are carefully designed to reduce the overhead for obtaining this iCSI at the BS. In Part I of this thesis, HBwS is studied in detail, as a solution to achieve performance comparable to HBiCSI in frequency- at fading systems, while still retaining some benets of HBaCSI i.e., infrequent update of analog hardware parameters, low hardware cost and low CE overhead. Figure 1.4: An illustration of hybrid beamforming with selection at the BS Another observation from Section 1.1 is that all forms of reduced complexity transceivers require some form of CSI at the BS and UEs, either iCSI or aCSI, and to reduce the overhead for such CSI acquisition an array of fast CE algorithms have been proposed. However, these fast algorithms are only partially successful in reducing the pilot overhead, especially when reduced complexity architectures are present both at the BS and the UEs. This is exemplied by the fact that the number of pilot re-transmissions required per coherence time still remains a function of M rx whenM tx = 1 in these algorithms. Furthermore, some of these fast CE approaches may not be applicable for IA phase since they would require the timing and frequency synchronization [116, 117] to be performed without the BS/UE beamforming gain, which may be dicult at the low signal-to-noise ratio (SNR) and high phase noise levels expected in mm-wave systems. Some of these CE approaches also require the channel to be static during the re-transmissions and are only applicable for certain antenna congurations and/or channel models. Finally, since the CE approaches iterate over several analog beams, the pilots may have to be spaced suciently far apart [118] to reduce the impact of the transient eects of analog hardware on CE [119], thus, potentially increasing the latency. The main reason for the pilot overhead is that conventional CE approaches require processing in the digital domain, while the BS/UE only have a few up/down-conversion chains. In Part II of the thesis, a novel class of reduced complexity RXs are explored that can obtain the required CSI for beamforming via CE in the analog domain. These designs are based on the premise that in sparse multi-path channels, acquiring the amplitude and phase of a sinusoidal reference tone (that is transmitted along with the data) at each RX antenna is sucient CSI to perform receive beamforming. Estimation of 10 the amplitude and phase of a reference sinusoidal tone is signicantly simpler than conventional CE and can be performed at each RX antenna by analog hardware. Such amplitude and phase estimation shall henceforth be referred to as analog channel estimation (ACE). As the CE is done in the analog domain, O(1) pilots are sucient to update the RX beamformer, thereby alleviating the CE overhead at the RX. In essence, while conventional reduced complexity receivers only push part of the beamforming/combining into the analog domain, this class additionally also pushes the CE required to performing such beamforming/combining also into the analog domain. Though this class of receivers is mainly suggested for use at the UEs, it shall be shown that their use at the UEs helps reduce the CE overhead at both the BS and UE. 1.3 Connections to Ultra-Wide Band Systems Another popular communication protocol with application areas ranging from short range com- munication, sensor networks and radar systems is spread spectrum Ultra-Wide Band (UWB) communication. In spread spectrum UWB, the data signal is spread over a much wider band- width prior to transmission [120]. This spreading provides resilience to narrow band interference and small-scale fading. Furthermore, the large bandwidth also allows resolution of a large num- ber of channel multi-path components, thus providing good localization capabilities. Several approaches for such spectrum spreading have been studied. In time hopping impulse radio, each data signal is modulated by a sequence of UWB pulses, where pulses undergo pseudo- random temporal hops in each frame [121,122]. In frequency hopping schemes, the data signal is modulated by a carrier signal, where the carrier `hops' to dierent frequencies as a function of time [120, 123]. Another spreading approach which was popular for 3 rd generation cellular technologies (3G) is direct sequence spread spectrum, where each data signal is modulated by a pseudo-random code occupying a much wider bandwidth and enjoying good auto-correlation and cross-correlation properties. Such spread spectrum UWB systems are mathematically anal- ogous to frequency- at fading Single Input Multiple Output (SIMO) systems. Just as multiple antennas at an RX can allow resolution of various channel multi-path components (MPCs) in the spatial domain, a UWB system allows their resolution in the delay domain due to its wide signal bandwidth [124, 125]. An All-Rake RX, illustrated in Fig. 1.5a, is can be used to accumulate MPC power from the multiple resolvable delay bins. However in a practical UWB system the number of such delay bins is usually large ( 100), thereby needing a large number of Rake correlators. Therefore, similar to the reduced complexity transceivers dis- (a) All Rake (b) Selective Rake (c) Partial Rake Figure 1.5: An illustration of the dierent reduced complexity Rake receivers. 11 11 cussed in Section 1.1, several reduced complexity RXs have been proposed for UWB systems, which are discussed next assuming the number of resolvable delay bins is M rx . A Selective Rake RX [124, 126] is analogous to receive antenna selection where the RX is equipped with K rx correlators, and a bank of switches connects the strongestK rx out-of-theM rx delay bins to correlators based on iCSI (K rx M rx ), as illustrated in Fig. 1.5b. 12 A Partial Rake RX [127] is analogous to HBaCSI, where the RX is equipped with K rx programmable delay elements and correlators, and the delays are chosen based on aCSI (such as the power delay prole), as illustrated in Fig. 1.5c. 13 Analogous to the discussion in Section 1.1.5, such reduced com- plexity Rake receivers suer from a large CE overhead. One solution to accumulate the MPC powers while avoiding such CE is via the use of non-coherent UWB reception. Apart from avoiding CE, non-coherent reception also does not require carrier synchronization thus making the receiver resilient to phase noise. Non-coherent RXs for UWB can be broadly classied into: energy detection and auto-correlation reception [128, 129]. Energy detection based im- pulse radio is analogous to the energy-based multi-antenna RX discussed in Section 1.1.4, and it integrates the received energy over a symbol duration to demodulate amplitude modulated transmit signals. Auto-correlation RXs on the other hand correlate the received signal with a shifted version of itself, the shift being either in time, frequency or code domains. Such a shift prior to integration additionally allows detection of the sign of the transmit signal along with the amplitude, thus allowing a higher modulation index than energy detection. The simplest form of the auto-correlation reception is the transmit reference scheme [130], wherein the TX transmits a reference (unmodulated) UWB pulse before every data modulated pulse within each symbol duration. The RX multiplies the RX signal with a delayed version of itself, to use the RX signal corresponding to the reference pulse as a `matched lter' for combining the data modulated pulse, as illustrated in Fig. 1.6. A detailed overview of such non-coherent techniques for impulse radio UWB is given in [128]. Due to the underlying similarities between the two, though the main part of this thesis is dedicated to massive MIMO systems, analogous problems for UWB systems shall also be treated where relevant. In particular, as shall be shown in Chapter 5, the judicious management of the CE overhead for HBwS leads us to a new type of Rake receiver called Judiciously Trained Selective Rake (JS-Rake) (see also [131]). Furthermore, the transmit reference schemes [128, 130, 132, 133] for UWB, discussed above, do not have multi-antenna analogues. As shall be shown in Part II, these UWB transmit reference schemes are the starting point for our work on ACE techniques for reduced complexity massive MIMO systems. 1.4 Other Works During my PhD, I also had the privilege of working on several other topics not included in this thesis. One such work of notable contribution is the concept of `interlaced clustering'. It is an 11 The delay elements are only for illustration and in an actual implementation, the delay elements are emulated by using appropriately delayed matched lters at the correlators. 12 Under the uncorrelated scattering assumption, the resolved channel MPCs in a UWB are uncorrelated. Therefore there is no need for a UWB counterpart to beam selection. 13 In an alternate design for partial rake, the delay elements are xed, and they combine only the rst Krx resolvable delay bins. 12 (a) Transmit structure (b) RX block diagram Figure 1.6: An illustration of the transmit reference impulse radio UWB system approach to improve the cell edge throughput in distributed antenna systems (also known as cloud-RANs) via the use of dierent spatially shifted (by less than a cell size) coverage patterns on dierent parts of the spectrum. The interested reader is referred to the corresponding publications [134] [135] for a more detailed discussion: J1 V. V. Ratnam, A. F. Molisch and G. Caire, \Capacity Analysis of Interlaced Clustering in a Distributed Transmission System With/Without CSIT," in IEEE Transactions on Wireless Communications, vol. 15, no. 4, pp. 2629-2641, April 2016. C1 V. V. Ratnam, G. Caire and A. F. Molisch, \Capacity analysis of interlaced clustering in a distributed antenna system," IEEE International Conference on Communications (ICC), London, 2015, pp. 1727-1732. 13 Part I Hybrid Beamforming with Selection In this part of the thesis, a class of reduced complexity switched transceivers called Hybrid Beamforming with Selection (HBwS) shall be introduced and studied. The organization of this part is as follows: Chapter 3 analyzes the system capacity for HBwS, Chapter 4 studies the capacity maximizing beamformer design problem for HBwS and proposes techniques to reduce the hardware cost of the switch bank and the beamforming matrix, Chapter 5 studies the optimal trade-o between the system capacity and the channel estimation overhead for HBwS under certain simplications, and nally Chapter 6 presents the conclusions and future research directions in this area. The patents and publications corresponding to this part are as follows: P1 V. V. Ratnam, A. F. Molisch, et. al., \System design and algorithm for judicious pilot training in low complexity transceivers," provisional patent - 62/398,447. M1 A. F. Molisch, V. V. Ratnam, S. Han, et. al., \Hybrid Beamforming for Massive MIMO: A Survey," in IEEE Communications Magazine, vol. 55, no. 9, pp. 134-141, 2017. J2 V. V. Ratnam, A. F. Molisch and H. C. Papadopoulos, \MIMO Systems With Restricted Pre/Post-Coding - Capacity Analysis Based on Coupled Doubly Correlated Wishart Ma- trices," in IEEE Transactions on Wireless Communications, vol. 15, no. 12, pp. 8537- 8550, Dec. 2016. J3 V. V. Ratnam, O. Y. Bursalioglu, H. C. Papadopoulos and A. F. Molisch, \Hybrid Beamforming with Selection for Multi-user Massive MIMO Systems" accepted to IEEE Transactions on Signal Processing, 2018. C2 V. V. Ratnam, A. F. Molisch, et. al., \JS-RAKE: Judiciously trained selective rake receiver for UWB systems. in IEEE International Conference on Ubiquitous Wireless Broadband (ICUWB), Nanjing, Oct. 2016. [BEST STUDENT PAPER AWARD] C3 V. V. Ratnam, A. Molisch, et. al.,\Diversity versus training overhead trade-o for low complexity switched transceivers, in IEEE Global Communications Conference (Globe- com), (Washington, USA), Dec. 2016. C4 V. V. Ratnam, O. Y. Bursalioglu, H. C. Papadopoulos and A. F. Molisch, \Preproces- sor design for hybrid preprocessing with selection in massive MISO systems," in IEEE International Conference on Communications (ICC), Paris, 2017. 14 Chapter 2 Introduction and System Model As mentioned in Section 1.2, despite having a much lower CE overhead and only requiring infrequent update of the analog hardware, the performance 1 of HBaCSI may be worse than that of HBiCSI since the analog beams can only span a xed channel subspace of dimension K tx , which does not adapt to iCSI [64]. This capacity gap may be rather large in frequency- at fading systems if the rank of the channel spatial correlation matrix, or equivalently the dimension of the dominant channel subspace is much larger than K tx , which is possible both at microwave [111] and mm-wave [81] frequencies. A good solution to bridge this capacity gap is via the use of selection techniques. However conventional selection-based reduced complexity transceivers, i.e., antenna selection and beam selection suer from a large CE overhead, poor performance and/or a large hardware cost as discussed in Section 1.1. Therefore, a generalization of such selection transceivers, namely HBwS, is proposed here as a solution to achieve performance 1 comparable to HBiCSI, while still retaining some benets of HBaCSI i.e., infrequent update of analog hardware parameters and low channel estimation overhead. The block diagram of a TX with HBwS is given in Fig. 2.1. Similar to conventional hybrid beamforming (see Section 1.1), there again is an analog beamforming matrix T that is connected to theM tx antennas. However, unlike the conventional hybrid beamforming schemes (HBaCSI/HBiCSI), the number of input ports for the beamforming matrix ( L tx ) is larger than the number of available up-conversion chains ( K tx ). This matrix is preceded by a bank of K tx one-to-many radio frequency (RF) switches, each of which connects one up-conversion chain to one of several input ports. Note that each connection of the K tx up-conversion chains to K tx out-of-the L tx input ports corresponds to a distinct analog precoding beam in Fig. 1.4. While the beamforming matrix is designed based on aCSI, the switches exploit iCSI to optimize this K tx out-of-the L tx input port selection. Note that transmit antenna selection [27,136,137] is a special case with T =I Mtx , and transmit beam selection [37,41,105,138] is a special case with L tx =M tx (see Section 1.1 for a detailed discussion on them). Since the beamformer uses only aCSI, HBwS is more resilient to the transient response of phase-shifters [112] than HBiCSI, thereby easing performance specication requirements on them. 2 The premise for this design is that unlike phase shifters, RF switches 1 Here system performance refers to the capacity excluding the channel estimation overhead. 2 Adapting to iCSI involves changing the precoding beam multiple times within a coherence interval (see Section 4.6). 15 Figure 2.1: A Block diagram of Hybrid Beamforming with Selection at the TX are cheap, have low insertion loss and can be easily designed to switch quickly based on iCSI [31, 41, 112{115]. Since L tx > K tx and switches adapt the eective precoding beams to iCSI, the analog precoding can span a larger channel subspace and a larger beamforming gain can be achieved in comparison to HBaCSI. Furthermore, due to its superior beam-shaping capabilities, HBwS provides better user separation than HBaCSI in a multi-user system. On the downside, since the analog beams span a larger channel subspace, the CE overhead for HBwS may be larger than for HBaCSI. However by designing the beamformer carefully, the overhead can be made signicantly lower than for full CE. Additionally, since the beamformer has a size of M tx L tx , as opposed to M tx K tx for HBaCSI, a larger number of analog components are required for HBwS. The estimation overhead and techniques to reduce the number of analog components shall be discussed in more detail in Sections 4.6 and 4.5. Notation used in this work is as follows: scalars are represented by light-case letters; vectors by bold-case letters; matrices are represented by capitalized bold-case letters and sets and subspaces are represented by calligraphic letters. Additionally, a i represents the i-th element of a vector a,jaj andkak P represent the ` 2 and ` P norms of a vector a, A i;j represents the (i;j)-th element of a matrix A, [A] cfig and [A] rfig represent the i-th column and row vectors of matrix A respectively,kAk F represents the Frobenius norm of a matrix A, A y is the conjugate transpose of a matrix A,PfAg represents the subspace spanned by the columns of a matrix A, jAj represents the determinant of a matrix A andjAj represents the cardinality of a setA or dimension of a spaceA. Also, N k = N! (Nk)!k! where N! is the factorial of N, d = is equivalence in distribution, 0 implies a positive semi-denite constraint, Efg represents the expectation operator, P is the probability operator, I i and O i;j are the ii and ij identity and zero matrices respectively, andR andC represent the eld of real and complex numbers. 2.1 General Assumptions and Channel model The downlink of a single cell system is considered, having one BS withM tx antennas (M tx 1) and implementing HBwS, and multiple UEs, modeled as a multi-user massive MIMO broadcast channel. The presented results can also be extended to the uplink Multiple Access Channel 16 (MAC) with HBwS at the BS. Similar to the system model in [64], it is assumed that the UEs can be spatially divided into UE groups, with common intra-group spatial channel statistics and orthogonal channels across the groups, as illustrated in Fig. 2.2a. If such a grouping for all UEs is not possible, a suitable UE selection algorithm can be used [96]. It is further assumed that the BS transmit resources, such as the RF chains, TX power, switches and analog hardware are split among the dierent UE groups based on the average channel statistics, via an aCSI based resource sharing algorithm. 3 While such algorithms already exist for HBaCSI [67, 69, 139], extending them for HBwS is beyond the scope of this paper. Since the dierent UE groups have orthogonal channels and the resources are split among the groups based on aCSI, the transmission to each UE group can be treated independently. Therefore, without loss of generality, the analysis shall be restricted to a single representative UE group with M 1 UEs. The BS allocates K tx up-conversion chains to this UE group (K tx M tx ). The portion of the TX RF analog beamforming matrix allocated to the UE group T, has a dimension ofM tx L tx and a corresponding sub-set of switches, denoted by a selection matrix S, are used to connect K tx out-of-the L tx input ports of the beamforming matrix to the K tx up-conversion chains. Note that K tx K tx , L tx L tx , T is a sub-matrix of T and S is a sub-matrix of S, where the right hand sides refer to the total resources at the BS (see Section 2 and Fig. 2.1). The M 1 receivers in the group have M 2 antennas and M 2 down-conversion chains each. Furthermore, the total number of RX antennas in the UE group is dened asM rx ,M 1 M 2 and it is assumed that K tx M rx . A narrow-band system with a frequency at and temporally block fading (a) User layout (b) Resource allocation Figure 2.2: Illustration of: (a) a sample user layout (b) aCSI based resource allocation to the user groups channel is considered. Under these assumptions, the downlink baseband received signal at UE m, for a given selection matrix S, can be expressed as: y m (S) = p e H m TSx + n m (2.1a) 3 While iCSI based inter-group resource sharing may potentially improve performance, it may pose stringent requirements on the system hardware and increase system complexity. 17 = p e H m TSGu + n m (2.1b) where y m (S) is the M 2 1 received signal vector at UE m, is the mean receive SNR, e H m is the M 2 M tx downlink channel matrix for UE m, S is a L tx K tx sub-matrix of the identity matrixI Ltx - formed by picking K tx out-of-theL tx columns, x is theK tx 1 sub-matrix of the K tx 1 digitally precoded transmit vector x corresponding to the K tx allocated up-conversion chains and n m CN (O M 2 M 2 ;I M 2 ) is the normalized additive white Gaussian noise observed at UE m. Without loss of generality, one can dene u , G 1 x, where G is a K tx K tx a full-rank matrix that ortho-normalizes the columns of TS i.e., G y S y T y TSG =I Ktx . Here it is implicitly assumed that TS has linearly independent columns for each S. The transmit power constraint can then be expressed as: trfTSE x fxx y gS y T y g 1 (2.2a) )E u fu y ug 1: (2.2b) Note that the ortho-normalization matrix G is dened only to simplify (2.2a) and does not constitute the entire digital base-band precoding, a part of which may still exist in u. The channel is assumed to contain both a large scale fading as well as a small scale fading component. The small scale fading statistics are assumed to be Rayleigh in amplitude and doubly spatially correlated (both at transmitter and receiver end). As illustrated in Fig. 2.2a, the UEs in a group are close enough to share the same set of local scatterers, but are suf- cient wavelengths apart to undergo i.i.d. small scale fading. Therefore, it is assumed that the channels to the dierent UEs are independently distributed and follow the widely used Kronecker correlation model [140] with a common transmit spatial correlation matrix R tx but individual receive correlation matrices R rx;m , respectively. Let R tx = E tx tx E y tx be the eigen- decomposition of the transmit correlation matrix such that the diagonal elements of tx are arranged in descending order of magnitude. Under these conditions, the channel matrices can be expressed as: e H m = R 1=2 rx;m H m R 1=2 tx (2.3a) = R 1=2 rx;m H m [ tx ] 1=2 [E tx ] y ; (2.3b) where H m ; H m are M 2 M tx matrices with i.i.d.CN (0; 1) entries. Without loss of generality, the mean pathloss for the UE group is included into and any UE specic large scale fading components are included in R rx;m . For ease of notation the former expansion, i.e., (2.3a) shall be used in Chapter 3 and the latter, i.e., (2.3b) shall be used in Chapter 4. The BS is assumed to have perfect knowledge of the aCSI metricsfR tx ; R rx;1 ;:::; R rx;M 1 g, which is used to update the analog beamforming matrix. The BS is also assumed to have perfect knowledge of the eective iCSI channels for the UE groupf e H 1 T;:::; e H M 1 Tg, which is used to update the selection matrix S. Finally, each UE m is assumed to know its eective channel after picking T and S, i.e., e H m TSG. The corresponding channel estimation overhead is discussed later in Section 4.6. A generic switching architecture is considered, where S , fS 1 ;::; S jSj g denotes the set of all feasible selection matrices. Note that depending on the switch bank architecture, this set, referred to as the switch position set, may not involve all the Ltx Ktx choices. For each 18 S i 2S, let G i be the corresponding orthogonalization matrix in (2.1b) i.e., G y i S y i T y TS i G i = I Ktx . Although G i is also a function of T, this dependence is not explicitly shown for ease of representation. For a given selection matrix S i , note that the downlink channel described in Section 2.1 is a broadcast channel with eective channel matrices e H m TS i G i for each UE m. Therefore using uplink-downlink duality [141], the instantaneous sum capacity achievable using Dirty Paper Coding (DPC) [142,143] can be expressed as: 4 e C(T; e H) = max 1ijSj;fP 1 ;::;P M 1 g log I Ktx + M 1 X m=1 G y i S y i T y e H y m P m e H m TS i G i ! (2.4) subject to: P m 0; M 1 X m=1 TrfP m g 1; where P m represents the M 2 M 2 dual-uplink MAC transmit covariance matrix at UE m and one denes e H = e H y 1 e H y 2 ::: e H y M 1 y . Although (2.4) is convex infP m g for each i, the optimal covariance matricesfP m g are not known in closed form. Therefore, a sub-optimal solution: P m = 1 Mrx I M 2 shall be relied upon, which is optimal under a large SNR if e H has a full row-rank and K tx M rx [144, 145]. 5 Using this solution in (2.4) and applying the Sylvester's determinant identity [146], a tractable instantaneous sum capacity lower bound can be obtained as: C(T; e H), max 1ijSj log I Mrx + M rx e HTS i G i G y i S y i T y e H y : (2.5) Since it is assumed the capacity optimal dirty-paper coding is used instead of linear pre-coding, a base-band digital beamforming matrix does not show up in (2.5). HenceforthC(T; e H) shall be referred to as the hSNR (hSNR) instantaneous sum capacity, recalling thatC(T; e H) e C(T; e H) in general, with equality in the high SNR regime. From (2.3b) and the fact that e H m are independent for all m, one obtains: e H = R 1=2 rx HR 1=2 tx ; (2.6a) = R 1=2 rx H[ tx ] 1=2 [E tx ] y ; (2.6b) where R rx is a block-diagonal matrix with the m-th diagonal block being R rx;m and H = H y 1 H y 2 ::: H y M 1 y , H = H y 1 H y 2 ::: H y M 1 y are M rx M tx matrices with i.i.d. CN (0; 1) entries. 4 The channel is assumed to remain static for long enough that the capacity for each channel fading realization is a meaningful metric. 5 The SNR here is including the analog beamforming gain, and additionally the UEs in a group have a comparable signal strength. Therefore these conditions may be usually satised if Rrx;m has a full rank8m. 19 Chapter 3 Capacity Analysis based on Coupled Doubly-correlated Wishart Matrices 3.1 Introduction In this chapter the instantaneous capacity of a TX with HBwS shall be characterize, for a xed T. Such an analysis can provide insight into the in uence of dierent system parameters on the performance, and consequently can help design good, near-optimal systems. From (2.5), note that a general abstraction of the hSNR instantaneous system capacity for HBwS in can be expressed as: C( e H), max Q2Q log I Mrx + e HQ i Q y i e H y ; (3.1) whereQ is a `codebook' ofM tx K tx unitary matrices and the dependence on T is dropped for convenience. This capacity structure is not conned to HBwS but is rather are applicable to a wider range of systems called, restricted pre-coded systems. As an illustration, consider the case of (Limited Feedback Precoding (LFP)) { a technique to reduce the overhead of iCSI feedback from the UE to the base-station (BS) in frequency division duplexing systems. In single-user LFP (M 1 = 1), the BS and UE share an apriori codebookQ of precoding matrices, and the UE uses iCSI to pick the best precoding matrix from the codebook and only feeds back only the codebood index to the BS. It can be readily veried that the instantaneous capacity for LFP takes on a similar structure to (3.1). While the analysis in this chapter is in fact applicable to this wider class of restricted pre-coded systems, for convenience the discussion here shall be limited to the case of HBwS. The interested reader is referred to the journal paper [147] to see the more extensive discussion for restricted precoding. The performance analysis for restricted precoded systems is dicult in general and no closed form expressions (for the most general setting) are known to date, as shall be evident from the literatur review below for LFP and Transmit Antenna Selection (TAS). For the case of LFP, lower bounds on the capacity with beamforming in an isotropic channel were proposed in [86]. System upper bounds under similar settings were considered in [87,148] etc. The lower bound in [86] was extended to the case of spatial multiplexing in [149]. However, bounds on the 20 performance and the optimal design of the codebook for spatial multiplexing in a correlated channel are not available in literature to the best of the authors' knowledge. Even in the relatively simple case of random vector quantized beamforming, performance bounds and good codebook designs for a correlated MIMO channel were found only recently [90]. A more complete discussion of the results prior to 2008 are available in [94]. Similarly, for the case of TAS, bounds on the distribution of the capacity for spatial multiplexing in an isotropic channel were found in [18]. The ergodic capacity in the high and low SNR regimes were discussed in [16] and asymptotic large antenna bounds are considered in [19, 23]. A loose upper bound on the outage probability for a spatial multiplexing system with receive antenna selection, in a correlated fading channel was considered in [26]. However, the performance analysis of spatial multiplexing with TAS in a correlated channel is not available in the literature to the best of the authors' knowledge. A more complete review of literature on antenna selection is available in [27,94,137]. As shall be shown in Sec. 3.2, some of the important performance measures, like mean and outage hSNR capacities, are a function of the eigenvalues of a set of coupled 1 , doubly- correlated Wishart matrices. Therefore characterizing the joint eigenvalue distribution across these coupled Wishart matrices is an essential step towards characterizing the performance. The asymptotic eigenvalue distribution [150, 151], non-asymptotic diagonal distribution [152] and joint eigenvalue distributions [153] for a single Wishart matrix have been widely characterized both with and with-out correlated entries. The joint eigenvalue distribution for a pair of correlated Wishart matrices was characterized in [154, 155]. However, the joint eigenvalue distribution across a larger set of correlated, let alone coupled, Wishart matrices has not been studied in literature to the best of our knowledge. The contributions of this chapter are as follows: The asymptotic second-order statistics and joint distribution of eigenvalues across a set of coupled, doubly-correlated Wishart matrices are derived. Additionally, a tight approximation for the joint distribution of eigenvalues in the non-asymptotic regime is also proposed. These results are then used to approximate the distribution of the instantaneous hSNR capacity for a HBwS system. In the process, a new technique for nding the distribution of the largest element of a correlated Gaussian vector is proposed. As an application of the proposed techniques, an ecient beamformer for a HBwS system is also derived. The rest of the chapter is organized as follows: The joint distribution of the channel eigen- values are derived in Sec. 3.2. Using these results, the approximate capacity distribution for a restricted precoded system is derived in Sec. 3.3. To study the eectiveness of the approxi- mation, simulations under some practical channel parameters are performed in Sec. 3.4. As an application, a demonstration of how the results can be used to design a good beamformer for a beam selection system is presented in Sec. 3.5. Finally, the conclusions are presented in Sec. 3.6. Dening Q i = TS i G i andQ =fTS i G i j1ijSjg, and from the analysis in Section 2.1, 1 Here coupling refers to the fact that the Gaussian matrices generating the set of Wishart matrices have some common elements. 21 the instantaneous hSNR system capacity for HBwS can be re-expressed as: C( e H) = max 1ijQj ( Mrx X m=1 log " 1 + ~ im M rx #) (3.2) where ~ im is them-th largest eigenvalue of e HQ i Q y i e H y and the dependence ofC on T is dropped for ease of representation. Note that from the denition of the ortho-normalization matrices G i , Q i 's are semi-unitary i.e. Q y i Q i =I Ktx for alli2f1;::;jQjg. Since the eective channel for a given precoder matrix Q is e HQ, ~ im shall henceforth be referred to as the m-th \channel" eigenvalue for precoder Q i . 3.2 Joint distribution of channel eigenvalues From (3.2) it is clear that the instantaneous hSNR capacity distribution depends only on the eigenvalues of e HQ i Q y i e H y . It can be easily veried from (2.6a) thatf e HQ i Q y i e H y j1 ijQjg forms a set of coupled, doubly-correlated Wishart matrices. The coupling comes from the fact that all these matrices are generated from the same i.i.d. random matrix H. In this section, the joint distribution of the eigenvalues of these coupled Wishart matrices is characterized. First the asymptotic second-order statistics and the joint distribution of the eigenvalues are derived in the large antenna limit i.e., for M tx ;K tx !1 (with a xed ratio) while M rx is kept xed (nite). This scaling is required for analytical tractability and where necessary, approximations for the, more practical, nite antenna regime shall also be considered (including the case of K tx =M rx ). For the large antenna limit, the scaled parameters M tx =sM o and K tx =sK o are dened, whereM o ;K o are constants ands is the scaling factor. A family ofM tx M tx transmit correla- tion matrices R tx and a family of codebooksQ =fQ 1 ;::; Q jQj g are also dened as a function of s. For the family of codebooks, the codebook sizejQj is xed but the precoding matrices Q i are M tx K tx semi-unitary matrices as a function of s. The eigenvalues of e HQ i Q y i e H y typically di- verge as s increases. Therefore, the eigenvalues of its normalized counterpart J i , e HQ i Q y i e H y TrfQ y i RtxQ i g shall instead be characterized. 2 We dene the eigen decomposition J i = E i i E y i where E i and i are the unordered eigenvector and eigenvalue matrices, respectively. These eigenval- ues im = [ i ] m;m shall be referred to as the normalized channel eigenvalues. We also dene the eigen decompositions R rx = E rx rx [E rx ] y and R tx = E tx tx [E tx ] y where, tx k = [ tx ] kk , rx k = [ tx ] kk are the k-th largest eigenvalues of R tx , R rx respectively. 3.2.1 First-order approximation and second-order statistics The expression for the normalized channel eigenvalues and their second-order statistics, in the large antenna limit, are given by the following theorem, which extends the results in [151] to the joint statistics case: 2 Note that if TrfQ y i RtxQig = 0, the corresponding channel eigenvalues are trivially zero. Here only the non-trivial case of TrfQ y i RtxQig> 0 is considered. 22 Theorem 3.2.1. Consider a family of transmit correlation matrices R tx and a family of pre- coding matricesQ =fQ 1 ;::; Q jQj g as a function of s. If the eigenvalues of R rx are all distinct and lim s!1 kQ y i RtxQ i k F TrfQ y i RtxQ i g = 0 for all i2f1;::;jQjg, then as s!1: im ' e rx m y J i e rx m , _ im (3.3) im = Ef im g' rx m , _ im (3.4) K ij mn = Ef im jn g im jn ' mn rx m rx n Q y j R tx Q i 2 F TrfQ y i R tx Q i gTrfQ y j R tx Q j g , _ K ij mn (3.5) for all 1 m;n M rx , i;j2f1;::;jQjg and where \'" denotes a rst-order approximation (i.e., an equality in which the higher order terms that do not in uence the asymptotic statistics of im as s!1 are neglected), im = [ i ] m;m and e rx m = [E rx ] cfmg . Proof. See Appendix 3.A. These normalized channel eigenvalues [ i1 ;::; iM ] are unordered and are picked in the per- mutation such that the Homan-Weilandt inequality holds (see proof of Theorem 3.2.1). Note that the conditions required for the above theorem are somewhat dicult to verify since they depend on the codebook. A simpler sucient condition, independent of the codebook, is given by the following proposition. Proposition 3.2.1.1 (Simpler sucient condition). Theorem 3.2.1 is satised if eigenvalues of R rx are all distinct and either lim s!1 P K tx k=1 ( tx k ) 2 h P K tx `=1 tx M tx +1` i 2 = 0 or lim s!1 ( tx 1 ) 2 P K tx `=1 tx M tx +1` 2 = 0 Proof. See Appendix 3.B. Intuitively, the theorem states that as long as the eigenvalues of the transmit correlation matrix are not too skewed (so that the law of large numbers is applicable) the normalized channel eigenvalues asymptotically converge. Therefore, the rst-order approximations to the normalized channel eigenvalues are valid for large s, and these are used to derive the second- order statistics. Some examples of families of transmit correlation matrices which satisfy the skewness constraints in Proposition 3.2.1.1 are discussed in Appendix 3.D. Though the presented results are asymptotic, one is also are interested in how quickly the terms in (3.3)-(3.5) converge to their rst-order approximations as a function of s. It is worth mentioning that for the special case of a single receive antenna (M rx = 1), (3.3)-(3.5) are exact for all values of s. For larger values of M rx , a comparison of the convergence speeds of the rst-order approximations to results from Monte-Carlo simulations are studied in Fig. 3.1 for a sample restricted precoded system. Here, the unordered eigenvalues of i are being compared with the ordered eigenvalues obtained from Monte-Carlo simulations. Such a comparison is reasonable if the overlap between the marginal distributions of the unordered eigenvalues is low i.e., if the eigenvalues of R rx are suciently well separated and if s 1. A comparison of the convergence of the asymptotic rst-order expression (3.3) to Monte-Carlo simulation results is studied in Fig. 3.1a. It shows that while the convergence of (3.3) is very quick withK tx for large 23 0 10 20 30 40 50 60 70 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3 m=1, =0.9 m=2, =0.9 m=1, =0.5 m=2, =0.5 m=1, =0.2 m=2, =0.2 Smaller eigen value Larger eigen value (a) Eigenvalue estimation 0 5 10 15 20 25 30 35 40 45 50 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 (b) Eigenvalue statistics Figure 3.1: Convergence of normalized channel eigenvalues to their rst-order approximations for a restricted precoded system, as a function of K tx : (a) Compares the mean square error Ef[ im _ im ] 2 g normalized by K 11 mm , as a function of K tx for dierent values of transmit correlation (b) Compares im ;K ij mn ; _ im ; _ K ij mn as a function of K tx for = 0:5 system parameters: M tx = 2K;M rx = 2; [R tx ] ab = jabj ; [R rx ] ab = (0:5) jabj , Q 1 = [I Mtx ] cf1:Ktxg and Q 2 = [I Mtx ] c n 1+ j K tx 2 k :b 3K 2 c o eigenvalues it is slower for the smaller eigenvalues. A comparison of the approximate second- order eigen statistics to Monte-Carlo simulations as a function of s is presented in Fig. 3.1b. It shows that the second-order statistics match even for K tx = 2, validating quick convergence. Similar results have been observed for a wide variety of system parameters. The seemingly slow convergence of _ im for small eigenvalues in Fig. 3.1a and im in Fig. 3.1b is a result of the comparison of ordered with unordered eigenvalues. Approximation 3.2.1. Due to the accuracy and quick convergence of the rst-order approxi- mations, _ X's shall be used in place of theX's in (3.3)-(3.5), even in the non-asymptotic regime, i.e., for nite values of s. 3.2.2 Joint Distribution of eigenvalues In this section the joint distribution of the normalized channel eigenvalues shall be found. Since the actual distribution is hard to characterize, the asymptotic joint distribution shall be found rst and approximations for nite values ofK tx shall be considered later. The following theorem gives us partial results on the asymptotic joint distribution of eigenvalues: Theorem 3.2.2. Consider a family of transmit correlation matrices R tx and a family of precoding matrices Q = fQ 1 ;::; Q jQj g as a function of s. Then the vector of eigenvalues v = [ i 1 m 1 ; i 2 m 2 ;:::; i L m L ] for any nite L, 1i 1 ;:::;i L jQj and 1m 1 ;:::;m L M rx are jointly Gaussian distributed as s!1, with second-order statistics as given in Theorem 3.2.1, if: 24 1. The eigenvalues of R rx are all distinct. 2. Q i ` = U i ` V for all 1 ` L, where denes the Kronecker product, V is a ss unitary matrix and U i ` is any xed M o K o semi-unitary matrix. 3. The transmit-correlation matrix eigenvalues satisfy lim s!1 ( tx 1 ) 2 P s k=1 ( tx M tx +1k ) 2 = 0. Proof. See Appendix 3.C. 0 1 2 3 4 5 6 0 0.5 1 1.5 PDF (a) Marginal PDF 0 5 10 15 20 25 30 35 40 45 50 8 10 12 14 16 18 20 Kurtosis 0 5 10 15 20 25 30 35 40 45 50 0 2 4 6 Skewness (b) Skewness-Kurtosis Figure 3.2: Asymptotic convergence of the distribution of normalized channel eigen-values for a restricted precoded system, as a function K tx : (a) Plots the empirical probability distribution of the eigenvalue 11 (b) Compares the kurtosis and skewness of the bi-variate random vectors v = f 11 ; 21 g and v = flog 11 ; log 21 g to a bi-variate Gaussian with same second-order statistics system parameters: M tx = 2K;M rx = 2; [R tx ] ab = (0:5) jabj ; [R rx ] ab = (0:5) jabj , Q =fQ 1 ; Q 2 g where Q 1 = [I Mtx ] cf1:Ktxg , Q 2 = [I Mtx ] c n 1+ j K tx 2 k :b 3K 2 c o and 5000 samples Some examples of families of transmit correlation matrices which satisfy the skewness con- straint (condition 3) are discussed in Appendix 3.D. Intuitively, the theorem states that if the eigenvalues of the transmit correlation matrix are not too skewed (so that Lyapunov's central limit theorem is applicable) then the normalized channel eigenvalues corresponding to the pre- coding matrices that are suciently well separated in their column space are asymptotically jointly Gaussian. To characterize this convergence, the empirical distribution of the normalized channel eigenvalues for a sample restricted precoded system is plotted in Fig. 3.2a for dier- ent values of K tx . Following the approach in [156], to test the joint normality, the Kurtosis and Skewness of a vector of eigenvalues v (corresponding to a well-separated codebook) are plotted in Fig. 3.2b as a function of K tx . From [156], for a large sample set from a p-variate Gaussian distribution, the skewness and kurtosis converge to the values of 0 and p(p + 2), re- spectively. Both gures suggest that though the joint distribution is asymptotically Gaussian, the convergence is very slow. Such large values of K tx may be impractical and therefore, other approximations to the joint distribution are required in the nite antenna regime. 25 In this chapter, a joint lognormal distribution is proposed as an approximation for the normalized channel eigenvalue distribution in the nite antenna regime. One can observe from Fig. 3.2a that in the non-asymptotic regime, a lognormal distribution may indeed be a better t for the marginal distribution. In Fig. 3.2b, the kurtosis and skewness for the logarithm of eigenvalues are also depicted. The quick convergence of these parameters with K tx provides further credence to this hypothesis. Apart from ensuring that the eigenvalues are always non- negative, a joint-lognormal approximation for eigenvalues also ensures that the instantaneous hSNR capacity is Gaussian distributed in the high SNR regime. This is an intuitively pleasing result and is consistent with prior literature [157,158]. Approximation 3.2.2. In the rest of the chapter, the normalized channel eigenvalues for any set of precoding matrices shall be approximated to be jointly lognormally distributed in the non-asymptotic regime. Unlike in Theorem 3.2.2, which considers only precoding matrices that are suciently well separated, here the eigenvalues corresponding to any set of precoding matrices are approxi- mated to be jointly lognormal distributed. The validity of this approximation is studied in Fig. 3.3 wherein the Skewness and Kurtosis for the eigenvalues corresponding to all precod- ing matrices of a sample antenna selection system are plotted. These results also show that a jointly lognormal distribution is a better t than a Gaussian t for the normalized channel eigenvalues. However, even for the logarithm of eigenvalues, the Skewness and Kurtosis values deviate partially from those of a Gaussian distribution, thereby suggesting that Approx. 3.2.2 is not very accurate. However, this approximation is needed for analytical tractability. In Sec. 3.3, it is demonstrated that the resulting approximation error in estimating the instantaneous hSNR capacity is relatively small. 1 1.5 2 2.5 3 3.5 4 4.5 5 0 50 100 150 200 Kurtosis 1 1.5 2 2.5 3 3.5 4 4.5 5 0 20 40 60 Skewness Figure 3.3: Comparison of Skewness and Kurtosis of the set of normalized channel eigen- values v =f 11 ;::; jQj;1 g and their logarithms v =flog 11 ;::; log jQj;1 g to a Gaussian dis- tribution with same second-order statistics, in an antenna selection system system param- eters: M tx = 2K;M rx = 1; [R tx ] ab = (0:5) jabj ; [R rx ] ab = (0:5) jabj ,Q =fQjQ is a M tx K tx submatrx ofI Mtx g and 10000 samples 3 26 3.3 Instantaneous hSNR Capacity Analysis Note from (3.2) that the instantaneous hSNR system capacity can be expressed as: C( e H) = max 1ijQj fC i ( e H)g (3.6) where the individual hSNR instantaneous capacities for i = 1;::;jQj can be expressed in the form C i ( e H) , P Mrx m=1 log(1 + TrfQ y i RtxQ i g Mrx im ). For suciently large values of TrfQ y i RtxQ i g Mrx , i.e., in the high SNR regime, one can approximate: C i ( e H) Mrx X m=1 log TrfQ y i R tx Q i g M rx im ! (3.7) Now, from Approx. 3.2.2, for suciently high SNR one obtains: fC 1 ( e H);:::;C jQj ( e H))g Jointly Gaussian (3.8) Additionally, using Approx. 3.2.1, (3.7) and results on the second-order statistics of a lognormal random vector [159], one can easily show that: C i , EfC i ( e H)g Mrx X m=1 log 2 4 TrfQ y i R tx Q i g _ 2 im M rx q _ 2 im + _ K ii mm 3 5 (3.9) ij , EfC i ( e H)C j ( e H)gEfC i ( e H)gEfC j ( e H)g X m;n log " _ im _ jn + _ K ij mn _ im _ jn # = Mrx X m=1 log " _ im _ jm + _ K ij mm _ im _ jm # (3.10) A comparison of the approximate joint statistics and marginal distribution ofC i ( e H) to Monte- Carlo simulations for a sample antenna selection system is given in Fig. 3.4, as a function of K tx . The results show that the approximations are tight forK tx 4. In Fig. 3.4c, the skewness and kurtosis of the vector of individual capacities corresponding to all precoding matrices for the antenna selection system are studied. The close t to a Gaussian distribution suggests that the impact of Approx. 3.2.2 on instantaneous hSNR capacity is small. Similar results have been observed for a wide variety of system parameters. 3.3.1 hSNR System capacity Note that (3.8){(3.10) provide a model that fully characterizes the joint distribution of the individual hSNR capacities C 1 ( e H). Therefore, from (3.6) and (3.8), the instantaneous hSNR capacity C( e H) can be expressed as the largest element of a correlated Gaussian vector. 3 Though v is a M tx K tx random vector, the Kurtosis and Skewness are computed only for the dominant subspace, which has Mtx principal components. 27 0 5 10 15 20 25 30 35 40 45 50 10 -2 10 -1 10 0 10 1 (a) Joint statistics of Ci( e H) 0 1 2 3 4 5 6 7 8 9 10 Instantaneous hSNR Capacity (nats/s/Hz) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 PDF (b) PDF of C1( e H) 1 1.5 2 2.5 3 3.5 4 4.5 5 0 20 40 60 80 100 120 140 (c) Skewness-Kurtosis Figure 3.4: Individual hSNR capacity distribution of an antenna selection system: (a) Compares the joint second-order statistics of capacities across dierent precoding matrices as a function of K tx (b) Compares the empirical PDF of C 1 ( e H) to a Gaussian distribution with mean and variance as given by (3.9)-(3.10) (c) Compares skewness and kurtosis of the set of channel capacities v =fC 1 ( e H);::;C jQj ( e H)g to a Gaussian distribution with same second-order statistics system parameters: M tx = 2K;M rx = 2; = 10; [R tx ] ab = (0:5) jabj ; [R rx ] ab = (0:5) jabj , Q 1 = [I Mtx ] cf1:Ktxg , Q 2 = [I Mtx ] c n 1+ j K tx 2 k :b 3K 2 c o , Q 3 = [I Mtx ] cf2:(Ktx+1)g ,Q =fQjQ is a M tx K tx submatrx ofI Mtx g and 10000 samples 3 For the maximum of a set of correlated Gaussian random variables, neither the exact dis- tribution nor even the mean is not known in closed form. Existing methods [160] to solve for them are too cumbersome, especially whenjQj is large. Though several bounds exist on the mean [161{165], they are not uniformly tight across all correlation structures. On the other hand, numerical approaches like in [166,167] are recursive and therefore are likely to accumulate signicant amount of error when the number of variablesjQj are large. This is specically rel- evant to our scenario since the codebook sizejQj can be very large. Therefore, a new approach to compute the distribution and mean of the largest element of a correlated Gaussian vector 28 shall be formulated here. Note that: C( e H) = [C 1 ( e H);:::;C jQj ( e H)] 1 = log [e C 1 ( e H) ;:::;e C jQj ( e H) ] 1 log [e C 1 ( e H) ;:::;e C jQj ( e H) ] p for p logjQj = 1 p log h jQj X i=1 e pC i ( e H) i (3.11) where the second last step follows from the norm inequalities L 1=p kak p kak 1 kak p for any vector a of length L. Since e pC i ( e H) Q Mrx m=1 TrfQ y i RtxQ i g im Mrx p is lognormal distributed, equation (3.11) above shows that the largest element of a correlated Gaussian vector can be approximately represented as the logarithm of a sum of correlated lognormals. 3.3.2 Sum of correlated lognormals It is well known that the sum of correlated lognormal random variables is approximately log- normal (see [168] and references therein). Of the many approximations for characterizing this sum, the moment and cumulant matching approaches, such as [169, 170], yield a poor t in the lower tail regions of the distribution. On the other hand, the moment generating function matching approaches, like [171], are too cumbersome when the number of variablesjQj is large. Here, it is proposed to use of the approach in [172] (reproduced here as algorithm 3.1), which extends the work in [173] to the correlated case. Similar to [166], this algorithm is recursive and therefore also shares the same drawback of accumulating error. However, the drawback is a by-product of the algorithm in [172] and not of our approach in (3.11). Any new results on sum of correlated lognormals can readily be used to resolve this drawback. To check the goodness of t, the empirical distribution of the largest element of a sample Gaussian vector, and itsp-norm approximation are compared to the distributions obtained using Algorithm 3.1, Clark [167] and second-order moment-matching [169] in Fig. 3.5. The results show that both Clark as well as Algorithm 3.1 give good approximations to the distribution of the largest element. In summary, the instantaneous hSNR capacity C( e H) can be approximated as a Gaussian random variable and its mean and variance can be computed via Algorithm 3.1. 3.4 Simulation Results Using the results derived in the previous sections, the system hSNR capacity of several HBwS systems shall now be analyzed. The simulation layout considers a single UE, cellular downlink channel operating at 2:4 GHz. Both the transmitter and receiver have a uniform, linear antenna array with antenna spacings ofd tx = 5cm andd rx = 2cm, respectively (unless otherwise stated). The transmitter experiences a Laplacian power angle spectrum (PAS) with mean angle of arrival (AoA) = =6 rads and an angle spread (AS) of =10 rads. The receiver on the other hand 29 Algorithm 3.1: Statistics of the largest element of a correlated Gaussian vector Inputs: p, C i ; ij forall 1i;jjQj fDened as in (3.9)-(3.11).g w (1) = s (1) =p C 1 2 w (1) = 2 s (1) =p 2 11 Q(1;) =p 2 1 for i = 2 tojQj do w (i) =p C i s (i 1) 2 w (i) =p 2 ii + 2 s (i 1) 2Q(i 1;i) s (i) = s (i 1) +G 1 w (i); w (i) 2 s (i) = 2 s (i 1)G 1 w (i); w (i) +G 2 w (i); w (i) +2 Q(i 1;i) 2 s (i1)G 3 w (i);w (i) 2 w (i) for j = 1 tojQj do Q(i;j) =Q(i 1;j) 1 G 3 w (i);w (i) 2 w (i) + p 2 ij G 3 w (i);w (i) 2 w (i) end for end for fG 1 (;);G 2 (;);G 3 (;) are as dened in Appendix of [172].g return s (jQj)=p fMean of C( e H)g return 2 s (jQj)=p 2 fVariance of C( e H)g 0 1 2 3 4 5 6 7 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 PDF Figure 3.5: Distribution of largest element of a Gaussian vector: x-jointly Gaussian vector, y, exp [x] (element-wise); Clark, moment-match refer to solutions from [167] and [169], respectively simulation parameters:jXj = 40,EfX i g = 1,EfX i X j g = 1:5 + ij , p = 8 experiences a uniform power angle spectrum. 4 As a rst simple example, an antenna selection system is considered in Fig. 3.6. In Fig. 3.6a, the PDF of instantaneous hSNR capacity as computed by Algorithm 1 is compared to Monte-Carlo simulations. To quantify the origin of 4 Tx/Rx correlation matrix is calculated as: [Rx] ab = R PAS()e 2j(ab)dx sin d . R PAS()d, where j = p 1, is the wavelength at 2:4GHz and x=tx/rx. All arrays and multipath components are in the horizontal plane. 30 -2 0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 0.6 Monte-Carlo sim. Algorithm 1 Gauss approx. (a) PDFfC( e H)g 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 2.8 3 3.2 3.4 3.6 3.8 4 4.2 Monte-Carlo sim. Algorithm 1 Gauss approx. (b) Tx antenna spacing Figure 3.6: Comparison of hSNR capacity of an antenna selection system as predicted by Algorithm3.1 to Monte-Carlo simulations and Gauss-approx (a) Plots PDF of instantaneous hSNR capacity for K tx = 2; 6 (b) Plots the mean hSNR capacity as a function of transmit antenna spacing (K tx = 2) system parameters: M tx = 2K, M rx = 2, SNR = 10 the mismatch in distributions, the PDF of the largest element of a Gaussian vector with the second-order statistics given by (3.9)-(3.10) is also plotted, labeled as Gauss-approx. The gap between Monte-Carlo and Gauss-approx quanties the error due to inaccuracy of Approx. 3.2.1 and 3.2.2. One observes that this gap does not increase much with K tx . On the other hand, the gap between Gauss-approx and Algorithm 1 quanties the error due to inaccuracy of the approach in [172]. This gap increases withK tx , owing to the error accumulation in the recursive steps of [172] for large codebooks jQj = Mtx Ktx . In Fig. 3.6b, the impact of transmit antenna spacing on the mean hSNR capacity is compared. As seen from the results, though Algorithm 1 overestimates the capacity, it accurately re ects the impact of system parameters like antenna spacing on the mean hSNR capacity. 3.5 Application to beamformer design for HBwS In this section, the analog beamformer design T of a beam selection system (a special case of HBwS) is optimized. While several heuristic designs for the beamformer T have been proposed in literature for correlated channels, for example the DFT, or spatial correlation matrix based designs [37,41], nding the mean capacity maximizing beamformer design is an open problem. This is partly because no uniformly tight (over the space of beamformers) bounds or approxima- tions to the instantaneous/mean hSNR capacity are available for a correlated channel. In this section, using Algorithm 3.1 as the objective function, a numerical-gradient ascent algorithm is used to search for an analog beamformer T that maximizes the mean hSNR capacity. Here, the gradient of mean hSNR capacity (as predicted by Algorithm 3.1) with respect to T is computed numerically. For ease of pictorial representation, a setting with M tx = 3, K tx = M rx = 1 is 31 (a) Optimized Beamformer EfC( e H)g = 2:91nats/s/Hz (b) DFT Beamformer EfC( e H)g = 2:74nats/s/Hz Figure 3.7: Comparison of beam-patterns for optimized beamformer and the DFT beamformer system parameters: M tx = 3, K tx =M rx = 1, SNR = 10 5 considered and other parameters same as in Sec. 3.4. The beam-patterns formed by the precoding vectors Q i of the optimized beamformer are compared to the DFT beamformer in Fig. 3.7. The estimated PAS 5 is also plotted for comparison. As the results show, unlike the DFT beamformer that is designed for generic correlated channels, the optimized beamformer here adapts to the UE PAS leading to an increase in the mean hSNR capacity from 2:74 to 2:9 nats/s/Hz. 3.6 Summary and Conclusions This chapter analyzes the capacity of a special class of MIMO systems called restricted pre- coded systems. Many practically relevant systems like HBwS and LFP fall under this class. It is shown that the instantaneous hSNR capacity of restricted precoded systems can be expressed as a function of the eigenvalues of a set of coupled doubly-correlated Wishart matrices. The eigenvalues are shown to be jointly Gaussian in the large antenna limit, if a set of conditions on the transmit correlation and codebook are satised. The asymptotic second-order statistics of the eigenvalues are also derived and the simulations suggest that their convergence is very quick. It is proposed, and veried, that in the nite antenna regime, a joint-lognormal distribution is a better t to the eigenvalue distribution. Using these results, and a few simplifying approx- imations, it is shown that the instantaneous hSNR capacity for a restricted precoded system 5 Est.PAS() = P ab [Rtx] ab e 2j(ab)d tx sin , where j = p 1 and is the wavelength at 2:4GHz. 32 can be approximated as the largest element of a correlated Gaussian vector. A new approach for characterizing its distribution is proposed and, in the process, it is shown that the problem of nding the distribution of the sum of lognormals and the problem of nding the distribution of the largest element of a Gaussian vector are equivalent. Simulations results suggest that the proposed algorithm slightly overestimates the hSNR capacity but predicts the dependence of capacity on system parameters like number of antennas and antenna spacing accurately. One also observes that a signicant portion of the mismatch comes from error accumulation in the recursive steps of the proposed algorithm. Any non-recursive approach to characterize the sum of lognormals can be useful in tackling this problem. It is also demonstrated, via an example, that the proposed algorithm can be used in the design of near-optimal analog beamformers for a beam selection or more generally a HBwS system. 3.A Appendix A (Proof of Theorem 3.2.1). For any value ofM tx ,K tx and8i2f1;::;jQjg, from (2.3a) we have: J i = e HQ i Q y i e H y TrfQ y i R tx Q i g = R 1=2 rx HR 1=2 tx Q i Q y i R 1=2 tx H y R 1=2 rx TrfQ y i R tx Q i g = R 1=2 rx " Ktx X k=1 ^ h i k [ ^ h i k ] y # R 1=2 rx (3.12) where ^ h i k = HR 1=2 tx [Q i ] cfkg q TrfQ y i R tx Q i g. Dening ^ J i = P Ktx k=1 ^ h i k [ ^ h i k ] y and taking expecta- tions one obtains: Ef[ ^ J i ] ab g = X k Ef[ H] rfag R 1=2 tx [Q i ] cfkg [Q i ] y cfkg R 1=2 tx [ H] rfbg y g TrfQ y i R tx Q i g = X k [Q i ] y cfkg R 1=2 tx Ef [ H] rfbg y [ H] rfag gR 1=2 tx [Q i ] cfkg TrfQ y i R tx Q i g = ab (3.13) where ab = 1 if a = b and ab = 0 if a6= b (the Kronecker delta function) and the last step follows from the fact that H has i.i.d.CN (0; 1) entries. E [ ^ J i ] ab 2 = Ktx X k=1 Ktx X `=1 E n [ ^ h i k ] a [ ^ h i k ] y b [ ^ h i ` ] b [ ^ h i ` ] y a o = X k;` " E n [ ^ h i k ] a [ ^ h i k ] y b o E n [ ^ h i ` ] b [ ^ h i ` ] y a o +E n [ ^ h i k ] a [ ^ h i ` ] y a o E n [ ^ h i k ] y b [ ^ h i ` ] b o # (3.14a) = Ef[ ^ J i ] ab g 2 + X k;` E n [ ^ h i k ] y b [ ^ h i ` ] b o 2 (3.14b) = Ef[ ^ J i ] ab g 2 + X k;` [Q i y R tx Q i ] 2 lk TrfQ y i R tx Q i g 2 = Ef[ ^ J i ] ab g 2 + kQ y i R tx Q i k 2 F TrfQ y i R tx Q i g 2 (3.14c) 33 where (3.14a) follows from the result on the expectation of the product of four circularly sym- metric jointly Gaussian random variables [174] and (3.14b) follows from the fact that the vector ^ h i has i.i.d entries82f1;::;K tx g. Therefore, if lim s!1 kQ y i RtxQ i k F TrfQ y i RtxQ i g = 0, from (3.12), (3.14c) we have: lim s!1 ^ J i ms =I Mrx ) lim s!1 J i ms = R rx (3.15) where ms = denotes element-wise mean square convergence. From the Homan-Weilandt inequal- ity [175], there exists a permutation matrix P such thatkP i P y rx k F kJ i R rx k F , where i ; rx are the eigenvalue matrices of J i ; R rx , respectively. Without loss of generality, assuming i is always picked in this permutation, we have: lim s!1 i ms = rx (3.16) Let e im be an eigenvector of J i corresponding to an eigenvalue im = [ i ] m;m . Now ass!1: J i e im = im e im (3.17a) ) R rx e im ms = rx m e im [From (3.15) and (3.16)] ) e im ms = e j e rx m (3.17b) for some angle (which may be a function of H), where (3.17b) follows from the fact that all the eigenvalues of R rx are distinct. Now since both the eigenvalues and eigenvectors converge in mean square sense, following a similar procedure to [151], from (3.17a) we have: [J i R rx + R rx ][e im e j e rx m +e j e rx m ] = [ im rx m + rx m ][e im e j e rx m +e j e rx m ] ) [J i R rx ]e j e rx m + R rx [e im e j e rx m ] ' [ im rx m ]e j e rx m + rx m [e im e j e rx m ] (3.18a) ) e rx m y [J i R rx ]e rx m ' im rx m (3.18b) ) im ' e rx m y J i e rx m (3.18c) where, as in [151], \'" denotes a rst-order approximation (i.e., an equality in which the higher order terms that do not in uence the asymptotic statistics of im ass!1 are neglected). Note that (3.18a) follows by neglecting the higher order terms and (3.18b) follows by premultiplying both sides by e j [e rx m ] y . This proves the asymptotic rst-order expression for the eigenvalues (3.3). By taking expectations on both sides of (3.18c), the asymptotic mean can be expressed as: im = Ef im g' e rx m y EfJ i g e rx m ' rx m [From (3.15)] (3.19) Similarly, for cross-correlation we have: Ef im jn g'E n e rx m y J i e rx m e rx n y J j e rx n o = rx m rx n Ktx X k=1 Ktx X `=1 E n e rx m y ^ h i k [ ^ h i k ] y e rx m e rx n y ^ h j ` [ ^ h j ` ] y e rx n o (3.20) 34 where the last step follows from (3.12). Notice that e rx y ^ h are all circularly symmetric jointly Gaussian random variables for all 1M rx , 1jQj, 1K tx . From the result on the expectation of the product of four complex, circularly symmetric jointly Gaussian random variables [174], we have: Ef im jn g ' rx m rx n Ktx X k=1 Ktx X `=1 e rx m y E n ^ h i k [ ^ h i k ] y o e rx m e rx n y E n ^ h j ` [ ^ h j ` ] y o e rx n + rx m rx n Ktx X k=1 Ktx X `=1 e rx m y E n ^ h i k [ ^ h j ` ] y o e rx n e rx n y E n ^ h j ` [ ^ h i k ] y o e rx m )Ef im jn g im jn ' X k;` rx m rx n e rx m y Ef HR 1=2 tx [Q i ] cfkg [Q j ] y cf`g R 1=2 tx H y ge rx n 2 [TrfQ y i R tx Q i gTrfQ y j R tx Q j g] = X k;` rx m rx n [Q j ] y cf`g R 1=2 tx Ef H y e rx n e rx m y HgR 1=2 tx [Q i ] cfkg 2 [TrfQ y i R tx Q i gTrfQ y j R tx Q j g] = X k;` mn rx m rx n [Q j ] cf`g y R tx [Q i ] cfkg 2 [TrfQ y i R tx Q i gTrfQ y j R tx Q j g] = mn rx m rx n Q y j R tx Q i 2 F TrfQ y i R tx Q i gTrfQ y j R tx Q j g (3.21) where, the penultimate step follows from the fact that H has i.i.d. entries and e rx m y e rx n = mn . This concludes the proof. 3.B Appendix B (Proof of proposition 3.2.1.1). From Theorem 3.2.1, the required condition is that lim Ktx!1 kQ y i R tx Q i k F TrfQ y i R tx Q i g = 0 8i2f1;::;jQjg Note that: kQ y i R tx Q i k F TrfQ y i R tx Q i g = kfQ y i R tx Q i gk 2 kfQ y i R tx Q i gk 1 = kfQ i Q y i R tx gk 2 kfQ i Q y i R tx gk 1 (3.22) where fAg is the vector of eigenvalues for a square matrix A. Let " fAg and # fAg denote sortings of fAg in ascending and descending orders, respectively. From results on eigenvalue majorization [176, Eqn 3.20], we have for all 1LM tx : L Y `=1 # ` fQ i Q y i g " ` fR tx g L Y `=1 # ` fQ i Q y i R tx g L Y `=1 # ` fQ i Q y i g # ` fR tx g (3.23) 35 where : ` fAg is the `-th element of : fAg. By taking the logarithm on both sides we get: log h # fQ i Q y i g " fR tx g i log h # fQ i Q y i R tx g i log h # fQ i Q y i g # fR tx g i (3.24) where denotes the Hadamard product, the logarithm is taken element-wise and a b implies b majorizes a. Now consider the functionf (x) =ke x k where2f1; 2;:::g and the exponent e x is taken element wise. The function is clearly permutation invariant in vector x and for any elements x a ;x b of vector x, we have: @f (x) @x a @f (x) @x b (x a x b ) = (x a x b ) e xa e x b ke x k 1 0 (3.25) Therefore from [176, Th 2.3.14], f (x) is a Schur-convex function. From the denition of a Schur convex function and from (3.24) we have: f 2 log[ # fQ i Q y i R tx g] f 1 log[ # fQ i Q y i R tx g] f 2 log[ # fQ i Q y i g # fR tx g] f 1 log[ # fQ i Q y i g " fR tx g] (3.26) ) kQ y i R tx Q i k F TrfQ y i R tx Q i g q P Ktx k=1 ( tx k ) 2 h P Ktx `=1 tx Mtx+1` i (3.27) where in the last step one uses the fact that Q i is semi-unitary with dimension M tx K tx . Alternately, from H older's inequality: kfQ i Q y i R tx gk 2 kfQ i Q y i R tx gk 1 kfQ i Q y i R tx gk 1 kfQ i Q y i R tx gk 2 Now following similar steps to before, we have: kQ y i R tx Q i k F TrfQ y i R tx Q i g f 1 log[ # fQ i Q y i g # fR tx g] f 2 log[ # fQ i Q y i g " fR tx g] tx 1 q P Ktx `=1 tx Mtx+1` 2 (3.28) Therefore a sucient condition for Theorem 3.2.1 is that the eigenvalues of R rx be distinct, and either the right hand side in (3.27) or (3.28) go to zero as s!1. 3.C Appendix C (Proof of Theorem 3.2.2). Since K tx =sK o s, we have: ( tx 1 ) 2 P Ktx k=1 ( tx Mtx+1k ) 2 ( tx 1 ) 2 P s k=1 ( tx Mtx+1k ) 2 (3.29) Therefore using (3.29) and conditions 1,3 of the theorem statement, from proposition 3.2.1.1, for any 1ijQj, 1mM rx as s!1: im = e rx m y J i e rx m = rx m ^ g m R 1=2 tx Q i Q y i R 1=2 tx ^ g y m TrfQ y i R tx Q i g (3.30) 36 where ^ g m = e rx m y H is a 1M tx vector with i.i.d.CN (0; 1) entries. Additionally, the second-order statistics of im are also as given by (3.4) and (3.5). Now, for any 1 m M rx , consider a set of eigenvalues v = [ i 1 m ; i 2 m ;:::; i L m ]. From the Cramer-Wold theorem [177], these eigenvalues are asymptotically jointly Gaussian i any weighted sum converges to a Gaussian distribution as s!1. A weighted sum is given by: w y v = L X `=1 w ` i ` m = rx m ^ g m R 1=2 tx " L X `=1 w ` Q i ` Q y i ` TrfQ y i ` R tx Q i ` g # R 1=2 tx ^ g y m Using condition 2 of the theorem statement, we have: w y v = rx m ^ g m R 1=2 tx 2 6 6 6 6 4 0 B B B B @ L X `=1 w ` U i ` U y i ` TrfQ y i ` R tx Q i ` g | {z } W 1 C C C C A VV y 3 7 7 7 7 5 R 1=2 tx ^ g y m d = Mtx X n=1 rx m j[^ g m ] n j 2 # n fR 1=2 tx [W VV y ]R 1=2 tx g (3.31) where d = represents equality in distribution and # n fAg is the n-th largest eigenvalue of a matrix A. Using Lyapunov's central limit theorem, it is easy to show that (3.31) is asymptotically Gaussian if (see [151,178] for details): lim s!1 kfR 1=2 tx [W VV y ]R 1=2 tx gk 1 kfR 1=2 tx [W VV y ]R 1=2 tx gk 2 = 0 (3.32) where, fAg represents the vector of eigenvalues of a matrix A. Now following similar steps to those in Appendix 3.B (see (3.26)), we have: kfR 1=2 tx [W VV y ]R 1=2 tx gk 1 kfR 1=2 tx [W VV y ]R 1=2 tx gk 2 = kf[W I s ]R tx gk 1 kf[W I s ]R tx gk 2 (3.33) k # fW I s g # fR tx gk 1 k # fW I s g " fR tx gk 2 # 1 fR tx g # 1 fWg r P s `=1 h " ` fR tx g # 1 fWg i 2 = tx 1 q P s `=1 ( tx Mtx+1` ) 2 (3.34) 37 where one uses the fact that VV y = I s and # fW I s g = # fWg # fI s g. From (3.32), (3.34) and condition 3 of the theorem, w y v is Gaussian distributed8w ass!1. Therefore the vector v = [ i 1 m ; i 2 m ;:::; i L m ] is asymptotically jointly Gaussian distributed. Note that the joint Gaussianity above trivially also ensures marginal Gaussianity. Now, the joint Gaussianity of im and jn forn6=m directly follows from the marginal Gaussianity and the independence of ^ g m and ^ g n in (3.30). 3.D Appendix D In this section some families of M tx M tx transmit correlation matrices, where M tx = M o s, shall be enlisted which satisfy: lim s!1 ( tx 1 ) 2 P s k=1 ( tx M tx +1k ) 2 = 0. As discussed in Appendix 3.C, this condition also implies Proposition 3.2.1.1. Exponential Correlation In this case, the elements of the correlation matrix are given by [R tx ] ab = jabj forjj < 1. That this matrix satises the required constraint can be veried using the bounds on eigenvalues as derived in [179]. Arbitrary power angle spectrum A uniform linear antenna array at the transmitter with spacing d tx is considered. Assuming the multipath components to be only in the horizontal plane, the elements of the transmit correlation matrix can be expressed as: [R tx ] ab = Z 1 0 e j2f(ab) " 1 X n=1 (fn) # | {z } (f) df (3.35) where, (f) = 8 < : PAS arcsin( f d tx ) +PAS arcsin( f d tx ) p d 2 tx 2 f 2 for jfj dtx 1 0 for jfj dtx > 1 (3.36) j = p 1, is the wavelength and PAS() is the normalized PAS computed as PAS() = PAS() R PAS()d. We deneF a as the smallest, possibly non-contiguous, sub-interval of [0; 1] such that: min f2[0;1]Fa f(f)g max f2Fa f(f)g and Z Fa fdf = 1=a where (f) is as dened in (3.35). Since R tx is a Toeplitz matrix, from Szego's results on eigenvalues of Toeplitz matrices [180], as s!1 we have: ( tx 1 ) 2 P s k=1 ( tx Mtx+1k ) 2 = max f2F 1 2 (f) M tx R f 0 2F Mo 2 (f 0 )df 0 (3.37) 38 It can be easily veried that the right hand side of (3.37) goes to zero if: PAS() is continuous, max 2(;] fPAS()g<1 and PAS( 2 ) = PAS( 2 ) = 0 The constant M o is such that R f 0 2F Mo 2 (f 0 )df 0 > 0. The former condition is almost always satised for a sectored base-station. The latter condi- tion is more stringent. However, it can be relaxed if the codebookQ is restricted such that 8i lim s!1 rankfQ y i R tx Q i g=s>K o 1. 39 Chapter 4 Beamformer Design in a Multi-user Scenario While the hSNR system capacity of a multi-user HBwS system was characterized in Chapter 3 for a given analog beamformer T, note that good near-optimal designs for the beamforming matrix are yet to be found. For the case of beam selection, several designs for the beamforming matrix using phase-shifters [37, 41] or lens antennas [105, 138] have been proposed. However in most of these prior works, the beamforming matrix oers orthogonal beam choices and the number of input ports ( L tx ) equals the number of transmit antennas (M tx ), i.e., the beams span the whole channel dimension. This signicantly increases the implementation cost of the analog beamformer in massive MIMO systems, where M tx 1, as also mentioned in Section 1.1. Though some of these designs [41, 55], can be extended to the case of L tx <M tx , these designs are inferior, especially in spatially sparse channels [50]. Finally none of these beamformer designs account for the CE overhead. A more generic design for the beamforming matrix, with L tx 6= M tx and possibly non-orthogonal columns (i.e., oering non-orthogonal beam choices), will be considered in this chapter. Furthermore the impact of the CE overhead and switch bank architecture on the beamformer design is also studied. Furthermore, the hardware implementation cost of HBwS is also investigated. The contributions of this chapter are as follows: 1. A generic architecture of HBwS for low complexity multi-user MIMO transceivers is pro- posed, wherein the beamforming matrix may be a rectangular matrix i.e., L tx 6= M tx , with non-orthogonal columns. 2. For a channel with isotropic scattering within the subspace spanned by the beamformer, it is shown that a beamformer that maximizes a lower bound to the mean system sum- capacity with Dirty-paper coding can be obtained as a solution to a coupled Grassmannian subspace packing problem. 3. A good sub-optimal solution to this packing problem is found and algorithms to further improve it are proposed. 4. A two-stage architecture for the beamformer is proposed and a family of \good" switch 40 positions are found, which help reduce the computational and hardware cost of HBwS while retaining good performance. 5. An extension of the beamformer design to channels with anisotropic scattering is also explored. The organization of this chapter is as follows: the mean sum-capacity maximizing beam- forming matrix design problem is formulated in Section 4.1; the search-space for the optimal beamformer is characterized in Section 4.2 and a closed-form lower bound to the mean sum- capacity is explored in Section 4.3; a good beamformer design and algorithms for further im- proving upon it are proposed in Section 4.4; strategies to reduce the hardware implementation cost for HBwS are discussed in Section 4.5; the CE overhead for HBwS is discussed in Section 4.6; the simulation results are presented in Section 4.7; the extension to anisotropic channels is considered in Section 4.8; and nally, the conclusions are summarized in Section 4.9. 4.1 Problem Formulation As discussed in Chapter 2, BS resources are assumed to be split among the dierent UE groups based on aCSI, and therefore the discussion shall be limited to a single representative UE group. For this user group, taking an expectation of the instantaneous hSNR system capacity in (2.5) with respect to the channel fading e H, one obtains the mean hSNR sum-capacity as: C(T),E e H max 1ijSj log I Mrx + M rx e HTS i G i G y i S y i T y e H y : (4.1) As a recap, from (2.3b) and the fact that e H m are independent for all m, one can express: e H = R 1=2 rx H[ tx ] 1=2 [E tx ] y ; (4.2) where R rx is a block-diagonal matrix with them-th diagonal block being R rx;m and H is aM rx M tx matrix with i.i.d.CN (0; 1) entries and R tx = E tx tx E y tx is the eigen-decomposition of the transmit correlation matrix such that the diagonal elements of tx are arranged in descending order of magnitude. The primary goal of this chapter is to nd the analog beamformer T that maximizes the lower bound in (4.1), i.e.,: T opt = argmax T2C M tx L tx f C(T)g subject to:jPfTgjD; (4.3) wherePfTg represents the sub-space spanned by the columns of T, D is a bound on the dimension of this subspace, T opt is designed based on the knowledge of the aCSI statistics: R tx and R rx and, with slight abuse of notation, argmaxfg refers to any one of the (possibly many) maximizing arguments. Such a bound onPfTg is required to limit the CE overhead, as shall be shown in Section 4.6. The optimal beamformer that maximizes an objective of the form f(D) C(T), where f() is any non-increasing function, can then be found by simply performing a line search overD2f1;:::;M tx g. Note that (4.3) does not involve any magnitude or phase restrictions on the elements of the analog beamforming matrix T. Such an unrestricted 41 beamformer, while serving as a good reference for comparison, may also provide intuition for designing beamformers with unit magnitude and discrete phase constraints. As shall be seen later, an exact solution to (4.3) is intractable and we shall therefore restrict ourselves to a good sub-optimal solution, that only requires knowledge of R tx . 4.1.1 Connections to limited-feedback precoding As mentioned in Chapter 3, note that HBwS is an example of a restricted precoded system [147]. In fact, by considering the precoding matrices for the dierent switch positionsfTS i G i jS i 2Sg as entries of a codebook, the single UE case (M 1 = 1) can be interpreted as a type of limited- feedback unitary precoding [94, 149]. However, in contrast to conventional LFP, the HBwS codebook entriesfTS i G i jS i 2Sg are coupled, as they are generated from the columns of the same beamforming matrix T. As a result, good codebook designs for limited-feedback unitary precoding [86,88,90,149] cannot be directly extended to nd good designs for T. 4.2 Transforming the search space Notice that in (4.3), search for T opt is over all possible M tx L tx complex matrices with jPfTgj D. Without loss of generality, such a beamformer can be expressed as T = E ^ T where E2 C MtxD and ^ T2 C DLtx . In this section, this search space is reduced by getting rid of some sub-optimal and redundant solutions. We rst state the following theorem: Theorem 4.2.1. If rankfR tx gD, there exists an optimal solution T opt to (4.3) such that, T opt = E D tx ^ T, where ^ T2C DLtx and E D tx = [E tx ] cf1:Dg is the M tx D principal sub-matrix of E tx corresponding to the D largest eigenvalues. Proof. See Appendix 4.A. Note that such a low rank R tx may be often experienced in massive MIMO systems both at cm and mm wave frequencies. It is also conjectured here that: Conjecture 4.2.1. Theorem 4.2.1 holds for any rankfR tx g. Intuitively, this conjecture states that ifPfTg, and therefore the analog precoding beams, should lie in a channel subspace of dimensionD, then it should be the dominantD dimensional channel sub-spacePfE D tx g. Unfortunately a general proof of this conjecture has eluded us. A proof under some additional conditions and M rx = 1 will be derived in Chapter 5 (see also corresponding paper [79]). While the rest of the results in the chapter are exact for rankfR tx g D, this intuitive, albeit dicult to prove, conjecture shall be relied upon when rankfR tx g>D. From Theorem 4.2.1, Conjecture 4.2.1 and (4.1)-(4.2), problem (4.3) reduces to T opt = E D tx ^ T opt , where: ^ T opt = argmax ^ T2C DL tx n C D ( ^ T) o ; (4.4) C D ( ^ T), C(E D tx ^ T) 42 =E H D max 1ijSj log I Mrx + M rx R 1=2 rx H D [ D tx ] 1=2 ^ TS i ^ G i ^ G y i S y i ^ T y [ D tx ] 1=2 [H D ] y R 1=2 rx ; (4.5) where D tx is theDD principal submatrix of tx , H D is aM rx D matrix with i.i.d.CN (0; 1) entries and ^ G i ortho-normalizes columns of ^ TS i . Henceforth one shall restrict to nding the optimal solution ^ T opt to (4.4), since T opt can be found in a straightforward way from it. In fact expressing T opt as E D tx ^ T opt may also help reduce the hardware cost for implementing the analog beamforming matrix, as shall be shown later in Section 4.5.1. To prevent any confusion, ^ T opt shall be referred to as the Reduced Dimensional (RD) beamformer. Though (4.4) reduces the search space fromC MtxLtx toC DLtx , it is still unbounded. This problem is remedied by the following theorem. Theorem 4.2.2 (Bounding the search space). For any ^ T2C DLtx , both ^ T and ^ T attain the same mean hSNR sum-capacity (4.5), where is any arbitraryL tx L tx complex diagonal matrix. Proof. See Appendix 4.B. From Theorem 4.2.2, by replacing ^ T by ^ T = ^ T in (4.4), where: [ ] `;` = [ ^ T 1;` ] y [ ^ T] cf`g j ^ T 1;` j 81`L tx ; the optimal RD-beamformer design problem can be reduced to: ^ T opt = argmax ^ T2T G n C D ( ^ T) o where, (4.6) T G = n ^ T2C DLtx [ ^ T] cf`g = 1; Imf ^ T `;1 g = 08` = 1;:::;L tx o ; where, Imfg represents the imaginary component. The constraints in (4.6) aid in resolving the ambiguities of ^ T, similar to the approach used in [181]. Note that since the mean hSNR sum-capacity C D ( ^ T) is invariant to complex scaling of the columns of ^ T, each column [ ^ T] cf`g for 1 ` L tx is representative ofPf[ ^ T] cf`g g, i.e., it represents a point on the complex Grassmannian manifoldG(D; 1). For a b > 0, the complex Grassmannian manifoldG(a;b) is the set of all linear sub-spaces of dimension b in C a1 . Therefore, (4.6) is actually an optimization problem over the complex Grassmannian manifoldG(D; 1). 4.3 Lower bound on the objective function Though transformations to the search space were introduced in the previous section to reduce the search complexity, the mean hSNR sum-capacity C D ( ^ T) is not in closed form. A closed- form lower bound to C D ( ^ T) for the case of M rx = 1 was considered by us in [50] which was shown to be maximized by Grassmannian line packing the columns of ^ T. However, this bound is independent of the switch position setS and cannot be generalized toM rx > 1. Note that an approximation to C D ( ^ T) can be obtained from the capacity analysis in Chapter 3. Though this 43 approximation eliminates the need for taking an expectation as in (4.5), it has to be computed recursively and is accurate only when D;K tx M rx . The computational cost of this recursive computation can become very expensive when K tx 1. In contrast, in this section a closed- form lower bound to the mean sum-capacity is derived that depends onS. Henceforth, for ease of analysis, it is assumed that D tx = I D . 1 Extension to more generic channels is considered later in Section 4.8. For any a b > 0, the complex Stiefel manifoldU(a;b) is dened as the set of all a b matrices with ortho-normal columns. Such matrices shall be referred to as semi-unitary matrices. Furthermore, for A; B2U(a;b) the `Fubini-Study distance' function is dened as: d FS A; B = arccos q jA y BB y Aj: (4.7) Here d FS (A; B) is not a distance measure between A and B, but rather a distance measure betweenPfAg andPfBg onG(a;b). For ease of notation, we further dene ^ Q i , ^ TS i ^ G i for each selection matrix S i 2S. Note that ^ Q i 2U(D;K tx ) andPf ^ Q i g2G(D;K tx ) for all i = 1;::;jSj. Then one obtain the following lemma: Lemma 4.3.1 (Higher dimension lower bound). If D tx =I D , then C D ( ^ T)C D LB1 ( ^ T) where: C D LB1 ( ^ T),E H DE V max 1ijSj log h 1 + V y ^ Q i ^ Q y i V i ; (4.8) = Mrx M rx R rx H D [H D ] y , H D is as dened in (4.5) and V is a random matrix uniformly distributed overU(D;K tx ), independent of H D . Proof. See Appendix 4.C Note that in (4.5), each ^ Q i is associated with a corresponding selection region: H D 2 C MrxD i = argmax 1jjSj jI Mrx + Mrx R 1=2 rx H D ^ Q j ^ Q y j [H D ] y R 1=2 rx j . Essentially, Lemma 4.3.1 nds a lower bound where these selection regions are changed to V2U(D;K tx ) i = argmax 1jjSj jV y ^ Q j ^ Q y j Vj : These regions are easier to bound than those in (4.5), as exploited by the following theorem. Theorem 4.3.1 (Fubini-Study lower bound). If D tx =I D andD 1, then C D ( ^ T)C D LB ( ^ T), where: C D LB ( ^ T),jSj 1 cos 2=Ktx (=2) K tx ! DKtx+ [ + log cos 2 (=2)]; (4.9a) = min i6=j d FS ( ^ Q i ; ^ Q j ),f FS ( ^ T); (4.9b) =M rx log M rx +logjR rx j + Mrx X m=1 (Dm + 1); (4.9c) 1 Any constant scaling factor in D tx is included into , without loss of generality. 44 () being the digamma function and =o(D) i.e., lim D!1 =D = 0. Furthermore, if 2: argmax ^ T2T G n C D LB ( ^ T) o argmax ^ T2T G n f FS ( ^ T) o : (4.10) Proof. Let us dene , min i6=j d FS ( ^ Q i ; ^ Q j ), andQ i G(D;K tx ) for 1ijSj as: Q i , n PfWg W2U(D;K tx );d FS (W; ^ Q i )<=2 o = n PfWg W2U(D;K tx );jW y ^ Q i ^ Q y i Wj> cos 2 (=2) o : Now consider V uniformly distributed overU(D;K tx ) as in Lemma 4.3.1. Since both V and ^ Q i are semi-unitary, one obtains 0jV y ^ Q i ^ Q y i Vj 1 [182]. By pessimistically assuming that jV y ^ Q i ^ Q y i Vj = 0 whenPfVg = 2 S i Q i andjV y ^ Q i ^ Q y i Vj = cos(=2) whenPfVg2 S i Q i , one can lower bound C D LB1 ( ^ T) in (4.8) as: C D LB1 ( ^ T) P PfVg2 [ i Q i E H D log[ cos 2 (=2)] = P PfVg2 [ i Q i + log cos 2 (=2) ; (4.11) where is given by (4.9c) and follows from the results on log-determinant of a Wishart matrix [183]. 2 Since V is uniformly distributed overU(D;K tx ), based on results in [149, 184], one obtains: P PfVg2 [ i Q i jSj 1 cos 2=Ktx (=2) K tx ! DKtx+ ; (4.12) where =o(D). Using (4.11){(4.12) and Lemma 4.3.1, one arrives at (4.9a). Note that ^ T aects C D LB ( ^ T) only via the term (for a xed L tx ). Therefore, if the partial derivative of C D LB ( ^ T) with respect to is non-negative, then maximizing is equivalent to maximizing C D LB ( ^ T). The required condition can be found as @C D LB ( ^ T) @ cos 2 (=2) 0 i.e., (DK tx +) cos 2=Ktx (=2) K tx [+log cos 2 (=2)] 1cos 2=Ktx (=2): (4.13) Since cos() = min i6=j q j ^ Q y i ^ Q j ^ Q y j ^ Q i j 0, we have cos 2 (=2) = cos()+1 2 1 2 . Therefore a sucient condition for (4.13) can be obtained as: (D + K tx )[ log 2] 2 1=Ktx 1: Lettingjj K tx D=2 for D 1 since = o(D) , it can be veried that the above holds for 2. Thus, (4.10) follows. 2 Note that H D [H D ] y is the determinant of aMrxMrx complex Wishart matrix withD degrees of freedom. 45 Since the objective in (4.6) is not in closed form, for D tx = I D , the sub-optimal RD- beamformer design problem that maximizes f FS ( ^ T) in (4.10) is considered, i.e., the focus is on nding: ^ T FS = argmax ^ T2T G min i6=j d FS ^ Q i ; ^ Q j : (4.14) While it only maximizes a lower bound C D LB ( ^ T) to the mean sum-capacity, the metric f FS ( ^ T) can be readily computed for each candidate ^ T unlike C D ( ^ T) in (4.5). 4.3.1 Interpreting of the Fubini-Study distance metric - f FS ( ^ T) Note that for D tx =I D , the mean hSNR sum-capacity of the RD-beamformer in (4.5) can be alternately expressed as: C D ( ^ T) =E H D max 1ijSj n C D i ( ^ T; H D ) o ; (4.15) where, C D i ( ^ T; H D ) = log I Mrx + Mrx R 1=2 rx H D ^ Q i ^ Q y i [H D ] y R 1=2 rx . Based on the analysis in Chap- ter 3, these individual instantaneous hSNR capacities C D i ( ^ T; H D ) are approximately jointly Gaussian distributed with second order statistics given by: Ef C D i ( ^ T; H D )g M rx log K 3=2 tx M rx p K tx + 1 ! + logjR rx j; (4.16a) Crosscov n C D i ( ^ T; H D ); C D j ( ^ T; H D ) o M rx log 1+ k ^ Q y j ^ Q i k 2 F K 2 tx (4.16b) M rx log 1 + cos d FS ( ^ Q i ; ^ Q j ) 2=Ktx K tx ; where the last step follows by applying the AM-GM inequality on eigenvalues of ^ Q y j ^ Q i ^ Q y i ^ Q j and using (4.7). Therefore, by maximizingf FS ( ^ T) in (4.14), a lower bound to the largest cross- covariance term among the individual hSNR capacitiesf C D i ( ^ T; H D )g is minimized. 3 This is an intuitively pleasing result, since reducing cross-covariance typically shifts the probability distribution of the maximum of a set of Gaussian random variables to the right [185]. 4.4 Design of the RD-beamformer Since (4.14) tries to maximize the minimum Fubini Study distance between the subspaces fPf ^ Q i gj1 ijSjg, it may seem identical to the well studied problem of Grassmannian subspace packing, for which several ecient algorithms are available in literature (see [186] 3 If one replaces the Fubini-Study distancedFS() in (4.14) by chordal distance [149], the corresponding solution ^ T chord exactly minimizes the largest cross-covariance term in (4.16b). However simulations show no improvement in mean sum-capacity with this replacement. Hence we stick to the Fubini-Study distance. 46 and references therein). However there is a subtle dierence, which stems from the fact that ^ Q i = ^ TS i ^ G i for i =f1;:::;jSjg are generated from the same RD-beamformer ^ T. They are therefore coupled, making (4.14) a coupled Grassmannian sub-space packing problem. This is illustrated via a toy example in Fig. 4.1, wherePf ^ Q i g's are represented as planes passing through the origin. Here rotatingPf ^ Q 1 g about ^ T cf2g would require moving ^ T cf1g , which may further change other sub-spaces that contain ^ T cf1g , such as,Pf ^ Q 2 g;Pf ^ Q 4 g. Thus thePf ^ Q i g's are coupled in general, and (4.14) should rather be interpreted as trying to pack the columns [ ^ T] cf`g such that the planes (Pf ^ Q i g's) are well separated. To the best of our knowledge, such Figure 4.1: An illustration of the coupled subspace packing problem (4.14) for D = 3;K tx = 2 and a real eld. Here ^ TS 1 = ^ T cf1;2g , ^ TS 2 = ^ T cf1;3g , ^ TS 3 = ^ T cf3;Ltxg and ^ TS 4 = ^ T cf1;4g . a problem has not been addressed in literature before. Lemma 4.4.1. If L tx D, any ^ T2U(D;L tx ) is optimal for (4.14). Proof. From (4.7) it can be readily veried that 0 d FS () =2. Therefore from (4.14), 0f FS ( ^ T)=2. Now, for L tx D, consider a DL tx RD-beamformer ^ T such that ^ T y ^ T =I Ltx . For any i;j2f1;:::;jSjg and i6= j, there exists `2f1;::;L tx g such that S i picks ^ T cf`g but S j does not. Then one obtains ^ Q y j ^ T cf`g =O Ktx1 , which follows from the fact that ^ T has orthonormal columns and hencePf ^ TS j g? ^ T cf`g . Furthermore,9a2C Ltx1 such that ^ Q i a = ^ T cf`g . Then: a ^ Q y i ^ Q j ^ Q y j ^ Q i a = [ ^ T cf`g ] y ^ Q j ^ Q y j ^ T cf`g = 0 )d FS ^ Q i ; ^ Q j = =2: (4.17) Since ^ T satises the upper bound on f FS (), the lemma follows. 47 Unfortunately, solutions to (4.14) are not known for the more interesting case of L tx >D. However, a related problem is the problem of Grassmannian line packing: ^ T LP = argmax ^ T2T G min i6=j n d FS ([ ^ T] cfig ; [ ^ T] cfjg ) o ; (4.18) which tries to maximize the minimum Fubini Study distance between the columns of ^ T, and for which several near-optimal solutions are available in literature [186, 187]. Both problems have identical solutions ^ T FS = ^ T LP for L tx D. While this is not true for L tx > D, it is hypothesized here that ^ T LP might still serve as a good, analytically tractable, sub-optimal solution to (4.14). One important dierence however is that unlike ^ T FS , ^ T LP is independent of the switch position setS, and therefore may have poor performance for certainS if L tx >D. Therefore, some numerical optimization algorithms to adapt ^ T LP to f FS () are explored in Appendix 4.D. These algorithms are used later in Section 4.7 to evaluate the quality of the line packed solution ^ T LP , via simulations. 4.5 Reducing the hardware and computational complexity For each UE group, it was assumed in Section 2.1 that K tx ;L tx are chosen by an inter-group resource sharing algorithm. Full exibility in picking K tx ;L tx imposes a signicant hardware cost for large values ofM tx . Additionally, a largejSj may also increase the computational com- plexity of picking the best selection matrix for each channel realization. This section discusses methods to reduce these hardware and computational costs when K tx ;L tx are pre-xed values i.e., inter-group resource sharing only involves power allocation. In this case, one can restrict the discussion to the beamformer and selection bank for a single UE group. 4.5.1 Reducing hardware cost of the beamforming matrix In general, a variable gain phase-shifter is required to implement each element of the analog beamforming matrix T, thereby, needing M tx L tx components. This leads to a large imple- mentation cost, especially when L tx > D. However if T is designed apriori for a xed value of D, the hardware cost can be reduced signicantly, as illustrated next. Note that the pro- posed beamforming matrix can be expressed as: T = T var ^ T x , where T var = E D tx [ ^ T] cf1:Dg , ^ T x = h I D [ ^ T] 1 cf1:Dg [ ^ T] cf(D+1):Ltxg i and ^ T is aDL tx RD-beamformer designed for either (4.14) or (4.18). Firstly, by implementing both components T var and ^ T x separately as shown in Fig. 4.2, the number of required variable gain phase-shifters reduce to D(M tx +L tx D), which can be a signicant reduction whenM tx D andL tx >D. The number of analog power dividers may however increase by a factor ofD=K tx . Secondly, since design of ^ T is independent of aCSI given D and D tx =I D (see (4.14)), the D(L tx D) components of ^ T x can be imple- mented using a xed phase-shifter array. Later in Section 4.8 it shall be shown that this xed structure is also applicable when D tx 6= I D . Further reduction in the hardware complexity is possible using unit gain, discrete phase-shifter components for the beamformer [41,55,66]. The use and impact of such components, however, is beyond the scope of this chapter. 48 Figure 4.2: Block diagram of a reduced complexity analog front-end design for HBwS, corre- sponding to one UE group. 4.5.2 Restricting the size of the switch position set In this subsection, the size of the switch position setS for each UE group is restricted. The size restriction not only reduces the hardware cost of implementing the switch bank, but also reduces the computational eort of picking the best selection matrix for a channel realization. In fact, since the ^ Q i 's are coupled, some selection matrices may contribute little to the overall system performance. Let us dene for each selection matrix S i a corresponding setB i f1;::;L tx g such that `2B i i [S i ] cfkg = [I Ltx ] cf`g for some 1kK tx i.e., S i connects input port ` of T to some up-conversion chain. It can then be shown that ^ Q y i ^ Q j ^ Q y j ^ Q i hasjB i \B j j unity eigenvalues (see Appendix 4.E). Therefore, from the denition of d FS () in (4.7), we have for i6=j: cos d FS ( ^ Q i ; ^ Q j ) 2 h # Ktx f ^ Q y i ^ Q j ^ Q y j ^ Q i g i KtxjB i \B j j ; (4.19a) cos d FS ( ^ Q i ; ^ Q j ) 2 h # jB i \B j j+1 f ^ Q y i ^ Q j ^ Q y j ^ Q i g i KtxjB i \B j j ; (4.19b) where # k (A) represents thek-th largest eigenvalue of a matrix A. Bounds in (4.19) suggest that reducingjB i \B j j might help increase cos d FS ( ^ Q i ; ^ Q j ) . Therefore a good way of increasing C D LB ( ^ T) in Theorem 4.3.1 is to reducejB i \B j j for i6= j. However,jSj should also be kept as large as possible to minimize the performance loss. In other words, one wishes to nd the largest family of subsets ~ B such that: 4 ~ B = B 1 ;::;B j ~ Bj B i f1;::;L tx g;jB i j =K tx andjB i \B j j8i6=j : (4.20) Finding the largest such family is an open, but well studied, problem in the eld of extremal combinatorics. Based on some of these results, one obtains the following theorem: 4 ~ B =fB1;::;B j ~ Bj g is a set of subsets of column indices of T, each element of which corresponds to a selection matrix S2S. 49 Theorem 4.5.1 (K tx -uniform,f0 :g-intersecting subsets). Let ~ B be the largest subset of the power set off1;:::;L tx g such that (4.20) is satised. Then the cardinality of ~ B satises: L tx 2K tx +1 q +1 j ~ Bj Ltx +1 Ktx +1 if L tx 2K 2 tx ; (4.21) where q is the largest prime number such that qL tx =K tx . Proof. The upper bound is derived in [188] and an algorithm that achieves the lower bound was proposed in [189, Theorem 4.11], which is reproduced below for convenience. Let q be the largest prime number such that q L tx =K tx . If q K tx , a construction of a family of q +1 subsets with the required, bounded overlap is given by Algorithm 4.1. Now from Bertrand's Algorithm 4.1: Frankl-Babai Construction [189] 1: for i = 1 to q +1 do 2: B i = 3: for j = 0 to do 4: a j = mod(ij q j ;q) fHerej implies integer divisiong 5: end for 6: f(x), P j=0 a j x j 7: for k = 0 to K tx 1 do 8: B i =B i S kq + mod(f(k);q) + 1 9: end for 10: end for postulate [190,191], there always exists a prime number q between L tx =(2K) and L tx =K tx i.e., q L tx =(2K). Therefore a sucient condition for Algorithm 4.1 is: L tx =(2K) K tx . This concludes the theorem. For ~ B designed by Algorithm 4.1, each subsetB i picks exactly one element in the interval kq; (k + 1)q for k = 0;::;K tx 1. Therefore the corresponding switch position setS Alg1() = fS 1 ;::; S jS Alg1() j g can be implemented by equipping each up-conversion chain with a 1-to-q switch as depicted in Fig. 4.2, i.e., each up-conversion chain has an exclusive set of input ports to connect to. This leads to a signicant saving in hardware cost as opposed a system with all possible selections. Note that this reduced complexity structure is analogous to the designs in [23,31,58] for conventional hybrid beamforming and antenna selection. 4.6 Channel Estimation Overhead For performing HBwS to a UE group, the knowledge offR tx ; e H 1 T;:::; e H M 1 Tg is required at the BS and off e H m TSGg is required at each UEm. In this section the corresponding CE overhead is quantied for a narrow-band Orthogonal Frequency Division Multiplexing (OFDM) system. While the knowledge offR rx;1 ;:::; R rx;M 1 g was also assumed to be available at the BS in Section 50 2.1, this knowledge is not utilized in the proposed beamformer designs (see Sections 4.4 and 4.8). Note that since the dierent UE groups are assumed to have orthogonal channels, pilots can be reused across the UE groups. Hence, without loss of generality, the CE overhead is quantied by considering a single representative group. A study of the impact of channel quantization or estimation errors on system performance is beyond the scope of this chapter. Several algorithms have been proposed to acquire aCSI statistics, such as R tx , with minimal training [80,99]. Additionally, since R tx remains constant for a long time duration and over a large bandwidth [192,193], it can be acquired at the BS with low overhead via uplink channel training, in both TDD and FDD systems. Similarly, the acquisition of e H m TSG at each UEm imposes a small overhead, since each element of u in (2.1b) can use a dierent pilot sub-carrier and all the UEs can be trained in parallel via downlink training, once S2S is picked. The main bottleneck is the estimation off e H 1 T;:::; e H M 1 Tg at the BS. Note that it is sucient to estimatef e H 1 T;:::; e H M 1 Tg, where T is a sub-matrix of T whose columns form a basis forPfTg. SincejPfTgj minfD;L tx g from (4.3), this involves estimation of minfD;L tx gM rx channel coecients. These coecients can be obtained either via uplink pilot training in TDD, or via downlink training and feedback of e H m T from each UEm in FDD. As an illustration, consider a TDD based system where theM 1 UEs transmitdminfD;L tx g=K tx e uplink pilot symbols in each coherence time. All theM rx =M 1 M 2 UE antennas use orthogonal pilot sub-carriers for parallel training. By using a sequence of S2S for thedminfD;L tx g=K tx e pilots, such that each column of T is picked at least once,f e H 1 T;:::; e H M 1 Tg can be estimated at the BS. The corresponding system sum-throughput (including CE overhead) can be expressed as O HBwS C(T), where: O HBwS = 1dminfD;L tx g=K tx e; (4.22) and = (symbol duration) (coherence time). As is evident, there is a trade-o between the mean hSNR sum-capacity C(T) which is an non-decreasing function ofD (see (4.3)), and the CE overhead O HBwS which is a non-increasing function of D. As mentioned in Section 4.1, a good beamformer that maximizes the system throughput O HBwS C(T) can therefore be obtained by performing a line search over D2f1;:::;M tx g. However proposing a computationally ecient algorithm to nd this D is beyond the scope of this thesis. Investigations under certain simplications shall be considered in Chapter 5 (see also [23,79]). 4.7 Simulation Results For simulations a TDD based narrow-band OFDM system is considered, with one BS (M tx = 100) implementing HBwS and one representative UE group. It is assumed that the shared spatial TX correlation matrix has isotropic scattering within the dominant D-dimensional sub- space, i.e., D tx =I D and E tx may be arbitrary. Extensions to the anisotropic case are considered in the next section. The switch bank provides each up-conversion chain with an exclusive set ofbL tx =K tx c input ports for connection [105]. Unless otherwise stated, it is assumed that all the switch positions possible with this architecture are allowed i.e.,: S all = [I Ltx ] cf` 1 ;::;` K tx g (k1) L tx K tx <` k k L tx K tx ;k2f1;::;K tx g : 51 For the results, the system sum throughputO HBwS C D ( ^ T) is used as the metric, whereO HBwS is from (4.22). Since C D ( ^ T) in (4.5) is not known in closed form, throughout this section Monte- Carlo simulations are used to obtain its sample-mean estimate. For each channel realization, a brute-force search is performed to pick the best S2S. The design of low-complexity algorithms for selecting S is beyond the scope of this chapter (see [27,137] and references therein). Note that the system model and mean capacity bound in Sections 2.1{4.1 are also applicable to HBaCSI and HBiCSI, by setting L tx = K tx ,S =fI Ktx g (no selection stage) and letting T depend on aCSI and iCSI, respectively. For limiting the CE overhead, the beamforming for HBaCSI and HBiCSI are also restricted to lie in the dominantD-dimensional channel subspace. Their mean hSNR sum-capacity, with hSNR sum-capacity maximizing beamformers, can be obtained by replacingS =fI Ktx g with ^ T HBaCSI = [I D ] cf1:Ktxg and ^ T HBiCSI = E Ktx iCSI in (4.5), respectively, where E Ktx iCSI is the DK tx eigen-vector matrix of [E D tx ] y e H y e HE D tx , corresponding to the K tx largest eigenvalues. Similarly, their throughput can be obtained by using pre-log factors of O HBaCSI = 1 and O HBiCSI = 1dD=K tx e in (4.5), respectively. A proof of the above is skipped for brevity. The choice of D in the following results is arbitrary, and therefore further gains with HBwS are expected if a line search over D is performed. 4.7.1 In uence of number of input ports (L tx ) The system sum throughput with HBwS as a function of number of input ports (L tx ) is studied in Fig. 4.3. Here the performance of both the line packed RD-beamformer - ^ T LP and the RD-beamformer output of Algorithm 4.3 - ^ T Alg3 (see Appendix 4.D) are plotted. The results 0 5 10 15 20 25 30 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 Figure 4.3: Comparison of sum throughputs for dierent beamformers, as a function of L tx M tx = 100;D = 10;M rx = 2; = 10,S =S all , ^ T LP is generated using the line packing algorithm in [194], D tx =I D , R rx =I Mrx , = 10 2 . suggest that withL tx = 2D, HBwS outperforms HBaCSI by approximately half the throughput gap between HBiCSI and HBaCSI. One also observes that there is a diminishing increase in throughput as the number of ports L tx is increased. Therefore for a good trade-o between hardware cost and performance, L tx should be of the order of D. Finally, one observes that ^ T LP and ^ T Alg3 have almost identical throughput forS =S all . 52 4.7.2 In uence of restricted switch positions (S) In this sub-section, the performance of HBwS under the restricted switch position setsS Alg1() from Algorithm 4.1 is analyzed. Note thatS Alg1() S all and can therefore be implemented us- ing the switch bank considered here. The sum-throughput C D ( ^ T LP ) withS Alg1() is compared to the average sum-throughput with ^ T LP and a randomly chosen switch position setS rand() of same size asS Alg1() , averaged over several realizations, in Fig. 4.4. The results support our claim that selection matrices with low overlap contribute more to mean hSNR sum-capacity than others. Results also show that whilejS Alg1() j increases exponentially with, the increase in throughput is sub-linear, and therefore a small value of is sucient to achieve good perfor- mance. Fig. 4.4 also compares the sum throughput for ^ T LP and ^ T Alg3 , withS =S Alg1() . The 0 0.5 1 1.5 2 2.5 3 7.4 7.6 7.8 8 8.2 8.4 8.6 8.8 9 9.2 9.4 Figure 4.4: Comparison of O HBwS C D ( ^ T LP ) withS =S Alg1() to the average sum-throughput EfO HBwS C D ( ^ T LP )g withS =S rand() , averaged over several realizations ofS rand() , as a func- tion of maximum overlap . Here,S Alg1() is designed using Algorithm 4.1 andS rand() is a random subset of S all such that jS rand() j = jS Alg1() j. For S = S Alg1() , we also plot O HBwS C D ( ^ T Alg3 ) M tx = 100;D = 10;L tx = 20;K tx = M rx = 4; = 10, ^ T LP is from [194], D tx =I D , R rx =I Mrx , = 10 2 . results suggest that even for these reduced complexity switch position sets, the line packing so- lution ^ T LP is almost as good as ^ T Alg3 . A similar trend has also been observed for several other switch position sets not discussed in this chapter for brevity. This suggests that the design ^ T LP cannot be improved upon via conventional approaches such as Algorithms 4.2 & 4.3, at least for theS considered here. The good performance of ^ T LP is probably because f FS ( ^ T LP )=2 whenL tx is of the order ofD, and hence is near optimal for (4.14) (see Theorem 4.4.1). Hence- forth, the use of the computationally intensive Algorithms 4.2 & 4.3 shall be avoided, and only ^ T LP shall be used to study the performance of HBwS. 4.7.3 In uence of number of UEs (M 1 ) The case where the representative UE group has M 1 single antenna UEs i.e., M 2 = 1 is con- sidered next. The sum throughput of HBwS (normalized by sum throughput of HBiCSI) for 53 varying M 1 is studied in Fig. 4.5. Such a normalization allows comparison across dierent values of M 1 . From the results, one can observe that the slope of the normalized throughput curves increase with M 1 , suggesting that the additional beam choices with HBwS aid in UE separability. Apart from O HBwS C D ( ^ T LP ), which can be achieved via DPC (see Chapter 2 and also [144]), the normalized sum-throughput for Zero Forcing (ZF) precoding is also plotted in Fig. 4.5. Note that, unlike DPC, ZF precoding may yield poor performance when simultane- ously supporting allM 1 UEs [195]. Therefore the scheduling of a sub-set of UEs (M sc ) from the UE group shall be allowed. The corresponding sum-rate with max-min fairness to the scheduled UEs can be expressed as: C D ZF ( ^ T) =E e H max 1ijSj jM sc j log 1+ Tr e H sc E D tx ^ TS i ^ G i ^ G y i S y i ^ T y [E D tx ] y e H y sc 1 ; where e H sc is a sub-matrix of e H (see (4.1)) corresponding to the scheduled UEsM sc . Results 0 5 10 15 20 25 30 35 40 45 50 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Figure 4.5: Sum throughput of HBwS (normalized by the sum throughput of HBiCSI) versus L tx , for varying number of UEs. For ZF precoding, the rstjM sc j UEs in the UE group are scheduled 9 M tx = 100;D = 10;M 2 = 1; = 10,S =S all , ^ T LP is from [194], D tx = I D , R rx =I M 1 , = 10 2 . show that HBwS helps reduce the throughput gap between the linear precoding scheme (ZF) and DPC, without requiring sophisticated user scheduling. 4.7.4 In uence of the bound on PfTg - D As mentioned in Section 4.6, there exists a trade-o between mean hSNR sum-capacity and CE overhead as the value of D changes. To illustrate this trade-o, the sum throughput of HBwS, HBaCSI and HBiCSI is studied as a function of D in Fig. 4.6, for a channel with isotropic scattering, i.e., R tx = I Mtx . As evident from the results, there exists a D that maximizes the sum throughput. Furthermore, this D increases with K tx . Proposing a computationally 4 Sophisticated user scheduling algorithms such as [195] cannot be directly extended to HBwS due to the presence of switching. 54 0 5 10 15 20 25 30 1 2 3 4 5 6 7 8 9 Figure 4.6: Sum throuhgput of dierent schemes, as a function of D M tx = 100;L tx = 20; = 10,S =S all , ^ T LP is from [194], R tx =I Mtx , R rx =I Mrx , = 10 2 . ecient algorithm to nd D is beyond the scope of this chapter (see Chapter 5 for some investigations). However a simple rule of thumb is to ensure that D K tx = to reduce overhead. 4.8 Anisotropic Channels In this section, the beamformer design in Sections 4.2-4.4 is extended to anisotropic scattering in the dominant D dimensional sub-space, i.e., D tx 6= I D . Without loss of generality, the eigenvalues in D tx are assumed to be arranged in the descending order of magnitude. Similar to the approach used in [86,90,196], the companding trick is employed to adapt the beamforming matrix to D tx , as: T ani = E D tx D tx ^ T LP ; (4.23) where ^ T LP is the DL tx line packed RD-beamformer. Intuitively, D tx in (4.23) skews the columns of T ani , and therefore also the precoding beams T ani S i G i , to be more densely packed near the eigen-vectors corresponding to the larger eigenvalues of R tx . 5 For improving through- put, an additional optimization of T ani over the choice of D is required (see Section 4.6). Note that this skewed beamformer can still be implemented using the two stage design in Sec 4.5.1 by using T var = E D tx D tx [ ^ T LP ] cf1:Dg . For L tx D, not all line packed matrices ^ T LP yield good performance after skewing by (4.23). Therefore for L tx D, the use of ^ T LP = W O (DLtx)Ltx y in (4.23) is suggested, where W is the L tx L tx DFT matrix, i.e., [W] a;b =e j2ab L tx = p L tx . For simulations, the BS is assumed to have a half wavelength (=2) spaced uniform planar array of dimension 4010. The TX transmits to a UE group ofM 1 = 3 single antenna receivers 5 A more general approach involves using [ D tx ] in (4.23), where > 0 is a skewing parameter. 55 that share the same transmit PAS, given by: PAS(;) = 3 X i=1 exp h j ~ i jj ~ i j i ( ~ i ; ~ i ) (4.24) where: (;) = 1 forjj=20;jj=20 0 otherwise ; ~ = 3 10 ; 0; 5 , ~ = 6 10 ; 8 10 ; 7 10 , 2 [=2;=2) is the azimuth angle of departure, 2 [0;) is the elevation angle of departure and is a factor that controls the anisotropy of the channel. For this PAS, the transmit correlation matrix can be computed as: [R tx ] ab = R =2 =2 R 0 PAS(;)e j2d H (a;b) sin() sin()+j2d V (a;b) cos() sin()dd R =2 =2 R 0 PAS(;) sin()dd ; (4.25) where d H (a;b);d V (a;b) are the horizontal and vertical spacing (in wavelengths) between el- ements a and b, respectively. For this channel, the mean hSNR sum-capacity of the skewed 0 10 20 30 40 50 60 70 80 90 100 7 8 9 10 11 12 13 14 15 (a)fD;Ltx;Ktx;Mrxg =f24; 9; 3; 3g 0 10 20 30 40 50 60 70 80 90 100 7 8 9 10 11 12 13 14 15 (b)fD;Ltx;Ktx;Mrxg =f24; 51; 3; 3g Figure 4.7: Mean hSNR sum-capacity for the skewed beamformer T ani as a function of channel anisotropy. (a) ^ T LP = W O (DLtx)Ltx y , where W is the L tx L tx DFT matrix (b) ^ T LP is from [194] simulation parameters: M tx = 400,S =S all , R rx =I Mrx and = 1 . beamforming matrix T ani is compared to T LP = E D tx ^ T LP in Fig. 4.7 as a function of the channel anisotropy, for the both cases of L tx D and L tx >D. For a comparison to existing designs, the mean hSNR sum-capacity of T Sud = [E tx ] cf(1:Ltx)g , a generalization of the beamformer in [41], is also depicted where: 6 (`) = mod (` 1)K tx + j (` 1)K tx L tx k ;L tx + 1: 6 Here T Sud is a column permutation of [Etx] cf1:L tx g which ensures good performance withS all . 56 One can observe from the results that the hSNR capacity gap between HBiCSI and HBaCSI reduces as the channel anisotropy increases. Also, unlike T LP , the mean hSNR sum-capacity of T ani does not reduce with increasing anisotropy. Further, one can observe that T ani outperforms both T Sud and HBaCSI, over the whole range of for both DL tx and D >L tx . Therefore the designed beamformer is not only more generic (allows non-orthogonal columns) but also leads to a higher mean hSNR sum-capacity in comparison to existing designs. 4.9 Summary and Conclusions In this chapter, the analog beamformer design problem is studied and it is shown that a beam- forming matrix that maximizes a lower bound to the system mean sum-capacity is obtained by solving a coupled Grassmannian sub-space packing problem. A sub-optimal solution to this problem is proposed, and algorithms to improve the design for a givenS are also explored. It is shown that designing the beamformer apriori for a xedD enables a two stage implementation, having a xed and an adaptive stage. This implementation lowers the hardware implementa- tion cost signicantly when L tx > D. It is also shown that switch positions with low overlap contribute more to mean hSNR sum-capacity than others, and provide an algorithm for nding a family of such important switch positions. In addition, the impact of L tx , the switch bank architecture, the number of UEs and the CE overhead on the system performance are also studied. Simulation results suggest that with L tx = 2D, HBwS can achieve gains compara- ble to half the hSNR capacity gap between HBaCSI and HBiCSI. It is also observed that for the explored switch position sets, the RD-beamformer design ^ T LP cannot be improved further via conventional approaches such as Algorithms 4.2 & 4.3. Furthermore, for a good trade-o between performance and hardware cost, the number of input ports (L tx ) should be of the order of D. However, larger values may be practical in a multi-user scenario, since a larger L tx can aid separation of multiple data streams. In particular, it helps make performance of linear ZF precoding comparable to the non-linear and capacity optimal DPC, without the need for sophisticated user scheduling algorithms. Picking DK tx = ensures a low CE overhead. Results for anisotropic channels suggest that skewing the line packed beamformer yields better performance in comparison to existing designs. 4.A Appendix A (Proof of Theorem 4.2.1). Let E D tx = [E tx ] cf1:Dg , and let us additionally dene E MtxD tx , [E tx ] cfD+1:Mtxg . Since E tx E y tx =I Mtx , for any beamforming matrix T one can write: T = E tx E y tx T = E D tx [E D tx ] y T + E MtxD tx [E MtxD tx ] y T: (4.26) Additionally let the principal DD submatrix of tx , that contains all non-zero eigenvalues, be given by D tx . Dening ^ T, [E D tx ] y T, for any UE m, from (2.1a) and (2.3b) we have: y m (S i ) = p R 1=2 rx;m H m [ tx ] 1=2 [E tx ] y TS i x + n m 57 = p R 1=2 rx;m H m [ D tx ] 1=2 [E D tx ] y E D tx ^ TS i x + n m ; (4.27) where (4.27) follows from (4.26) and the fact that rankfR tx g D. Now, from the transmit power constraint in (2.2a) we have: tr n TS i E x fxx y gS y i T y o 1 ) tr n E D tx [E D tx ] y E D tx ^ TS i E x fxx y gS y i T y o + tr n E MtxD tx [E MtxD tx ] y TS i E x fxx y gS y i T y o 1 (4.28a) ) tr n E D tx ^ TS i E x fxx y gS y i ^ T y [E D tx ] y o + tr n [E MtxD tx ] y TS i E x fxx y gS y i T y E MtxD tx o 1 (4.28b) ) tr n E D tx ^ TS i E x fxx y gS y i ^ T y [E D tx ] y o 1; (4.28c) where (4.28a) follows by replacing T using (4.26) and by using [E D tx ] y E D tx =I D , (4.28b) follows from the identity that trfA y Bg = trfBA y g for matrices A; B of same dimension, and (4.28c) follows by observing that both matrices on the left hand side are positive semi-denite. From (4.28c) and (4.27), for any T andEfxx y g, there exists aDL tx matrix ^ T such that if T satises the power constraint then so does E D tx ^ T and both T; E D tx ^ T yield the same instantaneous received signal y m (S i ). Therefore the theorem follows. 4.B Appendix B (Proof of Theorem 4.2.2). Let ^ T2 C DLtx be any DL tx matrix. For each ^ TS i we have a correspondingK tx K tx ortho-normalization matrix ^ G i . Now consider ^ T = ^ T where is any L tx L tx non-singular complex diagonal matrix. For each selection matrix S i , by dening a corresponding matrix ^ G i , S y i 1 S i ^ G i , we have: ^ G y i S y i ^ T y ^ T S i ^ G i = ^ G y i S y i [ 1 ] y S i S y i y ^ T y ^ T S i S y i 1 S i ^ G i = ^ G y i S y i S i S y i ^ T y ^ TS i S y i S i ^ G i (4.29a) = ^ G y i S y i ^ T y ^ TS i ^ G i (4.29b) = I Ktx ; where (4.29a) follows from the fact that S i S y i is diagonal and hence commutes with and (4.29b) uses the fact that S y i S i =I Ktx . This proves that ^ G i ortho-normalizes columns of ^ T S i . Using a similar sequence of steps it can be shown that ^ T S i ^ G i ^ G y i S y i ^ T y = ^ TS i ^ G i ^ G y i S y i ^ T y and hence from (4.5), C D ( ^ T ) = C D ( ^ T). This proves the theorem. 4.C Appendix C (Proof of Lemma 4.3.1). Consider the decomposition H D = U H D H DV y H D , where U H D is the M rx M rx left singular-vector matrix, H D is the M rx M rx diagonal matrix containing the 58 non-zero singular values, and V H D is theDM rx semi-unitary matrix containing theM rx right singular vectors (corresponding to the non-zero singular values) for H D in (4.5). 7 Now, using the fact thatjI + Aj 1 +jAj for any positive semi-denite matrix A, one can bound C D ( ^ T) in (4.5), for D tx =I D , as: C D ( ^ T)E H D E V H D max 1ijSj log h 1 + V y H D ^ Q i ^ Q y i V H D i ; (4.30) where = Mrx M rx jR rx jj H Dj 2 . Since H D has i.i.d. CN (0; 1) components, it is well known that H D and V H D are independently distributed and further, V H D is uniformly distributed overU(D;M rx ) [149, Lemma 4]. Consider a random DK tx matrix V, independent of H D and uniformly distributed over U(D;K tx ). Since the uniform measure is invariant under unitary transformation, for anyDD unitary matrix U, we have: V d = UV) [V] cf1:Mrxg d = U[V] cf1:Mrxg : In other words, [V] cf1:Mrxg is uniformly distributed over U(D;M rx ) and therefore V H D d = [V] cf1:Mrxg . Now replacing V H D in (4.30) by [V] cf1:Mrxg , we have: C D ( ^ T) E H D E V max 1ijSj log h 1+ [V] y cf1:Mrxg ^ Q i ^ Q y i [V] cf1:Mrxg i (4.31) Note that [V] y cf1:Mrxg ^ Q i ^ Q y i [V] cf1:Mrxg is the M rx M rx principal sub-matrix of V y ^ Q i ^ Q y i V. Therefore we have81iM rx : # i f[V] y cf1:Mrxg ^ Q i ^ Q y i [V] cf1:Mrxg g # i+(KtxMrx) fV y ^ Q i ^ Q y i Vg; (4.32a) 1 # k fV y ^ Q i ^ Q y i Vg 0 81kK tx ; (4.32b) where # k fAg represents the k-th largest eigenvalue of a square matrix A, (4.32a) follows from the Cauchy's Interlacing Theorem [176, Corollary 3.1.5] and (4.32b) is obtained by using the results on the spectral norm [182] and by observing that both V and ^ Q i have a largest singular value of 1. Since the determinant is the product of eigenvalues, from (4.31) and (4.32), one arrives at (4.8). Note thatj H Dj 2 = H D [H D ] y . 4.D Appendix D Here, some numerical algorithms to nd good solutions to (4.14) are explored, starting with an initial solution of ^ T LP . First the use of a greedy algorithm that permutes columns of ^ T LP to increase f FS () is proposed, as depicted in Algorithm 4.2. The performance of this permuted 7 Note that this is the compact singular value decomposition [197] which is slightly dierent from the usual approach of singular value decomposition. 59 matrix ^ T Alg2 may further be improved via a numerical gradient ascent of f FS ( ^ T), as depicted in Algorithm 4.3. Since f FS ( ^ T) involves the minfg function which may not be dierentiable, a dierentiable relaxation of f FS ( ^ T) is used in Algorithm 4.3: ~ f FS ( ^ T) = log jSj X i jSj X j=i+1 exp 10d FS ( ^ Q i ; ^ Q j ) =10: We therefore have ^ T LP Algo4.2 ! ^ T Alg2 Algo4.3 ! ^ T Alg3 , where: f FS ^ T LP f FS ^ T Alg2 f FS ^ T Alg3 f FS ^ T FS : The complexities of Algorithms 4.2 and 4.3 are both O(DL tx jSj 2 ) per iteration, which can be Algorithm 4.2: Greedy column permutation algorithm 1: Inputs: D;L tx ;S 2: Initialize ^ TfAs obtained by line packing [50]g 3: repeat 4: for ` = 1 to L tx do 5: for j = 1 to L tx do 6: ^ T Alg2 (j) = ^ T 7: [ ^ T Alg2 (j)] cf`g = [ ^ T] cfjg and [ ^ T Alg2 (j)] cfjg = [ ^ T] cf`g fSwap columns ` and jg 8: end for 9: Find j such that f FS ( ^ T Alg2 (j )) is largest 10: if f FS ( ^ T Alg2 (j ))>f FS ( ^ T) then 11: ^ T = ^ T Alg2 (j ) 12: end if 13: end for 14: until Convergence of f FS ( ^ T) 15: return ^ T substantial whenjSj and/or L tx is large. 4.E Appendix E Let us dene for each selection matrix S i a corresponding setB i f1;::;L tx g such that `2B i i [S i ] cfkg = [I Ltx ] cf`g for some 1 k K tx . Then for any `2B i \B j , we have [ ^ T] cf`g 2 Pf ^ Q i g\Pf ^ Q j g i.e., there exists a K tx 1 vectors i ` ; j ` such that ^ Q i i ` = ^ Q j j ` = [ ^ T] cf`g . Furthermore since ^ Q y i ^ Q i = ^ Q y j ^ Q j =I Ktx , we have: ( ^ Q j ^ Q y j )[ ^ T] cf`g = ( ^ Q j ^ Q y j ) ^ Q j j ` ) ^ Q j ^ Q y j [ ^ T] cf`g = [ ^ T] cf`g 60 Algorithm 4.3: Gradient ascent algorithm for RD-beamformer design 1: Inputs: D;L tx ;S 2: Initialize ^ T; ; T 3: ^ T new = ^ T 4: repeat 5: ^ T = ^ T new 6: for i = 1 to D do 7: for j = 1 to L tx do 8: ^ T = ^ T 9: [ ^ T ] ij = [ ^ T] ij + T 10: F ij = ~ f FS ( ^ T ) ~ f FS ( ^ T) T fComputing the gradient of objective functiong 11: end for 12: end for 13: for j = 1 to L tx do 14: t = [F] cfjg [ ^ T] cfjg [ ^ T] y cfjg [F] cfjg fComponent of the gradient tangential to unit norm constraintg 15: [ ^ T new ] cfjg = [ ^ T] cfjg cos( jtj) + t sin( jtj)=jtjfUpdate process ensures unit norm columns [197]g 16: end for 17: until ~ f FS ( ^ T new ) ~ f FS ( ^ T) 18: return ^ T 61 ) ^ Q j ^ Q y j ^ Q i i ` = ^ Q i i ` ) ^ Q y i ^ Q j ^ Q y j ^ Q i i ` = i ` ; i.e., i ` is an eigen-vector of ^ Q y i ^ Q j ^ Q y j ^ Q i with eigenvalue 1. If the vectorsf[ ^ T] cf`g j`2B i \B j g are linearly independent, then so arefi ` j`2B i \B j g. Therefore, ^ Q y i ^ Q j ^ Q y j ^ Q i hasjB i \B j j unity eigenvalues. 62 Chapter 5 Diversity versus Training overhead Trade-o in switched transceivers Consider the mean hSNR sum-capacity maximizing analog beamformer design problem (4.3) from Chapter 4. To limit the CE overhead, an arbitrary limit D was imposed on the column- space dimension of the beamformerjPfTgj. As discussed in Section 4.6, CSI is only required within the channel subspace spanned by the beamformer and therefore such a restriction on PfTg limits the CE overhead. However, since this also limits the number of channel dimensions that the TX can exploit for transmission, the system mean capacity is also slightly reduced. Therefore there is an inherent trade-o between the system mean capacity and the correspond- ing CE overhead, as also exemplied via simulations in Section 4.7.4. Such a trade-o is not limited to HBwS but is rather also applicable to any reduced complexity switched transceiver. Take for example the case of Selective Rake receivers [198,199] in spread spectrum Ultra-Wide Band (UWB) systems. In spread spectrum UWB, the data signals are spread over a wide bandwidth to avail several benets (see Section 1.3), one of which is access to a large number of resolvable delay bins, say M rx . A selective Rake receiver, which is analogous to RX antenna selection in massive MIMO, accumulates power from the strongest K rx channel delay bins by using a bank of K rx correlators. However this accumulation requires the knowledge of the strengths of all the M rx delay bins which imposes a CE overhead similar to HBwS. Limiting the CE to only D resolvable bins reduces CE overhead but also the mean capacity, i.e., there exists a similar trade-o. In this chapter, this mean capacity versus CE overhead trade-o is characterized and the throughput maximizing D is found for a system with a single antenna TX (M tx = 1) and a reduced complexity switched RX. Such a deviation from the system model in Chapter 2 is inevitable if a combined treatment of both HBwS and selective Rake is desired. However the results can be directly extended to the case where the switched architecture is at the TX and the RX has a single antenna, which is more aligned with the system model in Chapter 2. Additionally, the results presented here are only applicable to the special case of HBwS where the RX analog beamformer T is aM rx D sub-matrix of either the DFT matrix or E rx (D = L M rx ), where E rx is the eigen-vector matrix for R rx . Generalization to an arbitrary analog beamformer is beyond the scope of this thesis. 63 Consider a reduced-complexity switched RX that experiences M rx diversity branches 1 but only has K rx down-conversion chains. 2 Assuming a single antenna TX and in the absence of interference, the capacity optimal way for combining the received signals is Generalized Selection Combining (GSC) [200], also known as hybrid selection / maximum ratio combining [13]. In GSC, the instantaneously strongest K rx diversity branches are down-converted and maximum-ratio combined. The performance of a GSC system has been studied in great detail for independent Rayleigh fading [14], Nakagami-m fading [201] and arbitrary fading [202{204] channels. Similarly, the correlated channel scenarios have been discussed in [201, 205]. For implementing GSC, the receiver needs the CSI for all the diversity paths. This CSI can be acquired by transmitting a known pilot sequence from the transmitter during each channel coherence time interval. However, for low complexity systems, since the RX has onlyK rx <M rx down-conversion chains, for each transmitted pilot sequence the RX can acquire CSI for only K rx diversity paths. Therefore, the pilot sequence has to be re-transmitteddM rx =K rx e times to acquire the CSI for all the diversity paths. 3 This overhead for training can be especially large when M rx K rx , the pilot sequence is long and/or the channel coherence time is short. In this chapter a novel method to minimize the impact of this training overhead is proposed (a) Full CSI (b) Reduced CSI Figure 5.1: Illustration of CSI acquisition for a subset of diversity branches (D = 8;K tx = 5) and analyzed. It is motivated by the fact that in practice, not all diversity branches have the same average power: for example, in massive MIMO system, the spatial directions/dimensions at the receiver may carry dierent channel power levels on an average. Consequently, some diversity branches might make only a minor contribution to boosting the system mean capacity. Therefore, it is better to acquire CSI for only a subset of D paths, where K rx DM rx , and D is a trade-o between the CE overhead the performance gain from the increased diversity (see Fig. 5.1). Obviously, the determination of this subset has to be based on second-order statistics of the diversity paths only; those second-order statistics change very slowly with time, and thus 1 For HBwS at RX, the i-th diversity branch refers to the channel sub-space spanned by the i-th column of Erx or DFT matrix, and for a UWB selective Rake RX it refers to the i-th delay bin. 2 Henceforth only the phrase \up/down-conversion chains" shall be used, though this always means \up/down- conversion chains for massive MIMO or correlators for Rake receivers." 3 Fewer re-transmissions may be required with channel sparsity constraints (see Section 1.1). In this work no such sparsity constraints are assumed. 64 can be easily tracked at the RX. The main contributions of this chapter are as follows: 1. The problem of nding the data throughput maximizingD and corresponding set of paths D (D) is formulated. 2. It is proved that for a given value of D, a simple decision rule can be used to nd the corresponding optimal set of pathsD (D). 3. It is proved that the achievable data throughput is a unimodal function of D, which ensures that any locally optimum D is also globally optimum. 4. A computationally ecient approximation for the mean capacity of a GSC system is proposed, which signicantly reduces the complexity of nding optimal D. 5. The diversity versus training overhead trade-o is studied in two practically important low complexity switched RXs. The organization of this chapter is as follows: the channel model for this chapter is given in Section 5.1; the theoretical results on the optimal value of D and the corresponding set of pathsD (D) is presented in 5.2; a heuristic, computationally ecient, approximation to mean capacity is presented in 5.3; simulation results are in Section 5.4; and nally, the conclusions are summarized in Section 5.5. Additional Notation: dae the smallest integer larger than a and f x ,F x the probability density and cumulative distribution for a random variable x, respectively. 5.1 General Assumptions and Channel model A GSC system with a switched reduced complexity receiver and a single antenna TX is con- sidered. The channel oers M rx diversity paths and the RX only picks K rx diversity paths for down-conversion. Without loss of generality, it is assumed that M rx is a multiple of K rx . 4 Under these assumptions, the base-band equivalent received signal vector during any symbol duration can be represented as: y = p Shx + Sn (5.1) where y is theK rx 1 received signal vector corresponding to theK rx down-conversion chains, is the average SNR, S is aK rx M rx sub-matrix ofI Mrx that picks the bestK rx branches for down-conversion, h is theM rx 1 normalized channel vector corresponding to theM rx diversity paths, x is the transmit data symbol and nCN (O MrxMrx ;I Mrx ) is the M rx 1 normalized additive white gaussian noise vector. It is assumed that the channel diversity paths h i are independent but not identically distributed (i.n.i.d) 5 and their amplitudes follow a Nakagami- m distribution with the probability density function given by: f jh i j (x) = 2m m (m) m i x 2m1 expf mx 2 i g (5.2) 4 If this is not true, dummy diversity paths of zero magnitude can be appended 5 In a SIMO channel, the signals at the antenna elements are often correlated. However, in several HBwS systems [37, 41], the eective channel can be modeled as an i.n.i.d vector, since the fading in dierent beam directions is independent [66]. 65 where the shape parameter (m) is xed but the spread parameter ( i ) is dierent for each diversity pathi. Without loss of generality, the channel is normalized such that P Mrx i=1 i =M rx . Some relevant distribution parameters of h i are enlisted in Table 5.1. The RX is assumed to have knowledge of the average power Efjh i j 2 g = i for all the M rx paths. 6 Since the average power changes very slowly, it can be tracked for all the M rx paths with lower overhead (see Section 1.1 and references therein). Table 5.1: Distribution parameters of diversity path h i CDF: F jh i j (x) lower,inc (m;mx 2 = i ) (m) Avg. power: Efjh i j 2 g i The channel is assumed to be block fading, wherein the channel stays constant for a co- herence time interval and then changes to another random realization with distribution as in (5.2). During each coherence time interval, the pilot sequence is re-transmitteddD=K rx e times to acquire the CSI for D diversity paths (K rx DM rx ). Let us dene the CSI acquisition setDf1;::;M rx g as the set of indices of diversity paths whose CSI is acquired at the RX. Assuming perfect CE, it can be shown that the instantaneous SNR for GSC can be expressed as: GSC (D) = max SD;jSj=Krx ( X i2S jh i j 2 ) (5.3) Correspondingly the achievable throughput can be expressed as: R(D) = (1dD=K rx e) C(D) (5.4) where, C(D) = Eflog(1 + GSC (D))g is the mean capacity 7 and is the fraction of time- frequency resources consumed by the pilot sequence. 8 From (5.4), there is a trade-o between the number of diversity branches usedD and the amount of CSI training requireddD=K rx e. In this chapter, the CSI acquisition setD opt that maximizes the achievable throughput and its size D opt =jD opt j are found. Note that for the case of HBwS with single UE antenna, expectations of both (2.4) and (2.5) with respect to e H yield a similar expression to C(D) above. Similarly the CE overhead in (5.4) is identical to the overhead for HBwS when L =D (see Section 4.6). Thus the results derived in this chapter are also applicable to the to the case of HBwS with M tx 1 and M rx = 1. 6 The average path power corresponds to the Power Delay Prole in a UWB system and the Power Angle Spectrum in a SIMO system 7 The fading of the diversity paths is assumed to be quasi-static such that the capacity can be computed for each fading realization. In case of fast fading, the same expression holds for ergodic capacity. 8 In general, is a function of not only the symbol duration but also of the number of active TXs with orthogonal pilots (see Sec 5.4). 66 5.2 Problem Formulation Consider the following family of optimization problems for K rx DM rx : D (D) = argmax Df1;::;Mrxg jDj=D R(D) (5.5) Then the throughput maximizing CSI acquisition set can be expressed asD opt =D (D opt ), where: D opt = argmax KrxDMrx fR(D (D))g (5.6) Theorem 5.2.1. An optimal solutionD (D) to (5.5) is given by: D (D) = 1 2 ::: D (5.7) where is a permutation of the vector [1;:::;M rx ] such that i j for all ij. 9 Proof. See Appendix 5.A. The result of this theorem is illustrated pictorially in Fig. 5.2. Though it seems intuitively correct to only acquire the CSI for the D diversity paths with the highest average power, it is worth mentioning that Theorem 5.2.1 is non-trivial. For example, the theorem fails if the shape parameter m were dierent for each diversity path. Since it is now known how to ndD (D), Figure 5.2: Illustration of Theorem 5.2.1. the problem of ndingD opt is reduced to nding optimal size D opt in (5.6). Theorem 5.2.2. C(D (D)) is a non-negative, non-decreasing and a concave function 10 of D i.e., C D+1 C D for K rx DM rx 1 (5.8) where C D , C(D (D)) C(D (D 1)). 9 A version of this theorem for Rayleigh fading was proved by us in [131] 10 Here concavity, refers to the fact that the piece-wise linear function obtained by interpolating C(D (D)) for non-integer D is concave 67 Proof. See Appendix 5.B. Since C(D (D)) is non-decreasing function of D, from (5.4) we have: R(D (D))R(D (K rx D K rx )) for all K rx DM rx . Therefore one can simplify the optimization problem in (5.6) to: D opt = argmax D2fKrx;2K;::;Mrxg fR(D (D))g (5.9) For ease of notation, let C(D (0)) = C(D (M rx +K rx )) = 0. Then anyD 2fK rx ; 2K rx ;::;M rx g is a local maximum of (5.9) i: R(D (D K rx ))R(D (D ))R(D (D +K rx )) C Krx D g(D ) and C Krx D +Krx g(D +K rx ) (5.10) where C Krx D = C(D (D )) C(D (D K rx )) and: g(D) , C(D (DK rx )) 1 D Krx (5.11a) Since from theorem 5.2.2, C Krx D is a non-increasing function ofD andg(D) is a non-decreasing function ofD, any locally optimumD for (5.9) is also a globally optimum solution. Therefore, instead of a brute-force search, the following linear search algorithm is used to nd D opt : 11 Algorithm 5.1: Find D opt M rx ;K rx ;m;; - inputs Initialize D =K rx , C(D (0)) = 0; while D<M rx do Compute C(D (D)); Compute C(D (D +K rx )); if C Krx D g(D) and C Krx D+Krx g(D +K rx ) then return D; end if D =D +K rx ; end while return M rx 11 Though a binary search for a local optimum may have fewer iterations, the linear search can be used to recursively compute C(D (D)), as shall be illustrated in section 5.3 68 5.3 Computing the capacities In this section, a method to calculate C(D (D)) is found. Most of the prior works on the performance analysis of GSC, rely on nding the Moment Generating Function (MGF) of the SNR. Finding the MGF is in itself a computationally intensive exercise involving K rx D Krx one-dimensional integrals in general [203]. Therefore techniques to nd the mean capacity from the MGF such as [206,207] become computationally infeasible if K rx and/or D are large. Instead, this work shall rely on the upper bound on mean capacity: C UB (D), log(1 +Ef GSC (D)g) C(D) (5.12) to nd a near-optimal D opt in Algorithm 5.1. It can be easily veried that Theorems 5.2.1 and 5.2.2 are also applicable if C(D) is replaced by C UB (D). Computing C UB (D) (which is a function of the mean SNR) is also an involved exercise involvingK rx 2 D Krx one-dimensional integrals [203]. Though some works also nd closed form results [14], they involve a larger number of iterations and thus do not necessarily reduce the computational complexity. This computational load can be extremely large especially if K rx and/or D are large and therefore alternate approaches are required. Observing that C UB (D (DK rx )) is known while nding C UB (D (D)) in Algorithm 5.1, one can recursively dene: e C UB (D (D)) =e C UB (D (D1)) +Ef GSC (D)g (5.13) Using (5.25), one can write: Ef GSC (D)g = Z 1 x=0 x 2 h F jh D1 (Krx ) j (x)f jh D j (x)f jh D1 (Krx ) j (x)(1F jh D j (x)) i dx (5.14) where: f jh D1 (Krx ) j (x) = X b2P (1:Krx1;Krx +1:D1) D1 f jh b Krx j (x) " Krx1 Y i=1 1F jh b i j (x) # 2 4 D1 Y j=Krx+1 F jh b j j (x) 3 5 (5.15) F jh D1 (Krx ) j (x) = Krx1 X k=0 X b2P (1:k;k+1:D) D1 " k Y i=1 1F jh b i j (x) # 2 4 D1 Y j=k+1 F jh b j j (x) 3 5 (5.16) andP (a:b;c:d) D1 is set of all permutations of the vector [ 1 ;::; D1 ] such that8b2P (a:b;c:d) D1 ;b a < b a+1 <::<b b andb c <b c+1 <::<b d . In general, this recursive denition does not lead to any signicant savings in computing C UB (D (D)). However, in the special case whereD (D 1) has i.i.d. diversity paths, we have: f iid jh D1 (Krx ) j (x) = K rx D 1 K rx f jh iid D1 j (x) 1F jh iid D1 j (x) Krx1 F jh iid D1 j (x) DKrx1 (5.17) F iid jh D1 (Krx ) j (x) = Krx1 X k=0 D 1 k 1F jh iid D1 j (x) k F jh iid D1 j (x) Dk1 (5.18) 69 Ef iid GSC (D (D1))g = Krx X k=1 k D 1 k Z 1 x=0 h x 2 f jh iid D1 j (x) 1F jh iid D1 j (x) k1 F jh iid D1 j (x) Dk1 i dx (5.19) where f jh iid D1 j (x) and F jh iid D1 j (x) are the marginal PDF and CDF, respectively, ofjh i j8i 2 D (D 1). In this special case, computing C UB (D (D)) from C UB (D (D 1)) only involves computing K rx one-dimensional integrals. To reduce the computation cost in the general i.n.i.d case, while computingEf^ GSC (D)g 12 (from (5.14)),D (D 1) is approximated to be composed of i.i.d. components. In other words, f jh D1 (Krx ) j (x) andF jh D1 (Krx ) j (x) are approximated byf iid jh D1 (Krx ) j (x) andF iid jh D1 (Krx ) j (x), respectively, where the i.i.d. spreading parameter iid D1 is such that: Ef iid GSC (D (D1))g =Ef^ GSC (D (D1))g. Note that from (5.13),Ef^ GSC (D (D1))g is already available when computingEf^ GSC (D)g. This procedure is detailed in Algorithm 5.2 and shall be referred to as `RecursiveIID Approx'. The proposed approximation is accurate when either the spreading parameters i are equal for Algorithm 5.2: Compute ^ C UB (D (D)) recursively 12 Ef^ GSC (D (D 1))g;D;K rx ;m;; D - inputs if DK rx then return ^ C UB (D (D)) = log(1 +Ef^ GSC (D (D 1))g + D ) end if Find iid D1 s.t. Ef iid GSC (D (D 1))g =Ef^ GSC (D (D 1))g whereEf iid GSC (D (D 1))g is as dened in (5.19) and: f jh iid D1 j (x), 2 (m) " m iid D1 # m x 2m1 expf mx 2 iid D1 g f jh iid D1 j (x), lower,inc (m;mx 2 = iid D1 ) (m) fFor example, using FSOLVE in MATLABg ComputeEf^ GSC (D)g from (5.14) with ^ f jh D1 (Krx ) j (x); ^ F jh D1 (Krx ) j (x) as given by (5.17){(5.18). Ef^ GSC (D (D))g=Ef^ GSC (D (D1))g+Ef^ GSC (D)g return ^ C UB (D (D)) = log(1 +Ef^ GSC (D (D))g) some i and negligible for others or are skewed such that P Krx i=1 i P Mrx i=Krx+1 i . A study of the accuracy of the approximation for several practically relevant power spectra ( ) are studied 12 Here ^ X is used to denote an approximation for X (X = GSC; CUB) 13 The constant is chosen such that P Mrx i=1 i =Mrx 70 0 5 10 15 20 25 0.5 1 1.5 2 2.5 3 = 0.5 = 0.2 = 0 (a) Truncated exponential 0 5 10 15 20 25 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 = 1.5 = 3 = 0.75 (b) Truncated gaussian Figure 5.3: Mean capacity as a function ofD for dierent diversity power spectra: (a) Considers an exponential power spectrum i = expfig (b) Considers a Gaussian power spectrum i = exp n (ibMrx=2c) 2 2 2 o system parameters: m = 2;M rx = 24;K rx = 2; = 1 13 in Fig 5.3. Here the mean capacity C(D (D)) is compared to both the mean capacity upper bound C UB (D (D)) and the Recursive-IID Approx. ^ C UB (D (D)) (as obtained via Algorithm 5.2). Since the exact computations of C(D (D)), C UB (D (D)) are infeasible, they are obtained via Monte-Carlo simulations. The results show that Recursive-IID Approx provides a very good approximation to C UB (D (D)). Though there is a some gap between C(D (D)) and ^ C UB (D (D)), the gap is more or less constant. As shall be shown in Sec 5.4, the impact of this gap on D opt is minimal. Similar results are observed for other power spectra, barring a few heavy tail functions like the Zipf probability mass function. 5.4 Simulation Results For simulations, a system with a single antenna TX and a low complexity switched RX is considered. Two relevant scenarios are explored: 1) A UWB system with impulse radio sig- nalling and a selective Rake RX 2) an Orthogonal Frequency Division Multiplexed system with a multi-antenna RX. For nding the pilot overhead, it is assumed that there are U such sin- gle antenna TXs in the system. 14 Orthogonal pilots are assigned to the TXs to prevent pilot contamination. The fractional pilot overhead in the two cases is computed as: UWB T symb U T coh and MIMO rmsU T coh . 15 The simulation parameters are summarized in Table 5.2 and are similar to the parameters in IEEE 802.15.4a PAN (Personal Area Network) standard [208] and the cellular LTE (Long Term Evolution) standard [209], respectively. 14 TheU TXs may have dedicated RXs, such as in a peer-to-peer network, or may have a common RX such as in a multiple access channel. 15 This result is obtained considering that one pilot is required per TX in each coherence time and coherence bandwidth for OFDMA and that the TXs have orthogonal time access in a UWB Personal Area Network [208]. 71 Table 5.2: Simulation Parameters Cellular Layout UWB SIMO No. of diversity paths (M rx ) 50 100 No. of down-conversion chains (K rx ) 2 5 Carrier freq 6 GHz 2 GHz Coherence time (T coh ) 10 ms 10 ms Delay spread ( rms ) 100 ns 500 ns Symbol duration (T symb ) 8s 100s No. of Users (U) 25 200 Fractional pilot overhead () 0:02 0:01 Assuming that the multiple TXs have orthogonal access in time, frequency or space (no interference), one can restrict to a single TX-RX analysis, as is considered in this work. The achievable rates for the two scenarios as a function of D is presented in Fig 5.4. Here, the achievable throughput R(D (D)) (as obtained via Monte-Carlo simulations) are compared to the Recursive-IID Approx ^ R UB (D (D)) (as obtained via Algorithm 5.2 and (5.4)). The results 5 10 15 20 25 30 35 40 45 50 Diversity branches used (D) 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95 2 Achivable throughput (nats/s/Hz) (a) UWB: m = 5; = 1 10 20 30 40 50 60 70 80 90 100 Diversity branches used (D) 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 Achivable throughput (nats/s/Hz) (b) SIMO: m = 1; = 10 Figure 5.4: Achievable throughput as a function of D for two practical scenarios: (a) A UWB system with i = P 6 j=1 expf j 25 2i10j 15 gu[2i 10j] (where u[i] = 1 for i 0 and u[i] = 0 otherwise) (b) Considers a SIMO system with i = P 20 j=1 (j=20) 2 exp n [1:8(i50) j ] 2 50 o where j = 36(1) j p 2 log(j=20) 13 suggest that the proposed recursive IID algorithm predicts the value of D opt very accurately. Also, as expected, D opt M rx and this choice leads to a signicant increase in achievable throughput( 20 30%). 72 5.5 Summary and Conclusions In this chapter, the trade-o between diversity and training overhead in reduced complexity switched transceivers is studied. It is proved that to maximize achievable throughput, it is optimal to perform CE for only a subset of diversity branches (of sizeD) with the highest second moments, if the diversity branches have the same fading parameters but dierent mean powers. Note that this provides credence to Conjecture 4.2.1, albeit under some specic conditions. It is also proved that the achievable throughput is a unimodal function of the size of this subset D, which ensures that any locally optimumD is also globally optimum. A computationally ecient approximation for the mean capacity is introduced, which reduces the complexity of nding the optimalD signicantly. Simulation results for some practically important settings suggest that the optimal choice of D can improve throughput by a factor of 20 30%. Under typical scenariosD opt >K rx , suggesting that with judicious pilot training, selective Rake outperforms partial Rake [198] and HBwS outperforms HBaCSI, even after accounting for training overhead. 5.A Appendix A (Proof of Lemma 5.2.1). Supposef 1 ; 2 ;:::; D g is not an optimal solution to (5.5). Consider any optimal solutionD (D)6=f 1 ; 2 ;:::; D g. Then there exist distinct numbersa 1 ;::;a p ;b 1 ;::; b p (1a 1 ;::;a p D<b 1 ;::;b p M rx ) such that: D (D)[f a 1 ;::; ap g nf b 1 ;::; bp g =f 1 ;:::; D g Note that, from the denition of , we have a j b j for all 1jp. From (5.3), we then have: GSC (D (D)) = max SD (D);jSj=Krx ( X i2S jh i j 2 ) max SD (D);jSj=Krx ( X i2S i jh i j 2 ) (5.20) where, the constants are dened as: i = 8 < : a j b j for i = b j ; 1jp 1 otherwise (5.21) It can be easily veried from (5.2), thatjh a j j d = p b j jh b j j8j, where d = denotes equality in distribution. Now since h i are independently distributed for 1iM rx , from (5.20) we have: GSC (f 1 ;:::; D g) d = max SD (D);jSj=Krx ( X i2S i jh i j 2 ) ) GSC (f 1 ;:::; D g) d GSC (D (D)) (5.22) 73 where d represents rst order stochastic dominance of the left hand side over the right hand side. Using (5.4) and (5.22) we further have: R(D (D))R(f 1 ;:::; D g) (5.23) which is in contradiction to our initial assumption. This concludes the proof. 5.B Appendix B (Proof of Lemma 5.2.2). SinceD (D)D (D + 1), from (5.3) and (5.4), C(D (D)) is a non- negative, non-decreasing function of D. For any D, let us dene a new random vector h such that: h i = h i for i = 2fD;D + 1g, h D d =h D but is independent of h, and h D+1 = h D q D+1 D . It can be easily veried from (5.2) that h d = h. Letjh D (i) j;j h D (i) j represent magnitude of thei-th largest diversity paths (in magnitude) from the setsfjh j j j2D (D)g andfj h j j j2D (D)g, respectively. 16 Then from (5.3) we have: GSC (D (D)) = Krx X i=1 jh D (i) j 2 (5.24) We also dene: GSC (D), GSC (D (D)) GSC (D (D 1)) = maxfjh D1 (Krx) j 2 ;jh D j 2 gjh D1 (Krx) j 2 (5.25) Now the incremental capacity can be expressed as: C D = C(D (D)) C(D (D 1)) = E ( Z GSC (D) 0 1 1 + GSC (D (D 1)) +x dx ) (5.26) Since h d = h, one can also write: C D+1 = E ( Z GSC (D+1) 0 1 1 + GSC (D (D)) +x dx ) (5.27) where GSC (D (D)); GSC (D + 1) are as in (5.24){(5.25) with terms of h replaced by corre- sponding terms of h. AsD (D 1)D (D), from the denition of h it can be easily veried thatj h D (Krx) jjh D1 (Krx) j and GSC (D (D)) GSC (D (D 1)), for all channel realizations. Ad- ditionally, using theorem 5.2.1, h D+1 h D and so GSC (D + 1) GSC (D). Using these results and from (5.26){(5.27), the theorem follows. 16 We setjh D (i) j = 0 if i>jD (D)j 74 Chapter 6 Conclusions and Future research directions In this part, a generalization of reduced complexity switched transceivers was proposed, referred to as Hybrid Beamforming with Selection (HBwS) as an attractive solution to reduce the hardware cost of multi-user MIMO systems, while retaining good performance. In Chapter 3 it was shown that the instantaneous hSNR capacity 1 of HBwS can be approximated as the largest element of a correlated Gaussian vector. While this allows nding an approximate PDF to the instantaneous capacity, it imposes a signicant computational burden, especially when the set of all switching optionsS is large. In Chapter 4 the analog beamformer design problem for HBwS was considered, where the sub-space spanned by the beamformer was restricted to limit the CE overhead. It was shown that a beamformer design that maximizes a closed form lower bound to the hSNR mean capacity can be obtained via packing on the Grassmannian manifold. Additionally, the CE overhead for HBwS was quantied and several techniques for reducing the hardware cost of the analog beamforming matrix and the switch bank were also proposed. Finally in Chapter 5, the optimal trade-o between the CE overhead and the system mean capacity of HBwS was studied for a simple scenario, and a computationally ecient algorithm to nd the throughput maximizing system parameters was proposed. While the results show signicant improvement in performance with HBwS in comparison to HBaCSI in narrow-band systems, it should be noted that the gains expected in wide-band systems are usually small. This is because in the limit of a large transmission bandwidth, frequency diversity makes all the L tx selection ports equivalent. Hence HBwS, and in general all iCSI based schemes: antenna selection, beam selection and HBiCSI, are more suited for small-to-medium bandwidth systems, where channels are at-most moderately frequency selective. While several interesting problems have been tackled in this thesis, there is still signicant scope for work on HBwS. A few such interesting research directions are enlisted below: 1. In Chapter 2, it was assumed that the BS transmit resources, i.e., power, input ports ( L), switches and RF chains ( K) are split among dierent user groups based on average channel statistics, via an aCSI based resource sharing algorithm (see Fig. 2.2b). While several 1 hSNR capacity is a capacity lower bound that is tight in the high SNR regime (see Chapter 2). 75 such resource allocation schemes have been proposed for the case of HBaCSI [67,69,139], extending them to HBwS will be a non-trivial and challenging problem, due to the time varying \eective" beams. The complexity stems from the fact that the sum-capacity of HBwS, which is a possible objective function for the resource allocation problem, does not admit a closed form solution. While the capacity approximation derived in Chapter 3 is also not in closed form and is cumbersome to compute in general, I believe that it can yield a closed form approximation under some simplifying assumptions such as M rx = 1 or D tx = I D . Thus there is scope for nding near optimal resource allocations at least under some simplifying assumptions. 2. The beamformer design problem handled in Section 4 assumed no constraints on the beamformer hardware. It would be an interesting problem to nd a beamformer de- sign with constraints imposed such as use of unit gain, discrete phase shifters, or per antenna/RF chain power constraints. 3. Note that the dimension of the analog beamformer for HBwS is M tx L tx , but depending on the switch positions S only a M tx K tx sub-matrix of T is actually utilized for transmission at any time. Clearly there is some redundancy in hardware with possible scope for optimization. By possibly integrating the analog hardware in the beamformer and the switches in the switch bank into one complex block, it might be possible to reduce the number of analog hardware components. 4. Throughout Part I, the analog beamformer was assumed to be built using recongurable analog hardware. However such an HBwS scheme can also be used in conjunction with lens based antenna arrays. In particular, note that the two stage beamformer structure in Section 4.5.1 can be built by using a M tx M tx Rotman lens [43, 44] for the outer beamformer, a DL xed RD-beamformer and a bank of switches that connects D out-of-the M tx lens ports to the xed analog beamformer (as depicted in Fig. 6.1). The performance analysis of such a system could be an interesting extension to the current work. 5. Throughout this part it was assumed that the hSNR capacity maximizing selection matrix S is chosen for each channel realization. However as discussed in Chapter1.1.1, nding the optimal subset of K rx antennas is a non-convex problem. While several greedy/heuristic antenna subset selection algorithms have been proposed [27,31,35,94], most of them not suitable to the case where the beamformer T has non-orthogonal columns. This is due to the presence of the orthonormalization matrix G in the capacity expression (2.4){ (2.5) when such an ortho-normal constraint is not satised. It will be an interesting and challenging problem to study the performance of conventional reduced complexity antenna/beam selection algorithms in such scenarios and, if required, propose new algo- rithms that perform better. 6. The trade-o between the capacity and the CE overhead has only been analyzed for only a simple case in this thesis. There is a signicant scope of work on this problem for cases where M tx > 1 or when the beamformer T may have non-orthogonal columns. 76 Figure 6.1: Block diagram of a Rotman lens based two-stage HBwS beamformer, corresponding to one UE group. 77 Part II Analog Channel Estimation Techniques for massive MIMO In this part a new class of reduced complexity receivers that can perform CE in the analog domain shall be introduced. Such receivers tread a ne line between coherent and non-coherent communication, and are in fact inspired by a class of non-coherent RXs for Ultra-Wide Band (UWB) systems called Frequency Shift Reference (FSR) RXs. Therefore, for continuity, rst some new results on FSR RXs for UWB systems shall be presented in Chapter 7. Then in Chapter 8 a novel low complexity non-coherent RX for massive MIMO systems called `multi- antenna FSR' RX is proposed, that is inspired by UWB FSR RX but avoids some of its pitfalls. In Chapter 9 it shall be shown that by allowing the multi-antenna FSR RX to work coherently, a signicant reduction in noise level can be achieved, leading to a new coherent transmission scheme called Reference Tone Aided Transmission (RTAT). The RTAT RX can be interpreted as an RX that performs CE in analog domain continuously. In Chapter 10 a periodic analog channel estimation based RX is presented, that performs CE periodically/judiciously leading to better performance than RTAT. Finally the summary and some future directions of investigation are discussed in Chapter 11. P2 V. V. Ratnam and A. F. Molisch, \Analog channel estimation techniques for beam- former design in massive MIMO systems" provisional patent - 62/652,056. P3 V. V. Ratnam, A. F. Molisch et. al., \A Quardrature Amplitude Modulation Ca- pable Multi-Dierential Frequency-Shifted Reference UWB Radio", provisional patent - 62/567,686. J4 V. V. Ratnam and A. F. Molisch, \Periodic Analog Channel Estimation Aided Beam- forming for Massive MIMO Systems," in preparation, 2018. C5 V. V. Ratnam, A. F. Molisch, et. al., \Bit and Power Allocation in QAM Capable Multi- Dierential Frequency-Shifted Reference UWB Radio," in IEEE Global Communications Conference (Globecom), Singapore, 2017, pp. 1-7. C6 V. V. Ratnam and A. F. Molisch, \Reference Tone Aided Transmission for Massive MIMO: Analog Beamforming without CSI," in IEEE International Conference on Com- munications (ICC), Kansas City, 2018. 78 C7 V. V. Ratnam and A. F. Molisch, \Multi-antenna FSR Receivers: Low Complexity Non-coherent Receivers for Massive MIMO," submitted to IEEE Global Communications Conference (Globecom), Abu Dhabi, 2018. 79 Chapter 7 QAM capable Multi-dierential Frequency Shift Reference UWB Radio Due to its inherent advantages of oering resilience to narrow-band interference, channel fading and its precision in localization, UWB communication has received signicant attention. Recent results have also shown that impulse-radio, a form of UWB communication, is a good candidate for transmission over millimeter wave frequency bands [210]. For coherent UWB systems, while the transmitter can be designed with fairly low complexity, the RX complexity may be pro- hibitively high. Issues such as timing synchronization and requirement of a large number of Rake correlators to tap sucient channel power, especially in dense multi-path environments, make the RXs both power-hungry and expensive [128]. While several low-complexity alternatives have been proposed in the literature such as Selective Rake or Partial Rake RXs [127,131], the required CE overhead for such systems may be large [79]. Therefore, as a compromise between performance and RX complexity, non-coherent transmission schemes such as energy detection and auto-correlation reception have been proposed. While such schemes have inferior perfor- mance to coherent reception, they admit a simpler RX architecture. Furthermore, such schemes do not require explicit CE and can therefore be used in rapidly time varying channels [128], such as in vehicular networks. Among the non-coherent approaches, auto-correlation schemes have the added advantage over energy detection that they can utilize the signal phase and can therefore achieve higher data rates. In such schemes, a data signal and a reference signal are simultaneously transmitted, separated either in time, frequency or code domain. The RX performs a non-linear operation that eectively correlates the received waveform corresponding to the data signal with that of the reference signal, thereby, leading to an in-phase accumulation of the multi-path component MPC energies. The rst such scheme called transmit-reference was explored in [130], where the data and reference signals are separated in the time domain. The corresponding RX however requires a wide-band analog delay element, which may be hard to implement in an integrated circuit. Consequently, Frequency Shift Reference (FSR) systems [132] and code-shift reference systems [133] were proposed, where the data and reference signals are separated in frequency 80 and code domains, respectively. In such systems, the delay element is replaced by mixers or analog correlators, which are easier to implement. Several modications to reduce the noise accumulation have also been considered [211]. To improve the energy eciency and throughput of auto-correlation RXs, multi-dierential schemes, where multiple data signals share the same reference signal, have also been proposed [212, 213]. However, most prior works on Multi- Dierential Frequency Shift Reference (MD-FSR) systems only exploit the in-phase component of each data signal i.e., they can support only binary phase shift keying (BPSK) or amplitude shift keying (ASK). This limits the data-rate such systems can achieve. In this work, a more generic MD-FSR RX is explored that can exploit both the in-phase and quadrature phase components of the signal and therefore can support higher order modulation formats, such as, quadrature amplitude modulation (QAM). The contributions of this work are as follows: 1. A MD-FSR system is proposed, where the data streams can support QAM with a channel adaptive modulation order. 2. The corresponding RX that can exploit both the in-phase and quadrature-phase signal components is designed and its performance is analyzed in terms of bit error rate and input-output mutual information. 3. It is shown that the system performance depends only on a few channel metrics which can be estimated very easily at the RX. 4. Using these channel metrics, the optimal bit and power allocation to each data stream is found, both for un-coded and coded systems. Notation used in this work is as follows: scalars are represented by light-case letters; vectors by bold-case letters; matrices are represented by capitalized bold-case letters; and sets are represented by calligraphic letters. Additionally, Efg represents the expectation operator, C represents the set of complex numbers. Also, j = p 1 and Refcg, Imfcg, c refer to the real component, the imaginary component and the complex conjugate of a complex number c, respectively. 7.1 General Assumptions and System Model An impulse radio system implementing MD-FSR transmission [212] is considered, with K par- allel data streams sharing a common reference signal. Each symbol is composed of F frames, each of duration T f i.e., symbol duration T s =FT f . The transmit UWB pulse waveform p(t) is assumed to have a narrow support ont2 [0;T p ], whereT p T f . Further, the pulse is assumed to be normalized such that R 1 1 p 2 (t)dt = 1=F . Without loss of generality, throughout this chapter, a base-band notation is assumed. Let x = [x 1 ;:::;x K ] be the scaled complex transmit symbol vector 1 for the 0-th symbol, where E (k) d = Efjx k j 2 g is the average energy per symbol 1 Here, the in-phase component of the signal is represented as real and the quadrature-phase component as imaginary. 81 allocated to the k-th data stream. Then, the base-band signal for the 0-th symbol is given by: s(t) = Re n p E r + K X k=1 p 2x k e j2f k t o u(t); (7.1) whereu(t) = P F n=1 p(tnT f ),E r is the energy per symbol allocated to the reference signal and f k represents the frequency oset ofk-th data stream from the reference signal. An illustration of the transmit signal for the 0-th symbol is given in Fig. 7.1a. The total average transmit symbol (a) Transmit signal (b) Received signal Figure 7.1: An illutration of the transmit and received signals forK = 1,x 1 =e j 2 andE r = 1. energy is then given by E s = E r + P K k 1 =1 E (k 1 ) d . To minimize inter-stream interference [212], these frequency osets are assumed to be chosen as: f k = (2k 1)=T s . Note that in (7.1), u(t) eectively samples x k e j2f k t at every T f seconds. Therefore, to prevent aliasing, it ia further assumed that K < (F + 2)=4. In practice, the UWB pulses in u(t) may actually be jittered for improved interference rejection and shaping of the emission spectrum. Such jittering may slightly degrade the performance of the MD-FSR system, especially at the higher frequency data streams. An analysis of the impact of jittering is beyond the scope of this chapter. For brevity, it is assumed that the RF transmission happens via double side-band modulation [214]. However, it can be veried the presented results are also applicable to single-side band transmission. Using partial channel information fed back by the RX, the transmitter chooses the appropriate modulation and/or coding scheme for each data stream, which shall be discussed later in section 7.4. The channel is assumed to haveL MPCs with the base-band channel impulse response given as: h(t) = L X `=1 ` (t ` ); where ` and ` denote the complex amplitude and delay of the `-th channel MPC, with `+1 > ` . For simplicity of analysis, it is assumed that a b T p for all a6=b. Similarly, to 82 prevent inter-frame interference, the maximum channel delay L is assumed to be smaller than T f . Furthermore, the channel coherence time is assumed to be at-least an order of magnitude larger than the symbol duration. - Refr(t)g () 2 ? - Imfr(t)g () 2 6 L - jr(t)j 2 - N 6 =2 ? N - cos(2f K t) - R Ts 0 ()dt - Y K;I - R Ts 0 ()dt - Y K;Q . . . - N 6 =2 ? N - cos(2f 1 t) - R Ts 0 ()dt - Y 1;I - R Ts 0 ()dt - Y 1;Q - R Ts 0 ()dt - Y 0 Figure 7.2: Base-band block diagram of MD-FSR RX which exploits both in-phase and quadrature-phase signal components The RX is assumed to an ideal lter that leaves the desired signal un-distorted but suppresses the out-of-band noise. Assuming perfect symbol timing synchronization, the ltered base-band received signal for the 0-th symbol can be expressed as: r(t) = L X `=1 Re ( K X k=1 p 2x k e j2f k (t ` ) + p E r ) ` u(t ` )+n(t); where Refr(t)g, Imfr(t)g represent the base-band received signals generated by cosine and sine components of the carrier, respectively, n(t) is the zero-mean, two-sided, circularly symmetric, complex, additive Gaussian noise process with power spectral density S n (f) = N 0 forjfjW , W being the system bandwidth. For illustration, the received signal components for the TX signal in Fig. 7.1a are given in Fig. 7.1b. Without loss of generality, the carrier phase oset and the phase-shift due to the delay of the carrier frequency component is included into the complex MPC amplitude ` , for each MPC `. The base-band signal is then squared and frequency shifted using a bank of mixers as depicted in Fig. 7.2. Note that since the transmitter may use higher order modulation formats, both the in-phase (I) and quadrature-phase (Q) signal components: Y k;I ;Y k;Q are required. For a given K, this doubles the required number of mixers and integrators, in comparison to the RX in [212], which exploits only the I-component. However, as shall be shown in section 7.5, the proposed RX outperforms the design in [212] even if restricted to have the same number of mixers and integrators. 7.2 Analysis of the Integrator Outputs Throughout this chapter, several cases shall be encountered where a low frequency phasor, say e j2ft , is \sampled" by the UWB pulse waveform. In such cases the following results shall be 83 used: e j2ft p(ta) e j2fa p(ta) (7.2) F X n=1 e j2fnT f F = Z Ts 0 e j2f min t T s dt; (7.3) where f is a multiple of 1=T s , f W and f min = f b T f for an integer b such that f min 2 1 2T f ; 1 2T f . Note that the squared output in Fig. 7.2 can be expressed as: jr(t)j 2 =jr s (t)j 2 +jn(t)j 2 +r s (t)n (t) +r s (t)n(t); (7.4) where r s (t) denotes the noise-free component of the received signal r(t). 7.2.1 Signal plus noise power estimation Taking an average of the 0-th integrator output in Fig. 7.2, over several symbols, the RX can estimate the received signal-plus-noise energy as: EfY 0 jh(t)g (1) = E Z Ts 0 jr s (t)j 2 +jn(t)j 2 dt (2) L X `=1 j ` j 2 " E r + K X k=1 E (k) d # + Z Ts 0 R n (0)dt = 2 rss E s + 2T s W N 0 ; (7.5) where (1) = follows from (7.4) and the facts that n(t) has zero mean and is independent of r s (t); (2) follows by using (7.2)-(7.3) and the facts that f k is a multiple of 1=T s ; R n () is the auto- correlation function for the noise process and rss (h) , q P L `=1 j ` j 2 is a measure of the in- stantaneous channel gain. Since the channel coherence time is typically several orders of mag- nitude larger than the symbol duration, the RX is assumed to have an accurate knowledge of EfY 0 jh(t)g. From (7.5), it is evident that our RX accumulates twice the noise power in com- parison to [212]. However, it also accumulates twice the channel power ( 2 rss ) as opposed to the P L `=1 Ref ` g 2 in [212]. The remaining 2K integrator outputs in Fig. 7.2, which are used for demodulating the data streams, are analyzed in the following subsections. 7.2.2 Signal Component Analysis For ease of representation, Y k , Y k;I +jY k;Q is dened as the k-th I/Q integrator output in Fig.7.2. From (7.4), the signal component of Y k can then be expressed as: S k = Z Ts 0 jr s (t)j 2 e j2f k t dt (1) Z Ts 0 L X `=1 j ` j 2 p 2E r x k e j2f k ` u 2 (t ` ) dt 84 = L X `=1 j ` j 2 p 2E r x k e j2f k ` ; (7.6) where (1) follows by using (7.2)-(7.3) and observing that the other terms vanish since f ^ k is an odd multiple of 1=T s 8 ^ k. As is evident from (7.6), even if Imfx k g = 0 i.e., the transmitted signal has only I-phase component, the corresponding received signal has both I and Q components. This apparent rotation is introduced because the reference signal and thek-th data stream pass through slightly dierent channels, owing to the dierence in the modulating frequency. Such a rotation of the constellation points can easily be tracked at the RX with negligible overhead, as explained in section 7.2.4. 7.2.3 Noise component analysis From (7.4), the noise components at the k-th I/Q integrator output (Y k = Y k;I +jY k;Q ) in Fig. 7.2 can be expressed as: Z k = Z Ts 0 e j2f k t r s (t)n (t) +r s (t)n(t) +jn(t)j 2 dt: (7.7) It is well known from literature [132,212] that such noise components are approximately Gaus- sian distributed. Given the transmit data vector x and channel impulse response h(t), the conditional mean of the noise components can be computed as: EfZ k jx;hg = Z Ts 0 R n (0)e j2f k t dt = 0: Similarly the conditional variance and pseudo-covariance, respectively, of the I/Q noise can be expressed as (detailed steps are given in Appendix 7.A): 2 Z k jx;h = EfZ k Z k g = 2N 2 0 WT s +2N 0 2 rss K X k=1 jx k j 2 +E r (7.8) ~ 2 Z k jx;h = EfZ k Z k g = N 0 L X `=1 j ` j 2 e j4f k ` X (k 1 ;k 2 )2A k ^ x k 1 ^ x k 2 ; (7.9) whereA k fK;::; 0;::;Kg 2 such that (k 1 ;k 2 )2A k if (k 1 ;k 2 ) satises one of: ^ f k 1 + ^ f k 2 = 2f k (7.10a) ^ f k 1 + ^ f k 2 + 2f k = 1 T f ; (7.10b) with ^ f ^ k ,f ^ k , ^ f ^ k ,f ^ k , ^ x ^ k ,x ^ k , ^ x ^ k ,x ^ k for ^ k2f1;::;Kg, ^ f 0 , 0 and ^ x 0 , p 2E r . Clearly, from (7.8){(7.9), one can see that the noise terms are not circularly symmetric and are further dependent on the data vector x. Therefore, for optimal decoding, a joint estimation of the data 85 vector x should be performed. To reduce the decoding complexity, the dependence of noise on the data vector x shall be neglected. Under this assumption, the noise variances, averaged over the data-vector x reduce to: 2 Z k jh = 2N 2 0 WT s +2N 0 2 rss E s (7.11a) ~ 2 Z k jh 0; (7.11b) by using the fact that the data streams have a zero mean and are mutually independent (see Section 7.1), and by using the approximation Efx 2 k g 0. Note that this approximation does not hold for some modulation formats e.g., Binary Phase Shift Keying (BPSK). However, as shall be shown in section 7.5, the mismatch in the results due to approximating (7.8)-(7.9) by (7.11) is small, even in such cases. In fact, in the SNR regime of interest, typically the noise-noise term N 2 0 WT s dominates 2 Z k jx;h , and therefore any non-circularly and dependence on x, if present, is negligible. In conclusion, the noise for each data stream is approximately circularly symmetric and Gaussian distributed. 7.2.4 Pilot Training Since coherence time is several orders of magnitude larger than the symbol duration, it is as- sumed that several pilot symbols can be transmitted for each channel realization with negligible overhead. Using the integrator outputsfY 1 ;:::;Y K g for these pilot symbols (refer (7.6)), the RX can accurately track the phase rotation introduced by the channel as: k = L X `=1 j ` j 2 e j2f k ` 1kK: (7.12) Furthermore, using blanked pilots i.e., symbols where no signal is transmitted, the RX can use the 0-th integrator output to accurately estimate the noise power N 0 . Henceforth, the RX shall be assumed to have perfect knowledge off 1 ;::; K g, N 0 and rss (refer (7.5)), for a given channel realization h(t). It is further assumed that using this knowledge, the RX picks the appropriate modulation order and/or coding rate for transmission and feeds it back to the TX via an error free feedback channel. Note that since these pilots are used only to detect the constellation-rotation and not the actual channel impulse response, the advantage of an FSR system, of not requiring explicit channel estimation, is still applicable. 7.3 Performance Analysis Using (7.6){(7.11), the eective base-band channel between the transmit data and the integrator outputs, can be expressed as: Y k = q 2E r E (k) d k x k +Z k 1kK; (7.13) where x k ,x k = q E (k) d andZ k CN (0; 2 Z k jh ). Note that since both the I and Q components are utilized, the magnitude of the channel fading termj k j is larger than the fading term in [212] 86 viz., Ref k g. This leads to a reasonable boost in performance when either the maximum channel delay max T f and K F=2 or when the MPC amplitudes ( ` 's) have a soft onset behaviour [215]. Assuming the k-th data stream uses grey coded k -Quadrature Amplitude Modulation (QAM) modulation, the corresponding BER, can be approximated as [214]: P (k) ejh ( k ) 8 > > < > > : Q r 4ErE (k) d j k j 2 2 Z k jh for k = 2 4( p k 1) p k log 2 k Q r 6ErE (k) d j k j 2 ( k 1) 2 Z k jh for k > 2 (7.14) For un-coded systems and systems with a xed code rate, BER is a good metric for analyzing system performance. Similarly for systems with adaptive code-rate, the mutual information between Y k and x k , for a given modulation order format, is used as the performance metric. This is an upper- bound to the system throughput achievable using practical channel codes. Assuming the k-th data stream uses k -QAM modulation, the mutual information for the k-th data stream (in bits/symbol) can be expressed as [216]: I k (h(t); k ) = X x k 2X k 2 6 6 6 4 Z Y k 2C exp Y k q 2ErE (k) d k x k 2 2 Z k jh k 2 Z k jh log 2 0 B B B @ k exp Y k q 2ErE (k) d k x k 2 2 Z k jh P ^ x k 2X k exp Y k q 2ErE (k) d k ^ x k 2 2 Z k jh 1 C C C A dY k 3 7 7 7 5 ; (7.15) whereX k is the set of constellation points on the I-Q plane for k -QAM. 7.4 Optimal Bit and Power Allocation Let the allowed set of QAM modulations for each data stream be: = 1; 2; 4;:::; max . In this section the optimal bit and power allocation 2 to each data stream shall be discussed, for both xed and adaptive code-rate MD-FSR systems. For these results, the following lemma shall be utilized: Lemma 7.4.1. For any power allocationfE r ;E (1) d ;::;E (K) d g such that E r + P k E (k) d E s , we have: E r E (k) d ~ E r ~ E (k) d for all 1kK, where: ~ E r =E s =2; ~ E (k) d = E (k) d E s 2(E s E r ) : Proof. Using AM-GM inequality, we have: E 2 s 4 E r (E s E r ) 2 These allocations are optimal given the approximations in (7.11). 87 ) E (k) d E 2 s 4 E (k) d E r (E s E r ); for any 1kK ) ~ E (k) d ~ E r E (k) d E r : (7.16) This proves the lemma. Note thatf ~ E r ; ~ E (1) d ;::; ~ E (1) d g also satises the sum power constraint i.e., ~ E r + P k ~ E (k) d E s . 7.4.1 Uncoded/ Fixed Code-rate Systems For an un-coded system, the optimal 2 power allocationfE r;opt ;:::;E (K) d;opt g and modulation order assignment 1;opt ;::; K;opt is the solution to: argmax Er;E (1) d ;:::;E (K) d ; 1 ;::; K ( X k log 2 k ) (7.17) subject to: E r + X k E (k) d E s ; P (k) ejh ( k )P e;th ; k 2f1; 2; 4;::; max g81kK; where P e;th is a threshold on the bit error rate. It can be noted from (7.14) that the BER for stream k is a decreasing function of E r E (k) d . Therefore, using lemma 7.4.1, there exists an optimal power allocation with E r;opt = E s =2. Using this value of E r;opt , (7.17) reduces to the classical problem of bit and power allocation to multi-carrier systems [217]. This can be solved using the Hughes-Hartogs algorithm or any of its numerous variants [217, 218]. Note that, by replacing P (k) ejh ( k ) with BER for a given code-rate, this analysis also extends to systems with xed code-rates. 7.4.2 Adaptive Code-rate Systems For systems with adaptive code-rate and with equi-probable symbols, a higher order QAM constellation always provides more mutual information than lower order QAMs [216]. Therefore, without loss of generality, one can set 1 = ::: = K = max and nd the optimal 2 power allocationfE r;opt ;:::;E (K) s;opt g as a solution to: argmax Er;E (1) d ;:::;E (K) d ( X k I k (h(t); max ) ) (7.18) subject to: E r + X k E (k) d E s : It can be noted from (7.15) thatI k (h(t);) is a strictly increasing function ofE r E (k) d . Therefore, again using lemma 7.4.1, there exists an optimal power allocation withE r;opt =E s =2. Using this value of E r;opt , (7.18) reduces to the problem of power allocation in multi-carrier systems with specic input distributions. Therefore,fE (1) d;opt ;:::;E (K) d;opt g can be obtained using the Mercury water-lling algorithm [219]. 88 7.5 Simulation Results For simulation results, a MD-FSR impulse radio system operating at a center frequency of 6 GHz, having a pulse width of T p = 2 ns and a frame duration of T f = 160 ns is considered. For modeling the channel power delay prole (PDP), the industrial non line-of-sight scenario model (DSM bs NLoS-b) from [215] is used. By using this PDP, a base-band sample channel impulse response (that satises the assumptions in Section 7.1) is generated as [220]: h(t) = (T f =Tp)1 X `=0 rss q PDP(`T p ) ` e j ` (t`T p ); where the PDP is normalized as: P (T f =Tp)1 `=0 PDP(`T p ) = 1 and for reproducible results, the small scale fading coecients are set as ` = 1 and the phase angles as ` =`=5. The average RX SNR is also dened as: SNR = 2 rss E s =N 0 . For this channel model, the simulated bit error probabilities for a sample MD-FSR RX are compared to the analytical results from (7.14), in Fig. 7.3. The results show an excellent match between the analytic BER expressions and the 14 16 18 20 22 24 26 28 30 32 34 10 -4 10 -3 10 -2 10 -1 10 0 Bit Error Rate (BER) Analytic, =2,k=1 Monte-Carlo, =2,k=1 Analytic, =2,k=5 Monte-Carlo, =2,k=5 Analytic, =4,k=1 Monte-Carlo, =4,k=1 Analytic, =4,k=5 Monte-Carlo, =4,k=5 Analytic, =16,k=1 Monte-Carlo, =16,k=1 Analytic, =16,k=5 Monte-Carlo, =16,k=5 BPSK ( =2) QPSK ( =4) 16-QAM ( =16) Figure 7.3: BER for data streamsk =1; 5 of an MD-FSR RX with grey-coded QAM modulation K = 5, F = 20, E (k) d = Es 2K , k =, 1kK Monte-Carlo simulations. This suggests that the circularly symmetric Gaussian approximation for the eective noise (in section 7.2.3) is reasonable, at-least for the SNR regime depicted here. Next, the sum-rates of an un-coded and an adaptive code-rate system, with optimal bit and power allocation, are studied as a function of max in Fig. 7.4. Additionally, the performance of the MD-FSR RX design in [212] is also plotted, with BPSK transmission. This design only requires one integrator and correlator per data stream, as opposed to the two in our design (see Fig. 7.2). Hence, for a fair comparison, double the data-streams are allowed for the RX in [212]. The results show that the use of higher order modulation formats improves performance, even at low SNR values. Furthermore, even with the same RX hardware complexity, our design with max 4 outperforms the conventional BPSK MD-FSR RX [212] over the whole SNR range 89 22 24 26 28 30 32 34 36 0 2 4 6 8 10 12 14 16 18 20 Sum Rate (bits/symbol) (a) Un-coded P e,th = 10 3 , F = 40 18 20 22 24 26 28 30 32 34 36 38 0 2 4 6 8 10 12 14 16 18 20 Sum Mutual Information (bits/symbol) (b) Adaptive Code-rate (F = 40) Figure 7.4: Sum-Rate from (7.17) and (7.18), respectively, for an MD-FSR system as a function of the SNR. Optimal bit/power allocation obtained using (a) the Hughes-Hartogs algorithm [218] (b) the Mercury water-lling algorithm [219]. studied. 3 Intuitively, this is because the use of both I/Q components allows a slightly higher SNR (j k j versus Ref k g in (7.13)) and also allows the same data-rate to be achieved using fewer sub-carriers (with better channel gains). 7.6 Conclusion In this chapter a novel architecture for a MD-FSR system is proposed that can exploit both the in-phase and quadrature-phase components of the signal. The corresponding received signal and noise components are analyzed, and closed form expressions for the system performance measures are provided. The optimal power allocation to the reference signal is derived, and the bit and data stream power allocation problem is considered, both for xed and adaptive code-rate systems. Results suggest that the proposed RX outperforms conventional designs, even when restricted to have the same hardware complexity. 7.A Appendix A From (7.7), the conditional variance of the noise for the k-th data stream can be expressed as: 2 Z k jx;h = EfZ k Z k g = E n Z Ts 0 r s (t)n (t) +r s (t)n(t) e j2f k t dt 2 | {z } (i) +E n Z Ts 0 jn(t)j 2 e j2f k t dt 2 | {z } (ii) :(7.19) 3 Note that the receiver in [212] can potentially also transmitQ-ary Amplitude Shift Keying (ASK) modulated signals. By picking maxQ 2 , our design can outperform it. 90 Note that the other terms in (7.19) vanish, since n(t) is independent of r s (t) and the odd moments of n(t) are zero. Now the rst term of (7.19) can be computed as: (i) = ZZ Ts 0 e j2f k (t 1 t 2 ) h R n (t 1 t 2 )r s (t 1 )r s (t 2 ) +R n (t 1 t 2 )r s (t 1 )r s (t 2 ) i dt 1 dt 2 : (7.20) The rst component in (7.20) can be computed as: (i) a = ZZ Ts 0 R n (t 1 t 2 )e j2f k (t 1 t 2 ) r s (t 1 )r s (t 2 )dt 1 dt 2 (1) ZZ 1 1 L X `=1 F X n=1 R n (t 1 t 2 )e j2f k (t 1 t 2 ) p(t 1 nT f ` )p(t 2 nT f ` ) Re K X k 1 =1 p 2x k 1 e j2f k 1 nT f + p E r Re K X k 2 =1 p 2x k 2 e j2f k 2 nT f + p E r j ` j 2 dt 1 dt 2 (2) = ZZ 1 1 R n ( ^ t 1 ^ t 2 )e j2f k ( ^ t 1 ^ t 2 ) p( ^ t 1 )p( ^ t 2 )d ^ t 1 d ^ t 2 " L X `=1 F X n=1 j ` j 2 K X k 1 =K ^ x k 1 p 2 e j2 ^ f k 1 nT f K X k 2 =K ^ x k 2 p 2 e j2 ^ f k 2 nT f # (3) = Z 1 1 Z 1 1 S n (f)P (f k +f)e j2(f+f k ) ^ t 2 df p( ^ t 2 )d ^ t 2 " L X `=1 Fj ` j 2 K X k=1 jx k j 2 +E r # = Z 1 1 S n (f)jP (f +f k )j 2 df " L X `=1 Fj ` j 2 K X k=1 jx k j 2 +E r # (4) = N 0 2 rss K X k=1 jx k j 2 +E r ; (7.21) where (1) follows from (7.2), the fact that R n (t) 0 for t 1=W and by change of the integration limits using that fact that p(t) = 0 for t = 2 [0;T p ]; (2) = follows by change of variables ^ t 1 =t 1 nT f ` , ^ t 2 =t 2 nT f ` and dening ^ f ^ k ,f ^ k , ^ f ^ k ,f ^ k , ^ x ^ k ,x ^ k , ^ x ^ k ,x ^ k for ^ k2f1;::;Kg, ^ f 0 , 0 and ^ x 0 , p 2E r ; (3) = follows by using S n (f) = R 1 1 R n (t)e j2ft dt, P (f) = R 1 1 p(t)e j2ft dt, S n (f);P (f) being the noise power spectral density and Fourier transform of the pulse, respectively and from (7.3); (4) = follows from Parseval's theorem. Using a similar sequence of steps, the other term in (7.20) can be computed as: (i) b = ZZ Ts 0 R n (t 1 t 2 )e j2f k (t 1 t 2 ) r s (t 1 )r s (t 2 )dt 1 dt 2 = N 0 2 rss K X k=1 jx k j 2 +E r (7.22) 91 Now the second component of (7.19) can be computed as: (ii) = E n Z Ts 0 jn(t)j 2 e j2f k t dt 2 = ZZ Ts 0 e j2f k (t 1 t 2 ) E n n(t 1 )n (t 1 )n(t 2 )n (t 2 ) dt 1 dt 2 (1) = ZZ Ts 0 e j2f k (t 1 t 2 ) R 2 n (0)+jR n (t 1 t 2 )j 2 dt 1 dt 2 (2) Z Ts 0 Z 1 1 e j2f k (t 1 t 2 ) jR n (t 1 t 2 )j 2 dt 1 dt 2 = Z Ts 0 Z 1 1 S n (f)S n (f +f k )dfdt 2 (3) 2N 2 0 WT s ; (7.23) where (1) = follows from results on expectation of a product of Gaussian random variables [174]; (2) follows by change of the integration limits using that fact that R n (t) 0 for t 1=W and (3) follows from the fact that f k W and S n (f) = N 0 forjfjW . Now using (7.21){(7.23), one arrives at (7.8). Similarly, the noise pseudo-covariance for the k-th data stream can be computed as: ~ 2 Z k jx;h = EfZ k Z k g = E n Z Ts 0 r s (t)n (t) +r s (t)n(t) e j2f k t dt 2 | {z } (iii) +E n Z Ts 0 jn(t)j 2 e j2f k t dt 2 | {z } (iv) (7.24) Now the rst term of (7.24) can be computed as: (iii) = ZZ Ts 0 e j2f k (t 1 +t 2 ) h R n (t 1 t 2 )r s (t 1 )r s (t 2 ) +R n (t 1 t 2 )r s (t 1 )r s (t 2 ) i dt 1 dt 2 (7.25) (1) = ZZ Ts 0 2R n (t 1 t 2 )e j2f k (t 1 +t 2 ) r s (t 1 )r s (t 2 )dt 1 dt 2 (2) ZZ 1 1 2R n ( ^ t 1 ^ t 2 )e j2f k ( ^ t 1 + ^ t 2 ) p( ^ t 1 )p( ^ t 2 )d ^ t 1 d ^ t 2 L X `=1 F X n=1 j ` j 2 K X k 1 =K ^ x k 1 p 2 e j2 ^ f k 1 nT f K X k 2 =K ^ x k 2 p 2 e j2 ^ f k 2 nT f e j4f k (nT f + ` ) (3) = Z 1 1 S n (f)P (f +f k )P (ff k )df L X `=1 j ` j 2 e j4f k ` X (k 1 ;k 2 )2A k ^ x k 1 ^ x k 2 92 (4) N 0 L X `=1 j ` j 2 e j4f k ` X (k 1 ;k 2 )2A k ^ x k 1 ^ x k 2 ; (7.26) where (1) = follows from the fact that R n (t 1 t 2 ) =R n (t 1 t 2 ); (2) follows from steps similar to (7.21); (3) = also follows from steps similar to (7.21) withA k as dened in (7.10) and (4) follows from the fact that f k W and hence, P (ff k )P (f). Now the second component of (7.24) can be computed as: (iv) (1) = ZZ Ts 0 e j2f k (t 1 +t 2 ) R 2 n (0)+jR n (t 1 t 2 )j 2 dt 1 dt 2 (2) Z Ts 0 Z 1 1 e j2f k (t 1 +t 2 ) jR n (t 1 t 2 )j 2 dt 1 dt 2 (3) = 0; (7.27) where (1) = follows from results on expectation of a product of Gaussian random variables [174]; (2) follows by change of the integration limits using that fact that R n (t) 0 for t 1=W and (3) = follows by change of variable ^ t 1 =t 1 t 2 . Now using (7.26){(7.27), one arrives at (7.9). 93 Chapter 8 Multi-Antenna Frequency Shift Reference Receivers for massive MIMO In the previous chapter a non-coherent RX for UWB applications was discussed, called Fre- quency Shift Reference (FSR) RX. The FSR RX oers a low-complexity alternative to Rake RXs, for low data-rate ultra-wideband applications. In such systems, a data signal and a ref- erence signal are simultaneously transmitted. At the RX, the received signal corresponding to the data is correlated with the received signal corresponding to the reference signal via a squaring operation, thereby accumulating the multi-path channel energies in-phase. While this signicantly reduces the cost of the RX, it leads to a signicant enhancement in the RX noise levels as is evident by the 2N 2 0 WT s term in (7.11). Essentially due to the squaring operation, noise power from the whole UWB system bandwidth appears at each demodulation output, thus signicantly reducing the link margin. 1 Therefore such FSR, and more generally transmit reference, RXs have only limited applications. One way to prevent such noise accumulation is to avoid the bandwidth spreading caused by multiplication with the impulse train u(t) (see (7.1)). Note that such a bandwidth spreading is an essential part of spread spectrum UWB that allows resolution of channel MPCs in delay domain, thus, providing resilience to fading. However in multi-antenna systems, the channel MPCs can be resolved in the spatial domain obviating the need for separating them in the delay domain. Therefore FSR receivers can be implemented in massive MIMO systems without the need for bandwidth spreading, thus sig- nicantly reducing the noise accumulation in comparison to UWB FSR. Based on this idea, in this chapter a novel reduced complexity, non-coherent RX for massive MIMO systems called, Multi-Antenna Frequency Shift Reference (MA-FSR) RX [221] is proposed. Reduced complexity non-coherent receivers for massive MIMO have several advantages over the coherent RXs such as hybrid beamforming or HBwS (see Section 1.1). For example, the latter require coherent signal demodulation, which is hard to implement at the high phase noise levels encountered at mm-wave frequencies. Furthermore, the acquisition of the chan- 1 Squaring in the time domain causes a signicant enhancement of noise power spectral density at low fre- quencies. 94 nel state information and the design of the beamformer may impose signicant CE overheads and computational burdens on the transceivers (see Section 1.1). While several non-coherent RXs have been proposed for massive MIMO [73, 74, 222, 223], most schemes either require an up/down-conversion chain at each antenna element or work well only in rich scattering channels or narrow-band channels, all of which may limit their applicability to low-cost, high data rate applications at mm-wave frequencies. With the advent of new 5G verticals that require low- cost, low-energy solutions, such as internet of things (IoT), the time is ripe for new transceiver architectures that allow high data-rate communication with low system and hardware costs, while also providing a good beamforming gain to improve link budget. The proposed MA-FSR receiver in this chapter is ideal for such scenarios. The MA-FSR RX uses only one down- conversion chain, supports high data rates with non-coherent demodulation, and can perform receive beamforming without requiring phase-shifters, explicit CE, or complicated signal pro- cessing { thus alleviating the drawbacks of the above mentioned schemes. Inspired by the FSR schemes for Single Input Single Output (SISO) UWB systems [132,212,224], in this scheme the TX transmits a reference signal and several data signals on dierent frequency sub-carriers via OFDM. At each RX antenna, the received waveform corresponding to the data sub-carriers is then correlated with the received waveform corresponding to the reference signal via a simple squaring operation. The outputs are then summed up and fed to a single down-conversion chain for data demodulation. As shall be shown later, this operation emulates Maximal Ratio Combining (MRC) at the RX with imperfect channel estimates. Since the RX beamforming is enabled without CE, MA-FSR is especially suitable for fast time-varying channels, such as in V2V or V2X networks. Furthermore, due to the non-coherent RX architecture, the phase noise of the transmit signal has negligible in uence on the performance. The RX also exploits power from all the channel MPCs and is therefore resilient to blocking of MPCs. Unlike conventional UWB FSR systems, there is no bandwidth spreading of the data signal involved. Therefore, the noise enhancement due to the non-linear RX architecture is signicantly smaller, making it practically viable. On the ip side, the proposed scheme only uses 50% of frequency sub-carriers for data transmission, can only support a single spatial data-stream, cannot suppress interfer- ence and can only be used for beamforming in the receive mode of a node. Therefore, MA-FSR is more suitable for scenarios with abundant spectrum, either in a device-to-device network where beamforming at one link end provides sucient link margin, or in the end terminals of an infrastructure based network where down-link trac is dominant. The contributions of this chapter are as follows: 1. An MA-FSR RX architecture for massive MIMO systems is proposed, that allows non- coherent transmission, lowers implementation cost and energy consumption at the cost of 50% bandwidth eciency and that does not require phase-shifters or CE at the RX. 2. The achievable throughput for the proposed MA-FSR system is characterized, both ana- lytically and via simulations, for the SIMO scenario in a frequency-selective fading chan- nel. 3. A class of improved MA-FSR architectures that can further improve performance, albeit, with a higher hardware complexity are also presented. 95 The system model is discussed in Section 8.1; the signal and noise components of the demod- ulated outputs are characterized in Section 8.2; the system throughput and power allocation problem are analyzed in Section 8.3; simulation results are provided in Section 8.4; a class of im- proved MA-FSR architectures are proposed in Section 8.5 and the conclusions are summarized in Section 8.6. Notation: scalars are represented by light-case letters; vectors by bold-case letters; and sets by calligraphic letters. Additionally, j = p 1, Efg represents the expectation operator, c is the complex conjugate of a complex scalar c, c y is the Hermitian transpose of a complex vector c, (t) represents the Dirac delta function, a;b represents the Kronecker delta function and Refg/Imfg refer to the real/imaginary component, respectively. Furthermore a and a y denote the complex conjugate and the conjugate transpose of a vector a, respectively. 8.1 General Assumptions and System model A SIMO link (which can be part of a larger system) is considered, where the TX has a single antenna (M tx = 1) and the RX has M rx 1 antennas and one down-conversion chain. Note that this model also covers a MIMO link where the TX transmits a single spatial data stream, since the combination of TX precoding vector and propagation channel creates an eective SIMO link. The TX transmits OFDM symbols with 2K sub-carriers, indexed asf0;:::; 2K1g. A reference signal is transmitted on the 0-th sub-carrier andKg data signals are transmitted on the sub-carrier setK,fK;K + 1;::; 2Kg 1g. Here g ensures that the transmit signal lies within the system bandwidth, and is usually small, determined by the TX phase noise. The remaining sub-carriers, i.e.,f1;::;K 1g[f2Kg;:::; 2K 1g are unused, as illustrated in Fig. 8.1. While it uses only 50% of the sub-carriers for data transmission, this OFDM structure is necessary to prevent inter-stream interference, as shall be shown in section 8.2. 2 Then, the complex equivalent transmit signal for the 0-th symbol (forT p t T s ) can be Figure 8.1: Power spectral density of an MA-FSR transmit signal. 2 This allocation ensures that through the RX non-linearity, the products of reference tone and data signals mix to the sub-carriersfK;::; 2Kgg in base-band, while the data-data inter-modulation products mix to the sub-carriersf1;:::K 1g, which can be ltered out. This sub-carrier allocation is dierent from the frequency o-sets used in a UWB FSR receiver [212], and leads to a much lower amount of noise enhancement. 96 expressed as: s(t) = r 2 T s " p E r + X k2K x k e j2f k t # e j(2fct+(t)) ; (8.1) whereE r is the energy allocated to the reference signal,x k is the data signal fork-th OFDM sub- carrier,f c is the carrier frequency, f k =k=T s represents the frequency oset of k-th sub-carrier from the reference signal, (t) represents the phase noise process at the TX and T s ;T p are the symbol duration and the cyclic prex duration, respectively. Here the complex equivalent signal is dened such that the actual (real) transmit signal is given by Refs(t)g. It is further assumed that the data signals on the sub-carriersfx k jk2Kg are mutually independently distributed with zero means. The total average transmit symbol energy is then given by E s =E r + P k2K E (k) d , where E (k) d =Efjx k j 2 g is the energy allocated to the k-th sub-carrier. 3 Dierent from Part I a wide-band channel with frequency selective fading is considered here, having LM rx scatterers, with the M rx 1 channel impulse response vector given as [81]: h(t) = L1 X `=0 ` a rx (`)(t ` ); (8.2) where ` is the complex gain, ` is the delay and a rx (`) is the RX array response vector, respectively, of the `-th MPC. As an illustration, the array response vector for a =2-spaced uniform linear array is given by: [a rx (`)] i = e j(i1) sin( ` ) , where is the wavelength of the carrier signal and ` is the angle of arrival of the`-th MPC. Note that here the system bandwidth is implicitly assumed to be small enough to ignore beam squinting eects. For ease of analysis, it is assumed that the array response vectors for the scatterers are mutually orthogonal i.e. a rx (`) y a rx ( `) =M rx `; ` . This assumption is reasonable if the scatterers are well separated and M rx L [38]. Later, in Section 8.4, the system performance when the above assumption is relaxed is also studied. To prevent inter symbol interference, the cyclic prex is assumed to be longer than maximum channel delay: T p > L1 . To model a generic time varying channel, the MPC parameters are assumed to remain constant for at least a coherence time interval T coh , and may/may not change afterwards. The RX front-end is assumed to have a low noise amplier followed by a Band Pass Filter (BPF) at each antenna element, as depicted in Fig. 8.2. The BPF has a cut-o frequency of f 2K and leaves the transmitted signal un-distorted but suppresses the out-of-band noise. Furthermore, the RX is assumed to have perfect timing and clock synchronization with the TX. The ltered complex equivalent received waveform for the 0-th symbol for 0tT s can then be expressed as: r(t) = L1 X `=0 ` a rx (`)s BB (t ` )e j2fc(t ` ) + n(t)e j2fct ; (8.3) 3 For ease of notation, the additional energy per symbol required to transmit the cyclic prex viz., EsTp=Ts is not considered here. 97 BPF r 1 () 2 BPF r 2 () 2 BPF . . . () 2 BPF r Mrx () 2 + r sq (t) LPF ADC Serial to parallel Prex removal IFFT Y K . . . Y 2Kg1 OFDM receiver Figure 8.2: Block diagram for a multi-antenna FSR receiver. where s BB (t),s(t)e j2fct is the base-band transmit signal including the carrier phase noise, and n(t) is the M rx 1 base-band equivalent, stationary, complex additive Gaussian noise process vector, with individual entries being circularly symmetric, independent and identically distributed (i.i.d.), and having a power spectral densityS n (f) = 2N 0 for 0 f f 2K . The ltered signals at each antenna are then squared and summed up, as depicted in Fig 8.2. Note that such a squaring operation can be performed using square law devices or multipliers with identical inputs. For the purpose of this chapter, it shall be assumed that the squaring operation, and also the ltering, mixing and adding operations at the RX can be implemented exactly. Since it is the actual, real received signal which gets squared, the output for the 0-th symbol, after the squaring and summing, can be expressed as: r sq (t) = Mrx X m=1 Refr m (t)g 2 = Mrx X m=1 jr m (t)j 2 2 + r m (t) 2 +r m (t) 2 4 ; (8.4) wherer m (t) = [r(t)] m is the complex equivalent received signal at them-th antenna. Note that both r m (t) 2 , r m (t) 2 are high pass signals with a carrier frequency of 2f c . This summed signal r sq (t) is then low-pass ltered (with a cut-o frequency of 2K=T s ) to get: r LPF (t) = kr(t)k 2 2 = L1 X `=0 M rx j ` j 2 2 s BB (t ` ) 2 + kn(t)k 2 2 + L1 X `=0 Re h ` s BB (t ` )n(t) y a rx (`)e j2fc ` i ; (8.5) where the orthogonality assumption for the array response vectors is used. Finally, r LPF (t) is sampled by an ADC at a sampling rate of 4K=T s samples/sec and conventional OFDM demodulation follows. Note that sincer LPF (t) is a real signal with maximum frequency 2K=T s , 98 the ADC sampling rate must be at least 4K=T s samples/sec to prevent aliasing. However it can be shown that the signal of interest i.e., the product between the reference and data sub-carriers only lies within the frequency rangeK=T s jfj (2Kg1)=T s . Thus the same performance can also be obtained by replacing the low-pass lter by a band-pass lter with a pass-band of K=T s jfj 2K=T s , and using an ADC sampling rate of 2K=T s samples/sec. 8.2 Analysis of the demodulation outputs Inspired by a similar analysis for the UWB FSR receiver in Chapter 7, the current section analyzes the OFDM demodulation outputs. The OFDM demodulated output for the k-th subcarrier of the 0-th symbol can be expressed as (2Kk 2K 1): Y k = T s 4K 4K1 X u=0 r LPF uT s 4K e j2ku 4K : (8.6) Each demodulation output shall be expressed as Y k = S k +N k where S k , referred to as the signal component, involves terms in (8.6) not containing the channel noise and Z k , referred to as the noise component, containing the remaining terms. It can be veried from (8.5) and the expression for s BB (t) that only the demodulation outputsfY k jKjkj (2Kg)g involve signal components. Therefore a sub-optimal, albeit simple, demodulation approach shall be considered, where only the outputsfY k jk2Kg are utilized for demodulating the data, and the noise components are treated as noise. 4 8.2.1 Signal component analysis From (8.5){(8.6), the signal components of the OFDM demodulated outputs Y k , fork2K, can be expressed as: S k = X u;` M rx j ` j 2 4K p E r + X k 1 2K x k 1 e j2f k 1 ( uTs 4K ` ) 2 e j2ku 4K (1) = M rx p E r " L1 X `=0 j ` j 2 e j2f k ` # x k ; (8.7) where (1) = follows from the sub-carrier allocation, which ensures that, despite the non-linear RX architecture, only the cross-product between the reference signal and the data on the k-th sub- carrier contribute to S k , for k K. Essentially, the MA-FSR RX utilizes the received vector corresponding to the reference tone as weights to combine the received signal corresponding to the data, thus emulating maximal ratio combining of the contributions from the dierent antennas with imperfect channel estimates. Since this combining takes place via squaring in the analog domain, the proposed RX enables beamforming without CE or use of phase-shifters. However, as is evident from (8.7), the signals from theL multi-path components do not add up 4 Since rLPF(t) is real, Y k =Y k i.e.fY k jk< 0g do not contain any additional information. 99 in-phase, at the demodulation outputs. This is due to the fact that the reference signal and the k-th data stream pass through slightly dierent channels owing to the dierence in modulating frequency. This leads to some amount of frequency selective fading, as shall be explained later in Section 8.3. 8.2.2 Noise component analysis From (8.6), the noise component of the OFDM demodulation output Y k , for k2K, can be expressed as: Z k = 4K1 X u=0 T s e j2ku 4K 4K " 1 2 n uT s 4K 2 + L1 X `=0 Re h ` s BB uT s 4K ` n uT s 4K y a rx (`)e j2fc ` i : (8.8) Note that the noise consists of both: noise-noise cross product and data-noise cross product terms. Given the transmit data vector x and channel impulse response h(t), the conditional mean of the noise components can be computed as: EfZ k jx; hg = 0; where the facts that the noise process n(t) is stationary, has a zero mean and P 4K1 u=0 e j2ku 4K = 0 for k > 0 are used. Similarly, the conditional second order statistics of the noise components can be computed as (detailed steps are given in Appendix 8.A): K a;b (x; h) , EfZ a Z b g M rx a;b h (2Kb)N 2 0 + N 0 0;0 E r i + X (k 1 ;k 2 )2 e A(a;b) M rx N 0 a;b x k 1 x k 2 (8.9a) e K a;b (x; h) , EfZ a Z b g X (k 2 ;k 1 )2 e B(a;b) M rx k 1 ;k 2 x k 1 x k 2 N 0 ; (8.9b) where a;b K, a;b (h), P ` j ` j 2 e j2(faf b ) ` , e A(a;b), (k 1 ;k 2 ) k 1 ;k 2 2K;k 1 k 2 = a b;k 2 b and e B(a;b), (k 1 ;k 2 ) k 1 ;k 2 2K;k 1 k 2 =a+b4K;k 2 b . Clearly, from (8.9a){ (8.9b) one can see that the noise components at the OFDM outputs are mutually correlated and are further dependent on the data vector x. For reducing computational complexity, a sub- optimal approach is considered where each sub-carrier data is decoded independently. Under this assumption, the noise variances, averaged over the transmit data vector x reduce to: 5 K k;k (h) = M rx (2Kk)N 2 0 + N 0 0;0 (E r + 2Kg1 X k 2 =k E (k 2 ) d ) (8.10a) 5 While it is imprecise to take an expectation ofK k;k (x; h) with respect to x k , its impact is very small when E (k) d Er (see Section 8.3.1). 100 e K k;k (h) = 0; (8.10b) wherek2K and the fact that the data streams have a zero mean and are mutually independent (see Section 8.1) is used. Henceforth, the noise component at each OFDM outputfZ k jk2Kg shall be approximated to be jointly Gaussian distributed. While this allows nding a lower bound to the system capacity, the accuracy of this assumption is also justied via simulations in Section 8.4. 8.3 Performance Analysis Using (8.7) and (8.10), the eective SISO channel between the transmit data and the k-th demodulation output (k2K) can be expressed as: Y k =M rx q E r E (k) d k;0 (h) x k +Z k ; (8.11) where x k ,x k = q E (k) d andZ k CN (0;K k;k (h)). It is assumed that at regular intervals the TX transmits pilot symbols using which the RX can estimate the `fading coecients'f k;0 jk2Kg. Similarly, using blanked pilots i.e., symbols where no signal is transmitted, the RX can also estimate N 0 . Note that since k;0 , N 0 and 0;0 are average channel parameters, they change slowly and can be tracked accurately and with low overhead. Henceforth, it shall be assumed the RX has perfect knowledge of these channel parameters for each channel realization h(t). These values are further assumed to be fed back to the TX, via a feedback channel, for bit and power allocation. Note that since these pilots are used only to estimate the average channel parameters and not the actual SIMO channel, the advantages of simplied CE are still applicable for MA-FSR RX. 6 From the Gaussian assumption forfZ k jk2Kg and from (8.10), note that (8.11) represents a parallel Gaussian channel across the sub-carriers. The eective SNR for the k-th demodulation output (k2K) is then given by: k (h(t)) = M rx E r E (k) d j k;0 (h)j 2 (2Kk)N 2 0 + N 0 0;0 (E r + P 2Kg1 k 2 =k E (k 2 ) d ) : (8.12) Note that even though the RX does not explicitly perform CE, a beamforming gain of M rx is still observed in k (h(t)). However since the dierent MPCs do not add up in phase at the RX and the noise power varies with k, the system suers from frequency selective fading, which causes some loss in performance. Similarly, the instantaneous Sum Rate (iSR) is dened as: iSR h(t) , X k2K 1 T s log 1 + k (h(t)) ; (8.13) where the cyclic prex overhead is neglected for convenience. 6 While not considered here, dierential modulation can also be used on the sub-carriers when even such simplied CE is infeasible. 101 8.3.1 Power Allocation Since both the signal and noise variances in (8.11) are aected by the transmit powers in a non-linear way, nding the iSR maximizing power allocation to the data and reference tone is dicult. Therefore the following sub-optimal power allocation shall be relied upon. Lemma 8.3.1. For any feasible power allocation fE r ;E (k) d jk 2 Kg, we have: k (h(t)) 2~ k (h(t)) for all k2K, where ~ k (h(t)) is the eective SNR with: ~ E r = E s 2 ; ~ E (k) d = E (k) d (E s ~ E r ) E s E r : (8.14) Proof. Case 1 : LetE r E s =2. Then for any power allocationfE r ;E (k) d jk2Kg and anyk2K we have: ~ E r ~ E (k) d E r = E 2 s E (k) d 4E r (E s E r ) (1) E (k) d ; and (8.15a) ~ E r + 2Kg1 X k 2 =k ~ E (k 2 ) d = ~ E r + P 2Kg1 k 2 =k E (k 2 ) d E s E r (E s ~ E r ) (2) E r + 2Kg1 X k 2 =k E (k 2 ) d ; (8.15b) where (1) follows from the AM-GM inequality, and (2) follows by noting that P 2Kg1 k 2 =k E (k 2 ) d (E s E r ) 1, and hence the right hand side is a non-increasing function of ~ E r . From (8.15), it is clear that k (h(t)) ~ k (h(t)). Case 2: If on the other hand E r <E s =2, then from (8.12), one can write for any k2K: 2~ k (h(t)) = M rx ~ E r (2 ~ E (k) d )j k;0 (h)j 2 (2Kk)N 2 0 + N 0 0;0 ( ~ E r + P 2Kg1 k 2 =k ~ E (k 2 ) d ) (3) M rx E r E (k) d )j k;0 (h)j 2 (2Kk)N 2 0 +N 0 0;0 (E r + P 2Kg1 k 2 =k E (k 2 ) d ) = k (h(t)); where (3) follows from the fact that ~ E r > E r and 2 ~ E (k) d > E (k) d > ~ E (k) d (see (8.14)). Therefore the theorem follows. As a consequence of Lemma 8.3.1, using E r = E s =2 can at-worst cause a 3 dB loss in the SNR of the data streams. 7 Note that the SNR expression in (8.12) can be approximated as: ^ k (h(t)) = M rx E r E (k) d j k;0 (h)j 2 (2Kk)N 2 0 + N 0 0;0 [E r + 2Kgk Kg (E s E r )] ; 7 In practice, the value of Er may further be limited by spectral mask regulations, a discussion of which is beyond the scope of this chapter. 102 which is obtained by using by replacingE (k) d by (E s E r )=(Kg). Now using ^ k (h(t)) instead of k (h(t)) in (8.13) withE r =E s =2 and P k E (k) d =E s =2, a sub-optimal iSR maximizing power allocation forfE (k) d jk2Kg can be obtained by the water-lling algorithm. In fact, it can be shown that this allocation is optimal, as 0;0 (h)Es N 0 ! 0. 8.4 Simulation Results For simulations a SIMO system is considered, where the RX has a half-wavelength spaced uniform linear array (M rx = 64) with one down-conversion chain and is equipped with a MA- FSR RX. The TX transmits OFDM symbols with T s = 2s, T cp = 0:2s, g = 5 and f c = 30 GHz. The phase noise at the TX is modeled as a Wiener process withEfj(t +T s )(t)j 2 g = 2 . A sample channel impulse response h(t) is considered with: L = 3, ` = 50j` 1j ns, ` = (`1)=10, ` = (1) `+1 expfj ` j= g, ==10 and [a rx (`)] i =e j(i1) sin( ` ) . Perfect timing synchronization, and perfect knowledge off k;0 jk2Kg is also assumed at the RX. For this h(t), the Symbol Error Rate (SER) for (8.6), obtained by Monte-Carlo simulations, is compared to the analytical SER for the eective channel (8.11) in Fig. 8.3. The perfect match between the analytical and simulation results validates our analysis and the eective channel model in (8.11). Due to the frequency selective signal and noise powers, one also observes that the SER changes with k. 6 8 10 12 14 16 18 20 22 24 26 10 -4 10 -3 10 -2 10 -1 10 0 QPSK 16-QAM Figure 8.3: SER for data streams k = 50; 74; 94 of an MA-FSR RX with QAM modulation K = 50, E r =E s =2, E (k) d =E s =(2jKj), K = 50, g = 5 Next the analytical iSR of MA-FSR (8.13) is compared to the iSR of a coherent receiver with analog beamforming that only occupies half the bandwidth, i.e.,jfjK=T s , in Fig. 8.4. For analog beamforming, the use of statistical beamforming with perfect channel estimates is assumed at the RX, where the beamformer is a rx (1) ja rx (1)j. Furthermore perfect phase noise cancellation is assumed, and CE overhead is not included in the iSR. The results show that for 0;0 E s =N 0 10 dB, the MA-FSR suers from a 9 dB reduction in beamforming gain in comparison to analog beamforming. However at lower values, this beamforming loss 103 -20 -10 0 10 20 30 40 50 0 1 2 3 4 5 6 7 10 8 6 dB 9 dB Figure 8.4: Comparison of the iSR (without CE overhead) of MA-FSR and analog beamforming (a) For MA-FSR, E r = E s =2, K = 128 and g = 5 (b) For analog beamforming, only the subcarriersf0;::;Kg 1g are used. increases signicantly, as is also evident from (8.13). Note that 0;0 E s =N 0 = 10 dB corresponds to a per sub-carrier SNR of around10 dB without the RX beamforming gain, and thus, indeed represents a scenario where the RX beamforming gain is essential. Furthermore, one can observe that the performance of equal data power allocation is comparable to water-lling. However these results depend on L. Larger values of L intensify the frequency selective fading of MA-FSR, thereby possibly increasing the SNR loss in Fig. 8.4. Therefore, MA-FSR is more suited to sparse channels with very few MPCs. 8.5 Improved MA-FSR designs Note that the MA-FSR RX performance degrades signicantly below a certain threshold SNR. This is mainly due to the noise enhancement resulting from the squaring operation, which leads to the large noise-noise cross term in (8.8). Since the transmit signal is mainly restricted to frequenciesjfj 1 Ts and K Ts jff c j 2K Ts (ignoring phase noise), the impact of this noise enhancement can be signicantly reduced by suppressing the noise at lower frequencies by a factor of p using a lter, as illustrated in Fig. 8.5a. Using a similar analysis as presented here, it can be shown that the eective SNR on the k-th sub-carrier for this design (with no phase noise) is: A k (h(t)) = M rx E r E (k) d j k;0 (h)j 2 [(2Kk + 1) + 1]N 2 0 + N 0 0;0 E (k) sum ; (8.16) where E (k) sum =E r +E (k) d + P 2Kg1 k 2 =k+1 E (k 2 ) d . While this design reduces the noise enhancement of the RX, note that the receiver still only has a 50% bandwidth eciency. This eciency can be boosted to 100% by using an alternate design where the squaring circuit is replaced by a multiplier, with one input being the received signal processed via a narrow bandpass lter that 104 isolates only the reference sub-carrier, as illustrated in Fig. 8.5b. This design, called reference tone aided transmission shall be analyzed in detail in Chapter 9 (see also [225]). Note that while these designs show signicant improvements in performance, implementing such sharp lters at the carrier frequency is dicult and may have to rely on carrier recovery techniques, thus increasing RX complexity. (a) With noise suppression (b) With narrow band-pass lter Figure 8.5: MA-FSR designs with improved performance 8.6 Conclusion In this chapter a novel non-coherent receiver for massive MIMO systems is proposed, that only requires a single down-conversion chain, can support high data rates and can achieve beamforming gains without phase shifters or CE. The MA-FSR RX essentially uses the received signal for a reference tone to combine the received signal corresponding to the data, via a squaring operation at each antenna. The carefully designed sub-carrier allocation prevents inter-carrier interference. The analysis suggests that the eective channel between the sub- carrier inputs and the demodulated outputs behaves like a parallel Gaussian channel with frequency selective fading, where the frequency selectivity arises due to modulating frequency mismatch between the reference and data sub-carriers and due to the varying noise levels. These varying noise levels arise due to the noise enhancement experienced by the squaring operation at the RX. The simulation results show that, in comparison to analog beamforming, MA-FSR suers only a small loss in beamforming gain, as long as the mean received power is above a certain threshold. This threshold behavior is due to the noise enhancement due to the squaring operation, and several improved FSR designs that can reduce its impact are also proposed. One such design shall be discussed in more detail in Chapter 9. 8.A Appendix A From (8.8), the conditional cross-covariance between the noise components at thea-th andb-th sub-carriers can further be computed as: K a;b (x; h) (1) = 4K1 X u;v=0 T 2 s e j2(aubv) 4K 16K 2 E ( 1 4 n uT s 4K 2 n vT s 4K 2 105 + L1 X ` 1 ;` 2 =0 Re h ` 1 s BB uT s 4K ` 1 n uT s 4K y a rx (` 1 )e j2fc ` 1 i Re h ` 2 s BB vT s 4K ` 2 n vT s 4K y a rx (` 2 )e j2fc ` 2 i ) (2) = 4K1 X u;v=0 T 2 s e j2(aubv) 4K 16K 2 E ( 1 4 n uT s 4K 2 n vT s 4K 2 + L1 X ` 1 ;` 2 =0 1 2 Re h ` 1 ` 2 s BB uT s 4K ` 1 s BB vT s 4K ` 2 a rx (` 2 ) y n vT s 4K n uT s 4K y a rx (` 1 )e j2fc( ` 1 ` 2 ) i ) (3) = 4K1 X u;v=0 T 2 s e j2(aubv) 4K 16K 2 " 1 4 (M 2 rx R n [0] 2 +M rx jR n [vu]j 2 ) + L1 X ` 1 =1 1 2 Re h j ` 1 j 2 s BB uT s 4K ` 1 s BB vT s 4K ` 1 M rx R n [vu] i # ; where (1) = follows from the fact that n(t) is zero-mean Gaussian and therefore the odd moments of n(t) are zero; (2) = follows by using the identity RefAgRefBg = 1 2 RefAB +ABg for any scalarsA;B, and by ignoring the terms involving pseudo-covariance of the circularly symmetric Gaussian noise and (3) = follows by deningR n [u],E n [n(t)] i [n t + uTs 4K ] i o for any 1iM rx , using the results on expectation of a product of four Gaussian random variables [174], and from the orthogonality of the array response vectors. Dening a new variable w =vu and using change of variables, one can approximateK a;b (x; h) as: K a;b (x; h) (4) 4K1 X u=0 4Ku1 X w=u M rx T 2 s e j2((ab)ubw) 4K 16K 2 " jR n [w]j 2 4 + L1 X ` 1 =0 j ` 1 j 2 T s Re ( E r + X k 1 2K p E r x k 1 e j2uk 1 4K e j2k 1 ` 1 Ts + X k 1 2K X k 2 2K x k 1 x k 2 e j2[(k 1 k 2 )uwk 2 ] 4K e j2(k 1 k 2 ) ` 1 Ts + X k 2 2K p E r x k 2 e j2(w+u)k 2 4K e j2k 2 ` 1 Ts R n [w] )# (5) 4K1 X u=0 1 X w=1 M rx T 2 s e j2((ab)ubw) 4K 16K 2 " 1 4 jR n [w]j 2 106 + 1 T s Re ( 0;0 E r + X k 1 2K k 1 ;0 p E r x k 1 e j2uk 1 4K + X k 1 2K X k 2 2K x k 1 x k 2 e j2[u(k 1 k 2 )wk 2 ] 4K k 1 ;k 2 + X k 2 2K 0;k 2 p E r x k 2 e j2(w+u)k 2 4K R n [w] )# ; where (4) follows by assuming that the phase noise e j(t) is constant within the support of the noise auto-correlation function R n [w] and (5) follows by changing the summation limits since R n [w] has a very narrow support of aroundO(1) and by dening a;b , P L1 `=0 j ` j 2 e j2(faf b ) ` . Note that (4) is accurate as long as the system bandwidth is much larger than the phase noise bandwidth [226]. Now taking a summation over u, one obtains: K a;b (x; h) = 1 X w=1 M rx T s e j2bw 4K 8K " T s 2 a;b jR n [w]j 2 + 0;0 a;b E r + X (k 1 ;k 2 )2A(a;b) k 1 ;k 2 x k 1 x k 2 e j2wk 2 4K R n [w] + a;b 0;0 E r + X (k 2 ;k 1 )2A(a;b) k 1 ;k 2 x k 1 x k 2 e j2wk 2 4K R n [w] # ; where we deneA(a;b), (k 1 ;k 2 ) k 1 ;k 2 2K;k 1 k 2 =ab and the remaining terms vanish sincejabjK1<k 1 ;k 2 2Kg1. Note that the sampled noise auto-correlation function can be expressed in terms of the power spectral density as: R n [w] = R 1 1 S n (f)e j2fwTs 4K df. Thus, substituting R n [w] we have: K a;b (x; h) (6) = M rx 2 " T s 2 a;b Z 1 1 S n (f 1 )S n f 1 + b T s df 1 + a;b 0;0 E r S n b T s + X (k 1 ;k 2 )2A(a;b) k 1 ;k 2 x k 1 x k 2 S n k 2 b T s + a;b 0;0 E r S n b T s + X (k 2 ;k 1 )2A(a;b) k 1 ;k 2 x k 1 x k 2 S n k 2 +b T s # = M rx a;b (2Kb)N 2 0 +M rx a;b N 0 0;0 E r + X (k 1 ;k 2 )2 e A(a;b) M rx N 0 a;b x k 1 x k 2 ; (8.17) where (6) = follows from the identity: P 1 w=1 e j2fwT = 1 T P 1 g=1 (fg=T ) and using the fact thatS n (f) is non-zero only in the range 0f 2K=T s and we dene e A(a;b), (k 1 ;k 2 )2 A(a;b) k 2 b . Using a similar sequence of steps the noise pseudo-covariance can be computed as: e K a;b (x; h) = 4K1 X u;v=0 T 2 s e j2(au+bv) 4K 16K 2 " 1 4 (M 2 rx R n [0] 2 +M rx jR n [vu]j 2 ) 107 + L1 X ` 1 =1 1 2 Re h j ` 1 j 2 s BB uT s 4K ` 1 s BB vT s 4K ` 1 M rx R n [vu] i # ; 4K1 X u=0 1 X w=1 M rx T 2 s e j2((a+b)u+bw) 4K 16K 2 " 1 4 jR n [w]j 2 + 1 T s Re ( 0;0 E r + 2K1 X k 1 =K k 1 ;0 p E r x k 1 e j2uk 1 4K + 2K1 X k 1 ;k 2 =1 x k 1 x k 2 e j2[u(k 1 k 2 )wk 2 ] 4K k 1 ;k 2 + 2K1 X k 2 =K 0;k 2 p E r x k 2 e j2(w+u)k 2 4K R n [w] )# ; = 1 X w=1 M rx T s e j2bw 4K 8K " X k 1 2C(a;b) k 1 ;0 p E r x k 1 R n [w] + X (k 1 ;k 2 )2B(a;b) k 1 ;k 2 x k 1 x k 2 e j2wk 2 4K R n [w] + X (k 2 ;k 1 )2B(a;b) k 1 ;k 2 x k 1 x k 2 e j2wk 2 4K R n [w] + X k 2 2C(a;b) 0;k 2 p E r x k 2 e j2wk 2 4K R n [w] # ; = M rx 2 " X k 1 2C(a;b) k 1 ;0 p E r x k 1 S n b T s + X (k 1 ;k 2 )2B(a;b) k 1 ;k 2 x k 1 x k 2 S n k 2 +b T s + X (k 2 ;k 1 )2B(a;b) k 1 ;k 2 x k 1 x k 2 S n k 2 b T s + X k 2 2C(a;b) 0;k 2 p E r x k 2 S n k 2 +b T s # ; = X (k 2 ;k 1 )2 e B(a;b) M rx k 1 ;k 2 x k 1 x k 2 N 0 ; (8.18) where we deneB(a;b) = (k 1 ;k 2 ) Kk 1 ;k 2 2K1;k 1 k 2 =a+b4K ,C(a;b) =fkjk k 2K 1;a +b +k = 4Kg and e B(a;b), (k 1 ;k 2 ) k 1 ;k 2 2K;k 1 k 2 =a +b 4K;k 2 b . 108 Chapter 9 Reference Tone Aided Transmission As discussed in Section 1.1, a major challenge for coherent reduced complexity transceivers is the acquisition of CSI required for beamforming. While several fast CE approaches have been proposed that either rely on aCSI, sparse nature of channels, or perform beam-tracking, they suer several deciencies (see Sections 1.1{1.2 for a more detailed discussion). One solution to perform beamforming without the need for CSI is via non-coherent reception, and one such novel non-coherent RX architecture: MA-FSR was proposed in Chapter 8. While MA-FSR suers only a 6 9 dB reduction in beamforming gain in comparison to analog beamforming 1 with perfect CSI (at high SNR), it only utilizes 50% of the spectral resources for data trans- mission. This is due to the carefully designed sub-carrier allocation that prevents inter-carrier interference among the dierent data sub-carriers after the squaring operation at RX. One way to prevent the inter-carrier interference without such spectrally inecient sub-carrier allocation is by replacing the squaring operation at the RX by a multiplication operation, and by ensuring that one of the multiplication inputs does not contain any data sub-carriers. In this chapter such a transmission scheme, called Reference Tone Aided Transmission (RTAT), is proposed and analyzed. In RTAT, the TX transmits a reference tone at a known frequency, along with the data. At each RX antenna, the received waveform corresponding to the reference signal is recovered, and multiplied to the received waveform corresponding to the data signals, in the analog domain. This results in a low-pass signal with the inter-antenna phase shift compensated. The outputs from each antenna are then summed up, low-pass ltered and sampled for data demodulation. This operation emulates MRC with imperfect CSI at the RX. In the architecture considered here, the RTAT RX uses only one down-conversion chain and can perform beamforming without requiring analog phase shifters - which may suer large insertion losses [227], and without explicit CE, thereby obviating the need for aCSI acquisition at the RX. It also exploits power from all the channel MPCs and is therefore resilient to shadow fading and blocking of MPCs. While not explored here, the received signals from the reference tone can also be used for transmit beamforming on the reverse link. On the ip side, the RTAT transceivers, in their current suggested form, do not support multiple spatial data streams and can only be used for beamforming at one end of a communication link. They are therefore more 1 Analog beamforming is a special case of hybrid beamforming where the transceiver has only one RF chain (Krx = 1). 109 suited for use at the UEs. While no assumptions are made about the requirement for coherent demodulation, carrier recovery may be essential in a practical implementation of RTAT. Prior to the growth of digital hardware and digital processing capabilities, some legacy com- munication systems used a similar approach for inter-antenna phase compensation, referred to as adaptive phased array antennas [228{231]. However such legacy systems were mainly pro- posed for space communication (i.e., for single path channels) with single carrier transmission, and they did not exploit the amplitude of the carrier signal or the properties of massive antenna arrays and terrestrial mm-wave channels. In contrast, RTAT exploits both the signal phase and amplitude information of the reference tone and shows that sucient RX beamforming gain can be achieved even in sparse multi-path channels. The contributions of this chapter are: 1. A novel transmission scheme for low-complexity massive MIMO systems that does not require phase-shifters or explicit CE at the RX, named RTAT, is proposed. 2. An RX architecture for the RTAT scheme is proposed, and its achievable throughput in a frequency-selective fading channel is characterized, for a single spatial data-stream. 3. A near-optimal power allocation for data streams is derived and an IA procedure for RTAT is proposed. 4. Simulations compare performance of the proposed RTAT scheme to hybrid beamforming. Notation: scalars are represented by light-case letters; vectors by bold-case letters; and sets by calligraphic letters. Additionally, j = p 1, Efg represents the expectation operator, c is the complex conjugate of a complex scalar c, c y is the Hermitian transpose of a complex vector c and Refg/Imfg refer to the real/imaginary component, respectively. 9.1 General Assumptions and System Model A point-to-point SIMO system is considered, where the TX has a single antenna and the RX hasM rx 1 antennas. Note that even a point-to-point MIMO system where the TX transmits a single spatial data stream can be modeled as an equivalent SIMO system. The applicability of this model to the downlink of a cellular network is exemplied later in Section 9.4. Both the TX and RX are assumed to have one up/down-conversion chains each. The TX transmits OFDM symbols with 2K + 1 sub-carriers, indexed asfK;K + 1;:::;K 1;Kg, respectively. A reference tone, i.e., a pure sinusoidal signal with a pre-determined frequency known both to the TX and RX, is transmitted on the 0-th subcarrier. The system also transmits data onKg lower and higher sub-carriers, represented by the index setK =fK;:::;g 1;g + 1;:::;Kg. The sub-carriersfg;:::;gg are blanked, to act as a guard band between the reference and data sub-carriers, as illustrated in Fig. 9.1. The value of g is determined by the carrier recovery process at RX, as shall be discussed later. Unlike in Chapter 8, the phase noise at the TX is neglected for convenience. Assuming no phase noise at the TX, the complex equivalent transmit signal for the 0-th symbol can then be expressed as: s tx (t) = r 2 T s " p E r + X k2K x k e j2f k t # e j2fct (9.1) 110 Figure 9.1: Power spectral density of an RTAT transmit signal. forT cp tT s , where E r is the energy allocated to the reference tone, x k is the data signal at the k-th OFDM sub-carrier, f c is the carrier frequency, f k =k=T s represents the frequency oset of k-th sub-carrier from the reference signal and T s ;T cp are the symbol duration and the cyclic prex duration, respectively. Here the complex equivalent signal is dened such that the actual (real) transmit signal is given by Refs tx (t)g. For convenience, it is also assumed that f c is a multiple of both 1=T s and 1=(T cp +T s ), which ensures that the reference tone has no phase transitions between symbols. Furthermore, it is assumed that the data vector x has independent zero mean entries. The total average transmit symbol energy is given by E s =E r + P k2K E (k) d , where E (k) d =Efjx k j 2 g is the energy allocated to the k-th sub-carrier. 2 A wide-band channel with frequency selective fading and L M rx scatterers is assumed, with the M rx 1 channel impulse response vector given as [81]: h(t) = L1 X `=0 ` a rx (`)(t ` ) (9.2) where a rx (`) is the RX array response vector, ` is the complex gain and ` is the delay, respectively, of the `-th scatterer, (t) is the Dirac delta function. As an illustration, the array response vector for a =2-spaced uniform linear array is given by: [a rx (`)] i = e j(i1) sin( ` ) , where is the wavelength of the carrier signal and ` is the angle of arrival of the `-th MPC. Note that here the system bandwidth is implicitly assumed to be small enough to ignore beam squinting eects. To prevent inter symbol interference, the cyclic prex is assumed to be longer than the maximum channel delay: T cp > L1 . For ease of analysis, that the array response vectors for the scatterers are assumed to be mutually orthogonal, i.e., a rx (` 1 ) y a rx (` 2 ) =M rx if ` 1 =` 2 and 0 otherwise. This assumption is reasonable if the scatterers are well separated and M rx L [38]. To model a generic time varying channel, it is assumed that the MPC parameters f ` ; a rx (`); ` g remain constant for at least a coherence time interval T coh , and may/ may not change afterwards. Ignoring any RX non-linear eects, the M rx 1 complex equivalent received waveform for 0tT s can be expressed as: s rx (t) = L1 X `=0 ` a rx (`)s BB (t ` )e j2fc(t ` ) +n(t); (9.3) 2 For ease of notation, the additional energy per symbol required to transmit the cyclic prex viz., EsTcp=Ts is neglected. 111 LNA s rx;1 (t) nBPF =2 . . . LNA s rx;Mrx (t) nBPF =2 + LPF Refw LPF (t)g + LPF Imfw LPF (t)g (a) Block Diagram (b) nBPF frequency response Figure 9.2: Block diagram for a multi-antenna RTAT receiver and the frequency response of the narrow band pass lter. wheres BB (t),s tx (t)e j2fct is the base-band transmit signal, and n(t) is the M rx 1 station- ary, complex additive Gaussian noise process vector, with individual entries being circularly symmetric, i.i.d., and having a power spectral density: S n (f) = 2N 0 for all f 0. Using a Fourier series expansion, the noise for 0tT s can be expressed as: n(t) = 1 X k=0 e N[k]e j2f k t = 1 X k=fcTs N[k]e j2f k t e j2fct ; (9.4) where f k =k=T s as before, e N[k] is a M rx 1 vector with i.i.d. CN (0; 2N 0 =T s ) entries and we dene N[k] = e N[k +f c T s ]. Furthermore e N[ k] and e N[ k] are mutually independent for k6= k. This expansion shall be helpful in the forthcoming analysis. The received signal s rx;m (t) at each antennam is then passed through a Low Noise Amplier (LNA) and multiplied by a narrow band-pass ltered version of itself, as depicted in Fig. 9.2a. The frequency response of the narrow Band Pass Filter (nBPF) used is as depicted in Fig. 9.2b, and it essentially recovers the reference tone. The M rx 1 complex equivalent output from the nBPFs at the M rx antennas for 0tT s can be expressed as: ^ s rx (t) = L1 X `=0 r 2E r T s ` a rx (`)e j2fc(t ` ) + ^ n(t); (9.5) where ^ n(t) = P g k=g ^ N[k]e j2(fc+f k )t with ^ N[k] = p N[k] fork6= 0 and ^ N[0] = N[0]. By using a linear approximation for the output phase noise of a Phase Locked Loop (PLL) [232{234], it can be shown that such a sharp nBPF can be closely approximated by a PLL followed by a variable gain amplier, as depicted in Fig. 9.3. In such an implementation, the noise in (9.5) represents the PLL phase noise, f g is the PLL noise bandwidth and g can be as low as 6 [235]. For brevity, the availability of such an nBPF block at each antenna shall be assumed, and implementation details shall be discussed in future work (see Chapter 10 for some related implementations). The received signal Refs rx;m (t)g at each antennam is then mixed with the recovered nBPF output Ref^ s rx (t)g and its quadrature counterpart Imf^ s rx (t)g, as depicted in Fig. 9.2a. The 112 Figure 9.3: An example implementation of the nBPF lter, where DC represents a block that estimates the DC component of its input. mixed outputs are then summed up and low pass ltered (with cut-o frequencyf K ) to obtain, for 0tT s : w LPF (t) = LPF n ^ s y rx (t)Refs rx (t)g o = L1 X `=0 " p E r T s M rx j ` j 2 p E r + X k2K x k e j2f k (t ` ) + K X k=K r E r 2T s ` e j2fc ` a rx (`) y N[k]e j2f k t + g X k=g r E r 2T s ` e j2fc ` ^ N[ k] y a rx (`)e j2f k t + k+K X k= kK ` e j2(fc +f k ) ` p 2T s x k e j2(f k f k )t ^ N[ k] y a rx (`) # + g X k=g k+K X k= kK e j2(f k f k )t 2 ^ N[ k] y N[k]; (9.6) where the outputs corresponding to the in-phase and quadrature phase components of ^ s rx (t) are represented as the Re, Im components of w LPF (t), we set x k = 0 for k = 2K, the orthogonality assumption for the array response vectors is used, and the Fourier series expansion of the noise terms is used. Finally, w LPF (t) is sampled by an ADC at a sampling rate of T s =(2K + 1) samples/sec and conventional OFDM demodulation follows. 3 As shall be shown in following sections,w LPF (t) attains a good SNR and so henceforth assume perfect timing synchronization between TX and RX shall be assumed. 3 Since the reference tone-reference tone product in (9.6) produces a DC output, using a DC shift can negate its impact on the ADC dynamic range. 113 9.2 Analysis of the demodulation outputs The OFDM demodulated output for the k-th subcarrier, for any k2K, can be expressed as: Y k = T s 2K + 1 2K X u=0 w LPF uT s 2K + 1 e j2ku 2K+1 : (9.7) For convenience,Y k shall be split asY k =S k +Z k whereS k , referred to as the signal component, involves terms in (9.7) not containing the channel noise and Z k , referred to as the noise com- ponent, containing the remaining terms. For simplicity, a sub-optimal demodulation approach is considered where Z k is treated as noise. 9.2.1 Signal component analysis From (9.6){(9.7), the signal component of the OFDM demodulated output Y k , for any k2K, can be expressed as: S k = 2K X u=0 L 1 X `=0 X k2K p E r M rx j ` j 2 2K + 1 x k e j2f k ( uTs 2K+ 1 ` ) e j2ku 2K+ 1 = M rx p E r L1 X `=0 j ` j 2 e j2f k ` x k : (9.8) Essentially, the RX utilizes the received signal vector corresponding to the reference tone as weights to combine the received signal vector corresponding to the data sub-carriers, i.e., it emulates MRC combining with imperfect CSI at RX. Since this operation takes place in the RF domain, the proposed RX does not rely on explicit CE or phase-shifters for combining the received signals. However, as is evident from (9.8), the signal from theL multi-path components do not add up in-phase, at the demodulation outputs. This is due to the fact that the reference tone and the k-th data stream pass through slightly dierent channels owing to the dierence in their modulating frequency. This leads to some amount of frequency selective fading, the impact of which is quantied in sections 9.3 and 9.5. 9.2.2 Noise component analysis From (9.6){(9.7), the noise component of the OFDM demodulation output Y k , for any k2K, can be expressed as: Z k (1) = 2K X u=0 " L1 X `=0 r E r 2T s ` e j2fc ` a rx (`) y N[k]e j2ku 2K+1 + g X k=g ` e j2(fc+f k +f k ) ` p 2T s x k+k e j2ku 2K+1 ^ N[ k] y a rx (`) ! + g X k=g e j2ku 2K+1 2 ^ N[ k] y N[ k +k] # T s e j2ku 2K+1 2K + 1 114 = L1 X `=0 r E r T s 2 ` e j2fc ` a rx (`) y N[k] + g X k=g ` p T s e j2(fc+f k +f k ) ` p 2 x k+k ^ N[ k] y a rx (`) ! + g X k=g T s 2 ^ N[ k] y N[ k +k]; (9.9) where x k = 0 for k = 2K, and (1) = follows by observing that the other terms in (9.6) do not contribute to the k-th demodulation output. As is evident, the noise consists of both: noise- noise and data-noise cross product terms. Given the transmit data vector x and channel impulse response h(t), the conditional mean of the noise components can be computed as: EfZ k jx; hg = 0; k2K; which follows from (9.4). Similarly, the conditional second order statistics of the noise compo- nents can be expressed as (detailed steps are given in Appendix 9.A): K a;b (x; h) , EfZ a Z b g = 0;0 M rx N 0 E r a;b + a;b M rx N 0 x a x b (1) + g X k=g x k+a x k+b + a;b M rx (1)N 2 0 + g X k=g N 2 0 ; (9.10) e K a;b (x; h) , EfZ a Z b g = X ( k; k)2Aa a;b M rx N 2 0 ; (9.11) where a;b (h) , P ` j ` j 2 e j2(faf b ) ` , a;b is the Kronecker delta function i.e., a;b = 1 if a = b, a;b = 0 otherwise andA a =f( k; k)j k = k +a;g k; k gg. Note thatA a =fg for jaj> 2g, i.e., the pseudo cross-covariance is 0 for most of the demodulated sub-carriers. Clearly, from (9.10){(9.11) one can see that the noise components at the OFDM outputs are mutually correlated and are further dependent on the data vector x. Therefore, for optimal decoding, a joint estimation of the data vector x should be performed. Here the simple approach is considered, where the dependence of noise terms on x is not exploited. Under this assumption, the noise variances, averaged over the transmit data vector x reduce to: K a;b (h) = 0;0 M rx N 0 a;b E r +E (a) d + X k2Ba E (k+a) d + a;b M rx N 2 0 1 + 2g ; (9.12a) e K a;b (h) 0; (9.12b) where the fact that x has zero mean, mutually independent entries is used, we deneB a = fk ja +kj>g;gkg;k6= 0g, and the pseudo cross-covariance term in (9.11) is neglected. The noise componentsfZ k jk2Kg shall further be approximated to be jointly Gaussian dis- tributed, which enables nding a closed-form lower bound to the system capacity. The validity of this assumption is tested via simulations in Section 9.5. 115 9.3 Performance Analysis Using (9.8), (9.12) and the joint Gaussian assumption for fZ k jk 2 Kg, the eective SISO channel between the transmit data and the demodulation outputs can be expressed as: Y k =M rx q E r E (k) d k;0 (h) x k +Z k k2K; (9.13) where, x k ,x k = q E (k) d and Z k CN (0;K k;k (h)). Note that (9.13) represents a parallel Gaus- sian channel. The SNR for the k-th data stream and the system instantaneous Spectral E- ciency (iSE) 4 , respectively, can then be expressed as: k (h(t)) = M rx E r E (k) d j k;0 (h)j 2 0;0 N 0 h E r +E (k) d + P k2B k E ( k +k) d i +N 2 0 1+2g ; (9.14) iSE h(t) , X k2K 1 2K + 1 log 1 + k (h(t)) ; (9.15) where the cyclic prex overhead is neglected for convenience. Note that even though the RX does not explicitly estimate the array response vectors a rx (`) for the MPCs and does not use phase-shifters, one can still observe a beamforming gain of M rx in k (h(t)). However, one drawback is that k;0 (h) 0;0 (h), and is dierent for each k i.e., some amount of frequency selective fading is encountered. The performance loss due to this fading behavior is studied via simulations in Section 9.5. Note that the channel gain and noise variance in (9.13): f k;0 (h);K k;k (h)jk2Kg need to be estimated at the RX, for data demodulation. This can be achieved by transmitting pilot symbols (and null pilots) on the data sub-carriers. Since the RX has a good beamforming gain (9.14), andf k;0 (h);K k;k (h)jk2Kg are average channel parameters which change slowly with time, they can be tracked accurately with very low estimation overhead. If required, these parameters can further be fed back to the TX, via a feedback channel, for adaptive modulation and power allocation. Since the pilots are used only to estimate the average SISO channel parameters and not the actual SIMO channel, the advantages of simplied IA and CE are still applicable for the proposed scheme. 9.3.1 Power Allocation The aim of this subsection is to nd the iSE maximizing power allocation to the data sub- carriers and reference tone. Since both the signal and noise variances in (9.13) are aected by the transmit powers in a non-linear way, nding the optimal power allocation is dicult. Therefore the following approximate SNR expression shall be relied upon for power allocation: ^ k (h(t)) = M rx E r E (k) d j k;0 (h)j 2 0;0 N 0 h E r +(E s E r ) 1+g jKj i +N 2 0 1+2g ; (9.16) 4 Due to the sub-optimal per sub-carrier demodulation approach considered in Section 9.2.2, iSE is in general a lower bound to the instantaneous capacity. 116 which is obtained by usingjB k jg and by replacing E (k) d by (E s E r )=jKj. For ^ k (h(t)) one obtains the following lemma. Lemma 9.3.1. For any feasible power allocation fE r ;E (k) d jk 2 Kg, we have: ^ k (h(t)) ^ k (h(t)) for all k2K, where ^ k (h(t)) is the approximate SNR with: E r = + p 2 +E s ; E (k) d = E (k) d (E s E r ) E s E r ; (9.17) = N 2 0 1+2g + 0;0 N 0 E s 1+g jKj and = 0;0 N 0 (1 1+g jKj ). Proof. For any power allocation fE r ;E (k) d jk 2 Kg, consider the alternate power allocation f ~ E r ; E (k) d (Es ~ Er) EsEr jk2Kg for some ~ E r 2 [0;E s ]. Then from (9.16), for any k2K we have: ^ ~ k (h(t)) = M rx ~ E r E (k) d (E s ~ E r )j k;0 (h)j 2 (E s E r )( ~ E r +) ; (9.18) where ^ ~ k (h(t)) is the approximate SNR with the alternate power allocation, and ; are as dened in (9.17). By nding the second derivative of (9.18) with respect to ~ E r , it can be veried that ^ ~ k (h(t)) is a concave function of ~ E r . Therefore, now equating the rst derivative of (9.18) with respect to ~ E r to zero, one can nd that ^ ~ k (h(t)) is maximized for: ~ E r = + p 2 +E s ; (9.19) where the explicit steps are skipped for brevity. Since ^ ~ k (h(t)) = ^ k (h(t)) for ~ E r =E r , we have ^ k (h(t)) ^ k (h(t)). As a consequence of Lemma 9.3.1, the approximate data stream SNRs in (9.16) are max- imized by using E r = E r . 5 Now using ^ k (h(t)) instead of k (h(t)) in (9.15) with P k E (k) d = E s E s , a sub-optimal iSE maximizing power allocation forfE (k) d jk2Kg can be obtained by the water-lling algorithm. Note that forjKj 1, by ignoring the N 2 0 term and assumingj k;0 (h)j 2 j 0;0 (h)j 2 in (9.16), ^ k (h(t)) is same as the SNR for MRC combining with perfect CSI at RX. To show that such an N 2 0 term is also encountered in other RXs with noisy CE, a fully digital M rx -antenna RX is considered next. 9.3.2 Digital beamforming with full resolution ADCs For simplicity, the optimistic assumption that the fully digital RX has perfect knowledge of spanfa rx (`)j1 ` Lg, and therefore uses a M rx L unitary digital pre-combiner U with columnspanfUg = spanfa rx (`)j1 ` Lg is made. Within each coherence time interval, 5 In practice, the value of Er may further be limited by spectral mask regulations, a discussion of which is beyond the scope of this chapter. 117 the TX transmits one pilot symbol and RX uses the minimum-mean-square-error (MMSE) CE procedure. The use of a sub-optimal non-parametric CE is assumed, where the CE is independent for each coherence time-bandwidth block. Within a coherence bandwidth however, the channel is assumed to be frequency at, allowing a joint estimation across the sub-carriers. It is stated without proof that under this setting, and using analysis similar to Section 9.2, the eective SNR for a fully digital RX can be obtained as: FD k (h(t)) = M rx E 2 s j 0;0 (h)j 2 (2K + 1) 0;0 N 0 E s (1 + 1 K coh ) + L(2K+1) MrxK coh N 2 0 ; (9.20) where K coh represents the number of sub-carriers within a coherence bandwidth. Note the similarity of FD k (h(t)) to (9.16). A similar analytical comparison to hybrid beamforming and digital beamforming with low-resolution Analog to Digital Converters (ADCs) is beyond the scope of this chapter, and shall be explored in future work. 9.4 A Sample Initial Access Protocol To illustrate the IA advantages of RTAT, a sample IA protocol is proposed here. Consider several multi-antenna BSs deployed over a coverage area, using either fully digital or hybrid beamforming. The BSs are assumed to periodically transmit a Primary Synchronization Se- quence (PSS), with a common reference tone. An exhaustive beam-scanning procedure [103] is assumed, where this PSS is sequentially transmitted along dierent azimuth directions by each BS. Since no beam-search is required at a UE with a RTAT RX, the PSS needs to be transmitted only once for each BS azimuth direction. While use of a common reference tone for the PSS may cause reference tone contamination (similar to pilot contamination), the in- terference can be avoided by using disjoint data sub-carriers to transmit each BS's PSS control information. The control information in the PSS includes the frequency for a second reference tone, which is exclusive and well-separated for each BS. Using this control information, the UE adapts its nBPF to the exclusive reference tone of a chosen BS. It also transmits a return packet to inform the BS the azimuth direction to use. Further control information and the actual data transmission happens using this exclusive tone, with the data streams at each BS occupying same bandwidth. 6 Note that there is no carrier frequency oset, and the symbol synchronization is easy since it is performed with the RX beamforming gain. While a low PSS count per TX azimuth direction can also be achieved by using wider beams at the RX [236], this reduces the RX beamforming gain and the coverage radius. In contrast, RTAT retains the full RX beamforming gain. 9.5 Simulation Results For simulations, a single-user MIMO system in downlink is considered, having a multi-antenna BS and a UE with a half-wavelength spaced uniform linear array (M rx = 32). It is also assumed 6 Note thatK does not need to be symmetric about the reference tone. 118 that the BS has aCSI knowledge (via some IA procedure), and it beamforms a single spatial data stream towards the UE, having an RTAT RX architecture and one down-conversion chain. The BS transmits OFDM symbols with T s = 2s, T cp = 0:2s and f c = 30 GHz. By including the transmit precoding beam into the channel, a sample eective channel impulse response h(t) between the BS beam and UE is considered with: L = 3, ` = 50j` 1j ns, ` = (` 1)=10, ` = (1) `+1 expfj ` j= g, = =10 and [a( ` ; ` )] m = expfj sin( ` )g. Perfect timing synchronization, and perfect knowledge off k;0 jk2Kg are assumed to be available at the UE. For this h(t), the Symbol Error Rate (SER) for (9.7), obtained by Monte-Carlo simulations, is compared to the analytical SER for the eective channel (9.13) in Fig. 9.4. The perfect match between the analytical and simulation results validates our analysis and the eective channel model in (9.13). Due to the frequency selective fading, one also observes that the SER changes with k. 5 10 15 20 25 30 10 -4 10 -3 10 -2 10 -1 10 0 16-QAM QPSK Figure 9.4: SER for data streams k = 1;K=2;K of an RTAT receiver with QAM modulation K = 50, E r from Lemma 9.3.1, E (k) d = (E s E r )=jKj, K = 50, g = 6, = 0:1 Next, the iSE of RTAT (9.15) is compared to the iSE of hybrid beamforming 7 with no phase noise. For CE in hybrid beamforming, it is assumed that in each aCSI coherence time, the UE sequentially scans the received signal at co-prime locations of the antenna array [80]. Using these received signals, the UE uses the nuclear norm minimization (NNM) algorithm [80] to estimate the RX correlation matrix P ` j ` j 2 a( ` ; ` )a( ` ; ` ) y . It's largest eigenvector is then used as the analog beamformer for data reception. The perfect knowledge of N 0 and the exact ` 2 -norm of error are also assumed for NNM. The performance of the MA-FSR RX proposed in Chapter 8 is also included for comparison. Since there is very little work on performance of low resolution ADCs with CE in sparse frequency-selective fading channels [9], it is not included in the comparison here. The results show that for 0;0 E s =N 0 15 dB, RTAT suers from a 2:5 dB reduction in beamforming gain in comparison to hybrid beamforming. Note that since E s is the total transmit power and 0;0 includes the BS beamforming gain, having 0;0 E s =N 0 15 dB is reasonable. Furthermore, performance of equal data power allocation is comparable to water-lling. However these results are subjective and depend on L. Larger 7 Since UE has one RF chain, this is same as analog beamforming. 119 0 5 10 15 20 25 30 35 40 45 50 0 1 2 3 4 5 6 7 8 -10 0 10 20 10 -6 10 -4 10 -2 10 0 Low SNR regime in log iSE scale 2.5 dB Figure 9.5: Comparison of the iSE (without CE overhead) of RTAT, hybrid beamforming and MA-FSR (a) For RTAT, E r is from Lemma 9.3.1, K = 512, g = 6 and = 0:5 (b) For hybrid beamforming, the (5; 6) co-prime array is used and pilot symbols are assumed to use only 1=10- th of the sub-carriers (to reduce noise accumulation) (c) For MA-FSR, E r = E s =2, K = 512, g = 5. values of L intensify the frequency selective fading of RTAT, thereby increasing the loss in beamforming gain. Therefore, RTAT is more suited to sparse channels with very few MPCs between the TX beam and RX. The results also show that since the bandwidth eciency is 100%, the iSE of RTAT is approximately twice that of MA-FSR. Note that the iSE curves in Fig. 9.5 do not include the CE overhead. As an illustration, assuming that the aCSI coherence time is 10 ms and the BS has 100 antennas and performs exhaustive beam search, CE consumes 2% and 22% of the time-frequency resources for RTAT and hybrid beamforming, respectively. Therefore RTAT can provide signicant savings in estimation overhead at the cost of a small drop in SNR. On the analog hardware front, the RTAT RX requires 2M rx mixers, M rx LNAs and M rx nBPF lters (implemented using PLLs), where as, hybrid beamforming requires M rx analog phase-shifters, and possibly M rx LNAs and M rx mixers [227]. A comparative study of the hardware cost or energy consumption of these designs is beyond the scope of this chapter. Alternate, and possibly more cost-ecient, designs for an RTAT RX may also exist, which shall be explored in future work. 9.6 Conclusion In this work a novel new approach to reduce the hardware cost of massive MIMO systems, namely reference tone aided transmission is proposed. In RTAT the TX transmits a sinusoidal reference tone along with the data. The RX essentially recovers the reference tone at each antenna and uses it for down-converting the data signals to base-band, thereby, compensating for the inter-antenna phase shift of each MPC. Note that this operation can be interpreted as analog channel estimation (ACE) of the reference tone. Thus eectively, this work shows 120 that estimation of the amplitude and phase of a sinusoidal pilot/reference tone is sucient to obtain a good RX beamforming gain for a wide-band system with a single spatial data-stream. Further, the analysis suggests that the eective RTAT channel between the TX data and RX can be interpreted as a parallel Gaussian channel with frequency selective fading, where the frequency selectivity arises due to frequency mismatch between the reference tone and data sub- carriers. It is also shown that the SNR expressions for RTAT and for a fully digital RX with non-parametric CE are similar. Finally, the simulation results suggest that, in comparison to hybrid beamforming, RTAT suers only a small reduction in beamforming gain, while providing signicant savings in estimation overhead. 9.A Appendix A From (9.9), the conditional cross-covariance between the noise components at the a and b-th demodulation outputs can be computed as: K a;b (x; h) (1) = X ` 1 ;` 2 E r T s 2 ` 1 ` 2 e j2fc( ` 1 ` 2 ) a rx (` 1 ) y E N fN[a]N[b] y ga rx (` 2 ) + X ` 1 ;` 2 g X k; k=g ` 1 ` 2 T s e j2(fc+f k +fa) ` 1 e j2(fc+f k +f b ) ` 2 2 x k+a x k+b a rx (` 2 ) y E N f ^ N[ k] ^ N[ k] y ga rx (` 1 ) ! + T 2 s 4 g X k=g g X k=g E N n ^ N[ k] y N[ k+a]N[ k+b] y ^ N[ k] o (2) = L X `=1 j ` j 2 M rx N 0 E r a;b +e j2(f b fa) ` x a x b (1) + g X k=g x k+a x k+b + a;b M rx (1)N 2 0 + g X k=g N 2 0 (9.21) where (1) = follows from the fact that odd moments of N[k] are 0; and (2) = follows from (9.4){(9.5), by dening a;b = 1 ifa =b and 0 otherwise, and the fact the array response vectors a rx (`) are mutually orthogonal. Similarly, the conditional pseudo cross-covariance between the noise components at the a and b-th demodulation outputs can be computed as: e K a;b (x; h) (3) = X ` 1 ;` 2 g X k=g p E r T s 2 e j2fc( ` 1 ` 2 ) e j2(f k +f b ) ` 2 ` 1 ` 2 a rx (` 1 ) y E N fN[a] ^ N[ k] y ga rx (` 2 ) 121 + X ` 1 ;` 2 g X k=g p E r T s 2 e j2fc( ` 2 ` 1 ) e j2(f k +fa) ` 1 ` 1 ` 2 a rx (` 2 ) y E N fN[b] ^ N[ k] y ga rx (` 1 ) + g X k; k=g T 2 s 4 E N f ^ N[ k] y N[ k +a] ^ N[ k] y N[ k +b]g (4) = X ( k; k)2Aa a;b M rx N 2 0 (9.22) where (3) = follows from the fact that odd moments of N[k] are 0; and (4) = follows from (9.4){(9.5), observing thatjaj;jbj > g, the fact the array response vectors a rx (`) are mutually orthogonal, and by deningA a as in (9.11). 122 Chapter 10 Periodic Analog Channel Estimation aided Beamforming As mentioned in Chapter 9, the main reason for the pilot overhead in conventional reduced complexity transceivers is that conventional CE approaches require processing in the digital domain, while the RX has only one down-conversion chain. Therefore, in Chapter 9, a dierent transmission scheme RTAT was proposed that can perform receive beamforming without explicit CE or use of phase shifters, thereby leading to a signicant reduction in CE overhead and simpler IA. In RTAT, a sinusoidal reference tone is transmitted along with the data signals. At each receive antenna, the reference tone is recovered (including amplitude and phase), and multiplied by the received signals, to obtain a base-band signal whose inter-antenna phase shift has been compensated. The resulting low-pass signals from all the antennas are then added, emulating MRC combining at the RX with imperfect CSI. Note that the reference recovery essentially amounts to estimating the amplitude and phase of the received reference tone at each RX antenna. Therefore, in essence, Chapter 9 shows that the knowledge of the amplitude and phase of a received reference tone (that is transmitted along with the data) is sucient for designing a good RX analog beamformer. Since estimation of the amplitude and phase for a single sinusoidal tone is signicantly simpler than conventional CE, it can be performed at each RX antenna by analog hardware. Thus, by avoiding digital CE, the analog beamformer can be designed without pilot re-transmissions. This amplitude and phase estimation shall henceforth be referred to as analog channel estimation (ACE). In Chapter 9, ACE is performed continuously, via carrier recovery at each RX antenna. While this reduces the estimation overhead, it requires M rx carrier recovery circuits which may add to the cost and power consumption of the RX. Furthermore the continuous tracking of the reference tone is an overkill, and may cause some wastage in the transmit power and spectral eciency. In this chapter therefore a dierent scheme, referred to as Periodic Analog Channel Estimation (PACE) is proposed, which requires a single carrier recovery circuit, M rx phase shifters (see Fig. 10.1) and performs ACE judiciously. In PACE, the TX transmits a reference tone at a known frequency during each periodic RX beamformer update phase. A single carrier recovery circuit, such as a PLL, is used to recover the reference tone from one or more antennas, as shown in Fig. 10.1. This recovered 123 reference tone, and it's quadrature component, are then used to estimate the phase o-set and amplitude of the received reference tone at each RX antenna, via a bank of `lter, sample and hold' circuits (represented as integrators in Fig. 10.1). These phase and amplitude estimates are used to control an array of variable gain phase-shifters, which generate the RX analog beam. During the data transmission phase, the received signals pass through these phase-shifters, are summed and processed similar to conventional analog beamforming. As the phase and amplitude estimation is done in the analog domain, O(1) pilots are sucient to update the RX beamformer. Additionally, the power from several channel MPCs may be accumulated by this approach, thereby increasing the system diversity against MPC blocking. Note that the same variable gain phase-shifts can also be used for transmit beamforming on the reverse link. Furthermore, by providing an option for digitally controlling the inputs to the phase-shifters, the proposed architecture can also support conventional beamforming approaches. On the ip side, the accumulation of power from multiple MPCs may cause frequency selec- tive fading in a wide-band scenario, which degrades performance. Furthermore, the proposed approach in it's current suggested form does not support reception of multiple spatial data streams and can only be used for beamforming at one end of a communication link. This ar- chitecture is therefore more suitable for use at the UEs. The possible extensions to multiple spatial stream reception shall be explored in future work. While the proposed architecture is also applicable in frequency- at fading scenarios, in this chapter we shall focus on the analysis of a wide-band scenario where the repetition interval of PACE and the beamformer update is of the order of aCSI coherence time. LNA s rx;Mrx (t) R D 2 Ts 0 ()dt R D 2 Ts 0 ()dt a b a -b + =2 a b + . . . LNA s rx;2 (t) R D 2 Ts 0 ()dt R D 2 Ts 0 ()dt Phase Shifter PLL = 2 LNA s rx;1 (t) R D 2 Ts 0 ()dt R D 2 Ts 0 ()dt Phase Shifter + Imfr(t)g + Refr(t)g Phase and amplitude estimation Down-conversion Figure 10.1: Block diagram of an RX with analog beamforming enabled via analog channel estimation. The contributions of this chapter are as follows: 124 1. A novel PACE technique is proposed, that enables RX analog beamforming with low CE overhead. 2. Two novel carrier recovery circuits that can aid PACE are proposed and analyzed, and their phase noise is characterized via linear analysis techniques. 3. The achievable system throughput with PACE aided beamforming in a wide-band channel is characterized analytically. 4. Simulations with practically relevant channel models are used to support the analytical results. The organization of the chapter is as follows: the system model is presented Section 10.1; two designs for ACE and their respective noise analysis is presented in Section 10.2; the system per- formance with PACE aided beamforming is characterized in Section 10.3; advantages of PACE beamforming during IA phase is discussed in Section 10.4; simulations results are presented in Section 10.5 and nally conclusions are in Section 10.6. Notation: scalars are represented by light-case letters; vectors by bold-case letters; and sets by calligraphic letters. Additionally, j = p 1, a is the complex conjugate of a complex scalar a, jaj represents the ` 2 -norm of a vector a and A y is the conjugate transpose of a complex matrix A. Finally,Efg represents the expectation operator, Refg/Imfg refer to the real/imaginary component, respectively, Expfag represents an exponential distribution with mean a and Unifa;bg represents a uniform distribution in range [a;b]. 10.1 General Assumptions and System model The downlink of a single-cell MIMO system having one BS withM tx antennas, and several UEs withM rx antennas each is considered. Since the main focus is on the downlink, the abbreviations BS & TX and UE & RX shall be used interchangeably. Each UE is assumed to have one up/down-conversion chain, while no assumptions are made regarding the BS architecture. The BS transmits a single spatial data-stream to each scheduled UE, and all such scheduled users are served simultaneously via spatial multiplexing. Furthermore, the downlink data to the users is assumed to be transmitted via orthogonal precoding beams such that there is no inter-UE interference. 1 Under these assumptions and given transmit precoding beams and power allocation, the analysis shall be restricted to a single representative UE without loss of generality. For convenience, the use of noise-less and perfectly linear antennas, lters, ampliers and mixers shall be assumed at both the BS and UE. An analysis including the non-linear eects of these analog components is beyond the scope of this chapter. The BS transmits OFDM symbols with K sub-carriers, indexed asK =fK 1 ;:::;K 2 1;K 2 g with K 1 +K 2 + 1 = K, to this representative UE. 2 The BS transmits two kinds of symbols: reference symbols and data symbols. In a reference symbol, only a reference tone, i.e., a sinusoidal signal with a 1 This type of precoding is possible by avoiding transmission to the scatterers common to multiple scheduled users [96]. 2 While the proposed PACE technique is also applicable to single carrier transmission, a detailed analysis of the same is beyond the scope of this chapter. 125 pre-determined frequency known both to the BS and UE, is transmitted on the 0-th subcarrier, and the remaining sub-carriers are all empty. On the other hand, in a data symbol all the K sub-carriers are used for data transmission. 3 The purpose of the reference symbols is to aid PACE and beamformer design at the RX, as shall be explained later. Since the BS can aord an accurate oscillator, it shall be assumed that the BS suers negligible phase noise. TheM tx 1 complex equivalent transmit signal for the 0-th symbol, if it is a reference or data symbol, respectively, can then be expressed as: ~ s (r) tx (t) = r 2 T cs t p E (r) e j2fct (10.1a) ~ s (d) tx (t) = r 2 T cs t X k2K x (d) k e j2f k t e j2fct ; (10.1b) forT cp tT s , where t is the M tx 1 unit-norm TX beamforming vector for this UE with jtj = 1,x (d) k is the data signal at thek-th OFDM sub-carrier, j = p 1,f c is the carrier/reference frequency, f k = k=T s represents the frequency oset of the k-th sub-carrier, T cs = T cp +T s and T s ;T cp are the symbol duration and the cyclic prex duration, respectively. Here the complex equivalent signal is dened such that the actual (real) transmit signal is given by s () tx (t) = Ref~ s () tx (t)g. For the data symbols, the use of Gaussian signaling is assumed with E (d) k =Efjx k j 2 g, for each k2K. The total average transmit OFDM symbol energy (including cyclic prex) allocated to the UE is dened as E cs , where E cs E (r) and E cs P k2K E (d) k . Furthermore, for convenience f c is assumed to be a multiple of 1=T cs , which ensures that the reference tone has the same initial phase in each symbol. The channel to the representative UE is assumed to be sparse with L resolvable MPCs, where L M tx ;M rx and the corresponding M rx M tx channel impulse response matrix is given as [81]: H(t) = L1 X `=0 ` a rx (`)a tx (`) y (t ` ); (10.2) where ` is the complex amplitude and ` is the delay and a tx (`); a rx (`) are the TX and RX array response vectors, respectively, of the `-th MPC. As an illustration, the `-th RX array response vector for a uniform planar array withM H horizontal andM V vertical elements (M rx =M H M V ) is given by a rx (`) = a rx rx azi (`); rx ele (`) , where we dene: a rx ( rx azi ; rx ele ), 2 6 6 6 6 4 1 e j2 H sin( rx azi ) sin( rx ele ) ::: e j2 H (M H 1) sin( rx azi ) sin( rx ele ) 3 7 7 7 7 5 2 6 6 6 6 4 1 e j2 V cos( rx ele ) ::: e j2 V (M V 1) cos( rx ele ) 3 7 7 7 7 5 ; (10.3) 3 In an actual implementation the data symbols may have may also have null and pilot sub-carriers, but there ignored here for simplicity. 126 rx azi (`), rx ele (`) are the azimuth and elevation angles of arrival for the `-th MPC, H ; V are the horizontal and vertical antenna spacings and is the wavelength of the carrier signal. Expressions for a tx (`) can be obtained similarly. Note that in (10.2) the system bandwidth is implicitly assumed to be small enough to assume frequency at MPC amplitudesf 0 ;::; L1 g and to ignore beam squinting eects [237]. To prevent inter symbol interference, the cyclic prex is assumed to be longer than the maximum channel delay: T cp > L1 . To model a time varying channel, the MPC parametersf ` ; a tx (`); a rx (`)g are treated as aCSI, that remain constant within an aCSI coherence time and may change arbitrarily afterwards. 4 The MPC delaysf 0 ;:::; L1 g may however change faster, and are modeled as iCSI parameters that only remain constant within a shorter time interval called the iCSI coherence time. Note that this time variation of delays is an equivalent representation of the Doppler spread experienced by the RX. Finally, no distribution prior or side information onf ` ; a tx (`); a rx (`); ` g are assumed. The RX front-end is assumed to have an LNA followed by a BPF at each antenna element that leaves the desired signal un-distorted but suppresses the out-of-band noise. The M rx 1 ltered complex equivalent received waveform for the 0-th symbol can then be expressed as: ~ s () rx (t) = L1 X `=0 ` a rx (`)a tx (`) y ~ s () tx (t ` ) + p 2 ~ w () (t)e j2fct (10.4) for 0 t T s , where () = (r) (d), ~ w () (t) is the M rx 1 complex equivalent, base-band, stationary, additive, vector Gaussian noise process, with individual entries being circularly symmetric, i.i.d., and having a power spectral density: S w (f) = N 0 forf K 1 f f K 2 . During the data transmission phase, the M rx 1 received waveform for a data symbol ~ s (d) rx (t) is phase shifted by a bank of phase-shifters whose outputs are summed and fed to a down- conversion chain for data demodulation, as in conventional analog beamforming. However unlike conventional analog beamforming, the control signals to the phase-shifters are obtained using the reference symbols ~ s (r) rx (t) and using PACE, as shall be discussed in the next section. Figure 10.2: An illustrative transmission block structure for the PACE scheme. Here the communication between a BS and UE is assumed to involve three important phases: 4 While each MPC may contain several unresolved sub-paths, the corresponding set of scatterers are usually co-located. Therefore the relative sub-path delays and resulting MPC amplitude ` are expected to vary slowly with the TX/RX movement. 127 IA - where the BS and UE nd each other, timing/frequency synchronization is attained and spectral resources are allocated; analog beamformer design - where the BS and UE obtain the required CSI to update the analog precoding/combining beams; and data transmission. 5 For convenience, the relative time scale of these steps are illustrated in Fig. 10.2. In this chapter it is assumed that the IA and beamformer design at the BS are already achieved, and the main focus shall be on the beamformer design at the UE and the data transmission phase. Therefore perfect timing and frequency synchronization is assumed between the BS and UE, and the TX beamforming vector t is assumed to be pre-designed based on aCSI at the BS. Later in Section 10.4, it shall also be brie y discussed how the aCSI can be attained at the BS and how the proposed new RX architecture can aid in the IA phase of communication. 10.2 Analog beamformer design at the receiver During each beamformer design phase, the BS transmits D consecutive reference symbols to facilitate PACE for designing the RX beamformer. This process involves two steps: lock- ing/synchronizing of a local RX oscillator to the received reference tone 6 and using the locked oscillator to estimate the amplitude and phase-osets at each antenna. The rst D 1 reference symbols are used for the former step and the remainingD 2 symbols are used for the latter step (D = D 1 +D 2 ). Therefore D is independent of M rx and is mainly determined by the time required for the lock acquisition (see Remark 10.2.1). The rst step, referred to as recovery of the reference tone, is analyzed in Section 10.2.1 and the latter step is discussed in Section 10.2.2. Then in Section 10.2.3, an improved architecture for reference tone recovery shall be explored, that provides better performance albeit with a slightly higher hardware complexity. For convenience, it shall be assumed that the MPC delays do not change within the beamformer design phase, and are represented asf^ 0 ;:::; ^ L1 g (see also Remark 10.2.2). However the de- lays may be dierent during the data transmission phase, as shall be considered in Section 10.3. Without loss of generality, assuming the rst reference symbol to be the 0-th OFDM symbol, the complex equivalent RX signal for the D reference symbols at antenna m can be expressed as: 7 ~ s (r) rx;m (t) = p 2A (r) m e j2fct + p 2 ~ w (r) m (t)e j2fct (10.5) for 0 t DT cs T cp , where A (r) m , P L1 `=0 q 1 Tcs ` [a rx (`)] m a tx (`) y t p E (r) e j2fc ^ ` is the amplitude of the reference tone at antenna m. 10.2.1 Recovery of the reference tone - using a single PLL Here the reference tone recovery is assumed to be performed by a type 2 analog PLL at antenna 1, as illustrated in Fig. 10.3. A more involved structure for recovery shall be discussed later 5 For brevity, estimation of the eective instantaneous channel between the TX and RX beams is treated as part of the data transmission step. 6 Note that IA based frequency synchronization may compensate for the TX-RX frequency oset via digital post processing. Thus frequency synchronization does not imply having an RX oscillator that is locked to the reference tone. 7 The component of ~ s () rx (t) forTcpt 0 suers inter-symbol interference and hence is not included here. 128 in Section 10.2.3. Here, LF represents the loop low-pass lter, G is the variable loop gain and LNA s rx;1 (t) LF G VCO s vco (t) s PLL (t) Figure 10.3: Block diagram of the phase locked loop at antenna 1. VCO represents a voltage controlled oscillator. LF is assumed to be a rst-order active low-pass lter with a transfer functionLF(s) = 1 +=s and the loop gain G is assumed to adapt to the amplitude of the input such that GjA (r) 1 j = constant. 8 While the VCO also may have its own internal noise process [238,239], here its contribution is neglected in comparison to the additive noise at the PLL input. Without loss of generality, let the output of the VCO be expressed as: s vco (t) = p 2 cos[2f c t + +(t)] (10.6) where (t) may be arbitrary and we dene 2 (;] such that A (r) 1 e j =jjA (r) 1 j. The stochastic dierential equation governing the VCO output for 0tDT cs T cp can then be expressed as [232]: 2f c + d(t) dt = LF n Ref~ s rx;1 (t)g p 2 cos 2f c t + +(t) o G + 2f vco = LF n Re A (r) 1 e j[ +(t)] + ~ w (r) 1 (t)e j[ +(t)] o G + 2f vco (10.7) where f vco is the free running frequency of the VCO. In this section, one is interested in characterizing the time required for locking, i.e., D 1 T cs and the spectrum of the VCO output s vco (t), or equivalently(t), during the lastD 2 reference symbols, i.e., in the steady-state when the PLL is locked to the reference tone. The rst part is answered by the following remark: Remark 10.2.1. For the PLL considered, the phase lock acquisition time is 1 2(fcfvco) jA (r) 1 jG 2 in the no noise scenario [232, 233]. Thus andjA (r) 1 jG must be of the orders of 1=T s and 2jf c f vco j respectively, to keep D 1 small. Numerous techniques [240,241] have been proposed to further reduce the acquisition time, which are not discussed here for brevity. Next, the steady state distribution of(t) (modulo 2) is characterized. In the steady-state, it can be shown that (t) (modulo 2) is approximately a zero mean random process [232,233]. To enable simple closed-form expressions for (t) (and hence s vco (t)), (10.7) is linearized using the following widely used approximations: 1. For linearizing the PLL, cycle slips are neglected and the VCO phase deviations are assumed to be small, such that e j(t) 1 j(t). 8 Such a variable gain can possibly be implemented by using an automatic gain control circuit. 129 2. The distribution of the base-band noise process ~ w (r) 1 (t) is assumed to be invariant to multiplication with e j[ +(t)] , i.e., ^ w (r) 1 (t) , ~ w (r) 1 (t)e j[ +(t)] is also a Gaussian noise process with power spectral densityS w (f). Approximation 1 is accurate in the steady-state in the large SNR regime, while Approximation 2 is accurate when the noise bandwidth is much larger than the loop lter bandwidth [226,232]. Using these approximations and the denition of , (10.7) can be linearized as: 2[f c f vco ] + d L (t) dt = LF ( jA (r) 1 j L (t) + ^ w (r) 1 (t) + [ ^ w (r) 1 (t)] 2 ) G (10.8) where (t) is replaced by L (t) to denote use of the linear approximation, and the fact that f c is much larger than the bandwidth of LF is used. Note that for sucient SNR, (t) d L (t) (modulo 2) during the last D 2 reference symbols. Assuming L (0) = 0 and the PLL input is 0 for t 0 and taking the Laplace transform on both sides of (10.8), we have: 2[f c f vco ] s +s L (s) =GLF(s) " jA (r) 1 j L (s) + ^ W (r) 1 (s) + [ ^ W (r) 1 (s )] 2 # (10.9) where L (s) and ^ W (r) 1 (s) are the Laplace transforms of L (t) and ^ w (r) 1 (t), respectively. It can be veried using the nal value theorem that the contribution of the rst term on the left hand side of (10.9) vanishes as t increases. Furthermore, in the steady-state, L (t) is a zero mean, stationary Gaussian process [238]. Therefore ignoring the rst term in (10.9), the power spectral density, auto-correlation function and variance of L (t) in steady-state can then be computed, respectively, as: S L (f) = jGj 2 (4 2 f 2 + 2 )S w (f) 2 4 2 f 2 +G(j2f +)jA (r) 1 j 2 (10.10) R L () = Z 1 1 S L (f)e j2ft dt jGj 2 N 0 4 h a 2 2 a(a 2 b 2 ) e ajtj + b 2 2 b(b 2 a 2 ) e bjtj i (10.11) Varf L (t)g = R L (0) N 0 jA (r) 1 jG + 4jA (r) 1 j 2 ; (10.12) where 2a =GjA (r) 1 j + q G 2 jA (r) 1 j 2 4GjA (r) 1 j, 2b =GjA (r) 1 j q G 2 jA (r) 1 j 2 4GjA (r) 1 j, (10.11){ (10.12) follow from nding the inverse Fourier transform via partial fraction expansion and the nal expressions follow by observing thatS w (f) N 0 for all f. 10.2.2 Phase and amplitude oset estimation Using the steady state (approximate) statistics for (t) derived in the last subsection, here the amplitude and phase-oset estimates at each antenna, represented as the integrator outputs in Fig. 10.1, shall be characterized. Note from Fig. 10.1 that the PLL output at antenna 1 is 130 fed to a =2 phase shifter to obtain its quadrature component. From (10.6), the in-phase and quadrature-phase components of the PLL output in steady-state, i.e., for D 1 T cs T cp t DT cs T cp can be expressed together as: ~ s PLL (t) = p 2e j[2fct+ +(t)] : (10.13) At each antenna, the received signal for the reference symbols is multiplied by the in-phase and quadrature-phase components of the VCO output, and the outputs are fed to a `lter, sample and hold' circuit. This circuit involves a low pass lter with bandwidth of 1=(D 2 T cs ) and is followed by a sample and hold circuit that samples the ltered output at the end of the D reference symbols. For convenience, in this chapter the `lter, sample and hold' outputs shall be approximated by an integrate and hold operation as depicted in Fig. 10.1. Representing the `lter, sample and hold' outputs corresponding to the in-phase and quadrature-phase compo- nents of the PLL output as real and imaginary respectively, the correspondingM rx 1 sampled complex output vector can be approximated as: I PACE 1 D 2 Z T 2 T 1 Ref~ s (r) rx (t)g~ s PLL (t)dt = 1 D 2 Z T 2 T 1 r 1 T cs ^ H(0)t p E (r) e j[ +(t)] + ^ w (r) (t) dt (10.14) where 1 D 2 is a scaling factor, T 1 ,D 1 T cs T cp , T 2 ,DT cs T cp , ^ H(f k ), L1 X `=0 ` a rx (`)a tx (`) y e j2(fc+f k )^ ` is the M rx M tx frequency-domain channel matrix for the k-th subcarrier during beamformer design phase and ^ w (r) (t), ~ w (r) (t)e j[ +(t)] is an M rx 1 i.i.d. Gaussian noise process vector with power spectral densityS w (f) (see Approximation 2). From (10.11), note that the auto- correlation function of L decays exponentially with a time constant ofO(1=GjA (r) 1 j). Therefore, forGjA (r) 1 j 1=(D 2 T cs ), I PACE experiences enough independent realizations of (t). Therefore replacing the integral with an expectation over VCO phase noise in (10.14), we have: I PACE (1) p T cs ^ H(0)t p E (r) e j Efe j L (t) g + Z T 2 T 1 ^ w (r) (t) D 2 dt (2) = p T cs ^ H(0)t p E (r) e j e Varf L (t)g 2 + p T cs ^ W (r) (10.15) where (1) follows from the fact that (t) d L (t) (modulo 2) in steady-state, (2) = follows by dening ^ W (r) , 1 D 2 p Tcs R T 2 T 1 ^ w (r) (t)dt, by noting that L (t) is a stationary Gaussian process with variance as in (10.12) and by using the expression for its characteristic function. Since ^ w (r) (t) is i.i.d. Gaussian with a power spectral densityS w (f), it can be veried that ^ W (r) CN [O Mrx1 ; (N 0 =D 2 )I Mrx ] when 1 D 2 K 1 ;K 2 . The outputs of these sample and hold circuits are then used as a control signals to the RX phase-shifter array, to generate the RX analog 131 beam to be used during the data transmission phase. From (10.15) and (10.12), note that either D 2 orjA (r) 1 j can be increased, to reduce the impact of noise ^ W (r) on the analog beam. Since jA (r) 1 j is a non-decreasing function of E (r) (see (10.5)), this implies that E (r) should be kept as large as possible while satisfying E (r) E cs and meeting the spectral mask regulations. Note that the results in this section are based on several approximations, including the linear phase noise analysis in Section 10.2.1. To test the accuracy of these results, the numerical values of R T 2 T 1 e j(t) dt D 2 T cs , obtained by simulating realizations of (t) from (10.7), are compared to its analytic approximation e Varf L (t)g 2 in Fig. 10.4. Note that this comparison re ects the accuracy of the approximation in (10.15). As is evident from Fig. 10.4, (10.15) is accurate above a certain SNR. Additionally, since I PACE decays exponentially with Varf L (t)g (see (10.15)), one can observe from Fig. 10.4a that the mean integrator output drops drastically below a certain threshold SNR. As shall be shown in Section 10.3, such a drop in the mean causes a sharp degradation in the system performance below this threshold SNR. Therefore in the next subsection a better reference recovery circuit is proposed, referred to as weighted carrier arraying, that reduces the SNR threshold. 0 20 40 60 80 100 120 140 160 180 200 10 -2 10 -1 10 0 (a) 1Mean: 1E R T 2 T 1 e j(t) D 2 Tcs dt 20 40 60 80 100 120 140 160 180 200 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 (b) Variance: E h R T 2 T 1 e j(t) D 2 Tcs dt e Varf L g 2 i 2 Figure 10.4: Accuracy of the lter, sample and hold outputs as a function of SNR for the single PLL and weighted arraying architectures. It is assumed that A (r) 1 = 1, A (r) 5 = 0:7e j=3 , A (r) 15 = 0:5e j=3 and the remaining simulation parameters are from Table 10.1. Table 10.1: Single PLL and weighted arraying simulation parameters Parameter f c f c f vco T s T cp K 1 K 2 GjA (r) 1 j Value 30GHz 5MHz 1s 0:1s 512 511 jf c f vco j 4=T s Parameter f IF f c f IF f p vco M G p jA (r) rss j 2 p Value 1GHz 5MHz f1; 5; 15g 2=T s jf c f IF f p vco j 4=T s Remark 10.2.2. The preceding derivations assumed that the MPC delays are identical for the D reference symbols. However since the PLL continuously tracks the RX signal and phase/ 132 amplitude estimation at each antenna is performed simultaneously, these results are valid even if the delays change slowly within the beamformer design phase. Remark 10.2.3. The RX phase-shifter array or the down-conversion chain are not utilized during the D reference symbols of the beamformer design phase. Therefore, data reception is also possible during these D reference symbols in parallel, as long as a sucient guard band between the data sub-carriers and the reference sub-carrier is provided (similar to (10.27)) to reduce impact on the PLL performance. Note that in a multi-cellular scenario, use of the same reference tone in adjacent cells can cause reference tone contamination, i.e., I PACE may contain components corresponding to the channel from a neighboring BS. This is analogous to pilot contamination in conventional CE approaches, and can be avoided by using dierent, well-separated reference frequencies in adjacent cells, as also suggested by us in [225]. 10.2.3 Recovery of the reference tone - using weighted carrier arraying One way to improve the performance and reduce the PLL SNR threshold is via weighted carrier arraying as illustrated in Fig. 10.5. In weighted carrier arraying, apart from the main primarly PLL, secondary PLLs are used at a subsetM of antennas, which compensate for the inter-antenna phase shift. The resulting phase compensated signals from theM antennas are weighted, combined and tracked by the primary PLL, which operates at a higher SNR and with a wider loop bandwidth than the secondary PLLs. Note that this architecture can be interpreted as a generalization of the design proposed in [230, 231, 242, 243] to multi-path channels. Next, the linear steady-state noise analysis of this system in a multi-path channel is presented. However, an analysis of the transient behavior and lock acquisition time of this design is beyond the scope of this chapter. In Fig. 10.5, LPF=BPF refer to low-pass and band- LNA s rx;m (t) LPF LPF G s m VCO s Secondary PLLs 1=G s m LNA s rx;1 (t) LPF LPF G s 1 VCO s 1=G s 1 + LF p G p VCO p cos(2f IF t) BPF s PLL (t) Arrayed PLL Figure 10.5: Block diagram of weighted carrier arraying for reference tone recovery. pass lters with a wide bandwidth, designed only to remove the unwanted side-band of the mixer outputs. Without loss of generality, let the outputs of the primary and secondary VCOs 133 be expressed as: 9 s p vco (t) = p 2 cos[2(f c f IF )t +(t)] s s vco;m (t) = p 2 cos[2f IF t + m + m (t)]; m2M respectively, where (t); m (t) are arbitrary, f IF is the common free running frequency of the secondary VCOs, and m are such that A (r) m e j[ m] =jjA (r) m j. Now similar to Section 10.2.1, from (10.5) the dierential equation governing the secondary PLL at antenna m2M can be expressed as: d m (t) dt = Re A (r) m e j[ m+m(t)+(t)] + ~ w (r) m (t)e j[ m+m(t)+(t)] G s m p 2 = Re jjA (r) m je j[m(t)+(t)] + ^ w (r) m (t) G s m p 2 (10.16) where we dene ^ w (r) m (t) , ~ w (r) m (t)e j[ m+m(t)+(t)] and G s m is the loop gain of the secondary VCO at antenna m. Similarly, for the primary VCO we have: 2(f c f IF )+ d(t) dt = LF n X m2M Re jjA (r) m je j[m(t)+(t)] + ^ w (r) m (t) 1 G s m o G p p 2 + 2f p vco (10.17) where f p vco is the free running frequency of the primary VCO, G p is the loop gain and LF p is an active low pass lter with transfer function LF p (s) = (1 + p =s). Similar to Section 10.2.1, the linear analysis shall be relied upon by using 1) e j[m(t)+(t)] 1 j[ m (t) +(t)], which is accurate in the high SNR steady-state when m (t) +(t) 1 and 2) ^ w (r) m (t) d ~ w (r) m (t), which is accurate for a wide noise bandwidth. Using these approximations in (10.16){(10.17) with zero initial conditions and taking Laplace transforms, one obtains: s L m (s) = jA (r) m j[ L m (s) + L (s)] + ^ W (r) m (s) + [ ^ W (r) m (s )] 2 G s m p 2 (10.18a) s L (s) = LF(s) X m2M h jA (r) m j G s m [ L m (s)+ L (s)]+ ^ W (r) m (s)+[ ^ W (r) m (s )] 2G s m i G p p 2 + 2(f IF +f p vco f c ) s (10.18b) where ^ W (r) m (s), L (s) and L m (s) are the Laplace transforms of ^ w (r) m (t), linear approximation L (t) and linear approximation L m (t), respectively. It is assumed that the loop gains of the PLLs adapt to the amplitudes of the input such thatjA (r) m jG s m =;8m2M and P m2M G p jA (r) m j 2 = constant. Then solving the system of equations in (10.18), one obtains: s + X m2M (s + p )jA (r) m j 2 G p ( p 2s +) L (s) = X m2M (s + p )( ^ W (r) m (s) + [ ^ W (r) m (s )] )jA (r) m jG p 2( p 2s +) 9 Another convergence point fors p vco (t) is at a frequency of (fc +fIF). But the nal results presented here are also valid for this alternate convergence point. 134 + 2(f IF +f p vco f c ) s (10.19) It can be veried using the nal value theorem that the last term in (10.19) only contributes a constant phase shift in steady-state, say L . 10 Thus the steady-state power spectral density of the time varying part of L (t), i.e., L (t) L , can be obtained as: S L L (f) = N 0 jA (r) rss G p j 2 2 (s + p ) s( p 2s +) + (s + p )jA (r) rss j 2 G p 2 s=j2f (10.20) where [A (r) rss ] 2 = P m2M jA (r) m j 2 . Finding the inverse Fourier transform via partial fraction ex- pansion, the variance of (t) can be obtained as: Varf L (t)g = (jA (r) rss j 2 [G p =] + p 2 p )[G p =]N 0 4 p 2( +jA (r) rss j 2 [G p =]) (jA (r) rss j 2 [G p = p 2] + p )N 0 4jA (r) rss j 2 (10.21) Comparing (10.21) to (10.12), note that the PLL phase noise is essentially reduced by the maximal ratio combining gain corresponding to theM antennas. Ass PLL (t) in Fig. 10.5 suers same phase noise as (t), the corresponding `lter, sample and hold' outputs with weighted carrier arraying can be obtained by using (10.21) in (10.15). The accuracy of the resulting approximation is studied in Fig. 10.4. 10.3 Data transmission During the data transmission phase, OFDM symbols of type (10.1b) are transmitted and the corresponding received signals are processed via the phase-shifter array with I PACE as the control signals. Without loss of generality, again assuming the 0-th OFDM symbol as a representative data symbol, the combined data signal at the RX for 0tT s can be expressed as: 7 R(t) = 1 p 2 I y PACE L1 X `=0 X k2K r 2 T cs ` a rx (`)a tx (`) y tx k e j2(fc+f k )(t ` ) + p 2 ~ w (d) (t)e j2fct where the 1= p 2 term in the rst step is just a scaling constant for convenience and the MPC delays for this representative data symbol are assumed to bef 0 ;:::; L1 g. This phase shifted and combined signal R(t) is then converted to base-band by a separate RX oscillator, whose phase oset and noise are assumed to be mitigated via some digital phase noise compensation techniques [239, 244{246]. Therefore neglecting its phase noise, the resulting base-band signal can be expressed as R BB (t) = R(t)e j2fct . This signal is then sampled and OFDM demodu- lation follows. The OFDM demodulation output for the k-th subcarrier (k2K) is then given 10 Simulations suggest this steady-state phase shift for the actual non-linear system (10.16){(10.17) is noise dependent. However such an arbitrary, but constant, phase shift does not impact the resulting beamforming gain if cycle skipping probability is low. 135 by: Y k = 1 K K1 X u=0 R BB uT s K e j 2ku K = 1 p T cs I y PACE H(f k )tx k + 1 p T cs I y PACE ~ W (d) [k] (10.22) whereH(f k ), P L1 `=0 ` a rx (`)a tx (`) y e j2(fc+f k ) ` is the M rx M tx frequency domain channel matrix for the k-th data subcarrier and ~ W (d) [k], p Tcs K P K1 u=0 ~ w (d) ( uTs K )e j 2ku K , with ~ W (d) [k] being independently distributed for each k 2K as ~ W (d) [k]CN [O Mrx1 ; (N 0 T cs =T s )I Mrx ]. Note from (10.15) that I y PACE is similar (with appropriate scaling), but not identical, to the perfect MRC beamformer for the k-th sub-carrier: t y H(f k ) y . The mismatch is due to the beamforming noise ^ W (r) and because the reference symbols and thek-th sub-carrier data stream pass through slightly dierent channels, owing to the dierence in sub-carrier frequencies and the MPC delays (^ ` 6= ` ). Consequently, the beamformer I PACE only achieves imperfect MRC combining, causing some loss in performance and the eective channel coecients I y PACE H(f k )t to vary with the sub-carrier index k, i.e., the system experiences frequency-selective fading. Furthermore, since the MPC delaysf 0 ;::; L1 g change after every iCSI coherence time, so may these channel coecients. As depicted in Fig. 10.2, the TX is assumed to transmits pilot symbols within each iCSI coherence time, using which the RX can estimate the coecients I y PACE H(f k )t k2K . Since these pilots are used only to estimate the eective single-input- single-output channel and not the actual MIMO channel, the corresponding overhead is small and shall be neglected here. Assuming perfect estimates of these channel coecients, the SNR for the k-th sub-carrier, and the iSE can be expressed, respectively, as: PACE k ( ^ W (r) ; H(t)) , jI y PACE H(f k )tj 2 E (d) k kI PACE k 2 N 0 T cs =T s (10.23) iSE PACE ^ W (r) ; H(t) , X k2K 1 K log 1 + PACE k ( ^ W (r) ; H(t)) ; (10.24) where the cyclic prex overhead in (10.24) is neglected for convenience. Note that the iSE maximizing data power allocationfE (d) k jk2Kg can be obtained via water-lling across the sub-carriers. While the exact expressions for (10.23){(10.24) are involved, their expectations with respect to I PACE can be bounded, as stated by the following theorem. Theorem 10.3.1. If the RX array response vectors for the channel MPCs are mutually orthog- onal, i.e., a rx (`) y a rx (i) = 0 for `6= i, the SNR and iSE, averaged over the beamformer noise ^ W (r) , can be bounded as: PACE k (H(t)) : M rx E (r) e Varf L (t)g j(f c ;f k )j 2 E (d) k (0; 0) N 0 D 2 E (d) k +(0; 0) N 0 Tcs Ts E (r) e Varf L (t)g + [N 0 ] 2 Tcs D 2 Ts (10.25a) iSE PACE H(t) : X k2K 1 K log 0 @ 1+ M rx E (r) e Varf L (t)g j(f c ;f k )j 2 E (d) k (0; 0) N 0 D 2 E (d) k +(0; 0) N 0 Tcs Ts E (r) e Varf L (t)g + [N 0 ] 2 Tcs D 2 Ts 1 A (10.25b) 136 where ( _ f; f) = P L1 `=0 j ` j 2 ja tx (`) y tj 2 e j[2 _ f(^ ` ` )2 f ` ] and : represents a inequality at a high enough SNR such that the approximations in Section 10.2 are accurate. Proof. By treating the received signal component corresponding to ^ W (r) in (10.22), i.e., [ ^ W (r) ] y H(f k )tx k , as noise, one can obtain a lower bound to the mean SNR as: PACE k (H(t)) : E ^ W (r) ( T cs t y ^ H(f 0 ) y H(f k )t 2 E (d) k E (r) e Varf L (t)g E x k ; ~ W d [k] n I y PACE ~ W d [k] + [ ^ W (r) ] y H(f k )tx k 2 o ) (1) T cs t y ^ H(f 0 ) y H(f k )t 2 E (d) k E (r) e Varf L (t)g E ^ W (r) n kI PACE k 2 N 0 T cs =T s + [ ^ W (r) ] y H(f k )t 2 E (d) k o = t y ^ H(f 0 ) y H(f k )t 2 E (d) k E (r) e Varf L (t)g h t y ^ H(f 0 ) y ^ H(f 0 )tE (r) e Varf L (t)g + MrxN 0 D 2 i N 0 Tcs Ts + N 0 D 2 t y H(f k ) y H(f k )tE (d) k (2) = M rx j(f c ;f k )j 2 E (d) k E (r) e Varf L (t)g (0; 0) N 0 Tcs Ts E (r) e Varf L (t)g + N 2 0 Tcs D 2 Ts + N 0 D 2 (0; 0)E (d) k ; (10.26) where (1) follows from the Jensen's inequality and (2) = from the orthogonality of the array response vectors. Similarly, by treating [ ^ W (r) ] y H(f k )tx k as Gaussian noise independent of x k , a lower bound on the mean iSE can be obtained as: iSE PACE H(t) : E ^ W (r) X k2K 1 jKj log 2 6 41 + T cs t y ^ H(f 0 ) y H(f k )t 2 E (d) k E (r) e Varf L (t)g E x k ; ~ W d [k] n I y PACE ~ W d [k] + [ ^ W (r) ] y H(f k )tx k 2 o 3 7 5 (2) X k2K 1 jKj log 2 4 1+ M rx j(f c ;f k )j 2 E (d) k E (r) e Varf L (t)g (0; 0) N 0 Tcs Ts E (r) e Varf L (t)g + N 2 0 Tcs D 2 Ts + N 0 D 2 (0; 0)E (d) k 3 5 where (2) again follows from the Jensen's inequality, and by using similar steps to (10.26). The array response orthogonality condition in Theorem 10.3.1 is satised if the scatterers corresponding to dierent MPCs are well separated and M rx L [38]. Note that even though the RX does not explicitly estimate the array response vectors a rx (`) for the MPCs, one still observes an RX beamforming gain of M rx in (10.25a). The impact of imperfect MRC com- bining and the resulting frequency selective fading is quantied by (f c ;f k ), where note that j(f c ;f k )jj(0; 0)j. Another drawback of the fading is that it may cause a drastic drop in performance of the single PLL architecture in Section 10.2.1 ifjA (r) 1 j - the reference signal strength at the antenna 1 - falls in a fading dip, as is evident from (10.25). Note however that the weighted arraying architecture in Section 10.2.3 enjoys diversity against such fading by using signals from multiple antennas. 137 10.4 Initial access and aCSI estimation at the BS In this section, aCSI acquisition at the BS during the TX beamformer design phase is analyzed and a sample IA protocol that can utilize PACE is also proposed. The aCSI for the TX beamformer design phase can be obtained at the BS either via uplink CE, or by downlink CE with CSI feedback from the RX. Uplink CE can be performed using conventional uplink pilot training with any of the digital CE algorithms from Section 1.1. Note that PACE cannot be used at the BS since the pilots from multiple users need to be separated via digital processing. For downlink CE with feedback, the BS can transmit pilots sequentially along dierent transmit precoder beams, with D reference symbols for each beam. The RX performs PACE for each TX beam, and provides the BS with uplink feedback about the corresponding link strength. The suggested IA protocol is somewhat similar to the downlink CE with feedback, where the BS performs anglular domain CE by transmitting along dierent angular directions, pos- sibly with dierent beam widths. For each TX beam the BS transmits D reference symbols, followed by a PSS and a Secondary Synchronization Sequence (SSS). The RX performs ACE, and provides uplink feedback to the BS upon successfully detecting a PSS. However due to lack of prior timing synchronization, the `lter, sample and hold' circuit in Section 10.2.2 cannot be used directly for ACE. One alternative is to allow continuous transmission of the reference done even during the PSS and SSS with the following suggested symbol structure: ~ s (ia) tx (t) = r 2 T cs t p E (r) + X k2KnG x (p) k e j2f k t e j2fct (10.27) whereG =fg;::; 0;:::gg. 11 The amplitude and phase estimation can be performed similar to Section 10.2.2, i.e., by multiplying the received signal at each antenna with the PLL output and then ltering with a low pass lter with cut-o frequency 1=(D 2 T cs ). Due to the continuous availability of the reference tone, the lter outputs can be directly used to control the phase shifter at each antenna without the `sample and hold' operation. Since the lter cut-o fre- quency is low, its output changes rather slowly and hence the phase-shifted and combined IA data is conjectured to achieve an SNR comparable to (10.23). Since D =O(1), the IA latency does not scale with M rx and yet the PSS/SSS symbols can exploit the RX beamforming gain. 10.5 Simulation Results For the simulation results a single cell scenario is considered, with a single representative UE with a =2-spaced 16 4 (M rx = 64) antenna array, having one down-conversion chain and using PACE aided beamforming. The BS transmits a single spatial stream to this UE via a pre-designed transmit precoding beam t. For convenience, t shall be included into the channel to consider the M rx 1 eective channel H(t)t between the precoding beam and the RX. The RX beamformer design phase is assumed to last D = 6 symbols with D 2 = 2, where the BS transmits reference symbols with power E (r) = 20E cs =K (to satisfy spectral mask regulations) 11 HereG denes a guard band around the reference tone, separating it from the data. The purpose of the guard band is to reduce the impact of the data sub-carriers on the PLL output. 138 and the corresponding PLL parameters for the single PLL and weighted arraying at the RX, respectively, are assumed to be as given in Table 10.1. For comparison to existing schemes, the performance of statistical analog beamforming [41] at the RX shall be included, where the beamformer is the largest eigen-vector of the RX spatial correlation matrix R rx (t). For this case, it is assumed that there is no phase noise and perfect estimate of R rx (t) is available at the RX, given by: R rx (t) = 1 K P k2K ^ H(f k )tt y ^ H(f k ) y . First a sparse multi-path eective channel is considered with L = 3 MPCs and delays ^ ` = f0; 20; 40gns, ` ^ ` = f30; 25; 25gps, angles of arrival rx azi = f0;=6;=6g, rx ele = f0:45;=2;=2g and eective amplitudes ` atx(`) y t p (0;0) =f p 0:6; p 0:3; p 0:1g, respectively, dur- ing the beamformer design phase and during one snapshot of the data transmission phase. For this channel, the mean iSE of PACE aided beamforming, obtained using Monte-Carlo simula- tions with the non-linear PLL equations, is compared to the analytical approximation (10.25b) in Fig. 10.6a. Since I PACE in (10.14) is random, the one sigma interval of iSE is also depicted as a shaded region here. As is evident from the results, PACE aided beamforming suers only a 2 dB reduction in the beamforming gain compared to statistical beamforming, above a cer- tain SNR in sparse channels. As is expected, this SNR threshold is lower for weighted carrier arraying than for a single PLL. Furthermore, the derived analytical approximations are also accurate in this SNR regime. Note that these results are obtained for an oscillator oset of 5 MHz (see Table 10.1). Better performance can be achieved if the PLL is optimized for more accurate local oscillators. (a) Sparse channel 0 2 4 6 8 10 12 14 16 18 20 1 1.5 2 2.5 3 3.5 4 (b) Dense channel P ` P ` Es KN 0 = 1 Figure 10.6: Comparison of iSE for PACE based beamforming and conventional statistical beamforming versus SNR. Here E (r) = 20E cs =K, E (d) k = E cs =K for all k2K and the PLL parameters are as in Tables 10.1. Next, to study the impact of number of MPCs on system performance, the eective chan- nel is modelled as a rich scattering stochastic channel with L resolvable MPCs, each with 5 unresolved sub-paths. The MPC parameters are generated in a similar way to the cluster parameters in the 3GPP channel models [247{249]. To elaborate, the variablesf^ 0 ` j0 ` 139 L 1g are generated with ^ 0 ` Expf40nsg. The MPC delays during the beamformer de- sign phase are then generated as ^ = ^ 0 ` min ` 1 f^ 0 ` 1 g, and the MPC powers are generated as P ` = e 2^ ` =40ns . The powers are normalized such that P ` P ` = 1, and the angles of arrival at the RX are generated using inverse Laplacian functions: rx azi (`) = (=3) _ b ` log(P ` = maxfP ` g) and rx ele (`) = =2 + (=18) b ` log(P ` = maxfP ` g), where _ b ` ; b ` 2 f1;1g with equal proba- bility. The 5 unresolved sub-paths of each MPC are assumed to have equal power P `;q = P ` =5, excess delays [^ `;q ^ ` ] Expf1nsg, RX azimuth angle osets [ rx azi (`;q) rx azi (`)] Unif=50;=50g and RX elevation angle osets [ rx ele (`;q) rx ele (`)] Unif=50;=50g. Note that the eective channel during beamformer design phase then reduces to ^ H(t)t = P L1 `=0 P 5 q=1 p P `;q a rx ( rx azi (`;q); rx ele (`;q))(t ^ `;q ). The channel delays for one snapshot of the data transmission phase are generated by assuming the RX moves a distance of d = 2cm in a random azimuth direction azi Unif0; 2g without changing its orientation, i.e., `;q = ^ `;q d c cos( rx azi (`;q) azi ) sin( rx ele (`;q)), where c is the velocity of light. For this stochastic channel model, the mean iSE for PACE aided beamforming is compared to sta- tistical beamforming in Fig. 10.6b. For computational tractability, the non-linear PLL sim- ulation is skipped here and validity of (10.12), (10.15) and (10.21) is assumed, to gener- ate the Monte-Carlo simulation results. For the analytical approximation (10.25b), we use j ` j = P 5 q=1 P k2K p P `;q e j2(fc+f k )(^ `;q ^ ` ) K, i.e., the mean amplitude of the MPC during beamformer design phase and ` = min q f `;q g. As observed from the results, the loss in beam- forming gain for PACE aided beamforming is higher in rich scattering, and therefore it is mainly suitable for eective channels (including the transmit beamforming) with few resolvable MPCs. It must be emphasized that such cases may frequently occur at mm-wave frequencies, where the number of resolvable MPCs/clusters with signicant energy (within 20dB of the strongest) is on the order 3 10 [81,249]. Here the mismatch of the analytical approximation (10.25b) to simulations is due to two reasons: a) due to frequency selective fading of the MPC amplitudes, which is not modeled by (10.2) and b) due to non-orthogonality of the array response vectors for largeL. Thus further work is required to characterize the system performance in such channels in closed forms. Note however that the ACE noise analysis in Section 10.2 is still applicable for these scenarios. Note that for the iSE results in this section, the CE overhead has not been included. For PACE, only D = 6 pilots were used for designing the beamformer. In comparison, sparse ar- ray based techniques [80, 107] require at least 16 pilots for designing the beamformer. The corresponding overhead reduction is signicant when downlink CE with feedback is used for aCSI at BS, such as in frequency division duplexing systems. 12 For example with M tx = 256, exhaustive beamscanning [78] at the TX and an aCSI coherence time of 10ms, the BS aCSI acquisition overhead reduces from 40% for sparse array techniques to 15% for PACE. The over- head reduction is expected to be higher if the time required for updating the analog hardware is also taken into account. Thus, PACE aided beamforming shows potential in solving the CE overhead issue of hybrid massive MIMO systems, with minimal degradation in performance. 12 Even in time division duplexing systems, dowlink CE with feedback may be used during the IA phase, causing a large IA latency. 140 10.6 Conclusions This chapter proposes the use of PACE for designing the RX beamformer in massive MIMO systems. This process includes transmission of a reference sinusoidal tone during each beam- former design phase, and estimation of its received amplitude and phase at each RX antenna. A single PLL based carrier recovery circuit is proposed to support PACE, and its analysis suggests that the channel estimates decay exponentially with inverse of the SNR at the PLL input. To remedy this and also to obtain diversity against fading, a multiple PLL based weighted carrier arraying architecture is also proposed. The performance analysis suggests that PACE aided beamforming can be interpreted as using the channel estimates on one sub-carrier to perform beamforming on other sub-carriers, with an additional loss factor corresponding to the ACE phase-noise. Simulation results suggest that PACE aided beamforming suers only a small loss in beamforming gain in comparison to conventional analog beamforming in sparse channels, at suciently high SNR. This loss however increases with the number of channel MPCs L, and hence PACE is mostly suitable for sparse channels with few MPCs. The CE overhead reduction with PACE is signicant when downlink CE with feedback is required. Benets of PACE aided beamforming during IA phase are also discussed, although a more detailed analysis will be subject for future work. Similarly the performance of PACE at very low SNR and with system mismatches and imperfections also requires more attention. 141 Chapter 11 Conclusions and Future research directions In this part a class of reduced complexity multi-antenna RXs were studied, that can perform beamforming via analog CE of a reference tone that is transmitted along with the data. Such techniques are inspired by our results on a novel MD-FSR RX for single antenna UWB systems that is capable of demodulating QAM symbols, which are elaborated in Chapter 7. Results show that while its performance is better than conventional BPSK-capable MD-FSR schemes, the proposed RX suers from a signicant noise enhancement due to the bandwidth spreading and the non-linear squaring operation at the RX front end. Thus MD-FSR is mainly suited to support low data rates. Using the observation that such bandwidth spreading can be avoided when the RX is equipped with multiple antennas, a novel reduced complexity non-coherent RX for massive MIMO called Multi-Antenna Frequency Shift Reference (MA-FSR) is proposed in Chapter 8. Due to absence of bandwidth spreading, MA-FSR achieves good performance and only suers a 6 dB reduction in beamforming gain compared to conventional analog beam- forming, above a certain SNR threshold in sparse channels. However the performance degrades signicantly below this SNR threshold. This is because, while the MA-FSR avoids the noise enhancement due to bandwidth spreading, it still suers high signicant noise levels due to the squaring operation when the system bandwidth is wide. Furthermore, the bandwidth eciency of MA-FSR is only 50%, thus limiting the throughput. Two techniques for reducing the noise levels of MA-FSR are proposed in Section 8.5 that rely on replacing the squaring operation in MA-FSR with a multiplication operation. One of these techniques, Reference Tone Aided Transmission (RTAT), also improves the bandwidth eciency to 100% and is analyzed in de- tail in Chapter 9. Results show that RTAT only suers a 2:5 dB reduction in beamforming gain in comparison to conventional analog beamforming, above a certain SNR threshold in sparse channels. However, in RTAT the reference tone transmission and ACE are performed contin- uously, which leads to a wastage of transmit power. It also prevents averaging of the channel estimates from multiple symbol periods. In contrast a Periodic Analog Channel Estimation (PACE) scheme is proposed and analyzed in Chapter 10, where the reference tone transmis- sion and beamformer update is performed only during dedicated time slots. Additionally an weighted, arrayed carrier recovery circuit is proposed that can exploit the received signals from 142 multiple RX antennas for reference tone recovery. Simulation results show that PACE also suf- fers only a 2 dB reduction in beamforming gain when compared to analog beamforming, but the corresponding SNR threshold is much lower than for RTAT under similar system parameters. Despite several interesting RX architectures being explored in this part, there is a signicant scope for additional work a few of which are discussed below: 1. The analysis of ACE based RXs in Chapters 8{10 assumes ideal, linear performance of the analog hardware such as LNAs, oscillators, squaring circuits and lters. However in practice, such components often suer from non-idealities such as 1dB compression of the LNA, local oscillator leakage and third-order inter-modulation products. The impact of such non-idealities is often important in system analysis especially when a strong reference tone is transmitted with the data. This is because there exists the possibility of higher order reference-noise products overwhelming the desired data-reference product terms. Therefore the results and inferences presented in Part II have to be treated with caution till an analysis of hardware imperfections and possible hardware implementation is performed. 2. In Chapter 8, a simple demodulation approach was assumed where each sub-carrier data is decoded independently. However since the sub-carrier noise is obtained by a non- linear squaring operation, the noise at each sub-carrier is non-linearly dependent. The large noise enhancement experienced by MA-FSR, and in general FSR receivers, can possibly be reduced if a joint decoding of the sub-carriers is performed. While joint sub- carrier decoding via exhaustive search suers from exponential complexity, I believe that approaches such as Viterbi's algorithm can be explored to reduce the decoding complexity. In fact some such reduced complexity joint decoding approaches have been proposed for dierentially encoded non-coherent RXs UWB systems [74{76]. While not directly extendable to MA-FSR they may yield useful insight into the problem. 3. In Chapters 9 and 10 the phase noise of the TX oscillator is ignored. While the use of a PLL for reference recovery at RX can intuitively mitigate some of the TX phase noise, this has to be backed up by a thorough analysis in future work. 4. Note that in Chapter 9 the nBPF blocks have a non-zero bandwidth and accumulate some amount of channel noise ^ n(t). After the non-linear operation at the RX, the product of ^ n(t) with the reference tone falls within the sub-carrier rangefg;::;gg. Since these sub- carriers do not contain any desired data terms, they were ignored in Chapter 9. However by estimating ^ n(t) from these sub-carrier outputs, the impact of it on the remaining data sub-carriers can be compensated partially. This approach is similar to the phase noise compensation techniques used in [246]. The performance analysis of the RTAT RX with this additional noise cancellation will be an interesting extension to explore. 5. In Chapter 9, a detailed discussion about the implementation of the nBPF block was not considered. While it was hinted that in one implementation a PLL can be used to build the nBPF at each RX antenna (see Fig. 9.3), such a design would lead to a large hardware cost. Instead, an alternate architecture similar to the one in Chapter 10 with a single PLL can be used, as illustrated in Fig. 11.1. In this architecture, the reference tone is 143 transmitted continuously along with the data as in RTAT. The RX is equipped with a common PLL that uses signals from one or more RX antennas to keep its output locked to the reference tone. The I and Q components of the common PLL output are then used to convert the received signal at each RX antenna to base-band. These base-band signals are then processed by a bank of base-band phase shifters and fed to a down-conversion chain, as illustrated in Fig. 11.1. The control signals for these phase-shifters are just the low pass ltered versions of the base-band received signals. Note that unlike in PACE, the amplitude and phase estimation and compensation happens continuously, and thus, the `lter, sample and hold' circuit from Chapter 10 is replaced by a lter circuit (similar to Section 10.4). This architecture shall be referred to as Continuous Analog Channel Estimation (CACE) aided beamforming (as opposed to PACE). While CACE requires continuous transmission of the reference tone, it allows the phase compensation to be performed by base-band phase shifters which are cheaper and suer smaller insertion loss than the RF phase shifters used in PACE. The continuous presence of the reference tone and the corresponding guard band also allows phase noise compensation in CACE, as mentioned in the previous comment. Thus both CACE and PACE have their own advantageous and it is not obvious which design is superior. While the results on RTAT form a rst order approximation, a more detailed analysis of CACE can be performed in future work following steps similar to Chapter 10. LNA s rx;Mrx (t) LPF LPF LPF LPF a b a -b + a b + . . . LNA s rx;2 (t) PLL = 2 LPF LPF LPF LPF Phase Shifter LNA s rx;1 (t) LPF LPF LPF LPF Phase Shifter + Imfw LPF (t)g + Refw LPF (t)g Baseband conversion Amplitude & Phase compensation Digital processing Figure 11.1: Block diagram of an RX with analog beamforming enabled via CACE. 6. In Chapter 10, the impact of ACE on system performance was characterized by linear phase noise analysis techniques. While this gives a good approximation at a high SNR, non-linear techniques have to be leveraged to characterize performance at low SNR. While 144 a signicant amount of work has been done on the non-linear phase noise analysis of a single PLL [232,250{252], there is very limited work on the arrayed PLL architectures, to the best of our knowledge . Even in the high SNR regime, such a non-linear analysis is useful to provide insight into the impact of system parameters on locking range or locking time. This would be an interesting area to explore in future work. 7. The ACE based schemes proposed in this thesis depend heavily on the assumption of sparsity of the wireless channels in the large antenna regime. While it is the general consensus in industry and academia that such sparsity is indeed expected, they are yet to be validated by channel measurement campaigns especially in the mm-wave regime, which are still in their infancy. To reveal the true performance capabilities of these schemes, emulation of these systems on measured channel data is required. This is an interesting direction of further work. 8. One issue with the ACE based schemes suggested in this thesis is that they can only support a single spatial data stream. This is inevitable since a purely ACE based scheme cannot distinguish between dierent channel MPCs. While this may not be a serious issue at mm-wave where the large bandwidth may obviate the need for transmission of multiple spatial streams to each user, there may be scenarios where spatial multiplexing is desired. Another related issue is that all the ACE based schemes suer at least a 2 dB reduction in beamforming gain in comparison to analog beamforming. This loss is also inevitable the since purely ACE based analog beamforming schemes try to combine power from all the channel MPCs, which is not possible over a wide bandwidth. In fact to precisely avoid this, conventional CE based analog beamforming approaches focus on a single good MPC for data stream transmission. One nice way to resolve both these issues is by the use of a novel concept called Hybrid Channel Estimation (HCE), which involves using part analog and part digital components for CE. For example, consider replacing the `sample and hold' circuits in the PACE RX (see Fig. 10.1) by a very low sampling rate ADC at each antenna, as illustrated in Fig. 11.2. These ADCs sample the low pass ltered outputs obtained after multiplying the received signal with the I and Q PLL outputs at a sampling rate of 1=D 2 T s , and the corresponding sampled output vector can be approximated as: I HCE p T cs ^ H(0)t p E (r) e j e Varf L (t)g 2 + p T cs ^ W (r) ; (11.1) similar to I PACE in (10.15). While this signal is directly used for controlling the phase shifters in PACE, more sophisticated digital signal processing is possible with HCE since I HCE it is available in digital domain. For example, by taking a Discrete Fourier Transform of I HCE , one can obtain estimates of the array response vectors of each MPC. Thus by using these estimates to digitally control the phase shifters, spatial multiplexing can be supported at each RX. Even better performance can be achieved by using compressed sensing based techniques on (11.1) to estimate the RX array response vectors. In fact since only a small number of measurements are required for compressed sensing, it is sucient to have the phase and amplitude estimation circuits at only a small subset 145 of the antenna elements. This can signicantly bring down the cost of the additional hardware. This would be a very interesting direction to explore in future work. LNA s rx;Mrx (t) LPF ADC LPF ADC Phase Shifter . . . LNA s rx;2 (t) LPF ADC LPF ADC Phase Shifter PLL = 2 LNA s rx;1 (t) LPF ADC LPF ADC Phase Shifter + Imfr(t)g + Refr(t)g Phase and amplitude estimation Down-conversion Figure 11.2: Block diagram of an RX with analog beamforming enabled via hybrid channel estimation. 9. Note that in this thesis several ACE based schemes we proposed, with sequentially better performance (although each design has its own advantages). An interesting question to answer would be: `What are the fundamental limits on channel estimation in the analog domain?' This is infact a dicult, possibly ill posed, question since it is not clear what are the limits on capabilities of analog hardware. Another related issue is a study on the interplay between timing synchronization, frequency synchronization and channel estimation. Indeed since the three have to be performed jointly during the IA phase, a detailed study between the trade-o between the three is required. Some serious thought and work is required to be able to develop any generic answers to these questions. 146 Acronyms ACE analog channel estimation aCSI second order channel statistics ADC Analog to Digital Converter ASK Amplitude Shift Keying BER uncoded Bit Error Rate BPF Band Pass Filter BPSK Binary Phase Shift Keying BS Base Station CACE Continuous Analog Channel Estimation CE channel estimation CSI Channel State Information DFT Discrete Fourier Transform DPC Dirty Paper Coding FDD Frequency Division Duplexing FSR Frequency Shift Reference GSC Generalized Selection Combining HBaCSI Hybrid Beamforming based on aCSI HBiCSI Hybrid Beamforming based on iCSI HBwS Hybrid Beamforming with Selection HCE Hybrid Channel Estimation hSNR hSNR I in-phase i.i.d. independent and identically distributed i.n.i.d independent but not identically distributed IA Initial Access iCSI instantaneous Channel State Information 147 iSE instantaneous Spectral Eciency iSR instantaneous Sum Rate LFP Limited Feedback Precoding LNA Low Noise Amplier MA-FSR Multi-Antenna Frequency Shift Reference MAC Multiple Access Channel MD-FSR Multi-Dierential Frequency Shift Reference MGF Moment Generating Function MIMO Multiple Input Multiple Output MISO Multiple Input Single Output mm-wave millimeter wave MPC multi-path component MRC Maximal Ratio Combining nBPF narrow Band Pass Filter OFDM Orthogonal Frequency Division Multiplexing PACE Periodic Analog Channel Estimation PAS power angle spectrum PDP power delay prole PLL Phase Locked Loop PSS Primary Synchronization Sequence Q quadrature-phase QAM Quadrature Amplitude Modulation RD Reduced Dimensional RF radio frequency RF chain up/down-conversion chain RTAT Reference Tone Aided Transmission RX receiver 148 SER Symbol Error Rate SIMO Single Input Multiple Output SISO Single Input Single Output SNR signal to noise ratio SSS Secondary Synchronization Sequence TAS Transmit Antenna Selection TDD Time Division Duplexing TX transmitter UE User Equipment UWB Ultra-Wide Band VCO Voltage Controlled Oscillator ZF Zero Forcing 149 Index analog beamformer design, 31, 41, 46 analog channel estimation, 10, 78, 120, 123 antenna selection, 3, 21, 30 beam selection, 4, 31 capacity vs training overhead, 55, 63, 67, 72 channel estimation overhead, 8, 10, 50 frequency shift reference receivers, 12, 78, 81, 94 Grassmannian line packing, 48 hSNR sum capacity, 19, 20, 27, 41 hybrid beamforming, 5 hybrid beamforming with selection, 9, 15, 40 initial access, 8, 118, 138 JS-Rake, 12, 72 limited feedback precoding, 8, 20, 42 non-coherent receivers, 7, 12, 80, 94 phase locked loop, 128, 133 power allocation, 87, 102, 116 reference tone aided transmission, 109 restricted pre-coded systems, 20 transmit reference schemes, 12, 81 ultra wide-band communication, 11, 63, 80 150 Bibliography [1] ITU, \IMT vision - framework and overall objectives of the future development of IMT for 2020 and beyond," Tech. Rep. ITU-R M.2083-0, International Telecommunicationis Union, 2015. [2] J. Winters, \Optimum combining for indoor radio systems with multiple users," IEEE Transactions on Communications, vol. 35, pp. 1222{1230, November 1987. [3] G. Foschini and M. Gans, \On limits of wireless communications in a fading environment when using multiple antennas," Wireless Personal Communications, vol. 6, pp. 311{335, Mar 1998. [4] A. J. Paulraj, D. A. Gore, R. U. Nabar, and H. Bolcskei, \An overview of MIMO com- munications - a key to gigabit wireless," Proceedings of the IEEE, vol. 92, pp. 198{218, Feb 2004. [5] T. L. Marzetta, \Noncooperative cellular wireless with unlimited numbers of base station antennas," IEEE Transactions on Wireless Communications, vol. 9, pp. 3590{3600, nov 2010. [6] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, \Massive MIMO for next generation wireless systems," IEEE Communications Magazine, vol. 52, pp. 186{195, February 2014. [7] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski, \Five disrup- tive technology directions for 5G," IEEE Communications Magazine, vol. 52, pp. 74{80, February 2014. [8] B. Murmann, \ADC performance survey 1997-2018 (ISSCC & VLSI Symposium)." avail- able at: https://web.stanford.edu/ ~ murmann/adcsurvey.html. [9] C. Mollen, J. Choi, E. G. Larsson, and R. W. Heath, \Uplink performance of wideband massive MIMO with one-bit ADCs," IEEE Transactions on Wireless Communications, vol. 16, pp. 87{100, jan 2017. [10] D. Gore, R. Heath, and A. Paulraj, \Statistical antenna selection for spatial multiplexing systems," in IEEE International Conference on Communications (ICC), IEEE, 2002. 151 [11] C. Park and K. Lee, \Statistical transmit antenna subset selection for limited feedback MIMO systems," in 2006 Asia-Pacic Conference on Communications, IEEE, aug 2006. [12] S. Jin, X. Li, and X. Gao, \Statistical transmit antenna selection for correlated rayleigh fading MIMO channels," in IEEE Vehicular Technology Conference (VTC), IEEE, 2006. [13] M. Win and J. Winters, \Analysis of hybrid selection/maximal-ratio combining in rayleigh fading," IEEE Transactions on Communications, vol. 47, no. 12, pp. 1773{1776, 1999. [14] M.-S. Alouini and M. Simon, \An MGF-based performance analysis of generalized selec- tion combining over rayleigh fading channels," IEEE Transactions on Communications, vol. 48, pp. 401{415, mar 2000. [15] S. Thoen, L. V. der Perre, B. Gyselinckx, and M. Engels, \Performance analysis of com- bined transmit-SC/receive-MRC," IEEE Transactions on Communications, vol. 49, no. 1, pp. 5{8, 2001. [16] S. Sanayei and A. Nosratinia, \Capacity of MIMO channels with antenna selection," IEEE Transactions on Information Theory, vol. 53, pp. 4356{4362, nov 2007. [17] A. Gorokhov, D. Gore, and A. Paulraj, \Receive antenna selection for mimo at-fading channels: theory and algorithms," IEEE Transactions on Information Theory, vol. 49, pp. 2687{2696, oct 2003. [18] A. Molisch, M. Win, Y.-S. Choi, and J. Winters, \Capacity of MIMO systems with antenna selection," IEEE Transactions on Wireless Communications, vol. 4, pp. 1759{ 1772, jul 2005. [19] P. Hesami and J. N. Laneman, \Limiting behavior of receive antennae selection," in Annual Conference on Information Sciences and Systems (CISS), IEEE, mar 2011. [20] S. Y. Park and D. J. Love, \Capacity limits of multiple antenna multicasting using antenna subset selection," IEEE Transactions on Signal Processing, vol. 56, pp. 2524{2534, jun 2008. [21] D. Bai, P. Mitran, S. S. Ghassemzadeh, R. R. Miller, and V. Tarokh, \Rate of channel hardening of antenna selection diversity schemes and its implication on scheduling," IEEE Transactions on Information Theory, vol. 55, pp. 4353{4365, oct 2009. [22] H. Li, L. Song, and M. Debbah, \Energy eciency of large-scale multiple antenna sys- tems with transmit antenna selection," IEEE Transactions on Communications, vol. 62, pp. 638{647, feb 2014. [23] Y. Gao, H. Vinck, and T. Kaiser, \Massive MIMO antenna selection: Switching architec- tures, capacity bounds, and optimal antenna selection algorithms," IEEE Transactions on Signal Processing, vol. 66, pp. 1346{1360, mar 2018. 152 [24] Z. Xu, S. Sfar, and R. Blum, \Analysis of MIMO systems with receive antenna selec- tion in spatially correlated rayleigh fading channels," IEEE Transactions on Vehicular Technology, vol. 58, pp. 251{262, jan 2009. [25] H. Yu, Y. Wang, S. Liu, and J.-W. Chong, \Capacity bounds for joint transmit and receive antenna selection systems with correlated fading channels," in International Conference on Wireless Communications, Networking and Mobile Computing, IEEE, oct 2008. [26] H. Shen and A. Ghrayeb, \Analysis of the outage probability for spatially correlated MIMO channels with receive antenna selection," in IEEE Global Telecommunications Conference (GLOBECOM), IEEE, 2005. [27] S. Sanayei and A. Nosratinia, \Antenna selection in MIMO systems," IEEE Communi- cations Magazine, vol. 42, pp. 68{73, oct 2004. [28] M. Gharavi-Alkhansari and A. Gershman, \Fast antenna subset selection in MIMO sys- tems," IEEE Transactions on Signal Processing, vol. 52, pp. 339{347, feb 2004. [29] Y.-S. Choi, A. Molisch, M. Win, and J. Winters, \Fast algorithms for antenna selection in MIMO systems," in IEEE Vehicular Technology Conference. VTC, IEEE, 2003. [30] V. V. Ratnam, \A low complexity antenna selection method for MIMO and its perfor- mance studies." Bachelor's thesis, availabe at: http://gssst.iitkgp.ac.in/uploads/ student67/Vishnu_thesis.pdf, 2012. [31] R. Mendez-Rial, C. Rusu, N. Gonzalez-Prelcic, A. Alkhateeb, and R. W. Heath, \Hybrid MIMO architectures for millimeter wave communications: Phase shifters or switches?," IEEE Access, vol. 4, pp. 247{267, 2016. [32] A. Dua, K. Medepalli, and A. Paulraj, \Receive antenna selection in MIMO systems using convex optimization," IEEE Transactions on Wireless Communications, vol. 5, pp. 2353{ 2357, sep 2006. [33] K. Phan and C. Tellambura, \Receive antenna selection based on union-bound minimiza- tion using convex optimization," IEEE Signal Processing Letters, vol. 14, pp. 609{612, sep 2007. [34] X. Gao, O. Edfors, J. Liu, and F. Tufvesson, \Antenna selection in measured massive MIMO channels using convex optimization," in IEEE global Communications Conference (Globecom), IEEE, dec 2013. [35] A. Molisch, \MIMO systems with antenna selection - an overview," in Proceedings of Radio and Wireless Conference (RAWCON '03), pp. 167{170, Aug 2003. [36] A. Molisch, X. Zhang, S. Kung, and J. Zhang, \DFT-based hybrid antenna selection schemes for spatially correlated MIMO channels," in IEEE Proceedings on Personal, In- door and Mobile Radio Communications (PIMRC), IEEE, 2003. 153 [37] A. Molisch and X. Zhang, \FFT-based hybrid antenna selection schemes for spatially correlated MIMO channels," IEEE Communications Letters, vol. 8, pp. 36{38, jan 2004. [38] O. E. Ayach, R. W. Heath, S. Abu-Surra, S. Rajagopal, and Z. Pi, \The capacity opti- mality of beam steering in large millimeter wave MIMO systems," in IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), IEEE, jun 2012. [39] Y.-S. Choi and S. Alamouti, \Performance analysis and comparisons of antenna and beam selection diversity," in IEEE Vehicular Technology Conference (VTC), IEEE, 2004. [40] D. Bai, S. S. Ghassemzadeh, R. R. Miller, and V. Tarokh, \Beam selection gain versus antenna selection gain," IEEE Transactions on Information Theory, vol. 57, pp. 6603{ 6618, oct 2011. [41] P. Sudarshan, N. Mehta, A. Molisch, and J. Zhang, \Channel statistics-based RF pre- processing with antenna selection," IEEE Transactions on Wireless Communications, vol. 5, pp. 3501{3511, December 2006. [42] P. Sudarshan, N. Mehta, A. Molisch, and J. Zhang, \Antenna selection with RF pre- processing: robustness to RF and selection non-idealities," in IEEE Radio and Wireless Conference (RAWCON), IEEE, 2004. [43] J. Brady, N. Behdad, and A. M. Sayeed, \Beamspace MIMO for millimeter-wave com- munications: System architecture, modeling, analysis, and measurements," IEEE Trans- actions on Antennas and Propagation, vol. 61, pp. 3814{3827, jul 2013. [44] Y. Zeng and R. Zhang, \Millimeter wave MIMO with lens antenna array: A new path divi- sion multiplexing paradigm," IEEE Transactions on Communications, vol. 64, pp. 1557{ 1571, apr 2016. [45] P. V. Amadori and C. Masouros, \Low RF-complexity millimeter-wave beamspace-MIMO systems by beam selection," IEEE Transactions on Communications, vol. 63, pp. 2212{ 2223, jun 2015. [46] J. Wang and H. Zhu, \Beam allocation and performance evaluation in switched-beam based massive MIMO systems," in 2015 IEEE International Conference on Communica- tions (ICC), IEEE, jun 2015. [47] R. Pal, K. V. Srinivas, and A. K. Chaitanya, \A beam selection algorithm for millimeter- wave multi-user MIMO systems," IEEE Communications Letters, vol. 22, pp. 852{855, apr 2018. [48] X. Gao, L. Dai, Z. Chen, Z. Wang, and Z. Zhang, \Near-optimal beam selection for beamspace MmWave massive MIMO systems," IEEE Communications Letters, vol. 20, pp. 1054{1057, may 2016. 154 [49] A. Alkhateeb, Y.-H. Nam, J. Zhang, and R. W. Heath, \Massive MIMO combining with switches," IEEE Wireless Communications Letters, vol. 5, pp. 232{235, jun 2016. [50] V. V. Ratnam, O. Y. Bursalioglu, H. C. Papadopoulos, and A. F. Molisch, \Preproces- sor design for hybrid preprocessing with selection in massive MISO systems," in IEEE International Conference on Communications (ICC), IEEE, may 2017. [51] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen, L. Li, and K. Haneda, \Hybrid beamforming for massive MIMO: A survey," IEEE Communications Magazine, vol. 55, no. 9, pp. 134{141, 2017. [52] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, \An overview of signal processing techniques for millimeter wave MIMO systems," IEEE Journal of Selected Topics in Signal Processing, vol. 10, pp. 436{453, apr 2016. [53] S. Kutty and D. Sen, \Beamforming for millimeter wave communications: An inclusive survey," IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 949{973, 2016. [54] X. Zhang, A. Molisch, and S.-Y. Kung, \Phase-shift-based antenna selection for MIMO channels," in IEEE Global Telecommunications Conference (GLOBECOM), vol. 2, pp. 1089{1093, Dec 2003. [55] X. Zhang, A. Molisch, and S.-Y. Kung, \Variable-phase-shift-based RF-baseband codesign for MIMO antenna selection," IEEE Transactions on Signal Processing, vol. 53, pp. 4091{ 4103, Nov 2005. [56] P. Karamalis, N. Skentos, and A. Kanatas, \Adaptive antenna subarray formation for MIMO systems," IEEE Transactions on Wireless Communications, vol. 5, pp. 2977{2982, nov 2006. [57] O. E. Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, \Spatially sparse precoding in millimeter wave MIMO systems," IEEE Transactions on Wireless Commu- nications, vol. 13, pp. 1499{1513, mar 2014. [58] Z. Xu, S. Han, Z. Pan, and C.-L. I, \Alternating beamforming methods for hybrid analog and digital MIMO transmission," in IEEE International Conference on Communications (ICC), IEEE, jun 2015. [59] W. Ni, X. Dong, and W.-S. Lu, \Near-optimal hybrid processing for massive MIMO systems via matrix decomposition," IEEE Transactions on Signal Processing, vol. 65, pp. 3922{3933, aug 2017. [60] X. Yu, J.-C. Shen, J. Zhang, and K. B. Letaief, \Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems," IEEE Journal of Selected Topics in Signal Processing, vol. 10, pp. 485{500, apr 2016. [61] A. Alkhateeb, G. Leus, and R. W. Heath, \Limited feedback hybrid precoding for multi- user millimeter wave systems," IEEE Transactions on Wireless Communications, vol. 14, pp. 6481{6494, nov 2015. 155 [62] F. Sohrabi and W. Yu, \Hybrid digital and analog beamforming design for large-scale antenna arrays," IEEE Journal of Selected Topics in Signal Processing, vol. 10, pp. 501{ 513, apr 2016. [63] D. Ying, F. W. Vook, T. A. Thomas, and D. J. Love, \Hybrid structure in massive MIMO: Achieving large sum rate with fewer RF chains," in IEEE International Conference on Communications (ICC), IEEE, jun 2015. [64] A. Adhikary, J. Nam, J.-Y. Ahn, and G. Caire, \Joint spatial division and multiplex- ing|the large-scale array regime," IEEE Transactions on Information Theory, vol. 59, pp. 6441{6463, oct 2013. [65] V. Venkateswaran and A.-J. van der Veen, \Analog beamforming in MIMO communica- tions with phase shift networks and online channel estimation," IEEE Transactions on Signal Processing, vol. 58, pp. 4131{4143, aug 2010. [66] A. Alkhateeb, O. E. Ayach, G. Leus, and R. W. Heath, \Hybrid precoding for millimeter wave cellular systems with partial channel knowledge," in 2013 Information Theory and Applications Workshop (ITA), IEEE, feb 2013. [67] A. Liu and V. Lau, \Phase only RF precoding for massive MIMO systems with limited RF chains," IEEE Transactions on Signal Processing, vol. 62, pp. 4505{4515, sep 2014. [68] S. Park, J. Park, A. Yazdan, and R. W. Heath, \Exploiting spatial channel covariance for hybrid precoding in massive MIMO systems," IEEE Transactions on Signal Processing, vol. 65, pp. 3818{3832, jul 2017. [69] Z. Li, S. Han, and A. F. Molisch, \Optimizing channel-statistics-based analog beam- forming for millimeter-wave multi-user massive MIMO downlink," IEEE Transactions on Wireless Communications, vol. 16, pp. 4288{4303, jul 2017. [70] Z. Li, S. Han, S. Sangodoyin, R. Wang, and A. F. Molisch, \Joint optimization of hybrid beamforming for multi-user massive MIMO downlink," IEEE Transactions on Wireless Communications, pp. 1{1, 2018. [71] Z. Li, S. Han, and A. F. Molisch, \User-centric virtual sectorization for millimeter-wave massive MIMO downlink," IEEE Transactions on Wireless Communications, vol. 17, pp. 445{460, jan 2018. [72] M. Chowdhury, A. Manolakos, and A. J. Goldsmith, \Design and performance of nonco- herent massive SIMO systems," in 2014 48th Annual Conference on Information Sciences and Systems (CISS), IEEE, mar 2014. [73] M. Chowdhury, A. Manolakos, and A. Goldsmith, \Scaling laws for noncoherent energy- based communications in the SIMO MAC," IEEE Transactions on Information Theory, vol. 62, pp. 1980{1992, apr 2016. 156 [74] A. Schenk and R. F. H. Fischer, \Noncoherent detection in massive MIMO systems," in International ITG Workshop on Smart Antennas (WSA), pp. 1{8, March 2013. [75] A. G. Armada and L. Hanzo, \A non-coherent multi-user large scale SIMO system relaying on m-ary DPSK," in IEEE International Conference on Communications (ICC), IEEE, jun 2015. [76] D. Kong, X.-G. Xia, and T. Jiang, \A dierential QAM detection in uplink massive MIMO systems," IEEE Transactions on Wireless Communications, vol. 15, pp. 6371{6383, sep 2016. [77] A. Alkhateeb, O. E. Ayach, G. Leus, and R. W. Heath, \Channel estimation and hybrid precoding for millimeter wave cellular systems," IEEE Journal of Selected Topics in Signal Processing, vol. 8, pp. 831{846, oct 2014. [78] C. Jeong, J. Park, and H. Yu, \Random access in millimeter-wave beamforming cellular networks: issues and approaches," IEEE Communications Magazine, vol. 53, pp. 180{185, jan 2015. [79] V. V. Ratnam, A. F. Molisch, N. Rabeah, F. Alawwad, and H. Behairy, \Diversity versus training overhead trade-o for low complexity switched transceivers," in IEEE Global Communications Conference (GLOBECOM), IEEE, dec 2016. [80] S. Haghighatshoar and G. Caire, \Massive MIMO channel subspace estimation from low- dimensional projections," IEEE Transactions on Signal Processing, vol. 65, pp. 303{318, jan 2017. [81] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S. Rappaport, and E. Erkip, \Millimeter wave channel modeling and cellular capacity evaluation," IEEE Journal on Selected Areas in Communications, vol. 32, pp. 1164{1179, jun 2014. [82] C. N. Barati, S. A. Hosseini, S. Rangan, P. Liu, T. Korakis, S. S. Panwar, and T. S. Rappa- port, \Directional cell discovery in millimeter wave cellular networks," IEEE Transactions on Wireless Communications, vol. 14, pp. 6664{6678, dec 2015. [83] Y. Li, J. G. Andrews, F. Baccelli, T. D. Novlan, and C. J. Zhang, \Design and analysis of initial access in millimeter wave cellular networks," IEEE Transactions on Wireless Communications, vol. 16, pp. 6409{6425, oct 2017. [84] M. Giordani, M. Mezzavilla, C. N. Barati, S. Rangan, and M. Zorzi, \Comparative anal- ysis of initial access techniques in 5g mmWave cellular networks," in 2016 Annual Con- ference on Information Science and Systems (CISS), IEEE, mar 2016. [85] D. Love, R. Heath, and T. Strohmer, \Grassmannian beamforming for multiple-input multiple-output wireless systems," IEEE Transactions on Information Theory, vol. 49, pp. 2735{2747, oct 2003. 157 [86] D. Love and R. Heath, \Grassmannian beamforming on correlated MIMO channels," in IEEE Global Telecommunications Conference (GLOBECOM), IEEE, 2004. [87] K. K. Mukkavilli, A. Sabharwal, E. Erkip, and B. Aazhang, \On beamforming with nite rate feedback in multiple-antenna systems," IEEE Transactions on Information Theory, vol. 49, pp. 2562{2579, oct 2003. [88] V. Raghavan, R. W. Heath, and A. M. Sayeed, \Systematic codebook designs for quan- tized beamforming in correlated MIMO channels," IEEE Journal on Selected Areas in Communications, vol. 25, pp. 1298{1310, September 2007. [89] C. Au-yeung and D. Love, \On the performance of random vector quantization limited feedback beamforming in a MISO system," IEEE Transactions on Wireless Communica- tions, vol. 6, pp. 458{462, feb 2007. [90] V. Raghavan and V. V. Veeravalli, \Ensemble properties of RVQ-based limited-feedback beamforming codebooks," IEEE Transactions on Information Theory, vol. 59, pp. 8224{ 8249, dec 2013. [91] N. Jindal, \MIMO broadcast channels with nite-rate feedback," IEEE Transactions on Information Theory, vol. 52, pp. 5045{5060, nov 2006. [92] S. Y. Park, D. Park, and D. J. Love, \On scheduling for multiple-antenna wireless net- works using contention-based feedback," IEEE Transactions on Communications, vol. 55, pp. 1174{1190, jun 2007. [93] J. Diaz, O. Simeone, and Y. Bar-Ness, \Asymptotic analysis of reduced-feedback strate- gies for MIMO gaussian broadcast channels," IEEE Transactions on Information Theory, vol. 54, pp. 1308{1316, mar 2008. [94] D. Love, R. Heath, V. N. Lau, D. Gesbert, B. Rao, and M. Andrews, \An overview of limited feedback in wireless communication systems," IEEE Journal on Selected Areas in Communications, vol. 26, pp. 1341{1365, oct 2008. [95] J. Choi, D. J. Love, and P. Bidigare, \Downlink training techniques for FDD massive MIMO systems: Open-loop and closed-loop training with memory," IEEE Journal of Selected Topics in Signal Processing, vol. 8, pp. 802{814, oct 2014. [96] A. Adhikary, E. A. Safadi, M. K. Samimi, R. Wang, G. Caire, T. S. Rappaport, and A. F. Molisch, \Joint spatial division and multiplexing for mm-wave channels," IEEE Journal on Selected Areas in Communications, vol. 32, pp. 1239{1255, jun 2014. [97] V. V. Ratnam, A. F. Molisch, O. Y. Bursalioglu, and H. C. Papadopoulos, \Hy- brid beamforming with selection for multi-user massive MIMO systems," CoRR, vol. abs/1708.00083, 2017. [98] Y. Chi, Y. C. Eldar, and R. Calderbank, \PETRELS: Parallel subspace estimation and tracking by recursive least squares from partial observations," IEEE Transactions on Signal Processing, vol. 61, pp. 5947{5959, dec 2013. 158 [99] S. Park and R. W. Heath, \Spatial channel covariance estimation for mmWave hybrid MIMO architecture," in Asilomar Conference on Signals, Systems and Computers, IEEE, nov 2016. [100] J. Lee, G.-T. Gil, and Y. H. Lee, \Channel estimation via orthogonal matching pursuit for hybrid MIMO systems in millimeter wave communications," IEEE Transactions on Communications, vol. 64, pp. 2370{2386, jun 2016. [101] J. Kim and A. F. Molisch, \Fast millimeter-wave beam training with receive beamform- ing," Journal of Communications and Networks, vol. 16, pp. 512{522, oct 2014. [102] V. Desai, L. Krzymien, P. Sartori, W. Xiao, A. Soong, and A. Alkhateeb, \Initial beam- forming for mmWave communications," in Asilomar Conference on Signals, Systems and Computers, IEEE, nov 2014. [103] M. Giordani, M. Mezzavilla, and M. Zorzi, \Initial access in 5G mmWave cellular net- works," IEEE Communications Magazine, vol. 54, pp. 40{47, nov 2016. [104] F. Devoti, I. Filippini, and A. Capone, \Facing the millimeter-wave cell discovery chal- lenge in 5G networks with context-awareness," IEEE Access, vol. 4, pp. 8019{8034, 2016. [105] X. Gao, L. Dai, Y. Zhang, T. Xie, X. Dai, and Z. Wang, \Fast channel tracking for tera- hertz beamspace massive MIMO systems," IEEE Transactions on Vehicular Technology, vol. 66, pp. 5689{5696, jul 2017. [106] J. Li, Y. Sun, L. Xiao, S. Zhou, and C. E. Koksal, \Super fast beam tracking in phased antenna arrays," CoRR, vol. abs/1710.07873, 2017. [107] P. Pal and P. P. Vaidyanathan, \Nested arrays: A novel approach to array processing with enhanced degrees of freedom," IEEE Transactions on Signal Processing, vol. 58, pp. 4167{4181, aug 2010. [108] P. P. Vaidyanathan and P. Pal, \Sparse sensing with co-prime samplers and arrays," IEEE Transactions on Signal Processing, vol. 59, pp. 573{586, feb 2011. [109] D. Romero and G. Leus, \Compressive covariance sampling," in Information Theory and Applications Workshop (ITA), IEEE, feb 2013. [110] R. Mendez-Rial, N. Gonzalez-Prelcic, and R. W. Heath, \Augmented covariance estima- tion with a cyclic approach in DOA," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, apr 2015. [111] TR36.873, \Technical specication group radio access network; study on 3D channel model for LTE (release 12)," Tech. Rep. v12.2.0, 3rd Generation Partnership Project, 2015. [112] R. R. Romanofsky, \Array phase shifters: Theory and technology," Tech. Rep. NASA/TM-2007-214906, NASA Glenn Research Center, 2007. 159 [113] Y.-S. Yeh, B. Walker, E. Balboni, and B. Floyd, \A 28-GHz phased-array receiver front end with dual-vector distributed beamforming," IEEE Journal of Solid-State Circuits, vol. 52, pp. 1230{1244, may 2017. [114] E. Adabi Firouzjaei, mm-Wave Phase Shifters and Switches. PhD thesis, EECS Depart- ment, University of California, Berkeley, Dec 2010. available at: http://www2.eecs. berkeley.edu/Pubs/TechRpts/2010/EECS-2010-163.html. [115] R. L. Schmid, P. Song, C. T. Coen, A. C. Ulusoy, and J. D. Cressler, \On the analysis and design of low-loss single-pole double-throw w-band switches utilizing saturated SiGe HBTs," IEEE Transactions on Microwave Theory and Techniques, vol. 62, pp. 2755{2767, nov 2014. [116] A. A. Nasir, S. Durrani, H. Mehrpouyan, S. D. Blostein, and R. A. Kennedy, \Timing and carrier synchronization in wireless communication systems: a survey and classication of research in the last 5 years," EURASIP Journal on Wireless Communications and Networking, vol. 2016, aug 2016. [117] X. Meng, X. Gao, and X.-G. Xia, \Omnidirectional precoding and combining based syn- chronization for millimeter wave massive MIMO systems," IEEE Transactions on Com- munications, vol. 66, pp. 1013{1026, mar 2018. [118] K. Venugopal, A. Alkhateeb, N. G. Prelcic, and R. W. Heath, \Channel estimation for hybrid architecture-based wideband millimeter wave systems," IEEE Journal on Selected Areas in Communications, vol. 35, pp. 1996{2009, sep 2017. [119] O. S. Sands, \Beam-switch transient eects in the RF path of the ICAPA receive phased array antenna," Tech. Rep. NASA/TM-2003-212588, NASA Glenn Research Center, 2003. [120] M. Win, D. Dardari, A. Molisch, W. Wiesbeck, and J. Zhang, \History and applications of UWB [scanning the issue]," Proceedings of the IEEE, vol. 97, pp. 198{204, feb 2009. [121] M. Win and R. Scholtz, \Ultra-wide bandwidth time-hopping spread-spectrum impulse radio for wireless multiple-access communications," IEEE Transactions on Communica- tions, vol. 48, pp. 679{689, apr 2000. [122] M. Win, \A unied spectral analysis of generalized time-hopping spread-spectrum signals in the presence of timing jitter," IEEE Journal on Selected Areas in Communications, vol. 20, pp. 1664{1676, dec 2002. [123] A. Molisch, J. Zhang, and M. Miyake, \Time hopping and frequency hopping in ultraw- ideband systems," in IEEE Pacic Rim Conference on Communications Computers and Signal Processing (PACRIM), IEEE, 2003. [124] M. Win, G. Chrisikos, and N. Sollenberger, \Performance of RAKE reception in dense multipath channels: implications of spreading bandwidth and selection diversity order," IEEE Journal on Selected Areas in Communications, vol. 18, pp. 1516{1525, aug 2000. 160 [125] A. F. Molisch, Wideband Wireless Digital Communications. Pearson Education, 2000. [126] R. Maei, U. Manzoli, and M. Merani, \Rake reception with unequal power path signals," IEEE Transactions on Communications, vol. 52, pp. 24{27, jan 2004. [127] D. Cassioli, M. Win, F. Vatalaro, and A. Molisch, \Low complexity rake receivers in ultra- wideband channels," IEEE Transactions on Wireless Communications, vol. 6, pp. 1265{ 1275, apr 2007. [128] K. Witrisal, G. Leus, G. Janssen, M. Pausini, F. Troesch, T. Zasowski, and J. Romme, \Noncoherent ultra-wideband systems," IEEE Signal Processing Magazine, vol. 26, pp. 48{66, jul 2009. [129] A. Schenk and R. F. Fischer, \Decision-feedback dierential detection in impulse-radio ultra-wideband systems," IEEE Transactions on Communications, vol. 59, pp. 1604{1611, jun 2011. [130] R. Hoctor and H. Tomlinson, \Delay-hopped transmitted-reference RF communications," in IEEE Conference on Ultra Wideband Systems and Technologies, IEEE, 2002. [131] V. V. Ratnam, A. F. Molisch, N. Rabeah, F. Alawwad, and H. Behairy, \JS-RAKE: Judiciously trained selective RAKE receiver for UWB systems," in IEEE International Conference on Ubiquitous Wireless Broadband (ICUWB), IEEE, oct 2016. [132] D. L. Goeckel and Q. Zhang, \Slightly frequency-shifted reference ultra-wideband (UWB) radio," IEEE Transactions on Communications, vol. 55, pp. 508{519, mar 2007. [133] A. A. D'Amico and U. Mengali, \Code-multiplexed UWB transmitted-reference radio," IEEE Transactions on Communications, vol. 56, pp. 2125{2132, dec 2008. [134] V. V. Ratnam, G. Caire, and A. F. Molisch, \Capacity analysis of interlaced clustering in a distributed antenna system," in IEEE International Conference on Communications (ICC), IEEE, jun 2015. [135] V. V. Ratnam, A. F. Molisch, and G. Caire, \Capacity analysis of interlaced clustering in a distributed transmission system with/without CSIT," IEEE Transactions on Wireless Communications, vol. 15, pp. 2629{2641, apr 2016. [136] D. Gore and A. Paulraj, \MIMO antenna subset selection with space-time coding," IEEE Transactions on Signal Processing, vol. 50, pp. 2580{2588, Oct 2002. [137] A. Molisch and M. Win, \MIMO systems with antenna selection," IEEE Microwave Magazine, vol. 5, pp. 46{56, mar 2004. [138] Y. Zeng, R. Zhang, and Z. N. Chen, \Electromagnetic lens-focusing antenna enabled massive MIMO: Performance improvement and cost reduction," IEEE Journal on Selected Areas in Communications, vol. 32, pp. 1194{1206, jun 2014. 161 [139] X. Wang, Z. Zhang, K. Long, and X.-D. Zhang, \Joint group power allocation and pre- beamforming for joint spatial-division multiplexing in multiuser massive MIMO systems," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, apr 2015. [140] J. Kermoal, L. Schumacher, K. Pedersen, P. Mogensen, and F. Frederiksen, \A stochastic MIMO radio channel model with experimental validation," IEEE Journal on Selected Areas in Communications, vol. 20, pp. 1211{1226, aug 2002. [141] S. Vishwanath, N. Jindal, and A. Goldsmith, \Duality, achievable rates, and sum-rate ca- pacity of gaussian mimo broadcast channels," IEEE Transactions on Information Theory, vol. 49, pp. 2658{2668, oct 2003. [142] M. Costa, \Writing on dirty paper (corresp.)," IEEE Transactions on Information Theory, vol. 29, pp. 439{441, may 1983. [143] H. Weingarten, Y. Steinberg, and S. Shamai, \The capacity region of the gaussian multiple-input multiple-output broadcast channel," IEEE Transactions on Information Theory, vol. 52, pp. 3936{3964, sep 2006. [144] G. Caire and S. Shamai, \On the achievable throughput of a multiantenna gaussian broadcast channel," IEEE Transactions on Information Theory, vol. 49, pp. 1691{1706, jul 2003. [145] J. Lee and N. Jindal, \High SNR analysis for MIMO broadcast channels: Dirty pa- per coding versus linear precoding," IEEE Transactions on Information Theory, vol. 53, pp. 4787{4792, dec 2007. [146] A. G. Akritas, E. K. Akritas, and G. I. Malaschonok, \Various proofs of sylvester's (de- terminant) identity," Mathematics and Computers in Simulation, vol. 42, pp. 585{593, nov 1996. [147] V. V. Ratnam, A. F. Molisch, and H. C. Papadopoulos, \MIMO systems with restricted pre/post-coding{capacity analysis based on coupled doubly correlated wishart matrices," IEEE Transactions on Wireless Communications, vol. 15, pp. 8537{8550, dec 2016. [148] B. Mondal and R. W. Heath, \Performance analysis of quantized beamforming MIMO systems," IEEE Transactions on Signal Processing, vol. 54, pp. 4753{4766, dec 2006. [149] D. Love and R. Heath, \Limited feedback unitary precoding for spatial multiplexing systems," IEEE Transactions on Information Theory, vol. 51, pp. 2967{2976, aug 2005. [150] A. M. Tulino and S. Verdu, Random Matrix Theory and Wireless Communications (Foun- dations and Trends in Communications and Information Theory). Now Publishers Inc, 2004. [151] C. Martin and B. Ottersten, \Asymptotic eigenvalue distributions and capacity for MIMO channels under correlated fading," IEEE Transactions on Wireless Communications, vol. 3, pp. 1350{1359, jul 2004. 162 [152] D. Morales-Jimenez, J. F. Paris, J. T. Entrambasaguas, and K.-K. Wong, \On the diago- nal distribution of a complex wishart matrix and its application to the analysis of MIMO systems," IEEE Transactions on Communications, vol. 59, pp. 3475{3484, dec 2011. [153] A. Maaref and S. A>ssa, \Joint and marginal eigenvalue distributions of (non)central com- plex wishart matrices and PDF-based approach for characterizing the capacity statistics of MIMO ricean and rayleigh fading channels," IEEE Transactions on Wireless Commu- nications, vol. 6, pp. 3607{3619, oct 2007. [154] P. J. Smith and L. M. Garth, \Distribution and characteristic functions for correlated complex wishart matrices," Journal of Multivariate Analysis, vol. 98, pp. 661{677, apr 2007. [155] P.-H. Kuo, P. Smith, and L. Garth, \Joint density for eigenvalues of two correlated complex wishart matrices: Characterization of MIMO systems," IEEE Transactions on Wireless Communications, vol. 6, pp. 3902{3906, nov 2007. [156] K. V. Mardia, \Measures of multivariate skewness and kurtosis with applications," Biometrika, vol. 57, no. 3, pp. 519{530, 1970. [157] P. Smith and M. Sha, \On a gaussian approximation to the capacity of wireless MIMO systems," in IEEE International Conference on Communications (ICC), IEEE, 2002. [158] P. Smith, S. Roy, and M. Sha, \Capacity of MIMO systems with semicorrelated at fading," IEEE Transactions on Information Theory, vol. 49, pp. 2781{2788, oct 2003. [159] M. K. Simon, Probability Distributions Involving Gaussian Random Variables: A Hand- book for Engineers and Scientists (The Springer International Series in Engineering and Computer Science). Springer, 2002. [160] R. B. Arellano-Valle and M. G. Genton, \On the exact distribution of linear combinations of order statistics from dependent random variables," Journal of Multivariate Analysis, vol. 98, pp. 1876{1894, nov 2007. [161] H. N. Nagaraja, \Order statistics from independent exponential random variables and the sum of the top order statistics," in Statistics for Industry and Technology, pp. 173{185, Birkh auser Boston, 2006. [162] A. M. Ross, \Computing bounds on the expected maximum of correlated normal vari- ables," Methodology and Computing in Applied Probability, vol. 12, pp. 111{138, aug 2008. [163] D. R. Hoover, \Bounds on expectations of order statistics for dependent samples," Statis- tics & Probability Letters, vol. 8, pp. 261{265, aug 1989. [164] B. C. Arnold and R. A. Groeneveld, \Bounds on expectations of linear systematic statis- tics based on dependent samples," The Annals of Statistics, vol. 7, no. 1, pp. 220{223, 1979. 163 [165] Y. L. Tong, \Order statistics of normal variables," in Springer Series in Statistics, pp. 123{149, Springer New York, 1990. [166] C. E. Clark, \The greatest of a nite set of random variables," Oper. Res., vol. 9, pp. 145{ 162, Apr. 1961. [167] D. Sinha, H. Zhou, and N. V. Shenoy, \Advances in computation of the maximum of a set of gaussian random variables," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 26, pp. 1522{1533, aug 2007. [168] S. Asmussen, J. L. Jensen, and L. Rojas-Nandayapa, \A literature review on lognormal sums," 2014. available at: http://citeseerx.ist.psu.edu/viewdoc/download?doi= 10.1.1.719.5927&rep=rep1&type=pdf. [169] L. Fenton, \The sum of log-normal probability distributions in scatter transmission sys- tems," IEEE Transactions on Communications, vol. 8, no. 1, pp. 57{67, 1960. [170] A. Abu-Dayya and N. Beaulieu, \Outage probabilities in the presence of correlated lognor- mal interferers," IEEE Transactions on Vehicular Technology, vol. 43, no. 1, pp. 164{173, 1994. [171] N. Mehta, J. Wu, A. Molisch, and J. Zhang, \Approximating a sum of random variables with a lognormal," IEEE Transactions on Wireless Communications, vol. 6, pp. 2690{ 2699, jul 2007. [172] A. Safak, \Statistical analysis of the power sum of multiple correlated log-normal com- ponents," IEEE Transactions on Vehicular Technology, vol. 42, no. 1, pp. 58{61, 1993. [173] S. C. Schwartz and Y. S. Yeh, \On the distribution function and moments of power sums with log-normal components," Bell System Technical Journal, vol. 61, pp. 1441{1462, sep 1982. [174] W. Bar and F. Dittrich, \Useful formula for moment computation of normal random vari- ables with nonzero means," IEEE Transactions on Automatic Control, vol. 16, pp. 263{ 265, jun 1971. [175] A. J. Homan and H. W. Wielandt, \The variation of the spectrum of a normal matrix," Duke Mathematical Journal, vol. 20, pp. 37{39, mar 1953. [176] R. Bhatia, Matrix Analysis. Springer New York, 1997. [177] H. Cram er and H. Wold, \Some theorems on distribution functions," Journal of the London Mathematical Society, vol. s1-11, pp. 290{294, oct 1936. [178] G. Levin and S. Loyka, \Comments on "asymptotic eigenvalue distributions and capacity for MIMO channels under correlated fading," IEEE Transactions on Wireless Communi- cations, vol. 7, pp. 475{479, feb 2008. 164 [179] J. Pierce and S. Stein, \Multiple diversity with nonindependent fading," Proceedings of the IRE, vol. 48, pp. 89{104, jan 1960. [180] G. S. Ulf Grenander, Toeplitz Forms and Their Applications (AMS Chelsea Publishing). American Mathematical Society, 2001. [181] E. Bjornson, M. Bengtsson, and B. Ottersten, \Optimal multiuser transmit beamform- ing: A dicult problem with a simple solution structure [lecture notes]," IEEE Signal Processing Magazine, vol. 31, pp. 142{148, jul 2014. [182] R. A. Horn and R. Mathias, \An analog of the cauchy-schwarz inequality for hadamard products and unitarily invariant norms," SIAM Journal on Matrix Analysis and Applica- tions, vol. 11, pp. 481{498, oct 1990. [183] N. R. Goodman, \The distribution of the determinant of a complex wishart distributed matrix," Ann. Math. Statist., vol. 34, pp. 178{180, 03 1963. [184] A. Barg and D. Nogin, \Bounds on packings of spheres in the grassmann manifold," IEEE Transactions on Information Theory, vol. 48, pp. 2450{2454, sep 2002. [185] R. A. Vitale, \Some comparisons for gaussian processes," Proceedings of the American Mathematical Society, vol. 128, pp. 3043{3047, apr 2000. [186] A. Medra and T. N. Davidson, \Flexible codebook design for limited feedback systems via sequential smooth optimization on the grassmannian manifold," IEEE Transactions on Signal Processing, vol. 62, pp. 1305{1318, mar 2014. [187] I. S. Dhillon, R. W. Heath, T. Strohmer, and J. A. Tropp, \Constructing packings in grassmannian manifolds via alternating projection," Experiment. Math., vol. 17, no. 1, pp. 9{35, 2008. [188] M. Deza, P. Erds, and P. Frankl, \Intersection properties of systems of nite sets," Pro- ceedings of the London Mathematical Society, vol. s3-36, pp. 369{384, mar 1978. [189] L. Babai, P. Frankl, and U. of Chicago. Department of Computer Science, Linear Al- gebra Methods in Combinatorics with Applications to Geometry and Computer Science. Department of Computer Science, The University of Chicago, 1992. [190] S. Ramanujan, \A proof of bertrand's postulate," Journal of Indian Math. Society, 1919. [191] J. Sondow and E. W. Weisstein, \Bertrand's postulate." From MathWorld{A Wolfram Web Resource. Available at: http://mathworld.wolfram.com/BertrandsPostulate. html. [192] A. Ispas, C. Schneider, G. Ascheid, and R. Thoma, \Analysis of the local quasi- stationarity of measured dual-polarized MIMO channels," IEEE Transactions on Ve- hicular Technology, vol. 64, pp. 3481{3493, aug 2015. 165 [193] R. Wang, C. U. Bas, S. Sangodoyin, S. Hur, J. Park, J. Zhang, and A. F. Molisch, \Sta- tionarity region of mm-wave channel based on outdoor microcellular measurements at 28 GHz," in MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM), IEEE, oct 2017. [194] A. Medra and T. N. Davidson, \Flexible codebook design for limited feedback sys- tems [online]." Available at: http://www.ece.mcmaster.ca/davidson/pubs/Flexible_ codebook_design.html, 2015. [195] G. Dimic and N. Sidiropoulos, \On downlink beamforming with greedy user selection: performance analysis and a simple new algorithm," IEEE Transactions on Signal Pro- cessing, vol. 53, pp. 3857{3868, oct 2005. [196] P. Xia and G. Giannakis, \Design and analysis of transmit-beamforming based on limited- rate feedback," IEEE Transactions on Signal Processing, vol. 54, pp. 1853{1863, may 2006. [197] A. Edelman, T. A. Arias, and S. T. Smith, \The geometry of algorithms with orthogo- nality constraints," SIAM J. Matrix Anal. Appl., vol. 20, pp. 303{353, Apr. 1999. [198] D. Cassioli, M. Win, F. Vatalaro, and A. Molisch, \Performance of low-complexity RAKE reception in a realistic UWB channel," in IEEE International Conference on Communi- cations (ICC), IEEE, 2002. [199] W. C. Lan, M.-S. Alouini, and M. Simon, \Optimum spreading bandwidth for selec- tive RAKE reception over rayleigh fading channels," IEEE Journal on Selected Areas in Communications, vol. 19, pp. 1080{1089, jun 2001. [200] T. Eng, N. Kong, and L. Milstein, \Comparison of diversity combining techniques for rayleigh-fading channels," IEEE Transactions on Communications, vol. 44, no. 9, pp. 1117{1129, 1996. [201] R. Mallik and M. Win, \Analysis of h-s/MRC in correlated nakagami fading," in IEEE Global Telecommunications Conference (GLOBECOM), IEEE, 2002. [202] Y. Ma and S. Pasupathy, \Performance of generalized selection combining on generalized fading channels," in IEEE International Conference on Communications (ICC), IEEE, 2003. [203] Y. Ma and S. Pasupathy, \Ecient performance evaluation for generalized selection com- bining on generalized fading channels," IEEE Transactions on Wireless Communications, vol. 3, pp. 29{34, jan 2004. [204] A. Annamalai, G. Deora, and C. Tellambura, \Theoretical diversity improvement in gsc(n,l) receiver with nonidentical fading statistics," IEEE Transactions on Communi- cations, vol. 53, pp. 1027{1035, jun 2005. 166 [205] X. Zhang and N. C. Beaulieu, \Performance analysis of generalized selection combining in generalized correlated nakagami-m fading," IEEE Transactions on Communications, vol. 54, pp. 2103{2112, nov 2006. [206] F. Yilmaz and M.-S. Alouini, \Novel asymptotic results on the high-order statistics of the channel capacity over generalized fading channels," in IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), IEEE, jun 2012. [207] F. Yilmaz and M.-S. Alouini, \A unied MGF-based capacity analysis of diversity com- biners over generalized fading channels," IEEE Transactions on Communications, vol. 60, pp. 862{875, mar 2012. [208] IEEE802.15.4, \Low-rate wireless personal area networks (LR-WPANs)," tech. rep., IEEE Computer Society, 2011. [209] TR36.814, \Further advancements for E-UTRA physical layer aspects," tech. rep., 3rd Generation Partnership Project, 2010. [210] N. Rendevski and D. Cassioli, \Potentials of low-complexity rake receivers for 60 GHz UWB wireless communication systems," in IEEE International Forum on Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI), IEEE, sep 2015. [211] N. Decarli, A. Giorgetti, D. Dardari, M. Chiani, and M. Z. Win, \Stop-and-go receivers for non-coherent impulse communications," IEEE Transactions on Wireless Communica- tions, vol. 13, pp. 4821{4835, sep 2014. [212] Q. Zhang and D. Goeckel, \Multi-dierential slightly frequency-shifted reference ultra- wideband (UWB) radio," in Annual Conference on Information Sciences and Systems (CISS), IEEE, mar 2006. [213] H. Nie and Z. Chen, \Dierential code-shifted reference ultra-wideband (UWB) radio," in IEEE Vehicular Technology Conference (VTC), IEEE, sep 2008. [214] S. Haykin and M. Moher, Communication Systems. Wiley, 2009. [215] J. Karedal, S. Wyne, P. Almers, F. Tufvesson, and A. Molisch, \A measurement-based statistical model for industrial ultra-wideband channels," IEEE Transactions on Wireless Communications, vol. 6, pp. 3028{3037, aug 2007. [216] G. Ungerboeck, \Channel coding with multilevel/phase signals," IEEE Transactions on Information Theory, vol. 28, pp. 55{67, jan 1982. [217] N. Papandreou and T. Antonakopoulos, \Bit and power allocation in constrained mul- ticarrier systems: The single-user case," EURASIP Journal on Advances in Signal Pro- cessing, vol. 2008, jul 2007. [218] D. Hughes-Hartogs, \Ensemble modem structure for imperfect transmission media." US patent, 1988. 167 [219] A. Lozano, A. Tulino, and S. Verdu, \Optimum power allocation for parallel gaussian channels with arbitrary input distributions," IEEE Transactions on Information Theory, vol. 52, pp. 3033{3051, jul 2006. [220] A. F. Molisch, K. Balakrishnan, D. Cassioli, C. chin Chong, S. Emami, A. Fort, J. Karedal, J. Kunisch, H. Schantz, U. Schuster, and K. Siwiak, \IEEE 802.15.4a channel model - nal report,," 2004. available at: http://www.ieee802.org/15/pub/TG4a.html. [221] V. V. Ratnam and A. Molisch, \Multi-antenna FSR receivers: Low complexity non- coherent receivers for massive MIMO," in IEEE Global Communications Conference (GLOBECOM), (Abu Dhabi, United Arab Emirates), Dec. 2018. [222] L. Zheng and D. Tse, \Communication on the grassmann manifold: a geometric approach to the noncoherent multiple-antenna channel," IEEE Transactions on Information The- ory, vol. 48, no. 2, pp. 359{383, 2002. [223] K. Ghavami and M. Naraghi-Pour, \Noncoherent SIMO detection by expectation propa- gation," in 2017 IEEE International Conference on Communications (ICC), IEEE, may 2017. [224] V. V. Ratnam, A. F. Molisch, A. Alasaad, F. Alawwad, and H. Behairy, \Bit and power allocation in QAM capable multi-dierential frequency-shifted reference UWB radio," in IEEE Global Communications Conference (GLOBECOM), IEEE, dec 2017. [225] V. V. Ratnam and A. Molisch, \Reference tone aided transmission for massive MIMO: analog beamforming without CSI," in IEEE International Conference on Communications (ICC), (Kansas City, USA), May 2018. [226] A. Viterbi, \Phase-locked loop dynamics in the presence of noise by Fokker-Planck tech- niques," Proceedings of the IEEE, vol. 51, no. 12, pp. 1737{1753, 1963. [227] L. Kong, Energy-Ecient 60GHz Phased-Array Design for Multi-Gb/s Communication Systems. PhD thesis, UC Berkeley, 2012. available at: http://www2.eecs.berkeley. edu/Pubs/TechRpts/2014/EECS-2014-191.html. [228] M. Breese, R. Colbert, W. Rubin, and P. Sferrazza, \Phase-locked loops for electroni- cally scanned antenna arrays," IRE Transactions on Space Electronics and Telemetry, vol. SET-7, no. 4, pp. 95{100, 1961. [229] R. Ghose, \Electronically adaptive antenna systems," IEEE Transactions on Antennas and Propagation, vol. 12, pp. 161{169, mar 1964. [230] P. Thompson, \Adaptation by direct phase-shift adjustment in narrow-band adaptive antenna systems," IEEE Transactions on Antennas and Propagation, vol. 24, pp. 756{ 760, sep 1976. [231] C. Golliday and R. Hu, \Phase-locked loop coherent combiners for phased array sensor systems," IEEE Transactions on Communications, vol. 30, pp. 2329{2340, oct 1982. 168 [232] A. Viterbi, Principles of Coherent Communication. McGraw-Hill Inc., 1967. [233] S. Gupta, \Phase-locked loops," Proceedings of the IEEE, vol. 63, no. 2, pp. 291{306, 1975. [234] L. Piazzo and P. Mandarini, \Analysis of phase noise eects in OFDM modems," IEEE Transactions on Communications, vol. 50, pp. 1696{1705, oct 2002. [235] Z. Liu, J.-Y. Kim, D. S. Wu, D. J. Richardson, and R. Slavik, \Homodyne OFDM with optical injection locking for carrier recovery," Journal of Lightwave Technology, vol. 33, pp. 34{41, jan 2015. [236] C. N. Barati, S. A. Hosseini, M. Mezzavilla, T. Korakis, S. S. Panwar, S. Rangan, and M. Zorzi, \Initial access in millimeter wave cellular systems," IEEE Transactions on Wireless Communications, vol. 15, pp. 7926{7940, dec 2016. [237] S. K. Garakoui, E. A. M. Klumperink, B. Nauta, and F. E. van Vliet, \Phased-array antenna beam squinting related to frequency dependency of delay circuits," in European Microwave Conference, pp. 1304{1307, Oct 2011. [238] A. Mehrotra, \Noise analysis of phase-locked loops," IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 49, pp. 1309{1316, sep 2002. [239] D. Petrovic, W. Rave, and G. Fettweis, \Eects of phase noise on OFDM systems with and without PLL: Characterization and compensation," IEEE Transactions on Commu- nications, vol. 55, pp. 1607{1616, aug 2007. [240] D. Messerschmitt, \Frequency detectors for PLL acquisition in timing and carrier recov- ery," IEEE Transactions on Communications, vol. 27, pp. 1288{1295, sep 1979. [241] Y. Venkataramayya and B. S. Sonde, \Acquisition time improvement of PLLs using some aiding functions," Indian Institute of Science Journal, vol. 63, pp. 73{88, Mar. 1981. [242] J. H. Schrader, \Receiving system design for the arraying of independently steerable antennas," IRE Transactions on Space Electronics and Telemetry, vol. SET-8, no. 2, pp. 148{153, 1962. [243] J. Schrader, \A phase-lock receiver for the arraying of independently directed antennas," IEEE Transactions on Antennas and Propagation, vol. 12, pp. 155{161, mar 1964. [244] P. Robertson and S. Kaiser, \Analysis of the eects of phase-noise in orthogonal frequency division multiplex (OFDM) systems," in IEEE International Conference on Communica- tions (ICC), IEEE, 1995. [245] S. Wu, P. Liu, and Y. Bar-Ness, \Phase noise estimation and mitigation for OFDM systems," IEEE Transactions on Wireless Communications, vol. 5, pp. 3616{3625, dec 2006. 169 [246] S. Randel, S. Adhikari, and S. L. Jansen, \Analysis of RF-pilot-based phase noise compen- sation for coherent optical OFDM systems," IEEE Photonics Technology Letters, vol. 22, pp. 1288{1290, sep 2010. [247] M.2135-1, \Guidelines for evaluation of radio interface technologies for IMT-Advanced," tech. rep., International Telecommunications Union, 12 2009. [248] TR36.814, \Further advancements for E-UTRA physical layer aspects (release 9)," tech. rep., 3rd Generation Partnership Project, 03 2010. [249] TR38.900, \Study on channel model for frequency spectrum above 6 GHz (release 14)," Tech. Rep. V14.3.1, 3rd Generation Partnership Project, 2017. [250] D. Abramovitch, \Lyapunov redesign of analog phase-lock loops," IEEE Transactions on Communications, vol. 38, no. 12, pp. 2197{2202, 1990. [251] W. C. Lindsey and R. C. Tausworthe, \A bibliography of the theory and application of the phase-lock principle," Tech. Rep. 19730017475, Jet Propulsion Lab., California Inst. of Tech., 1973. [252] W. C. Lindsey, Phase-Locked Loops and Their Application (IEEE Press selected reprint series). IEEE, 1978. 170
Abstract (if available)
Abstract
Spectral Efficiency, measured in bits/s/Hz, was the main performance criterion for design of wireless communication systems till about the last decade. However, with the advent of 'internet of things' as well as with the increase in system bandwidth and transceiver complexities, cost and energy efficiency have also become important design metrics today. Thus, in the interest of making systems practically viable, a significant amount of focus has been laid on the design of 'reduced complexity' transceivers that can reap the benefits of state-of-the-art communication technologies, like massive Multiple Input Multiple Output (MIMO) and Ultra-Wide Band (UWB) communication, while being cost and energy efficient. A common feature of these reduced complexity transceivers, for e.g., antenna selection, hybrid beamforming, selective Rake receivers and transmit reference receivers, is reduction of the signal dimension via analog pre-processing/beamforming, thus facilitating the use of a lower dimensional digital processor. Such 'hybrid' signal processing techniques usually require some knowledge of the wireless propagation channel which is conventionally obtained using digital channel estimation (CE) techniques. However, due to the lack of full signal digitization in these transceivers, a significant fraction of transmission resources may have to be expended to obtain this channel knowledge. Thus, conventional CE techniques contribute to a large CE overhead. ❧ In the first part of this thesis, a novel class of such reduced complexity transceivers, namely Hybrid Beamforming with Selection (HBwS), is presented, that utilize switches to aid in the analog beamforming. The use of switches enables the analog beamforming to adapt to the instantaneous channel variations, providing better user separability, beamforming gain, and/or simpler hardware than some conventional reduced complexity transceivers. In this part, the capacity of a system with a HBwS transmitter is analyzed and good designs for the analog beamforming stage are proposed. The trade-off between the system capacity and the CE overhead is also characterized and incorporated into the proposed transceiver designs. ❧ In the second part, it shall be shown that in sparse multi-path channels, estimation of the amplitude and phase of a single transmitted sinusoidal tone encompasses 'sufficient channel knowledge' for enabling hybrid signal processing at the receiver. Since amplitude and phase estimation of a single tone is significantly simpler than conventional CE, it can be performed in the analog domain. Thus, by avoiding digital CE, the estimation overhead of reduced complexity receivers can be lowered significantly. To illustrate this, three novel receiver architectures that can perform analog CE shall be presented. These new receivers admit both coherent and non-coherent implementations and are in fact inspired by a class of non-coherent receivers for UWB systems. The part concludes with a discussion about the advantages and limitations of such analog CE techniques and some approaches to resolve them.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Hybrid beamforming for massive MIMO
PDF
Design and analysis of large scale antenna systems
PDF
Precoding techniques for efficient ultra-wideband (UWB) communication systems
PDF
Novel optimization tools for structured signals recovery: channels estimation and compressible signal recovery
PDF
Modeling and analysis of propulsion systems and components for electrified commercial aircraft
PDF
Exploiting side information for link setup and maintenance in next generation wireless networks
PDF
Elements of next-generation wireless video systems: millimeter-wave and device-to-device algorithms
PDF
Efficient inverse analysis with dynamic and stochastic reductions for large-scale models of multi-component systems
PDF
Physics-based bistatic radar scattering model for vegetated terrains in support of soil moisture retrieval from signals of opportunity
Asset Metadata
Creator
Ratnam, Vishnu Vardhan
(author)
Core Title
Design and analysis of reduced complexity transceivers for massive MIMO and UWB systems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
02/02/2019
Defense Date
06/08/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
analog channel estimation,antenna selection,beam alignment,beamformer design,channel estimation,hybrid beamforming,hybrid beamforming with selection,hybrid channel estimation,limited feedback precoding,massive MIMO,MIMO,non-coherent transmission,OAI-PMH Harvest,restricted precoding,selective rake,transmit reference
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Molisch, Andreas F. (
committee chair
), Alexander, Kenneth (
committee member
), Chugg, Keith M. (
committee member
), Lindsey, William C. (
committee member
)
Creator Email
ratnam@usc.edu,ratnamvishnuvardhan@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-50092
Unique identifier
UC11668661
Identifier
etd-RatnamVish-6528.pdf (filename),usctheses-c89-50092 (legacy record id)
Legacy Identifier
etd-RatnamVish-6528.pdf
Dmrecord
50092
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Ratnam, Vishnu Vardhan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
analog channel estimation
antenna selection
beam alignment
beamformer design
channel estimation
hybrid beamforming
hybrid beamforming with selection
hybrid channel estimation
limited feedback precoding
massive MIMO
MIMO
non-coherent transmission
restricted precoding
selective rake
transmit reference