Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Hybrid beamforming for massive MIMO
(USC Thesis Other)
Hybrid beamforming for massive MIMO
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Hybrid Beamforming for Massive MIMO Zheda Li A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) Supervisor: Prof. Andreas F. Molisch August 2018 Copyright 2018 Zheda Li 1 Dedication To my parents: Li, Jian and Kang, Changfang Acknowledgements First of all, I would like to thank my adviser, Prof. Andreas Molisch. It is my great honor to be his student in past years. I am deeply grateful for his academic advise and funding support to accomplish my PhD work. I sincerely appreciate Prof. Molisch's great vision and insight, which lead me to the right track. Prof. Molisch is very patient in guiding me, and giving me suggestions. He went though all of my mathematical derivations in either paper draft or project reports, and pointed out mistakes, even small typos. I still remember how ecient and professional Prof. Molisch was, when he sat besides me, and helped me revise my rst conference paper years ago. I also remember the days we collaborated to write research proposal. Those moments reveal Prof. Molisch's commitment to his students. The experience of working with Prof. Molisch not only strengthens my knowledge in wireless communications, but also sets up a decent example for me to learn from, which I believe will be invaluable treasure to my future career. I am also extremely thankful for the invaluable advice of Prof. Shengqian Han from the Beihang University, who collaborated with me in the past three years on my thesis work. I would like to express my gratitude to Prof. Robert Scholtz for choosing me as his teaching assistant in the past three semesters, which accumulated my teaching experience and improved my communication skills. Thanks to my internship supervisor, Dr. Ozgun Bursalioglu Yilmaz, and other colleagues, including Dr. Haralabos Papadopoulos and Dr. Chen- wei Wang, from the DoCoMo Innovations. It was a pleasure and productive collaboration with them in past two years. I would like to express my gratitude for the time and signicant advice of Prof. Giuseppe Caire, Prof. Salman Avestimehr, Prof. Meisam Razaviyayn, Prof. Keith Chugg, and Prof. Antonio Ortega. Thanks to Prof. Chenyang Yang, head of the WelComLab in Beihang University, for oering me the opportunity to visit her lab. I would like to thank all of my colleagues in- cluding Hao Feng, Junyang Shen, Seun Sangodoyin, Sundar Aditya, Vinod Kristem, Joongheon Kim, Rui Wang, Celalettin Umit Bas, Daoud Burghal, and Vishnu Ratnam. It is always enjoyable to work with them. I would also like to express my gratitude to the Ming Hsieh Department of Electrical Engineering and Communication Science Institute, especially Diane Deme- tras, Gerrielyn Ramos, Corine Wong, and Susan Wiedem. Thanks for their generous assistance throughout my studies. Finally, my deepest gratitude is owned to my Mom and Dad. Without your love and support, I cannot go though all diculties in past years. Your complete faith in me and caring encouragement are always my source of energy to travel through this journey. I know you are proud of me, but I would like to say I am always proud of being your son. I love you! 4 Contents 1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . 2 1.2 Hybrid Digital/Analog Beamforming Architecture . . . . . . . 3 1.2.1 Hybrid Beamforming Based on Instantaneous CSI . . . 5 1.2.2 Wideband Hybrid Beamforming . . . . . . . . . . . . . 7 1.2.3 Other Low-Complexity Architechtures . . . . . . . . . 8 1.3 Channel-Statistics-Based Hybrid Beamforming . . . . . . . . . 9 1.3.1 Development of Hybrid Beamforming on Average CSI . 10 1.3.2 Acquisition of Channel Statistics . . . . . . . . . . . . 11 1.4 Overview of Contributions . . . . . . . . . . . . . . . . . . . . 12 1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.6 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2 System Model 17 3 Fully-Decoupled Analog/Digital Beamforming with Vir- tual Sectorization 20 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 System Model with Virtual Sectorization . . . . . . . . . . . . 23 3.3 Analog Beamformers and Digital Combiner Optimization . . . 25 3.3.1 Analog Beamformers . . . . . . . . . . . . . . . . . . . 25 3.3.2 Digital Combiner . . . . . . . . . . . . . . . . . . . . . 27 3.4 Digital Precoder Optimization . . . . . . . . . . . . . . . . . . 27 3.4.1 Optimization Algorithm . . . . . . . . . . . . . . . . . 28 3.4.2 Initialization Strategy . . . . . . . . . . . . . . . . . . 30 3.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.1 One-ring Cluster Model . . . . . . . . . . . . . . . . . 32 3.5.2 Ray Tracing Channels . . . . . . . . . . . . . . . . . . 35 5 3.5.3 Measured Channels . . . . . . . . . . . . . . . . . . . . 39 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Optimizing Channel-Statistics-Based Analog Beamforming 47 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.1 Downlink Problem Formulation . . . . . . . . . . . . . 50 4.2.2 Uplink Dual Problem Formulation . . . . . . . . . . . . 52 4.2.3 Semi-Decoupled Analog Beamforming Design . . . . . 53 4.3 Upper Bound of Achievable Sum Rate . . . . . . . . . . . . . 55 4.3.1 Take Expectation Inside the Determinant . . . . . . . . 55 4.3.2 Take Expectation Outside the Determinant . . . . . . . 56 4.4 Dual Uplink Transmission with the Approximate Rate Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.4.1 Optimal Transmission under the One-To-One Mapping Channel Model . . . . . . . . . . . . . . . . . . . . . . 59 4.4.2 Extension to a More Generic Channel Model . . . . . . 62 4.4.3 Common Scatterer Eect: Toy Example . . . . . . . . 65 4.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 68 4.5.1 One-ring Channel . . . . . . . . . . . . . . . . . . . . . 69 4.5.2 Two-Path Channel . . . . . . . . . . . . . . . . . . . . 73 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5 Joint Optimization of Analog Beamforming and Training Resource Allocation 76 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2 Overview of User-Centric Virtual Sectorization . . . . . . . . . 79 5.2.1 Recap of JSDM . . . . . . . . . . . . . . . . . . . . . . 79 5.2.2 Basic Idea of User-Centric Virtual Sectorization . . . . 80 5.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 86 5.3.1 Training with STDT Phase . . . . . . . . . . . . . . . 86 5.3.2 Dedicated Data Transmission . . . . . . . . . . . . . . 88 5.3.3 Beamformer Optimization . . . . . . . . . . . . . . . . 88 5.4 Algorithm development . . . . . . . . . . . . . . . . . . . . . . 92 5.4.1 Training Order Optimization . . . . . . . . . . . . . . 93 5.4.2 Greedy User-Centric Beam Clustering . . . . . . . . . . 95 5.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 97 5.5.1 Geometric Stochastic Channel Model (GSCM) . . . . . 98 6 5.5.2 Results and Discussions . . . . . . . . . . . . . . . . . 100 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6 Future Research Directions 108 6.1 Extension of Thesis . . . . . . . . . . . . . . . . . . . . . . . . 108 6.2 Other Related Directions . . . . . . . . . . . . . . . . . . . . . 108 Appendices 110 .1 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . 111 .2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . 111 .3 Proof of Remark 1 . . . . . . . . . . . . . . . . . . . . . . . . 112 .4 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . 113 Bibliography 115 7 List of Figures 1.1 Block diagrams of hybrid beamforming structures at BS for a downlink transmission, where structures A, B, and C denote the full-complexity, reduced-complexity, and virtual-sectorization structures, respectively. . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Fully-connected HDA structure at both link ends . . . . . . . 17 3.1 Logic structure of hybrid beamforming with virtual sectorization 23 3.2 Sum rate v.s. Number of clusters with SNR = 40 dB for block diagonalization (left) and eigen-beamforming (right) analog beamformer. The left side gures does not incorporate the cost of training and feedback overhead, which is considered in the right side ones. Number on the bars corresponds to number of UE RF chains. . . . . . . . . . . . . . . . . . . . . 34 3.3 Sum rate v.s. SNR with number of cluster G = 4 for block diagonalization (left) and eigen-beamforming (right) analog beamformer. Number in the legends corresponds to number of UE RF chains. . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Ray-tracing simulation environment: two UE routes spread over orthogonal street canyons are selected for simulations, where one is marked by an black arrow, and the other is marked by a blue one. . . . . . . . . . . . . . . . . . . . . . . 36 8 3.5 Sum rate v.s. Number of clusters when transmit power P t = 50 dBm with the ray-tracing simulation data for block diago- nalization (left) and eigen-beamforming (right) analog beam- former. The left side gures does not incorporate the cost of training and feedback overhead, which is considered in the right side ones. Number on the bars corresponds to number of UE RF chains. . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.6 Sum rate v.s. Transmit power P t when number of cluster G = 2 with the ray-tracing simulation data for block diago- nalization (left) and eigen-beamforming (right) analog beam- former. Number in the legends corresponds to number of UE RF chains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.7 Tx view of the urban macro-cell in Cologne . . . . . . . . . . 40 3.8 Rx placement for the urban macro cell in Cologne . . . . . . . 40 3.9 Top view of the measured urban environment and distribu- tions of terminals . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.10 Scatter plot of MPC in the azimuth domain of DOD for partial measured sites . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.11 Sum rate v.s. Number of clusters when transmit powerP t =30 dBm with the Cologne measurements data for block diago- nalization (left) and eigen-beamforming (right) analog beam- former. The left side gures does not incorporate the cost of training and feedback overhead, which is considered in the right side ones. Number on the bars corresponds to number of UE RF chains. . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.12 Sum rate v.s. Transmit power P t with eigen-beamforming (EB) analog precoder based on the Cologne measurements data for the scenario with a single UE group (left) and the one with two UE groups (right). Number in the legends cor- responds to number of UE RF chains. . . . . . . . . . . . . . . 44 3.13 Sum rate v.s. Number of BS RF chainsl BS with eigen-beamforming (EB) analog precoder and two UE groups based on the Cologne measurements data for number of UE RF chains l UE to be 1 and 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1 Two UEs communicate with the BS via a LOS component and a common cluster . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Ergodic capacity of analog precoder-combiner-projected channel 67 9 4.3 Simulation scenarios . . . . . . . . . . . . . . . . . . . . . . . 68 4.4 Ergodic capacity of analog precoder-combiner-projected chan- nel under dierent settings of SNR and number of UEs per group 71 4.5 Ergodic capacity of analog precoder-combiner-projected chan- nel vs. Number of RF chains . . . . . . . . . . . . . . . . . . . 72 4.6 Ergodic capacity of analog precoder-combiner-projected chan- nel under dierent settings of SNR and power weight of the common scatterer . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1 Toy example of 3-UE channel: 1) both UE 1 and UE 2 have LOS propagation to the BS, all three UEs \see" a common cluster that couples them, and normalized average power of MPCs, i.e. [ 2 ip ], is also labeled next to dashed lines; 2) generation of beam pair bipartite graph from beam measure table. . . . . . 80 5.2 Generation of beam pair bipartite graph when there are multi- ple receive eigenmodes. Both UE 1 and UE 2 exhibit two receive eigenmodes, while UE 3 has only one pointing to the common cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3 Compare the training phase of JSDM and UCVS, where (a) is an example of reduced beam pair bipartite graph, (b) re ects the training process of the JSDM, while (c) and (d) repre- sent the training periods of the UCVS with dierent training orders, respectively. is the duration of overall training window. 85 5.4 Con ict graph of transmit beams for beam pair bipartite graph exhibited in Fig. 5.3, and dierent colors represent dierent pilot dimensions allocated to transmit beams. . . . . . . . . . 93 5.5 Illustration of GSCM with UEs and scatterers in a range of DOD support. . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.6 Net average sum rate vs. d for T cor = 20 and = 0:9 . . . . . 101 5.7 Net average sum rate vs. T cor for all = 20 . . . . . . . . . . . 102 5.8 Net average sum rate vs. T cor for all = 40 . . . . . . . . . . . 102 5.9 Net average sum rate vs. for d = 5 dB . . . . . . . . . . . . 104 5.10 Net average sum rate vs. s for all = 20 . . . . . . . . . . . . 105 5.11 Net average sum rate vs. e for T cor = 20 . . . . . . . . . . . . 106 10 List of Tables 3.1 Simulation parameters . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Ray-tracing simulation congurations of USC campus . . . . . 37 3.3 Measurement Channel sounder conguration . . . . . . . . . . 41 4.1 Simulation parameters . . . . . . . . . . . . . . . . . . . . . . 70 5.1 Simulation parameters . . . . . . . . . . . . . . . . . . . . . . 99 11 Abstract Deployed with a large number of antenna elements, massive multiple-input- multiple-output (MIMO) systems will be an important component of next generation wireless systems, because they provide tremendous increase of spectral eciency. However, two major challenges hinder the implementa- tion of massive MIMO in a practical system. The rst one is the hardware constraint. To build a complete up-and-down conversion chain for every an- tenna element of a large array is not only cost-prohibitive but also entails signicant power consumption, especially for those mixed-frequency signal components at millimeter-wave (mm-) frequencies. Another challenge is the acquisition of instantaneous channel state information (CSI) for large an- tenna arrays. Conventional downlink training, whose cost is proportional to the array size at the base station (BS), cannot be directly extended to a massive MIMO system. With the help of channel reciprocity, uplink train- ing cost is proportional to the number of antennas at the user equipments (UEs). However, if the ensemble of UE antennas is of the same approximate as that of BS antenna array, the dilemma of instantaneous CSI acquisition still remains. The severe Doppler eect at mm-wave bandwidths, leading to short coherence times, worsens the situation. One possible solution to address the above key challenges is utilizing the hybrid digital/analog (HDA) architecture, which decomposes the MIMO precoding/combining matrix into two concatenated beamforming matrices, realized in analog and digital domain, respectively. This leads to a cost-and- energy ecient way to implement massive MIMO. To alleviate the burden of instantaneous CSI acquisition, the analog (pre-) beamforming can be adap- tive to the channel statistics or, as we call it, long-term CSI, while the digital beamforming is adaptive to eective channel projected by analog beamform- ing. Then, the dimension of the eective channel that needs to be trained is reduced from the number of antenna elements to the number of radio fre- quency (RF) chains. Since the stationarity region of channel statistics can be equivalent to tens or hundreds of coherence blocks, the long-term CSI can be acquired and tracked in a much slower fashion, which signicantly mitigates instantaneous CSI training eort. Following the surge of mm-wave communications, it is necessary to build large arrays at both BS and UEs to combat the severe path loss, and satisfy the required received signal strength. With both link ends equipped by the HDA architecture and given the special channel characteristics at mm-wave frequencies, beamformer optimization of multi-user (MU-) MIMO encounters the following new constraints: i) there will be four types of beamforming ma- trices that are coupled together, i.e., analog/digital precoder and analog/dig- ital combiner, ii) analog and digital beamformers are adaptive on dierent time scales, i.e., channel statistics and instantaneous CSI, respectively, iii) since coherence time is much shorter at mm-wave bandwidths, the impact of training cost needs to be jointly concerned with beamformer optimization. This dissertation develops beamforming strategies to leverage channel characteristics and HDA architecture to make massive MIMO practical in the real world. Throughout the dissertation, we assume the full knowledge of second order channel statistics, which are far less time-sensitive than the instantaneous CSI, and can be tracked eciently. Based on the mixed-time- scale design, the proposed solutions adopt semi-decoupling of beamformers in the analog and digital domain to reduce the algorithm complexity. The main contributions of this dissertation are: • Low-complexity hybrid beamforming with fully decoupled ana- log and digital design: we rst optimize the channel-statistics-based analog beamforming to maximize the average signal to interference plus noise ratio (SINR) by treating digital beamformers as identity matri- ces. Then, by xing the analog beamforming, the digital beamformers are optimized to maximize the spectral eciency based on the knowl- edge of channel statistics and eective instantaneous CSI, where par- tial feedback scheme is also investigated to further reduce the cost of CSI acquisition. • Optimal analog beamforming with capacity-achieving digital beamforming: rather than xing the analog beamforming to opti- mize the digital one as above, we assume the capacity-achieving dig- ital beamforming, whose coupling eect is incorporated to optimize the channel-statistics-based analog beamforming. With optimal ana- log beamforming to maximize the ergodic downlink capacity, conven- tional digital beamforming strategies, e.g., block diagonalization, can be implemented. • Joint optimization of analog beamforming and training re- source allocation: the overall network throughput also depends on 2 the cost of instantaneous CSI acquisition. For given analog beamform- ers, parallel beam training is possible if eective beam-to-UE channels are orthogonal across dierent UEs. Therefore, we extend the above optimization of analog beamforming to incorporate the impact of train- ing cost, and reach a low-complexity solution that can maximize the network throughput. Optimality analyses and numerical simulations demonstrate advantages of proposed schemes over the state-of-the-art methods, which can be enabled as promising technologies to implement practical massive MIMO. 3 Chapter 1 Introduction Multiple-input multiple-output (MIMO) technology, i.e., the use of multiple antennas at transmitter (TX) and receiver (RX), has been recognized as an essential approach to achieving high spectral eciency [1, 2]. In the multi- user (MU-) MIMO scenario, it improves the spectral eciency in two forms: (i) a base station (BS) can communicate simultaneously with multiple user equipments (UEs) on the same time-frequency resources, (ii) multiple data streams can be sent between the BS and each UE. The total number of data streams (summed over all UEs in a cell) is upper limited by the smaller of the number of BS antenna elements, and the sum of the number of all UE antenna elements. While MU-MIMO has been studied for more than a decade, [3] introduced the exciting new concept of \massive MIMO", where the number of antenna elements at the BS reaches dozens or hundreds. Not only does this allow to increase the number of data streams in the cell to very large values, but it also simplies signal processing, creates \channel hardening" [4] such that small- scale fading is essentially eliminated, and reduces the required transmission energy due to the large beamforming gain; see, e.g., [5] for a review. Massive MIMO is benecial at centimeter (cm-) wave frequencies, but is essential in the millimeter (mm-) wave bands, since the high free space pathloss at those frequencies [6] necessitates large array gains to obtain sucient signal-to- noise ratio (SNR), even at moderate distances of about 100 m. Yet the large number of antenna elements in massive MIMO also poses major challenges: (i) a large number of radio frequency (RF) chains (one for each antenna element) increases cost and energy consumption; (ii) determin- ing the channel state information (CSI) between each transmit and receive 1 antenna uses a considerable part of spectral resources. Next, we will elabo- rate on these key challenges, especially when implementing massive MIMO system at mm-wave bandwidths. Then, we introduce proposed solutions to address them. 1.1 Background and Motivation Due to the ever-increasing demand for high throughput and high data rate, wide swaths of new spectrum will have to be used for 5G cellular systems [7]. This implies the use of mm-wave frequencies, since only at these high frequencies can we nd sucient amount of bandwidth. In particular, the use of 28 GHz (for cellular access and backhaul) as well as 60 GHz (for wireless LAN) have drawn a lot of attention in recent years [8]. Due to the high free- space pathloss at these frequencies, adaptive antenna arrays with high gain, and thus large number of antenna elements, will have to be used. Typically, several hundred antenna elements will be used at the base station (BS) or access point (AP), leading to the massive MIMO operating regime. Best performance can be achieved if the signal for each antenna element is formed in baseband, and then upconverted through an RF chain (and similarly at the receiver); in this case, information theoretically optimum schemes such as dirty-paper coding [9], or quasi-optimal schemes using digital linear beamforming [10] could be implemented. However, construction of hundreds of RF chains is cost-prohibitive in particular for mm-wave systems. Moreover, high power consumption brought by mixed-frequency components and analog-digital converters (ADC) hinders the implementation of mm-wave systems [11]. Therefore, any practical realization will thus need to investigate low-complexity transceiver implementations. To overcome the hardware constraint of building RF chains at mm-wave bandwidth, analog-only beamformers were initially proposed to move all re- quired signal processing to the analog domain [12{14], which are usually realized by networks of phase shifters. A single RF chain is equipped at TX and RX, respectively. Meanwhile, without knowledge of CSI, codebook- based beam training is performed to nd the best TX/RX beam pair. Al- though such a system signicantly reduces the hardware complexity, it is also limited to serve one date stream on the same frequency-and-time resource, whose spatial multiplexing gain is far less than the potential provided by the massive MIMO regime [15]. Consequently, more exible architectures 2 with reduced-complexity are necessary to exploit the combination of massive MIMO and mm-wave communications. Besides the hardware constraint, the performance advantages of massive MIMO systems have been established under the assumption of perfect full CSI at the BS, which can be retrieved from uplink training according to chan- nel reciprocity [16] in time division duplexing (TDD) systems. Nevertheless, full CSI is dicult to obtain in mm-wave systems. While most of the mas- sive MIMO literature considers a large number of antenna elements at the BS and single-antenna user equipments (UEs) [3,17{19], it is also necessary to build antenna arrays at the UE as well at mm-wave bandwidths. Since the pathloss is approximately proportional to the square of the frequency [16], i.e., 20 log 10 (f), operating at 60 GHz generates additional 30 dB loss com- pared with that at 2 GHz, which needs the array gain at both link ends to compensate. Thanks to the short wavelength at mm-wave frequencies, which enables a compact size for an array with a large number of antenna elements. For example, with antenna spacing being half a wavelength, the size of a uni- form rectangular array with 100 antenna elements is just 2:5 2:5 cm 2 at 60 GHz, which can t on the usual size of an UE. With UEs having many an- tennas, the tremendous overhead required by orthogonal uplink training may hinder the implementation, especially since coherence times decrease with in- creasing carrier frequency [16]. Furthermore, conventional uplink training is incapable of utilizing array gain, since it does not yet have knowledge of the CSI, which leads to a low signal-to-noise ratio (SNR). Consequently, large training blocks should be implemented for channel estimation, but cannot be realized due to the fast variation of the mm-wave channel. 1.2 Hybrid Digital/Analog Beamforming Architecture A promising solution that explores the trade-o between hardware complex- ity reduction and baseband exibility lies in the concept of hybrid transceivers, which use a combination of analog beamformers in the RF domain, together with digital beamforming in baseband, connected to the RF with a smaller number of up/down conversion chains. This concept was rst introduced in [20, 21]. It is motivated by the fact that the number of up-down conver- sion chains is only lower-limited by the number of data streams that are to be transmitted; in contrast, the beamforming gain and diversity order is given by the number of antenna elements if suitable RF beamforming is done. While 3 formulated originally for MIMO with arbitrary number of antenna elements (i.e., covering both massive MIMO and small arrays), the approach is of interest in particular to massive MIMO. Interests in hybrid transceivers has therefore accelerated over the past years, where various hybrid digital/analog architectures have been proposed in dierent papers, e.g., [22{49]. Figure 1.1: Block diagrams of hybrid beamforming structures at BS for a downlink transmission, where structures A, B, and C denote the full- complexity, reduced-complexity, and virtual-sectorization structures, respec- tively. Fig. 1.1 shows three typical block diagrams of hybrid beamforming struc- tures at the BS, where we assume a downlink transmission from the BS to the UE, thereby the BS is a TX and the UE is a receiver RX. The classica- tion is applicable to both cm-and mm-wave bands. At the TX, a baseband digital precoder F BB processesN S data streams to produceN BS RF outputs, which are then upconverted to RF signals and mapped via an analog pre- coder F RF toN BS antenna elements for transmission. The structure at RX is similar: an analog beamformer W RF combines the signals from the N UE an- tenna elements to createN UE RF outputs, which are then downconverted to 4 baseband and further processed with digital beamforming using a combining matrix W BB , producing the output signal y that is then detected/decoded. For a full-complexity structure, i.e., structure A in Fig. 1.1, each ana- log precoder output can be a linear combination from all RF signals [25{ 39, 50, 51]. Complexity reduction at the price of a somewhat reduced per- formance can be achieved when each RF chain can be connected only to a subset of antenna elements [39{49,52], which is shown in structure B in Fig. 1.1. Evolving from the structure B, where each RF chain is connected to a xed subset of adjacent antenna elements, namely \xed-subarray" struc- ture, other novel structures, e.g. \dynamic-subarray" with each RF chain connected to nonadjacent antennas [24, 53, 54] and \overlapped-subarray" with each RF chain connected to overlapped subset of antennas [41], have also been investigated in recent years. Dierent from structure A and B where baseband signals are jointly processed by a digital precoder, structure C employs the analog beamformer to create multiple \virtual sectors", which enables separated baseband processing, downlink training, and uplink feed- back in dierent virtual sectors and therefore reduces both signaling overhead and computational complexity [55]. 1.2.1 Hybrid Beamforming Based on Instantaneous CSI Even assuming full-instantaneous CSI at the TX, it is very dicult to nd the analog and digital beamforming matrices that optimize, e.g., the net data rates of the UEs [27]. The main diculties include: i) analog and digital beamformers at each link end, as well as combiners at the dierent link ends, are coupled, which makes the objective function of the resulting optimization non-convex, ii) typically the analog precoder/combiner is realized as a phase- shifter network, which imposes additional constraints on the elements of W RF and F RF . With the knowledge of instantaneous CSI, two main methodologies are explored to alleviate the challenges described above and achieve the feasible near-optimal solution. • Approximating the optimal beamformer: For single-user (SU-) MIMO, the solution to the optimum beamforming for the fully digital case with N BS RF = N BS and N UE RF = N UE is known, i.e., dominant left- /right singular vectors of a channel transfer matrix from singular value decomposition (SVD) [16]. Then, [27,36,48,56{58] nd an approximate 5 optimum hybrid beamformer by minimizing the Euclidean distance to this fully digital one. The objective function of the approximation prob- lem is still non-convex, but much less complex than the original one. For sparse channels, as occurring in mm-wave channels, minimizing this distance provides a quasi-optimal solution [27]. In non-sparse channels, such as usually occurring at cm-wave bands, an alternating optimiza- tion of analog and digital beamformer can be used. A closed-form solu- tion for each of the alternating optimization steps can be developed (i) for the reduced-complexity structure [48,56,57], while (ii) for the full- complexity structure [36], the non-convex problem can be expanded into a series of convex sub-problems by restricting the phase increment of the analog beamformer within a small vicinity of its preceding iter- ation. [20] provides formulations (and approximate solutions) for both the diversity and the spatial multiplexing case. [59,60] explore the min- imum mean square error (MMSE) hybrid receiver design and investi- gate the impact of analog-digital-converter (ADC) with dierent reso- lutions, while a sparse approximation method is utilized in [61] to ob- tain a semi-opitimal MMSE-based hybrid precoder/combiner in MIMO interference channels. For the MU-MIMO scenario, [62] optimizes the hybrid precoder/combiner based on the criterion of nding the best ap- proximation to the fully digital weighted-minimum-mean-square-error (WMMSE) precoder/combiner in [63]. Whereas [27, 36, 48, 56, 57] as- sume the Gaussian input, [58] incorporates the constraint of nite- alphabet inputs to solve the approximation problem. • Decoupling the design of the analog and digital beamformers: one of the main challenges in hybrid beamformer design is the coupling among analog and digital beamformers, and between the beamform- ers at TX and RX. This motivates decoupling the beamformer designs for reducing the problem complexity. By assuming some transceiver algorithms, optimization of beamforming matrices can be solved se- quentially. For example, in order to maximize the net rate for SU- MIMO, one can eliminate the impact of the combiner on the pre- coder by assuming a fully digital MMSE receiver [30]. Further de- coupling of the analog and digital precoder is possible by assuming that the digital precoder is unitary. Subsequently, F RF is optimized column-by-column by imposing the phase-only constraint on each an- tenna. With the known analog precoder, a closed-form expression of 6 the digital precoder can then be obtained. Alternatively, some simple heuristic decoupling beamforming strategies have been explored. For example, the element-wise normalized conjugate beamformer can be used as the analog precoder [31,32], with which the asymptotic signal- to-interference-plus-noise ratio (SINR) of hybrid beamforming is only reduced by a factor of 4 compared to fully digital beamforming when letting the number of antenna elements and streams,N BS andN S , go to innity while keeping N BS N S constant. Extending to the situation where the UE is also equipped with a hybrid structure for MU-MIMO, one can rst construct the RF combiner by selecting the strongest receive beams from the Fourier codebook to maximize the Frobenius norm of the combiner-projected channel [28]. Then, the same normalized eigenbeamformer is implemented as the analog precoder on the eec- tive channel [31,32], or dominant singular vectors of analog-combiner- projected channel are used to form the analog precoder [51]. In the baseband, the BS performs block diagonalization (BD) [16] over the projected channel to suppress inter-user interference. Since a line-of- sight (LOS) path is often required for the mm-wave channel to obtain sucient received signal strength, [64] eciently estimates the domi- nant directions respectively toward the dierent UEs to form analog precoders, and zero-forcing (ZF) digital precoding is performed over the ananlog-precoder-projected channel afterwards. 1.2.2 Wideband Hybrid Beamforming The above discussion focused on narrowband (i.e., single-subcarrier) systems. In wideband orthogonal frequency division multiplexing (OFDM) systems, however, analog beamformers cannot have dierent weights across subcarri- ers; for strongly frequency selective channels, such beamformers extending over the whole available band adapt to the average channel state (compare Sec. III). Frequency-domain scheduling such as the one in [46] was believed unnecessary for fully digital massive MIMO systems because the suciently large number of antennas can harden the channels and provide sucient spatial degrees of freedom for multiplexing UEs [5]. However, under practi- cal constraints on array size (e.g., according to 3GPP LTE Release 13 [65]), frequency-domain scheduling is still necessary for hybrid transceivers [66,67]. With frequency-domain scheduling, UEs are served on dierent subcarriers, making the existing narrowband hybrid precoders no longer applicable. Ex- 7 isting works [66, 68] have studied the joint optimization of wideband ana- log precoder and narrowband digital precoders, aimed at minimizing the BS transmit power or maximizing the sum-rate of UEs. [69, 70] propose compressed-sensing-based solutions to channel estimation for the wideband massive MIMO under the HDA architecture, while [70] also develops two types of wideband hybrid beamformers by approximating the fully digital op- timal one and alternatively minimizing MSE, respectively. [71] shows that the asymptotic performance of wideband SU-MIMO with full-complexity HDA structure (structure A in Fig. 1.1) approaches the fully-digital system when the number of antennas goes to innity, while for a practical array size, it also proposes unied heuristic algorithms for both full-complexity and xed- subarray structure. 1.2.3 Other Low-Complexity Architechtures Most hybrid beamforming architectures exploit a phase-shifter-network to perform analog beamforming as Fig. 1.1 exhibits. Other architectures en- abled with switches or combination of switches and phase-shifters are also explored to either further reduce the hardware complexity [72,73] or improve the performance of hybrid beamforming [74] with the same number of RF chains. In [72, 73], phase-shifters in structure A and structure B shown in Fig. 1.1 are replaced by switches, which realizes combining of selected an- tenna ports at each RF chain. The hardware cost and power consumption are signicantly reduced at a sacrice of some system performance. On the other hand, [74] implements switches to select subset of analog beamformers, and then down-convert to the baseband. Therefore, with the same number of RF chains, the proposed architecture in [74] benets from additional \antenna selection " gain at a cost of more analog beamformers. To avoid the cost brought by the phase-shifters, [75{80] implement the pre-beamforming in the analog domain by a lens array, which naturally con- centrates energy from dierent path directions at dierent antenna ports, respectively. Therefore, without help of phase-shifters, the antenna selection scheme is similar to the analog beam selection achieved in [74]. Although the lens array-based massive MIMO is of interest, it is out of the scope of this dissertation. Another popular subject in recent years to make massive MIMO practi- cal at mm-wave bandwidths is to combine the low-resolution ADC [81{84] with the HDA architecture [85{88]. Although the smaller number of RF 8 chains reduces the power consumption, the high-resolution ADC may still be a bottleneck to build an energy-ecient system. Therefore, designing based on low-resolution ADC with a few bits quantization introduces additional constraints on beamforming optimization and channel estimation under the HDA architecture, which could be a future research direction of this disser- tation. 1.3 Channel-Statistics-Based Hybrid Beamforming Assuming perfect CSI at the BS, a majority of hybrid beamforming de- signs aim to close the performance gap with the fully digital system (re- fer to Section 1.2). However, the acquisition of CSI at the BS comes at the penalty of severe signaling overhead. In time-division duplexing (TDD) systems, this overhead stems from the uplink training that provides the ba- sis of the beamforming. When taking into account the signaling overhead, information-theoretic results show that for both TDD and frequency-division- duplexing (FDD) systems, the spatial multiplexing gain (SMG) of massive MIMO downlink with fully digital structure equals to M(1 M T ), where M = min(N BS ;K; T 2 ), K =N S is the number of single-antenna users, and T is the number of channel uses in a coherence time-frequency block [55]. In FDD systems, the overhead is even larger, since both downlink training and uplink feedback for each antenna are required. Note also that in addition to the coherence time, the frame structure of systems such as LTE provide additional constraints for pilot repetition frequency and thus might impact training overhead. It is evident that for any massive MIMO systems relying on full CSI from all antenna elements of the BS to UEs, the maximal achievable SMG is limited by the size of coherence block of channels because N BS and K are generally large in massive MIMO systems. This therefore necessitates the design of transmission strategies with reduced dimensional CSI in order to relieve the signaling overhead. Specically, recent research has considered analog beamforming based on the slowly-varying second order channel statis- tics at the BS (a two-stage beamformer, with one stage based on average CSI only, can also be implemented in a fully digital fashion). The beamforming signicantly reduces the dimension of the eective channel that needs to be acquired for digital beamforming within each coherent fading block by taking advantage of a small angular spread at the BS. Such structures work robustly 9 even with the analog beamformers that cannot usually adapt to the varying channels as quickly as digital beamformers. 1.3.1 Development of Hybrid Beamforming on Average CSI Hybrid beamformers using average CSI for the analog part were rst sug- gested in [21], for SU-MIMO system. [55] proposed a scheme called \Joint Spatial Division Multiplexing" (JSDM), which considered the hybrid struc- tures at the BS and single-antenna UEs; to further alleviate the downlink training and uplink feedback burden, UEs with similar transmit channel co- variance are grouped together and inter-group interference is suppressed by an analog precoder based on the BD method, which creates multiple \virtual sectors". With this virtual sectorization, downlink training can be conducted in dierent virtual sectors in parallel, and each UE only needs to feed back the intra-group channels, leading to the reduction of both training and feedback overhead by a factor equal to the number of formed virtual sectors. In practice, however, to maintain the orthogonality between virtual sec- tors, JSDM often conservatively groups UEs into only a few groups, because UEs' transmit channel covariances tend to be partially overlapped with each other. This limits the reduction of training and feedback overhead. Once grouping UEs into more virtual sectors violates the orthogonality condition, JSDM is not able to combat the inter-group interference. Eliminating over- lapped beams of UEs in dierent groups is a heuristic approach to solving this problem [89]. In [25], JSDM is generalized to support non-orthogonal virtual sectorization, and a modied MMSE algorithm is proposed to opti- mize the multi-group digital precoders to maximize the lower bound of the average sum rate. Two UE grouping methods have been proposed as extension to JSDM [90]: K-means clustering [91], or xed quantization. In the large antenna limit, the number of downlink streams served by JSDM can be optimized given the angle of departure (AoD) of multipath components (MPC) and their spread for each UE group. To reduce the complexity of JSDM, in particular due to SVD, an online iterative algorithm can be used to track the analog precoder under time varying channels [92]. When considering single-antenna UEs, Fourier codebook based analog precoder and ZF digi- tal precoder, the performance of JSDM can be further improved by jointly optimizing the analog precoder and allocation of RF chains to groups based on second order channel statistics [93]. This principle can be extended to 10 multicell systems, where an outage constraint on the UEs' SINR can be considered [94]. Fixing the digital precoding to be regularized-zero-forcing (RZF), [95] optimizes analog precoding based on channel statistics by maxi- mizing the average SLNR, while [96] utilizes the average CSI to combat the inter-cell interference in the analog domain. Whereas the above paper fo- cuses on the channel-statistics-based analog beamforming, [97] reduces the feedback amount of instantaneous CSI by using channel statistics to design digital beamforming. 1.3.2 Acquisition of Channel Statistics In this dissertation, we assume the knowledge of second-order channel statis- tics, whose acquisition cost is intuitively negligible in low-mobility or nomadic scenarios, due to the large stationarity region of channel statistics. For gen- eral mobile communications, the ecient acquisition of channel staitistics under the constraint of HDA structure has been explored by both academia [98, 99] and industry standardization [65] in recent years. The major chal- lenge of eciently extracting the average CSI is that the baseband estimator can only access to fewer combined training signals under the HDA architec- ture. [98] exploits the coprime sampling scheme to extract channel subspace through low-dimensional projections, while [99] proposes compressive sensing techniques to tackle this problem. While the 3GPP standard does not prescribe particular transceiver ar- chitectures, HDA structures have motivated the design of CSI acquisition protocols in Release 13 of LTE-Advanced Pro in 3GPP [65], especially the nonprecoded and beamformed pilots for full dimensional (FD-) MIMO. The non-precoded beamformer is related to the reduced-complexity structure B in Fig. 1.1, where a (possibly static) analog precoder is applied to a subset of an antenna array to reduce the training overhead. The beamformed approach may assume the full-complexity structure A in Fig. 1.1, where analog beam- formers are used for downlink training signals. The BS transmits multiple analog precoded pilots in dierent time or frequency resources. Then, user feedback indicates the preferred analog beam; given this, the user can further measure and feedback the instantaneous eective channel in a legacy LTE manner. These approaches can under some circumstances reduce the over- head in average CSI acquisition, which generally perform well for SU-MIMO but may suer large performance degradation for MU-MIMO unless the av- erage CSI of all users is fed back. Recently proposed hybrid CSI acquisition 11 schemes in 3GPP combine the above two approaches. First, the BS sends non-precoded pilots to estimate the average CSI at users. Then, based on the analog (non-codebook-based) or digital (codebook-based) feedback of the average CSI from users, the BS determines the analog beamformer and then sends beamformed pilots. These hybrid schemes essentially enable the form of beamforming we discussed above in this section, namely adaptation of the analog beamformer based on long-term statistics, which is then followed by digital beamformer based on instantaneous eective CSI. Increasing the array size further motivates studies to reduce the training and feedback overhead through, e.g., aperiodic training schemes. The JSDM-based structure C in Fig. 1.1 that separates a cell into multiple \virtual sectors" is one approach to reduce the overhead signicantly by simultaneous downlink training and uplink feedback across virtual sectors. 1.4 Overview of Contributions Most covariance-based MU-MIMO hybrid beamforming tackles the HDA ar- chitecture only at the BS side, while the UE is assumed to be equipped with a single antenna [55,89{91,93{97] or multiple antennas with a purely digital architecture [92]. From a general system design perspective, the scope of this dissertation is to investigate feasible suboptimal beamforming strategies for both link ends equipped with HDA structures, where the analog beam- forming is based on the channel statistics, and the digital beamforming is based on the instantaneous eective channel. Therefore, there exists four beamformer variables that are coupled together on dierent time scales, in- cluding analog precoder/combiner and digital precoder/combiner, leading to extremely challenging problems to solve. Low-complexity beamforming strategies are necessary to eciently tackle these problems. Resorting to the idea of semi-decoupling analog and digi- tal beamforming, we rstly plan to explore following two reduced-complexity categories, i.e., i) \Fully-decoupled": assuming heuristic analog beamformers, optimize digital beamforming; ii) \Semi-joint": assuming capacity-achieving digital beamformers, optimize analog beamforming. In the rst category, the analog beamformer is fully-decoupled from the digital beamforming de- sign. In the second category, the interplay between analog beamformers and capacity-achieving digital beamformers is concerned, but relaxations are necessary to reduce the problem complexity. In the following, we will sum- 12 marize the proposed schemes for dierent categorizes separately, emphasize their connections and dierences. The ultimate goal to jointly design analog and digital beamforming for the maximization of net sum rate incorporates the interplay of above two research branches, and lays the foundation of our future research topic. \Fully-decoupled": to nd the optimal beamformers, we need to rst design the digital beamformer for each snapshot of the channel and then de- rive the analog beamformer based on their long-term time average, making their mathematical treatment dicult. Decoupled designs of the analog and digital beamformers therefore make the optimization problem simpler and practically attractive. Utilizing a JSDM-like virtual sectorization, we let a per-group analog precoder illuminate the angular subspace spanned by users within the group, while the analog combiner at the user side consists of the strongest receive eigenmodes of the channel covariance. To combat the in- ter/intra group interference and reduce the cost of CSI feedback, a weighted- average-minimum-mean-square-error (WAMMSE) algorithm is proposed to achieve the sub-optimal digital beamformers that can maximize the con- ditional average sum rate based on the partial eective CSI and channel statistics [25,100]. \Semi-joint":To tackle neglected aspects in the design of JSDM-like ana- log beamforming, we devote to investigate the optimum channel-statistics based analog beamforming when both BS and UE have multiple antenna el- ements and RF chains under a generic mm-wave channel model. By assuming the capacity-achieving optimal digital beamforming, we formulate the analog beamforming optimization problem to maximize the ergodic capacity. To fur- ther reduce the problem complexity, we relax digital beamformers at UEs to be functions of analog beamformers and channel statistics, based on which an ecient greedy algorithm is proposed to approach semi-optimal analog beamformers at both link ends [101,102]. Based on this idea of capacity-maximization analog beamformers with xed digital beamforming strategy, we introduce another joint tier of analog beam training design to incorporate the concern of training overhead in [103, 104] where trade-o of training reduction and capacity enhancement is jointly explored to maximize the net throughput. The proposed scheme, namely user-centric virtual sectorization (UCVS), also generalizes the group-centric JSDM-like scheme [25,100] to the user-centric one, where each UE is served by individual beam cluster, and dierent beam clusters can be overlapped to each other. However, compared with [25, 100{102], only a single RF chain 13 is assumed at the individual UE in [103, 104], and extension to the scenario with multiple RF chains at the UE side needs further investigations. 1.5 Organization The rest of the thesis is organized as follows. Chapter 2 introduces the sys- tem model used throughout this dissertation. Chapter 3 concentrates on the low-complexity hybrid beamforming with fully decoupled RF and baseband processing under the virtual sectorization. Chapter 4 considers the optimiza- tion of channel-statistics-based analog beamforming, which incorporates the impact of capacity-achieving digital beamforming. Chapter 5 studies the joint optimization of analog beamforming and training resource allocation, leading to a user-centric virtual sectorization scheme to maximize the overall throughput. Finally, Chapter 6 provides an outlook on future research direc- tions. List of Publications Journal Papers 1. Z. Li, S. Han, A. F. Molisch, \Joint Optimization of Hybrid Beamform- ing for Multi-User Massive MIMO Downlink", IEEE Transactions on Wireless Communications, vol. 17, no. 6, pp. 3600-3614, 2018. 2. Z. Li, S. Han, A. F. Molisch, \Optimizing Channel-Statistics-Based Analog Beamforming for Millimeter-Wave Multi-User Massive MIMO Downlink", IEEE Transactions on Wireless Communications, vol. 16, no. 7, pp. 4288-4303, 2017. 3. Z. Li, S. Han, A. F. Molisch, \User-Centric Virtual Sectorization for Millimeter-Wave Massive MIMO Downlink", IEEE Transactions on Wireless Communications, vol. 17, no. 1, pp. 445-460, 2018. 4. A.F. Molisch, V.V. Ratnam, S. Han, Z. Li, S.L.H. Nguyen, L. Li and K. Haneda, \Hybrid Beamforming for Massive MIMO: A Survey", IEEE Communications Magazine, vol. 55, no. 9, pp. 134-141, 2017. 14 Conference Papers 1. Z. Li, S. Han, A. F. Molisch, \Hybrid beamforming design for millimeter- wave multi-user massive MIMO downlink", Communications (ICC), 2016 IEEE International Conference on. IEEE, 2016. 2. Z. Li, S. Han, A. F. Molisch, \Channel-Statistics-Based Analog Down- link Beamforming for Millimeter-Wave Multi-User Massive MIMO", Communications(ICC),2017IEEEInternationalConferenceon. IEEE, 2017. 3. Z. Li, S. Han, A. F. Molisch, \User-Centric Virtual Sectorization for Millimeter-Wave Massive MIMO Downlink", 2017 IEEE Global Com- munications Conference (GLOBECOM). IEEE, 2017. 4. Z. Li, N. Rupasinghe, O.Y. Bursalioglu, C. Wang, H. Papadopoulos, G. Caire, \Directional Training and Fast Sector-based Processing Schemes for mmWave Channels", Communications (ICC), 2017 IEEE Interna- tional Conference on. IEEE, 2017. 5. O.Y. Bursalioglu, Z. Li, C. Wang, H. Papadopoulos, \Ecient C-RAN Random Access for IoT Devices: Learning Links via Recommendation Systems", Communications Workshops (ICC Workshops), 2018 IEEE International Conference on. IEEE, 2018. 6. Z. Li, A. F. Molisch, \Shadowing in urban environments with microcel- lular or peer-to-peer links", in Antennas and Propagation (EUCAP), 2012 6th European Conference on, pp. 44-48. IEEE, 2012. 1.6 Notations X is a matrix, x is a column vector,x is a scalar, andX is a set. () y , () T , () , and () 1 stand for Hermitian transpose, transpose, conjugate, and pseudo inverse operators, respectively. E[] (E[j]) represents expectation (condi- tional). X\Y,X[Y, and X indicate the intersection and union of setX andY, and the complement ofX , respectively.XnY indicates removing ele- ments ofY fromX .jXj denotes the cardinality ofX . tr (X) andjXj denote matrix trace and determinant, respectively. kk F denotes the Frobenius norm. diag(x 1 ;:::;x n ) represents a diagonal matrix, while diag(X 1 ;:::; X n ) 15 indicates a block diagonal matrix. diag(X) indicates a diagonal matrix with elements of X's diagonal line. X 1 2 indicates the Cholesky decomposition of a positive semi-denite (PSD) matrix X. I n is the n-by-n identity matrix. CN (m; K) indicates the circularly symmetric complex Gaussian distribution with mean vector m and covariance matrix K.O() denotes the complexity order of the algorithm. denotes Kronecker product, and vec() vectorizes a matrix by stacking its columns. 16 Chapter 2 System Model We consider the downlink transmission of a single-cell massive MIMO sys- tem at mm-wave carrier frequency, with the BS equipped with M antenna elements and l BS RF chains serving K UEs, while each UE has N antenna elements and l UE RF chains. The HDA structure is used at both link ends as shown in Fig. 1.1. Consequently, the spatial multiplexing gain s of the !"#"$%&' ()*+,-*) ! ! ! ! ! ! ./' 01%"2 ./' 01%"2 ! ! ! ! ! ! 3 3 3 ! ! ! ! ! ! !"#"$%&' 0,45"2*) ! ! ! ./' 01%"2 ./' 01%"2 ! ! ! ! ! ! ! ! ! 3 3 3 !"#"$%&' (%)*(%+, -+%&.#' /0 -+%&.#' /0 !"#"$%&' (%)*(%+, ! ! ! ! ! ! ! ! Figure 2.1: Fully-connected HDA structure at both link ends proposed system is limited by min (l BS ;Kl UE ). Not restricting each UE to receive just a single stream, we investigate the scenario where the BS is able to communicate with a UE via multiple streams simultaneously, i.e., s = P K i=1 s i with s i 0 indicating the number of data streams dedicated to the i-th UE (UE i ). The considered model can be easily generalized to the 17 case where the users have dierent antennas and RF chains. In the downlink, the BS broadcasts the beamformed data streams to UEs, where the precoder consists of two-stage concatenated beamformers at RF and baseband domain separately. Specically, the BS rst applies an l BS s digital precoder F d followed by an Ml BS analog precoder F a . Similarly, the UE i ;8i rst projects the received signal through the analog combiner W ai 2 C Nl UE before the frequency down-conversion and then applies the digital combiner W di 2 C l UE s i . Assuming the block fading channel, the precoder-combiner-projected received signal is expressed by ^ x i = W y di W y ai H i F a F d x + W y di W y ai n i (2.1) = W y di W y ai H i F a F di x i | {z } Desired signal + X j6=i W y di W y ai H i F a F dj x j | {z } Interference + W y di W y ai n i | {z } Noise ; where x2 C s1 is the sampled symbol vector following the distribution of CN (0; I s ), n i CN (0; 2 I N ) denotes the white additive Gaussian noise at UE i , which is independent of the symbol vector x, and H i 2C NM indicates the transfer matrix of channel between the BS and UE i . x i 2C s i 1 denotes the symbol vector to UE i ;8i, and x = [x T 1 ;:::; x T K ] T . F di 2C l BS s i indicates the digital precoder to serve UE i ,8i, and F d = [F d1 ;:::; F dK ]. A general double directional channel description is considered as follow- ing: H i = 1 p L i P P i p=1 g ip a( ip )b y ( ip ), 1 p L i A i i G i B y i ; (2.2) where P i is the number of multipath components (MPC) connecting the BS and UE i , L i denotes the overall large scale loss, including path loss plus shadowing, and a() and b() are steering vectors of the antenna array with respect to direction of arrival (DOA) = [ az ; el ] and direction of departure (DOD) = [ az ; el ], respectively. Note that both DOD and DOA contain components in azimuth and elevation domain, which are distinguished by subscripts \az" and \el", respectively. A i and B i consist of stacked steering vectors of DOA and DOD, respectively, i.e., A i , [a( i1 ); a( i2 );:::; a( iP i )] and B i , [b( i1 ); b( i2 );:::; b( iP i )];8i. g ip re ects the small-scale fading of UE i 's p-th MPC with zero mean and variance 2 ip , 8i;p. 1 2 ip can be viewed as the average power of UE i 's p-th MPC normalized by the large 1 Note that the assumption of zero mean might not be full compatible with the assump- 18 scale loss. Assuming that dierent MPCs exhibit independent fading, we have P P i p=1 2 ip = 1. Diagonal matrices i and G i have diagonal entries equal to the standard deviations [ ip ] P i p=1 and normalized small-scale fad- ing variables [g ip = g ip ip ] P i p=1 respectively, i.e., , diag( i1 ; i2 ;:::; iP i ) and G i , diag(g i1 ;g i2 ;:::;g iP i ). Since the angular power spectrum remains the same within the stationarity region of the channel statistics, which can be equivalent to tens or hundreds of coherence blocks, the normalized cost for acquisition of long-term CSI, including [A i ], [ i ], [B i ], and [L i ], is thus neg- ligible. tion of innitely resolving massive MIMO antennas in a mathematical sense. However, in practice multipath components occur in clusters, and if the MIMO array can resolve between the clusters, but not within, the eective channel could fulll the conditions de- scribed above. Irrespective of these considerations, the model we use is widely used in the literature, e.g., [27,105]. 19 Chapter 3 Fully-Decoupled Analog/Digital Beamforming with Virtual Sectorization In this chapter, we investigate a virtual sectorization realized by channel- statistics-based user grouping and analog beamforming, where the user equip- ment (UE) only needs to feed back its intra-group eective channel, and the overall cost of channel state information (CSI) acquisition is signicantly re- duced. Under the Kronecker channel model assumption, we rst show that the strongest eigenbeams of the receive correlation matrix form the opti- mal analog combiner. Then, with the partial knowledge of instantaneous CSI, we jointly optimize the digital precoder and combiner by maximizing a lower bound of the conditional average net sum-rate. Simulations over the propagation channels obtained from geometric-based stochastic models, ray tracing results, and measured outdoor channels, demonstrate that our proposed beamforming strategy outperforms state-of-the-art methods. 3.1 Introduction To implement massive MIMO, a scheme called \Joint Spatial Division Multi- plexing" (JSDM) was proposed in [55], which provides a two-stage precoder naturally tting the HDA structure. The rst-stage (analog) beamforming is based on the slowly-varying second order channel statistics, which sig- nicantly reduces the dimension of the eective channel that needs to be trained and fed back within each coherent fading block. To further alleviate the downlink training and uplink feedback burden, UEs with similar transmit 20 channel covariance are grouped together and inter-group interference is sup- pressed by a block diagonalization (BD) based analog precoder, which creates multiple \virtual sectors". With this virtual sectorization, downlink training can be conducted in dierent virtual sectors in parallel, and each UE only needs to feed back the intra-group channels, leading to the reduction of both training and feedback overhead proportional to the number of formed virtual sectors. Extending this idea to the multicell scenario in [94, 96], covariance- based analog precoders not only oer beamforming gains to intra-cell desired signals but also suppress inter-cell interference. In practice, however, to maintain the orthogonality between virtual sec- tors, JSDM often conservatively groups UEs into few groups, because UEs' transmit channel covariances tend to be partially overlapped with each other. This limits the reduction of training and feedback overhead. Once grouping UEs into more virtual sectors violates the orthogonality condition, JSDM is not able to combat the inter-group interference. To solve the problem, [89] proposed to strike overlapped-eigenbeams of UEs in dierent groups, which however sacrices some beamforming gains. Another limitation of JSDM is that it was designed only for the case that the UE has a single antenna. Concerning scenarios with many antennas at the UE, [106] extends [55] by selecting the strongest eigenbeam of UE-side channel covariance to form an analog combiner, whereas xing a single RF chain for each UE, it neglects the potential that each UE may receive multiple symbols simultaneously. In this chapter, we generalize the JSDM scheme to support non-orthogonal virtual sectorization and HDA structures at both BS and UEs. With mul- tiple RF chains at both link ends, spatial multiplexing at each individual UE is explored to enhance the overall system performance. To reduce the problem complexity, we decouple designs of analog and digital beamformers, and propose multi-layer beamforming strategies with dierent time scales. In the rst layer, analog precoders/combiners are designed based on the sec- ond order channel statistics, while digital precoder optimization utilizes the mixed intra-group channel and long-term CSI in the second layer. Since each UE is able to evaluate the instant covariance of its received interference in the data transmission phase, digital combiners are based on both intra and inter-group instantaneous eective channels in the third layer. Another set of problems that bears some formal resemblance to our task is multi-cell digital beamforming optimization, where each UE either needs to feed back the instantaneous channels from all cooperative BSs, e.g., in coordinated multi-point (CoMP) [107], or the instantaneous channel from its 21 serving BS together with instant covariance of interference plus noise, e.g., in [63, 108]. However, the problem we are solving is distinct, and actually more challenging, since we assume that the BS does not know the instan- taneous information of inter-group interference. The main contributions of this chapter are thus as follows. • We investigate the design of hybrid beamformers at both link ends for multiuser massive MIMO systems. For the analog beamformer based on second order statistics, we show that under the Kronecker channel model, the optimization of analog combiners and precoders that maxi- mize the intra-group signal to inter-group interference plus noise ratio can be decoupled, where the optimal combiner of each UE consists of the dominant eigenvectors of its receive correlation matrix. • Given the analog beamformers, the digital precoders are optimized by maximizing a lower bound of the net conditional average data rate, where the BS knows only instantaneous intra-group channels and second- order statistics. We develop a block descent algorithm to solve the problem by establishing the equivalence betweem the problem and a weighted average mean square error minimization (WAMMSE) prob- lem. • We illustrate the performance of the proposed scheme by various sim- ulation results, based on a synthetic one-ring cluster channel model, and ray tracing data of an outdoor campus environment at 28 GHz, which show that orthogonal user grouping is not necessarily optimal when taking the feedback overhead into account. • Systematic simulations are also conducted over real channel matrices obtained from a massive MIMO measurement campaign with center frequency 2:53 GHz in Cologne [109], which demonstrate advantages of our proposed scheme also at low frequencies. The rest of this chapter is organized as follows. In Section 3.2, the sys- tem and spatial channel model are presented. In Section 3.3, we rst prove the optimality of decoupling analog precoder and combiner design under the Kronecker channel model, which signicantly reduces the complexity for jointly designing beamformers at both link ends in the rst layer. Then, for the digital combiner design at the third layer, the optimal combiner can 22 be implemented with the knowledge of intra/inter-group instantaneous ef- fective channels. In Section 3.4, the optimization problem to design digital precoders is formulated based on the intra-group eective channel and chan- nel covariance, whose locally optimal solution is guaranteed by the proposed WAMMSE algorithm. Simulations results based on extensive channel pro- les from typical one-ring channel model, ray tracing simulations, and real measurements are respectively presented in Section 3.5 before the conclusion in Section 3.6. 3.2 System Model with Virtual Sectorization To reduce the complexity of the digital precoder at the BS and also the channel feedback overhead at the UEs, the concept of virtual sectorization proposed in [55] is employed as shown in Fig. 3.1. We partition the RF Digital Precoder Digital Precoder RF Chain RF Chain . . . RF Chain RF Chain . . . . . . + + . . . . . . . . . . . . . . . . . . . . . . . . Digital Combiner RF Chain RF chain . . . . . . . . . + + + . . . . . . . Figure 3.1: Logic structure of hybrid beamforming with virtual sectorization chains at the BS intoG groups, where theg-th group servesk g UEs with its digital precoder V g . Let d g and l g denote the number of data streams and RF chains of the BS assigned to the g-th group, respectively, where d g l g and P G g=1 l g = l BS . Then, for the g-th group, the digital precoder V g has the dimension of l g d g because data streams of each group are processed separately. Therefore, given the total number of RF chains at the BS, the dimension of V g decreases with G, leading to reduced computational com- plexity. Note that Fig. 3.1 exhibits the logic structure of virtual sectorization 23 realized by the proposed beamforming strategy, which is not in con ict with typical hybrid beamforming structures, e.g., shared-array structures [110]. By assuming at fading within the coherence time, the received signal at the i-th UE in the g-th group (denoted by UE g i ) is expressed by ^ s g i =F y g i W y g i H g i B g V g i s g i | {z } Desired signals +F y g i W y g i H g i B g kg X j=1;j6=i V g j s g j | {z } Intra-group interference + F y g i W y g i H g i G X z=1;z6=g kz X l=1 B z V z l s z l | {z } Inter-group interference + F y g i W y g i n g i | {z } Noise ; (3.1) where V g = [V g 1 ;:::; V g kg ]2C lgdg is the digital precoder of theg-th group, V g i 2 C lgdg i is the digital precoder for UE g i , d g i is the number of data streams assigned to UE g i with P kg i=1 d g i = d g , F g i 2 C l UE dg i represents the digital combiner of UE g i , B g 2 C Mlg indicates the analog precoder for group g, W g i 2 C Nl UE is the analog combiner at UE g i , H g i 2 C NM denotes the channel matrix of UE g i , and n g i CN (0; 2 g i I N ) denotes the additive white Gaussian noise vector. Dening the eective channel from the z-th group to UE g i as H g i ;z , W y g i H g i B z 2C l UE lz ; we assume perfect eective channel estimation at the UEs. The eective channels are required for the optimization of digital precoders at the BS, which can be fed back by the UEs in FDD systems. Under virtual sector- ization the feedback overhead of each UE can be reduced by limiting that each UE only feeds back the intra-group eective channel, i.e., UE g i feeds back H g i ;g . Consider that the UEs employ an orthogonal analog feedback scheme [111] to feed back the eective channels. Then, the overhead of UE g i to feed back H g i ;g is l g channel uses, and the total feedback overhead can be obtained as fb = P G g=1 k g l g channel uses [111], where indicates the required number of repetition transmissions for robustness. We can see the impact of the number of groups G on the feedback overhead clearly under the special case where the l BS RF chains of the BS are evenly assigned to the G groups. In this case, we have fb =Kb=G, which is reduced by a factor of G. 24 Based on (3.1), we can obtain the net data rate of UE g i as R g i = (1 tr + fb T d ) log 2 jI dg i +F y g i H g i ;g V g i V y g i H y g i ;g F g i 1 g i j; (3.2) where tr =l BS denotes the overhead of orthogonal downlink training in term of channel uses, T d is the total number of channel uses in a coherence block assigned to an individual UE, and g i can be expressed as g i =F y g i H g i ;g kg X j6=i V g j V y g j H y g i ;g + G X z6=g kz X l=1 H g i ;z V z l V y z l H y g i ;z + W y g i W g i 2 g i F g i ,F y g i g i F g i ; (3.3) where g i is the instantaneous covariance of interference plus noise at the UE g i . To serve all potential UEs, we consider that the coherence block with size T is split into orthogonal time slots, i.e., T d = T , where UEs assigned with the same time slot are simultaneously served by MU-MIMO. 3.3 Analog Beamformers and Digital Combiner Optimiza- tion In this section, we optimize the analog beamformers and digital combiners at UEs based on channel covariance and instant eective channels, respectively. Since dierent channel information is exploited for the design of analog and digital beamformers, joint optimization for them is very challenging. We resort to decoupled optimization by designing the analog beamformers to mitigate the inter-group interference while designing the digital combiner to maximize the net data rate of each UE. 3.3.1 Analog Beamformers We begin with the design of the analog combiner of UE g i by maximizing the received \intra-group signal to inter-group interference plus noise" ra- tio, which is dened under the assumption of equal power allocation over groups as g i = Pt GkBgk 2 E[k H g i ;g k 2 ] P G z=1;z6=g Pt GkBzk 2 E[k H g i ;z k 2 ]+ 2 g i kW g i k 2 ; (3.4) 25 where P t is the total transmit power. Let the analog combiner W g i sat- isfy the semi-unitary property, i.e., W y g i W g i = I l UE , which is also assumed in [30]. Therefore, given the analog precoders [B g ] G g=1 , the optimal analog combiner W g i that maximizes g i can be readily found by solving a gen- eralized Rayleigh quotient problem [112], which consists of l UE dominant eigenvectors of the matrix G X z=1;z6=g P t GkB z k 2 E[H g i B z B y z H y g i ]+ 2 g i I N 1 P t GkB g k 2 E[H g i B g B y g H y g i ]: Then, given the analog combiner of UE g i , we can obtain that the analog- combined eective channel W y g i H g i as vec(W y g i H g i )CN (0; (I M W y g i )K g i (I M W g i )); (3.5) based on which existing analog precoder design methods, e.g., BD and eigen- beamforming (EB) schemes [55] can be employed. Since the analog beamformers are coupled with each other, iterative up- dates of analog precoders and combiners are required in general. Neverthe- less, we next show that the iteration between analog precoders and combiners can be avoided if the Kronecker channel model [113] is valid, i.e., the channel covariance K g i can be expressed as K g i = t;g i r;g i ; (3.6) where t;g i and r;g i are transmit and receive correlation matrices of UE g i , re- spectively. Proposition 1:Under the Kronecker channel model (3.6), the optimal semi-unitary analog combiner that maximizes g i is independent from ana- log precoders [B g ] G g=1 and consists of l UE dominant eigenvectors of receive correlation matrix r;g i . Detailed proof can be found in Appendix.1. It is also intuitively pleasing, as the Kronecker model implies that the angular spectrum at the BS does not change when the one at the UE is modied, and vice versa. According to Proposition 1, as long as the Kronecker channel model is satised, each UE can optimize its own analog combiner individually by selecting its strongest receive eigenmodes, and then existing analog precoder design methods can be applied based on the analog-combined eective channel without requiring it- erations. With the validity of Proposition 1, decoupling the designs of analog 26 precoder and combiner signicantly reduces the problem complexity. Section 3.5 reveals comparisons of dierent covariance-based analog precoders, i.e., BD and EB. 3.3.2 Digital Combiner Given the analog beamformers [B g ] G g=1 and W g i as well as the digital precoder [V g ] G g=1 , the optimal digital combiner F g i can be obtained by maximizing the net data rate R g i given in (3.2). It is not hard to show that the optimal F g i is the linear minimum mean square error (MMSE) combiner, which is F g i = ( H g i ;g V g i V y g i H y g i ;g + g i ) 1 H g i ;g V g i : (3.7) 3.4 Digital Precoder Optimization Upon substituting the optimal linear MMSE combiner F g i into (3.2), the net data rate of UE g i can be rewritten as R g i = T d tr fb T d log 2 jI l UE + H g i ;g V g i V y g i H y g i ;g 1 g i j; (3.8) where g i is dened in (3.3). Since each UE, say UE g i , only feeds back the intra-group eective channel H g i ;g but not the inter-group eective channels H g i ;z for z6= g, we need to optimize the digital precoders [V g ] G g=1 by maximizing the sum rate of UEs averaged over the uncertainties. By noting the correlation between H g i ;g and H g i ;z , both of which are determined by the same propagation channel H g i , we formulate the digital precoder optimization problem aimed at maximizing the conditional-average net sum rate of UE as max [Vg i ] P G g=1 P kg i=1 E[R g i j H g i ;g ] (3.9a) s:t: P G g=1 P kg i=1 tr(B g V g i V y g i B y g )P t ; (3.9b) which is dicult to solve because it is hard to nd an explicit expression for the conditional average data rate. To tackle this diculty, we rst dene the mean square error (MSE) matrix E g i of UE g i as E g i ,E[(^ s g i s g i )(^ s g i s g i ) y ] = ~ F y g i ( H g i ;g V g i V y g i H y g i ;g + g i ) ~ F g i + I dg i V y g i H y g i ;g ~ F g i ~ F y g i H g i ;g V g i ; (3.10) 27 where [ ~ F g i ] represent auxiliary digital combiners we use at the second layer, while the real digital combiners [F g i ] will be eventually optimized at the third layer as described in Section 3.3.2. Then, we develop the following theorem. Theorem 1: Let A g i 0 be a d g i -by-d g i weight matrix for UE g i , and ~ F g i be an auxiliary variable representing the digital combiner of UE g i . The net sum rate maximization (3.9) is lower bounded by the following weighted conditional average mean square error minimization (WAMMSE) problem: min [Ag i ];[ ~ Fg i ];[Vg i ] P G g=1 P kg i=1 tr(A g i ~ E g i ) log 2 jA g i j (3.11a) s:t: P G g=1 P kg i=1 tr(B g V g i V y g i B y g )P t ; (3.11b) where ~ E g i is the conditional expectation of the MSE matrix E g i of UE g i 's data streams given H g i ;g , i.e., ~ E g i =E[E g i j H g i ;g ]. Detailed proof can be found in Appendix.2. Although the objective func- tion (3.11a) is not jointly convex with respect to [A g i ], [ ~ F g i ], and [V g i ], it is respectively convex for every group of variables if the others are xed. Based on this fact, we utilize a block descent algorithm [63] to nd a suboptimal solution to problem (3.11) in Section 3.4.1, which is equivalent to maximizing the lower bound of problem (3.9). 3.4.1 Optimization Algorithm Given the weight matrices [A g i ] and digital precoders [V g i ], the optimal digital combiners [ ~ F g i ] can be obtained based on the rst-order optimality condition as ~ F g i = ~ J 1 g i H g i ;g V g i ;8 i;g; (3.12) where ~ J g i dened in (5) is the instantaneous covariance of the received signal conditionally averaged over the inter-group interference channels. Similarly, we can obtain the optimal [A g i ] and [V g i ] as A g i = (I dg i V y g i H y g i ;g ~ F g i ) 1 ;8 i;g; (3.13) V g i = P kg j=1 H y g j ;g ~ F g j A g j ~ F y g j H g j ;g + P G z6=g P kz l=1 E[ H y z l ;g ~ F z l A z l ~ F y z l H z l ;g j H z l ;z ] +B y g B g 1 H y g i ;g ~ F g i A y g i ;8 i;g; (3.14) where is a Lagrange multiplier that can be found through bisection search to satisfy the power constrain (3.11b). To obtain ~ F g i and V g i from (3.12) 28 and (3.14), the conditional expectation of the form E[ H g i ;z Q H y g i ;z j H g i ;g ] for z6= g needs to be computed, which can be done as follows. By vectorizing E[ H g i ;z Q H y g i ;z j H g i ;g ] as vec(E[ H g i ;z Q H y g i ;z j H g i ;g ]) =E[ H g i ;z H g i ;z j H g i ;g ]vec(Q); we know that we only need to compute the conditional expectationE[ H g i ;z H g i ;z j H g i ;g ]. Further, dening y = vec( H g i ;g ) and x = vec( H g i ;z ), we can nd thatE[ H g i ;z H g i ;z j H g i ;g ] is just a reshaped version ofE[xx y jy]. Since x and y are joint complex Gaussian vectors, we can nd the conditional second moment of x as E[xx y jy] =E[xjy]E[xjy] y + K xx jy ; (3.15) where E[xjy] = K xy K 1 yy y; K xx jy = K xx K xy K 1 yy K y xy : (3.16) In (3.16), K xx and K yy are the covariance matrices of x and y, and K xy is the cross covariance matrix of x and y, all of which are functions of the channel covariance K g i and the analog beamformer and can be obtained as K xx = (B T z W y g i )K g i (B z W g i ); (3.17a) K yy = (B T g W y g i )K g i (B g W g i ); (3.17b) K xy = (B T z W y g i )K g i (B g W g i ): (3.17c) Instead of directly computing (3.17) which involves the multiplication of a large-dimensional matrix K g i , we simplify (3.17a): K xx = 1 L i P i X p=1 2 ip (B T z b ( ip )b T ( ip )B z ) (W y g i a( ip )a y ( ip )W g i ): (3.18) We can nd that the complexity of (3.18) is signicantly less than that of (3.17a). Similar simplications can be applied to (3.17b) and (3.17c), which alleviate the overall computational burden. Moreover, these computations occur only once within the stationarity time of second order channel statis- tics, whose complexity is therefore aordable. The full algorithm is summarized in Algorithm 1. Since the objective 29 Algorithm 1 WAMMSE algorithm for digital precoder 1: Initialize [V g i ] such that constraint (3.11b) is satised. 2: repeat 3: update [ ~ F g i ] according to (3.12) and (5), 4: update [A g i ] according to (3.13), 5: update [V g i ] according to (3.14), 6: until the required accuracy or the maximum number of iterations is reached. function (3.11a) is convex with respect to an individual variable while xing others, each iteration based on the rst order optimality condition will gen- erate a new set of variables, which reduce (3.11a). Meanwhile, substituting (3.13) into (3.11a), we can observe that the objective function is reduced to P G g=1 P kg i=1 log 2 j ~ E g i j, which is lower bounded by zero. Therefore, the con- vergence of the proposed WAMMSE algorithm is guaranteed to a locally optimal solution. 3.4.2 Initialization Strategy The performance and convergence speed of the WAMMSE algorithm largely depend on the selected initial value of [V g i ]. In order to achieve better per- formance, one may run the WAMMSE algorithm many times with dierent initializations and then keep the best result, which however leads to very high complexity. In the following we derive an initial V g i aimed at maximizing the signal to leakage plus noise ratio (SLNR) for UE g i ,8 i;g. Since the BS does not know the inter-group eective interference channel, we dene the SLNR in the following way SLNR g i = tr(P S;g i ) tr( ~ P I;g i ) ; (3.19) where P S;g i = ^ F y g i H g i ;g V g i V y g i H y g i ;g ^ F g i is the instant covariance of received desired signal s g i , ^ F g i is a preliminary digital combiner which is selected as thed g i left dominant singular vectors of the eective channel H g i ;g , and ~ P I;g i represents the covariance of leakage from the signal intended for UE g i plus noise conditionally averaged over the inter-group interference channels, which 30 can be expressed as ~ P I;g i = P G z6=g P kz l=1 ^ F y z l E[ H z l ;g V g i V y g i H y z l ;g )j H z l ;z ] ^ F z l (3.20) + P kg j6=i ^ F y g j H g j ;g V g i V y g i H y g j ;g ^ F g j + 2 g i ^ F y g i W y g i W g i ^ F g i : Since the analog combiner W g i consists of eigenvectors as given after (3.4) and the preliminary digital combiner consists of singular vectors, we have ^ F y g i W y g i W g i ^ F g i = I dg i : (3.21) Let the initial digital precoders satisfy the semi-unitary property, i.e., V y g i V g i = I dg i . As a result, to maximize (3.19) is equivalent to solving a generalized Rayleigh quotient problem [112]. Let P g i = Ptdg i P G g=1 dg denote the power al- located to UE g i under the assumption of equal power allocation over data streams. Since existing BD and EB analog precoders [55] satisfy B y g B g = I lg , we have P g i = tr(V y g i B y g B g V g i ) = tr(V y g i V g i ). Then, we can obtain the opti- mal initial V g i = q Pg i dg i V g i , where V g i consists of d g i dominant eigenvectors of the matrix ~ U 1 g i H y g i ;g ^ F g i ^ F y g i H g i ;g , where ~ U g i is dened as ~ U g i = P G z6=g P kz l=1 E[ H y z l ;g ^ F z l ^ F y z l H z l ;g j H z l ;z ] + P kg j6=i H y g j ;g ^ F g j ^ F y g j H g j ;g + 2 g i dg i Pg i I lg : (3.22) The conditional expectation in (3.22) can be evaluated similarly by (3.15)(3.17). 3.5 Simulation Results In this section we evaluate the proposed hybrid beamforming strategy via Monte Carlo simulations over synthetic data generated by a one-ring chan- nel model, simulation data by ray tracing at 28 GHz, and real measurements at 2:53 GHz, respectively. For \virtual sectorization", we implement the K- means algorithm [91] to group UEs with similar transmit channel covariance. Considering the fact that increasing the number of UE groups (and thus sec- tors) reduces the amount of feedback overhead but violates the orthogonality between UE groups, we investigate scenarios with dierent number of clus- ters. For simplicity, we equally assign the BS's RF chains among UE groups, i.e. l g = l BS G . 31 The optimal analog combiner for the Kronecker channel model given in Section 3.3.1 is used. For analog precoders, both the BD and EB methods are considered [55]. Physically, the BD approach projects the desired signal of a UE onto the complementary space of its inter-group interference covariance, while the EB method concentrates energy on the strongest eigenbeams of group-averaged transmit covariance. We now assume that each UE is initially assigned a single data stream, and only randomly schedules l g UEs if the number of UEs in a group is larger than l g , otherwise all k g UEs within the g-th group are scheduled. To implement JSDM in the scenario where each UE has multiple RF chains, we project the eective channel H g i ;g by its dominant left singular vector to turn the multiple RF chains as one eective RF chain. For both JSDM and the proposed method, the optimal combiner (3.7) at the UE side is performed. 3.5.1 One-ring Cluster Model We consider a single-cluster channel model, where multipath components' (MPC) direction of departure (DOD)/direction of arrival (DOA) concen- trates around a dominant direction with a certain angular spread. In every drop of multiple UEs, we independently generate a single-cluster with partic- ular dominant DOD and DOA for each of them. We use the same angular spreads and for each UE. We consider uniform linear arrays (ULA) for both the BS and the UEs and assume the Kronecker model for the prop- agation channel. Then, the joint spatial correlation function becomes the product of transmit and receive correlation, which is ( m; a; n; b) = 1 2 1 2 R + e j2D( m a) sin () R + e j2D( n b) sin () d d ,[ t;g i ] m; a [ r;g i ] n; b ; (3.23) where ( m; a; n; b) indicates the spatial correlation coecient between the channel from BS antenna m to UE antenna n and the channel from BS an- tenna a to UE antenna b, [] m; a represents the ( m; a)-th entry of a matrix. Therefore, given the parameter set of a cluster, we can compute its corre- sponding channel covariance K2 C MNMN through (3.23). Ignoring the impact of large scale loss (path loss plus shadowing, which could of course be easily included, but would tend to obfuscate the eects of the overlap of an- gular spectra), we directly simulate the transfer channel realization through 32 Table 3.1: Simulation parameters DOD range min =60 ; max = 60 DOA range min =180 ; max = 180 DOD/DOA spread = 15 , = 50 Number of UEs K = 16 Number of BS antennas and RF chains M = 64, l BS = 16 Number of UE antennas and RF chains N = 16, l UE = 1; 2; 4 Antenna spacing (in wavelength) D = 1 2 Number of independent UE drops 100 its covariance, i.e. vec(H) = K 1 2 w, where wCN (0; I MN ). Meanwhile, with the spatial structure of the Kronecker channel model (3.6), (3.17a) can be simplied to K xx = (B T z t;g i B z ) (W y g i r;g i W g i ): (3.24) The detailed simulation congurations are listed in Table. 3.1. We evaluate the downlink training and uplink feedback overhead based on the model given in Section 3.2. Considering a coherence bandwidth of 500 kHz [114] and a coherence time of 2 ms, corresponding approximately to the mobile speed of 1 m/s at 60 GHz, we can obtain that the coherence block includes around T = 994 channel uses based on long-term evolution (LTE) system congurations. Let be 1, then we haveT d =T . Meanwhile, we con- sider the ideal situation with = 1 that the feedback CSI is only transmitted once. All the results are averaged over 100 independent UE drops. Figure 3.2 depicts the sum rate achieved by the proposed WAMMSE scheme and JSDM, where Fig.3.2(a) and (b) use the BD analog precoder, Fig.3.2(c) and (d) use the EB analog precoder, and training plus feedback overhead is considered in Fig.3.2(b) and (d). In the legend we use \Gain" to denote the performance gain of the WAMMSE over JSDM, and given the number of clusters, dierent number of RF chains at each UE are simulated as marked on the bars. We can see that the proposed WAMMSE scheme exhibits signicant performance gain over JSDM in all considered scenarios. When training and feedback overhead is not considered as shown in Fig.3.2(a) and (c), grouping UEs in a single cluster achieves the best performance, in which case there is no inter-group interference. However, even conservatively 33 Figure 3.2: Sum rate v.s. Number of clusters with SNR = 40 dB for block di- agonalization (left) and eigen-beamforming (right) analog beamformer. The left side gures does not incorporate the cost of training and feedback over- head, which is considered in the right side ones. Number on the bars corre- sponds to number of UE RF chains. considering the mobile speed of 1 m/s, we can see from Fig.3.2(b) and (d) that a single cluster is no longer optimal due to the high overhead. By comparing the upper and lower subgures, we can nd that the BD analog precoder outperforms the EB analog precoder when the number of UE groups is 1; 2 and 4. However, in the case of 8 UE groups, the BD analog precoder sacrices much beamforming gain to suppress inter-group interference, leading to lower data rate than the EB analog precoder that maximizes the beamforming gain. Figure 3.3 compares the performance of the WAMMSE scheme and JSDM as a function of the SNRs, where the number of clusters is G = 4 and the numbers in the legends stand for the number of RF chains at each UE. Under the BD analog precoder as shown in Fig. 3.3(a), the sum rate of JSDM tends to atten because of the residual inter-group interference after analog precoding. In contrast, the WAMMSE scheme is able to eciently combat it. Under the EB analog precoder in Fig. 3.3(b), the performance gap between the two schemes is even larger, especially when each UE has only one RF chain. When more RF chains are available at the UEs, the UEs can partially mitigate the inter-group interference, which reduces performance gain of the 34 Figure 3.3: Sum rate v.s. SNR with number of clusterG = 4 for block diago- nalization (left) and eigen-beamforming (right) analog beamformer. Number in the legends corresponds to number of UE RF chains. WAMMSE over JSDM. 3.5.2 Ray Tracing Channels We simulate the double-directional channel impulse response with the assis- tance of a commercial ray-tracing tool, namely Wireless InSite [115]. The analysis in [89] uses the same ray-tracing dataset. Our ray tracer can simulate propagation channel characteristics from 50 MHz to 100 GHz in various envi- ronments, which includes the application of mm-wave communication around 28 GHz. Wireless InSite performs ray launching, emitting rays (representing plane waves) from the transmitter (TX) into all directions, and following each ray as it interacts (re ection, diraction, transmission) with the objects in the environment. This process continues until either the ray reaches receiver (RX) or the strength of the ray falls below a specied threshold. The input to the ray-tracer is a digital map of the environment, which includes 3D models of the buildings and the electromagnetic characteristics of the building walls. Meanwhile we include the models of foliage in the simulations, because the foliage eects are considered to be non-neglibile in mm-wave system. The output is a list of parameters for the MPCs that 35 matches the parameter list of a double-directional channel model [116], where each MPC is associated with the path power 2 p , propagation delay p , and angle of departure p and arrival p . Like all ray tracers, the accuracy of the program is determined by the accuracy of the environmental database, the density of emitting rays, and the maximum number of interactions taken into account. Simulation results have been compared to measurements in a variety of settings and shown to provide reasonable agreement [115]. We stress that the main point of the simulation is not to reproduce, for a spe- cic location, the exact channel characteristics, but rather to obtain a set of channel characteristics that are reasonably typical for a mm-wave channel. The simulation has been conducted based on the model of University Park Campus, University of Southern California (USC), which is shown in Figure 3.4. The green dot is the BS located above the rooftop in the middle of the map, while simulated UEs are red routes covering all possible streets of the campus. Gray objects represent the buildings, and their building walls are congured to use the same material for simplicity. The green 3D polygons denote foliage features. In the mm-wave channel the diracted MPC will be greatly attenuated, therefore restricting the ray-tracer to consider up to one diraction is a valid simplication and speeds up the simulation. The detailed simulation congurations are listed in Table 3.2. Figure 3.4: Ray-tracing simulation environment: two UE routes spread over orthogonal street canyons are selected for simulations, where one is marked by an black arrow, and the other is marked by a blue one. 36 Table 3.2: Ray-tracing simulation congurations of USC campus Carrier Frequency 28 GHz Antenna Pattern Isotropic Antenna Polarization Vertical TX power 30 dBm BS height 45 m UE height 2 m Maximal Diraction 1 Maximal Re ection 10 Based on the ray tracing channels, we investigate the sum rate of the proposed scheme over two UE routes, whose separation distances to the BS are within 200 m and path losses are below 130 dB. For each UE drop, we randomly select total K = 16 UEs along two routes, and generate 20 independent small scale fading realizations. Other parameter settings are maintained the same as in Table 3.1. The noised power is set to be100 dBm, which typically corresponds to a 20 MHz bandwidth. For the evaluation of CSI acquisition, we adjust to be 2, calculate the coherence time for mobile speed 1 m/s at 28 GHz, and maintain other parameters, including coherence bandwidth and the number of orthogonal time slots, the same as in Section 3.5.1. We stress that in the generation of the data, no assumptions about a Kronecker structure of the model are made. On the contrary, mm-wave channels, due to their small number of interactions, are commonly assumed to deviate considerably from the Kronecker structure. Despite this, we see below that our proposed scheme shows signicant performance improvement also in this case. Fig. 3.5 exhibits the sum spectral eciency varying with number of clus- ters for dierent settings of UE RF chains, i.e., 1, 2, and 4, where we x the transmit power to be 50 dBm (100 W). For the typical small-cell scenario marked in Fig.3.4, if we account for the CSI acquisition cost, the optimal hy- brid beamforming strategy is EB analog precoder plus proposed WAMMSE digital beamforming with two UE clusters, whose net sum rate is approxi- mately 2:7 times that of JSDM as Fig. 3.5(d) exhibits. This is because of the fact that the JSDM scheme is not able to combat the severe inter-group interference brought by non-orthogonal UE grouping. In Fig. 3.5(b), with 37 1 2 4 0 50 100 150 200 250 (a) BD, without overhead number of cluster Sum rate (bps/Hz) JSDM Gain 1 2 4 0 50 100 150 200 250 (c) EB, without overhead number of cluster Sum rate (bps/Hz) JSDM Gain 1 2 4 0 50 100 150 (b) BD, with overhead number of cluster Sum rate (bps/Hz) JSDM Gain 1 2 4 0 50 100 150 (d) EB, with overhead number of cluster Sum rate (bps/Hz) JSDM Gain 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 Figure 3.5: Sum rate v.s. Number of clusters when transmit power P t = 50 dBm with the ray-tracing simulation data for block diagonalization (left) and eigen-beamforming (right) analog beamformer. The left side gures does not incorporate the cost of training and feedback overhead, which is consid- ered in the right side ones. Number on the bars corresponds to number of UE RF chains. the BD method, the analog precoder reduces the inter-group interference by sacricing some beamforming gains, which signicantly improves the perfor- mance of JSDM, and slightly worsens the proposed scheme. However, the proposed beamforming strategy still outperforms JSDM by 25%. Without the UE grouping, the BD method reduces to the EB. The per- formance gap between the proposed scheme and JSDM is smaller than that of Fig. 3.2, which is due to the fact that the system is in the interference- limited mode in the very high SNR regime, and JSDM with zero-forcing digital beamforming approaches the optimal solution. Meanwhile, since the quasi-optical propagation dominates at mm-wave frequencies, and channels of close-by UEs exhibit (quasi-) line-of-sight propagation, the gain by in- creasing UE RF chains is very limited. In Fig. 3.6, we present the sum rate varying with the transmit power P t , ranging from 20 50 dBm (0:1 100 W) when the number of UE groups is 38 20 25 30 35 40 45 50 40 60 80 100 120 140 160 180 P t (dBm) Sum rate (bps/Hz) (a) BD 20 25 30 35 40 45 50 40 60 80 100 120 140 160 180 P t (dBm) Sum rate (bps/Hz) (b) EB JSDM−1 WAMMSE−1 JSDM−2 WAMMSE−2 JSDM−4 WAMMSE−4 JSDM−1 WAMMSE−1 JSDM−2 WAMMSE−2 JSDM−4 WAMMSE−4 Figure 3.6: Sum rate v.s. Transmit power P t when number of cluster G = 2 with the ray-tracing simulation data for block diagonalization (left) and eigen-beamforming (right) analog beamformer. Number in the legends cor- responds to number of UE RF chains. two. With the BD method eectively suppressing inter-group interference, both beamforming strategies continuously improve the system performance by increasingP t in Fig. 3.6(a). However, with the EB method, the sum rate of JSDM will soon saturate in the interference-limited mode, while the pro- posed method capable of combating inter-group interference performs even better than that of BD method as Fig. 3.6(b) exhibits. 3.5.3 Measured Channels We nally investigate the performance in measured propagation channels at cm-wave frequencies. At those frequencies, the Kronecker assumption is generally fullled better (though not perfectly). The measurement campaign was conducted in the old-town city center in Cologne (Germany), which is a medium-sized city with narrow and sometimes winding streets. The measured area was mostly made up of buildings with similar heights and multiple oors (ranging from 4 to 8). The TX, acting as the BS, was mounted on a rooftop of a 30 m high-rise building with the RXs, acting as UEs, placed 39 on the rooftop of a car. The height of the RX was about 2:5 m above ground. The TX and RX placements in the environment are shown in Fig. 3.7 and 3.8, respectively. Figure 3.7: Tx view of the urban macro-cell in Cologne Figure 3.8: Rx placement for the urban macro cell in Cologne We performed the channel measurement with a wideband MIMO MEDAV RUSK channel sounder operating at a center frequency of 2.53 GHz. This channel sounder has been used in a number of previous channel measurement campaigns with extensive details provided in [117,118]. The channel sounding setup uses a cylindrical antenna array structures at both TX and RX ends, which guarantees a truly 3D channel measurement and ensures that MPCs from all directions in the urban environment could be easily captured. By using a combination of virtual and switched array, a massive MIMO with a cylindrical array of dimension 60 (circumference) 8 (vertical)2 (polarization) at the BS, and 822 at the UE was measured. 45 positions of the UE within the cell were measured, thus providing (virtual) multi-user MIMO channel characteristics. The channel sounder conguration is listed in Table. 3.3, and additional details can be found in [109]. For the temporal and spatial analysis of the measurement data, the high resolution parameter estimation framework RIMAX [13] was used in the 40 Table 3.3: Measurement Channel sounder conguration Bandwidth 2:52 2:54 GHz Number of frequency points 257 Number of channels 900 32 Total time Syn. Aperture Approx. 10 mins TX ports, RX ports 900 ports, 32 ports. Azimuth range [180 ; 180 ] Elevation range [90 ;90 ] channel parameter extraction so as to obtain an antenna independent char- acterization of the radio channel. With the knowledge of extracted MPCs, including path power, propagation time, and angular information, we can build up the double directional channel description (2.2), and investigate the system performance of the proposed scheme over a realistic propagation chan- nel. Figure 3.9: Top view of the measured urban environment and distributions of terminals Fig. 3.9 exhibits the top view of the urban environment, where we also 41 mark the TX position, and 12 measured sites whose propagation channels are then used for systematic simulations. Although some UE locations are widely separated, e.g., RX 6 and RX 12, their angular spectra are largely overlapped as Fig. 3.10 shows. Given G = 2, the K -means UE grouping −60 −40 −20 0 20 40 60 80 −140 −138 −136 −134 −132 −130 −128 −126 −124 −122 −120 φ az [degree] Power [dB] RX 6 RX 30 RX 31 RX 12 RX 14 Figure 3.10: Scatter plot of MPC in the azimuth domain of DOD for partial measured sites algorithm splits UEs into two groups, where one is centered around az = 0 , with RX index setf6; 30; 31; 32; 10; 50g, and the other is centered around az = 50 , includingf12; 14; 15; 17; 18; 44g. However, severe inter-group in- terference needs to be dealt by hybrid beamforming design. For simulation parameter settings, we let l BS = 12 andl UE = 1; 2; 4, while other parameters related to system size are maintained the same as in Table 3.1. To account for the cost of CSI acquisition, we consider a more realistic cellular network, which divides the entire coherence block into 10 orthogonal time slots, i.e., = 10, at 2:53 GHz. Meanwhile, UE mobility speed, coherence bandwidth, 42 and repetition times of feedback transmission are respectively maintained the same as those in Section 3.5.1. 1 2 0 50 100 (a) BD, without overhead number of cluster Sum rate (bps/Hz) JSDM Gain 1 2 0 50 100 (c) EB, without overhead number of cluster Sum rate (bps/Hz) JSDM Gain 1 2 0 5 10 15 (b) BD, with overhead number of cluster Sum rate (bps/Hz) JSDM Gain 1 2 0 10 20 30 40 (d) EB, with overhead number of cluster Sum rate (bps/Hz) JSDM Gain 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 Figure 3.11: Sum rate v.s. Number of clusters when transmit power P t =30 dBm with the Cologne measurements data for block diagonalization (left) and eigen-beamforming (right) analog beamformer. The left side gures does not incorporate the cost of training and feedback overhead, which is considered in the right side ones. Number on the bars corresponds to number of UE RF chains. Similar to Fig. 3.2 and 3.5, Fig. 3.11 depicts (net) sum rate varying with number of UE groups and l UE under dierent hybrid beamforming strate- gies at P t = 30 dBm. Note that since the pathloss around 2 GHz is much smaller than that of mm-wave frequencies, we investigate a lower total trans- mit power than that in Fig. 3.5. Concerning the impact of training and feedback overhead, we can observe that, with two UE groups, EB analog precoder plus the proposed WAMMSE digital beamformers achieve the op- timal trade-o to reduce the CSI acquisition cost and suppress inter-group interference. Splitting UEs into two groups brings strong inter-group inter- 43 ference. In Fig. 3.11(a) and (b), the BD analog precoder which nulls out overlapped eigendirections not only sacrices signicant beamforming gains but also loses degrees of freedom of those UEs with dominant MPCs lying in the overlapped angular spectrum region as Fig. 3.10 exhibits, which creates the so called \group-edge" problem. In Fig. 3.11(a) and (c), when G = 1, increasing UE RF chains will not improve the performance of JSDM with digital precoder based on Spencer's BD method [119], while the performance of the proposed WAMMSE based on the joint precoder/combiner design im- proves by increasing UE RF chains. 0 10 20 30 0 10 20 30 40 50 60 70 80 90 P t (dBm) Sum rate (bps/Hz) (a) Without UE grouping JSDM−1 WAMMSE−1 JSDM−2 WAMMSE−2 JSDM−4 WAMMSE−4 0 10 20 30 0 10 20 30 40 50 60 P t (dBm) Sum rate (bps/Hz) (2) Two UE groups JSDM−1 WAMMSE−1 JSDM−2 WAMMSE−2 JSDM−4 WAMMSE−4 Figure 3.12: Sum rate v.s. Transmit powerP t with eigen-beamforming (EB) analog precoder based on the Cologne measurements data for the scenario with a single UE group (left) and the one with two UE groups (right). Num- ber in the legends corresponds to number of UE RF chains. Fig. 3.12 presents the sum rate varying with total transmit powerP t with EB analog precoder when G = 1 and 2. The proposed WAMMSE scheme signicantly outperforms JSDM under various settings of transmit power and UE RF chains, especially when there are multiple UE groups. Similar to results exhibited in Fig. 3.6(b), the severe inter-group interference becomes the bottleneck of JSDM when P t increases. For the Cologne measurements data, we also investigate the impact of 44 12 14 16 18 20 22 24 0 10 20 30 40 50 60 70 80 90 100 (a) l UE =1 l BS Sum rate [bit/s/Hz] JSDM_P t =10 dBm WAMMSE_P t =10 dBm JSDM_P t =40 dBm WAMMSE_P t =40 dBm 12 14 16 18 20 22 24 0 20 40 60 80 100 120 (b) l UE =2 l BS Sum rate [bit/s/Hz] JSDM_P t =10 dBm WAMMSE_P t =10 dBm JSDM_P t =40 dBm WAMMSE_P t =40 dBm Figure 3.13: Sum rate v.s. Number of BS RF chains l BS with eigen- beamforming (EB) analog precoder and two UE groups based on the Cologne measurements data for number of UE RF chains l UE to be 1 and 2. dierent numbers of BS RF chains as shown in Fig. 3.13, where the sum rate varies with increasing the number of BS RF chains. We maintain the param- eter set the same as that to obtain Fig. 3.11 and 3.12, except ranging the number of BS RF chains l BS from 12 to 24. Two sets of curves are obtained by setting P t = 10 dBm and 40 dBm. Comparing JSDM and the proposed scheme, we can observe that JSDM cannot well exploit gains brought by increasing BS RF chains at high SNR regime due to the inter-group interfer- ence, while WAMMSE eciently combats the inter-group interference and continuously gain from more BS RF chains. In Fig. 3.13(b), by introduc- ing additional UE RF chain enhances the capability of interference suppres- sion, which leads to a better performance of the proposed scheme at high SNR regime. 3.6 Conclusions In this chapter, we studied the design of hybrid beamforming for massive MIMO FDD downlinks. The functionality of multiple RF chains at the UE 45 is explored. We rst prove the optimality of decoupling the design of analog precoder and combiner under the Kronecker channel model, which leads to an optimal analog combiner formed by selecting the strongest eigenbeams of the receive covariance matrix in the sense of maximizing the ratio between intra-group signal to inter-group interference plus noise. Then, a WAMMSE algorithm is proposed to obtain the per-group digital precoders merely based on intra-group eective channel and channel covariance, which maximize a lower bound of the conditional average net sum rate. Simulation results over a one-ring channel model, ray-tracing data at 28 GHz, and urban macrocell measurements at 2:53 GHz demonstrate the necessity of (non-orthogonal) \virtual sectorization" plus intelligent hybrid beamforming to achieve opti- mal trade-o between the reduction of CSI acquisition cost and suppression of inter-group interference. By utilizing conditional second order channel statistics, the proposed scheme incorporating the coupling eect of digital combiner for designing digital precoder alleviates the impede of the inter- group interference. Compared with existing schemes, our algorithm provides better performance to support massive MIMO FDD downlink in a variety of scenarios. 46 Chapter 4 Optimizing Channel-Statistics-Based Analog Beamforming In this chapter, we consider the design of analog beamformers for the down- link of multi-user massive multiple-input-multiple-output (MIMO) systems. We specically investigate systems where both link ends are equipped with hybrid digital/analog (HDA) beamforming structures, where the analog beam- formers are adapted based on second order channel statistics, reducing the training overhead as well as hardware eort. We present a framework for the optimization of such beamformers operating in mm-wave channels, exploit- ing the directional characteristics and sparse nature of such channels. We develop an approximate upper bound of the ergodic sum capacity, based on which ecient beamforming algorithms are devised. Simulation results show signicant performance improvements of the proposed algorithms compared to state-of-the-art algorithms. 4.1 Introduction To the best of our knowledge, there is little consideration for exploring the optimality of jointly designing analog beamformers at both ends with HDA structure based on the channel statistics and we are trying to close this gap in this chapter. Assuming UEs equipped with a single RF chain, [29, 55, 89] do not explore the potential spatial multiplexing gain at the UE-level. Concerning HDA structures with many RF chains at both ends, [28] is able to let the UE receive multiple symbols simultaneously, but it heuristically 47 designs the analog combiner by dominant eigenbeams of the transfer channel. In [30], by simply xing the digital beamformers to be identity matrices, the transmit and receive analog beamformers are optimized alternatively, but it neglects the coupling eect between them. The scheme proposed by [106] utilizes a covariance-based precoder, similar to [55] and optimizes the combiner by maximizing a sum-rate upper bound also based on the long- term CSI. However, it assumes the Kronecker channel model which does not well re ect the directional characteristics of the mm-wave channel, moreover it considers only analog beamformers at both ends rather than the HDA structure with both analog and digital beamformers. To develop ecient algorithms to jointly optimize the analog precoder/- combiner by exploiting the highly directional property of the mm-wave chan- nel, we need to deal with two major challenges. First, the coupling between the analog and digital beamformers needs to be incorporated in the analog beamformer optimization. The other diculty lies in the design over dier- ent time scales for analog and digital beamformers, where the former is based on the long-term CSI, while the latter depends on the instantaneous eective channel. With beamformers on dierent time scales, the expectation and the optimizations are interlaced, greatly increasing the mathematical diculty: an outer optimization over the long-term CSI can be easily performed only if the inner optimization on the channel snapshot basis and expectation would provide a closed-form solution, which is not the case. The main contributions of this chapter can be summarized as follows: • We develop an optimization framework for the joint design of transmit and receive analog beamformers based on long-term CSI for the down- link multiuser massive MIMO system, where the digital and analog beamformers are based on the CSI on dierent scales. We also account for the fact that each UE can receive multiple data streams but the numbers of streams for dierent users are not pre-specied but rather requires joint optimization with the analog beamformers. The chan- nel model we considered is a non-kronecker model that is relevant for mm-wave systems. • We derive an approximate upper bound of the ergodic sum capacity of the considered HDA system, which is used as the objective function for the optimization of transmit and receive analog beamformers. We show that existing upper bounds of the ergodic sum capacity, which are widely used for the design of analog beamformers for single-end 48 HDA systems, are not applicable for the system with HDA structures at both ends. In contrast, our developed approximate upper bound is able to capture the essential constraint of the HDA structure at both ends on the numbers of data streams assigned to dierent users. • We propose a low-complexity algorithm to maximize the approximate upper bound by properly decoupling the design of analog and dig- ital beamformers. We prove the optimality of the algorithm when multipath components (MPCs) exhibit distinct direction of departure (DOD) and direction of arrival (DOA). Simulations demonstrate the advantages of the proposed beamformers over state-of-the-art meth- ods, and show the signicance of the common scatterer eect for the beamforming design. The rest of the chapter is organized as follows. In Section 4.2, we rst formulate the original downlink problem in Section 4.2.1 and then visit its up- link dual problem to facilitate the optimization in Section 4.2.2. A reduced- complexity formulation by semi-decoupling the design of analog and digital beamformers is exhibited in Section 4.2.3. In Section 4.3, we analyze the shortcomings of existing upper bounds of ergodic sum capacity for the opti- mization of both-end analog beamformers, and then develop a new approxi- mate upper bound, based on which an analog beamformer design algorithm is developed in Section 4.4. Simulations results are presented in Section 4.5 before the conclusion in Section 4.6. 4.2 Problem Formulation The short coherence time creates a fundamental information theoretical bot- tleneck for the implementation of massive MIMO at mm-wave bandwidth [55]. Without prior knowledge of the channel's directional characteristics, the system has to treat the block fading channel isotropically at both ends and assign orthogonal resource for training at the antenna-level to estimate NM entries of the transfer channel, which leads to a tremendous reduction of multiplexing gain. Nevertheless, considering the channel model (2.2), with the knowledge of the angular power spectrum, i.e., [A i ]; [B i ] and [ i ], the system only needs to perform the training at the beam-level to estimate the small scale fading [G i ]. Due to the sparsity of the mm-wave channel, the dimension of G i , 49 i.e.,P i , is much less thanNM, which signicantly reduces the burden of the training phase. For example, if the channel has just a single MPC [29], by forming the transmit and receive beams pointing toward the direction of the MPC at both ends, we only need K channel uses for uplink training of K UEs. The cost of obtaining the channel statistics is negligible as long as the stationarity region of the channel statistics is much larger than a coherence block, which is usually fullled: with pedestrians mobility, the coherence time of the small scale fading is on the order of milliseconds, while channel statistics vary within a few seconds or even longer. Therefore, the implementation of massive MIMO at mm-wave frequen- cies necessitates the exploration of the channel statistics to rst reduce the dimension of the eective channel. To be specic, within the stationarity of channel statistics, the BS physically forms an analog precoder illuminating particular DODs to serve multiple UEs, while the UE forms an analog com- biner to strengthen receive signal energy at certain DOAs and null out other directions, which generates the l UE l BS eective channel that the system observes at the baseband: H i , W y ai H i F a : (4.1) Although the dimension of eective channel is reduced to the number of RF chains, the large array gain at the order of number of antenna elements is reaped by the analog beamformers. Given the eective channel [ H i ], it follows that the digital beamforming design becomes a typical MU-MIMO problem, where suitable schemes currently considered can be applied on the eective channel, e.g., block diagonalization [119] and dirty paper coding [120], which motivates us to nd the capacity-optimal analog beamforming. Also from the perspective of the hardware constraint, since the analog beamformer is implemented in the RF domain with less exibility than the baseband processing, it is reasonable to adjust it to the variation of channel statistics rather than the instantaneous CSI. 4.2.1 Downlink Problem Formulation The general problem of nding the capacity-optimal analog beamforming based on the channel statistics, including the path steering matrix [A i ]; [B i ] and average MPC power prole [ 2 i ], has not been explored yet, to the best of our knowledge. Heuristically utilizing a covariance-based downlink precoder 50 for the single-cell with single-antenna UEs, [55,89] form the analog precoder to stack eigenvectors of [K BSi ], whose optimality holds only under two strict conditions [55]: (1) the UEs exhibiting dierent transmit channel covariance shall lie in spatially orthogonal subspaces, i.e., if K BSi 6= K BSj for i6= j , then K BSi K y BSj = 0; (2) the number of the BS RF chains is no less than the number of eigenmodes of [K BSi ]. Extending to the scenario where each UE is also equipped with multiple antennas, [106] forms an analog combiner towards the strongest eigenmodes of the channel covariance at the UE and investigates its optimality under the Kronecker channel model. Since the Kronecker channel couples all trans- mit and receive eigenmodes together, it is intuitively optimal to select the strongest receive eigenmodes in the sense of maximizing the SNR. However, in a sparse mm-wave channel model with high dependence between DOD and DOA as given in (2.2), the optimality does not hold. For example, one receive eigenmode with dominant desired signal energy may be heavily con- taminated by interference, so that in fact it is the one the UE prefers to null out for interference suppression. In summary, with spatially correlated UEs and limited number of BS RF chains, for a generic mm-wave channel, the optimization of joint ana- log precoder-combiner-design based on the channel statistics has not been addressed and we are trying to close this gap. Given the analog beamform- ing and implied eective channel exhibited in (4.1), the achievable rate of UE (i) by using the dirty paper coding scheme in the digital baseband is given by [120] C (i) = log 2 W y a(i) W a(i) + H (i) P ji S (j) H y (i) 2 W y a(i) W a(i) + H (i) P j>i S (j) H y (i) ; (4.2) where [(i)] K i=1 is the ordered index set of UEs, and S (i) is the input covari- ance of UE (i) . The dirty paper coding scheme optimizes S (i) in a sequential manner so that UE (i) will not be interfered by the streams for prior UEs, i.e., [UE (j) ];j <i. Therefore, the sum rate of the UEs is P K i=1 C (i) , which can be maximized by solving the problem max [(i)];[S i ] P K i=1 C (i) ; s:t: P K i=1 tr(F a S i F y a )P t ; (4.3) whereP t is the downlink transmit power. The maximal sum rate of the UEs is achieved over the variable space, including all possible permutations of 51 index set and input covariance, i.e. [(i)] K i=1 and [S (i) ] K i=1 . Note that the power constraint of (4.3) is the outputs of antennas averaged over sample symbols within the coherence block. Aiming to jointly optimize the analog precoder-and-combiner by using channel statistics, we formulate the following general problem max Fa;[W ai ] E[ max [(i)];[S i ] P K i=1 C (i) ]; s:t: P K i=1 tr(F a S i F y a )P t ; (4.4) where the expectation operator averages out the small scale fading, i.e., [G i ] in (2.2). However, problem (4.4) is extremely dicult to solve. The main challenge lies in two aspects: a) with the variables of interest lying in both nominator and denominator, the inner maximization problem (4.3) is gen- erally non-convex; b) dierent variables optimized over dierent time scales are coupled together, e.g., F a and [S i ] jointly determine the transmit power, while [S i ] varies with the block fading channel but F a remains the same for a relatively long period of time. We consider the fully-connected hybrid beamforming structure as Fig. 2.1 exhibits. To develop covariance-based analog beamformers with reduced- complexity HDA structures, e.g. subarray structure in [121], while interesting for further investigations, is out of the scope of this dissertation. Aiming to maximize the ergodic downlink capacity, problem (4.4) does not incorporate the constant modulus constraint over the analog beamforming matrix, so that our solution can be viewed as an upper bound of the covariance-based analog beamformers formed by the phase-shifter-network. Simulation results in Section 4.5 investigate the impact of the constant modulus constraint nu- merically. 4.2.2 Uplink Dual Problem Formulation Representing the original formulation (4.4) in a more tractable manner, we resort to the uplink-downlink duality theory [122] and try to develop its equivalent problem for the uplink. One important theorem proposed by [122] is that for a downlink capacity maximization problem under the transmit power constraint with structure K X i=1 tr(S i Q 1 )P t ; (4.5) 52 and noise distribution followingCN (0; Q 2i ) at UE i , the uplink dual problem is under the power constraint P K i=1 tr(S 0 i Q 2i ) P t with noise distribution CN (0; Q 1 ) at the BS, where Q 1 and [Q 2i ] are arbitrary valid covariance matrices and S 0 i is the uplink input covariance of UE i . Given the eective channel [ H i ], we can observe from (4.3) and (4.2) that Q 1 = F y a F a ; Q 2i = 2 W y ai W ai ; i = 1; 2;:::;K: (4.6) Therefore, the dual problem of (4.3) to maximize the uplink multiple-access (MAC) channel capacity can be expressed as max [S 0 i ] logj( P K i=1 H y i S 0 i H i + Q 1 )Q 1 1 j (4.7) s:t: P K i=1 tr(S 0 i Q 2i )P t : When optimizing analog beamformers, we consider the capacity-achievable digital beamformers at both ends instead of linear beamformers F d and [W di ], and thus obtain the expressions of downlink and dual uplink capacity. There- fore, W di does not appear in Q 2i . For every instantaneous channel realization, problem (4.7) and (4.3) are equivalent, which leads us to replace the inner optimization of (4.4) by its uplink dual. Substituting (4.6), we develop an equivalent uplink problem with simpler objective function: max Fa;[W ai ] E[max [S 0 i ] logj( P K i=1 H y i S 0 i H i + F y a F a )(F y a F a ) 1 j] (4.8) s:t: P K i=1 tr(S 0 i W y ai W ai ) 2 P t : 4.2.3 Semi-Decoupled Analog Beamforming Design Either the downlink problem formulation (4.4) or its dual (4.8) introduces a two-tier joint optimization, where the inner and outer maximization are based on the instantaneous and average CSI, respectively. Although (4.8) exhibits a simplied structure, the coupling eect of variables F a ; [W ai ] and S 0 i hinders the closed-form solution to develop ecient algorithms. Decoupling the impact of the instantaneous [S 0 i ] from the long-term F a and [W ai ] can signicantly reduce the complexity of the problem. For ex- ample, [30] directly xes the input covariance being an identity matrix, i.e., S 0 i = 2 I l UE ,8i where is a constant factor to ensure the power budget 53 constraint is satised. However, this assumption implicitly enforces that all UE RF chains are used for spatial multiplexing, i.e.,8i;s i =l UE , which may be far away from the optimal solution, e.g., full spatial multiplexing does not perform well in the low SNR regime. Without considering the stream assignment [s i ] involved in the optimization for [S 0 i ] may lead to an undesired system design. Instead of a fully decoupled design, we assume that the input covariance in the dual uplink is also time-invariant to the block fading channel within the stationarity region of channel statistics. Therefore, (4.8) can be reduced to max Fa;[W ai ];[ ~ S i ] E[logj( P K i=1 H y i ~ S i H i + F y a F a )(F y a F a ) 1 j]; (4.9) where the auxiliary variables [ ~ S i ] are introduced to represent the time-invariant input covariance to the dual uplink channel. Factorizing ~ S i = ~ S 1 2 i ( ~ S 1 2 i ) y and substituting (4.1) into (4.9), the function inside the expectation becomes logjF y a P K i=1 H y i ~ W i ~ W y i H i F a (F y a F a ) 1 + I l BS j = (a) logj P K i=1 H y i ~ W i ~ W y i H i F a (F y a F a ) 1 F y a + I M j = (b) logj P K i=1 H y i ~ W i ~ W y i H i ~ F a ~ F y a + I M j = (c) logj ~ F y a P K i=1 H y i ~ W i ~ W y i H i ~ F a + I l BS j; (4.10) where we dene8i; ~ W i , W ai ~ S 1 2 i , and equality (a) and (c) follow the identity logjXY + Ij = logjYX + Ij. For (b), the projection matrix F a (F y a F a ) 1 F y a is factorized into ~ F a ~ F y a , where ~ F a is semi-unitary. Based on the last line of (4.10), with the only assumption of the time- invariant input covariance [ ~ S i ], we derive a more compact formulation to optimize the analog beamforming: max ~ Fa;[W ai ];[ ~ S i ] E[logj ~ F y a P K i=1 H y i ~ W i ~ W y i H i ~ F a + I l BS j] (4.11) s:t: P K i=1 tr( ~ W i ~ W y i ) 2 P t ; ~ F y a ~ F a = I l BS ~ W i = W ai ~ S 1 2 i ; W ai 2C Nl UE ; ~ S i 2C l UE l UE ;8i: (4.11) integrates the two-tier of (4.8) into one-tier by letting the auxiliary in- put covariance [ ~ S i ] be invariant to the block fading channel. Note that this is 54 only an auxiliary assumption made for the design of the analog beamformer, while in actual operation the real input covariance is designed based on the instantaneous CSI. 4.3 Upper Bound of Achievable Sum Rate Despite the simplications, it is still dicult to obtain a closed-form expres- sion of the achievable ergodic rate in (4.11). Therefore, Monte-Carlo simula- tions are generally needed to nd the maximal achievable rate and optimize the analog beamforming, which is computationally prohibitive. Instead, we aim to derive a closed-form upper bound, based on which an ecient algo- rithm for the design of analog beamformers will be developed. Following the Jensen's inequality, dierent bounding strategies are investigated. In Section 4.3.1, we will show that a traditional bounding technique by simply taking the expectation inside the determinant does not re ect the rank constraint brought by HDA structure, which leads to the approximation error growing unbounded with the dierence between number of RF chains and rank of channel covariance. Therefore, in Section 4.3.2, we resort to derive a tighter bound by taking the expectation outside the determinant that incorporates this concern, which will be utilized to formulate an approximate problem of (4.11) in Section 4.4. 4.3.1 Take Expectation Inside the Determinant Since the logarithm of determinant is a concave function, we rst apply Jensen's inequality, i.e., E[logjXj] logjE[X]j, to obtain the following up- per bound: E[logj ~ F y a P K i=1 H y i ~ W i ~ W y i H i ~ F a + I l BS j] logj ~ F y a P K i=1 E[H y i ~ W i ~ W y i H i ] ~ F a + I l BS j = (d) logj ~ F y a P K i=1 1 L i B i y i diag(A y i ~ W i ~ W y i A i ) i B y i ~ F a + I l BS j; (4.12) where equality (d) is generated by substituting the channel model (2.2) and assuming that MPCs exhibit independent small scale fading, i.e., the diagonal entries of G i ;8i are independent of each other. However, maximizing the upper bound (4.12) may be far away from the optimal solution. Comparing the objective function of (4.11) and (4.12), we 55 can observe that the covariance of the eective channel of UE i from the view of the BS becomes Q avg;i ,E[H y i ~ W i ~ W y i H i ] which may have dierent ranks from the instantaneous channel Q inst;i , H y i ~ W i ~ W y i H i . Specically, the rank of Q inst;i is no larger than the number of RF chains at UE i , while the rank of Q avg;i does not re ect the constraint on the number of UE RF chains. For example, a single-antenna UE can only be served by up to one stream, i.e., 8i; rank(Q inst;i ) 1. Nevertheless, the rank of Q avg;i = K BSi can be up toP i . Thus, optimizing over the upper bound (4.12) implicitly assumes that UE i is able to receive rank(K BSi ) P i symbols simultaneously8i, which is not realistic. For example, consider a system with 10 UEs, each of which has a single RF chain, a BS with 10 RF chains, and P i = 10;8i. Each UE is only capable of sending one symbol per channel use in the uplink. However, the optimization over (4.12) may schedule UEs with better channel conditions to send more symbols, which is far away from optimal. Applying the identity logjXY + Ij = logjYX + Ij, a dierent upper bound can also be developed in a similar fashion (detailed derivation is omitted due to lack of space): P K i=1 logj 1 L i ~ W y i E[H i ~ F a ~ F y a H y i ] ~ W y i + I l UE j, which ignores the limit on the number of RF chains at the BS. Meanwhile, E[logjXj] logjE[X]j is not a tight enough approximation if X has a very large dimension [123], and thus not appropriate for the optimization in mas- sive MIMO systems. Therefore, our problem necessitates the development of a new bound. 4.3.2 Take Expectation Outside the Determinant Following the concavity of the logarithm function, alternatively we can take the expectation outside the determinant to get a tighter upper bound, i.e., E[logjXj] logEjXj. The expression of the new upper bound of (4.11) is dened as C up , logE[j ~ F y a P K i=1 H y i ~ W i ~ W y i H i ~ F a + I l BS j]; (4.13) which maintains the rank of the eective channel either from the point of view of the BS or the UEs. LetjXj ^ i k ^ j k denote the determinant of a sub-matrix of X obtained by se- lecting the row and column subset from X indexed by ^ i k = [ i 1 ; i 2 ;:::; i k ] and ^ j k = [ j 1 ; j 2 ;:::; j k ], respectively. For the index sets ^ i k and ^ j k , the subscript k indicates their cardinalities, while the superscripts i;j are used 56 to distinguish dierent sets, respectively. Then, the integer i k indicates the k-th element in the set ^ i k . We can develop the Remark 1. Remark 1: Dene , diag( 1 p L 1 1 ;:::; 1 p L K K ), B, [B 1 ;:::; B K ], ~ Q, diag( ~ Q 1 ;:::; ~ Q K ) and ~ Q i , A y i ~ W i ~ W y i A i ;8i, we have E[j ~ F y a P K i=1 H y i ~ W i ~ W y i H i ~ F a + I l BS j] = P l BS j=0 P ^ j P ^ 1 j j ~ F y a B y j ^ j ^ 1 j j ~ Qj ^ 1 j ^ 1 j jB y ~ F a j ^ 1 j ^ j ; (4.14) where the sum over the index set ^ j means to consider all possible subsets with cardinality j from [1; 2; 3;:::;k]. To prove Remark 1, we use the following sub-determinant expansions [124]: jI + Xj= P k j=0 P ^ j jXj ^ j ^ j ; (4.15) X= Y X i !j Xj= P ^ 1 k P ^ 2 k ::: P ^ n k jX 1 j [1;2;:::;k] ^ 1 k jX 2 j ^ 1 k ^ 2 k :::jX n j ^ n1 k [1;2;:::;k] : (4.16) Detailed derivations can be found in Appendix.3. Lemma 1: Without the rank constraint over ~ Q, the global optimal ~ Q to maximize (4.14) is a diagonal matrix if [A i ] are semi-unitary, i.e.,8i; A y i A i = I P i . A detailed proof of Lemma 1 can be found in the Appendix.4. Assuming [A i ] are semi-unitary matrices, Lemma 1 implies that enforcing ~ Q to be a diagonal matrix will not change the optimal solution, which leads us to further simplify (4.14): P l BS j=0 P ^ j P ^ 1 j j ~ F y a B y j ^ j ^ 1 j j ~ Qj ^ 1 j ^ 1 j jB y ~ F a j ^ 1 j ^ j = (m) P l BS j=0 P ^ j P ^ 1 j P ^ 2 j j ~ F y a B y j ^ j ^ 1 j j ~ Qj ^ 1 j ^ 2 j jB y ~ F a j ^ 2 j ^ j = (n) P l BS j=0 P ^ j j ~ F y a B y ~ QB y ~ F a j ^ j ^ j = (o) j ~ F y a B y ~ QB y ~ F a + I l BS j: (4.17) Since ~ Q is diagonal as we discussed above, equality (m) followsj ~ Qj ^ 1 j ^ 2 j = 0; ^ 1 j 6= ^ 2 j , equality (n) and (o) follow the reversion of (4.16) and (4.15), 57 respectively. We substitute the expression for B; and ~ Q back into (4.17) to derive a compact upper bound that can be evaluated conveniently: logj ~ F y a B y ~ QB y ~ F a + I l BS j = logj ~ F y a P K i=1 ~ H y i ~ W i ~ W y i ~ H i ~ F a + I l BS j; (4.18) where we dene the average eective channel of UE i as ~ H i , 1 p L i A i i B y i , which is dierent from the previously dened instantaneous eective chan- nel H i . In general, (4.18) is not an exact upper bound, but its approximation can be justied in various scenarios of massive MIMO system at the mm- wave bandwidth. One important assumption we made to hold the validity of approximating (4.14) by (4.17) is that [A i ] is a semi-unitary matrix. This is often a reasonable assumption in light of the following considerations: 1) with the large antenna array also available at UE, e.g.,N = 16 or 32, the UE is able to distinguish MPCs distributed in dierent resolvable angular bins; 2) the local scattering eect is apparent at the UE side, which disperses DOAs of MPCs with similar DODs viewed by the BS, thus the angular spread is much larger at UEs than that at the BS, e.g., 10 degree DOD spread vs. 50 degree DOA spread [125]; 3) with very few dominant MPCs distributed in a large DOA support range, highly correlated DOAs occur rarely. Consequently, we argue that the MPCs' DOA steering vectors tend to be approximately orthogonal to each other. The other assumption we have made in proving Lemma 1 is that ~ Q i 2 C P i P i is a full rank matrix8i, which requires the number of MPCs to be no larger than the number of UE RF chains, i.e., P i l UE . For high performance UEs with relative large arrays and large number of RF chains, such as mounted on mobile relay stations and laptops, the assumption is valid. In fact, later in Section 4.5, our simulation results demonstrate that the approximation performs well, even with a small number of UE RF chains. In summary, we reveal the shortcoming of existing bounds and develop a new closed-form upper bound (4.14) that approximately represents the achievable rate calculation (4.11). Following Jensen's inequality, the bound by taking the expectation outside the determinant is not only tighter but also maintains the rank of the eective channel at both ends. With the assump- tion of [A i ] being semi-unitary and [ ~ Q i ] being full rank, we can avoid the computational complexity of (4.14) to calculate determinants of all possible sub-matrices and get a much simplied form (4.18). 58 4.4 Dual Uplink Transmission with the Approximate Rate Upper Bound We now optimize the analog beamformer for the dual uplink transmission based on the maximization of the upper bound approximation. Because of the tightness of logE[jj] to E[logjj] and the characteristics of the mm- wave channel, maximizing the approximate upper bound (4.18) approaches the optimization of the original problem (4.11). Based on (4.18), we have max ~ Fa;[W ai ];[ ~ W di ] logj ~ F y a P K i=1 ~ H y i ~ W i ~ W y i ~ H i ~ F a + I l BS j (4.19) s:t: P K i=1 tr( ~ W i ~ W y i ) 2 P t ; ~ F y a ~ F a = I l BS ; ~ W i = W ai ~ W di ; W ai 2C Nl UE ; ~ W di 2C l UE s i ;8i; where we introduce an auxiliary digital precoder ~ W di so that ~ S i = ~ W di ~ W y di . s i denotes the number of streams assigned to UE i , which is bounded byl UE , 8i. Therefore, the impact of the data stream assignment is also considered in the optimization for the analog beamforming. Although problem (4.19) seems much simpler, the objective function is still not concave. Moreover, the optimization of [ ~ W i ] inherently contains the stream assignment, i.e., optimizing [s i ] satisfying P K i=1 s i min(l BS ;Kl UE ) and 0 s i l UE , which results in a mixed integer programming. In this section, we mainly focus on the development of ecient algorithm design to approximate the optimal solution of problem (4.19). In fact, the following part will show that the global optimal solution to (4.19) can be achieved eciently under a so-called one-to-one mapping channel model. 4.4.1 Optimal Transmission under the One-To-One Mapping Chan- nel Model A link end with an innite number of antenna elements is able to resolve innitesimal angular dierences. A simplied mm-wave channel model is proposed by [29], where the probability that dierent MPCs' DODs/DOAs in the downlink coincide converges to zero almost surely. Based on this assumption, with both link ends equipped with a very large array, steering vectors corresponding to dierent angles become orthogonal to each other, i.e.,8i; A i and B i in (2.2) are semi-unitary. 59 An interesting observation of channel model (2.2), when [A i ] and [B i ] become semi-unitary, is that the expression is the singular value decomposi- tion (SVD) of the channel, and steering matrices [A i ] and [B i ] become the eigenmodes of the channel at the UEs and BS, respectively. We call this type of channel \one-to-one" mapping (O2O) channel, because each eigenmode of the channel covariance at the BS/UE is coupled with a single eigenmode at the UE/BS. The O2O channel is the opposite to the Kronecker model, where eigenmodes at both ends are fully coupled to each other. In the framework of the \virtual channel model" [126] or the Weichselberger model [127], it can be interpreted as a channel with a diagonal mode coupling matrix. With innite number of antenna elements at the BS, the eigenmodes of channel covariance from the view of the BS, i.e., [B i ], tend to be columns of the discrete-Fourier-transform (DFT) matrix [55]. Thus, we can represent [B i ] as dierent selections from a common M M DFT matrix , i.e., B i = T i , where T i is anMP i selection matrix with only a single one on each row and column, while other entries are zeros, and = [b 1 ; b 2 ;:::; b M ] consisting of all eigenmodes [b m ] M m=1 at the BS side. Meanwhile, according to Lemma 1, we can construct [ ~ W i ] so that8i; ~ Q i = diag(q i1 ;q i2 ;:::;q iP i ) with the diagonal elements denoting the power allocation, e.g.,8i; ~ W i = A i ~ Q 1 2 i . Therefore, problem (4.19) reduces to max ~ Fa;[ ~ Q i ] logj ~ F y a P K i=1 T i i ~ Q i y i T y i y ~ F a + I l BS j (4.20) s:t: ~ Q i = diag(q i1 ;q i2 ;:::;q iP i );s i = rank( ~ Q i )l UE 8i; P K i=1 tr( ~ Q i ) 2 P t ; ~ F y a ~ F a = I l BS : Since8i; T i is a selection matrix, T i i ~ Q i y i T y i is diagonal. Thus, the op- timization of ~ F a is to select steering vectors corresponding to the largest diagonal entries of P K i=1 T i i ~ Q i y i T y i . Meanwhile, the rank constraint over ~ Q i implies that the optimization of ~ W i = A i ~ Q 1 2 i is to select steering vectors from A i ;8i. The selection matrix T i reorders the diagonal entries of the di- agonal matrix i ~ Q i y i to align the power allocation, average power of beam pair with the eigenmodes in , with which the determinant in (4.20) can be represented as j ~ F y a P K i=1 T i i ~ Q i y i T y i y ~ F a +I l BS j =j ~ F y a diag([ P (i;j)2N (bm) 2 ij q ij ] M m=1 ) y ~ F a +I l BS j; (4.21) 60 where diag([x i ] n i=1 ) = diag(x 1 ;:::;x n ). The setN (b m ) contains all eigen- modes at the side of UEs connecting to b m , and (i;j)2N (b m ) denotes that the j-th eigenmode from A i of the UE i is connected to b m . Therefore, P (i;j)2N (bm) 2 ij q ij re ects the total weight of the eigenmode b m , combining allocated power and average channel gains. Proposition 1: It is optimal to let a single eigenmode b m ;8m at the BS side only send a single stream to a UE under the O2O channel model for the massive MIMO regime, i.e., jointly transmitting one stream by using multiple eigenmodes is not necessary. Proof. Let [b m 1 ;:::;b m l BS ] denotes the set of optimal selected eigen- modes at the BS side, and ~ Q ? i = diag(q ? i1 ;:::;q ? iP i );8i, denotes the optimal power allocation. Thus, (4.21) can be represented as j diag([ P (i;j)2N (bmn ) 2 ij q ? ij ] l BS n=1 ) + I l BS j; (4.22) However, for each diagonal entry, we can derive an upper bound as: P (i;j)2N (bmn ) 2 ij q ? ij max (i;j)2N (bmn ) 2 ij P (i;j)2N (bmn ) q ? ij = max (i;j)2N (bmn ) 2 ij P ? t;mn ; (4.23) where P ? t;mn indicates the optimal power allocated to the eigenmode b mn . The equality in (4.23) holds only if we assign all allocated power of a selected eigenmode to its strongest transmit-receive pair. This completes the proof. A simple greedy eigenmode pair selection (GEPS) algorithm is proposed to achieve the optimal eigenmode selection. DeneT andR as sets consist- ing of all eigenmodes from the perspective of the BS and UEs, respectively, we iterate the following procedure until the RF chains of BS or UEs are ex- hausted: add the strongest eigenmode pair betweenT andR and remove the eigenmodes at both ends corresponding to the selected pair fromT and R, respectively. Then, the water-lling power allocation can be performed to determine the power of dierent pairs. We can see the optimality of the GEPS algorithm as follows. Assume that we have found the optimal l < l BS eigenmode pairs. According to the Proposition 1, the new added pair should not have a common eigenmode 61 with those selected at the BS. Meanwhile, since under the O2O channel each eigenmode at the UE side is only connected to one eigenmode at the BS side, the new pair will contain a new eigenmode from the BS and the UE side, respectively. Consequently, it is orthogonal to selected pairs and the strongest available one will be optimal. Given the optimal beam selection, we substitute (4.23) back to (4.22) and develop the following power allocation problem: max [Pt;nm ] P l BS m=1 logj max (i;j)2N (bnm ) 2 ij P t;nm + 1j (4.24) s:t: P l BS m=1 P t;nm Pt 2 ; which can be solved simply by the water-lling algorithm. The GEPS algorithm plus the water-lling power allocation (4.24) can achieve the global optimum to the problem (4.19) under an O2O channel model. With the optimal beam selection under the limited number of RF chains by using the GEPS algorithm, the water-lling algorithm based on the long-term CSI can further strike those selected beam pairs with very weak average channel gains, which is helpful to reduce the training overhead. With low computational complexity, the GEPS is practical for implemen- tation in the real world. In fact, the cost of the algorithm involves two parts: 1) to sort the MPCs by their average power weights, 2) the execution of GEPS itself. The complexity of the typical merge sort isO(P logP ), where P , P K i=1 P i is the total number of ensemble MPCs from all UEs. During the execution, the worst scenario is that every added b nm is connected to all UEs, meanwhile, the path weights of beam pairs with common end b m are larger than remaining unscheduled ones. Therefore, we have to remove K 1 pairs after adding a new eigenmode, which leads to a complexity or- derO(Kl BS ). Combining these two phases, the worst-case complexity of our proposed algorithm isO(P logP +Kl BS ), irrespective of M and N. 4.4.2 Extension to a More Generic Channel Model Exploring the special O2O channel model, we can nd the algorithmic op- timal beam selection. However, problem (4.19), without assuming unitary [A i ], is generally non-convex, involving mixed integer programming, whose solution may be computationally prohibitive, especially with dense UE de- ployment. Thus, we intend to develop a greedy algorithm to approach a suboptimal solution with reasonable complexity. 62 If we x ~ F a , the remaining problem involves not only optimizing [W ai ] but also the stream assignment re ected in [ ~ W di ]. We propose Algorithm 2 named optPrecoder, to decide the analog precoder at the UEs with given analog combiner at the BS for the dual uplink transmission. 1. Initially, the entries of the stream assign vector d are all zeros. Let W ai consist of the l UE strongest right singular vectors of ~ F y a ~ H y i ;8i, where ~ F a is the left singular matrix of F a as dened in (4.10). With ~ W di being the strongest right singular vector of ~ F y a ~ H y i W ai , we can nd the user with the indexi ? , denoted by UE i ?, with the strongest eective channel gain, i.e.,kH e k F where H e = ~ F y a ~ H y i ?W ai ? ~ W di ?, and assign the rst stream to it, i.e., d(i ? ) = 1. 2. Then, we add the next stream that has the strongest eective channel after projection on the null space of scheduled streams. To be specic, for a candidate UE j , we update W aj consisting of l UE strongest right singular vectors of (H e ) ~ F y a ~ H y j , where (H e ) is the null space of H e . Then, updating ~ W dj by d(j) + 1 strongest right singular vectors of (H e ) ~ F y a ~ H y j W aj . For j = 1;:::;K, compute the sum rate by substi- tuting ~ W j = W aj ~ W dj into (4.18). Let j ? denote the index of the user leading to the largest data rate. Then, we add the new stream to UE j ?. 3. Repeat step 2 until adding new stream will not enhance the sum rate or P K i=1 d(i) =l BS . The detailed procedure of the optPrecoder algorithm is given in the Algo- rithm 2. Integrating the optPrecoder as an inner optimization for the analog pre- coders, we can develop an outer-layer algorithm greedily adding steering vectors from [B i ] to construct the analog combiner F a for the dual uplink transmission. The implementation procedure of the overall greedy analog beamforming design (GABD) algorithm is exhibited below: 1. Initially,T = [B 1 ;:::; B K ], where K is the number of UEs, and F a is empty. 2. Let F 0 a = F a . For a candidate steering vector v i inT , let F 0 a = [F 0 a ; v i ]. RunoptPrecoder to return the updated precoder [W aj ] K j=1 ; [ ~ W dj ] K j=1 and the \sum-rate capacity" denoted by ~ R i based on (4.19) for F 0 a . 63 Algorithm 2 optPrecoder 1: Input F a and let ~ F a consist of the left singular matrix of F a . 2: Initialize the stream assignment, i.e., d = [0;:::; 0] T 2Z K1 + , where each entry labels the number of assigned streams to a UE. 3: Initialize H e to be empty, which will store scheduled UEs' eective chan- nels. 4: while P K i=1 d(i)<l BS do 5: for i 1 to K do 6: d 0 = d. 7: H 0 e = H e 8: if UE i 's RF chain is not used up then 9: Let d 0 (i) = d 0 (i) + 1. 10: Find the null space of scheduled streams: (H e ) = I l BS H e (H y e H e ) 1 H y e . 11: Do SVD for H i = (H e ) ~ F y a ~ H y i . Let W ai consist of l UE strongest right singular vectors. 12: Do SVD for H i W ai and let ~ W di consist of right singular vectors corresponding to d 0 (i) largest singular values. 13: Add the eective channel of UE i to H 0 e , i.e., H 0 e = [H 0 e ; H i W ai ~ W di ]. 14: Following the equal power allocation among streams, calculate the sum-rate of UEs based on (4.19). 15: end if 16: end for 17: Add the new stream to UE i ?, i.e., d(i ? ) = d(i ? ) + 1, if it can enhance the rate most. If the sum-rate is no larger than the previous iteration, then stop adding streams. 18: end while 64 3. Repeat step 2) for i ranging from 1 tojTj and nd the steering vector v i ? enhancing the rate most, i.e., i ? = arg max i ~ R i . 4. Let F a = [F a ; v i ?] and remove v i ? fromT , i.e.,T Tnfv i ?g. 5. Repeat step 2) to 4) until the RF chains at the BS are used up. The GABD algorithm implictly assumes that the optimal F a shall be made up by MPC steering vectors, whose optimality has been demonstrated un- der the O2O channel model in Section 4.4.1. In fact, GABD adds the new stream on the orthogonal channel of the existed streams, whose outcome is exactly the same as that of the GEPS under the O2O channel model. In conclusion, we propose a general GABD algorithm for the situations where the steering matrices are not ideally semi-unitary. For the ideal O2O chan- nel, the implementation of GABD can be reduced to a more ecient way, i.e., GEPS. 4.4.3 Common Scatterer Eect: Toy Example 1 cluster BS 2 UEs b1 b3 b2 Figure 4.1: Two UEs communicate with the BS via a LOS component and a common cluster Concerning the scenario where every UE is also equipped with many antennas, both [29] and [106] let the UE form beamformers pointing toward 65 the strongest eigenmodes individually. In particular, [29] selects the strongest eigenmodes of the instant channel, while [106] uses the dominant eigenmodes of the channel covariance. However, this strategy may lead to signicant reduction of the multiplexing gain when a common cluster occurs in the propagation environment. Let us consider the following toy example. Fig. 4.1 shows two UEs with approximately the same angular power spectrum at the BS, including a LOS component and a common cluster. All DODs at the downlink can be resolved by three orthogonal beams, where b 1 serves the LOS component, while b 2 and b 3 stand for the common cluster. Assuming l UE = 1 andN 1, the UE is able to resolve DOAs from the LOS component and the common cluster. If we force both UEs steering at the LOS direction, the degree of freedom (DoF) of the network is reduced from 2 to 1, i.e., only 1 UE can be served. On the other hand, when using our proposed algorithm GEPS/GABD, once b 1 is occupied by either UE 1 or UE 2 , the remaining UE will form beam steering at the common cluster, which generates the optimal analog beamformer design for the toy example exhibited in Fig. 4.1. In the following toy example simulation for an ideal O2O channel, both link ends are equipped with ULAs, where M = 256;N = 64;l BS = 16 and l UE = 1. Assuming each of K = 16 UEs communicates with the BS through P i = 16;8i MPCs, we let the channel covariance at the BS be the same for all UEs. Therefore, all UEs exhibit the same eigendirections from the perspective of the BS, which can be interpreted as that all 16 scatterers are common for all UEs. Re ecting the asymptotical behavior of large array at both ends, we directly select columns of the DFT matrix to construct [B i ] and [A i ]. Meanwhile, to maintain the same large scale loss for all UEs, we simply letL i = 1;8i and the sum of the normalized average path power to be 1, i.e., P P i p=1 2 ip = 1;8i. The MU-MIMO capacity of the broadcast channel with the analog precoder-combiner-projected channel can be achieved by the sum power iterative waterlling [128]. Resorting to Monte Carlo simulations, we evaluate the ergodic capacity with dierent beamforming strategies as shown in Fig. 4.2. Extending the implementation of JSDM to the scenario with multiple antennas at UEs, the beam division multiple access scheme (BDMA) [106] directly projects the channel on the strongest eigenmode of the channel co- variance at the UE, denoted by JSDM/BDMA in the legend, which leads to highly correlated MISO channels among UEs, similar to the dilemma we described in Fig. 4.1 above. With the increase of the SNR, a tremendous reduction of the DoF can be observed compared with our proposed scheme 66 −40 −30 −20 −10 0 10 20 0 20 40 60 80 100 120 140 160 180 SNR [dB] Capacity [bit/s/Hz] GABD JSDM/BDMA SEG−inst. Figure 4.2: Ergodic capacity of analog precoder-combiner-projected channel GEPS/GABD. On the other hand, with the knowledge of the full instan- taneous CSI, [29] selects the strongest transmit-receive eigenmode pair, i.e., SEG-inst. in the legend, for all UEs. Assuming that 2 i1 is the largest aver- age path power of UE i , the real channel gain i1 g i1 , combined with the small scale fading, may not be the strongest among all paths of UE i , which leads to the BS serving dierent UEs by various eigendirections rather than just one direction as that in BDMA. However, there is still a large chance that the projected instantaneous channels of dierent UEs end up being originated from the same eigendirection at the BS, because of which the SEG-inst. can- not realize the full spatial multiplexing of the system in the high SNR regime as shown in Fig. 4.2. In conclusion, following the uplink-downlink duality theory, the original downlink problem (4.4) is equivalent to a simplied uplink problem (4.8). To further reduce the complexity of treating the coupling eects between optimization layers with dierent time scales, we introduce a time invariant 67 input covariance, and reduce problem (4.8) to (4.9), which has a single layer purely based on the long-term CSI. With an appropriate bounding technique and the validity of Lemma 1, we obtain a closed-form expression (4.18) to well approximate the objective function of (4.9), which leads to problem (4.19). Under a particular O2O channel model, we develop the GEPS algorithm to obtain the global optimal analog beamformers of problem (4.19). For generalized mm-wave channel models, a heuristic GABD scheme is proposed, whose performance will be demonstrated in Section 4.5. 4.5 Simulation Results In this section, we evaluate the performance of our proposed analog beam- forming algorithm, i.e., GABD, via simulations. Similar to Fig. 4.2, all simulation results exhibit the comparison of the ergodic capacity with the state-of-the-art methods [29, 55, 106]. With the assumptions of block fading channel and ideal eective CSI acquisition, we calculate the instantaneous downlink capacity according to (4.4) by using the iterative waterlling power allocation method [128], while the ergodic capacity is obtained through the ensemble average of the Monte Carlo realizations. !" #$ %&'()* %&'()+ %&'(), ! "# $% & !"#$%&'()*&+$,-"&&'.$/01'. !2#$340(5"6-$,-"&&'.$/01'. Figure 4.3: Simulation scenarios 68 In Section 4.5.1, we consider a one-ring cluster channel model, where the DODs of the MPCs concentrate around a dominant direction with a certain angular spread. Since the UEs moving in the visibility region (VR) [129] of the same cluster tend to share the common local scatterers [130] as Fig. 4.3a shows, we build up the geometric stochastic propagation environment by xing the set of randomly generated scatterers corresponding to each cluster. Then, each scatterer provides a single MPC originated from the BS and ended at the UE. To further investigate the impact of common scatterers on the perfor- mance of the proposed and existing beamforming methods, we generalize the 2-path channel model in Section VI-B, which is used in Fig. 4.6a and Fig. 4.6b. Specically, besides independently generating a single MPC for each UE [29], we introduce another scatterer which generates the second MPC to all UEs as shown in Fig. 4.3b. 4.5.1 One-ring Channel Multiple UEs are dropped in three sectors with disjoint DOD support inter- vals, i.e.,60 30 ,15 15 , and 30 60 as Fig. 4.3a shows. The dashed ring covers the VR of the cluster, whose radius is maintained to be 8 m in our simulations. Scatterers are independently generated around the ring and shared by all UEs within the same VR. Subsuming the impact from the large scale loss, including pathloss plus shadowing, into the average SNR, we generate path power [ 2 ip ] following a uniform distribution and normalize them to satisfy P P i p=1 2 ip = 1. For every UE within the same VR, we assign power weights to common scatterers in an ordered manner, which reasonably re ects the correlation of the propagation environment across UEs: signi- cant common scatterers probably contribute dominant power weight to all UEs in the same VR. Unless otherwise specied, the detailed parameter set- tings are the ones exhibited in Table 4.1. Assuming both ends are equipped with ULAs, we generate the synthetic transfer channel according to (2.2). To evaluate the system performance on average, we simulate over 100 inde- pendent UE drops, while 100 independent fading realizations are generated per drop. First, we investigate the performance of the ergodic capacity as a function of the SNR by xing the number of UEs per group to be 6, l BS = 18, and l UE = 1 in Fig. 4.4a. The \SU bound" is generated by assuming that all UEs 69 Table 4.1: Simulation parameters Number of UEs per group 1 6 Number of BS antennas and RF chains M = 256, l BS = 1 18 Number of UE antennas and RF chains N = 64, l UE = 1 8 Antenna spacing (in wavelength) D = 1 2 Number of MPCs P i = 6;8i Number of UE drops 100 Number of fading realizations per UE drop 100 have full collaboration with each other, so that the whole system is reduced to the single-UE MIMO scenario whose capacity is well-known and can be eciently evaluated [16]. Without the constraint of the HDA structure, the SU bound is achieved based on the full digital system with full CSI at both ends. Comparing the JSDM/BDMA [55, 106] and SEG-inst. [29] with the SU bound, we can observe that the performance gap tends to be larger with increasing SNR, which re ects the reduction of the DoF caused by the biased analog combiner projection at UEs, i.e., each UE distributively selects its own strongest receive eigenmodes based on the channel covariance (BDMA) or instant CSI (SEG-inst.). Nevertheless, exhibiting a constant performance gap from the SU bound, our proposed GABD scheme is able to fully explore the DoF of the network, which outperforms not only the JSDM/BDMA but also the instantaneous-CSI-based SEG-inst. method. Fig. 4.4a also includes the scheme of GABD with constant modulus constraint, namely GABD CM, where we normalize the amplitude of proposed beamformers' entry to be unity. With negligible performance gap between GABD and GABD CM, our proposed scheme is robust to the additional hardware constraint brought by the phase-shifter-network to form the analog beamformer. Comparing Fig. 4.2 and Fig. 4.4a, the performance loss of the JSDM/B- DMA under a more realistic mmwave channel model is not as signicant as the one shown in the toy example. Since we only have nite number of antenna elements at both ends, the steering matrices [A i ] and [B i ] are not ideally unitary, which results in the coupling of the strongest receive eigenmode with more than one transmit eigenmodes. However, with larger antenna arrays at both ends, the steering matrices tend to be more unitary, generating even larger performance gaps between GABD and JSDM/BDMA. 70 −20 −10 0 10 20 30 0 50 100 150 200 250 300 350 SNR [dB] Capacity [bit/s/Hz] SU bound GABD GABD_CM JSDM/BDMA SEG−inst. (a) Capacity vs. SNR 1 2 3 4 5 6 0 50 100 150 200 250 Number of UE per group Capacity [bit/s/Hz] SU bound GABD JSDM/BDMA SEG−inst. SNR=10dB SNR=−10dB (b) Capacity vs. Number of UEs per group Figure 4.4: Ergodic capacity of analog precoder-combiner-projected channel under dierent settings of SNR and number of UEs per group On the other hand, SEG-inst. outperforms our scheme at low SNR regime in both plots, which is reasonable since at this regime the system will benet more from the beamforming gain provided by the knowledge of instantaneous CSI rather than the multiplexing gain. The simulation scenario exhibited in Fig. 4.3a naturally partitions UEs into dierent groups according to the DOD support range or, in other words, eigenspace of the channel covariance. Then, to investigate the impact of the number of UEs per group, we vary it from 1 to 6, while keeping the other simulation settings the same in Fig. 4.4b. Two sets of curves for dierent schemes are plotted for SNR =10 and 10 dB, respectively. In the low SNR regime, neither scheme can signicantly benet from adding UEs because they are acting as noise-limited systems. With the knowledge of instant CSI, the SEG-inst. provides slightly better beamforming gain than GABD. However, when SNR = 10 dB, thanks to the ability of exploring DoFs, our proposed scheme outperforms both JSDM/BDMA and SEG-inst. with the increase of the number of UEs per groups. 71 Finally, Fig. 4.5a and Fig. 4.5b depict the impact of the number of RF chains at the BS and the UEs, respectively. For SNR =10 dB and 10 0 5 10 15 20 0 50 100 150 200 250 Number of BS RF chain Capacity [bit/s/Hz] SU bound GABD JSDM/BDMA SNR=10 dB SNR=−10 dB (a) l BS 1 2 3 4 5 6 7 8 40 60 80 100 120 140 160 180 200 220 240 Number of UE RF chain Capacity [bit/s/Hz] SU bound GABD JSDM/BDMA SNR=10 dB SNR=−10 dB (b) l UE Figure 4.5: Ergodic capacity of analog precoder-combiner-projected channel vs. Number of RF chains dB, we acquire two sets of curves for dierent schemes. 1 Meanwhile, the SU bound is also exhibited as a benchmark. In Fig. 4.5a, by xing l UE to be 1 and increasing l BS from 1 to 18, we can observe that the gain of GABD over the JSDM/BDMA is small when SNR =10 dB. In fact, both of them saturate with the increase ofl BS , since the system is noise limited. When SNR = 10 dB, we can observe that once l BS > 8, JSDM/BDMA starts to saturate, while our proposed scheme can still benet from the increase of DoFs. In Fig. 4.5b, we x l BS = 18 and let l UE range from 1 to 6. Since min(l BS ;Kl UE )=18 and the synthetic channel contains 18 independent DODs in total, the maximum DoF of the overall system is still 18. Therefore, the SU bound is independent of the number of UE RF chains. When l UE 1 Since the SEG-inst. [29] requires a strict constraint that the number of BS RF chain equals the ensemble of UE RF chains, i.e.,l BS =Kl UE , which is not applicable for exible adjusting of RF chains here. Thus, we do not show its curve in Fig. 4.5a and Fig. 4.5b. 72 rank (K UE i ) = 68i, BDMA selectingl UE strongest receive eigenmodes at UE maintains the whole eigenspace of the channel covariance, which leads to the global optimum covariance-based analog beamformers at both ends. In this case, our proposed scheme starts to align with JSDM/BDMA. However, for a practical system with a smaller number of RF chains at UEs, sayl UE = 1 [29, 106], the ergodic capacity of GABD is 40% more than that of JSDM/BDMA at SNR = 10 dB. 4.5.2 Two-Path Channel To illustrate the impact of the common scatterer, we consider the scenario exhibited in Fig. 4.3b. Within the interval of [30 ; 60 ], besides an inde- pendently generated scatterer for each UE, we also have a shared common scatterer. With 6 BS RF chains, the BS serves 6 single-RF-chain UEs, while other parameter settings are maintained the same as before. In contrast to the random power weights for MPCs in Section 4.5.1, we now vary the power weight of the common scatterer c between 0 and 1. −20 −10 0 10 20 30 0 20 40 60 80 100 120 140 SNR [dB] Capacity [bit/s/Hz] SU bound GABD JSDM/BDMA SEG−inst. σ c 2 =0.95 σ c 2 =0.4 (a) Capacity vs. SNR: 2 c = 0:4; 0:9 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 90 σ c 2 Capacity [bit/s/Hz] SU bound GABD JSDM/BDMA SEG−inst. SNR=−10 dB SNR=10 dB (b) Capacity vs. 2 c : SNR =10; 10 dB Figure 4.6: Ergodic capacity of analog precoder-combiner-projected channel under dierent settings of SNR and power weight of the common scatterer 73 First, Fig. 4.6a compares the performance of dierent schemes under 2 c = 0:4 and 0:95. With 2 c = 0:4, the power weight of the common scatterer is smaller than that of the unique scatterer for each UE. Consequently, selecting the strongest eigenmode of the receive covariance under the JSDM/BDMA scheme coincides with the optimal solution, whose curve is also aligning with our proposed scheme and close to the SU bound. However, SEG-inst. may let some UEs point toward the common scatterer, since the instantaneous power weight from the common scatterer may be the strongest one. Reduction of the DoF generated by the biased analog combining at the UE leads to the performance degradation compared with JSDM/BDMA and our scheme. With 2 c = 0:95, the JSDM/BDMA will always form a beam pointing toward the common scatterer, which exhibits the strongest average power weight. Meanwhile, SEG-inst. also tends to select the common scatterer since its instantaneous power will be the strongest one with very high probability. In this case, both schemes lead to a low-rank channel [16] tremendously reducing the system performance by ignoring other DoFs to be explored. Nevertheless, the GABD scheme is capable of utilizing other MPCs for beamforming, whose performance is still close to the SU bound. Then, Fig. 4.6b shows the ergodic capacity varying as a function of 2 c for SNR =10 and 10 dB. First consider two extreme situations with 2 c = 0 and 1. When there is no power weight on the common scatterer, our channel is reduced to the single-path model considered in [29]. The ergodic capacity of dierent schemes coincides with the SU bound, since the channels of the UEs tend to be orthogonal to each other without the common scatterer eect. On the contrary, when 2 c = 1, the multiuser channel is reduced to a low-rank model where there only exists a single DOD from the perspective of the BS. Since the DoF of the system we can explore is just 1, all schemes, including the SU bound, end up being the same. However, when 2 c ranges from 0 to 1, with the GABD scheme, the system is able to jointly optimize the analog beamformers at both ends. Its ergodic capacity, close to the SU bound, is much more robust against the variation of the power weight on the common scatterer compared with other schemes. When 2 c < 0:5, the performance of the JSDM/BDMA is the same as our proposed scheme, as explained in Fig. 4.6a for the case with 2 c = 0:4. Once 2 c surpasses 0:5, JSDM/BDMA will let each UE form analog beamformers pointing toward the common scatterer, which signicantly reduces the DoF. Meanwhile, SEG-inst. is more robust than JSDM/BDMA when 2 c > 0:5 since the instantaneous strongest eigenbeam pair is not necessarily the one 74 corresponding to the common scatterer, although it contributes the larger average power weight. 4.6 Conclusions In this chapter, we studied the design of analog beamforming for massive MIMO downlink of a multi-user mm-wave system with HDA structure at both link ends. Based on the high directionality of the mm-wave channel, an approximate upper bound of the ergodic sum capacity is developed. Then, a GABD algorithm is proposed to maximize this upper bound. Simulation results demonstrate the necessity of joint precode/combiner design to combat the common scatterer eect. Compared with existing schemes, our algorithm reveals better and more robust performance in a variety of scenarios. 75 Chapter 5 Joint Optimization of Analog Beamforming and Training Resource Allocation In this chapter, we develop an optimization framework, namely user-centric virtual sectorization (UCVS), to explore the tradeo of training overhead, beamforming gain, and spatial multiplexing gain. In the UCVS, both the channel-statistics-based analog beamforming design and a non-orthogonal donwlink training scheme are investigated to reduce the necessary cost of in- stantaneous channel acquisition. By maximizing an approximate net average throughput, we devise ecient algorithms to realize the suboptimal UCVS. With generic mm-wave channel models, we demonstrate by simulations that our proposed scheme outperforms state-of-the-art methods in various typical scenarios of mm-wave communications. 5.1 Introduction Many beamformer optimizations for massive MIMO with hybrid digial/ana- log (HDA) structure assume the full acquisition of, and adaptation to, instan- taneous channel state information (CSI). However, it is nontrivial to obtain the full CSI with extremely large arrays, especially for mm-wave channels. Main challenges lie in the following aspects: 1) shorter coherence time at high carrier frequency caused by the larger Doppler spread, 2) for a single channel use of training, the number of sample measurements is less than that of the 76 conventional fully digital system due to lack of RF chains. Achieving the same amount of measurements as a fully digital system requires extending the training duration, worsening the dilemma caused by 1). Even in a system without hardware constraints, i.e., in a fully digital implementation, the short coherence time at mm-wave frequencies consti- tutes a problem for massive MIMO. Considering a large-array BS serving single-antenna user equipments (UEs), [3] suggests channel-reciprocity-based uplink training in a time-division-duplexing (TDD) mode to avoid the large overhead brought by the downlink training in frequency-division-duplexing (FDD) mode. However, the large pathloss at mm-wave frequencies necessi- tates both link ends to be equipped with multiple antenna elements in order to exploit beamforming gains. If the total number of antenna elements from all UEs is then the same order as that of the BS, the signicant burden of uplink training at antenna level will also make massive MIMO based on in- stantaneous CSI infeasible. Therefore, analog beamforming has to be used at both link ends during the training phase to reduce the eective channel dimension without the full knowledge of instantaneous CSI. Two major research directions dealing with the above challenges have been investigated in the past few years: 1) compressive-sensing-based chan- nel estimation plus analog beamforming optimization [27, 34], 2) channel- statistics-based analog beamforming design [55]. In this chapter, we focus on the latter approach to design analog beamformers at both link ends based on second-order (covariance) channel statistics. Within the stationarity time of the channel statistics, which can be equivalent to tens or hundreds of co- herence times [131], the covariance-based analog beamforming reduces the eective channel dimension to the number of RF chains. Consequently, typi- cal training schemes and digital beamformers, e.g., zero-forcing, for the MU- MIMO system can be easily employed. Joint spatial division multiplexing (JSDM) [55] designs the analog pre- coder at the BS as a function of the channel covariance matrices, which bears some formal resemblance to our investigations. However, its sector-specic design, which enforces orthogonality between dierent groups of UEs, will null out signals from common scattereres, and thus may sacrice not only signicant beamforming gain but also spatial multiplexing gain (see Section 5.2.1 for details). In this chapter, we intend to design channel-statistics- based analog beamformers from a perspective of user-centric beam clustering (UCBC): the BS forms a beam cluster for an individual UE, whereas the beam clusters of dierent UEs can overlap with each other. The overlapped 77 part of beam clusters indicates the set of beams pointing toward common scatterers to serve corresponding UEs. Meanwhile, the allocation of training resources will also be part of the optimization of our formulated problem. The inherent sparsity of mm-wave channels can be exploited by directional beams at both link ends [132]. With appropriately designed analog beamformers, the eective spatial channels of the UEs tend to be semi-orthogonal to each other, which creates the po- tential of non-orthogonal beam training (NOBT). In [133], the tradeo of training duration and achievable rate with HDA structure at the BS side is investigated, but retains the conventional orthogonal training scheme. Our proposed user-centric virtual sectorization (UCVS) scheme exploits the UCBC to form exclusive or partially overlapped virtual sectors for dif- ferent UEs, and the NOBT to save overall training overhead. Moreover, periods of downlink training for dierent UEs may end at dierent time slots in UCVS. Therefore, for a particular UE whose eective CSI is obtained by the BS before the completion of the training phase, we may launch the down- link data transmission to it. This simultaneous training-data transmission (STDT) phase is also considered in [134] for the uplink, where orthogonal training among UEs is assumed, and the interference between training sig- nal and payload data is mitigated by using successive interference cancel- lation based on the orthogonality between the independent and identically distributed (i.i.d.) UE channels. The mm-wave channel with highly direc- tional characteristics is generally not i.i.d. [89]. With both NOBT and po- tential STDT phase, we will utilize the spatial orthogonality to suppress the interference between training signals and payload data from the propagation perspective of the downlink. To the best of our knowledge, there is little work exploring the joint opti- mization of training resource allocation and channel-statistics-based analog beamformer design, and we are trying to close this gap. The main contribu- tions of this chapter are summarized below: • We develop an optimization framework for the mm-wave massive MIMO downlink, where channel-statistics-based UCBC, NOBT, and implied STDT phase are introduced to combat the fast variation channel. A UCVS scheme is realized by exploring the highly directional and sparse characteristics of mm-wave channels. • Given an analog beamforming design, we formulate the problem to op- timize the training resource allocation from a graph theory perspective. 78 An algorithmetic method is developed for an approximate solution of training resource allocation. • We account for the coupling eect of training resource allocation and analog beamformer optimization to jointly maximize the overall net average throughput. We devise ecient algorithms to realize user- centric beamformers. Employing generic mm-wave channel models, simulations demonstrate the advantages of the proposed scheme over the state-of-the-art scheme under various typical parameter settings. The rest of this chapter is organized as follows. In Section 5.2, we rst review the concept of JSDM, then elaborate on the essential idea of UCBC. Section 5.3 presents stepwise procedures of the UCBC scheme, and summa- rizes the developments of the problem formulation, based on which algorithm developments are exhibited in Section 5.4. Simulations results are presented in Section 5.5 before drawing the conclusions in Section 5.6. 5.2 Overview of User-Centric Virtual Sectorization The main objective of this chapter is to provide a user-centric optimization framework that incorporates the concern of training overhead reduction. In this section, we will rst give a recap of JSDM, which provides a sector- centric analog precoder design based on the channel statistics. Later, com- paring JSDM and UCVS by illustrating some toy examples, we elaborate on the usefulness of our proposed idea in typical scenarios of mm-wave commu- nications and also explain its working mechanism conceptually. 5.2.1 Recap of JSDM The JSDM-based framework can be interpreted as a sector-centric beam clustering, where the BS individually forms covariance-based analog pre- coders to illuminate each \sector", while dierent UE groups tend to be semi-orthogonal to each other. Specically, single-antenna UEs with simi- lar channel covariance are grouped together and inter-group interference is suppressed by an analog precoder based on the approximate block diagonal- ization method, which creates multiple \virtual sectors". Treating each RF chain at a BS withM antenna elements as an individual \BS", we can view JSDM as a coordinated multi-point (CoMP) transmission 79 1 3 2 BS " # cluster $ 0.2 0.2 0.8 0.8 1 $ # " " $ # 0.2 0.2 0.8 0.8 1 UE Beam " # $ " 0.2 0 0 # 0 0.2 0 $ 0.8 0.8 1 Figure 5.1: Toy example of 3-UE channel: 1) both UE 1 and UE 2 have LOS propagation to the BS, all three UEs \see" a common cluster that couples them, and normalized average power of MPCs, i.e. [ 2 ip ], is also labeled next to dashed lines; 2) generation of beam pair bipartite graph from beam measure table. scheme [107] under particular constraints: an exclusive set of \BSs" serves its corresponding UE group in joint transmission (JT) mode, meanwhile, it also needs to work in coordinated beamforming (CB) mode with other groups, suppressing the leakage interference. However, the enforced constraint may lead to a solution that is away from net sum rate maximization. For example, Fig. 5.1 exhibits a 2-path channel model of three UEs, where both UE 1 and UE 2 have the line of sight (LOS) propagation to the BS. Additionally, all UEs share a common cluster. Assume three transmit beams illuminating all MPCs of this network: if we place UE 1 and UE 2 into separate groups, the BS has to null out b 3 following the orthogonality principle of JSDM across dierent groups. Although parallel training can be implemented and simultaneously serve two UEs (channels of b 1 to UE 1 and b 2 to UE 2 tend to be quasi-optical, which are orthogonal to each other), we not only lose signicant beamforming gain since the average power from b 3 to UE 1 and UE 2 is 0:8, but also lose one degree of freedom (DoF) by generating a poor eective channel condition for UE 3 , which lies in the sector edge between groups. 5.2.2 Basic Idea of User-Centric Virtual Sectorization Maximizing the net sum rate of UEs necessitates the joint consideration of training costs, beamforming gains, and overall spatial multiplexing gains. We generalize the JSDM-like sector-centric beam clustering to a UE-centric 80 one, where the BS forms a cluster of transmit beams for each scheduled UE individually. Unlike the constraint of JSDM that the common set of beams is assigned to UEs within the same group, while UEs in dierent groups exhibit exclusive beam clusters, we allow partially overlapped beam clusters among UEs. Dene the UE-specic analog precoder as B i 2C Ml i ;8i, where l i is the number of BS RF chains used to serve UE i . In the toy example exhibited in Fig. 5.1, two interesting scenarios of UE grouping can be developed follow- ing the principle of JSDM [55]: 1) separate UE 1 and UE 2 into two groups, therefore B 1 = b 1 , B 2 = b 2 , and B 3 0; 2) Group all three UEs together, and let B 1 = B 2 = B 3 = [b 1 ; b 2 ; b 3 ]. Note that in scenario 1), since UE 3 lies in the sector edge as we mentioned before, its analog precoder is ap- proximately zero. Comparing both scenarios, we can simultaneously serve UE 1 and UE 2 with one pilot dimension for parallel training at the expense of beamforming gains from the common cluster in scenario 1), while in sce- nario 2), all three UEs can be scheduled at the cost of three pilot dimensions for orthogonal beam training. However, there is no explicit conclusion as to which scenario, i.e., UE grouping, is optimal to maximize the net sum rate in [25,55,89{91,106]. Meanwhile, there is another scenario that is not covered by JSDM, say scenario 3), where we have B 1 = [b 1 ; b 3 ], B 2 = [b 2 ; b 3 ], and B 3 = b 3 . Although the overall analog precoder F a remains the same for both scenario 2) and 3), orthogonal beam training is not necessary for scenario 3). Since the channels of b 1 to UE 1 and b 2 to UE 2 are approximately orthogonal to each other, we can assign the same pilot dimension to b 1 and b 2 , which will not cause the problem of pilot contamination [3]. Therefore, we can use only two pilot dimensions to complete the training of three beams by utilizing the spatial orthogonality between eective channels. Before proceeding to specic problem formulations in Section 5.3, we explain core concepts that are introduced by our scheme. Beam pair bipartite graph With the assumption of DFT-codebook-based design, the optimization of analog precoder at BS becomes a selection problem, which falls into the realm of integer programming. Given the UE-side channel covariance K UE;i , we can write its eigen decomposition as K UE;i = E UE;i UE;i E y UE;i , where E UE;i = [r i1 ; r i2 ;:::; r ir i ] is a semi-unitary matrix with rank r i min(N;P i ), 81 r ij denotes thej-th receive eigenmode of E UE;i , and UE;i aligns eigenvalues of K UE;i on its diagonal. Therefore, we can build up a measure matrix between DFT beam tones and receive eigenmodes as follows: S(m;j + i1 X k=1 r k ) = b y m ~ K BS;i;j b m ; (5.1) ~ K BS;i;j ,E[H y i r ij r y ij H i ] = 1 L i A BS;i i diag(A y UE;i r ij r y ij A UE;i ) i A y BS;i ; where b m denotes the m-th column of M , and ~ K BS,i,j represents the BS side channel covariance of UE i projected by r ij . S2 R M P K i=1 r i >0 indicates the measure matrix between M DFT beams and receive eigenmodes of all UEs. The entry indexed by (m;j + P i1 k=1 r k ) denotes the average channel gain betweenm-th DFT beam tone andj-th receive eigenmode of UE i , where j ranges from 1 to r i . For the toy example exhibited in Fig. 5.1, we simply let L i = 1;8i, and N = 1, while the steering matrices consist of normalized DFT columns with A BS;1 = [b 1 ; b 3 ], A BS;2 = [b 2 ; b 3 ], and A BS;3 = b 3 . Substituting the above parameter set into (5.1) generates the beam measure table in Fig. 5.1, where we only exhibit the measure table with eective transmit beams: b 1 , b 2 , and b 3 . Equipped with a single antenna element, UEs in the toy example receive omnidirectional signals. Therefore, we only have one receive eigenmode for each UE. To build the beam pair bipartite graph, we place nodes of transmit beams and UEs at left and right side, respectively. If the entry between a beam and a UE is non-zero, we connect the two nodes by a weighted edge. If UEs with multiple antenna elements are able to resolve dierent MPCs, UE i will have P i receive eigenmodes,8i, which leads to the development of beam pair bipartite graph as Fig. 5.2 exhibits. To display a toy example, we simply let [A UE;i ] also consist of normalized DFT columns, which then become receive eigenmodes. With directional beams at both ends, we can observe that the beam measure table becomes even sparser, based on which non-orthogonal beam training can be utilized to reduce the overhead cost. On the other hand, [29,106] design analog combiners at the UEs by selecting its strongest eigenmode individually, which may be far away from the max- imization of the net sum rate in a mm-wave channel. For example, for the beam pair bipartite graph shown in Fig. 5.2, if we let all three UEs point toward to the common cluster, which exhibits the largest weights for all of them, the DoF of the analog-combiner-projected MU-MIMO channel will be 82 1 3 2 BS " # cluster $ 0.2 0.2 0.8 0.8 1 $ # " "# $" ## 0.2 0.2 0.8 0.8 1 receive transmit "" "# #" ## $" " 0.2 0 0 0 0 # 0 0 0.2 0 0 $ 0 0.8 0 0.8 1 "" "# $" #" ## "" #" Figure 5.2: Generation of beam pair bipartite graph when there are multiple receive eigenmodes. Both UE 1 and UE 2 exhibit two receive eigenmodes, while UE 3 has only one pointing to the common cluster. only one. Therefore, we will also investigate the joint optimization of analog combiners based on the beam pair bipartite graph. In reality, there will be no entry with exact zero-value in the beam mea- sure table, which implies a fully connected beam pair bipartite graph. How- ever, after an appropriate thresholding, we strike out weak edges with weight below a threshold, say the noise oor, so that we obtain an eective bipartite graph as in Fig. 5.1 and Fig. 5.2. The threshold parameter plays an impor- tant role in the beam clustering, which will be elaborated in Section 5.4. On one hand, striking weak beam pairs generates a sparser beam measure table, which needs less pilot dimensions for training. On the other hand, the ef- fective bipartite graph should maintain dominant directional characteristics of the multi-user channel, or we will suer severe pilot contamination and inter-user interference (see below). Non-orthogonal beam training (NOBT) Given the analog beamformers at both ends, the beam measure table S with the dimension M P K i=1 r i is reduced to an eective one, denoted by S, projected by analog beamformers, where S has dimension l BS K. Based on S, we can develop the beam cluster of an individual UE, containing all transmit beams connected to it. For example, let us revisit the toy example exhibited in Fig. 5.1. Considering a system with l BS = 3 and N =l UE = 1, we assume that the optimized analog precoder F a is [b 1 ; b 2 ; b 3 ]. Therefore, Fig. 5.1 is equivalent to its reduced beam measure table S. The analog precoders (beam clusters) of the UEs are B 1 = [b 1 ; b 3 ], B 2 = [b 2 ; b 3 ], and 83 B 3 = b 3 . The training overhead cost depends on the minimum number of neces- sary orthogonal pilot dimensions. Dene the set of UEs whose beam cluster contains the i-th transmit beam asK i , i.e.K i =fkjb i 2 B k g. Therefore, if K i \K j 6=;,8i6=j, we cannot schedule b i and b j for training on the same pilot dimension, since any UE lying in the intersection set will encounter severe pilot contamination. However, consider a set of beamsT , such that their served UE sets do not overlap: in that case, we can train them simul- taneously, i.e.T =fijK i \K j =;;8j2Tnfigg. 1 For example, in Fig. 5.1, we haveK 1 =f1g,K 2 =f2g, andK 3 =f1; 2; 3g. BS cannot train b 1 and b 3 (or b 2 and b 3 ) simultaneously, sinceK 1 \K 3 =f1g (K 2 \K 3 =f2g). On the other hand, b 1 and b 2 can be placed on the same pilot dimension, since K 1 \K 2 =;. The total number of orthogonal resource elements occupied by training can be reduced to 2 for the toy example, while JSDM suggested by [55, 89] will perform orthogonal training across [b i ] 3 i=1 , treated as intra- group transmit beams serving all three UEs. Detailed developments on the minimization of training cost can be found in Section 5.4.1. Simultaneous training-data transmission (STDT) Conventional cellular systems will start the data transmission phase after the completion of the training phase. However, in this chapter, we propose a novel training scheme where the BS can \partially" launch the data transmis- sion during the training window. We illustrate its mechanism conceptually by a toy example exhibited in Fig. 5.3. To better clarify the STDT phase, we dene the following sets, which will be used in the remainder of the chapter. K cc;t is the set of UEs who have completed beam training at time slott.K tr;t denotes the set of UEs awaiting the training signal at time slott, andK dd;t is the set of UEs receiving a data signal at time slot t. B t =fijb i 2[ k2Kcc;t B k ; b i = 2[ k2 Kcc;t B k g, indicating the set of beams that are ready for data transmission at time slot t, while T tr;t is the set of beams trained at time slot t. With the reduced beam pair bipartite graph exhibited in Fig. 5.3a, JSDM will place UEs in the same group with the common analog precoder, i.e. 1 For a TDD system, a similar argument can be developed to utilize the directional characteristics of mm-wave channels for uplink training. Then, we need to investigate the set of UEs that can be trained together, whose set of receive beams at BS shall be orthogonal to each other. 84 Slot 1 Slot 2 Slot 3 Pilot … … … " # $ % … Slot 4 Pilot Pilot Pilot data data data data Slot 1 Slot 2 Slot 3 Pilot … … … " # $ % … Slot 4 Pilot Pilot Pilot data data data data data " " Slot 1 Slot 2 Slot 3 Pilot … … … " # $ % … Slot 4 Pilot Pilot data data data data Pilot Slot # $ % # (a) (b) (c) (d) =3 =3 =4 Figure 5.3: Compare the training phase of JSDM and UCVS, where (a) is an example of reduced beam pair bipartite graph, (b) re ects the training process of the JSDM, while (c) and (d) represent the training periods of the UCVS with dierent training orders, respectively. is the duration of overall training window. 85 F a = [b 1 ;:::; b 4 ], and orthogonal beam training is implemented as Fig. 5.3b shows. However, with the partially overlapped beam clusters in UCVS shown in Fig. 5.3c and Fig. 5.3d, UE-specic analog precoders are B 1 = [b 1 ; b 2 ] and B 2 = [b 2 ; b 3 ; b 4 ]. SinceK 1 \K 3 =;, b 1 , b 3 can be trained simultaneously and we only need 3 orthogonal time slots to complete the training of 4 beams. For Fig. 5.3c, based on the association between transmit beams and UEs in Fig. 5.3,K tr;1 =f1; 2g,K tr;2 =f2g, andK tr;3 =f1; 2g, whileT tr;1 =f2g, T tr;2 = f4g, andT tr;3 = f1; 3g. K cc;t = ;;8t 3, andK cc;4 = f1; 2g, indicating both UEs complete beam training after the whole training window. Therefore,B t =;;8t 3. However, for Fig. 5.3d, where we swap the order of training b 1 , b 3 , and b 4 , an interesting observation is thatK cc;3 = f1g, andB 3 = f1g, which denotes that b 1 can be used for payload transmission at time slot 3 to serve UE 1 . Although b 2 and b 3 are also trained before time slot 3, scheduling them for data transmission will leak interference to the training signal of b 4 at UE 2 . We will optimize the training order of beams in Section 5.4.1. In summary, the NOBT phase exploits the directional characteristics to reduce the training cost, while the implied STDT phase utilizes additional DoFs in the training phase for data transmission. Individual gains from NOBT and STDT respectively depend on the topology of the beam pair bipartite graph. For example, if the channel subspace of dierent UEs are orthogonal to each other and maintain the same dimension, parallel training can be implemented across dierent transmit beams. Although there is no STDT phase, the training cost is tremendously reduced by the NOBT phase. In Section 5.5, we investigate the individual contributions from NOBT and STDT, respectively, through simulations with random topology of beam pair bipartite graph. 5.3 Problem Formulation 5.3.1 Training with STDT Phase Instantaneous channel estimation To enable STDT, UEs need to feed back the instantaneous estimated eective channel to the BS at time slott,8t. Then, the BS can extract available beams to formB t+1 for data transmission at time slot t + 1. The received training 86 signal at UE i at time slot t can be expressed as ~ x tr;i;t = p p;t w y ai H i F a p tr;t + w y ai H i F a x d;t + w y ai n i;t = p p;t h y i p tr;t + h y i x d;t + n i;t = p p;t h i;G(i;t) | {z } Desired training signal + X j2Ttr,tnfG(i;t)g p p;t h i;j | {z } Training contamination + h y i;dd;t F d;t x t | {z } Payload interference + n i;t |{z} Noise ; (5.2) where p;t denotes the power used for training each beam in every time slot, 8t. p tr;t is an l BS 1 indicator vector to denote whether a transmit beam is scheduled for training at time slot t, e.g., if p tr;t (i) = 1, the i-th transmit beam is trained at time slot t. n i;t 2 C N indicates the i.i.d. complex Gaussian noise vector at UE i , whose entries followCN (0; 2 ). The second term in (5.2) denotes the interference by data transmission, where x d;t 2 C l BS 1 is the data symbol vector at time slot t. Since partial beams may be scheduled for data transmission instantly, x d;t only has a few (or none) non-zero entries, which corresponds to beams inB t ,8t. For UE i belonging toK tr;t , we dene the eective channel from the j-th transmit beam as h i;j , and G(i;t) denotes the index of training beam associated with UE i at time slot t. The pilot suers interference from two components: one from the pilot signal of other beams (Training contamination) and the other from beams scheduled for data transmission (Payload interference). In (5.2), h i;dd;t 2 C jBtj1 denotes the eective channel from beams transmitting data symbols at time slott. F d;t 2C jBtjjK dd;t j denotes the digital precoder at time slott for payload transmission tojK dd;t j UEs. x t 2C jK dd;t j1 denotes the data symbol vector, followingCN (0; I jK dd;t j ). From (5.2), we can estimate the eective channel h i;G(i;t) by using existing channel estimation methods. Partial data transmission During the training window, we may launch the partial data transmission as Fig. 5.3d exhibits. Suppose UE k is able to receive a data symbol at time slot 87 t, where t. The received signal model at UE k is ^ x d;k;t = h y k;dd;t F d;t x t + p p;t X j2Ttr;t h k;j + n k;t = h y k;dd;t f d;t;k x t;k | {z } Desired signal + h y k;dd;t X i2K dd;t nk f d;t;i x t;i | {z } Inter-user interference + p p;t X j2Ttr;t h k;j | {z } Training interference + n k;t |{z} Noise ; (5.3) where F d;t consists of individual digital precoders serving UEs belonging to K dd;t , i.e., F d;t = [f d;t;k ] k2K dd;t , and x t;i is the data symbol transmitted to UE i at the time slot t. Similarly to (5.2), there exist two kinds of interfer- ence: the conventional inter-user interference and the interference from the simultaneously transmitted training signals. 5.3.2 Dedicated Data Transmission After the period of downlink training, the BS can utilize all analog beams for data transmission and the received signal model at UE k can be expressed as ^ x d;k =w y ak H k F a f d;k x k + w y ak H k F a X i6=k f d;i x i + w y ak n k = h k f d;k x k + h k X i6=k f d;i x i + n k ; (5.4) where we ignore the subscript t since the receive signal model remains the same after the training window. 5.3.3 Beamformer Optimization Given the analog beamforming, the achievable rate of UE (i) at time slot t by using the dirty paper coding (DPC) scheme in digital baseband is given by [120] C (i);t = log 2 w y a(i) w a(i) + h y (i);dd;t P ji (j);t h y (i);dd;t 2 w y a(i) w a(i) + h y (i);dd;t P j>i (j);t h y (i);dd;t ; (5.5) 88 where (i)2K dd;t and [(i)] jK dd;t j i=1 is the ordered index set of UEs in DPC, and (i);t is the input covariance of UE (i) at the time slott. Therefore, the net average MU-MIMO downlink capacity within the coherence block is C avg;DL = P Tcor t=1 P (i)2K dd;t C (i);t T cor ; (5.6) whereT cor is the coherence time in units of channel use. If we do not consider the data transmission during the training window, C avg;DL becomes (1 Tcor ) P K i=1 C (i) , where C (i) , independent of t, remains the same within the data transmission phase of a coherence block. Considering the whole stationarity region of channel statistics, we intend to jointly optimize the analog beamformers and pilot assignment matrix P tr , which leads to the maximization of the net average downlink capacity: max [B k ;w ak ] K k=1 ;[p;t] t=1 ;Ptr E[ max [ (i);t ;(i)2K dd;t ] Tcor t=1 C avg;DL ] (5.7a) s:t: B k M ;8k;l use = rank([B k ] K k=1 )l BS ; (5.7b) P tr 2N luse ; P tr (i;j) = 1 or 0;8i;j; X j=1 P tr (i;j) = 1;8i; (5.7c) p;t jT tr;t j + X (i)2K dd;t tr( (i);t ) d ;8t; (5.7d) where the expectation ofC avg;DL is taken to average out the small scale fading, i.e. [ G i ] in (2.2), across multiple coherence blocks within the stationarity time of the channel statistics. Note that the CSI feedback can be realized by the dedicated uplink channel right after the training. Since we focus on the performance of the downlink, we assume ideal instantaneous channel acquisition from the uplink feedback channel, and do not incorporate the feedback cost in problem (5.7), an assumption that is widely used in the literature [55,89,106]. (5.7b) indicates that an individual beam cluster consists of normalized DFT columns and the total number of used transmit beams, i.e. l use , shall not surpassl BS . Analog combiners at UEs, [w ai ], are functions of UE-side channel covariance matrices. P tr denotes the pilot assignment matrix, where each row has a single non-zero entry to indicate the assigned pilot for the beam. 89 In (5.7d), the total transmit power is constrained by d , and tr( (i);t ) = tr(F a (i);t F y a ) is the power for data transmission to UE i at time slot t, and p;t jT tr;t j is the total power used for downlink training at time slot t. The problem (5.7) is very challenging to solve, incorporating three tiers of optimization with dierent time scales, and also coupled together. In the rst tier, we need to design the channel-statistics-based analog beamformers, where the codebook-based F a is coupled with [w ai ]. Later, at the second tier, the pilot matrix P tr needs to be optimized based on the eective beam pair bipartite graph as Fig. 5.3 shows, which not only needs to minimize the training overhead but also optimize the training order to achieve additional spatial multiplexing gains in the STDT phase. For the rst two tiers, our design is based on the long-term CSI, while in the third tier, the designs of input covariances [ (i);t ] and permutation of index set [(i)] are based on the instantaneous CSI, which will eventually determine the performance of the rst two-tier optimization. Resorting to the uplink-downlink duality theory [122], we can develop an equivalent uplink problem of (5.7): max [B k ;w ak ] K k=1 ;[p;t] t=1 ;Ptr E[ max [ 0 i;t ;i2K dd;t ] Tcor t=1 C avg;UL ]; (5.8a) s:t: (5:7b); (5:7c); p;t jT tr;t j + P i2K dd;t 0 i;t w y ai w ai 2 d ;8t; (5.8b) whereC avg;UL = 1 Tcor P Tcor t=1 C t;UL , andC t;UL = logj P K i=1 h i;dd;t 0 i;t h y i;dd;t + I jBtj j. C t;UL denotes the instantaneous uplink capacity at time slot t. 0 i;t indicates the uplink transmit power coecient of UE i at the time slot t,8i;t. Con- straints (5:7b) and (5:7c) remain the same for the uplink dual problem, while the power constraint becomes (5.8b) instead of (5.7d). Detailed developments of the uplink-dual problem with HDA structure at both ends are revealed in [102], which is brie y summarized as follows. Based on [122], the downlink channel has the same instantaneous sum rate as its dual uplink, which can be expressed as max [ 0 i;t ;i2K dd;t ] C t;UL = logj( P K i=1 h i;dd;t 0 i;t h y i;dd;t + Q 1;t )Q 1 1;t j (5.9) s:t: P i2K dd;t 0 i;t Q 2i;t d p;t jT tr;t j; where Q 1;t = F y a;t F a;t = I jBtj , since F a;t consists of normalized DFT columns, and Q 2i;t = 2 w y ai w ai ;i2K dd;t ;8t. Based on (5.9), we can obtain the op- 90 timization for the dual uplink channel as (5.8). Our goal is still focusing on the downlink problem, but we resort to its equivalent dual problem for mathematical convenience. Decoupled optimization with reduced complexity Although the uplink-dual problem (5.8) exhibits a more tractable objective function than that of (5.7a), it still incorporates joint multi-tier optimization with dierent time scales. Decoupling the interaction between instantaneous [ 0 i;t ] and channel-statistics- based variables can signicantly reduce the problem complexity. Therefore, rather than jointly optimizing power allocations [ 0 i;t ], we stick with sim- ple equal power allocation among training signals and payload data, i.e., 0 i;t = p;t , where i2K dd;t and t ranges from 1 to T cor . With unit-norm combiners [w ai ], we have the following power allocation equality: p;t = 0 i;t = d jT tr;t j + 2 jK dd;t j ;8i2K dd;t : (5.10) At time slots dedicated for training, (5.10) is reduced to equal power allo- cation over trained beams, i.e. p;t = d jTtr;tj , while after the training window, (5.10) becomes equal power allocation among UEs, i.e. p;t = d 2 jK dd;t j . By introducing the power allocation equality (5.10),C avg.;UL becomes an achiev- able net throughput rather than the net uplink capacity. However, we reduce the original downlink problem over dierent time scales to an uplink problem purely over the long-term CSI: max [B k ;w ak ] K k=1 ;Ptr E[C avg;UL ] (5.11a) s:t: (5:7b); (5:7c); kw ak k = 1;8k; (5.11b) Average throughput approximation To avoid the computational burden in evaluation of the expectation at (5.11a), we consider the following upper bound of average uplink throughput E[C t;UL ] (a) C UL;upper = logE[j p;t P K i=1 h i;dd;t h y i;dd;t + I l BS j]; 91 where (a) follows from Jensen's inequality: E[logjI + Xj] logE[jI + Xj]. Without loss of generality, we ignore the time subscript in the following, and explore the uplink throughput bound approximation for the dedicated data transmission phase. The result is directly applicable for the STDT phase. Proposition 1: By assuming a single-path channel model, i.e. P i = 1 in (2.2),8i, we have the following equivalence: logE[j p F y a P K i=1 H y i w ai w y ai H i F a + I l BS j] = logj p F y a P K i=1 ~ H y i w ai w y ai ~ H i F a + I l BS j; where ~ H i , 1 L i A UE;i i A y BS;i . Proposition 1 can be easily obtained from the result in [102]. Based on Proposition 1, we obtain a closed-form expression to evaluate the net average uplink throughput under the single-path channel model, and develop the following problem: max [B k ;w ak ] K k=1 ;Ptr ~ C avg;UL = 1 T cor Tcor X t=1 ~ C t;UL ; (5.12) s:t: (5:7b); (5:7c); (5:11b); where ~ C t;UL = logj p;t F y a;t P i2K dd;t ~ H y i w ai w y ai ~ H i F a;t + I jBtj j. Without the as- sumption of P i = 1;8i, Proposition 1 does not hold in general and problem (5.12) becomes an approximation of problem (5.11). Our simulation results in Section 5.5.2 demonstrate that the approximation performs well, even with general settings of [P i ]. 5.4 Algorithm development Problem (5.12) is still generally non-convex, involving integer programming for designing P tr and [B k ]. Meanwhile, given a topology of beam pair bipar- tite graph as shown in Fig. 5.3, there is no closed-form expression for the minimum cost to complete the training, not to mention which training order we should apply to increase the opportunity of data transmission during the training window. In this section, we will rst provide a graph-based algo- rithm to heuristically optimize the training order. Then, a greedy algorithm is proposed to achieve a suboptimal solution to problem (5.12). 92 " # $ % Figure 5.4: Con ict graph of transmit beams for beam pair bipartite graph exhibited in Fig. 5.3, and dierent colors represent dierent pilot dimensions allocated to transmit beams. 5.4.1 Training Order Optimization Given a beam pair bipartite graph, the minimum training cost can be evalu- ated by the algorithm proposed in [135], which provides a suboptimal solu- tion to minimize an upper bound of the training cost: whereas [135] treats left side nodes as BSs, we view them as transmit beams. The algorithm is summarized below: • Build up the con ict graph of transmit beams by treating them as vertices and connect any pair of them with which a common UE is associated as Fig. 5.4 illustrates. • Sort the degree of vertex in descending order, which will be [b 2 ; b 3 ; b 4 ; b 1 ] in Fig. 5.4. • Allocate pilot dimensions to vertices (beams) in a sequential manner. For every vertex awaiting pilot assignment, if it is con icted with all previous vertices, assign an orthogonal pilot dimension to it. Otherwise, assign a pilot dimension occupied by most transmit beams that have no con ict with the vertex. For the toy example in Fig. 5.4, the output of the algorithm will be [t 1 ;t 2 ;t 3 ;t 2 ], corresponding to [b 2 ; b 3 ; b 4 ; b 1 ], wheret i indicates the time slot index of the i-th pilot dimension,8i. However, the schedule order of pilot dimension for training is not explored in above algorithm. Considering that the purpose of optimizing the training order is to in- crease the transmission opportunity for payload data within the training win- dow, we heuristically choose to maximize the total number of time slots for 93 payload data transmission as the objective function, which is max P K i=1 (T cor T tr;i ), where T tr;k indicates the time instance when the BS completes the training for UE k . Apparently, it is equivalent to minimize the sum of train- ing periods of all UEs, i.e. min P K i=1 T tr;i . The aim of solving this problem is to complete as many as possible UEs' individual training earlier than by optimizing over all possible sequential orders of [t i ] i=1 . Minimizing the number of time slots used for training is not the same thing as making sure that we can send as many data slots as possible - there could be non-training slots for a UE before its training is nished (i.e., empty slots). However, the formulated problem is physically intuitive and tractable. To approximate the optimal solution to this typical integer programming problem, we summarize our proposed algorithm below: 1. Dene the degree of time slott i asD(t i ), which is the number of trans- mit beams assigned to time slott i . Dene the setD =fD(t 1 );:::;D(t )g, which includes all values of time slot degree, and sort the elements in a descending order. 2. For the i-th element inD, i.e.,D(i), extract the set of time slotsP i = ft m jD(t m ) =D(i)g, and calculate their priority metrics X j2Ttr;tm K X k=1 I k (b j ) L tran;k ;8m2P i ; where I k (b j ) is an indicator to denote whether b j is associated with UE k , L tran;k is the number of transmit beams connected to UE k , and T tr;tm contains all transmit beams trained on pilot dimension t m . 3. Sort the priority metrics of time slots belonging toP i in a descending order and sequentially assign indices to them. 4. Repeat step 2) and step 3) for i = 1;:::;jDj. At step 2) and 3), for pilot dimensions with the same degree, sayD(i), we introduce a metric P j2Ttm P K k=1 I k (b j ) L k to evaluate the priority order of the m-th pilot dimension,8m2P i . P K k=1 I k (b j ) L k can be interpreted as the relative signicance of b j . If it is very large, b j is connected to a lot of UEs associated with a few transmit beams, then scheduling b j rst increases the chance to nish training of many UEs earlier than . Combining relative 94 signicance of trained beams on each pilot dimension, we obtain the priority orders, or we say the relative signicance of pilot dimensions, and then we can schedule them sequentially. Based on the result of training allocation and order scheduling, we can build up the pilot assignment matrix P tr . 5.4.2 Greedy User-Centric Beam Clustering We consider the case that the analog precoder and analog combiner are cho- sen from the DFT codebook and the eigenmode of UE-side channel covari- ance, respectively. Therefore, the beamformer optimization of (5.12) becomes to select the eective beam pairs from the bipartite graph implied by S. Thanks to the training order optimization in Section 5.4.1, we can evaluate the performance of any given topology of reduced S, which lays the founda- tion of our proposedgreedyuser-centricbeamclustering (GUCBC) algorithm. The detailed implementation procedure can be summarized as follows: 1. Initially, let w ak = 0;8k and F a =;. Let W a be the ensemble of analog combiners as W a , [w ai ;:::; w aK ]. 2. Extract theM P K i=1 r i measure matrix S following (5.1), enforce small entries to be zero if a certain portion, i.e. , of total average energy can be maintained, and build up the beam pair bipartite graph. Dene a beam pair setE containing all edges, i.e. (b; r)2E if transmit beam b and receive eigenbeam r are connected. 3. Let W 0 a = W a and F 0 a = F a . For a candidate beam pair e = (b; r) in E, we let F 0 a = [F 0 a ; b] and assign r to its corresponding UE, then run the evaULthroughput given in Algorithm 3 to return the net average uplink throughput approximation (NAUTA). 4. Repeat step 3) for every candidate edge and nd the optimal one e ? = (b ? ; r ? ) that can enhance the NAUTA most. 5. Update W a by assigning r ? to its corresponding UE k ?, update F a by F a = [F a ; b ? ], and remove the beam pairs starting with b ? and beam pairs ended with all other receive eigenmodes of UE k ? fromE. 6. Repeat step 3) to step 5) until rank(F a ) =l BS or the NAUTA does not increase by adding additional beams. 95 The essential idea of the algorithm is to greedily add eective beam pairs from the bipartite graph. For every candidate beam pair, we need to uti- lize an inner function, so called evaULthroughput, to evaluate the NAUTA of the beamformed eective channel with this additional candidate, and then select the best beam pair to update F a and W a . For example, let us inves- tigate the toy example exhibited by Fig. 5.2, where initially we have total 5 edges inE. The rst step will select a beam pair that provides maximal NAUTA, which is (b 3 ; r 31 ) with the largest weight. Then, edges (b 3 ; r 12 ) and (b 3 ; r 22 ) have to be removed, since their transmit beam b 3 has already been selected. The remaining beam pair set will bef(b 1 ; r 11 ); (b 2 ; r 21 )g, and we sequentially assign them to F a and W a if the NAUTA can get enhancement. Detailed specics of evaULthroughput can be found in Algorithm 3. Under the constraint of both channel rank and number of BS RF chains, the pro- posed GUCBC algorithm not only designs sub-optimal analog beamformers at both link ends, but also implicitly incorporates the functionality of UE scheduling in the sense that the users with w ak = 0 are not scheduled. Algorithm 3 evaULthroughput 1: Extract the eective beam pair bipartite graph projected by F 0 a and W 0 a . Follow procedures in Section5.4.1 to optimize the pilot assignments P tr . 2: for t 1 to T cor do 3: ExtractK cc;t and its complement K cc;t . 4: Build up F 0 a;t by trained beams and make sure that no beam in F 0 a;t is connected to UEs belonging to K cc;t , then calculate ~ C t;UL in (5.12). 5: end for 6: Substitute [ ~ C t;UL ] into (5.12) and we can obtain the NAUTA ~ C avg;UL . To evaluate the complexity of the proposed algorithm, we compare it with the exhaustive beam search. Given l sel. selected transmit eigenmodes and stream assignments [y i ] K i=1 , wherey i represents whether or not assigning a stream to UE i , and P K i=1 y i l sel. , there are total r i y i possible sets of receive eigenmodes which can be used to form analog combiner at UE i ,8i. Therefore, for all K UEs, there will be total Q K i=1 r i y i possibilities for given stream assignments [y i ] K i=1 and transmit beams. The total number of combinations 96 that exhaustive search method needs to investigate is: N comb. = l BS X l sel. =1 M l sel. X P K i=1 y i l sel. K Y i=1 r i y i ; (5.13) where the second summation is over all possible realizations of stream as- signments, and the rst summation is over all possible numbers of transmit beams. The computational burden grows extremely fast with the increasing of variables M, r i , and l BS , which is prohibitive for implementation. For the proposed method, the iteration time is up to l BS . At step 5) of the greedy user-centric beam clustering (GUCBC) algorithm, we will remove beam pairs that are not eective beam pair candidates for follow-up iterations. Con- sidering the worst-case scenario, where we only remove the selected beam pair from the beam pair setE for every iteration, its complexity is upper bounded by l BS X t=1 (jEjt + 1) =l BS jEj l BS (l BS 1) 2 l BS jEj; (5.14) where t indicates the index of iteration. Therefore, the complexity order of the proposed scheme is roughly O(l BS jEj). If we conservatively remove weak beam pairs whose eective channel gains are below noise power, the cardinal- ity ofE is approximately the ensemble of all MPCs, i.e., P K i=1 P i . Therefore, the complexity of the proposed scheme is upper bounded by O(l BS P K i=1 P i ), which is tremendously less than N comb. shown in (5.13), especially for mm- wave frequencies where dominant MPCs are much fewer than those at low fre- quencies. 5.5 Simulation Results In this section, we evaluate the performance of the proposed UCVS scheme via simulations. All simulation results exhibit the comparison with JSD- M/BDMA [55, 106] with respect to net average sum rate. To have a fair comparison of dierent schemes, we always use a least squares (LS) channel estimation during the training phase, and a zero-forcing digital precoder for the payload transmission. Meanwhile, the BS performs a greedy UE schedul- ing algorithm based on the instantaneous reduced-dimensional CSI to achieve 97 BS UE scatterer DOD support Figure 5.5: Illustration of GSCM with UEs and scatterers in a range of DOD support. the approximate optimal performance with dierent analog beamformer de- signs, respectively. 5.5.1 Geometric Stochastic Channel Model (GSCM) Following the dominant characteristics of mm-wave propagation, we mainly focus on the MPCs interacting with a single scatterer. Fig. 5.5 illustrates an example of how to generate the synthetic channel proles. We place the scatterers and UEs in an angular range (as seen from the BS) that we call support interval of DOD. Therefore, dierent options of DOD support range can represent dierent scenarios. For example, for a crowded cafeteria, we may use a narrow DOD support, while for UEs separated far away from the BS perspective in the angular domain, we can use a wide DOD support range. To activate scatterers for the channels between the BS and the UEs, we utilize the following probabilistic model: P active =P UE;LOS P BS;LOS ; (5.15) where P active is the probability that a scatterer is active for a UE, which is the product of marginal probabilities that both ends can \see" this scat- terer. The marginal probability that a terminal has LOS propagation to the scatterer follows P UE/BS;LOS = min (d 1 =d; 1)(1 exp(d=d 2 )) + exp(d=d 2 ); (5.16) where d 1 and d 2 are modeling parameters, and d is the distance from the BS/UE to the scatterer. When d < d 1 , we have P UE/BS;LOS = 1 indicating that the scatterer is deterministically visible by the UE/BS. For d>d 1 , the 98 probability is exponentially decreasing with increasing of d, where the decay rate is determined by d 2 . Settings of both d 1 and d 2 will be environment- dependent [125], 2 e.g., urban, rural, and the terminal heights will also make a dierence. Note that this model provides an implementation of the \com- mon scatterer" concept used, e.g., in COST 2100 [130]. Unless otherwise specied, the parameter settings for channel model and system conguration are exhibited in Table 5.1. Table 5.1: Simulation parameters DOD support range all = 20 or 40 LOS from BS to scatterer d 1;BS2S =24 m, d 2;BS2S =45 m LOS from UE to scatterer d 1;UE2S =2 m, d 2;UE2S =10 m Scatterer density s = 0:01 0:09 Energy threshold = 0:7 1 Number of UEs K = 4 No. of BS antennas and RF chains M = 64, l BS = 8 No. of UE antennas and RF chains N = 8, l UE = 1 Antenna spacing (in wavelength) D = 1 2 For modeling parameters in (5.16), since the BS is usually located higher, d 1;BS2S and d 2;BS2S are respectively larger than d 1;UE2S and d 2;UE2S (we use subscripts \BS2S" and \UE2S" to distinguish parameter sets for dierent terminals). The coherence time in the unit of channel use by standard can be evaluated by 1 f d Ts , where f d is the Doppler spread, and T s is the symbol duration, which is 66:7s in the LTE standard. Substituting the mobility speed ranging from 1:8 m/s to 5 m/s and carrier frequency of 60 GHz, we can obtain coherence time approximately ranging from 40 to 15 channel uses of the LTE standard. For all simulation sets, we maintain the noise power and large scale loss to be unity, i.e. 2 = 1,L i = 1,8i. Therefore, transmit power d is equivalent to the signal-to-noise ratio (SNR) subsuming the impact of large scale loss. For every drop of UEs, multiple scatterers are independently generated following a Poisson process with parameter s in the sector-shape region as 2 [125] proposes the LOS probability model (5.16) for mm-wave channels between BS and UE, whereas here we use this model to indicate the probability of LOS between a terminal and a scatterer. 99 Fig. 5.5 shows. Meanwhile, random locations of UEs are constrained in the region, whose separation distances to the BS range from 50 to 60 m. With the assumption of uncorrelated scattering, we independently generate [ ip ] following a uniform distribution within [0; 1], and then normalize them to satisfy P P i p=1 2 ip = 1;8i. Given locations of UEs and scatterers, we randomly generate UE-scatterer association graphs following the probabilistic model, based on which we can obtain double directional channel descriptions (2.2) of all UEs. The net average sum rate exhibited are all obtained by averaging over 100 UE drops, each of which consists of 20 independent realizations of UE- scatterer association graph and 50 independent realizations of small fading. We investigate the ensemble average over dierent realizations of beam pair bipartite graph to demonstrate the advantages of the proposed method in var- ious propagation environments. Intuitively, if UE channels are fully spatially orthogonal to each other or their transmit eigenmodes are fully coupled, both schemes will achieve approximately the same performance, where in the for- mer case each UE forms an independent UE group, while in the latter case, all UEs are grouped together. In the Section 5.5.2, we investigate system performance under more realistic mm-wave channel models, which lies in the middle of above two extreme examples. 5.5.2 Results and Discussions We rst x the coherence time T cor to be 20 and the threshold parameter to be 0:9, then investigate the behavior of the net sum rate varying with SNR as Fig. 5.6 exhibits. Note that since there is no clear conclusion on the optimal UE grouping for JSDM in [55, 91], we make comparisons with the JSDM scheme under dierent UE groupings, where K-means clustering to group UEs with similar channel covariance is applied [91]. We can observe that for both DOD support intervals, grouping all UEs together is optimal in the high SNR regime, since the channel-covariance-based analog precoder in JSDM cannot fully eliminate the inter-group interference, and forming more user groups will make the system operate in interference-limited mode. How- ever, for the low SNR regime, the system tends to be noise-limited, and using more UE groups introduces additional gains from training cost reduction and thus obtains better performance. With all = 40 , we can observe that the impact of user grouping for JSDM is smaller, which is because dropping scat- terers and UEs in a wider DOD support range leads more UE channels to 100 −20 −15 −10 −5 0 5 0 5 10 15 SNR [dB] Net average sum rate [bit/s/Hz] UCVS JSDM_G=1 JSDM_G=2 JSDM_G=4 (a) all = 20 −20 −15 −10 −5 0 5 0 2 4 6 8 10 12 14 16 SNR [dB] Net average sum rate [bit/s/Hz] UCVS JSDM_G=1 JSDM_G=2 JSDM_G=4 (b) all = 40 Figure 5.6: Net average sum rate vs. d for T cor = 20 and = 0:9 be spatially orthogonal. Incorporating non-orthgonal training, STDT phase, and user-centric beamformer optimization, the proposed UCVS scheme out- performs JSDM with the optimal UE grouping setting in both cases. Fig. 5.7 and Fig. 5.8 show the net average sum rate as a function of the coherence time under dierent DOD support range all and SNR d . For the proposed UCVS scheme, we also investigate dierent settings of threshold parameter , i.e. 0:9 or 1. Comparing Fig. 5.7a and 5.7b, or Fig. 5.8a and 5.8b, we notice that appropriate settings of are scenario-dependent. Specif- ically, for the low SNR regime, compared with the situation where = 1, the UCVS with a relatively smaller = 0:9 generates a sparser beam mea- sure table and obtains more gains from the reduction of training overhead, while for the high SNR regime in Fig. 5.7b and Fig. 5.8b, the interference- limited system is more sensitive to the threshold parameter, since striking out \weak" beam pair edges may generate nontrivial pilot contamination in the training phase and inter-user interference during data transmission, whose performance can be even worse than that of optimal JSDM as long as T cor is large enough. However, for typical coherence times below 30, the proposed scheme at = 0:9 still outperforms the state-of-art method. 101 15 20 25 30 35 40 T cor 4 5 6 7 8 9 10 11 Net sum rate [bit/s/Hz] UCVS_ =1 UCVS_ =0.9 UCVS_no STDT_ =0.9 JSDM_G=1 JSDM_G=2 JSDM_G=4 (a) d =5 dB 15 20 25 30 35 40 T cor 6 8 10 12 14 16 18 20 Net sum rate [bit/s/Hz] UCVS_ =1 UCVS_ =0.9 UCVS_no STDT_ =0.9 JSDM_G=1 JSDM_G=2 JSDM_G=4 (b) d = 5 dB Figure 5.7: Net average sum rate vs. T cor for all = 20 15 20 25 30 35 40 T cor 5 6 7 8 9 10 11 Net sum rate [bit/s/Hz] UCVS_ =1 UCVS_ =0.9 UCVS_no STDT_ =0.9 JSDM_G=1 JSDM_G=2 JSDM_G=4 (a) d =5 dB 15 20 25 30 35 40 T cor 8 10 12 14 16 18 20 Net sum rate [bit/s/Hz] UCVS_ =1 UCVS_ =0.9 UCVS_no STDT_ =0.9 JSDM_G=1 JSDM_G=2 JSDM_G=4 (b) d = 5 dB Figure 5.8: Net average sum rate vs. T cor for all = 40 102 To evaluate the individual contributions from the NOBT and STDT re- spectively, we also investigate the performance of UCVS without concerning the STDT phase, whose legend is \UCVS no STDT" in Fig. 5.7 and Fig. 5.8. Although individual gains by STDT are not signicant in the simulated scenarios with terminals and scatterers in a narrow DOD support range, they can be dominant in other typical scenarios. For example, with two UEs spa- tially orthogonal to each other in the angular domain, one has a much large DOD spread than that of the other. The reduction of training cost by NOBT will be limited by training the UE channel with large DOD spread, while the system can start data transmission once the training for UE with narrower angular spread is completed, making STDT more advantageous. In conclusion, for the interesting range of parameter settings for mm- wave systems, i.e. operating at SNR below 0 dB and coherence time below 50 channel uses, the proposed UCVS exhibits signicant performance advantage over JSDM, e.g., more than 38% when d =5 dB andT cor = 20 as Fig. 5.7a shows. Meanwhile, for the large coherence time and high SNR regime, which is usually out of the scope of mm-wave systems, the proposed scheme with appropriate threshold setting still outperforms the state-of-the-art method in Fig. 5.7b and Fig. 5.8b. To further investigate the impact of the threshold , we x d = 5 dB, T cor = 20, and compare UCVS with JSDM by net average sum rate varying with in Fig. 5.9. The optimal JSDM will still group all UEs together, and its net sum rate does not vary with parameter . For the proposed UCVS, we cannot adjust to be too small. Otherwise, the UCVS based on the reduced beam pair bipartite graph will cause training contamination and inter-user interference, whose performance may be even worse than that of JSDM, e.g., when = 0:7 0:8 and all = 20 in Fig. 5.9a. With all = 40 , the UCVS still performs better when = 0:7, because UE channels under a wider DOD support range tend to be more spatially orthogonal to each other, leading UCVS more robust to small values of . Fig. 5.10 exhibits results of net average sum rate varying with the scat- terer density s . Due to lack of space, we only present results with all = 20 , while behavior of the average rate is similar when all = 40 . With the increase of scatterer density, the UE channel is less sparse and the perfor- mance gap between UCVS and JSDM becomes smaller. Consider a sce- nario with dense scatterers in a narrow DOD support interval; in that case many close-by scatterers act as a common scatterer, which can not be dis- tinguished from the perspective of either BS or UE. The eigenspace of UE 103 0.7 0.75 0.8 0.85 0.9 0.95 1 6 8 10 12 14 16 18 γ Net average sum rate [bit/s/Hz] UCVS JSDM_G=1 JSDM_G=2 JSDM_G=4 (a) all = 20 0.7 0.75 0.8 0.85 0.9 0.95 1 8 9 10 11 12 13 14 15 16 17 γ Net average sum rate [bit/s/Hz] UCVS JSDM_G=1 JSDM_G=2 JSDM_G=4 (b) all = 40 Figure 5.9: Net average sum rate vs. for d = 5 dB channel covariances will probably largely overlap. Therefore, by grouping all UEs together and forming the JSDM-like analog beamformer approaches the optimal covariance-based solution, which aligns with our proposed method eventually for large s . However, for a typical sparse mm-wave channel, e.g., when s = 0:01, the number of MPCs is typically less than 10, 8i, and the proposed scheme shows signicant performance advantage for a typical mm-wave system operating at5 dB: it outperforms JSDM by 35%. In summary, with short coherence times at mm-wave frequencies, the UCVS benets from NOBT and STDT, while with large coherence time at the high SNR regime, orthogonal training is necessary to avoid pilot contam- ination, and both UCVS and JSDM scheme achieve almost the same perfor- mance as Fig. 5.7 and Fig. 5.8 show. From Fig. 5.9b, we can observe an optimal trade-o between the reduction of training cost and interference sup- pression, which illuminates the future work to explore the optimal threshold setting dependent on dierent propagation environments. For the impacts by the scatterer densities, we can nd that sum rates of both schemes mono- tonically decrease with the channel sparsity until the saturation from Fig. 5.10. The performance gap compared with the JSDM is much larger at low 104 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 2 3 4 5 6 7 8 9 scatterer density Net average sum rate [bit/s/Hz] UCVS JSDM_G=1 JSDM_G=2 JSDM_G=4 (a) d =5 dB 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 2 4 6 8 10 12 14 16 18 scatterer density Net average sum rate [bit/s/Hz] UCVS JSDM_G=1 JSDM_G=2 JSDM_G=4 (b) d = 5 dB Figure 5.10: Net average sum rate vs. s for all = 20 scatterer density than that at large one, since the proposed scheme is able to exploit the channel sparsity to improve the system peroformance. To incorporate the impacts of non-ideal knowledge of channel statistics, we consider the following estimation model of MPC directions ~ ip = ip +e BS;ip ; ~ ip = ip +e UE;ip ;8i;p; where e BS;ip N (0; 2 e ) and e BS;ip N (0; 2 e ) represent the estimation er- ror of DOD and DOA, respectively. ~ ip and ~ ip are estimated DODs and DOAs, respectively. For simplicity, we let estimation error variables follow i.i.d. Gaussian distribution with mean zero and variance 2 e . Therefore, by adjusting dierent values of e , we can investigate impacts of direction mis- alignment with dierent extents. Fig. 5.11 exhibits sum rates varying with e for dierent parameter set- tings atT cor = 20, where channel model and system congurations are main- tained the same as in Section 5.5.1. For the JSDM at T cor = 20, we only exhibit results corresponding to one UE group, which is optimal as exhibited in Fig. 5.7 and Fig. 5.8. For the proposed UCVS, we consider dierent setting of energy threshold for dierent SNRs: at d = 5 dB, we let to 105 0 2 4 6 8 e 2 4 6 8 10 12 14 16 18 Net sum rate [bit/s/Hz] UCVS_ =0.9_ d =-5 dB JSDM_ d =-5 dB UCVS_ =1_ d =5 dB JSDM_ d =5 dB (a) all = 20 0 1 2 3 4 5 6 7 8 e 2 4 6 8 10 12 14 16 Net sum rate [bit/s/Hz] UCVS_ =0.9_ d =-5 dB JSDM_ d =-5 dB UCVS_ =1_ d =5 dB JSDM_ d =5 dB (b) all = 40 Figure 5.11: Net average sum rate vs. e for T cor = 20 be 1, and maintain the full beam pair bipartite graph, while is adjusted to be 0:9 at d =5 dB. We can observe that the proposed scheme requires more accurate directional characteristics, while JSDM without concerning UE grouping is more robust to imperfect channel statistics. However, with the help of large array, the angle estimation deviation shall be very small, e.g., less than 1 degree, with which the proposed scheme can outperform the JSDM scheme. 5.6 Conclusions In this chapter, we built up an optimization framework based on the user- centric virtual sectorization for the implementation of massive MIMO sys- tems in FDD mode, which incorporates three coupled optimization tiers with dierent time scales, including analog beamformer design, training resource allocation, and digital beamformer design, respectively. A UE-specied \vir- tual sectorization" employs the STDT phase and NOBT to fully exploit the mm-wave channel characteristics. Heuristic low-complexity algorithms were devised to approach the suboptimal solution of analog beamformer design. 106 Simulations revealed signicant gains of the proposed scheme over state-of- the-art methods in typical mm-wave channels. For future work, we will con- sider the optimization of the threshold in the proposed scheme to strike weak paths, which is dependent on dierent propagation environments. Mean- while, simulations over real mm-wave channel data will be investigated to check the practical applicability of the proposed scheme. 107 Chapter 6 Future Research Directions 6.1 Extension of Thesis First of all, as we discussed in Section 1.4, the subsequent work lies in the combination of two research branches, i.e., \Fully-decoupled" and \Semi- joint", developed in this dissertation, aiming to nd the optimal solutions to maximize the real sum-rate with practical linear baseband processing. Secondly, other reduced-complexity HDA architectures are also interest- ing to explore. Current work mainly focuses on the fully-connected structure, where each RF chain has access to all antenna elements. Concerning con- straint of hardware cost and power consumption, number of phase shifters to build analog beamformer could be limited, which necessitates the sub-array structure, where each RF chain is only connected to a subset of antenna el- ements [110]. Mathematically, without connection between a particular RF chain and an antenna element, the corresponding entry of analog beamform- ing matrix will be zero. Still focusing on the optimization of multi-layer beamforming with dierent time scales, we would like to develop MU-MIMO beamforming strategies that accommodate the sub-array HDA structure. 6.2 Other Related Directions This thesis mainly addresses the channel-statistics-based hybrid beamform- ing optimization. In the future, we also would like to investigate the following related aspects. 108 • Quantized codebooks for hybrid beamforming: current proposed schemes evaluate the training cost by assuming the ideal analog feed- back. To rene the proposed scheme, we need to consider additional constraints brought by nite codebook for optimization of hybrid beam- forming strategy. Especially, the phase shifter may provide a limited numbers of realizations, which further reduces the exibility of ana- log beamforming compared with that of a digital one. Meanwhile, the design of codebooks for analog/digital beamforming with dierent con- straints of quantization-level and time-sensitivity also needs further in- vestigation. • Acquire channel statistics based on machine learning: in this thesis, we assume the prior knowledge of average CSI, e.g., angular power spectrum, which is intuitively satised for the low-mobility ap- plications. However, the extension to general mobile environments re- quires more ecient estimation and tracking methods of channel statis- tics. At the era of big data, machine learning tools could be promis- ing solutions. For example, with GPS-enabled location tracking, we may rst extract channel statistics from sampled measurements, corre- sponding to dierent locations, to build up an oine database. Then, a convolutional neural network trained by the database can be used to perform eciently online-estimation of average CSI with location in- formation. • More practical assumptions and implementation of the pro- posed methods: prior work assumes the ideal RF components, and high resolutions ADCs. However, practical RF circuits and antennas have non-ideal characteristics. Meanwhile, low-resolution ADCs are necessary for the application of massive MIMO at mm-wave bandwidth. In the future, we would like to build up the test bed for validation and improvement of proposed schemes. Beamforming strategies may need back-and-forth modications to incorporate new non-ideal hard- ware constraints. 109 Appendices 110 .1 Proof of Proposition 1 With (3.6), we can obtain the termE[k H g i ;z k 2 ] in (3.4) as E[k H g i ;z k 2 ] = tr(E[vec( H g i ;z )vec( H g i ;z ) y ]) (a) = tr((B T z W y g i )K g i (B z W g i )) (b) = tr(B T z t;g i B z ) tr(W y g i r;g i W g i ); (1) where (a) follows the vectorization of matrix vec(AXB) = (B T A)vec(X), (b) substitutes the Kronecker channel model (3.6) and uses the property of kronecker product that (A B)(C D) = (AC) (BD) and tr(A B) = tr(A) tr(B). With (1) we can rewrite (3.4) as g i = Pt GkBgk 2 tr(B T g t;g i B g ) P G z=1;z6=g Pt GkBzk 2 tr(B T z t;g i B z ) + 2 g i kW y g i Wg i k 2 tr(W y g i r;g i Wg i ) : (2) Then, to maximize g i is equivalent to maximize the term tr(W y g i r;g i Wg i ) kW y g i Wg i k 2 , which is independent from analog precoders [B g ] G g=1 and has the optimal solution consisting of d g i dominant eigenvectors of r;g i . .2 Proof of Theorem 1 By xing variables [ ~ F g i ] and [V g i ], we can observe that (3.11a) is a convex function with respect to [A g i ]. Based on the rst order optimality condition, we have A ? g i = ~ E 1 g i ;8i;g: (3) Similarly, by xing variables [V g i ] and [A g i ], we can observe that the ex- pression inside the trace operator of (3.11a) is a convex quadratic function with respect to [ ~ F g i ]. Therefore, the optimal ~ F ? g i following the rst order optimality condition is ~ F ? g i = ~ J 1 g i H g i ;g V g i ;8 i;g; (4) 111 where ~ J g i = H g i ;g P kg j=1 V g j V y g j H y g i ;g + P G z6=g P kz l=1 E[ H g i ;z V z l V y z l H y g i ;z j H g i ;g ]+ W y g i W g i 2 g i : (5) Substituting the optimal [A ? g i ] (3) and [ ~ F ? g i ] (4) back into (3.11a), we can reduce the problem (3.11) to max [Vg i ] P G g=1 P kg i=1 log 2 j ~ E 1 g i j;s:t: P G g=1 P kg i=1 tr(B g V g i V y g i B y g )P t ; (6) where ~ E g i now becomes a function only regarding to [V g i ], i.e., ~ E g i =I dg i V y g i H y g i ;g ~ J 1 g i H g i ;g V g i . Therefore, we have log 2 j ~ E 1 g i j = (x) log 2 jI dg i + V y g i H y g i ;g ( ~ J g i H g i ;g V g i V y g i H y g i ;g ) 1 H g i ;g V g i j = (y) log 2 jI l UE + H g i ;g V g i V y g i H y g i ;g ( ~ J g i H g i ;g V g i V y g i H y g i ;g ) 1 j (z) E[log 2 jI l UE + H g i ;g V g i V y g i H y g i ;g 1 g i j H g i ;g ]; (7) where equality (x) follows the Woodbury matrix identity, and equality (y) followsjI + ABj = jI + BAj. At (z), we apply the Jensen's inequality: E[logjI + X 1 j] logjI +E[X] 1 j. Meanwhile, from the denitions of [ g i ] and [ ~ J g i ] as exhibited in (3.3) and (5) respectively, we have E[ g i j H g i ;g ] = ~ J g i H g i ;g V g i V y g i H y g i ;g ,8g;i. Then, the proof is completed. .3 Proof of Remark 1 Substituting the channel model (2.2) into the determinant of (4.13), we can obtain j ~ F y a P K i=1 1 L i B i G y i y i A y i ~ W i ~ W y i A i i G i B y i ~ F a + I l BS j =j ~ F y a B y G y ~ QGB y ~ F a + I l BS j; (8) 112 where G = diag(G 1 ;:::; G K ). Based on (4.15) and (4.16), we expand (8) by determinants of its submatrices: j ~ F y a B y G y ~ QGB y ~ F a + I l BS j= (f) P l BS j=0 P ^ j j ~ F y a B y G y ~ QGB y ~ F a j ^ j ^ j = (g) P l BS j=0 P ^ j P ^ 1 j P ^ 2 j P ^ 3 j P ^ 4 j j ~ F y a B y j ^ j ^ 1 j jG y j ^ 1 j ^ 2 j j ~ Qj ^ 2 j ^ 3 j jGj ^ 3 j ^ 4 j jB y ~ F a j ^ 4 j ^ j = (h) P l BS j=0 P ^ j P ^ 1 j P ^ 3 j j ~ F y a B y j ^ j ^ 1 j jG y j ^ 1 j ^ 1 j j ~ Qj ^ 1 j ^ 3 j jGj ^ 3 j ^ 3 j jB y ~ F a j ^ 3 j ^ j ; (9) where equality (f) and (g) are obtained following (4.15) and (4.16), respec- tively. Note that G is a diagonal matrix. Therefore,jG y j ^ 1 j ^ 2 j = 0 if ^ 1 j 6= ^ 2 j andjGj ^ 3 j ^ 4 j = 0 if ^ 3 j 6= ^ 4 j , which leads to the equality (h). As we described in (2.2), only G contains small scale fading variables changing with the instantaneous channel, while other variables, i.e., [A i ], and B, remain constant for a long period of time. Consequently, taking the expectation over (9) develops the following expression P l BS j=0 P ^ j P ^ 1 j P ^ 3 j j ~ F y a B y j ^ j ^ 1 j E[jG y j ^ 1 j ^ 1 j jGj ^ 3 j ^ 3 j ]j ~ Qj ^ 1 j ^ 3 j jB y ~ F a j ^ 3 j ^ j = (i) P l BS j=0 P ^ j P ^ 1 j j ~ F y a B y j ^ j ^ 1 j j ~ Qj ^ 1 j ^ 1 j jB y ~ F a j ^ 1 j ^ j ; (10) where equality (i) follows from dierent MPCs exhibiting independent mean- zero small scale fading. To be specic, since G is diagonal,jG y j ^ 1 j ^ 1 j andjG y j ^ 3 j ^ 3 j equal the product of normalized small scale fading variables corresponding to the index set ^ 1 j and ^ 3 j , respectively. Thus, if ^ 1 j = ^ 3 j ,E[jG y j ^ 1 j ^ 1 j jGj ^ 3 j ^ 3 j ] = 1, otherwise it is 0. .4 Proof of Lemma 1 We use proof by contradiction. Dene A=diag(A 1 ;:::; A K ), so that ~ Q= A y ~ W ~ W y A, where A is still a semi-unitary matrix: A y A=diag(A y 1 A 1 ;:::; A y K A K )=diag(I P 1 ;:::; I P K ): 113 Assume that the optimal [ ~ W ? i ] maximizing (4.14) under the power constraint makes ~ Q ? =A y ~ W ? ~ W ?y A and8i; ~ Q ? i = A y i ~ W ? i ~ W ?y i A i be non-diagonal matrices. Then, we can always construct new preocders ^ W i = A i diag( ~ Q ? i ) 1 2 8i, with which the transmit power statises: tr( ^ W i ^ W y i ) = (j) tr(diag( ~ Q ? i )) = tr( ~ Q ? i ) = tr(A y i ~ W ? i ~ W ?y i A i ) (k) tr( ~ W ? i ~ W ?y i ) ! P K i=1 tr( ^ W i ^ W y i ) 2 P K i=1 tr( ~ W ? i ~ W ?y i ) 2 P t ; (11) where equality (j) follows the assumption that A i is semi-unitary, and (k) is based on the fact that the trace of a positive semi-denite matrix projected on its subspace, i.e., A y i ~ W ? i ~ W ?y i A i , is no larger than the trace of itself. Hence, [ ^ W i ] also satises the power constraint. Meanwhile, the expression of ~ Q in (4.14) for our newly constructed [ ^ W i ] becomes ^ Q i =A y i ^ W i ^ W y i A i =A y i A i diag( ~ Q ? i )A y i A i =diag( ~ Q ? i ) ! (l) j ^ Qj ^ j ^ j j ~ Q ? j ^ j ^ j ;8j and ^ j ; (12) where the step (l) follows from the Hadamard's inequality. Since every sub- determinant of the new ^ Q is no smaller than that of ~ Q ? , the weighted sum of them given in (4.14), is still larger, which con icts with the assumption that ~ Q ? is optimal. Therefore, ~ Q ? has to be a diagonal matrix. 114 Bibliography [1] J. Winters, \Optimum combining for indoor radio systems with mul- tiple users," IEEE Transactions on Communications, vol. 35, no. 11, pp. 1222{1230, 1987. [2] G. J. Foschini and M. J. Gans, \On limits of wireless communications in a fading environment when using multiple antennas," Wireless personal communications, vol. 6, no. 3, pp. 311{335, 1998. [3] T. Marzetta, \Noncooperative cellular wireless with unlimited numbers of base station antennas," IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590{3600, 2010. [4] B. M. Hochwald, T. L. Marzetta, and V. Tarokh, \Multiple-antenna channel hardening and its implications for rate feedback and schedul- ing," IEEE Transactions on Information Theory, vol. 50, no. 9, pp. 1893{1909, 2004. [5] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, \Massive MIMO for next generation wireless systems," IEEE Communications Magazine, vol. 52, no. 2, pp. 186{195, 2014. [6] A. I. Sulyman, A. T. Nassar, M. K. Samimi, G. R. MacCartney, T. S. Rappaport, and A. Alsanie, \Radio propagation path loss models for 5G cellular networks in the 28 GHz and 38 GHz millimeter-wave bands," IEEE Communications Magazine, vol. 52, no. 9, pp. 78{86, 2014. [7] \Report and order and further notice of proposed rulemaking," Federal Communications Commission, Tech. Rep. 16-89, July 2016. 115 [8] T. Rappaport, R. Heath Jr, R. Daniels, and J. Murdock, Millimeter wave wireless communications. Pearson Education, 2014. [9] M. Costa, \Writing on dirty paper (corresp.)," IEEE Transactions on Information Theory, vol. 29, no. 3, pp. 439{441, 1983. [10] T. Yoo and A. Goldsmith, \On the optimality of multiantenna broad- cast scheduling using zero-forcing beamforming," IEEE Journal on Se- lected Areas in Communications, vol. 24, no. 3, pp. 528{541, 2006. [11] J. Singh, S. Ponnuru, and U. Madhow, \Multi-gigabit communication: the ADC bottleneck," in Ultra-Wideband (ICUWB), 2009 IEEE Inter- national Conference on. IEEE, 2009, pp. 22{27. [12] J. Wang, \Beam codebook based beamforming protocol for multi-Gbps millimeter-wave WPAN systems," IEEE Journal on Selected Areas in Communications, vol. 27, no. 8, pp. 1390{1399, Oct. 2009. [13] Y. M. Tsang, A. S. Poon, and S. Addepalli, \Coding the beams: Im- proving beamforming training in mmwave communication system," in Global Telecommunications Conference (GLOBECOM). IEEE, 2011. [14] L. Chen, Y. Yang, X. Chen, and W. Wang, \Multi-stage beamform- ing codebook for 60GHz WPAN," in Communications and Networking in China (CHINACOM), 2011 6th International ICST Conference on. IEEE, 2011, pp. 361{365. [15] E. Bj ornson, J. Hoydis, and L. Sanguinetti, \Massive MIMO has un- limited capacity," IEEE Transactions on Wireless Communications, vol. 17, no. 1, pp. 574{590, 2018. [16] A. Molisch, Wireless communications, 2nd ed. IEEE Press-Wiley, 2011. [17] F. Rusek, D. Persson, B. K. Lau, E. Larsson, T. Marzetta, O. Edfors, and F. Tufvesson, \Scaling up MIMO: Opportunities and challenges with very large arrays," IEEE Signal Processing Mag., vol. 30, no. 1, pp. 40{60, 2013. [18] J. Hoydis, S. Ten Brink, and M. Debbah, \Massive MIMO: How many antennas do we need?" in Communication, Control, and Computing 116 (Allerton), 2011 49th Annual Allerton Conference on. IEEE, 2011, pp. 545{550. [19] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, \An overview of massive MIMO: Benets and challenges," IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 5, pp. 742{758, 2014. [20] X. Zhang, A. Molisch, and S.-Y. Kung, \Variable-phase-shift-based RF-baseband codesign for MIMO antenna selection," IEEE Trans. Sig- nal Processing, vol. 53, no. 11, pp. 4091{4103, 2005. [21] P. Sudarshan, N. Mehta, A. Molisch, and J. Zhang, \Channel statistics- based RF pre-processing with antenna selection," IEEE Trans. Wire- less Commun., vol. 5, no. 12, pp. 3501{3511, 2006. [22] A. Alkhateeb, J. Mo, N. Gonzalez-Prelcic, and R. W. Heath, \MIMO precoding and combining solutions for millimeter-wave systems," IEEE Communications Magazine, vol. 52, no. 12, pp. 122{131, 2014. [23] N. Song, T. Yang, and H. Sun, \Overlapped subarray based hybrid beamforming for millimeter wave multiuser massive MIMO,"IEEESig- nal Processing Letters, vol. 24, no. 5, pp. 550{554, 2017. [24] S. Park, A. Alkhateeb, and R. W. Heath, \Dynamic subarrays for hybrid precoding in wideband mmwave MIMO systems," IEEE Trans- actions on Wireless Communications, vol. 16, no. 5, pp. 2907{2920, 2017. [25] Z. Li, S. Han, and A. Molisch, \Hybrid beamforming design for millimeter-wave multi-user massive MIMO downlink," in Communi- cations (ICC), 2016 IEEE International Conference on, 2016. [26] A. Alkhateeb, O. El Ayach, G. Leus, and R. Heath, \Hybrid precoding for millimeter wave cellular systems with partial channel knowledge," in Proc. ITA, 2013. [27] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. Heath, \Spa- tially sparse precoding in millimeter wave MIMO systems," IEEE Trans. Wireless Commun., vol. 13, no. 3, pp. 1499{1513, 2014. 117 [28] W. Ni and X. Dong, \Hybrid block diagonalization for massive mul- tiuser MIMO systems," IEEE Trans. Commun., vol. 64, no. 1, pp. 201{211, 2015. [29] A. Alkhateeb, G. Leus, and R. Heath, \Limited feedback hybrid pre- coding for multi-user millimeter wave systems," IEEE Trans. Wireless Commun., vol. 14, no. 11, pp. 6481{6494, 2015. [30] F. Sohrabi and Y. Wei, \Hybrid digital and analog beamforming design for large-scale antenna arrays," IEEE J. Select. Topics Signal Process- ing, vol. 10, no. 3, pp. 501{513, 2016. [31] D. Ying, F. W. Vook, T. A. Thomas, and D. J. Love, \Hybrid structure in massive MIMO: Achieving large sum rate with fewer RF chains," in Communications (ICC), 2015 IEEE International Conference on. IEEE, 2015, pp. 2344{2349. [32] L. Liang, W. Xu, and X. Dong, \Low-complexity hybrid precoding in massive multiuser MIMO systems," IEEE Wireless Communications Letters, vol. 3, no. 6, pp. 653{656, 2014. [33] T. E. Bogale, L. B. Le, and A. Haghighat, \User scheduling for mas- sive MIMO OFDMA systems with hybrid analog-digital beamform- ing," in Communications (ICC), 2015 IEEE International Conference on. IEEE, 2015, pp. 1757{1762. [34] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, \Channel estimation and hybrid precoding for millimeter wave cellular systems," IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 5, pp. 831{846, 2014. [35] C. Kim, J.-S. Son, T. Kim, and J.-Y. Seol, \On the hybrid beamform- ing with shared array antenna for mmwave MIMO-OFDM systems," in Wireless Communications and Networking Conference (WCNC). IEEE, 2014, pp. 335{340. [36] W. Ni, X. Dong, and W.-S. Lu, \Near-optimal hybrid processing for massive MIMO systems via matrix decomposition,"IEEETransactions on Signal Processing, vol. 65, no. 15, pp. 3922{3933, 2017. 118 [37] O. El Ayach, R. W. Heath, S. Abu-Surra, S. Rajagopal, and Z. Pi, \Low complexity precoding for large millimeter wave MIMO systems," in Communications (ICC), 2012 IEEE International Conference on. IEEE, 2012, pp. 3724{3729. [38] C. Rusu, R. M endez-Rial, N. Gonz alez-Prelcicy, and R. W. Heath, \Low complexity hybrid sparse precoding and combining in millimeter wave MIMO systems," in Communications (ICC), 2015 IEEE Inter- national Conference on. IEEE, 2015, pp. 1340{1345. [39] S. Han, I. Chih-Lin, Z. Xu, and C. Rowell, \Large-scale antenna sys- tems with hybrid analog and digital beamforming for millimeter wave 5G," IEEE Communications Magazine, vol. 53, no. 1, pp. 186{194, 2015. [40] O. El Ayach, R. W. Heath, S. Rajagopal, and Z. Pi, \Multimode pre- coding in millimeter wave MIMO transmitters with multiple antenna sub-arrays," in Global Communications Conference (GLOBECOM). IEEE, 2013, pp. 3476{3480. [41] J. A. Zhang, X. Huang, V. Dyadyuk, and Y. J. Guo, \Massive hy- brid antenna array for millimeter-wave cellular communications,"IEEE Wireless Communications, vol. 22, no. 1, pp. 79{87, 2015. [42] R. A. Stirling-Gallacher and M. S. Rahman, \Multi-user MIMO strate- gies for a millimeter wave communication system using hybrid beam- forming," in Communications (ICC), 2015 IEEE International Con- ference on. IEEE, 2015, pp. 2437{2443. [43] ||, \Linear MU-MIMO pre-coding algorithms for a millimeter wave communication system using hybrid beam-forming," in Communica- tions (ICC), 2014 IEEE International Conference on. IEEE, 2014, pp. 5449{5454. [44] L.-K. Chiu and S.-H. Wu, \Hybrid radio frequency beamforming and baseband precoding for downlink MU-MIMO mmwave channels," in Communications (ICC), 2015 IEEE International Conference on. IEEE, 2015, pp. 1346{1351. [45] J. Singh and S. Ramakrishna, \On the feasibility of beamforming in millimeter wave communication systems with multiple antenna arrays," 119 in Global Communications Conference (GLOBECOM). IEEE, 2014, pp. 3802{3808. [46] C. Kim, T. Kim, and J.-Y. Seol, \Multi-beam transmission diversity with hybrid beamforming for MIMO-OFDM systems," in Globecom Workshops (GC Wkshps). IEEE, 2013, pp. 61{65. [47] M. S. Rahman and K. Josiam, \Low complexity RF beam search algo- rithms for millimeter-wave systems," in Global Communications Con- ference (GLOBECOM). IEEE, 2014, pp. 3815{3820. [48] Z. Xu, S. Han, Z. Pan, and I. Chih-Lin, \Alternating beamforming methods for hybrid analog and digital MIMO transmission," in Com- munications (ICC), 2015 IEEE International Conference on. IEEE, 2015, pp. 1595{1600. [49] G. Wang and G. Ascheid, \Hybrid beamforming under equal gain con- straint for maximizing sum rate at 60 GHz," in Vehicular Technology Conference (VTC Spring), 2015 IEEE 81st. IEEE, 2015. [50] M. M. Molu, P. Xiao, M. Khalily, K. Cumanan, L. Zhang, and R. Tafa- zolli, \Low-complexity and robust hybrid beamforming design for multi-antenna communication systems," IEEE Transactions on Wire- less Communications, vol. 17, no. 3, pp. 1445{1459, March 2018. [51] X. Wu, D. Liu, and F. Yin, \Hybrid beamforming for multi-user mas- sive MIMO systems," IEEE Transactions on Communications, 2018. [52] C. Lin, G. Y. Li, and L. Wang, \Subarray-based coordinated beam- forming training for mmwave and sub-THz communications," IEEE Journal on Selected Areas in Communications, vol. 35, no. 9, pp. 2115{ 2126, 2017. [53] S. Park, A. Alkhateeb, and R. W. Heath Jr, \Dynamic subarray ar- chitecture for wideband hybrid precoding in millimeter wave MIMO systems," in IEEE Global Conference on Signal and Information Pro- cessing (GlobalSIP), 2016, pp. 600{604. [54] C.-C. Hu and J.-H. Zhang, \Hybrid precoding design for adaptive sub- connected structures in millimeter-wave MIMO systems," IEEE Sys- tems Journal, 2018. 120 [55] A. Adhikary, J. Nam, J.-Y. Ahn, and G. Caire, \Joint spatial division and multiplexing|the large-scale array regime," IEEE Trans. Inform. Theory, vol. 59, no. 10, pp. 6441{6463, 2013. [56] X. Yu, J. Zhang, and K. B. Letaief, \A hardware-ecient analog net- work structure for hybrid precoding in millimeter wave systems," IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 2, pp. 282{ 297, May 2018. [57] X. Yu, J.-C. Shen, J. Zhang, and K. B. Letaief, \Alternating min- imization algorithms for hybrid precoding in millimeter wave MIMO systems,"IEEEJournalofSelectedTopicsinSignalProcessing, vol. 10, no. 3, pp. 485{500, 2016. [58] J. Jin, Y. R. Zheng, W. Chen, and C. Xiao, \Hybrid precoding for millimeter wave MIMO systems: A matrix factorization approach," IEEE Transactions on Wireless Communications, vol. 17, no. 5, pp. 3327{3339, May 2018. [59] V. Venkateswaran and A.-J. van der Veen, \Optimal phase-shifter de- sign to cancel RF interference in multi-antenna systems," in Acous- tics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010, pp. 2566{2569. [60] ||, \Analog beamforming in MIMO communications with phase shift networks and online channel estimation," IEEE Transactions on Signal Processing, vol. 58, no. 8, pp. 4131{4143, 2010. [61] M. Kim and Y. H. Lee, \MSE-based hybrid RF/baseband process- ing for millimeter-wave communication systems in MIMO interference channels," IEEE Transactions on Vehicular Technology, vol. 64, no. 6, pp. 2714{2720, 2015. [62] D. H. Nguyen, L. B. Le, T. Le-Ngoc, and R. W. Heath, \Hybrid MMSE precoding and combining designs for mmwave multiuser sys- tems," IEEE Access, vol. 5, pp. 19 167{19 181, 2017. [63] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, \An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel," IEEE Trans. Signal Processing, vol. 59, no. 9, pp. 4331{4340, 2011. 121 [64] L. Zhao, D. W. K. Ng, and J. Yuan, \Multi-user precoding and chan- nel estimation for hybrid millimeter wave systems," IEEE Journal on Selected Areas in Communications, vol. 35, no. 7, pp. 1576{1590, 2017. [65] 3GPP TR 36.897, \Study on elevation beamforming/full-dimension (FD) MIMO for LTE," Tech. Rep., 2015. [66] L. Kong, S. Han, and C. Yang, \Wideband hybrid precoder for massive MIMO systems," in Signal and Information Processing (GlobalSIP), 2015 IEEE Global Conference on. IEEE, 2015, pp. 305{309. [67] T. Bogale, L. Le, A. Haghighat, and L. Vandendorpe, \On the number of RF chains and phase shifters, and scheduling design with hybrid analog{digital beamforming,"IEEETrans.WirelessCommun., vol. 15, no. 5, pp. 3311{3326, 2016. [68] J. Geng, Z. Wei, X. Wang, W. Xiang, and D. Yang, \Multiuser hybrid analog/digital beamforming for relatively large-scale antenna arrays," in Globecom Workshops (GC Wkshps). IEEE, 2013, pp. 123{128. [69] J. P. Gonz alez-Coma, J. Rodr guez-Fern andez, N. Gonz alez-Prelcic, L. Castedo, and R. W. Heath, \Channel estimation and hybrid precod- ing for frequency selective multiuser mmwave MIMO systems," IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 2, pp. 353{ 367, May 2018. [70] K. Venugopal, A. Alkhateeb, N. G. Prelcic, and R. W. Heath, \Chan- nel estimation for hybrid architecture-based wideband millimeter wave systems," IEEE Journal on Selected Areas in Communications, vol. 35, no. 9, pp. 1996{2009, 2017. [71] F. Sohrabi and W. Yu, \Hybrid analog and digital beamforming for mmwave OFDM large-scale antenna arrays," IEEE Journal on Selected Areas in Communications, vol. 35, no. 7, pp. 1432{1443, 2017. [72] R. M endez-Rial, C. Rusu, N. Gonz alez-Prelcic, A. Alkhateeb, and R. W. Heath, \Hybrid MIMO architectures for millimeter wave com- munications: Phase shifters or switches?" IEEE Access, vol. 4, pp. 247{267, 2016. 122 [73] A. Alkhateeb, Y.-H. Nam, J. Zhang, and R. W. Heath, \Massive MIMO combining with switches," IEEE Wireless Communications Letters, vol. 5, no. 3, pp. 232{235, 2016. [74] V. V. Ratnam, A. F. Molisch, O. Y. Bursalioglu, and H. C. Papadopou- los, \Hybrid beamforming with selection for multiuser massive MIMO systems," IEEE Transactions on Signal Processing, vol. 66, no. 15, pp. 4105{4120, Aug. 2018. [75] Y. Zeng and R. Zhang, \Millimeter wave MIMO with lens antenna ar- ray: A new path division multiplexing paradigm," IEEE Transactions on Communications, vol. 64, no. 4, pp. 1557{1571, 2016. [76] ||, \Cost-eective millimeter-wave communications with lens an- tenna array,"IEEEWirelessCommunications, vol. 24, no. 4, pp. 81{87, 2017. [77] X. Gao, L. Dai, S. Han, I. Chih-Lin, and X. Wang, \Reliable beamspace channel estimation for millimeter-wave massive MIMO systems with lens antenna array," IEEE Transactions on Wireless Communications, vol. 16, no. 9, pp. 6010{6021, 2017. [78] Y. Zeng, R. Zhang, and Z. N. Chen, \Electromagnetic lens-focusing an- tenna enabled massive MIMO: Performance improvement and cost re- duction," IEEE Journal on Selected Areas in Communications, vol. 32, no. 6, pp. 1194{1206, 2014. [79] R. Guo, Y. Cai, M. Zhao, Q. Shi, B. Champagne, and L. Hanzo, \Joint design of beam selection and precoding matrices for mmwave MU-MIMO systems relying on lens antenna arrays," IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 2, pp. 313{325, May 2018. [80] Y. Gao, M. Khaliel, F. Zheng, and T. Kaiser, \Rotman lens based hybrid analog{digital beamforming in massive MIMO systems: Ar- ray architectures, beam selection algorithms and experiments," IEEE Transactions on Vehicular Technology, vol. 66, no. 10, pp. 9134{9148, 2017. [81] F. Sohrabi, Y. F. Liu, and W. Yu, \One-bit precoding and constellation range design for massive MIMO with QAM signaling," IEEE Journal 123 of Selected Topics in Signal Processing, vol. 12, no. 3, pp. 557{570, June 2018. [82] J. Choi, J. Mo, and R. W. Heath, \Near maximum-likelihood detector and channel estimator for uplink multiuser massive MIMO systems with one-bit ADCs," IEEE Transactions on Communications, vol. 64, no. 5, pp. 2005{2018, 2016. [83] J. Zhang, L. Dai, Z. He, S. Jin, and X. Li, \Performance analysis of mixed-ADC massive MIMO systems over Rician fading channels," IEEE Journal on Selected Areas in Communications, vol. 35, no. 6, pp. 1327{1338, 2017. [84] J. Mo, P. Schniter, and R. W. Heath, \Channel estimation in broad- band millimeter wave MIMO systems with few-bit ADCs," IEEE Transactions on Signal Processing, vol. 66, no. 5, pp. 1141{1154, March 2018. [85] P. Raviteja, Y. Hong, and E. Viterbo, \Millimeter wave analog beam- forming with low resolution phase shifters for multiuser uplink," IEEE Transactions on Vehicular Technology, vol. 67, no. 4, pp. 3205{3215, April 2018. [86] K. Roth and J. A. Nossek, \Achievable rate and energy eciency of hybrid and digital beamforming receivers with low resolution ADC," IEEE Journal on Selected Areas in Communications, vol. 35, no. 9, pp. 2056{2068, 2017. [87] J. Mo, A. Alkhateeb, S. Abu-Surra, and R. W. Heath, \Hybrid archi- tectures with few-bit ADC receivers: Achievable rates and energy-rate tradeos," IEEE Transactions on Wireless Communications, vol. 16, no. 4, pp. 2274{2287, 2017. [88] J. Choi, B. L. Evans, and A. Gatherer, \Resolution-adaptive hy- brid MIMO architectures for millimeter wave communications," IEEE Transactions on Signal Processing, vol. 65, no. 23, pp. 6201{6216, 2017. [89] A. Adhikary, E. Al Safadi, M. Samimi, R. Wang, G. Caire, T. Rap- paport, and A. Molisch, \Joint spatial division and multiplexing for mm-wave channels," IEEE J. Select. Areas Commun., vol. 32, no. 6, pp. 1239{1255, 2014. 124 [90] J. Nam, A. Adhikary, J. Ahn, and G. Caire, \Joint spatial division and multiplexing: Opportunistic beamforming, user grouping and simpli- ed downlink scheduling," IEEE J. Select. Topics Signal Processing, vol. 8, no. 5, pp. 876{890, 2014. [91] Y. Xu, G. Yue, N. Prasad, S. Rangarajan, and S. Mao, \User group- ing and scheduling for large scale MIMO systems with two-stage pre- coding," in 2014 IEEE International Conference on Communications (ICC), June 2014, pp. 5197{5202. [92] J. Chen and V. K. Lau, \Two-tier precoding for FDD multi-cell massive MIMO time-varying interference networks," IEEE Journal on Selected Areas in Communications, vol. 32, no. 6, pp. 1230{1238, 2014. [93] A. Liu and V. Lau, \Phase only RF precoding for massive MIMO sys- tems with limited RF chains,"IEEETransactionsonSignalProcessing, vol. 62, no. 17, pp. 4505{4515, 2014. [94] A. Liu and V. K. Lau, \Two-stage subspace constrained precoding in massive MIMO cellular systems," IEEE Transactions on Wireless Communications, vol. 14, no. 6, pp. 3271{3279, 2015. [95] S. Park, J. Park, A. Yazdan, and R. W. Heath, \Exploiting spatial channel covariance for hybrid precoding in massive MIMO systems," IEEETransactionsonSignalProcessing, vol. 65, no. 14, pp. 3818{3832, 2017. [96] A. Alkhateeb, G. Leus, and R. W. Heath, \Multi-layer precoding: A potential solution for full-dimensional massive MIMO systems," IEEE Transactions on Wireless Communications, vol. 16, no. 9, pp. 5810{ 5824, 2017. [97] M. Dai and B. Clerckx, \Multiuser millimeter wave beamforming strategies with quantized and statistical CSIT," IEEE Transactions on Wireless Communications, vol. 16, no. 11, pp. 7025{7038, 2017. [98] S. Haghighatshoar and G. Caire, \Massive MIMO channel subspace estimation from low-dimensional projections," IEEE Transactions on Signal Processing, vol. 65, no. 2, pp. 303{318, 2017. 125 [99] S. Park and R. W. Heath Jr, \Spatial channel covariance estimation for the hybrid MIMO architecture: A compressive sensing based ap- proach," arXiv preprint arXiv:1711.04207, 2017. [100] Z. Li, S. Han, S. Sangodoyin, R. Wang, and A. F. Molisch, \Joint opti- mization of hybrid beamforming for multi-user massive MIMO down- link," IEEE Transactions on Wireless Communications, vol. 17, no. 6, pp. 3600{3614, June 2018. [101] Z. Li, S. Han, and A. F. Molisch, \Channel-statistics-based analog downlink beamforming for millimeter-wave multi-user massive MIMO," in Communications (ICC), 2017 IEEE International Conference on. IEEE, 2017. [102] ||, \Optimizing channel-statistics-based analog beamforming for millimeter-wave multi-user massive MIMO downlink," IEEE Trans- actions on Wireless Communications, vol. 16, no. 7, pp. 4288{4303, 2017. [103] ||, \User-centric virtual sectorization for millimeter-wave mas- sive MIMO downlink," in 2017 Global Communications Conference (GLOBECOM), Dec. 2017. [104] ||, \User-centric virtual sectorization for millimeter-wave massive MIMO downlink," IEEE Transactions on Wireless Communications, vol. 17, no. 1, pp. 445{460, 2018. [105] J. Li, L. Xiao, X. Xu, and S. Zhou, \Robust and low complexity hy- brid beamforming for uplink multiuser mmwave MIMO systems,"IEEE Commun. Lett., vol. 20, no. 6, pp. 1140{1143, Jun. 2016. [106] C. Sun, X. Gao, S. Jin, M. Matthaiou, Z. Ding, and C. Xiao, \Beam division multiple access transmission for massive MIMO communica- tions," IEEE Trans. Commun., vol. 63, no. 6, pp. 2170 {2184, 2015. [107] C. Yang, S. Han, X. Hou, and A. Molisch, \How do we design CoMP to achieve its promised potential?" IEEE Wireless Commun. Mag., vol. 20, no. 1, pp. 67{74, 2013. 126 [108] S. Shim, J. S. Kwak, R. Heath, and J. Andrews, \Block diagonaliza- tion for multi-user MIMO with other-cell interference," IEEE Trans. Wireless Commun., vol. 7, no. 7, pp. 2671{2681, 2008. [109] S. Sangodoyin, V. Kristem, C. U. Bas, M. K aske, J. Lee, C. Schneider, G. Sommerkorn, J. Zhang, R. Thom a, and A. F. Molisch, \Cluster- based analysis of 3D MIMO channel measurement in an urban envi- ronment," in MILCOM 2015 - 2015 IEEE Military Communications Conference, Oct. 2015, pp. 744{749. [110] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen, L. Li, and K. Haneda, \Hybrid beamforming for massive MIMO: A survey," IEEE Communications Magazine, vol. 55, no. 9, pp. 134{141, 2017. [111] O. Ayach and R. Heath, \Interference alignment with analog channel state feedback," IEEE Trans. Wireless Commun., vol. 11, no. 2, pp. 626{636, 2012. [112] M. Sadek, A. Tarighat, and A. Sayed, \A leakage-based precoding scheme for downlink multi-user MIMO channels," IEEE Trans. Wire- less Commun., vol. 6, no. 5, pp. 1711{1721, May 2007. [113] P. Almers, E. Bonek, A. Burr, N. Czink, M. Debbah, V. Degli-Esposti, H. Hofstetter, P. Ky osti, D. Laurenson, G. Matz et al., \Survey of channel and radio propagation models for wireless MIMO systems," EURASIP Journal on Wireless Communications and Networking, vol. 2007, no. 1, p. 019070, 2007. [114] M. Kobayashi, N. Jindal, and G. Caire, \Training and feedback op- timization for multiuser MIMO downlink," IEEE Trans. Commun., vol. 59, no. 8, pp. 2228{2240, 2011. [115] W. I. Remcom, \https://www.remcom.com/." [116] M. Steinbauer, A. F. Molisch, and E. Bonek, \The double-directional radio channel," IEEE Antennas Propagat. Mag., vol. 43, no. 4, pp. 51{63, Aug. 2001. [117] R. Thomae, D. Hampicke, A. Richter, G. Sommerkorn, A. Schneider, U. Trautwein, and W. Wirnitzer, \Identication of time-variant direc- tional mobile radio channels," IEEE Trans. Instrum. Meas., vol. 49, no. 2, pp. 357{364, Apr. 2000. 127 [118] R. Thomae, D. Hampicke, A. Richter, G. Sommerkorn, and U. Trautwein, \MIMO vector channel sounder measurement for smart antenna system evaluation," European Transactions on Telecommuni- cations ETT, vol. 12, no. 5, pp. 427{438, Sep. 2001. [119] Q. Spencer, A. Swindlehurst, and M. Haardt, \Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels," IEEE Trans. Signal Processing, vol. 52, no. 2, pp. 461{471, 2004. [120] S. Vishwanath, N. Jindal, and A. Goldsmith, \Duality, achievable rates, and sum-rate capacity of Gaussian MIMO broadcast channels," IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 2658{2668, 2003. [121] R. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. Sayeed, \An overview of signal processing techniques for millimeter wave MIMO systems," IEEE J. Select. Topics Signal Processing, vol. 10, no. 3, pp. 436{453, Apr. 2016. [122] W. Yu, \Uplink-downlink duality via minimax duality," IEEE Trans. Inform. Theory, vol. 52, no. 2, pp. 361{374, 2006. [123] H. Bahrami and T. Le-Ngoc, \Precoder design based on correlation matrices for MIMO systems," IEEE Trans. Wireless Commun., vol. 5, no. 12, pp. 3579{3587, 2006. [124] A. Aitken, Determinants and matrices. Edinburgh : Oliver and Boyd, 1954. [125] S. Hur, S. Baek, B. Kim, Y. Chang, A. Molisch, T. Rappaport, K. Haneda, and J. Park, \Proposal on millimeter-wave channel model- ing for 5G cellular system," IEEE J. Select. Topics Signal Processing, vol. 10, no. 3, pp. 454{469, 2016. [126] J. H. Kotecha and A. M. Sayeed, \Transmit signal design for optimal estimation of correlated MIMO channels," IEEE Trans. Signal Pro- cessing, vol. 52, no. 2, pp. 546{557, 2004. [127] W. Weichselberger, M. Herdin, H. Ozcelik, and E. Bonek, \A stochastic MIMO channel model with joint correlation of both link ends," IEEE Trans. Wireless Commun., vol. 5, no. 1, pp. 90{100, 2006. 128 [128] N. Jindal, W. Rhee, S. Vishwanath, S. Jafar, and A. Goldsmith, \Sum power iterative water-lling for multi-antenna Gaussian broad- cast channels," IEEE Trans. Inform. Theory, vol. 51, no. 4, pp. 1570{ 1580, 2005. [129] H. Asplund, A. Glazunov, A. Molisch, K. Pedersen, and M. Steinbauer, \The COST 259 directional channel model-part II: Macrocells," IEEE Trans. Wireless Commun., vol. 5, no. 12, pp. 3434{3450, 2006. [130] L. Liu, C. Oestges, J. Poutanen, K. Haneda, P. Vainikainen, F. Quitin, F. Tufvesson, and P. Doncker, \The COST 2100 MIMO channel model," IEEE Wireless Commun. Mag., vol. 19, no. 6, pp. 92{99, 2012. [131] G. Matz, \Statistical characterization of non-WSSUS mobile radio channels,"e&iElektrotechnikundInformationstechnik, vol. 122, no. 3, pp. 80{84, 2005. [132] Z. Li, N. Rupasinghe, O. Y. Bursalioglu, C. Wang, H. Papadopoulos, and G. Caire, \Directional training and fast sector-based processing schemes for mmwave channels," in2017IEEEInternationalConference on Communications (ICC), May 2017. [133] T. Bogale, L. Le, and X. Wang, \Hybrid analog-digital channel esti- mation and beamforming: Training-throughput tradeo," IEEE Trans. Commun., vol. 63, no. 12, pp. 5235 { 5249, Dec. 2015. [134] X. Zheng, H. Zhang, W. Xu, and X. You, \Semi-orthogonal pilot design for massive MIMO systems using successive interference cancellation," in2014GlobalCommunicationsConference(GLOBECOM), Dec. 2014, pp. 3719{3724. [135] Z. Chen, X. Hou, and C. Yang, \Training resource allocation for user- centric base station cooperation networks,"IEEETrans.Veh.Technol., vol. 65, no. 4, pp. 2729{2735, Apr. 2016. 129
Abstract (if available)
Abstract
Deployed with a large number of antenna elements, massive multiple-input-multiple-output (MIMO) systems will be an important component of next generation wireless systems, because they provide tremendous increase of spectral efficiency. However, two major challenges hinder the implementation of massive MIMO in a practical system. The first one is the hardware constraint. To build a complete up-and-down conversion chain for every antenna element of a large array is not only cost-prohibitive but also entails significant power consumption, especially for those mixed-frequency signal components at millimeter-wave (mm-) frequencies. Another challenge is the acquisition of instantaneous channel state information (CSI) for large antenna arrays. Conventional downlink training, whose cost is proportional to the array size at the base station (BS), cannot be directly extended to a massive MIMO system. With the help of channel reciprocity, uplink training cost is proportional to the number of antennas at the user equipments (UEs). However, if the ensemble of UE antennas is of the same approximate as that of BS antenna array, the dilemma of instantaneous CSI acquisition still remains. The severe Doppler effect at mm-wave bandwidths, leading to short coherence times, worsens the situation. ❧ One possible solution to address the above key challenges is utilizing the hybrid digital/analog (HDA) architecture, which decomposes the MIMO precoding/combining matrix into two concatenated beamforming matrices, realized in analog and digital domain, respectively. This leads to a cost-and-energy efficient way to implement massive MIMO. To alleviate the burden of instantaneous CSI acquisition, the analog (pre-) beamforming can be adaptive to the channel statistics or, as we call it, long-term CSI, while the digital beamforming is adaptive to effective channel projected by analog beamforming. Then, the dimension of the effective channel that needs to be trained is reduced from the number of antenna elements to the number of radio frequency (RF) chains. Since the stationarity region of channel statistics can be equivalent to tens or hundreds of coherence blocks, the long-term CSI can be acquired and tracked in a much slower fashion, which significantly mitigates instantaneous CSI training effort. ❧ Following the surge of mm-wave communications, it is necessary to build large arrays at both BS and UEs to combat the severe path loss, and satisfy the required received signal strength. With both link ends equipped by the HDA architecture and given the special channel characteristics at mm-wave frequencies, beamformer optimization of multi-user (MU-) MIMO encounters the following new constraints: i) there will be four types of beamforming matrices that are coupled together, i.e., analog/digital precoder and analog/digital combiner, ii) analog and digital beamformers are adaptive on different time scales, i.e., channel statistics and instantaneous CSI, respectively, iii) since coherence time is much shorter at mm-wave bandwidths, the impact of training cost needs to be jointly concerned with beamformer optimization. ❧ This dissertation develops beamforming strategies to leverage channel characteristics and HDA architecture to make massive MIMO practical in the real world. Throughout the dissertation, we assume the full knowledge of second order channel statistics, which are far less time-sensitive than the instantaneous CSI, and can be tracked efficiently. Based on the mixed-time-scale design, the proposed solutions adopt semi-decoupling of beamformers in the analog and digital domain to reduce the algorithm complexity. The main contributions of this dissertation are: ❧ i) Low-complexity hybrid beamforming with fully decoupled analog and digital design: we first optimize the channel-statistics-based analog beamforming to maximize the average signal to interference plus noise ratio (SINR) by treating digital beamformers as identity matrices. Then, by fixing the analog beamforming, the digital beamformers are optimized to maximize the spectral efficiency based on the knowledge of channel statistics and effective instantaneous CSI, where partial feedback scheme is also investigated to further reduce the cost of CSI acquisition. ❧ ii) Optimal analog beamforming with capacity-achieving digital beamforming: rather than fixing the analog beamforming to optimize the digital one as above, we assume the capacity-achieving digital beamforming, whose coupling effect is incorporated to optimize the channel-statistics-based analog beamforming. With optimal analog beamforming to maximize the ergodic downlink capacity, conventional digital beamforming strategies, e.g., block diagonalization, can be implemented. ❧ iii) Joint optimization of analog beamforming and training resource allocation: the overall network throughput also depends on the cost of instantaneous CSI acquisition. For given analog beamformers, parallel beam training is possible if effective beam-to-UE channels are orthogonal across different UEs. Therefore, we extend the above optimization of analog beamforming to incorporate the impact of training cost, and reach a low-complexity solution that can maximize the network throughput. ❧ Optimality analyses and numerical simulations demonstrate advantages of proposed schemes over the state-of-the-art methods, which can be enabled as promising technologies to implement practical massive MIMO.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Signal processing for channel sounding: parameter estimation and calibration
PDF
Design and analysis of reduced complexity transceivers for massive MIMO and UWB systems
PDF
Design and analysis of large scale antenna systems
PDF
mmWave dynamic channel measurements for localization and communications
PDF
Double-directional channel sounding for next generation wireless communications
PDF
Real-time channel sounder designs for millimeter-wave and ultra-wideband communications
PDF
Elements of next-generation wireless video systems: millimeter-wave and device-to-device algorithms
PDF
Enabling massive distributed MIMO for small cell networks
PDF
Multidimensional characterization of propagation channels for next-generation wireless and localization systems
PDF
RF and mm-wave blocker-tolerant reconfigurable receiver front-ends
PDF
Fundamentals of two user-centric architectures for 5G: device-to-device communication and cache-aided interference management
PDF
Exploiting side information for link setup and maintenance in next generation wireless networks
PDF
Novel beamforming with dual-layer array transducers for 3-D ultrasound imaging
PDF
Improving spectrum efficiency of 802.11ax networks
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Large system analysis of multi-cell MIMO downlink: fairness scheduling and inter-cell cooperation
PDF
Achieving efficient MU-MIMO and indoor localization via switched-beam antennas
PDF
Silicon-based RF/mm-wave power amplifiers and transmitters for future energy efficient communication systems
PDF
Reinforcement learning in hybrid electric vehicles (HEVs) / electric vehicles (EVs)
PDF
Configurable imaging platform for super-harmonic contrast-enhanced ultrasound imaging
Asset Metadata
Creator
Li, Zheda
(author)
Core Title
Hybrid beamforming for massive MIMO
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
08/09/2018
Defense Date
06/11/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
5G,hybrid beamforming,massive MIMO,millimeter-wave,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Molisch, Andreas (
committee chair
), Avestimehr, Salman (
committee member
), Razaviyayn, Meisam (
committee member
)
Creator Email
zhedali@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-64058
Unique identifier
UC11670757
Identifier
etd-LiZheda-6709.pdf (filename),usctheses-c89-64058 (legacy record id)
Legacy Identifier
etd-LiZheda-6709.pdf
Dmrecord
64058
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Li, Zheda
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
5G
hybrid beamforming
massive MIMO
millimeter-wave