Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Large system analysis of multi-cell MIMO downlink: fairness scheduling and inter-cell cooperation
(USC Thesis Other)
Large system analysis of multi-cell MIMO downlink: fairness scheduling and inter-cell cooperation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
LARGE SYSTEM ANALYSIS OF MULTI-CELL MIMO DOWNLINK: FAIRNESS SCHEDULING AND INTER-CELL COOPERATION by Hoon Huh A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2011 Copyright 2011 Hoon Huh Dedication This dissertation is dedicated to my dear wife, Eunho and two lovely daughters, Jane and Joan, with great appreciation and happiness. ii Acknowledgements First and foremost, I would like to express my sincere gratitude to my advisor, Prof. Giuseppe Caire for his guidance and support throughout the past ve years. He showed me an example of a true scholar and teacher and I am really honored to be his stu- dent. This research work would not be possible without his great encouragement and inspiration. I extend my sincere thanks to Prof. Urbashi Mitra and Prof. Leana Golubchik, who reviewed this dissertation. Their insightful comments and invaluable advice made this work technically sound and meaningful. I am deeply indebted to Dr. Haralabos Papadopoulos, Dr. Antonia Tulino, and Prof. Inkyu Lee, who are the coauthors of papers which led to the results of this dissertation. I was so fortunate to have opportunities to work together with each of them and have learned a lot from their integrity and erudition. I am grateful to my colleagues, Ozgun Bursalioglu and Song Nam Hong for their friendly help during my USC years and thank Samsung Electronics Co., Ltd. for giving me a valuable opportunity for the Ph.D study and supporting me for all that time. Finally, I want to say thank you to my parents and family for their endless support, sacrice, and love. First of all, I sincerely hope my father who is struggling against cancer will overcome it successfully and I will be able to repay my father and mother for their devotion to me for a long time. Without my dearest wife, Eunho, I could not have done anything. Her full faith and love has been a source of power for me and I cannot express in words how grateful I am to her and also to her parents. My beloved daughters, Jane and Joan, I wish they will know someday that they are joy and hope every single day of my life. iii Table of Contents Dedication ii Acknowledgements iii List of Figures vi Abstract viii Chapter 1 Introduction 1 1.1 Large System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Multi-cell Coverage and Fairness Criteria . . . . . . . . . . . . . . . . . . 7 1.3 Inter-cell Cooperation and Channel Training . . . . . . . . . . . . . . . . 9 1.4 Massive MIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 2 Ergodic Capacity under Fairness Criteria 16 2.1 Multi-cell Model and Capacity Region . . . . . . . . . . . . . . . . . . . . 16 2.2 Weighted Ergodic Sum Rate Maximization . . . . . . . . . . . . . . . . . 23 2.2.1 Solution for Finite N . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.2 Limit for N!1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.3 Computation of the Asymptotic Rates . . . . . . . . . . . . . . . . 30 2.2.4 System Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 Fairness Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.3.1 Lagrangian Optimization . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.2 Proportional Fairness Scheduling . . . . . . . . . . . . . . . . . . . 40 2.3.3 Hard Fairness Scheduling . . . . . . . . . . . . . . . . . . . . . . . 40 2.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.5.1 Proof of Lemma 2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.5.2 Proof of Lemma 2.2. . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5.3 Proof of Lemma 2.3. . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5.4 Proof of Theorem 2.2. . . . . . . . . . . . . . . . . . . . . . . . . . 48 iv Chapter 3 Inter-cell Cooperation and Impact of Channel Training 50 3.1 Linear Zero-forcing Beamforming . . . . . . . . . . . . . . . . . . . . . . . 50 3.1.1 System Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.1.2 Downlink Scheduling Optimization Problem . . . . . . . . . . . . . 52 3.1.3 Power Allocation under Sum-power or Per-BS Power Constraints . 56 3.2 Large System Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.1 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.2 Weighted Sum-rate Maximization . . . . . . . . . . . . . . . . . . . 61 3.2.3 Optimization of the User Fractions and Powers . . . . . . . . . . . 64 3.2.4 Network Utility Function Maximization . . . . . . . . . . . . . . . 66 3.3 Channel Estimation and Non-Perfect CSIT . . . . . . . . . . . . . . . . . 68 3.4 Numerical Results and Simplied Scheduling . . . . . . . . . . . . . . . . 73 3.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.5.1 Proof of Theorem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.5.2 Proof of Theorem 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.5.3 Proof of Theorem 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . 90 Chapter 4 Massive MIMO and Location-based Scheduling 93 4.1 TDD Cellular System Model . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.1.1 Cellular Layout and Frequency Reuse . . . . . . . . . . . . . . . . 95 4.1.2 Channel Statistics and Received Signal Model . . . . . . . . . . . . 100 4.2 Uplink Training and Channel Estimation . . . . . . . . . . . . . . . . . . 102 4.2.1 Pilot Reuse Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.2 MMSE Channel Estimation and Pilot Contamination . . . . . . . 103 4.3 MU-MIMO Precoders and Achievable Rates . . . . . . . . . . . . . . . . . 105 4.3.1 Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3.2 Achievable Group Spectral Eciency . . . . . . . . . . . . . . . . . 110 4.4 Scheduling and Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.5 Numerical Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 114 4.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.6.1 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.6.2 Proof of Theorems 4.2 and 4.3 . . . . . . . . . . . . . . . . . . . . 124 Chapter 5 Conclusions 131 References 134 v List of Figures 1.1 Multi-cell MU-MIMO system with spatial user distribution. . . . . . . . . 2 1.2 Network-MIMO inter-cell cooperation. . . . . . . . . . . . . . . . . . . . . 3 1.3 Eigenvalue distribution of HH H for = 2. The solid curve denotes Marcenko-Pastur density function and the bars denote the empirical dis- tributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Wyner model with a one-dimensional linear cellular array. . . . . . . . . . 8 1.5 Downlink beamforming in the FDD system. . . . . . . . . . . . . . . . . . 10 2.1 Linear one-sided 2-cell model with B = 2 BSs and A = 8 user groups. . . 33 2.2 Utility function and individual group rates under PFS in the 2-cell model. Antenna ratio = 4 and K = 8 user groups. . . . . . . . . . . . . . . . . . 42 2.3 Utility function and individual group rates under HFS in the 2-cell model. Antenna ratio = 4 and K = 8 user groups. . . . . . . . . . . . . . . . . . 43 2.4 Two-dimensional three-sectored hexagonal 7-cell model. . . . . . . . . . . 45 2.5 Cluster forms for three levels of cooperation with dierent colors denoting dierent clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.6 Ergodic user rate distribution under PFS in the 7-cell model. Antenna ratio = 4 and K = 84 user groups. . . . . . . . . . . . . . . . . . . . . . 47 3.1 Cluster (B = 2) sum rate of the proposed optimization algorithm as 0 increases from 0 to = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.2 Cell m and m + 1 in a linear two-sided cellular array. . . . . . . . . . . . . 74 3.3 Cluster forms for B = 1; 2; and 8 cell cooperation. . . . . . . . . . . . . . 75 3.4 User rate under perfect CSIT, obtained from large system analysis for B =1, 2, and 8 and from nite dimension simulation with greedy user selection for B = 2 and N =1, 2, 4, and 8. M = 8 cells and K = 64 user groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 vi 3.5 User rate under perfect CSIT in nite dimension (N = 2, 4, and 8) with random user selection and power allocation aided by the asymptotic results for B=1, 2, and 8. M = 8 cells and K = 64 user groups. . . . . . . . . . . 78 3.6 Cell sum rate versus antenna ratio for B=1, 2, and 8. M = 8 cells and K = 192 user groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.7 Cell sum rate versus cluster size B for =1, 2, 4, and 8 under non-perfect CSIT and explicit downlink training with p = . M = 24 cells and K = 192 user groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.1 Two dimensional hexagonal cell layout with B = 19. The triangle marks indicate the BS positions (points of bs ) and the red triangle marks indicate the points of . The insides of the large thick-lined hexagon and the small hexagons denoteV andV b for b2 bs , respectively. . . . . . . . . . . . . . 95 4.2 Cluster pattern geometry and user bins in one-dimensional and two dimen- sional layouts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.3 1-dimensional layout with C = 2 and F = 2. . . . . . . . . . . . . . . . . . 100 4.4 Pilot reuse and contamination for C = 2, F = 1, and Q = 2. The dashed lines show the contamination from a user sharing the same pilot signal, in another cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.5 Two cases of a precoding scheme for C = 2, F = 1, and Q = 2, with (a) J = Q and (b) J = C(Q 1) + 1. The dashed lines indicate the channel vectors to out-of-cluster users for which a ZF constraint is imposed. In Figure (b), the light-shaded dashed lines indicate the channel vectors assumed zero in the beamforming calculation. . . . . . . . . . . . . . . . . 108 4.6 Bin spectral eciency vs. location within a cell obtained from the large system analysis (solid) and the nite dimension (N = 1) simulation (dot- ted) for various (F;C;J). M = 30 and L = 40. . . . . . . . . . . . . . . . 115 4.7 Optimal scheme at each user locations. M = 20 and 100, K = 16, and L = 84. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.8 Bin-optimized spectral eciencies normalized by the (1,1,0) spectral e- ciencies. M = 50, K = 48,and L = 84. . . . . . . . . . . . . . . . . . . . . 118 4.9 Cluster sum throughput vs. M for various (F;C;J) and for a bin-optimized architecture under PF scheduling. K = 48 and L = 84. The arrow indi- cates that the proposed architecture achieves the same spectral eciency as the xed scheme (1; 1; 0) of [Mar10], with a 10-fold reduction of the number of BS antennas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 vii Abstract For the evolution of multiuser MIMO (MU-MIMO) technologies in the multi-cell environ- ment, a lot of studies are considering the inter-cell cooperation schemes, which provide benets of inter-cell interference mitigation and/or higher antenna array gain from a number of cooperating transmit antennas. In order to appreciate the full potentials of such schemes for the MU-MIMO system, the following aspects need to be considered: a multi-cell coverage with spatial user distribution and realistic pathloss model; multiuser scheduling, taking into account fairness issues; the type of downlink precoding and em- ployed signal processing; the availability of channel state information at the transmitter; the type of inter-cell cooperation and corresponding system overhead. However, taking into account all these aspects in a single framework is very complicated and its analytical characterization is generally dicult. So far, the system performance has been evaluated through computationally very intensive Monte Carlo simulation. In this dissertation, an analytic tool is developed, based on the large random matrix theory. In the large system limit where both the number of antennas per base station (BS) and the number of users per cell grow to innity with a constant ratio, the randomness in channels from BSs to users disappears and the MU-MIMO system becomes a deterministic network, which allows analytic and computationally ecient performance evaluation and optimization. Using this large system analysis, some relevant issues in the design of the MU-MIMO system in the multi-cell downlink are considered: 1) ergodic capacity of the MU-MIMO system with inter-cell cooperation subject to general fairness criteria; 2) the trade-o between the benet of inter-cell cooperation and the cost of estimating cooperating cells' channels; and 3) the ecient performance achievement of the so-called \massive MIMO" system by the user location-based scheduling and transmission strategy. viii Chapter 1 Introduction The next generation of wireless cellular systems, e.g., 802.16m [IEE10], LTE-Advanced [3GP10a] is expected to capitalize on the large spectral eciency gains promised by Mul- tiuser Multiple-Input Multiple-Output (MU-MIMO) technologies. A considerable research eort has been dedicated to the performance evaluation of MU-MIMO technologies under the cellular environments [PDF + 08,LSO09,FCD + 09]. The downlink capacity under a single cell setting was thoroughly studied and charac- terized with the full information theoretic understanding of the underlying MIMO Gaus- sian broadcast channel (BC): the achievable rate region of the MIMO-BC was investigated by applying dirty paper coding (DPC) [CS03] and its sum capacity was determined in the uplink established under the duality between the BC and multiple-access channel (MAC) [VT03, VJG03, JVG04, YC04]. The entire capacity region was characterized and shown to coincide with the DPC region [WSS06]. However, in a multi-cell scenario, e.g., shown in Figure 1.1, we are in the presence of a MIMO broadcast and interference channel which is not yet fully understood in an information theoretic sense. A simple and practical approach consists of treating inter- cell interference (ICI) as noise and, in this case, the system capacity may be signicantly limited by the ICI. A variety of inter-cell cooperation schemes have been proposed to mitigate ICI, ranging from a fully cooperative network MIMO [FKV06, JTS + 07, BH07, CRP + 08] to a partially coordinated beamforming [DY10,HPC10,BPG + 09]. For example, 1 BS 1 BS 2 BS 3 Figure 1.1: Multi-cell MU-MIMO system with spatial user distribution. with a network MIMO scheme shown in Figure 1.2, a cluster of base stations (BSs) are connected to some cluster controller and their transmit signals are jointly designed at the controller such that the BSs operate as a distributed antenna array of a single cell. These cooperation schemes provide the benets of ICI mitigation and/or larger antenna array gain and in order to appreciate the full potential and limitation of such schemes in a realistic wireless cellular setting, the following important aspects need to be considered: a multi-cell coverage with spatial user distribution and realistic pathloss model; a multiuser scheduling, taking into account fairness issues; the type of downlink precoding and employed signal processing; the availability of channel state information at the transmitter (CSIT); the type of inter-cell cooperation and corresponding system overheads. Taking into account all these aspects in a single framework is very complicated and its analytical characterization is generally dicult. So far, the system performance has been evaluated through computationally very intensive Monte Carlo simulation [HV05,BH07, ZMM + 08,CRP + 08,MF08,ZCA + 09,HTH + 09,RC09,PDF + 08,LSO09,FCD + 09]. In this dissertation, an alternative approach is proposed, based on the \large system limit" analysis. We leverage results on the large random matrix theory [TV04, Gir90, 2 cluster controller BS 1 BS 2 BS 3 Figure 1.2: Network-MIMO inter-cell cooperation. TLV05, TLV06, ABEH06], and asymptotically characterize the system achievable rate region in the limit where both the number of antennas per BS and the number of users per cell grow to innity with a xed ratio. In the large-system limit, the channel randomness disappears and the MU-MIMO system becomes a deterministic network. It follows that complicated system optimization and performance evaluation can be formulated into \static" problems which allow almost closed form solutions. The solutions are obtained by solving a system of xed-point equations and are particularly simple under certain symmetry system conditions that will be discussed later on. The proposed approach is much more ecient than Monte Carlo simulation and provides a very accurate prediction for the performance of nite-dimensional systems, even for small dimension. 1.1 Large System Analysis In the large system analysis, the main tool is the rich and powerful theory of large random matrices [TV04,Gir90] and the system ergodic capacity or achievable rate in many cases of interest can be conveniently calculated by scaling the system size to innity and making use of the limiting distributions of large random matrices. This approach, pioneered in [VS99,TV00] for the analysis of random-spreading code-division multiple access (CDMA), has been successfully applied to single-user MIMO channels [CTKV02, DP03, TLV05] to 3 network-MIMO cellular systems [ABEH06, HKD10, ZH10]. In addition, it was observed experimentally and proved mathematically (e.g., see [MSS03,KCM09]) that in most cases of interest the convergence of the actual nite-dimensional systems to their large-system limit is very fast, so that the results predicted by the large-system analysis are very accurate approximations of the actual system performance for system sizes of practical interest. In order to understand usefulness of the large system analysis, we review some classical example of the single-user MIMO channel [TV04]. One channel use of the single-user MIMO is described by y = Hx + z; (1.1) where x 2 C M , y 2 C N , and z CN (0; I N ) denotes the transmit signal, received signal and i.i.d. complex Gaussian noise vector, respectively and H2C NM denotes the channel matrix with channel fading coecients [H] n;m CN (0; 1=N). Assuming that the channel matrix H is perfectly known at the receiver but unknown at the transmitter, the normalized mutual information of this open-loop single-user MIMO channel given H is given by 1 N I(x; yjH) = 1 N log det I N +SNR HH H = 1 N N X i=1 log 1 +SNR i HH H = Z 1 0 log (1 +SNRx)dF N HH H (x) (1.2) where the transmit signal-to-noise ratio (SNR) is given by SNR = NE[kxk 2 ] ME[kzk 2 ] 4 and F N HH H denotes the empirical cumulative distribution function of the eigenvalues f 1 (HH H ); ; N (HH H )g of matrix HH H dened as F N HH H (x) = 1 N N X n=1 1f n (HH H )xg with 1fg denoting the indicator function. Taking the expectation of (1.2) with respect to the distribution of H, we obtain the ergodic channel capacity. For H with i.i.d. zero-mean and variance 1 N elements, the limiting empirical distri- bution of the eigenvalues of HH H is known by the Marcenko-Pastur law [MP67]. Specif- ically, as M;N !1 with M=N ! , the empirical distribution of the eigenvalues of HH H almost surely converges to the Marcenko-Pastur probability density function given by f (x) = [1] + (x) + p [xa] + [bx] + 2x (1.3) where [] + = max(; 0), a = (1 p ) 2 , and b = (1 + p ) 2 . Figure 1.3 shows the Marcenko-Pastur distribution for = 2 and the example of empirical distribution for N =16, 64, and 256 for a single channel realization. It is shown that as N grows, the randomness gets smaller and smaller and the empirical distributions converge to the Marcenko-Pastur density function, which enables the analytical evaluation independently of the instantaneous channel realization in the large system limit. In this regime of N!1, we have the closed-form expression for (1.2) [VS99] as 1 N log det I N +SNR HH H ! Z b a log(1 +SNRx)f (x)dx = log 1 +SNR 1 4 F(SNR;) + log 1 +SNR 1 4 F(SNR;) loge 4SNR F(SNR;) (1.4) 5 0 1 2 3 4 5 6 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 x probability density function, f H H H(x) (a) N = 16 0 1 2 3 4 5 6 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 x probability density function, f H H H(x) (b) N = 64 0 1 2 3 4 5 6 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 x probability density function, f H H H(x) (c) N = 256 Figure 1.3: Eigenvalue distribution of HH H for = 2. The solid curve denotes Marcenko- Pastur density function and the bars denote the empirical distributions. whereF(x;z) = p x(1 + p z) 2 + 1 p x(1 p z) 2 + 1 2 . This simple expression is computationally much more ecient than the Monte Carlo simulation, which typically evaluates the ergodic capacity by averaging (1.2) for a great number of H realizations. In this dissertation, we consider cellular downlink environments with multiple users and distance-dependent pathloss, where H has independent but non-identically distributed elements and inter-cell interference exists. Using the large system analysis under such system settings, we treat various evaluation and optimization problems with inter-cell cooperation, fairness scheduling, capacity achieving or practical precoding, and explicit channel training according to the system duplex method. In particular, we focus on the interesting issues to be described in the following sections. 6 1.2 Multi-cell Coverage and Fairness Criteria The rst simple and analytically tractable model for a multi-cell system with inter-cell cooperation was introduced by Wyner [Wyn94] for the uplink, and extended by uplink- downlink duality [JVG04] to the downlink in [SZS07]. Wyner model considers either a one-dimensional linear cellular array as shown in Figure 1.4 and a two-dimensional hexag- onal cellular pattern, where only interference from adjacent cells is accounted for, and where the channel path coecients are either 1 (between a BS and the users inside its cell) or 2 [0; 1] (between a base-station and the users in the nearest adjacent cells). In Wyner's model the system capacity takes on an elegant single-letter closed-form expres- sion in the case of full joint processing of all cells and no fading. Wyner's setting was extended in several works [SW97, SS00, SZS07, SSS + 08, SSPS09]. Single cell processing and joint two-cell processing were investigated by treating the inter-cell interference as Gaussian noise in [SW97] and the at-fading channel case with full joint cell processing was treated in [SS00]. This model was modied and extended to take into account various issues such as soft hand-o and limited inter-cell cooperation due to constrained backhaul capacity (see [SZS07,SSS + 08,SSPS09] and references therein). Although the Wyner model captures some fundamental aspects of the multi-cell prob- lem, its rather unrealistic assumption for the pathloss makes the system essentially sym- metric with respect to any user. More realistically, as shown in Figure 1.1, users in dif- ferent locations of the cellular coverage region are subject to distance-dependent pathloss that may have more than 30 dB of dynamic range [WiM06], and therefore they are in fundamentally asymmetric conditions. It follows that characterizing the sum-capacity (or achievable sum-throughput, under some suboptimal scheme) is rather meaningless from a system performance viewpoint, unless some appropriate notion of fairness is taken into account. In fact, if the sum-throughput is the only objective, the resulting rate and power optimization under distance-dependent pathloss would lead to the solution of serving only the users close to their BS, while leaving the users at the cell edge to starve. 7 α α α α α α ... ... 1 1 1 1 Figure 1.4: Wyner model with a one-dimensional linear cellular array. As a matter of fact, \fairness" is a fundamental aspect in cellular networks. The problem of downlink scheduling subject to some fairness criterion has been widely studied (see for example [VTL02,GNT06,SMCN10] and references therein). The goal of fairness scheduling is to make the system operate at some point of its ergodic achievable rate region such that a suitable concave and increasing network utility function is maximized [MW00]. By choosing the shape of the network utility function, a desired fairness criterion can be enforced. The framework of stochastic network optimization [GNT06] can be leveraged in order to systematically devise scheduling algorithms that perform arbitrarily close to the optimal achievable fairness point, even when the explicit computation of the achievable ergodic rate region is hopelessly complicated. The fairness operating point is given as the time-averaged rate obtained by applying a dynamic scheduling algorithm on a slot-by- slot basis. Hence, its analytical characterization is generally very dicult and the system performance is typically evaluated by letting the scheduling algorithm evolve in time and computing the time-averaged rates by Monte Carlo simulation [HV05, BH07, ZMM + 08, CRP + 08,MF08,ZCA + 09,HTH + 09,RC09,PDF + 08,LSO09,FCD + 09]. However, using the large system analysis, the performance of dynamic fairness schedul- ing can be computed by solving a \static" convex optimization problem. By incorporating the large random matrix theory into the convex optimization solution, we obtain the sys- tem ergodic capacity subject to any desired concave and componentwise non-decreasing network utility function of the users' ergodic rates. The solution is obtained by solving, iteratively, a system of xed-point equations, which becomes particularly simple when 8 each cooperation cluster satises certain symmetry conditions. It also match very closely the simulation results for nite-dimensional systems with the dirty paper coding and the dynamic scheduling policy. The considered model encompasses arbitrary user locations and distance-dependent pathloss and considers arbitrary inter-cell cooperation clusters, where the BSs in the same cluster operate as a distributed antenna array (full cooperation) and inter-cluster interference is treated as Gaussian noise (no inter- cluster cooperation). As special cases, we recover conventional cellular systems (no inter-cell cooperation) and the case of full cooperation. 1.3 Inter-cell Cooperation and Channel Training It is well appreciated that increasing the eective number of transmit antennas by the inter-cell cooperation improves the spectral eciency of the MU-MIMO downlink system but it can also encounter a \dimensionality bottleneck" due to requirements for the chan- nel state information at the transmitter (CSIT) for the increased number of cooperating transmit antennas [RC09, RCP09, HTC10]. Especially, in the frequency-division duplex (FDD) system, the CSIT acquisition is more challenging and can be obtained via the closed-loop operation of channel training and feedback which requires considerable sys- tem overheads both in the downlink and uplink. In general, the downlink beamforming of the FDD system takes the following 5 phases [CJKR10]: common training, channel state information (CSI) feedback, beamforming, dedicated training, and data transmission, as illustrated in Figure 1.5. When we consider inter-cell cooperation, the training overhead (pilot symbols) re- quired to collect CSIT grows at least as fast as the number of cooperating transmit antennas and therefore there is non-trivial tradeo between ICI reduction owing to inter- cell cooperation and the cost of estimating higher and higher dimensional channels. We can observe that there exist an optimal cooperation cluster size that depends on the 9 common training dl - ctr θ downlink uplink feedback beam- forming dedicated training data transmit fb η dtr θ data ζ Figure 1.5: Downlink beamforming in the FDD system. channel coherence time and bandwidth, beyond which the inter-cell cooperation is not convenient, consistently with the nite-dimensional simulation ndings of [RC09,CRP10]. For this purpose, we focus on the Linear Zero-Forcing BeamForming (LZFBF) and apply a fore-mentioned large system analysis to LZFBF and extend the analysis to the case where the CSI is obtained through explicit downlink training and MMSE estimation. In these conditions, we obtain a lower bound on the achievable ergodic rates, which takes into account the overhead due to training-based channel estimation. This analysis also allows precise performance evaluation of systems for which brute-force Monte Carlo simulation would be very demanding. We notice that the performance analysis of LZFBF MU- MIMO with non-ideal CSIT was extensively studied in the nite-dimensional case (e.g. [DLZ07,JAWV08,CJKR10] and in the large-system limit (e.g. [WCDS09,HKD10,ZH10]). Unlike concurrent works, this dissertation considers the system optimization under the fairness criteria in the multi-cell downlink with inter-cell cooperation. Taking one step further, we make use of this asymptotic analysis in order to design a random scheduling algorithm that pre-selects the users with assigned probabilities ob- tained from the large-system results, and therefore requires much less CSIT feedback than the standard opportunistic scheduling scheme. This shows that the asymptotic results can be used not only for performance analysis but also for system design for the MU-MIMO. The standard approach to scheduling for downlink beamforming consists of having a large number of users feeding back their CSIT and selecting a subset of users with cardinality not larger than the number of jointly coordinated transmit antennas, such that the chan- nel vectors of the selected users have both large norm and are mutually approximately 10 orthogonal [DS05,YG06]. This user selection, combined with LZFBF precoding, is shown to attain the same performance as Gaussian DPC in the limit of a large number of users and xed number of transmit antennas. However, in this limit, the throughput per user vanishes as O( log logn n ) where n is the number of users. Therefore, a more meaningful regime is one in which the number of users is proportional to the number of antennas, yielding constant throughput per user. Comparing the results of the asymptotic analysis with the Monte Carlo simulation of nite dimensional systems, including user selection as said before, we notice that multiuser diversity yields larger throughput per user for low-dimensional systems, but this gain reduces as the system dimension grows. This is a manifestation of the \channel hardening eect" noticed in [HMT04], and agrees with the theoretical ndings in [TCFB09] that show that the probability of nding a subset of approximately orthogonal users vanishes as the system dimension increases. It follows that for large systems there is diminishing return in selecting users from a large set. In contrast, the cost of CSIT feedback grows at least linearly with the number of users feeding back their estimated CSIT. Therefore, we consider a probabilistic scheduling algo- rithm for which users are pre-selected at random using the probabilities derived from the large-system analysis, and only the selected users are required to feed back their CSIT. The performance of this scheme are shown to be close to the much more costly full user selection scheme, and become closer and closer as the system dimension increases by the large-system limit and channel hardening eect. 1.4 Massive MIMO As mentioned in the previous section, it is well-known that the performance improvement of MU-MIMO systems by inter-cell cooperation schemes such as \network-MIMO" (e.g., 11 [BPG + 09,DY10,HPC10,FKV06,JTS + 07,BH07,CRP + 08]) is limited by the \dimensional- ity" bottleneck [RC09,RCP09,HTC10]. In particular, the high-SNR capacity of a single- user MIMO system withN t transmit antennas,N r receiving antennas, and fading coher- ence block lengthT complex dimensions 1 , scales asC(SNR) = minfN t ;N r ;T=2g logSNR+ O(1) [MH99,ZT02]. Therefore, even by pooling all base stations into a single distributed macro-transmitter with N t 1 antennas and all user terminals into a single distributed macro-receiver with N r 1 antennas, the system degrees of freedom 2 are eventually limited by the fading coherence block length T . While this dimensionality bottleneck is an inherent fact, emerging from the high-SNR behavior of the capacity of MIMO block-fading channels [ZT02, HH03] noticed that the same behavior also characterizes the capacity scaling of systems based on explicit train- ing for channel estimation. Therefore, it can be interpreted as eect of the \overhead" incurred by pilot signals. For FDD systems described in the previous section, the train- ing overhead required to collect CSIT grows linearly with the number of cooperating transmit antennas [CJKR10]. Such overhead restricts the MU-MIMO benets that can be harvested with a large number of transmit antennas [RCP09, HTC10]. For time di- vision duplexing (TDD) systems, exploiting channel reciprocity [Mar06, Mar10], which states that the uplink and downlink channels between a pair of antennas are strongly correlated within the same channel coherence block, the CSIT can be obtained from the uplink training. In this case, the pilot signal overhead scales linearly with the number of active users per cell, but it is independent of the number of cooperating antennas at the BSs. As a result, for a xed number of users scheduled for transmission, the TDD system performance can be signicantly improved by increasing the number of BS antennas. 1 The fading coherence block lengthT , measured in signal complex \dimensions" in the time-frequency domain is proportional to the product WcTc, where Tc (s) denotes the channel coherence interval, and Wc (Hz) denotes the channel coherence bandwidth [Pro00]. 2 The system Degrees of Freedom (DoFs) are dened as the pre-log factor of the system capacity C(SNR), i.e., DoFs = lim SNR!1 C(SNR) logSNR , and quantify the number of \equivalent" parallel single-user Gaussian channels, in a rst-order approximation with respect to logSNR. 12 Following this idea, Marzetta [Mar10] has shown that simple Linear Single-User BeamForming (LSUBF) and random user scheduling, without any inter-cell cooperation, yields unprecedented spectral eciency in TDD cellular systems, when a suciently large number of transmit antennas per active user are employed at each BS. This scheme, called \Massive MIMO", was analyzed in the limit of innite number of BS antennas per user per cell. In this regime, the eects of Gaussian noise and uncorrelated inter-cell interfer- ence disappear, and that the only remaining impairment is the inter-cell interference due to pilot contamination [JAMV11], i.e., to the correlated interference from other cells due to users re-using the same pilot signal. In this dissertation, we also focus on TDD systems and exploit reciprocity and pro- pose a novel network-MIMO architecture that achieves spectral eciencies comparable with \Massive MIMO", with a more practical number of BS antennas per active user (one order of magnitude less antennas, to achieve approximately the same spectral ef- ciency). As in [Mar10], we also analyze the proposed system in the limit of a large number of antennas. However, a dierent system scaling is considered, where the number of antennas per active user per cell is nite. This is obtained by letting the number of users per cell, the number of antennas per BS, and the channel coherence block length go to innity, with xed ratios [HTC10]. We nd that in this regime the LSUBF scheme advocated in [Mar10] performs very poorly. In contrast, we consider a family of network- MIMO schemes based on small clusters of cooperating base stations, Linear Zero-Forcing BeamForming (LZFBF) with suitable inter-cluster interference constraints, uplink pilot signals reuse across cells, and frequency reuse. The key idea consists of partitioning the users population into geographically determined \bins", containing statistically equiva- lent users, and optimizing the network-MIMO scheme for each individual bin. Then, users in dierent bins are scheduled over the time-frequency slots, in order to maximize some desired \fairness" network utility function. The geographic nature of the proposed scheme yields very simple system operations, where each time a given bin is scheduled, a subset 13 of active users in the selected bin is chosen at random or in a round robin fashion, with- out performing any CSIT-based user selection. This allows a fast turn-around between feedback and transmission, that can take place in the same channel coherence block. The resulting architecture is a mixed-mode network-MIMO, where dierent schemes, each of which is optimized for the served user bin, are multiplexed in time-frequency. Using results and tools from the large-system analysis, we obtain the asymptotic achievable rate for each scheme. The performance predicted by the large-system analysis match very well with nite-dimensional simulations and the large-system analysis pro- vides an accurate and rapid way to compare dierent network-MIMO schemes and select the best in the family for each scheduled location, without resorting to cumbersome and time-consuming Monte Carlo simulation. In fact, the system parameters in the consid- ered family are strongly mutually dependent, and the system optimization without our analytical tools would just be infeasible. 1.5 Dissertation Outline The remainder of this dissertation is organized as follows. In Chapter 2, the ergodic capacity of the multi-cell MU-MIMO downlink systems is considered under genera fairness criteria. We present the MU-MIMO downlink system model with inter-cell cooperation and formulate the fairness scheduling problem and de- velop the numerical solution for the input covariance maximizing the weighted ergodic sum rate in the large-system limit. Then, we use these results in order to obtain a semi-analytic method to calculate the optimal ergodic fairness rate point in the asymp- totic regime. Examples of these asymptotic rates are shown in 2-cell linear and 7-cell planar models and are compared with nite-dimensional simulation results obtained by the combination of DPC and the actual dynamic scheduling scheme based on stochastic optimization. 14 In Chapter 3, we focus on the large system analysis of LZFBF under non-perfect CSIT and investigate the trade-o between the benets of inter-cell cooperation and the accompanying costs of channel training. We take the large system limit and present the large-system regime of the LZFBF precoder and the optimization algorithm for user selec- tion and power allocation. The opportunistic fairness scheduling scheme is also described again and the impact of non-perfect CSIT and training overhead is analyzed. Numerical results and the low-complexity randomized scheduling scheme are also presented. In Chapter 4, the location-adaptive scheduling and transmission strategy is proposed for the TDD-based MU-MIMO system. We describe the family of proposed network- MIMO schemes. and discuss the uplink training, MMSE channel estimation and pilot contamination eect for TDD-based systems. We also analyze the network-MIMO ar- chitectures under considerations and and provide expressions for their achievable rate in the large system limit. The spectral eciency under specic fairness criteria and some numerical results including comparison with nite dimensional simulation results are pre- sented. The dissertation is concluded in Chapter 5 with the nal remarks on the contribution of this dissertation. 15 Chapter 2 Ergodic Capacity under Fairness Criteria In this chapter, we evaluate the ergodic capacity of the multi-cell MU-MIMO down- link systems with arbitrary cluster arrangements of cooperating cells, subject to genera fairness criteria. First, we present the nite-dimensional system model with inter-cell cooperation clusters and formulate the fairness scheduling problem. We take the large system limit and develop the numerical solution for the input covariance maximizing the weighted ergodic sum rate. Then, we combine these results with the Lagrangian op- timization and obtain a semi-analytic method to compute the optimal ergodic fairness rate point in the asymptotic regime. Numerical examples are presented in 2-cell linear and 7-cell planar models and are compared with nite-dimensional simulation results ob- tained by the combination of DPC and the actual dynamic scheduling scheme based on stochastic optimization. 2.1 Multi-cell Model and Capacity Region We consider a cellular system formed by M base-stations (BSs), each equipped with multiple antennas. Each cell contains single-antenna user terminals. For the sake of the large-system analysis developed in this work, we shall scale the number of antennas per BS and the number of users per cell to innity, such that their ratio remains constant. Furthermore, for analytical simplicity, we consider a spatial discretization of the users positions, such that the whole users population is partitioned intoK co-located groups of 16 N users each. When we scale the system to its large-system limit, we let the parameter N (number of users in each user group) go to innity. We let > 0 indicate the ratio between the number of BS antennas and the number of users in each group, therefore, the number of BS antennas is equal to N. Users in the same group are statistically equivalent: since they are co-located, they ex- perience the same set of pathloss coecients with respect to the BSs, and their small-scale fading channel coecients are mutually independent and identically-distributed (i.i.d.). In practice, it is reasonable to assume that co-located users are separated by a su- cient number of wavelength such that they undergo i.i.d. small-scale fading (due to local scattering eects and small movements of the users around their nominal position), but the wavelength is suciently small so that they all have essentially the same distance- dependent pathloss from any given BS. Users in dierent groups observe generally dif- ferent sets of pathlosses, depending on their relative positions with respect to the BSs. We remark here that the assumption of K co-located user groups is inessential for this analysis to hold, but it signicantly simplies the xed-point equations that we have to solve iteratively to obtain the desired results (see Sections 2.2.2 and 2.2.3). It is possible to extend these results to continuous user spatial distributions (e.g., uniform over each cell). However, in order to solve the resulting xed-point equations such as (2.22), we have to discretize the domain of certain integrals, and therefore in practice we go back to the discretized case. Here, we avoid this complication by assuming a discrete user distribution from the very beginning. We assume a block-fading model where the channel coecients are constant over time-frequency \slots" determined by the channel coherence time and bandwidth, and change according to some well-dened ergodic process from slot to slot. In contrast, the distance-dependent pathloss coecients are constant in time. This is representative of a typical situation where the distance between BSs and users changes signicantly over a time-scale of the order of tens of seconds, while the small-scale fading decorrelates completely within a few milliseconds [TV05]. The slot index shall be denoted by t, but 17 we will omitt for notation simplicity whenever possible. We shall make explicit reference to the time slot when discussing the dynamic fairness scheduling policy in Section 2.3. One channel use of the multi-cell MU-MIMO downlink is described by y k = M X m=1 m;k H H m;k x m + n k ; k = 1;:::;K (2.1) where y k = [y k;1 :::y k;N ] T 2 C N denotes the received signal vector for the k-th user group, m;k and H m;k denote the distance-dependent pathloss and a NN channel matrix collecting the small-scale channel fading coecients from them-th BS to thek-th user group, respectively, x m = [x m;1 :::x m; N ] T 2C N is the signal vector transmitted by the m-th BS, and n k = [n k;1 :::n k;N ] T 2C N denotes the AWGN at the user receivers in thek-th user group. The elements of n k and of H m;k are i.i.d.CN (0; 1). We assume a per-BS average power constraint expressed by tr (Cov(x m ))P m , whereP m > 0 denotes the total transmit power of the m-th BS. We consider a multi-cell cooperative system where cells are partitioned into coopera- tion clusters. Each cluster acts eectively as a distributed MU-MIMO system, by jointly processing all the antennas for all BSs in the cluster, and by serving the users in the groups located in the cells forming the cluster. Since the clusters form a partition of the cells set, no user group can be served by more than one cluster. We assume that each cluster has perfect channel state information for all the users associated with the cluster, and has statistical information (i.e., known distributions but not the instantaneous values) relative to signals from other clusters. Within these channel state information assumptions, we consider ideal joint processing of all BSs in the same cluster. Inter-Cluster Interference (ICI) is treated as additional Gaussian noise. Let L denote the number of cooperation clusters. We dene the partitionfM 1 ;:::;M L g of the BSs setf1;:::;Mg, and the corresponding partitionfK 1 ;:::;K L g of the user groups set f1;:::;Kg, whereM ` andK ` denote the set of BSs and user groups forming the `-th cooperation cluster. In the given cellular networks, the cluster size denoted by (jM ` j;jK ` j) 18 represents the level of inter-cell cooperation and grows as more BSs are cooperating. For example, in three-sectored hexagonal 7 cell networks servingK = 84 user groups (4 groups per sector) which will be described in Section 2.4, there are one BS and 4 user groups in each of 21 sectors and we consider three levels of cooperations, (a) no cooperation, (b) three-sector cooperation within each cell, and (c) full cooperation over the 21 sectors, as illustrated in Figure 2.5 where dierent colors denote dierent clusters. We consider the same-sized clusters and have (a) L = 21 clusters of size (1,4), (b) L = 7 clusters of size (3,12), and (c) L = 1 clusters of size (21,84) in each case. We assume that clusters are selsh and use all available transmit power, with no consideration for the ICI that they may cause to other clusters. Hence, the ICI plus noise variance at any user terminal in group k2K ` is given by 2 k = E 2 4 1 N X m= 2M ` m;k H H m;k x m + n k 2 3 5 = 1 + X m= 2M ` 2 m;k P m : (2.2) From well-known worst-case noise results [HH03], since users in each cluster have only statistical information about the ICI, treating the noise plus ICI as a white complex circularly symmetric Gaussian noise with variance 2 k yields an achievable inner bound to the achievable rate region of the clusters. Therefore, from the viewpoint of clusterM ` , the system is equivalent to a single-cell MU-MIMO downlink with per-group-of-antennas power constraint, where each antenna group corresponds to each BS's antennas, and with AWGN power at the user receivers given by (2.2). Hence, from now on, we shall focus on a reference cluster (say, `) and simplify the notation. We let B =jM ` j and A =jK ` j denote the number of BS and user groups in the cluster, and enumerate the BS and the 19 user groups forming the cluster as m = 1;:::;B and k = 1;:::;A, respectively. Also, we dene the modied path coecients m;k = m;k k and the cluster channel matrix e H = 2 6 6 6 6 4 1;1 H 1;1 1;A H 1;A . . . . . . . . . B;1 H B;1 B;A H B;A 3 7 7 7 7 5 : (2.3) Hence, one channel use of the reference cluster downlink is given by y = e H H x + v (2.4) where y =C AN , x =C BN , and vCN (0; I) (we drop subscript ` for notation simplic- ity). It is well-known that the boundary of the capacity region of the MIMO BC (2.4) for xed channel matrix e H and given per-group-of-antennas power constraintsfP 1 ;:::;P B g can be characterized by the solution of a min-max weighted sum-rate problem [Yu06, ZZL + 09, HPC10]. For reasons that will be clear when discussing the scheduling policy in Section 2.3, we restrict ourselves to the case of identical weights for all statistically equivalent users, i.e., for the case that users in the same group have the same weight for their individual rates. We let W k and R k ( e H) = 1 N P N i=1 R k;i ( e H) denote the weight for user group k and the corresponding instantaneous per-user rate, respectively. In this paper, we refer to as \instantaneous" the quantities that depend on the realization of the channel matrix e H. Since this changes from slot to slot, instantaneous quantities also change accordingly. We let denote the permutation that sorts the weights in increasing order W 1 ::: W A and use the subscript [k : A] to indicate quantities involving user groups from k to A . 1 Then, we let e H k denote the k-th BNN slice of e H in (2.3), and Q k = diag(q k;1 ;:::;q k;N ) denote an NN non-negative denite diagonal 1 It is understood that matrices with subscripts k1 : k2 with k1 > k2 are empty (equal to all-zero matrices of appropriate dimension). 20 matrix of the dual uplink users' transmit powers (see below). Furthermore, we dene Q = diag(Q 1 ;:::; Q A ) and, for given permutation , we let e H k:A = h e H k ::: e H A i and Q k:A = diag (Q k ;:::; Q A ). The rate pointfR k ( e H;W 1 ;:::;W A )g on the boundary of the instantaneous capacity region corresponding to weightsfW 1 ;:::;W A g is obtained as solution of the min-max problem [Yu06,ZZL + 09,HPC10] min 0 max Q0 A X k=1 W k R k ( e H) (2.5) where the instantaneous per-user rate of each group takes on the expression R k ( e H) = 1 N log () + e H k:A Q k:A e H H k:A () + e H k+1:A Q k+1:A e H H k+1:A ; (2.6) where () is a BN BN block-diagonal matrix with N N constant diagonal blocks m I N , for m = 1;:::;B and the maximization with respect to Q is subject to the trace constraint tr(Q) B X m=1 m P m : (2.7) The variables =f m g are the Lagrange multipliers corresponding to the per-group-of- antennas power constraints. The rate R k ( e H) in (2.6) can be interpreted as the instantaneous per-user rate of user group k in the dual vector Multiple Access Channel (MAC) with worst-case noise dened by r = A X k=1 e H k s k + z (2.8) where Cov(s k ) = Q k and zCN (0; ()). In this \dual uplink" interpretation, group sum-rate expression (2.6) corresponds to group-wise successive interference cancellation, where user groups are decoded successively in the order of 1 ; 2 ;:::; A , and users in each group are jointly decoded. Also, notice that users in group k in general do not achieve individually the rate R k ( e H) on every slot. Rather, this rate is the aggregate 21 sum-rate of all users in group k , normalized by N, i.e., the mean user rate of group k for given e H. Ecient interior-point methods to solve (2.5) are given, for example, in [Yu06,ZZL + 09,HPC10]. Yet, the solution of this problem is numerically fairly involved, especially for large dimensions. Consistently with the assumption of xed coecientsf m;k g and ergodic block-fading for the small-scale fading coecientsfH m;k g, the ergodic capacity region of the MU- MIMO downlink channel (2.4) is given by the set of all achievable average rates, where averaging is with respect to the small-scale fading coecients. In particular, an inner bound to the ergodic capacity region is given by C(P 1 ;:::;P B ) = coh [ W 1 ;:::;W A 0 n R : 0R k;i E h R k ( e H;W 1 ;:::;W A ) i ; 8k = 1;:::;A;8i = 1;:::;N o (2.9) where \coh" indicates the closure of the convex hull. The achievability of the above region is clear: all users i in group k are statistically equivalent and therefore they can achieve the same ergodic rate. Notice thatC(P 1 ;:::;P B ) is generally an inner bound because of the restriction of the weights in (2.5) to be identical for all users in the same group. We will see later that, for fairness scheduling in the limit of N!1, this limitation becomes immaterial. At this point we can formulate the fairness scheduling problem. Let g(R) denote a strictly increasing and concave network utility of the ergodic user rates. While the channel fading coecients change from time slot to time slot according to some ergodic process, the optimal scheduling policy allocates dynamically the transmit powers and the 22 DPC precoding order in order to let the system operate at the ergodic rate point solution of: maximize g(R) subject to R2C(P 1 ;:::;P B ) (2.10) Dierent fairness criteria can be enforced by choosing appropriately the function g() [MW00]. For example, proportional fairness [VTL02, BBG + 00, PELT06] is obtained by letting g(R) = P k;i logR k;i and max-min fairness is obtained by letting g(R) = min k;i R k;i . We notice here that an analytical computation of the ergodic rate point R ? achieving the optimum in (2.10) is dicult in general, and has escaped clean closed form solutions even in much simpler cases. However, by applying the general stochastic optimization framework of [GNT06] (see also [SMCN10], more specically targeted to the MU-MIMO downlink), explicit scheduling policies can be designed such that the limit of the time- averaged user rates converges to R ? . The rest of this paper is dedicated to nding an ecient method to directly compute R ? , by exploiting large random matrix theory and convex optimization. 2.2 Weighted Ergodic Sum Rate Maximization In this section, we consider the solution of the following preliminary problem. For xed f m g, we consider the maximization of the weighted average sum-rate maximization maximize A X k=1 W k 1 N E 2 4 log () + e H k:A Q k:A e H H k:A () + e H k+1:A Q k+1:A e H H k+1:A 3 5 subject to tr(Q)Q (2.11) 23 where we dene Q = P B m=1 m P m . The maximization in (2.11) is with respect to the diagonal elementsfq k;i g of Q, which can be functions of the channel statistics, but not of their instantaneous values (that is, Q does not depend on e H, but only on its statistics). Letting k =W k W k1 withW 0 = 0, the objective function in (2.11) can be written as F W; (Q) = A X k=1 k 1 N E h log () + e H k:A Q k:A e H H k:A i W A 1 N logj()j (2.12) In this form, problem (2.11) is clearly convex, since F W; (Q) in (2.12) is a concave function of Q. In addition, we can prove the following symmetry result: Lemma 2.1 The optimal Q in (2.11) allocates equal power to the users in the same group. Proof See Section 2.5.1. Using Lemma 2.1, we restrict the optimization in (2.12) to block-diagonal matrices Q with constant diagonal blocks Q k = Q k N I. Furthermore, we have: Lemma 2.2 The optimal ? for the min-max problem (2.5) are strictly positive, i.e., ? > 0. Proof See Section 2.5.2. Using Lemma 2.2, we dene H k = 1 p N 1=2 () e H k and rewrite the objective function with some abuse of notation as F W; (Q 1 ;:::;Q A ) = A X k=1 k 1 N E " log I + A X `=k H ` H H ` Q ` # (2.13) where the trace constraint (2.11) becomes P A k=1 Q k Q. 24 2.2.1 Solution for FiniteN The Lagrangian function of problem (2.11) is given by L(Q 1 ;:::;Q A ;) =F W; (Q 1 ;:::;Q A ) A X k=1 Q k Q ! (2.14) Using the dierentiation rule @ logjXj = tr(X 1 @X), we write the KKT conditions as @L @Q j = j X k=1 k N E 2 4 tr 0 @ H H j " I + A X `=k H ` H H ` Q ` # 1 H j 1 A 3 5 (2.15) for j = 1;:::;A, where equality must hold at the optimal point for all j such that Q j > 0. After some algebra and the application of the Sherman-Morrison-Woodbury matrix inversion lemma [HJ85], the trace in (2.15) can be rewritten in a more convenient form 1 N tr 0 @ H H j " I + A X `=k H ` H H ` Q ` # 1 H j 1 A = 1 N tr H H j 1 k:Anj H j h I +Q j H H j 1 k:Anj H j i 1 = 1mmse (j) k:A Q j (2.16) where we let k:Anj = I + P A `=k;`6=j H ` H H ` Q ` and where we dene mmse (j) k:A as follows: consider the observation model r [k:A] = A X `=k p Q ` H ` s ` + z (2.17) where s k ; s K+1 ;:::; s A and z are Gaussian independent vectors with i.i.d. components CN (0; 1). Then, mmse (j) k:A denotes the per-component MMSE for the estimation of s j from r [k:A] , for xed (known) matrices H k ;:::; H A . Explicitly, we have 25 mmse (j) k:A = 1 N tr IQ j H H j h H j H H j Q j + k:Anj i 1 H j = 1 N tr h I +Q j H H j 1 k:Anj H j i 1 (2.18) Using (2.16) in (2.15) and solving for the Lagrange multiplier, we nd = 1 Q A X `=1 ` X k=1 k (1E[mmse (`) k:A ]) (2.19) Finally, we arrive at the conditions Q j =Q P j k=1 k (1E[mmse (j) k:A ]) P A `=1 P ` k=1 k (1E[mmse (`) k:A ]) (2.20) for all j such that Q j > 0, and, using the KKT conditions and (2.19), we nd that the inequality Q j X k=1 k N E h tr H H j 1 k:Anj H j i A X `=1 ` X k=1 k (1E[mmse (`) k:A ]) (2.21) must hold for all j such that Q j = 0. Eventually, we have proved the following result: Theorem 2.1 The solution Q ? 1 ;:::;Q ? A of problem (2.11) is given as follows. For all j for which (2.21) is satised, then Q ? j = 0. Otherwise, the positive Q ? j satisfy (2.20). In nite dimension, an iterative algorithm that provably converges to the solution can be obtained as a simple modication of [TLV06, Algorithm 1]. The amount of calculation is tremendous because the average MMSE terms must be computed by Monte Carlo simulation. Since our emphasis is on the solution in the limit for N!1, we omit these details and focus on the innite dimensional case in Section 2.2.2. In addition, we have not yet addressed the outer minimization with respect to the Lagrange multipliersf m g. We postpone this issue to Section 2.2.4 where we discuss system symmetry conditions for 26 which the solution under the per-BS power constraint coincides with the laxer per-cluster sum power constraint. In this case, we can let m = 1 for all m, and no minimization with respect to is needed. 2.2.2 Limit forN!1 In this section, we consider problem (2.11) in the limit for N!1, making use of the asymptotic random matrix results of [TLV05]. In this regime, the instantaneous per-user rates in (2.5) converge to their expected values by well-known convergence results of the empirical distribution of the log-determinants in the rate expression (2.6) [TV04,Gir90]. Hence, in the large-system regime, the solution of (2.11) coincides with that of (2.5), for xed channel pathloss coecientsf m;k g, transmit power constraintsfP m g, weights fW k g and Lagrange multipliersf m g. We will use this fact in Section 2.3, where we will examine a general dynamic fairness scheduling policy for the actual (nite dimensional) system, and study its performance in the large-system regime. We introduce the normalized row and column indices r and t, taking values in [0; 1), and the aspect ratio of the matrix H given by the ratio of the number of columns over the number of rows and given by = A B . Then, we dene the following piecewise constant functions: Q(t): (dual uplink) transmit power prole such thatQ(t) =Q k for k1 A t< k A . G(r;t): channel gain prole of the matrix H such thatG(r;t) = 2 m; k = m for m1 B r< m B and k1 A t< k A . k:A (t): average per-component MMSE prole of the observation model (2.17), such that k:A (t) = mmse (j) k:A for k1 A t< 1. k:A (t): signal-to-interference-plus-noise ratio (SINR) prole corresponding to k:A (t) such that k:A (t) = 1= k:A (t) 1. In the limit of large N, these functions satisfy equations given by the following lemma2: 27 Lemma 2.3 As N!1, for each k = 1;:::;A, the SINR functions k:A (t) satisfy the xed-point equation k:A (t) = Z 1 0 BG(r;t)Q(t)dr 1 + R 1 (k1)=A BG(r;)Q()d 1+ k:A () (2.22) Also, the asymptotic k:A (t) is given in terms of the asymptotic k:A (t) as k:A (t) = 1=(1 + k:A (t)). Proof See Section 2.5.3. Since all functions involved in Lemma 2.3 are piecewise constant (although the lemma applies in more generality), we can give a more explicit expression directly in terms of the discrete set of values of these functions. Replacing k:A (t) = (j) k:A for all j1 A t< j A with j k in (2.22) and solving for the integrals of piecewise constant functions, we obtain (j) k:A = B X m=1 Z m B m1 B BG(r;t)Q(t)dr 1 + P A `=k R ` A `1 A BG(r;)Q()d 1+ (`) k:A = B X m=1 ( 2 m; j = m )Q j 1 + P A `=k ( 2 m; ` =m)Q ` 1+ (`) k:A : (2.23) Combining (2.23) with the already mentioned modication of the iterative algorithm of [TLV06, Algorithm 1], we obtain a procedure to compute the maximum weighted average sum rate of problem (2.11), for xed weightsfW k g and Lagrange multipliers f m g. This is summarized by Algorithm 2.1 below (for notation simplicity, the algorithm is written assuming k =k for all k = 1;:::;A. We can always reduce to this case after a simple reordering of the weights). 28 Algorithm 2.1 Input power optimization for weighted average sum rate maximization. 1. Initialize Q k (0) =Q=A for all k = 1;:::;A. 2. For i = 0; 1; 2;:::, iterate until the following solution settles: Q j (i + 1) =Q P j k=1 k (1 (j) k:A (i)) P A `=1 P ` k=1 k (1 (`) k:A (i)) ; (2.24) for j = 1;:::;A, where (j) k:A (i) = 1=(1 + (j) k:A (i)), and (j) k:A (i) is obtained as the solution of the system of xed point equations (2.23), also obtained by iteration, for powers Q k =Q k (i);8k. 3. Denote by (j) k:A (1), (j) k:A (1) and byQ j (1) the xed points reached by the iteration at step 2). If the condition Q j X k=1 k (j) k:A (1) A X `=1 ` X k=1 k 1 (`) k:A (1) (2.25) is satised for all j such that Q j (1) = 0, then stop. Otherwise, go back to the initialization step, set Q j (0) = 0 for j corresponding to the lowest value of P j k=1 k (j) k:A (1), and repeat steps 2) and 3) starting from the new initial condi- tion. 29 2.2.3 Computation of the Asymptotic Rates After the powers Q ? k = Q k (1) have been obtained from Algorithm 2.1, it remains to compute the corresponding average per-user rates. The average rate of users in group k is given by R k = 1 N E " log I + A X `=k H ` H H ` Q ` # 1 N E " log I + A X `=k+1 H ` H H ` Q ` # (2.26) In the limit for N!1, we can use the asymptotic analytical expression for the mutual information given in [ABEH06]. Adapting [ABEH06, Result 1] to our case, we obtain lim N!1 1 N E " log I + A X `=k H ` H H ` Q ` # = A X `=k log 1 + Q ? ` B X m=1 ( 2 m; ` = m )u m ! + B X m=1 log 1 + A X `=k ( 2 m; ` = m )Q ? ` v ` ! A X `=k B X m=1 ( 2 m; ` = m )Q ? ` u m v ` (2.27) where for each k = 1;:::;A,fu m :m = 1;:::;Bg andfv ` :` =k;:::;Ag are the unique solutions to the system of xed point equations u m = 1 + A X `=k Q ? ` ( 2 m; ` = m )v ` ! 1 ; m = 1;:::;B; v ` = 1 + B X m=1 Q ? ` ( 2 m; ` = m )u m ! 1 ; ` =k;:::;A: (2.28) The proof follows from [ABEH06] based on the Girko's theorem [Gir90] (see also [TV04]). Although (2.27) is not in a closed form,fu m g andfv ` g in (2.28) can be solved by xed point iterations withA +B variables. These converge very quickly to the solution to any desired degree of numerical accuracy. 30 2.2.4 System Symmetry So far we have considered the solution of the maximization in (2.11) for xedf m g. However, we are interested in the solution of (2.5) including the per-BS power constraint, that requires minimization with respect tof m g. In nite dimension and for xed channel matrix, the min-max problem can be solved by the subgradient-based iterative method of [ZZL + 09] or the infeasible-start Newton method of [YL07,HPC10]. A direct application of these algorithms to the large system limit requires asymptotic expressions for the subgradient with respect tof m g or the KKT matrix, respectively. These quantities contain the second order derivatives of the Lagrangian function with respect tofQ k g and f m g, which do not appear to be amenable for easily computable asymptotic limits. A general method for the minimization with respect tof m g can be obtained as follows. Let G W () denote the solution of (2.11). This is a convex function of and the minimizing ? must have strictly positive components by Lemma 2.2. Therefore, at the optimal point we have @G W @m = ? = 0 for all m = 1; ;B. It follows that the solution can be approached by gradient descent iterations where the gradient can be estimated by numerical dierentiation [CK04]. Let m be a BN-length vector for which the elements (m 1) N + 1; ;m N are for some > 0 and the other elements are zero. Then the approximation for the partial derivative of G W () with respect to m is given by G W (+m)G W (m) 2 with O( 2 ) error term [CK04]. Both G W ( + m ) and G W ( m ) are computed by Algorithm 2.1. From the above argument it follows that the general case where minimization with respect tof m g is required does not present any conceptual diculty beyond the fact that it may be numerically cumbersome. Of course, a simple upper bound consists of relaxing the per-BS power constraint to a sum-power constraint in the reference cluster. Notice that the solution and the value of the objective function is invariant to a common scaling of the Lagrange multipliers. Therefore, we can assume 1 B P B m=1 m = 1 without loss of generality. Letting m = 1 for all m yields the laxer sum-power constraint P k Q k P m P m = P tot , where P tot denotes the total transmit power of the cluster. This choice 31 yields an upper-bound to the capacity region of the cluster (under the constraint of treating ICI as noise) and therefore also provides an upper-bound to the whole system achievable region under the assumption that all BSs transmit at their maximum power. Next, we present a system symmetry condition under which the sum-power and the per-BS power solutions coincide. Assume the same BS power constraint P m =P for all m = 1;:::;B. Then, let A 0 = A=B assuming that B divides A. In particular, this is true when we have the same number of user groups in each cell of the cluster. Finally, assume that the BA matrix of the coecients =f m;k g can be partitioned into A 0 submatrices of size BB such that each submatrix has the property that all rows are permutations of the rst row, and all columns are permutations of the rst column. Since this requirement reminds the condition for strongly symmetric discrete memoryless channels, we shall refer to these submatrices as \strongly symmetric blocks". To x ideas, consider Figure 2.1 showing a linear cellular layout with B = 2 BSs and A = 8 user groups. Assume distance-dependent pathloss coecients yielding the matrix = 2 6 4 a b c c f e e d f e e d a b c c 3 7 5 for some positive numbers a;b;c;d;e;f. We notice that this matrix can be decomposed into the A 0 = 4 strongly symmetric blocks 2 6 4 a f f a 3 7 5; 2 6 4 b e e b 3 7 5; 2 6 4 c e e c 3 7 5; 2 6 4 c d d c 3 7 5 satisfying the above assumption. When these symmetry conditions hold, the user groups corresponding to the same strongly symmetric block (e.g., user groups pairs (1; 5), (2; 6), (3; 7) and (4; 8) in the example) are statistically equivalent, in the sense that they see exactly the same landscape of channel coecients from all the BSs forming the cluster. In this case, as it will be clear 32 BS 1 user group 2 user group 4 user group 1 . . . . . BS 2 . . . . . . . . . . cell 1 cell 2 user group 8 user group 5 user group 6 . . . . . ... ... Figure 2.1: Linear one-sided 2-cell model with B = 2 BSs and A = 8 user groups. in Section 2.3, we can restrict the weighted sum-rate maximization in (2.5), (2.11) to the case where the weights W k are identical for all user groups in the same strongly symmetric block. Without loss of generality, let's enumerate the user groups such that theb-th symmetric block contains user groups with indicesk = (b1)B +1;:::;bB, with corresponding constant weights W k = W 0 b . Then, the objective function (2.13) takes on the form: F W; (Q 1 ;:::;Q A ) = A 0 X b=1 0 b N E h log I + 1 () e H (b1)B+1:A Q (b1)B+1:A e H H (b1)B+1:A i (2.29) where 0 b =W 0 b+1 W 0 b for b = 1;:::;A 0 , with W 0 0 = 0, and where Q = 1 N diag(Q 1 ;:::;Q 1 | {z } N ;Q 2 ;:::;Q 2 | {z } N ;:::;Q A ;:::;Q A | {z } N ); with trace constraint P A k=1 Q k BP =P tot . We have the following result: Theorem 2.2 Under the above system symmetry conditions, the minimization in the min-max problem (2.5) in the limit ofN!1 is achieved for m = 1 for allm = 1;:::;B. Proof See Section 2.5.4. As a corollary, we have that under the symmetry conditions of Theorem 2.2 the per-BS power constraint and the total power constraint solutions coincide. 33 2.3 Fairness Scheduling Downlink opportunistic scheduling is currently used by \high data rate" third-generation cellular systems such as EV-DO [BBG + 00] and HSDPA [PELT06]. It is expected that in the next generation of systems based on MIMO-OFDM, such as IEEE 802.16m [IEE10] and LTE-Advanced [3GP10a], such strategies will be integrated with the MU-MIMO physical layer. In such systems, each cooperation cluster runs a downlink scheduler that computes a set of rate weight coecients and, at each scheduling time slot t, solves the maximization of the instantaneous weighted rate-sum subject to the per-BS power constraint, as in (2.5). The result of this maximization provides the power and rate allocation and the corresponding downlink precoder parameters (i.e., the beamforming vectors and the DPC encoding order) to be used in the current slot. The scheduler weights are recursively computed such that the time-averaged user rates converge to the desired ergodic rate point R ? , the solution of (2.10). The scheduling policy can be systematically designed by using the stochastic opti- mization approach of [GNT06, SMCN10], based on the idea of \virtual queues". Notice that we do not consider exogenous arrivals: consistently with the classical information theoretic setting, we assume that an arbitrarily large number of information bits are to be transmitted to the users in each cluster (innitely backlogged system). The virtual queues dened here are only a tool to recursively compute the weights of the scheduling policy. For the sake of clarity, we provide here a short review of the scheduling algorithm for the general nite-dimensional system. We denote instantaneous quantities as dependent on the slot index t. The scheduling policy ensures that the virtual queue of each user (k;i) (i.e., useri in groupk) is strongly stable (see [GNT06, Denition 3.1]). This implies that the arrival rate k;i is strictly less than the average service rateR k;i =E[R k;i ( e H(t))]. Then, the desired ergodic rate point R ? can be approached if the virtual queues are fed by virtual arrival processesA k;i (t) with rates k;i =E[A k;i (t)] suciently close to the desired 34 valuesR ? k;i . The interesting feature of this approach is that it is possible to generate such virtual arrival processes greedily and adaptively, even if the values R ? k;i are unknown a priori, and may be very dicult to be calculated directly. Let U k;i (t) denote the virtual queue backlog for user i in group k at time slot t, evolving according to the stochastic dierence equation U k;i (t + 1) = h U k;i (t)R k;i ( e H(t)) i + +A k;i (t) (2.30) where [] + = max(; 0). We consider the scheduling policy given as follows: At each time slot t, solve the weighted sum-rate maximization problem maximize A X k=1 N X i=1 U k;i (t)R k;i ( e H(t)) subject to Cov(x m )P m (2.31) The virtual queues are updated according to (2.30), where the arrival processes are given by A k;i (t) = a ? k;i , where the vector a ? is the solution of the maximization problem: maximize Vg(a) A X k=1 N X i=1 a k;i U k;i (t) subject to 0a k;i A max (2.32) for suitable V > 0 and A max > 0. The parameters V and A max determine the accuracy and the convergence speed of the time-average rates to their expected values. It can be shown [GNT06, SMCN10] that, for xed suciently large parameters A max , the gap between the long-time average rates lim t!1 P t1 =0 1 t R k;i ( e H()) and the optimal ergodic rates R ? k;i decrease as O(1=V ) while the expected backlog of the virtual queues increases as O(V ). 35 After reviewing the above background on scheduling and stochastic optimization, we are ready to make some observations that are instrumental for the performance compu- tation in the large-system limit. Due to the statistical equivalence of users in the same group, the ergodic rate points withR k;i =R k (independent ofi) are achievable. In partic- ular, the boundary of the system ergodic capacity region and of the regionC(P 1 ;:::;P B ) in (2.9) coincide for all rate points meeting this condition. It is meaningful to assume that the network utility function g(R) is invariant with respect to permutations of the rates of statistically equivalent users. In fact, all statistically equivalent users should be treated equally in the long-term average sense. 2 For example, the-fairness utility func- tion proposed in [MW00] satises this condition. In this case, it is immediate to show that the function g(R) is maximized by some rate point with equal rates over each user group or, if the symmetry conditions of Theorem 2.2 hold, over all groups in the same strongly symmetric block. Hence, in large-system limit the point R ? =fR ? k;i g solution of (2.10) must satisfy, for all i, R ? k ;i = lim N!1 1 N E 2 4 log I + P A `=k H ` H H ` Q ` I + P A `=k+1 H ` H H ` Q ` 3 5 where the term on the right-hand side is the average per-user rate given by the solution of (2.11) for some choice of the weightsfW k g and Lagrange multipliersf m g. It is well- known that, for a deterministic network, the dynamic scheduling policy described before coincides with the Lagrangian dual optimization with outer subgradient iteration, where the Lagrangian dual variables play the role of the virtual queues backlogs in the dynamic setting. In the large-system limit, the channel uncertainty disappears and the MU-MIMO system \freezes" to a deterministic limit. Using the large-system limit solution of (2.11) 2 Here we assume that all users have equal priority. For example, they are all delay-tolerant data users with no particular individual priority: users dier only by their location in the cluster, which determines their channel coecientsf k;m g. 36 presented in Section 2.2, the solution of the fairness scheduling problem (2.10) can be addressed directly, using Lagrangian duality. 2.3.1 Lagrangian Optimization We rewrite (2.10) using the auxiliary variables r = [r 1 ;:::;r A ] and using the denition of the ergodic rate region (2.9) as: min max r;Q; g(r) subject to r k 1 N E 2 4 log I + P A `=k H ` H H ` Q ` I + P A `=k+1 H ` H H ` Q ` 3 5 ; tr(Q)Q; 0 (2.33) The Lagrange function for (2.33) is given by L(; r; Q;; W) =g(r) A X k=1 W k 0 @ r k 1 N E 2 4 log I + P A `=k H ` H H ` Q ` I + P A `=k+1 H ` H H ` Q ` 3 5 1 A =g(r) A X k=1 W k r k | {z } f W (r) + A X k=1 W k 1 N E 2 4 log I + P A `=k H ` H H ` Q ` I + P A `=k+1 H ` H H ` Q ` 3 5 | {z } h W (;Q;) (2.34) where W is the vector of dual variables corresponding to the auxiliary variable constraints (rate constraints). The Lagrange function can be decomposed into a sum of functions of 37 r only, denoted by f W (r), and a function of ; Q and only, denoted by h W (; Q;). The Lagrange dual function for the problem (2.34) is given by G(W) = min max r;Q; L(; r; Q;; W) = max r f W (r) | {z } (a) + min max Q; h W (; Q;) | {z } (b) (2.35) and it is obtained by the decoupled maximization in (a) (with respect to r) and the min- max in (b) (with respect to ; Q;). Notice that problems (a) and (b) correspond to the static forms of (2.32) and (2.31), respectively, after identifying r with the virtual arrival rates A(t) and W with the virtual queue backlogs U(t). Finally, we can solve the dual problem dened as min W0 G(W) (2.36) Eventually, the solution of (2.36) can be found via inner-outer iterations as follows: Inner Problem: For given W, we solve (2.35) with respect to , r, Q and . This can be further decomposed into: Subproblem (a): Since f W (r) is concave in r 0, the optimal r readily obtained by imposing the KKT conditions. Subproblem (b): Taking the limit of N!1, this problem is solved by Algorithm 2.1 for xed > 0. If the system satises the symmetry conditions of Theorem 2.2 hold, then we let m = 1 and no minimization with respect to m is needed. If these conditions do not hold, the outer minimization can be solved by the gradient descent method with the approximated gradient. Otherwise, letting m = 1 yields an upper bound on the achievable network utility function, corresponding to the relaxation of the per-BS power constraints to the sum-power constraint. 38 Outer Problem: the minimization ofG(W) with respect to W 0 can be obtained by subgradient adaptation. Let , , Q and r (W) 3 denote the solution of the inner problem for xed W. For any W 0 , we have G(W 0 ) = max r f W 0(r) + max Q h W 0( ; Q; ) f W 0(r (W)) +h W 0( ; Q ; ) =G(W) + A X k=1 W 0 k W k (R k (W)r k (W)) (2.37) whereR k (W) denotes thek-th group rate resulting from the solution of the inner problem with weights W, which is eciently calculated by Algorithm 2.1 in the large-system regime. A subgradient forG(W) is given by the vector with componentsR k (W)r k (W). Eventually, the dual variables W can be updated at then-th outer iteration according to W k (n + 1) =W k (n)(n) (R k (W(n))r k (W(n))); 8k (2.38) for some step size (n) > 0 which can be determined by an ecient algorithm such as the back-tracking line search method [BV04] or Ellipsoid method [BT97]. In the numerical example of Section 2.4, we use the back-tracking line search method. It should be noticed that by setting (n) = 1 this subgradient update plays the role of the virtual queue update in the dynamic scheduling policy of (2.30). But in this optimization, the objective function converges to a single optimal point by the iterations and, by adjusting the step size (n) with the above methods, the convergence can be attained very fast. As an application example of the above general optimization, we focus on the two spe- cial cases of proportional fairness scheduling (PFS) and hard-fairness scheduling (HFS), also known as max-min fairness scheduling. 3 It is useful to explicitly point out the dependence of W only for r (W), since this appears in the subgradient expression, although it is clear that ; and Q also in general depend on W. 39 2.3.2 Proportional Fairness Scheduling The network utility function for PFS is given as g(r) = A X k=1 log(r k ) (2.39) In this case, the KKT conditions for the inner subproblem (a) yield the solution r k (W) = 1=W k ; 8k (2.40) (notice that r k must be positive for all k otherwise the objective function is1). As mentioned before, the dual variables play the role of the virtual queue backlogs in the dynamic scheduling policy, while the auxiliary variables r correspond to the virtual arrival rates. From (2.40), we see that at the n-th outer iteration these variables are related by W k (n) = 1 r k (W(n)) . As observed at the beginning of Section 2.3, the virtual arrival rates of the dynamic scheduling policy are designed in order to be close to the ergodic rates R ? at the optimal fairness point. It follows that the usual intuition of PFS, according to which the scheduler weights are inversely proportional to the long-term average user rates, is recovered. 2.3.3 Hard Fairness Scheduling In case of HFS, the scheduler maximizes the minimum ergodic user rate. The network utility function is given by g(r) = min k=1;:::;A r k : (2.41) 40 This objective function is not strictly concave and dierentiable everywhere. Therefore, it is convenient to rewrite subproblem (a) by introducing an auxiliary variable , as follows: max ;r0 A X k=1 W k r k subject to r k ; 8k (2.42) The solution must satisfy r k = for all k, leading to max >0 (1 A X k=1 W k ) : (2.43) Since the original maximization in (2.33) is bounded while (2.43) may be unbounded, we must have that P A k=1 W k = 1 and must take on some appropriate value that enforces this condition. The subgradient iteration for the weights W, using r k (W(n)) = (W(n)), becomes W k (n + 1) =W k (n) (R k (W(n)) (W(n))); 8k (2.44) Summing up the update equations over k = 1;:::;A and using the conditions that P A k=1 W k (n) = 1 for all n, we obtain r k (W(n)) = (W(n)) = 1 A A X j=1 R j (W(n)); 8k (2.45) 2.4 Numerical Results In this section we present some examples of the multi-cell model considered in this paper and (when possible) compare the numerical results using the proposed large-system anal- ysis with the results of Monte Carlo simulation applied to an actual nite-dimensional system subject to the dynamic fairness scheduling policy outlined at the beginning of Section 2.3. 41 0 50 100 150 200 250 0 2 4 6 8 10 12 14 Outer iteration index, n Objective function, Σ k log(r k ) Full cooperation No cooperation (a) Convergence of the utility function −1 −0.5 0 0.5 1 0 2 4 6 8 10 12 Group location (km) Group rate (bps/Hz) Finite−dim. DPC, N=1 Finite−dim. DPC, N=2 Finite−dim. DPC, N=4 Large system limit No cooperation Full cooperation (b) Individual group rates Figure 2.2: Utility function and individual group rates under PFS in the 2-cell model. Antenna ratio = 4 and K = 8 user groups. The examples involve a one-dimensional 2-cell model (M = 2) and a two-dimensional three-sectored 7-cell model (M = 21). In both cases, the system parameters and pathloss model are based on the mobile WiMAX system evaluation specication [WiM06] with cell radius 1.0 km and no shadowing assumption. The 2-cell model, shown in Figure 2.1, considers two one-sided BSs with = 4, located at1 km, and K = 8 user groups equally spaced between the BSs. We consider the case of full BS cooperation (i.e., one cluster comprising the two BSs) and no cooperation (i.e., two clusters of one BS each) with a symmetric partition of users, yielding L = 2 clusters withK 1 =f1; 2; 3; 4g and K 2 =f5; 6; 7; 8g. 42 0 100 200 300 400 500 600 2 3 4 5 6 7 8 Outer iteration index, n Objective function, min k r k Full cooperation No cooperation (a) Convergence of the utility function −1 −0.5 0 0.5 1 0 2 4 6 8 10 12 Group location (km) Group rate (bps/Hz) Finite−dim. DPC, N=1 Finite−dim. DPC, N=2 Finite−dim. DPC, N=4 Large system limit No cooperation Full cooperation (b) Individual group rates Figure 2.3: Utility function and individual group rates under HFS in the 2-cell model. Antenna ratio = 4 and K = 8 user groups. 43 Figure 2.2 illustrates the convergence of the utility function and individual group rates under PFS. In Figure 2.2(a), the PFS objective functions in the no cooperation and full cooperation cases are shown to converge to the respective optimal PFS points. Not surprisingly, the full cooperative system achieves signicantly higher value of the objective function (sum of the log-rates). In Figure 2.2(b), we compare the asymptotic rates in the large-system limit with the achievable rates obtained by using Monte Carlo simulation in nite dimension. In nite dimension we considered N = 1, 2, or 4 and the same parameters of the innite-dimensional case. The channel vectors are randomly generated and change at every t in an i.i.d. fashion, and the instantaneous rates are allocated by using the DPC with the water-lling algorithm [Yu06] combined with the dynamic scheduling policy [SMCN10] outlined in Section 2.3. Remarkably, the nite-dimensional simulation produced nearly the same rates for the considered values of N and these rates also almost overlap with the large-system asymptotic results even for very smallN. Notice that the dynamic scheduling policy should provide multi-user diversity gain and in general should achieve higher rates than the large-system limit, which is not able to exploit the dynamic uctuations of the small-scale fading due to \channel hardening". However, it appears that in the regime where the pathloss is dominant over the randomness of the multi-antenna channels and the number of users is not much larger than the number of BS antennas, the multi-user diversity gain is negligible and the asymptotic analysis generates results very close to the simulations with dynamic scheduling and DPC. Figure 2.3 shows the convergence behavior of the utility function and individual group under HFS. In the HFS case, all the users achieve the same individual rate which is slightly higher than the smallest rate of the PFS case. Also, the agreement with of the individual user rates with the nite dimensional simulation is noticeable. Using the proposed asymptotic analysis, validated in the simple 2-cell model, we can obtain ergodic rate distributions for much larger systems, for which a full-scale simulation would be very demanding. We considered a two-dimensional cell layout where 7 hexagonal cells form a network and each cell consists of three 120-degree sectors. As depicted in 44 sector 1 sector 2 sector 3 main antenna lobe of sector 1 base stations user group (a) 3-sectored cell conguration 1 2 5 6 7 4 3 7' 3" 2' 4" 3' 5" 4' 6" 5' 7" 6' 2" (b) Wrap-around torus topology Figure 2.4: Two-dimensional three-sectored hexagonal 7-cell model. Figure 2.4(a), three BSs are co-located at the center of each cell such that each BS handles one sector in no cooperation case. Each sector is split into the 4 diamond-shaped equal- area grids and one user group is placed at the center of each grid. Therefore there are total M = 21 BSs and K = 84 user groups in the network. In addition, we assume a wrap-around torus topology as shown in Figure 2.4(b), such that each cell is virtually surrounded by the other 6 cells and all the cells have the symmetric ICI distribution. The antenna orientation and pattern follows [IEE09] and the non-ideal spatial antenna gain pattern (overlapping between sectors in the same cell) generates interference even between sectors of the same cell. Figure 2.6 shows the user rate distribution with PFS under three levels of cooperation (a) no cooperation; (b) three-sector cooperation within each cell; and (c) full cooperation over 7-cell network, as described in Section 2.1 and illustrated in Figure 2.5. From the asymptotic rate results, it is shown that in case (b), the cooperation gain over the case (a) is primarily obtained for the users around cell centers, while the cooperation gain is attained over the whole cellular coverage area in case (c). 45 (a) No cooperation (b) Three-sector cooperation within each cell (c) Full cooperation over 7 cells Figure 2.5: Cluster forms for three levels of cooperation with dierent colors denoting dierent clusters. 2.5 Proofs 2.5.1 Proof of Lemma 2.1. Denote the utility function in (2.12) as a function of diagonal entries of Q as f(fq k;i : 1kA; 1iN;g). Since users in the same group are statistically equivalent and the function f() is dened through an expectation with respect to the channel fading coecients, it follows that f() must be invariant with respect to permutations of the arguments q k;1 ;:::;q k;N . That is, for any k = 1;:::;A, and 1i<jN, the value of the function is invariant if the argumentsq k;i andq k;j are exchanged. Suppose thatfq k;i g is the optimal input power allocation, solution of (2.11). Then, we have f(:::;q k;i ;:::;q k;j ;:::) =f(:::;q k;j ;:::;q k;i ;:::) f(:::; q k;i +q k;j 2 ;:::; q k;i +q k;j 2 ;:::): where the inequality follows from the concavity off() and Jensen's inequality. Under the optimality assumption, equality must hold and this implies that q k;i +q k;j 2 is the optimal input power for both users i and j in group k. Extending this argument by induction, it follows that the optimal input power must be in the formq ? k;i =Q k =N for alli = 1;:::;N, for some values Q 1 ;:::;Q A . 46 −3 −2 −1 0 1 2 3 −2 0 2 1 1.5 2 2.5 3 x−coordinate (km) y−coordinate (km) Asymptotic group rate (bps/Hz) (a) No cooperation −3 −2 −1 0 1 2 3 −2 0 2 2 3 4 5 x−coordinate (km) y−coordinate (km) Asymptotic group rate (bps/Hz) (b) Three-sector cooperation within each cell −3 −2 −1 0 1 2 3 −2 0 2 5 6 7 x−coordinate (km) y−coordinate (km) Asymptotic group rate (bps/Hz) (c) Full cooperation over 7 cells Figure 2.6: Ergodic user rate distribution under PFS in the 7-cell model. Antenna ratio = 4 and K = 84 user groups. 47 2.5.2 Proof of Lemma 2.2. The dual variable m plays the role of the noise power at the antennas of the m-th BS in the dual MAC. Let G W () = max Q:tr(Q)Q F W; (Q) and suppose that ? m = 0 for somem. Then,j( ? )j = 0 in (2.12) andG W ( ? ) goes to positive innity, which clearly cannot be the solution of the minimization with respect to in (2.5). Therefore, the optimal ? m is strictly positive for all m = 1; ;B. 2.5.3 Proof of Lemma 2.3. We apply [TLV05, Lemma 1] to the matrix I+ P A `=k H ` H H ` Q ` where H k:A = [H k ;:::; H A ] has independent non-identically distributed components. The variance prole in [TLV05, Lemma 1] is dened as the limit of the variance of the elements of the matrix H k:A , multiplied by the number of rows, BN. With the normalization, the elements of each (m;`)-th block of H k:A of size NN have variance 2 m; ` =m N . Therefore, the variance prole for the application of [TLV05, Lemma 1] is given by BG(r;t), whereG(r;t) is the piecewise constant function dened above. Eventually, we arrive at (2.22). 2.5.4 Proof of Theorem 2.2. Let P i;j =E [ e H] i;j 2 denote the variance of the (i;j)-th element of the channel matrix. For anyb = 1;:::;A 0 , the matrix e H (b1)B+1:A has the property that the empirical distri- bution of the element variances for all rows, i.e., the cumulative distribution functions F (N) i;b (z) = 1 (A (b 1)B)N AN X j=(b1)BN+1 1fP i;j zg are the same, for all row indexi = 1;:::; BN. This means that the matrix of the element variancesfP i;j g corresponding to e H (b1)B+1:A is row-regular (see denition in [TLV05, Denition 5]). Under the row-regularity condition, it follows that e H (b1)B+1:A in the limit ofN!1 is statistically equivalent to a matrix H (b1)B+1:A = G (b1)B+1:A T 1=2 (b1)B+1:A , where G (b1)B+1:A is an i.i.d. matrix with zero-mean and unit-variance elements and 48 T (b1)B+1:A is a non-negative diagonal matrix with asymptotic empirical spectral dis- tribution given by lim N!1 F (N) i;b (z). In particular, the distribution of H (b1)B+1:A is asymptotically unitary left-invariant, that is, for any unitary matrix U independent of H (b1)B+1:A , the matrices U H (b1)B+1:A and H (b1)B+1:A are asymptotically identically distributed. Let denote a BN BN block-permutation matrix, that permutes the B blocks of consecutive positions of length N in the index vectorf1;:::; BNg. Using the above asymptotic statistical equivalence, in the limit of largeN we can write, for anyf m g and fQ k g, F W; (Q 1 ;:::;Q A ) (a) = A 0 X b=1 0 b N E 2 4 1 B! X log I + 1 () H (b1)B+1:A Q (b1)B+1:A H H (b1)B+1:A T 3 5 = A 0 X b=1 0 b N E 2 4 1 B! X log I + T 1 () H (b1)B+1:A Q (b1)B+1:A H H (b1)B+1:A 3 5 = A 0 X b=1 0 b N E 2 4 1 B! X log I + T () 1 H (b1)B+1:A Q (b1)B+1:A H H (b1)B+1:A 3 5 (b) A 0 X b=1 0 b N E 2 4 log I + 0 @ 1 B! X T () 1 A 1 H (b1)B+1:A Q (b1)B+1:A H H (b1)B+1:A 3 5 (c) = A 0 X b=1 0 b N E h log I + H (b1)B+1:A Q (b1)B+1:A H H (b1)B+1:A i (2.46) where (a) follows from the left-unitary invariance, (b) follows from Jensen's inequality and (c) from the fact that, without loss of generality, we let 1 B P B m=1 m = 1. This shows that, for asymptotically large N and under the given symmetry conditions of the channel coecients and rate weights, the worst-case Lagrange multipliers for the weighted maximization of the average rates in (2.11) is m = 1. Since forN!1 the instantaneous rates in (2.5) converge to the average rates in (2.11), the theorem is proved. 49 Chapter 3 Inter-cell Cooperation and Impact of Channel Training In this chapter, we analyze the asymptotic achievable rate of linear zero-forcing beam- forming (LZFBF) in the large system limit under non-perfect channel state information at the transmitter and investigate the trade-o between the benets of inter-cell cooperation and the accompanying costs of channel training. We begin with the nite dimensional system and consider its user selection and power optimization problem under the sum- power and per-base station power constraints. Then we take the large system limit and present the limiting regime of the LZFBF precoder and the optimization algorithm for user selection and power allocation. We also analyze the impact of non-perfect CSIT and training overhead, present numerical results including comparison with the nite- dimensional simulation results, and propose the low-complexity randomized scheduling scheme. 3.1 Linear Zero-forcing Beamforming 3.1.1 System Setup Consider a cellular system formed by M BSs, with N antennas each, and KN single- antenna user terminals spatially distributed in the coverage area. We assume that the users are divided into K co-located \user groups" with N users each. Users in the same group are statistically equivalent: they see the same pathloss coecients from all BSs 50 and their small-scale fading channel coecients are i.i.d.. The received signal vector y k = [y k;1 y k;N ] T 2C N for the k-th user group is given by y k = M X m=1 m;k H H m;k x m + n k (3.1) where m;k and H m;k denote the distance dependent pathloss coecient and NN small-scale channel fading matrix from the m-th BS to the k-th user group, respectively, x m = [x m;1 x m; N ] T 2C N is the transmitted signal vector of the m-th BS, subject to the power constraint tr (Cov(x m ))P m , and n k = [n k;1 n k;N ] T 2C N denotes the additive white Gaussian noise (AWGN) at the user receivers. The elements of n k and of H m;k are i.i.d.CN (0; 1). We recall the cluster formation described in Section 2.1 and then a cooperative cell arrangement with L cooperation clusters is dened by the BS partitionfM 1 ; ;M L g of the BS setf1; ;Mg and the corresponding user group partitionfK 1 ; ;K L g of the user group setf1; ;Kg. We assume that the BSs in each clusterM ` act as a single distributed multi-antenna transmitter with jM ` jN antennas, perfectly coordinated by a central cluster controller, and serve users in groups k 2 K ` . The clusters do not cooperate and treat ICI from other clusters as noise. Assuming that each BS operates at its maximum individual transmit power, the ICI plus noise power at any user terminal in group k2K ` is given by 2 k = 1 + X m= 2M ` 2 m;k P m : (3.2) Each cluster seeks to maximize its own objective function dened by the fairness schedul- ing. It is easy to show that, under the above system assumptions, the selsh optimal strategy that operates at maximum per-BS power is a Nash equilibrium of the system. At this Nash equilibrium, the clusters are eectively decoupled since the eect that other clusters have on each cluster ` is captured by the ICI terms in (3.2) that do not depend on the actual BS transmit covariances Cov(x m ). 51 From the viewpoint of cluster`, the system is equivalent to a single-cell MIMO down- link channel with a modied channel matrix and noise levels and a per-BS power con- straint. Therefore, from now on we focus on a given reference cluster ` = 1 and, without loss of generality, we indicate the user groups in the reference cluster as k = 1;:::;A, with A =jK 1 j, and the BSs inM 1 as m = 1;:::;B with B =jM 1 j. After a convenient re-normalization of the channel coecients, we arrive at the equivalent channel model for the reference cluster given by y = H H x + z (3.3) with y2C AN , x2C BN , zCN (0; I AN ) and the channel matrix H2C BNAN is given by H = 2 6 6 6 6 4 1;1 H 1;1 1;A H 1;A . . . . . . B;1 H B;1 B;A H B;A 3 7 7 7 7 5 ; (3.4) where we dene m;k = m;k = k . The pathloss coecients are xed constant, that depend only on the geometry of the system, and the small-scale fading coecients are assumed to change independently from time slot to time slot according to a classical block-fading model. This is representative of a typical situation where the distance between BSs and users changes signicantly over a time-scale of the order of the tens of seconds, while the small-scale fading decorrelates completely within a few milliseconds [TV05]. Here, a \time slot" indicates the number of channel uses over which the small-scale coecients can be considered constant. This is approximately equal to the product of the channel coherence time and the channel coherence bandwidth [TV05]. 3.1.2 Downlink Scheduling Optimization Problem Each cluster controller operates according to a downlink scheduling scheme that allocates instantaneously the transmission resource (signal dimensions and transmit power) to the users. Following [MW00], the scheduling problem is formulated as the maximization of 52 a suitable strictly increasing and concave network utility function g() over the region of achievable ergodic rates (throughput region). For users in group k, we denote the normalized sum of individual user throughputs by R k = 1 N P N i=1 R (i) k . By symmetry, the users in the same group achieve the same throughput, therefore, the throughput vector with R (i) k = R k for all users i in group k is achievable. We assume that g() is symmetric in the throughput of users belonging to the same group. Therefore, letting R = (R 1 ; ;R A ), the fairness scheduling problem is formulated as maximize g(R) subject to R2R (3.5) In this work, we consider LZFBF downlink precoding, such thatR indicates the through- put region achievable by LZFBF for the channel model (3.3), under the assumption of operating at the Nash equilibrium said above. Notice thatR includes all rates obtained by long-term (ergodic) average over the time-varying fading channels and by time-sharing of all possible LZFBF policies (including user selection and power allocation). A schedul- ing policy achieving the optimum throughput point R ? solution of (3.5) consists of a rule that, at each scheduling time slot, maps the available channel information H into a set of users and transmit powers, such that the resulting rates, averaged over time, converge to R ? . As a rst step towards the solution of (3.5), we focus on the weighted instantaneous sum-rate maximization problem: maximize A X k=1 N X i=1 W (i) k R (i) k subject to R2R lzfbf (H) (3.6) whereW (i) k denotes the rate weight for user i in groupk, andR lzfbf (H) is the achievable instantaneous rate region of LZFBF for given channel matrix H. By \instantaneous", we 53 mean that this rate region depends on the given channel realization H, in contrast with the throughput regionR, that depends on the statistics of H. Realistically, we assume that A B (i.e., the number of users in the cluster is larger than or equal to the total number of BS antennas in the cluster) and that all coecients m;k are strictly positive. Therefore, rank(H) = BN is almost surely satised. In this case, LZFBF cannot serve simultaneously all users in the cluster, and the scheduler must select a subset of users not larger than BN, to be served at each time slot. The solution of (3.6) is generally dicult, since it requires a search over all user subsets of cardinality less or equal to BN. Well-known approaches (see [DS05,YG06]) consider the selection of a user subset in some greedy fashion, by adding users to the active user set one by one, till the objective function in (3.6) cannot be improved further. Finally, even for a xed set of active users, the problem of optimal LZFBF precoding subject to a per-BS power constraint is non-trivial and has been recently addressed in [WES08,HPC09,Zha10] through fairly involved numerical algorithms. Because of these diculties, problem (3.6) has so far escaped a clean analytical solution and most studies resort to extensive and costly Monte Carlo simulation. In order to overcome the above diculties, we make the following simplifying assump- tions: 1) The scheduler picks a fraction k of users in groupk by random selection inside the group. User selection is random and independent on each scheduling time slot; 2) The LZFBF precoder is obtained by normalizing the columns of the Moore-Penrose pseudo- inverse of the channel matrix, although this choice is not necessarily optimal under the per-BS power constraint [WES08]. Under these assumptions, let = ( 1 ;:::; A ) denote the fractions of active users in groups 1;:::;A, respectively. For given , the corresponding eective channel matrix is given by H = 2 6 6 6 6 4 1;1 H 1;1 ( 1 ) 1;A H 1;A ( A ) . . . . . . B;1 H B;1 ( 1 ) B;A H B;A ( A ) 3 7 7 7 7 5 ; (3.7) 54 where the blocks H m;k ( k ) is a N k N dimensional submatrix of H m;k . The user fractions must satisfy k 2 [0; 1] for each k = 1;:::;A and = 1:A B where we introduce the notation 1:k = k X j=1 j : (3.8) Hence, rank(H ) =N is almost surely satised. The LZFBF precoding scheme yields the transmitted signal for active users, x in the form x = V Q 1=2 u (3.9) where u is the independently coded unit-power user symbol vector of length N, V is the precoding matrix with unit-norm columns and Q is the diagonal matrix which contains user powers on the diagonal. In particular, here we assume that V is obtained from the Moore-Penrose pseudo-inverse as follows: dene the pseudo-inverse of H H as H + = H (H H H ) 1 ; (3.10) and then we let V = H + 1=2 where the column-normalizing diagonal matrix contains the reciprocal of the squared norm of columns of H + on the diagonal. Letting (i) k () denote the diagonal element of in position 1:k1 N +i, for i = 1;:::; k N, we have (i) k () = 1 H H H 1 (i) k (3.11) where H H H 1 (i) k denotes the element in the corresponding position 1:k1 N +i of the main diagonal of the matrix H H H 1 . Rewriting (3.3) with (3.7) and (3.9) and noticing that H H V = 1=2 , we arrive at the \parallel" channel model for all users selected simultaneously by the LZFBF precoder in the form y = 1=2 Q 1=2 u + z : (3.12) 55 The optimization of (3.6) for the channel model (3.12) is still involved, since the channel coecients (i) k () depend on the active user fractions in a complicated and non-convex way. Nevertheless, as an intermediate step, we consider here the solution of (3.6) for xed user fractions . 3.1.3 Power Allocation under Sum-power or Per-BS Power Constraints We divide all channel matrix coecients by p N and multiply the BS input power con- straints P m by N, thus obtaining an equivalent system where the channel coecients have variance that scales as 1=N. This is useful when we consider the large-system limit for N!1 in Section 3.2. Let q (i) k denote the diagonal element in position 1:k1 N +i of Q, corresponding to the power allocated to the i-th user of group k. The sum-power constraint is given by 1 N tr(Q) = 1 N A X k=1 k N X i=1 q (i) k P sum (3.13) whereP sum = P B m=1 P m . In order to express the per-BS power constraint, let m denote a diagonal matrix with all zeros, but for N consecutive ones, corresponding to positions from (m 1) N + 1 to m N on the main diagonal. Then, the per-BS power constraint is expressed in terms of the partial trace of the transmitted signal covariance matrix as 1 N tr m V QV H P m ; m = 1;:::;B (3.14) More explicitly, (3.14) can be written in terms of the powers q (i) k as A X k=1 k N X i=1 q (i) k (i) m;k P m ; m = 1;:::;B (3.15) 56 where we dene the coecients (i) m;k () = 1 N m N X `=(m1) N+1 V (i) `;k 2 (3.16) and where V (i) `;k denotes the element of V corresponding to the `-th row and the ( 1:k1 N +i)-th column. Since V has unit-norm columns, then P B m=1 (i) m;k = 1=N for all k;i. The power optimization problem that solves (3.6) for xed user fractions is given by maximize A X k=1 k N X i=1 W (i) k log(1 + (i) k ()q (i) k ) (3.17) subject to (3.13) in the case of sum-power constraint, or to (3.15) for the case of per-BS power constraint. The solution of (3.17) subject to the sum-power constraint is immediately given by the water-lling formula q (i) k = " W (i) k 1 (i) k () # + (3.18) where 0 is the Lagrange multiplier corresponding to the sum-power constraint. In the case of per-BS power constraint, we can use Lagrange duality and sub-gradient iteration method as given in the following. The Lagrangian for (3.17) is given by (depen- dency on is dropped for notation simplicity) L(q;) = A X k=1 k N X i=1 W (i) k log(1 + (i) k q (i) k ) T [q P] (3.19) where 0 is a vector of dual variables corresponding to the B BS power constraints, is the BN matrix containing the coecients (i) m;k and P = (P 1 ;:::;P B ) T . The KKT conditions are given by @L @q (i) k =W (i) k (i) k 1 + (i) k q (i) k T (i) k 0 (3.20) 57 where (i) k is the column of containing the coecients (i) m;k for m = 1;:::;B. Solving for q (i) k , we nd q (i) k () = " W (i) k T (i) k 1 (i) k # + (3.21) Then, replacing this solution into L(q;), we solve the dual problem by minimizing L(q();) with respect to 0. It is immediate to check that for any 0 0, L(q( 0 ); 0 )L(q(); 0 ) = ( 0 ) T (P q()) +L(q();) (3.22) Therefore, (P q()) is a subgradient forL(q();). It follows that the dual problem can be solved by a simple B-dimensional subgradient iteration over the vector of dual variables . 3.2 Large System Limit In this section, we consider the limit of the above instantaneous rate maximization prob- lems in the limit N!1, when ;A;B; and are xed. While the coecients of the limiting optimization problem are obtained in general through the solution of xed-point equations, under certain system symmetries, a closed-form explicit expression can be found. Then, we consider the simultaneous optimization of the user fractions and powers in the large system limit. Finally, we consider the solution of the downlink scheduling optimization problem (3.5) in the large system regime. 3.2.1 Asymptotic Analysis We start by nding the large system limit expression for the coecients (i) k (). This is provided by: 58 Theorem 3.1 For all i = 1;:::; k N, the following limit holds almost surely: lim N!1 (i) k () = k () = B X m=1 2 m;k m () (3.23) where ( 1 ();:::; B ()) is the unique solution in [0; 1] B of the xed point equations m = 1 A X q=1 q m 2 m;q P B `=1 ` 2 `;q ; m = 1;:::;B (3.24) with respect to the variables =f m g. Proof See Section 3.5.1. Notice that the limit (3.23) depends only on k (user group index) and not on i (user index within the group), consistently with the fact that, in this model, users in the same co-located group are statistically equivalent. Under some special symmetry conditions, the general problem simplies signicantly. We recall the system symmetry in Section 2.2.4 and assume that B divides A, let A 0 = A=B, and assume that the BA matrix of channel gains = [ 2 m;k ] can be partitioned intoA 0 submatrices of sizeBB such that each submatrix has the property that all rows are permutations of the rst row, and all columns are permutations of the rst column. In analogy with \strongly symmetric" discrete memoryless channels, we shall refer to these submatrices as \strongly symmetric blocks." Also, we dene a set of user groups whose corresponding columns in matrix form a strongly symmetric block, as a user equivalence class. Then, we have A 0 user equivalence classes, where each class corresponding to the user groups whose columns in the gain matrix belong to the same strongly symmetric block. We re-index the user groups such that groupsf(j 1)A 0 +i :j = 1;:::;Bg form the i-th equivalence class, for i = 1;:::;A 0 . To x ideas, consider Figure 2.1 where a 59 linear cellular layout with 2 cells and 8 user groups is considered, and assume that the gain coecients depend on distance and therefore are given as = 2 6 4 a b c c f e e d f e e d a b c c 3 7 5 (3.25) for some positive numbers a;b;c;d;e;f. We notice that this matrix can be decomposed into the A 0 = 4 strongly symmetric blocks 2 6 4 a f f a 3 7 5; 2 6 4 b e e b 3 7 5; 2 6 4 c e e c 3 7 5; 2 6 4 c d d c 3 7 5 with the required property. When this symmetry condition holds, under mild conditions on the concave network utility function, we have that the scheduler must set the fractions of all user groups in the same equivalence class to be equal. This is because such groups (e.g., user group pairs (1; 5), (2; 6), (3; 7), and (4; 8) in the example) are completely equivalent in terms of the set of channel gains seen from all BS in the cluster. We indicate the corresponding common fraction values as 0 i for i = 1;:::;A 0 , such that (j1)A 0 +i = 0 i for all j = 1;:::;B. In this case, for any m, we have A X q=1 q 2 m;q P B `=1 2 `;q = 1 A 0 X i=1 0 i B X j=1 2 m;(j1)A 0 +i P B `=1 2 `;(j1)A 0 +i = 1 A 0 X i=1 0 i = B where we have used the fact that, by the symmetry condition, P B j=1 2 m;(j1)A 0 +i P B `=1 2 `;(j1)A 0 +i = 1 and P A 0 i=1 0 i = 1 B P A q=1 q = B . It follows that the solution of the xed point equation (3.24) is given explicitly by m () = 1 B (3.26) 60 which is independent of m, and (3.23) yields k () = 1 B B X m=1 2 m;k : (3.27) For all k = (j 1)A 0 +i; 8j = 1;:::;B (indicating user groups in the same equivalence class), the sum P B m=1 2 m;k is a constant independent of k. Then, with some abuse of notation, we introduce the notation 2 i = P B m=1 2 m;k for all groups k in the i-th equivalence class. 3.2.2 Weighted Sum-rate Maximization Using the obtained asymptotic results, we consider the weighted sum-rate maximization problem in (3.17). First, we focus on the sum-power constraint (3.13). In the large system limit, we consider maximizing the weighted aggregate user group rates, normalized byN, in the case where the weightsW (i) k for all users in groupk are identical (i.e., independent of i), and denoted by W k . We shall see later that the weights are calculated by the scheduler that solves the general network utility maximization problem (3.5) and, under our assumptions on the form of g(), these weights must be identical for statistically equivalent users. Therefore, this assumption does not involve any loss of generality when the weighted-sum rate maximization problem is used as an intermediate step for the scheduling rule that addresses the throughput maximization problem (3.5). Furthermore, by the symmetry of the water-lling power allocation, it also follows that the power q (i) k allocated to all active users in group k must be identical (independent of i), and can be denoted by q k . In these conditions, from (3.23) we have that the instantaneous group rate converges to 1 N P N i=1 R (i) k ! k R k with R k = log (1 + k ()q k ) (3.28) 61 Notice that R k is the limit instantaneous rate for any user in group k that is eectively scheduled (the non-scheduled users have zero instantaneous rate). Because of the large- system limit, this rate is a deterministic quantity. Using (3.24), we can write the large-system limit weighted sum-rate maximization problem subject to the sum-power constraint in the form: maximize A X k=1 W k k log 1 + B X m=1 2 m;k m ! q k ! (3.29a) subject to A X k=1 k q k P sum ; A X k=1 k B; (3.29b) m = 1 A X k=1 k m 2 m;k P B `=1 ` 2 `;k ; m = 1;:::;B (3.29c) 0 m 1; m = 1;:::;B (3.29d) q k 0; 0 k 1; k = 1;:::;A (3.29e) This problem generally non-convex in q; and. However, for xed and, it is convex in q, and the solution is given by water-lling (see also (3.18)): q k = 2 4 W k 1 P B m=1 2 m;k m 3 5 + (3.30) For xed and q, we have a linear program with respect to . Finally, for xed and q the problem is degenerate with respect to since the equality constraint (3.29c), that corresponds to the xed-point equation (3.24) has a unique solution 2 [0; 1] B for all feasible . In the symmetric system case with the conditions given in the previous section, we have that user groups in the same equivalence class are completely symmetric, since the limits k () depend only on the equivalence class and not on the specic user group in the class. Assuming that the user groups in the same equivalence class also have the same rate weights, the optimization problem in the symmetric case reduces to optimizing the 62 powers q 0 i and the fractions 0 i for the equivalence classes. Letting 0 = P A 0 i=1 0 i ==B, we can state the optimization problem in the symmetric case as: maximize B A 0 X i=1 W i 0 i log 1 + 1 0 2 i q 0 i (3.31a) subject to B A 0 X i=1 0 i q 0 i P sum ; (3.31b) A 0 X i=1 0 i ; (3.31c) q 0 i 0; 0 0 i 1; i = 1;:::;A 0 (3.31d) The net eect of the symmetry is a sort of \resource pooling": the system with a cluster of B cooperating BSs reduces to an equivalent single-BS system with total transmit power P sum =B, load 0 = =B, A 0 = A=B user classes, and equivalent channel path gains 2 i = P B m=1 2 m;k given by the combination of the path gains from all BSs in the cluster to the user groups in the i-th equivalence class. Next, we consider the per-BS power constraint given in (3.15). Since the users in group k have identical k (), independent of i, and that we can assume without loss of generality that they are all given positive powers q (i) k > 0 (notice: if a user were given zero power we could decrease the corresponding fraction k and take it out of the active user set), also in this case we arrive at the conclusions that in the large system limit the allocated powers to users in group k must be identical, i.e., q (i) k = q k , independent of i. Using this in the constraint (3.15), we obtain A X k=1 q k m;k ()P m ; m = 1;:::;B; (3.32) where m;k () = k N X i=1 (i) m;k () = 1 N k N X i=1 m N X `=1+(m1) N V (i) `;k 2 : (3.33) 63 It is interesting to notice that m;k () is the squared Frobenius norm (normalized by N) of the submatrix of V corresponding to the users in groupk (columns from 1:k1 N +1 to 1:k N) and the antennas of BSm (rows from (m1) N +1 tom N). However, under the system symmetry conditions, we have the following result: Theorem 3.2 For symmetric systems, if all the BSs in the cluster have the equal power constraint, i.e., P 1 =::: =P B =P , then the per-BS power constraint coincides with the sum power constraint with P sum =BP . Proof See Section 3.5.2. It follows that under the conditions of Theorem 3.2 the problem (3.31) completely characterizes the optimization problem also under the per-BS power constraints. 3.2.3 Optimization of the User Fractions and Powers While (3.29) is still a non-convex problem in (q;), we can nd near-optimal solutions by borrowing from the greedy user selection heuristic used in the nite-dimensional case (see [DS05,YG06]). In particular, we consider the approach of incrementing user fractions sequentially in very small steps, 1, until the objective function value cannot be increased any longer. If we take the innitesimal of , this is equivalent to greedy user selection in the large system limit where denoting the fraction of one user to the total number of users goes to zero. We start from = 0 and at each step we nd k such that incrementing k by yields the largest improvement and the resulting new is feasible. For the tentative conguration of the fractions , the corresponding power allocation is obtained from the waterlling solution. We stop when no further increment can improve the objective function value. The detailed description is given in the following: 1. Initialize variables such that n = 0, R wsr (0) = 0, = 0, and = 0. 2. Setn n+1. For 1, set (k) =+e k (note: e k denotes a vector of length A of all zeros with a single 1 in positionk), fork2S =fj : j + 1;8jg. IfS is 64 0 0.5 1 1.5 2 2.5 3 3.5 4 0 5 10 15 20 25 30 μ’ Cluster sum rate (bps/Hz) Optimal value by the exhaustive search Figure 3.1: Cluster (B = 2) sum rate of the proposed optimization algorithm as 0 increases from 0 to = 4. empty or+> , then exit and keep the current and the corresponding rates as the nal values of the algorithm. Otherwise, compute the tentative weighted sum rate value R (k) wsr for each k, by solving the optimization problem in (3.29) for xed (k) with the waterlling power allocation. 3. Let b k = arg max k2S R (k) wsr and set R wsr (n) =R ( b k) wsr . 4. If R wsr (n)>R wsr (n 1), then set ( b k) , + and go back to step 2. 5. Otherwise, if R wsr (n) R wsr (n 1), exit and take the current and the corre- sponding rates as the nal values of the algorithm. Figure 3.1 shows the sum rate versus 0 = =B in a symmetric setting, when the above user fraction and power allocation algorithm is applied and compares it with the globally optimal value obtained from the exhaustive search algorithm. Under the two- cell model described in Section 3.2.1, we assume the two BSs are cooperating (B=2), the channel gain coecients are given as (3.25) with [a b c d e f] = [1:5 1:3 1:0 0:5 0:3 0:2], and the antenna ratio, transmit power, and user group weight are = 4,P = 15 dB, and W i = 1; 8i, respectively. The exhaustive algorithm searches for the optimal weighted 65 sum rate in the A 0 dimensional space of 0 where each 0 i is ranged from 0 to 1 with P A 0 i=1 0 i < . If we discretize this domain with step size for each dimension, the computational complexity of the exhaustive algorithm is roughly O((1=) A 0 ), whereas the greedy algorithm has O(A 0 =) complexity. For the greedy algorithm curve, we removed the comparison between R wsr (n) and R wsr (n 1) in step 4 to see the objective function over the range from 0 to . When = 0:01, the greedy algorithm achieves the same optimal value with the exhaustive algorithm at 0 = 2:76. 3.2.4 Network Utility Function Maximization In general, the solution of (3.6) (or (3.29) in the large system limit) for the case A> B (more users than antennas) yields an unbalanced distribution of instantaneous rates, where some user classes are not served at all (we have k = 0 for some k). This shows that, for a general strictly concave network utility function g(), the ergodic rate region R requires time-sharing even in the asymptotic large-system case. Finding the solution of (3.5) is therefore extremely hard. Nevertheless, this solution can be computed to any level of accuracy by using a method inspired by the stochastic optimization approach of [GNT06]. Interestingly, the algorithm can be used both as for the computation of the optimum throughput point in the large system limit, and as an actual downlink scheduling algorithm, that can be applied almost verbatim to the actual nite-dimensional system. In the former case, the algorithm is equivalent to Lagrangian iteration where the \virtual queues" (to be dened in the following) plays the role of Lagrange multipliers. In the latter case, when applied to the nite dimensional system, the algorithm performs a stochastic \Lyapunov drift" optimization (see [GNT06]). For each user group k = 1;:::;A, dene a virtual queue that evolves according to Q k (t + 1) = [Q k (t)r k (t)] + +a k (t) (3.34) 66 where r k (t) denotes the virtual service rate and a k (t) the virtual arrival process. The queues are initialized by Q k (0) = r k (0) = 0. Then, at each iteration t = 1; 2;:::, the virtual arrival processes is given by a k (t) = a ? k where a ? is the solution of the convex problem: maximize Vg(a) A X k=1 Q k (t)a k subject to 0a k a max ; 8k (3.35) and whereV;a max > 0 are some suitably chosen constants, that determine the convergence properties of the iterative algorithm. The service rates are given by r k (t) = k (t) log 1 + B X m=1 2 m;k m (t) ! q k (t) ! where ((t); q(t);(t)) is the solution of (3.29) for weights W k = Q k (t). Then, the virtual queues are updated according to (3.34). The theory developed in [GNT06] (see also [SMCN10]) ensures the following result. Let r(t) denote the vector of service rates generated by the above iterative algorithm. Then, lim inf t!1 g 1 t t1 X =0 Br() ! g(R ? ) K V (3.36) where R ? is the solution of (3.5) andK is a constant that depends on the system pa- rameters and on a max . In particular, using the results in [SMCN10] we can prove the bound K A 2 a 2 max + log 2 1 + maxf 2 m;k :8m;kgP sum By choosing V and a max appropriately, we can ensure a desired tradeo between the accuracy of the approximation of the optimum point and the convergence speed of the iterative algorithm. 67 It should be noticed that if we use the greedy optimization of the user fractions as described in Section 3.2.3 instead of the exact solution of (3.31), then the performance guarantee (3.36) is no longer valid. However, the algorithm ensures that the throughput point that maximizes g() over the ergodic rate region achievable with the (suboptimal) greedy optimization of the user fractions can be approached arbitrarily closely. 3.3 Channel Estimation and Non-Perfect CSIT So far, we have assumed that the transmitter (cluster controller) has perfect CSIT. In this section we consider the case where the BSs in the cluster broadcast a set of downlink pilot signals in order to enable the users to measure their downlink channels and feed back channel state information in some form, in order to provide CSIT and enable the computation of the LZFBF precoding matrix, user selection, and power allocation. We seek the characterization of the non-trivial tradeo between the advantage of having a large number of transmit antennas or cluster size (large and/or large B) and the over- head required for estimating the channels. We assume that the channels are constant over time-frequency blocks of size WT complex dimensions, where W denotes to the system coherence bandwidth (in Hz) and T denotes the system coherence time (in sec.). For each such block, p BN dimensions are dedicated to downlink training, in order to allow all users in the cluster to estimate the composite channel (i.e., the corresponding column of H in (3.4)) formed by BN coecients. Since the channel vectors are Gaussian, linear MMSE estimation is optimal with respect to the MSE criterion. A simple dimensional- ity argument shows that the MSE can be made arbitrarily small as 2 k ! 0 (vanishing noise plus ICI) if and only if p . The ratio p = denotes the \pilot dimensionality overhead", relative to the minimum number of pilots dimensions that allow MMSE esti- mation, under the condition that the MMSE vanishes as 2 k ! 0. In the following, we assume that this condition holds. 68 Focusing on the estimation of a generic column of H in (3.4) corresponding to some userj in groupk of the reference cluster, the channel model of downlink channel estima- tion based on the common pilots is given by y (j) k = Th (j) k + z (j) k (3.37) where T is a p BN BN training matrix with equal-energy orthogonal columns, cor- responding to the training sequences sent in parallel from the BN antennas of the cluster joint transmitter (notice that the vertical dimension corresponds to channel uses, or \time", and the horizontal dimension corresponds to the antennas), the vector h (j) k is the corresponding channel vector, obtained by the stacking of the channels (including their path coecients) from the dierent base stations forming the cluster, and z (j) k is a vector of i.i.d. CN (0; 1) normalized noise plus interference samples. For simplicity, we re-index the base stations forming the reference cluster as m = 1;:::;B and the user groups as k = 1;:::;A. With this notation, from (3.4) we have h (j) k = 2 6 6 6 6 4 1;k h (j) 1;k . . . B;k h (j) B;k 3 7 7 7 7 5 (3.38) where h (j) i;k denotes the j-th column of the block H i;k , with i.i.d.CN (0; 1) elements. The equal-energy and orthogonality condition on the columns of the training matrix T yield that the total transmit power (energy per channel use) in the training phase is given by 1 p BN tr T H T = p p (3.39) 69 where we let T H T =pI, and p denotes the energy of the training sequences. Letting the total training power equal to the total cluster transmit power, we obtain p = p B X m=1 P m Noticing that Cov(h (j) k ) = diag( 2 1;k I;:::; 2 B;k I) = D k has block-diagonal structure with diagonal blocks given by scaled N N identity matrices, we immediately obtain the MMSE estimator of h (j) k in the form b h (j) k = D k (I +pD k ) 1 T H y (j) k (3.40) with estimation error covariance given by k = D k pD k (I +pD k ) 1 D k = D k (I +pD k ) 1 (3.41) The MMSE covariance matrix is also block diagonal, with scaled identities diagonal blocks, and it depends only on the user group index k and not on the individual user in the group (this is obvious since the users in the same group are statistically equivalent). From the well-known orthogonality condition of MMSE estimation and from joint Gaussianity, we have the canonical decomposition h (j) k = b h (j) k + e (j) k (3.42) where the estimator b h (j) k and the error e (j) k are independent, and such that Cov( b h (j) k ) = D k k =pD k (I +pD k ) 1 D k (3.43) 70 Putting everything together, we can write the channel matrix H in (3.4) in the form H = b H + E, where b H = 2 6 6 6 6 4 b 1;1 H 1;1 b 1;A H 1;A . . . . . . b B;1 H B;1 b B;A H B;A 3 7 7 7 7 5 ; (3.44) with b m;k = 2 m;k q 1=p + 2 m;k ; (3.45) and the blocks H m;k are independent with i.i.d. CN (0; 1) elements, and where E is independent of b H, and is given in the form E = 2 6 6 6 6 4 1;1 E 1;1 1;A E 1;A . . . . . . B;1 E B;1 B;A E B;A 3 7 7 7 7 5 ; (3.46) with m;k = q 2 m;k b 2 m;k = m;k q 1 +p 2 m;k ; (3.47) and the blocks E m;k and independent with i.i.d.CN (0; 1) elements. In a practical FDD system, the users should feed back their estimated channel on each time-frequency block, i.e., for each new observation. Several schemes have been proposed for closed-loop CSIT feedback, including codebook-based vector quantization, scalar quantization of the channel coecients, and unquantized \analog" feedback. Fur- thermore, the feedback takes place on the uplink, and can be performed by accessing the uplink channel in FDMA/TDMA, or exploiting the MIMO-MAC nature of the uplink in order to allow a number of users proportional to the number of receiving antennas to send their feedback signals simultaneously [CJKR10, SMC09, KC09]. Analyzing the system in the presence of a specic feedback scheme is possible, although even more cumbersome [CJKR10]. However, from the results in the above mentioned papers we 71 know that a well-designed digital feedback scheme can achieve a quantization error that is negligible with respect to the downlink training estimation error. Furthermore, this can be done with a moderate use of the uplink feedback total capacity, provided that the number of users feeding back their CSIT is not too large (see for example the optimiza- tion tradeo in [KJC11]. For the sake of simplicity, here we assume an ideal genie-aided CSIT feedback that provides b H directly to the centralized cluster controller at no addi- tional costs, either in terms of rate or in terms of CSIT distortion. This provides a \best case" for any CSIT scheme based on explicit downlink training and any form of feedback. Then, since an actual system implementation CSIT feedback has a cost that impacts on the uplink capacity, we shall propose a randomized scheduling scheme that pre-selects a subset of users and therefore limits the number of users actually feeding back their CSIT in the next section. Under these assumptions, the cluster transmitter computes a mismatched LZFBF pre- coding matrix from the estimated channel matrix b H instead of H. The following theorem yields an achievability lower bound on the large-system performance of the mismatched LZFBF: Theorem 3.3 Under the downlink training scheme described above and assuming genie- aided CSIT feedback, the achievable rate of users in group k is lower bounded by R k log 1 + b k ()q k 1 + P B m=1 2 m;k P m ! (3.48) where b k () = B X m=1 b 2 m;k m () (3.49) where ( 1 ();:::; B ()) is the unique solution with components in [0; 1] of the xed point equation m = 1 A X q=1 q m b 2 m;q P B `=1 ` b 2 `;q ; m = 1;:::;B (3.50) with respect to the variables =f m g. 72 Proof See Section 3.5.3. The conclusion of this section is that all the derivations and the optimization made before for the case of perfect CSIT, including the system symmetry conditions, can be applied straightforwardly to the case of non-ideal CSIT, provided that the per-user rates are replaced with the corresponding terms in (3.48). Finally, the system spectral eciency must be scaled by the factor h 1 pNB WT i + , that takes into account the downlink training overhead, i.e., fraction of dimensions per block dedicated to training. In particular, letting = N WT denote the ratio between the number of users per group, N, and the dimensions in a time-frequency slot, we can investigate the system spectral eciency for xed , in the limit of N!1. The ratio captures the \dimensional crowding" of the system. It is clear that a highly underspread system (WT 1) can accommodate more users, and more jointly coordinated antennas at the transmitter. Vice versa, if WT is not much larger than N, then the number of jointly coordinated transmit antennas (captured by the product B) is intrinsically limited by the channel time-frequency coherence. 3.4 Numerical Results and Simplied Scheduling In this section, rst we provide a comparison between the large-system limit results and the Monte Carlo simulation of the corresponding nite-dimensional systems with greedy user selection. Then, driven by the behavior of the nite-dimensional system, we propose a simplied scheduling algorithm that randomly pre-select users according to the probability obtained from the asymptotic analysis. As explained in Section 1.3, this algorithm has the advantage that the feedback required for CSIT is greatly reduced, since only the users that are eectively served are going to feedback their CSIT. However, quantifying the feedback resource requirement in a precise manner is out of the scope of this paper. Finally, we consider the case of non-perfect CSIT and investigate the tradeo 73 .... user group k cell m cell m+1 .. .... .... .... user group k+1 BS m BS m+1 .. .... .... Figure 3.2: Cell m and m + 1 in a linear two-sided cellular array. between increasing the number of jointly coordinated antennas and the dimensionality cost incurrent by downlink training for channel estimation. In the multi-cell system model, we consider a linear cellular arrangement shown in Figure 3.2, where M base stations are equally spaced on the segment [M;M] km, in positions 2mM 1 for m = 1;:::;M and K user groups are also equally spaced on the same segment, with K=M user groups uniformly spaced in each cell. The distance d m;k between BSm and user groupk is dened modulo [M;M], i.e., we assume a wrap- around topology in order to eliminate boundary eects. We use a distance-dependent pathloss model given by 2 m;k =G 0 =(1 + (d m;k =) )) and the pathloss parameters, G 0 ;; and follow the mobile WiMAX system evaluation specications [WiM06], such that the 3dB break point is = 36m (i.e., 3.6% of 1 km cell radius), the pathloss exponent is = 3:504, the reference pathloss at d m;k = is G 0 =91:64 dB, and the per-BS transmit power normalized by the noise power at user terminals is P = 154 dB. Comparison with nite-dimensional systems Figure 3.4 shows the average user throughputs (bit/s/Hz) versus user locations for the rst two cells near the origin (given the symmetry, this pattern repeats periodically), for the case of M = 8 cells, = 4, K = 64 user groups, and cluster size B = 1; 2 and 8, where the clusters are formed as shown in Figure 3.3. Notice that with 8 user groups per cell and = 4, we have twice as much users than antennas in each cell. The case B = 8 corresponds to the network-wide full cooperation. For the nite-dimensional Monte Carlo simulation, we 74 BS 1 BS 2 BS 3 BS 4 BS 5 BS 6 BS 7 BS 8 (a) B = 1 cell cooperation BS 1 BS 2 BS 3 BS 4 BS 5 BS 6 BS 7 BS 8 (b) B = 2 cell cooperation BS 1 BS 2 BS 3 BS 4 BS 5 BS 6 BS 7 BS 8 (c) B = 8 cell cooperation Figure 3.3: Cluster forms for B = 1; 2; and 8 cell cooperation. applied the same stochastic optimization algorithm described in Section 3.2.4, where now t denotes a scheduling time slot index, and for each t a new set of i.i.d. channel vectors is generated. In this case, the instantaneous weighted sum-rate is obtained via the user selection algorithm of [DS05], assuming that the CSIT for all users in the systems is available at the cluster controllers. As far as the network utility functiong() is concerned, we consider the Proportional Fairness (PF) criterion, corresponding tog(R) = P k logR k . This PF criterion is applied to all the numerical results in this section. From Figure 3.4(a), we notice that the advantage of full cooperation is signicant, whereas the cluster of size B = 2 yields a signicant improvement for the users in the center of the cluster, with respect to the basic cellular system with no cluster cooperation (B = 1). In Figure 3.4(b), we compare the asymptotic results with the nite-dimensional simulation results in the case of B = 2. The nite dimensional system yields better per- user throughput than the large-system limit, thanks to the ability of the user selection to exploit the randomness in the instantaneous channel realizations (multiuser diversity). However, as the number of users at each location, N grows, the diversity gain continues 75 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 0 1 2 3 4 5 6 7 Group location (km) Asymptotic group rate (bps/Hz) B=8 B=2 B=1 (a) Large system analysis −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 0 1 2 3 4 5 6 7 Group location (km) Group rate (bps/Hz) Fin−dim, N=1 Fin−dim, N=2 Fin−dim, N=4 Fin−dim, N=8 Asymptotic, N=∞ (b) Finite dimension simulation for B = 2 Figure 3.4: User rate under perfect CSIT, obtained from large system analysis for B =1, 2, and 8 and from nite dimension simulation with greedy user selection for B = 2 and N =1, 2, 4, and 8. M = 8 cells and K = 64 user groups. 76 to decline, for example, in this gure, the relative gain of the nite-dimensional rate to the asymptotic rate is about 55% for N = 1 but 25% for N = 8. In the case of B = 1 and B = 8 which are not shown here, the same trends are observed even though the diversity gain is slightly larger (B = 1) or smaller (B = 8). It is well-known that for large systems (large N), this multiuser diversity eect disappears because of \channel hardening" [HMT04,TCFB09]. Random user selection scheme for reduced CSIT feedback User selection re- quires a large amount of CSIT feedback since it needs CSIT from many users in order to select a good subset at each scheduling slot, even though no more users than the number of antennas can be served at a time. For systems with nite but large size, it is not wise to have many more users than transmit antennas to feedback their CSIT, since the multiuser diversity eect becomes marginal whereas the feedback resource grows at least linearly with the number of users feeding back their CSIT at each slot. In this regime, a better option consists of pre-select the users to be served in each slot, such that only these users feed back their CSIT. In this case, we have to design a user pre-selection scheme that approximately maximizes the desired network utility function. For example, a simple round-robin scheme may perform far from the desired PF optimal point. For this purpose, we consider a randomized scheduling scheme based on the asymp- totic analysis that eectively provides such user pre-selection. We stress the fact that this may be useful for the sake of limiting the total CSIT feedback requirement, while still approximating the optimal proportional fairness throughput point. In the proposed scheme, the users to which CSIT feedback is requested are randomly selected in each slot t as follows: letf k g be the user fractions per group of (approximately) co-located users, which is obtained from the asymptotic analysis. The cluster controller has a maximum of BN independent data streams to transmit using LZFBF (equal to the number of jointly coordinated transmit antennas). At each time t, the scheduler generates BN i.i.d. random variables S 1 (t);:::;S BN (t), taking values on the integersf0; 1;:::;Ag 77 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 0 1 2 3 4 5 Group location (km) Group rate (bps/Hz) Asymptotic, N=∞ Fin−dim, N=2 Fin−dim, N=4 Fin−dim, N=8 B=8 B=1 B=2 Figure 3.5: User rate under perfect CSIT in nite dimension (N = 2, 4, and 8) with random user selection and power allocation aided by the asymptotic results for B=1, 2, and 8. M = 8 cells and K = 64 user groups. with probabilityP(S i (t) = k) = k B for k6= 0 andP(S i (t) = 0) = 1 P A k=1 k B . Then, user group k is served by stream i at time slot t if S i (t) =k. Notice that streams i's for which S i (t) = 0 are not used and that multiple streams may be associated to the same user group. Finally, for each stream a user in the associated group is selected at random, making sure that streams serve distinct users. Once the allocation of streams to users is determined, the selected users are requested to feedback their CSIT and the scheduler optimizes the transmit powers by solving the weighted sum rate maximization problem with weights W k = @g(R)=@R k , corresponding to the optimal asymptotic throughput point. In the special case of PF scheduling, this is given by W k = 1=R k , [VTL02]. The nite-dimension simulation results under this random user selection scheme is compared with the asymptotic results in Figure 3.5 under the same system setting as in Figure 3.5. As N increases, the nite-dimensional results converge to the innite- dimensional limit and they are almost overlapped, especially when B = 1 or 2. Hence, the proposed scheme is eective for systems of nite but moderately large size. 78 Non-perfect CSIT and coordination vs. estimation tradeo Figure 3.6 shows the cell sum rate (cluster sum rate normalized by the number of cooperating cells in the reference cluster) versus values of in the cases of (a) perfect CSIT and no consideration of training overhead, and (b) non-perfect CSIT and explicit downlink training with p = . We consider a larger number of user groups, K = 192 in the M = 8 cells. As shown in Figure 3.6(a), under the assumption of perfect CSIT given at no cost, the cell sum rate grows almost linearly as (the ratio of BS antennas over the users per group) increases, and grows also as B (cluster size) increases, which shows the inter-cell cooperation and larger number of transmit antenna gain. However, when the CSIT estimation error and downlink training overhead are taken into account, there is a non-trivial tradeo between the improvement owing to more and more jointly coordinated transmit antennas and the cost of estimating higher and higher dimensional channels. Notice that this tradeo is \fundamental", in the following sense: a trivial upper bound on the achievable sum capacity of the reference cluster is obtained by letting all users perfectly cooperate as a single multi-antenna receiver. The capacity of the resulting block-fading single-user MIMO channel with BN transmit antennas and AN receiving antennas and fading coherence block WT = N= was characterized in the high-SNR regime in [MH99,ZT02]. Using this result, in the case 1 2 A B, the dimensionality \pre-log" loss factor with respect to the case of ideal CSIT is given by 1 BN WT that coincides with what is said at the end of Section 3.3 with choice of p = . In fact, the \pre-log" optimality of explicit training for single-user MIMO channels with block fading in the high-SNR regime is well-known [ZT02, HH03]. Also, the same result show that if 1 2 < minfA; Bg, then there is no point in using more than WT=2 jointly coordinated antennas. Finally, notice that the recently proposed schemes for \blind" interference alignment [WGJ11], exploiting recongurable antennas at the user terminals, still require channel state information at the receiver (CSIR) for coherent detection at each user terminal. Since the resulting channel is MIMO point-to-point, the same downlink training said above appears. In other words, these \blind" interference alignment schemes avoid 79 0 5 10 15 20 25 0 20 40 60 80 100 120 140 γ Cell sum rate (bps/Hz) B=1 B=2 B=8 (a) Perfect CSIT and no training overhead 0 5 10 15 20 25 0 5 10 15 20 25 30 35 40 45 γ Cell sum rate (bps/Hz) B=1, τ=1/64 B=2, τ=1/64 B=8, τ=1/64 B=1, τ=1/32 B=2, τ=1/32 B=8, τ=1/32 (b) Non-perfect CSIT and downlink training with p = Figure 3.6: Cell sum rate versus antenna ratio for B=1, 2, and 8. M = 8 cells and K = 192 user groups. 80 CSIT feedback, but still require downlink training in the same amount considered in this work. In conclusions, while we have analyzed a specic downlink training scheme, we have that, for a cluster in isolation, the sum capacity scaling in the high-rate regime (high-SNR) is indeed the correct one. Figure 3.6 shows the cell sum rate with consideration of training overhead and estima- tion error for p = . Inspired by practical system values, we chose = 1=64 and 1=32. In the nite-dimensional case, this corresponds to WT = 640 or 320 signal dimensions, respectively, with N = 10 users per user groups (total KN=M = 240 users per cell). We notice that as increases, the sum rates in most cases grow at rst, achieve some max- imum point and decrease, due to the tradeo between the benet from a large number of antennas and the training overhead cost. For given B and , the maximum sum rate is achieved at B = 1 2 , which is in line with the result of the high-SNR regime when B A. For example, for B = 2 and = 1=64, the sum rate is maximum at = 16 where 2 = 1 2(1=64) . For B = 1 and = 1=64, the optimal is beyond the number of user groups per cell. We can also see that, when the number of antennas is large, the no cooperation case (B = 1) achieves the highest sum rate for both = 1=64 and 1=32, which suggests that no cooperation gain can be expected, because the improvement of multi-antenna gain does not compensate for the dimensional decrease (pre-log factor) due to the training overhead. In order to see the best cluster size with downlink training and estimation, we consider a system with a large number of cells, M = 24. Figure 3.7 illustrates the cell sum rate versus the cluster size B for = 1; 2; 4; and 8 and = 1=64 and 1=32 with p = . In a linear cellular arrangement with M = 24, the clusters except for B = 1; 2; or 24, do not have the symmetric structure described in Section 3.2.1. So for those clusters, we notice that the solution of problem (3.29) under the cluster sum-power constraint produces an upper-bound of the optimal value under per-BS power constraint. Even though not explicitly shown in the gure, we can conrm that the cluster sum rate (B times the cell sum rate) is maximized when B = 1 2 but as far as the cell sum rate is 81 0 5 10 15 20 25 0 5 10 15 20 25 B Cell sum rate (bps/Hz) γ=1, τ=1/64 γ=2, τ=1/64 γ=4, τ=1/64 γ=8, τ=1/64 (a) = 1=64 0 5 10 15 20 25 0 2 4 6 8 10 12 14 16 18 B Cell sum rate (bps/Hz) γ=1, τ=1/32 γ=2, τ=1/32 γ=4, τ=1/32 γ=8, τ=1/32 (b) = 1=32 Figure 3.7: Cell sum rate versus cluster size B for =1, 2, 4, and 8 under non-perfect CSIT and explicit downlink training with p = . M = 24 cells andK = 192 user groups. 82 concerned, the optimal for given B and is smaller than the optimal one in terms of the cluster sum rate, i.e., 1 2 . For example, in Figure 3.7(a), when = 4 and = 1=64, the maximum cluster sum rate is achieved at B = 8, but the cell sum rate given as the cluster sum rate divided by B is maximum at B = 3. When the channel is more time or frequency selective ( = 1=32), the optimum cluster size gets even smaller, as shown in Figure 3.7(b). Furthermore, the cell sum rate is more sensitive to the cluster size, when the number of antennas is larger. 3.5 Proofs 3.5.1 Proof of Theorem 3.1 For the sake of clarity, we recall some denitions and facts about random matrices with independent non-identically distributed elements (see [TV04,TLV05]) of the type dened in (3.4), (3.7), that will be essential in the proof of Theorem 3.1. Denition 3.1 Consider an N r N c random matrix H = [H i;j ], whose entries have variance Var[H i;j ] = P i;j N r (3.51) such that P = [P i;j ] is an N r N c deterministic matrix with uniformly bounded entries. For given N r , we dene the variance prole of H as the function v Nr : [0; 1) [0; 1)!R such that v Nr (x;y) = P i;j ; i1 Nr x< i Nr ; j1 Nc y< j Nc (3.52) When we consider the limit for N r !1 with xed ratio Nc Nr ! , we assume that v Nr (x;y) converges uniformly to a bounded measurable functionv(x;y), referred to as the asymptotic variance prole of H. For random matrices distributed according to Denition 3.1, we have the following results. 83 Theorem 3.4 ( [TV04, Theorem 2.52]) Let H be an N r N c random matrix whose entries are independent zero-mean complex circularly symmetric random variables satis- fying the Lindeberg condition 1 N c X i;j E jH i;j j 2 1fjH i;j jg ! 0 (3.53) as N r ;N c !1 with Nc Nr !, for all > 0. Assume that the variances of the elements of H are given by Denition 3.1 and dene the function z (Nr ) (y;s) = h H j 0 @ I +s X `6=j h ` h H ` 1 A 1 h j ; j1 Nc y< j Nc : (3.54) As N r !1 with Nc Nr !,z (Nr ) (y;s) converges almost surely to the limitz(y;s), given by the solution of the xed-point equation z(y;s) =E 2 4 v(X;y) 1 +sE h v(X;Y) 1+sz(Y;s) jX i 3 5 ; y2 [0; 1]: (3.55) where X and Y are i.i.d. random variables uniformly distributed on [0; 1]. Dening the eective dimension ratio as 0 = P (E[v(X;Y)jY]6= 0) P (E[v(X;Y)jX]6= 0) ; the following high-SNR limit can be proved. Corollary 3.1 (see [TV04, Theorem 3.1]) As s goes to innity, we have lim s!1 z(y;s) = 8 > < > : 1 (y) if 0 < 1 0 if 0 1 (3.56) 84 where, for 0 < 1, 1 (y) is the positive solution to 1 (y) =E 2 4 v(X;y) 1 +E h v(X;Y) 1(Y) X i 3 5 (3.57) We now enter specically the proof of Theorem 3.1. From the well-known formula for the inverse of a 2 2 block matrix, we can write the (j;j) diagonal element of the matrix (I +sH H H) 1 as I +sH H H 1 j;j = 1 1 +sh H j 0 @ I +s X `6=j h ` h H ` 1 A 1 h j (3.58) Furthermore, assuming that H has full rank, then H H H 1 j;j = lim s!1 s I +sH H H 1 j;j = lim s!1 s 1 +sh H j 0 @ I +s X `6=j h ` h H ` 1 A 1 h j = 1 lim s!1 h H j 0 @ I +s X `6=j h ` h H ` 1 A 1 h j (3.59) Comparing the denition of (i) k () in (3.11) with (3.59) and using Theorem 3.4 and Corollary 3.1, we have that the desired limiting value of (i) k () is given by 1 (y), evaluated at the corresponding value of y such that j1 Nc y< j Nc forj = P k1 `=1 ` N +i, after replacing the general matrix H in Theorem 3.4 with H given by our problem. In this case, the number of rows in the matrix is given byN r = BN and the number of columns is given by N c = N. With the normalization by 1= p N of all the channel coecients, the matrix H dened in (3.7) is formed by independent blocks H m;k ( k ) 85 of dimension N k N, such that each block has i.i.d. CN (0; 2 m;k =N) elements. As N !1, we have that N c ;N r !1 with ratio = B . By imposing the appropriate normalization, the asymptotic variance prole of H is given by the piece-wise constant function v(x;y) = B 2 m;k for (x;y)2 m 1 B ; m B " P k1 j=1 j ; P k j=1 j ! (3.60) with m = 1;:::;B and k = 1;:::;A. Also, we nd explicitly 0 = P A k=1 k 1 n 1 B P B m=1 m;k 6= 0 o 1 B P B m=1 1 n 1 P A k=1 k m;k 6= 0 o (3.61) and notice that the case 0 < 1 in (3.56) always holds since, by construction, rank(H ) = N. Hence, the limit for (i) k () is obtained as the solution of the xed point equation (3.57), for any y2 P k1 j=1 j ; P k j=1 j . In fact, the piece-wise constant form of v(x;y) yields that (i) k () converges to a limit that depends only on k (the user group) and not on i (the specic user in the group). With some abuse of notation, we let k () = 1 (y) for all y2 P k1 j=1 j ; P k j=1 j , in order to denote this limit. Particularizing (3.57) to this case, we obtain k () = B X m=1 2 m;k 1 + A X q=1 q 2 m;q q () ; k = 1;:::;A (3.62) It follows that the asymptotic limit of is block-diagonal, with scaled-identity diagonal blocks, where the k-th block is given by k ()I k N . In order to obtain the more convenient expression (3.23), we introduce the variables m 2 [0; 1], for m = 1;:::;B, and replace k () = P B m=1 2 m;k m into (3.62). Since m takes values in [0; 1], we can write m = 1=(1 +z m ) for z m 0, and solving for z m , we 86 obtain z m = P A q=1 q 2 m;q q () . Eliminating the variables z m from the latter equation, we arrive at the desired xed point equation (3.24), as given in Theorem 3.1. As a nal remark, notice that (3.24) has some signicant advantages with respect to (3.62). In particular, the variables m take values in [0; 1] (by construction), and typically we have B <A (less BSs in a cluster than user groups). Therefore, (3.24) can be always initialized by letting m = 1, and the xed point equation iterative solution involves only B, rather than A, variables. Also, it is immediately evident by inspection that the solution of (3.24) for m 2 [0; 1] always exists and it is unique. 3.5.2 Proof of Theorem 3.2 We extend the min-max duality results of [YL07] to the LZFBF case and rst consider the power minimization problem in the nite dimensional system. The beamforming vectors are given as the Moore-Penrose pseudo-inverse of the channel matrix and we allocate user powersfq (i) k g to minimize the per-BS powers under a set of SINR constraint on each user. For a scalar variable 0 and per-BS powersfP 1 ;:::;P B g, the problem is formulated as minimize B X m=1 P m subject to 1 N tr m V QV H P m ; m = 1; ;B; (i) k ()q (i) k (i) k ; i = 1; ; k ; k = 1; ;A (3.63) where (i) k denotes the SINR constraint for user i in group k and V , Q, , and m are dened as in Section 3.1. Then the Lagrangian function for (3.63) is given as L(; Q;;) = B X m=1 P m + B X m=1 m 1 N tr m V QV H P m A X k=1 k N X i=1 (i) k (i) k ()q (i) k k 1 ! ; (3.64) 87 where m 's and (i) k 's are dual variables for the per-BS power constraints and SINR constraints, respectively. By dening = 1 N P B m=1 m m and rearranging the terms of (3.64), the Lagrangian function can be rewritten as L(; Q;;) = A X k=1 k N X i=1 (i) k B X m=1 ( m P m P m ) A X k=1 k N X i=1 (i) k () (i) k k h V H V i (i) k ! q (i) k ; (3.65) where [] (i) k denotes the ( 1:k1 N +i)-th main diagonal element of a matrix. The dual function corresponding to (3.65) is given as G(;) = min ;Q L(; Q;;); (3.66) and the optimal solution to (3.63) can be attained by solving the dual problem max ; G(;). It is clear that P B m=1 m P m P B m=1 P m and (i) k ()q (i) k k h V H V i such that the dual function in (3.66) should be nite for 0 and q (i) k 0. Thus the dual problem has the following form: max 0 min 0 A X k=1 k N X i=1 (i) k subject to B X m=1 m P m B X m=1 P m ; (i) k () (i) k h V H V i (i) k k ; (3.67) where we notice that the maximization with respect to is switched with the min- imization in compliance with the ipping of the direction of inequality for . Since h V H V i (i) k = (i) k (), problem (3.67) can be interpreted as the uplink power minimiza- tion for LZFBF subject to the sum power constraint under the worst case noise condition where the minimum power is maximized with respect to the noise covariance given by . By the results of [YL07], this power allocation duality can be converted into the 88 achievable rate duality and the LZFBF achievable rate in the downlink subject to the per- BS power constraint is the same as the rate in the dual uplink with the noise covariance subject to P B m=1 m P m P B m=1 P m . We consider the uplink channel model: r = H s + w (3.68) where r2C BN , s2C N , and wCN (0; ) denote the received signal, transmitted signal, and AWGN vector, respectively and the transmit power of user i in group k is s (i) k 2 = (i) k . By normalizing the channel matrix by 1=2 , we have the equivalent channel e r = 1=2 H s +e w (3.69) where e wCN (0; I). Letting 1=2 H = e H and applying the LZFBF receiver lter e H + , we obtain the eective channel gain e (i) k (;) = 1 ( e H + ) H ( e H + ) 1 (i) k = 1 h (H H 1 H ) 1 i (i) k : (3.70) and denote its large system limit by e k (;) = lim N!1 e (i) k (;). Then e k (;) can be obtained from Theorem 3.1 by replacing 2 m;k with 2 m;k = m . In this large system limit, we have (1) k =::: = ( k N) k = k and the dual uplink problem of the downlink weighted sum rate maximization under the per-BS power constraint is given by the following min- max problem: min 0 max 0 A X k=1 W k k log(1 + e k (;) k ) subject to B X m=1 m P m B X m=1 P m ; A X k=1 k k B X m=1 P m : (3.71) We denote the objective function of problem (3.71) byf(;) and letg() = maxf(;) subject to the given constraints. If the symmetry condition described in Section 3.2.1 89 holds and BS power constraints are equal, P 1 =::: =P B =P , then the constraints are rewritten as P B m=1 m B and P A k=1 k k BP and furthermore g() achieves the same value for any permutation matrix . Since g() is convex in , we have g() = 1 B! X g()g 0 @ 1 B! X 1 A ; where the equality holds if all m 's are equal. Then the minimum ofg() is achieved with 1 =::: = B = 1. When 1 =::: = B = 1, problem (3.71) reduces to problem (3.31) under the sum power constraint. This concludes the proof. 3.5.3 Proof of Theorem 3.3 Let b V denote the beamforming matrix for given user fractions , dened as in Section 3.1.2 after replacing H with b H , dened as in (3.7) with the change m;k ! b m;k . Let's focus on a generic user j in groupk. From (3.3) and (3.9) the received signal is given by y (j) k = h (j) k H b V Q 1=2 u +z (j) k = b h (j) k H b v (j) k q q (j) k u (j) k + e (j) k H b V Q 1=2 u +z (j) k (3.72) where we used the fact thatb v (j) k is orthogonal to all measured channel vectors b h (i) ` , for all other scheduled users, and we used the decomposition (3.42). The useful signal coecient b h (j) k H b v (j) k is, by construction, equal to the diagonal element corresponding to user j in groupk of the matrix b 1=2 , calculated from b H as in (3.11). The additional interference term e (j) k H b V u is the intra-cluster multiuser interference due to the fact that CSIT is not perfect. 90 A standard technique to lower bound the mutual information I(u (j) k ;y (j) k j b H) is as follows: I(u (j) k ;y (j) k j b H) = h(u (j) k )h(u (j) k jy (j) k ; b H) = logeq k h(u (j) k ay (j) k jy (j) k ; b H) logeq k h(u (j) k ay (j) k j b H) logeq k E h logeVar(u (j) k ay (j) k j b H) i (3.73) where we assumed thatu (j) k is Gaussian with variance 1 (denoting, as before, the transmit power to users in groupk). The bound holds for any coecienta. In particular, we wish to use the coecient that minimizes the conditional variance Var(u (j) k ay (j) k j b H), given by the linear MMSE estimation ofu (j) k fromy (j) k for given b H. After standard algebra, omitted here for the sake of brevity, we obtain the variance (conditional MMSE estimation error) Var(u (j) k ay (j) k j b H) = q k E e (j) k H b V Q b V H e (j) k b H + 1 b h (j) k H v (j) k 2 q k +E e (j) k H b V Q b V H e (j) k b H + 1 (3.74) Replacing this into (3.73), we obtain the desired lower bound in the form I(u (j) k ;y (j) k j b H)E 2 6 6 6 4 log 0 B B B @ 1 + b h (j) k H v (j) k 2 q k 1 +E e (j) k H b V Q b V H e (j) k b H 1 C C C A 3 7 7 7 5 (3.75) Let's examine the terms in (3.75) separately. As already said before, the coecient in the numerator of the SINR term inside the logarithm, in the large system limit, is given by b h (j) k H v (j) k 2 ! b k (), where the latter is obtained via Theorem 3.1 replacing the coecients m;k with the new coecients b m;k dened in (3.45), thus obtaining (3.49) and (3.50). 91 The intra-cluster interference term in the denominator can be evaluated as follows. First, notice that because of the properties of the MMSE estimator, the channel error vector is independent of the estimator b H. Therefore, the conditioning with respect to b H makes b V and the diagonal matrix of transmitted powers Q act as constant matrices with respect to the conditional expectation, since they are both functions of the CSIT b H. We have E e (j) k H b V Q b V H e (j) k b H = tr Q b V H Cov(e (j) k ) b V = tr Q b V H k b V = tr k b V Q b V H = B X m=1 2 m;k P m (3.76) where the last line follows from the denition of k in (3.41), which is block-diagonal with B diagonal blocks of dimension N N, and them-th diagonal block is given by 2 m;k I where m;k is dened in (3.47), and by noticing that b V Q b V H is the covariance matrix of the signal transmitted from all the base stations forming the cluster. Under a per-BS power constraint, the partial trace of this matrix on any diagonal segment corresponding to base stationm (diagonal segments of length N) is equal toP m . Therefore, the simple form of (3.76) follows. This concludes the proof. 92 Chapter 4 Massive MIMO and Location-based Scheduling In this chapter, we propose the location-adaptive scheduling and transmission strategy for the TDD-based MU-MIMO system. First, we show the TDD-based system model with examples under one-dimensional linear and two-dimensional hexagonal cellular lay- outs. Then we describe a family of network-MIMO schemes dened by small clusters of cooperating BSs, zero-forcing multiuser MIMO precoding with suitable inter-cluster inter- ference constraints, uplink pilot signals reuse across cells, and frequency reuse, including the one considered in the massive MIMO system. For each of schemes, we asymptotically analyze the achievable spectral eciency in the large system limit and choose the opti- mal architecture which yields the largest spectral eciency at each user location. The scheduling algorithm that satises a specic fairness criterion is described and numeri- cal results including comparison with nite dimensional simulation results are presented, which show benets of the proposed strategy to the spectral eciency at each location and to system-wide throughput. 4.1 TDD Cellular System Model The TDD cellular architecture for high-data rate downlink proposed in this work is based on the following elements: A family of network-MIMO schemes, dened in terms of the size and shape of clusters of cooperating BSs, pilot reuse across clusters, frequency reuse factor, and downlink linear precoding scheme; A partitioning of the user population into 93 bins, according to their position in the cellular coverage area; The determination of the optimal network-MIMO scheme in the family for each user bin, creating an association between user bins and network-MIMO schemes; Scheduling of the user bins in time- frequency in order to maximize a suitable concave and componentwise non-decreasing network utility function of the ergodic user rates. The network utility function is chosen in order to re ect some desired notion of fairness (e.g., proportional fairness [MW00, VTL02, HMK + 10, BBG + 00, PELT06]). When a given bin is scheduled, the associated optimized network-MIMO architecture is used. Therefore, the overall scheme consists of a mixed-mode network-MIMO architecture. Invoking well-known convergence results [MSS03, KCM09], we use the \large-system analysis" approach for multi-antenna cellular systems pioneered in [ABEH06, HKD10, ZH10,HMK + 10], and especially the results of Chapter 3 of this dissertation (see also its journal version [HTC10]), which are almost directly applicable to this system model, and analyze the performance of the network-MIMO schemes in the considered family while scaling the number of users in each bin, the number of antennas per BS and the small- scale fading coherence block length to innity, with xed ratios. We dene a system size parameter N and let all the above quantities scale linearly with N!1. In particular, we let MN denote the number of BS antennas, LN denote the channel coherence block length, and UN denote the number of users per location (a bin is dened as a set of discrete locations in the cellular coverage, see Section 4.1.1), for given constantsM;L and U. This allows the ecient determination of the optimal association between bins and network-MIMO architectures in the proposed mixed-mode network-MIMO architecture. In this work we do not consider shadowing. Although the large-system analysis is easily applied to the case of shadowing, the extension of the geographically determined user bins to the case of shadowing is not obvious and it is left for future work. 94 (0,- 1) (0,- 2) (0,0) (0,1) (0,2) (0,3) (- 1,- 1) (- 1,- 2) (- 1,0) (- 1,1) (- 1,2) (- 1,3) (- 2,0) (- 2,- 1) (- 2,1) (- 2,2) (- 2,3) (- 2,4) (- 3,0) (- 3,- 1) (- 3,1) (- 3,2) (- 3,3) (- 3,4) (1,- 2) (1,- 3) (1,- 1) (1,0) (1,1) (1,2) (2,- 2) (2,- 3) (2,- 1) (2,0) (2,1) (2,2) (3,- 3) (3,- 4) (3,- 2) (3,- 1) (3,0) (3,1) (4,- 3) (4,- 4) (4,- 2) (4,- 1) (4,0) (4,1) (5,- 4) (5,- 5) (5,- 3) (5,- 2) (5,- 1) (5,0) (6,- 4) (6,- 5) (6,- 3) (6,- 2) (6,- 1) (6,0) (7,- 5) (7,- 6) (7,- 4) (7,- 3) (7,- 2) (7,- 1) (8,- 5) (8,- 6) (8,- 4) (8,- 3) (8,- 2) (8,- 1) Figure 4.1: Two dimensional hexagonal cell layout with B = 19. The triangle marks indicate the BS positions (points of bs ) and the red triangle marks indicate the points of . The insides of the large thick-lined hexagon and the small hexagons denoteV and V b for b2 bs , respectively. 4.1.1 Cellular Layout and Frequency Reuse Base stations, cells and clusters: The system geometry is concisely described by us- ing lattices inR (for 1-dimensional layouts [CRP + 08,RCP09] ) or inR 2 (for 2-dimensional layouts [Mar10]). Consider nested lattices bs u inR (resp.,R 2 ). We consider a coverage area given by The Voronoi cellV of centered at the origin, 1 dened the system coverage region. BSs are located at pointsb2 bs \V. The ner lattice u denes a grid of discrete user locations, as explained later in this section. We let B =j bs \Vj denote the number of BSs in the system. Example 1 Consider the 1-dimensional layout dened by = BZ and bs =Z. The coverage region isV = [B=2;B=2) and the BS locations b are given by all integer- coordinate points in the interval [B=2;B=2). Example 2 In system studies reported in the standardization of 4th generation cellular systems [3GP10b, IEE09] it is customary to consider a 2-dimensional hexagonal layout 1 The Voronoi cell of a lattice point x2 is the set of points y closer to x than to any other lattice point. 95 formed by 19 cells, as shown in Figure 4.1. In this case, bs = AZ 2 and = ABZ 2 , with A = p 3r 2 2 6 4 p 3 0 1 2 3 7 5 and B = 2 6 4 4 p 3 p 3 4 3 7 5; wherer denotes the distance between the center of a small hexagon and one of its vertices. We haveB = det(B) = 19, and the distance between the closest two points in bs is p 3r. For the sake of symmetry, in order to avoid \border eects" at the edges of the coverage region, we dene all distances and all spatial coordinates to be dened modulo . In particular, the modulo distance between two pointsu;v inR (resp.,R 2 ) is dened as d (u;v) =juv mod j; (4.1) wherex mod =xarg min 2 jxj. CellV b is dened as the Voronoi region of BSb2 bs \V with respect to the modulo- distance, i.e.,V b =fx2R :d (x;b)d (x;b 0 ); 8b 0 2 bs \Vg (replace R with R 2 for the 2-dimensional case). The collection of cells fV b g forms a partition into congruent regions of the coverage regionV. A \clustering pattern" u(C), dened by the set of BS locationsC =fb 0 ;:::;b C1 g with b j 2 bs \V and rooted at b 0 = 0, is the collection of BS location sets (referred to as \clusters" in the following) u(C) =ffC +cg : c2 bs \Vg: (4.2) We focus on systems based on single-cell processing (C = 1), or with joint processing over clusters of small size: C = 2 in the 1-dimensional case, and size C = 3 in the 2- dimensional case, as shown in Figure 4.2. It turns out that larger clusters do not achieve better performance due to the large training overhead incurred, while requiring higher complexity. Therefore, our results are not restrictive since they capture the best system parameters congurations. 96 BS b x -x SN users split btw {b- x, b+x} BS b+1 BS b x -x SN users split btw {b+x, b+1-x} 2-BS cluster b ... ... ... ... BS 0 BS (0,0) BS (0,1) BS (- 1,1) 3-BS cluster 0 (pattern 1) 3-BS cluster 0 (pattern 2) BS (0,1) BS (0,0) BS (1,0) 1-dimensional layout 2-dimensional layout Figure 4.2: Cluster pattern geometry and user bins in one-dimensional and two dimen- sional layouts. Example 3 In the 1-dimensional case of Example 1, with C = 1 and C = 2, we have u(f0g) =ff0g;f1g;:::;fB 1gg: and u(f0; 1g) =ff0; 1g;f1; 2g;:::;fB 1; 0gg; respectively. User location bins: We discretize the users uniform spatial distribution into a regular grid of user locations. In particular, we assume that users are placed at the points of the lattice translate e u = u +u 0 , where u 0 6= 0 is chosen such that e u is symmetric with respect to the origin and no points of e u fall on the cell boundaries. 97 Example 4 In the 1-dimensional case of Example 1 we can choose u = 1 K Z, for some even integer K, and let u 0 = 1 2K . Then, the points of e u \V b are symmetrically located with respect to each BS coordinate b = 0; 1;:::;B 1. A \user bin" v(X ), dened by the set of user locationsX =fx 0 ;x 1 ;:::;x m1 g with x i 2 e u , is the collection of user location sets (indicated by \groups" in the following) v(X ) =ffX +cg : c2 bs \Vg: (4.3) In particular, we restrictX to be a symmetric set of points with respect to the BS positions. Example 5 In the 1-dimensional case of Example 1, we are interested in the casesX = fx;xg andX =fx; 1xg, for some x2 e u \ [0; 1=2], as shown in Figure 4.2. This yields the bins v(fx;xg) =ffx;xg;f1x; 1 +xg;:::;fB 1x;B 1 +xgg and v(fx; 1xg) =ffx; 1xg;f1 +x; 2xg;:::;fB 1 +x;Bxgg; respectively. Cluster/group association and user group rate: The BSs forming a cluster are jointly coordinated by a \cluster controller" that collects all relevant channel state infor- mation and computes the beamforming coecients for the desired MU-MIMO precoding scheme. For any pairfX;Cg, the users in groupX +c are served by the clusterC +c, for all c2 bs \V (see Figure 4.2). By construction, each BS belongs to C clusters and transmits signals from all these C clusters. These signals may share the same frequency band, or be dened on orthogonal subbands, depending on the system frequency reuse factor dened later in this section. There aremUN users in each groupX +c, andCMN 98 jointly coordinated antennas in each clusterC +c. We assumemUCM, such that the downlink DoFs are always limited by the number of antennas. 2 The number of users eectively scheduled and served on each given slot is denoted by SN. We refer to these users as the \active users", and to the coecient S2 [0;CM] as the \loading factor". Depending on the geometry ofX andC and on the type of beamforming used (see Section 4.3) S can be optimized for each pairfX;Cg. We restrict to consider schemes that serve an equal numberSN=m of active users per location x2X +c. By symmetry, co-located users are statistically equivalent. Therefore, without loss of generality, we may assume that a round-robin scheduling picks all subsets of sizeSN out of the wholemUN users in each group with the same fraction of time. In this way, the aggregate spectral eciency of the group (indicated in the following a \group spectral eciency") is shared evenly among all the users in the group. Frequency reuse: We denote by F the frequency reuse factor of the scheme, which can also be optimized for each pairfX;Cg. The system bandwidth is partitioned into F subbands of equal width. For F = 1, all clusters in u(C) transmit on the whole system bandwidth. For F > 1, clusters are assigned dierent subbands according to a regular reuse pattern. For the 1-dimensional layout, any integer F is possible. For the 2-dimensional layout, we consider reuse factors given byF =i 2 +ij +j 2 for non-negative integer i and j [Rap02]. For later use, we deneD(f) as the set of clusters active on subband f2f0;:::;F 1g. Example 6 Figure 4.3 shows a 1-dimensional system with frequency reuse F = 2 for the clustering pattern of size C = 2 dened byC =f0; 1g and the user bin dened by X =fx; 1xg. Even-numbered clusters operate on subband 0 and odd-numbered clusters operate on subband 1. An example for the 2-dimensional hexagonal layout with F = 3 2 A system with mU < CM is not fully loaded, in the sense that the infrastructure would support potentially a larger number of users. 99 BS 1 BS 0 x -x BS 2 BS 3 f = 0 x -x ... cluster 0 cluster 2 BS 2 BS 1 x -x BS 3 BS 4 f = 1 x -x ... cluster 1 cluster 3 BS 4 BS 0 x -x Figure 4.3: 1-dimensional layout with C = 2 and F = 2. and C = 1 is shown in Figure 4.1, where cells with the same color operate on the same subband. 4.1.2 Channel Statistics and Received Signal Model The average received signal power for a user located at x2V from a BS antenna lo- cated at b2V is denoted by g(x;b), a polynomially decreasing function of the distance d (x;b). The AWGN noise power spectral density is normalized to 1. For xedfX;Cg, the fading channel coecients from the CMN antennas of BS clusterC +c to an active userk2f1;:::;S=mg at locationx +c 0 :x2X , on frequency subbandf, form a random vector h k;c 0 ;c (f;x)2C CMN1 , with circularly-symmetric complex Gaussian entries, i.i.d. across the BS antennas, the subbands and the users (independent small-scale Rayleigh fading). In the considered network-MIMO schemes, active users are served with equal transmit power, given by 1=S. Hence, the total transmit power per cluster is equal to N. Since each BS simultaneously participates in C clusters, also the total transmit power per BS is equal to N. Consequently, the channel coecients are normalized to have variance 1=N, such that the received signal power is independent of N. This provides the correct scaling when we let N !1 in the large-system limit. We let the chan- nel vector covariance matrix be given byE h k;c 0 ;c (f;x)h H k;c 0 ;c (f;x) = 1 N G c 0 ;c (x), where 100 G c 0 ;c (x) = diag (g(x +c 0 ;b +c)I MN :b2C). 3 Notice that G c 0 ;c (x) is independent of the user index k and on the subband index f, since the channels are identically distributed across subbands and co-located users. Under the standard block-fading assumption [Pro00,MH99,ZT02,Mar10,HTC10], the channel vectors are constant on each subband for blocks of length LN signal dimensions. Without loss of generality, we assume that these coherence blocks also correspond to the scheduling slot. Each slot is partitioned into an uplink training phase, of length L P N and a downlink data phase, of length L D N. In this section we deal with the data phase, while the training phase is addressed in Section 4.2. For the sake of notation simplicity, the slot \time" index is omitted: since we care about ergodic (average) rates, only the per-block marginal channel statistics matters. The data-bearing signal transmitted by clusterC +c on subband f is denoted by X c (f) = U c (f)V H c (f) (4.4) where the matrix U c (f)2C L D NSN contains the codeword (information-bearing) sym- bols arranged by columns. We assume that users' codebooks are drawn from an i.i.d. Gaussian random coding ensemble with symbolsCN (0; 1=S). Achievable rates shall be obtained via the familiar random coding argument [CT05] with respect to this input distribution. The matrix V c (f)2C CMNSN contains the beamforming vectors arranged by columns, normalized to have unit norm. It is immediate to see that the transmit power of any clusterC +c, active on frequency f, is given by 1 L D N tr E X H c (f)X c (f) =N, as anticipated before. Recalling the denition ofD(f), the received signal for userk at locationx+c :x2X is given by y k;c (f;x) = X c 0 2D(f) U c 0(f)V H c 0(f)h k;c;c 0(f;x) + z k;c (f;x) (4.5) 3 We use diag(Ma : a2A) to indicate a block-diagonal matrix with diagonal blocks Ma, for some index a taking values in a setA, and In to indicate the nn identity matrix. 101 where z k;c (f;x)CN 0; 1 F I L D N . Notice that a scheme using frequency reuse F > 1 transmits with total cluster powerN over a fraction 1=F of the whole system bandwidth. This is taken into account by letting the noise variance per component be equal to 1=F , in the signal model (4.5). By construction, the encoded data symbols for userk at locationx+c :x2X , are the entries of the k-th column of U c (f). The columns k 0 6=k of U c (f) form the intra-cluster (multiuser) interference for user k. All other signals U c 0(f), with c 0 2D(f);c 0 6=c, form the Inter-Cluster Interference (ICI). As seen in Section 4.3, intra-cluster interference and ICI are handled by a combination of beamforming and frequency reuse. 4.2 Uplink Training and Channel Estimation The CSIT is obtained on a per-slot basis, by letting all the scheduled (i.e., active) users in the slot sent pilot signals over the L P N dimensions dedicated to uplink training. 4 We xfX;Cg and focus on the SN active users in the groupsX +c :c2D(f). These users must sendSN orthogonal pilot signals to allow channel estimation at their corresponding serving clustersC +c :c2D(f). 4.2.1 Pilot Reuse Scheme LetL P =QS, whereQ 1 is an integer pilot reuse factor that can be optimized for each fX;Cg. Let 2C QSNQSN be a scaled unitary matrix, such that H = ul QSNI QSN , where ul denotes the uplink transmit power per user during the training phase. The columns of are partitioned intoQ disjoint blocks of sizeSN columns each, denoted by 0 ;:::; Q1 and referred to as training codebooks. These are assigned to the groups in a periodic fashion, such that the same training codebook q is reused everyQ-th groups 4 As done in [Mar10], also our analysis is slightly optimistic since it only accounts for the overhead and degradation due to uplink noisy channel estimation, while it assumes genie-aided overhead-free \dedicated training" to support coherent detection during data-transmission. As shown in [CJKR10], the eect of noisy dedicated training is minor relatively to the CSIT estimation error. 102 x ... ... q = 0 q = 0 q = 1 Q = 2 cluster 0 cluster 1 cluster 2 x x Figure 4.4: Pilot reuse and contamination forC = 2,F = 1, andQ = 2. The dashed lines show the contamination from a user sharing the same pilot signal, in another cluster. X +c : c2D(f). For later use, we let q(c)2f0;:::;Q 1g denote the index of the training codebook allocated to groupX +c, and we letP(q;f) =fc2D(f) :q(c) =qg. Pilot reuse is akin frequency reuse, but in general Q and F may be dierent in order to allow for additional exibility in the system optimization. Example 7 In the 1-dimensional layout with C = 2,C =f0; 1g andX =fx; 1xg we may have F = 1 (i.e., each cluster is active on the whole system bandwidth) and Q = 2 (i.e., two mutually orthogonal training codebooks are used alternately, such that the same set of uplink pilot signals is reused in every other cluster, as shown in Figure 4.4). 4.2.2 MMSE Channel Estimation and Pilot Contamination The uplink signal received by the CMN antennas of clusterC +c :c2D(f), during the training phase, is given by Y c (f) = X c 0 2D(f) q(c 0 ) H H c 0 ;c (f;X ) + Z c (f): (4.6) Because of TDD reciprocity, the uplink channel matrix H c 0 ;c (f;X )2 C CMNSN con- tains the downlink channels h k;c 0 ;c (f;x) arranged by columns, for all active users k = 1;:::;SN=m at all locations x +c 0 : x2X . In (4.6), Z c (f)2C L P CMN denotes the uplink AWGN with componentsCN (0; 1). The goal of the uplink training phase is to provide to each clusterC +c an estimate of the channel vectors h k;c;c (f;x) for all the active users in the corresponding served groupX +c. 103 By projecting Y c (f) onto the column of q(c) associated to user k at location x +c : x2X and dividing by ul QSN, the relevant observation for estimating the h k;c;c (f;x) is given by r k;c (f;x) = X c 0 2P(q(c);f) h k;c 0 ;c (f;x) + n k;c (f) (4.7) where n k;c (f)CN (0; ( ul QSN) 1 I CMN ). For anyc 0 2P(q(c);f), the MMSE estimate of h k;c 0 ;c (f;x) from r k;c (f;x) is obtained as b h k;c 0 ;c (f;x) = G c 0 ;c (x) 2 4 ( ul QS) 1 I CMN + X c 00 2P(q(c);f) G c 00 ;c (x) 3 5 1 r k;c (f;x) (4.8) Invoking the well-known MMSE decomposition, we can write h k;c 0 ;c (f;x) = b h k;c 0 ;c (f;x) + e k;c 0 ;c (f;x); (4.9) where the channel estimate b h k;c 0 ;c (f;x) and the error vector e k;c 0 ;c (f;x) are zero-mean un- correlated jointly complex circularly symmetric Gaussian vectors (and therefore statisti- cally independent due to joint Gaussianity). After some straightforward algebra (omitted for brevity), we obtain the covariance matricesE[ b h k;c 0 ;c (f;x) b h H k;c 0 ;c (f;x)] = 1 N c 0 ;c (x) and E[e k;c 0 ;c (f;x)e H k;c 0 ;c (f;x)] = 1 N c 0 ;c (x), where c 0 ;c (x) = diag c 0 ;c;b (f;x)I MN :b2C and c 0 ;c (x) = diag c 0 ;c;b (f;x)I MN :b2C , and where we dene c 0 ;c;b (f;x) = g(x +c 0 ;b +c) 1 + c 0 ;c;b (f;x) (4.10) c 0 ;c;b (f;x) = g(x +c 0 ;b +c) c 0 ;c;b (f;x) = g(x +c 0 ;b +c) 1 + c 0 ;c;b (f;x) 1 (4.11) with c 0 ;c;b (f;x) = g(x +c 0 ;b +c) ( ul QS) 1 + P c 00 2P(q(c);f)nc 0g(x +c 00 ;b +c) (4.12) The desired channel estimate at clusterC +c is given by b h k;c;c (f;x), obtained by letting c 0 = c in (4.8) { (4.12). Notice that the training phase observation r k;c (f;x) in (4.7) 104 contains the superposition of all the channel vectors h k;c 0 ;c (f;x) of the usersk at location x +c 0 : x2X , for all c 0 2P(q(c);f), i.e., sharing the same pilot signal. This is the so-called pilot contamination eect, which is a major limiting factor in the performance of TDD systems [JAMV11,Mar10]. Because of pilot contamination, the MMSE estimate b h k;c;c (f;x) is correlated with the channels h k;c 0 ;c (f;x), for all c 0 2P(q(c);f). Next, we express h k;c 0 ;c (f;x) for c 0 2P(q(c);f) in terms of b h k;c;c (f;x) and a com- ponent independent of b h k;c;c (f;x). This decomposition will be used in the proofs of the main analysis results in Sections 4.6.1 and 4.6.2 and it is the key to understand the pilot contamination eect. Using (4.8) and the fact that c;c (x) is invertible, we have that b h k;c 0 ;c (f;x) = E h k;c 0 ;c (f;x)jr k;c (f;x) = G c 0 ;c (x)G 1 c;c (x) b h k;c;c (f;x) (4.13) Using (4.9), the channel vector h k;c 0 ;c (f;x) from the antennas of clusterC +c to the unintended user k at location x +c 0 :x2X , can be written as h k;c 0 ;c (f;x) = G c 0 ;c (x)G 1 c;c (x) b h k;c;c (f;x) + e k;c 0 ;c (f;x): (4.14) The mutual orthogonality of b h k;c 0 ;c (f;x) and e k;c 0 ;c (f;x), together with (4.13) and the fact that due to the block-diagonal structure of the covariance matrices the MMSE estimator (4.8) decouples into componentwise individual scalar estimators, yield that b h k;c;c (f;x) and e k;c 0 ;c (f;x) are mutually independent. 4.3 MU-MIMO Precoders and Achievable Rates In the family of network-MIMO schemes considered in this work, the linear beamforming matrix V c (f) is calculated as a function of the estimated channel matrix b H c;c (f;X ). The schemes dier by the type of beamforming employed. In particular, we consider LZFBF 105 where any active user k at location x +c : x2X , imposes ZF constraints on J 0 clusters. A ZF constraint consists of the set of linear equations v H j;c 0(f;x 0 ) b h k;c;c 0(f;x) = 0; 8 (j;x 0 )6= (k;x) (4.15) where v j;c 0(f;x 0 ) denotes the column of V c 0(f) corresponding to userj at locationx 0 +c 0 : x 0 2X . 4.3.1 Beamforming Case J = 0: In this case no ZF constraints are imposed. Hence, we have V c (f) = UNorm n b H c;c (f;X ) o (4.16) where the operation UNormfg indicates the rescaling of the columns of the matrix argu- ment such that they have unit norm. This is the well-known Linear Single-User Beam- forming (LSUBF) considered in [Mar10], where the beamforming vector v k;c (f;x) is a scaled version of the corresponding estimated channel vector b h k;c;c (f;x). Case J = 1: In this case any active user imposes ZF constraints on its own serving cluster only. This corresponds to the classical single-cell LZFBF, for which V c (f) = UNorm n b H + c;c (f;X ) o ; (4.17) where we dene the Moore-Penrose pseudo-inverse M + = M h M H M i 1 (4.18) for a full column-rank matrix M. It follows that v k;c (f;x) is orthogonal to the esti- mated channels b h j;c;c (f;x 0 ) for all other active users (j;x 0 )6= (k;x) in the same cluster, i.e., ZF is used to tackle intra-cel interference, but nothing is done with respect to ICI. 106 The large-system asymptotic analysis of LZFBF with joint transmission from clusters of BSs, distance dependent pathloss and channel estimation errors has been provided in Chapter 3. CaseJ > 1: In this case, beyond the ZF constraints imposed to the serving cluster, each user imposes additional ZF constraints to J 1 neighboring clusters in order to mitigate the ICI. Mitigating ICI through the beamforming design is an alternative approach to frequency reuse and, in general, might be used jointly with frequency reuse. Let's focus on clusterC +c. This is subject to ZF constraints imposed by its own users, as well as by some users at some locations of the neighboring clusters. In order to enable such constraints, the c-th cluster controller must be able to estimate the channels of these out-of-cluster users. This can be done if these users employ training codebooks with indices q6= q(c). In some cases, only the channel subvectors to the nearest BS in the cluster can be eectively estimated, since there are other users sharing the same pilot signal that are received with a stronger path coecient. Then, the channel subvectors that cannot be estimated are treated as zero. Since these schemes are complicated to explain in full generality, we shall illustrate two specic examples, the generalization of which is cumbersome but conceptually straightforward, and can be worked out by the reader if interested in other specic cases. Example 8 Consider Figure 4.5(a), illustrating a 1-dimensional system withC =f0; 1g, X =fx; 1xg, F = 1 and Q = 2. The beamforming matrix of each cluster c satises ZF constraints for its own served users and for the users in the m = 2 locations at minimum distance in the nearest neighbor clusters, c 1 and c + 1. These locations collectively use distinct columns of the training codebooks q 6= q(c). In the specic example, the reference cluster c = 0 uses training codebook 0 , and the nearest locations on the left and on the right of cluster 0 use the rst SN=2 columns and the second SN=2 columns of the other training codebook 1 , respectively. Hence, cluster 0 controller can estimate all the channels of its own active users, at locations x; 1x, and of the users in adjacent 107 BS 0 x -x BS 1 BS 2 x -x cluster B-1 cluster 1 x -x cluster 0 ... (a) J =Q BS 0 x -x BS 1 BS 2 x -x cluster B-1 cluster 1 x -x cluster 0 ... (b) J =C(Q 1) + 1 Figure 4.5: Two cases of a precoding scheme for C = 2, F = 1, and Q = 2, with (a) J = Q and (b) J = C(Q 1) + 1. The dashed lines indicate the channel vectors to out-of-cluster users for which a ZF constraint is imposed. In Figure (b), the light-shaded dashed lines indicate the channel vectors assumed zero in the beamforming calculation. locationsx and 1 +x, as shown in the gure. The beamforming matrix in the case of Figure 4.5(a) is obtained as follows. Dene M c (f;X ) = h b H c;c (f;X ) | {z } 2MNSN b H c1;c (f;f1xg) | {z } 2MNSN=2 b H c+1;c (f;fxg) | {z } 2MNSN=2 i (4.19) be the matrix of dimension 2MN 2SN of all estimated channels at cluster controller c, where the rst block corresponds to the desired active users and the remaining blocks correspond to users in the adjacent clusters for which a ZF constraint is imposed. Then, V c (f) = UNorm M + c (f;X ) SN k=1 (4.20) where [] m n extracts the columns from n to m of the matrix argument. This scheme can be generalized to J =Q, where each cluster c satises ZF constraints for the desired SN active users in its own cluster and for a total of (J1)SN additional users in the nearest location of neighboring clusters. 108 Example 9 Consider Figure 4.5(b), illustrating the same 1-dimensional system as in Example 8 with a dierent beamforming design. In this case, the beamforming matrix of each cluster c satises ZF constraints for its own served users and all the users in the nearest neighbor clusters. However, some of these users share the same columns of the training codebooks q 6= q(c). In the specic example, the reference cluster c = 0 uses training codebook 0 , and the clusters to the left and to the right the other training codebook 1 . Users at location1+x use the same pilot signals of users at location 1+x, and users at locationx use the same pilot signals of users at location 2x. Then the 0- th cluster controller assumes that the least signicant channel coecients, corresponding to BSs at larger distance, are zero. In the example, for locations1 +x andx, only the subvector of dimension MN corresponding to the antennas of BS 0 is estimated, while the remaining subvector to BS 1 is treated as zero. Similarly, for locations 1+x and 2x only the subvector to BS 1 is estimated, while the remaining subvector to BS 0 is treated as zero. The beamforming matrix in the case of Figure 4.5(b) is obtained as follows. Dene M c (f;X ) = h b H c;c (f;X ) | {z } 2MNSN b H c1;c (f;X ) | {z } 2MNSN L b H c+1;c (f;X ) | {z } 2MNSN=2 R i (4.21) be the matrix of dimension 2MN 3SN of all estimated channels at cluster controller c, where indicates elementwise product and where L = 2 6 4 1 MNSN 0 MNSN 3 7 5; R = 2 6 4 0 MNSN 1 MNSN 3 7 5 are masking matrices that null out the subvectors of the channels that are treated as zero in the beamforming design. Then, V c (f) is again given by (4.20). This scheme can be generalized to J = C(Q 1) + 1, where each cluster c satises ZF constraints for the desired SN active users in its own cluster and for a total of (J 1)SN additional users in the neighboring clusters, with some channel sub-vectors set to zero. 109 4.3.2 Achievable Group Spectral Eciency Letting R (N) k;c (f;x) denote the spectral eciency (in bit/s/Hz) of user k at location x + c : x2X , served by cluster c according to a scheme as dened above, we dene the normalized group spectral eciency of bin v(X ) as R X;C (F;C;J) = 1 FBN F1 X f=0 X c2 bs \V X x2X SN=m X k=1 R (N) k;c (f;x) (4.22) In Sections 4.6.1 and 4.6.2, we prove the following results. Theorem 4.1 For givenX;C, frequency reuse factor F , downlink loading factor S and pilot reuse factor Q, in the limit of N !1, the following normalized group spectral eciency of bin v(X ) is achievable with LSUBF MU-MIMO precoding (J = 0): R X;C (F;C;J = 0) = S mF X x2X log 1 + CM S 0;0 (x) 1 F +(x) + CM S (x) ! ; (4.23) where 5 (x) = 1 mC X x 0 2X X b2C X c2D(0) c;c;b (x 0 )g(x;c +b) c;c (x 0 ) (4.24) and (x) = X c2P(0;0)n0 1 c;c (x) 1 C X b2C g(x;c +b) g(x;b) c;c;b (x) ! 2 ; (4.25) with c;c (x) = 1 C X b2C c;c;b (x); (4.26) are coecients that depend uniquely on the system geometry, frequency and pilot reuse, but are independent of the loading factor S and of the BS antenna factor M. As a corollary of Theorem 4.1, we can recover the result of [Mar10]. It is sucient to letM!1 in (4.23) and obtain the regime of innite number of BS antennas per active 5 In (4.24) and (4.25) it is assumed, without loss of generality, that cluster c = 0 uses subband f = 0 and training codebook q = 0. 110 user. Particularizing this for xed S, C = 1, and Q = 1, as in [Mar10], the normalized group spectral eciency becomes lim M!1 R X;f0g (F; 1; 0) = S mF X x2X log 1 + g(x; 0) 2 P c2P(0;0)n0 g(x;c) 2 ! (4.27) As observed in [Mar10], in this regime the system spectral eciency is uniquely limited by the ICI due to pilot contamination. The next result yields the achievable group spectral eciency of LZFBF (i.e., J 1), in the case of single-cell processing (i.e., C = 1). DeneE(x) as the set of J 1 clusters c6= 0 with centers closest to x2X (if J = 1 thenE(x) =;). Then, we have: Theorem 4.2 For givenX , C = 1 (i.e.,C =f0g), frequency reuse factor F , downlink loading factorS and pilot reuse factorQ, in the limit ofN!1, the following normalized group spectral eciency of bin v(X ) is achievable with LZFBF MU-MIMO precoding (J 1): R X;f0g (F; 1;J 1) = S mF X x2X log 1 + MJS S 0;0;0 (x) 1 F +(x) + MJS S (x) ! (4.28) where (x) = X c2P(0;0)[E(x) 0;c;0 (x) + X c2D(0)P(0;0)E(x) g(x;c) (4.29) and (x) = X c2P(0;0)n0 g(x;c) g(x; 0) 2 c;c;0 (x) (4.30) are coecients that depend uniquely on the system geometry, frequency and pilot reuse, but are independent of the loading factor S and of the BS antenna factor M. In passing, we notice that the limit of (4.28) for M !1, coincides with (4.27). Therefore, as observed in [Mar10], in the \Massive MIMO" regime LZFBF yields no advantage over the simpler LSUBF. 111 The case of LZFBF with multicell processing (C > 1) needs some more notation. First, as illustrated in the previous section through Examples 8 and 9, we consider the casesJ = 1,J =Q andJ =C(Q1)+1, referred to as cases (a), (b) and (c), respectively, in the following. In case (c) it is useful to dene b(x;c) = arg minfd (x;c +b) :b2Cg, i.e., the closest BS to locationx2X in clusterc2E(x). ForC > 1, an exact asymptotic ICI power expression cannot be found due to the complicated statistical dependence of beamforming vectors and channel vectors due to pilot contamination (see details in Section 4.6.2). However, the following result yields an achievable rate based on an upper bound on the ICI power: Theorem 4.3 For givenX andC withC > 1, frequency reuse factorF , downlink loading factor S and pilot reuse factor Q, in the limit of N!1, the following normalized group spectral eciency of bin v(X ) is achievable with LZFBF MU-MIMO precoding (J 1): R X;C (F;C;J 1) = S mF X x2X log 1 + CMJS S 0;0 (x) 1 F +(x) + CM S (x) ! (4.31) where (x) = 8 > > > > > > > > > > > > > < > > > > > > > > > > > > > : X c2E(x)[f0g 0;c (x) + X c2D(0)n0E(x) g 0;c (x); in cases (a) and (b); 0;0 (x) + X c2D(0)n0E(x) g 0;c (x)+ + 1 C X c2E(x) 0 @ 0;c;b(x;c) (x) + X b2Cnb(x;c) g(x;c +b) 1 A in case (c), (4.32) and (x) = X c2P(0;0)n0 1 C X b2C g(x;c +b) g(x;b) 2 c;c;b (x) (4.33) with g 0;c (x) = 1 C X b2C g(x;c +b) (4.34) 112 and with 0;c (x) = 1 C X b2C 0;c;b (x); (4.35) are coecients that depend uniquely on the system geometry, frequency and pilot reuse, but are independent of the loading factor S and of the BS antenna factor M. 4.4 Scheduling and Fairness Consider a system with K bins,fv(X 0 );:::;v(X K1 ), dened by setsX k of symmetric locations chosen to uniformly discretize the cellular coverage region V. The net bin spectral eciency in bit/s/Hz, for each bin v(X k ), is obtained by maximizing over all possible schemes in the family, i.e., over all possible clusters C of size C = 1; 2;:::, frequency reuse factorF , loading factorS, pilot reuse factorQ, and beamforming scheme indicated by J, the product maxf1QS=L; 0gR X k ;C (F;C;J) (4.36) where the rst term denotes the ratio between data-phase and total slot channel uses, and takes into account the pilot dimensionality overhead, while the second term is the spectral eciency for a given network-MIMO scheme in the family, given by Theorems 4.1, 4.2 and 4.3, depending on the case. The maximization of (4.36) is subject to the constraint JS CM, which becomes relevant for J > 0 (i.e., for LZFBF precoding). Maximizing (4.36) requires searching over a discrete parameter space (apart from S, which is continuous). The simple closed-form expressions given in Theorems 4.1, 4.2 and 4.3 allow for an ecient and accurate search, avoiding lengthy Monte Carlo simulations. Suppose that for, each bin v(X k ), the best scheme in the network-MIMO family has been found, and let R ? (X k ) denote the corresponding maximum of (4.36). Then, a scheduler allocates the dierent bins on the time-frequency plane in order to maximize some desired network utility function of the user rates. With randomized or round-robin 113 selection of the active users in each bin, each user in binv(X k ) shares an equal fraction of the product k R ? (X k ), where k is the fraction of time-frequency dimensions allocated to binv(X k ). Under the assumption that users in the same bin should be treated with equal priority, we can focus on the maximization of a componentwise non-decreasing concave network utility function of the bin spectral eciencies, denoted byG(R 0 ;:::;R K1 ). The scheduler determines the fractions with which the bins are scheduled by solving: maximize G(R 0 ;:::;R K1 ) subject to R k k R ? (X k ); K1 X k=0 k 1; k 0: (4.37) For example, if Proportional Fairness (PF) [VTL02] is desired, we have G(R 0 ;:::;R K1 ) = K1 X k=0 logR k (4.38) resulting in the bin time-sharing fractions k = 1=K (each bin is scheduled an equal amount of time). In contrast, if the minimum user rate is relevant, we can impose max- min fairness by considering the function G(R 0 ;:::;R K1 ) = min k=0;:::;K1 R k : (4.39) This results in the bin time-sharing fractions k = 1 R ? (X k ) P K1 j=0 1 R ? (X j ) . A general family of network utility functions of which (4.38) and (4.39) are special cases, is the so-called -fairness criterion dened in [MW00]. 4.5 Numerical Results and Discussion In this section, we present illustrative examples of the achievable spectral eciency of the proposed architecture for 1-dimensional and 2-dimensional layouts. We investigate the 114 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0 20 40 60 80 100 120 140 Bin location, x (km) Bin spectral efficiency (bits/s/Hz) (1,1,0), Q=1 (1,1,1), Q=1 (1,1,2), Q=2 (1,2,3), Q=2 Figure 4.6: Bin spectral eciency vs. location within a cell obtained from the large system analysis (solid) and the nite dimension (N = 1) simulation (dotted) for various (F;C;J). M = 30 and L = 40. merits and demerits of each scheme with respect to the user locations within a cell and study the advantages of the location-based architecture over the simple scheme in [Mar10], for the practically relevant case of a nite number of BS antennas per active user. Figure 4.6 shows the group spectral eciency in (4.36) as a function of the bin lo- cations within a cell for dierent schemes identied by the parameters (F;C;J) and Q. These results are obtained by Monte Carlo simulation (dotted) and closed-form large- system analysis (solid). We considered the 1-dimensional cell layout in Figure 4.2 with B = 24 BSs, M = 30 antenna factor per BS, L = 40 coherence block dimension factor, and K = 10 bins in each cluster, where clusters and location bins are as described in Example 3 and 5, with x uniformly distributed in [0; 1=2]. The pathloss model is the same as in [PCR11], where g(x;b) = G 0 =(1 + (d (x;b)=) ), with G 0 = 10 6 , = 3:76, and = 0:05, and re ects (after suitable normalizations) a typical cellular scenario with 1km diameter cells in a sub-urban environment. The (1,1,1) scheme with Q = 1 is the best for bins at the center of each cell. However, at the cell edges, C = 2, J = 2, or F = 2 (not included in the gure) attains much better performance. Notice also that the large system analysis yields a very accurate approximation of the nite dimensional 115 simulation, even for small scaling parameter N (we used the minimum possibleN = 1 in this case). Given this remarkable accuracy, in the following we present only the results for the large-system limit, since they can all be obtained in closed form using Theorems 4.1, 4.2 and 4.3. In the two-dimensional case, we consider the coverage region of B = 19 hexagonal cells as shown in Figure 4.1. For comparison, we assume the same system model as in [Mar10]. In particular, the channel coherence block dimension, the cell radius, and the pathloss model are given as L = 84, 1.6 km, and g(x;b) in the same form as before, with parameters G 0 = 10 6 , = 0:1 km, and = 3:8, respectively. Log-normal shadowing, considered in [Mar10], is not considered here. We assume C = 1 and C = 3 cluster patterns, K = 16 bins with 48 user locations, where the cluster and bin layout are as described in Figure 4.2. The frequency reuse factorF and pilot reuse factorQ is selected from the setf1; 3g and, when F or Q = 3, the frequency subband or pilot sequence set is allocated to clusters as shown in Figure 4.1 where dierent colors denotes dierent subbands or sequence sets. Figure 4.7 illustrates the optimal scheme over the family of network-MIMO schemes for (a) M = 20 and (b) M = 100. In both cases, (1; 1; 1) is optimal inside a cell but (3; 3; 1) or (3; 1; 1) is better near cell boundaries, with the cluster pattern of (3; 3; 1) depending on the user locations. However, the area where the (1; 1; 1) scheme is the best is much wider in the case of M = 100. This suggest that, when the number of BS antennas per active user is very large (towards the Massive MIMO limit), single-cell processing tends to overcome multi-cell joint processing, with the possible exception of a region near the cell edges, which shrinks as the number of antennas per active user increases. In contrast, when for not-so-large number of antennas per active user, multi-cell joint processing can achieve signicant gains. Next we compare the performance of the proposed architecture with the one advocated in [Mar10]. Figure 4.8 shows the bin-optimized spectral eciency normalized by the spectral eciency of (1; 1; 0);Q = 1 scheme under a two-dimensional layout withM = 50. This gure illustrates the advantage of the proposed architectures with respect to the 116 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −1.5 −1 −0.5 0 0.5 1 1.5 x−coordinate (km) y−coordinate (km) (1,1,1), Q=1 (3,3,1), Q=1, cluster pattern 1 (3,3,1), Q=1, cluster pattern 2 (a) M=20 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −1.5 −1 −0.5 0 0.5 1 1.5 x−coordinate (km) y−coordinate (km) (1,1,1), Q=1 (3,1,1), Q=1 (3,3,1), Q=1, cluster pattern 1 (3,3,1), Q=1, cluster pattern 2 (b) M=100 Figure 4.7: Optimal scheme at each user locations. M = 20 and 100, K = 16, and L = 84. 117 Figure 4.8: Bin-optimized spectral eciencies normalized by the (1,1,0) spectral ecien- cies. M = 50, K = 48,and L = 84. basic scheme of [Mar10]. The gain is 40 580% depending on the location and especially higher at the cell centers and cell boundaries. In Figure 4.9, we consider the system wide performance and the cluster sum throughputs are plotted as a function of M in the two-dimensional layout, where the individual user group throughputs are obtained from (4.37) under PF scheduling of (4.38). The cluster scheme includes two cases where the cluster pattern is xed as one of two shown in Figure 4.2 or can be switched to the closest one depending on the user locations. For comparison, we assume 20 MHz bandwidth and the coherence block size L = 84 as in [Mar10] (considering the parameters of 3GPP LTE TDD system). As the gure reveals, the (3; 3; 1) schemes perform very well for small M < 20, but, as M increases, the (1; 1; 1) scheme outperforms them. The bin-optimized architecture improves the throughput further at any value of M. The dotted horizontal line in Figure 4.9 denotes the cell throughput claimed in [Mar10] for the case of the innite number of transmit antennas per user with the (1; 1; 0) architecture and random scheduling. We notice that this limit can be approached very slowly, and more than 10000 antennas per BS are required (clearly impractical). Overall, the system spectral 118 10 1 10 2 10 3 10 4 0 500 1000 1500 2000 2500 Factor of the number of BS antennas, M Cell/cluster throughput (Mbps) (1,1,0), Q=1 (1,1,1), Q=1 (1,1,3), Q=3 (3,3,1), Q=1, w/o cluster switching (3,3,1), Q=1, w/ cluster switching Bin optimized Figure 4.9: Cluster sum throughput vs. M for various (F;C;J) and for a bin-optimized architecture under PF scheduling. K = 48 and L = 84. The arrow indicates that the proposed architecture achieves the same spectral eciency as the xed scheme (1; 1; 0) of [Mar10], with a 10-fold reduction of the number of BS antennas. eciency obtained by the proposed architecture is similar to that achieved by the xed- scheme (1; 1; 0) in [Mar10], with a 10-fold reduction in the number of antennas at the base stations (roughly, from 500 to 50 antennas, as indicated by the arrow). 4.6 Proofs 4.6.1 Proof of Theorem 4.1 We focus on the reference clusterC (i.e., c = 0), with corresponding served group of locationsX =fx 0 ;:::;x m1 g. For the sake of notation simplicity, we omit the subchannel index f, and letD denote the set of clusters active on the same subchannel of cluster 0, andP denote the set of clusters that share the same pilot block as cluster 0. From (4.5), 119 the (scalar) signal received at some symbol interval of the data phase, at the k-th active user receiver at location x2X , is given by y k;0 (x) = u k;0 (x)v H k;0 (x)h k;0;0 (x) (4.40a) + X j6=k u j;0 (x)v H j;0 (x)h k;0;0 (x) + X x 0 2Xnx X j u j;0 (x 0 )v H j;0 (x 0 )h k;0;0 (x) (4.40b) + X c 0 2Dn0 X x 0 2X X j u j;c 0(x 0 )v H j;c 0(x 0 )h k;0;c 0(x) +z k;0 (x); (4.40c) where u j;c 0(x 0 ) denotes the code symbol transmitted by cluster c 0 , to user j at location x 0 +c 0 :x 0 2X . With LSUBF downlink precoding, we have v j;c 0(x 0 ) = b h j;c 0 ;c 0(x 0 ) 1 b h j;c 0 ;c 0(x 0 ) (4.41) Using the MMSE decomposition (4.9), we isolate the useful signal term from (4.40a), given by, u k;0 (x)v H k;0 (x) b h k;0;0 (x): (4.42) The sum of the residual self-interference term due to the channel estimation error with the signals in (4.40b) transmitted by cluster 0 to the other users, results in the intra-cluster interference term u k;0 (x)v H k;0 (x)e k;0;0 (x) + X j6=k u j;0 (x)v H j;0 (x)h k;0;0 (x) + X x 0 2Xnx X j u j;0 (x 0 )v H j;0 (x 0 )h k;0;0 (x): (4.43) Finally, the ICI term and background noise are given in (4.40c). A standard achievability bound based on the worst-case uncorrelated additive noise [HH03] yields that the achievable rate R (N) k;0 (x) =E 2 4 log 0 @ 1 + E h juseful signal termj 2 j b h k;0;0 (x) i E h jnoise plus interference termj 2 j b h k;0;0 (x) i 1 A 3 5 : (4.44) 120 Both numerator and denominator of the Signal-to-Interference plus Noise Ratio (SINR) appearing inside the log in (4.44) converge to deterministic limits as N!1. We will use extensively the representation of the channel MMSE estimates as b h j;c;c 0(x 0 ) = 1 p N 1=2 c;c 0 (x 0 )a j;c;c 0(x 0 ) (4.45) where the vectors a j;c;c 0(x 0 ) are i.i.d.CN (0; I CMN ), with generic components denoted byfa n;b : n = 1;:::;MNg for all b2C. We will also make use of the following limit, which follows as a direct application of the strong law of large numbers: b h j;c;c 0(x 0 ) 2 = X b2C c;c 0 ;b (x 0 ) 1 N MN X n=1 ja n;b j 2 a:s: ! CM c;c 0 (x 0 ) (4.46) where c;c 0 (x 0 ) is dened in (4.26). Using (4.41), the SINR numerator is given by E u k;0 (x)v H k;0 (x) b h k;0;0 (x) 2 b h k;0;0 (x) = E ju k;0 (x)j 2 b h k;0;0 (x) 2 b h k;0;0 (x) (4.47a) = 1 S b h k;0;0 (x) 2 (4.47b) a:s: ! CM S 0;0 (x) (4.47c) where in (4.47a) we used the LSUBF denition (4.41). Next, we notice that all the terms forming interference and noise are uncorrelated. Hence, the conditional average interference power can be calculated as a sum of individual terms. The self-interference due to non-ideal CSIT is given by E u k;0 (x)v H k;0 (x)e k;0;0 (x) 2 b h k;0;0 (x) = 1 SN k b h k;0;0 (x)k 2 b h H k;0;0 (x) 0;0 (x) b h k;0;0 (x) = 1 SN P b2C 0;0;b (x) 0;0;b (x) 1 N P MN n=1 ja n;b j 2 CM 0;0 (x) a:s: ! 0 (4.48) 121 where (4.48) follows from noticing that X b2C 0;0;b (x) 0;0;b (x) 1 N MN X n=1 ja n;b j 2 a:s: ! M X b2C 0;0;b (x) 0;0;b (x); which is a nite constant. Following very similar calculations (omitted for brevity) and recalling that g(x;b) = 0;0;b (x) + 0;0;b (x) (see (4.11)) and that SN=m users per location x 0 2X are active, we obtain the intra-cluster interference power terms as 1 mC X x 0 2X X b2C 0;0;b (x 0 )g(x;b) 0;0 (x 0 ) (4.49) Next, we consider the ICI power term. In doing so, we must pay attention to the pilot contamination eect. In particular, we have to separate all contributions in (4.40c) coming from the k-th beam of clusters c 0 2P (i.e., for users sharing the same pilot signal of the reference user k at x2X ), from the rest. The two contributions to the ICI are I same pilot = X c 0 2Pn0 u k;c 0(x)v H k;c 0(x)h k;0;c 0(x) (4.50) and I no same pilot = X c 0 2Pn0 X j6=k u j;c 0(x)v H j;c 0(x)h k;0;c 0(x) + X c 0 2Pn0 X x 0 2Xnx X j u j;c 0(x 0 )v H j;c 0(x 0 )h k;0;c 0(x) + X c 0 2DP X x 0 2X X j u j;c 0(x 0 )v H j;c 0(x 0 )h k;0;c 0(x) (4.51) 122 BothI same pilot andI no same pilot are independent of b h k;0;0 (x). Therefore, condition- ing in the expectation can be omitted. Each individual term appearing in the sum (4.51) yields NE u j;c 0(x 0 )v H j;c 0(x 0 )h k;0;c 0(x) 2 ! 1 S 1 N tr c 0 ;c 0(x 0 )G 0;c 0(x) CM c 0 ;c 0 (x 0 ) = 1 SC X b2C c 0 ;c 0 ;b (x 0 )g(x;c 0 +b) c 0 ;c 0 (x 0 ) : Summing over all terms, we have E h jI no same pilot j 2 i = X c 0 2Dn0 1 mC X x 0 2X X b2C c 0 ;c 0 ;b (x 0 )g(x;c 0 +b) c 0 ;c 0 (x 0 ) : (4.52) In order to evaluate E h jI same pilot j 2 i , we use the decomposition (4.14) applied to h k;0;c 0(x), namely, h k;0;c 0(x) = G 0;c 0(x)G 1 c 0 ;c 0 (x) b h k;c 0 ;c 0(x) + e k;0;c 0(x): (4.53) The general term in (4.50) yields E u k;c 0(x)v H k;c 0(x)h k;0;c 0(x) 2 = 1 S E h k b h k;c 0 ;c 0(x)k 2 b h H k;c 0 ;c 0(x)G 0;c 0(x)G 1 c 0 ;c 0 (x) b h k;c 0 ;c 0(x) 2 + b h H k;c 0 ;c 0(x)e k;0;c 0(x)e H k;0;c 0(x) b h k;c 0 ;c 0(x) i ! M SC c 0 ;c 0 (x) X b2C g(x;c 0 +b) g(x;b) c 0 ;c 0 ;b (x) ! 2 (4.54) 123 where, inside the expectation, we used the a.s. limits (4.46), b h H k;c 0 ;c 0(x)G 0;c 0(x)G 1 c 0 ;c 0 (x) b h k;c 0 ;c 0(x) = X b2C g(x;c 0 +b) g(c 0 +x;c 0 +b) c 0 ;c 0 ;b (x) 1 N MN X n=1 ja n;b j 2 a:s: ! M X b2C g(x;c 0 +b) g(x;b) c 0 ;c 0 ;b (x); with g(c 0 +x;c 0 +b) =g(x;b), and b h H k;c 0 ;c 0(x)e k;0;c 0(x) a:s: ! 0, and the limit E h b h H k;c 0 ;c 0(x)e k;0;c 0(x)e H k;0;c 0(x) b h k;c 0 ;c 0(x) i = 1 N E h b h H k;c 0 ;c 0(x) 0;c 0(x) b h k;c 0 ;c 0(x) i = 1 N 2 tr c 0 ;c 0(x) 0;c 0(x) ! 0 (4.55) Summing over all such terms, we obtain E h jI same pilot j 2 i = CM S X c 0 2Pn0 1 c 0 ;c 0 (x) 1 C X b2C g(x;c 0 +b) g(x;b) c 0 ;c 0 ;b (x) ! 2 (4.56) Using (4.47c), (4.49), (4.52) and (4.56) in (4.44), recalling that the noise variance is equal to 1=F , summing over all users in the reference groupX and observing that the system is symmetric (by construction) with respect to any cluster and any subband, we nd the normalized group spectral eciency of bin v(X ) in the form (4.23). 4.6.2 Proof of Theorems 4.2 and 4.3 With reference to Section 4.3.1, we consider LZFBF with or without inter-cluster inter- ference (ICI) constraints, depending on the value of J 1. In particular, each cluster createsJSN beamforming vectors forSN users in the same cluster and (J 1)SN users in the neighbor clusters. Any active user in the system, at any given scheduling slot, imposes ZF constraints (see (4.15)) to J clusters. We restrict our attention to the three cases treated in Section 4.3.1, referred to as: Case (a) J = 1; Case (b) J = Q; Case (c) J = (Q 1)C + 1. Again, we focus on the reference clusterC (c = 0) with served groupX . In all cases, the beamforming matrix V 0 is given by the column-normalized 124 Moore-Penrose pseudo-inverse of the estimated channel matrix (see (4.17), (4.19) and (4.21)), of size CMNJSN, and formed by blocks of size NSN=m, as follows: M 0 = 2 6 6 6 6 6 6 6 4 M 0;0 M 0;1 M 0;Jm1 M 1;0 M 1;1 M 1;Jm1 . . . . . . M CM1;0 M CM1;1 M CM1;Jm1 3 7 7 7 7 7 7 7 5 ; (4.57) where each block M i;j corresponds to the MN antennas of BS b2C and to the SN=m active users in some locationx 0 with respect to which ZF constraints are imposed. For the purpose of analysis, the important fact to be noticed is that the blocks M i;j are mutually independent, and are formed by i.i.d. elements with mean zero and constant (on each block) variance. For example, if block M i;j corresponds to a user locationc 0 +x 0 :x 0 2X and BSb2C such that the corresponding channel vectors are estimated from the uplink training phase, the elements of M i;j areCN (0; c 0 ;0;b (x)=N) (see (4.11) in Section 4.2.2). Instead, if block M i;j corresponds to a user location and a BS such that the corresponding channel vectors are treated as zero (see Section 4.3.1, Example 9), then M i;j = 0 (all-zero block). The signal received by user k at location x2X takes on the form (4.40). From Theorem 3.3 the rate R (N) k;0 (x) =E 2 4 log 0 @ 1 + E h juseful signal termj 2 j v k;0 (x); b h k;0;0 (x) i E h jnoise plus interference termj 2 j v k;0 (x); b h k;0;0 (x) i 1 A 3 5 (4.58) is achievable, assuming that the receiver has perfect knowledge of its own estimated chan- nel and beamforming vector. The large system limit of the LZFBF useful signal coecient v H k;0 (x) b h k;0;0 (x) for channel matrices in the form (4.57) is obtained in Theorem 3.1 (de- tails are omitted for the sake of brevity). While in general this limit is obtained as the solution of a xed-point equation that must be solved numerically, the user locations and the BS positions considered in this work satisfy the symmetry conditions given in 125 Section 3.2.1, and the asymptotic useful signal term admits a simple closed form given in (3.27). Applying this result we obtain v H j;c 0(x 0 ) b h j;c 0 ;c 0(x 0 ) 2 a:s: ! (CMJS) c 0 ;c 0 (x 0 ) (4.59) for anyj;c 0 bs \V andx 0 2X . By construction, it is assumed thatJS <CM. Notice the well-known dimensionality limit of the ZF beamforming: when the ratio of the number of ZF constraints per degree of freedom (antenna)JS=(CM) tends to 1, the eective useful signal term vanishes. Using (4.59) and recalling thatE[ju k;0 (x)j 2 ] = 1=S we obtain the SINR numerator in (4.58) as E u k;0 (x)v H k;0 (x) b h k;0;0 (x) 2 v k;0 (x); b h k;0;0 (x) a:s: ! CMJS S 0;0 (x): (4.60) As done in Section 4.6.1, we consider the intra-cluster, ICI and noise terms in the SINR denominator of (4.58) separately. The ZF constraints imply that v H j;0 (x 0 ) b h k;0;0 (x) = 0 for all (j;x 0 )6= (k;x) :x 0 2X . Therefore, the intra-cluster interference term given in general by (4.43), reduces to X x 0 2X X j u j;0 (x 0 )v H j;0 (x 0 )e k;0;0 (x); and its conditional second moment is given by E P x 0 2X P j u j;0 (x 0 )v H j;0 (x 0 )e k;0;0 (x) 2 v k;0 (x); b h k;0;0 (x) = 1 S E h tr V H 0 E h e k;0;0 (x)e H k;0;0 (x) i V 0 v k;0 (x) i = 1 SN E h tr V 0 V H 0 0;0 (x) v k;0 (x) i a:s: ! 1 C X b2C 0;0;b (x) = 0;0 (x); (4.61) where the last line follows by noticing that, from Theorem 3.2, the matrix V 0 V H 0 in the large-system limit satises the following \constant partial trace" property: the sum of MN diagonal elements of V 0 V H 0 corresponding to the antennas of any BSb2C, divided 126 by N, tends to the constant limit S=C, independent of the BS index b. This is again a consequence of the symmetry conditions in Section 3.2.1 6 Next, we consider the ICI term and we separate it intoI same pilot andI no same pilot . The conditioning with respect to v k;0 (x); b h k;0;0 (x) is irrelevant for the ICI terms and therefore it can be omitted. First, we evaluate the pilot contamination eect for the case C = 1. Using (4.53), (4.60) and (4.50), we obtain E h jI same pilot j 2 i = E 2 4 X c 0 2Pn0 u k;c 0(x)v H k;c 0(x)h k;0;c 0(x) 2 3 5 = 1 S X c 0 2Pn0 E v H k;c 0(x) G 0;c 0(x)G 1 c 0 ;c 0 (x) b h k;c 0 ;c 0(x) + e k;0;c 0(x) 2 = 1 S X c 0 2Pn0 ( g(x;c 0 ) g(x; 0) 2 E v H k;c 0(x) b h k;c 0 ;c 0(x) 2 +E v H k;c 0(x)e k;0;c 0(x) 2 ) ! MJS S X c 0 2Pn0 g(x;c 0 ) g(x; 0) 2 c 0 ;c 0 (x) (4.62) where we used (4.59) and E v H k;c 0(x)e k;0;c 0(x) 2 = 1 N E h v H k;c 0(x) 0;c 0(x)v k;c 0(x) i 1 N E kv k;c 0(x)k 2 max b2C f 0;c 0 ;b (x)g = 1 N max b2C f 0;c 0 ;b (x)g! 0 (4.63) For C > 1, we have G 0;c 0(x)G 1 c 0 ;c 0 (x) = diag g(x;c 0 +b) g(x;b) I MN :b2C . While v H k;c 0 (x) and b h k;c 0 ;c 0(x) are orthogonal by design, the termE v H k;c 0 (x)G 0;c 0(x)G 1 c 0 ;c 0 (x) b h k;c 0 ;c 0(x) 2 is generally non-zero and does not admit a simple closed-form since v k;c 0(x) and b h k;c 0 ;c 0(x) 6 Notice that since the columns of V0 have unit norm we have 1 N tr(V0V H 0 ) = 1 N tr(V H 0 V0) = S. However, since 0;0(x) is block-diagonal with constant diagonal blocks 0;0;b (x)IMN , the constant partial trace property is needed in order to obtain (4.61). 127 are statistically dependent. In order to overcome this problem, we consider the following upper bound obtained by applying Cauchy-Schwartz inequality: v H k;c 0(x)G 0;c 0(x)G 1 c 0 ;c 0 (x) b h k;c 0 ;c 0(x) 2 v k;c 0(x) 2 G 0;c 0(x)G 1 c 0 ;c 0 (x) b h k;c 0 ;c 0(x) 2 (4.64) Recalling that v k;c 0(x) has unit norm and that E[ b h k;c 0 ;c 0(x) b h H k;c 0 ;c 0(x)] = 1 N c 0 ;c 0(x), we obtain E h jI same pilot j 2 i CM S X c 0 2Pn0 1 C X b2C g(x;c 0 +b) g(x;b) 2 c 0 ;c 0 ;b (x) (4.65) Next, we examine the ICI power caused by the termI no same pilot . In the case of J 2, this can be further decomposed into a termI ICI-ZF , taking into account the clusters which have a ZF constraint with respect to user k at location x2X , and I no-ICI-ZF , taking into account all other clusters. In order to proceed, we deneE(x) as the set of J 1 clusters c6= 0 with centers closest to x2X . With these denition, we have I ICI-ZF = X c 0 2E(x) X x 0 2X X j u j;c 0(x 0 )v H j;c 0(x 0 )h k;0;c 0(x) (4.66) and I no-ICI-ZF = X c 0 2Pn0 X j6=k u j;c 0(x)v H j;c 0(x)h k;0;c 0(x) + X c 0 2Pn0 X x 0 2Xnx X j u j;c 0(x 0 )v H j;c 0(x 0 )h k;0;c 0(x) + X c 0 2DPE(x) X x 0 2X X j u j;c 0(x 0 )v H j;c 0(x 0 )h k;0;c 0(x): (4.67) We start with the terms in (4.67). For c 0 2Pn0, by denition of LZFBF we have that v H j;c 0 (x 0 ) b h k;c 0 ;c 0(x) = 0 for all (j;x 0 )6= (k;x). ForC = 1, since G 0;c 0(x)G 1 c 0 ;c 0 (x) is a scaled identity matrix, using (4.53) we have that v H j;c 0(x 0 )h k;0;c 0(x) = v H j;c 0(x 0 )e k;0;c 0(x) (4.68) 128 For c 0 2DPE(x), the vectors v H j;c 0 (x 0 ) and h k;0;c 0(x) are statistically independent. Hence, for C = 1 we have lim N!1 E h I no-ICI-ZF 2 i = lim N!1 8 < : X c 0 2Pn0 1 SN E h tr V c 0V H c 0 0;c 0(x) i + X c 0 2DPE(x) 1 SN E h tr V c 0V H c 0G 0;c 0(x) i 9 = ; = X c 0 2Pn0 0;c 0 ;0 (x) + X c 0 2DPE(x) g(x;c 0 ) (4.69) where in (4.69) we used the asymptotic constant partial trace property of the matrix V c 0V H c 0 . ForC > 1, because of the block-diagonal form of the matrix G 0;c 0(x)G 1 c 0 ;c 0 (x) already mentioned before, (4.68) does not hold in general. An upper bound to the interference power in this case can be obtained by assuming that the MMSE estimate b h k;0;c 0(x) of the channel from user k at location x2X and the antennas of cluster c 0 is so noisy that it can be considered equal to zero. Therefore, the estimation error e k;0;c 0(x) has covariance 1 N G 0;c 0(x), and we obtain lim N!1 E h I no-ICI-ZF 2 i X c 0 2Dn0E(x) g 0;c 0 (x) (4.70) whereg 0;c 0 (x) is dened in (4.34). Finally, we considerI ICI-ZF in (4.66). We distinguish dierent cases depending on the value of J. In case (a), this term does not exist. In case (b), we have v H j;c 0 (x 0 ) b h k;0;c 0(x) = 0 for all j and all x2E(x). Hence, similarly to (4.61), we obtain E h I ICI-ZF 2 i ! X c 0 2E(x) 0;c 0(x) (4.71) In case (c), the ZF vectors v j;c 0(x 0 ) of clusterc 0 2E(x) are calculated by imposing orthog- onality conditions with the segment of the estimated channel vector b h k;0;c 0(x) correspond- ing to theMN antennas of the closest BS. In order to proceed further, we dene the index 129 of the closest BS to locationx in clusterc 0 2E(x) asb(x;c) = arg minfd (x;c+b) :b2Cg. Then, the eective channel used for ZF beamforming calculation is given by e h k;0;c 0(x) = b(x;c 0 ) b h k;0;c 0(x) where b(x;c 0 ) is a selection matrix, with all elements equal to zero but for a block of diagonal elements corresponding to the positions of the MN antennas of BS b(x;c 0 ). By construction, and using the MMSE decomposition, we have v H j;c 0(x 0 )h k;0;c 0(x) = v H j;c 0(x 0 ) b(x;c 0 ) b h k;0;c 0(x) + (I CMN b(x;c 0 ) ) b h k;0;c 0(x) + e k;0;c 0(x) = v H j;c 0(x 0 ) (I CMN b(x;c 0 ) )h k;0;c 0(x) + b(x;c 0 ) e k;0;c 0(x) = v H j;c 0(x 0 )e e k;0;c 0(x) (4.72) wheree e k;0;c 0(x) is independent of all beamfomrming vectors V c 0 of clusterc 0 2E(x), and has covariance matrix 1 N (I CMN b(x;c 0 ) )G 0;c 0(x)(I CMN b(x;c 0 ) ) + b(x;c 0 ) 0;c 0(x) b(x;c 0 ) Using these facts and operating similarly as in (4.71), we obtain E h I ICI-ZF 2 i ! X c 0 2E(x) 1 C 0 @ X b2Cnb(x;c 0 ) g(x;b +c 0 ) + 0;c 0 ;b(x;c 0 ) (x) 1 A (4.73) From (4.60), (4.61), (4.62), (4.69), (4.71), and (4.73), the normalized group spectral eciency for C = 1 and J 1 is obtained in the form (4.28) For the cluster case C > 1, using bounds (4.65), and (4.70), we obtain the achievable normalized group spectral eciency given by (4.31). 7 7 A lower bound to an achievable rate is also achievable. 130 Chapter 5 Conclusions In this dissertation, we developed analytic performance evaluation and optimization tools, based on the large system analysis for the MU-MIMO cellular system. We considered the distance-dependent pathloss and inter-cell interference make users' channel statistics unequal. In this case, it is important to evaluate the system performance subject to some form of fairness, formulated as the maximization of a network utility function over the ergodic achievable rates of users. Downlink scheduling and resource allocation in order to maximize a desired network utility function is a widely studied issue in the literature, that has found several important practical applications [VTL02,BBG + 00,PELT06]. Although dynamic scheduling policies are well-known form in the theory of stochastic network optimization, the computation of the resulting ergodic throughput for a multi-cell MU- MIMO downlink of practically relevant size, including tens of cells, hundreds of users per cell, and clusters of jointly processed cooperating base stations has been evaluated so far through very demanding system Monte Carlo simulation. Using the developed tools which is computationally much more ecient than the simulation, we considered fundamental issues in the system design. First, we presented a semi-analytic method for the computation of the optimal fairness rate point in the ergodic sense, based on a combination of the large random matrix theory and Lagrangian optimization. This analysis yields an ecient computation method of the system throughput subject to general fairness criteria, through the iterative solution 131 of a system of xed-point equations. Numerical results show that the rates predicted by the large-system analysis are indeed remarkably close to the rates of the corresponding nite-dimensional systems obtained by Monte Carlo simulation. Second, we investigated the trade-o between the benets of inter-cell cooperation and the cost of the increased channel state information requirements. This tradeo yields the optimal \cooperation cluster size" that maximizes the system throughput subject to fair- ness, when the cost of channel estimation is also taken into account. Due to this training overhead, the increase in the cooperation cluster size does not necessarily correspond to a system throughput increase. For this analysis, with a focus on linear zero-forcing beam- forming (LZFBF) combined with user selection, we derived the asymptotic expression in the large system limit under non-perfect channel state information at the transmitter (CSIT). We showed that under certain system symmetries, the analysis becomes much simpler and allows for a closed-form solution of a xed-point equation that character- ized the LZFBF performance. We also proposed a random user selection scheme that associates users with downlink data streams according to probabilities obtained from the asymptotic analysis, and provides a good approximation of the optimal throughput point while requiring much less CSIT feedback resource. Lastly, we proposed a novel network-MIMO TDD architecture that achieves spectral eciencies comparable with the recently proposed \Massive MIMO" scheme, with one order of magnitude less antennas per active user per cell. The proposed strategy operates by partitioning the users population into geographically determined \bins", and splitting time-frequency resources to form independent MU-MIMO transmissions optimized for each of the bins. This strategy allows the uplink training reuse factor, the frequency reuse factor, the active user loading factor, the BS cooperative cluster size and the type of MU- MIMO linear beamforming to be nely tuned to the particular user bin. We considered such network-MIMO scheme optimization, based on a family of possible schemes that range from single-cell processing to joint processing over clusters of coordinated BSs, with linear precoders ranging from conventional linear single-user beamforming to zero-forcing 132 beamforming with additional zero-forcing constraints for neighboring cells. In order to carry out the system optimization, we developed elegant closed-form formulas for the achievable spectral eciency in the large-system limit, where all system dimensions scale to innity with xed ratios. We demonstrated that dierent schemes in the considered family achieve the best spectral eciency at dierent user locations. This suggests the need for a location-adaptive architecture selection to serve the whole user population eciently. The resulting overall system is therefore a \mixed-mode" network- MIMO architecture, where dierent schemes, each of which is optimized for the corresponding served user bin, are multiplexed in the time-frequency plane. Overall, the results of this dissertation serve many useful purposes in evaluating and optimizing multi-cell MU-MIMO systems, especially when the system dimension and network size are large. For example, we can quickly quantify and compare the performance of systems with dierent cooperative clustering arrangements, dierent precoding, and channel training under dierent duplex methods (FDD or TDD) with corresponding system overheads, subject to dierent fairness criteria. 133 References [3GP10a] 3GPP technical specication group radio access network. Further advance- ments for E-UTRA: LTE-Advanced feasibility studies in RAN WG4. Technical report, 3GPP TR 36.815, Mar. 2010. [3GP10b] 3GPP technical specication group radio access network. Further advance- ments for E-UTRA: physical layer aspects. Technical report, 3GPP TR 36.814, Mar. 2010. [ABEH06] Defne Aktas, M. Naeem Bacha, Jamie S. Evans, and Stephen V. Hanly. Scal- ing results on the sum capacity of cellular networks with MIMO links. IEEE Trans. on Inform. Theory, 52(7):3264{3274, July 2006. [BBG + 00] Paul Bender, Peter Black, Matthew Grob, Roberto Padovani, Nagabhushana Sindhushayana, and Andrew Viterbi. CDMA/HDR: A bandwidth-ecient high-speed wireless data service for nomadic users. IEEE Commun. Mag., 38(7):70{77, July 2000. [BH07] Federico Boccardi and Howard Huang. Limited downlink network coordination in cellular networks. In Proc. IEEE Int. Symp. on Personal, Indoor, and Mobile Radio Commun. (PIMRC), Athens, Greece, Sept. 2007. [BPG + 09] Gary Boudreau, John Panicker, Ning Guo, Rui Chang, Neng Wang, and So- phie Vrzic. Interference coordination and cancellation for 4G networks. IEEE Commun. Mag., 47(4):74{81, Apr. 2009. [BT97] Dimitris Bertsimas and John N. Tsitsiklis. Introduction to Linear Optimiza- tion. Athena Scientic, 1997. [BV04] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. [CJKR10] Giuseppe Caire, Nihar Jindal, Mari Kobayashi, and Niranjay Ravindran. Mul- tiuser MIMO achievable rates with downlink training and channel state feed- back. IEEE Trans. on Inform. Theory, 56(6):2845{2866, June 2010. [CK04] Ward Cheney and David Kincaid. Nemerical Mathematics and Computing. Thomson Brooks/Cole, 2004. 134 [CRP + 08] Giuseppe Caire, Sean A. Ramprashad, Haralabos C. Papadopoulos, Christine Pepin, and Carl-Erik W. Sundberg. Multiuser MIMO downlink with limited inter-cell cooperation: approximate interference alignment in time, frequency and space. In Proc. Allerton Conf. on Commun., Control, and Computing, Urbana-Champaign, IL, Sept. 2008. [CRP10] Giuseppe Caire, Sean A. Ramprashad, and Haralabos C. Papadopoulos. Re- thinking network MIMO: cost of CSIT, performance Analysis, and architecture Comparisons. In Proc. Inform. Theory and Appl. Workshop (ITA), San Diego, CA, Feb. 2010. [CS03] Giuseppe Caire and Shlomo Shamai (Shitz). On the achievable throughput of a multiantenna Gaussian broadcast channel. IEEE Trans. on Inform. Theory, 49(7):1691{1706, July 2003. [CT05] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wi- ley, 2005. [CTKV02] Chen-Nee Chuah, David N. C. Tse, Joseph M. Kahn, and Reinaldo A. Valen- zuela. Capacity scaling in MIMO wireless systems under correlated fading. IEEE Trans. on Inform. Theory, 48(3):637{650, Mar. 2002. [DLZ07] Peilu Ding, David J. Love, and Michael D. Zoltowski. Multiple antenna broad- cast channels with shape feedback and limited feedback. IEEE Trans. on Sig. Proc., 55(7):3417{3428, July 2007. [DP03] Huaiyu Dai and H. Vincent Poor. Asymptotic spectral eciency of multi- cell MIMO systems with frequency- at fading. IEEE Trans. on Sig. Proc., 51(11):2976{2988, Nov 2003. [DS05] Goran Dimic and Nicholas D. Sidiropoulos. On downlink beamforming with greedy user selection: performance analysis and simple new algorithm. IEEE Trans. on Sig. Proc., 53(10):3857{3868, Oct. 2005. [DY10] Hayssam Dahrouj and Wei Yu. Coordinated beamforming for the multicell multi-antenna wireless system. IEEE Trans. on Wireless Commun., 9(5):1748{ 1759, May 2010. [FCD + 09] A. Farajidana, Wanshi Chen, A. Damnjanovic, Taesang Yoo, D. Malladi, and C. Lott. 3GPP LTE downlink system performance. In Proc. IEEE Global Commun. Conf. (GLOBECOM), Honolulu, HI, Nov. 2009. [FKV06] Gerard J. Foschini, Kemal Karakayali, and Reinaldo A. Valenzuela. Coordinat- ing multiple antenna cellular networks to achieve enormous spectral eciency. IEE Proc. Commun., 153(4):548{555, Aug 2006. [Gir90] Vyacheslav L. Girko. Theory of Random Determinants. Kluwer Academic Publishers, Doordrecht and Boston, 1990. 135 [GNT06] Leonidas Georgiadis, Michael J. Neely, and Leandros Tassiulas. Resource Allo- cation and Cross-Layer Control in Wireless Networks, volume 1. Foundations and Trends in Networking, 2006. [HH03] Babak Hassibi and Bertrand M. Hochwald. How much training is needed in multiple-antenna wireless links? IEEE Trans. on Inform. Theory, 49(4):951{ 963, Apr. 2003. [HJ85] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge Univer- sity Press, 1985. [HKD10] Jakob Hoydis, Mari Kobayashi, and Merouane Debbah. On the optimal num- ber of cooperative base stations in network MIMO systems. submitted to IEEE Trans. on Sig. Proc., Mar. 2010. [HMK + 10] Hoon Huh, Sung-Hyun Moon, Young-Tae Kim, Inkyu Lee, and Giuseppe Caire. Multi-cell MIMO downlink with cell cooperation and fair scheduling: a large-system limit analysis. submitted to IEEE Trans. on Inform. Theory, June 2010. [HMT04] Bertrand M. Hochwald, Thomas L. Marzetta, and V. Tarokh. Multiple- antenna channel hardening and its implications for rate feedback and schedul- ing. IEEE Trans. on Inform. Theory, 50(9):1893{1909, Sept. 2004. [HPC09] Hoon Huh, Haralabos C. Papadopoulos, and Giuseppe Caire. MIMO broadcast channel optimization under general linear constraints. In Proc. IEEE Int. Symp. on Inform. Theory (ISIT), Seoul, Korea, June 2009. [HPC10] Hoon Huh, Haralabos C. Papadopoulos, and Giuseppe Caire. Multiuser MISO transmitter optimization for intercell interference mitigation. IEEE Trans. on Sig. Proc., 58(8):4272{4285, Aug. 2010. [HTC10] Hoon Huh, Antonia M. Tulino, and Giuseppe Caire. Network MIMO with linear zero-forcing beamforming: large system analysis, impact of channel es- timation and reduced-complexity scheduling. submitted to IEEE Trans. on Inform. Theory, Dec. 2010. [HTH + 09] Howard Huang, Matteo Trivellato, Ari Hottinen, Mansoor Sha, Peter J. Smith, and Reinaldo A. Valenzuela. Increasing downlink cellular throughput with limited network MIMO coordination. IEEE Trans. on Wireless Commun., 8(6):2983{2989, June 2009. [HV05] Howard Huang and Reinaldo A. Valenzuela. Fundamental simulated perfor- mance of downlink xed wireless cellular networks with multiple antennas. In Proc. IEEE Int. Symp. on Personal, Indoor, and Mobile Radio Commun. (PIMRC), Berlin, Germany, Sept. 2005. 136 [IEE09] IEEE 802.16 broadband wireless access working group. IEEE 802.16m evalua- tion methodology document (EMD). Technical report, IEEE 802.16m-08/004, Jan. 2009. [IEE10] IEEE 802.16 broadband wireless access working group. IEEE 802.16m system requirements. Technical report, IEEE 802.16m-07/002, Jan. 2010. [JAMV11] Jubin Jose, Alexei Ashikhmin, Thomas L. Marzetta, and Sriram Vishwanath. Pilot contamination and precoding in multi-cell TDD systems. accepted for publication in IEEE Trans. on Wireless Commun., 2011. [JAWV08] Jubin Jose, Alexei Ashikhmin, Phil Whiting, and Sriram Vishwanath. Pre- coding methods for multi-user TDD MIMO systems. In Proc. Allerton Conf. on Commun., Control, and Computing, Urbana-Champaign, IL, Sept. 2008. [JTS + 07] Sheng Jing, David N. C. Tse, Joseph B. Soriaga, Jilei Hou, John E. Smee, and Roberto Padovani. Downlink macro-diversity in cellular networks. In Proc. IEEE Int. Symp. on Inform. Theory (ISIT), Nice, France, June 2007. [JVG04] Nihar Jindal, Sriram Vishwanath, and Andrea Goldsmith. On the duality of gaussian multiple-access and broadcast Channels. IEEE Trans. on Inform. Theory, 50(5):768{783, May 2004. [KC09] K. Raj Kumar and Giuseppe Caire. Channel state feedback over the MIMO- MAC. In Proc. IEEE Int. Symp. on Inform. Theory (ISIT), Seoul, Korea, June 2009. [KCM09] K. Raj Kumar, Giuseppe Caire, and A.L. Moustakas. Asymptotic performance of linear receivers in MIMO fading channels. IEEE Trans. on Inform. Theory, 55(10):4398{4418, Oct. 2009. [KJC11] Mari Kobayashi, Nihar Jindal, and Giuseppe Caire. Training and feedback optimization for multiuser MIMO downlink. To appear in IEEE Trans. on Commun., 2011. [LSO09] Jean-Baptiste Landre, Ahmed Saadani, and Francois Ortolan. Realistic per- formance of HSDPA MIMO in macro-cell environment. In Proc. IEEE Int. Symp. on Personal, Indoor, and Mobile Radio Commun. (PIMRC), Tokyo, Japan, Sept. 2009. [Mar06] Thomas L. Marzetta. How much training is required for multiuser MIMO? In Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers (ACSSC), Pacic Grove, CA, Oct. 2006. [Mar10] Thomas L. Marzetta. Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans. on Wireless Commun., 9(11):3590{3600, Nov. 2010. 137 [MF08] Patrick Marsch and Gerhard Fettweis. On base station cooperation schemes for downlink network MIMO under a constrained backhaul. In Proc. IEEE Global Commun. Conf. (GLOBECOM), New Orleans, LA, Nov. 2008. [MH99] Thomas L. Marzetta and Bertrand M. Hochwald. Capacity of a mobile multiple-antenna communication link in Rayleigh at fading. IEEE Trans. on Inform. Theory, 45(1):139{157, Jan. 1999. [MP67] Vladimir A. Marcenko and Leonid A. Pastur. Distributions of eigenvalues for some sets of random matrices. Math. USSR-Sbornik, 1(4):457{483, 1967. [MSS03] Aris L. Moustakas, Steven H. Simon, and Anirvan M. Sengupta. MIMO ca- pacity through correlated channels in the presence of correlated interferers and noise: a (not so) large N analysis. IEEE Trans. on Inform. Theory, 49(10):2545{2561, Oct. 2003. [MW00] Jeonghoon Mo and Jean Walrand. Fair end-to-end window-based congestion control. IEEE/ACM Trans. on Networking, 8:556{567, Oct. 2000. [PCR11] Haralabos C. Papadopoulos, Giuseppe Caire, and Sean A. Ramprashad. Achieving large spectral eciencies from MU-MIMO with tens of antennas: location-adaptive TDD MU-MIMO design and user scheduling. submitted to Asilomar Conf. on Signals, Systems, and Computers (ACSSC), 2011. [PDF + 08] Stefan Parkvall, Erik Dahlman, Anders Furuskar, Ylva Jading, Magnus Ols- son, Stefan Wanstedt, and Kambiz Zangi. LTE-Advanced { Evolving LTE towards IMT-Advanced. In Proc. IEEE Vehic. Tech. Conf. (VTC), Calgary, Alberta, Sept. 2008. [PELT06] Stefan Parkvall, Eva Englund, Magnus Lundevall, and Johan Torsner. Evolv- ing 3G mobile systems: Broadband and broadcast services in WCDMA. IEEE Commun. Mag., 44(2):30{36, Feb. 2006. [Pro00] John G. Proakis. Digital Communications. McGraw-Hill, 2000. [Rap02] Theodore S. Rappaport. Wireless Communications: Principles & Practice. Prentice Hall, 2002. [RC09] Sean A. Ramprashad and Giuseppe Caire. Cellular vs. network MIMO: a comparison including the channel state information overhead. In Proc. IEEE Int. Symp. on Personal, Indoor, and Mobile Radio Commun. (PIMRC), Tokyo, Japan, Sept. 2009. [RCP09] Sean A. Ramprashad, Giuseppe Caire, and Haralabos C. Papadopoulos. Cel- lular and network MIMO architectures: MU-MIMO spectral eciency and costs of channel state information. In Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers (ACSSC), Pacic Grove, CA, Nov. 2009. 138 [SMC09] Hooman Shirani-Mehr and Giuseppe Caire. Channel state feedback schemes for multiuser MIMO-OFDM downlink. IEEE Trans. on Commun., 57(9):2713{ 2723, Sept. 2009. [SMCN10] Hooman Shirani-Mehr, Giuseppe Caire, and Michael J. Neely. MIMO down- link scheduling with non-perfect channel state knowledge. IEEE Trans. on Commun., 58(7):2055{2066, July 2010. [SS00] Oren Somekh and Shlomo Shamai (Shitz). Shannon-theoretic approach to a Gaussian cellular multiple-access channel with fading. IEEE Trans. on Inform. Theory, 46(4):1401{1425, July 2000. [SSPS09] Amichai Sanderovich, Oren Somekh, H. Vincent Poor, and Shlomo Shamai (Shitz). Uplink macro diversity of limited backhaul cellular network. IEEE Trans. on Inform. Theory, 55(8):3457{3478, Aug. 2009. [SSS + 08] Shlomo Shamai (Shitz), Osvaldo Simeone, Oren Somekh, Amichai Sanderovich, Benjamin M. Zaidel, and H. Vincent Poor. Information-theoretic implications of constrained cooperation in simple cellular models. In Proc. IEEE Int. Symp. on Personal, Indoor, and Mobile Radio Commun. (PIMRC), Cannes, France, Sept. 2008. [SW97] Shlomo Shamai (Shitz) and Aaron D. Wyner. Information-theoretic consider- ations for symmetric, cellular, multiple-access fading channels { Part I & II. IEEE Trans. on Inform. Theory, 43(6):1877{1894, Nov. 1997. [SZS07] Oren Somekh, Benjamin M. Zaidel, and Shlomo Shamai (Shitz). Sum rate characterization of joint multiple cell-site processing. IEEE Trans. on Inform. Theory, 53(12):4473{4497, Dec. 2007. [TCFB09] Alessandro Tomasoni, Giuseppe Caire, Marco Ferrari, and Sandro Bellini. On the selection of semi-orthogonal users for zero-forcing beamforming. In Proc. IEEE Int. Symp. on Inform. Theory (ISIT), Seoul, Korea, June 2009. [TLV05] Antonia M. Tulino, Angel Lozano, and Sergio Verdu. Impact of antenna cor- relation on the capacity of multiantenna channels. IEEE Trans. on Inform. Theory, 51(7):2491{2509, July 2005. [TLV06] Antonia M. Tulino, Angel Lozano, and Sergio Verdu. Capacity-achieving input covariance for single-user multi-antenna channels. IEEE Trans. on Wireless Commun., 5(3):662{671, Mar. 2006. [TV00] David N. C. Tse and Sergio Verdu. Optimum asymptotic multiuser eciency of randomly spread CDMA. IEEE Trans. on Inform. Theory, 46(6):2718{2723, Nov. 2000. [TV04] Antonia M. Tulino and Sergio Verdu. Random Matrix Theory and Wireless Communications, volume 1. Foundations and Trends in Communications and Information Theory, 2004. 139 [TV05] David N. C. Tse and Pramod Viswanath. Fundamentals of Wireless Commu- nication. Cambridge University Press, 2005. [VJG03] Sriram Vishwanath, Nihar Jindal, and Andrea Goldsmith. Duality, achievable rates, and sum-rate capacity of Gaussian MIMO broadcast channels. IEEE Trans. on Inform. Theory, 49(10):2658{2668, Oct. 2003. [VS99] Sergio Verdu and Shlomo Shamai (Shitz). Spectral eciency of CDMA with random spreading. IEEE Trans. on Inform. Theory, 45(2):622{640, Mar. 1999. [VT03] Pramod Viswanath and David N. C. Tse. Sum capacity of the vector Gaus- sian broadcast channel and uplink-downlink duality. IEEE Trans. on Inform. Theory, 49(8):1912{1921, Aug. 2003. [VTL02] Pramod Viswanath, David N. C. Tse, and Rajiv Laroia. Opportunistic beam- forming using dumb antennas. IEEE Trans. on Inform. Theory, 48(6):1277{ 1294, June 2002. [WCDS09] Sebastian Wagner, Romain Couillet, Merouane Debbah, and Dirk T. M. Slock. Large system analysis of linear precoding in MISO broadcast channels with limited feedback. submitted to IEEE Trans. on Inform. Theory, June 2009. [WES08] Ami Wiesel, Yonina C. Eldar, and Shlomo Shamai (Shitz). Zero-forcing pre- coding and generalized inverses. IEEE Trans. on Sig. Proc., 56(9):4409{4418, Sept. 2008. [WGJ11] Chenwei Wang, Tiangao Gou, and Syed Ali Jafar. Aiming perfectly in the dark{blind interference alignment through staggered antenna switching. IEEE Trans. on Sig. Proc., 59(6):2734{2744, June 2011. [WiM06] WiMAX Forum. Mobile WiMAX { Part I: A technical overview and perfor- mance evaluation. Technical report, Aug. 2006. [WSS06] Hanan Weingarten, Yossef Steinberg, and Shlomo Shamai (Shitz). The capac- ity region of the Gaussian multiple-input multiple-output broadcast channel. IEEE Trans. on Inform. Theory, 52(9):3936{3964, Sept. 2006. [Wyn94] Aaron D. Wyner. Shannon-theoretic approach to a Gaussian cellular multiple access channel. IEEE Trans. on Inform. Theory, 40(6):1713{1727, Nov. 1994. [YC04] Wei Yu and John M. Cio. Sum capacity of Gaussian vector broadcast chan- nels. IEEE Trans. on Inform. Theory, 50(9):1875{1892, Sept. 2004. [YG06] Taesang Yoo and Andrea Goldsmith. On the optimality of multiantenna broad- cast scheduling using zero-forcing beamforming. IEEE J. Select. Areas Com- mun., 24(3):528{541, Mar. 2006. 140 [YL07] Wei Yu and Tian Lan. Transmitter optimization for the multi-antenna downlink with per-antenna power constraints. IEEE Trans. on Sig. Proc., 55(6):2646{2660, June 2007. [Yu06] Wei Yu. Sum-capacity computation for the Gaussian vector broadcast channel via dual decomposition. IEEE Trans. on Inform. Theory, 52(2):754{759, Feb. 2006. [ZCA + 09] Jun Zhang, Runhua Chen, Jerey G. Andrews, Arunabha Ghosh, and Robert W. Heath. Networked MIMO with clustered linear precoding. IEEE Trans. on Wireless Commun., 8(4):1910{1921, Apr. 2009. [ZH10] Randa Zakhour and Stephen V. Hanly. Base station cooperation on the down- link: large system analysis. submitted to IEEE J. Select. Areas Commun., June 2010. [Zha10] Rui Zhang. Cooperative multi-cell block diagonalization with per-base-station power constraints. IEEE J. Select. Areas Commun., 28(9):1435{1445, Dec. 2010. [ZMM + 08] Hongyuan Zhang, Neelesh B. Mehta, Andreas F. Molisch, Jin Zhang, and Huaiyu Dai. Asynchronous interference mitigation in cooperative base station systems. IEEE Trans. on Wireless Commun., 7(1):155{165, Jan. 2008. [ZT02] Lizhong Zheng and David N. C. Tse. Communication on the Grassmann mani- fold: a geometric approach to the noncoherent multiple-antenna channel. IEEE Trans. on Inform. Theory, 48(2):359{383, Feb. 2002. [ZZL + 09] Lan Zhang, Rui Zhang, Ying-Chang Liang, Yan Xin, and H. Vincent Poor. On gaussian MIMO BC-MAC duality with multiple transmit covariance con- straints. In Proc. IEEE Int. Symp. on Inform. Theory (ISIT), Seoul, Korea, June 2009. 141
Abstract (if available)
Abstract
For the evolution of multiuser MIMO (MU-MIMO) technologies in the multi-cell environment, a lot of studies are considering the inter-cell cooperation schemes, which provide benefits of inter-cell interference mitigation and/or higher antenna array gain from a number of cooperating transmit antennas. In order to appreciate the full potentials of such schemes for the MU-MIMO system, the following aspects need to be considered: a multi-cell coverage with spatial user distribution and realistic pathloss model; multiuser scheduling, taking into account fairness issues; the type of downlink precoding and employed signal processing; the availability of channel state information at the transmitter; the type of inter-cell cooperation and corresponding system overhead. However, taking into account all these aspects in a single framework is very complicated and its analytical characterization is generally difficult. So far, the system performance has been evaluated through computationally very intensive Monte Carlo simulation. ❧ In this dissertation, an analytic tool is developed, based on the large random matrix theory. In the large system limit where both the number of antennas per base station (BS) and the number of users per cell grow to infinity with a constant ratio, the randomness in channels from BSs to users disappears and the MU-MIMO system becomes a deterministic network, which allows analytic and computationally efficient performance evaluation and optimization. Using this large system analysis, some relevant issues in the design of the MU-MIMO system in the multi-cell downlink are considered: 1) ergodic capacity of the MU-MIMO system with inter-cell cooperation subject to general fairness criteria; 2) the trade-off between the benefit of inter-cell cooperation and the cost of estimating cooperating cells' channels; and 3) the efficient performance achievement of the so-called ""massive MIMO"" system by the user location-based scheduling and transmission strategy.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Channel state information feedback, prediction and scheduling for the downlink of MIMO-OFDM wireless systems
PDF
Enabling massive distributed MIMO for small cell networks
PDF
Design and analysis of large scale antenna systems
PDF
Space-time codes and protocols for point-to-point and multi-hop wireless communications
PDF
Communicating over outage-limited multiple-antenna and cooperative wireless channels
PDF
Optimal resource allocation and cross-layer control in cognitive and cooperative wireless networks
PDF
Achieving efficient MU-MIMO and indoor localization via switched-beam antennas
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Distributed interference management in large wireless networks
PDF
Hybrid beamforming for massive MIMO
PDF
Improving spectrum efficiency of 802.11ax networks
PDF
Communication and cooperation in underwater acoustic networks
PDF
Efficient inverse analysis with dynamic and stochastic reductions for large-scale models of multi-component systems
PDF
Propagation channel characterization and interference mitigation strategies for ultrawideband systems
PDF
Multidimensional characterization of propagation channels for next-generation wireless and localization systems
PDF
Design and analysis of reduced complexity transceivers for massive MIMO and UWB systems
PDF
Design and analysis of high-performance cooperative relay networks
PDF
Satisfying QoS requirements through user-system interaction analysis
PDF
Signal processing for channel sounding: parameter estimation and calibration
PDF
RGBD camera based wearable indoor navigation system for the visually impaired
Asset Metadata
Creator
Huh, Hoon
(author)
Core Title
Large system analysis of multi-cell MIMO downlink: fairness scheduling and inter-cell cooperation
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2011-08
Publication Date
01/27/2012
Defense Date
05/12/2011
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
fairness scheduling,inter-cell cooperation,large system analysis,multi-user MIMO,OAI-PMH Harvest
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Caire, Giuseppe (
committee chair
), Golubchik, Leana (
committee member
), Mitra, Urbashi (
committee member
)
Creator Email
hhuh@usc.edu,hoon.huh@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC1387560
Unique identifier
UC1387560
Identifier
etd-HuhHoon-187.pdf (filename)
Legacy Identifier
etd-HuhHoon-187
Dmrecord
639208
Document Type
Dissertation
Format
theses (aat)
Rights
Huh, Hoon
Internet Media Type
application/pdf
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
fairness scheduling
inter-cell cooperation
large system analysis
multi-user MIMO