Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Enabling massive distributed MIMO for small cell networks
(USC Thesis Other)
Enabling massive distributed MIMO for small cell networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Enabling Massive Distributed MIMO for Small Cell Networks Ryan Rogalin A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY, ELECTRICAL ENGINEERING VITERBI SCHOOL OF ENGINEERING May 15, 2015 Abstract According to the Cisco Visual Networking Index, the demand for wireless data trac is expected to increase at an exponential rate for the foreseeable future. This demand is largely driven by the success of smartphones and their de facto killer-app, mobile video. However, current generation mobile networks are ill-equipped to meet this demand in very dense user environments such as airports, campuses, conference halls, and stadiums. This problem is fundamentally limited by the quantity of available wireless spectrum, and the eciency with which this spectrum is used. A novel network architecture known as large-scale distributed Multiuser MIMO has the potential to solve this impending spectrum crunch. Large-scale distributed MU-MIMO unies two recent trends in wireless research: \massive MIMO" and \small cells." It consists of several Access Points (APs) con- nected to a central server via a wired backhaul network and which act as a large distributed antenna system. This dissertation presents scalable solutions to the two primary implementation challenges of Distributed MIMO: AP synchronization and uplink/downlink reciprocity calibration. AP synchronization refers to the act of forcing each AP's inexpensive crystal oscillator to operate on the same frequency, as well as the act of forcing each AP to transmit at the same time. Reciprocity cali- bration refers to the ability to infer downlink (AP to user) channel conditions based on uplink (user to AP) transmissions. The focus of this work is on the downlink, which is both more demanding in terms of trac and more challenging in terms 2 of implementation than the uplink. All of the proposed synchronization and cali- bration protocols utilize over-the-air signaling, which allows for reduced hardware requirements and for algorithms which scale well with the size of the network. The proposed schemes can be applied to networks formed by a large number of APs, each of which is driven by an inexpensive 802.11-grade clock and has a standard RF front-end, not explicitly designed to be reciprocal. In addition to novel synchronization and calibration algorithms, this work presents a number of system-level optimizations which are needed in practical networks. Ex- perimental and simulation-based results indicate that a realistic distributed MIMO system is capable of delivering the data rates required by next generation networks. 3 To my family 4 Acknowledgements While there is ostensibly a single name on this dissertation, it would not have been possible without a large ensemble of collaborators, friends, and family. First and foremost, I owe a great debt of gratitude to my advisor, Giuseppe Caire, for his guidance. Giuseppe not only gave me instruction on technical issues, but also guided my development as a researcher by knowing when to direct and when to let me struggle. He taught me the value of practice informed by theory and aided my professional development. I will always be grateful for his mentorship. Additionally, I have had the incredible privilege to work with many professors, practitioners, and researchers throughout my Ph.D. who have each contributed something unique to my education. I owe many thanks to my qualifying and defense committee members, Professors Kostas Psounis, Ramesh Govindan, Andy Molisch, and Ubli Mitra for their challenging critiques. I am deeply indebted to my managers, mentors, and collaborators from various internships and research projects, including Nihar Jindal, Ling Su, Babis Papadopoulos, Ozgun Bursalioglu, Joseph Han, Robert Liang, Joseph Kim, and Bill Horne. In addition to working as a research assistant at USC, I have had the great honor of serving as a teaching assistant. I had the pleasure of working with several skilled educators, but I am particularly grateful to Professor Ed Maby for his mentorship and for nurturing a passion for education. I owe all of my love and gratitude to my wife, Erin. Her organization and work ethic challenged me to be a better researcher, and her love and kindness made me 5 aspire to be a better person. I would not be who I am without her. My parents, sisters, and extended family have given me incredible support through this journey, and I thank them for their continual encouragement. There have been several times throughout my Ph.D. when I was faced with seem- ingly insurmountable obstructions, and nearly every one of them has been resolved by my student services advisor, Diane Demetras. Diane puts out so many res on a daily basis that the department would turn to ashes without her. She is always bright, cheerful, and helpful, and I am immensely appreciative for her help. Last, but certainly not least, my time at USC would not have been nearly as enjoyable without good friends. Many thanks to Antonios, Seun, Arash, Hassan, Vlad, Ansuman, Yang-Ho, Abe, Kristen, and Eric for the many delightful diversions from work. 6 Contents Abstract 2 Acknowledgements 5 1 Introduction 10 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.1 Theoretical Foundations . . . . . . . . . . . . . . . . . . . . . 20 1.3.2 Complementary and Experimental Work . . . . . . . . . . . . 23 1.3.3 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.4 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.3.5 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . 26 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2 Synchronization 30 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2 A Model of OFDM with Synchronization Errors . . . . . . . . . . . . 32 2.2.1 Point-to-Point OFDM Synchronization Errors . . . . . . . . . 32 2.2.2 Impact of Synchronization Errors on Distributed MU-MIMO . 37 2.3 Point-to-Point Synchronization . . . . . . . . . . . . . . . . . . . . . 41 7 CONTENTS 2.3.1 Maximum Likelihood Synchronization . . . . . . . . . . . . . . 42 2.3.2 Timing and Frequency Cramer-Rao Bound . . . . . . . . . . . 45 2.3.3 OFDM Synchronization . . . . . . . . . . . . . . . . . . . . . 48 2.3.4 Timing Synchronization by Sample Insertion/Deletion . . . . . 52 2.4 AirSync: A Master/Secondary Synchronization Protocol . . . . . . . 53 2.5 Consensus-Based Synchronization . . . . . . . . . . . . . . . . . . . . 57 2.6 Least Squares Synchronization . . . . . . . . . . . . . . . . . . . . . . 62 2.7 Recursive Least Squares Synchronization . . . . . . . . . . . . . . . . 67 2.8 Kalman Filter Synchronization . . . . . . . . . . . . . . . . . . . . . . 70 2.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3 Calibration 77 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.2 Calibration Model and Requirements . . . . . . . . . . . . . . . . . . 80 3.3 A Master/Secondary Calibration Protocol . . . . . . . . . . . . . . . 84 3.4 Reciprocity Using Least Squares Calibration . . . . . . . . . . . . . . 86 3.5 Simulated Performance Analysis . . . . . . . . . . . . . . . . . . . . . 90 3.6 Hierarchical calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4 System-Level Considerations 103 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.3 Frequency Osets and Phase Noise . . . . . . . . . . . . . . . . . . . 107 4.3.1 Achievable-Rate Optimal Frame Length . . . . . . . . . . . . 126 4.3.2 Outage-Throughput Optimal Frame Length . . . . . . . . . . 127 4.4 Truncated Conjugate Beamforming . . . . . . . . . . . . . . . . . . . 130 4.5 Blind Interference Alignment . . . . . . . . . . . . . . . . . . . . . . . 134 8 CONTENTS 4.5.1 Blind Interference Alignment Under Discrete Alphabets . . . . 137 4.5.2 Blind Interference Alignment in Non-ideal Channel Conditions 140 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5 Experimental Evaluation 144 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.2 Synchronization Results . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.3 Calibration Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.4 Blind Interference Alignment . . . . . . . . . . . . . . . . . . . . . . . 148 5.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Bibliography 155 9 Chapter 1 Introduction 1.1 Background Modern enterprise wireless local area networks (WLANs) are ill-equipped to combat interference. In contrast to macro-cellular networks, where a great deal of planning goes into the reduction of basestation-to-basestation interference, enterprise WLAN access points (APs) are often deployed in an ad-hoc manner. Typically, interfer- ence management is accomplished via frequency orthogonalization which involves assigning adjacent APs to dierent frequencies. This issue has spawned an entire industry, where the objective is to choose the best frequency for each AP to use such that interference between APs (and their associated users) is minimized. Aside from the fact that this is a computationally dicult problem for large networks, it is an inecient use of limited spectrum. With the forecast for mobile trac growing at an exponential rate for the foreseeable future [1] (see Figure 1.1), novel schemes to increase spectral eciency are needed. Two recent research trends in wireless networks have attracted signicant atten- tion as potential solutions: massive MIMO (multiple input multiple output, refer- ring to the use of multiple transmitter antennas and multiple receiver antennas) and 10 1.1. BACKGROUND 2013 2014 2015 2016 2017 2018 0 9 18 Exabytes per Month 61% CAGR 2013-2018 Mobile File Sharing (2.9%) Mobile M2M (5.7%) Mobile Audio (10.6%) Mobile Web/Data(11.7%) Mobile Video (69.1%) Figures in parentheses refer to traffic share in 2018 Source: Cisco VNI Mobile, 2014 Figure 1.1: The Cisco Visual Networking Index [1] predicts an exponential growth in wireless data trac, largely driven by the demand for mobile video. very dense spatial reuse. The former refers to serving multiple users on the same time-frequency channel resource via spatial precoding, thus eliminating multiuser in- terference with a very large number of antennas at each basestation site [2,3]. The latter pertains to the use of small cells [4], such that multiple short-range low-power links can co-exist on the same time-frequency channel resource thanks to sucient spatial separation. Distributed Multiuser MIMO (Distributed MU-MIMO, or sim- ply Distributed MIMO) unies these two approaches, simultaneously obtaining both multiuser interference suppression through spatial precoding and dense coverage by reducing the average distance between transmitters and receivers. This is achieved by coordinating a large number of Access Points (APs) distributed over a certain coverage region through a wired backhaul network connected to a Central Server (CS) in order to form a distributed antenna system. For a single, multiple-antenna AP transmitting to several users, techniques from multi-user MIMO [5] have been tremendously successful at interference mitigation. These techniques are now being adopted by standardization bodies [6{9], enabling the transmission of independent data streams to dierent users while reducing or 11 CHAPTER 1. INTRODUCTION eliminating interference between the streams. Additionally, theoretical analyses have shown that basestations with a massive number of antennas (taken as a limit to in- nity) can eliminate the eects of noise and fast fading, with spectral eciency becoming independent of bandwidth [2]. Part of the requirement to achieve these gains is the ability to determine the downlink (AP-to-user) channel from uplink (user-to-AP) channel measurements. While the practical limitations of this scheme (namely the eects of pilot contamination and the assumption of independent iden- tically distributed (i.i.d.) Rayleigh fading) temper the expectations of its perfor- mance, experimental results remain promising [10]. Large scale antenna systems (LSAS, another name for massive MIMO) are expected to play a signicant role in the standardization of 5G cellular systems [11]. In order to achieve large spectral eciencies through MU-MIMO downlink spatial precoding, channel state information at the transmitter (CSIT) is needed. Following the massive MIMO approach [2, 3], CSIT can be obtained from the users' uplink pilots by operating the system in Time-Division Duplexing (TDD) and exploiting the reciprocity of the radio channels. However, while the propagation channel from antenna to antenna is reciprocal, 1 the transmitter and receiver hardware are not, and introduce an unknown amplitude scaling and phase shift between the uplink (UL) and downlink (DL) channels [13]. While Distributed MU-MIMO may be applied in a variety of dierent scenarios, in this work we focus primarily on cost-eective consumer grade equipment and focus particularly on the downlink, which is both more demanding in terms of trac 1 This statement holds if the UL and DL transmissions take place within the coherence time and coherence bandwidth of the underlying doubly-selective channel. Using OFDM and TDD, the propagation channel is eectively reciprocal on each OFDM subcarrier over many consecutive OFDM symbols, since the coherence time for typical wireless local area networks with nomadic users may span 0:1s to 1s or more. In fact, 800 msec is commonly used as the simulation as- sumption for 801.11ac IEEE standard for WLAN [12]. In WARP software-radio based experiments, we observed the coherence time to be on the order of seconds. In contrast, Frequency Division Duplexing schemes have the UL and the DL separated by tens of MHz, well beyond the coherence bandwidth. Therefore, reciprocity for such systems typically does not hold. 12 1.1. BACKGROUND and more challenging in terms of implementation than the uplink. We assume inexpensive APs connected to the CS via a conventional wired digital backbone (e.g., Ethernet), capable of transporting digital packetized data at sucient speed from/to the CS. However, this backhaul is not sucient to distribute a common clock (i.e., timing and phase information) to the APs due to the limitations of standard network synchronization protocols such as IEEE 1588 [14]. Since each AP is driven by its own individual clock, and the digital backbone network is not capable of providing a suciently accurate common timing and fre- quency reference, such synchronization must occur via over-the-air signaling. Also, since typical commercial grade radios are not equipped with a built-in self-calibration capability, the non-reciprocal eects of the receiver and transmitter hardware must be compensated explicitly via a TDD reciprocity calibration protocol. Although in our treatment we focus on 802.11 grade hardware, note that the methods we propose are platform agnostic. Indeed, the same principles can be applied to any future cost- eective exible network deployments, where practical hardware mismatches need to be eliminated in order to enable coherent reciprocity-based Distributed MU-MIMO transmission. The goal of this work is to cost-eectively enable many independent APs to act as a single AP with a massive number of antennas. Access point coordination has been explored extensively in recent years, notably through the lens of LTE's Coordinated Multi-Point [15]. The typical assumption is that all basestations/APs are synchronized over a ber optic network, in which the synchronization is essen- tially perfect. In fact, perfect AP synchronization is typically a given in theoretical works [16,17], but in the considered enterprise Wi-Fi scenario, ber deployment be- comes prohibitively costly. As such, our work explores techniques to achieve this ne level of synchronization through over-the-air synchronization and subsequently in- troduces novel methods of uplink/downlink reciprocity calibration to enable massive 13 CHAPTER 1. INTRODUCTION MIMO. 1.2 System Architecture Figure 1.2 provides a high-level block diagram that illustrates the operation of channel-reciprocity based MU-MIMO on the hardware, including UL channel train- ing, DL MU-MIMO transmission, and the compensation mechanisms (synchroniza- tion and calibration) needed to enable such coherent DL MU-MIMO transmis- sion. Orthogonal Frequency Division Multiplexing (OFDM) is assumed throughout this work, since it is widely used in current WLAN (802.11a/g/n/ac) and cellular (LTE/802.16m) standards [18]. An UL-pilot/DL-precoded transmission cycle works as follows. At rst, UL pilots are transmitted by the User Terminals (UTs). After receiving the UL pilots, each AP individually processes its observations through an OFDM front-end coupled with a standard synchronization block, as used in any modern WLAN AP implementation. Estimates of the UL channels (between each AP and the pilot-transmitting UTs) are then formed at each AP, sent to the CS through the digital backbone network, and are jointly used at the CS to calculate the MU-MIMO precoding matrix [2, 3]. In the DL transmission phase, the packets of OFDM encoded symbols destined to the UTs are rst precoded using the DL precoding matrix. Then, the precoded signal at each AP is calibrated through the \CAL" block, which compensates for the non-reciprocal amplitude scaling and phase rotations introduced by the transmitter and receiver AP hardware. 2 Finally, all the APs forming the Distributed MU-MIMO network simultaneously send the precoded and calibrated data packets on the same time-frequency slot. The synchronization blocks, indicated by \sync" and \SYNC" in Figure 1.2 at the 2 There are several ways to apply calibration, some of which are not captured by the diagram in Figure 1.2. For example, calibration can be applied to the UL channel estimates before the precoding matrix calculation [19]. While specic implementations may depend on the specic hardware at hand, all such solutions are conceptually equivalent to the scheme in Figure 1.2. 14 1.2. SYSTEM ARCHITECTURE SYNC OFDM Channel Estimation sync OFDM SYNC OFDM CAL sync OFDM Precoding Data Symbols UL channel estimates DL precoder DEC UL Pilots AP / CS Processing Decoded Symbols Uplink Propagation Channel Downlink Propagation Channel UT Processing Precoder Design at CS Calibration & Synchronization Information Figure 1.2: Uplink training and downlink transmission for distributed channel- reciprocity based MU-MIMO. UT and AP side, respectively, contain the mechanisms by which the network main- tains synchronism. Synchronization at the UT side (both transmitter and receiver) and at the AP receiver takes care of frame and carrier frequency synchronization in order to transmit and receive on the assigned time slots and demodulate the OFDM signals with negligible inter-carrier interference (ICI). This can be imple- mented by completely standard schemes used in current WLAN technology and forms a well-known and thoroughly studied subject [20{24]. These techniques are covered in the context of AP synchronization in Chapter 2, though the techniques are directly applicable to the UT hardware as well. In contrast, synchronization at the AP transmitter side plays a critical role in the Distributed MU-MIMO architec- ture, since it needs to compensate for timing misalignment and the relative phase rotation of the DL data blocks, transmitted simultaneously by the jointly precoded APs. Such compensation is non-standard in current network implementations and it is not widely treated in the literature. In fact, as previously mentioned, most the- oretical works on Distributed MU-MIMO (see for example [3,16,17,25,26]) assume perfect transmitter synchronization without specifying how this is achieved. As shown in Section 2.2.2, uncompensated synchronization errors at the AP transmitter side cannot be undone at the UT receivers, since the jointly precoded signals are mixed up and cannot be individually resampled (for timing) and phase- 15 CHAPTER 1. INTRODUCTION rotated (for carrier frequency oset). For correct operation of the Distributed MU- MIMO architecture it is therefore essential that: 1) the channel amplitude and phase rotations incurred by UL channel estimation can be compensated at the AP side (TDD calibration), and 2) the jointly transmitting APs send their precoded data slots with suciently accurate timing and carrier frequency coherence. Chapters 2 and 3 focus on the design of signaling schemes and associated signal processing techniques for the CAL and SYNC blocks at the AP side of Figure 1.2. These techniques accomplish the above goals with sucient accuracy in order to materialize a large fraction of the theoretical capacity gain promised by Distributed MU-MIMO. At the network setup phase, each AP discovers its neighbor APs and measures the associated propagation delays. Since the APs are placed in xed positions, the network structure (neighboring APs) and path delays change very slowly in time and it is sucient to repeat this procedure at a very slow time scale (e.g., on the order of tens of minutes) to safely assume that these parameters are known. Calibration for TDD reciprocity (Chapter 3) depends on the complex scaling factors introduced by the modulation/demodulation hardware (ampliers, lters, ADC/DACs), that change slowly in time. Experimental software dened radio (SDR) implementation shows that calibration must be repeated on a time-scale on the order of 10 minutes [10] for a system where all of the APs share a common clock. In contrast, synchronization must track the variations of the clock frequen- cies. Hence synchronization must be performed frequently enough such that the errors do not signicantly degrade the system throughput. In a distributed system where APs do not necessarily share a common clock, even after synchronization, the very small residual errors in frequency cause a slow relative drift among the dierent AP phasors. As these errors vary and accumulate over time, and since they aect the extent of end-to-end reciprocity in the channel, calibration needs to be performed at the same rate as synchronization. The proposed 16 1.2. SYSTEM ARCHITECTURE pilot1 pilot2pilot3 frame calibration data OFDM pilot symbol APs poll users Users send UL pilots APs send DL data pilot4 sync slot Figure 1.3: The frame structure for the proposed Distributed MU-MIMO architec- ture. The synchronization slot contains several pilot signals, orthogonal in time, and separated by guard intervals, which are assigned to the anchor APs according to a graph coloring scheme. The colors of the pilots correspond to the graph coloring of Figure 1.4. The calibration slot comprises pilots over OFDM. During this slot, each AP transmits pilots over one or more time-frequency slots, allowing it to calibrate itself with other APs in its vicinity. protocol makes use of a frame structure comprised of synchronization, calibration and DL data slots, as shown in Figure1.3 where synchronization and calibration take place back to back. LetT =fi :i = 1;:::;N a g denote the set of APs. We dene a directed network graph with verticesT and edgesE =f(i;j)g for all ordered pairs of APs i;j such that the average SNR of the transmissioni!j is larger than some suitably dened threshold. Since the distance-dependent pathloss is symmetric with respect to the AP indices, if (i;j)2E then also (j;i)2E. Without loss of generality, we assume that the graph (T;E) is connected. 3 For the synchronization protocol, we deneAT to form a connected cover [27] (i.e., the subgraph formed by the APs i 2 A and their associated edges E(A) =f(i;j) : i;j2Ag is connected and any other AP i 0 = 2A has at least one neighbor inA). The APs inA are referred to as \anchor" nodes. Figure 1.4 shows a network with 4 anchor nodes. In a practical system deployment, we may imagine that each cluster corresponds to a geographically isolated area (e.g., a building) and the anchor nodes are placed in special positions (e.g., roof-top), such that they can 3 Otherwise, we apply the same algorithms independently on each connected component, but these independent components cannot be used to jointly beamform. 17 CHAPTER 1. INTRODUCTION Figure 1.4: Network graph with 4 anchor nodes (triangles) making use of dierent pilot bursts (identied by dierent colors, as in Figure 1.3). The non-anchor APs do not transmit sync pilots, and listen to their neighbor anchor node(s) in order to correct their timing and frequency references. exploit stronger propagation conditions (e.g., line of sight) to each other. During the network deployment phase, the network designer (or an algorithm) should nd a good set of anchor APs. The synchronization slots are formed by a number of pilot bursts (sequences of time-domain chips) separated by guard intervals. We assume that a rough timing synchronization is provided through the wired backhaul network using a coarse frame synchronization protocol such as IEEE 1588 [14]. As a result, the APs know where to look for the pilot bursts up to an accuracy of a few OFDM symbols, corresponding to the guard intervals inserted between the pilot bursts. We require that each anchor node is able to receive the pilot bursts from its neighboring anchor nodes without collisions. This is possible by nding anL(1; 1)-labeling (see [28] and references therein) of the subgraph (A;E(A)), and associating pilot bursts to the labeling colors. An L(1; 1)-labeling of a graph is a coloring scheme for which all neighbors of any node i2A have distinct colors. In order for this to be possible, 18 1.2. SYSTEM ARCHITECTURE we assume that the number of pilot bursts in the synchronization slot is larger or equal to the number of colors of the L(1; 1)-labeling. For example, when (A;E(A)) is a cycle, ifjAj is a multiple of 3, then 3 colors are sucient, while ifjAj is not multiple of 3 then 4 colors are needed. If (A;E(A)) is a clique, thenjAj is the necessary and sucient number of colors for an L(1; 1) labeling, and if (A;E(A)) is a tree, then it is known thatd max + 1 colors are necessary and sucient, whered max is the maximum node degree. In particular, for a star topology withjAj 1 nodes connected to the star center we needjAj colors. The non-anchor APs inTA do not transmit pilot bursts, and just listen to the anchor nodes. Eventually, at the end of a synchronization slot, each anchor node has received a distinct and non- overlapping pilot burst from each of its neighbors in (A;E(A)), and all non-anchor nodes have received one or more pilots from their neighboring anchor nodes. The estimates collected by the nodes are sent through the wired backhaul network to the CS, which computes correction factors as explained in Chapter 2. The calibration slot is organized in a similar way, where the only dierence is that in this case all nodes have to send and receive a pilot signal, as explained in Chapter 3. If the network is formed by suciently isolated clusters, calibration can be organized in a hierarchical way such that all clusters calibrate in parallel, and then a second round of calibration is run between the anchor nodes (one per cluster) in order to calibrate the whole network (see [19]). For example, such hierarchical schemes prove to be very useful in the presence of multiple antenna APs, where the groups of antennas of each AP, driven by a common clock, can be calibrated \locally" and in parallel using the Argos scheme [10], and a second level of calibration is used to set the relative calibration coecients between the APs. This again makes the proposed scheme work in even very large networks without the requirement of a \master" node or calibration antenna connected with high SNR to all other nodes or antennas. Note that spatial pilot reuse can be exploited to reduce the signaling 19 CHAPTER 1. INTRODUCTION overhead and enable calibration of (geographically) larger networks. The relative size of the sync and calibration slots with respect to the data slots in Figure 1.3 do not re ect the actual duration in a real implementation. As a matter of fact, the protocol overhead is very small for typical channel coherence times in a WLAN nomadic users scenario. Thanks to the periodically repeated computation of the timing and frequency correction factors and calibration correction factors, the APs are time/frequency synchronous and calibrated and can operate (up to residual errors) as a large dis- tributed and coherent multi-antenna system. Both synchronization and calibration must remain suciently accurate over the duration of the sequence of data trans- mission slots in the frame (until the next synchronization and calibration slot). The data slots can be arranged in the classical way widely proposed in TDD- based MU-MIMO literature [2, 3, 10, 29{31]: the APs send a request for UL pilots. The polled users respond with their mutually orthogonal UL pilots. Then, the APs send the received UL pilot burst to the CS which estimates the UL channel, com- putes the corresponding DL MU-MIMO precoding matrix and sends the precoding coecients (columns of the precoding matrix) along with the encoded data packets back to the APs for transmission. Finally, the APs precode the user data packets according to the desired MU-MIMO scheme, and compensate for the TDD calibra- tion factors (see Section 3.4) and phase rotation factors due to timing and frequency osets before joint DL transmission. 1.3 Related Work 1.3.1 Theoretical Foundations The most critical elements of this work are formed on the backbone of multiple antenna theory. While it has been known for some time that multiple antennas 20 1.3. RELATED WORK can be used to increase the reliability of a wireless channel, the modern and full understanding of MIMO was a result of the pioneering work of Foschini, Gans, and Telatar [32, 33]. They revealed the fundamental spatial multiplexing gain of min(N t ;N r ) in a point-to-point communication system operating at high SNR with N t transmit antennas andN r receive antennas. The core issues of MIMO have been covered extensively, and may now be considered so well-known that a number of suitable textbooks derive all the useful results [34,35]. MU-MIMO, while less prevalent than point-to-point MIMO in modern commu- nications systems, has also reached the point of wide-spread knowledge. Beginning with the groundbreaking result of Caire and Shamai [36], a number of other groups released capacity results on this channel [37, 38]. The full capacity of this channel was later fully characterized [39]. In essence, the results of MU-MIMO extend the spatial multiplexing gain result of the previous MIMO works. Since mobile devices are limited in size and the use of full MIMO capacity typically requires the antennas to be spaced by at least a quarter wavelength, the practical limits of MIMO tend to be a 24 factor increase in capacity. However, MU-MIMO achieves a min(N t ;KN r ) multiplexing gain, whereK is the number of users andN r is the number of receiver antennas on each user. ThusN t can be made very large to accommodate the service of many users simultaneously, eectively scaling the capacity of the channel with the number of users. The best known techniques of MU-MIMO, however, require CSIT and some sort of precoding transformation applied to the intended user symbols. The collection of CSIT, even when considering downlink pilots and feedback, is not excessively onerous to the overall sum rate [31]. Additionally, a variety of precod- ing methods may be utilized which have varying levels of complexity. One of the simplest precoding methods, known as Zero-Forcing Beamforming (ZFBF), uses the pseudo-inverse of the channel matrix and is structurally equivalent to channel inver- sion in single antenna channel. Like channel inversion, ZFBF is suboptimal, but it 21 CHAPTER 1. INTRODUCTION does achieve the maximum multiplexing gain at high SNR when a number of su- ciently orthogonal users can be scheduled [40,41]. Other common techniques include regularized zero forcing [42] and conjugate beamforming [2] which will be discussed in more detail in Chapter 2. With many of the important theoretical questions answered, MU-MIMO has now been adopted into the IEEE 802.11ac standard [6]. With the success of multiple antenna technologies, a number of works were pro- posed on access point coordination which make use of MU-MIMO techniques. The exact origin of basestation cooperation using MU-MIMO is not clear, though a num- ber of early works provide detailed overviews of the most fundamental issues [43{45]. Much of the subsequent work focused on the backhaul considerations [26]. Various techniques were developed, such as utilizing the considered schemes in smaller clus- ters to reduce the backhaul requirements [25]. With the conclusion that standard rubidium clocks used in cellular basestations are sucient for joint beamforming [46], the use of cooperative transmission was built into the LTE standard [7]. 4 In LTE this is referred to as Coordinated Multipoint (CoMP), and the expectations for CoMP were diminished by disappointing experimental trials [15]. 5 This was not a surprising result to some, who had shown that traditional cellular networks may have greater performance than cooperative networks when accounting for CSIT and backhaul constraints [17]. Subsequent works focused on the asymptotic behavior (as in [2]) of systems with innite transmit antennas [16,47]. Finally, it is worth noting that the existing knowledge of basestation cooperation indicates that it is not a panacea for all future networks. Massive MIMO, which considers the asymptotics of transmit antennas in a traditional cellular environment, is fundamentally limited by co-channel interference. It is tempting then, to think that by using cooperation to 4 This conclusion does not hold for the scenario in this dissertation, as rubidium clocks are far in excess of the limits of cost eective Wi-Fi grade hardware. Instead, the techniques developed herein operate on much less expensive hardware. 5 Again, these experiments were performed in a cellular environment, wherein interference is de- liberately avoided. In contrast, the scenarios considered in this work experience heavy interference, which may be exploited to signicantly increase data rates. 22 1.3. RELATED WORK form a single, massive cell consisting of every available basestation antenna, one may eliminate the eects of noise and achieve unbounded spectral eciency (much to the detriment of physical intuition). Recent work has shown, however, that there are indeed limits to the spectral eciency achievable through cooperation [48], though the overall gains are often still worthwhile. 1.3.2 Complementary and Experimental Work A number of works borne from the same theoretical underpinnings have made im- portant contributions. One of these contributions is transmit beamforming, which utilizes simultaneous transmissions of the same information from multiple transmit- ters to increase the eective SNR at the receiver. In order for the separate signals to add constructively, the carrier phase of the transmitters must align at the receiver. The synchronization techniques for multiple transmitters needed for this form of co- operative beamforming is considered in [49, 50]. Subsequent software-dened radio implementations of these ideas are presented in [51{53]. This work may be seen as a precursor to Distributed MIMO in which there is only single antenna output (MISO). This prompts an important distinction with respect to Distributed MIMO: the increase in spectral eciency for MISO beamforming is much less signicant. Since the addition of more transmitters leads to (ideally) a linear increase in SNR, then a linear increase in capacity necessitates an exponential increase in the number of transmitters. 6 Since there is no multi-user component to transmit beamforming, there is no degree of freedom gain by introducing more transmitters. However, the synchronization requirements for transmit beamforming are much less rigorous, as can be seen in Figure 2.1. While not specically focused on transmit beamforming, a number of relevant experimental works that rely on OFDM synchronization are also closely related 6 This can be observed through the Shannon formula for AWGN capacity, C = log(1 + SNR). 23 CHAPTER 1. INTRODUCTION to the work presented in Chapter 2. A number of works utilizing coarse OFDM synchronization were spiritual predecessors to Distributed MIMO, such as ZigZag decoding, which used a successive interference cancellation technique to combat hid- den terminals in Wi-Fi [54]. Though it is actually a form of transmit beamforming, the techniques used in SourceSync [55] are similar in nature to Distributed MIMO (SourceSync achieves a diversity gain, for example through Alamouti space time coding [56], rather than a multiplexing gain). More in line with the techniques of Distributed MIMO, Interference Alignment and Cancellation [57] achieves a par- tial multiplexing gain, but not the full degree of freedom gain. These works were followed by our experimental contributions [29,30], discussed in more detail below. Complementary results were published in other venues as well [58{60]. 1.3.3 Synchronization The synchronization of multiple transmitters has also been treated to some extent in the existing literature. Fundamentally, most modern point-to-point synchroniza- tion techniques incorporate (as a primitive) the estimation of frequency and timing from a single complex sinusoid [61,62]. Many such techniques behave like the maxi- mum likelihood estimator above a certain threshold SNR, that is, their mean square errors approximate the Cramer-Rao bound very closely [63]. This is a well stud- ied and widely-known phenomenon, and is used in most practical point-to-point synchronization schemes [64]. The application of these ideas to the synchronization of multiple transmitters has recently been demonstrated through SDR implementations, where the feasibility of over-the-air synchronization with enough accuracy to guarantee near-ideal multiuser interference suppression is demonstrated. These works primarily approach synchro- nization using a master-secondary protocol, requiring that all APs receive from a single master station broadcasting a beacon signal with suciently large power and 24 1.3. RELATED WORK thus limiting the size and topology of the network [29,30,58]. At least one group has also used a distributed antenna approach, which does not require synchronization between the distant antennas. However, it is also severely limited in network size due to losses in the distribution lines [59]. Though not necessarily utilized with Distributed MIMO in mind, consensus algorithms have also been considered for the synchronization of transmitters [65{67]. Consensus has the advantage of working with networks of arbitrary size and topology, and we consider our own take on consensus for Distributed MIMO in Section 2.5. However, for reasons revealed in Section 2.6, consensus is likely impractical for real world deployments. 1.3.4 Calibration Though knowledge of physical channel reciprocity has long been known, it has re- cently been rediscovered due to its importance in TDD systems [68]. The concept of TDD reciprocity calibration also has earlier origins, though renewed interest in it began to grow with the need for CSIT in MIMO schemes [69]. This work explored hardware architectures for ensuring reciprocity, but the modern formulation stems from massive MU-MIMO analysis [2]. It was soon realized that building hardware with reciprocity introduced additional cost, and thus signal processing methods were considered instead. Such a method of reciprocity calibration was considered in [13], wherein the calibration technique was based on exchanging pilot signals between a transmitter and a receiver. This method has been extended to also work in the presence of frequency osets [70] for practical purposes. In practice, it is much more desirable to design calibration protocols that do not assume the presence or collaboration of the UTs, which may be legacy devices not capable of implementing a newer protocol. A novel TDD reciprocity calibration method referred to as \Argos" was presented in [10] as part of an SDR implementation of a TDD reciprocity-based 25 CHAPTER 1. INTRODUCTION massive MIMO basestation. This method enables calibration by exploiting two- way signaling that involves only basestation antennas, exchanging calibration pilot signals with a (possibly additional) reference antenna. Argos is very sensitive to the placement of the reference antenna relative to the other antennas. As a result, this scheme is not readily scalable and is not suciently accurate to enable large-scale MIMO in a distributed system. 1.3.5 Our Contributions This dissertation is based on a number of our published works pertaining to Dis- tributed MIMO. Our preliminary work and proof-of-concept was successful at syn- chronizing four independent APs and then jointly beamforming to four users [30] using WARP radios [71]. This work focused on the mechanics of a timing and fre- quency synchronization protocol, whereby a master AP would transmit an RF pilot tone, all other APs would lock to that tone, then they would begin joint trans- mission. When using Zero-Forcing Beamforming (in which the APs pre-cancel the interference to the users based on knowledge of the downlink channel), the median signal-to-interference power ratio was 25 dB. This residual interference power came from imperfections in the synchronization and provides a preliminary motiva- tion for the improvement of the system. In a concurrent work, we analyzed the performance of the AirSync system [29]. Adopting a number of schemes from multi-user MIMO (namely, Zero-Forcing Beam- forming, Tomlinson-Harashima precoding, and blind interference alignment) we were able to derive performance curves (sum rate and bit error rate, among others) for a 2 AP, 2 user topology. In contrast to coordinated multipoint, we found that full degree of freedom/spatial multiplexing gains were possible in a WLAN scenario, and that the backhaul was not yet a limiting factor. This is explained by the dierence between WLAN and cellular networks, which are already designed to mitigate inter- 26 1.3. RELATED WORK ference by frequency orthogonalization, and thus do not have as great of a potential to jointly beam form. Our subsequent work improved the synchronization between APs, enabling a scaling up to many more APs. Because a massive number of AP antennas makes it impractical to gather channel state information at the transmitter from downlink training sequences, it becomes necessary to send uplink pilots from the users in order to infer the downlink channel. In contrast to Argos [10], we established a method which eliminated the need for an explicit reference antenna. In doing so, each AP observed the pilots of all of its neighbors, rather than the single pilot from the reference antenna. By collecting all of the measurements at the central processor, we were able to formulate an estimate of the calibration ratios as the solution to a least squares problem. Compared to the Argos scheme, we saw signicant improvements in the calibration accuracy for distributed topologies. Additionally, since the need for a reference antenna with a high SNR channel to each of the APs was eliminated, our scheme was much more able to scale to large, distributed topologies. A follow-up work extended this scalability to the full system [72]. We again demonstrated signicant improvements over Argos for calibration. In addition to the use of least squares as a means of solving the distributed calibration problem, we also saw value in using it as a method of distributed synchronization. In our model, each AP exchanges noisy estimates of the frequency dierences between it and its neighbors. The frequency dierence estimates correspond to any o-the-shelf OFDM estimators [20{23]. These estimates are fed back to the central processor, which generate correction factors for each AP to apply. Based on a derivation of the eect of frequency errors on our baseband signal model, we showed it was possible to correct for these frequency osets by a rotation of the baseband symbols. We characterized the mean-squared error of the joint maximum-likelihood timing and frequency estimator, and based on this performance, we were able to show by 27 CHAPTER 1. INTRODUCTION simulation that coherent beamforming could be achieved over a time scale on the order of several hundred OFDM symbols. 1.4 Organization The remaining chapters of this work are outlined as follows. Chapter 2 covers the elements of synchronization needed to enable Distributed MIMO. Starting with a model for OFDM synchronization, we cover the requirements on synchronization accuracy, and present the basic mechanics shared by all the subsequent schemes. We then develop a variety of schemes, starting with the master- secondary protocol from AirSync, and exploring consensus, least squares, recursive least squares, and nally a Kalman lter based approach. In Chapter 3 we discuss reciprocity calibration. We begin with a problem state- ment and model, then revisit the Argos scheme. We then discuss robust least squares calibration and provide numerical analysis of its performance with respect to Argos. Finally, we detail some results on hierarchical calibration for making the process even more scalable. Chapter 4 discusses results on system-level optimization. We provide a model for the performance of Distributed MIMO in the presence of frequency osets and phase noise. As revealed in Chapter 2, the progressive loss of synchronism degrades the eective SINR of the users over time. We then use this model to provide insight into the optimal frame length of a Distributed MIMO system. Additionally, we discuss some results on precoding schemes which are compatible with the limited backhaul capacity observed in practice. Chapter 5 covers a variety of experimental results. We provide results from synchronization and calibration on a real system, validating the models of Chap- ters 2 and 3. Finally, we present an experimental performance analysis of Blind 28 1.4. ORGANIZATION Interference Alignment in a Distributed MIMO testbed. 29 Chapter 2 Synchronization 2.1 Introduction While synchronization is one of the most fundamental problems in practical wire- less systems, the aspects of synchronization needed for Distributed MU-MIMO are relatively modern. In this context, we use the term synchronization to refer to the act of driving each discrete transmitter to operate on the same carrier frequency, the same sampling frequency, and with the same timing reference. As will be seen in Section 2.2, the eect of the carrier frequency oset is distinct from that of the sampling frequency oset, but due to the common practice of running both clocks from a single physical oscillator, their eects are linked. In this chapter we will often refer to the eects of the carrier frequency oset, where the eects of the associated sampling frequency oset are implicitly assumed. The distinguishing feature of Distributed MIMO synchronization is that it is foremost performed between the transmitters in contrast to traditional cellular and Wi-Fi systems which utilize synchronization only at the receivers (i.e. at the user end in the downlink and at the basestation end in the uplink). Due to the use of OFDM in modern wireless systems, traditional synchronization at the receivers is not 30 2.1. INTRODUCTION a particularly challenging issue. The receivers must achieve timing synchronization to within the length of the cyclic prex (e.g., 400-800 ns in Wi-Fi [18]), and the maximum carrier frequency oset (CFO) must be less than half of a subcarrier bandwidth (e.g. less than 156.25 kHz in Wi-Fi). When both of these conditions are met, synchronization may be achieved outside of real-time via baseband signal processing. The requirement on the CFO is typically assured via standardization (as in 802.11g and newer, which set strict guidelines for oscillator accuracy), and the requirement on timing is met by a suciently sensitive correlator receiver. In contrast, wireless synchronization between transmitters is not so straightfor- ward. In order to use the same set of techniques as a point-to-point system, the APs would need to be capable of very accurate full-duplex communication [73{75], which is still considered to be in the experimental stages of research. Rather, a dierent set of tools are needed to achieve the ne level of synchronization required for Distributed MIMO. This work primarily takes two approaches: either the trans- mitters use an auxiliary wireless spectrum outside of the data-bands to synchronize, or they periodically exchange pilots within the data band. The pros and cons of both techniques are readily apparent, and we treat both approaches in this chapter. Additionally, the accuracy of synchronization is much more demanding in the case of Distributed MIMO than in the point-to-point case. As in the point-to-point case, the timing synchronization between APs needs to be less than the length of the cyclic prex. However, this timing oset must remain constant from the point of channel state information acquisition until its use in the downlink transmission. The timing reference between APs is allowed to change from frame to frame, but each time the relative timing oset between APs change, new channel state infor- mation must be acquired. The requirement on the frequency osets is even more exacting. In order to achieve useful downlink transmission frame lengths, we will require synchronization on the order of parts per billion. For the 2.45 GHz carrier 31 CHAPTER 2. SYNCHRONIZATION frequency of Wi-Fi, this translates to synchronization of approximately 10 Hz or less. The fulllment of such stringent requirements gives rise to the techniques pre- sented in this chapter. We will begin by developing a model to describe the eects of asynchronism in a point-to-point case, which we will then use to explain the re- quirements in the case of Distributed MIMO. Next we will review and derive the techniques and fundamental limits used in point-to-point synchronization, as these methods will provide a useful component of the methods for Distributed MIMO. We will then discuss an experimental implementation of a simple Distributed MIMO synchronization method, which will prompt the development of various, more robust techniques in the latter sections of the chapter. 2.2 A Model of OFDM with Synchronization Er- rors 2.2.1 Point-to-Point OFDM Synchronization Errors Let f 0 and f s be the nominal carrier and sampling frequencies of all nodes of a wireless network, respectively. The actual carrier and sampling frequencies of node i, indicated by f 0;i and f s;i , respectively, dier from their nominal values by some deterministic unknown carrier frequency oset and sampling frequency oset, re- spectively. Also, each node i operates according to its local time axis with timing oset (TO) i with respect to a common nominal time axis. Motivated by the 802.11 standard [18], we assume that the sampling clock and the RF clock on each node hardware are derived from the same (local) oscillator, such that f 0;i =f s;i = 1 where is a constant factor that depends only on the hardware design but is independent ofi. Then we letf s;i =f s + i , where i f s is 32 2.2. A MODEL OF OFDM WITH SYNCHRONIZATION ERRORS the SFO at node i. Correspondingly, the CFO is given by i . Consistent with the typical accuracy of 802.11 commercial grade hardware, we assume that the CFO is much smaller than the signal bandwidth. Consider the transmission of a sequence of OFDM symbols from node i to node j. For the sake of clarity of exposition, we assume that each node has a single antenna, although the synchronization and calibration schemes proposed in this work can be immediately applied to multi-antenna APs. We let X i [n] = (X i [n; 0];:::;X i [n;N 1]) denote then-th block of frequency-domain symbols (i.e., then-th OFDM symbol), whereN is the number of OFDM subcarriers andX i [n;] is the complex baseband symbol sent on subcarrier 2f0;:::;N 1g at OFDM symbol time n. The OFDM modulator performs an IFFT, producing the block of N time-domain \chips" x i [n] = IFFT(X i [n]) = (x i [n; 0];:::;x i [n;N 1]), where x i [n;k] denotes the k-th chip of the n-th time-domain block. Each block x i [n] is cyclic-prexed, forminge x i [n] = (e x i [n;L];:::;e x i [n;N 1]), where L is the length of the cyclic prex (CP). The sequence of cyclic-prexed blocks is sent to the digital to analog converter (DAC), forming the continuous-time complex baseband signal e x i (t) = X n N1 X k=L e x i [n;k]((t i )=T s;i n(N +L)k); (2.1) where (t) is the elementary interpolator DAC waveform, assumed for simplicity to be the rectangular pulse (t) = 8 > < > : 1 t2 [1=2; 1=2] 0 elsewhere (2.2) and where T s;i = 1=f s;i is the sampling interval of transmitter i. The complex baseband signal (2.1) is carrier-modulated in order to produce the transmitted signal s i (t) = Refe x i (t) exp(j2f 0;i t)g. 33 CHAPTER 2. SYNCHRONIZATION Now consider the corresponding receiver operations at nodej. Leth i!j (t) denote the complex baseband equivalent impulse response of the channel between node i and node j, including the receiver hardware (low noise amplier, analog to digital converter (ADC), anti-aliasing lter, etc.). Since the CFO between nodei and node j is small with respect to the signal bandwidth, the complex baseband signal after demodulation can be written as (see also [76]) e y i!j (t) = X n N1 X k=L e x i [n;k]g i!j (t ( i j ) (n(N +L) +k)T s;i ) ! exp(j2(f 0;i f 0;j )t); (2.3) where we dene 1 g i!j (t) = ( i h i!j )(t), with i (t) = (t=T s;i ). The ADC at receiverj produces samples at ratep=T s;j , wherep 1 is a suitable oversampling integer factor. The resulting discrete-time complex baseband signal, is given by e y i!j [`] = e y i!j (`T s;j =p). Since T s;i and T s;j are both assumed to be very close to the nominal chip interval T s over a single chip, the scale of the time axis at the two nodes is essentially identical. However, by accumulating this small time dierence over many OFDM symbols, the SFO manifests itself as a timing misalignment of the time axes that grows linearly with the OFDM symbol index. Letting` =m(N +L)p+q withq2fLp;:::;Np1g, extracting blocks ofp(N +L) output samples, and assumingj i j jT s L, i.e., that the TO is signicantly less than the duration of the CP, 2 we arrive at the following channel model without inter- block interference between consecutive OFDM symbols: for q2f0;:::;pN 1g e y i!j [m;q] = N1 X k=L e x i [m;k]g i!j (qkp)T s =p i;j m(N +L)T s;i;j ! exp j2f 0;i;j T s (m(N +L) +q=p) ; (2.4) 1 (a b)(t) denotes the continuous-time convolution of signals a(t) and b(t). 2 As discussed in Section 1.2, the synchronization protocol that we propose in this work makes sure that this assumption is satised. 34 2.2. A MODEL OF OFDM WITH SYNCHRONIZATION ERRORS where i;j = i j ,T s;i;j =T s;i T s;j andf 0;i;j =f 0;i f 0;j . Sincejf 0;i f 0;j jT s 1, the phase rotation across the subcarriers of a single OFDM symbol is negligible. Hence, we can drop the dependence on q from the argument of the exponential in (2.4). This is equivalent to approximating the instantaneous phase of the carrier term in (2.4) as piecewise constant on each OFDM symbol of duration (N +L)T s , and only taking the phase increment into account in discrete steps from one OFDM symbol to the next. This approximation is the time-domain equivalent of neglecting ICI, which is indeed a valid approximation when the CFO is much smaller than the OFDM subcarrier separation. 3 The received discrete-time m-th signal block after CP removal can thus be ex- pressed by y i!j [m;q] N1 X k=L x i [m;k modN]g i!j (qkp)T s =p i;j m(N +L)T s;i;j ! exp j2f 0;i;j T s (N +L)m ; q = 0;:::;pN 1: (2.5) The rst term in the above product is easily recognized to be the cyclic convolution of the up-sampled sequencefx i [m;k] : k = 0;:::;N 1g (with insertion of p 1 zeros in between each sample), with the discrete-time impulse response g i!j [m;q] =g i!j (qT s =p ( i j )m(N +L)(T s;i T s;j )); for q = 0;:::;pN 1: (2.6) We wish to obtain a frequency-domain representation of this cyclic convolution. To this purpose, notice that g i!j [m;q] is obtained by sampling the impulse response g i!j (t) at ratep=T s delayed by ( i j )+m(N +L)(T s;i T s;j ). Using the well-known spectral folding relationship between the continuous-time Fourier transform and the 3 This assumption is validated by the oscillator requirements of the 802.11-2012 standard, which allows 20 ppm frequency error. For a f o = 2:45 GHz system, this corresponds to a maximum frequency dierence of 98 kHz, which is much less than the subcarrier spacing in the 802.11 standard, 312:5 KHz. 35 CHAPTER 2. SYNCHRONIZATION discrete-time Fourier transform of the corresponding sampled signal [77], and after some algebra we nd the discrete-time Fourier transform of g i!j [m;q] in the form G i!j (m;) = pN1 X q=0 g i!j [m;q]e j2q = p T s X ` G i!j ` T s =p exp j2(`) i;j +m(N +L)T s;i;j T s =p ; (2.7) where G i!j (f) = R g i!j (t)e j2ft dt is the continuous-time Fourier transform of g i!j (t), and where 2 [1=2; 1=2] and f 2 R denote the frequency variables for discrete-time and continuous-time signals, respectively. Usually systems are designed such that N L 1 and the receiver sampling frequency p=T s is large enough in order to avoid signicant spectral folding. This means that only the term for ` = 0 is signicant in the (discrete-time) frequency interval 2 [1=(2p); 1=(2p)], which is the interval of interest containing the N OFDM subcarriers of the transmitted signal. Hence, after applying the DFT and taking the samples corresponding to the set of discrete frequenciesf=(pN) : = N=2;:::;N=21g, straightforward algebra (omitted for the sake of brevity) yields the OFDM frequency-domain demodulated signal in the form Y i!j [m;] =H i!j []X i [m;]E i;j [m;]; (2.8) where H i!j [] = p Ts G i!j NTs , and where E i;j [m;] = exp j2 h N i ( i j ) exp j2 h N i ( i j )m exp (j2( i j )m); 36 2.2. A MODEL OF OFDM WITH SYNCHRONIZATION ERRORS with 2fN=2;:::;N=2 1g and where we dene the normalized TO, SFO and CFO as i =f s i ; i = (N +L)T s i ; i = i ; (2.9) respectively. In (2.8) the termH i!j []X i [m;] represents what we would expect from an ideal system without synchronization errors while the combined eects of the TO, CFO and SFO are captured by the multiplicative phase rotation term E i;j [m;]. 4 2.2.2 Impact of Synchronization Errors on Distributed MU- MIMO We consider a network formed by UTs k = 1;:::;N u served in the DL by APs i = 1;:::;N a , using Distributed MU-MIMO. From the analysis developed above, 5 it directly follows that the received signalN u 1 vector at the UT receivers, at OFDM symbol m and subcarrier , is given by y[m;] = [m;]H[m;][m;]x[m;] + z[m;]; (2.10) where H[m;] is theN u N a channel matrix with (i;k)-th elementH i;k [m;], x[m;] is theN a 1 vector of frequency domain symbols transmitted by theN a APs simul- taneously, and z[m;] is the correspondingN u 1 vector of Gaussian noise samples, i.i.d.CN (0;N 0 ). In general, the MU-MIMO precoded signal vector is given by x[m;] = G[m;]u[m;]; (2.11) 4 This model has been validated experimentally by the authors using the WARP software radio platform, as documented in [29, 30], and it is found to be accurate within the typical errors of 802.11 legacy APs. 5 The equations derived above apply both when the nodes are APs and UTs. To distinguish APs from UTs we use tildes over the UT variables. 37 CHAPTER 2. SYNCHRONIZATION where u[m;] is the N u 1 vector of time-frequency encoded data symbols to be sent to the N u UTs, and G[m;] is the N a N u MU-MIMO precoding matrix for OFDM symbol m and subcarrier . In this work we consider linear Zero-Forcing Beamforming (calculated as the column-normalized left pseudo-inverse of the chan- nel matrix) [3, 31] and conjugate beamforming [2, 3, 10], which is the Hermitian transpose of the channel matrix, scaled to meet the transmit power constraint. The two matrices [m;] and [m;] are diagonal of dimension N u N u and N a N a , respectively, with diagonal elements given by k;k [m;] = exp j2 h N i e k exp j2 h N i e k m exp j2 e k m ; (2.12) and i;i [m;] = exp j2 h N i i exp j2 h N i i m exp (j2 i m): (2.13) The eect of [m;] can be compensated at each UT (see sync blocks at UTs in Figure 1.2) by standard pilot-aided or data-aided timing and frequency synchro- nization techniques suited to OFDM (see [20{24]). On the other hand, the presence of [m;] in (2.10) between the precoded transmit vector x[m;] and the channel matrix H[m;] yields a degradation of the MU-MIMO precoding performance that cannot be undone by UT processing. As shown in Figure 1.2, our approach includes the use of a central server module and SYNC blocks, responsible for carrying out the synchronization at the transmitter side. The MU-MIMO precoding matrix G[m;] in (2.11) is constant over time throughout the transmission block (i.e., it is indepen- dent of m), and it is calculated from the noisy estimate of the \nominal" channel matrix H[0;] obtained by the UL pilots at the beginning of the precoding block (here indicated as OFDM symbol m = 0). Hence, the precoder is ignorant of the matrix [m;] unless TO, SFO and CFO are explicitly taken into account. In the 38 2.2. A MODEL OF OFDM WITH SYNCHRONIZATION ERRORS case of ZFBF, we let G[m;] = H H [0;](H[0;]H H [0;]) 1 [0;] for all m, and in the case of conjugate beamforming we let G[m;] = H H [0;][0;], where [0;] and [0;] are diagonal matrices that ensure that the columns of G[m;] have unit norm. Notice also that the ZFBF precoder requires N u N a . In order to motivate the need for the synchronization protocol proposed in this paper, we examine the performance degradation due to typical uncompensated SFO and CFO between the APs. We use lower bounds on the achievable sum-rates as a performance metric, obtained assuming an i.i.d. equal-power Gaussian coding en- semble, where the UTs treat the residual interference due to imperfect zero-forcing as noise. 6 We assume DL blocks of M = 60 OFDM symbols, typical of 802.11 [18]. In order to focus on the impact of synchronization errors only, H[m;] is assumed constant with respect to the time index m = 0;:::;M 1 over each block and, optimistically, we assume ideal TDD reciprocity and noiseless channel estimation. Provided that the relative TO between APs is within the length of the CP, the terms exp j2 N i are automatically included as part of the channel estimated from the UL pilot symbols. 7 Under these assumptions, the CS computes its MU-MIMO precoding matrix based on the channel matrix H[0;][0;] and uses it throughout the whole DL block comprising M symbols. Therefore, because of the time-varying matrix [m;], the precoder is more and more mismatched as m increases. Fig- ure 2.1 shows the achievable rates obtained by Monte Carlo simulation assuming H[0;] with i.i.d. elementsCN (0; 1) (normalized independent Rayleigh fading) of a network with N a = N u = 4, SFO i i.i.d. across the users and the DL blocks, uniformly distributed over [ max ; max ], with max = 800 Hz (20 ppm frequency er- 6 Quantifying the system performance in terms of achievable rates is now common practice in modern applied communication works (e.g., see [10,29,30,58]) and it is much more signicant than evaluating traditional performance metrics such as self-interference variance or estimation MSE, for which we are typically unable to quantify the impact on the ultimate system performance. 7 It can be readily shown that it takes about 1 second for two completely uncompensated clocks to fall out of sync by more than the CP, if they are o by worst case frequency osets dictated by 802.11. Hence, synchronization performed at intervals 1 second apart or faster suces. 39 CHAPTER 2. SYNCHRONIZATION −10 −5 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 45 SNR (dB) Sum Rate (bit/s/Hz) Ideal ZFBF Degraded ZFBF Ideal Conj BF Degraded Conj BF Figure 2.1: Achievable average rates for a 4 4 Distributed MIMO system in ideal synchronization conditions and degraded by synchronization impairments (uncom- pensated free running oscillators). ror), and f s = 40 MHz from the 802.11 specications [18]. The OFDM modulation has parameters N = 64 and L = 16. The achievable rate shown in Figure 2.1 is the average rate across a frame of 60 OFDM symbols. Since the system loses syn- chronism progressively across each block, the achievable rate rapidly degrades as the OFDM symbol m increases, such that the average performance is severely de- graded. As a comparison, we also show the performance of an ideally synchronized system (zero CFO/SFO) and the performance of a MISO cooperative beamform- ing scheme employing conjugate beamforming (denoted by Conj BF in the gure) to a single user [10, 50] with and without frequency osets. Notice that MISO cooperative beamforming suers much less from the lack of synchronization, and signicantly outperforms the mismatched MU-MIMO ZFBF at high SNR. Thus the motivation of this chapter is to provide the APs with suciently accurate estimation of their SFO/CFO such that the large spectral eciency promised by Distributed MU-MIMO is eectively realized in practice. 40 2.3. POINT-TO-POINT SYNCHRONIZATION 2.3 Point-to-Point Synchronization In order to overcome the impairments discussed in Section 2.2, we observe that the eects of the error term (2.13) can be undone by each AP i transmitter by multiplying the transmitted frequency domain symbols X i [m;] by the time and frequency dependent phase rotation factor P i [m;] = exp j2 1 + 1 h N i corr i m ; (2.14) and by adjusting the timing reference (origin of the transmitter time axis) by corr i . The timing and frequency corrections corr i ; corr i are provided by the CS through the low-complexity centralized computation illustrated in the rest of this section. We should also note that a symbol phase rotation by P i [m;], i.e., the complex conjugate of the term in (2.14), must be applied to signals received at the APs during the UL training from the users in the data slots. In this way, all APs use the same time and frequency reference for the calculation of the DL channel matrix for MU-MIMO precoding. Performing such corrections during UL training and DL transmission is consistent with the diagram in Figure 1.2, showing SYNC block modules in both UL and DL AP/CS processing. In addition to the baseband symbol rotation as modeled in Section 2.2, the SFO produces a contraction or dilation of the time axis of each AP. While this is a negligible eect over a few tens of OFDM symbols (typical duration of a data slot), it accumulates over the slots such that at some point the OFDM symbol misalignment between dierent APs becomes larger than the OFDM CP, thus producing inter- block interference. We will show in Section 2.3.4 that for typical values of SFO consistent with the 802.11 standard specications, the duration for which this symbol misalignment becomes signicant with respect to the OFDM CP length is much larger than the frame length at which the synchronization protocol must be repeated. 41 CHAPTER 2. SYNCHRONIZATION Therefore, thanks to the TO correction, the timing axis relative shift is reset at each frame and the SFO can be eectively compensated by the proposed baseband symbol rotation scheme. 2.3.1 Maximum Likelihood Synchronization First we consider the optimal, yet computationally dicult, estimator for the joint TO and CFO. We consider pilot-aided estimation for a transmitter-receiver pair where the channel is multipath time-invariant with known path delays and unknown path coecients along with additive white Gaussian noise. We denote by and by f the timing and carrier frequency dierences between the transmitter and the receiver. The received time-domain baseband signal corresponding to a pilot burst trans- mission is given by y(t) = P1 X p=0 h p s(t p ) ! e j2ft +z(t) (2.15) where the multipath channel impulse response ish(t) = P P1 p=0 h p (t p ), and where the pilot signal is given by s(t) = 1 p T s Nc1 X n=0 s n (t=T s n); (2.16) for the sequence of time-domain chip symbols s = (s 1 ;:::;s Nc ). In (2.15), z(t) denotes the complex circularly-symmetric Gaussian white noise with autocorrelation function E[z(t)z (t)] = N 0 (). Lettingr(t) = ( )(t) and assuming that the receiver performs chip matched ltering and sampling with one sample per chip, 42 2.3. POINT-TO-POINT SYNCHRONIZATION we obtain the received discrete-time observation y[m] = P1 X p=0 h p Nc1 X n=0 s n r mn + p T s ! e j2fTsm +z[m] (2.17) wherez[m]CN (0;N 0 ) is a discrete-time complex circularly symmetric i.i.d. Gaus- sian process. As explained in Section 1.2, the receiver has a coarse knowledge of the frame timing such that it expects to nd the pilot burst in a given interval including the guard times. The receiver collects M samples such that the sampled interval of duration MT s contains the pilot burst. We collect the M received samples into an M 1 vector y = (y[0];:::;y[M 1]) T . We dene = fT s , 8 and = =T s , consistently with the normalized TO dened in (2.9). We dene the diagonal matrix () = diag 1;e j2 ;e j4 ;:::;e j2(M1) ; the M (MN c + 1) convolution (tall Toeplitz) matrix S = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 s 0 0 0 s 1 s 0 . . . . . . . . . s Nc1 . . . 0 . . . s 0 . . . . . . 0 s Nc1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 ; and the matrixr() of dimension (MN c + 1)P whose p-th column (for 8 This is related to i ; j as dened in (2.9) by = ij N+L , when i is the transmitter and j the receiver. 43 CHAPTER 2. SYNCHRONIZATION p = 0;:::;P 1) has entries r (` p =T s ); for ` = 0;:::;MN c : In this way, the discrete-time sampled version of the convolution (r h)(t) is given by the productr()h, where h = (h 0 ;:::;h P1 ) T is the vector of path coecients. Eventually we arrive at the vector observation model y = (;)h + z; (2.18) where (;) = ()Sr() and z = (z[0];:::;z[M 1]) T . The joint maximum likelihood estimator (MLE) for ; and h, assuming N 0 andf p g known, is obtained by maximizing the Likelihood Function (;; h) = 1 (N 0 ) M exp 1 N 0 ky (;)hk 2 ; (2.19) or equivalently by minimizing the squared distanceky (;)hk 2 . For given and , the minimization with respect to h yields the classical LS solution b h = H (;)(;) 1 H (;)y: Replacing this in the objective function, after some simple algebra we nd that the joint ML estimator is obtained by maximizing with respect to and the quadratic form F (;) = y H (;) H (;)(;) 1 H (;)y: (2.20) This can be interpreted as maximizing the energy of the projection y onto the column space of the signal matrix (;). Eventually, the joint ML estimator 44 2.3. POINT-TO-POINT SYNCHRONIZATION is obtained by searching over a suciently ne grid of two-dimensional points with respect to the variables and . 2.3.2 Timing and Frequency Cramer-Rao Bound Next, we derive the Cramer-Rao Bound (CRB) for this estimation problem, which provides a lower bound on the error variance of an unbiased estimator. The maxi- mum likelihood estimator exhibits a threshold behavior, where above a certain SNR, the estimator essentially achieves the CRB [63]. Thus the CRB is a good approxi- mation for the performance of the MLE at reasonably high SNR values. It is well-known that when the likelihood function is a multivariate Gaussian pdf CN (m(); ) with the dependence on the parameters in the mean, then the Fisher information matrix J() has (a;b) elements [64] J a;b () = 2Re ( @m() @ a H 1 @m() @ b ) : In our case, we have =N 0 I, m() = ()Sr()h, and the vector of parameters has elements 1 =; 2 =; 2p+3 = Refh p g; 2p+4 = Imfh p g; p = 0;:::;P 1: Dene D = diag(0; 1;:::;M 1) and let e p denote the P 1 vector with all zeros and a single 1 in position p, for p = 1;:::;P . Also letr 0 () denote the derivative ofr() with respect to , whose p-th column has only two non-zero elements equal to 9 1 at component ` =b + p =T s c and equal to +1 at component ` = 9 This is the only step where we make explicit use of the fact thatr(t) is a triangular waveform. 45 CHAPTER 2. SYNCHRONIZATION b + p =T s c + 1. Then, we can write @m() @ = j2D()Sr()h; (2.21) @m() @ = ()Sr 0 ()h; (2.22) @m() @Refh p g = ()Sr()e p ; (2.23) @m() @Imfh p g = j()Sr()e p : (2.24) Dening the M 2(P + 1) matrix K = h @m() @ 1 ; @m() @ 2 ;:::; @m() @ 2(P+1) i , the Fisher information matrix is obtained as J() = 2 N 0 Re K H K : (2.25) Finally, the CRB for and are obtained by the rst two diagonal elements of J 1 (). It is easy to check that the CRB decreases asO(1=SNR) where SNR denotes the signal to noise ratio at the receiver [64]. Figure 2.2a shows the CRB for = 0 and = 0, h = (1; 0:5;0:2; 0:1;0:05;0:005) T ; with path delays (normalized with respect to T s ) equal to (0; 1:75; 3:56; 7:90; 10:72; 15:30); respectively. The pilot sequence s is formed by a block of 64 pseudo-random chips with values in the QPSK constellation, repeated twice and separated by 128 chips equal to 0, for a total pilot sequence length of 256 chips. This corresponds to less than 4 64-carrier OFDM symbols (in fact, the OFDM symbols also include the cyclic 46 2.3. POINT-TO-POINT SYNCHRONIZATION −20 −10 0 10 20 30 40 10 −15 10 −10 10 −5 10 0 10 5 Receiver SNR (dB) Mean−Square Error Multipath Channel ML and CRB Frequency Timing (a) Multi-path channel −20 −10 0 10 20 30 40 10 −15 10 −10 10 −5 10 0 10 5 Receiver SNR (dB) Mean−Square Error One Tap Channel ML and CRB Frequency Timing (b) Single-tap channel Figure 2.2: CRB (solid line) and simulated estimation MSE (asterisks) of the joint ML estimation for a multi-path a single-tap channel 47 CHAPTER 2. SYNCHRONIZATION prex which are not necessary for this time-domain pilot burst). The simulated MSE of the joint ML estimator derived before is also shown for comparison in the gure. As the gure reveals, the simulated estimation MSE is very close to the corresponding CRB even for very low SNR. Figure 2.2b shows the associated MSEs of the joint ML estimator assuming a single-path channel between the two APs. The behavior of the MSE is very similar to the case of Figure 2.2a, showing that the joint ML estimator is remarkably insensitive to the actual channel delay-intensity prole. 2.3.3 OFDM Synchronization OFDM synchronization refers to the acquisition of estimates of the timing and fre- quency osets between an OFDM transmitter/receiver pair. While maximum like- lihood estimation of the quantities is generally preferred, the maximum likelihood estimator has unreasonable computational requirements for a real time system. As such, we seek computationally simpler methods which achieve similar performance (i.e. which come close to the CRB at moderate to high SNR). As shown in Sec- tion 2.2, a small frequency oset of the carrier clock (i.e. one that does not induce ICI) can be observed as a complex rotation of the baseband symbols that increases linearly (mod 2) with time. This fact is used by most methods of estimating the carrier frequency oset, which are essentially derived from the classical setting of estimating a complex sinusoid in additive noise [61, 62]. The primary distinction among these methods is the type of pilot structure they use, which may either be transmitted prior to data transmission on all subcarriers or during data transmission on selected subcarriers. The problem of timing oset estimation is typically accomplished with a tra- ditional matched lter which either correlates the received signal with a known preamble, or with itself (which would require that the preamble is periodic). The 48 2.3. POINT-TO-POINT SYNCHRONIZATION former approach is used in the 802.11 standard, wherein the start of a PHY layer frame contains a known sequence which is matched ltered using DSP multiply- accumulate blocks. The mathematics of these methods are essentially the same and are straightforward. Assume that the length of the correlating sequence is M, and the received time-domain OFDM samples are given byfy[n]g Q1 n=0 . Letfy s [n]g denote the sequence against which the received symbols are being correlated. In the case of a known preamble,fy s [n]g is the sequence of known symbols, or in the case of a signal being correlated with itself, thenfy s [n]g is the second period of the signal. To determine the TO , then the receiver calculates: arg max M1 X i=0 jy[i +]y s [i]j: (2.26) This is directly applicable for the case of a known preamble, but for sequences which utilize autocorrelation, some further specicity is needed. In particular, the Schmidl-Cox method [20] utilizes an OFDM symbol that is repeated in the time domain with a period of one half the OFDM symbol duration. Thus in this case M = N=2, and y s [i] = y[i + +N=2]. This approach has the advantage of not requiring the receivers to know thefy s [n]g sequence a priori, but at the expense of a loss in receiver sensitivity due to the shortened sequence. In the classical case of a sinusoid in complex noise, the CRB varies inversely with a cubic order polynomial of the sequence length, so the MSE of the Schmidl-Cox method is roughly 9 dB worse than the approach of using one OFDM symbol in the known preamble method. The Van de Beek method [21] is similar to Schmidl-Cox, but instead of using additional pilot sequences, it exploits the fact that OFDM symbols already contain repeated data in the cyclic prex. It again autocorrelates the signal to nd the timing peak using equation 2.26, but in this case M = L, the length of the cyclic prex, and y s [i] = y[i + +N]. The Van de Beek method has the advantage 49 CHAPTER 2. SYNCHRONIZATION of not requiring an explicit preamble for synchronization. Additionally, it may use the cyclic prexes for multiple OFDM symbols in its estimation, so longer packets provide greater delity to the estimator. Practically however, the ability to utilize longer symbols is limited in hardware by the size of the DSP's multiply-accumulate blocks, and so these longer sequences may not be of practical use. Also, since the cyclic prex is also used to guard against inter-symbol interference, the samples from the CP can be corrupted by multipath interference. Once the timing oset estimate c is acquired, training symbols are again used to estimate the carrier frequency oset. The carrier frequency oset is calculated by observing the phase rotation across the two repeated sequences: c f = 1 2M arg ( M1 X i=0 y [i + c ]y s [i] ) (2.27) where the denitions of the sequencesfyg andfy s g are the same as before. This approach is common to preamble, Schmidl-Cox, and Van de Beek methods. A similar method is used for the continuous pilot tones transmitted by 802.11. In this case, a number of subcarriers are designated as pilot subcarriers, and a signal with constant phase is transmitted on these tones. Using the same equation 2.27, the receiver observes the phase rotation across the block to form its estimate of the CFO. This method is not desirable for exchanging frequency estimates among APs in a Distributed MIMO setting, as the overhead of transmitting full packets is too large. If the APs were capable of full duplex communication, these longer pilots may be feasible, where each AP would transmit on a set of tones. An additional pilot structure introduced in AirSync [30] utilizes a combination of a few of the above methods, with a simple extension to improve performance. For the TO, AirSync uses the 802.11 short training sequence as the time domain preamble, and uses a correlator to detect the start of the packet. The pilots used for 50 2.3. POINT-TO-POINT SYNCHRONIZATION −10 −5 0 5 10 15 20 25 30 35 40 10 −1 10 0 10 1 10 2 10 3 10 4 10 5 SNR (dB) Frequency RMSE (Hz) Two−Burst AirSync Two−Burst CRB Single−Burst AirSync Van de Beek Single−Burst CRB Maximum−Likelihood Figure 2.3: A comparison of several OFDM frequency synchronization techniques. Note that the varying quality of the estimators is almost entirely a function of the number of samples used by the technique, rather than fundamental dierences in the estimation procedures. the frequency oset estimation are a series of four OFDM symbols, each comprised of a Zado-Chu sequence (a constant amplitude, zero autocorrelation sequence). Two of these pilot bursts are separated by a set number of OFDM symbols (this number is variable, for the simulations we used 60) to improve the CFO estimation. The CFO estimation consists of two steps: rst, the same method as Schmidl-Cox is used over the rst four OFDM symbols to determine a coarse estimate of the CFO. Second, the estimate is rened by observing the angle from the rst group of OFDM symbols to the second group. The symbols in between the two pilot bursts can be utilized by additional pilot transmissions. In this way, a large number of pilots may be exchanged in a Distributed MIMO network while simultaneously increasing the accuracy of the estimator and lowering the overhead. A summary of the performance of the above methods is illustrated in Figure 2.3. Note that it is dicult to draw a direct performance comparison between the above methods, as they all use dierent length pilot sequences even under the same system parameters. Of note, however, is that the AirSync method signicantly outperforms 51 CHAPTER 2. SYNCHRONIZATION both the Schmidl-Cox and maximum likelihood approaches using the same number of transmitted samples. The AirSync method seemingly violates the CRB because it eectively uses 68 OFDM symbols, even though nothing is transmitted on 60 of them. It instead meets the CRB of a sequence with this length. 2.3.4 Timing Synchronization by Sample Insertion/Deletion In order to keep APs time-synchronous within a CP between two rounds of syn- chronization, we must eliminate the cumulative eect of the SFO by inserting or deleting samples. For example, suppose that an AP clock is 1% faster than the common sampling frequency dictated by the CS, then this AP must repeat a sample every 100 samples. In order to implement this sample insertion/deletion scheme, each AP needs to know the multiplicative SFO (e.g., that one clock is 1% faster), with respect to a common sampling frequency. We notice that if f 0;1 coincides with the nominal carrier frequency f 0 , then the corrective phase rotation applied each symbol is corr i i (N+L) fs . Hence, it is sucient to include in the network one reference anchor AP with a very precise oscillator (e.g., an expensive oven-controlled oscillator), in order to \transfer" its precision to all other APs in the network. If the number of APs in the network is very large, the marginal cost of one expensive oscillator is negligible. Under this assumption, we have that f s;i =f s + i f s 1 + corr i (N +L) : Hence, with good approximation, the multiplicative SFO factor is given by 1 + corr i (N+L) . Applying this approximation to the system in the previous example, we investi- gate the time it takes until the approximation causes a timing error (i.e., when two 52 2.4. AIRSYNC: A MASTER/SECONDARY SYNCHRONIZATION PROTOCOL 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 x 10 6 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 Pr {Async} Elapsed Time (OFDM Symbols) Figure 2.4: CDF of the time-to-asynchronism (time until two APs are more than a CP asynchronous) of the imperfect sample insertion/deletion mechanism. APs become more misaligned by more than a CP. We explore this scenario assuming 802.11 compliant (20 ppm) oscillators. We use an AP-to-AP SNR of 20 dB, and as- sume the same criteria as discussed in Section 2.3.2 (the channel impulse response is line-of-sight, we use the MSE corresponding to the case ofN c = 256, and we use the same pilot burst format). Figure 2.4 shows that even after 10 6 OFDM symbols, the probability of misalignment by more than a CP interval is still less than 10 3 . As a result, we conclude that the proposed approximation and sample insertion/deletion method is very accurate for any practical purpose. 2.4 AirSync: A Master/Secondary Synchroniza- tion Protocol The validity of the model laid out in Section 2.2 has been veried experimentally using the WARP software dened radio platform [71]. We let a transmitter send several tone signals, i.e., simple unmodulated sine waves corresponding to dierent 53 CHAPTER 2. SYNCHRONIZATION subcarriers of the OFDM modulation, and use a receiver to sample, demodulate and extract the instantaneous phase trajectory of the received tones. In the absence of phase oset, these signals would exhibit a constant phase when measured over a sequence of several OFDM symbols. Instead, the measured instantaneous phases are time-varying and closely approximate parallel straight lines, as shown in Figure 2.5. The common slope of these straight lines is given by the carrier frequency oset f between transmitter and receiver. The spacing between the lines is given by constant phase terms 2 NTs i for dierent subcarrier index , and depends on the time misalignment i between the AP and the nominal slot initial time. The small uctuations around the linear behavior of the instantaneous phase is due to noise, which is quite small for the WARP hardware used in our system, as can be observed qualitatively from plots as in Figure 2.5. It follows that by estimating the spacing between the phase trajectories (inter- cepts with the horizontal axis) and their common slope, we can track and predict the phase de-rotation coecients to be applied at each AP in order to \undo" the eect of the matrix [m;]. Notice that the de-rotation factor must be predicted a few OFDM symbols ahead, in order to include the delay of the hardware imple- mentation between when an OFDM symbol is produced by the baseband processor (FPGA) to when it is actually transmitted. The fact that the common phase drift of all subcarriers can be predicted by observing only a few of them prompts the following approach to achieving phase synchronization between access points: a main access point (master) is chosen to transmit a reference signal consisting of several pilot tones 10 placed outside the data transmission band, in a reserved portion of the available bandwidth. An initial channel probing header, transmitted by the master access point, is used by the other transmitters in order to get an initial phase estimate for each carrier. After 10 The use of multiple pilot tones ensures frequency diversity and spreads the pilot signal power over multiple frequency bins. 54 2.4. AIRSYNC: A MASTER/SECONDARY SYNCHRONIZATION PROTOCOL 5 10 15 20 25 30 35 −180 −120 −60 0 60 120 180 Time (OFDM Symbols) Phase (Degrees) Figure 2.5: Pilot phases measured over 35 OFDM symbols and across 4 subcarriers of a 64 subcarrier system. this initial estimate is obtained, the phase estimates will be updated using the phase drift measured by tracking the pilot signals. The estimate is used to calculate the dierence between the carrier phase of each secondary transmitter and the phase of the master transmitter. This dierence depends on the timing oset between the starting points of their frames and the frequency oset between the carrier frequency of the master AP (denoted by f c;1 ) and the carrier frequency of each secondary AP (denoted by f c;i , for i > 1). After obtaining the channel estimate, the secondary transmitters are able to undo the eect of the instantaneous phase dierence by derotating the transmitted frequency-domain symbols by the phase dierence term along the whole transmission slot. Thus we eliminate the presence of the time-varying diagonal matrix [m;] in front of the estimated channel matrix and therefore achieve the desired MU-MIMO precoding along the whole transmission slot. More specically, at time m = 0, the -th subcarrier signal generated by the master AP has the phase 2 NTs 1 + 1 [0], while the carrier generated by APi has the phase 2 NTs i + i [0]. The phase of the instantaneous phase dierence obtained from the master pilot tones is, ignoring the phase noise terms, 2 NTs ( 1 i ) + 1 [0] 55 CHAPTER 2. SYNCHRONIZATION i [0] +\H i [], where \H i [] is the phase of the channel coecient between the master AP and AP i. If this phase estimate is added to the phase of the generated -th subcarrier at APi, the resulting phase becomes 2 NTs 1 + 1 [0]+\H i [], i.e. the phase of APi is the phase of the master AP plus an oset\H i []. To keep this oset constant over the duration of a transmission slot, the estimate must be adjusted by adding the linear relative phase drift term 2( 1 i )m for all t ranging over the transmission slot. In this way, after the phase compensation all APs transmit at the frequency f c;1 of the master AP. The drift 2( 1 i )m is estimated based on the out-of-band pilots using a sliding window smoothing lter over four samples to compute an updated value of the \slope" i 1 . The secondary AP predicts, based on the current estimate, the instantaneous phase with a few OFDM symbols of look-ahead. The need for look-ahead prediction arises from the fact that the AP must align its phase to the phase of the reference at the moment of the actual transmission, not at the moment that the estimate has been recorded. Thus the look-ahead time ofd OFDM symbols corresponds to the synchronization circuit delay. The prediction is obtained by simple linear extrapolation, by letting the correction term at time m +d be given by 2( 1 i )(m +d), where 1 i is the estimated slope at time m. The constant oset\H i [] becomes a part of the downlink channel estimates and poses no further problems with regard to synchronization both when using downlink and uplink channel estimation schemes. In the implementation in [30], for simplicity, we obtain an individual phase esti- mate of the form 2 NTs ( 1 i ) + 1 [0] i [0] +\H i [] for every subcarrier and use it independently of the estimates for other subcarriers in correcting the subcarrier phase. The form of the phase estimate suggests that it is possible to obtain a bet- ter estimate by breaking the estimation process into two distinct parts: obtaining an initial, high quality estimate of the constant\H i [] during a system calibration 56 2.5. CONSENSUS-BASED SYNCHRONIZATION step and then estimating just the two factors 1 i and 1 [0] i [0] in subsequent packet transmissions. The constant estimate in this case is needed since undoing the angle \H i [] amounts to equalizing the channel between the master AP and the i-th AP. After equalizing the channel, the resulting phases can be unwrapped along the carrier index. It results that, after compensating for the angle\H i , the phase of the estimate is 2 NTs ( 1 i ) + 1 [0] i [0], linear in the carrier index plus a constant term. A linear MMSE tting can be applied in order to nd the two factors mentioned, which are in fact the slope of the line (the carrier phase with regard to the subcarrier index) and its intercept. 2.5 Consensus-Based Synchronization Consensus algorithms arise from the need for coordination of large systems of au- tonomous agents. They elegantly describe many natural phenomena (such as bird ocking and re y synchronization), and have applications in autonomous control, wireless sensor networks, and economics, among many others. Several types of con- sensus algorithms exist, though they all have in common that the agents, acting only on local information, cause their local state parameter to converge to a com- mon global value. One common approach is the so-called \consensus-averaging" algorithm, wherein each user updates its state according to an average of its neigh- bors' states. Such algorithms are useful for understanding the evolution of consensus problems, though in the presence of noisy measurements, they exhibit a random- walk-like behavior, and eventually diverge [78]. Instead, we will utilize a consensus algorithm with a decreasing step size, such that later measurements have less impact on the state update, which is provably convergent [79]. Assume we have N a agents, which in this case are the APs in our network. We will use the terms agents, nodes, and APs interchangeably. Let s i [t] represent some 57 CHAPTER 2. SYNCHRONIZATION internal state or parameter of node i at time t 2 Z + , and letN i represent the neighbors of node i. Each time step, node i acquires a noisy measurement of the states of its neighbors: r k!i [t] =s k [t] +z k!i [t] 8k2N i (2.28) where z k!i [t] are noise terms that are independent across i, k, and t, though not necessarily identically distributed. From [79], we have the following state update equation: s i [t + 1] =s i [t] +a[t] 1 jN i j X k2N i r k!i [t]s i [t] ! (2.29) where a[t] is a decreasing step size that ensures convergence. The choice of decreasing step sizes for a[t] guarantees mean square convergence [79]. For a stronger form of convergence, the authors in [80] assume z k!i [t] to be i.i.d. and prove that there exists a random variable s such that with probability one lim t!1 s i [t] =s for all nodes i. It is worthwhile to notice that the decrease of the step sizea[t] on one hand reduces the eect of the noise, leading to convergence, and on the other hand reduces the ability to drive the nodes' states towards each other. In general, the steps size must be such that P t a[t] =1 while P t a[t] 2 <1. Therefore a sequence which decreases faster than 1= p t and slower than 1=t would suce. A stepsize of a[t] = (t + 5) 0:65 is selected as a good trade-o between reaching convergence and the rate of convergence. In order to map the carrier frequency consensus to the general consensus problem, a natural choice would be to dene the state of nodei asf 0;i [t] (the carrier frequency at iteration t). Note, however, that we do not have access the carrier frequency (which is assumed constant), therefore a direct application of the update equation (2.29) is not possible. Further, because each node has a carrier frequency that diers from the nominal value by some unknown, deterministic quantity, we do not know 58 2.5. CONSENSUS-BASED SYNCHRONIZATION which node, if any, has the \correct" frequency. This implies that we cannot even measure the states as in equation (2.28). However, we can measure the dierence between two carrier frequencies, and we can change the eective carrier frequency by introducing a phase rotation across OFDM symbols (see Section 2.3). Let this eective frequency be denoted as f i [t] =f i [t 1] +f corr i [t] wheref corr i [t] is a correction factor and f i [0] =f 0;i . As such, we have the following measurement equation: k!i [t] = (f k [t]f i [t]) +z k!i [t] 8k2N i (2.30) . If we could have noisy observations of the other states (calling them r k!i [t] as before), from (2.29), we would have f i [t + 1]f i [t] = a[t] 1 jN i j X k r k!i [t]f i [t] ! (2.31) = a[t] 1 jN i j X k f k [t] +z k!i [t]f i [t] ! = a[t] 1 jN i j X k k!i [t] ! (2.32) but k!i [t] is precisely what we can obtain from a frequency estimator at AP i estimating a pilot tone transmitted by AP k, hence, we can directly calculate the update term f corr i [t + 1] =f i [t + 1]f i [t] (2.33) =a[t] 1 N X k k!i [t] ! : (2.34) Applying such a frequency shift is simply a matter of rotating the time-domain symbols by a linearly increasing phase, as pointed out in Section 2.3: _ X i [m;] =X i [m;] exp j2 1 + 1 h N i f corr i [m]T s (N +L)m (2.35) 59 CHAPTER 2. SYNCHRONIZATION For the timing consensus, we can use the same update equations, but with a slight modication. Timing measurements are inherently biased by the propagation and hardware delays, but as discussed in Section 1.2, the setup phase of our network involves measuring these parameters. Since the APs are static and hardware delays are largely deterministic, we only need to measure these quantities once. Further, we know from Section 2.2 that the sampling clock periods are approximately equal, and that timing is only needed to within a cyclic prex time,LT s . As such, if we can measure timing synchronization to within one sample period, then we can correct the occasional extra/lost sample induced by the small SFO by simply advancing or decrementing the clock by one sample (see Section 2.3.4). This ensures that we are no more than a few samples o after reaching consensus, which is within the cyclic prex. In this instance, assume we transmit from each AP at some known periodT probe , e.g., every 1s. Since each node knows the propagation delay from every other node in its neighborhood, we can determine the expected arrival time of a signal. Thus we want to measure the timing dierence between each pair of nodes: k!i [t] = ( k [t] i [t]) +z k!i [t] 8k2N i (2.36) and we will update the time of our next transmission as follows: i [t + 1] = i [t] +a[t] 1 jN i j X k2N i k!i [t] ! : (2.37) The behavior of the timing and frequency consensus algorithms is illustrated in Figures 2.6a and 2.6b. The carrier consensus converges to within 10 Hz, which is well within the guidelines specied in Section 2.2. The timing consensus converges to within 5 samples, which again is well within the requirement of being within the cyclic prex length. Both cases use the topology later described in Figure 3.1. 60 2.5. CONSENSUS-BASED SYNCHRONIZATION 0 20 40 60 80 100 −15 −10 −5 0 5 10 15 20 25 Offset From Nominal Freq (kHz) Time (iterations) (a) Carrier frequency consensus evolution. The carrier consensus converges to a maximum CFO of 10 Hz. 0 20 40 60 80 100 −250 −200 −150 −100 −50 0 50 100 150 200 Time (Iterations) Timing Offset (samples) (b) Timing consensus evolution. The timing consensus converges to within 5 samples. Figure 2.6: Timing and carrier consensus time evolution. 61 CHAPTER 2. SYNCHRONIZATION 2.6 Least Squares Synchronization The long convergence time of the consensus approach does not lend itself to practical deployments. In reality, by the time the consensus approach has converged, the oscillators will have drifted to new frequencies. 11 Further, the assumption that the APs act as autonomous agents is overly restrictive in our case, since the central controller may allow for a more ecient form of synchronization. In this section we introduce the least-squares synchronization approach, which utilizes the central controller to compute accurate frequency correction factors in one shot. As with consensus, this method neglects the eects of oscillator dynamics, but because it does not require multiple iterations to achieve frequency correction, it is more robust than consensus. We will begin to consider oscillator dynamics in Section 2.7. Consider the architecture discussed in Section 1.2. Each anchor node collects pilot bursts from all its neighboring anchor nodes and produces estimates of the TO and CFO with respect to its neighbors. Any suitable estimation scheme can be used for this purpose, such as the joint ML timing and frequency estimator derived in Section 2.3.1. This estimator and the associated CRB are relevant to our scenario, where the pilot signal is observed through a multipath channel with known path delays and unknown path coecients. We focus on the centralized estimation of the correction factors corr i and corr i from the noisy CFO and TO estimates c j!i = j i +z j!i ; (2.38) and c j!i = j i +w j!i ; (2.39) 11 For example, [65] reports a convergence time of 100 seconds, while the oscillator dynamics vary on the order of seconds. Though this section does not consider oscillator dynamics either, it has the advantage of synchronizing on a time scale much smaller than the oscillators' temperature- induced uctuations. 62 2.6. LEAST SQUARES SYNCHRONIZATION for all i2A and for all j2A(i), 12 where z j!i and w j!i are estimation errors with variance equal to the corresponding estimation MSE. As already noted, if j2A(i) then i2A(j). Therefore, these measurements always come in pairs. The CS collects all these estimates and computes the correction terms corr i ; corr i . In the absence of estimation errors we would have c j!i = j i . Hence, a reasonable approach consists of minimizing the weighted LS cost function J ( 1 ;:::; jAj ) = X (i;j)2E(A) j;i c j!i ( j i ) 2 ; (2.40) where j;i = 1=E[jz j!i j 2 ] is the inverse estimation MSE for the link j! i. Notice that if the estimation errors z j!i are Gaussian i.i.d., then the minimizer of (2.40) coincides with the joint ML estimator based on the observations (2.38). Taking the partial derivative with respect to i , and setting it equal to zero, we obtain the set of linear equations X j2A(i) ( i;j + j;i ) ( i j ) = X j2A(i) i;j c i!j j;i c j!i ; (2.41) that can be written in matrix form as L = u; (2.42) where = ( 1 ;:::; jAj ) T , u is thejAj 1 vector with i-th element X j2A(i) i;j c i!j j;i c j!i ; (2.43) 12 In the following,A(i) denotes the neighborhood of AP i in the subgraph (A;E(A)). 63 CHAPTER 2. SYNCHRONIZATION and L is thejAjjAj matrix with (i;j)-th elements [L] i;j = 8 > < > : P j2A(i) ( i;j + j;i ) for j =i ( i;j + j;i ) for j6=i (2.44) In general, the actual estimation MSE is not known and dicult to estimate in an on-line implementation. Hence, the CRB developed in Section 2.3.1 can be used to approximate the coecients i;j , motivated by the fact that in the channel SNR range of interest for WLAN applications, good estimators yield MSE essentially identical to the CRB (see results in Section 2.3.2). A simple alternative consists of using i;j =const. for alli;j, that is, using LS instead of weighted LS. In particular, by letting i;j = 1=2 for all (i;j)2E(A) we have that L is the Laplacian matrix of the connectivity graph of (A;E(A)). 13 It follows immediately that L has rankjAj 1 and the system of equations (2.41) is under-determined. In fact, L1 = 0 (1 being the all-one vector). This is to be expected, since only frequency dierences can be observed. In order to nd a solution, we choose a reference anchor AP, without loss of generality indexed by 1, and dene the frequency dierence vector with elements i = i 1 for alli2A with i6= 1. With this substitution, we obtain the system of equations L 1 = u, where L 1 is obtained from L by eliminating the rst column, yielding the canonical LS solution = (L T 1 L 1 ) 1 L T 1 u. The resulting correction factor corr i is equal to 0 for i = 1 (reference anchor AP) and equal to i for i6= 1. The TO estimation problem is completely analogous, by dening the corre- sponding weighted LS cost function P (j;i)2E(A) j;i c j!i ( j i ) 2 , with j;i = 1=E[jw j!i j 2 ]. The details are omitted for brevity. We note that the solution to the LS algorithm scales polynomially with the 13 The connectivity graph of the directed graph (A;E(A)) is the undirected graph with nodes A and edges connecting i;j whenever (i;j); (j;i)2E(A). The Laplacian matrix is thejAjjAj matrix with elements1 in positions corresponding to i;j whenever these nodes are connected, elementjA(i)j (the degree ofi) in the diagonal position corresponding to nodei and zero elsewhere. 64 2.6. LEAST SQUARES SYNCHRONIZATION number of anchor nodes, and that this is a limitation of centralized processing. For modern servers with many CPU/GPU processing elements, parallel algorithms may alleviate this constraint. In order to evaluate the impact of TO, CFO and SFO compensation on the performance of Distributed MU-MIMO, we revisit the example of Section 2.2.2 under the proposed synchronization/compensation scheme. In this example there are four single-antenna APs serving four UTs. For this small network, we assume that all four APs are anchor nodes, in line of sight of each other (the channel is formed by a single path with an unknown but constant channel coecient), with the same AP-to-AP SNR. We perform timing and frequency joint ML estimation, as derived in Section 2.3.1 to estimate the osets for each pair of APs, and then use LS-based synchronization on the fully connected anchor graph. For simplicity, in all the numerical results we have used LS with uniform weights, assuming (conservatively) that the CS does not know the actual estimation MSEs. The performance of the TO/CFO estimators between any pair of APs in this example are given in Figure 2.2b in Section 2.3.1, showing the MSE vs. SNR rela- tionship of the proposed joint timing and frequency ML estimator along with the associated CRB. We notice that even for fairly low SNR values this estimator per- forms very close to the CRB, which decreases inversely proportional to the receiver SNR. Figure 2.7 shows the achievable rate versus the index of the OFDM symbol for the case where the channel matrix is perfectly known at time m = 0 and, as explained before, the ZFBF precoder is calculated at time 0 and kept constant throughout the block. 14 While the results of Figure 2.1 are obtained for free-running oscillators, here we assume that at time m = 0 the CFO is estimated according to the proposed LS scheme and the phase rotation factor (2.14) is applied throughout 14 Achievable rates are calculated assuming perfect calibration to focus solely on the synchro- nization eect. 65 CHAPTER 2. SYNCHRONIZATION 0 100 200 300 400 500 600 700 800 900 1000 0 5 10 15 20 25 30 35 40 Time (OFDM Symbols) Sum Rate (bit/s/Hz) Ideal Sync 40 dB AP−AP SNR 30 dB AP−AP SNR 20 dB AP−AP SNR No Sync (a) Varying the AP-to-AP SNR with N c = 256. 0 100 200 300 400 500 600 700 800 900 1000 0 5 10 15 20 25 30 35 40 Time (OFDM Symbols) Sum Rate (bit/s/Hz) N c =1024 N c =256 N c =64 (b) Varying the sequence length while AP-to-AP SNR is 30 dB. Figure 2.7: The achievable rates versus time of a 4 4 system after synchronization and compensation as a function of the OFDM index in the data slot. 66 2.7. RECURSIVE LEAST SQUARES SYNCHRONIZATION the sequence of OFDM symbols. We also assume that the SNR for the AP-to-UT channels is constant at 30 dB. For the pilot bursts, we used a structure formed by two repetitions of the same sequence of N c =4 random QPSK time-domain chips, separated by N c =2 zeros, for a total sequence length of N c pilot chips (see also the example in Section 2.3.1). In Figure 2.7a, we held the sequence length constant at N c = 256 and vary the AP-to-AP SNR. Consistent with Figure 2.2, the higher the SNR, the lower the estimation MSE and, as a consequence, the residual CFO is smaller and produces a less evident mismatch of the MU-MIMO precoder throughout the data block in Figure 2.7a. In Figure 2.7b, we held the AP-to-AP SNR constant at 30 dB while varying N c . We observe that with N c = 1024 (roughly corresponding to 16 OFDM symbols) we can aord to send about 1000 OFDM symbols with small degradation. Considering data slots of 100 OFDM symbols, we can aord 10 DL MU-MIMO precoded data slots in between each synchronization slot. Performance can be improved further by implementing a smoothing lter over time (e.g., Kalman ltering) in order to track the timing and frequency corrections factors through the sequence of frames instead of re-estimating anew at each frame. This requires modeling the variations of the clock frequency errors i over time (e.g., a Gauss- Markov model) and will be covered in Section 2.8. 2.7 Recursive Least Squares Synchronization Each round of the synchronization process presented in Section 2.6 produces a new set of estimates for the correction factors corr i . These factors are imperfect due to pilot and estimation noise, so we wish to devise a scheme which improves on the estimates by reducing their variance. Each round of least squares synchronization, 67 CHAPTER 2. SYNCHRONIZATION we can consider the derived correction factor to be as follows: corr i [q] = i [q] 1 [q] + i [q]; (2.45) where we have assumed without loss of generality that AP 1 is the reference AP, and i [q] is estimation noise with variance 2 i . Rather than developing estimates with random error quantities each round, we can produce a more accurate estimator by not throwing away the previous estimate each round. Assuming these errors are i.i.d., a simple averaging of the correction factors can achieve this goal. This is a natural extension to the least squares approach known as recursive least squares, wherein we introduce an averaging window of length T c to reduce the estimator variance: corr i [q] = 1 T c Tc1 X `=0 corr i [q`]: (2.46) When the eective oscillator frequencies are treated as unknown constants, then i [q`] = i [q] for ` = 1;:::;T c 1, and thus corr i [q] = i [q] 1 [q] + 1 T c Tc1 X `=0 i [q`]: (2.47) The resulting overall estimation variance is 2 i Tc , which does indeed result in a lower estimation variance by a factor of T c . However, when the oscillators drift in frequency (as consumer-grade crystal os- cillators tend to do), the oscillator is treated as a random process which varies with time due to physical phenomena. In this case, the recursive least squares ap- proach is not the best linear unbiased estimator, as was the case for one-shot least squares. Consider the model of the oscillator frequency f s;i [q] as an AR-1 process, i.e. f s;i [q] =a i f s;i [q 1] + i [q 1] where i N (0; 2 i ) is process noise anda i is the AR-1 parameter. This model is well motivated in the literature, [81, 82] and may 68 2.7. RECURSIVE LEAST SQUARES SYNCHRONIZATION be mathematically validated through Burg's maximum entropy theorem [83]. From (2.9), we can derive a relationship with i [q] such that: i [q] =(N +L)T s i (2.48) =a i i [q 1] + e [q 1] + e f s ; (2.49) where e i N (0; ((N +L)T s ) 2 2 i ) and e f s =(N +L)T s (a 1)f s is a constant. By applying the recursion in (2.49) (T c 1) times to (2.45), the eect of the oscillator dynamics become apparent. The resulting eect on the variance of corr i [q] is, predictably, less advantageous than when the oscillator was treated as a constant: corr i [q] = 1 T c Tc1 X `=0 i [q`] 1 [q`] + i [q`] (2.50) = 1 T c " Tc1 X j=0 a j i ! i [qT c + 1] + q1 X k=qTc+1 qk1 X m=0 a m i ! i [k] # 1 T c " Tc1 X j=0 a j 1 ! 1 [qT c + 1] + q1 X k=qTc+1 qk1 X m=0 a m 1 ! 1 [k] # + 1 T c Tc1 X `=0 i [q`]: (2.51) While the eect of the e f s factor cancels out, the overall variance of the estimator increases due to the addition of the process noise over the window T c . The variance due to the reference oscillator may be reduced by utilizing a single high quality oscillator for one reference node, as suggested previously. The one-shot estimation noise is again reduced by a factor of T c , but whether this advantage can overcome the addition of the oscillator process noise is dependent on the noise statistics. As will be seen in the following section, a more appropriate scheme which utilizes the noise statistics to achieve the minimum mean square error of the estimator can be derived. 69 CHAPTER 2. SYNCHRONIZATION 2.8 Kalman Filter Synchronization As mentioned, there exists an optimal ltering strategy over synchronization frames which minimizes the mean square error of the estimator in steady state. To see this, let us consider the vector process f s which models the oscillator processes for all the APs. We will refer to this as the \eective" oscillator frequency, which includes the hardware oscillator frequency as well as the corrections induced via baseband symbol rotations. We assume that f s evolves in time according to an AR-1 vector process: f s [q] = Af s [q 1] + [q], where [q]N (0; ) is the evolution noise. Note that in general, the noise processes i [q] and j [q] for i6=j may be correlated, e.g. through temperature. Since f s is also aected by corrections induced by the baseband frequency domain rotations, we include this input via the vector u[q] such that the update equation becomes f s [q] = Af s [q 1] + u[q] +[q]. The state f s [q] is observed through a noisy linear transformation, y[q] = Mf s [q]+ z[q] where zN (0; z ) is the estimation noise. This also ts within the framework of noisy frequency dierence estimates seen previously. Each row of M corresponds to a frequency dierence observation, where the element in column i is 1 and the element in column j is1 (with the remaining elements being 0) for an estimate from i to j. The matrix z is diagonal, where the (k;k)th element contains the MSE of the estimator corresponding to the estimate in row k of M. Assume the desired estimator is unbiased (E[ b f s [q]] = E[f s [q]]), then we wish to show that the best linear unbiased estimate of f s [q] is b f s [q] = b f s [q] + K[q](y[q] M b f s [q]): (2.52) where b f s [q] = A b f s [q 1] + u. We start by assuming b f s [q] is a linear function of 70 2.8. KALMAN FILTER SYNCHRONIZATION b f s [q] and y[q], and we wish to nd the proper value of : b f s [q] = b f s [q] + K[q]y[q]: (2.53) Assume K[q] is a known constant. Taking the expectation of the estimate: E[ b f s [q]] =E[ b f s [q] + K[q]y[q]] (2.54) =E[A b f s [q 1] + u[q]] + K[q]ME[f s [q]] (2.55) =(AE[ b f s [q 1]] +E[u[q]] +E[[q]]) + K[q]ME[f s [q]] (2.56) E[f s [q]] =E[f s [q]] + K[q]ME[f s [q]] (2.57) = I K[q]M: (2.58) Thus b f s [q] = b f s [q] + K[q](y[q] M b f s [q]) (2.59) We then choose the quantity K[q] (known as the Kalman gain) in order to minimize the trace of the error covariance P[q] = E (f s [q] b f s [q]) 2 . The result is the well-known solution: K[q] = P [q]M T (MP [q]M T + z ) 1 (2.60) where P [q] = AP[q 1]A T + . This gives the optimal estimator of the oscillator states f s : b f s [q] = b f s [q] + P [q]M T (MP [q]M T + z ) 1 (y[q] M b f s [q]): (2.61) 71 CHAPTER 2. SYNCHRONIZATION Given the estimate of the state, the remaining question is how to correct the state to the desired quantity. One approach is to utilize the control theory dual of the Kalman lter, known as Linear Quadratic Gaussian Control. Such an approach requires choosing u[q] by minimizing a quadratic control cost function at every time q: J LQG =E " q X `=0 f s [`] T Q[`]f s [`] + u[`] T R[`]u[`] # : (2.62) This allows one to trade o the speed of convergence and the accuracy of the steady state error by choosing the matrices Q[`] and R[`]. The solution to this problem can be solved numerically using Algebraic Riccati Equation solvers, available in most standard numerical software packages. However, seeing as there is no real \cost" of correcting the frequency osets (i.e., large frequency osets are just as easy to correct as small frequency osets given they are both correctable by OFDM phase rotations and have the same estimation variance) one can form a more direct choice of u[q]. We instead minimize the mean square error of the state vector with respect to the ideal nominal frequency, f 0 : J MSE =E kf s [q]f s;0 1k 2 (2.63) where 1 is the vector of all ones and f s;0 is the nominal frequency. The solution to this cost function can be solved directly: @ @u[q] J MSE = 2E[Af s [q 1] + u[q] +[q]f s;0 1] = 0 (2.64) u[q] =f s;0 1 A b f s [q 1] (2.65) since b f s [q 1] is unbiased. In this manner, the eective frequencies of all oscillators are driven to the nominal frequency. The algorithm is summarized below: 72 2.8. KALMAN FILTER SYNCHRONIZATION M f y u ζ z + + z -1 A Figure 2.8: A diagram of the oscillator process 1. Learning: estimate (or otherwise utilize existing knowledge to attain) the ma- trices A, , and z . 2. Initialization: Let c f s;i (0) = f 0;s (nominal frequency)8 i and let the error covariance P(0) = I 3. Observation: Collect measurements y[q] 4. Form the a priori state and error covariance estimates: b f s [q] = A b f s [q 1] + u[q] P [q] = AP[q 1]A T + 5. Calculate Kalman gain: K[q] = P [q]M T (MP [q]M T + z ) 1 6. Calculate a posteriori state and error covariance estimates: b f s [q] = b f s [q] + K[q](y[q] M b f s [q]) P[q] = (I K[q]M)P [q] 7. Let u[q + 1] =f 0 1 A b f s [q] 8. Repeat from step 3 73 CHAPTER 2. SYNCHRONIZATION 0 10 20 30 40 50 60 70 80 90 100 10 −2 10 −1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Time (Synchonization rounds) Absolute Error from Reference AP (Hz) AP 2 AP 3 AP 4 Figure 2.9: The absolute error of the carrier frequency of three nodes from that of a \primary" AP (AP 1) as a function of time. Given this process, we can simulate the accuracy of the vector Kalman lter technique. We construct a network of four APs on the vertices of a square. The AP-to-AP SNR is determined by the distance between each AP pair, and in this case is 30-33 dB for each AP pair. A synchronization round is performed every 100 ms, in which each AP exchanges pilots with every other AP and estimates the frequency dierences using the AirSync estimator. We assume perfect knowledge of the matrices A, , and z . A and are chosen according to the hardware measurements described in Section 5.2 and the processes are (perhaps inaccurately) assumed uncorrelated. The diagonal elements of z are chosen according to the mean squared error of the AirSync estimator at the given AP-to-AP SNR of the AP pair, while the o diagonal elements are (again, perhaps inaccurately) assumed to be zero. AP 1 is arbitrarily assumed to be the reference by which we measure the other APs. The error from this reference is shown in Figure 2.9. Note that the network achieves sub-10Hz synchronization, which is sucient for beamforming over packets relevant to the 802.11 protocol. 74 2.9. CONCLUSION 2.9 Conclusion In this chapter we presented scalable solutions for one of the main implementa- tion hurdles of the Distributed MU-MIMO downlink. First, we motivated the need for timing and frequency synchronization, and detailed the performance degrada- tion in the case of uncompensated frequency osets between the jointly precoded APs. We then discussed the typical techniques used in synchronizing OFDM trans- mitter/receiver pairs, and subsequently used this approach for a Master/Secondary synchronization protocol. Next, we outlined a network-wide consensus synchro- nization procedure based on a pilot burst exchange protocol with local timing and frequency oset estimation. This method was made more practical through the so- lution of a centralized constrained LS solution for the overall timing and frequency correction factors. Finally, we further improved upon the LS approach by consid- ering ltering over frames with Recursive Least Squares, which turned out to be of limited use when considering oscillator dynamics. Subsequently, we were able to derive an optimal estimation and control problem for dynamic oscillators through Kalman ltering. Notice that synchronization involves only the APs and does not assume any UT collaboration. Therefore, the proposed schemes are suited to work with legacy user equipment, not explicitly designed for this purpose. Also, all the proposed schemes are immediately applicable to APs equipped with multiple an- tennas, where the groups of antennas belonging to the same AP are driven by a common clock and therefore need not be mutually synchronized. It is also interesting to notice that the proposed synchronization schemes calcu- late the timing and frequency correction factors relative to some particular anchor AP (denoted by AP 1 in Section 2.6). Hence, a previously mentioned side benet of our scheme is that network-wide timing and frequency stability can be achieved by having a single node (denoted as AP 1) equipped with a very stable oscillator, with- out the need to have a high-SNR connection from each node to this particular AP. 75 CHAPTER 2. SYNCHRONIZATION Finally, based on extensive simulation results and experimental evidence provided by a software radio implementation [29,30], we conclude that the proposed schemes eectively enable the implementation of Distributed MU-MIMO network architec- tures, and can turn a cluster of small cell APs into a large distributed cooperative antenna system. 76 Chapter 3 Calibration 3.1 Introduction To achieve large spectral eciencies over distributed large-scale antenna deploy- ments, the use of high-performing multiuser MIMO is needed. To enable MU-MIMO, timely channel state information is needed at the transmitter. The most promis- ing method of acquiring this CSIT relies on channel reciprocity and Time-Division Duplex transmission. Channel reciprocity refers to the ability to infer downlink channel coecients from the measurement of uplink pilots, and TDD uses the same frequency band for both uplink and downlink transmissions. Without channel reci- procity, orthogonal pilots must be transmitted from each AP antenna, measured by each user, and then fed back to the APs for an overall overhead proportional to N a +N u . With TDD channel reciprocity, orthogonal pilots are transmitted from each user, and the APs infer the downlink channels based on these N u pilots. Since in the case of massive MIMO, N a is much larger than N u , the use of uplink pilots accounts for a signicant reduction in CSIT overhead. While Distributed MIMO may be applied in many dierent scenarios, in this work we focus primarily on the cost-eective topology discussed in Section 1.2. We 77 CHAPTER 3. CALIBRATION assume a network of APs, such as would be found on a corporate or academic cam- pus, conference center, or airport, connected via a wired backbone (i.e., Ethernet) to a central server that processes data from (and to) all APs and acts as a gateway to other networks. Once the downlink estimates are formed, they are sent to the CS and used to calculate the precoder for the downlink MU-MIMO scheme within the coherence time and coherence bandwidth of the channel. Because consumer-grade equipment is being utilized, the assumption of recipro- cal RF chains at the AP side is not necessarily true and channel reciprocity may be violated by the APs' transmit and receive lters. In fact, while the uplink and downlink channels from antenna to antenna have identical impulse responses in the same coherence interval [68], the baseband-to-RF and RF-to-baseband conversion chains are not reciprocal unless some specic self-calibrating design is used. As a result, the eective downlink baseband channel is not equal to the eective uplink baseband channel. Unless this mismatch is explicitly compensated for, learning the uplink channels is not sucient to guarantee multiuser signal separation by joint precoding in the downlink. Since typical low-cost hardware designs, relying on o- the-shelf radios, have not been designed with reciprocity in mind, devising ecient and scalable signal processing schemes to achieve TDD reciprocity in a large-scale Distributed MU-MIMO network is highly desirable. Under some reasonable simplifying assumptions (discussed in Section 3.2), the non-reciprocal elements of the RF chains can be treated as random (yet stable in time) complex coecients which multiply the baseband equivalent impulse response. Transmission in the downlink is multiplied by the AP's transmit coecient and the user's receive coecient. Likewise, transmission in the uplink is multiplied by the user's transmit coecient and the AP's receive coecient. The compensation of these coecients forms the crux of reciprocity calibration. The treatment of reciprocity calibration in the existing literature can be classi- 78 3.1. INTRODUCTION ed into a few rough categories. The most basic form of calibration, which we will refer to as \absolute calibration," explicitly compensates for the coecients of each transmit and receive chain on each device. This is the form of calibration which may be utilized in high-performance, high-stakes hardware such as satellite commu- nication. The additional cost and manufacturing complexity of this approach makes it impractical for consumer-grade equipment. Alternatively, a technique known as \relative calibration" in [13] can be utilized which does not require explicit knowl- edge of each transmit and receive lter. Instead, some coecients are selectively compensated for, which results in a reciprocal channel (though not necessarily the \true" physical channel). The resulting equivalent reciprocal channel functions as desired. However, this technique fundamentally relies on pilot signaling among the APs and the user terminals (UTs), and requires feedback from each user terminal to the APs. The requirement that UTs be involved in the calibration signaling, and the inherent feedback overhead of this method renders it unsuitable for large-scale Distributed MU-MIMO. In [10] a proof-of-concept reciprocity-based Massive MIMO implementation, re- ferred to as Argos, was presented along with a new TDD calibration method. One attractive feature of the Argos calibration scheme is that it only requires the APs to be involved in the calibration, i.e., it does not involve the UTs in the process. This approach is, somewhat confusingly, also referred to as \relative calibration," and the remainder of this manuscript will use this denition of relative calibration. One important drawback of Argos calibration, however, is that it is very sensitive to the relative placement of the reference antenna used for calibration [10]. As a result, this scheme is not readily scalable and is not suited for enabling large-scale MIMO in distributed antenna deployments. In this chapter we consider a new class of techniques for TDD reciprocity cali- bration that can enable robust and ecient large-scale MU-MIMO operation. The 79 CHAPTER 3. CALIBRATION techniques presented can be regarded as an extension of the Argos calibration meth- ods [10], recovering it as a special case. As we demonstrate in this chapter, the proposed scheme signicantly outperforms Argos even in a co-located deployment. More importantly, unlike Argos, it enables eective spatial multiplexing gains and high-performance in large-scale reciprocity-based Distributed MU-MIMO system. 3.2 Calibration Model and Requirements We consider a network formed by user nodes k = 1;:::;N u served in the downlink by APsi = 1;:::;N a , using Distributed MU-MIMO and OFDM. We consider linear precoding methods, such as, e.g., linear zero-forcing beamforming [31] and conjugate beamforming [2]. We focus our attention on the calibration problem, and assume that synchroniza- tion is perfectly achieved (see Chapter 2 for further discussion on synchronization). The downlink signal at the UT receivers, at OFDM symbol m and subcarrier , is given by the N u 1 vector y[m;] = H[m;]x[m;] + z[m;]: (3.1) where H[m;] is theN u N a channel matrix with (i;k)-th elementH i;k [m;], x[m;] is the N a 1 vector of frequency domain symbols transmitted by the N a APs, and z[m;] is the corresponding N u 1 vector of i.i.d. CN (0;N 0 ) noise samples. The downlink channel matrix H[m;] is given by H[m;] = e R[m;]B[m;]T[m;]; (3.2) where e R[m;] = diag( ~ R 1 [m;];:::; ~ R Nu [m;]) 80 3.2. CALIBRATION MODEL AND REQUIREMENTS T[m;] = diag(T 1 [m;];:::;T Na [m;]) are diagonal matrices of complex coecients, introduced by the users' receiver chains, and by the APs' transmission chains, respectively. The matrix B[m;] rep- resents the discrete-time frequency domain physical channel at subcarrier and OFDM symbol m, containing the channel coecients due solely to the antenna-to- antenna propagation. In reciprocity-based MU-MIMO the downlink channel matrix is estimated at the APs based on uplink pilot signals transmitted by the user terminals. The relevant uplink channel at OFDM symbol m and subcarrier is given by Y up [m;] = H up [m;] e X[m;] + Z up [m;]; (3.3) where Z up [m;] is the uplink Gaussian noise vector and e X[m;] is a N u N u uni- tary matrix of frequency domain uplink pilot symbols. The uplink channel matrix, H up [m;] satises H up [m;] = e T[m;]B[m;] T R[m;] (3.4) with e T[m;] = diag( e T 1 [m;];:::; e T Nu [m;]) denoting the matrix of user transmitter coecients, and R[m;] = diag(R 1 [m;];:::;R Na [m;]) denoting the matrix of AP receiver coecients. The key property exploited by reciprocity-based MU-MIMO is that the physical channel matrix B[m;] is the same in both uplink and downlink due to TDD (uplink and downlink are at the same carrier frequency) and the reciprocity of the physical propagation channel 1 . In the absence of RF impairments, i.e., in the case that R[m;] = T[m;] = I Na ; and e R[m;] = e T[m;] = I Nu (3.5) 1 As long as the interval between UL and DL is much smaller than a channel coherence time, which is typically between 1ms and 1000 ms 81 CHAPTER 3. CALIBRATION we have H up [m;] = H[m;] T (3.6) and hence, estimates of the uplink channel H up [m;] directly provide estimates of the downlink channel H[m;]. These channel estimates can then be directly used at the APs to calculate the MU-MIMO precoder for the downlink. In practice, however, R[m;], T[m;], e R[m;], e T[m;] are non-identity unknown diagonal matrices that vary slowly in time (m) and frequency (). These impairments must be compensated for in order to enable reciprocity-based MU-MIMO transmission. We now investigate the requirements on reciprocity, and illustrate why relative calibration is sucient. First note that since R[m;], T[m;], e R[m;], and e T[m;] vary very slowly inm (on the order of several minutes), because of the temperature drift of the front-end electronic components, the calibration protocol can operate at a much slower time scale than the MU-MIMO uplink channel estimation for centralized deployments such as Argos. Hence, for the sake of estimation, these matrices can be treated as unknown constants. Without loss of generality we focus on a particular tone and drop the depen- dence of all variables on the OFDM symbol index m and subcarrier for notation simplicity. The uplink pilot burst Y up is sent to the CS, which estimates the uplink channel as 2 b H up = Y up e X H = H up + e Z up ; with e Z up = Z up e X H . Neglecting for the time being the estimation error e Z up , we have that if the CS computes the downlink multiuser MIMO precoder from b H up this will be mismatched with respect to the downlink channel H because of the presence of the diagonal matrices T; e R in lieu of R; e T. The key observation made in [10] is that the downlink channel matrix e RBT is 2 Note that, as explained in Section 3.4, in our simulations we used improved MMSE-type estimates in place of b H up . 82 3.2. CALIBRATION MODEL AND REQUIREMENTS not entirely needed to perform beamforming. In fact, only the row space of this matrix is needed, that is, any matrix formed by H alt = DBT; (3.7) with D some arbitrary invertible constant diagonal matrix, can be used as an alterna- tive for any kind of beamforming. For example, consider Zero-Forcing Beamforming. We can calculate the ZFBF precoding matrix as V = H H alt H alt H H alt 1 1=2 (3.8) where is a diagonal matrix that imposes on each row of the matrix V, the row kv i k 2 = 1, for all i. Hence, the received version of the ZFBF precoded signal u in the downlink becomes y = e RBTVu + z (3.9) = e RBTT H B H D H DBTT H B H D H 1 1=2 u + z (3.10) = e RBB H D H D H B H T H T 1 B 1 D 1 1=2 u + z (3.11) = e RD 1 1=2 u + z (3.12) We notice that the resulting channel matrix is diagonal, provided that N u N a . As a result, the problem reduces to estimating BT up to the right multiplication by some matrix D, from the uplink training observation e TBR. In particular, as shown in [10], an estimate of the diagonal relative calibration matrixRT 1 , for some non-zero scalar, suces for enabling spatial multiplexing with reciprocity-based MU-MIMO. Assuming that the uplink channel H up = e TB T R is provided by uplink channel estimation (ignoring estimation noise), and RT 1 is available for some arbitrary (and unknown) 6= 0, multiplying H up from the right 83 CHAPTER 3. CALIBRATION by the inverse of RT 1 provides the CS with a matrix H alt of the form (3.7) with D = 1 e T. As a result, y = e RT 1 1=2 u + z; (3.13) i.e., the eective downlink precoded channel matrix is diagonal. An alternative way of arriving at this result involves pre-compensating for the eects of the RF impairments at each AP. In particular, if each AP i premultiplies its own transmit signal by the corresponding element R i =T i , the downlink channel can be turned into a \calibrated" downlink channel with matrix e RBR. Suppose for example that the multiuser downlink precoding matrix is the uplink channel pseudo-inverse V = (H up ) H H up (H up ) H 1 1=2 . Then, this matrix applied to the calibrated downlink channel yields y in the form (3.13), resulting again in a diagonal matrix of the precoded downlink channel. In summary, spatial multiplexing with perfect user signal separation is possible provided the calibration protocol allows for suciently accurate estimation of the matrix RT 1 , dened up to some arbitrary non-zero factor . 3.3 A Master/Secondary Calibration Protocol In this section we revisit the relative calibration method of Argos [10], and refor- mulate it mathematically. Recall that the goal of relative calibration consists of estimatingR i =T i for each APi 3 , i.e., estimatingR i =T i up to a common (for alli) multiplicative constant. Lettingc i =R i =T i and setting this constant equal to one of thec i 's (e.g., =c 1 ), the task of calibration reduces to estimating allc i 's relative to c 1 . In Argos, each AP is calibrated independently of all other APs with respect to a reference AP. In particular, the Argos calibration procedure is as follows: 3 In the centralized case, each AP is essentially an array element 84 3.3. A MASTER/SECONDARY CALIBRATION PROTOCOL (A1) Sequentially transmit calibration pilots, one pilot from each AP. The observa- tion collected by AP j when AP i transmits its pilot can be expressed as Y i!j =T i B i!j R j +Z i!j (3.14) whereB i!j is the channel response from antenna i to antennaj that is solely due to the propagation environment, and Z i!j represents noise. (A2) Calculate calibration variables a i!1 = Y i!1 Y 1!i . (A3) Receive the uplink channel estimate b H up . (A4) Calculate the downlink channel estimate b H = b H up A where A = diag(a 1!1 ;:::;a Na!1 ). The Argos calibration mechanism (A2) for AP j relies only on the observations collected by the pair of APs 1 andj during the associated pair of pilot transmissions. In particular, since B j!1 = B 1!j , the ratio Y 1!j =Y j!1 provides an estimate of the ratio (R j =T j )=(R 1 =T 1 ), i.e., it provides an estimate of the desired relative calibration parameter. To frame calibration in a more general mathematical setting, consider the follow- ing. Assume that the APs form a connected directed network graph (T;E), where T =f1;:::;N a g and (i; j)2E if the channel between APs i and j has suciently large SNR. During the calibration slots, pilot bursts are transmitted and received by the APs over a connected spanning subgraph (T;F) including all the APs and a subset of linksFE. Specically, we have (i; j)2F if there is a pair of ob- servationsfY i!j ; Y j!i g of the form (3.16), due to calibration pilots transmitted by APs i and j on distinct OFDM symbols but within the same coherence-time of the channel. Hence, the subset of links is such that if (i;j)2F then also (j;i)2F. For example, (T;F) could be obtained as a spanning tree of the underlying undirected network graph, where each edge of the spanning tree corresponds to two directed 85 CHAPTER 3. CALIBRATION edges inF. Let (i;j)2F. As before, after a calibration slot AP i gathers the observation Y j!i =T j B j!i R i +Z j!i ; (3.15) and AP j gathers the observation Y i!j =T i B i!j R j +Z i!j : (3.16) The Argos procedure actually dictates the estimation of ratios of c i s, as a i!1 = c 1 c i . Then, in computing b H = b H up A, it is actually forming channel estimates of the form b H = r 1 t 1 e TBT, which satises the calibration requirements laid out in the previous section, where =c 1 . 3.4 Reciprocity Using Least Squares Calibration In this section we present a novel TDD relative calibration technique that generalizes the approach of [10] to the case of an arbitrary distributed network topology. We re-consider the channel model (2.10) assuming that, for clarity of exposition, syn- chronization is perfect, i.e., the residual TO, CFO and SFO are equal to zero. We then focus on the eects of transmit/receive hardware mismatches on the end-to-end channel reciprocity and devise a scheme able to compensate for such mismatches. We remark here that while a centrally clocked MU-MIMO architecture needs to per- form calibration at a very low rate (e.g., one calibration round every 10 min), due to the inherent stability of the complex scaling introduced by the transmit/receive hardware at each AP, in the case of a Distributed MU-MIMO architecture as con- sidered in this paper the residual CFO after non-ideal compensation yields a phase error that accumulates over the OFDM symbols across the data blocks. Hence, 86 3.4. RECIPROCITY USING LEAST SQUARES CALIBRATION calibration must be repeated together with synchronization, as shown by the frame structure in Figure 1.3. The calibration slot is formed by pilot symbols arranged on the time-frequency plane induced by OFDM, as commonly implemented for channel estimation purposes [9,18,84]. Since the non-reciprocal elements of the channel (due to the modulation/demodulation hardware) are smooth over the signal bandwidth both in amplitude and in phase (see the experimental results of Section 5.3), the calibration pilots can be arranged eciently on the time-frequency plane such that a large number of mutually orthogonal pilots can be exchanged in a short slot. Calibration slots are formed by pilot bursts designed to have a at transmit power spectral density. For example, this can be obtained by sending some OFDM symbols formed by known frequency domain symbols. Calibration might be done independently at each subcarrier or, by exploiting the fact that the non-reciprocal elements of the channel (due to the transmit and receive chains) are typically smooth over the signal bandwidth both in amplitude and phase, it can be performed after some smoothing in the frequency domain, in order to gain noise margin. Here, we focus on a single subcarrier and neglect the possible improvement by frequency smoothing. Recall the observations in (3.15) and (3.16). Grouping such measurements in pairs, we have 2 6 4 Y j!i Y i!j 3 7 5 = 2 6 4 T j R i T i R j 3 7 5 B i!j + 2 6 4 Z j!i Z i!j 3 7 5 = 2 6 4 c i c j 3 7 5 ij + 2 6 4 Z j!i Z i!j 3 7 5 ; (3.17) owing to the fact that, by the physical channel reciprocity,B i!j =B j!i , and dening ij = ji =T i T j B i!j . Our goal is to estimate the relative calibration coecients c i for i = 1;:::;N a , up to a common multiplicative non-zero constant. Without loss 87 CHAPTER 3. CALIBRATION of generality we assume thatfc i g is a set of non-zero bounded complex scalars (if c i = 0 or 1=c i = 0, the i-th node can be omitted as it is a \non-communicating" node). Inspection of (3.17) reveals that if the observations Y j!i ;Y i!j were noiseless, we would have c j Y j!i = c i Y i!j for all (i;j)2F u , whereF u is the set of undirected edges corresponding toF, i.e.,F u =f(i;j); (i;j)2F and (j;i)2Fg. Hence, a natural approach in the presence of observation noise is to dene the following LS cost function J cal (c 1 ;c 2 ;:::;c Na ) = X (i;j)2Fu jc j Y j!i c i Y i!j j 2 ; (3.18) and nd the solution c = (c 1 ;c 2 ;:::;c Na ) that minimizes (3.18). At this point, some observations are in order. First, observe that in order to exclude the trivial all-zero solution we need to impose a xed value for c 1 6= 0, e.g., c 1 = 1 (without loss of generality we chose AP 1 as the reference AP). Second, notice that the constrained non-trivial solution is dened up to an arbitrary multiplicative constant of magnitude 1. Hence, there is no loss of generality in solving for c 2 ;:::;c Na as a function of c 1 and replacing c 1 with some suitable value of non-zero magnitude. DierentiatingF cal with respect toc i , treatingc i andc i as if they were indepen- dent variables [85], and then setting the partial derivatives to zero, we obtain @ @c i F cal (c 1 ;c 2 ;:::;c Na ;) = X j:(i;j)2Fu c i jY i!j j 2 c j Y i!j Y j!i c i : (3.19) In matrix form, we obtain Ac = c, where A is the N a N a matrix with (i;j)-th elements [A] i;j = 8 > > > > < > > > > : P j:(i;j)2Fu jY i!j j 2 for j =i Y i!j Y j!i for j6=i; (i;j)2F u 0 for j6=i; (i;j) = 2F u : (3.20) 88 3.4. RECIPROCITY USING LEAST SQUARES CALIBRATION We can solve for the variablese c = (c 2 ;:::;c Na ) T by letting A = [a 1 jA 1 ] where a 1 is the rst column of A, such that e c =(A H 1 A 1 ) 1 A H 1 a 1 c 1 : (3.21) Finally, we notice that the sought constrained LS solution c is a unit-norm eigenvector associated to the eigenvalue of A with the smallest magnitude. In fact, this yields the direction in the domainC Na ofJ cal (c) with slowest growth, such that the value of J cal (c) is minimized at the intersection of the unit circlekck 2 = 1 and the eigenspace of A. We next note that the Argos relative calibration [10] (reviewed in Section 3.3) coincides with the solution to (3.18) subject to the graph (T;F) being a star with AP 1 at the center. In fact, in this case the objective function is given by J cal (c 1 ;c 2 ;:::;c Na ) = X j6=1 jc j Y j!1 c 1 Y 1!j j 2 ; (3.22) where the constrained minimum is obviously achieved by letting c j = Y 1!j Y j!1 c 1 for somec 1 , as proposed in [10]. In general, however, we can obtain signicantly better performance than [10] by considering topologies dierent from the star topology. As we demonstrate in Section 3.5 via simulations, this is especially true in the case of APs distributed over a relatively large area, resulting in AP-to-AP channel SNRs that can vary signicantly between dierent AP pairs. We next consider the MU-MIMO training and signaling operation based on a given set of estimates,f^ c k g of the relative calibration parameters. First, observations of the form (3.3) are collected based on uplink pilots. These observations are then used at the CS to obtain an MMSE estimate of the H up , namely 4 , b H up . Assuming 4 In principle, determining the gain of the MMSE lter requires knowledge of the magnitudes of the RF-impairments at the AP receivers and the user terminal transmitters, i.e., quantities that are unknown. In our simulations we simply used the large-scale gains in the associated point-to-point 89 CHAPTER 3. CALIBRATION the set off^ c k g Na k=2 is computed via an equation of the form (3.21) with c 1 = 1, the matrix H alt is then constructed as follows: H alt = ( b H up ) T diag 1; ^ c 1 2 ;:::; ^ c 1 Na : (3.23) We remark that if we replace ^ c i withc i =c 1 , and b H up with H up , the matrix H alt takes the desired form (3.7) with diagonal D. Consequently, given H alt from (3.23), and any given precoder function V = V(H alt ), such as e.g., ZFBF in (3.8), the eective downlink channel is given by (3.9), with eective N u N u channel matrix given by = e RBTV. We then use the instantaneous rate of the ith user, log 2 (1 + SINR i ), as our performance metric, where SINR i computed in the usual manner. The calibration performance of dierent calibration methods is evaluated via comparison of the associated user instantaneous rates (subject to a common MU- MIMO precoder method). As an upper bound we consider the performance with genie-aided calibration. Genie-aided calibration uses the same MU-MIMO training and signaling operation, with H alt in (3.23) replaced with H genie alt = ( b H up ) T diag c 1 1 ;c 1 2 ;:::;c 1 Na : (3.24) 3.5 Simulated Performance Analysis We provide a simulation-based comparison between the calibration scheme of [10] and our scheme. For convenience, we refer to the former as \Argos-Calibration" and to the latter as \LS-Calibration". First we consider APs to be co-located, forming a centralized MIMO array. In the co-located case, we assume all of the antennas are in the center of the square depicted in Figure 3.1. Second, we consider a scenario channels as indicators of the large scale SNR in order to determine the MMSE lter gains. 90 3.5. SIMULATED PERFORMANCE ANALYSIS −4 −2 0 2 4 −4 −3 −2 −1 0 1 2 3 4 1 2 3 X coordinate Y coordinate Figure 3.1: Sample topology involving APs (depicted by \") arranged on an 8 8 grid, and 16 UTs (depicted by \+"). The reference antenna used for Argos calibra- tion is denoted by a. comprising 64 single antenna APs distributed over a regularly spaced 8 8 squared grid, as shown in Figure 3.1. The system serves 16 UTs simultaneously, using Distributed MU-MIMO. The UTs are independently and randomly located with uniform probability over the square. The users' achievable rates are calculated by Monte Carlo simulation, randomizing over several realizations of the users locations. Figure 3.1 illustrates one such realization. The distance between the two most distant nodes is 100m. Hence, the minimum distance in the regular grid arrangement between APs is equal to 100=7= p 2 10:1m. In order to isolate the impact of the TDD reciprocity calibration on the perfor- mance of MU-MIMO precoding, we assume perfect synchronization. The UL and DL channels are given by Y up i [m;] =R i N U X j=1 i;j B i;j [m;] e T j X up j [m;] +Z up i [m;] (3.25) Y j [m;] = e R j N A X i=1 i;j B i;j [m;]T i X i [m;] +Z j [m;]; (3.26) 91 CHAPTER 3. CALIBRATION whereZ j [m;] andZ up j [m;] are i.i.d.CN (0;N 0 ) additive Gaussian noise samples, the real-nonnegative scalar i;j denotes the large-scale path gain between AP i and UT j and we assume i.i.d. small-scale fading B i;j [m;]CN (0; 1). The large-scale path gains between AP-to-AP or AP-to-UT are both based on the WINNER model [86], where the pathloss (PL) is given in dB as a function of distance (d, in meters), carrier frequency (f 0 , in GHz), and log-normal shadowing ( dB with variance 2 dB ) as: PL(d) = A log 10 (d) + B + C log 10 (f 0 =5) + dB ; 3 d 100; (3.27) where the parameters A, B, C, and 2 dB are scenario-dependent. We consider an indoor oce scenario 5 where A = 18:7;B = 46:8;C = 20; 2 dB = 9 when in line of sight, otherwiseA = 36:8;B = 43:8;C = 20; 2 dB = 16 when not in line of sight. For distances d < 3m, we conservatively extend the model by setting PL(d) = PL(3). This is justied by the fact that the extremely high receive powers associated with shorter distances do not lead to higher link capacities, due to practical constraints on the modulation order as well as the receiver hardware (gain control and ADC range). The line of sight probability is given by a Bernoulli distribution with parameterp los , which depends on distance as follows: p los = 8 > > < > > : 1 d 2:5m 1 0:9(1 (1:24 0:6 log 10 (d)) 3 ) 1=3 else. The large-scale path gain between AP i and UTj at distance d i;j is given by i;j = 10 (PL(d i;j )=20) . For the distributed case, the large scale gain between any AP and user is assumed to be uncorrelated. In the case of co-located antennas, we can model the large scale fading ij between a user j and any antenna i to be the same for all 5 These values correspond to the so-called A1 Indoor Oce scenario with single, light walls in every path and where all of the UTs and APs are located on the same oor. 92 3.5. SIMULATED PERFORMANCE ANALYSIS i assuming a small antenna array size with omni-directional antennas. In the TDD calibration protocols, pilot signals are transmitted between APs. In particular, when AP j transmits a pilot symbol X j [m;], AP i receives Y j!i [m;]=R i j!i B j!i [m;]T j X j [m;]+Z j!i [m;]: (3.28) j!i = i!j denotes the large-scale path gain between APs i and j and follows the same model used for i;j . The small scale fading coecientsB i!j [m;] have the same statistics asB i;j [m;] and we haveB i!j [m;] =B j!i [m;] due to the TDD reciprocity. For the co-located AP scenario, we assumeB j!i are i.i.d. Rician random variables with parameter (). This model allows us to study the eect of SNR variations on the performance of various calibration schemes. The hardware-induced non-reciprocal coecientsfR i ; T i g andf e R j ; e T j g are mod- eled as i.i.d. complex random variables with uniformly distributed phase over [;] and uniformly distributed magnitude in [1%; 1+%]. % is chosen such that the stan- dard deviation of the squared-magnitudes is equal to 0.1. Figure 3.2 illustrates the net eect of calibration in the centralized scenario on the user achievable rate for a sample user. Specically shown in the gure are the rate CDFs of user 2 (as shown in Figure 3.1) for Argos Calibration, LS Calibration, and genie-aided calibration for = 0 (Rayleigh) and = 1000 when ZFBF is used as the multi-user precoding. The reference antenna location used in Argos-Calibration is also indicated in Figure 3.1. LS-Calibration is run using a fully connected graph. In Figure 3.3, we present the rates of three particular users (labeled 1, 2 and 3 in Figure 3.1) for the particular realization of the user locations shown in Figure 3.1. Figure 3.4 shows the performance in terms of the achievable rate cumulative distribution function (CDF), obtained by generating independent realizations of the channels, of the calibration estimation, and of the UT positions. In these results we 93 CHAPTER 3. CALIBRATION 4 4.5 5 5.5 6 6.5 7 7.5 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Rate Scaled CDF ZF Genie−aided Cal. ZF Full −LS Cal. = 0 ZF Full −LS Cal. = 1000 ZF Argos Cal. = 0 ZF Argos Cal. = 1000 Figure 3.2: CDF of rates of the user depicted \2" in Figure 3.1 when calibration is done by Argos, LS and genie-aided calibration for various values. The results are obtained forP C = 10 3 andP T = 5(10 10 ). From this perspective, the two full-LS curves essentially lie on top of the genie-aided scheme. have assumed (for simplicity) that all UTs are served with equal power per user, and we have scaled the transmit signals by a common scaling factor such that the transmit power at each antenna does not exceed the per antenna power constraint of 90 dB. Backing o the transmit power in order to meet the per-AP power constraint is suboptimal. The optimal MU-MIMO zero-forcing precoding with per-antenna power constraint is discussed in detail in [87], and its optimization requires non- uniform power allocation per DL data stream and can be obtained using the theory of generalized inverses and convex relaxation. Here, we chose to use a more practical suboptimal approach for the sake of simplicity. LS-Calibration is run using a fully connected graph. Genie-aided calibration uses the true values of T i 's and R i 's to calculate the calibration coecients. It should be noted that the pilot overhead for the LS and Argos schemes are identical. The performance of the dierent calibration schemes is aected by the SNR 94 3.5. SIMULATED PERFORMANCE ANALYSIS 0 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Client Rate (bits/channel use) CDF User 2 User 1 User 3 (a) ZFBF 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Client Rate (bits/channel use) CDF User 3 User 2 User 1 (b) Conjugate BF Figure 3.3: Rate CDFs using ZFBF (a) and conjugate beamforming (b), with Argos- Calibration (dashed), LS-Calibration (solid) and genie-aided calibration (dash-dot), for the UTs indicated by \1", \2" and \3" in Figure 3.1. 95 CHAPTER 3. CALIBRATION 0 2 4 6 8 10 12 14 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Client Rate (bits/s/Hz) CDF Argos Calibration LS Calibration Genie−Aided Cal. (a) ZFBF 0 1 2 3 4 5 6 7 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Client Rate (bits/s/Hz) CDF Argos Calibration LS Calibration Genie−Aided Cal. (b) Conjugate BF Figure 3.4: The CDFs of the user rates, across the user locations and the calibration estimates. Figure (a) displays the results for ZFBF, and Figure (b) the results for conjugate beamforming. 96 3.6. HIERARCHICAL CALIBRATION value between antennas. As reported in [10], Argos-Calibration requires a careful placement of the reference antenna such that it has high enough SNR to all the other antennas. Argos, which was designed for co-located antennas, does not perform well in the distributed case due to the large pathlosses between the reference antenna and the more distant antennas. Since LS-Calibration does not depend on a single reference antenna, its performance is much less sensitive to the quality of a single AP-to-AP channel, and it achieves essentially genie-aided performance. 3.6 Hierarchical calibration In a large distributed network of nodes it may be necessary to provide relative calibration between many nodes that are dispersed over wide areas. The calibration methods of Section 3.4 require the inversion of a matrix with dimensions equal to the cardinality of the size of the network. Clearly, with increasing network sizes, these methods may become prohibitively computationally expensive. In this section we consider an alternative approach that becomes attractive for calibration in large-scale networks which relies on hierarchical calibration. In its simplest two-layer case, this involves rst splitting the network in suciently small- size clusters, and using the techniques of Section 3.4 to calibrate all nodes within each cluster. Subsequently a second inter-cluster calibration step is performed which accomplishes relative calibration across clusters. For convenience we re-index cluster nodes within each cluster, and denote by (i;m) the m-th node in cluster i. We also let c i;m =R i;m =T i;m denote the unknown parameter of interest. We will assume that for each i, suciently accurate intra- cluster calibration has been performed using an algorithm of the form (3.21) so that each node has been calibrated relative to a reference node in cluster c i . In particular, given that the algorithm (3.21) applied to cluster i has returned intra- 97 CHAPTER 3. CALIBRATION cluster calibration estimatesf^ c i;m g m , we have c i;m ^ c i;m c i ; (3.29) for some unknown parameter c i . We also let Y (i;m)!(j;n) denote the observation at node (j;n) based on a pilot transmitted by node (i;m), i.e., an observation of the form (3.16), with i and j replaced by (i;m) and (j;n), respectively. In order to study the inter-cluster calibration problem, we next consider clusters as nodes on a graph. Assume a connected cluster-network graph (T cl ;E cl ) where pilot bursts are transmitted and received by APs across clusters. The inter-cluster- communicating APs form a connected spanning subgraph (T cl ;F cl ) including all the clusters and a subset of linksF cl E cl . A pair (i; j)2F cl if there is at least one pair of observationsfY (i;m)!(j;n) ; Y (j;n)!(i;m) g of the form (3.16) that are to be used for calibration The pair is due to a pair of calibration pilots transmitted by APs (i;m) and (j;n) on distinct OFDM symbols but within the coherence-time of the channel. We also letG ij denote the set of all (m;n) index pairs for which such bi-directional pairs of observations are available between APs (i;m) and (j;n) in clusters i and j respectively. Thus, (i;j)2F cl if and only if the setG ij is non-empty. A visual interpretation of the hierarchical calibration problem is shown in Fig- ures 3.5 and 3.6. Figure 3.5 depicts a 99 network of nodes. As shown in the gure, there are nine 3 3 clusters of APs. Assuming intra-cluster calibration has already been performed within each 33 cluster, the bidirectional arrows (where each arrow represents two-way measurements between APs in dierent clusters) represent a set of measurements that can be used for inter-cluster calibration. Figure 3.6 shows the corresponding connected spanning subgraph (T cl ;F cl ) as- sociated with the two-way measurements shown in Figure 3.5, on which the inter- cluster calibration is to be performed. The inter-cluster calibration problem can be tackled with a straightforward extension of the baseline methods of Section 3.4. 98 3.6. HIERARCHICAL CALIBRATION Figure 3.5: Inter-cluster calibration among nine 3 3 clusters of APs, based on two-way measurements (double arrows) between APs in dierent clusters. Figure 3.6: Subgraph on which inter-cluster calibration is performed on the network of Figure 3.5. 99 CHAPTER 3. CALIBRATION The associated objective function for inter-cluster calibration can then be readily expressed as follows J h = X (i;j)2F cl X (m;n)2G ij c i;m Y (i;m)!(j;n) c j;n Y (j;n)!(i;m) 2 : (3.30) The solution can be readily derived by following the same steps as for the baseline LS-calibration. Letting ~ Y (i;m)!(j;n) = ^ c i;m Y (i;m)!(j;n) (3.31) and using (3.29) we can re-express J h in (3.30), as a function of the c i 's as follows J h = X (i;j)2F cl X (m;n )2G ij c i ~ Y (i;m)!(j;n) c j ~ Y (j;n)!(i;m) 2 : (3.32) Lettinge c = (c 2 ;:::;c Na ) T , with N C =jT cl j, the vectore c that minimizes (3.32) as a function of c 1 is given by (3.21), and where A is the N C N C matrix with element in its i-th row and j-th column given by A i;j = 8 > > > > < > > > > : X j:(i;j)2F cl X (m;n )2G ij j ~ Y (i;m)!(j;n) j 2 forj =i X (m;n)2G ij ~ Y (i;m)!(j;n) ~ Y (j;n)!(i;m) forj6=i; Notice that for some i6= j, the coecient A i;j may be zero, ifG ij is empty, i.e., if (i;j)= 2F cl . 3.7 Conclusion In this chapter we have developed hardware-impairment compensation techniques that can enable spatial multiplexing gains with reciprocity-based multiuser MIMO in the downlink of large-scale Distributed MIMO deployments with inexpensive ra- 100 3.7. CONCLUSION dios. Reciprocity-based MU-MIMO critically promises large aggregate spectral e- ciencies in such networks with manageable CSI acquisition overheads by exploiting uplink-downlink propagation-channel reciprocity (within the channel coherence time and bandwidth). However, since radios typically have randomly distributed non- reciprocal impairments in their baseband-to-RF and RF-to-baseband chains, there is no end-to-end uplink-downlink reciprocity. In this chapter we presented calibra- tion algorithms for these impairments which enable the desired multiplexing gains of large-scale Distributed MIMO. The RF calibration methods we present are robust extensions of the Argos calibration methods [10] (originally developed for co-located Massive MIMO deployments), and can enable spatial multiplexing gains without requiring knowledge of the user-terminal impairments or user-terminal involvement in the calibration process. A nal remark about scalability is in order. For general network graphs, the computational complexity of LS for both synchronization and TDD calibration is polynomial in the number of APs. For certain network graphs this complexity can be linear (e.g., in the star topology of the Argos scheme [10]). In any case, the computation involved in solving the LS estimation problems is easily aordable for networks of practical size (up to 100 or more APs). It is also interesting to remark that the data exchange over the wired backhaul network required by the proposed protocol is also easily aordable. For example, suppose that the timing and frequency measurements (real numbers) and the calibration pilot symbol (complex numbers) are represented with 16 bits per real coecient. This requires 64 bits per frame per AP (roughly speaking). For frames of 10 ms, this is equivalent to 6.4 kbit/s per AP of protocol data overhead. For a 1 Gb Ethernet backbone, as it would be meaningful for a Distributed MU-MIMO system, this overhead is less than 5 orders of magnitude less than the backhaul capacity. Therefore, a system with 100 APs would consume less than 0.1% of the backhaul capacity. However, in the 101 CHAPTER 3. CALIBRATION interest of scalability, we will revisit this topic in Section 4.4. 102 Chapter 4 System-Level Considerations 4.1 Introduction The development of practical synchronization and calibration schemes in the pre- vious chapters has enabled the primary functional building blocks of scalable Dis- tributed multi-user MIMO. However, a number of important questions remain in making this network topology realizable. How often is synchronization needed? How does the system perform under hardware impairments? Which precoding schemes are practical under realistic backhaul and processing constraints? This chapter seeks to answer some of these questions through simulation and analysis, such that a re- alizable Distributed MIMO network can be designed. It is worth noting that like many problems in networks, the solutions to these problems are highly inter-dependent, and very topology-dependent. Merely selecting the best (user rate-optimal) set of anchor nodes is combinatorial in nature and nding global optima are likely computationally intractable. However, as will be seen in this chapter, reasonable heuristics can be developed which enable practical solutions to these issues. The focus of this chapter is primarily on PHY layer issues. Theses issues re- 103 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS main fundamental to the ecient operation of the system, but do not represent the entirety of challenge of building a realizable Distributed MIMO system. Many questions remain in the MAC layer optimization, as hinted at in some of the orig- inal Distributed MIMO implementation works [30, 88]. Recent progress has been made in this area as well [89], though the interaction of the system with higher layer protocols remains an open question. 4.2 Simulations In the previous chapters, synchronization and calibration have been considered in isolation. When evaluating synchronization, calibration has been assumed to be ideal, and when evaluating calibration, ideal synchronization has likewise been as- sumed. This section employs simulations to examine the overall achievable rates possible when these schemes are jointly utilized. The results in Figure 4.1 are derived from a network of 64 single antenna APs serving a set of 16 single antenna, uniformly randomly distributed users (this sce- nario is derived from the simulations described in Section 3.5). The propagation model is again adopted from WINNER-II [86] indoor oce scenario A1. No an- chor nodes are designated for the purposes of synchronization since the choice of anchor nodes impacts the system performance. Rather, the SNR thresholding tech- nique used to dene the graph connectivity in Section 2.6 is used. For this purpose, a threshold of 5 dB was chosen, corresponding to the threshold behavior of the AirSync estimator shown in Figure 2.3. Each transmission frame (as dened in Figure 1.3) is composed of 3200 OFDM symbols (derived from the maximum payload size of 802.11 [18]) of downlink data transmission, and pilot overhead proportional to the number of APs, as described in the following paragraphs. In accordance with the 20 MHz bandwidth 802.11 104 4.2. SIMULATIONS variants, the bandwidth is divided into 64 subcarriers, and a cyclic prex of length 16 chips is adopted. The simulations adopt the synchronization technique of Sections 2.3.3 and 2.8. The simulated carrier frequency is in the ISM band at 2.45 GHz, with an operating bandwidth of 20 MHz, again akin to the usage of the 2.45 GHz band in 802.11. This carrier frequency is derived from a 40 MHz nominal sampling frequency, with a frequency multiplication factor of = 61:25. The oscillator dynamics are modeled, as in Section 2.8, using an AR-1 process. This model is validated from hardware measurements in Section 5.2, and these measured parameters are used to drive the AR-1 process in these simulations. The uncompensated frequency osets are uniformly distributed in the range of 2:45 GHz 49 kHz, the limits of which are dictated by the 802.11 standard's requirement on using 20ppm oscillators [18]. The assumption that these uncompensated frequency osets are uniformly distributed is not physically motivated, but the uniform distribution is the maximum entropy distribution for a closed interval [90], and the initial conditions do not aect the Kalman lter's steady-state performance [64]. The synchronization pilots adopt the AirSync format described in Section 2.3.3, wherein two bursts of pilots are sent with an interval between them, and other APs transmit their bursts throughout these intervals. In this way, it is possible to inter- leave a large number of pilots on the same time-frequency resource, while reducing the mean square error of the one-shot estimators. When perfectly interleaved, this technique introduces no additional overhead. In these simulations, each pilot burst is 4 OFDM symbols in length for a total of 8 OFDM symbols (as in Figure 2.3). The calibration pilots are 4 OFDM symbols in length, and calibration is per- formed utilizing the technique of Section 3.4. Calibration is performed with the same frequency as synchronization. When the transmitters re-synchronize each frame, their relative phase osets are reset according to the estimation of the new 105 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS timing and frequency osets. Rather than explicitly compensate for the phase o- sets as is done in [30], the Kalman approach integrates these phase osets into the eective channels of the APs and UTs. As such, calibration needs to be performed after each synchronization round in order to take these new relative phase osets into consideration. After calibration, uplink channel estimation is performed by the transmission of 4 OFDM symbols of Zado-Chu sequences [91] from each user. Figure 4.1 illustrates the behavior of the sum rate over 30 frames. The small- scale fading is modeled using block fading, in which the AP-to-AP and AP-to-user channels are assumed constant over the course of a frame, and vary as i.i.d. Rayleigh fading from frame to frame. In this way, slight variation in the \ideal" curve can be observed. The \actual" curve illustrates the sum rate when degraded by imperfect synchronization and calibration. The sawtooth nature of the curves re ect the periodic resynchronization, with the zero-forcing precoder becoming progressively more mismatched over the course of a frame. The initial degradation (i.e. the gap between \actual" and \ideal" right after synchronization and calibration) comes partially from the mismatch in the precoder, partially due to imperfect calibration, and partially from imperfect estimation of the uplink pilots. The nature of the results in Figure 4.1 unveil a fundamental question for a Dis- tributed MIMO system designer: how long should the pilots in the synchronization and calibration slots be to maximize the total throughput? As revealed in Section 2.6, longer synchronization pilots lead to more accurately synchronized transmit- ters. However, utilizing more of the channel resources for synchronization lowers the overall throughput, as longer gaps between data frames reduces the air time for downlink data. For a xed frame length, what is the optimal length of the pilot sequences? Figure 4.2 further illustrates the tradeo at hand. In this case, a cluster of 5 APs with 4 antennas each is being synchronized for transmission to 8 single antenna 106 4.3. FREQUENCY OFFSETS AND PHASE NOISE 0 2 4 6 8 10 12 x 10 4 0 20 40 60 80 100 120 140 160 180 200 Sum Rate (bit/s/Hz) Time (OFDM Symbols) Ideal Actual Figure 4.1: System performance using the synchronization and calibration schemes of Chapters 2 and 3 together. The sum rate is displayed over 30 frames, where each frame is dened as in Figure 1.3. users. Increasing the length of the pilot sequences helps until a certain point (6 OFDM symbols), and any additional gain in synchronization delity after that does not outweigh the channel resources being used for synchronization. Of course, these results are entirely simulation-based and are highly topology dependent. In the following section, we develop a model which allows for a more general analysis of this phenomenon. 4.3 Frequency Osets and Phase Noise Frequency osets between transmitters of the same basestation are not typically an- alyzed since multiple RF chains are often driven by the same clock source. However, as revealed in Figure 2.1, in a Distributed MIMO system frequency osets are very relevant to the proper operation of downlink beamforming. For the cost-ecient over the air synchronization schemes detailed in Chapter 2, the synchronization step re- lies on estimation algorithms which are inherently imperfect due to the estimation 107 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS 0 5 10 15 20 25 30 96 98 100 102 104 106 108 110 112 114 Pilot Length (OFDM Symbols) Sum Goodput (bit/s/Hz) Figure 4.2: Sum goodput (sum rate times the proportion of each frame used for data transmission) versus the synchronization pilot length. The benets of longer pilots have diminishing returns, so there is an upper limit on the usefulness of increasing the pilot length. noise. Even if utilizing expensive ber clock distribution, aws in the transmission medium can lead to imperfect synchronization. In both cases, residual frequency osets, though small, still exist after the synchronization protocol. Additionally, phase noise is present in all systems, whether they operate on the same oscillator or on synchronized oscillators. For instance, Wiener phase noise is a well-accepted model of free-running oscillators characterized by independent iden- tically distributed Gaussian phase increments. Additional phase noise models exist which tend to be more appropriate for RF communications, namely the Tikhonov phase noise utilized in some information theoretic works [92]. While these models of phase noise are well understood, they are not typically thought to have a signicant impact on the performance of achievable user rates, though they have been studied for their impact on channel state information ac- quisition [93, 94]. Such analyses observe that the impact of phase noise is largely experienced at high SNR and can be managed through appropriate training sequence 108 4.3. FREQUENCY OFFSETS AND PHASE NOISE ... (a) S1 ... (b) S2 ... (c) S3 Figure 4.3: The three scenarios considered in the following analysis. S1) Central- ized Massive MIMO S2) CoMP, where the dashed lines between the independent oscillators represent perfect frequency synchronization, and S3) Distributed MIMO. design, but the eects on phase noise during data transmission are omitted. A sim- ilar phase noise analysis has been performed in the time domain for the uplink of a massive MIMO system, but this analysis does not consider frequency osets or the impact on downlink achievable rates [95]. Additionally, frequency osets have been analyzed in a number of scenarios in the existing literature. A simple two transmitter, two receiver example was performed in [96], but no generalizable results were found. The following analysis utilizes the approach of deriving an eective SNR by which to optimize the system. Comple- mentary approaches have been developed which derive the eective channel mean square error [97], which could be used in a similar manner. Additionally, the same tactic could be used with the signal to interference ratio, which has been derived exactly [98]. We utilize the eective SNR in the following analysis due to its direct connection to achievable rates. We show through the calculation of achievable rates that the eects of frequency osets and phase noise can be signicant, and should in uence the design of large scale antenna systems. Consider a system of M basestation antennas serving K single-antenna users. These basestation antennas may take on a variety of interesting topologies, and we title them as follows: 109 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS Scenario 1 (S1): A single basestation with a large number of antennas Scenario 2 (S2): A network of basestations with high accuracy independent clock sources connected over a high speed backhaul (such as LTE CoMP) Scenario 3 (S3): A network of basestations with independent oscillators, where the oscillators may lack perfect synchronization (i.e. Distributed MIMO) These scenarios are illustrated in Figure 4.3. The instantaneous sampled carrier phase of each BS antenna m at timen is represented by a complex phasor m [n] = exp(j m [n]). The phase process m [n] is treated with some generality, though we will highlight a few important cases. The rst phase noise model we utilize is Wiener phase noise, which is appropriate for free running and laser oscillators. In this case, the phase process evolves as m [n] = m [n1]+w m [n] wherew m [n]N (0; 2 m )8n. In addition to Wiener phase noise, we also consider a simple memoryless phase noise process known as Tikhonov phase noise, which is seen in coherent systems as the residual phase error after a rst order PLL has locked [99]: m [n] = w m [n] where w m [n] Tikhonov(0; m )8n. On top of phase noise, we also assume the phase process can experience a constant drift under the in uence of frequency osets. In isolation from phase noise, the phase process would then evolve as m [n] = m [n1]+c m , wherec m is the normalized frequency oset relative to some nominal frequency. Of course, these eects may be combined, for example, such that the phase process evolves as m [n] = m [n1]+c m +w m [n]. Similarly, the instantaneous carrier phase of each user k at time t is represented by k [n] = exp(j k [n]) which may evolve, for example, as k [n] = k [n1]+d k +v k [n] wherev k [n]N (0; 2 k )8n and where d k is the normalized frequency oset of user k relative to the nominal frequency. Let the baseband channel coecient g k;m between BS antenna m and user k be determined by a pathloss parameter k;m and a small-scale fading parameter b k;m 110 4.3. FREQUENCY OFFSETS AND PHASE NOISE such thatg k;m = p k;m b k;m . We assume the channel coecient is at-fading, which is either the result of a suciently narrowband channel or the use of frequency division with suciently narrow subcarrier spacing. Assuming a suciently rich scattering environment, the small-scale fading has a standard Rayleigh distribution, orb k;m CN (0; 1), though this fact is not implicitly utilized in the following results. We assume ideal, absolute calibration, so the t i and r i parameters are unity for all basestation and user RF front ends. The eective downlink channel fading between BS antenna m and user k is then represented as h DL k;m [n] = p k;m b k;m [n]e j(m[n] k [n]) : (4.1) Let the uplink channel coecients be determined by the same parameters as the downlink channel, but with the variablesk andm transposed. As a result of channel reciprocity, g k;m = g m;k [68]. The resulting uplink channel fading between user k and BS antenna m is given by h UL m;k [n] = p k;m b k;m [n]e j( k [n]m[n]) : (4.2) In aggregating these channels, we develop the followingKM downlink andMK uplink channel matrices: H DL [n] = [n]G[n][n] H (4.3) H UL [n] = [n]G[n] T [n] H (4.4) 111 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS where [n] = diag( 1 [n];:::; M [n]); [n] = diag( 1 [n];:::; K [n]); [G[n]] i;j =g i;j [n]: For notational simplicity, when the dependence on n is obvious, the parameter is dropped. Note here that the eective uplink and downlink channels are not related by the hermitian transpose, as is commonly assumed in the information theory literature [38]. We ultimately wish to analyze the achievable rates of the users, for which the received symbol vector is y = H DL Vx + z (4.5) where V is the transmit beamforming matrix, x is the vector of transmitted symbols, andz i CN (0; 1) is complex circularly symmetric additive white Gaussian noise8i. Our analysis will focus on the case in which the beamforming matrix is calculated as the conjugate transpose of the estimate of the downlink channel, i.e. V = b H H DL [2]. User Rate Analysis We consider uplink TDD channel estimation, such that the uplink channel matrix H UL is estimated column-by-column through pilots transmitted by each user. For the remainder of this section, we assume a block fading model in which the channel coecients are constant within the period of anN +L channel symbol frame (i.e. an uplink channel estimation round of length L followed by a downlink beamforming transmission of length N) and thus we drop the dependence on the time variable. Further, in order to isolate the eects of phase noise and frequency osets, we assume the noiseless reception of uplink pilot sequences. We assume that the training sequence is relatively short compared to the data 112 4.3. FREQUENCY OFFSETS AND PHASE NOISE portion of the frame, and the eect of residual frequency osets and phase noise are small over short time periods, so we utilize the approximation that k [n] k [0]8n2f0;:::;L 1g,8k and m [n] m [0]8n2f0;:::;L 1g,8m. Then when user k2f1;:::;Kg transmits its pilots, the training data is received as y tr k = k [0][0]g T k (4.6) where g k is the kth row of the matrix G. In other words, the basestation receives the kth column of the matrix H UL [0]. At the end of the pilot transmissions (i.e., after time L 1), the BS aggregates the channel measurements to form the uplink estimation matrix b H UL = [0]G T [0] H : (4.7) Given the uplink channel estimate, the BS forms the downlink channel estimate by taking the transpose, b H DL = b H T UL = [0] H G[0]. The beamforming matrix is then given by V = b H H DL = [0] H G H [0], where = diag( 1 ;:::; K ) is a power-scaling matrix. In order to meet a total power constraint of P sum , we choose the k 's such that E[Tr(Vxx H V H )]P sum (4.8) or assuming the covariance matrix of x, x = diag(P 1 ;P 2 ;:::;P K ) K X k=1 E[jjv k jj 2 ]P k P sum (4.9) where v k is the kth column of the matrix V. The downlink channel in the data phase is given by H DL [n] = [n]G[n] H for 113 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS n2fL;:::;N +L 1g, and the resulting received downlink data vector is given by y d [n] = H DL [n]Vx[n] + z[n] (4.10) = [n]G[n] H [0] H G H [0]x[n] + z[n]: (4.11) Observing the received signal of userk at timen, we see the desired symbolx k [n] along with the interference and noise: y k [n] = k [n] k [0] M X m=1 g k;m g k;m m [n] m [0] k x k [n] + X i6=k k [n] i [0] M X m=1 g k;m g i;m m [n] m [0] i x i [n] +z k [n]: (4.12) In order to analyze the achievable rates of these users, we must characterize the signal power, interference power, and noise. Since the signal and interference powers are correlated, standard approaches from information theory do not apply. However, utilizing a technique from [100], we can transform this channel into an eective SISO channel where the signal and noise are uncorrelated. Let S k [n] =E " k [n] k [0] M X m=1 jg k;m j 2 m [n] m [0] k # ; then add and subtractS k [n]x k [n] to (4.12). As a result, the received signal becomes y k [n] =S k [n]x k [n] +N k [n] (4.13) 114 4.3. FREQUENCY OFFSETS AND PHASE NOISE where the aggregate noise term N k [n] = X i6=k k [n] i [0] M X m=1 g k;m g i;m m [n] m [0] i x i [n] + k [n] k [0] M X m=1 jg k;m j 2 m [n] m [0] k S k [n] ! x k [n] +z k [n] (4.14) is uncorrelated with the desired signal S k [n]x k [n]. Equation 4.13 then becomes a SISO channel with known signal and noise variances. The signal power is expressed as jS k [n]j 2 = M X m 1 =1 M X m 2 =1 E jg k;m 1 j 2 jg k;m 2 j 2 E m 1 [n] m 1 [0] E [ m 2 [n] m 2 [0]] ! E [ k [n] k [0]] 2 2 k : (4.15) The noise variance can be written as: E[jN k [n]j 2 ] =E[jx k [n]j 2 ] M X m=1 E jg k;m j 4 2 k jS k [n]j 2 + M X m 1 =1 X m 2 6=m 1 E jg k;m 1 j 2 jg k;m 2 j 2 2 k E m 1 [n] m 1 [0] m 2 [n] m 2 [0] ! + X i6=k E jx i [n]j 2 M X m=1 E jg k;m j 2 jg i;m j 2 2 i ! +E[jz k [n]j 2 ]: (4.16) For the sake of analytic tractability and to provide a baseline comparison case, we assume the pathloss variables to be equal (and without loss of generality may be assumed to be 1). This corresponds to a number of hypothetical topologies, such as the case in which a ring of distributed basestations serves a number of 115 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS users at the center of the ring. Under the assumption of i.i.d. Rayleigh fading, the second moments of the channels are equal to 1, and the fourth moments are 2 (generally speaking, Rayleigh fading is not necessary to meet these second and fourth moment parameter assumptions, though the fading coecients are assumed to be independent of one another regardless of the distribution). The noise power is assumed to be 1, and the user power allocation is assumed to be uniform (i.e. P i = P k = P sum =K8i;k), which implies that k = 1= p M for all k is sucient to meet the power constraint. We then parameterize the remaining quantities with the following variables: T k [n],E [ k [n] k [0]]; R m [n],E [ m [n] m [0]]; and V m 1 ;m 2 [n],E m 1 [n] m 1 [0] m 2 [n] m 2 [0] ; which vary depending on the phase noise and frequency oset models. Thekth user's signal power is deterministic in this model, and described by the above parameters: jS k [n]j 2 =jT k [n]j 2 1 M M X m 1 =1 M X m 2 =1 R m 1 [n]R m 2 [n] ! P sum K : (4.17) Utilizing the above notation, the resulting SNR becomes SNR k [n] = jT k [n]j 2 1 M P M m 1 =1 P M m 2 =1 R m 1 [n]R m 2 [n] Psum K 1 + 1 +KjS k [n]j 2 + 1 M P M m 1 =1 P m 2 6=m 1 V m 1 ;m 2 [n] Psum K : (4.18) In the case of ideal synchronization where T k [n], R m [n], and V m 1 ;m 2 [n] are all one, then the useful signal power grows with the number of basestation antennas, while the eective noise power only grows with the number of users. Scenario S1: Centralized Massive MIMO The rst scenario we will consider 116 4.3. FREQUENCY OFFSETS AND PHASE NOISE will be that of centralized Massive MIMO [2]. In this case, no frequency osets between the transmitters exist, but we consider the impact of phase noise on the system performance. In the case of Wiener phase noise, T k [n] may be written as T k [n] =E " exp j 2 k [0] + n X `=1 v k [`] !!# ; (4.19) R m [n] may be written R m [n] =E " exp j 2 m [0] + n X `=1 w m [`] !!# ; (4.20) and V m 1 ;m 2 [n] may be rewritten V m 1 ;m 2 [n] =E " exp j 2 m 2 [0] 2 m 1 [0] + n X `=1 2 (w m 2 [`]w m 1 [`]) !!# : (4.21) In the above equations and the remainder of the section, we utilize the variables subscripted with k and m as generic to all users and basestations, respectively. That is, unless noted otherwise, the statistical parameters of R m [n] are the same for all m. In a transmitter driven by a single clock source, m 1 [0] = m 2 [0] and w m 1 [`] = w m 2 [`] for all m 1 , m 2 , and `, so V m 1 ;m 2 [n] = 1. We further note that E[exp(j( P n `=1 w m [`]))] = exp(n 2 m =2) since the increments are independent over time. In a truly free-running oscillator, the initial phase of both transmitters and receivers at the start of each frame is uniformly distributed over the interval [0; 2), and thus T k [n] = 0 and R m [n] = 0. In this case, we have completely non-coherent communication, and the eective SNR is zero. However, it may be assumed that the transmitters and receivers achieve the initial phase synchronization necessary for coherent communications as would be done in a real system. Without loss of generality, we assume that the initial phases are zero. Then the kth user's SNR 117 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS under centralized Wiener phase noise is given as: SNR S1,W k [n] = M exp(n( 2 k + 2 m )) Psum K 1 + (K +MM exp(n( 2 k + 2 m ))) Psum K : (4.22) Massive MIMO analyses often utilize the fact that the inner product between one row of the downlink channel matrix and itself grows proportional to M, while the inner product between one row and another grows proportional to p M to show that the SINR grows unbounded asM!1. Similarly, the bounding scheme used above to convert the received signal into a SISO channel sees an eective signal power that grows proportional to the number of transmit antennas, but an eective noise power that grows proportional to the number of users. In the case of SNR S1,W k [n], however, forn> 0 the signal to noise ratio grows sub-linearly. This can be observed through the following limit: lim M!1 SNR S1,W k [n] = 1 exp(n( 2 k + 2 m )) 1 : (4.23) However, free-running oscillators are not normally utilized in modern communi- cation systems, as the oscillators are at the very least disciplined by a phase-locked loop. As such, this Wiener phase noise approach (also utilized in a similar analysis on in the uplink [95]) may be considered a worst case scenario. A more realistic model is to utilize the Tikhonov phase noise model which accounts for the residual phase error after the PLL [92]. As previously mentioned, in this case the phase process at basestationm is given by m [n] = exp(jw m [n]) wherew m [n] Tikhonov(0; m )8n, and the phase process at user k is given by k [n] = exp(jv k [n]) where v k [n] Tikhonov(0; k )8n. The Tikhonov, or Von Mises distribution, is conceptually sim- ilar to the Normal distribution modulo 2, though it is more general such that it also encompasses the case of uniform phase distribution. For a random variable 118 4.3. FREQUENCY OFFSETS AND PHASE NOISE X Tikhonov(;), the pdf is given as f X (x) = exp( cos(x)) 2I 0 () (4.24) where I 0 () is the zeroth order modied Bessel function of the rst kind with pa- rameter, and the support off X (x) is over [; +]. We can then express the desired quantities as follows: T k [n] = I 1 ( k ) I 0 ( k ) 2 (4.25) R m [n] = I 1 ( m ) I 0 ( m ) 2 (4.26) V m 1 ;m 2 [n] =E[exp(j m [n]j m [0] +j m [n] +j m [n])] = 1: (4.27) As a result, we express thekth user's SNR under centralized operation and Tikhonov phase noise as: SNR S1,T k [n] = M I 1 ( k ) I 0 ( k ) 4 I 1 (m) I 0 (m) 4 Psum K 1 + K +MM I 1 ( k ) I 0 ( k ) 4 I 1 (m) I 0 (m) 4 Psum K : (4.28) In this case, SNR S1,T k [n] is independent of the time index, so it only suers a constant degradation factor dependent on the quality of the post-PLL errors. However, it does exhibit diminishing returns for increasing M, as seen in the following limit: lim M!1 SNR S1,T k [n] = I 1 ( k ) I 0 ( k ) 4 I 1 (m) I 0 (m) 4 1 I 1 ( k ) I 0 ( k ) 4 I 1 (m) I 0 (m) 4 : (4.29) 119 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS Scenario S2: Coordinated Multipoint This scenario arises when a network of basestations cooperatively beamforms to its users. In this case, each basestation has an independent oscillator, but we assume it is of sucient quality to ignore the eects of frequency osets. It is not unreasonable, for instance, for an LTE bases- tation to have an expensive oven-controlled oscillator or rubidium atomic oscillator which has very stable frequency dynamics. In this way, it is possible for all of the basestations in the network to operate on the same frequency, yet have independent phase noise processes. In this context, the Wiener phase noise model does not make sense, as the oscillators have very stable long term frequency dynamics. As such, we consider only the Tikhonov phase noise model. We begin by noting that the parameter T k [n] remains the same as in S1, as the users are modeled using the same phase noise process in S2. We also observe that R m [n] does not exhibit any dependence between basestations, so assuming that the phase noise has the same dynamics at each oscillator (i.e., the same type of oscillator is used at each basestation), then R m [n] = I 1 ( m ) I 0 ( m ) 2 : (4.30) However, V m 1 ;m 2 [n] now consists of four independent terms, as the phase noise at a given basestation is independent over time, and the phase noise between basestations is also independent. In this case, V m 1 ;m 2 [n] = I 1 ( m 1 ) I 0 ( m 1 ) 2 I 1 ( m 2 ) I 0 ( m 2 ) 2 : (4.31) As assumed in the case of R m [n] , if the phase noise processes at each basestation follow the same statistics, then V m 1 ;m 2 [n] = I 1 (m) I 0 (m) 4 . 120 4.3. FREQUENCY OFFSETS AND PHASE NOISE Combining these results, we have that SNR S2 k [n] = M I 1 ( k ) I 0 ( k ) 4 I 1 (m) I 0 (m) 4 Psum K 1 + 1 I 1 (m) I 0 (m) 4 +K +M I 1 (m) I 0 (m) 4 1 I 1 ( k ) I 0 ( k ) 4 Psum K : (4.32) We again note that the asymptotics of M suggest the innite SNR achievable in Massive MIMO is only attainable in this model as k !1, i.e. I 1 ( k ) I 0 ( k ) ! 1, or as phase noise tends to zero. In general though, the limit is as follows: lim M!1 SNR S2 k [n] = I 1 ( k ) I 0 ( k ) 4 1 I 1 ( k ) I 0 ( k ) 4 : (4.33) Scenario S3: Distributed MIMO This scenario arises when each RF front end is clocked to an independent oscillator, and a secondary algorithm is run to syn- chronize the oscillators, such as those in Chapter 2. This is an idealized model of the system considered in Chapter 2, where eects of imperfect calibration are neglected. We assume, however, that the synchronization algorithm can never exactly estimate the relative frequency osets between multiple access points, and thus there exist residual frequency osets c m between the basestations. We begin by assuming the residual frequency oset at basestation m is uni- formly distributed over the interval [ m ; m ]. This assumption is not physically motivated, but rather arises from the fact that the uniform distribution is the maxi- mum entropy distribution for a random variable over a known interval. The interval in question is dictated, for example, by wireless standards. In the case of 802.11, the IEEE working group requires oscillators of 20ppm or better, which translates to a maximum frequency oset of49kHz at 2:45GHz carrier frequency [18]. The frequency dierencesc m then take on a triangular distributionT (2 m ; 2 m ). Since the Tikhonov phase noise is a memoryless process, we describe the phase evolution 121 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS without recursion, yet with a constant phase drift due to the frequency oset. The phase process is then m [n] = exp(j(nc m +w m [n])) wherew m [n] Tikhonov(0; m ). We assume the users are able to estimate and remove the eects of the carrier osets from the transmitters, so T k [n] is only aected by the Tikhonov phase noise, as in scenarios S1 and S2. The transmitter parameters are then described as: R k m [n] =E[exp(j(nc m +w m [n] +w m [0]))] = sinc 2 (n m ) I 1 ( m ) I 0 ( m ) 2 ; (4.34) and V k m 1 ;m 2 [n] =E[exp(j(n +k 1)(c m 2 c m 1 ))] = sinc 4 (n m ) I 1 ( m ) I 0 ( m ) 4 (4.35) where sinc(x) = sin(x) x is the unnormalized sinc function. The SNR in Distributed MIMO with uniformly distributed frequency osets and Tikhonov phase noise at both the transmitters and receivers is then SNR S3,U,T k [n] = Msinc 4 (n m ) I 1 ( k ) I 0 ( k ) 4 I 1 (m) I 0 (m) 4 Psum K 1 + 1 +K + M 1 I 1 ( k ) I 0 ( k ) 4 1 sinc 4 (n m ) I 1 (m) I 0 (m) 4 Psum K : (4.36) The limiting SNR is the same as scenario 2: lim M!1 SNR S3,U,T k [n] = I 1 ( k ) I 0 ( k ) 4 1 I 1 ( k ) I 0 ( k ) 4 : (4.37) For niteM though, the SNR now decays with time as the oscillators lose coherence. We also consider the eect of normally distributed residual frequency osets. We 122 4.3. FREQUENCY OFFSETS AND PHASE NOISE 10 1 10 2 10 3 10 4 10 5 0 100 200 300 400 500 600 Basestation antennas Average Sum Rate (bit/s/Hz) Ideal S1 S2 S3, N=1000 S3, N=5000 Figure 4.4: Sum rate (averaged with respect to time) versus the number of basesta- tion antennas. The impact of phase noise may be seen as a saturation in achievable rates for large values of M, while frequency osets have a greater impact on average rates for smaller values of M. let the residual frequency osets be distributed as i.i.d. N (0; 2 f ), so the frequency dierences are distributed asN (0; 2 2 f ). In this case,R k m [n] = exp(n 2 2 f ) I 1 (m) I 0 (m) 2 and V k m 1 ;m 2 [n] = exp(2n 2 2 f ) I 1 (m) I 0 (m) 4 . Then SNR S3,N,T k [n] = Me 2n 2 2 f I 1 ( k ) I 0 ( k ) 4 I 1 (m) I 0 (m) 4 Psum K 1 + 1 +K + M 1 I 1 ( k ) I 0 ( k ) 4 1 e 2n 2 2 f I 1 (m) I 0 (m) 4 Psum K ; (4.38) and perhaps unsurprisingly, lim M!1 SNR S3,N,T k [n] = I 1 ( k ) I 0 ( k ) 4 1 I 1 ( k ) I 0 ( k ) 4 : (4.39) Numerical Results In order to illustrate the eects of the oscillator impairments, we present nu- merical results in this section. The phase noise statistics were taken from the mea- 123 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS surement of a CVHD-950 voltage controlled oscillator, where the symbol period was assumed to be 2s. The parameter of the Tikhonov phase noise was calculated us- ing an iterative Newton method [101], and found to be = 3768:06. The frequency oset statistics were taken from the same measurements, but where the residual frequency osets were generated by simulating the synchronization algorithm de- scribed in Section 2.6. Note that for short bursts of time domain measurements, an exact measurement of the residual frequency oset is not possible, otherwise such a method would be used to perfectly synchronize the oscillators in the rst place. Hence, the residual frequency oset statistics must be obtained via simulation. We compare the three scenarios using these parameters, where Scenarios 1, 2, and 3 utilize the Tikhonov phase noise model, and Scenario 3 utilizes the Normally dis- tributed frequency oset model. We utilize (2.9) to properly scale the frequency osets from physical quantities to the c m terms. Here we set the number of subcar- riers to 64, the length of the cyclic prex to 16, the sampling period to 25 ns, the frequency multiplication factor to 61.25, and the standard deviation of the sampling clock frequency osets to 5 Hz. Figures 4.4 and 4.5 illustrate the achievable sum rates of a system of K = 50 users with a variable number of basestation antennas and data block lengths. Figure 4.4 displays the sum rate averaged with respect to time for a few scenarios: oscillators with no phase noise or frequency osets, a single oscillator subject to phase noise (S1), multiple oscillators subject to independent phase noise (S2), and multiple oscillators subject to phase noise and frequency osets (S3). These results exhibit some interesting phenomena. First, even under the relatively loose phase dynamics of a VCXO, an inordinately large number of antennas are required to see the impact of phase noise on massive MIMO in this topology. Second, consistent with previous results [95], distributed oscillators without frequency osets (S2) exhibit better performance than a single centralized oscillator (S1) under the same phase 124 4.3. FREQUENCY OFFSETS AND PHASE NOISE 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 50 100 150 200 250 Time (slots) Instantaneous Sum Rate (bit/s/Hz) Ideal, M=100 S3, M=100 Ideal, M=1000 S3, M=1000 Figure 4.5: Sum rate versus time for an N = 5000 length block. The achievable rates rapidly decline, motivating the existence of an average-sum-rate-optimal block length. noise statistics. Third, longer block sizes for oscillators with frequency osets (S3, N = 1000 vs. S3, N = 5000) lead to lower average rates. Finally, it may be observed that all schemes subject to Tikhonov phase noise (i.e. all but the ideal case in Figure 4.4) begin to saturate with very large values of M, but the rate at which they saturate (with respect to M) vary. Figure 4.5 illustrates the time dynamics of S3 versus ideal oscillators forM = 100 and M = 1000. While the ideal oscillators are not aected by the time index, the impaired cases degrade such that 90%+ of the achievable sum rates are lost by the 1700th block forM = 100, and by the 2500th block forM = 1000. Again, the eect of choosing an excessively long block length is on display. Though the M = 1000 case has a higher initial sum rate, its average sum rate over 5000 blocks is lower than that of theM = 10 3 case averaged over the rst 1000 blocks. The performance of S2 is not displayed in this case, as it is only slightly worse than ideal for these values of M, and it does not vary with time as is the case in S3. 125 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS 4.3.1 Achievable-Rate Optimal Frame Length In the absence of oscillator degradations or channel variations and assuming a xed synchronization overhead (i.e. xed L), it is apparent that the average throughput is a strictly increasing function of N, since longer block lengths lead to greater ef- ciency as N N+L ! 1. Given the SNR of the eective channel for S3, however, it is apparent that larger values ofN do not necessarily increase the average sum rate of a Distributed MIMO system. In fact, it is known that the mean signal-to-interference ratio (SIR) degrades quadratically with time in the case of zero-forcing beamform- ing [98]. Thus our objective is to choose an optimal N for given synchronization overhead, phase noise statistics, frequency oset statistics, and user/AP placement. We ultimately wish to solve: maximize f[N] = N N +L N+L1 X n=L 1 N log 2 (1 + SNR k [n]) subject to NN max where N max is the maximum frame length allowable by the coherence time of the channel. Because the objective of this problem is over the upper limit of the sum, the standard mathematical tools of optimization are not very useful. However, two observations can be made which make it numerically tractable. First, the objective function tends to be unimodal | that is, it satises the property that f[1] f[N 1] f[N ] f[N + 1] f[N max ] for some N . As such, a simple approach to calculating this value numerically involves sweeping over N until f[N] decreases, at which point the optimal has been found. Secondly, we can utilize the Euler-Maclaurin to approximate f[N] as f[N] 1 N +L Z N+L1 L log 2 (1 + SNR k [n])dn: (4.40) 126 4.3. FREQUENCY OFFSETS AND PHASE NOISE This approximation does not yield a closed form solution either since the rst order conditions of the above function yield polylogarithm functions. The polylogarithm yields a closed form solution for only certain arguments. However, the polylogarithm can be evaluated numerically, leading to a reasonable computational method. 4.3.2 Outage-Throughput Optimal Frame Length The analysis performed in the previous subsection points to an average-rate optimal block length, where the degraded channel eectively forms a set of dependent parallel channels in time. The implication of this approach is that coding is performed over the parallel channels in time, i.e. the codewords sent in the rst time slot of the each frame are jointly used to decode the message. As a result, a number of frames must be accumulated in order to decode the message, perhaps leading to signicant latency. This situation is analogous to fast-fading channels, where the ergodic rate is the typical metric of interest. However, we may also consider the alternative formulation of capacity typically observed in slow-fading channels, the outage capacity. In particular, we wish to maximize the outage-throughput by choosing the optimal coding rate R and frame length N: C k (R;N) =P R< 1 N N+L1 X `=L log 2 (1 +SINR k [`]) ! R N N +L : (4.41) One property to note in this equation is that we utilize the SINR rather than the \eective SISO" SNR developed earlier in this section. Thus this method can be utilized with any form of precoding transmission (as long as the below rst and second moments on the rate are known). Developing an exact solution to C k (R;N) requires knowledge of the full distribution of the average sum rate of user k. We can reduce this information requirement by observing some simple concentration of measure bounds of the desired quantity. For the sake of convenience, we adopt 127 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS the notation S N = P N+L1 `=L log 2 (1 +SINR[`]). We begin with the observation of a second-moment lower bound known as the Paley-Zygmund bound: P(S N E[S N ]) (1) 2 E[S N ] 2 var[S N ] + (1) 2 E[S N ] 2 (4.42) for 0<< 1. Letting = RN E[S N ] , we develop a lower bound on C k (R;N): C k (R;N) 8 > < > : 1 RN E[S N ] 2 E[S N ] 2 var[S N ] + 1 RN E[S N ] 2 E[S N ] 2 1 fRN<E[S N ]g 9 > = > ; RN N +L : (4.43) Similarly, we can utilize a second moment bound to upper bound this quan- tity as well. In this case, we utilize Cantelli's inequality, a single-tailed variant of Chebyshev's inequality: P(S N +E[S N ]) var[S N ] var[S N ] + 2 (4.44) for > 0. In this instance, we let = RNE[S N ], and when RN < E[S N ], we utilize the trivial upper boundP(S N +E[S N ]) 1. As a result, we nd an upper bound on the outage-throughput: C k (R;N) var[S N ] var[S N ] + (RNE[S N ]) 2 1 fRN>E[S N ]g + 1 fRNE[S N ]g RN N +L : (4.45) The result of using these upper and lower bounds translates to a signicant reduc- tion in the computational complexity of calculating the outage-throughput-optimal parameters. An example of the true objective function which we are attempting to maximize is shown as the blue mesh in Figure 4.6. This objective function is directly observable because it corresponds to a small 4 AP, 4 UT network and is derived from Monte Carlo simulations. For networks of practical size, however, the computational burden of calculating the full distribution is impractical. From this simple topology 128 4.3. FREQUENCY OFFSETS AND PHASE NOISE Figure 4.6: The outage-throughput as a function of the coding rate and frame length is represented by the blue mesh. The red and green meshes represent the Cantelli upper bound and Paley-Zygmund lower bound, respectively. The black dots represent the optimal code rate/frame length combination for each of the three curves. 129 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS though, we can see the upper- and lower-bounds of the outage-throughput provide close approximations to the true optima. Two-dimensional cuts along the optimal X- and Y-axes are shown in Figure 4.7. While it is not proven here, the phenomenon that the true optimum is sandwiched between the upper and lower optima appears to be true for these bounding functions. 4.4 Truncated Conjugate Beamforming One of the many considerations in making a realizable Distributed MIMO system comes from the concern of backhaul limitations [15, 26]. For deployments of very large numbers of transmit antennas, which can easily number in the hundreds when using the maximum of 8 antennas dictated by 802.11, the exchange of channel state information to each AP can pose a serious loading issue to modern Ethernet links. Prior work has sought to counter this problem by coordinating transmissions at the MAC layer, forming clusters of cooperating APs [25]. This approach is functional, yet for certain classes of precoders it adds an unnecessary level of complexity. In this section we propose a computationally simple precoding scheme which not only reduces the backhaul load, but can also increase user data rates. There are two general approaches the system designer may consider for data transmission over the backhaul. First, one may design the central server to cal- culate the precoding matrix, apply it to the users' data symbols, then send the precoded symbols over Ethernet to the APs. Considering the number of subcarriers and OFDM symbols per transmission, this represents a signicant computational load for a centralized encoder. Further, if utilizing the zero- forcing precoder, the calculation of this quantity scales cubically with the number of transmit antennas. Alternatively, the central server could send the users' uncoded data symbols along with the quantized channel matrices, with which the distributed APs could calcu- 130 4.4. TRUNCATED CONJUGATE BEAMFORMING 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Coding Rate (bit/s/Hz) Outage Thruput for Optimal Frame Length (bit/s/Hz) Actual Thruput Paley LB Cantelli UB 0 10 20 30 40 50 60 70 80 90 100 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Frame Length (Symbols) Outage Thruput for Optimal Code Rate (bit/s/Hz) Actual Thruput Paley LB Cantelli UB Figure 4.7: Two-dimensional cuts along the X- and Y-axes of the three-dimensional plot shown in Figure 4.6. 131 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS X−Coordinate (m) Y−Coordinate (m) −3Λ/2 −Λ/2 Λ/2 3Λ/2 −3Λ/2 −Λ/2 Λ/2 3Λ/2 Figure 4.8: The general topology used for the simulations in Figures 4.9 and 4.10. 16 APs with 4 antennas each jointly beamform to 32 single-antenna users. is a variable dening the closest distance between two APs late the precoded symbols. In this way, the computational complexity scales with the number of APs, but the backhaul utilization increases linearly with the number of APs. In the proposed approach, we utilize the latter method, yet subvert the traditional precoder design to reduce the backhaul utilization. We begin by observing a simulated topology in Figure 4.8. In this topology, it is heuristically clear that transmissions from the AP in the top left corner will have less of an impact on a user near the bottom right than an AP also near the bottom right. With the channel coecients between distant transmitter/receiver pairs being very small, the impact of these coecients on the received signal is relatively inconsequential. As a result, we can eliminate the small elements from the precoder with only a slight increase in the interference power at the receiver. Consider the conjugate beamforming (matched lter) precoding scheme. We dene a received SNR threshold by which we choose the precoding matrix as [V] i;j =h j;i 1 R j;i > ; (4.46) 132 4.4. TRUNCATED CONJUGATE BEAMFORMING −20 0 20 40 60 80 100 0 20 40 60 80 100 User Sum Rate −20 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 Backhaul Utilization (Relative to Full CBF) Coefficient−Zeroing Threshold (dB) CBF Truncated CBF Backhaul Load Figure 4.9: For a xed topology ( = 4), increasing the SNR threshold can dramatically reduce the overhead requirements (red curve, corresponding to the right y-axis scale) while leaving the user sum rate (green curve) unaected. where 1 A is the indicator function of the set A,R j;i =jh j;i j 2 P=N 0 is the received signal power, andP=N 0 is the transmitted power to noise power ratio. We will refer to this precoding scheme as \truncated conjugate beamforming." We test this scheme in the topology outlined in Figure 4.8. This is a network of 16 APs, each with 4 antennas, serving 32 single-antenna users. In Figures 4.9 and 4.10 we vary the SNR threshold to observe the impact on the achievable user rates as well as the backhaul load. Figure 4.9 utilizes a 4m AP spacing ( = 4), whereas Figure 4.10 utilizes a 10m grid. As can be seen in Figure 4.9, in this dense topology truncated conjugate beamforming can achieve the same sum rate performance as conjugate beamforming while utilizing one fth of the backhaul resources by setting to 70 dB. The results for this network when it is less dense are even more impressive, as truncated conjugate beamforming actually outperforms regular conjugate beamforming when is 80 dB. In this case, because interference from distant APs is already minimal, the truncated conjugate beamforming scheme puts more power into the useful signal of nearby users. The result is a signicant improvement in both backhaul utilization and user rate performance. Note that the 133 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS −20 0 20 40 60 80 100 0 20 40 60 80 100 User Sum Rate −20 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 Backhaul Utilization (Relative to Full CBF) Coefficient−Zeroing Threshold (dB) CBF Truncated CBF Backhaul Load Figure 4.10: In less dense networks ( = 10), the truncated conjugate precoder can actually signicantly outperform regular conjugate beamforming while utilizing 95% less backhaul trac. reduction in backhaul utilization comes not only from not needing all the channel coecients of the channel matrix, but also not needing the user data symbols for the corresponding zeroed precoding matrix elements. By transmitting the precoder as a sparse matrix, signicant reductions in back- haul data requirements are possible. By eliminating the less relevant channel coef- cients from the channel matrix, we allow the backhaul load to scale with the user density rather than the number of APs, which enables massive numbers of transmit antennas. While this technique is not proven to be optimal, it does provide an easily implementable solution to system designers. 4.5 Blind Interference Alignment In the previous section, we explored a technique which limited the backhaul uti- lization by reducing the eective CSIT required at each AP. In general, full CSIT is assumed to be needed for MU-MIMO techniques. However, it was shown that a partial degree of freedom gain is still possible with no CSIT by inducing particular 134 4.5. BLIND INTERFERENCE ALIGNMENT channel uctuations at the receiver through antenna switching [102{104]. This tech- nique is known as Blind Interference Alignment (BIA) and is of particular interest to the scenario of Distributed MIMO since it does not require CSIT. The fundamental idea of BIA is to dierentiate the users by inducing spatial signatures in their channel temporal variations. This is obtained by allocating to each user an antenna switching sequence, according to which they demodulate the signal from one of their antennas. Only one antenna in every given slot is used, so that a single RF front-end and demodulation chain are needed. Assume a downlink transmission scenario in which there are N t transmit anten- nas, andK users. Further, let each receive antenna haveM =N t selectable antenna \modes," i.e. each antenna can \see" M linearly independent channels. The trans- mitter tries to send some complex-valued symbolsx [k] i , where the subscripts refer to user k's ith symbol. In general, we will let the vector x [k] refer to the collection of all of user k's symbols. Assume transmission occurs over a block fading channel over which the coherence time is greater than M +K 1 symbol durations. That is, all channel coecients h [k] n (m) from transmit antenna n to user k's mth antenna mode, stay constant over M +K 1 symbol durations, after which they change to a new, random value. Let h [k] (m) be the vector of such coecients from all transmit antennas. We reproduce here the method described in [102] which achieves a Degree of Free- dom gain of MK M+K1 by using a predened antenna switching pattern. For example, consider the case of N t =M =K = 2. Let y k (t) represent user k's received symbol at time t. On the rst slot, a superposition of the two user's vectors, x [1] + x [2] , is sent from the two transmit antennas (i.e. x [1] 1 +x [2] 1 is sent from the rst antenna, and x [1] 2 +x [2] 2 from the second). In the subsequent two time slots, x [1] and x [2] are sent, respectively. Let user 1 apply the antenna switching pattern A, B, A over the three slots, and let user 2 apply the pattern A, A, B. Then user 1's received symbols 135 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS are given by: y 1 (t 1 ) = h H 1A (x [1] + x [2] ) +z 1 (t 1 ) (4.47) y 1 (t 2 ) = h H 1B (x [1] ) +z 1 (t 2 ) (4.48) y 1 (t 3 ) = h H 1A (x [2] ) +z 1 (t 3 ) (4.49) where z k (t)CN (0; 1) is i.i.d., circularly symmetric, complex random noise. After the symbols are received, the following zero-forcing operation is performed: ~ y 1 = 2 6 4 y 1 (t 1 )y 1 (t 3 ) y 1 (t 2 ) 3 7 5 (4.50) = 2 6 4 h H 1A h H 1B 3 7 5 2 6 4 x [1] 1 x [1] 2 3 7 5 + 2 6 4 ~ z 1 z 1 (t 2 ) 3 7 5 (4.51) where ~ z 1 CN (0; 2) is the additive noise from symbols one and three. By repre- senting the concatenated channel coecients in matrix form, H [1] = 2 6 4 h H 1A h H 1B 3 7 5 (4.52) it is clear that ~ y [1] = H [1] x [1] + z [1] (4.53) is equivalent to a 22 MIMO channel with twice the noise power in the rst receiver. This is also equivalent to ~ y [1] = e H [1] x [1] + z [1] (4.54) 136 4.5. BLIND INTERFERENCE ALIGNMENT where both noise terms are now distributed asCN (0; 1) and e H [k] = diag 1 p K ;:::; 1 p K ; 1 H [k] : (4.55) This process is summarized in Table 4.1. Note that in order to satisfy the sum power constraintE[jjxjj 2 ]P, we transmit each symbol with power M+K1 M 2 K P. In other words, expending units of power per symbol, then MK symbols sent over M +K 1 slots implies a total power usage of MK M+K1 . Dividing this uniformly over transmit antennas, we nd = M+K1 M 2 K P. For the case of an arbitrary number of users and transmitters, we refer to the full description of the scheme in [102]. Finally, we note that the single user achievable rate of such a system is given for Gaussian coding as follows: R [k] = 1 M +K 1 E log det I + (M +K 1)P M 2 K e H [k] e H [k]H : (4.56) 4.5.1 Blind Interference Alignment Under Discrete Alpha- bets Unless otherwise noted, now assume that all x [k] i are elements taken from some discrete alphabetX , such as a QAM or PSK constellation of sizejXj =S. Assuming the matrix H [k] is known at each receiver (through pilot tones, for example) then 137 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS Slot t =t 1 t =t 2 t =t 3 Tx1 Sends Tx2 Sends x [1] + x [2] = " u [1] 1 +u [2] 1 u [1] 2 +u [2] 2 # x [1] = " u [1] 1 u [1] 2 # x [2] = " u [2] 1 u [2] 2 # User 1 Antenna A B A User 2 Antenna A A B User 1 Receives y 1 (t 1 ) = h H 1A (x [1] + x [2] ) +z 1 (t 1 ) y 1 (t 2 ) = h H 1B x [1] +z 1 (t 2 ) y 1 (t 3 ) = h H 1A x [2] +z 1 (t 3 ) User 2 Receives y 2 (t 1 ) = h H 2A (x [1] + x [2] ) +z 2 (t 1 ) y 2 (t 2 ) = h H 2A x [1] +z 2 (t 2 ) y 2 (t 3 ) = h H 2B x [2] +z 2 (t 3 ) User 1 Decodes e y 1 (1) = y 1 (t 1 )y 1 (t 3 ) = h H 1A x [1] +z 1 (t 1 )z 1 (t 3 ) e y 1 (2) = y 1 (t 2 ) = h H 1B x [1] +z 1 (2) 9 > > = > > ; ) ^ x [1] = h H 1A h H 1B 1 e y 1 (1) e y 1 (2) User 2 Decodes e y 2 (1) = y 2 (t 1 )y 2 (t 2 ) = h H 2A x [2] +z 2 (t1)z 2 (t2) e y 2 (2) = y 2 (t 3 ) = h H 2B x [2] +z 2 (3) 9 > > = > > ; ) ^ x [2] = h H 2A h H 2B 1 e y 2 (1) e y 2 (2) Table 4.1: Blind Interference Alignment for the 2 2 scenario 138 4.5. BLIND INTERFERENCE ALIGNMENT the per-user achievable rate can be written: R [k] =I(x [k] ; (~ y [k] ; H [k] )) (4.57) =E H I(x [k] ; ~ y [k] jH [k] =H) (4.58) =E H " X x i 2X P (x i )I(x i ; ~ y [k] jjH [k] =H) # (4.59) =E H " X x i 2X P (x i ) Z y2C M f(yjHx i ) log 2 f(yjHx i ) f(yjH) dy # (4.60) where P (a i ) and f(b i ) are the probability mass and probability distribution func- tions, respectively, of random vectors a and b. In this case, let x be uniformly distributed such that P (x i ) = 1 S M 8i. The remaining distributions are given as: f(yjHx i ) = 1 K M1 2 exp jjC 1 (yHx i )jj 2 (4.61) f(yjH) = X x i 2X 1 S M 1 K M1 2 exp jjC 1 (yHx i )jj 2 (4.62) where C = diag(K;:::;K; 1) (4.63) accounts for the additional noise variance in the zero-forced branches. This expression can be simplied to a form that can be easily calculated via Monte-Carlo integration. Due to the symmetry of the problem, the following ex- pression for the rst user's rate is the same for the general case. We present here the specic solution for M =K = 2: R [1] =E H " E x " E yjHx " log 2 (f(yjHx i )) log 2 1 S 2 X x 0 2X 2 f(yjHx 0 ) !### : (4.64) Thus to evaluate the expectation, one must merely generate the random variables 139 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS −10 0 10 20 30 40 50 0 1 2 3 4 5 6 7 8 9 10 SNR (dB) Per−User Rate (bit/s/Hz) BPSK QPSK 16−QAM 64−QAM Gaussian Figure 4.11: BIA achievable rates under various input alphabets. H, x, and yjHx and nd the average value of the above expression. The only signicant computational complexity comes from the summation over the discrete alphabet, which for most modern modulation formats (jXj 256) this is not a major hindrance. The resulting rate curves are shown in Figure 4.11. 4.5.2 Blind Interference Alignment in Non-ideal Channel Conditions Now assume that rather than the perfect zero-forcing described in previous sections, there is some unknown channel variation over the course of a super-symbol. For example, in the 2 2 case, we now have the following received symbols: y 1 (t 1 ) = h H 1A (x [1] + x [2] ) +z 1 (t 1 ) (4.65) y 1 (t 2 ) = h H 1B (x [1] ) +z 1 (t 2 ) (4.66) y 1 (t 3 ) = e h H 1A (x [2] ) +z 1 (t 3 ) (4.67) where e h 1A has changed by some random value from h 1A . We model this behavior using an additive Gaussian random variable such that 140 4.5. BLIND INTERFERENCE ALIGNMENT e h 1A = h 1A +E whereECN (h 1A ; 2 i ). Following the zero-forcing step, the symbols of interest to user 1 are: ~ y [1] = 2 6 4 y [1] 1 y [1] 3 y [1] 2 3 7 5 (4.68) = 2 6 4 h H 1A h H 1B 3 7 5 2 6 4 x [1] 1 x [1] 2 3 7 5 + 2 6 4 h H 1A e h H 1A 0 3 7 5 2 6 4 x [2] 1 x [2] 2 3 7 5 + 2 6 4 e z 1 z 1 (t 2 ) 3 7 5 (4.69) = 2 6 4 h H 1A h H 1B 3 7 5 2 6 4 x [1] 1 x [1] 2 3 7 5 + 2 6 4 g H 1 0 3 7 5 2 6 4 x [2] 1 x [2] 2 3 7 5 + 2 6 4 e z 1 z 1 (t 2 ) 3 7 5 (4.70) where g [k] (j)CN (0; 2 i ) is user k's jth interference channel. The distribution of the interference terms (e.g. I = g [1] 1 x [2] 1 ) can be found via the product distribution rule: f I (i) = Z 1 1 1 jxj f X (x)f G i x dx (4.71) = 1 S X x 0 2X 1 2 i jx 0 j 2 exp jij 2 2 i jx 0 j 2 : (4.72) This is a Gaussian mixture distribution, which is a well-dened probability distri- bution function. Inserting these terms into the distribution of the received symbols, f(y [1] 1 jh [1] 1 ;h [1] 2 ;x [1] 1 ;x [1] 2 ) = 1 S 2 X x [2] 1 ;x [2] 2 1 2 i (jx [2] 1 j 2 +jx [2] 2 j 2 ) exp jy [1] 1 h [1] 1 x [1] 1 h [1] 2 x [1] 2 j 2 2 2 i (jx [2] 1 j 2 +jx [2] 2 j 2 ) ! (4.73) and f(y [1] 1 jh 1] 1 ;h 1] 2 ) = 1 S 4 X x [1] 1 ;x [1] 2 ;x [2] 1 ;x [2] 2 f(y [1] 1 jh 1] 1 ;h 1] 2 ;x [1] 1 ;x [1] 2 ) (4.74) Using these functions in the rate expression (4.64), the 22 single-user achievable 141 CHAPTER 4. SYSTEM-LEVEL CONSIDERATIONS −10 0 10 20 30 40 50 0 1 2 3 4 5 6 7 8 9 10 SNR (dB) Per−User Rate (bit/s/Hz) BPSK QPSK 16−QAM 64−QAM Gaussian Figure 4.12: BIA achievable rates under imperfect zero-forcing (green dashed lines) rates are given in Figure 4.12. 4.6 Conclusion In this chapter we have explored a number of system-level considerations for prac- tical Distributed MIMO networks. Through full scale system simulations, we have identied one of the primary impairments, frequency osets, and considered their impact in a massive MIMO scenario. This analysis prompted the designation of an optimal frame length, for which we explored two potential solutions, analogous to the fast-fading/slow-fading approaches to wireless channel capacity. We then discussed a heuristic-based precoding scheme which was shown to both reduce the requirements on the backhaul capacity and potentially increase the system sum rate. Finally, we analyzed a precoding scheme which does not require channel state information at the transmitter, known as Blind Interference Alignment. We inves- tigated the performance of BIA under discrete alphabets, as well as considering its performance under non-ideal (time varying) channel conditions. This chapter has merely scratched the surface of a few of the more important 142 4.6. CONCLUSION system-level considerations; a number of important issues remain in making Dis- tributed MIMO realizable. For one, coding schemes which are more robust to the progressive loss of synchronism would be a key technology to enabling high per- formance Distributed MIMO. Second, the optimal selection of anchor nodes is, in general, a dicult combinatorial problem for which useful heuristics are needed. Fi- nally, the intuitive approach behind truncated conjugate beamforming proved use- ful in developing a low-backhaul-requirement precoding scheme, however nothing is proven regarding its optimality, and there are likely more advanced schemes which accomplish greater sum-rates at lower backhaul capacities. Though implementa- tion hurdles remain for Distributed MIMO, it also presents a fruitful landscape for communication researchers. 143 Chapter 5 Experimental Evaluation 5.1 Introduction Our original work on Distributed MIMO was based on experimentation [29], which we subsequently rened through the analysis presented in the previous chapters. In this chapter, we return to the experimental origins of this system to verify and test a number of our theoretical assumptions. A large-scale implementation of Distributed MIMO is not our objective, as the resources required for such quantities of hardware would be signicant. Rather, we rely on the theoretical underpinnings established thus far, and utilize these measurements to shore up the underlying assumptions of our models. We begin with a verication of the oscillator dynamics discussed in Section 2.8. As noted previously, the AR-1 model utilized in that section is not motivated through the physics of such devices, but rather out of mathematical convenience. We reveal in this chapter that the model is indeed appropriate based on measurements. Next, we provide measurements of the calibration coecients discussed in Chapter 3, and explore the frequency correlation of these random variables. Finally, we analyze the system-level performance of a Distributed MIMO system utilizing the Blind 144 5.2. SYNCHRONIZATION RESULTS Interference Alignment method discussed in Section 4.5. 5.2 Synchronization Results In this section we present experimental justication for the model presented in Sec- tion 2.8. The use of the Kalman lter is predicated on the underlying process being a single-tap autoregressive process. Of course, a number of generalizations of the classical Kalman lter are possible, such as higher order Kalman lters or the ex- tended Kalman lter, though in this case we rst seek to verify this much more computationally simple approach. We verify through measurements that the phase noise and frequency drift statistics of the oscillator can be characterized as an AR-1 process. The oscillator under consideration is a Crystek CVHD-950 100 MHz voltage controlled crystal oscillator. This oscillator drives an AD-9523 clock distribution board and PLL to output a 40 MHz signal, which is then multiplied by the RF hardware to form the 2.45 GHz carrier signal. This oscillator and signal path is typical of an 802.11 grade device, with 20 ppm frequency accuracy. The frequency drift behavior of the device is measured by observing an unmodulated sine at very high SNR (> 40 dB) with a symbol period of 0.2 s. Though we wish to measure the characteristics of a single oscillator, it is only possible to measure the frequency and phase dynamics between two oscillators: the measured and the measuring os- cillators. There is no way to distinguish between the phase noise/frequency oset characteristics of the measured oscillator and those of the measuring oscillator, as such, it is important to utilize a highly accurate measurement source. We use an HP sampling oscilloscope equipped with a low phase noise oscillator to capture the 20 ms sample paths. One hundred such sample paths are used for calculating the autoregressive process parameters. 145 CHAPTER 5. EXPERIMENTAL EVALUATION 0 2 4 6 8 10 12 14 16 18 20 −300 −200 −100 0 100 200 300 Time (s) Frequency Offset (Hz) Simulated Oscillator Measured Oscillator Figure 5.1: Sample paths of a consumer-grade voltage controlled crystal oscillator, and the corresponding AR-1 process used to simulate it. The typical method of estimating the autoregressive parameters is through the Yule-Walker equations, which can be solved eciently using the Levinson-Durbin algorithm [105]. However, Burg's maximum entropy spectral estimation provides a more robust method when the autocorrelation matrix is poorly conditioned [106]. As such, we choose the latter method. The resulting measured parameters are a = 0:4731 and = 6:19 10 3 . The measured sample path is illustrated in Figure 5.1 along with a sample path generated from the autoregressive model. As can be seen, the frequency uctuations of the simulation are on the same order of magnitude and time scale as the measured oscillator. 5.3 Calibration Results In this section we explore the behavior of the calibration coecients introduced in Section 3.2 on actual hardware. To this end, we utilize the WARP SDR platform [71] to exchange the pilot signals described in Equations (3.15) and (3.16). Utilizing a 146 5.3. CALIBRATION RESULTS single WARP v2 FPGA with 4 RF front ends, we transmit and receive the pilots from each RF front end over a 20 MHz, 802.11g-inspired modulation format. As with the 20 MHz 802.11n and 802.11ac standards, we utilize a subcarrier spacing of 312.5 kHz across 64 subcarriers. As the standard prescribes, the data subcarriers are distributed over the subcarrier indices -28 to +28, covering a total eective bandwidth of 17.8125 MHz. The subcarrier indices from [32;29] and [29; 31] are null carriers, i.e. nothing is transmitted on them for the sake of easing the requirements on the transmit and receive lters. We perform the relative calibration protocol described in Section 3.4, and observe the resulting calibration coecients c i for i = 2; 3; 4 on each data subcarrier in Fig- ure 5.2. Recall that in relative calibration, the reference antenna (chosen arbitrarily as antenna 1 here) can be assumed to have an unit magnitude complex scalar cali- bration coecient. In our case, we assume c 1 = 1 for all subcarriers. The primary observation to take note of is that for all three relatively-calibrated RF front ends, the calibration coecients are very highly correlated across the subcarrier indices. The fact that the frequency and phase responses of the calibration coecients are so at in frequency can be used to inform our system design. Instead of utilizing the entire frequency spectrum to send pilots from one AP, we can estimate all of the calibration coecients across the spectrum by transmitting on a relatively small subset of the total subcarriers. In this way, multiple APs may transmit simultane- ously on separate subcarriers, reducing the total overhead due to calibration. This concept is further explored in recent work [107]. Also noteworthy is the relatively small variation that each AP's calibration coe- cients exhibit from unit magnitude. In this particular instance, because the transmit and receive gains are set to the same values, it is not surprising that the calibra- tion coecients should have roughly the same magnitudes. However, in general the magnitudes may vary from AP to AP due to device settings as well as manufac- 147 CHAPTER 5. EXPERIMENTAL EVALUATION 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 Real Component Imaginary Component c 2 c 3 c 4 Figure 5.2: The complex (magnitude and phase) response of measured calibration coecients on a 4 transmitter system on 48 subcarriers across 17.8125 MHz of bandwidth. Note that both the magnitude and phase responses of the coecients are highly correlated in frequency. turing variations. In this particular instance, the calibration coecients are mostly encoded in their relative phase responses. 5.4 Blind Interference Alignment In this section we investigate the performance of Blind Interference Alignment in a practical Distributed MIMO system. The results here are for a system of two APs and two users. The network topology used in these tests is from the original AirSync protocol (see Section 2.4), in which a primary AP transmits an out of band pilot tone to the secondary AP. Since channel state information is not required for Blind Interference Alignment, no reciprocity calibration is necessary. The testbed topology is illustrated in Figure 5.3, where the APs and users are situated in a close range oce environment. One of the resulting scatter plots is shown in Figure 5.4. Since BIA does not provide the transmitter with channel state 148 5.4. BLIND INTERFERENCE ALIGNMENT Figure 5.3: BIA Testbed. When using Blind Interference Alignment each receiver switches between two antenna modes. information, it is important to nd out by how much the received symbol quality is aected by small variations in the positioning of the antennas. We have conducted an experiment in which we have varied the transmitter antenna positions within one wavelength of their initial position and measured the channel SINR for the two user symbols. The CDFs of the resulting SINR distributions are shown in Figure 5.5. The high variance of the distribution has profound implications on the design of a coding and medium access scheme for BIA. The fact that User 1 has a higher median SNR can be easily explained by the fact that the two symbols are transmitted by antennas placed on dierent transmitters. The placement of the users relative to the corresponding transmitter determines each symbol's average power. In order to quantify the performance of BIA, we compare it to a typical TDMA system. In TDMA, instead of multiuser precoding we transmit to one user at a time from the closest access point. In this scheme, transmissions to dierent users happen in a time-shared manner, just like in 802.11. As opposed to 802.11, we assume that dierent access points do not collide when attaining channel access, i.e. they perform perfect downlink scheduling. We investigate the sum rates achievable 149 CHAPTER 5. EXPERIMENTAL EVALUATION User 1 Symbol Stream 1 User 1 Symbol Stream 2 User 2 Symbol Stream 1 User 2 Symbol Stream 2 Figure 5.4: The scattering diagram under Blind Interference Alignment during downlink transmission. We measure the spectral eciency in bits per second per Hertz (bps/Hz) transferred by each scheme, where the comparison was done looking only at the portion of the bandwidth used for data transmission (i.e. we considered only the data carriers and ignored the overhead of null carriers, pilots and cyclic prex). Since the OFDM framing for all three schemes is identical and similar to that of 802.11, we obtain a fair comparison of their throughputs. We have varied the transmitters' signal powers in a proportional way, trying to obtain a typical range of SNRs at the receivers. The receive-side SNR values span the typical high range encountered in WiFi signal transmission, from 15 dB to 30 dB. The received SNR values in our gures were estimated using non-precoded and non-synchronized isotropic broadcasts, measuring the raw received power and comparing it to the receiver noise. The same levels of total transmit power were used in the precoded synchronous transmissions. We evaluate the SINR (Signal to Noise plus Interference Power Ratio) values of the dierent symbol streams decoded by the receivers. Determining the sym- bol SINR values requires more eort in our scenario than in classic point-to-point transmission. Since our system is susceptible to power leakage from one stream to 150 5.4. BLIND INTERFERENCE ALIGNMENT 10 15 20 25 0 0.2 0.4 0.6 0.8 1 Received SNR (dB) Portion of Measurements Less or Equal User 1 User 2 Figure 5.5: The cumulative distribution function of received SNRs under the Blind Interference Alignment Scheme. another, we would like to continuously transmit over all channels in order to assess the impact of interference. To this end we sampled each symbol stream using symbols chosen from a rela- tively sparse QAM-16 constellation. We measured the variance of the constellation points on the receiver side in order to determine the sum of the noise and interference powers. The amplitude of the constellation re ects the received signal power. At the high SNR values present in our system, the clusters of constellation points are spaced suciently far apart to allow for an accurate mapping of the received sym- bols to constellation points. In order to assess the eects of interference produced by streams that follow other encodings, we have, in some experiments, xed a QAM-16 constellation on one symbol stream while employing symbols chosen according to a Gaussian or uniform distribution on the other stream. Our results have shown that at the low interference levels measured, none of the statistics collected show considerable variance depending on the type of interference. Figure 5.6a illustrates the inferred symbol error rates for the QAM-16 constel- lation transmitted. It can be easily seen that the BIA curve closely follows the TDMA curve, with only a few dB dierence. We would like to know how the qual- 151 CHAPTER 5. EXPERIMENTAL EVALUATION 15 20 25 10 −8 10 −6 10 −4 10 −2 10 0 Receiver Signal to Noise Ratio (dB) Bit Error Rate BIA TDMA (a) Bit Error Rate 15 20 25 30 0 5 10 15 Receiver Signal to Noise Ratio (dB) Sum Rate (bits/s/Hz) BIA TDMA (b) Sum rate (Gaussian codes) 15 20 25 30 0 0.5 1 1.5 2 Receiver Signal to Noise Ratio (dB) Multiplexing Gain Over TDMA BIA (c) Multiplexing gain (Gaussian codes) Figure 5.6: Experimental Results of Blind Interference Alignment 152 5.5. CONCLUDING REMARKS ity of the resulting symbol streams aects the achievable rates. To this end we have estimated the rates achievable when using capacity-achieving codes instead of the QAM-16 modulation. Figure 5.6b presents the resulting sum rates and Figure 5.6c presents the multiplexing gains. The average gain for BIA is 22%. While this is shy of the theoretical achievable multiplexing gain of 4/3, we must remember that BIA allocates power among two degrees of freedom, whereas TDMA allocates its whole transmitted power to a single transmitter. Additionally, as mentioned in Section 4.5, BIA suers from noise enhancement in one of its symbol streams, which can be seen in Figure 5.4. In the case of a Distributed MIMO system, we would expect that phase syn- chronization error could lead to random rotations of the received soft symbols. We investigated this eect by comparing the variance of soft symbols corresponding to constellation points of dierent amplitudes. We would expect that due to random rotations, the variance of the outer constellation points would be higher. How- ever, our measurements could not identify such an eect for any of the transmission schemes. 5.5 Concluding Remarks Based on the above experimental measurements, we conclude that the model as- sumptions of the previous chapters are indeed valid. Further, we demonstrate that Blind Interference Alignment provides a fractional degree of freedom gain in Dis- tributed MIMO when explicit channel state information is unavailable. The primary focus of this dissertation has been on the PHY layer primitives necessary to enable scalable massive Distributed MIMO. We have demonstrated scalable algorithms to achieve both synchronization and reciprocity calibration for large networks of access points, as would be found in high density small cell scenarios. 153 CHAPTER 5. EXPERIMENTAL EVALUATION Further, we have solved a number of the implementation challenges of such systems, and provided heuristic approaches to others. A number of important questions about the MAC layer functionality of this network remain, although it is known that it can outperform traditional Wi-Fi networks by orders of magnitude [89]. The primary impedance to adoption in the long term lies with the standardization committees, though much of the work in this dissertation can work around existing 802.11 standards. The impending spectrum crunch necessitates a radical shift in the way system designers perceive wireless networks. We have reached the upper limits of spec- trum availability for most consumer devices, and the wireless community must now aim for dramatic increases in spectral eciency. Distributed MIMO provides such an opportunity, and may prove a key technology in meeting future wireless data demands. 154 Bibliography [1] Cisco Visual Networking Index: Global Mobile Data Trac Forecast Update, 2011 - 2016, Feb 2012. [2] T. Marzetta, \Noncooperative cellular wireless with unlimited numbers of base station antennas," Wireless Communications, IEEE Transactions on, vol. 9, no. 11, pp. 3590 {3600, november 2010. [3] H. Huh, G. Caire, H. Papadopoulos, and S. Ramprashad, \Achieving \Massive MIMO" Spectral Eciency with a Not-so-Large Number of Antennas," IEEE Trans. on Wireless Commun., vol. 11, no. 9, pp. 3226{3239, Sep. 2012. [4] V. Chandrasekhar, J. Andrews, and A. Gatherer, \Femtocell networks: a sur- vey," Communications Magazine, IEEE, vol. 46, no. 9, pp. 59 {67, september 2008. [5] G. Caire and S. Shamai, \On the achievable throughput of a multiantenna gaussian broadcast channel," IEEE Trans. on Inform. Theory, vol. 49, no. 7, pp. 1691 { 1706, Jul. 2003. [6] IEEE Draft Standard for IT - Telecommunications and Information Exchange Between Systems - LAN/MAN - Specic Requirements - Part 11: Wireless LAN Medium Access Control and Physical Layer Specications - Amd 4: En- hancements for Very High Throughput for operation in bands below 6GHz, 11 2012. 155 BIBLIOGRAPHY [7] S. Sesia, I. Touk, and M. Baker, LTE{The UMTS Long Term Evolution. Wiley Online Library, 2009, vol. 66. [8] J. Lee, J.-K. Han, and J. Zhang, \Mimo technologies in 3gpp lte and lte-advanced," EURASIP J. Wirel. Commun. Netw., vol. 2009, pp. 3:1{3:10, Mar. 2009. [Online]. Available: http://dx.doi.org/10.1155/2009/302092 [9] S. A. Ramprashad and A. Benjebbour, \MU-MIMO in LTE: Performance and the challenges for future enhancement," ICC'2012, Ottawa, Canada, Jun. 2012. [10] C. Shepard, H. Yu, N. Anand, E. Li, T. Marzetta, R. Yang, and L. Zhong, \Argos: practical many-antenna base stations," in Proceedings of the 18th annual international conference on Mobile computing and networking, ser. Mobicom '12. New York, NY, USA: ACM, 2012, pp. 53{64. [Online]. Available: http://doi.acm.org/10.1145/2348543.2348553 [11] E. Larsson, O. Edfors, F. Tufvesson, and T. Marzetta, \Massive mimo for next generation wireless systems," Communications Magazine, IEEE, vol. 52, no. 2, pp. 186{195, February 2014. [12] IEEE 802.11-09/0308r12, TGac Channel Model Addendum, March 2010. [13] F. Kaltenberger, H. Jiang, M. Guillaud, and R. Knopp, \Relative channel reciprocity calibration in MIMO/TDD systems," in Proc. Future Network and Mobile Summit, 2010, Jun. 2010, pp. 1 {10. [14] IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems, 24 2008. [15] R. Irmer, H. Droste, P. Marsch, M. Grieger, G. Fettweis, S. Brueck, H.-P. Mayer, L. Thiele, and V. Jungnickel, \Coordinated multipoint: Concepts, per- 156 BIBLIOGRAPHY formance, and eld trial results," Communications Magazine, IEEE, vol. 49, no. 2, pp. 102 {111, february 2011. [16] H. Huh, A. M. Tulino, and G. Caire, \Network MIMO With Linear Zero- Forcing Beamforming: Large System Analysis, Impact of Channel Estima- tion, and Reduced-Complexity Scheduling," IEEE Trans. on Inform. Theory, vol. 58, no. 5, pp. 2911 { 2934, 2012. [17] S. A. Ramprashad and G. Caire, \Cellular vs. network MIMO: A comparison including the channel state information overhead," in IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications. IEEE, 2009, pp. 878{884. [18] IEEE Standard for Information technology{Telecommunications and informa- tion exchange between systems Local and metropolitan area networks{Specic requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specications, 29 2012. [19] R. Rogalin, O. Y. Bursalioglu, H. C. Papadopoulos, G. Caire, and A. F. Molisch, \Hardware-impairment compensation for enabling distributed large- scale MIMO," in Information Theory and Applications Workshop (ITA). IEEE, 2013, pp. 1{10. [20] T. M. Schmidl and D. C. Cox, \Robust frequency and timing synchronization for OFDM," Communications, IEEE Transactions on, vol. 45, no. 12, pp. 1613{1621, Dec. 1997. [Online]. Available: http: //dx.doi.org/10.1109/26.650240 [21] J.-J. van de Beek, M. Sandell, M. Isaksson, and P. Ola Borjesson, \Low- complex frame synchronization in ofdm systems," in Universal Personal Com- 157 BIBLIOGRAPHY munications. 1995. Record., 1995 Fourth IEEE International Conference on, nov 1995, pp. 982 {986. [22] F. Tufvesson, O. Edfors, and M. Faulkner, \Time and frequency synchroniza- tion for ofdm using pn-sequence preambles," in Vehicular Technology Confer- ence, 1999. VTC 1999 - Fall. IEEE VTS 50th, vol. 4, 1999, pp. 2203 {2207 vol.4. [23] P. Moose, \A technique for orthogonal frequency division multiplexing fre- quency oset correction," Communications, IEEE Transactions on, vol. 42, no. 10, pp. 2908 {2914, oct 1994. [24] M.-O. Pun, M. Morelli, and C.-C. Kuo, \Maximum-likelihood synchronization and channel estimation for ofdma uplink transmissions," Communications, IEEE Transactions on, vol. 54, no. 4, pp. 726 { 736, april 2006. [25] F. Boccardi, H. Huang, and A. Alexiou, \Network MIMO with reduced back- haul requirements by MAC coordination," in 42nd Asilomar Conference on Signals, Systems and Computers. IEEE, 2008, pp. 1125{1129. [26] P. Marsch and G. Fettweis, \On base station cooperation schemes for downlink network MIMO under a constrained backhaul," in IEEE Globecom. IEEE, 2008, pp. 1{6. [27] Z. Zhang, X. Gao, and W. Wu, \Algorithms for connected set cover problem and fault-tolerant connected set cover problem," Theoretical Computer Science, vol. 410, no. 8^ a\10, pp. 812 { 817, 2009. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0304397508008554 [28] T. Calamoneri, \The L(h;k)-Labelling Problem: A Survey and Annotated Bibliography," The Computer Journal, vol. 49, no. 5, pp. 585{608, 2006. 158 BIBLIOGRAPHY [29] H. V. Balan, R. Rogalin, A. Michaloliakos, K. Psounis, and G. Caire, \Achieving high data rates in a distributed mimo system," in Proceedings of the 18th annual international conference on Mobile computing and networking, ser. Mobicom '12. New York, NY, USA: ACM, 2012, pp. 41{52. [Online]. Available: http://doi.acm.org/10.1145/2348543.2348552 [30] ||, \AirSync: Enabling Distributed Multiuser MIMO With Full Spatial Multiplexing," IEEE/ACM Trans. on Networking, vol. PP, no. 99, 2012. [31] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran, \Multiuser MIMO achievable rates with downlink training and channel state feedback," IEEE Trans. on Inform. Theory, vol. 56, no. 6, pp. 2845{2866, 2010. [32] G. Foschini and M. Gans, \On limits of wireless communications in a fading environment when using multiple antennas," Wireless Personal Communications, vol. 6, pp. 311{335, 1998. [Online]. Available: http://dx.doi.org/10.1023/A:1008889222784 [33] E. Telatar, \Capacity of multi-antenna gaussian channels," European Trans. on telecommunications, vol. 10, no. 6, pp. 585{595, 1999. [34] J. Proakis and M. Salehi, Digital communications. New York, NY: McGraw- Hill, 2007. [35] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. New York, NY: Cambridge University Press, 2005. [36] G. Caire and S. Shamai, \On the achievable throughput of a multiantenna gaussian broadcast channel," Information Theory, IEEE Transactions on, vol. 49, no. 7, pp. 1691{1706, July 2003. 159 BIBLIOGRAPHY [37] S. Vishwanath, N. Jindal, and A. Goldsmith, \Duality, achievable rates, and sum-rate capacity of gaussian mimo broadcast channels," Information Theory, IEEE Transactions on, vol. 49, no. 10, pp. 2658 { 2668, oct. 2003. [38] P. Viswanath and D. N. Tse, \Sum capacity of the vector gaussian broadcast channel and uplink-downlink duality," IEEE Trans. Inf. Theor., vol. 49, no. 8, pp. 1912{1921, Sep. 2006. [Online]. Available: http://dx.doi.org/10.1109/TIT.2003.814483 [39] H. Weingarten, Y. Steinberg, and S. Shamai, \The capacity region of the gaussian multiple-input multiple-output broadcast channel," IEEE Trans. on Inform. Theory, vol. 52, no. 9, pp. 3936 {3964, Sept. 2006. [40] T. Yoo and A. Goldsmith, \On the optimality of multiantenna broadcast scheduling using zero-forcing beamforming," IEEE J. Select. Areas Commun., vol. 24, no. 3, pp. 528 { 541, Mar. 2006. [41] ||, \Capacity of fading mimo channels with channel estimation error," in Communications, 2004 IEEE International Conference on, vol. 2, june 2004, pp. 808 { 813 Vol.2. [42] C. Peel, B. Hochwald, and A. Swindlehurst, \A vector-perturbation technique for near-capacity multiantenna multiuser communication-part i: channel in- version and regularization," IEEE Trans. on Commun., vol. 53, no. 1, pp. 195 { 202, Jan. 2005. [43] M. Karakayali, G. Foschini, and R. Valenzuela, \Network coordination for spectrally ecient communications in cellular systems," Wireless Communi- cations, IEEE, vol. 13, no. 4, pp. 56{61, Aug 2006. 160 BIBLIOGRAPHY [44] I. M. Gerhard Kramer and R. D. Yates, \Cooperative communications," Foundations and Trends R in Networking, vol. 1, no. 3{4, pp. 271{425, 2006. [Online]. Available: http://dx.doi.org/10.1561/1300000004 [45] D. Gesbert, S. Hanly, H. Huang, S. Shamai Shitz, O. Simeone, and W. Yu, \Multi-cell mimo cooperative networks: A new look at interference," Selected Areas in Communications, IEEE Journal on, vol. 28, no. 9, pp. 1380{1408, December 2010. [46] V. Jungnickel, T. Wirth, M. Schellmann, T. Haustein, and W. Zirwas, \Syn- chronization of cooperative base stations," in Wireless Communication Sys- tems. 2008. ISWCS '08. IEEE International Symposium on, Oct 2008, pp. 329{334. [47] R. Zakhour and S. Hanly, \Base station cooperation on the downlink: Large system analysis," Information Theory, IEEE Transactions on, vol. 58, no. 4, pp. 2079{2106, April 2012. [48] A. Lozano, R. Heath, and J. Andrews, \Fundamental limits of cooperation," Information Theory, IEEE Transactions on, vol. 59, no. 9, pp. 5213{5226, Sept 2013. [49] Y.-S. Tu and G. Pottie, \Coherent cooperative transmission from multiple adjacent antennas to a distant stationary antenna through awgn channels," in Vehicular Technology Conference, 2002. VTC Spring 2002. IEEE 55th, vol. 1, 2002, pp. 130 { 134 vol.1. [50] R. Mudumbai, D. Brown, U. Madhow, and H. Poor, \Distributed transmit beamforming: challenges and recent progress," Communications Magazine, IEEE, vol. 47, no. 2, pp. 102 {110, february 2009. 161 BIBLIOGRAPHY [51] M. M. Rahman, H. E. Baidoo-Williams, R. Mudumbai, and S. Dasgupta, \Fully wireless implementation of distributed beamforming on a software- dened radio platform," in Proceedings of the 11th international conference on Information Processing in Sensor Networks, ser. IPSN '12. New York, NY, USA: ACM, 2012, pp. 305{316. [Online]. Available: http: //doi.acm.org/10.1145/2185677.2185745 [52] F. Quitin, U. Madhow, M. Rahman, and R. Mudumbai, \Demonstrating dis- tributed transmit beamforming with software-dened radios," in Proc. IEEE International Symposium on a World of Wireless, Mobile and Multimedia Net- works (WoWMoM), Jun. 2012, pp. 1 {3. [53] F. Quitin, M. Mahboob, R. Mudumbai, and U. Madhow, \Distributed beam- forming with software-dened radios: frequency synchronization and digital feedback," in Proc. IEEE GLOBECOM, Hsinchu, Taiwan, Dec. 2012. [54] S. Gollakota and D. Katabi, \Zigzag decoding: combating hidden terminals in wireless networks," in ACM SIGCOMM, Seattle, WA, 2008. [Online]. Available: http://doi.acm.org/10.1145/1402958.1402977 [55] H. Rahul, H. Hassanieh, and D. Katabi, \SourceSync: a distributed wireless architecture for exploiting sender diversity," in ACM SIGCOMM, New Delhi, India, 2010. [Online]. Available: http://doi.acm.org/10.1145/1851182.1851204 [56] S. Alamouti, \A simple transmit diversity technique for wireless communica- tions," IEEE J. Select. Areas Commun., vol. 16, no. 8, pp. 1451{1458, 1998. [57] S. Gollakota, S. D. Perli, and D. Katabi, \Interference alignment and cancellation," in ACM SIGCOMM, Barcelona, Spain, 2009. [Online]. Available: http://doi.acm.org/10.1145/1592568.1592588 162 BIBLIOGRAPHY [58] H. S. Rahul, S. Kumar, and D. Katabi, \Jmb: scaling wireless capacity with user demands," in Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, ser. SIGCOMM '12. New York, NY, USA: ACM, 2012, pp. 235{246. [Online]. Available: http://doi.acm.org/10.1145/2342356.2342401 [59] X. Zhang, K. Sundaresan, M. A. A. Khojastepour, S. Rangarajan, and K. G. Shin, \Nemox: Scalable network mimo for wireless networks," in Proceedings of the 19th Annual International Conference on Mobile Computing & Networking, ser. MobiCom '13. New York, NY, USA: ACM, 2013, pp. 453{464. [Online]. Available: http://doi.acm.org/10.1145/2500423.2500445 [60] K. Nikitopoulos and K. Jamieson, \Faster: Fine and accurate synchronization for large distributed mimo wireless networks," RN, vol. 13, p. 19, 2013. [61] S. Kay, \A fast and accurate single frequency estimator," IEEE Trans. on Acoustics, Speech and Signal Process., vol. 37, no. 12, pp. 1987 {1990, Dec. 1989. [62] D. Rife and R. Boorstyn, \Single tone parameter estimation from discrete-time observations," Information Theory, IEEE Transactions on, vol. 20, no. 5, pp. 591 { 598, sep 1974. [63] B. Quinn and P. Kootsookos, \Threshold behavior of the maximum likelihood estimator of frequency," IEEE Trans. on Signal Process., vol. 42, no. 11, pp. 3291 {3294, Nov. 1994. [64] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Englewood Clis, NJ: Prentice Hall, 1993. 163 BIBLIOGRAPHY [65] M. Rahman, S. Dasgupta, and R. Mudumbai, \A distributed consensus ap- proach to synchronization of RF signals," in Proc. IEEE Statistical Signal Processing Workshop (SSP), Aug. 2012, pp. 281 {284. [66] S. Niranjayan and A. Molisch, \Ultra-wide bandwidth timing networks," in Ultra-Wideband (ICUWB), 2012 IEEE International Conference on, 2012, pp. 51{56. [67] I. Schizas, A. Ribeiro, and G. Giannakis, \Consensus in ad hoc wsns with noisy links - part i: Distributed estimation of deterministic signals," Signal Processing, IEEE Transactions on, vol. 56, no. 1, pp. 350 {364, jan. 2008. [68] G. Smith, \A direct derivation of a single-antenna reciprocity relation for the time domain," Antennas and Propagation, IEEE Transactions on, vol. 52, no. 6, pp. 1568{1577, June 2004. [69] V. Jungnickel, U. Kruger, G. Istoc, T. Haustein, and C. von Helmolt, \A mimo system with reciprocal transceivers for the time-division duplex mode," in Antennas and Propagation Society International Symposium, 2004. IEEE, vol. 2, June 2004, pp. 1267{1270 Vol.2. [70] M. Guillaud and F. Kaltenberger, \Towards practical channel reciprocity ex- ploitation: Relative calibration in the presence of frequency oset," in Wire- less Communications and Networking Conference (WCNC), 2013 IEEE, April 2013, pp. 2525{2530. [71] Rice University. Rice university warp project. [Online]. Available: http: //warp.rice.edu [72] R. Rogalin, O. Bursalioglu, H. Papadopoulos, G. Caire, A. Molisch, A. Michaloliakos, V. Balan, and K. Psounis, \Scalable synchronization and reciprocity calibration for distributed multiuser mimo," 2013. 164 BIBLIOGRAPHY [73] M. Duarte and A. Sabharwal, \Full-duplex wireless communications using o- the-shelf radios: Feasibility and rst results," in Signals, Systems and Com- puters (ASILOMAR), 2010 Conference Record of the Forty Fourth Asilomar Conference on. IEEE, 2010, pp. 1558{1562. [74] M. Jain, J. I. Choi, T. Kim, D. Bharadia, S. Seth, K. Srinivasan, P. Levis, S. Katti, and P. Sinha, \Practical, real-time, full duplex wireless," in Pro- ceedings of the 17th annual international conference on Mobile computing and networking. ACM, 2011, pp. 301{312. [75] M. Duarte, C. Dick, and A. Sabharwal, \Experiment-driven characterization of full-duplex wireless systems," Wireless Communications, IEEE Transactions on, vol. 11, no. 12, pp. 4296{4307, 2012. [76] M. Garc a and C. Oberli, \Intercarrier Interference in OFDM: A General Model for Transmissions in Mobile Environments with Imperfect Synchro- nization," EURASIP J. Wireless Comm. and Networking, 2009. [77] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals & systems (2nd ed.). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1996. [78] L. Xiao and S. Boyd, \Fast linear iterations for distributed averaging," Systems & Control Letters, vol. 53, no. 1, pp. 65{78, 2004. [79] M. Huang and J. H. Manton, \Coordination and consensus of networked agents with noisy measurements: Stochastic algorithms and asymptotic behavior," SIAM J. Control Optim., vol. 48, no. 1, pp. 134{161, Feb. 2009. [Online]. Available: http://dx.doi.org/10.1137/06067359X [80] M. Huang and J. Manton, \Stochastic consensus seeking with noisy and di- rected inter-agent communication: Fixed and randomly varying topologies," IEEE Trans. on Automatic Control, vol. 55, no. 1, pp. 235 {241, Jan. 2010. 165 BIBLIOGRAPHY [81] A. N. Morabito, J. D. Sahr, Z. M. Berkowitz, and L. E. Vertatschitsch, \Stochastic modeling of phase noise in distributed coherent passive radar sys- tems." [82] A. B. Florent, F. Munier, T. Svensson, M. Flament, T. Eriksson, A. Svensson, and H. Zirath, \System implications in designing a 60 ghz wlan rf front-end." [83] T. M. Cover and J. A. Thomas, Elements of information theory. New York, NY, USA: Wiley-Interscience, 1991. [84] A. F. Molisch, Wireless communications. Wiley. com, 2010, vol. 15. [85] A. Hjorungnes and D. Gesbert, \Complex-valued matrix dierentiation: Tech- niques and key results," Signal Processing, IEEE Transactions on, vol. 55, no. 6, pp. 2740 {2746, june 2007. [86] P. Ky osti, J. Meinil a, L. Hentil a, X. Zhao, T. J ams a, C. Schneider, M. Narandzi c, M. Milojevi c, A. Hong, J. Ylitalo, V.-M. Holappa, M. Alatossava, R. Bultitude, Y. de Jong, and T. Rautiainen, \WINNER II channel models," EC FP6, Tech. Rep., Sep. 2007. [Online]. Available: http://www.ist-winner.org/deliverables.html [87] H. Huh, H. Papadopoulos, and G. Caire, \Multiuser MISO transmitter opti- mization for intercell interference mitigation," Signal Processing, IEEE Trans- actions on, vol. 58, no. 8, pp. 4272{4285, Aug. 2012. [88] H. Balan, A. Michaloliakos, R. Rogalin, G. Caire, and K. Psounis, \Ecient MAC for distributed multiuser mimo systems," University of Southern Cali- fornia, Tech. Rep. CENG-2012-7, Feb. 2012. [89] A. Michaloliakos, R. Rogalin, Y. Zhang, K. Psounis, and G. Caire, \Performance modeling of next-generation wireless networks," CoRR, vol. abs/1405.0089, 2014. [Online]. Available: http://arxiv.org/abs/1405.0089 166 BIBLIOGRAPHY [90] R. W. Yeung, Information theory and network coding. Springer Science & Business Media, 2008. [91] D. Chu, \Polyphase codes with good periodic correlation properties (cor- resp.)," Information Theory, IEEE Transactions on, vol. 18, no. 4, pp. 531{ 532, Jul 1972. [92] M. Katz and S. Shamai, \On the capacity-achieving distribution of the discrete-time noncoherent and partially coherent awgn channels," Informa- tion Theory, IEEE Transactions on, vol. 50, no. 10, pp. 2257{2270, Oct 2004. [93] H. Minn, N. Al-Dhahir, and Y. Li, \Optimal training signals for mimo ofdm channel estimation in the presence of frequency oset and phase noise," Com- munications, IEEE Transactions on, vol. 54, no. 10, pp. 1754{1759, Oct 2006. [94] Y. Chi, A. Gomaa, N. Al-Dhahir, and A. Calderbank, \Training signal design and tradeos for spectrally-ecient multi-user mimo-ofdm systems," Wireless Communications, IEEE Transactions on, vol. 10, no. 7, pp. 2234{2245, July 2011. [95] A. Pitarokoilis, S. K. Mohammed, and E. G. Larsson, \Uplink performance of time-reversal mrc in massive mimo systems subject to phase noise," CoRR, vol. abs/1306.4495, 2013. [96] P. Marsch and G. Fettweis, Coordinated Multi-Point in Mo- bile Communications: From Theory to Practice, ser. ITPro col- lection. Cambridge University Press, 2011. [Online]. Available: http://books.google.com/books?id=hDeUBsdroDQC [97] K. Manolakis, C. Oberli, L. Herrera, and V. Jungnickel, \Analytical models for channel aging and synchronization errors for base station cooperation," 167 BIBLIOGRAPHY in Signal Processing Conference (EUSIPCO), 2013 Proceedings of the 21st European, Sept 2013, pp. 1{5. [98] K. Manolakis, C. Oberli, V. Jungnickel, and F. Rosas, \Analysis of synchro- nization impairments for cooperative base stations using ofdm," International Journal of Antennas and Propagation, 2015. [99] A. Viterbi, \Phase-locked loop dynamics in the presence of noise by fokker- planck techniques," Proceedings of the IEEE, vol. 51, no. 12, pp. 1737{1753, Dec 1963. [100] H. Yang and T. Marzetta, \Performance of conjugate and zero-forcing beam- forming in large-scale antenna systems," Selected Areas in Communications, IEEE Journal on, vol. 31, no. 2, pp. 172{179, February 2013. [101] S. Sra, \A short note on parameter approximation for von mises- sher distributions: and a fast implementation of is(x)," Computational Statistics, vol. 27, no. 1, pp. 177{190, 2012. [Online]. Available: http://dx.doi.org/10.1007/s00180-011-0232-x [102] T. Gou, C. Wang, and S. Jafar, \Aiming perfectly in the dark - blind interfer- ence alignment through staggered antenna switching," in IEEE GLOBECOM, Dec. 2010. [103] S. Jafar, \Blind interference alignment," Selected Topics in Signal Processing, IEEE Journal of, vol. 6, no. 3, pp. 216{227, June 2012. [104] ||, \Exploiting channel correlations - simple interference alignment schemes with no csit," in IEEE GLOBECOM, Dec. 2010, pp. 1 {5. [105] S. Kay, \Modern spectral estimation: theory and application," Prentice-Hall Signal Processing Series, 1988. 168 BIBLIOGRAPHY [106] M. De Hoon, T. Van der Hagen, H. Schoonewelle, and H. Van Dam, \Why yule-walker should not be used for autoregressive modelling," Annals of nu- clear energy, vol. 23, no. 15, pp. 1219{1228, 1996. [107] H. Papadopoulos, \Towards Realizing Scalable Large-Scale MIMO with Non- Colocated Arrays," in IEEE Communication Theory Workshop, Phuket, Thai- land, June 23-26, 2013. 169
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Achieving high data rates in distributed MIMO systems
PDF
Large system analysis of multi-cell MIMO downlink: fairness scheduling and inter-cell cooperation
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Hybrid beamforming for massive MIMO
PDF
Congestion control in multi-hop wireless networks
PDF
Enabling virtual and augmented reality over dense wireless networks
PDF
Design and analysis of large scale antenna systems
PDF
Networked cooperative perception: towards robust and efficient autonomous driving
PDF
Distributed interference management in large wireless networks
PDF
Performant, scalable, and efficient deployment of network function virtualization
PDF
IEEE 802.11 is good enough to build wireless multi-hop networks
PDF
Using formal optimization techniques to improve the performance of mobile and data center networks
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
PDF
Improving spectrum efficiency of 802.11ax networks
PDF
Cooperation in wireless networks with selfish users
PDF
Understanding the characteristics of Internet traffic dynamics in wired and wireless networks
PDF
Enabling efficient service enumeration through smart selection of measurements
PDF
Achieving efficient MU-MIMO and indoor localization via switched-beam antennas
PDF
Channel state information feedback, prediction and scheduling for the downlink of MIMO-OFDM wireless systems
PDF
Scaling-out traffic management in the cloud
Asset Metadata
Creator
Rogalin, Ryan
(author)
Core Title
Enabling massive distributed MIMO for small cell networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
04/28/2015
Defense Date
03/11/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
basestation cooperation,massive MIMO,multi-user MIMO,OAI-PMH Harvest,small cells
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Caire, Giuseppe (
committee chair
), Govindan, Ramesh (
committee member
), Psounis, Konstantinos (
committee member
)
Creator Email
rogalin@usc.edu,ryan.rogalin@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-561566
Unique identifier
UC11298675
Identifier
etd-RogalinRya-3402.pdf (filename),usctheses-c3-561566 (legacy record id)
Legacy Identifier
etd-RogalinRya-3402.pdf
Dmrecord
561566
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Rogalin, Ryan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
basestation cooperation
massive MIMO
multi-user MIMO
small cells