Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Exploiting side information for link setup and maintenance in next generation wireless networks
(USC Thesis Other)
Exploiting side information for link setup and maintenance in next generation wireless networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Exploiting Side Information for Link Setup and Maintenance in Next Generation Wireless Networks Daoud A. Burghal University of Southern California Electrical Engineering Los Angeles, California A Dissertation Submitted to the Faculty of the USC Graduate School In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2019 Committee Members: Prof. Andreas F. Molisch (Chair) Prof. Ashutosh Nayyar Prof. Gary Rosen Abstract Establishing and maintaining a wireless communication link consume considerable resources, as it requires transceivers on both sides to search for one another, and to perform frequent channel estimation to maintain the desired quality of service levels. Next generation wireless networks are expected to provide connectivity for a wide range of applications, with various requirements and heterogeneous constraints, thus several new technologies have been introduced including new communication modes and frequency bands. This dissertation addresses emerging aspects in link setup and maintenance in two new architectures: device-to-device (D2D) communication and dual connectivity over two frequency bands. In particular, it discusses solutions for three problems: (i) neighbor discovery and (ii) Channel State Information (CSI) acquisition in D2D networks, and (iii) band assignment in dual-band systems. These problems become increasingly complex when the D2D network is dense or when the wireless devices are mobile. Interestingly, the problems can be simplified and/or solved more efficiently when additional information, e.g., about the wireless devices or the environment, is given. In this dissertation, the goal is to design efficient schemes that exploit the observable side information. In the area of CSI acquisition, we consider the problem of acquiring the CSI in base-station (BS) controlled D2D networks. Obtaining high-quality CSI requires a trade-off between inter- ference, outdatedness of CSI, and noise. Thus, the goal is to find an efficient pilot scheduling scheme that minimizes errors in the estimates and minimize signaling overhead between the BS and devices. In this report, we present the Location Aware Training Scheme (LATS) as a simple yet efficient training technique. Using location as side information, and assuming that devices are aware of such information, LATS groups the devices into geographical segments and assigns a frequency reuse pattern to them. Next, as part of our contribution to the area of neighbor discovery, we consider the problem of randomized directional neighbor discovery with prior information. In general, minimizing the discovery time is the goal of neighbor discovery schemes. We study the average discovery time for directional random neighbor discovery when devices have side information about their set of possible neighbors, which also helps in identifying the performance limits of random neighbor discovery schemes. Typically, discovery time analysis is done for assumptions that simplify the network structure, such as uniform neighbor relations for all devices. However, with prior in- formation, the directional transmission probabilities depend on the node and the direction. This complicates the analysis of the expected discovery time though it also improves performance when used correctly. We first provide a closed-form expression for the expected discovery time based on the non- uniform coupon collector problem. Next, we identify the directional transmission probabilities of each device that achieve a small discovery time. Due to the mathematical complexity, we provide a lower and an upper bounds on the expected discovery time, which allows writing the problem as a convex optimization problem. Through simulations, we demonstrate the performance gain due to prior knowledge with the proposed methods as compared to when no prior information is available, as well as the impact of uncertainty in the prior knowledge. In the third problem, we consider the band assignment (BA) in dual-band systems, where the BS chooses one of the two available frequency bands (centimeter-wave and millimeter-wave bands) to communicate with the user equipment (UE). While the millimeter-wave band might offer higher data rate, there is a significant probability of outage. To maintain the link during the outage the communication should be carried on the (more reliable) centimeter-wave band. We consider two variations of the BA problem, one-shot and sequential BA. For the former the BS uses only the currently observed information to decide whether to switch to the other frequency band, for the sequential BA, the BS uses a window of previously observed information to predict the best band for a future time step. We provide two approaches to solve the BA problem, (i) a deep learning approach that is based on Long Short Term Memory and/or multi-layer Neural Networks, and (ii) a iii Gaussian Process-based approach, which relies on the assumption that the channel states are jointly Gaussian. We compare the achieved performances to several benchmarks in two environments: (i) a stochastic environment, and (ii) microcellular outdoor channels obtained by ray-tracing. In general, the deep learning solution shows superior performance in both environments. iv Acknowledgments As I am approaching the conclusion of my doctoral studies, it is necessary to pause and reflect on the journey. I was frankly fortunate, among many things, to receive much guidance and support without which this work would not have been possible. I start by recalling an Arabic proverb which means a person is in eternal debt for his teacher. With that, I would like to convey my gratitude to my Ph.D. advisor Professor Andreas Molisch for his patience, guidance, and support throughout the last few years. Prof. Molisch represents a true example of dedication and passion for scientific research with scholarly knowledge and humility. I am thankful for the unique research flexibility that he offered me, with which I was able to branch into different directions, allowing me to taste research versatility and come to appreciate different fields. I would like also to thank my defense and qualifying exams committees members Prof. Ashutosh Nayyar, Prof. Gary Rosen, Prof. Bhaskar Krishnamachari, and Prof Urbashi Mitra, for their valuable feedback and insightful comments. I have also benefited and enjoyed their graduate- level courses that I attended. Indeed, I am grateful to all my teachers at USC for providing the finest quality of teaching through a wide variety of courses. Furthermore, I am thankful to the USC and its staff for facilitating all the administrative pro- cesses during my studies, I am especially thankful to Diane Demetris, Susan Wiedem and Gerrielyn Ramos, as I used to seek their assistance in many administrative issues. I am also thankful for the Fulbright, the NSF and Intel Corporation for the funding that I received for parts of my Ph.D. studies. Throughout my study years, I enjoyed being part of the electrical engineering department, I met many brilliant individuals. I am especially glad that I was part of the WiDeS research group and thankful to all the group members: Umit Bas, Hao Feng, Dr. Shengqian Han, Vinod Kristem, Ming-Chun Lee, Zheda Li, Jorge Gomez Ponce, Vishnu Ratnam, Seun Sangodoyin, Aditya Sundar, and Rui Wang. I always benefited from the discussions that we had during our meetings. Living far away from home and family is not easy, I was blessed with many friends in Los Angeles, spending times with them provided the needed stress relief, they were also there when I needed them the most, to all of them I say thank you, I will always remember our times here, the dinners, the soccer games, the road trips, the hikes, and the deep discussions. I would like to express my sincere gratitude and appreciation to my parents, my sisters, and brothers, for all the unconditional support and the constant encouragements that I received through- out the years. I would like especially to thank my parents, besides the upbringing and the selfless care that they provided, they have instilled two precious attitudes that proved to be valuable in my Ph.D. journey: the scientific curiosity and the appreciation to hard work. And last but not least I am thankful for God, the All-Knowing the All-Wise the Exalter, for all above, for all the opportunities that I had, the experiences that I went through and all the amazing people that I had the chance to know. vi Contents List of Figures xii List of Tables xv 1 Introduction 1 1.1 Future Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Key Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Key Technologies and Solutions . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Emerging Challenges in Link Setup and Maintenance . . . . . . . . . . . . . . . . 5 1.2.1 An Overview of Link Setup and Maintenance - LTE Cellular System . . . . 5 1.2.1.1 Basic LTE Network and Channel Structures . . . . . . . . . . . 5 1.2.1.2 Communication Link Setup . . . . . . . . . . . . . . . . . . . . 6 1.2.1.3 Channel Estimation . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1.4 Handover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.2 Link Setup in D2D Networks . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.3 Link Maintenance for MmWave Band Links . . . . . . . . . . . . . . . . . 12 1.3 Side Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Research Summary and Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . 15 1.4.1 CSI acquisition D2D Networks . . . . . . . . . . . . . . . . . . . . . . . 15 1.4.1.1 Major Contributions . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.1.2 Relevant Chapters and Peer Reviewed Scientific Papers . . . . . 17 1.4.2 Neighbor Discovery with Prior Information . . . . . . . . . . . . . . . . . 17 CONTENTS 1.4.2.1 Major Contributions . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4.2.2 Relevant Chapters and Peer Reviewed Scientific Papers . . . . . 18 1.4.3 Band Assignment in Dual Band Systems . . . . . . . . . . . . . . . . . . 19 1.4.3.1 Major Contributions . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.3.2 Relevant Chapters and Peer Reviewed Scientific Papers . . . . . 21 2 CSI Acquisition 22 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 Location Aware Training Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.1 Network and Channel Models . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.2 Scheme Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.1 Performance Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.2 Impact of Location Information: Overhead and Performance . . . . . . . . 41 2.3.2.1 Frequency of Location Update . . . . . . . . . . . . . . . . . . 41 2.3.2.2 Location Uncertainty . . . . . . . . . . . . . . . . . . . . . . . 42 2.3.3 LATS Optimization Problem and Algorithm . . . . . . . . . . . . . . . . . 44 2.4 Simulation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.5.0.1 Derivation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.5.0.2 The Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.5.0.3 Number of Segments Per Tier . . . . . . . . . . . . . . . . . . . 56 2.5.0.4 Diagonal Segments . . . . . . . . . . . . . . . . . . . . . . . . 57 2.5.0.5 Average Number of Segments Per Tier . . . . . . . . . . . . . . 60 3 Expected Discovery Time With Prior Information 61 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.1.2 Chapter’s Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 viii CONTENTS 3.1.3 Chapter Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2 System Model and Randomized Neighbor Discovery . . . . . . . . . . . . . . . . 66 3.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2.1.1 The Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2.1.2 Implementation Examples . . . . . . . . . . . . . . . . . . . . . 67 3.2.1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.2.2 Probability of the Successful Discovery . . . . . . . . . . . . . . . . . . . 69 3.2.3 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.3 Expected Discovery Time and Bounds . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3.1 Expected Discovery Time . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3.2 The Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3.3 The Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.4.1 Upper Bound Based Metric . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.4.2 Approximation Based Metric . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.5 Model with Uncertainty—a Generalization . . . . . . . . . . . . . . . . . . . . . . 80 3.5.1 Probability of Success . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.5.2 Optimization Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.6 Simulation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.6.1 Performance of the Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.6.2 N i; andN i; Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.7.1 Proof of Theorem 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.7.2 Relation Between Bounds and Distribution . . . . . . . . . . . . . . . . . 92 3.7.3 Proof of Theorem 3.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.7.3.1 Proof of Theorem 3.7.1 . . . . . . . . . . . . . . . . . . . . . . 94 3.7.3.2 Proof of Theorem 3.7.2 . . . . . . . . . . . . . . . . . . . . . . 97 3.7.4 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 ix CONTENTS 4 Band Assignment in Dual Band Systems 101 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.1.1 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.2 Problem and Solutions Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.2.1 Basic System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.2.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.2.2.1 One-Shot Band Assignment . . . . . . . . . . . . . . . . . . . . 107 4.2.2.2 Sequential Band Assignment . . . . . . . . . . . . . . . . . . . 107 4.2.3 Solutions Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.2.3.1 Gaussian Process Based Solutions . . . . . . . . . . . . . . . . 108 4.2.3.2 Learning Based Solutions . . . . . . . . . . . . . . . . . . . . . 109 4.2.4 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.3 Gaussian Process Based Band Assignment . . . . . . . . . . . . . . . . . . . . . . 110 4.3.1 One-Shot Band Assignment . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.2 Sequential Band Assignment . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.3.2.1 Exact Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.3.2.2 Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.4 Learning Based Band Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.4.1.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.4.1.2 Learning Techniques Overview . . . . . . . . . . . . . . . . . . 116 4.4.1.3 Training and Testing . . . . . . . . . . . . . . . . . . . . . . . . 117 4.4.2 One-Shot Band Assignment . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.4.3 Sequential Band Assignment . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.5 Experiment I: Stochastic Environment . . . . . . . . . . . . . . . . . . . . . . . . 120 4.5.1 The Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.5.2 One-Shot Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 x CONTENTS 4.5.2.1 Data Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.5.2.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.5.3 Sequential Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.5.3.1 UE Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.5.3.2 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.5.3.3 Performance ( = 1) . . . . . . . . . . . . . . . . . . . . . . . . 124 4.5.3.4 Performance ( = 1:9) . . . . . . . . . . . . . . . . . . . . . . . 127 4.6 Experiment II: Simulated Campus Environment . . . . . . . . . . . . . . . . . . . 129 4.6.1 The Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 4.6.2 One-Shot Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.6.2.1 Data Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.6.2.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 4.6.3 Sequential Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.6.3.1 Data Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.6.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5 Concluding Remarks and Future Outlook 136 5.1 CSI Acquisition and Neighbor Discovery In D2D networks . . . . . . . . . . . . . 136 5.1.1 Future Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.2 Band Assignment in Dual Band Systems . . . . . . . . . . . . . . . . . . . . . . . 138 5.2.1 Future Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Bibliography 141 xi List of Figures 1.1 The LTE network structure: the figure shows two networks Network-B and Network- R. Network-B has two eNBs and it is the home network for the UE. . . . . . . . . 6 1.2 As simplified downlink channel structure in LTE for FDD System [1]. . . . . . . . 7 1.3 A simplified link setup procedure in LTE, the UE tries to ”attach” (connect) to Network-B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 A simplified handover procedure from eNB A to eNB B in Network-B. . . . . . . 9 1.5 Different Communication Links and Direct Communication Coverage Scenarios [2]. 11 1.6 Distribution of the shadowing values at 28 GHz from a point cloud simulation of Narinkkatori square, Kamppi district, central Helsinki, Finland. The point cloud data were configured by real measurements. (a) The satellite view of Narinkkatori square, (b) split of the square to six path-loss regions, (c) the empirical CDF against the Gaussian distribution of shadowing and the histogram with non line of sight for values extracted using linear fit path-loss model, (d) the empirical CDF against Gaussian distribution of shadowing and the histogram with non line of sight, the values extracted using multi fit model using regions in (b), more discussion can be found in [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1 Structure of LATS: a simplified segmented cell and LATS cell components. In this figureS n = 1 andG n = 2. The virtual transmitters of training set number 13 are scheduled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 LIST OF FIGURES 2.2 Diagonal and vertical/horizontal segments in the first tier, l = 1, and the two segments that represent them. The average interference distance depends on the segment side lengthL s , i.e.,S n , and the guard distanceL g , i.e.,G n . In this figure S n = 1 andG n = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.3 Theoretical evaluation and Simulation of NMSE as function of segmentation pa- rameterS n , forL N = 600 m,L c = 25m and = 0:008 dev./m 2 . . . . . . . . . . . 47 2.4 All-tiers NMSE as function of segmentation parameter S n , for L N = 1:4 km, L c = 25m and = 0:008 dev./m 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5 Fraction of the coherence time used by LATS, for coherence time 0:025 sec, L N = 1:4 km,L c = 25m and = 0:008 dev./m 2 . . . . . . . . . . . . . . . . . . 49 2.6 NMSE vs.L N for several device velocities for fixed device density 0:008 dev./m 2 . 49 2.7 NMSE for LATS, TDMA and RA schemes. . . . . . . . . . . . . . . . . . . . . . 51 2.8 LATS vs. Optimal CSI acquisition and other schemes. Simple example: L N = 150m,L c = 25m and 8 device. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.9 F LU : number of location updates per device per second for SMS [4] and simple motion model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.10 NMSEe 0 NMSE e 0 =0 NMSE e 0 =0 vs. distance errore 0 . . . . . . . . . . . . . . . . . . . . . . . . 55 2.11 Subset of segments and the segments that have the same value ofZ . . . . . . . . . 57 2.12 N b = 10,l 0 = 9 Tiers for segment at (2; 2);i:e:;i = 2;j = 0 . . . . . . . . . . . . 58 2.13 Tiers for a segment at (1; 4), i.e.,i = 1 andj = 3. (N b = 10,l 0 = 9) . . . . . . . . 59 2.14 Tiers for a segment at (3; 5), i.e.,i = 3 andj = 2. (N b = 10,l 0 = 9) . . . . . . . . 60 3.1 Neighbors of nodei in the cmWave and mmWave frequency bands. In this exam- ple, in direction i nodei has three neighbors in the lower frequency band and one neighbor in the upper frequency band, i.e.,jN i; i j = 3 andjN i; i j = 1. . . . . . . . 68 xiii LIST OF FIGURES 3.2 An uncertainty example in a dual band system due to different number of antennas in different bands, where L = 2 and U = 4. Node j, that is neighbor to i in direction 0 ij = m1 in the cmWave band, could be neighbor in directions ij 2 fc1;c2g in the mmWave band, i.e.,A ij =fc1;c2g. With this knowledge, it is reasonable to assume ij ( i ) = 1 2 ; i 2A ij . Note thatj2N i;c1 andj2N i;c2 , while in realityj2N i;c2 only. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.3 T versus average number of neighbors. The used lower bound (LB) is 1 N P N i T LBi . 85 3.4 T max versus average number of neighbors. The used lower bound (LB) max i T LBi . . 86 3.5 Monte Carlo evaluation ofEfmax i T i g . . . . . . . . . . . . . . . . . . . . . . . . 87 3.6 T versus number of beam directions (). . . . . . . . . . . . . . . . . . . . . . . 88 3.7 T versus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.8 T with two uncertainty models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.9 T in dual band system, with two values of number beam directions in the lower band L =f2; 5g, and varying number of beam direction in the upper band. . . . . 91 4.1 NN and LSTM diagrams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.2 E S vs. U for two combination ”loc+cm+mm” and ”cm+mm” ((c-6) and (c-5), respectively, in Table 4.5). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.3 E S vs. the decision threshold T with for = 1 for two features combinations ”loc+cm+mm” and ”cm+mm”, respectively, (c-5) and (c-6) in Table 4.5. . . . . . . 127 4.4 The impact of the size of the observation windowQ in stochastic environment with circular trajectory. For this data set the percentage of ”1”s is 75%. . . . . . . . . . 129 4.5 (a) Ray-tracing simulation environment. The green dot is the BS located above the rooftop, while simulated UEs are red routes. Gray objects represent the buildings. The green 3D polygons denote foliage with different densities. (b) Using 70% of the data for testing, from left to rightA S andA T . . . . . . . . . . . . . . . . . . . 130 4.6 E S vs. U for three features combinations ”loc+cm+mm”, ”cm+mm” and ”De- lay+AoD+cm+mm”, respectively, (c-3), (c-4) and (c-9) in Table 4.8. . . . . . . . . 135 xiv List of Tables 2.1 Table of key mathematical symbols . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1 Table of key mathematical symbols . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1 DL structures, note that all structures are followed by (FC:2)+(SOFT)+(CLASS), where FC is a fully connected linear transformation layer (parallel neurons), RelU: a nonlinear RelU layer [5], SOFT: a softmax layer, CLASS: classification layer,F is the features size. The hidden size of the LSTM layer is shown after ”LSTM”. . . 119 4.2 Stochastic channel simulation configurations . . . . . . . . . . . . . . . . . . . . . 121 4.3 Ray-tracing simulation configurations. . . . . . . . . . . . . . . . . . . . . . . . . 121 4.4 Performance of the learning over the stochastic data under different feature avail- ability. Note that on average 49:3% of the labels are "1". . . . . . . . . . . . . . . 122 4.5 Sequential BA = 1: Performance of the learning over the stochastic data under different feature availability. Note that on average 47:9% of the labels are "1". The first number in each entry in rows 6 17 denotes the prediction error withU = 4, and the second withU = 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.6 Sequential BA = 1:9: Performance of the learning over the stochastic data under different feature availability. Note that on average 48:4% of the labels are "1". . . . 127 4.7 Performance of the learning techniques on ray-tracing data, under different feature availability, note that the percentage of points with labels equal to "1" is approxi- mately 30%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 LIST OF TABLES 4.8 Performance of the solutions on ray-tracing data under different feature availabil- ity. The percentage of points with labels equal to "1" is approximately 30%. Re- sults in rows 8-19 correspond toU = 4=U = 8. . . . . . . . . . . . . . . . . . . . 133 xvi Chapter 1 Introduction 1.1 Future Wireless Networks Today, over two-thirds of the world population use mobile phones. Since their advent in the early ’80s of the last century, the mobile communication standards have been set mainly to facilitate human wireless communication, that evolved from basic voice calls in the first generation to video calls and various other forms of data communication in the last generation. However, this is set up for a change as the next generation wireless networks are envisioned to address a wider range of communication avenues, that have emerged with the new machine-oriented communication, changing mobile use cases and user trends, which challenge the conventional cellular communica- tion structure, and tighten the required quality of service (QoS) constraints. 1.1.1 Key Requirements The vision of smart cities with smart homes, safer roads, etc. rely on enabling machine (to pos- sibly machine) communication. This new communication paradigm is essential for many new applications such as the internet of things (IoT), autonomous vehicles communication, and several others that use machine-to-machine (M2M) communication. Supporting such communication is one of the unique requirements for the next generation of wireless networks. This introduces sev- eral challenges, for instance by 2022 it is expected that M2M connections will account for over Key Requirements half of the global connections [6]. This is challenging not only because of the massive number of new connections, but also the new constraints that need to be met to satisfy some of the emerg- ing applications. For instance, while Vehicle-to-Anything (V2X) might use (initially) low data rates, they have stringent delay requirements. These lead to a reduction in the delay to as low as 1 ms [7, 8], which is an order of magnitude less than the fourth generation wireless standards (4G). This delay requirement is also needed by many new applications such as augmented reality and mission-critical applications. The researchers and standardization committees have recognized the machine-type and the delay sensitive applications under two categories: massive Machine Type Communication (mMTC) and Ultra-Reliable Low Latency Communication (URLLC) [9–11], re- spectively. The next generation wireless networks are also expected to satisfy the unprecedented demand on wireless data. This demand is attributed mainly to the proliferation of smart mobile devices, such as mobile phones and tablets, accompanied by the increasing dependency on such devices. In fact, it is predicted that over 70 percent of the total IP traffic will be from wireless devices by 2022 [6], according to Cisco Visual Networking Index (CVNI), an annual report that provides a forecast on global IP and mobile traffic projections and internet trends. It is not only the increasing dependency on mobile devices, but also the type of transmitted traffic that fuel such enormous demand; e.g., using data thirsty applications such as gaming and video streaming are prevalent; in fact, the latter is predicted to account for over 82 percent of the total traffic by 2022 [6]. Thus for the next generation wireless networks the targeted throughput is 100 Mbps for the edge user to achieve high definition video streaming [8, 11], this is a 100 fold increase in data rate compared to 4G. This application category has been recognized under the Enhanced Mobile Broadband (eMBB) category [11]. Several of the foreseen new use cases of mobile devices challenge the traditional structure of cellular communication. For instance, future wireless networks are also anticipated to play a major role in public safety [12]. Such use case requires highly flexible communication models that maintain communication even in disastrous situations [13, 14]. In such scenarios, infrastructure- based communication is deemed unreliable, motivating ad-hoc like communication. 2 Key Technologies and Solutions The success of wireless communication is typically conditioned on its reliability. With the emerging applications and use cases, the future wireless networks must be highly reliable with minimal outage probability, and support high mobility for seamless user experience. Although the current wireless standards have successfully achieved high data rates for drop based models, wherein any given location the users can experience acceptable performance level, they, however, experience deterioration in the performance as they move [15,16]. Supporting highly mobile appli- cations is an essential requirement of the wide range applications in the fifth generation networks (5G), in eMBB and mMTC categories. As a result, the network must show resilience to different channels and under different conditions. 1.1.2 Key Technologies and Solutions Realizing the future wireless networks is contingent upon using ingenious solutions and technolo- gies, here we highlight some of the proposed techniques. • With the increasing number of wireless devices that must share the limited resources, several network densification methods were suggested to enable larger resource reuse. In one solu- tion, smaller cells are used, such as femtocells and picocells [8]. In another solution, traffic offloading is used, e.g., by utilizing the existing technologies, such as wifi networks, which could be realized by integrating the wifi standards in future networks standards. Enabling wireless devices to communicate directly to nearby devices without the detour through the infrastructure is also a considered solution. This reduces the delays and allows for larger data rates and low energy consumption due to the proximity of devices to one another. This mod- ification to the network structure increases the flexibility of the network and enables several proximity services [13, 14]. Depending on the application, this solution has been referred to as device-to-device (D2D) communication or Vehicle-to-Vehicle communication (V2V). • Adding a new spectrum to the currently congested centimeter-wave frequency band (cmWave) is at the forefront of the considered solutions. The large free bandwidth in the millimeter- wave frequency band (mmWave) makes it suitable for the high data rates applications. In 3 Key Technologies and Solutions addition, the propagation characteristic at the mmWave band [8, 17], 1 e.g., the high attenua- tion and the bad through-wall penetration, are especially useful for the network densification efforts as this naturally reduces the interference [8]. • It is well established that several antennas at both ends of the communication links, i.e., using multi-input multi-output (MIMO), has number of benefits, such as attaining diversity, beamforming, and/or multiplexing gains, which offer substantial capacity improvements by combating number of channel impairments and/or the ability to simultaneously transmit par- allel data streams [18]. Recent studies, have considered the asymptotic performance as the number of antennas grows to infinity; it was shown that the impact of small fading vanishes with large number of antennas [19, 20], in addition the narrower beams width (enabled by a large number of antennas) results in quasi-orthogonal links, which significantly limits the interference and can be used to increase the number of parallel links, allowing for massive multi-user MIMO [19], which is especially useful for dense networks. However, due to practical constraints, the number of antennas is restricted. Interestingly, a large number of antennas can be realized in the mmWave frequency band with a small form factor due to the small wavelength associated with high frequencies [18]. • While the above solutions enable high data rates, allow for a large number of connected devices and enable new use cases, a number of other solutions were proposed to further enhance the latency and reliability. The suggested solutions include, efficient caching pro- cedures, edge computing, distributed management along with improving the physical layer procedures, waveforms, and frame structures [8, 11]. Implementing these solutions is not straightforward, it results in a plethora of challenging re- search problems. In this sequel, we focus on D2D communications, and on systems that use the mmWave band. In particular, we tackle issues related to link setup and maintenance in these sys- tems. 1 In a slight abuse of notation, we call here the sub-6GHz band the cmWave band, and 24-100 GHz the mmWave band. This is inspired by the current 3GPP and WiFi frequency ranges. 4 1.2. EMERGING CHALLENGES IN LINK SETUP AND MAINTENANCE 1.2 Emerging Challenges in Link Setup and Maintenance Before introducing some of the challenges we provide an overview of link setup and maintenance procedures in a cellular system, as it shall give a better understanding of the basic procedures in ”conventional” mobile communication, and help to establish the necessity of such procedures in communication systems in general. For simplicity, we here use the Long Term Evolution (LTE) network as an example of cellular systems. 1.2.1 An Overview of Link Setup and Maintenance - LTE Cellular System 1.2.1.1 Basic LTE Network and Channel Structures Fig.1.1 shows a basic diagram of the Evolved Packet System (EPS) architecture, also sometimes referred to as LTE network. It consists of users equipment (UEs), BSs and a core network that can be connected to another Packet Data Network (PDN), such as the internet. In EPS the BS and the core network are, respectively, referred to by evolved Node B (eNB) and Evolved Packet Core (EPC). 2 The eNBs provide wireless connectivity to the UEs and handle radio related tasks, several eNBs can be interconnected by X2 interface and are connected to the EPC network by S1 links. The EPC consists of several entities to manage the UEs, e.g., authentication, billing, IP packet forwarding. In LTE downlink communication (from the eNB to the UE) Orthogonal Frequency Division Multiplexing (OFDM) is used for modulation, based on orthogonal sub-carriers with 15kHz sep- arations. One Resource Block (RB) consists of 12 sub-carriers and seven OFDM symbols that spans one slot of 0:5ms duration, the duration of one OFDM symbol and a sub-carrier is defined as a Resource Element (RE). The number of total sub-carriers used in the network depends on the available bandwidth, with a minimum of 72 sub-carriers in 1:4MHz bandwidth. The standard uses a frame structure of 10 ms period. The structure contains channels and signals, where the channel here refers to the resource block structure used to convey certain types of control signals or user 2 Note that in this dissertation we use the term BS as a generic expression for the eNB in LTE, ”gNB” in 3GPP New Radio, or an Access Point with both 802.11ac/ax and 802.11ad/ay in WiFi. 5 An Overview of Link Setup and Maintenance - LTE Cellular System eNB A eNB B eNB C UE EPC (Netw or k - B) EPC (Netw or k -R) PDN X 2 S1 S1 S1 Figure 1.1: The LTE network structure: the figure shows two networks Network-B and Network-R. Network-B has two eNBs and it is the home network for the UE. data. The structure of the LTE downlink channel is shown in Fig 1.2, with some channels and signals. 1.2.1.2 Communication Link Setup When the UE is first powered on it does not have any connection with the network, thus it starts scanning the radio frequencies for the wireless service provider. The UE listens to the broadcasted signal by each eNB to identify the network, this is done by listening to the two synchronization signals (PSS and SSS in Fig.1.2) to align its timing and frequency to the eNB. These signals also indicate the “physical name” of the cell. The UE uses this information to detect the Cell Reference Signal (CRS) which it uses to estimate the CSI that it uses to demodulate the signals carried in the PBCH, such as Master Information Block (MIB). The MIB contains several system information such as the system bandwidth and other channel configurations, which are needed to decode other system information in other channels, e.g., the name of the service provider (PLMN). If the channel 6 An Overview of Link Setup and Maintenance - LTE Cellular System Slot (0.5 ms) 7 OFDM symbols Resource Element (RE) One sub-carriers (15 KHz Bandwidth) Cell Reference Signal (CRS) 72 sub-carriers RB pair 12 carriers over 1 ms (14 OFDM symbols) Frame (10 ms) PSS (Primary Synch. Signal) SSS (Secondary Synch. Signal) PBC H (Physical Broadcast Channel) CR S (Cell Reference Signal) Other Signals Resource Block (RB) Figure 1.2: As simplified downlink channel structure in LTE for FDD System [1]. quality and the service provider are acceptable the UE starts a process to access the network. Fig. 1.3 shows a simplified version of the link setup procedure. Note that it is possible that the UE has coverage from several service providers (e.g., the Black and the Red networks in Fig. 1.1), then if the UE started scanning the “wrong” networks, it has to restart the process at a different carrier frequency. 1.2.1.3 Channel Estimation The CSI is a critical component in many wireless communication techniques, besides the coherent demodulation, it is assumed in some cooperative and scheduling techniques. In LTE, as discussed above, the UE uses the CRS to estimate the channel state to demodulate the data in other channels, and to calculate the received signal power and quality, which is needed for several communication procedures such as the handover process that is discussed below. For the uplink (communication from the UE to eNB), the UE is requested to transmit two types of reference signals: the Demod- ulation Reference Signal (DMS) and the Sounding Reference Signal (SRS), the former is used to 7 An Overview of Link Setup and Maintenance - LTE Cellular System EPC (Network-B) Cell Search, Synchronization and ID Channel Estimation Reading Network Parameters Random Access 3 rd Layer (RRC) Connection Authentication Bearer Setup UE Network Attachment eNB A UE Correct Cell/Network ? S1 Figure 1.3: A simplified link setup procedure in LTE, the UE tries to ”attach” (connect) to Network-B. demodulate the received data at the eNB side, while the latter is used to assess the channel quality at certain RBs. The structures of these signals are defined in the standard, while the exact configu- rations are conveyed to the UE during the link setup. Several other types of RSs are defined in the LTE standard, but will not be discussed here further for brevity. On a side note, there have been major modifications to the sounding signals since Release 10 [2]. One goal is to reduce the ”always on” signalling for the 5G New Radio (NR) [11]. The modifications include introducing new sounding signals and modifying the structure and properties of the existing ones, for instance, the CRS was replaced with pilot signals that follow flexible transmission patterns. This obviously impacts the link setup and maintenance. 1.2.1.4 Handover The channel conditions might vary over time, e.g., due to the UE mobility, then it is possible, for instance, that a neighbor eNB, or different radio access technology (RAT) to offer better coverage. Then, if the UE is connected to the network, the host eNB might initiate a handover process. To make such a decision the eNB must assess the channel quality the UE observes in the currently 8 An Overview of Link Setup and Maintenance - LTE Cellular System EPC (Network-B) eNB A eNB B Measurement Reporting Handover Request (UE/Service Properties) Acknowledgment (RCC New Config) S1 S1 X2 Handover? UE Reconfig. UE Data Downlink Path Switch Reconfiguration Random Access, Complete RCC establishment Figure 1.4: A simplified handover procedure from eNB A to eNB B in Network-B. available resources compared to the other ones (e.g., different eNB, frequency band, RAT). To do that the UE must send measurement reports to the host eNB that contains the observed channel qualities. The measurement and reporting frequencies, periodicities, and the type of measured information are configured by the eNB when the communication link is initially established. In case a handover to a neighbor eNB is required (e.g., from eNB A to eNB B in Fig. 1.1, the serving eNB sends a handover request to the neighbor eNB (eNB B in Fig 1.1), which needs to determine the needed resources by the UE against the currently available ones, the process continues as depicted in Fig.1.4. The measurement process is slightly complicated, for instance, the neighbor eNB could use a different carrier frequency compared to the serving eNB. In that case, the UE must switch to the other frequency to measure the channel state and then switch back to the carrier frequency used by its serving eNB, this may result in data loss. To mitigate this problem, the standards define a measurement gap of 6 ms during which no dedicated communication to the UE is permitted so that the UE can safely switch to the other carrier frequencies [21]. The configuration of the 9 Link Setup in D2D Networks measurement gap, such as its periodicity, is also conveyed to the UE during the link setup. 1.2.2 Link Setup in D2D Networks As discussed above, enabling D2D communication is a promising answer to many future networks’ requirements, whereby exploiting direct communications between devices, higher data rates, lower energy consumption, and better spatial reuse can be achieved [22–24]. The 3rd generation part- nership project (3GPP) has introduced D2D communication in recent 4G releases for limited use cases; it is usually recognized as Proximity Services (ProSe) [2]. In the standards, the D2D link is typically referred to by ”sidelink”, to distinguish it from the uplink and downlink, see Fig.1.5. In general, setting up a direct link between devices requires that such a link has sufficient channel quality (low attenuation); devices that fulfill this condition and thus can communicate with each other are called ”neighbors”. Interestingly, the standards have distinguished between two types of direct connectivity for ProSe [2, 25]: • ProSe discovery, where the UE could discover nearby services, e.g., for commercial pur- poses. • Direct communication between two UEs, to exchange data between the UEs. The distinction between the two is due to the fact that some direct communication, such as broad- casting, do not require the neighborhood information, while, as highlighted above, the discovery process can be useful for many applications other than direct communication. In this dissertation, with a slight deviation from the used terminology in the standards, we will frequently use D2D communication to refer to direct communication that requires the neighbor discovery process. The link setup and maintenance depend on the availability of the BS. For instance, the BS can participate in the resource allocation, e.g., by assigning the RBs needed for ProSe; it could also provide the needed coordination to mitigate interference. 3GPP has recognized three scenarios [25], see Fig. 1.5, • In coverage, where the link can be established within the coverage of the BS. 10 Link Setup in D2D Networks In coverage Partial coverage Out of coverage Cellular Communication D2D communication Figure 1.5: Different Communication Links and Direct Communication Coverage Scenarios [2]. • Partial-coverage, where one of the devices falls outside the coverage area of the BS. • Out of coverage, where both devices are outside the network coverage. When in coverage, the BS can assign the resources on a per UE basis [25], alternatively, the net- work can dedicate an RB pool for the ProSe, where the UE can select the needed RB autonomously. Note that the RB pool can be used as well when the UEs are out of coverage or partially in cover- age, as it can be, for example, hard-coded into the UEs. As a result, due to the structure of the D2D network, the procedures for setting up a D2D link are fundamentally different from the ones used in the cellular systems. The following two fall within the scope of this thesis: • The UEs have to perform neighbor search before establishing a communication link with them, i.e., neighbor discovery is needed [26]. • In contrast to the cellular networks, where only the CSI between UEs and the BS is required, in D2D networks the CSI between the devices is needed. Thus new methods are required for CSI acquisitions. 11 Link Maintenance for MmWave Band Links We here point out that the procedure for CSI acquisition has been addressed in the recent standards release, using the SCI signal [25], however, this is meant for one link, and, as we discuss below and in Chapter 2, the CSI between several nearby devices could be needed. We note that in both cases, CSI acquisition and neighbor discovery, the devices have to transmit pilot signals to verify neighbor relations and to estimate the CSIs, i.e., they must participate in the transmission of the pilot signals as well. This makes the channel estimation and neighbor discovery distinguished from the cell search and channel estimation in a cellular system. Furthermore, although the D2D communication is a type of peer-to-peer communication and similar problems were addressed in the peer-to-peer literature, the possible presence of a central controller raises different performance expectations and constraints. In this dissertation, we utilize the available side information to address these problems; we elaborate on side information and the considered problems in sections 1.3 and 1.4, respectively. 1.2.3 Link Maintenance for MmWave Band Links As mentioned earlier, the usage of the mmWave frequency band will help to satisfy the contin- uously increasing demand for wireless data. However, the harsh propagation environment in the mmWave band degrades the reliability and the QoS in a number of scenarios, due to the increased intermittent connectivity of the UE [15], which significantly increases link maintenance efforts, e.g., through frequent handover procedures. Note that the propagation conditions and the large path loss could also complicate the link setup as well. The susceptibility of mmWave links to blockage and other severe propagation conditions mo- tivates the utilization of multi-frequency bands in communication systems, for example, joint uti- lization of cmWave and mmWave bands, i.e., dual-band communication. Depending on the exact setup, setting up and maintaining a link in dual-band systems could be different from the conven- tional single band links. Nevertheless, the problems are still challenging as the UE can use one of the two bands at a time due to several practical limitations. Note that the usage of multi-frequency bands is different from the concept of carrier aggregation [27], wherein the latter the UE can be served by several BSs and/or over different carrier frequencies (relatively closely separated). In 12 1.3. SIDE INFORMATION this dissertation, we utilize side information to address the problem of link setup and maintenance in systems that use mmWave frequency for communication with a focus on dual-band systems. We discuss the motivation of side information and the considered problems in the following sections. 1.3 Side Information The term ”side information” has been used in different contexts in the communication literature. One of the most dominant uses is in coding theory, where it usually refers to a set of variables or observations that are correlated with the received signals and could be used at the encoder and/or the decoder sides [28]. It could refer to the state of the fading channel as in [29], data bits correlated with transmitted message for secure communication [30], channel state and interference messages in dirty paper coding, or residual source coding error in speech coding [18]. For schemes/systems design, it is usually used to indicate the added information that is not conventionally utilized in that context or needs extra signaling to be acquired. For instance, to design a power control scheme in cognitive radio [31], the authors refer to the state of the channel, busy or free, as side information. With this definition in mind, side information has been referred to using different terms, such as ”out-of-band” information for mmWave band communication [32], ”contextual information” for user scheduling and in the bandit learning theory in general, [33, 34]. Based on the above discussion, we refer to the information that is correlated with the received signal(s) and/or the system performance and may require additional effort as side information. We utilize side information to tackle the challenges in section 1.2.2 and section 1.2.3. Examples of side information are the location information of the UEs, or the CSI in a different frequency band. In general, we predict that the utilization of side information will attract more attention in the near future as • In many scenarios, side information can be acquired relatively easily. For instance, approx- imate location information, which varies slowly over time (compared to coherence time), does not incur significant overhead for certain users with data-heavy applications, e.g., video streaming. Furthermore, with advances in signal processing techniques and hardware, sev- 13 1.3. SIDE INFORMATION 0 - 50 50 - 100 100 80 60 40 20 0 - 20 (a) (b ) (c) (d ) Figure 1.6: Distribution of the shadowing values at 28 GHz from a point cloud simulation of Narinkkatori square, Kamppi district, central Helsinki, Finland. The point cloud data were configured by real measure- ments. (a) The satellite view of Narinkkatori square, (b) split of the square to six path-loss regions, (c) the empirical CDF against the Gaussian distribution of shadowing and the histogram with non line of sight for values extracted using linear fit path-loss model, (d) the empirical CDF against Gaussian distribution of shadowing and the histogram with non line of sight, the values extracted using multi fit model using regions in (b), more discussion can be found in [3]. eral types of side information can be extracted from the pilot signals, such as the angle of arrival of the signal. In addition, nowadays many wireless devices are equipped with various sensors that can be utilized to acquire side information to aid the communication process. • The stringent performance constraints accompanied by the fact that many communication techniques already operate close to their theoretical limits calls for exploitation of the struc- tures in the environment [35], where such structure can be revealed using the side informa- tion. In addition, different schemes are based on theoretical models that may not hold in practice. For example Fig. 1.6 shows the shadowing distribution in a given environment. There is a clear deviation of the shadowing distribution (on a logarithmic scale) from the Gaussian distribution (a distribution that is usually assumed in theory). • The heterogeneous nature of future communication systems and users requirements dictate the use of additional information to cluster the users to match their QoS with various re- 14 1.4. RESEARCH SUMMARY AND THESIS STRUCTURE sources. • Recently, there is an increasing interest in applying machine learning for wireless commu- nication [36]. Different from analytical solutions, machine learning can use complex input data and with a minimal assumption about the environment. However, the usage of side information is not solved it all, as it comes with additional expenses. One needs to decide which information is relevant, use it in proper format and pass it through appropriate pre-processing techniques. Side information might as well need to be subject to upper layers’ conditions such as personal privacy. Thus to use side information its advantages should outweigh its disadvantages. In the next section, we summarize the three problems that we consider in this thesis and the used side information. 1.4 Research Summary and Thesis Structure This thesis tackles three main problems, we dedicate a chapter for each of them, and the last chapter (chapter 5) provides concluding remarks and possible future directions. Each of the following subsections is dedicated to one of the problems, in each we summarize the problem, describe the approaches and highlight the used side information. At the end of the subsection, we summarize our contribution, and point out the related publications. 1.4.1 CSI acquisition D2D Networks We consider the problem of acquiring the CSI in BS controlled D2D networks. As the BS is obliv- ious to the channel states between devices, they have to exchange short pilot signals to estimate the CSI. Obtaining a high-quality CSI requires a trade-off between interference, outdatedness of CSI, and noise. Thus, the goal is to find an efficient pilot scheduling scheme that minimizes errors in the channel estimates. The CSI acquisition can be viewed as an NP-hard scheduling problem. To solve the problem we propose the Location Aware Training Scheme (LATS) as an efficient heuristic scheme that acquires the CSI between devices that are within the vicinity of one another. 15 CSI acquisition D2D Networks Note that, as highlighted earlier, this is different from the purpose of CSI acquisition for direct communication in the current LTE-Advanced standards, where the goal there is mainly to acquire the CSI between communicating devices [25]. We use the location of the devices as local side in- formation, where we assume that each device is aware of its location. LATS in its core is a spatial reuse scheme, where it groups the devices into geographical segments and assigns frequency reuse patterns to them. LATS tunes the size of the segments and the separation distance based on a statistical metric that captures the average quality of the acquired CSI. In particular, to identify the parameters of the scheme (segmentation and guard parameters, S n and G n respectively), we use the average Normalized Mean Square Error (NMSE) as a metric, which combines the effects of outdatedness, noise, and interference. We derive an approximation of the average NMSE based on statistics of the devices and LATS structure. We show, through simulations, that LATS outperforms TDMA- and CSMA-based schemes. The proposed scheme has a number of practical advantages; it has relatively good scaling behavior over device density and cell radius, also the complexity of the scheme grows only linearly with the number of devices and it requires only approximate locations of the devices. Finally, interestingly, the proposed structure can be utilized to solve several other problems in D2D communication, and ProSe in general, where the devices are required to frequently broadcast pilot messages; nevertheless, we here stick to the CSI acquisition for clarity. 1.4.1.1 Major Contributions • Design an efficient CSI Acquisition scheme with a complexity that is linear with the number of devices. • Derive an error metric that captures the quality of the acquired CSI. • Analyze several practical issues such as overhead reduction techniques and the impact of location errors. 16 Neighbor Discovery with Prior Information 1.4.1.2 Relevant Chapters and Peer Reviewed Scientific Papers Chapter 2 of this document is dedicated to the CSI acquisition problem, the related publications are: C1 Burghal, Daoud, and Andreas F. Molisch. ”Location aware training scheme for D2D net- works.” In Signals, Systems and Computers, 2013 Asilomar Conference on, pp. 1705-1708. IEEE, 2013. J1 Burghal, Daoud, and Andreas F. Molisch. ”Efficient channel state information acquisition for device-to-device networks.” IEEE Transactions on Wireless Communications 15, no. 2 (2016): 965-979. 1.4.2 Neighbor Discovery with Prior Information Although the neighbor discovery has been addressed extensively in the peer-to-peer communica- tion literature, the envisioned structures and applications of D2D communication has generated several interesting problems. In this problem, we address the neighbor discovery when the devices have prior information about their set of possible neighbors. This problem arises in many scenar- ios, for example, in BS controller D2D networks the BS can estimate the superset of neighbors for each device based on the approximate devices’ locations. In another example, the D2D communi- cation could be implemented in dual (or multi) band systems, due to the difference in propagation environments in the two bands the neighbors in one band are not necessarily identical to the ones in the other band, however, performing full neighbor discovery process twice, once in each band, is a waste of resources. Thus, the devices can utilize the neighborhood information in one of the bands to optimize the neighbor discovery process in the other band. The complexity of the problem increases with multi-antenna systems, wherein such case directional neighbor discovery process is required. In this problem, we focus on randomized directional neighbor discovery, where the devices have to randomly choose their beam directions and their transmission status. Minimizing the discovery time is the objective of many neighbor discovery schemes [26]. In this problem, we start by modeling the discovery time as coupon collector problem. We then pro- 17 Neighbor Discovery with Prior Information vide the expected discovery time, however, due to its mathematical complexity we derive convex lower and upper bounds that we use to optimize the random discovery parameters (the probability of choosing a beam direction and the probability of transmission in that direction). We assess the impact of uncertainty in prior information on the performance of the scheme. Once the discov- ery time and its main components are identified, one can derive several interesting algorithms for different scenarios with different complexity and/or information levels. We compare the performance against random direction neighbor discovery scheme that ignores the prior information. Not surprisingly, results show the significance of using such information. In the above we assume that the side information is the set of possible neighbors, however, the methods to acquire such information is different for the different scenarios. In Chapter 3, we discuss when the proposed solution can be used and when the prior information may be available. Based on the current standards, the randomized solution resembles ”discovery type 1” [2], where the devices announce their presence to their possible neighbors. Note that the proposed solution can be applied when the resource pool is provided, see Sec. 1.2.2, however, we restricted the analysis to resource pools that consists of time slots only. 1.4.2.1 Major Contributions • We formulate the expected discovery time with prior neighbor information as a non-uniform coupon collector problem and derive upper and lower bounds. • Based on the lower and upper bounds, we develop two convex optimization metrics to mini- mize the expected discovery time in the network. • We extend the analysis to utilize the prior information with probabilistic uncertainty. 1.4.2.2 Relevant Chapters and Peer Reviewed Scientific Papers The analysis of discovery time and the impact of uncertainty on the schemes are discussed in Chapter 3. 18 Band Assignment in Dual Band Systems J2 Burghal, Daoud, Arash Saber Tehrani, and Andreas F. Molisch. ”On expected neighbor dis- covery time with prior information: Modeling, bounds and optimization.” IEEE Transactions on Wireless Communications 17, no. 1 (2018): 339-351. We extended our work to cover a distributed scenario, and a low complexity BS controlled scenario. We provide a brief introduction to the two in the appendix of Chapter 3, however, the relevant material and details can be found in the following peer-reviewed publications: C2 Burghal, Daoud, Arash Saber Tehrani, and Andreas F. Molisch. ”Directional neighbor discovery in dual-band systems.” In Signals, Systems and Computers, 2015 49th Asilomar Conference on, pp. 1021-1025. IEEE, 2015. C3 Burghal, Daoud, Arash Saber Tehrani, and Andreas F. Molisch. ”Base station assisted neighbor discovery in device to device systems.” In Personal, Indoor, and Mobile Radio Communications (PIMRC), 2017 IEEE 28th Annual International Symposium on, pp. 1-7. IEEE, 2017. 1.4.3 Band Assignment in Dual Band Systems Although the large bandwidth at the mmWave band provides an opportunity for high data rates, its harsh propagation environment degrades the reliability and QoS in a number of scenarios. Thus in the next generation, wireless networks will have the ability to communicate in either the cmWave band or the mmWave band, though not necessarily in both bands simultaneously. In this problem, we consider some aspects of link setup and maintenance in dual band systems, where the BS chooses one of the two available frequency bands to communicate data to the UE. We consider two band assignment scenarios, the first is link setup scenario, where the BS uses current observable information about the channel to make the band assignment decision, we refer to this problem as one-shot band assignment. In the second problem, BS uses the current and previous channel observations to make a future band assignment decision, i.e., for link maintenance, we refer to this problem as sequential band assignment. We propose two approaches for each of the two scenarios. 19 Band Assignment in Dual Band Systems The first approach is an analytical solution to the problem; it is based on the premise that shadowing is log-normally distributed in space. However, we extend the assumption to time (space) and frequency by assuming that shadowing follows a Gaussian Process in both dimensions on a logarithmic scale. In this approach, the BS uses observable shadowing values to make the band assignment decision. We refer to this approach as a GP-based approach. For the second approach, we employ machine learning techniques, where the BS uses trained models to make the band assignment decisions based on the observable features. The features may include the received power in one of the two bands, the polar location of the UE, the angle of departure or the delay of the prominent multipath component. We consider several machine learning models with different complexity including Neural Network (NN) based models and deep learning solutions based on Long Short Term Memory recurrent NN. We study the performance of the proposed solutions in two environments, a stochastic environment, and a ray-tracing environment. The results reflect the power of machine learning, as it provides a competitive performance in the stochastic environment (where GP-based are statistically optimum), and considerably better performance in the ray-tracing environment. 1.4.3.1 Major Contributions • We propose exact and approximate solutions to the BA based on the GP assumption, where we assume that the shadowing follows a GP in space and frequency; interestingly the ap- proximate solution shows better results in channels where the GP does not hold. • Viewing the BA as a classification problem, we use several ML approaches to solve the BA problem, which includes linear regression (LR), Logistic Regression (GR), NNs, and DL based on Long Short Term Memory (LSTM) architectures. We use cross-validation techniques to optimize their parameters. The use of several ML techniques is necessary to provide a realistic assessment of the power of complex ML approaches. • We study the performance of the proposed solutions under several features combinations in stochastic and ray-tracing environments. This is done because some of the features are 20 Band Assignment in Dual Band Systems available with little overhead, while others require significant acquisition effort. 1.4.3.2 Relevant Chapters and Peer Reviewed Scientific Papers Chapter 4 is dedicated to this problem, the material in this chapter was also part of the following peer reviewed publications: C4 Burghal, Daoud, Rui Wang, and Andreas F. Molisch. ”Band Assignment in Dual Band Systems: A Learning-based Approach.” In MILCOM 2018-2018 IEEE Military Communi- cations Conference (MILCOM), pp. 7-13. IEEE, 2018. J3 Burghal, Daoud, Rui Wang, and Andreas F. Molisch. ”Deep Learning and Gaussian Process based Band Assignment in Dual Band Systems.” IEEE Transactions on Wireless Communi- cations (Submitted Jan 2018). As an alternative GP-based solution, we have considered the case where the BS performs linear rate prediction, the solution enables several interesting theoretical results. The material is not part of the dissertation and appears in the following publication: J4 Burghal, Daoud, and Andreas F. Molisch. ”Rate and Outage Probability In Dual Band Systems With Prediction-Based Band Switching.” IEEE Wireless Communications Letters (2018). 21 Chapter 2 CSI Acquisition 2.1 Introduction Unlike traditional peer-to-peer networks, D2D networks use the Base Station (BS) as a central controller. It is well established that in any network, such central control can greatly help to achieve higher throughput and Quality of Service (QoS). Typically, the network controller has to solve many traditional communication problems; setting up sessions, link scheduling, and perhaps employing Interference Alignment (IA) or cooperative communication techniques. To exploit the aforementioned techniques the knowledge of Channel State Information (CSI) is usually assumed. For instance, many link scheduling techniques require the knowledge of CSI [37–40], so that its measurement impacts the efficiency of the communication and the achiev- able throughput [41]. Similarly, in routing the prior knowledge of CSI can improve the perfor- mance, e.g., [42] and [43], and reduce additional overhead [44]. To implement interference can- cellation the knowledge of CSI is indispensable [45–47]. Since acquisition of perfect CSI is not possible in practice, the quality of CSI is of a prime importance, as confirmed by numerous stud- ies. For example, authors in [48] and [49] show the effect of imperfect CSI on the subsequent communication. Thus, the goal of this work is to design a scheme that acquires the CSI efficiently and accurately. In contrast to conventional cellular networks, where only the CSI between devices and BS is 2.1. INTRODUCTION required, in D2D networks the CSI between devices is needed. This poses new challenges, as the BS cannot infer the state of the links between devices from direct observations. For example, the link between two devices could be poor due to fading, while their links to the BS have low atten- uation. Thus, it is necessary to perform estimation of the channels between devices. Yet the basic question is: at what times should the pilot transmissions from different devices occur? The tempo- ral variability of the channel and the broadcast effect make ”straightforward” solutions inefficient. For instance, Time Division Multiple Access (TDMA), where each device in the cell would get a separate time slot for transmission of its pilot sequence, reduces the interference between devices but increases CSI outdatedness due to the long times between pilot sequences of each particular device. On the other hand, contention based techniques (e.g., CSMA/CS) might reduce the delay of CSI acquisition but the quality of estimated channels might be poor due to interference. For these reasons the acquisition of CSI between all devices is usually referred to as cumbersome or impractical, see e.g., [50]. Consequently, some of the recent work restrains the channel knowl- edge to the average gain, which can be estimated from the path-loss as function of distance, see, e.g., [51], or by an outdated set of previous measurements, e.g., acquired during the neighbor dis- covery process [52], or estimate only the channels between a subset of the available devices [53]. Other papers constrain the CSI estimate to the currently active links [54]. While the acquisition of CSI can be viewed as a scheduling problem, where every device has to transmit a pilot signal to its neighbors, there are many factors that make CSI acquisition distinct from the traditional scheduling of information bits. The traditional scheduling problem has been studied thoroughly in mobile ad hoc networks, for the purpose of maximizing the throughput of data communication. However, the CSI acquisition differs in a number of aspects: the quality of the CSI impacts the Signal to Noise Ratio (SNR) of the subsequent communication; in fact, its key characteristic is the mean-squared error as opposed to the sum capacity that normally characterizes communications. While those two quantities are related [55], they are different and lead to different optimization problems. In addition, the majority of scheduling schemes deal with link scheduling, which is different from the fundamental problem at hand. Also, many scheduling schemes dis- cussed in the literature assume the knowledge of interference levels between devices, and thus the 23 2.1. INTRODUCTION CSI [37, 56], and therefore are not applicable to CSI acquisition. While other algorithms define a minimum acceptable Signal to Interference Plus Noise Ratio (SINR) and schedule the links based on the geometric setup [57], the quality of the CSI is not only function of SINR, but also impacted by outdatedness of the channel state in time-variant channels. As a result, CSI acquisition and link scheduling are considered separate problems. In fact, this has been clearly addressed in two recent D2D scheduling schemes, namely FlashLinQ [53] and ITLinQ [58], in those two schemes the scheduling problem is solved in two steps: CSI acquisition and then link scheduling. In general, the existing schemes for CSI acquisition problem fail to scale in large networks, e.g., [52, 53, 59, 60], or use fundamentally different assumptions, e.g., [61]. The schemes in [52,53,59,60] use orthogonal sequences to acquire the CSI. Orthogonality in frequency, as in [53], is similar in its fundamental principle to the TDMA approach mentioned above, 1 while code based orthogonality uses longer training sequences, i.e., large outdatedness, and is sensitive to propaga- tion delay [18]. The work in [61] is similar to our approach in that it uses geographical partitioning of devices for the acquisition of CSI. [61] solves the problem of device partitioning such that when employing IA for each portion the overall sum rate is maximized, this is due the reduction of the CSI overhead. Specifically, a link is added to a partition only if it satisfies certain conditions, e.g., maximum interference, such that employing IA is beneficial. Note that while this could be a suit- able scheme for communication in non connected networks, in our work the set of links for CSI acquisition is dictated by neighbor relations of uniformly distributed devices. Furthermore, the authors ignored the outdatedness of the CSI and considered the sum rate of pre-specified links. In contrast, in this work we aim to quantify the impact of delay, noise and interference on the acquired CSI for all neighbor devices. Consequently, both the fundamental assumptions and the evaluation criteria of [61] are significantly different from the present work. Thus, to the best of our knowledge, no particular scheme exists that tackles the CSI acquisition problem in D2D networks under the general assumptions outlined above. In this work, we present the Location Aware Training Scheme (LATS) as a flexible scheme that gathers the CSI between devices that are within vicinity of one another and provide an average error measure in the ac- 1 It furthermore faces challenges in frequency-selective channels; while our proposed scheme can be easily gener- alized to the frequency-selective case. 24 2.2. LOCATION AWARE TRAINING SCHEME quired CSI. In fact, LATS is inspired by traditional cellular ”spatial reuse” [18]: the cell is divided into segments, and segments using the same time-frequency resources are separated by a guard distance. The scheme provides the ability to trade off the effect of interference and outdatedness. The chapter is organized as follows: in section 2.2 we summarize the network and channel models and describe the scheme, in section 2.3 we analyze the scheme and derive an approxima- tion to the average Normalized Mean Square Error (NMSE) of the acquired CSI, then we study the effect of location information on the overhead and the performance, and formulate the LATS optimizing problem, (P-1) at the end of the section provides a compact form of the optimization problem and a summary of the relevant equations. In section 2.4 we discuss the simulation results. The appendix of the chapter is given in section 2.5. 2.2 Location Aware Training Scheme 2.2.1 Network and Channel Models We consider a wireless network with mobile devices that are spatially distributed according to a homogeneous Poisson Point Process, with density devices/m 2 . The area over which the devices are distributed is a square with side length ofL N . In addition, we assume that the network has a central controller, i.e., a Base Station (BS), that is charged with the coordination tasks described in the previous section. Communication between devices (both for pilot transmission and actual communications) is permitted only under control of the BS. Devices have a maximum communication range, L c , due to channel attenuation combined with constraints on the transmit power. The latter can be due to regulatory constraints, practical limitations on power amplifiers, or conscious limitation for improving system performance. 2 Note that in this work we do not limit the CSI acquisition to within a fixed geographical area; rather the goal is to gather the CSI within a distanceL c of every device. This communication model is widely used in the literature and referred to as protocol model and represented by a disk graph, 2 E.g., Ref. [62] describes optimization of the communication range for maximizing throughput in video caching networks. 25 Network and Channel Models see, e.g., [56], [57]. We assume a time slotted system; the devices have (rough) timing synchronization with the BS (e.g., through a beacon) and use half duplex communication. Devices are assumed to be aware of their individual locations, which is commonly fulfilled for modern devices. We assume that BS-Device channels (”signaling channels”) are orthogonal to D2D channels. Furthermore, the D2D pilots are transmitted on the same frequency throughout the cell, while possible orthogonalization occurs by transmission at different times. 3 Thus, pilot transmission suffers from interference. Due to the mobility of devices and/or the environment, we consider the channels to be time-varying; we furthermore assume that they are frequency-flat. 4 The received pilots are assumed to be corrupted by noise, the noise is modeled as white Gaussian noise with one-sided power spectral densityN 0 . Consequently, we can write the received pilot signal for a transmit receive pair (;r) as y ;r (t) =h ;r (t)S P (t) +n r (t) + X i2I h i;r (t)S Pi (t) (2.1) whereS Pf:g (t) is the pilot signal transmitted from a device to its neighbors at timet,n r (t) is noise at the devicer,I is the set of all devices simultaneously scheduled for transmission with, and h f:;rg represents the complex (amplitude) channel gain between a transmitter and device r; for given device locations it is modeled as circularly symmetric complex Gaussian random variable with variance equal to path gain 2 , which is function of distance and is given by 2 (d ;r ) = 4d ref 2 d ref d ;r (2.2) whered ;r is the distance between a transmitting device and a receiverr,6= r, is the wave- length, and is the propagation exponent, usually 2 [1:5; 6] depending on the environment [18]. d ref is the breakpoint distance which, in the simulations in Sec. 2.4, is assumed to bed ref 4h 2 Dev , 3 Note that this is no restriction of generality; if a set of frequencies is available, orthogonalization in the time/frequency plane can be done completely analogously, and the equations derived below require only minor modi- fications. 4 Again, consideration of frequency-selective channels can be done in an analogous way, since training has to be done (in an OFDM system) for one subcarrier per coherence bandwidth [18]. 26 Scheme Description whereh Dev is the height of mobile devices antenna [18]. To capture the temporal channel variation, we use a first-order autoregressive model, thus the true channel after seconds, [63], is h ;r (t +) =()h ;r (t) + p 1j()j 2 h 0;r (t) (2.3) where h 0;r is a process independent of h ;r but identically distributed (i.i.d.). is the channel correlation coefficient as function of time shift. While it can in principle refer to any delay, in our application is the lag between the time the channel is estimated att and used for communications att +. In next section we will elaborate further on. Equations (2.1) and (2.3) reflect the intrinsic trade-off of the CSI acquisition problem: simul- taneous transmission of pilots reduces the delay in (2.3) but raises the interference level in (2.1), and vice versa. Thus, to reduce the estimation errors the training scheme must be designed care- fully to mitigate the effect of interference, noise, and outdatedness. In the following subsection we describe LATS, which tackles these problems. 2.2.2 Scheme Description LATS uses the well-known concept of spatial reuse to jointly minimize the interference and delay. The effect of distance is evident: (2.2) suggests that the interference power received byr decays as a function of the distance between a transmitter and a receiver r. One possible approach for scheduling the pilot tones would be to determine the optimum sets of devices that can transmit simultaneously without exceeding an interference threshold; however, such scheduling is known to be extremely complex (usually NP-hard, see, e.g., [56], [64]). 5 To simplify the problem we group devices, based on their locations, into uniform square shaped segments. Viewing the center of each segment as a virtual device v , we schedule the virtual devices in a way that minimizes the estimation errors. This can be also interpreted as discretizing devices locations to points on a grid. As we discuss later, this greatly simplifies the scheduling 5 Yet even this is not optimal, this is due to: (i) the choice of the threshold is usually heuristic and (ii) the sets are not independent, see [64] and [61] for relevant discussions. 27 Scheme Description problem. Therefore, the cell is divided into small segments that has side length ofL s meters. To maintain the uniform structure we assume that the diameter of a communication ”disk” described above has an integer number of segments, more precisely 2L c = (2S n + 1)L s , where the segmentation parameterS n is a non-negative integer. To reduce interference, LATS defines the minimum separation distance between virtual devices that are scheduled simultaneously to be 2L c +2L g , whereL g is the guard distance in meters. Again, to maintain the uniform structure and ease of scheduling, we defineL g = Gn 2 L s where the guard parameterG n is a non-negative integer. At this point, we recognize three areas around a virtual transmitter v : the Training Area as square area that contains the communication disk of v , Training Block as the square area centered at v with side lengthL B = 2L c + 2L g , and the Guard Area as the area outside the training area but inside the training blocks. Fig. 2.1 shows a segmented cell and the three types of areas. This uniform setup greatly simplifies the scheduling. Given the separation distanceL B , we can define the Training Set as the collection of segments such that every two neighbor segments are horizontally or verticallyL B apart, we denote thes th such set as s . By construction ofL B , the virtual devices that belong to the same training set s can transmit simultaneously. As a result, given the structure above it is easy to see that any training block b has up to s b = 2Lc+2Lg Ls 2 segments. Note thats b is equal to the number of training sets. Using the definition ofL s andL g , s b can be shown to be s b = (2S n +G n + 1) 2 (2.4) In the example shown in Fig. 2.1 the cell has 25 training sets, i.e.,s b = 25, each set contains 4 segments, i.e.,j s j = 48s2f1;:::; 25g, wherej:j denotes the cardinality of a set. We note the sub-optimality of the scheme described above: firstly, we use squares instead of hexagons as training areas; furthermore we have only considered scheduling of segments separated by the guard distance along the vertical/horizontal axis. We also have neglected the impact of fading on the scheduling. This has been the price of arriving at a simplified model. Further issues can arise when: (i) devices are not at the center of their segments, making it 28 Scheme Description devices 1 2 4 3 5 6 7 9 8 10 11 12 14 13 15 16 17 19 18 20 21 22 24 23 25 1 2 4 3 5 6 7 9 8 10 11 12 14 13 15 16 17 19 18 20 21 22 24 23 25 1 2 4 3 5 6 7 9 8 10 11 12 14 13 15 16 17 19 18 20 21 22 24 23 25 1 2 4 3 5 6 7 9 8 10 11 12 14 13 15 16 17 19 18 20 21 22 24 23 25 Training Area Guard Area Training Block L s L c Tx’ing Segm L g Figure 2.1: Structure of LATS: a simplified segmented cell and LATS cell components. In this figure S n = 1 andG n = 2. The virtual transmitters of training set number 13 are scheduled. possible to have shorter distances between transmitters and (ii) there is more than one device per segment. LATS resolves these issues as follows: for (i) we adjustS n andG n to correct for devices ”deviation” such that the average effect of interference is tolerable, while for (ii) LATS suggests training the set as many times as the maximum number of devices per segment, i.e., using TDMA for devices in the same segment with random time slot selection. To elaborate further on the LATS scheduling mechanism: assume a segmentg in training set s, i.e.,g2 s . Let the random variablek g be the number of devices in segmentg andw s as the training period for training sets, we can writew s as w s , max g2s k g (2.5) For example, in Fig. 2.1 we have: w 1 = 1,w 13 = 2, andw 15 = 0. To schedule thek g devices in g, the controller randomly chooses k g time slots from the available w s slots. In other words, denoting the time slot of device asz , we havez 2f1;:::;w s g. Note that all segments in s are 29 Scheme Description Table 2.1: Table of key mathematical symbols L s Segment side length (m) L g Guard distance (m) L c Maximum communication range (m) S n Segmentation parameter G n Guard parameter Device density (dev./m 2 ) s b Maximum number of segments in a training block k g Number of devices in segmentg s Set of all segments in training set s w s Training period: number of time slots for training sets z Transmission time for device (in training sets) (s;z ) Time until the acquired CSI is used, for transmitter in training sets t Single symbol training duration (sec.) N b Number of training blocks in the cell d Intf Average transmitter-receiver, interferer-receiver distances ratio scheduled within the same period, and the training for sets starts afterT s seconds, which can be given by T s = s1 X i=1 w i t (2.6) where t is the duration of training symbol. 6 Since the outdatedness of a CSI estimate depends on the time difference between the time the CSI is estimated and the time at the end of network training, and fors b training sets, the encountered delay for the estimated CSI of (;r) link is (s;z ) = s b X j=s (w j )z ! t (2.7) which is independent of devicer. We conclude this section by pointing out that in the proposed scheme the values ofS n andG n trade off the delay with interference, for instance the increase of number of segments per training area, i.e., larger S n , clearly reduces the interference, however, since L s decreases there is less chances that all segments are occupied, this reduces the reuse efficiency and increases the delay. 6 t may include maximum propagation delay, processing time, etc. 30 2.3. ANALYSIS Similar reasoning can be used to verify the effect of the guard parameter: as G n increases the interference decreases but the delay increases, simply because of the addition of new non-active devices in the guard area. For clarity of presentation, table 2.1 summarizes the notation for the chapter. 2.3 Analysis In this section we derive the performance metric of LATS, namely the average Normalized Mean Square Error (NMSE), which measures the effect of interference, outdatedness, and noise. Next, we analyze the impact of location knowledge on the overhead and system performance. We then formulate the LATS optimization problem, which we use to find the optimal values ofS n andG n . 2.3.1 Performance Metric We use the NMSE to capture the performance of the scheme, which reflects average quality of all acquired CSI in the cell. The derivation of the NMSE proceeds as follows: we start with the Square Error (SE) of the CSI estimate of a link (;r) conditioned on the time, locations, and devices schedules. The Mean SE, (MSE), is derived by averaging the SE over different realizations of the small scale fading and noise. The MSE reflects the quality of the CSI for (;r), to consider other links over the cell we first normalize the MSE. Next, we average the NMSE over all devices in the segment, all segments in the training set and all training sets in the cell. In the derivation we make use of the total expectation and conditional probability laws along with some plausible assumptions. Let us first consider a link (;r), where the transmitter is scheduled at time slotz of segment g2 s . For clarity, we will useg 0 to represent the segment that contains. We can then write the SE in the estimation of the channelh ;r as SE ;r (t;s;g 0 ;D;W;K s ;I s;z ) =j ^ h ;r (t)h ;r (t +(s;z ))j 2 (2.8) 31 Performance Metric where we denoted the estimate of the channel as ^ h ;r . The SE is conditioned on the absolute time t, the index of the training sets, segmentg 0 , the setK s which includes the numbers of devices in every segment that belongs to sets, i.e.,K s =fk g :8g2 s g, the set of locations of all devicesD, the set of all training periodsW, i.e.,W =fw s :s2f1;:::;s b gg, and the set of interfering devices I s;z , i.e., the devices that are in training sets and scheduled atz time slot. We have dropped the explicit dependency on these parameters from the r.h.s. of (2.8) for brevity. To find the estimate ^ h ;r (t) we apply Least Squares estimation to (2.1) ^ h ;r (t) =h ;r (t) + S P (t) jS P (t)j 2 n r (t) + S P (t) jS P (t)j 2 X i2Is;z h i;r (t)S Pi (t) (2.9) Next, take the expectation over different realizations for the same scheduling setup, i.e., ex- pectation over small scale fading and noise, which equals the expectation over timet if ergodicity holds. Substituting (2.3) and (2.9) in (2.8) and with some straightforward algebra, it is possible to write the conditional Mean Square Error (MSE) MSE ;r (s;g 0 ;D;W;K s ;I s;z ) =2(1((s;z ))) 2 (d ;r )+ N 0 + P i2Is;z 2 (d i;r )E i E (2.10) where we have used the fact thatEfjh f:;rg j 2 jDg = 2 (d f:;rg ), where 2 is given in (2:2).E andE i are the transmitted power from the transmitter and interfereri, respectively. The impact of the channel estimation error on the performance of a specific link is mostly determined by the relative error. 7 The Normalized MSE ;r can be written as NMSE ;r (s;g 0 ;D;W;K s ;I s;z ) = MSE ;r (s;g 0 ;D;W;K s ;I s;z ) Efjh ;r (t +(s;z ))j 2 g (2.11) However, since the h is a stationary process over the training time, we have Efjh ;r (t + 7 This is different from [65] where we minimized a scaled version of the mean square error. 32 Performance Metric (s;z ))j 2 jDg = 2 (d ;r ), so that NMSE ;r (s;g 0 ;D;W;K s ;I s;z ) = 2(1((s;z ))) + N 0 + P i2Is;z 2 (d i;r )E i E 2 (d ;r ) (2.12) We can intuitively interpret the above as a function of the delay and the inverse of the SINR. In the subsequent discussion we considerE i =E =E p , which is consistent with the assumption that all devices have the same maximum communication rangeL c . Using (2.2), eq. (2.12) can be written as NMSE ;r (s;g 0 ;D;W;K s ;I s;z ) =2(1((s;z ))) + N 0 d ;r E p d 2 ref 4 2 + X i2Is;z d i ref d ;r d i i;r (2.13) where and i are the propagation exponents in (;r) and (i;r) links, respectively. The above equation is a function of the distances between devices, more specifically the distance of (, r) link and (i;r) link,8i2I s;z . Using the LATS structure, we can recognize interference tiers with respect to the transmitting device, for example the first tier is the group of all active segments in the neighbor training blocks. Depending on the cell size and location of the segmentg 0 in the cell, the first tier may contain up to 8 segments. Similarly, for thel th tier, we have up to 8l blocks. We can defineI l s;z as a set that contains all active interfereres atz in thel th tier for sets, in other words,I s;z =fI l s;z ;l2f1; 2; 3;::;l 0 gg, wherel 0 is the maximum number of tiers, in a square cell withN b blocksl 0 = l p N b 1 m , whered:e is the smallest following integer operator. Taking the expectation over the ensemble of all possible locations, (2.13) can be written as NMSE ;r (s;g 0 ;W;K s ;I s;z ) = 2(1((s;z ))) +c 0 Efd ;r g + l 0 X l=1 X i2I l s;z c 1 E d ;r d i i;r (2.14) wherec 0 = N 0 Epd 2 ref 4 2 andc 1 =d i ref . Note that the expected value of the distance ratio depends on the orientation of the interfering segment with respect to segmentg 0 and the distance between the centers of the two segments. Due 33 Performance Metric Ls Lg Lc Figure 2.2: Diagonal and vertical/horizontal segments in the first tier, l = 1, and the two segments that represent them. The average interference distance depends on the segment side lengthL s , i.e.,S n , and the guard distanceL g , i.e.,G n . In this figureS n = 1 andG n = 2. to the uniform structure of the training blocks, there are symmetries in every tier. Consequently, for the set of all segments in tierl we can group the interfering segments into subsets, with index m, and evaluate the expectation of distance ratio once for each subset. It is easy to verify that for tierl we havel + 1 different subsets, i.e.,m2f0;:::;lg. If we connect the centers of the segments in each tier with the center ofg 0 , then we can choose the segment that makes the angle m = m 4l , from the x-axis, to represent them th subset. For instance, Fig. 2.2 shows the two subsets in the first tier. Therefore, forl = 1 it is enough to evaluate the expectation twice: once for the segments that are diagonally oriented with respect tog 0 and once for the segments that are vertically/horizontally oriented with respect tog 0 . Consequently, we can writeI l s;z =fI l;m s;z ;m2f0;:::;lgg, whereI l;m s;z is the subset of inter- ferers in tierl that are in segments and belong to subsetm. For simplicity of notation we define the average distance ratio d l m = E n d ;r d i i;r o ,i2I l;m s;z , and 34 Performance Metric the average distance d =Efd ;r g. Using the symmetry we can rewrite (2.14) as NMSE (s;g 0 ;W;K s ;I s;z ) = 2(1((s;z ))) +c 0 d + l 0 X l=1 l X m=0 X i2I l;m s;z c 1 d l m (2.15) which is independent ofr. To find d l m and d , the distribution of devices is required. Without loss of generality and using the Cartesian coordinates, consider that two opposite vertices of segmentg 0 are at (0; 0) and (L s ;L s ). Since the transmitters are uniformly distributed in segment g 0 , the probability density function (pdf) of transmitter coordinates is f (x ;y ) = 1 L 2 s ; x ;y 2 [0;L s ] (2.16) On the other hand, an interfering segment in tierl has two opposite vertices at (X l;m ;Y l;m ) and (X l;m +L s ;Y l;m +L s ), such that X l;m =l(2L c + 2L g ) cos m 4l ; m2f0;:::;lg Y l;m =l(2L c + 2L g ) sin m 4l ; m2f0;:::;lg (2.17) Similarly, the pdf of location of an interfering devicei, in tierl and distance typem, is f i (x i ;y i ) = 1 L 2 s ; s:t: x i 2 [X l;m ; X l;m +L s ] and y i 2 [Y l;m ; Y l;m +L s ] (2.18) The receivers are distributed within a disk centered on the transmitter, it is more convenient to condition the location of the receiverr on the location of the transmitter. Thus, we have f rj (x r ;y r jx ;y ) = 1 L 2 c ; x r ;y r s:t: (x x r ) 2 + (y y r ) 2 L 2 c (2.19) Using the conditional pdfs, the value of the expected value d l m can be evaluated by 35 Performance Metric d l m = Z Ls 0 Z Ls 0 Z Y l;m +Ls Y l;m Z X l;m +Ls X l;m Z y +Lc yLc Z x + p L 2 c y 2 r x p L 2 c y 2 r ((x x r ) 2 + (y y r ) 2 ) 2 ((x i x r ) 2 + (y i y r ) 2 ) i 2 f rj (x r ;y r jx ;y )f i (x i ;y i )f (x ;y ) dx r dy r dx i dy i dx dy L 2 c (2.20) Note that d l m is function of S n and G n , therefore we may write d l m (S n ; G n ) . The average distances are not always easy to calculate in closed form.We use numerical methods to calculate d l m in (2.20), more details are given in 2.3.3. On the other hand, d can be evaluated in closed form using polar coordinates as below: d = Z 2 0 Z d ref 0 r ;1 +1 L 2 c drd + Z 2 0 Z Lc d ref r ;2 +1 L 2 c drd = 2 L 2 c d ;1 +2 ref ;1 + 2 + L ;2 +2 c d ;2 +2 ref ;2 + 2 ! (2.21) we have used = ;1 whend ;r <d ref and = ;2 otherwise. The interfering segments at any tier are considered active only if they have active transmitter. And since in any scheduled segment there is up to one active transmitter in a time slot, there is a one-to-one correspondence between an interfereri and segmentg. Consequently, for givens and time slotz we can replace the setI l;m s;z by an indicator function1 g (s;z ;l;m), that can take the values: 1 g (s;z ;l;m) = 8 > < > : 1 :g2I l;m s;z 0 :g = 2I l;m s;z In other words, segment g causes interference if it contains an active interferer at z . Then (2.15) can be written as NMSE (s;g 0 ;W;K s ;I s;z ) = 2(1((s;z ))) +c 0 d + l 0 X l=1 l X m=0 X g2I l;m s;z c 1 1 g (s;z ;l;m) d l m (2.22) 36 Performance Metric Next, we take the expectation over the existence of an interferer in segmentg at time slotz . Given the random scheduling technique we can show Ef1 g (s;z ;l;m)g =P(1 g (s;z ;l;m) = 1) = k g w s (2.23) Using this result we have NMSE (s;g 0 ;W;K s ) = 2(1((s;z ))) +c 0 d + l 0 X l=1 l X m=0 X g2I l;m s;z c 1 k g w s d l m (2.24) Thek g , which are given in setK s , are i.i.d. Poisson distributed random variables, with proba- bility mass function (pmf) P k . Taking the expectation overk g we have NMSE (s;g 0 ;W;K s ) =2(1((s;z ))) +c 0 d +c 1 l 0 X l=1 l X m=0 X g2I l;m s;z ws X u i =0 P k (k g =u i jk g w s ) u i w s d l m (2.25) whereP k (:) is the probability of the enclosed event using pmf P k . Note that for givenw s we know thatk g w s . Using the definition of conditional expectation we have P k (k g =u i jk g w s ) = P k (k g =u i ) P k (k g w s ) P k (k g =u i ) (2.26) For large cells, i.e., large number of segments per training set, the approximation at the r.h.s. is reasonable. Using the same argument and since number of devices in a segment is Poisson random variable, we can show that ws X u i =0 u i P k (k g =u i )Efk g g =A (2.27) where A = L 2 s = 2Lc 2Sn+1 2 , i.e., the area of a segment. The r.h.s. is the average of number of 37 Performance Metric devices in a segment [66]. Substituting (2.26) and (2.27) in (2.25) we have NMSE (s;g 0 ;W;k g 0 ) =2(1((s;z ))) +c 0 d +c 1 l 0 X l=1 l X m=0 X g2I l;m s;z A d l m w s (2.28) where we have replacedK s withk g 0 , since we averaged overk g for allg6= g 0 . The conditional NMSE above is derived for single link, since there are k g 0 devices in segment g 0 , the average conditional is NMSE(s;g 0 ;W;k g 0 ) = 1 kg 0 P kg 0 =1 NMSE (s;g 0 ;W;k g 0 ), in other words we have: NMSE(s;g 0 ;W;k g 0 ) = 1 k g 0 kg 0 X =1 2(1((s;z ))) +c 0 d + Ac 1 w s l 0 X l=1 l X m=0 X g2I l;m s;z d l m (2.29) The only term that depends on is, for short training intervals the correlation function can be approximated by ((s;z ))a 1 2 (s;z ) +a 0 (2.30) wherea 0 anda 1 are constants. Substituting (2.7) in (2.30) and then the result in (2:29) we have NMSE(s;g 0 ;W;k g 0 ) =2 2a 1 t 2 0 @ s b X j=s s b X i=s w j w i 2 1 k g 0 kg 0 X =1 z s b X j=s w j + 1 k g kg 0 X =1 z 2 1 A 2a 0 +c 0 d + Ac 1 w s l 0 X l=1 l X m=0 X g2I l;m s;z d l m (2.31) Taking the expected value over the scheduling time of devicesz and noting thatz 2f1;:::;w s g, for given values ofw s andk g 0 it can be shown, [67], that E 8 < : 1 k g 0 kg 0 X =1 z 9 = ; = w s + 1 2 and E 8 < : 1 k g 0 kg 0 X =1 z 2 9 = ; = w 2 s 1 12 + w s + 1 2 2 (2.32) which are independent ofk g 0 , substituting this in (2.31), we have 38 Performance Metric NMSE(s;g 0 ;W) =2 2a 1 t 2 s b X j=s s b X i=s w j w i (w s + 1) s b X j=s w j + 4w 2 s + 6w s + 2 12 ! 2a 0 +c 0 d + Ac 1 w s l 0 X l=1 l X m=0 X g2I l;m s;z d l m (2.33) The NMSE above is conditioned on all values ofw j forj2fs;:::;s b g, taking the expectation over the random training periodsw j , and since thew j are i.i.d.8j, NMSE(s;g 0 ) =2 2a 1 t 2 s b s + 1 3 Efw 2 g + (s b 2 2ss b +s 2 )E 2 fwg s b s + 1 2 Efwg + 1 6 2a 0 +c 0 d +E Ac 1 w l 0 X l=1 l X m=0 X g2I l;m s;z d l m (2.34) where we have dropped the subscript of training periods due to the fact that the number of devices is i.i.d. over segments. Next we denoteEfwg byw,Efw 2 g byw 2 andEf 1 w g byw inv . Note that the average total delay follows directly fromw and (2.6), in other words we haveT s b +1 = ts b w. To evaluatew,w 2 andw inv we first find P w , the pmf of a random variablew, based on (2.5) for w = q there must be at least one segment that hasq devices and the rest have strictly less thanq devices. In other words, if there areN b segments in training sets, then forw =q there shall ben segments that haveq devices,N b n have<q devices and zero segments have>q devices: P w (w =q) = N b X n=1 N b ! n!(N b n)! P k (n =q) n P k (n<q) N b n (2.35) which is function of S n and G n . Specifically, for a network that has integer number of blocks, every training set hasN b segments, so that for a cell with side lengthL N , N b = 2 6 6 6 L N 2L c (1 + Gn 2Sn+1 ) 3 7 7 7 2 (2.36) 39 Performance Metric Furthermore, note that the distribution of number of devices in a segment is a two dimensional Poisson process given by P k (k =q) = (A) q q! e A (2.37) Applying the definition of expectation we can findw andw inv by w = 1 X q=1 qP w (w =q); w inv = 1 X q=1 1 q P w (w =q); (2.38) Next, we average the conditional NMSE in (2.34) over g 0 . However, by definition ofI s;z (and consequently the subsetI l;m s;z ), only the last term in (2.34) depends ong 0 , in fact it depends on the position of segmentg 0 in the cell. Thus, averaging over all possible locations ofg 0 , we define the average transmitter-interferer distance ratio d Intf as d Intf = 1 N b X g 0 l 0 X l=1 l X m=0 X g2I l;m s;z d l m (2.39) which is a deterministic function for a given network, for special case when the first tier is the dominant tier, i.e.,l =l 0 = 1, and ignoring the effect of cell edge, we have d Intf = 4( d 1 0 + d 1 1 ). In the Appendix, we find a more accurate approximation of d Intf where we consider the effect of all far interferers, i.e.,l> 1. Accordingly we have, NMSE(s) =2 2a 1 t 2 s b s + 1 3 w 2 + s b 2 2ss b +s 2 w 2 s b s + 1 2 w + 1 6 2a 0 +c 0 d +w inv Ac 1 d Intf (2.40) The average NMSE that we have so far is a function ofs, however, since the randomness in the training sets, i.e., number of devices and their locations within their segments, is i.i.d., and given the deterministic relation between the sets, in terms of delay, we can average overs using 40 Impact of Location Information: Overhead and Performance NMSE = 1 s b P s b s=1 NMSE(s), we then have NMSE =2 2a 0 a 1 t 2 1 3 (2s 2 b 3s b + 1)w 2 s b w + s b 1 3 w 2 + 1 3 +c 0 d +c 1 Aw inv d Intf (2.41) In the next subsection we study the practical impact of location knowledge on the overhead and the performance. 2.3.2 Impact of Location Information: Overhead and Performance In location based algorithms, tracking the location in real time poses practical difficulties. In this subsection we quantify the overhead requirement for location update in a centralized implementa- tion of LATS, and then we address the effect of location accuracy on the value of the NMSE. 2.3.2.1 Frequency of Location Update In a centralized LATS, the BS needs to know the location of the devices on the grid, consequently, at the beginning of every training round, devices have to send their new locations if they moved to new segments. Given the structure of LATS the required ”location” is merely the device segment and the training block. Thus, no real time tracking of the accurate location is required. Note that for the devices to know the structure of the grid, they need to know two parameters: S n and a reference point, e.g. BS location. Nevertheless, the amount of overhead and location updates packets could be a problem in some setups. In this part we quantify the frequency at which devices update their locations. Since the structure of the grid is defined byS n andG n , they impact the amount of overhead. We assume a crude mobility model to get a closed-form analytical expression, and then in Sec. 2.4 we compare it, through simulation, with a more realistic mobility model [4]. Initially, let every device moves in a straight line, with a random speedV 2 [0;V max ], and a random angle 2 [0; 2], whereV and are uniformly distributed within their respective ranges. Further, we assume that devices do not change their direction or speed during the interval 41 Impact of Location Information: Overhead and Performance of simulation. We then have V X = V cos( ) and V Y = V sin( ), where V X and V Y are the speed of the device along the x-axis and y-axis, respectively. Ideally, device needs to update its location once it moves to another segment; this happens if the device crosses any horizontal or vertical line of any segment. Thus, the Frequency of Location Update (F LU ) : F LU = V X L s + V Y L s (2.42) Assuming thatL N >> L s , we can ignore the effect of initial location of device. Next we substitute the values ofV X andV Y into (2.42) and take the expected value ofF LU over the speed and the angle of motion, denotingEfF LU g byF LU we have F LU = V 0 2L s Z 2 0 j cos()j +j sin()jd = 4V 0 L s = 2V 0 (2S n + 1) L c (2.43) where we have definedV 0 =EfV g. In the r.h.s. we wroteL s in terms ofS n . As shown in (2.43) the average number of location update packets per second is linear function ofS n only. GivenF LU ,V 0 andL c , eq. (2.43) can be used as an upper bound for the values ofS n that can be meaningfully chosen. The effect of location update in the downlink where the BS has to update the schedule of devices, is based on (2.43), but is more involved since it is related to training periodsw s andw s 0, which are the original and the new training sets of, respectively. Thus, we will consider it in our future work. 2.3.2.2 Location Uncertainty There are several factors that affect the accuracy of location information; for example, overhead reduction techniques or intrinsic location measurement impairments may lead to erroneous infor- mation about the locations. In this part we investigate the effect of location accuracy on the analysis derived in 2.3.1. First, assume that the true location of device is (x 0 ;y 0 ), which belongs to a segmentg, and 42 Impact of Location Information: Overhead and Performance the noisy location being (x n ;y n ). Consider the following linear model x n =x 0 +e x and y n =y 0 +e y (2.44) Assume thate x ande y are i.i.d. and have the pdff e (e), which we assumed to be symmetric. Since the original location of is uniformly distributed in the x-axis and the y-axis, as in (2.18), which are independent of the values ofe x ande y , then the marginal pdf of device noisy locations over the x-axis, can be evaluated by convolution off xe (x e ) andf x 0 (x 0 ), in other words we have f xn (x n ) =f ex (e x )f x 0 (x 0 ), similarly we havef yn (y n ) =f ey (e y )f y 0 (y 0 ). For a simple uncertainty/noise model, 8 assume that f e (e) = 1 2e 0 for e2 [e 0 ;e 0 ] for some distancee 0 . Without loss of generality, assume that two opposite vertices of segmentg are (0; 0) and (L s ;L s ), then we can evaluate the pdf of the new coordinates as indicated above. Consequently, the distribution along the x-axis,f xn (x n ) is (whene 0 < Ls 2 ): f xn (x n ) = 8 > > > > > > > > < > > > > > > > > : e 0 +xn 2e 0 Ls ; jx n je 0 1 Ls ; e 0 <x n L s e 0 Ls+e 0 xn 2e 0 Ls ; L s e 0 <x n L s +e 0 0; otherwise (2.45) a similar pdf can be derived forf yn (y n ) and for the case whene 0 > Ls 2 . Fore 0 = 0, it reduces to the marginal pdfs of (2.16) and (2.18). Interestingly, the additive location error pushes the edge of the segment bye 0 . The distribution of devices in the whole cell will be uniform except at the cell edge, this can be ignored when the neighbor cells use orthogonal channels and for small values ofe 0 . However, the density of devices that belong to the same segment decreases as we get closer to its new boundaries. As a result, the induced interference due to location errors can be tolerated with a proper choice of the guarding and segmentation parameters. On the other hand, the scheduling process is not affected by the new 8 This can also be considered as simple overhead reduction method, that allows the devices to move up to distance e 0 , in the x-axis and/or y-axis, outside its original segment before it updates its location. 43 LATS Optimization Problem and Algorithm location model, because is still listed with segmentg. Therefore, the effect of the location error on the value of NMSE in (2.41) is solely related to d Intf , more specifically we have to replace (2.16) and (2.18) by modified pdfs, as in (2.45), and thus have d l m (S n ,G n ,e 0 ). In Sec. 2.4 we simulate the impact of location error on the value of the NMSE. 2.3.3 LATS Optimization Problem and Algorithm We write the final optimization problem of LATS in terms of the parametersS n andG n . Since we seek to minimize NMSE overS n andG n , to write the objective function (OBJ) we drop all terms in (2.41) that are independent ofS n andG n . Then, OBJ can be written as OBJ =a 1 t 2 1 3 (2s 2 b 3s b + 1)w 2 s b w + s b 1 3 w 2 +c 1 Aw inv d Intf a 1 t 2 2 3 s 2 b w 2 s b w +s b w 2 +c 1 Aw inv d Intf (2.46) where the last approximation is good forS n > 0. We can replace ts b w by the total delay T s b +1 . Depending on the application requirement,T s b +1 can be upper limited by the available time for training. In this subsection, for clarity, we writeT s b +1 ,w inv ,w 2 and d Intf as explicit functions ofS n ,G n in addition to the uncertainty distancee 0 for d Intf . Finally, we can write the optimization problem as in (P-1). In (P-1) we simplified the OBJ function further using (2.4) and the definitions ofL s andA, and replaced the infinite sums in (2.38) with ones that are upper limited by the average number of devices, this is reasonable given the homogeneous distribution assumption. This optimization problem can be solved numerically for the two parametersS n andG n based on known constants:L N ,,L c ,e 0 ,c 1 ,V 0 , the acceptableF LU and the available potion of coherence time. Although the integration of (2.20) can be cumbersome, it is enough to evaluate it once and tabulate the values for the range of interest, and use look-up table throughout the run of the scheme. In a centralized implementation of LATS, the BS has to know device locations on the grid, and then send back pilot training schedule to the devices, consequently when the devices move to new segments the schedule might need to be updated. However, since the environment and network structure change slowly overtime, the optimization problem (P-1) need not to be solved for every 44 2.4. SIMULATION AND DISCUSSION arg min Sn;Gn a 1 2 3 T 2 s b +1 (S n ; G n ) tT s b +1 (S n ; G n ) + t 2 (2S n +G n + 1) 2 w 2 (S n ; G n ) (P-1) + 4c 1 L 2 c (2S n + 1) 2 w inv (S n ; G n ) d Intf (S n ; G n ;e 0 ) subject to : w inv (S n ; G n ) = dL 2 N e X q=1 1 q P w (w =qjS n ; G n ) ; w 2 (S n ; G n ) = dL 2 N e X q=1 q 2 P w (w =qjS n ; G n ) T s b +1 (S n ; G n ) = t(2S n +G n + 1) 2 dL 2 N e X q=1 qP w (w =qjS n ; G n ) ; T s b +1 (S n ; G n )< G n 2f0; 1; 2;:::g ; S n 2 0; 1; 2;:::; L c F LU 4V 0 1 2 d Intf (S n ; G n ;e 0 ) is given by (2:20) (2:45)(2:48; in Appendix); P w (w =qjS n ; G n ) given by (2:35) pilot scheduling. 2.4 Simulation and Discussion In this section we investigate the behavior of LATS as function ofS n andG n , and study its scaling behavior as function of cell area and coherence time, and also compare its performance with other schemes as function of device density. To further understand the difference between LATS and the optimal CSI acquisition scheme, we consider a simple example and compare the NMSE values for both of them as function of coherence time. Next, we compare the derived average frequency of location updatesF LU with the model of [4]. And finally we show the effect of location error on the NMSE. We start by comparing a Monte Carlo evaluation of the NMSE with the derived approximation of (2.41) for several values ofS n andG n . Simulation parameters, that are considered throughout this section, unless it is mentioned otherwise, are: a cell side length L N = 600 m, maximum communication rangeL c = 25 m, training duration t = 20 sec, wave length = 0:125 meter, 45 2.4. SIMULATION AND DISCUSSION average device speed 5 km/h, device density = 0:008 devices/m 2 . The path loss coefficient is = 2 for distances smaller thand ref , and = 4 greater thand ref and antenna height ish Dev = 1:5m. In addition, the channels are Rayleigh-fading; we assume a Jakes Doppler spectrum and thus an autocorrelation function that follows a Bessel function of zero order of the first kind, i.e.,() = J 0 (2v max ), wherev max is the maximum Doppler shift [18]. Consequently, for the approximation (2.30) we take the first two coefficients of the Taylor expansion of the autocorrelation function, thus we have: a 0 = 1 anda 1 =(v max ) 2 . The channel realizations are generated according to the method of [68]. The transmit power is chosen such that the average received SNR at distanceL c is 10 dB. Fig. 2.3 shows a comparison between Monte Carlo simulations and the theoretical values of the NMSE as a function ofS n for fixed values ofG n . In this figure we show two versions of the theoretical NMSE, in one version we use d Intf of NMSE (2.48), specifically for every tier l we have the approximation d l m = 1 2 ( d l 0 + d l l ), thus the effect of all the tiers is considered, for the other version we assume that the first tier is the dominant tier and every segment has exactly 8 neighbor segments. Note that both methods yield comparable results for the chosen simulation parameters, namely and L c , which reduce the effect of the far tiers. In general, both methods show good agreement with the simulation. The NMSE including all tiers provides, not surprisingly, better agreement with the numerical simulation results than the NMSE with first tier only. However, both versions exceed the simulated NSME when the delay is large due the approximation in (2.30). In addition, the first-tier NMSE exceeds all-tiers NMSE when the guard distance is large, this is due to the fact that our first-tier approximation assumes that all first-tier interferers are possible for each segment in the cell (even at the segments at the border of the cell). The more accurate all-tier model really includes interferers from those segments that are actually in the cell. Having thus established the validity of our analytical approximations, most of the subsequent figures will only show the analytical results and not the (computationally extremely expensive) Monte Carlo simulations. Fig. 2.4 shows the performance trade-off as a function of the LATS parameters, namely the segmentation parameter S n (which impacts L s and L g ) and the guarding parameter G n (which 46 2.4. SIMULATION AND DISCUSSION 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Sn NMSE Sim. Th. All Tiers Th. First Tier Gn=0 Gn=2 Figure 2.3: Theoretical evaluation and Simulation of NMSE as function of segmentation parameterS n , for L N = 600 m,L c = 25m and = 0:008 dev./m 2 . controlsL g ). We see that whenG n = 0 the NMSE first decreases asS n increases. This is due to the fact that increasing the number of segments: (i) decreases the variance of the actual transmitter locations around the segment center and (ii) reduces the probability that nodes in neighboring blocks are on the air simultaneously. The NMSE then increases whenS n exceeds a certain value — at this point outdatedness becomes the dominant effect. On the other hand, when S n = 0, (and thusL s takes its maximum value), the NMSE is large whenG n = 0. WhenG n increases the NMSE reduces significantly, due to the decrease of the interference. However, for large values of G n , e.g. G n = 8 in Fig. 2.4, the NMSE increases again due to the outdatedness that is caused by the additional non-active area, i.e., the guard space. The joint effect ofS n andG n , when they are both greater than zero, is somewhat subtle; because S n impacts both the segment size and the guard spacing. For example, whenG n = 2, the NMSE increases withS n , since in this case largerS n means smaller guard distance. In general, largeS n values induce large outdatedness, this also occurs whenS n is small andG n is large. Note that the optimal values ofS n andG n depend on several factors such as the propagation conditions in the cell, i.e., ,d ref , and the coherence time. 47 2.4. SIMULATION AND DISCUSSION 0 5 10 15 0 0.1 0.2 0.3 0.4 0.5 0.6 Sn NMSE Th. All Tiers, Gn = 0 Th. All Tiers, Gn = 2 Th. All Tiers, Gn = 8 Figure 2.4: All-tiers NMSE as function of segmentation parameterS n , forL N = 1:4 km,L c = 25m and = 0:008 dev./m 2 The effect ofS n andG n on the delay is further explained by Fig. 2.5, which shows the portion of time that is used for CSI acquisition by LATS, normalized by the channel coherence time, as function of S n for several values of G n . It is clear that large values of S n and G n are in fact expensive in terms of resource utilization. As a result, the value of in (P-1) can be chosen to restrict the range of theS n andG n such that system utilization requirement is met. To understand the scaling behavior of LATS, we compare its performance as the cell sideL N increases, for several values of device velocities, i.e., coherence times. Fig. 2.6 shows the NMSE againstL N for several device velocities. We first highlight that as the area of the cell increases, the average training periodw increase, as well as the number of far interferers. However, it is possible to show thatw increases slowly such that we can bound it with high probability for reasonable cell sizes, furthermore, the effect of far interferes vanishes for very large distance [18]. Consequently, for small device velocities the increase in L N has negligible impact, however, as the coherence time decreases, it is clear that the NMSE deteriorates with largerL N . We also notice that for large device speeds, and high device densities, CSI acquisition becomes unreliable. 48 2.4. SIMULATION AND DISCUSSION 0 5 10 15 0 0.5 1 1.5 Sn Training Time/Coherence Time Gn = 0 Gn = 2 Gn = 8 Figure 2.5: Fraction of the coherence time used by LATS, for coherence time 0:025 sec,L N = 1:4 km, L c = 25m and = 0:008 dev./m 2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0.2 0.4 0.6 0.8 1 1.2 Cell side length L N (m) NMSE Dev. Speed = 1 km/h Dev. Speed = 5 km/h Dev. Speed = 10 km/h Dev. Speed = 50 km/h Dev. Speed = 100 km/h Figure 2.6: NMSE vs.L N for several device velocities for fixed device density 0:008 dev./m 2 49 2.4. SIMULATION AND DISCUSSION Next, we compare LATS with other schemes, due to the scarce literature that tackles CSI acquisition in D2D, we restrict the comparison to a pure TDMA and a Random Access scheme. In TDMA, every device in the cell is given a time slot for training, thus no interference exists. For random access, we use Carrier Sense Multiple Access Collision Avoidance (CSMA/CA); since all devices need to transmit their pilots, every device transmits a pilot with given probabilityp, this technique is usually referred to as ”p-persistent CSMA/CA”, which is a contention based scheme that is used in wireless communication standards [18], e.g., IEEE 802.11 and IEEE 802.15.4. However, no back-off time is used here. Since the transmission probabilityp affects the delay and the amount of interference, we do the simulation over a range of values ofp. Ideally, if the transmitters are 2L c apart from one another, the percentage of active transmitters isp 0 = L 2 N L 2 c N , where N is the total number of devices in the cell, i.e.,p 0 represents the ratio of maximum number of non-overlapping communication disks to the total number of devices. This is actually an optimistic assumption, since transmitting nodes can generally create interference in a disc of radiusL c (1 + ), where depends on the required SIR threshold and on propagation conditions. We assume that device and device 0 re-transmit their pilot tones if their corresponding communication disks overlap. However, we do not account for the time required to send feedback (ACK/NACK) messages). 9 Consequently, the true values of NMSE for p-persistent CSMA/CA is larger than what is displayed here. The comparison is done over several device densities2f2; 4; 6; 8; 10g10 3 device=m 2 , in a cell withL N = 600 m. For LATS, we restrict the values ofS n 2f0; 1; 2; 3; 4g andG n 2f0; 1; 2g, while ignoring the overhead and time utilization requirements, i.e.,F LU =1 and =1. Then we perform a Monte Carlo simulation, and for each of the combinations ofS n andG n the minimum NMSE is displayed in Fig. 2.7 (as ”Sim. min.” ). Also, we numerically solved the optimization problem in (P-1) forS n andG n , then evaluated the NMSE (2.41) for these values (displayed in Fig. 2.7 as ”Th. min.”). We further ran the simulation usingS n andG n (displayed in Fig. 2.7 as ”Th. solution Sim. min.”). For the RA simulation we usep2f0:01; 0:025;:05; 0:1; 0:25; 0:5; 1gp 0 , and choose the one that results in the minimum NMSE for each device density, the carrier sensing 9 This is clearly an idealization, since feedback/acknowledgment is a must in such networks. 50 2.4. SIMULATION AND DISCUSSION 2 3 4 5 6 7 8 9 10 x 10 -3 10 -2 10 -1 10 0 10 1 Device Density ( dev./m 2 ) NMSE Sim min Th. min Th. Sol. min TDMA RA Figure 2.7: NMSE for LATS, TDMA and RA schemes. time slot is chosen to be 1 sec. The optimal values ofG n andS n obtained through simulation are identical to the solution of (P-1), and (as can be seen from Fig. 2.7), the theoretical value of the NMSE is within 2% of the simulation. We observe that LATS significantly outperforms both TDMA and RA. This large difference in the performance is due to the fact that LATS takes into account the trade off while TDMA and RA do not. As we indicated earlier, the optimal solution to the CSI acquisition problem is NP-hard. How- ever, to have a better understanding of the performance of the optimal compared to LATS, 10 we consider a simple example of 8 devices distributed randomly on a line of length L N = 150 m, the devices have communication range L c = 25 m. As a result, the segments, training blocks etc. are reduced to lines. We find the NMSE value for the optimal though brute-force simulation, note that for 8 devices there are around 10 5 different scheduling instances. We also compare the result with TDMA and RA as well. Fig. 2.8 shows the results. We first note that for relatively small cell, with respect to the communication range, LATS has similar behavior as TDMA, since 10 Note that we assume that the optimal solution does not a priori know the instantaneous channel realizations, otherwise it negates the goal of the scheme, which is learn exactly the CSI. 51 2.4. SIMULATION AND DISCUSSION 10 -4 10 -3 10 -2 10 -1 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075 Coherence Time NMSE Bruteforce (Optimal) Segmentation (LATS) TDM RA Figure 2.8: LATS vs. Optimal CSI acquisition and other schemes. Simple example:L N = 150m, L c = 25m and 8 device. it relies on random TDM scheduling. For large coherence time a TDMA scheme is close to the optimal. However, as the coherence time decreases, the optimal scheme groups the devices such that: (i) the largest number of simultaneous transmitting devices are scheduled last, on the other hand (ii) the devices that have weakest links, e.g., largest distances, are scheduled during the last time slots and with minimum number of simulations devices. On the other hand, LATS is based on the statistical quality of the CSI and uniform distribution of the devices, it considers a uniform ”weight” for the devices, this relaxation helped in designing a simple scheme that requires only approximate location of the devices and low schedule update rate with good scaling properties. Finally, in contrast to Fig. 2.7, due to the size of the cell, RA does not show any advantage over TDMA for the simulated values of p. Next, we consider the impact of mobility, and effect of the errors in the location. We use the Semi Markov Smooth (SMS) mobility model introduced in [4]. In this model every device have four different states, the acceleration state, ”-phase”, where a device accelerates from zero up to some random velocity in a straight line with random initial angle. After that, the device enters the ”-phase”, where device motion follows a Gauss Markov model. Next, the device enters a 52 2.4. SIMULATION AND DISCUSSION stopping interval, ” -phase”, where it decelerates back to zero. After the device stops it enters the fourth state, where it does not move for sometime before it goes back to the first state, and so on. Every phase lasts for a random duration that depends on the phase. The model guarantees a uniform spatial distribution of devices over time, smooth device turn over and realistic speed variation. For a detailed description of the model refer to [4]. To compare the simple motion model of Sec. 2.3.2 and the SMS model, we tune the parameters in SMS such that the average device speed is the same, namely we assume that the average speed is 5 km/h. Figure 2.9 shows the value of frequency of location updates. It is clear thatF LU is close to the simulation results of the SMS model. Remaining differences in the results can be attributed to the variation in motion direction plus the difference in values of peak velocities. To reduce the value ofF LU we need to increase the size of the segments. This can be achieved by choosing smallerS n . Alternatively, as discussed in 2.3.2.2, we can allow LATS to work with outdated location information, i.e., use controlled location errors as an overhead reduction method. Therefore, the effective segment size will no longer beL s . To evaluateF LU using (2.43), we can define the new segment side length as ~ L s = L s + 2e 0 . Fig. (2.9) compares the evaluation ofF LU based on (2.43) with the simulation of SMS model fore 0 = 1 3 L s . Firstly, notice that the amount of overhead reduces as expected. Secondly, the gap between the theoretical value ofF LU and the simulation of the SMS is increases whene 0 > 0. Finally, we simulate the effect of location error on the value of NMSE. Specifically, we consider the normalized change of the NMSE as function of e 0 . In Fig. 2.10 we consider several values of S n and G n and plot the normalized change of NMSE compared to the case when e 0 = 0 as function of the distance error (in meter). The all-tiers approximation, (2.48), is used to evaluate the theoretical value of the NMSE, where we used the modified pdfs, as in (2.45), to calculate the integrals of the expected distance ratio (2.20). The figure shows that the increase of guard distance for fixed segment size reduces the sensitivity to distance error. On the other hand, the segment size for fixed guard distance, i.e.,G n = 0 in Fig. (2.45), has less impact on the NMSE values against distance error. In any case, LATS shows low sensitivity to small distance error. Interestingly, we may think of e 0 as a third optimization parameter that can be used to modify system behavior 53 2.5. APPENDIX 0 2 4 6 8 10 12 14 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Sn F LU Location Update/(device x second) SMS Th. Simple Sim. Simple e 0 = 0 e 0 = L s /3 Figure 2.9:F LU : number of location updates per device per second for SMS [4] and simple motion model. which impacts the overhead, as shown in Fig. 2.9, as well as the performance. 2.5 Appendix Due to the construction of the grid, the average distance ratio d l m is maximum for m = 0 and minimum form =l, to get relatively easy form for d Intf , let us assume that d l m = d l = 1 2 ( d l 0 + d l l ). Thus, the value of interference that is received from tier l is simply d l scaled by the number of segments in that tier. Since we calculate the average over segments location, our goal is then to calculate the average number of segments per tier. Notice the symmetry of number of segments per tier in a uniform cell, for example any segment at the corner of the cell has 3 segments forl = 1, 5 segments forl = 2, etc. we can write d Intf = 1 N b d l 0 +1 2 e X i=1 d l 0 +1 2 ei X j=0 l 0 X l=1 Z(l;j;i) d l (41(j = 0) + 81(j > 0) (31(j = 0) + 41(j > 0))1(2(i +j) =l 0 + 2)) (2.47) 54 2.5. APPENDIX 0 5 10 15 20 25 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Distance Error e 0 NMSE Grwoth Th. Sn = 1 Gn =0 Sim. Sn = 1 Gn = 0 Th. Sn = 3 Gn = 0 Sim Sn = 3 Gn = 0 Th. Sn = 1 Gn = 1 Sim. Sn = 1 Gn = 1 Th. Sn = 1 Gn = 3 Sim. Sn = 1 Gn = 3 Figure 2.10: NMSEe 0 NMSE e 0 =0 NMSE e 0 =0 vs. distance errore 0 considering the segments in one quadrant of the cell, due to the symmetry,Z is the number of segments in tierl, with respect to the segmentg that is in columni and rowj above the diagonal segments, wherei2f1;::; l l 0 +1 2 m g andj2f0;::; l l 0 +1 2 m ig, which can be written as Z(l;j;i) = 8 > > > > > > > > > > > > < > > > > > > > > > > > > : 8l; l<i 4l + 2i 1; il<i +j 2l + 2i +j 1; i +jl<l 0 ij + 2 l 0 + 1; l 0 ij + 2ll 0 i + 1 0; otherwise The derivation are found below. Further, note that this result can be verified easily. 2.5.0.1 Derivation: The derivation goes as follows : We first argue the existence of symmetry in the number of seg- ments in tier l for segments in the cell. Then, we use the symmetry to derive the number of 55 2.5. APPENDIX segments in tierl for subset of the segments in the cell. Finally, in the given subset we will no- tice a deterministic relation between the position of the segment in the subset and the number of segments in the tier, this will be the base step for writing the expression. 2.5.0.2 The Symmetry The simple structure of the square cell and the definition of the interfering tiers in LATS create symmetric relation between the segments, for instance it is clear that any segment at the corner of the cell will have only three ”possible” interfering segments in the first tier (Z g = 3), and five in the second,(Z g = 5) . Similarly any segment that is at the edge of the cell, except the ones at the corners, will have five segments in the first tier, (Z g (1) = 3). In this part we utilize such symmetry to find the number of segments per tier, Z g (l), for given segment g. Utilizing such symmetry we focus on subset of all segments. For p N b segments per side, it is easy to see that for the segments at the diagonal and cross diagonal of the cell we can focus on subset of them, eg. the segments on (i;i) for8i2f1;:::;d p N b 2 eg, and notice that for each of the (i;i) there are 3 other segments that has the same numberZ, except when p N b is odd the segment (d p N b 2 e;d p N b 2 e) is unique. Similarly, for the segments of the diagonal we have symmetry, and can focus on the segments (i;i +j)8i2f1;:::;d p N b 2 eg and8j2f0;:::;d p N b 2 eig, for each of these segments there are 7 other segments in the cell that have the sameZ, except when p N b is odd, the segments at (i;d p N b 2 e)8i2f1;:::;d p N b 2 e1g have 3 other segments in the cell that have the sameZ. Figure 2.11 depict such subset and the segments that have the same values ofZ. As a result in the next part we will focus on the subset of segments that belong to (i;i +j) 8i2f1;:::;d p N b 2 eg and8j2f0;:::;d p N b 2 eig, and derive the value ofZ(l;j;i) which can be easily mapped toZ g above. 2.5.0.3 Number of Segments Per Tier To derive the the value ofZ we first find it for the diagonal segments, i.e., (i;i) for8i2f1;:::;d p N b 2 eg, then for the ones at the edge, i.e., for segments at (1; 1 +j) for8j2f0;:::;d p N b 2 e 1g, finally we generalize that for the rest of the segments in the subset. 56 2.5. APPENDIX 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 2 2 2 2 2 2 6 15 4 5 3 4 5 7 8 9 11 12 14 3 4 5 6 7 8 9 10 11 12 13 14 15 3 7 8 9 11 12 14 4 5 3 5 6 7 8 9 10 12 14 15 3 7 9 11 12 14 4 11 13 8 5 12 14 9 8 14 4 11 13 3 10 7 11 12 7 9 8 4 5 3 Figure 2.11: Subset of segments and the segments that have the same value ofZ 2.5.0.4 Diagonal Segments First notice that the maximum number of segments in any tier is 8l. By construction of the segments in the subset, tierl has full number of segments, i.e., 8l, ifl<i. Once becomesli it is impossible to have full tier. For instance, for (1; 1), we haveZ(1; 0; 1) = 3,Z(2; 0; 1) = 5, Z(3; 0; 1) = 7, etc. In other words, we haveZ(l; 0; 1) = 2l + 1, where 1 l < l 0 + 1. For the segment (2; 2), we haveZ(2; 0; 2) = 7,Z(3; 0; 2) = 9,Z(2; 0; 2) = 11, which can be generalized asZ(l; 0; 2) = 2l + 3, where 2 l < l 0 , Fig. 2.12 displays the tier for (2; 2). We can generalize this to beZ(l; 0;i) = 2l + 2i 1 whereil<l 0 i + 2, note thati +ll 0 + 1, otherwise the tier will be outside the cell. To summarize we can write the value ofZ Z(l; 0;i) = 8 > > > > < > > > > : 8l; l<i 2l + 2i 1; il<l 0 i + 2 0; otherwise Segments Edge segments are in on (1; 1 +j)8j2f1;:::;d p N b 2 eig. Where we have excluded the segments in the corners since it was included in previous case. As shown in Fig. 2.13 the first tier,l = 1, for segment (1; 4),Z(1; 3; 1) = 5. Forl = 2 we noticeZ(2; 3; 1) = 9,Z(3; 3; 1) = 13, this can be 57 2.5. APPENDIX 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 2 2 2 2 2 2 6 15 4 5 3 4 5 7 8 9 11 12 14 3 4 5 6 7 8 9 10 11 12 13 14 15 3 7 8 9 11 12 14 4 5 3 5 6 7 8 9 10 12 14 15 3 7 9 11 12 14 4 11 13 8 5 12 14 9 8 14 4 11 13 3 10 7 11 12 7 9 8 4 5 3 Tier 8 for segment (2,2) Tier 1 for segment (2,2) Segment (2,2) 1 Figure 2.12:N b = 10,l 0 = 9 Tiers for segment at (2; 2);i:e:;i = 2;j = 0 written asZ(l;j; 1) = 4l+1, (j > 0), for 1l<j +1. Oncelj +1 we have different series, it is the same as the diagonal with additional number of segments that depends onj,Z(4; 3; 1) = 12, Z(5; 3; 1) = 14,Z(6; 3; 1) = 16. We haveZ(l;j; 1) = 2l+j +1. However, whenj +1+l>l 0 +1 we haveZ(5;j; 1) =l 0 + 1. Wherell 0 We can summarize that by: Z(l;j; 1) = 8 > > > > > > > > < > > > > > > > > : 4l + 1; 1l<j + 1 2l +j + 1; j + 1l<l 0 j + 1 l 0 + 1; l 0 j + 1ll 0 0; otherwise GeneralZ Here we consideri > 1 andj > 0. We first notice thatZ(l;j;i) = 8l whenl < i. Oncel i and whenl<j +i we have a case that is slightly similar similar to the ”edge segments”, however, we have additional segments in the tier depending on i, thus in this case we have Z(l;j;i) = 4l + 1 + 2(i 1) = 4l + 2i 1. For instance the segment (3; 5) we have: Z(3; 2; 3) = 17, Z(4; 2; 3) = 21. Oncel i +j andl +i +j l 0 + 1,(i.e.,i +j l < l 0 ij + 2), for (i;i +j) have a slightly similar case to the diagonal elements that is on (i;i), however we have to do a modification 58 2.5. APPENDIX 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 2 2 2 2 2 2 6 15 4 5 3 4 5 7 8 9 11 12 14 3 4 5 6 7 8 9 10 11 12 13 14 15 3 7 8 9 11 12 14 4 5 3 5 6 7 8 9 10 12 14 15 3 7 9 11 12 14 4 11 13 8 5 12 14 9 8 14 4 11 13 3 10 7 11 12 7 9 8 4 5 3 Tier 3 for segment (1,4) Segment (1,4) 1 Tier 6 for segment (1,4) Tier 9 for segment (1,4) Figure 2.13: Tiers for a segment at (1; 4), i.e.,i = 1 andj = 3. (N b = 10,l 0 = 9) depending on the value ofj, we haveZ(l;j;i) = 2l + 2i +j 1. If the the value ofl +i +j >l 0 + 1 butl +il 0 + 1, (i.e.,l 0 ij + 2ll 0 i + 2), we haveZ(l;j;i) =l 0 + 1. Figure 2.14 shows the tiers for segment (3; 5), (i.e., we havei = 3;j = 2). When for l < 3, we have full tier, eg. Z(2; 2; 3) = 8 2 = 16, when 3 l < 5 we have the second case, Z(4; 2; 3) = 4 4 + 2 3 1 = 21. The third case occurs when ,5 l < 6 we haveZ(5; 2; 3) = 17. When 6 l < 8 we haveZ(6; 2; 3) = 10. For larger value ofl,we have Z(l;j;i) = 0. We can write the general case as Z(l;j;i) = 8 > > > > > > > > > > > > < > > > > > > > > > > > > : 8l; l<i 4l + 2i 1; il<i +j 2l + 2i +j 1; i +jl<l 0 ij + 2 l 0 + 1; l 0 ij + 2ll 0 i + 1 0; otherwise As it is mentioned, the four cases are generalization for diagonal and edge segments. 59 2.5. APPENDIX 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 2 2 2 2 2 2 6 15 4 5 3 4 5 7 8 9 11 12 14 3 4 5 6 7 8 9 10 11 12 13 14 15 3 7 8 9 11 12 14 4 5 3 5 6 7 8 9 10 12 14 15 3 7 9 11 12 14 4 11 13 8 5 12 14 9 8 14 4 11 13 3 10 7 11 12 7 9 8 4 5 3 Tier 4 for segment (3,5) Segment (3,5) 1 Tier 5 for segment (3,4) Tier 7 for segment (3,5) Figure 2.14: Tiers for a segment at (3; 5), i.e.,i = 3 andj = 2. (N b = 10,l 0 = 9) 2.5.0.5 Average Number of Segments Per Tier We can now write the average number of segments per tier for a square cell will withN b segments. We use the aruments in subsection 2.5.0.2 to consider the odd and even cases. Z(l) = 1 N b d l 0 +1 2 e X i=1 d l 0 +1 2 ei X j=0 Z(l;j;i) (41(j = 0) + 81(j > 0) (31(j = 0) + 41(j > 0))1(2(i +j) =l 0 + 2)) (2.48) 8l2f1;:::;l 0 g 60 Chapter 3 Expected Discovery Time With Prior Information 3.1 Introduction In peer-to-peer (P2P) communication, wireless nodes transmit the data packets directly between nodes, without detour via the infrastructure. This provides flexibility and low energy consumption. It further can provide high spectrum efficiency, which improves the sum rate capacity in the net- work. For these reasons, P2P has been studied extensively for many years, and is used whenever dedicated infrastructure is either too expensive (often in sensor networks), impractical to deploy (e.g., military communications), or not spectrally efficient (ad hoc communications in wireless LANs). Furthermore, recent system designs have found the combination of infrastructure-based transmission of control information with P2P based payload communications to be both robust and very spectrally efficient. Consequently, the Third-Generation Partnership Project 3GPP added this communication model, under the name of device-to-device (D2D) [14, 23], as one feature in LTE-Advanced [2], and also will make it a part of the next generation mobile networks (5G) [8]. However, setting up and scheduling a direct communication link between two nodes requires that such a link has sufficient channel quality (low attenuation); nodes that fulfill this condition and thus can effectively communicate with each other are called ”neighbors”. To identify these 3.1. INTRODUCTION high quality links, nodes have to perform neighbor discovery. In case nodes are equipped with adaptive directional antennas, they furthermore need to determine the required beam direction to communicate with a neighbor. In a typical neighbor discovery process, at every time instant, an active node acts as a trans- mitter or a receiver. When a node is in transmission mode, it transmits its unique identifier (ID) over the wireless channel. Nodes that successfully receive the ID are considered neighbor nodes associated with the received ID. The neighbor discovery process needs to be repeated over several time slots, due to half-duplex constraint of the transceivers, the presence of a sleeping mode of the nodes, and especially the possible interference from other nodes. An important class of discovery algorithms are random algorithms, where each node randomly selects the time and direction of transmission (see Sec. 3.2 for more details); this category is at the center of our work. Typically, the goal for neighbor discovery schemes is to minimize the time required for nodes to discover their neighbors. Modeling the required time for neighbor discovery is thus of a prime importance, as insight into its dependence on system parameters allows to find schemes that accel- erate network initialization and reduce energy consumption. Furthermore, the theoretical formula- tion of the discovery time, possibly for simplified models, provides a benchmark that may facilitate development of complex schemes. In addition, in several cases, nodes might have prior information about their neighbors. This information can be used to improve the neighbor discovery process. The prior information could be available through assistance of a base-station (BS) [69] [70], cooperative discovery schemes, prior neighbor discoveries and several other means. Dual band systems [71] are an interesting example where prior information could be available; the most practically important case is where nodes use both the cmWave and millimeter-wave (mmWave) bands. 1 This raises challenges in neighbor discovery. Since the propagation characteristics in the two bands are correlated but not identical, neighbor relations in the two bands could be different. An efficient resource utilization needs to exploit the finite correlation between the bands. 1 In fact, as highlighted in Chapter 1, it is expected that service providers, in the next generation wireless networks, will add mmWave links to the current (licensed or unlicensed) cmWave bands [8], we use ”cmWave” as shorthand for ”sub - 6 GHz” and ”mmWave” for ”above 6 GHz”. 62 3.1. INTRODUCTION The neighbor discovery problem has been addressed extensively in the literature. The neighbor discovery schemes can be classified depending on how nodes choose their transmission decisions: (i) deterministic, (ii) random or (iii) partially random. In deterministic schemes nodes follow predefined sequences, such as specific on-off patterns in single antenna systems [72], or given se- quences to scan the different directions in multi antenna systems [73, 74]. In random schemes, nodes choose the transmission states randomly, a famous example is the birthday protocol [75]; furthermore in multi-antenna (adaptive beamforming) systems, nodes randomly choose the beam directions and transmission states with predefined probabilities [76]. In partially random schemes, a combination of random and deterministic techniques are used, for instance nodes may sequen- tially scan all directions and choose their transmission state at random [77]. Although deterministic and partially random schemes may outperform fully random schemes in a number of scenarios, in this work we consider random schemes, since random schemes are a class of algorithms widely investigated in the literature, and in many cases they offer relative simplicity and have good scaling capabilities. Random neighbor discovery schemes can be further sub-categorized depending on the hand- shaking procedure: (i) one-way neighbor discovery, where nodes assume no handshaking proce- dure is used, (ii) two-way handshaking, where a feedback is transmitted in response to a successful discovery, and (iii) three-way (or more) handshaking procedure, where the feedback and confir- mation are exchanged between the transmitter and the receiver, see [73]. Although the schemes (ii) and (iii) incur longer time-slots, they can reduce the discovery time due to the adaptive nature of the schemes. The scheme (i) could be viewed as the basic and the ”natural” random neighbor discovery scheme, where system requirement and algorithm design are relatively simple. In this chapter, we study the expected discovery time for randomized directional one-way neighbor discovery when prior information is available. With prior information, nodes can have different transmission and beam-steering probabilities, which results in a generalized and mathe- matically challenging multivariable model for expected discovery time. We assume the knowledge of supersets of each node’s neighbors. Although this assumption seems an idealization of some practical scenarios, it could be reasonable for the cases mentioned earlier, such as BS assisted and 63 Related Work dual band systems. The superset assumption is reasonable in the latter case due to the channel attenuation characteristics of the two bands [18]. Nevertheless, the proposed model and bounds in this chapter hold true for any one-way neighbor discovery scheme. Note that when no prior information is available, the analysis provides the ultimate performance limits of such randomized schemes. Furthermore, we will consider the impact of uncertainty in the prior knowledge (relaxing the superset assumption) in Sec. 3.5. 3.1.1 Related Work Several other works have analyzed the expected discovery time, [73, 78–83], however, network models and neighbor discovery techniques and other assumptions are significantly different from our work. Refs. [78–81, 83], use the clique assumption to derive the expected discovery time, i.e., that every node and its neighbor constitute a clique such that each node is a neighbor to all other nodes in the clique. As a result, all nodes transmit with equal probability. Other work, e.g., [73,82], use an equivalent assumption, namely that all nodes use equal transmission probability. This is reasonable for a uniform neighbor relation for all nodes, i.e., nodes have equal number of neigh- bors in all directions. 2 Refs. [78, 79, 81–83] consider single antenna systems. Ref. [73] considers beam-steering capabilities of nodes, but assumes uniform probabilities for steering the beams into different directions. In reality, with the prior information assumption, the transmission probability and beam-steering probabilities need not be the same. In addition, the neighbor discovery schemes considered in [73,79,83] consider handshaking and collision detection procedures, while the work in this chapter is based on a one-way neighbor discovery model. [80] offers a similar approach to our work, as it derives the expected discovery time for multi-antenna one-way (in addition to two- ways) schemes, however, it uses the simplifying assumption of uniform transmission probabilities and beam-steering probabilities for one-way schemes, as a result the analysis are not applicable to the problem at hand. Other papers minimize the discovery time by maximizing the probability of success [76], or 2 Thus, the term uniform neighbor relation here and throughout the chapter also indicates a uniform or isotropic structure of the network. 64 Chapter’s Objectives number of discovered links in a time slot [75]. Thus no explicit expression of discovery time is required. These methods work in a network with equal and uniform probability of success, which is not the case in our setting. The dual band communication scenario can also be viewed as an example of multi-channel communication. Several schemes addressed the multi-channel neighbor discovery, however, the basic assumptions are substantially different. For instance, [84] and [85] provide discovery time analysis based on the clique assumption. Furthermore, some suggested schemes assume discovery over phases and dynamic transmission probability. In addition, they consider perfect neighbor relations over the frequencies. [86] considers the problem where every node exists in a subset of the frequencies and a node can transmit and receive in a single frequency at a time. The transmission probabilities in [86] are designed based on the degrees of the nodes, which could be different for different nodes. Yet the time analysis is based on the pre-specified probabilities of transmission and using bounds on probability of success. Additionally, the schemes aim to discover the nodes in all frequencies, where nodes switch between the frequencies uniformly at random. However, in the envisioned neighbor discovery scenario in the dual band example, nodes are aware of neighbors in the cmWave band and need to discover their neighbors in the mmWave band, thus no switching is necessary. The network assisted neighbor discovery problem in D2D was mainly addressed in the context of current and future technologies. For instance [52] and [87] discuss how to efficiently integrate the D2D neighbor discovery with the LTE-Advanced systems. Thus the neighbor discovery models and the provided analysis are different from the current work. Finally, our work in [71] considers the neighbor discovery in dual band systems, where we developed a distributed scheme that depends on the local expected discovery time. However, no analysis of the expected discovery time or impact of prior neighbor information is given. 3.1.2 Chapter’s Objectives The assumption of uniform neighbor relation of all nodes becomes invalid under prior neighbor information assumption. Intuitively, with prior information, the transmission and beam-steering 65 Chapter Organization probabilities could be different for different nodes. In this chapter we tackle three main questions related to the expected time for one-way directional neighbor discovery schemes: (i) how to model it, (ii) how to optimize it, and (iii) how the first two are impacted by a probabilistic uncertainty on the prior knowledge. The answers to these questions are also related to any one-way neighbor discovery scheme. 3.1.3 Chapter Organization The remainder of the chapter is organized as follows: Section 3.2 introduces the system model with some possible implementation examples and the key parameters in the randomized schemes, namely the probability of successful discovery; we also discuss different objective functions related to discovery time. Section 3.3 formulates the expected discovery time as a non-uniform coupon collector problem, and provides an upper and a lower bounds for the expected discovery time. Section 3.4, based on bounds, proposes two metrics for expected discovery time minimization. Section 3.5 extends the model of Section 3.2 by introducing uncertainty to the prior information. Section 3.6 provides extensive simulation results to illustrate the performance of the proposed op- timization techniques, the one-way neighbor discovery [76], as well as the lower bound. Finally, the appendix, Section 3.7, provides additional discussions and derivations to some of the chap- ter’s theorems. The appendix is concluded with a brief discussion related to two low complexity approaches to the neighbor discovery with prior information. 3.2 System Model and Randomized Neighbor Discovery 3.2.1 System Model 3.2.1.1 The Basic Model We consider a synchronized time-slotted wireless network withN static nodes. Nodes use switched beam antenna arrays with beam width!. To be able to cover its surrounding, each node requires =d 2 ! e directions. Let2f1;:::; g denote a direction. Further, due to cost and hardware 66 System Model complexity we assume that nodes have single RF chains, and thus nodes can transmit or receive in a single direction at a time. We also assume that node i can only communicate with another neighboring node j over a single direction (e.g., line of sight), denoted by i;j . 3 LetN i; be the superset of nodes thati can communicate to in direction, and letN i =[ N i; . Further, letN i; denote the set of neighbors thati can communicate with effectively in direction, i.e., the ”true” set of neighbors in direction, and letN i =[ N i; . Each node is assigned a unique ID and other nodes will recognize it as a neighbor when they receive this ID. We consider the collision model, where a node can receive the transmitted ID in a certain direction if no other neighbor of the receiving node transmits to it in the same direction, i.e., no other node’s ID collides with the transmitted ID. The goal of the directional neighbor discovery problem is for each node i to identify all its neighborsN i; for all with the prior knowledge ofN i; for all . In Sec.3.5 we generalize the model by allowing for uncertainty inN i; , due for example to limited motion or mismatch in node capabilities over different bands in dual band systems. Finally, note that the determination of the tuning parameters of the randomized scheme requires a central controller or a coopera- tive/distributed scheme (see appendix 3.7.4), the latter is out of the scope of this manuscript. 3.2.1.2 Implementation Examples The model above could be realized in different applications. Here we highlight three possible setups. • Dual band system, in this chapter we will frequently refer to it as it provides a simple illustra- tive example. Given the channel characteristics of the cmWave and mmWave bands [18], we can viewN i; andN i; as neighbor nodes in direction in the cmWave band and mmWave band, respectively; Fig. 3.1 shows an illustration of neighbor relations in a dual band system. Nodes could perform neighbor discovery in one band, e.g., lower band, to getN i; . 3 Note that with some straightforward modification to the probability of success, the analysis and the bounds still hold if multi-directional communication is assumed, however, the convexity of the resulting optimization problem needs further investigation. 67 System Model Figure 3.1: Neighbors of nodei in the cmWave and mmWave frequency bands. In this example, in direction i nodei has three neighbors in the lower frequency band and one neighbor in the upper frequency band, i.e.,jN i; i j = 3 andjN i; i j = 1. • BS-assisted D2D, when the BS has access to device locations, for each nodei it could es- timateN i; . It is important to keep in mind that due to shadowing/blocking in wireless communication location proximity does not necessarily translate to neighborhood relation. • Networks that require frequent hello-like message exchange for neighbor discovery and topology maintenance. This could possibly include some ad hoc networks, the setN i; in this case could be the set of previously discovered nodes in direction. 3.2.1.3 Notation We use calligraphic uppercase letters to denote random variables (r.v.), events, or sets, e.g.,N ; normal uppercase letters to denote constants or cardinality of sets, e.g., N x =jN x j, wherej:j is the cardinality operator. We use lowercase letters to denote variables or realization of random variables. Further, we use several indexing methods when we refer to beam directions: (i) double index, e.g., i;j for the beam direction thati uses to communicate withj, which is different than j;i (ii) single index, e.g., i to represent the beam direction thati uses at a certain time slot, and (iii) non indexed, for a generic beam direction. Table 3.1 provides the key mathematical symbols used in this chapter. 68 Probability of the Successful Discovery Table 3.1: Table of key mathematical symbols i Beam direction of nodei in a time slot (random variable). q i; Probability that devicei steers its beam to direction. p i; Given i =, probability thati transmits. i;j Direction thati uses to communicate withj. N i; Set of true neighbors ofi in direction. T j;i Time fori to discoverj. T i Time fori to discover all its neighbors. T i Expected value ofT i , i.e.,EfT i g, see Sec. 3.2.3. T Average expected discovery time in the network, i.e., 1 N P N i EfT i g. T max Maximum expected discovery time in the network, i.e., max i EfT i g. Q Set of all beam-steering probabilities. P Set of all transmission probabilities. P j;i Probability thati discoverj in a given time slot. P (min) i The minimum discovery probability for nodei, i.e., min j P j;i ; j2N i . P (H) i The harmonic mean ofP j;i ;j2N i . ij ( i ) Probability thati hasj as a neighbor in direction i . 3.2.2 Probability of the Successful Discovery In this work we analyze the expected discovery time for a randomized scheme where each node either transmits or listens with certain probability. That is, nodei randomly selects a direction with some probabilityq i; 2 [0; 1], and then chooses to transmits its signature in that direction with some probabilityp i; 2 [0; 1] or listens with probability (1p i; ). Specifically, P( i =) =q i; ; (3.1) X 2f1;:::;g q i; = 1; (3.2) P(S i = 1j i =) =p i; ; (3.3) where the r.v.S i denotes the state of the node which is one if it is transmitting and zero otherwise. The r.v. i denotes the beam direction of node i in a given time slot. LetD j;i denote the event 69 Objective Function of node i detecting node j, i.e., i receives node j’s ID successfully. For node i to successfully discover nodej, three conditions must be satisfied [76,88]: (1) nodej needs to transmit in direction j = j;i which occurs with probabilityp j; j;i q j; j;i . (2) Nodei has to listen in direction i = i;j that happens with probability (1p i; i;j )q i; i;j . (3) No other neighbork2N i; i;j transmits in the direction that causes interference toi discoveringj, we denote this event byk6!D j;i . Apparently, P(k6!D j;i ) = (1p k; k;i q k; k;i ). Thus, the probability of successful discovery is given by P(D j;i ) =p j; j;i q j; j;i (1p i; i;j )q i; i;j Y k2N i; i;j (1p k; k;i q k; k;i ): (3.4) As we will discuss later, the expected discovery time is a function of the probability of successful discovery, which clearly is function ofp i; i andq i; i for all directions i and for all nodesi. Thus, the goal for any randomized discovery scheme is to identify the optimum transmission and beam- steering probabilities. For clarity of presentation we useP j;i =P(D j;i ). The value ofP j;i depends on nodes inN i; , which is the set of nodes thati has yet to discover! To circumvent this we initially assume thatN i; = N i; , this turns the problem into verifying the existence of the neighbors. 4 Due to this assumption, for given values of p i; and q i; 8i8, the true probability of success is lower bounded by the value in (3.4). Alternatively, in Sec. 3.5 we adopt a probabilistic model to capture the correlation betweenN i; andN i; . Additionally, in the simulation section, we study the impact ofN i; =N i; assumption on the performance when N i; N i; . 3.2.3 Objective Function As discussed earlier, discovery time is a key property for any neighbor discovery scheme. For instance, allocation of resources (time slots) for neighbor discovery in BS-controlled D2D requires knowledge of the duration of the discovery process [2, 89]. Let the random variableT i (in units of time-slots) be the duration that nodei requires to discover its neighbors. The probability distribu- 4 This is also relevant to other applications, for instance channel acquisition between devices (nodes) in D2D net- work as in Chapter 2, where the set of neighbors are known, and we need to exchange ”signature” messages frequently. 70 Objective Function tion ofT i can be found using the inclusion exclusion principle, [90], to calculate the probability thatT i is larger thant. Alternatively, using the Poisson approximation argument [90,91], a simpler form that can be given by P(T i t) = Y j2N i (1 exp P j;i t ): (3.5) There are numerous ways to represent the discovery time in a network withN nodes. In this work we focus our attention on two objective functions: (i) average discovery time 1 N P N i T i , and (ii) time for all nodes to discover all their neighbors, i.e., max i T i . Obviously, the two objectives are distinct. In general, for two nodes, i 6= j,T i andT j are not independent and non identical. Thus analyzing the joint distribution or high order moments are very challenging and network dependent, even when (3.5) is used. Instead, to conveniently analyze the mentioned objectives, we work with their expected values, where the expectation is over the random decisions of nodes, namely the transmission states,S i ;8i, and the directions of the beams i ;8i. Nevertheless, in Appendix 3.7.3 we discuss the relation between the expectation and its bounds with the distribution (3.5) for one node. Taking the expectation for the two objectives, we have: (i) T , 1 N P N i EfT i g = 1 N P N i T i , where T i , EfT i g, and (ii) Efmax i T i g. As discussed above, deriving the joint distribution to average max i T i is very complicated. We instead use T max , max i T i , i.e., we aim to minimize the maximum expectation rather than the expectation of the maximum. The relation between the two follows Jensen’s inequality, T max E ( max i T i ) : (3.6) Finally, it is worth mentioning that in a network with uniform structure or when using the simplified uniform neighbor relations assumption we have T = T max = T i . 71 3.3. EXPECTED DISCOVERY TIME AND BOUNDS 3.3 Expected Discovery Time and Bounds The objective functions in 3.2.3 depend onEfT i g. In this section, we first model theT i as the col- lection time of a non-uniform coupon collector problem and then derive lower and upper bounds. 3.3.1 Expected Discovery Time LetT j;i denote the time that nodei takes to discover nodej2N i . 5 Then T i = max j T j;i : (3.7) Since the nodes in the network make their decisions independently over every time slot,T j;i is a geometric random variable. Further, note that for two nodesj;k2N i ;j6=k, the eventsD j;i and D k;i are disjoint, this is true since we use the collision model, i.e., nodei may discover nodej ork in any given time slot but not both. Using these observations, the problem can be viewed as a non-uniform coupon collector prob- lem [92]. Specifically, for nodei, consider the neighbor nodes as anN i ,jN i j distinct coupons. Also, letP j;i be the probability of collecting couponj. Then,T j;i is the time to collect coupon type j andT i is the time required to collect all theN i coupons. As shown in [92] T i = X j2N i 1 P j;i X k;j2N i ;k6=j 1 P j;i +P k;i +::: + (1) N i +1 1 P j2N i P j;i : (3.8) As indicated in [90], the proof of this relation relies on two aspects of the problem; firstly, the minimum time to collect any two coupons (to discover any two neighbors), e.g., min(T j;i ;T k;i ), is also a geometric random variable with parameter equal to the sum of their individual probabilities, e.g.,P j;i +P k;i . Secondly, the ”max” can be rewritten using the inclusion exclusion principle [90]. 5 Not to confuseD j;i withT j;i the former corresponds to a Bernoulli random variable, while the latter is related to a geometric random variable. 72 The Upper Bound Specifically, using the definition ofT i , (3.7), E ( max n T j;i ;T k;i ;::: o ) = X j2N i E n T j;i o X j;k2N i ;j6=k E ( min n T j;i ;T k;i o ) + X j;k;r2N i ;j6=k6=r E ( min n T j;i ;T k;i ;T r;i o ) ::: As a special case, when P j;i = P k;i = = P i , i.e., i discovers all its neighbors with equal probability, it is easy to show, as in [90], that T i = H N i P i ; (3.9) where H X is theX th harmonic number, i.e., H X = P X x=1 1=x log(X) + 0:557. Remark: This special case corresponds to a typical coupon collector problem, as discussed in 3.1.1, which has been extensively used to model the discovery time when no prior information is available, and thus clique and uniform neighbor relations assumption are justified. 6 3.3.2 The Upper Bound In this subsection, we find an upper bound for the expected discovery time, T i . The result is summarized by the following theorem. Theorem 3.3.1. The expected discovery time is upper bounded as follows, T i H N i P (min) i ; whereP (min) i , min j2N i P j;i . Intuitively, the bound is a result of reducing all the probabilities in (3.8) to P (min) i , which increases the expected discovery time; in Appendix 3.7.1 we provide a formal proof. 6 For instance, with abuse of notation, in a clique withN i + 1 nodes with probability of successP i = 1 (N i +1)e , as in [78], will have T i = H N i +1 (N i + 1)e. 73 The Lower Bound 3.3.3 The Lower Bound Here, we derive a lower bound to the expected discovery time. Theorem 3.3.2. The expected discovery time for node i, is lower bounded by the maximum ex- pected time needed to discover any neighborj2N i , i.e., T i max j2N i E n T j;i o ; 8j2N i . Proof. Noting that the max function is convex, Theorem 3.3.2 can be proven by Jensen’s Inequal- ity. Remember that T i =EfT i g. Using (3.7), we have E ( max j T j;i ) max j2N i E n T j;i o = max j2N i 1 P j;i : (3.10) In general, we expect (3.10) to be tight when one of the success probabilities is significantly smaller than the other success probabilities, especially when the number of the neighbor nodes is small. The bound presented in the following theorem addresses the case when the variation of success probabilities is relatively small. Theorem 3.3.3. The expected time that node i takes to discover all its N i neighbors, T i , in an arbitrary network is lower bounded as follows: T i H N i P i ; where P i , 1 N i P j2N i P j;i . Proof. The proof can be found in Appendix 3.7.3, or can be concluded from an equivalent result in [93]. 74 3.4. OPTIMIZATION Remark 1: Theorem 3.3.3 indicates that the discovery time is minimized when all probabilities of success are equal, this can be visualized for a uniform network, where each node has equal number of neighborsN i . If neighbors are located equally and uniformly in all directions, then the nodes (and the network) seem to have a smaller expected discovery time than in a random network. Remark 2: The bounds in Theorem 3.3.2 and Theorem 3.3.3 emphasize two extreme sce- narios; the former the case when discovering one of the neighbors is relatively very hard, while the latter the case when all of the neighbors are discovered with comparable difficulty, and thus emphasizes the number of neighbors. To cover both cases, we propose the following lower bound: T i max 8 > > < > > : H N i P i , 1 P (min) i 9 > > = > > ; = T LBi : (3.11) 3.4 Optimization Thus far we have discussed the discovery time and derived a lower and an upper bound. In this section we provide methods to find the ”tuning parameters” of the randomized discovery scheme, i.e., p i and q i 8i and8. Since the goal is to minimize the expected discovery time, we can minimize either of the objective functions in 3.2.3. However, we note these two objective functions rely on T i , which is very complicated and non convex, for instance if nodei hasX neighbors T i has 2 X terms which makes it hard to optimize. 7 Here, based on section 3.3, we provide two relatively simpler optimization metrics and prove that they can be written in convex form. 7 Note that [92] provides an alternative formula to calculate T i (3.8) through integration of the Complementary CDF (CCDF) (3.5). Although this method uses smaller memory to calculate T i , it requires more computation effort due to the need to perform numerical integration. In simulation section we use the suitable one based on the setup, as discussed in Sec. 3.6. 75 Upper Bound Based Metric 3.4.1 Upper Bound Based Metric We use the bound in Theorem 3.3.1 as a metric. Thus, for T we have, min P Q 1 N N X i max j2N i H N i P j;i ; (3.12) where we used the definition ofP (min) i in Theorem 3.3.1 and used the fact that the 1 x is a strictly decreasing function (x> 0). Similarly, for T max we have min PQ max i max j2N i H N i P j;i ; (3.13) whereP andQ are sets of all transmission and beam-steering probabilities in the network, respec- tively. Note that the objective functions in (3.12) and (3.13) contains reciprocals of Posynomial, which are not convex in general [94, Ch. 4]. In the following we show that both optimization problems can be rewritten as geometric programs and then transformed into convex form. We start by writing the full optimization problem related to (3.12), we have OPT-AvgWPs0 min P;Q N X i=1 max j2N i 0 B B @ H N i p j; j;i q j; j;i (1p i; i;j )q i; i;j Q k2N i; i;j (1p k; k;i q k; k;i ) 1 C C A subject to: X q i; = 1; 8i q i; ;p i; 2 [0; 1]; 8 8i where we used (3.4) to substitute forP j;i . Next, we define the dummy variables y i; i;j and z k; k;i , and introduce the new constraints as 76 Upper Bound Based Metric follows: OPT-AvgWPs min P;Q N X i=1 max j2N i 0 B B @ H N i p j; j;i q j; j;i y i; i;j q i; i;j Q k2N i; i;j z k; k;i 1 C C A subject to: y i; i;j 1p i; i;j ; j2N i ;8i z k; k;i 1p k; k;i q k; k;i ; k2N i ;8i X q i; 1 ; 8i;q i; ;p i; 2 [0; 1] ; 8 8i This optimization problem can be converted to convex form [94, Ch. 4], and thus solved using standard convex optimization toolboxes. In the following we prove that the solution of OPT- AvgWPs0 and the solution of OPT-AvgWPs are the same: Lemma 3.4.1. There exists an optimal solution of OPT-AvgWPs where all the inequalities hold with equality. Proof. Let ^ P and ^ Q denote the optimal solution of OPT-AvgWPs. Assume that for some nodes i, we haveJ i N i , such thaty i; i;j < 1p i; i;j and/orz k; k;i < 1p k; k;i q k; k;i forj;k2J i . Then increasing the values ofy i; i;j and/orz k; k;i , forj;k2J i , will either decrease the objective function or leave it unchanged. Thus, all the first and second types inequalities in OPT-AvgWPs will be active. Further, assume for a nodei, P q i; < 1. Take a direction i 2f1;:::; g and define q i; i =q i; i + i p i; i = i p i; i ; where i = 1 P q i; and i = q i; i q i; i + i . Note that p i; i q i; i = p i; i q i; i and y i; i q i; i = (1 p i; i )q i; i (1p i; i )q i; i . Hence the optimal cost will be either reduced or stay unchanged. The former case contradicts the optimality assumption. As a result, the third type inequalities will also 77 Approximation Based Metric be active. Thus we can solve the convex optimization problem OPT-AvgWPs. Similar steps can be used to arrive to a convex optimization problem for (3.13). The only difference is to replace the sum with a max operator, which can still be converted to convex form [94]. In the simulation section we refer to the scheme that solves for convex form of OPT-AvgWPs as AvgWPs. Similarly we refer to the one that solves the convex form related to (3.13) function by MaxWPs. The upper bound is based on the minimum probability of successful discovery, as discussed in 3.3.3, this quantity does not always capture the behavior of the expected discovery time. This is also clear from the proof of Lemma 3.4.1; we are able to increase the probability of success with other neighbor nodes without impacting the value of the objective function. As will be discussed in the simulation section, this observation gives an indication when this metric is most useful. 3.4.2 Approximation Based Metric The alternative metric that we provide in this part is based on the harmonic mean of probabilities of success. That is, T i is approximated as follows, T i T 0 i , max 8 > > < > > : H N i P (H) i , max j2N i 1 P j;i 9 > > = > > ; ; (3.14) where P (H) i is the harmonic mean of P j;i , j2N i , i.e., P (H) i = N i P j2N i 1 P j;i . In fact, simulations show that T i ' T 0 i . This could be roughly explained through two observations; (i) T i is mostly dominated by the probabilities of success with small values, and the harmonic mean approximates that well. (ii) Similar to the bound in Theorem 3.3.3 and as indicated in [95], replacing a diverse values of probabilities with a condensed one could possibly result in a lower bound. The rigorous analytical explanation seems to be quite involved. Thus, we do not provide any guarantee on how smallj T i T 0 i j, except the following theorem, 78 Approximation Based Metric Theorem 3.4.2. The approximation T 0 i is bounded as follows max 8 > > < > > : H N i P i , max j 1 P j;i 9 > > = > > ; T 0 i H N i P (min) i Proof. The proof of the upper bound is straight forward. For the lower bound it is enough to show that: H N i P i H N i N i X j2N i 1 P j;i This can be proven using the relation between the harmonic and arithmetic means, that is, 1 N i X j2N i P j;i N i P j 1 P j;i Taking the reciprocal of both sides and flipping the direction of the inequality completes the proof of the theorem. The optimization to this metric is done similar to 3.4.1, we minimize the objective functions in 3.2.3. For the T , we have min PQ 1 N N X i max 8 > > < > > : H N i N i X j2N i 1 P j;i , max j2N i 1 P j;i 9 > > = > > ; (3.15) Similarly, for T max we have min PQ max i max 8 > > < > > : H N i N i X j2N i 1 P j;i , max j2N i 1 P j;i 9 > > = > > ; (3.16) Using similar procedures as in 3.4.1, we can rewrite the optimization problems based on (3.15) and (3.16) as geometric programs, which can be transformed into convex form, for instance for T , 79 3.5. MODEL WITH UNCERTAINTY—A GENERALIZATION based on (3.15), we have OPT-ApproxAvg min P;Q N X i=1 max 8 > > < > > : H N i N i X j2N i 1 p j; j;i q j; j;i y i; i;j q i; i;j Q k2N i; j;i z k; k;i , max j2N i 1 p j; j;i q j; j;i y i; i;j q i; i;j Q k2N i; j;i z k; k;i 9 > > = > > ; subject to: y i; i;j 1p i; i;j ; j2N i ;8i z k; k;i (1p k; k;i q k; k;i ) ; k2N i ;8i X q i; 1 and q i; ;p i; 2 [0; 1]; 8; 8i This is a geometric program that can be transformed into a convex form [94]. With slight modifi- cation on the proof in 3.4.1, we can show that through this program we can find the optimal value for the metric (3.15). In the simulation section we refer to the scheme that solves the convex form of OPT-ApproxAvg by ApproxAvg, and for the program that solves for the convex form of (3.16) by ApproxMax. 3.5 Model with Uncertainty—a Generalization The assumed prior knowledge in 3.2.1 may not match the system setup in a number of applica- tions, and thus the relation betweenN andN could be more complicated. For instance, the set of neighbor nodes in direction could be uncertain due to, for instance, randomness in the propa- gation environment, outdatedness of network topology, etc. In the dual band system example, the number of beam directions in each of the bands need not be equal or the antenna pattern could be different in different bands. In this section we aim to incorporate such uncertainty with the model and study its impact on the probability of success and the optimization metrics for the expected discovery time. 80 3.5. MODEL WITH UNCERTAINTY—A GENERALIZATION One way to introduce the uncertainty to the model is by using a confidence probability distri- bution on node relations. In particular, let i;j ( i ; j ) be the probability that a ”link” betweeni and j exist such thati hasj as a neighbor in direction i , andj hasi as a neighbor in direction j . In other words, we have i;j ( i ; j ) =P(j2N i; i ;i2N j; j ): (3.17) The joint distribution i;j ( i ; j ) is generic and depends on the source of uncertainty in the system. For simplicity and ease of presentation, we assume that the probability is independent at the two ends of the link between i and j, i.e., we use i;j ( i ; j ) = ij ( i ) ji ( j ), where ij ( i ) = P(j2N i; i ). This simplification might be meaningful for the uncertainty examples covered in this section. The probability form in (3.17) can be used with more general examples and more complicated uncertainty models. Note that the provided analysis and conclusions in this section hold true for (3.17) with minor and straightforward modifications. LetA ij be the set of directions for nodei where nodej could exist, i.e.,A ij =f : ij ()> 0g. One source of uncertainty could be the limited motion or location-link mismatch, assuming thati should ”ideally” communicate withj in the direction 0 ij , then we have ij ( i ) =P(j2N i; i j 0 ij ): In this case ij ( i ) captures the ”diffusion” around/from 0 ij . Another example is a dual band system with unequal number of beam directions in the two bands. Let U and L denote the number of beam directions in the upper and in the lower band, respectively. Assume that U L . ThenA ij contains all beam directions in the upper band that overlap with direction 0 ij in the lower band at whichj2N i; 0 ij (in the lower band). Due to this generalization, a neighbor nodej may be considered to be in one or more directions, i.e., j2N i; 82A ij . Fig. 3.2 shows a simple illustrative example. In Sec. 3.6, we provide additional examples for values of ij ( i ). However, accurate modeling for motion, environment or dual band systems is beyond the scope of the thesis. For instance, val- 81 Probability of Success Figure 3.2: An uncertainty example in a dual band system due to different number of antennas in different bands, where L = 2 and U = 4. Nodej, that is neighbor toi in direction 0 ij =m1 in the cmWave band, could be neighbor in directions ij 2fc1;c2g in the mmWave band, i.e.,A ij =fc1;c2g. With this knowledge, it is reasonable to assume ij ( i ) = 1 2 ; i 2A ij . Note thatj2N i;c1 andj2N i;c2 , while in realityj2N i;c2 only. ues of ij in dual band systems depend on the correlation between the bands, whose determination is an active research topic. 3.5.1 Probability of Success Introducing the concept of uncertainty changes the probability of successful discovery, since node i may communicate withj using one of possibly several directions inA ij , and similarlyj could communicate withi using one direction inA ji . The average probability of success becomes P(D j;i ) = X i 2A ij (1p i; i )q i; i ij ( i ) X j 2A ji p j; j q j; j ji ( j ) (3.18) Y k2N i; i 1 ik ( i ) X k 2A ki p k; k q k; k ki ( k ) : The rationale behind (3.18) is easy to see when expanding the the summations. Note that two nodes are neighbors over a pair of directions, i and j , with probability ij ( i ) ji ( j ). Additionally, nodei would receive interference from nodek ifi andk are neighbors over the direction i and k , respectively. 82 Optimization Metric 3.5.2 Optimization Metric Plugging (3.18), the average probability of success, into the expected discovery time (3.8), or the developed metrics of Sec. 3.4 results in very poor performance and breaks the convexity. For the developed metrics, it results in a loose lower bound. Alternatively, noting that we use the reciprocal of probability of success in the proposed metrics of Sec. 3.4, in the following we propose a heuristic method that proved to be effective in Sec. 3.6. Instead of averaging the probability of success, we can partially average the reciprocal of probabilities of success over the direction pairs ( i ; j ), we use 1 P(D j;i ) : X i 2A ij ; j 2A ji ij ( i ) ji ( j ) P( ~ D j;i ( i ; j )) ; (3.19) where P( ~ D j;i ( i ; j )) = (1p i; i )q i; i p j; j q j; j Y k2N i; i 1 ik ( i ) X k 2A ki p k; k q k; k ki ( k ) : This method is clearly based on averaging the metrics of Sec. 3.4 directly. Note that it is very chal- lenging to incorporate such averaging over the expected discovery time, (3.8), due to the coupling of all probabilities of success, which emphasize the importance of the simplified metrics in Sec. 3.4. Finally, using arguments similar to 3.4.1 and 3.4.2 we can show that the modified metrics can be transformed into a convex form. 3.6 Simulation and Discussion In this section we study the expected discovery time with prior information, we compare the per- formance of the proposed metrics with the direct neighbor discovery scheme [76], i.e., optimum solution with no prior information. 8 We further compare them all with the lower bound and the 8 We omit the comparison with the scheme in [71] to reduce redundancy, since the performance can be deduced directly from the graphs in [71]. 83 Performance of the Schemes approximation, and we discuss the impact of the average number of neighbors and beam width on the performance. Next, we study the impact of correlation between the two setsN i andN i on the performance of the schemes, uncertainty for given distribution and uncertainty due to difference in number of beams in dual band systems. We perform the simulation over a network with 40 nodes. The nodes are placed uniformly at random locations in a square area with 200 meters side length, the communication range for all the nodes is equal toR meters, we chooseR depending on the desired average number of neighbors in the network. Although the metrics in Sec. 3.4 are relatively simpler to optimize compared to the exact average discovery, they are still computationally heavy for a large number of nodes. In this section we perform simulation over 100 realizations of node locations. 3.6.1 Performance of the Schemes In this subsection we study the performance of the schemes using three metrics: the average of expected discovery time T , the maximum expected discovery time T max and the expectation of maximum discovery timeEfmax i T i g. The simulation is done for nodes with 5 beam directions and for the verification problem, i.e.,N i =N i . In the simulations we refer to the scheme that min- imizes the average expected time, i.e., T , by AvgET, and the scheme that minimizes the maximum expected time, i.e., T max , by MaxET. Fig. 3.3 shows the average expected discovery time, T = 1 N P 40 i=1 T i , for several schemes versus the average number of neighbors per node. Obviously, AvgET achieves the minimum value. For small number of neighbors we use (3.8), while for larger we use the method in [92]. Inter- estingly, both AvgWPs and ApproxAvg perform well, with slight advantage to ApproxAvg, which uses all probabilities of discovery for all neighbors rather than the one that results in the minimum probability of success. We note, as could be anticipated, MaxWPs and ApproxMax are not per- forming well under this metric. Furthermore, note that all the schemes beat the optimum scheme with no prior information. The figure shows the approximate value for T i (3.14), as discussed in 3.4.2 it offers a tight approximation/lower bound to AvgET, which explains the good performance of ApproxAvg. The plotted lower bound is 1 N P N i T LBi , and similarly the plotted approximation 84 Performance of the Schemes 2 4 6 8 10 12 14 16 0 50 100 150 200 250 300 350 400 Average number of neighbors Average expected discovery (time slots) No prior info. AvgWPs MaxWPs AvgET ApproxAvg ApproxMax Approx. Value Uniform LB Figure 3.3: T versus average number of neighbors. The used lower bound (LB) is 1 N P N i T LBi is 1 N P N i T 0 i . We note that the lower bound scales well compared to other schemes, however, the gap increases with the increase of average number of neighbors. Although it is not shown in the figure, it is worth mentioning that in a very dense network, where all neighbors have approximately equal number of neighbors in all directions, we expect the bound to become tight, since the net- work topology approximately approaches the uniform network with all probabilities of success are approximately equal. Fig. 3.4 shows the maximum expected discovery time, T max = max i2f1;:::;40g T i , for several schemes versus the average number of neighbors per node. Clearly, MaxET achieves the minimum value. In this case we have MaxWPs and ApproxMax outperform AvgWPs and ApproxAvg. Notice that MaxWPs performs relatively better than the other schemes, that is due to the fact that the met- ric considers the maximum value and MaxWPs minimizes an upper bound. Minimizing the upper bound would reduce the extreme values, which results in a better solution compared to minimizing an approximation for this metric. We further note that all the schemes outperform the optimum scheme without prior information. We also plotted the minimum value of the lower bound and the approximation, here we use max i T LBi as a lower bound and max i T 0 i for the approximation. Com- pared to the previous objective the gap between the schemes and the lower bound has increased, while the approximation shows reasonable performance. 85 Performance of the Schemes 2 4 6 8 10 12 14 16 0 100 200 300 400 500 600 Average number of neighbors Maximum expected discovery time (time slots) No prior info. AvgWPs MaxWPs MaxET ApproxAvg ApproxMax Approx. Value Uniform LB Figure 3.4: T max versus average number of neighbors. The used lower bound (LB) max i T LBi . As discussed earlier,Efmax i T i g has no closed form, thus we use Monte Carlo simulations to obtain the expected maximum discovery time, specifically, we simulate the solution of the schemes with respect to this metric. Fig. 3.5 shows the results. The simulation is done for 1000 itera- tions. Surprisingly, the schemes that minimize the average values, AvgET, AvgWPs and Approx- Avg, perform better than the other schemes, in other words, this could suggest that the schemes not only minimize the average of discovery time but also reduce the chances of large values ofT i 8i. This might seem counter intuitive, since one might expect that minimizing max i T i is the suitable method. However, max i T i is a lower bound toEfmax i T i g, and minimizing it may not achieve satisfactory results. This is more pronounced for larger value of average number of neighbors, where the discrepancy between the minimized metrics and simulated one is clear. Despite of these results, minimizing the maximum expected discovery time is still a useful metric with obvious interpretations. In addition, we notice that the slope of the scheme without prior information is smaller than in previous scenarios. This could indicate that steering the beams uniformly at ran- dom could alleviate the extreme values of max i T i , as a result of low probability of success, due to collisions. In the remainder of this section, we focus on the average expected time metric since it is a proxy for both; the expected average and maximum discovery time. Finally, note that MaxWPs outperforms MaxET for large numbers of average neighbors and ApproxAvg slightly outperforms 86 N i; andN i; Relation 2 4 6 8 10 12 14 16 0 500 1000 1500 2000 2500 3000 Average number of neighbors Average maximum discovery time (time slots) No prior info. AvgWPs MaxWPs AvgET MaxET ApproxAvg ApproxMax Figure 3.5: Monte Carlo evaluation ofEfmax i T i g AvgET for low numbers of average neighbors, this is possible since the objective in this figure is different than the objective minimized by MaxET and AvgET, respectively. We next study the impact of the number of beam directions on the average expected discovery time. In Fig. 3.6, we assume that nodes have on average eight neighbors. It is evident that the use of the prior information reduces the discovery time. Further we notice that the relation between the schemes that was observed in Fig. 3.3, is maintained in this figure as well. For omni-directional neighbor discovery, and under the current simulation setup, the benefit of prior information is marginal, this suggests that the gain of prior information comes mainly from choosing the appro- priate beam-steering probabilities. 3.6.2 N i; andN i; Relation To capture the impact of correlation betweenN i; andN i; , for the system model introduced in Sec. 3.2, we assume a simple correlation between the two sets. Let the setsN i; andN i; contain the nodes up to distanceR andR in direction, respectively, where 0 1, and thus we have N i; N i; . This model corresponds to disk coverage areas in Fig. 3.1 withR andR as radius for the coverage in cmWave band and mmWave band, respectively. We emphasize that this naive model does not necessarily correspond to the physical reality of the channel, and that we use it to 87 N i; andN i; Relation 0 5 10 15 0 500 1000 1500 Number of beam directions Σ Average expected discovery time (time slots) No Prior Info. AvgWPs MaxWPs AvgET ApproxAvg ApproxMax Figure 3.6: T versus number of beam directions (). illustrates the performance when deviating theN i; =N i; assumption that is made initially. The impact of is shown in Fig. 3.7. As we can see, although the analysis and the optimization are done with assumption of perfect correlation, i.e., = 1, the benefit of the schemes when< 1 is evident. However, as the correlation between the two sets decreases, i.e., small, the schemes have relatively comparable performance. One impact of the lower correlation is that even when there are small chances to have neighbors in a given direction, node i has to steer its beam into that direction, otherwise nodei could possibly miss one of the neighbors. This becomes a major limitation when the correlation is very small and large number of nodes in the network have this problem in several directions. This simple model can be captured by the distribution i;j ( i ; j ) in Sec. 3.5, specifically, we can simply use i;j ( i ; j ) = 2 , and solve the metrics as highlighted in Sec. 3.5. In this case, since we know precisely the direction which a neighbor might exist in, the average probability of success in (3.18) reduces to a scaled version of (3.4) with smaller impact of interference. In figure 3.7 we consider this for AvgWPs and ApproxAvg, which are shown as AvgWPs2 and ApproxAvg2, respectively. As expected, in this simple model there is a small improvement over the model that ignores such information. Next, we consider a slightly more complicated uncertainty model. We assume that a neighbor nodej is in direction 0 i with probabilityP 0 , which could represent a prior knowledge. Additionally, 88 N i; andN i; Relation 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 300 ρ Average expected discovery time (timeslots) No Prior Info. AvgWPs MaxWPs AvgET ApproxAvg ApproxMax AvgWPs2 ApproxAvg2 Figure 3.7: T versus. j might be in surrounding directions i 6= 0 i with probability a( i j 0 i )(1P 0 ) b , wherea( i j 0 i ) andb are constants. We consider two examples, (i) Distribution 1, we assume that neighbor node could be in one of three directions, 0 i and one direction to the left and right of 0 i , witha = 1 andb = 2. For the second example, (ii) Distribution 2, we assume the neighbor node could be in one of five directions, with a = 1 and b = 4. For example, for Distribution 2, given that 0 i = 3, then ij ( i ) =P(j2N i; i j3) = 1P 0 4 ,8 i 2f1; 2; 4; 5g and zero otherwise. In Fig. 3.8, the simulation is done for a similar setup as before, however, each node is capable of directing its beam into eight directions. Clearly, the performance improves as the certainty increases, i.e., P 0 increases. For the two distributions, the performance converges whenP 0 is small, since the node will be equally probable to be in the surrounding directions. Next, we consider a dual band systems with different number of beams in the two bands. We fix the number of beams in the lower band, L 2f2; 5g, and vary the number of beams in the upper band U . For simplicity, we ignore the impact of attenuation and shadowing at the two bands. Note that with this level of information, when given prior knowledge about the direction which neighborj exists in the lower band, say 0 i , thenj exits in directions i in the upper band with probability, ij ( i ) proportional to the shared area between the two directions i and 0 i . For instance, similar to the example in Fig. 3.2, when L = 2 and U = 4, for j in 0 i = 1, then ij (1) = ij (2) = 1 2 and zero otherwise. Fig. 3.9 shows advantage of increased information about 89 3.7. APPENDIX 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 100 200 300 400 500 600 700 Po Average expected discovery time (time slots) No Prior Info. ApproxAvg AvgWPs Distribution 2 Distribution 1 Figure 3.8: T with two uncertainty models. neighbor directions; the discovery time is smaller when L = 5 compared to the discovery time when L = 2. Additionally, as a generalization to the discussion for Fig. 3.7, for this ij model, and when U =n L , 9 the average probability of success in (3.18) reduces to a scaled version of (3.4) with a smaller impact of interference, which explains the sudden decay of discovery time for the case with L = 5 when U = 5 or U = 10. However, it is less obvious for L = 2 due to the small value of L . 3.7 Appendix 3.7.1 Proof of Theorem 3.3.1 Proof. As defined earlier,T j;i is a Geometric random variable with parameter P j;i P (min) i , let P (min) i = P j;i ji , where ji is a non negative real number. Further, define two, probably hy- pothetical, disjoint events such thatD ji =D (min) ji [D ji , with probabilitiesP(D (min) ji ) = P (min) i andP(D ji ) = ji . Next, define corresponding geometric random variables,T (min) ji andT ji . For instance,T (min) ji is the number of trails until the first occurrence ofD (min) ji . Then, as discussed 9 This is relevant to the typical values, since in practice the number of beam directions is usually power of two. 90 Proof of Theorem 3.3.1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 1000 Numbear of beams in upper band Σ U Average expected discovery time (time slots) No Prior Info. ApproxAvg AvgWPs Σ L =2 Σ L =5 Figure 3.9: T in dual band system, with two values of number beam directions in the lower band L =f2; 5g, and varying number of beam direction in the upper band. earlier and in [90], we have T ji = min n T (min) ji ;T ji o Next, using the definition ofT i and the value ofT ji , we have max 8 < : min n T (min) ji ;T ji o ;T k;i ;::: 9 = ; max 8 < : T (min) ji ;T k;i ;::: 9 = ; This is true for allj, i.e., we have T i max 8 < : T (min) ji ;T (min) ki ;::: 9 = ; NotingT min ji is a geometric random variable with parameterP (min) i independent ofj, by taking the expectation of both sides and following the derivation of (3.8), the result follows. 91 Relation Between Bounds and Distribution 3.7.2 Relation Between Bounds and Distribution The Cumulative Distribution Function (CDF) in eq. (3.5), is the probability the last neighbor of particular node i is discovered before time t. Note that it is function of values of P j;i , j2N i , which are determined based on solving appropriate optimization problem, as in Sec. 3.4 or 3.5.2. Thus,P j;i has no immediate structure that depends on the number of neighbors and/or directions. This makes providing concentration of measure around the means or the bounds, as in [78], a challenging task. In the following we simplify the CDF and provide a simple relation to compute the time needed for nodei to discover the last neighbor with probability . We can first approximate the CDF as follows P(T i t ) exp P j2N i exp P j;i t = Taking the log() of both sides twice we have log( X j2N i exp P j;i t ) = log( log( )) The left hand side is usually referred to as ”log-sum-exp” [94, Ch. 3], or soft max, which bounds the maximum value of the arguments of the exponential functions as follows, P (min) i t + log(N i ) log( log( ))P (min) i t (3.20) This suggests that c P (min) i t log(N i ) +c P (min) i (3.21) withc = log( log( )). Thus, t is slightly larger than c P (min) i for small number of neighbors, e.g.,c = 2:25 for = 0:9. Note the resemblance between the inequalities in (3.21) and the findings in Theorem 3.3.1 and Theorem 3.3.2. In particular, for 0:7 the lower bound is similar to the one in Theorem 3.3.2, and for 0:57 the upper bound is similar to the one in Theorem 3.3.1. 92 Proof of Theorem 3.3.3 Additionally, noting thatP (H) i P (min) i , if log( log( ))P (min) i t + log(N i ), it is easy to show that replacing theP j;i byP (H) i , similar to the approximation in (3.14), would result in a lower bound for the expectation, by reverse engineering some of the steps above and using the relation between the Complementary CDF (CCDF) and the expectation. However, if log( log( )) is close to the right hand side of (3.20), then we need to show that the values oft when log(N i ) P (H) P (min) t contributes more to the expectations evaluation compared to when log(N i ) P (H) P (min) > t. Although this seems true through simulation, (not shown in the chapter), it is analytically complicated, and thus, as in Sec. 3.4.2, we leave it as a conjecture. Finally, note that we can always compute the probability for a given value oft. For instance, simulations, not shown in this chapter, show that we need t to be twice the exact value of the expectation for 0:8. 3.7.3 Proof of Theorem 3.3.3 Proof. The proof has two parts; in the first part we show that replacing any two probabilities in (3.8) with their average results in a lower bound of the original expected discovery time. In the second part, we show that iteratively substituting two probabilities with their average converges to the right hand side of Theorem 3.3.3. LetT (j;k) i denote the discovery time of nodei, when two probabilities of success, sayP j;i and P k;i , are substituted with their arithmetic averages, i.e., P j;i 1 2 (P j;i +P k;i ) andP k;i 1 2 (P j;i +P k;i ) (3.22) The following theorem shows the impact of such modification. Theorem 3.7.1. Replacing any two probabilities of successP j;i andP k;i with their averages results in a lower bound of the original expected discovery time, i.e., we have: E n T i o E ( T (j;k) i ) : 93 Proof of Theorem 3.3.3 For clarity of presentation we provide the proof in subsection3.7.3.1. As indicated above, the formula forEfT (j;k) i g is similar to (3.8) with new updated probabilities. Consequently, performing the average iteratively will result in a lower bound on each step, i.e., whenj;k;s;r;v;u2N i we have, E ( T (j;k) i ) E ( T (j;k)(s;r) i ) E ( T (j;k)(s;r)(v;u) i ) ::: (3.23) whereE n T (j;k)(s;r) i o andE n T (j;k)(s;r)(v;u) i o are the resultant discovery time after applying (3.22) replacing the newP s;i andP r;i with their averages, and then substituting the resultingP v;i andP u;i with their averages, respectively. Next, we show that this process will ultimately converge, when we apply (3.22) iteratively to different, possibly randomly chosen, probability pairs. Specifically, we have the following theorem: Theorem 3.7.2. Let p be a vector of real numbers, i.e., p = [p 1 ;:::;p n ] T , and let p = 1 n P n j=1 p j . There exists an algorithm that iteratively replaces two elements at a time with their average until p converges to a vector p with all elements equal to p. In subsection3.7.3.2 we use the properties of doubly stochastic and Markov matrices to show that for largek, W k p! p is one such algorithm, where W is a matrix that represents a sequence of double averaging andk is the number of repetitions of such procedure. Theorem 3.3.3 is a direct consequence of Theorem 3.7.1 and Theorem 3.7.2 . 3.7.3.1 Proof of Theorem 3.7.1 Proof. We start by defining the following terms and functions. LetI i j;k be the constant equal to the sum of the terms in (3.8) that are independent ofP j;i andP k;i , an example of such terms is 1 P s;i +P r;i , wheres;r;j;k2N i . LetC i k;j be the constant equal to the sum of the terms that include both of P k;i andP j;i , an example of such a term is 1 P j;i +P k;i . Finally, letF i jjk (x) be F i jjk (x) = 1 x X r2N i nfj;kg 1 x +P r;i ::: + (1) (N i ) 1 x + P r2N i nfj;kg P r;i : (3.24) 94 Proof of Theorem 3.3.3 The functionF i jjk (x) contains all the terms of (3.8) that includeP j;i and notP k;i , withP j;i repre- sented byx. From symmetry considerations, it is easy to verify thatF i jjk (x) = F i kjj (x). Thus, we can rewrite (3.8) as: T i =F i jjk (P j;i ) +F i kjj (P k;i ) +C i k;j +I i j;k (3.25) Next, we show that the functionF i jjk (x) is convex function inx. Lemma 3.7.3. For a given finite setN i and valuesP j;i 2 (0; 1]8j2N i the functionF i jjk (x) is convex. Proof. We use the second derivative test. Thus, we need to show that the second derivative of F i jjk (x) is non negative, i.e., F i 00 jjk (x) 0: where F i 00 jjk (x) = 2 x 3 X r2N i nfk;jg 2 (x +P r;i ) 3 ::: + (1) N i 2 (x + P r2N i nfj;kg P r;i ) 3 (3.26) To show that this is non-negative over the range of x2 [0; 1], we utilize the similarity between each term in (3.26) and some properties of Unilateral Laplace Transform [96]. Let the Laplace transform of a functiong(x) be given by: G(s) = Z 1 0 g(t)e ts dt As is well-known, fora2 [0; 1] and region of convergences >a,Lft 2 e at g = 1 (s+a) 3 . Given the linearity of the Laplace transform, by replacings withx, anda with appropriate values in each term in (3.26), we find L 1 fF i 00 jjk (x)g =t 2 e xt Z jk 95 Proof of Theorem 3.3.3 where Z jk = 1 X r6=k;r2N i e P r;i t +::: + (1) N i e ( P r2N i nfk;jg P r;i )t (3.27) Clearly, to show thatF i 00 jjk (x) 0, it is sufficient to show thatZ jk is non negative. Claim :Z jk 0. Proof. Let us substituteZ jk = 1V jk where V jk = X r2N i nfk;jg v r X r6=s;r;s2N i nfk;jg v r v s +::: + (1) N i 1 Y r2N i nfk;jg v r ; (3.28) wherev r =e P r;i t 2 [0; 1]. The reader might notice the similarity between (3.28) and the inclusion exclusion principle of n = N i 2 independent events. To see this clearly, define the set of independent eventsA r ,r =f1;:::;ng, each occurs with probabilityv r . Then the probability of the union of these events is P(A 1 [A 2 [:::[A n ) = X r P(A r ) X r6=s P(A r \A s ) +::: + (1) n+1 P(A 1 [A 2 [:::[A n ) = X r v r X r6=s v r v s +::: + (1) n+1 Y v r By axioms of probability, the left hand side is 1, i.e.,V jk 1, and thusZ jk 0. This concludes the proof of lemma 3.7.3. SinceF i jjk (x) is convex, we can write, [94], F i kjj 0 B B @ P j;i +P k;i 2 1 C C A +F i jjk 0 B B @ P j;i +P k;i 2 1 C C A = 2F i kjj 0 B B @ P j;i +P k;i 2 1 C C A (3.29) F i jjk (P j;i ) +F i kjj (P k;i ) (3.30) 96 Proof of Theorem 3.3.3 Then adding the termsI i j;k +C i k;j to the both sides of the inequality completes the proof. 3.7.3.2 Proof of Theorem 3.7.2 Proof. Let p = [p 1 ;p 2 :::;p n ] T ; we start with the case that n is odd number. Then define! ! ! u an nn matrix that has all one diagonal elements except a 2 2 block that starts at (2u 1; 2u 1) and equals to 1 2 U 22 , where U uv is all one matrix of sizeuv. We also define O uv as all zero uv matrix and I uu as an identity matrix of sizeuu. For instance we have ! ! ! 2 = 2 6 6 6 6 4 I 22 O 22 O 22 1 2 U 22 O 4n4 O n44 I n4n4 3 7 7 7 7 5 We also definee ! ! ! u , that is an all one diagonal matrix except the 22 block 1 2 U 22 that starts instead at element (2u; 2u). Note that the process of replacing the first two elements of p with their average is equivalent to! ! ! 1 p. Next define = Q n1 2 i ! ! ! i and similarly we have e = Q n1 2 i e ! ! ! i . It is easy to see that p replaces the pair of elements (p i ;p i+1 ) fori =f1; 3;:::;n 2g of p with their averages, similarly e p for (p i ;p i+1 )i =f2; 4;:::;n 1g. Next, define: p u = ( e ) u p Then an algorithm described in Theorem 3.7.2 can be the one in the following lemma: lim u!1 p u = p Proof. To prove the lemma, we first need to study the structure of the matrices and e . By multiplying all! ! ! i we can see that is a block diagonal with each block equal to 1 2 U 22 except the 97 Proof of Theorem 3.3.3 last 1 1 block is 1, i.e., we have = 2 6 6 6 6 6 6 6 6 4 1 2 U 22 O 22 O 22 ::: O 22 1 2 U 22 O 22 ::: . . . . . . O n11 O 1n1 1 3 7 7 7 7 7 7 7 7 5 Similarly, e is a block diagonal, but the first 1 1 block is equal to one, and the rest are 1 2 U 2;2 . Define W, e , then forn 5 let M u be ann 2 matrix of all zero except an all one 4 2 sub matrix, the first element of the sub matrix is at (u + 1; 1), i.e., for instance M 1 = 2 6 6 6 6 6 4 O 12 U 42 O n52 3 7 7 7 7 7 5 Then, it is easy to verify that W has the following form W = 1 4 2 6 6 6 6 6 4 2U 12 U 22 O n32 M 1 M 3 ::: M n5 O n21 2U 21 3 7 7 7 7 7 5 Note that every row and every column of W add to 1, this type of matrices are called doubly stochastic. To prove the lemma we utilize properties of such matrices. Remember that we need to show that W 1 = lim u!1 W u = 1 n U nn Since W is a square matrix with non negative entries and every row adds to one, we can view it as a state transition matrix of a Markov Chain (MC) withn states. The resultant MC is irreducible since any stateu can be accessible from any other statev. It is also aperiodic, since aperiodicity is a class property, it is enough to note that for this finite irreducible MC any state has a self transition, 98 Extensions and thus periodd = 1. As in [66], the aforementioned properties indicate that there is a unique stationary distribution of the MC such that W 1 = 2 6 6 6 6 6 6 6 6 4 w 1 w 2 ::: w n w 1 w 2 ::: w n . . . . . . . . . w 1 w 2 ::: w n 3 7 7 7 7 7 7 7 7 5 Doubly stochastic matrices are closed under multiplication [97], i.e., W u is a doubly stochastic matrix for any valueu. Consequently, we havenw i = 1 =) w i = 1 n , as a result we have W 1 = 1 n 2 6 6 6 6 6 4 1 1 ::: 1 . . . . . . . . . . . . 1 1 ::: 1 3 7 7 7 7 7 5 = 1 n U nn Q.E.D. This also shows that averaging two elements at a time in the sequence indicated by Lemma 3.7.3.2 is one possible algorithm in Theorem 3.7.2. Whenn is an even number, we need to slightly modify! ! ! u ande ! ! ! u such that we omit the last row and column in both matrices. The proof follows similar methodology, and is omitted for brevity. 3.7.4 Extensions Optimizing the expected discovery time for the whole network might be complex especially for a large network and/or when the nodes have a large number of antennas. Alternatively, we can use local information to minimize the discovery time. In particular, and based on our observation earlier, P min i plays a major role in the bounds. In [71] and [98], we provide two scenarios to minimize the discovery time. Specifically, 99 Extensions • In [98], we again maximize the minimum probability of success: max p;q min i;j P(D j;i ) To solve this problem, the BS can use a simple iterative solution (similar to coordinated gradient descent) to maximize the minimum probability of success. Alternatively, this can be viewed as distributed solution where the neighbor nodes can exchange local information, such as the valuesp j; ;q j; . This results in a low complexity iterative scheme with reasonable performance. • In [71], for a distributed system each node aims to minimize its local expected discovery time. Similar to above the node can maximize the local minimum probability of success, for nodei we have max p i; i ;q i; i ;8 i min j P j;ijI whereI represents the global side information for all neighbor nodes and their directional transmission probabilities. However, since this information is not available to the nodes, this degenerates toI i , which is the local side information. To optimize the above quantity, nodei can make assumptions about thep j; andq j; and the estimated number of neighbors that other nodes see in direction ofi, i.e., their prior information. In this problem, certain constraints should be enforced to maintain fairness in the discovery process. 100 Chapter 4 Band Assignment in Dual Band Systems 4.1 Introduction The large available bandwidths in the millimeter-wave (mmWave) frequency band can support the high data rates required for many emerging applications in next generation wireless networks (5G and beyond). However, the hostile propagation conditions at high frequencies restrict its utiliza- tion. Compared to the centimeter-wave (cmWave) band, signals in the mmWave band suffer from higher attenuation, higher diffraction loss, and are more susceptible to blockage, which reduces the reliability of the communication systems [18,99], a required criterion for seamless user experience. Thus, due to these characteristics of the two bands, both are indispensable components for future wireless networks [8, 32]. The joint utilization of the two bands enhances the coverage, system reliability and achievable data rates. Recently, different dual band architectures were proposed [32, 100, 101]. For instance, the cmWave band can be used for the control plane while the mmWave band is used for the data plane. Alternatively, both bands can be used for both planes. In some wireless networks, the simultaneous usage of the two bands might not be practical due to a number of limitations at the User Equipment (UE) side, such as limited processing capabilities, constraints on transmission power, etc. Thus, 4.1. INTRODUCTION depending on the underlying band assignment (BA) scenario, the basestation (BS) has to assign the UE to one of the two bands based on the instantaneously observed channels, such as in the initial channel access scenario, or sequentially switch the communication between the two bands as the UE moves, i.e., switch to the mmWave band whenever it is available or the cmWave band when the mmWave band suffers a blockage or other bad propagation conditions. We refer to the first problem as one-shot BA, and the second as sequential BA. The BA problem is challenging, since simultaneous observations of the two bands are not usually available to the BS. Furthermore, using a frequent ”measurement gap” to send training signals over the two bands and synchronizing such (possibly) unnecessary switching between the bands reduces the overall throughput of the system. In an alternative solution that relies on the correlation and the joint characteristics of the two bands, the BS can utilize partial information, such as the channel state in one band or the UE’s location, together with some ”prior knowledge” to solve the BA problem. Both the accurate joint characterizations of the two bands and the proper utilization of such (possibly) non-homogeneous information are also challenging. In this chapter, we concentrate on the latter problem; for the former, see, e.g., [102] and references therein. Machine learning (ML) is a powerful technique that can capture complex relations between the input data (features) and the output values (labels). Motivated by the remarkable success of Deep ML (DL) in various fields, the wireless communication community started recently to explore DL in problems such as channel coding, estimation, channel modeling and many others, see., e.g., [5, 36, 103] and references therein. The reported initial results are promising; ML based solutions are able to provide competitive performance for problems where optimal solutions are known, e.g., using multi-layer Neural Networks (NNs) for decoding in AWGN channel [104], indicating that ML may also be applied to problems where traditional methods have failed or where the environment is too complex. For instance, Ref. [105] demonstrated the efficiency of DL based detection over a molecular system where the channel model is difficult to model. In this work we consider two different approaches to solving the BA scenarios. In the first approach, we use standard assumptions about the channel model to derive analytical solutions to the problem. In particular, we assume that the shadowing (on logarithmic scale) in the two 102 Prior Work bands follows a joint Gaussian distribution over frequency and space, i.e., it represents a Gaussian Process (GP); this assumption extends the widely used model of lognormal shadowing in a single frequency band [18]. The second approach is motivated by the above discussion on the complexity of the BA and the promising performance of the ML solutions. Thus, we explore different ML models for several feature combinations that may include some information about the channel properties and/or location of the UE. We study the approaches in two different environments. The first is a stochastic environment, where the channel realizations are generated in accordance with the GP assumption. The second is a more realistic environment where the channel states are obtained via ray-tracing. Both environments are needed for fair and informative comparisons, as in the first the GP-based solution is optimal for a statistical channel model, and the second represents a realization of a realistic environment. 4.1.1 Prior Work In addition to the papers that discussed possible advantages and architectures of dual band systems, there have been recent studies that considered the interplay between cmWave and mmWave bands. Refs. [99, 106, 107] utilize the angular correlation in the two bands to provide a coarse estimate of the Angle of Arrival (AoA) at mmWaves based on the AoAs in the cmWave band, which can be used to reduce the beam-forming complexity at the mmWave band. Ref. [108] studied the covariance matrix translation between the two bands. For joint communication in the two bands, [107] proposes a two-queue model to assign data to each band such that delay is minimized and throughput is maximized. Ref. [33] considers the downlink resource allocation in a network with a small cell BS, where the BS aims to assign the UE or services to the resources in the two bands. The BA process can be viewed as a handover process between two co-located BSs with dif- ferent frequencies. Refs. [109–111] used ML approaches to address the handover and switching between BSs that may use different frequency bands. In [110], the authors use ML to improve the success rate in the handover between two co-located cells in different bands, their implemented ML classifier uses the prior channel measurements and handover decisions within a temporal win- dow to predict the success of the handover. Ref. [109] introduces an uplink (ULink)/downlink 103 Objective (DLink) decoupling concept where the central BS gathers measurements of the Rician K-factor and the DLink reference signal receive power for both bands, and trains a non-linear ML algorithm that is then applied to the cmWave band data to predict the target frequencies and BS that can be used for the ULink and DLink. Ref. [111] uses a gated recurrent NN to predict handover status at the next time slot given the beam-sequence, where the BS uses the sequence of previously used beam-forming vectors as input to the ML scheme. Different than these works we use different sets of features and several ML algorithms (including a DL solution based on recurrent NN) for two problem setups in two different environments, enabling us to optimize a solution approach, and not just check the performance of one particular algorithm. In addition we also consider an analytical solution based on GP for the BA problem, which may not only be of value in itself, but also allows to benchmark the ML solution. Channel states prediction using GP or ML was considered in several works such as [112–115], where Refs. [112, 115] use GP to predict the shadowing values in the network based on collected drive tests, while Refs. [112–114] use regression ML techniques to predict the channel state. Using ML to predict unobserved channel features was also considered in [116, 117]; in [117] the authors use NNs to predict the AoA, and in [116] the authors utilize the observed channel state information in a central BS to predict the optimal beam direction in local BSs. However, these works focus on single band and use mainly a regression framework, while here we solve the BA in two bands as a classification problem using several related features combination. In our recent work [118] we derive the achievable rates and the outage probability of the BA based on linear prediction. In the current work, although using similar basic assumptions as [118] for the GP approach, the BA decisions are based on the probability of success. 4.1.2 Objective We consider GP-based and ML-based approaches to obtain solutions to two BA scenarios: one- shot and sequential BA. The one-shot scenario relies on the current observations for the BA, while the sequential BA the BS uses the current and previous observation to predict the best BA in a future time instant. The GP-based solution uses the GP assumption along with approximations, 104 4.2. PROBLEM AND SOLUTIONS OVERVIEW which may not always hold in practice, to derive analytically tractable solutions. In the ML based solution we use DL and other ML techniques to propose efficient solutions to the BA problems. The used observations depend on the scenario and the approach, which may include: the location of the UE, the received power (or data rate) in one band, 1 the delay, and the Angle of Departure (AoD) of the main multi-path component (MPC) [18]. Utilizing such information, the BS can reduce the required signaling, which improves the spectrum efficiency and reduces the latency in the system. We study the performance of the proposed solutions in a ray-tracing and a stochastic environments, where the latter environment relies on the GP assumption to generate the shadowing. The remainder of the chapter is organized as follows. Sec. 4.2 describes the basic system model, summarizes the two problems, and introduces the two approaches along with their main assumptions. In Sec. 4.3, we derive the BA rules under the GP assumption, and also provide an approximate solution. Sec. 4.4 provides the details of the DL approach. Secs. 4.5 and 4.6 provide two experiments to evaluate the performance of the schemes, using a stochastic environment and a ray-tracing environment, respectively. 4.2 Problem and Solutions Overview 4.2.1 Basic System Model We consider a dual band cellular system, where the BS and the UE can operate in two frequency bands with center frequencyf b and bandwidth! b in bandb2fc;mg, wherec andm refer to the cmWave and the mmWave bands, respectively. Due to a number of practical limitations of the UE, we assume that data transmission occurs in a single frequency band at a time. The BS controls the band selection process, using some observations about the channel and prior knowledge to choose the band that results in the highest data rate. To focus on the basic problem, we consider a single user case, i.e., no scheduling or interference is considered; the multi-user case is left for future work. 1 We use the signal to noise ratio, signal strength and rate interchangeably when we refer to one of them as a feature, since we assume that we can use one of them to calculate the others, even though that might not be correct under some circumstances. 105 Basic System Model It is well established that the small scale fading in the two bands are independent due to the large frequency separation; furthermore, modern diversity techniques mostly eliminate its impact [18]. In contrast, large scale parameters vary relatively slowly over time and maintain time and frequency correlation, making it possible to utilize information over frequency and time (space) and thus make switching decisions. Note also that large-scale parameters are reciprocal in ULink and DLink for both time-domain and frequency-domain duplexing systems as long as the duplexing distance is smaller than the stationarity time or bandwidth, respectively; this condition is fulfilled for almost all practical systems. For this reason, the subsequent discussion is valid for both link directions, and we assume that the BS can acquire the channel state information about the large-scale parameters without additional overhead. Similar to [118] we define a time-frame as a sequence ofT time slots (data units), each time- frame is indexed witht. The SNR in bandb on a logarithmic scale (dB) during time-framet can be described by [18] SNR b (t) =P b tx + b (t)N b 0 ; (4.1) whereP b tx is the transmitted power (including both transmit and receive antenna gain),N b 0 is the noise level, and b (t) captures the large scale variation in bandb that varies as the UE moves. Then using capacity-achieving transmission, we can write the rate in bandb as R b (t) =! b log 1 + 10 SNR b (t) : (4.2) where = 0:1. In this work, the BA procedure and the detailed description of the observations and prior knowledge depend on the scheme and the problem setup. In general, the BS uses the observations to produce the soft decision ~ D2 [0; 1], which it then uses to make the BA decision D2f0; 1g, where we use ”0” and ”1” to refer to the data transmission in the cmWave and mmWave band, respectively. 106 Problem Description 4.2.2 Problem Description We study two scenarios, other scenarios can be also considered, however, they generally lie be- tween the two scenarios and/or can be derived based on the provided analysis, thus for brevity we do not discuss them here. 4.2.2.1 One-Shot Band Assignment In this problem, for time-frame t, the BS uses the current observations to choose the frequency band for data communication, i.e., the observations, the BA decision and data transmission are all in the same time frame t. The used observations depend on the approach, but may include the observed power (or rate) in one of the two bands, the UE location, delay and AoD of the dominant MPC at framet. This problem is relevant, for instance, in initial network access, where the BS can uses information from control signal (e.g., over the cmWave band) to assign the suitable frequency band for data communication. It is furthermore relevant in nomadic scenarios, where the channel state does not (or not significantly) change over time, and thus even a sequential BA degenerates to the one-shot approach. 4.2.2.2 Sequential Band Assignment In this problem, we assume that as the UE moves, the BS uses the current and the previous ob- servations to predict the best band after U time frames. Similar to above, the used observations depend on the approach. This problem is useful for resource allocation, as the BS can plan what resources to use in advance. Note that although this problem (with small modifications) can be viewed as a generalization of the previous one, we keep the two distinct for clarity. 107 Solutions Overview 4.2.3 Solutions Overview 4.2.3.1 Gaussian Process Based Solutions In this approach, assuming that the b (t) in (4.1) consists of path-lossP b L (t) and large scale fading (shadowing)S b (t), i.e., b (t) =P b L (t) +S b (t) (4.3) we assume that the BS knows the channel model and statistics, and it can estimateP b L (t), either using empirical models or prior knowledge of the environment. Also we assume that the shadow- ing is a stationary Gaussian process in space (time) and frequency with mean b = 0 and standard deviation b in bandb, i.e.,S b (t)N (0; 2 b ), andS b (t) andS b 0 (t 0 ) are jointly normal with cor- relation function Cov(S b (t);S b 0 (t 0 )). An example of the correlation model and further discussion is provided in Sec. 4.3 and in [118]. Note that assumingS b (t) is Gaussian on a dB scale matches many measurement campaigns [18], but it may not always hold in practice. Still, we rely on it along with the joint Gaussian assumption over frequency for simplicity and mathematical tractability. To emphasize the fact that the rate is a random quantity, we use (4.3) to rewrite (4.1): R b (t) =! b log(1 + 0 b 10 S b (t) ): (4.4) where = 0:1 and 0 b = 10 (P b tx P b L N b 0 )0:1 : To simplify the notation we sometimes omit the time index forS b andR b when things are in clear context. Then, for the two problems we have • For the one-shot problem, the BS observes SNR b (t) and uses an estimate of the path loss P b L (t) to extractS b (t). Then it usesS b (t) to make a BA decision based on the probabilities thatR b (t)>R b 0 (t), i.e., whether to communicate overb orb 0 . • For the sequential decision problem, in time frame t the BS uses the SNR observations (along with the path loss estimates) of the current and the lastQ time frames to predict the BA decision in time framet +U based on the probability thatR b (t +U)>R b 0 (t +U). 108 Solutions Overview For the GP-based solutions we refer to the set of observations at timet as setH t . In general, it is easy to observe that this approach uses power (rate) and distance information; it also uses training data to acquire the statistics of the environment. However, it is difficult to directly incorporate other features. Thus we consider two different environments, one of which matches the GP channel model. 4.2.3.2 Learning Based Solutions In any given frame the BS has to take one of two decisions: use cmWave band (D = 0) or the mmWave band (D = 1), which can be viewed as a binary classification problem. Several ML models can be used to solve such problems. Our focus will be here on NNs based solutions, but we also consider other models to provide fair assessments of the ML based solutions. The following is a summary of the used approaches: • For the one-shot problem, we use multi-layer NNs, in addition to LR and GR. The regression models are simple linear and nonlinear ML solutions to the problem. Depending on the environment, we could use the received power in one band, the polar coordinate of the UE (distance and phase), delay and the AoD of the dominant MPC as features. • For the sequential problem, we use a deep network with NN and LSTM architecture. We also use a multi-layer NN and GR with historical data for comparison, the history being a window of the lastQ observations. The observed features depend on the environment, but in this case they may include the observed power from both frequency bands. 2 For the learning based solutions, we denote the set of observed features in time frame t byF t . Details of chosen approaches, training etc. are presented in Sec. 4.4. 2 This is just one of many possible features, in practice this can be realized using passive power sensing (RSSI sensors) [18]. 109 Performance Metrics 4.2.4 Performance Metrics We evaluate the performance of the solutions using the probability of BA error. ForN X number of instances it can be numerically equivalent to the average number of BA errors 3 E X = 1 N X N X X i jD i L i j: (4.5) whereD i andL i 2f0; 1g are the decision of the scheme of interest and the correct decision, respectively, for instanti. We also refer toL i as the true label that takes "1" when the data rate in the mmWave band is larger than the data rate in the cmWave band. Note here we use instant i rather than time t as the data points over which we evaluate the performance may not belong to the same sequential data. We use the subscript X to distinguish the data sets for which we evaluate the BA error as we will discuss later. For interpretability of the results, we also show the normalized rate loss of the BA procedures. For a given scheme, we can define the achievable rate asR i :fR b i :b =m ifD i = 1; orb =c ifD i = 0g. We also define the maximum achievable rate for that instant asR max;i = max b2fc:mg R b i , then the normalized rate loss is given by R X = 1 N X N X X i=1 R i R max;i R max;i (4.6) To calculate the rate loss values, we bound the achievable rates by using a fix modulation format for both bands, in particular, we use 256 QAM [18]. Although we will show R X alongside E X throughout this chapter, we will only discuss the latter in the interpretation of the results. 4.3 Gaussian Process Based Band Assignment Based on the model introduced in Sec. 4.2.3.1, we denote the following two events W b (t) :R b (t)>R b 0 (t); b6=b 0 ;b;b 0 2fc;mg: 3 Since some of the data points are correlated we rely on an argument similar to the one we gave in [118]. 110 One-Shot Band Assignment Then given the set of observations at time frametH t , a reasonable band assignment rule is b = arg max b2fc;mg P(W b (t)jH t ) (4.7) This rule is a Maximum A posteriori Probability (MAP) Decision rule [119]. Next we have the following theorem. Theorem 4.3.1. Minimum BA error probability can be achieved when the BS chooses bandb that satisfies b :P(W b (t)jH t ) 0:5: Proof. The proof is simple. We start by showing that the rule is equivalent to (4.7). Note that we have two cases,W b (t) orW b 0(t), thus from (4.7) we chooseb when P(W b (t)jH t )P(W b 0(t)jH t ) 1P(W b (t)jH t ); reording the terms we haveP(W b (t)jH t ) 0:5. Next, by the definition of the MAP rule in (4.7), the BS minimizes the probability of error at each time frame, which as a result minimizes the overall probability of BA error. The Theorem indicates that the natural choice for T is optimal under the conditions above. 4.3.1 One-Shot Band Assignment In this scenario the observation is the rate (power) in one band. Here we focus on the case when we need to decide whether to assign the UE to the mmWave band, as the other direction is safer in general and can be derived similarly. Thus we haveH t =fS c (t)g, then the rule is P(R m R c jS c =s c ) 0:5: (4.8) 111 Sequential Band Assignment We can rewrite the probability in (4.8) as: P ! m log(1 + 0 m 10 S m )! c log(1 + 0 c 10 S c ) S c =s c =P S m v 1 S c =s c ; (4.9) where v 1 = 1 log 10 ( 1 0 m (exp rc=!m 1)) , and r c = ! c log(1 + 0 c 10 sc ). With the assumption that S m andS c are jointly normal, it is enough to determine the conditional mean mjc and variance 2 mjc to calculate the probability in (4.9), which can be shown to be mjc = m;c m c s c and 2 mjc = (1 2 m;c ) 2 m ; where m;c is the correlation coefficient ofS m andS c . Thus we have P(S m v 1 jS c =s c ) =Q v 1 mjc mjc 0:5; whereQ(:) is the Q-function [120]. Taking the inverse of Q-function, and rearranging the terms, we have v 1 Q 1 (0:5) mjc + mjc = m;c m c s c ; where we have used the fact thatQ 1 (0:5) = 0. Solving fors c , the BS should assign the UE to the mmWave band if the following condition is satisfied: S c c m;c m v 1 : (4.10) For consistency with [121], we refer to this scheme as Threshold Based BA (TBBA). 4.3.2 Sequential Band Assignment 4.3.2.1 Exact Solution The sequential BA follows similar steps, however, we here haveH t =fS c (tQ);S c (tQ + 1);:::;S c (t);S m (tQ);S m (tQ + 1);:::;S m (t)g. The goal is to choose eitherW c (t +U) or 112 Sequential Band Assignment W m (t +U). Thus we chooseb =m if: P(W m (t +U)jH t ) 0:5 (4.11) and b = c otherwise. Since the setH t has more than a single value, we need to use the joint Gaussian assumption over time (space). Then (4.11) becomes P(W m (t +U)jH t ) =P(R m (t +U)R c (t +U)jH t ) = Z 1 1 P(R m (t +U)R c (t +U)jH t ;S c (t +U) =s)f S c jHt (s)d s : (4.12) wheref S c jHt (s) is the probability density function (PDF) ofS c (t +U) givenH t , which follows a normal distribution with mean cjH and variance 2 cjH . Note that we conditioned onS c (t +U) and used the integration to circumvent the fact that the probability ofW m :fR m >R c g =fR m R c > 0g cannot be calculated using a simple probability distribution without some crude approximations (as we discuss in the next subsection). Next, using (4:4) we rewrite (4.12) as: Z 1 1 P(S m (t +U)V 2 (s)jH t ;S c (t +U) =s)f S c jHt (s)d s ; (4.13) where V 2 (s) = 1 log 10 1 0 m exp ! c ! m log(1 + 0 c 10 s ) 1 In order to evaluate (4.13), we first point out thatS m (t +U),S c (t +U) and the observations inH t are jointly normal, thus it is enough to calculate the conditional mean and variance ofS m (t +U) givenH t andS c (t +U), which we denote by mjH + and 2 mjH + , respectively. Note that we refer to the set offS c (t +U)[H t g byH + t . To calculate these quantities we need to define a few vectors and matrices: we use the convention that X denotes the covariance matrix between the elements of a set of random variablesX . We also denote X;y as the cross covariance vector betweenX and a random variableY , and yjX is the variance ofY given a realization of elements ofX . For instance, H is a (Q + 1) (Q + 1) covariance matrix of the shadowing observations, H;b is a (Q + 1) 1 vector that represents the covariance between the shadowing in bandb at timet +U 113 Sequential Band Assignment and data in setH t . We use a similar subscript convention for the means, where m X refers to the vector of individual mean values ofX , and yjX refers to the mean ofY givenX . Then it is easy to verify that [120]: mjH + = m + m;H + 1 H + (k m H +) = m;H + 1 H + k; (4.14) where k = [s; vec(H t ) > ] > , where vec(X ) converts the setX to a vector. Also we have mjH + = 2 m m;H + 1 H + > m;H +: (4.15) Similarly, forS c (t +U) givenH t , we have: cjH = c + c;H 1 H (h m H ) = c;H 1 H h cjH = 2 c c;H 1 H > c;H ; where h = vec(H t ). Note that as indicated earlier, to calculate these quantities, we have to know the correlation model as well as the path-loss values. 4.3.2.2 Approximation We can provide a simpler rule that does not rely on integration by assuming that w b log(1 + 10 SNR b (t) ) w b log(10 SNR b (t) ), which is usually referred to as the ”high SNR assumption”. Then we have: ~ R b =! b log( b ) + log(10) 10 ! b S b ; which follows a normal distribution with mean and variance, respectively: ~ b =! b log( b ); and ~ 2 b = 0 @ log(10) 10 1 A 2 ! 2 b 2 b : 114 4.4. LEARNING BASED BAND ASSIGNMENT Then we can define the event ~ W b (t +U) : ~ R b (t +U) ~ R b 0 (t +U), and chooseb that satisfies P( ~ W b jH t )> 0:5. To calculate this probability, and takingb =m andb 0 =c, we have: P( ~ W m jH t ) =P( ~ R m (t +U) ~ R c (t +U) 0jH t ) =P( ~ R D 0jH t ); where ~ R D = ~ R m (t +U) ~ R c (t +U), sinceS m andS c are jointly normal, so is ~ R m and ~ R c , and thus ~ R D is normally distributed with mean and variance, respectively: D = ~ m ~ c and 2 D = ~ 2 m + ~ 2 c 2 m;c ~ m ~ c : Furthermore, note that ~ R D givenH t is normally distributed, with mean and variance, respectively: DjH = D + D;H 1 H h; and DjH = 2 D D;H 1 H > D;H : Finally, the decision rule becomesQ DjH DjH 0:5, which can be shown to be equivalent to DjH 0 =) ~ c ~ m D;H 1 H h (4.16) Due to the simplicity of this rule one can easily derive a number of interesting quantities, such as the probability of error. We hereafter refer to the exact and approximate GP based solutions, respectively, as GP and GP App . In Sec. 4.5 we study the impact of the decision threshold T and observation windowQ on both solutions. 4.4 Learning Based Band Assignment 4.4.1 Preliminaries As introduced in Sec. 4.2.3.2, the BS uses the input featuresF to the ML to produce ~ D and then that to make the BA decisionD. The BS can use a threshold T 2 [0; 1] to map ~ D toD, where we 115 Preliminaries assume thatD = 1 when ~ D> T . We can choose T that results in the best performance, however, we here do that only for the one-shot problem, and use T = 0:5 for the sequential problem. 4.4.1.1 Features We consider six features, i.e., side information that are used as input to the learning solutions: (f 1 ) the distance from the BS to the UEd in meters, (f 2 ) the angular position of the UE in rad, (f 3 ) the received signal strength (or the SNR) in the cmWave band in dBm (or dB), (f 4 ) similar quantities in the mmWave band, (f 5 ) the delay of the dominant MPC in seconds, and (f 6 ) the AoD of the dominant MPC, where the dominant MPC is the one with the highest power. The availability of the features depends on the system implementations. For instance, (f 1 ) and (f 2 ), i.e., (d;) (which represent the polar coordinates of the UE with respect to the BS), may be estimated using signal processing techniques or acquired by explicit feedback of the GPS data. To extract (f 5 ) large bandwidth might be required, for (f 6 ) the use of antenna arrays is necessary. Using both (f 3 ) and (f 4 ) is only reasonable for the sequential BA problem, but it may require additional effort or equipment at the UE side. We consider several combinations of the above features and discuss their effectiveness for BA. As typically done in ML, we perform pre-processing of the features, in particular we stan- dardize the input features such that their mean is zero and the standard deviation is equal to one. In addition, we utilize the prior knowledge about the wireless propagation, for instance we use logarithmic scale for distances and power, as this may linearize their relation with one another. 4.4.1.2 Learning Techniques Overview LR (Linear regression) is a simple ML approach, where the output is assumed to be a linear com- bination of the input features. Although the linear models are relatively simple, they have been widely used in wireless communication [18]. Since the BA can be viewed as a binary classifica- tion, GR (logistic regression) can better fit to our problem, where the linear combinations of the features are mapped by a logistic function to the range [0; 1], thus representing the probability of the output being in one of the classes. NNs (artificial neural networks) have been successfully 116 Preliminaries Figure 4.1: NN and LSTM diagrams. applied to many complex practical problems. An NN consists of one or more layers, each of which has a number of parallel neurons (nodes), see Fig. 4.1. The neuron performs a weighted combi- nation of the input features and then passes it through a possibly non-linear transformation, also known as an activation function, e.g., a sigmoid function (generalization of the logistic function). Note that the GR can be viewed as a simple NN that consists of one neuron. LSTM (Long Short Term Memory) is a popular Recurrent NN architecture. The inputs of the LSTM are the current features, the previous output and the previous cell state. The cell state is a memory that is controlled by three gates, which control when to read, to write and to erase the value of the cell. The decisions of the gates are controlled by NNs that provide nonlinear transformations of the input values, Fig. 4.1 shows a diagram of an LSTM layer. The weights of the above solutions are determined during the training phase over a training datasetA T , where the goal is to minimize the prediction error of the label values at the output over the observed data points. Popular training techniques use gradient descent, such as backpropagation for NNs and backpropagation through time for LSTM. 4.4.1.3 Training and Testing To train the learning approaches, we assume that the BS uses a data setA T =fP T 1 ;:::;P T N T g, where the superscriptT denotes training, andN T is the number of training examples. Each exam- ple pointP T i is a features-label pair (F i ;L i ), whereF i is the set of features of theith example of sizeF =jF i j,j:j denoting the cardinality operator. As in Sec. 4.2.4,L i 2f0; 1g is the true label 117 One-Shot Band Assignment of that example. We assume thatA T is available to the BS, for instance through previous decisions or an initial network training phase. However, the procedure to acquireA T is out of the scope of the thesis. In this work, let the setA denotes the entire data set we use in each environment, where each pointP i 2A represents the features label pair (F i ;L i ), wherei2f1;:::;Ng andN =jAj. In the simulation we randomly splitA into a training setA T and a testing setA S , whereA =A T [A S andA T \A S =;. We may further splitA T into a training subsetA T t and a validation subset A T v ,A T t \A T v =;. The sizes ofA T t ,A T v andA S are respectivelyN T ;N V ; andN S . More details about this are provided in the next sections. During the training phase, as it is commonly used in binary classification problems, the performance of the learning approaches is evaluated using a Cross Entropy (CE) cost function, i.e., E CE;X = 1 N X N X X i (L i log( ~ D i ) (1L i )log(1 ~ D i )): As in Sec. 4.2.4, we use the subscriptX to distinguish the used data set to calculate these value. For the introduced data sets we haveX2fT;V;Sg. 4.4.2 One-Shot Band Assignment In this problem we use LR, GR and NN learning approaches. For NN we use up to four hidden layers and up to 100 neurons in total. We use L2 regularization to reduce the impact of over-fitting with parameter. For each time instant, the input feature size for each of the approaches is equal to the number of used features. For this problem we use Monte-Carlo cross validation to improve our estimate of the validation error, in which we repeat the random split ofA T toA T t andA T v , and rerun the training and the validation. Then we choose the network structure (number of layers and neurons in the NN) and regularization coefficient that achieve the smallest average E CE;V . To choose the hard decision threshold T we use the value in [0; 1] that results on the smallest average E V . Finally note that we use similar training, cross validation and method to obtain the hard decisions for LR and GR as well. 118 Sequential Band Assignment 4.4.3 Sequential Band Assignment In this problem we use previously observed data points to predict the best band afterU future time frames. We use a DL approach based on LSTM and also use GR-based (denoted by GR H ), and NN-based (denoted by NN H ) approaches for comparison. We consider several DL structures, they are summarized in the Table 4.1. For each features scenario we choose the network that results in the lowest cross validation error, we refer to this approach as LSTM opd . In Secs. 4.5 and 4.6 we show the performance of LSTM opd and NW4, we refer to the latter as LSTM std . We use the Adam algorithm for training [122]. The number of used epochs depends on whether we shuffle the data set at the beginning of each iteration, we use up to 600 epochs when we shuffle the data set, and up to 120 when we do not, for the former case the size of the minimum batch is three sequences (to be explained later) while it is four for the latter case. We use an initial learning rate of 0:01 and a drop factor of 0:1 and 0:009 when we shuffle and do not shuffle respectively, the learning rate drops after 120 and 50 epochs for the two cases respectively. The choice between the two is done based on the cross validation. For NN H we use two hidden layers with 70 and 40 neurons, respectively. Network Structure (Layer: Size) NW0 (FC: 5)+(LSTM: 5)+(FC: 5) NW1 (FC: 15)+(LSTM: 10)+(FC: 10) NW2 (FC: 50)+(LSTM: 10)+(FC: 10) NW3 (FC: 30)+(LSTM: 20)+(FC: 15) NW4 (FC: 20)+(LSTM: 40)+(FC: 20) NW5 (FC: 15)+(LSTM: 10)+(FC: 10)+(RelU)+(FC:7) NW6 (FC: 10)+(LSTM: 50)+(FC: 7) NW7 (FC: 2F )+(LSTM: 2F )+(FC: 2F ) NW8 (FC: 3F )+(LSTM: 3F )+(FC: 2F ) NW9 (FC: 10F )+(LSTM: 9F )+(FC: 5F ) NW10 (FC: 10F )+(LSTM: 5F ) NW11 (FC: 5F )+(LSTM: 10F ) NW12 (LSTM: 10F ) NW13 (FC: 3F )+(LSTM: 15F )+(FC: 4F ) NW14 (FC: 20)+(LSTM: 40)+(RelU) NW15 (FC: 3F )+(LSTM: 15F )+(FC: 4F )+(RelU) Table 4.1: DL structures, note that all structures are followed by (FC:2)+(SOFT)+(CLASS), where FC is a fully connected linear transformation layer (parallel neurons), RelU: a nonlinear RelU layer [5], SOFT: a softmax layer, CLASS: classification layer,F is the features size. The hidden size of the LSTM layer is shown after ”LSTM”. 119 4.5. EXPERIMENT I: STOCHASTIC ENVIRONMENT Since the decision depends on the previous data points, we use a modified dataset A T 0 . In particular, the labels at point (time)i areL 0 i =L i+U . In addition, to utilize theQ previous observed points in the GR H and NN H , their data sets should have input features sizejF 0 j = F (Q + 1), i.e., the size is equal to the number of used features in the current and theQ previous points. For the DL approach, no modification to the feature size is needed, because the LSTM layer has a memory and can select what to remember and what to forget. Note that in this manuscript we focus onjL 0 i j = 1 for simplicity. We consider the other case in our future work, where we can use sequence-to-sequence learning based on LSTM encoder/decoder [123]. Due to the size of the problem, we use fixed T and values. In particular we use the ”natural” choice T = 0:5 as we will discussed in the next section. 4.5 Experiment I: Stochastic Environment Here we study the performance of the solutions using stochastically generated channels. This will provide a comparison between the learning based and the GP based solutions in a synthetic environment that matches the GP assumptions. We first describe the general data generation model and then address the dataset details and the performance for each of the two problems. In this section and the one follows, we only discuss the main results, however, the tables and figures contain more information and are left for the reader’s reference. Note that for the learning schemes, we emphasize that the displayed performance values are by no means the optimal values, as we have considered a limited number of structures and parameters and performed a grid search over them. 4.5.1 The Environment In order to generate the channel realizations in the two bands, we use a modified correlation model of the one suggested in [118]. The covarinace between shadowing values at time instantst andt 0 120 The Environment and in frequency bandsb andb 0 is Cov S b (t);S b 0 (t 0 ) = b;b 0 r C S b (t);S b (t 0 ) C S b 0 (t);S b 0 (t 0 ) ; (4.17) where b;b 0 is the correlation coefficients, and C S b (t);S b (t 0 ) = exp (t;t 0 ) d b dcor 2 b ; (4.18) where (t;t 0 ) is the displacement (in meters) between the location of the UE at times t and t 0 , d b dcor is the shadowing decorrelation distance in band b (in meters), the real coefficient > 0 is a decay exponent [124], values for in (0; 2] have been previously used [124]. Note that with = 1 (4.18) is equivalent to the popular Gudmundson correlation model [125]; with this value, we observed that the schemes show small dependency on prior observations, which may not reflect practical environments, thus we consider two values of in the sequential problem. We assume that the path-loss follows a break point path-loss model [18], with a break distance d break and a propagation exponent 2 fordd break and ford>d break . Table 4.2 summarizes the values used for generating the data sets. Variable Band c/m f b 2.5/28 GHz Bandwidth! b 10/100 MHz P b tx 15/22 dBm 4 d break 50 m d dcor 25/24 b 5/7 dB m;c 0.75 f1,1.9g Noise Spectral Density -174 dBm/Hz Table 4.2: Stochastic channel simulation configura- tions Variable Band c/m f b 2.5/28 GHz Ant. Pattern Isotropic Ant. Polarization Vertical P b tx 15/30 dBm BS height 45 m MS height 2 m Max. Diffraction 2/1 Max. Reflection 10 Table 4.3: Ray-tracing simulation configura- tions. 121 One-Shot Scenario 4.5.2 One-Shot Scenario 4.5.2.1 Data Points We assume that the BS is located at the center of a square cell with a side length of 500 m, the data set consists of 2000 data points, which correspond to 2000 uniformly distributed UEs around the BS. We choose = 1:9. 4 In this data set, to simplify the simulation environment, we focus on three features: location of the UE (d;) and the power in the cmWave band. We here use 65% of the data set for training. For the Monte-Carlo cross validation, we take around 20% ofA T for the validation subsetA T v . We generate 1000 independent cell realizations to assess the performance of the learning based BA and the TBBA in the stochastic environment. 4.5.2.2 Performance Feature / Combination c-1 c-2 c-3 c-4 c-5 c-6 c-7 d X X X X X X X X cmWave Power X X X X NN E S .24 .186 .189 .194 .267 .195 .413 NN R S .099 .067 .068 .071 .115 .071 .223 GR E S .265 .191 .192 .192 .266 .193 .469 GR R S .115 .07 .07 .07 .115 .071 .261 LR E S .264 .195 .193 .195 .265 .194 .469 LR R S .113 .072 .07 .072 .114 .071 .261 TBBA E S .193 TBBA R S .07 Table 4.4: Performance of the learning over the stochastic data under different feature availability. Note that on average 49:3% of the labels are "1". We can view each of the 1000 cell realizations as a different cell. Then for every realization we repeat the training, validation and then testing. Table 4.4 summarizes the results for seven feature combinations. The last row shows the performance of the TBBA (GP based). In the generated data set we have about 50:7% of the labels are ”0”. Thus, assigning the cmWave band, i.e., cmWave- only BA, for all points would result in error equal 0:493. In general, we notice the learning techniques provide significant improvements over the cmWave- only BA, with an advantage to the NN over the other schemes, as the NN is able to learn the 4 Note that this is different from [121], however, the results that we discuss below are consistent with [121] due to the fact that the value of has small impact on the one-shot problem. 122 Sequential Scenario non-linearity in the feature(s)/BA mapping. This can be observed in the performance for the first features combination (c-1), i.e., the location of the UE. Next, adding the received power in the cmWave band, (c-2), provides an evident performance gain for all learning approaches. In fact, it seems that any other combination with the power information would provide comparable perfor- mance, especially when we use the angle information as in (c-3). Comparing power-only (c-6) to distance-only (c-5), we observe that the power in the cmWave band seems to reveal more information about the BA than the distance. In fact, we notice that the performance in (c-4) is close to (c-6). This should not be surprising, as the shadowing is better captured with the received power in the cmWave band compared to the distance. However, we notice an improvement in (c-3) compared to (c-4), as the angle will provide additional information that helps to identify clusters of similar BA decisions. For the TBBA, from the table we notice that the learning schemes can be at least as good as the TBBA in several features combinations. These results have been achieved without providing the structure and the statistics of the channels. In fact, the NN is able to outperform the TBBA in (c-2) and (c-3). Note that with only power feature, the learning based solutions are roughly as good as the TBBA. Further discussion can be found in [121]. 4.5.3 Sequential Scenario 4.5.3.1 UE Trajectories In this problem the data set consists of sequences of features and labels that represent different UE trajectories and the optimal BA decisions; each sequence can be viewed as an ordered subset of the available data points. Generation of such a data set is challenging, as we have to generate correlated data points and reasonable trajectories. Note that the points on different trajectories may still be correlated as they belong to the same realization of the environment. As a result, we restrict our environment to one realization with several trajectories, we further generate the trajectories over a grid that represents the cell. We use a Semi Markov Smooth mobility model (SMS) to generate the motion trajectories [4]. 123 Sequential Scenario In a SMS model, the UE motion goes through cycles of four states until the end of the simulation time; it starts with the acceleration state with a random direction and a maximum ultimate speed, then a steady motion state, next it decelerates to zero before it stops in the last state, the UE can then go again to the first state. The duration of each state is a design parameter, we assume that the duration of the second state is a random value whose minimum is equal to half of the simulation time, during this state it maintains the speed and the direction with a high probability. In our simulation, we omit the repeated data points (the consecutive points on the trajectories that correspond to the same location), and limit the number of repeated crossings over the same grid point. This model captures two important aspects of the realistic pedestrians mobility, the smooth speed and direction adaptation (during the second state), and the possibility of changing the direction and speed along the route (at the beginning of the first state). 4.5.3.2 Data Generation We use the same network structure (cell dimension, path loss model, frequency and bandwidths) as above with 5 m separation distance between the points on the grid. We generated 1000 sequences, we assumed that the duration for each sequence is 900 s with speed up to 1:5m/s and a 4 s sampling period. To generate the shadowing values we use the correlation model in (4.17) for two different values, = 1 and = 1:9. We assume that the observation window of GP-based and learning- based solutions (other than LSTM-based) is five, i.e., Q = 5. For training we use 70% of the sequences for training and 30% for testing. For the LSTM based solution we use 50 sequences for cross validation. To assess the power of the solutions we consider the prediction over two different future valuesU = 4 andU = 8. 4.5.3.3 Performance ( = 1) The results are presented in Table 4.5. As before the first five rows show the features combinations, then for each solution we show the BA error E S in the first row and the normalized rate loss R S in the second row for the two U values (separated by ”=”). Starting with U = 4, we notice that GP App shows around 9% degradation in the performance compared to GP. For learning 124 Sequential Scenario Feature / Combination c-1 c-2 c-3 c-4 c-5 c-6 c-7 d X X X X X X X X cmWave Power X X X X X X mmWave Power X X LSTM opd E S .178/.198 .195/.214 .183/.238 .164/.196 .159/.193 .192/.244 .206/.248 LSTM opd R S .068/.077 .081/.088 .073/.1 .062/.078 .062/.078 .077/.102 .083/.103 LSTM std E S .178/.198 .193/.214 .183/.233 .165/.192 .163/.192 .191/.243 .206/.252 LSTM std R S .068/.077 .078/.088 .073/.099 .064/.076 .065/.077 .079/.107 .083/.104 NN H E S .211/.216 .216/.227 .197/.244 .192/.214 .189/.207 .209/.251 .22/.257 NN H R S .08/.082 .083/.087 .074/.101 .072/.082 .071/.08 .081/.104 .085/.106 GR H E S .22/.225 .22/.229 .214/.25 .209/.225 .198/.227 .211/.257 .221/.256 GR H R S .085/.088 .083/.089 .082/.103 .079/.088 .074/.088 .081/.107 .086/.106 GP E S .201/.226 GP R S .075/.088 GP App E S .221/.242 GP App R S .082/.095 Table 4.5: Sequential BA = 1: Performance of the learning over the stochastic data under different feature availability. Note that on average 47:9% of the labels are "1". The first number in each entry in rows 6 17 denotes the prediction error withU = 4, and the second withU = 8. schemes we notice that all-features combination (c-5) provides the best performance followed by the combination of cmWave power and location (c-4). One reason for (c-5)’s good performance seems to be the location information. This conjecture is supported by the performance of (c-1) compared to having cmWave and mmWave powers (c-6). The importance of location information is intuitive, as it relates to trajectory prediction which in turn impacts the channel conditions. The performance difference between the LSTM-based solutions and NN H may be attributed to the inherent ability of LSTM layer for sequential learning. Note that the cmWave and mmWave power combination (c-6) still provides valuable information, and with it as the basis, most of the learning schemes outperform GP, and all of them outperform GP App . This is important as both approaches, the GP-based and the ML-based, may use cmWave and mmWave power as input observations. Interestingly, we also observe that using a cmWave only (c-7) learning scheme can outperform GP App . ForU = 8, we first notice that the performance degradation of GP App compared to GP reduces to about 7%. We also observe that the learning schemes can still outperform the GP-based solution, especially using (c-5) and (c-4), however, the number of combinations and schemes that outper- form GP-based solution reduces, and the gain that the best learning solution provides (LSTM opd 125 Sequential Scenario using (c-5)) reduces from 21% to 15%. Using power(s) only to solve the problem becomes less efficient, as it is clear in the case of (c-6), where only the LSTM-based scheme can compete with GP App . This is expected as the learnability for largeU is harder, Fig. 4.2 emphasizes this trend for combinations (c-5) and (c-6) as function ofU. The figure also shows that LSTM std dominates the other schemes for smallU, but the probability of BA error increases logarithmically asU increases. From the table, for most of the combinations, we observe that LSTM std shows relatively good performance, this might be attributed to its medium size as smaller and larger networks are more susceptible to fitting problems. Note that for some cases, the table shows that LSTM std has slightly better performance than LSTM opd , despite the fact that the schemes included in LSTM opd cover LSTM std , this is attributed to the small cross validation set. Figure 4.2: E S vs. U for two combination ”loc+cm+mm” and ”cm+mm” ((c-6) and (c-5), respectively, in Table 4.5). The use of T = 0:5 for GP was justified in Sec. 4.3, in Fig. 4.3 we present the impact of T on the performance for combinations (c-5) and (c-6), and show the performance for the LSTM std (as listed in the table) for comparison. We notice that T 0:5 is good for most of the schemes except GP App , indicating that GP App can be improved with a judicious choice of T . 126 Sequential Scenario Figure 4.3: E S vs. the decision threshold T with for = 1 for two features combinations ”loc+cm+mm” and ”cm+mm”, respectively, (c-5) and (c-6) in Table 4.5. 4.5.3.4 Performance ( = 1:9) Feature / Combination c-1 c-2 c-3 c-4 c-5 c-6 c-7 d X X X X X X X X cmWave Power X X X X X X mmWave Power X X LSTM opd E S .166/.209 .158/.204 .139/.203 .124/.184 .103/.178 .119/.215 .166/.23 LSTM opzd R S .068/.089 .06/.084 .052/.087 .046/.075 .034/.074 .042/.096 .063/.097 LSTM std E S .166/.204 .158/.206 .142/.203 .123/.185 .107/.175 .119/.215 .166/.229 LSTM std R S .068/.085 .062/.085 .053/.087 .045/.077 .037/.074 .042/.096 .063/.099 NN H E S .196/.209 .18/.211 .155/.217 .147/.201 .118/.195 .136/.231 .176/.233 NN H R S .077/.085 .067/.084 .055/.091 .051/.081 .036/.077 .042/.098 .066/.099 GR H E S .21/.215 .18/.211 .169/.228 .172/.207 .127/.206 .133/.229 .179/.232 GR H R S .085/.087 .067/.084 .063/.097 .064/.082 .038/.081 .04/.096 .067/.099 GP E S .126/.204 GP R S .037/.08 GP App E S .159/.225 GP App R S .052/.089 Table 4.6: Sequential BA = 1:9: Performance of the learning over the stochastic data under different feature availability. Note that on average 48:4% of the labels are "1". With = 1:9, the correlation function decays faster than above, however, we noticed that the 127 Sequential Scenario impact of prior observations is more pronounced. The results for several features observations are presented in Table 4.6 with a structure similar to the one above. ForU = 4, we first observe that GP outperforms GP App by about 20%, and several learning schemes outperform the GP with several features combinations. In addition, the all-features combination (c-5) still has the least E S , but different than above the performance gain is attributed to the power in the two bands (c- 6). Interestingly, LSTM-based solutions using cmWave power feature (c-7) is as good as location (comparable to GP App ) in this environment, which indicates that (c-7) is a good BA predictor. The significance of (c-7) is also evident for other learning schemes that use an observation window. ForU = 8, we notice a similar trend as in = 1, namely that the values of E S increase, the gain of GP over GP App reduces to about 9%, and the gain of the best learning combination/scheme, (c-5)/ LSTM opd , reduces from 18:3% to 12:8%. However, these gains are larger than in = 1, due to the utilization of the observations. Compared toU = 4 the efficacy of observed powers reduces, for instance (c-1) outperforms (c-6) and (c-7); this is due to the fact the shadowing decorrelates with large separation distances. Nevertheless, the LSTM-based schemes are able to utilize the cmWave power when accompanied with other features, in (c-2), (c-3) and (c-4), and shows at least comparable performance to GP. Based on the used pedestrians speed, grid points separation and sampling interval we anticipate that observations outside the used observation window, of sizeQ = 5, have small influence on the BA at time framet+U. However, considering the adopted motion model, this may not be accurate, as an old observation might be highly correlated (closely located) to future value. This complicates the analysis of the impact ofQ. Instead we here restrict our attention to a simpler motion model, namely a circular motion around the BS, where we consider 5000 sequences, each corresponding to one circle around the BS and having an independent shadowing realization; we here relax some of the correlation assumption since we consider only cmWave and mmWave power information. The results are provided in Fig. 4.4. The learning schemes achieve the same performance com- pared to the optimal solution (GP in this case). Starting with GP vs. GP App , it is clear that the approximation introduces an error floor for GP App . The GP App shows a noticeable decrease in E S untilQ = 4, as it may reduce the uncertainty, however beyond that the error increases again due to 128 4.6. EXPERIMENT II: SIMULATED CAMPUS ENVIRONMENT the model mismatch. While an increase ofQ improves the performance for GP and the learning schemes, we notice a slower improvement for largeQ, due to the decrease of added information in older observations in such a uniform motion. A more comprehensive study is needed for this problem but it highly depends on the motion model. Figure 4.4: The impact of the size of the observation window Q in stochastic environment with circular trajectory. For this data set the percentage of ”1”s is 75%. 4.6 Experiment II: Simulated Campus Environment 4.6.1 The Environment To assess the performance in a more realistic setting, we simulate the propagation channel in a campus environment by means of a commercial ray-tracing tool, Wireless InSite [126]. The input to the ray-tracer includes the 3D models of the buildings, the characteristics of the building materials and models of foliage. The output is a list of parameter vectors that contains the power, propagation delay, the AoD and AoA, for each MPC. Simulation results have been compared to measurements in a variety of settings and shown to provide good agreement [126]. This simulation 129 One-Shot Scenario has been conducted based on the model of the University Park Campus, University of Southern California, which is shown in Fig. 4.5-(a). The detailed simulation configurations are listed in Table 4.3. The simulation environment was also used in prior works, see references in [121]. The data set has about 1150 points, i.e.,jAj = 1150, each point contains all the six features. The label that is associated with each point is whether the rate in the mmWave band is larger than the one in the cmWave band. To calculate the rate we use the Shannon capacity with bandwidth and noise spectral density that are shown in Table 4.2. (a ) (b) (m eter) (m eter) (m eter) (m eter) Figure 4.5: (a) Ray-tracing simulation environment. The green dot is the BS located above the rooftop, while simulated UEs are red routes. Gray objects represent the buildings. The green 3D polygons denote foliage with different densities. (b) Using 70% of the data for testing, from left to rightA S andA T . 4.6.2 One-Shot Scenario 4.6.2.1 Data Points Since acquiring a large number of data points may not be practical for the BS, using a large portion of the data set for training may produce misleading results. Here we use only 30% ofA for training. To apply the Monte-Carlo cross-validation method, we randomly choose 80% ofA T for training 130 One-Shot Scenario and 20% for validation. The network is then tested onA S , i.e., the remaining 70% of the data set. Fig. 4.5-(b) shows an example of the setsA S andA T . 4.6.2.2 Performance Feature /Combination c-1 c-2 c-3 c-4 c-5 c-6 c-7 c-8 d X X X X X X cmWave Power X X X X X Delay X X X AoD X X Numb Layers// T 2/.15/.45 4/.15/.55 1/.05/.5 1/.1/.6 2/.1/.35 3/.5/.45 4/.3/.6 3/.5/.55 NN E S .078 .061 .072 .074 .085 .182 .067 .093 NN R S .05 .032 .041 .042 .054 .133 .037 .062 GR E S .178 .062 .082 .081 .083 .183 .082 .182 GR R S .129 .032 .051 .049 .051 .135 .048 .13 LR E S .176 .078 .088 .078 .081 .178 .072 .188 LR R S .126 .047 .054 .047 .049 .128 .04 .136 Table 4.7: Performance of the learning techniques on ray-tracing data, under different feature availability, note that the percentage of points with labels equal to "1" is approximately 30%. We first point out that in this environment using cmWave-only BA would result in an error equal to 0.3, i.e., the percentage of "1" inA is 30%. Table-4.7 summarizes the results of the solutions. The used structures of the NNs are shown in the 7th row. Combinations (c-1) and (c-8) show the cases when we use the location or the delay and AoD of the strongest MPC; these two are usually related as several localization techniques use the delay and AoD to determine the location. The performance in the two cases are comparable, even though we may not have Line of Sight (LOS) in all the cases. We also note that NN significantly outperforms the regression-based approaches. Adding the power to the two combinations above, as in (c-2) and (c-7), improves the perfor- mance for this environment as well, especially for the regression-based BA. The performance gain in (c-2) and (c-7) can be partially explained by the good results in (c-5) that uses the cmWave power only. As in the stochastic environment, a scheme that only exploits the distance feature (c-6) shows relatively poor performance for all approaches. Similar comparisons can be made with a delay-only (not shown in the table) scheme, which provides an improvement compared to distance only [121]. This performance could be expected as that the delay may reflect a more realistic ”effective” distance, note that non-LOS links will show a longer delay even if they have 131 Sequential Scenario similar geographic distance as their LOS counterpart. A combination of delay and distance with power, in (c-3) and (c-4), shows small improvement over power only, however, they show sig- nificant improvement over distance-only and delay-only cases. In general, we notice that in this environment, the performance gaps between NN and other learning based solutions are larger than for the stochastic environment, which suggests that in a more realistic environment, the NN is especially useful. 4.6.3 Sequential Scenario 4.6.3.1 Data Points To generate the sequences we use the motion model discussed in Sec. 4.5.3.1 over the ray-tracing grid. This includes the pedestrians speed, simulation time, and sampling interval. We generate 1000 sequences, and use 350 of them for training out of which 70 are chosen randomly for cross validation for the LSTM-based solution. In this problem we apply the GP-based solutions; to do so we extract the channel parameters to fit the path-loss (using a double linear fit) and then compute the parameter for the correlation model (4.17) and (4.18). 4.6.3.2 Performance Table 4.8 shows the performance of the schemes in this environment with a structure similar to the ones in Sec. 4.5.3. We start by observing that both GP-based solutions are not performing well. This can be explained by our investigation of the shadowing distribution in this environment, where we observed that it is far from satisfying the GP assumption even for a single band. We also note that the GP has a larger E S compared to GP App (14% worse forU = 4); this surprising result might be related to the fact that GP is only exact if the Gaussian model is fulfilled, so that an approximate algorithm might suffer less in the presence of model mismatch. For the learning schemes we focus on the performance of LSTM-based and NN H . ForU = 4, we notice that the location plus the powers in both bands (c-3) still achieves low E S , however, note that for LSTM- based solutions this is the case for other combination as well such as the powers in both bands 132 Sequential Scenario Feat./Comb. c-1 c-2 c-3 c-4 c-5 c-6 c-7 c-8 c-9 c-10 c-11 d X X X X X X X cmWave X X X X X X X X mmWave X X X X Delay X X X X AoD X X X LSTM opd E S .078/.117 .062/.103 .062/.1 .07/.12 .094/.14 .08/.132 .098/.137 .071/.114 .061/.11 .071/.12 .081/.128 LSTM opd R S .048/.08 .044/.074 .041/.071 .048/.085 .061/.1 .062/.092 .065/.098 .048/.08 .046/.078 .048/.086 .054/.089 LSTM std E S .077/.117 .064/.108 .062/.1 .07/.123 .094/.14 .08/.132 .096/.134 .073/.118 .065/.111 .071/.12 .079/.126 LSTM std R S .048/.081 .044/.075 .042/.071 .048/.089 .062/.1 .053/.094 .067/.096 .049/.086 .044/.08 .047/.086 .053/.09 NN H E S .086/.11 .083/.119 .076/.116 .107/.139 .134/.162 .12/.15 .113/.147 .098/.117 .085/.117 .099/.144 .107/.152 NN H R S .053/.073 .052/.081 .047/.078 .068/.096 .088/.114 .078/.103 .074/.105 .063/.082 .054/.081 .063/.103 .068/.106 GR H E S .208/.211 .134/.156 .123/.15 .125/.156 .142/.17 .122/.153 .224/.231 .138/.166 .119/.155 .137/.164 .14/.168 GR H R S .149/.151 .089/.11 .08/.104 .081/.108 .096/.121 .079/.106 .16/.166 .092/.117 .078/.108 .091/.116 .093/.106 GP E S .183/.177 GP R S .127/.123 GP App E S .159/.174 GP App R S .107/.119 Table 4.8: Performance of the solutions on ray-tracing data under different feature availability. The percentage of points with labels equal to "1" is approximately 30%. Results in rows 8-19 correspond toU = 4=U = 8. 133 Sequential Scenario plus AoD and Delay (c-9). Comparing E S for the power in both bands (c-4), location (c-1), and AoD and Delay (c-7), we notice that (c-4) plays a major role in the performance gain that we observed. The value that (c-4) achieve is also possible using other combinations, namely (c-8) and (c-10), which require the cmWave power plus other features, indicating the practicality of the solutions when only the cmWave power (e.g., through a control signal) is periodically observed. Note that, for LSTM-based scheme, E S for (c-1) is not much worse than (c-4), which explains why the combination of location and cmWave power (c-2) would be as good as (c-3). For NN H things are slightly different as the observed performance gain is mainly attributed to the location information (c-1), which alone provides a performance comparable to (c-9). ForU = 8, we observe that (c-3) is the best features combination for LSTM opd , while location only (c-1) is the best for NN H ; the performance gain for LSTM opd over NN H (using their best features combinations) reduces from 19:7% to 9:1%; this could be explained by the observed degradation of the efficacy of (c-4) compared to (c-1), as the correlation of the shadowing reduces, and the fact that NN H can utilize the location information well. Combining location or Delay and AoD with other features provides just a slight advantage for the LSTM opd , however, for other combinations the LSTM opd outperforms NN H significantly, possibly due to the fact that these features combination are less relevant to location information. Note that the observed behaviour of the LSTM based solutions and NN H with the location information and the power of both bands was also observed in stochastic environment with = 1:9. The impact ofU is further shown in Fig. 4.6. The probabilities of error E S using (c-3) and (c-9) are comparable over differentU, which is interesting as this could eliminate the need for explicit feedback of the location information. For the GP based scheme, as discussed above, GP App is better than GP. One reason for the relative better performance can be attributed to the intuitive GP App structure, eq. (4.16), which is a threshold rule that employs the gap between the average received powers in the two bands, which may rely less on the impact of the GP assumption on the rates. Note that the general behaviour of the GP-based solution can be explained by the fact that the environment does not follow the GP assumption anymore, but rigorous explanations are difficult since we here have a single environment realization. 134 Sequential Scenario Figure 4.6: E S vs. U for three features combinations ”loc+cm+mm”, ”cm+mm” and ”De- lay+AoD+cm+mm”, respectively, (c-3), (c-4) and (c-9) in Table 4.8. 135 Chapter 5 Concluding Remarks and Future Outlook Link setup and maintenance are essential components of wireless communications, their proce- dures will always be part of future networks. As networks evolve, it is possible that new or mod- ification to the procedures is needed. In this dissertation, we addressed three emerging problems in the link setup and maintenance related to the proposed network architectures for the next gen- eration wireless networks. In particular, we considered the: (i) Channel State Information (CSI) acquisition in D2D networks, (ii) Directional Neighbor Discovery with prior information, and (iii) Band Assignment (BA) in dual-band systems. We utilized the side information to propose effi- cient schemes. Side information could represent highly correlated prior information and/or reveal the structure of the network or the environment. Such information has proved to be invaluable to design efficient schemes that address the challenging link setup and maintenance for a number of future network architectures and technologies. In this chapter provides concluding remarks and possible future directions. 5.1 CSI Acquisition and Neighbor Discovery In D2D networks The CSI is needed for several communication procedures. In D2D networks, the CSI between the devices and their set of nearby devices could be used for coherent communication, scheduling, interference cancellation etc. In Chapt. 2, we proposed a novel channel training scheme, entitled Future Outlook LATS for D2D networks, where every device acquires the CSI of the channels to all neighbor devices. We used the NMSE as CSI performance metric that quantifies average square error in the acquired CSI. We then formulated an optimization problem to solve for LATS parameters. Also, we studied location error and location update as two relevant practical issues and analyzed their impact on the NMSE. Simulations showed that the scheme outperforms other schemes for most devices densities. In Chapt. 3, we considered directional neighbored discovery when side information about the set of possible neighbors is given. We modeled the expected discovery time for one-way randomized schemes when prior information is available. As a result of prior information, every node may use different transmission and beam-steering probabilities. We used the non-uniform coupon collector to model the expected discovery time and provide upper and lower bounds that we used to derive convex optimization programs to minimize the expected discovery time by tuning the directional transmission probabilities. Further, the derived lower bound reveals the sensitivity of the expected discovery time to the variation in the probability of success. Specifically, for given average probability of success, equal probability of successes result in the minimum expected discovery time. It also indicates the relation of discovery time in uniform and random networks. In the first two problems, the devices have to exchange pilot signals for the CSI acquisition and neighbor discovery. Thus, the CSI acquisition could be viewed as neighbor discovery prob- lem, and the neighbor discovery does provide estimates to the CSI during the neighbor discovery phase. However, we here draw important distinctions between the two. LATS is intended for timely channel estimations between devices with given communication radius and using omnidirectional antennas. However, for the neighbor discovery, the average CSI is enough to establish neighbor- hood relations. Furthermore, in Chapt. 3, among several other differences, we consider that the devices are equipped with directional antennas. 5.1.1 Future Outlook In the first problem, the devices acquire the CSI information by the end of every CSI acquisition round although such information is enough for some scenarios, one might be interested to convey 137 5.2. BAND ASSIGNMENT IN DUAL BAND SYSTEMS the CSI to the BS. Thus it is interesting to study possible techniques to transmit such information to the BS side with minimal overhead, e.g., the CSI data could be restricted to the set of neighbors and/or quantized using various types of codebooks, as is typically done in CSI reporting [18]. For the second problem, two drawbacks of the proposed solutions are the complexity of the optimization problem that might require a central controller, and the fact that we used one-way randomized neighbor discovery. Our work [71,98] show possible extensions to the problem, how- ever, additional investigations are needed to analytically analyze the performance guarantees of the proposed distributed solutions. For the second problem, a similar analysis of iterative and sequential neighbor discovery schemes would be interesting. Incorporating the directional neighbor discovery with CSI acquisition is an obvious future inter- est, where the CSI acquisition can be only inquired from neighbor devices. In a dynamic network, the devices could perform continuous ”neighbor discovery” and CSI acquisition through beam tracking, although there has been considerable work for single device tracking, multiple neighbor tracking in dense networks has received little attention. In particular, the temporal evolution of var- ious devices could be correlated, and the incorporation of this correlation into the beam tracking could give rise to further interesting problems. 5.2 Band Assignment in Dual Band Systems In Chapt. 4, we explored learning-based and GP-based approaches to provide solutions to the BA problem in two scenarios; (i) one-shot BA and (ii) sequential BA. For dual-band systems, the first scenario could be viewed as part of the link setup procedures, while the second is needed for link maintenance. We considered two environments to assess the performance of the proposed techniques and gain insights about the impact of different features, using stochastic and ray-tracing simulations. We also discussed the impact of prediction horizon and the observation window. The performance depends on the problem and the used features. For the one-shot problem, the learning based approaches showed competitive performance to the GP-based solution, especially when the SNR in one band is known. For the sequential BA, the DL scheme (LSTM based) showed 138 Future Outlook superior performance due to its inherent ability to deal with sequential data. NN-based and LSTM- based solutions using location and power information in the two bands have consistently shown to be the best BA decision predictor; however, LSTM-based solutions using other information includ- ing delay and AoD also showed competitive performance. Interestingly, in realistic environments, the power information has proven to be especially beneficial for short prediction horizon. We also observed that the GP-based solutions have failed in the ray-tracing environment. In general, the LSTM and NN based solutions show good performance using features that are relatively easy to acquire, indicating the practicality of the learning solutions. 5.2.1 Future Outlook Future generations of wireless networks will rely on machine learning solutions to provide the needed flexibility of the heterogeneous networks over highly dynamic channels and complex sys- tem demands. In fact, the proposed solution of BA using machine learning comes as the ML is capturing increasing attention from the wireless communication community. Thus, there are nu- merous opportunities in that avenue. We here highlight a few interesting directions in the link setup and maintenance. It is worth exploring a slight variation to the BA problem. For instance, initial network access (link setup) with directional antennas is expected to be a major challenge. For this case, the pre- diction of the best beam direction, rather than the received power, is needed. Similar to chapter 4, this can be generalized for a sequential beam steering problem. This motivates the use of a new set of features combinations, such as previously observed or used beam direction. In this dissertation we confined our discussion to two frequency bands, although that captures the essence of the problem, future wireless networks are likely to utilize multi-frequency bands, such as 2 GHz, 5 GHz, 28 GHz, 60 GHz, and possibly the Tera-Hertz (THz) band. Thus, it is inter- esting to investigate the impact of different features combination on the BA process. In a related problem, the different frequency bands could be utilized by different radio access technology. ML learning solutions can take the key difference between the different RATs into account. This can also occur in networks structures different from the one we considered, for instance, the network 139 Future Outlook could consist of several BSs with overlay communications in multi-tier networks. Furthermore, although we have discussed supervised learning, there are other interesting ML models can be used that suits future networks. One prominent method is reinforcement learning, which can be used for links maintenance in dynamic channels, this reduces the burden of data acquisition and provides the needed adaptability for channel conditions. The distributed learning is promising methods that also reduce the need for data collection as the training can be pushed to the devices, this also could answer the privacy concerns about the massive data collection, interestingly, this is an active research direction for many big technology companies, e.g., see [127]. 140 Bibliography [1] C. W. Johnson, Long term evolution in bullets, 2010. [2] E. Dahlman, S. Parkvall, and J. Skold, 4G, LTE-advanced Pro and the Road to 5G. Aca- demic Press, 2016. [3] D. Burghal, S. L. H. Nguyen, A. F. Molisch, and K. Haneda, “Dual frequency bands shadow- ing correlation model in a micro-cellular environment,” in GLOBECOM 2019 IEEE Global Communications Conference, 2019, p. To Be Submitted. [4] M. Zhao and W. Wang, “WSN03-4: A Novel Semi-Markov Smooth Mobility Model for Mo- bile Ad Hoc Networks,” in Global Telecommunications Conference, 2006. GLOBECOM’06. IEEE. IEEE, 2006, pp. 1–5. [5] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Transactions on Cognitive Communications and Networking, vol. 3, no. 4, pp. 563–575, 2017. [6] Cisco Systems, “Cisco, Visual Networking Index - White paper,” https://www. cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/ white-paper-c11-741490.html, note = ”[Online]”, Feb. 2018. [7] ETSI, “Service requirements for next generation new services and markets (3GPP TS 22.261 version 15.5.0 Release 15),” https://www.etsi.org/deliver/etsi ts/122200 122299/122261/ 15.05.00 60/ts 122261v150500p.pdf, 2018, [Online]. BIBLIOGRAPHY [8] J. G. Andrews, S. Buzzi, W. Choi, S. V . Hanly, A. Lozano, A. C. Soong, and J. C. Zhang, “What will 5G be?” IEEE Journal on selected areas in communications, vol. 32, no. 6, pp. 1065–1082, 2014. [9] E. Dutkiewicz, X. Costa-Perez, I. Z. Kovacs, and M. Mueck, “Massive machine-type com- munications,” IEEE Network, vol. 31, no. 6, pp. 6–7, 2017. [10] H. Ji, S. Park, J. Yeo, Y . Kim, J. Lee, and B. Shim, “Ultra-reliable and low-latency commu- nications in 5G downlink: Physical layer aspects,” IEEE Wireless Communications, vol. 25, no. 3, pp. 124–130, 2018. [11] E. Dahlman, S. Parkvall, and J. Skold, 5G NR: The next generation wireless access technol- ogy. Academic Press, 2018. [12] T. Doumi, M. F. Dolan, S. Tatesh, A. Casati, G. Tsirtsis, K. Anchan, and D. Flore, “LTE for public safety networks,” IEEE Communications Magazine, vol. 51, no. 2, pp. 106–112, 2013. [13] X. Lin, J. G. Andrews, A. Ghosh, and R. Ratasuk, “An overview of 3GPP device-to-device proximity services,” IEEE Communications Magazine, vol. 52, no. 4, pp. 40–48, 2014. [14] A. F. Molisch, M. Ji, J. Kim, D. Burghal, and A. Tehrani, “Device-to-device communica- tions,” in Towards 5G: Applications, Requirements and Candidate Technologies, S. Talwar and R. Vannithamby, Eds. John Wiley & Sons, 2016. [15] S. Hur, H. Yu, J. Park, W. Roh, C. U. Bas, R. Wang, and A. F. Molisch, “Feasibility of mo- bility for millimeter-wave systems based on channel measurements,” IEEE Communications Magazine, vol. 56, no. 7, pp. 56–63, 2018. [16] J. Liu, K. Au, A. Maaref, J. Luo, H. Baligh, H. Tong, A. Chassaigne, and J. Lorca, “Initial access, mobility, and user-centric multi-beam operation in 5G new radio,” IEEE Communi- cations Magazine, vol. 56, no. 3, pp. 35–41, 2018. 142 BIBLIOGRAPHY [17] Z. Pi and F. Khan, “An introduction to millimeter-wave mobile broadband systems,” IEEE communications magazine, vol. 49, no. 6, pp. 101–107, 2011. [18] A. F. Molisch, Wireless Communications. Wiley; 2 edition, 2010. [19] L. Lu, G. Y . Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of mas- sive MIMO: Benefits and challenges,” IEEE journal of selected topics in signal processing, vol. 8, no. 5, pp. 742–758, 2014. [20] E. Bj¨ ornson, E. G. Larsson, and T. L. Marzetta, “Massive MIMO: Ten myths and one critical question,” arXiv preprint arXiv:1503.06854, 2015. [21] ETSI, “LTE; Evolved Universal Terrestrial Radio Access (E-UTRA); Physical channels and modulation (3GPP TS 36.211 version 10.0.0 Release 10),” https://www.etsi.org/deliver/etsi ts/136100 136199/136133/10.01.00 60/ts 136133v100100p.pdf, 2011, [Online]. [22] H.-Y . Hsieh and R. Sivakumar, “On using peer-to-peer communication in cellular wireless data networks,” Mobile Computing, IEEE Transactions on, vol. 3, no. 1, pp. 57–72, 2004. [23] K. Doppler, M. Rinne, C. Wijting, C. Ribeiro, and K. Hugl, “Device-to-device communica- tion as an underlay to LTE-advanced networks,” Communication Magazine, IEEE, vol. 47, no. 12, pp. 42–49, 2009. [24] D. Feng, L. Lu, Y . Yuan-Wu, G. Y . Li, S. Li, and G. Feng, “Device-to-device communi- cations in cellular networks,” IEEE Communications Magazine, vol. 52, no. 4, pp. 49–55, 2014. [25] J. Roessler, “Device to Device Communication in LTE Whitepaper,” 2015. [Online]. Available: https://cdn.rohde-schwarz.com/pws/dl downloads/dl application/application notes/1ma264/1MA264 0e D2DComm.pdf [26] D. Tsolkas, N. Passas, and L. Merakos, “Device discovery in lte networks: A radio access perspective,” Computer Networks, vol. 106, pp. 245–259, 2016. 143 BIBLIOGRAPHY [27] 3gpp, “Carrier Aggregation explained,” http://www.3gpp.org/technologies/ keywords-acronyms/101-carrier-aggregation-explained, [Online]. [28] C. E. Shannon, “Channels with side information at the transmitter,” IBM journal of Research and Development, vol. 2, no. 4, pp. 289–293, 1958. [29] A. Narula, M. J. Lopez, M. D. Trott, and G. W. Wornell, “Efficient use of side information in multiple-antenna data transmission over fading channels,” IEEE Journal on selected areas in communications, vol. 16, no. 8, pp. 1423–1436, 1998. [30] D. Gunduz, E. Erkip, and H. V . Poor, “Secure lossless compression with side information,” in Information Theory Workshop, 2008. ITW’08. IEEE. IEEE, 2008, pp. 169–173. [31] K. Hamdi, W. Zhang, and K. B. Letaief, “Power control in cognitive radio systems based on spectrum sensing side information,” in Communications, 2007. ICC’07. IEEE International Conference on. IEEE, 2007, pp. 5161–5165. [32] N. Gonz´ alez-Prelcic, A. Ali, V . Va, and R. W. Heath, “Millimeter-wave communication with out-of-band information,” IEEE Communications Magazine, vol. 55, no. 12, pp. 140–146, 2017. [33] O. Semiari, W. Saad, and M. Bennis, “Joint millimeter wave and microwave resources al- location in cellular networks with dual-mode base stations,” IEEE Transactions on Wireless Communications, vol. 16, no. 7, pp. 4802–4816, 2017. [34] W. Chu, L. Li, L. Reyzin, and R. Schapire, “Contextual bandits with linear payoff functions,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 208–214. [35] M. Dohler, R. W. Heath Jr, A. Lozano Solsona, C. B. Papadias, and R. A. Valenzuela, “Is the PHY layer dead?” 2011. [36] M. Chen et al., “Machine learning for wireless networks with artificial intelligence: A tuto- rial on neural networks,” arXiv preprint arXiv:1710.02913, 2017. 144 BIBLIOGRAPHY [37] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource allocation and cross-layer control in wireless networks,” Foundations and Trends in Networking, vol. 1, no. 1, pp. 1–144, 2006. [38] S. Adireddy and L. Tong, “Exploiting decentralized channel state information for random access,” Information Theory, IEEE Transactions on, vol. 51, no. 2, pp. 537–561, Feb 2005. [39] D. Bethanabhotla, G. Caire, and M. Neely, “Joint transmission scheduling and congestion control for adaptive streaming in wireless device-to-device networks,” in Signals, Systems and Computer, Asilomar Conference on, Nov 2012, pp. 1179–1183. [40] J. Kim, A. F. Molisch, and G. Caire, “Max-weight scheduling and quality-aware streaming for device-to-device video delivery,” CoRR, vol. abs/1406.4917, 2014. [Online]. Available: http://arxiv.org/abs/1406.4917 [41] K. Kar, X. Luo, and S. Sarkar, “Throughput-optimal scheduling in multichannel access point networks under infrequent channel measurements,” Wireless Communication, IEEE Transactions on, vol. 7, no. 7, pp. 2619–2629, 2008. [42] B. Kaufman, J. Lilleberg, and B. Aazhang, “Spectrum sharing scheme between cellular users and ad-hoc device-to-device users,” Wireless Communications, IEEE Transactions on, vol. 12, no. 3, pp. 1038–1049, March 2013. [43] M. Neely, E. Modiano, and C. Rohrs, “Dynamic power allocation and routing for time varying wireless networks,” in INFOCOM 2003. 22nd Annual Joint Conference of the IEEE Computer and Communications IEEE Society, vol. 1, 2003, pp. 745–755 vol.1. [44] S. Corson and J. Macker, “Mobile ad hoc networking (manet): Routing protocol perfor- mance issues and evaluation considerations,” United States, 1999. [45] J. Andrews, “Interference cancellation for cellular systems: a contemporary overview,” IEEE Wireless Communications, vol. 12, no. 2, pp. 19–29, April 2005. [46] S. Shalmashi, G. Miao, and S. Ben Slimane, “Interference management for multiple device- to-device communications underlaying cellular networks,” in Personal, Indoor and Mobile 145 BIBLIOGRAPHY Radio Communications (PIMRC), 2013 IEEE 24th International Symposium on, Sept 2013, pp. 223–227. [47] K. Huang, J. Andrews, R. Heath, D. Guo, and R. Berry, “Spatial interference cancellation for mobile ad hoc networks: Perfect csi,” in Global Telecomm. Conference, 2008. IEEE GLOBECOM 2008. IEEE, Nov 2008, pp. 1–5. [48] T. Yoo and A. Goldsmith, “Capacity of fading mimo channels with channel estimation er- ror,” in Communications, 2004 IEEE international conference on, vol. 2, June 2004, pp. 808–813 V ol.2. [49] M. Medard, “The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel,” Information Theory, IEEE Transactions on, vol. 46, no. 3, pp. 933–946, May 2000. [50] Mobile and wireless communications Enablers for the Twenty-twenty Information Soci- ety, “Proposed solutions for new radio access,” https://www.metis2020.com/wp-content/ uploads/deliverables/METIS D2.4 v1.pdf, [Online]. [51] G. Fodor and N. Reider, “A distributed power control scheme for cellular network assisted D2D communications,” in Global Telecommunications Conference (GLOBECOM 2011), 2011 IEEE, Dec 2011, pp. 1–6. [52] H. Tang, Z. Ding, and B. Levy, “Enabling D2D communications through neighbor discovery in lte cellular networks,” Signal Processing, IEEE Transactions on, vol. 62, no. 19, pp. 5157–5170, Oct 2014. [53] X. Wu, S. Tavildar, S. Shakkottai, T. Richardson, J. Li, R. Laroia, and A. Jovicic, “Flash- LinQ: a synchronous distributed scheduler for peer-to-peer ad hoc networks,” in Commu- nications, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, Sept 2010, pp. 514–521. 146 BIBLIOGRAPHY [54] A. Pradini, G. Fodor, G. Miao, and M. Belleschi, “Near-optimal practical power control schemes for D2D communications in cellular networks,” in Networks and Communications (EuCNC), 2014 European Conference on, June 2014, pp. 1–5. [55] D. Guo, S. Shamai, and S. Verdu, “Mutual information and minimum mean-square error in gaussian channels,” Information Theory, IEEE Transactions on, vol. 51, no. 4, pp. 1261– 1282, April 2005. [56] G. Sharma, R. R. Mazumdar, and N. B. Shroff, “On the complexity of scheduling in wireless networks,” in Proceeding of the International Conference on Mobile Computing and Networking, ser. MobiCom ’06. New York, NY , USA: ACM, 2006, pp. 227–238. [Online]. Available: http://doi.acm.org/10.1145/1161089.1161116 [57] H. Balakrishnan, C. Barrett, V . Kumar, M. Marathe, and S. Thite, “The distance-2 match- ing problem and its relationship to the MAC-layer capacity of ad hoc wireless networks,” Selected Areas in Communications, IEEE Journal on, vol. 22, pp. 1069–1079, 2004. [58] N. Naderializadeh and A. S. Avestimehr, “ITLinQ: a new approach for spectrum sharing in device-to-device communication systems,” arXiv preprint arXiv:1311.5527, 2013. [59] H. Minn and D. Munoz, “Channel knowledge acquisition in relay and multi-point to multi- point transmission systems,” Veh. Tech., IEEE Transactions on, vol. PP, no. 99, 2014. [60] O. El Ayach, A. Lozano, and R. Heath, “On the overhead of interference alignment: Train- ing, feedback, and cooperation,” Wireless Communications, IEEE Transactions on, vol. 11, no. 11, pp. 4192–4203, November 2012. [61] S. Peters and R. Heath, “User partitioning for less overhead in MIMO interference chan- nels,” Wireless Communications, IEEE Transactions on, vol. 11, no. 2, pp. 592–603, Febru- ary 2012. 147 BIBLIOGRAPHY [62] N. Golrezaei, P. Mansourifard, A. F. Molisch, and A. G. Dimakis, “Base-station assisted device-to-device communications for high-throughput wireless video networks,” IEEE Transactions on Wireless Communications, vol. 13, no. 7, pp. 3665–3676, 2014. [63] D. Marathe and S. Bhashyam, “Power control for multi-antenna gaussian channels with delayed feedback,” in Signals, Systems and Computer, Asilomar Conference on, October 2005, pp. 1598–1602. [64] O. Goussevskaia, Y . A. Oswald, and R. Wattenhofer, “Complexity in geometric SINR,” in Proceedings of the 8th ACM international symposium on Mobile ad hoc networking and computing. ACM, 2007, pp. 100–109. [65] D. Burghal and A. F. Molisch, “Location aware training scheme for D2D networks,” in Signals, Systems and Computer, Asilomar Conference on, Nov 2013, pp. 1705–1708. [66] S. M. Ross, Stochastic Processes. John Wiley and Sons, 1995. [67] J. Rice, Mathematical statistics and data analysis. Cengage Learning, 2006. [68] P. Dent, G. Bottomley, and T. Croft, “Jakes fading model revisited,” Electronics letters, vol. 29, no. 13, pp. 1162–1163, 1993. [69] A. Prasad, A. Kunz, G. Velev, K. Samdanis, and J. Song, “Energy-efficient D2D discovery for proximity services in 3GPP LTE-advanced networks: ProSe discovery mechanisms,” IEEE vehicular technology magazine, vol. 9, no. 4, pp. 40–50, 2014. [70] H. Park, Y . Kim, T. Song, and S. Pack, “Multiband directional neighbor discovery in self- organized mmwave ad hoc networks,” IEEE Transactions on Vehicular Technology, vol. 64, no. 3, pp. 1143–1155, March 2015. [71] D. Burghal, A. S. Tehrani, and A. F. Molisch, “Directional neighbor discovery in dual-band systems,” in 2015 49th Asilomar Conference on Signals, Systems and Computers, Nov 2015, pp. 1021–1025. 148 BIBLIOGRAPHY [72] J. Luo and D. Guo, “Neighbor discovery in wireless ad hoc networks based on group test- ing,” in Communication, Control, and Computing, 2008 46th Annual Allerton Conference on, Sept 2008, pp. 791–797. [73] Z. Zhang and B. Li, “Neighbor discovery in mobile ad hoc self-configuring networks with directional antennas: algorithms and comparisons,” IEEE Transactions on Wireless Com- munications, vol. 7, no. 5, pp. 1540–1549, May 2008. [74] A. S. Tehrani, A. F. Molisch, and G. Caire, “Directional zigzag: Neighbor discovery with directional antennas,” in 2015 IEEE Global Communications Conference (GLOBECOM). IEEE, 2015, pp. 1–6. [75] M. J. McGlynn and S. A. Borbash, “Birthday protocols for low energy deployment and flexible neighbor discovery in ad hoc wireless networks,” in Proceedings of the 2nd ACM international symposium on Mobile ad hoc networking & computing. ACM, 2001, pp. 137–145. [76] S. Vasudevan, J. Kurose, and D. Towsley, “On neighbor discovery in wireless networks with directional antennas,” in INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, vol. 4. IEEE, 2005, pp. 2502–2512. [77] E. Felemban, R. Murawski, E. Ekici, S. Park, K. Lee, J. Park, and Z. Hameed, “Sand: Sectored-antenna neighbor discovery protocol for wireless networks,” in 2010 7th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON), June 2010, pp. 1–9. [78] S. Vasudevan, D. Towsley, D. Goeckel, and R. Khalili, “Neighbor discovery in wireless net- works and the coupon collector’s problem,” in Proceedings of the 15th annual international conference on Mobile computing and networking. ACM, 2009, pp. 181–192. 149 BIBLIOGRAPHY [79] G. Sun, F. Wu, X. Gao, G. Chen, and W. Wang, “Time-efficient protocols for neighbor discovery in wireless ad hoc networks,” IEEE transactions on vehicular technology, vol. 62, no. 6, pp. 2780–2791, 2013. [80] H. Cai and T. Wolf, “On 2-way neighbor discovery in wireless networks with directional antennas,” in 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 2015, pp. 702–710. [81] A. Russell, S. Vasudevan, B. Wang, W. Zeng, X. Chen, and W. Wei, “Neighbor discov- ery in wireless networks with multipacket reception,” IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 7, pp. 1984–1998, 2015. [82] L. You, Z. Yuan, P. Yang, and G. Chen, “Aloha-like neighbor discovery in low-duty-cycle wireless sensor networks,” in 2011 IEEE Wireless Communications and Networking Con- ference, March 2011, pp. 749–754. [83] Z. Yuan, Y . Lizhao, W. Li, B. Chen, and Z. Xu, “History-aware adaptive backoff for neigh- bor discovery in wireless networks,” in Mobile Ad-hoc and Sensor Networks (MSN), 2011 Seventh International Conference on. IEEE, 2011, pp. 174–181. [84] A. Gonga, T. Charalambous, and M. Johansson, “Neighbor discovery in multichannel wire- less clique networks: An epidemic approach,” in 2013 IEEE 10th International Conference on Mobile Ad-Hoc and Sensor Systems, Oct 2013, pp. 131–135. [85] L. You, X. Zhu, and G. Chen, “Neighbor discovery in peer-to-peer wireless networks with multi-channel mpr capability,” in 2012 IEEE International Conference on Communications (ICC), June 2012, pp. 4975–4979. [86] Y . Zeng, K. A. Mills, S. Gokhale, N. Mittal, S. Venkatesan, and R. Chandrasekaran, “Ro- bust neighbor discovery in multi-hop multi-channel heterogeneous wireless networks,” J. Parallel Distrib. Comput., vol. 92, no. C, pp. 15–34, May 2016. 150 BIBLIOGRAPHY [87] K. W. Choi and Z. Han, “Device-to-device discovery for proximity-based service in lte- advanced system,” Selected Areas in Communications, IEEE Journal on, vol. 33, no. 1, pp. 55–66, 2015. [88] G. Jakllari, W. Luo, and S. V . Krishnamurthy, “An integrated neighbor discovery and mac protocol for ad hoc networks using directional antennas,” Wireless Communications, IEEE Transactions on, vol. 6, no. 3, pp. 1114–1024, 2007. [89] 3gpp, “3GPP TS 23.303 V12.4.0,Proximity-based services (ProSe); Stage 2,” http://www. 3gpp.org/DynaReport/23303.htm, [Online]. [90] R. Sheldon et al., A first course in probability. Pearson Education India, 2002. [91] M. Brown, E. A. Pek¨ oz, and S. M. Ross, “Coupon collecting,” Probability in the Engineer- ing and Informational Sciences, vol. 22, no. 02, pp. 221–229, 2008. [92] P. Flajolet, D. Gardy, and L. Thimonier, “Birthday paradox, coupon collectors, caching algorithms and self-organizing search,” Discrete Applied Mathematics, vol. 39, no. 3, pp. 207–229, 1992. [93] E. Anceaume, Y . Busnel, and B. Sericola, “New results on a generalized coupon collector problem using markov chains,” Journal of Applied Probability, vol. 52, no. 02, pp. 405–418, 2015. [94] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004. [95] M. Shaked, “Majorization and Schur Convexity — I,” Encyclopedia of Statistical Sciences, 2006. [96] A. V . Oppenheim, A. S. Willsky, and S. H. Nawab, Signals &Amp; Systems (2Nd Ed.). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1996. [97] R. Bhatia, Matrix analysis. Springer Science & Business Media, 2013, vol. 169. 151 BIBLIOGRAPHY [98] D. Burghal, A. S. Tehrani, and A. F. Molisch, “Base station assisted neighbor discov- ery in device to device systems,” in Personal, Indoor, and Mobile Radio Communications (PIMRC), 2017 IEEE 28th Annual International Symposium on. IEEE, 2017, pp. 1–7. [99] A. Ali, N. Gonz´ alez-Prelcic, and R. W. Heath, “Estimating millimeter wave channels us- ing out-of-band measurements,” in Information Theory and Applications Workshop, (ITA), 2016. IEEE, 2016, pp. 1–6. [100] K. Chandra, R. V . Prasad, B. Quang, and I. Niemegeers, “CogCell: cognitive interplay between 60 GHz picocells and 2.4/5 GHz hotspots in the 5G era,” IEEE Communications Magazine, vol. 53, no. 7, pp. 118–125, 2015. [101] S.-Y . Lien et al., “5G new radio: Waveform, frame structure, multiple access, and initial access,” IEEE communications magazine, vol. 55, no. 6, pp. 64–71, 2017. [102] S. Sangodoyin, U. Virk, D. Burghal, A. Molisch, and K. Haneda, “Joint characterization of mm-wave and cm-wave device-to-device MIMO channels,” in Military Communications Conference, MILCOM 2018-2018 IEEE, 10 2018. [103] T. Wang et al., “Deep learning for wireless physical layer: Opportunities and challenges,” China Communications, vol. 14, no. 11, pp. 92–111, 2017. [104] T. J. a. O’Shea et al., “Deep learning based MIMO communications,” arXiv preprint arXiv:1707.07980, 2017. [105] N. Farsad and A. Goldsmith, “Neural network detection of data sequences in communication systems,” arXiv preprint arXiv:1802.02046, 2018. [106] T. Nitsche et al., “Steering with eyes closed: Mm-wave beam steering without in-band measurement,” in 2015 IEEE Conference on Computer Communications(INFOCOM), April 2015, pp. 2416–2424. 152 BIBLIOGRAPHY [107] M. Hashemi et al., “Out-of-band millimeter wave beamforming and communications to achieve low latency and high energy efficiency in 5G systems,” IEEE Transactions on Com- munications, 2017. [108] A. Ali, N. Gonz´ alez-Prelcic, and R. W. Heath Jr, “Spatial covariance estimation for millime- ter wave hybrid systems using out-of-band information,” arXiv preprint arXiv:1804.11204, 2018. [109] H. Chergui, K. Tourki, R. Lguensat, M. Benjillali, C. Verikoukis, and M. Debbah, “Classi- fication algorithms for semi-blind uplink/downlink decoupling in sub-6 GHz/mmwave 5G networks,” arXiv preprint arXiv:1809.01583, 2018. [110] F. B. Mismar and B. L. Evans, “Partially blind handovers for mmwave new radio aided by sub-6 GHz LTE signaling,” in Proceedings IEEE International Conference on Communica- tions Work. Evolutional Tech. & Ecosystems for 5G Phase II, 2018. [111] A. Alkhateeb and I. Beltagy, “Machine learning for reliable mmwave systems: Blockage prediction and proactive handoff,” arXiv preprint arXiv:1807.02723, 2018. [112] J. Riihijarvi and P. Mahonen, “Machine learning for performance prediction in mobile cellu- lar networks,” IEEE Computational Intelligence Magazine, vol. 13, no. 1, pp. 51–60, 2018. [113] Y . Wang, M. Martonosi, and L.-S. Peh, “Predicting link quality using supervised learning in wireless sensor networks,” ACM SIGMOBILE Mobile Computing and Communications Review, vol. 11, no. 3, pp. 71–83, 2007. [114] J. Wang et al., “Spatiotemporal modeling and prediction in cellular networks: A big data enabled deep learning approach,” in INFOCOM 2017-IEEE Conference on Computer Com- munications, IEEE. IEEE, 2017, pp. 1–9. [115] L. S. Muppirisetty, T. Svensson, and H. Wymeersch, “Spatial wireless channel prediction under location uncertainty,” IEEE Transactions on Wireless Communications, vol. 15, no. 2, pp. 1031–1044, 2016. 153 BIBLIOGRAPHY [116] S. Chen, Z. Jiang, J. Liu, R. Vannithamby, S. Zhou, Z. Niu, and Y . Wu, “Remote channel inference for beamforming in ultra-dense hyper-cellular network,” in GLOBECOM 2017- 2017 IEEE Global Communications conference. IEEE, 2017, pp. 1–6. [117] S. Navabi, C. Wang, O. Y . Bursalioglu, and H. Papadopoulos, “Predicting wireless chan- nel features using neural networks,” in Communications (ICC), 2016 IEEE international Conference on. IEEE, 2018, pp. 1–6. [118] D. Burghal and A. F. Molisch, “Rate and outage probability in dual band systems with prediction-based band switching,” IEEE Wireless Communications Letters, pp. 1–1, 2018. [119] S. M. Kay, “Fundamentals of statistical signal processing, vol. ii: Detection theory,” Signal Processing. Upper Saddle River, NJ: Prentice Hall, 1998. [120] A. Leon Garcia, Probability, statistics, and random processes for electrical engineering. Pearson Education; 3rd ed., 2008. [121] D. Burghal, R. Wang, and A. F. Molisch, “Band assignment in dual band systems: A learning-based approach,” arXiv preprint arXiv:1810.01534 [eess.SP], 2018. [122] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. [123] I. Sutskever, O. Vinyals, and Q. V . Le, “Sequence to sequence learning with neural net- works,” in Advances in neural information processing systems, 2014, pp. 3104–3112. [124] S. S. Szyszkowicz et al., “On the feasibility of wireless shadowing correlation models,” IEEE Transaction on Vehicular Technology, vol. 59, no. 9, pp. 4222–4236, 2010. [125] M. Gudmundson, “Correlation model for shadow fading in mobile radio systems,” Electron- ics letters, vol. 27, no. 23, pp. 2145–2146, 1991. [126] W. I. Remcom, “https://www.remcom.com/wireless-insite-em-propagation-softwareinsite- wireless-em-propagation-more-info,” online, accessed: March 2017. 154 BIBLIOGRAPHY [127] J. Koneˇ cn` y, H. B. McMahan, F. X. Yu, P. Richt´ arik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016. 155
Abstract (if available)
Abstract
Establishing and maintaining a wireless communication link consume considerable resources, as it requires transceivers on both sides to search for one another, and to perform frequent channel estimation to maintain the desired quality of service levels. Next generation wireless networks are expected to provide connectivity for a wide range of applications, with various requirements and heterogeneous constraints, thus several new technologies have been introduced including new communication modes and frequency bands. This dissertation addresses emerging aspects in link setup and maintenance in two new architectures: device-to-device (D2D) communication and dual connectivity over two frequency bands. In particular, it discusses solutions for three problems: (i) neighbor discovery and (ii) Channel State Information (CSI) acquisition in D2D networks, and (iii) band assignment in dual-band systems. These problems become increasingly complex when the D2D network is dense or when the wireless devices are mobile. Interestingly, the problems can be simplified and/or solved more efficiently when additional information, e.g., about the wireless devices or the environment, is given. In this dissertation, the goal is to design efficient schemes that exploit the observable side information. ❧ In the area of CSI acquisition, we consider the problem of acquiring the CSI in base-station (BS) controlled D2D networks. Obtaining high-quality CSI requires a trade-off between interference, outdatedness of CSI, and noise. Thus, the goal is to find an efficient pilot scheduling scheme that minimizes errors in the estimates and minimize signaling overhead between the BS and devices. In this report, we present the Location Aware Training Scheme (LATS) as a simple yet efficient training technique. Using location as side information, and assuming that devices are aware of such information, LATS groups the devices into geographical segments and assigns a frequency reuse pattern to them. ❧ Next, as part of our contribution to the area of neighbor discovery, we consider the problem of randomized directional neighbor discovery with prior information. In general, minimizing the discovery time is the goal of neighbor discovery schemes. We study the average discovery time for directional random neighbor discovery when devices have side information about their set of possible neighbors, which also helps in identifying the performance limits of random neighbor discovery schemes. Typically, discovery time analysis is done for assumptions that simplify the network structure, such as uniform neighbor relations for all devices. However, with prior information, the directional transmission probabilities depend on the node and the direction. This complicates the analysis of the expected discovery time though it also improves performance when used correctly. ❧ We first provide a closed-form expression for the expected discovery time based on the non-uniform coupon collector problem. Next, we identify the directional transmission probabilities of each device that achieve a small discovery time. Due to the mathematical complexity, we provide a lower and an upper bounds on the expected discovery time, which allows writing the problem as a convex optimization problem. Through simulations, we demonstrate the performance gain due to prior knowledge with the proposed methods as compared to when no prior information is available, as well as the impact of uncertainty in the prior knowledge. ❧ In the third problem, we consider the band assignment (BA) in dual-band systems, where the BS chooses one of the two available frequency bands (centimeter-wave and millimeter-wave bands) to communicate with the user equipment (UE). While the millimeter-wave band might offer higher data rate, there is a significant probability of outage. To maintain the link during the outage the communication should be carried on the (more reliable) centimeter-wave band. We consider two variations of the BA problem, one-shot and sequential BA. For the former the BS uses only the currently observed information to decide whether to switch to the other frequency band, for the sequential BA, the BS uses a window of previously observed information to predict the best band for a future time step. We provide two approaches to solve the BA problem, (i) a deep learning approach that is based on Long Short Term Memory and/or multi-layer Neural Networks, and (ii) a Gaussian Process-based approach, which relies on the assumption that the channel states are jointly Gaussian. We compare the achieved performances to several benchmarks in two environments: (i) a stochastic environment, and (ii) microcellular outdoor channels obtained by ray-tracing. In general, the deep learning solution shows superior performance in both environments.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Neighbor discovery in device-to-device communication
PDF
Fundamental limits of caching networks: turning memory into bandwidth
PDF
Fundamentals of two user-centric architectures for 5G: device-to-device communication and cache-aided interference management
PDF
Coexistence mechanisms for legacy and next generation wireless networks protocols
PDF
Structured codes in network information theory
PDF
Design, modeling, and analysis for cache-aided wireless device-to-device communications
PDF
Double-directional channel sounding for next generation wireless communications
PDF
Elements of next-generation wireless video systems: millimeter-wave and device-to-device algorithms
PDF
Algorithmic aspects of energy efficient transmission in multihop cooperative wireless networks
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Multichannel data collection for throughput maximization in wireless sensor networks
PDF
A protocol framework for attacker traceback in wireless multi-hop networks
PDF
Enabling virtual and augmented reality over dense wireless networks
PDF
Joint routing, scheduling, and resource allocation in multi-hop networks: from wireless ad-hoc networks to distributed computing networks
PDF
Channel sounding for next-generation wireless communication systems
PDF
Magnetic induction-based wireless body area network and its application toward human motion tracking
PDF
Aging analysis in large-scale wireless sensor networks
PDF
Utilizing context and structure of reward functions to improve online learning in wireless networks
PDF
Learning, adaptation and control to enhance wireless network performance
PDF
Elements of robustness and optimal control for infrastructure networks
Asset Metadata
Creator
Burghal, Daoud A.
(author)
Core Title
Exploiting side information for link setup and maintenance in next generation wireless networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
08/05/2019
Defense Date
03/04/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
channel state information,CSI,D2D,device to device,link setup,machine learning,neighbor discovery,next generation wireless networks,OAI-PMH Harvest,side information,wireless communication
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Molisch, Andreas F. (
committee chair
), Nayyar, Ashutosh (
committee member
), Rosen, Gary (
committee member
)
Creator Email
burghal@usc.edu,daoud_elec@yahoo.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-207081
Unique identifier
UC11662642
Identifier
etd-BurghalDao-7722.pdf (filename),usctheses-c89-207081 (legacy record id)
Legacy Identifier
etd-BurghalDao-7722.pdf
Dmrecord
207081
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Burghal, Daoud A.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
channel state information
CSI
D2D
device to device
link setup
machine learning
neighbor discovery
next generation wireless networks
side information
wireless communication