Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Design, modeling, and analysis for cache-aided wireless device-to-device communications
(USC Thesis Other)
Design, modeling, and analysis for cache-aided wireless device-to-device communications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Design, Modeling, and Analysis for Cache-Aided Wireless Device-to-Device Communications by Ming-Chun Lee A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY Electrical and Computer Engineering August 2020 Copyright 2020 Ming-Chun Lee Acknowledgments I would like to thank Professor Mingyue Ji, Professor Nishanth Sastry, and Dr. Hao Feng for their helpful, insightful, and contributive discussions and inputs for this PhD dissertation. I would also like to thank numerous faculties, staffs, and colleagues at the USC who helped and supported my PhD study. I would like to show my significant appreciation to my highly respected advisor, Professor Andreas F. Molisch, who provided excellent suggestions and instructions during my PhD study. He not only helped me improving and refining my technical skills, but he also guided me toward a impactful research direction and taught me how to conduct high-quality researches using appropriate research principles. Last but not the least, I would like to thank my family for providing endless support throughout my PhD study and my whole life. ii Contents Acknowledgments ii List of Tables xiv List of Figures xvi Abstract xxi 1 Introduction to Cache-Aided Networks for Video Distribution 1 1.1 Caching technologies in Computer and Communication Networks . . . . . . . . . . . . . . 2 1.2 Wireless Edge Caching in Cellular Networks . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Distinctions between Cache-Aided Wireless Device-to-Device Networks and Cache-aided Wired Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Distinctions between Cache-Aided Wireless Device-to-Device Networks and Cache-aided Ad-Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Overview of Cache-Aided Wireless D2D Networks 11 2.1 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.1 Request Probability of Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2 Types of Content Caching and Delivery Strategies . . . . . . . . . . . . . . . . . . 15 iii 2.1.2.1 Centralized Control: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.2.2 Decentralized Control: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.2.3 Caching Granularity: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.3 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3.1 Network Throughput: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3.2 Network Cost: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.3.3 Energy Efficiency: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.3.4 Cache Hit-Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.3.5 Latency: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.3.6 Outage Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.4 Information Theory Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.5 Content Caching Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1.6 Content Delivery Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1.7 Joint Content Caching and Delivery Strategy . . . . . . . . . . . . . . . . . . . . . 26 2.1.8 Dynamic Content Caching and Replacement . . . . . . . . . . . . . . . . . . . . . 27 2.1.9 Coded Caching in D2D Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.10 Relevant Cross-Layer Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.1 Throughput–Outage Analysis and Evaluation of Cache-Aided D2D Networks with Measured Popularity Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.2 Optimal Throughput–Outage Analysis of Cache-Aided Wireless Multi-Hop D2D Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 iv 2.2.2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.2.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.3 Caching Policy and Cooperation Distance Design for Base Station Assisted Wire- less D2D Caching Networks: Throughput and Energy Efficiency Optimization and Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2.3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2.3.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2.4 Individual Preference Probability Modeling and Parameterization for Video Content in Wireless Caching Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.2.4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.2.4.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2.5 Individual Preference Aware Caching Policy Design in Wireless D2D Networks . . . 40 2.2.5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.2.5.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.2.6 Dynamic Caching Content Replacement in Base Station Assisted Wireless D2D Caching Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.6.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.6.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 Throughput–Outage Analysis and Evaluation of Cache-Aided D2D Networks with Measured Popularity Distributions 44 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Measured Data and Popularity Distribution Modeling . . . . . . . . . . . . . . . . . . . . . 47 v 3.3 Achievable Throughput–Outage Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.1 Network Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.2 Prerequisite for the Analysis of Throughput-Outage Tradeoff . . . . . . . . . . . . . 54 3.3.3 Throughput-Outage Tradeoff for MZipf Distributions with < 1 . . . . . . . . . . 55 3.3.4 Throughput-Outage Tradeoff for MZipf Distributions with > 1 . . . . . . . . . . 57 3.3.5 Finite-Dimensional Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.4 Evaluations of cache-aided D2D Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 Optimal Throughput–Outage Analysis of Cache-Aided Wireless Multi-Hop D2D Networks 66 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.1 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2 Network Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3 Achievable Throughput-outage Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.1 Achievable Caching and File Delivery Scheme . . . . . . . . . . . . . . . . . . . . 75 4.3.2 Throughput-Outage Performance for < 1 . . . . . . . . . . . . . . . . . . . . . . 78 4.3.3 Throughput-Outage Performance for > 1 . . . . . . . . . . . . . . . . . . . . . . 80 4.3.4 Finite Dimensional Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.4 Outer Bound of the Throughput-outage Performance . . . . . . . . . . . . . . . . . . . . . 84 4.5 Main Results forq = (1) Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5 Caching Policy and Cooperation Distance Design for Base Station Assisted Wireless D2D Caching Networks: Throughput and Energy Efficiency Optimization and Trade-Off 90 vi 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.2 Content Caching and System Modeling of Base Station Assisted Wireless D2D Caching Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2.1 Network and System Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2.2 Elementary Access Probability Analysis . . . . . . . . . . . . . . . . . . . . . . . . 97 5.3 Caching Policy and Cooperation Distance Design for Throughput Optimization . . . . . . . 99 5.3.1 Throughput Analysis for Random-Push Networks . . . . . . . . . . . . . . . . . . . 99 5.3.2 Throughput Analysis for Prioritized-Push Networks . . . . . . . . . . . . . . . . . 102 5.3.3 Throughput-Based Caching Policy and Cooperation Distance Design . . . . . . . . 106 5.4 Caching Policy and Cooperation DistanceDesign for Energy Efficiency Optimization . . . . 107 5.4.1 Energy Efficiency Analysis for Random-Push Networks . . . . . . . . . . . . . . . 107 5.4.2 Energy Efficiency Analysis for Prioritized-Push Networks . . . . . . . . . . . . . . 108 5.4.3 EE-Based Caching Policy and Cooperation Distance Design . . . . . . . . . . . . . 109 5.5 Throughput–Energy Efficiency Trade-Off Analysis and Design . . . . . . . . . . . . . . . . 110 5.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6 Individual Preference Probability Modeling and Parameterization for Video Content in Wire- less Caching Networks 125 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.2 Individual Preference Probability Modeling and Dataset Preparations . . . . . . . . . . . . . 128 6.2.1 Modeling on Individual Preference Probability . . . . . . . . . . . . . . . . . . . . 128 6.2.2 Dataset Descriptions and Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 130 vii 6.2.3 Kullback-Leibler Distance Based Parameter Estimation . . . . . . . . . . . . . . . . 132 6.2.4 Genre-Based Structure and Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.3 Proposed Modeling of Individual Popularity Distributions . . . . . . . . . . . . . . . . . . . 136 6.3.1 Size Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.3.2 Individual Genre Popularity Distribution . . . . . . . . . . . . . . . . . . . . . . . 137 6.3.3 Genre-Based Conditional Popularity Distribution . . . . . . . . . . . . . . . . . . . 139 6.4 Proposed Modeling of Individual Ranking Orders . . . . . . . . . . . . . . . . . . . . . . . 143 6.4.1 Genre Appearance Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.4.2 Genre Ranking Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.5 Statistical Parameterization of the Proposed Modeling Framework . . . . . . . . . . . . . . 145 6.5.1 Statistical Modeling for Parameters of Individual Genre Popularity Distributions . . 147 6.5.2 Statistical Modeling for Parameters of Genre-Based Conditional Popularity Distri- bution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.5.3 Statistical Modeling for Parameters of Genre Ranking Distribution . . . . . . . . . . 154 6.5.4 Statistical Modeling for User Loading . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.6 Correlation Analysis for Parameters of Proposed Modeling Framework . . . . . . . . . . . . 159 6.7 Proposed Individual Preference Probability Generation . . . . . . . . . . . . . . . . . . . . 164 6.7.1 Parameter Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.7.2 Procedure of the Proposed Individual Preference Probability Generation Approach . 166 6.7.3 Numerical Validations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 6.8 Summary of Insights and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 6.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7 Individual Preference Aware Caching Policy Design in Wireless D2D Networks 175 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 viii 7.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 7.2 Network and Individual Preference Models . . . . . . . . . . . . . . . . . . . . . . . . . . 178 7.3 Caching Policy Design Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 7.3.1 Fundamental Access Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 7.3.2 Utility Maximization Problem Formulation . . . . . . . . . . . . . . . . . . . . . . 184 7.3.3 Interpretations of the Utility Maximization Problem and its relationship to Practice . 185 7.3.3.1 Throughput Maximization Problem . . . . . . . . . . . . . . . . . . . . . 185 7.3.3.2 Cost/Power Minimization Problem . . . . . . . . . . . . . . . . . . . . . 186 7.3.3.3 Hit-Rate Maximization Problem . . . . . . . . . . . . . . . . . . . . . . 186 7.3.3.4 Throughput–Cost Weighted Sum Problem . . . . . . . . . . . . . . . . . 186 7.3.3.5 Efficiency Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 7.3.4 Effects of the Statistics of Wireless Channels and User Distributions . . . . . . . . . 188 7.3.4.1 Case 1: Systems with effective link quality control . . . . . . . . . . . . . 189 7.3.4.2 Case 2: Systems with deterministic path-loss and shadow fading . . . . . 189 7.3.4.3 Case 3:K users uniformly distributed in a square with side lengthD and with shadowing and small-scale fading . . . . . . . . . . . . . . . . . . . 190 7.4 Proposed Caching Policy Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 7.4.1 Complexity Analysis of the Proposed Caching Policy Design . . . . . . . . . . . . . 193 7.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 7.5.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 7.5.2 Effects of the Individual Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . 196 7.5.3 Tradeoff Behaviors between Different Performance Metrics . . . . . . . . . . . . . 198 7.5.4 Performance Evaluations with respect to Cluster Size . . . . . . . . . . . . . . . . . 201 7.5.5 Evaluations with Different Schedulers . . . . . . . . . . . . . . . . . . . . . . . . . 204 ix 7.5.6 Summary of the Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 8 Dynamic Caching Content Replacement in Base Station Assisted Wireless D2D Caching Net- works 211 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 8.1.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 8.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 8.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 8.3 Dynamic Caching Content Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 8.4 Drift-Plus-Penalty Aided Minimization Methodology . . . . . . . . . . . . . . . . . . . . . 223 8.5 Myopic Drift-Plus-Penalty Aided Minimization Replacement . . . . . . . . . . . . . . . . . 226 8.6 Drift-Plus-Penalty Aided Minimization Replacement Exploiting Sampling . . . . . . . . . . 229 8.6.1 Proposed Replacement Exploiting Sampling and Rolling Horizon . . . . . . . . . . 230 8.6.2 Complexity Reduction Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 8.6.2.1 Initial Candidate Number Reduction . . . . . . . . . . . . . . . . . . . . 233 8.6.2.2 Sampling with candidate pruning . . . . . . . . . . . . . . . . . . . . . . 234 8.7 Extension to Caching Multiple Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 8.8 Performance Evaluations and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 8.8.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 8.8.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 8.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 9 Concluding Remarks and Prospective Directions 247 Bibliography 250 x A Appendices of Chapter 3 275 A.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 A.2 Proof of Corollary 1 and Corollary 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 A.2.1 Proof of Corollary 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 A.2.2 Proof of Corollary 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 A.3 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 A.3.1 Proof of Regime 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 A.3.2 Proof of Regime 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 A.3.3 Proof of Regime 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 A.4 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 A.5 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 B Appendices of Chapter 4 289 B.1 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 B.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 B.3 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 B.4 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 B.5 Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 B.6 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 B.7 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 B.8 Proof of Corollary 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 B.9 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 B.10 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 B.11 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 B.12 Proof of Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 xi B.13 Proof of Theorem 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 B.14 Proof of Lemma 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 B.15 Proof of the Uniformly Random Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 C Appendices of Chapter 5 317 C.1 Proof of Lemma 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 D Appendices of Chapter 6 319 D.1 Challenges, Limitations, and Drawbacks of Kullback-Leibler Distance Based Parameter Es- timation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 D.2 Details of the Correlation Analysis Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 D.3 Generation of Correlated Random Samples with Arbitrary Distributions using Rank Corre- lation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 D.4 Details of the Parameteriztion Results and Individual Preference Probability Generation Ap- proach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 D.5 Empirical Justifications for the Proposed Modeling . . . . . . . . . . . . . . . . . . . . . . 327 D.6 Individual Preference Modeling of Facebook Dataset . . . . . . . . . . . . . . . . . . . . . 328 E Appendices of Chapter 7 352 E.1 Derivations of the expected Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 E.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 F Appendices of Chapter 8 355 F.1 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 F.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 F.3 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 xii F.4 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 xiii List of Tables 1.1 Features of Different Cache-Aided Networks . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Parameterization of Popularity Distribution using the MZipf Model . . . . . . . . . . . . . . 49 5.1 Summary of Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2 Summary of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 7.1 Summary of Frequently Used Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 D.1 Linear Correlation Results of June . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 D.2 Linear Correlation Results of July . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 D.3 Rank Correlation Results of June . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 D.4 Rank Correlation Results of July . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 D.5 K-L Distance Results of Proposed Models in Ch. 6.3 and Ch. 6.4, and loading distribution . 329 D.6 K-L Distance Results of the Specifically Designed Distributions in Ch. 6.5 . . . . . . . . . . 329 D.7 K-S Test Results of the Well-Known Distributions in Ch. 6.5 . . . . . . . . . . . . . . . . . 330 D.8 K-L Distance Results of the Quantized Well-Known Distributions in Ch. 6.5 . . . . . . . . . 330 D.9 Parameterization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 D.10 Parameterization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 D.11 Numerical Results of in g andq in g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 xiv D.12 Numerical Results ofM g of June . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 D.13 Numerical Results ofM g of July . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 xv List of Figures 2.1 A typical cache-aided wireless D2D system. Users can be served via accessing files in their own cache (self-loop blue arrows), files in their neighbors’ caches through D2D communi- cations (yellow arrows), and files in the inventory through BS links (blue arrows from BSs). Users can be either active (yellow nodes) or inactive (green nodes). Active users denote the users that have requests to be satisfied; inactive users denote the users that do not have requests to be satisfied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1 Measured ordered popularity distribution of video files of the BBC iPlayer requested via the cellular operator in July of 2014. = 0:86 and = 0:83 for Zipf distributions in Metro regions 1 and 2, respectively. andq for the MZipf distributions are shown in Table 3.1. . . 51 3.2 Relation betweenq,M, andN using data in Metro region 1 of June, 2014. . . . . . . . . . . 52 3.3 Comparison between the normalized theoretical result (solid lines) and normalized simu- lated result (dashed lines) in networks adoptingK = 4,S = 1,M = 1000, andN = 10000. 59 3.4 Throughput outage tradeoff in networks assuming mixed office scenario for propagation channel; varying local storage size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.5 Throughput outage tradeoff comparisons between different models for Metro region 2 of July in networks assuming mixed office scenario for propagation channel; varying local storage size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 xvi 4.1 Outage probability with respect toq=g c (M) in Remark 5 considering different . . . . . . . 81 4.2 Comparison between the normalized theoretical result (solid lines) and normalized simu- lated result (dashed lines) in networks adopting S = 1 and M = 1000 under different g c (M). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.1 Evaluation of the proposed analyses in the random-push networks with = 0:6 andq = 0. . 114 5.2 Evaluation of the proposed analyses in the prioritized-push network with = 0:6,q = 0, a = 0:0008m 2 , and i = 0:0042m 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.3 Throughput comparisons between the approximations in prioritized-push networks with = 0:6 andq = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.4 Throughput comparisons between the random-push and prioritized-push networks with = 0:6 andq = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.5 Throughput comparisons between the networks with and without resource constraint in the prioritized-push network with = 0:6 andq = 0. . . . . . . . . . . . . . . . . . . . . . . . 118 5.6 Throughput comparisons between different caching policies in the prioritized-push network with = 0:6 andq = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.7 EE comparisons between different caching policies in the prioritized-push network with = 0:6 andq = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.8 Performance comparisons between different caching policies in the prioritized-push network with = 1:28,q = 34, a = 0:0022m 2 , and i = 0:0028m 2 . . . . . . . . . . . . . . . 122 5.9 Throughput comparisons between different caching policies and densities in the prioritized- push network with = 1:28 andq = 34. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.1 Sketch of the individual preference probability generation of a user. . . . . . . . . . . . . . 135 6.2 Comparisons between the model and real data of size distributions. . . . . . . . . . . . . . . 138 xvii 6.3 Exemplary comparisons between the model and real data of individual genre popularity distributions in June. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.4 Exemplary comparisons between the model and real data of individual genre popularity distributions in July. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.5 Exemplary comparisons between the model and real data of genre-based conditional popu- larity distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.6 Comparisons between the model and real data of genre appearance probabilities. . . . . . . 144 6.7 Exemplary comparisons between the model and real data of ranking distributions. . . . . . . 146 6.8 Comparisons between the model and real data for the distribution of out k with NNT. . . . . . 149 6.9 Comparisons between the model and real data for the distribution ofq out k with NNT. . . . . . 150 6.10 Comparisons between the model and real data for the distribution of out k with NT. . . . . . . 152 6.11 Comparisons between the model and real data for the distribution ofq out k with NT. . . . . . . 153 6.12 Comparisons between the model and real data for the distribution of in k with NNT. . . . . . 155 6.13 Comparisons between the model and real data for the distribution ofq in k with NNT. . . . . . 156 6.14 Comparisons between the model and real data for the distribution of in k with NT. . . . . . . 157 6.15 Comparisons between the model and real data for the distribution ofq in k with NT. . . . . . . 158 6.16 Comparisons between the model and real data for the distribution ofa rk g . . . . . . . . . . . . 160 6.17 Comparisons between the model and real data for the distribution ofb rk g . . . . . . . . . . . . 161 6.18 Comparison between the model and real data of the loading distribution. . . . . . . . . . . . 162 6.19 Comparison between global popularity distributions from proposed generation approach and real data in June. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.20 Comparison between global popularity distributions from proposed generation approach and real data in July. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 xviii 7.1 This figure shows an example of the network model. In the left-middle cluster, we have K A = 2,K I = 1,S = 2, andM = 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 7.2 Comparisons between analytical and simulated results in terms of throughput and EE. . . . . 197 7.3 Comparisons between networks with different number of inactive users in terms of throughput.198 7.4 Comparisons between different designs in terms of throughput, EE, and hit-rate. . . . . . . . 200 7.5 Comparisons between different designs in terms of throughput, EE, and hit-rate. . . . . . . . 202 7.6 Comparisons between different designs in terms of throughput, EE, and hit-rate with respect to cluster size with A = 0:01 and I = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7.7 Comparisons between different designs in terms of throughput, EE, and hit-rate with respect to cluster size with A = 0:005 and I = 0:005. . . . . . . . . . . . . . . . . . . . . . . . . 206 7.8 Comparisons between different designs in terms of throughput and EE with respect to cluster size with A = 0:01 and I = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 8.1 Throughput as function of cluster size for MyDPP replacement with different step-sizes. . . 241 8.2 Throughput as function of cluster size for different replacement schemes. . . . . . . . . . . 242 8.3 Throughput as function of cluster size for different replacement schemes in networks in- cluding mobility outage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 8.4 Throughput as function of cluster size for different replacement schemes in networks in- cluding mobility outage and with caching of multiple files per user. . . . . . . . . . . . . . . 245 8.5 Throughput as function of cluster size for MyDPP replacement with different average net- work velocities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 D.1 Exemplary comparisons between the model and real data of individual genre popularity distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 xix D.2 Exemplary comparisons between the model and real data of individual genre popularity distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 D.3 Exemplary comparisons between the model and real data of genre-based conditional popu- larity distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 D.4 Exemplary comparisons between the model and real data of genre-based conditional popu- larity distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 D.5 Exemplary comparisons between the model and real data of ranking distributions. . . . . . . 337 D.6 Exemplary comparisons between the model and real data of ranking distributions. . . . . . . 338 D.7 Comparisons between the model and real data of genre appearance probabilities. . . . . . . 338 D.8 Comparisons between the model and real data of the size distribution. . . . . . . . . . . . . 339 D.9 Comparisons between the model and real data of the loading distribution. . . . . . . . . . . 339 D.10 Comparison between global popularity distributions of files from proposed generation ap- proach and real data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 D.11 Comparisons between the model and real data. . . . . . . . . . . . . . . . . . . . . . . . . . 341 D.12 Comparisons between the model and real data. . . . . . . . . . . . . . . . . . . . . . . . . . 342 D.13 Comparisons between the model and real data. . . . . . . . . . . . . . . . . . . . . . . . . . 343 D.14 Comparisons between the model and real data. . . . . . . . . . . . . . . . . . . . . . . . . . 344 D.15 Comparisons between the model and real data. . . . . . . . . . . . . . . . . . . . . . . . . . 345 D.16 Complete flow diagram of the individual preference probability and global popularity distri- bution generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 xx Abstract Based on the concentrated popularity distribution of video files, caching of popular files on devices, and distributing them via device-to-device (D2D) communications allows a significant improvement of wireless video networks. This dissertation discusses various aspects of the cache-aided wireless D2D networks, aiming to provide further understanding and improvement. Both practical and theoretical per- spectives are considered. Starting from the conventional homogeneous user preference model, we analyze the throughput-outage performance of the single-hop and multi-hop cache-aided D2D networks from the information-theoretical point of view. The analysis assumes a model based on the measured popularity distribution of mobile users; thus, this provides a connection between theory and practice. We next inves- tigate the optimum caching polices for throughput and energy efficiency under practical network models and discuss their tradeoffs. In addition, while observing that the conventional homogeneous user preference modeling could be restricted as different users can have different preferences, we investigate how individ- ual user preferences can be modeled and used to benefit cache-aided D2D networks. Based on an extensive real-world dataset, a modeling framework and parameterizations are proposed for user preferences, and an implementation recipe to generate user preferences following this model is proposed. Considering the user preferences, caching policies that optimize throughput, energy efficiency, hit-rate, and their tradeoffs are proposed and evaluated. Results show that significant benefits can be obtained by exploiting the informa- tion of individual user preferences. All above investigations are based on the static caching policies, i.e., caching policies that are invariant with respect to time and space. To accommodate the possible dynamics of networks, a proactive cache content replacement is investigated. We propose a D2D network architecture that can realize the content replacement. Also, two replacement designs exploiting Lyapunov optimization and Monte-Carlo sampling are provided. Results show that our proposed caching content replacement can significantly improve the cache-aided D2D networks when different environmental dynamics exist. xxi C H A P T E R 1 Introduction to Cache-Aided Networks for Video Distribution Demand of wireless data traffic has increased significantly. According to reports from Cisco, global mobile data traffic grew 74 percent in 2015 [1], 63 percent in 2016 [2], and 58 percent in 2017 [3]. Such growth is expected to continue and the mobile traffic is predicted to reach 77 exabytes per month by 2022 [3]. Among the mobile traffic, the mobile video traffic accounted for, 55 percent, 60 percent, and 59 percent of the total mobile traffic in 2015 [1], 2016 [2], and 2017 [3], respectively. It is expected that the mobile video traf- fic will account for 79 percent of the total mobile traffic [3] in 2022. As a result, finding new methods to provide highly efficient and low cost video services is of paramount importance for wireless systems [4]. Clearly, throughput-handling capacity of wireless networks needs to be improved. However, the conven- tional network-improving approaches [5], e.g., use of additional spectrum, massive MIMO systems, and network densification, all require obtaining more physical resources and/or deploying more infrastructure, either of which is costly. According to the applications, we can distinguish between several main categories of the video traf- fic: interactive video [6], live video streaming [7, 8], on-demand video streaming [8], and on-demand video downloading [9, 10]. The on-demand video traffic has two unique properties: asynchronous content reuse and concentrated popularity distribution. The asynchronous content reuse implies that content requests are 1 not generated at the same time. This property distinguishes the on-demand video services such as Netflix, Amazon Prime, Hulu, and Youtube, from the traditional broadcast TV , which achieved high spectral effi- ciency by forcing viewers to watch particular videos at prescribed times; thus, the multicasting/broadcasting approach used in TV services is no longer beneficial. The concentrated popularity indicates that the requests are concentrated on several popular files. Consequently, if we can “effectively” reuse content for popular files, a tremendous amount of on-demand video traffic can be satisfied with low cost and high efficiency. Over the last years, the progress of semiconductor technology has made memory the cheapest hardware resource. Based on this, the idea to trade the cheap memory for the expensive bandwidth by caching files in off-peak hours and then providing them in peak-hours becomes feasible and attractive for wireless net- works. This principle, combined with the properties of asynchronous requests and concentrated popularity distribution, rendered caching at the wireless edge a very effective and promising alternative solution for improving on-demand video distribution [4, 10]. 1.1 Caching technologies in Computer and Communication Networks Caching technologies have been discussed in different scenarios and with different contexts. To improve the access to the World Wide Web (WWW), web-caching schemes were intensively studied [11] to alleviate the Web service bottlenecks. The fundamental idea of web-caching is to bring popular web contents close to the clients by caching them in local proxies. With the evolution of Internet, many new applications emerged. The main usage of Internet then is not connecting to some specific websites, but rather accessing and sharing contents. In this context, the concept of content delivery network (CDN) [12] emerged and was widely discussed. The main goal of CDN is to distribute contents in the network such that the users and their desired contents are spatially correlated; thus improving the network performance and realiability. In contrast to simple web-caching which caches contents without regard of where a content is generated and placed in the network, CDN replicates contents by carefully considering the properties of the content 2 and network (e.g., expected popularity, link bandwidth, and processing ability), and thus makes the access of contents efficient. Although CDN has started to move from conventional application-based caching to content-oriented caching, it is still designed based on the conventional Internet which is constructed along the line of host-client model. In recent years, a tremendous number of new applications emerged; applying one universal requirement to all applications is no longer suitable. In addition, empirical results also showed that CDN cannot deal with network dynamics effectively [13]. Thus, a new Internet architecture is in need. Information centric networks (ICN) [14–16] emerged as a promising candidate for the future Internet. As compared to web-caching and CDN, ICN allows ubiquitous in-network caching, enabling a flexible caching structure (as compared to rigid caching structures in web-caching and CDN [15]) for serving various types of applications. Accordingly, ICN has been widely discussed in recent years [14–16]. In addition to above caching schemes that mainly focus wired networks, caching technology was also used in other networks and systems. Caching-aided peer-to-peer system started to be popular in the early 2000s [17–19]. Nodes in the peer-to-peer system are assumed to be equivalent in terms of capability, func- tionality, and tasks. Since it aims to construct a network without the need of (strong) centralized control, a peer-to-peer system is operated mostly with distributed functions. Caching-aided peer-to-peer system can be operated on wired [20] and wireless networks [21]. For example, a peer-to-peer system, called “Squir- rel”, was built in [20] to improve web-caching; another peer-to-peer system proposed in [21] was used for providing wireless home-to-home services. Interestingly, the definition of a peer-to-peer system has become broad and vague as it evolved [12, 19]. In some literature, different networks were deemed as realizations of the peer-to-peer system, e.g., ad-hoc networks [22]. Ad-hoc networks drew significant attention due to the potential applications in places where infrastructur- based wireless networks, such as cellular networks, do not exist. Ad-hoc network assumes that only wireless communications can happen between nodes and each node in the network needs to forward contents to other nodes. Since most of the ad-hoc networks operate in a decentralized manner with each node having the same 3 capability and functionality, some papers consider the ad-hoc network as a realization of the peer-to-peer system [22]. To improve the performance of ad-hoc networks, caching technology was intensively studied for ad-hoc networks [22–26]. In cache-aided ad-hoc networks, the “cooperative caching” is the central topic of the investigations. Cooperative caching assumes that users can share and coordinate their cached con- tents. Since users share their contents, when a user wants to access a content that is not in its own cache, multi-hop communications are used for searching and forwarding the desired contents. Cooperative caching can either be static [25] or dynamic [24]. Dynamic cooperative caching needs to update the cached con- tents when the cache is full according to some rule, called cache replacement policy which can be designed with [24, 26] and without cooperation [27]. There are some projects aiming to realize cache-aided ad-hoc networks for improving the content distribution. For example, projects for wireless home aimed to connect home devices for multimedia streaming [28–31]. In [22], a cache-aided ad-hoc network is configured with 802.11 protocol. When considering users to be high-speed vehicles, ad-hoc network is extended as vehic- ular ad-hoc network [32, 33]. Similar to ad-hoc networks, caching technology can be used for vehicular ad-hoc networks [34]. 1.2 Wireless Edge Caching in Cellular Networks Recently, aiming to resolve challenges of on-demand video distribution for mobile users, caching technology has been introduced in cellular networks, which is by-and-large called caching at the wireless edge. The main realizations of caching at the wireless edge are two fold [35, 36]: (i) caching at the BSs with limited backhaul, e.g., femtocaching [4] and hetnets caching [37], and (ii) caching at the devices/user equipments (UEs), e.g., cache-aided wireless D2D communications [4, 38]. In caching at the BSs, the BSs are assumed to either have limited backhaul or high-cost backhaul. Thus, serving users by accessing the inventory located in the core network all the time is prohibitive. In this context, caching files at the BSs is introduced to mitigate the limitation due to the backhaul and/or to 4 reduce cost due to using the high-cost backhaul. Since we assume that the popularity distribution of video files is concentrated for on-demand videos, by caching popular files in the caches of the BSs, a significant percentage of users can access the files without using the backhaul. Interestingly, this approach not only relaxes the utilization of the backhaul but also reduces the potential high delay caused by obtaining the desired files via the core network; thus the latency of distributing video files to users can be improved as well. Caching at the UEs aim to serve video file requests by using files cached in storages of UEs. Thanks to the maturity of wireless D2D communications [39], high-speed communications between UEs are feasible. This therefore facilitates the development of cache-aided wireless D2D networks, and users can obtained their desired files via using D2D communications when those files can be found in neighboring UEs. By directly satisfying video demand using D2D communications, users can obtain their files without using re- sources of the BSs. This thus offloads a heavy burden from the BSs and improves the network performance. Caching can be implemented with network coding for both caching at the BSs and cache-aided wire- less D2D. This is known as coded caching [40, 41] in the literature. The idea of coded caching is to let users pre-cache part of files according to some coding scheme. Then, the transmitter (BS or UE) aims to sat- isfy all requests of users within the minimum number of multicasting transmissions via multicasting coded files. Since users already pre-cache parts of files according to some well-designed scheme, they can suc- cessfully decode the transmitted coded files and obtain their desired portions of files such that the complete desired files can be obtained for all users. Obviously, the successfulness of decoding and the efficiency of transmissions rely on the proper design of the coding and caching scheme. Over the past years, investigations have demonstrated significant benefits of using cache-aided wireless D2D networks for video distribution [35, 36, 42–44]. As compared to caching at BSs, cache-aided wireless D2D allows improving video distribution on various aspects without installing additional infrastructure. Moreover, theoretical studies have shown that caching methods can significantly improve the scaling laws 5 of throughput [45–48]. Consequently, we will focus on studying cache-aided wireless D2D networks in this dissertation. Cache-aided wireless D2D for video distribution is different from the cache-aided wired networks. The most significant difference comes from that feature of wireless communications are different from the wired network, as will be discussed in Ch 1.3. Simple on-caching (or self-caching), where the user caches files in its own cache for satisfying its own demand is also different from cache-aided wireless D2D networks. Clearly, since in self-caching, users do not cooperate with one another, the simple self-caching cannot gain benefits from cooperation. On the other hand, the cache-aided wireless D2D network and the cache-aided ad-hoc network are very similar, and both of them aim to gain benefits by cooperation of users (nodes). However, the inclusion of BSs for D2D network scenarios differentiates the cache-aided wireless D2D network from the cache-aided ad-hoc network. This will be discussed in detail later in Sec. 1.4. It should be noted that since features of a peer-to-peer system in wireless scenarios are very similar to the cache-aided ad-hoc network, the comparisons in Ch. 1.4 can apply also to the cache-aided peer-to-peer network. 1.3 Distinctions between Cache-Aided Wireless Device-to-Device Networks and Cache-aided Wired Networks In cache-aided wired networks (web-caching, CDN, and ICN), content caching and delivery mainly happen between the nodes of the core network, such as servers, gateways, and routers. Specifically, when a client (user) sends a request to the network through its nearby routers, if the nearby routers do cache the desired content, then they can satisfy the request. If not, the routers would route the request to the gateways and possibly through the core network to the server containing the desired content for obtaining the desired content. Consequently, by caching at routers and gateways closer to the requesters, latency of the content delivery can be reduced. However, the caching technology in wired networks usually cannot be applied to 6 caching at the wireless edge, mainly due to the unique features, e.g., network structure, need of resource and link management, mobility, etc., of cellular networks [23, 24, 35]. Specifically, in caching at the wireless edge, caching at the BSs and/or UEs are adopted to improve the content distribution for mobile users. Wireless communications are thus used for distributing contents to users most of the time. In such network, when a user sends a request, it first tries to access the desired file through caches of BSs and UEs. If the desired file cannot be found from them, the core network is then used as the fall-back solution for satisfying the request. Clearly, such network structure is very different from the cache-aided wired networks, as the content delivery hits the core network only if the content cannot be found at the wireless edge. In addition, since the connections between nodes in wired networks are stable links without the broad- casting nature, the link limitation of wired networks is the maximum throughput per physical link. In contrast to wired networks, cache-aided wireless D2D networks consider wireless links where the broad- casting nature, limited resources, and dynamic channels of wireless communications render the management of links much more complicated and challenging. As a result, the limitation mainly lies in the maximum number of qualified links per area, and the caching and delivery strategy could be very different. More importantly, cache-aided wireless networks consider caching on mobile devices, leading to the dynamic caching space where cached contents can enter and leave the networks. This significantly affects the appli- cability of the caching strategy developed for cache-aided wired networks where the topology and caching space are mostly fixed. Overall, although caching technologies are used in both wired and wireless D2D networks to improve some common performance metrics, e.g., latency and user experience, the content caching and delivery strategies developed for wired networks are not directly applicable to wireless D2D networks. 7 1.4 Distinctions between Cache-Aided Wireless Device-to-Device Networks and Cache-aided Ad-Hoc Networks Cache-aided D2D networks share many common attributes with the cache-aided ad-hoc networks. However, there are still differences in many aspects. The main difference comes from the inclusion of the BSs in D2D networks [49]. In D2D networks, users are commonly covered by the BSs and cellular communications. Thus, caching and delivery in cache-aided D2D networks need to consider the influence of the BSs and other cellular users. This leads to pros and cons. For example, due to the existence of cellular links, interference between D2D links and cellular links need to be considered, leading to a more complicated interference management problem. On the other hand, due to the coverage of BSs, when there is no D2D link to obtain the desired content, the user can easily turn to BSs for possible (low-rate) services. Such architecture inherently lead to an important design aspect in cache-aided D2D networks - mode selection between cellular and D2D communications. Clearly, this design aspect is not observed in cache-aided ad- hoc networks. In addition, since the BSs usually can cover most users, files can be distributed directly to any node without long hopping. As a result, the file distribution for caching becomes more flexible. Finally, due to the intertwined coverage of cellular and D2D communications, the joint consideration of caching and delivery of them is important. This thus gives rise to the new area: the cache-aided heterogeneous network. To the best of our knowledge, the design and analysis of such network were never appearing in papers considering cache-aided ad-hoc networks. In addition to the data layer, the control layer of cache-aided wireless D2D networks is also affected by the appearance of BSs. In cellular networks, resource management is usually handled by the BSs. Thus, the delivery is non-transparent to the BS and/or network provider. Accordingly, the caching approach in cache- aided wireless D2D networks is significantly affected by the BSs, as they can help on providing control signalling and assist in improving the network optimizations. More importantly, such architecture results 8 in more centralized and flexible control for cache-aided D2D networks, as compared to cache-aided ad-hoc networks. Note that there exist centralized design for cache-aided ad-hoc networks, e.g., [50]. However, with the aid of BSs, cache-aided wireless D2D networks can be more flexible in terms of different activities on the control layer, e.g., information acquisition and exchange and control signaling. Although we can observe many differences between two networks due to the appearance of BSs, it should be noted that there are D2D networks implemented without the coverage of BSs, e.g., D2D communications in disaster and emergency situations [51–53]. In these cases, D2D networks are fundamentally identical to ad-hoc networks; so do the cache-aided D2D and cache-aided ad-hoc networks. Finally, we can observe some differences in terms of tendency and flavor for the researches between cache-aided D2D and cache-aided ad-hoc networks. For example, current papers of cache-aided D2D net- works mainly focus on the one-hop or few-hop delivery. The reason is that current technologies in smart devices, e.g., bluetooth, WiFi direct, and LTE-direct, are designed for single hop or two-hop communica- tions [49, 54]. On the other hand, cache-aided ad-hoc networks focuses on the multi-hop delivery. However, these do not fundamentally differentiate one from another, as there exist multi-hop papers for cache-aided D2D [55] and vice versa [56]. Due to more diverse and strict requirements and considerations, recent re- searches consider more diverse aspects of networks, e.g., traffic offloading, energy consumption, and trade- offs [35]. Moreover, more elements are included in cache-aided networks, e.g., context awareness, individ- ual preferences, incentive mechanism, and social behaviors [36]. These do not fundamentally differentiate a cache-aided D2D network from the cache-aided ad-hoc network. However, they somewhat differentiate the recent researches from researches in the past. Table 1.1 summarizes the features of different cache-aided networks. 1 1 It should be noted that since caching technologies have been widely considered in different networks, there exist overlaps and similarities between research topics of different cache-aided networks. Thus, This table provides the summary simply from a general point of view, and there could be exceptions being observed from time to time. 9 Table 1.1: Features of Different Cache-Aided Networks Networks Link Type Caching Entity Control Type Homogeneity Ubiquity Web-Caching Networks Wired Servers and Proxies Centralized/Decentralized No No Content Delivery Networks Wired Servers, Proxies, Centralized/Decentralized No No Gateways, and Routers Information Centric Networks Wired Servers, Proxies, Centralized/Decentralized No Yes Gateways, and Routers Peer-to-Peer Networks Wired/Wireless Peers Decentralized Yes Yes Ad-Hoc Networks Wireless Nodes Decentralized Yes Yes On-Device Caching None (no sharing) Devices Decentralized No Yes Caching at the BSs Wireless BSs Centralized/Decentralized No Yes Cache-Aided D2D Wireless Devices Centralized/Decentralized No Yes Caching in Cellular Networks Wired/Wireless BSs and Devices Centralized/Decentralized No Yes 10 C H A P T E R 2 Overview of Cache-Aided Wireless D2D Networks In this dissertation, we focus on various topics in cache-aided wireless D2D networks for video distribution. The typical model of the considered cache-aided wireless D2D network is shown in Fig. 2.1. Such a network consists of devices containing with storage/memory for caching files. Users could be covered by BSs; thus cellular links can be used for providing file delivery and/or control functionality in certain situation. On the contrary, when D2D users are not covered by BSs, the ad-hoc mode of D2D userss is considered [49], and the cache-aided D2D network is fundamentally identical to the cache-aided ad-hoc network. D2D communications can be classified into either in-band D2D or out-band D2D [57]. For in-band D2D, D2D communications occur at the same licensed spectrum as the BS-to-device communications; for out-band D2D, unlicensed spectrum is utilized for D2D communications. We can assume either that BSs in the networks are equipped with caches and limited backhaul or that BSs in the networks are conventional BSs without caching but with unlimited backhaul. In the former case, the content caching problem needs to be considered for both BSs and user devices, which complicates the problem. In cache-aided D2D networks, we have three main topics to deal with: (i) device discovery and synchro- nization; (ii) resource management and content delivery; and (iii) cache management and content placement. The first topic mainly deals with how the controller of the network can find devices in the network and syn- 11 Figure 2.1: A typical cache-aided wireless D2D system. Users can be served via accessing files in their own cache (self-loop blue arrows), files in their neighbors’ caches through D2D communications (yellow arrows), and files in the inventory through BS links (blue arrows from BSs). Users can be either active (yellow nodes) or inactive (green nodes). Active users denote the users that have requests to be satisfied; inactive users denote the users that do not have requests to be satisfied. chronize users in the network. In addition to the user discovery and synchronization that typically happens in a D2D network, a cache-aided network needs to additionally deal with how the network can know what files are cached by whom, namely the content discovery/search problem [58, 59]. The second topic is to deal with how the resource of the networks can be managed such that the content can be efficiently delivered to the requesters [35, 36] in the cache-aided network. The final topic is dealing with how content should be distributed in the network [35, 36]. In other word, this is to determine what content is cached by whom and how the network manages the caches in the network. Note that only if the caches of devices and their cached contents are appropriately managed can the network leverage the benefits brought by caching. Due to the unique feature that the desired contents can only be obtained from devices and/or BSs that cache it, 12 the content caching and delivery in cache-aided networks are mutually coupled with each other. On the one hand, the content delivery needs to consider how the contents are distributed within the network. On the other hand, how the content is delivered can have impact on the efficiency of a given content distribution method of the network. Therefore, the optimization of the network needs to jointly consider the content caching and delivery strategies. However, the joint optimization of the content caching and delivery strate- gies is very challenging. Consequently, to simplify the problem, many papers only consider optimizing one of them, while fixing the other. This will be discussed in more details below (see. Ch. 2.1.5, 2.1.6, and 2.1.7). In this dissertation, we will focus only on the content caching and delivery strategies, i.e., the second and third topics, and will generally assume that the discovery and synchronization are perfect. 2.1 Taxonomy This section presents a taxonomy of topics in cache-aided wireless D2D networks. This taxonomy will serves as the framework for discussing the contributions and understanding the significance of results in this dissertation. It should be noted that the discussion below is based on our own perspective, and there exist some high-quality survey papers discussing the taxonomy from different perspectives. They might complement our discussions below [35, 36, 43, 44]. 2.1.1 Request Probability of Files The performance of cache-aided wireless D2D networks is highly dependent on the requesting behaviors of users. Thus, the modeling of content requests is of paramount importance. In this dissertation, a proba- bilitistic request model is considered. In this type of model, we assume that there areM files in the library and users independently request files in the library according to some probability distribution. 1 If users re- 1 Note that there are papers with other models, such as a dynamic file requesting without specifying the request behaviors [24, 25, 60] and temporal-spatial model which incorporate the temporal-spatial relationship of files into the model [61–64]. They 13 quest files in the library according to the same probability distribution, i.e., all users request filef with the same probability 0 P r (f) 1, where P M f=1 P r (f), this is known as the homogeneous file-requesting model [4, 41, 42, 46, 65–74]. In this case, it is common that such probability distributionfP r g M 1 is as- sumed to follow the statistics, i.e., the popularity distribution, of file requests of users in the network, and the frequently adopted popularity distributions are Zipf popularity distribution [4, 75] and Mandelbrot-Zipf (MZipf) distribution [48, 76, 77]. When considering a homogeneous preference model, each user requests files independently and ran- domly according to the same popularity distribution. However, this model is at best an approximation, because different users indeed have different taste and preferences. Such heterogeneity in preferences of users has been observed in [78] and has also been modeled in recent papers [79–81]. Furthermore, based on real-world data, results in [78] have shown that leveraging the individual (user) preferences can improve the network performance. These observations thus clarify that, instead of using the homogeneous model, cache- aided D2D networks can be further improved by using a heterogeneous model when designing the network. Researchers have recently begun to consider individual preferences in their investigations, and studies have accordingly shown that it is possible to use this information to improve the performance of wireless caching networks [79, 82–87]. In these papers, user requests are modeled by using individual/user-specific request probability distributions. Mathematically, denoting 0P u r (f) 1 as the request probability of useru for filef, useru requests filef with the probabilityP u r (f), where P f=1 P u r (f) = 1. Since different users can have different request probability distributionsfP u r (f)g M 1 ;8u, this model describes the heterogeneity of user preferences. It should be noted that a design considering the probabilistic requesting model needs the accurate estimation/prediction of the popularity distribution and/or individual user preferences. Therefore, some papers consider such aspect jointly with the network design aspect (see Ch. 2.10). Nevertheless, in does not fit the requesting model discussed here, and they are not used in this dissertation except that the dynamic model will be used in chapter 8. 14 this dissertation, we assume that the estimation/prediction is perfect, and thus focus on the network design aspect. 2.1.2 Types of Content Caching and Delivery Strategies Depending on how the content caching/delivery strategies/policies are controlled, we can classify them either as the centralized (deterministic) or decentralized (random) caching/delivery strategies. It should be noted that it is not necessary that a centralized (decentralized) content caching strategy is accompanied with a centralized (decentralized) content delivery strategy; a centralized caching strategy can be implemented with a decentralized caching strategy, and vice versa. For example, it is common to have a decentralized caching strategy to determine what files to be cached in off-peak hours, e.g., midnight. On the other hand, it is common to have a centralized content delivery strategy because the BS can fully control the amount of resources allocated to each D2D link. 2.1.2.1 Centralized Control: In centralized strategies, a central controller is assumed to know perfectly the locations of users and is able to centrally control what content should be cached and delivered by whom. The centralized controller collects information from the environment and makes a joint decision for users [65, 88, 89]. The major benefit of a centralized controlling strategy is that different portions of the networks can fully coordinate with one another such that the network is jointly optimized. However, the downside is that the overhead of a centralized strategy can be significant as the centralized controller needs to collect information and distribute the decisions throughput the network. In addition, when the network scales to a very large size, it is common to have the scalability issue for the network. 15 2.1.2.2 Decentralized Control: In a decentralized strategy, users make decisions in a distributed manner [38, 72]. Since there is no cen- tralized controller, users collect information on their own and exchange information with one another. The major benefit of the decentralized strategy is that the overhead is low and the scalability can usually be guar- anteed. However, the global optimality might not be guaranteed with the decentralized strategy, and the set of decentralized strategies is only a subset of all possible strategies that the centralized controller can adopt. This leads to the fact that the optimal performance of a network with the decentralized setup can only be the lower bound of the network with the centralized setup, though their difference might be minor in terms of the scaling law [47, 88, 90]. It should be noted that the scalability is very important for the cache-aided wireless D2D networks and mobility can destroy the benefit of a centralized caching strategy [91]. More importantly, a well-designed randomized caching policy can provide performance very close to the deter- ministic caching policies [71, 90], while enjoying easy implementation and high scalability. As a result, the decentralized caching strategy is more popular for the cache-aided wireless D2D networks [35]. In terms of the delivery strategies, since service providers of cellular networks prefer full control of D2D communica- tions, the centralized strategy is more preferable in this context [92, 93]. However, when BSs do not exist, e.g., in cache-aided ad-hoc networks, the decentralized delivery is the dominant approach [19, 24]. 2.1.2.3 Caching Granularity: Caching can be conducted with different granularities: the complete file and a chunk of file. In the former case, caching of a complete file without splitting it into small chunks is considered. In contrast, one can choose to split a file into several chunks, and then cache only some of them. The latter one not only gives a higher granularity, it also changes the requesting behavior of users. This is because when a user requests a chunk of a file, it is likely that this user would request other chunks of the same file. Moreover, some chunks might be more frequently requested than other chunks. These unique features need to be considered when 16 caching is used at the granularity of chunk [27, 30, 94–96]. Notably, the reported Zipf and MZipf models for describing the popularity distribution of requests are usually at the granularity of file. While some efforts have been done for understanding the popularity distribution model under the granularity of chunk [97], to the best of our knowledge, it is yet to be clearly understood. 2.1.3 Performance Metrics In cache-aided wireless D2D networks, different performance metrics are used for evaluating the perfor- mance of caching and delivery strategies. The commonly used metrics are [35]: network throughput, net- work cost, energy efficiency (EE), cache hit-rate, outage probability, and latency. It should be noticed that when the above performance metrics are considered in the sense of the first-order statistics of the network (i.e., they are expected value of the network), it might not necessarily indicate that every user can benefit from having a large value of them. For example, it is possible that some users can significantly benefit from optimizing the network throughput, while some are sacrificed. This fairness issue also needs to be kept in mind when designing and evaluating the strategies, e.g., [98]. Different performance metrics can sometimes conflict with one another, and the best operation point of a network might not be obtained by having the optimum of a specific performance metric. Consequently, designs that can obtain effective tradeoff among different metrics are desirable [99, 100] in some circum- stances. A discussion of each performance metric is provided below. 2.1.3.1 Network Throughput: To evaluate the performance of a network, the throughput is widely used. Throughput of a network is defined as the number of bits (or files) that the network can delivery within a given time period, e.g., number of bits per second. When a network has a large throughput, it generally indicates that, on average, more files can be delivered within a given period, and thus more demand is satisfied within the period. 17 2.1.3.2 Network Cost: The cost of the network when delivering files is critical in some situations. In general, we are referring to the power consumption of the network when considering network cost. However, some other cost might also be used, e.g., the price of operating and/or maintaining the network. Commonly, high throughput induces high cost, and the network cost needs to be affordable for the operator. Thus, the network cost needs to be considered when designing the caching and delivery strategies. 2.1.3.3 Energy Efficiency: A tradeoff commonly exists between throughput and power consumption of the network – larger throughput induces larger power consumption and vice versa. Accordingly, how to find a balanced point of view be- tween them becomes critical. Energy efficiency (EE), defined as the number of bits (or files) delivered per unit of energy, serves as an effective metric for providing a balanced point of view for them. 2.1.3.4 Cache Hit-Rate Cache hit-rate is defined as the proportion of users that can find their desired files from cached files in the network. This metric indicates the percentage of requests that can be satisfied by the cached contents in the network; thus serving as an indicator for the efficiency of cache usage. Interestingly, this performance metric is usually more tractable than other metrics (e.g., throughput and EE), while optimizing cache hit-rate might not directly indicate the optimization of other metrics [74]. 2.1.3.5 Latency: Latency represents the time duration between the request of a user for a file and the actual reception of that file by the user. Compared to other performance metrics, latency has a significant impact on the “quality of experience” of a user. 18 2.1.3.6 Outage Probability Outage probability is defined as the probability of the occurrence of an outage event. This outage event might have different definitions in different papers. For example, one common definition for the outage is that the desired file cannot be found in the caches of the network. In this case, the outage probability is equivalent to the complement of the cache hit-rate. Another common one defines the outage as the failure of satisfying the requirements for the file delivery. The complement of the outage probability is called successful transmission rate in some papers. 2.1.4 Information Theory Limits To understand the properties and limits of a multi-user network, the scaling law anaylsis is commonly used. The scaling law analysis is to characterize how the network/user performance (e.g., throughput, outage, delay) scales as the number of usersN tends to infinity. Such scaling law analysis can provide us the perfor- mance trend as well as a means of comparison between fundamentally different communication frameworks. We note that in order to concentrate on the fundamental benefit of D2D communications, the scaling law analysis for wireless D2D networks commonly assumes that the BSs do not contribute to file delivery. As a result, the scaling law analysis for wireless D2D networks in this case applies also to the wireless ad-hoc networks and vice versa. The throughput scaling law analysis for wireless D2D networks without BSs (i.e., wireless ad-hoc net- works) has been subject to many investigations since the seminal work of Gupta and Kumar [101]. In [101], the transport capacity was investigated under both a protocol model and a physical model, with multi-hop used for communications;N users are either placed arbitrarily or placed randomly. Both a lower (achiev- able) bound on the throughput per user, of the order 1 p N logN and an upper bound (under some condi- tions) of 1 p N were derived. In [102], a similar analysis was conducted with a more generalized physical model and the upper bound 1 p N was validated under general conditions. The ( p logN) gap between 19 the achievable throughput and the upper bound was closed in [103], however, with a slightly different model where the user distribution is described by a Poisson point process (PPP). A number of other schemes and channel models were investigated in other papers as well. For example, analysis involving fading effect was provided in [104]. Also, the multicasting capacity was studied in [105] and [106]. In parallel to the investigations of the throughput scaling laws, the scaling laws for the throughput- delay tradeoff were provided in [107–109]. Since all the previous investigations were based on multi-hop communications, a natural question is whether one can go beyond the scaling law bounds of multi-hop D2D communications by using more sophisticated physical layer processing. This was indeed shown to be the case in [110], which introduced a hierarchical cooperation scheme, where the cooperation between users is used to form a distributed multiple-input multiple-output (MIMO) system among the users and gain benefits. The resulting scaling of the throughput per user is (1), at the price of very complicated cooperation among user nodes. Cache-aided ad-hoc networks have been substantially studied by the Computer Science community mostly with multi-hop communications, e.g., [21, 23, 25, 29, 111]. However, the fundamental scaling laws and optimality considerations did not draw much attention, except that [17] proposed a caching pol- icy (square-root replication policy) that provides the optimum design in terms of the expected number of nodes to visit until finding the desired content. Only recently did the fundamental properties of cache-aided D2D/ad-hoc networks start to draw more attention, and several papers have characterized the scaling laws of cache-aided D2D/ad-hoc networks without network coding. In [45], the scaling law of the throughput was characterized for single-hop cache-aided D2D networks considering a Zipf popularity distribution and a pro- tocol model for transmission between nodes. Ref. [46] investigated the scaling law of the throughput-outage performance for single-hop cache-aided D2D networks. It showed that the throughput per node can scale with S M with negligibly small outage when a heavy-tailed Zipf popularity distribution is considered, whereS is the cache size of a device. This result was later generalized in [48] to the MZipf distribution. 20 In [88], the scaling law of the average throughput per node for cache-aided D2D with multi-hop commu- nications was characterized with the assumptions of a deterministic caching policy and user locations on a grid. Ref. [47] provided an achievable throughput scaling law under the condition that the outage is vanish- ing for cache-aided wireless D2D networks with multi-hop communications. In [112], an upper bound for the throughput scaling law was proposed, which complemented results in [88] and [47]. The optimal scal- ing law for throughput-outage performance was obtained in [113]. It suggests that the optimal throughput scales with q S M when the outage is negligibly small. There exist papers investigating scaling laws using more complicated delivery approaches. The scaling laws of coded cache-aided D2D/ad-hoc networks have been studied in different contexts, e.g., [41, 114–117]. Besides, to improve cache-aided multi-hop D2D, schemes involving hierarchical cooperations were intro- duced in [118–120], and their scaling laws were characterized. In contrast to the above papers which studied the scaling behavior of the throughput and outage performance, the scaling behavior of the throughput-delay tradeoff was studied in [121]. The scaling law analysis shows that when the caching technology is used in wireless D2D networks, the performance of the network can be substantially improved. This provides the fundamental support for considering cache-aided wireless D2D networks for video distribution. 2.1.5 Content Caching Strategy Content caching strategy design is one of the main topic for cache-aided wireless D2D networks. Since the performance of a caching strategy is affected by the content delivery strategy, the best approach is to jointly design content caching and delivery strategy. However, it is sometimes too challenging to have a joint design. Therefore, some papers design caching strategies with assumption that the delivery strategy is fixed or follows some given rule. This subsection discusses papers following this principle. In addition, here we consider the static caching strategy, which is designed based on the network statistics without considering 21 the network dynamics and is used over time and space. To deal with network dynamics, the network mainly relies on the cache replacement/refreshment strategy, which will be discussed later in Ch. 2.1.8. Before the emergence of cache-aided wireless D2D networks, static caching policies for wireless sys- tems have been studied in cache-aided peer-to-peer systems and cache-aided ad-hoc networks. For example, Cohen and Shenker in [17] discussed two heuristic caching policies, and then proposed a caching policy that provides the optimum design (square-root replication policy) in terms of expected number of nodes that one needs to visit until finding the desired content. Meanwhile, with a similar group of authors of [17], several caching strategies were studied via simulations under different searching algorithms and network topologies in [18]. The results validate the optimality of the square-root replication policy in terms of expected number of nodes to search. Ref. [122] proposed a low-complexity greedy algorithm that can maximize the cache hit-rate in peer-to-peer systems. In [25], three different static methods for data replications were proposed for maximizing the data accessibility (similar to cache hit-rate). Since the designs for cache-aided ad-hoc networks and peer-to-peer systems might not be directly applicable to cache-aided wireless D2D networks. New designs appeared in recent years for cache-aided wireless D2D networks as discussed below. There are papers investigating the deterministic caching strategy for cache-aided wireless D2D networks. In [123], the determinsitc caching policy was investigated in a clustering network. It showed that a policy such that the set of all users in a cluster has the most popular files without replication can maximize the expected throughput of the network. In [70], considering the opportunistic D2D transmission and clustering network, a caching policy maximizing the cache hit-rate was proposed. Ref. [124] proposed an efficient caching algorithm to minimize the average latency of the content delivery. In [125], by taking cooperation between BSs and users into consideration, a cooperative caching policy was proposed to minimize the latency. In [126], a new architecture of inter-cluster cooperation was considered, and a deterministic caching policy was proposed to minimize the latency. It showed that the inter-cluster cooperation along with the proposed caching policy can improve 45% to 80% of the network performance as compared to the networks 22 without inter-cluster cooperation. There exist many papers investigating randomized caching policies. In randomized caching policies, users determine which file to cache according to some probabilities. Specifically, denoting P u c (f) as the probability for caching filef in useru, a randomized caching policy for useru is described asfP u c (f)g M 1 and can be implemented by the approach in [127]. Randomized caching policies have been designed in pursuit of different objectives: cache hit-rate [74], successful transmission rate, [66, 73, 74], throughput [71, 74, 128], latency [67, 129], and EE [68]. In [66], the content caching policy was optimized to pursue high successful transmission rate under the PPP user distribution. In [73], the successful transmission rate was optimized considering both the caching at BSs and D2D caching. In [74], the optimizations of throughput and hit- rate were considered independently. It also showed that optimizing the cache hit-rate does not necessarily optimize the throughput and vice versa. In [71], throughput optimization was considered in a clustering network. It considered both the deterministic and randomized caching policies and showed that a well- designed randomized caching policy can be close to the deterministic policy in terms of throughput. In [128], a caching policy that can maximize the probability of a D2D link that satisfies the capacity requirement was proposed. Ref. [67] proposed a caching policy that can maximize the probability that a user satisfies the latency requirement. Similarly, Ref. [129] investigated the caching policy that can optimize the offloading probability while satisfying the latency requirement. In [68], by taking battery life into consideration, a caching policy that can optimize the offloading gain while maintaining low energy cost was studied. The above papers generally consider the homogeneous requesting model, i.e., they assume that users request video files following the same popularity distribution. In contrast to this, there exist papers that consider the heterogeneous requesting model. Ref. [86] designed a caching policy by assuming that users in different groups have different file preferences; the goal then is to maximize the successful file discov- ery probability. In [87], a content push strategy was designed to maximize the D2D offloading gain for a particular realization of requests by jointly considering the influences of user preference and sharing will- 23 ingness. In [79], user preferences were used to maximize the offloading probability without accounting for the details of the physical layer. Using individual preference and user similarity, [84] proposed a caching content allocation approach to maximize a specifically defined utility. While [85] focused on estimating in- dividual preferences using a learning-based algorithm, the study provided a caching policy that exploits the estimated preferences in order to minimize the average delay of D2D networks. Lastly, [130] proposed a game theoretical-cooperative caching design by assuming that users know exactly what files they want to request. It should be noted that there exists papers, e.g., [78, 131], that use machine learning to learn the users’ preferences and accordingly decide which video should be preloaded onto a local device cache based on the preference of that particular user. While this kind of approach (also known as the “Netflix challenge”) is very important for recommendation systems and preloading on individual devices, it is not the main focus for designs in cache-aided D2D networks because the cooperation among users is not considered. Some papers study the caching strategy under special considerations and setups. Ref. [132] analyzed networks with full-duplex and proposed caching policy designs. Considering MIMO systems, throughput was analyzed in [133] and [134]. The effect of user mobility on cache hit-rate was discussed in [135]. In [136] and [137], mobility was leveraged to enhance network performance. In [138], a randomized caching policy with special structure for helping content exchange through D2D links was proposed and analyzed in systems that consider millimeter-wave communications for D2D links. In [139], a caching policy that maximizes offloading probability was proposed under the assumption that the coordinated multipoint trans- mission (CoMP) is allowed. Social behaviors of users are relevant to cache-aided wireless D2D networks. Designs with social awareness can thus improve the networks. In [140], a deterministic caching policy that considers social behaviors, user preferences, and heterogeneous cache sizes of users was proposed based on a hierachically structured framework. The results showed that this framework can maximize the traffic of- floaded from the BSs to D2D communications. Similarly, by taking the social behaviors into consideration, [141] proposed a caching policy that maximizes the probability that the D2D connections are agreed by the 24 users, showing the importance of considering social behaviors in caching policy design. 2.1.6 Content Delivery Strategy As opposed to Ch. 2.1.5, this subsection discusses papers that focus on content delivery in cache-aided wireless D2D networks. The principle of the papers discussed here is to optimize the content delivery under the situation that the caching strategy is either simple or fixed without being optimized. Papers considering joint content caching and delivery will be discussed later in Ch. 2.1.7. In [142], given users already cache some files in their storages, a link scheduling along with a dynamic quality-aware streaming was proposed for minimizing the delivery latency. It showed a significant improve- ment as compared to the conventional D2D scheduling schemes. In [143], a greedy source-destination pair- ing approach for cache-aided wireless D2D networks was proposed along with the use of the link schedul- ing, called ITLinQ. It showed that the spectral efficiency can be fundamentally improved. Assuming that the caching policy follows the heuristic Zipf-based structure in [71], the link scheduling and power allo- cation for D2D links were proposed in [69] via exploiting the edge-coloring technique. Considering both the cooperative and noncooperative D2D transmissions, [144] proposed a joint link scheduling and power allocation under the assumption that users randomly cache files in the library. Ref. [145] considered improv- ing the content delivery by making dynamic association decisions and conducting quality-aware streaming with long and short decision timeframes, respectively. It showed that the proposed decision-making scheme improves the delivery by effectively controlling the tradeoff between the streaming quality and playback latency. To allow distributed implementation of content delivery, [146] proposed a dynamic link schedul- ing and power allocation scheme via using the belief propagation (BP) methodology, and showed that it performs almost the same as the centralized scheme in the literature in terms of delay and power efficiency. In addition to considering the conventional cache-aided wireless D2D network structure, the content de- livery problem of cache-aided D2D has also been considered with more complicated network structure. Ref. 25 [147] considered a three-tier heterogeneous network, in which the caching is possible at small cell BSs and users, where the simple most-popular-content caching is adopted. It then analyzed the cache hit-rate of the network under different transmission modes (i.e., cellular and D2D modes), and proposed a mode selection and link scheduling algorithm that can optimizes the network throughput. In [148], a distributed content delivery scheme was proposed via BP algorithm under the consideration of BSs, single-hop D2D delivery, and multi-hop D2D delivery. The results showed that the energy consumption can improved. Consider- ing mmWave communications along with the multi-hop D2D delivery, [149] proposed a mobility-aware scheduling approach that can help improving the fog computing in cache-aided wireless D2D networks. 2.1.7 Joint Content Caching and Delivery Strategy Although the joint optimization of content caching and delivery strategy is very challenging, many papers were devoted to improving cache-aided wireless D2D networks along this line. In [65], to minimize the file delivery cost of the network while satisfying the user demand within a timeframe, a joint caching and transmission strategy was proposed under the assumption that the requests of users are known at the be- ginning of the timeframe. The policy in [65] treated different timeframes independently, which ignores the possible correlations between timeframes. To improve this, [150] assumed that the requests of a file follow some Poisson point process with a given lifetime, and then formulate the joint caching and scheduling prob- lem as a Markov decision problem (MDP) which aims to minimize cost of the transmissions. In contrast to [65] and [150] which designed joint optimization with some adaptations over time, [151] considered a static randomized caching policy, and jointly optimized the caching policy and link scheduling such that the suc- cessful offloading probability is maximized. It should be noted that the difference between the static policy in [151] and the adaptive placement in [65] and [150] is that the adaptive placement might need to place files into caches of users multiple times during the file delivery phase, while the static policy only needs to have a one-time placement of files happening at the beginning (e.g., midnight). The scheduling in [151] was 26 implemented in the way that they restrict the number of users to be activated at the same time. However, it did not explicitly provide an interference avoidance scheme. On the contrary, [152] assumed a clustering network, and optimized the caching policy, link scheduling, and power control jointly. It showed that the throughput of the network can be improved by the proposed approach. Since a user can choose which source to associate with when its desired file is cached by more than one user, association problem arises naturally. In [153], with quality-awareness, the caching and node association problems were considered for improving the time-averaged video quality of users. The results showed that the average quality of video delivery can be improved. Furthermore, it indicated that a tradeoff exists between video quality and video diversity. By considering the social behaviors, [154] proposed a joint content caching and delivery policy based on the social interactions of users. It showed that the social behaviors of users can influence the network performance. 2.1.8 Dynamic Content Caching and Replacement In Sec. 2.1.5, 2.1.6, and 2.1.7, the caching strategy was generally assumed to be static, i.e., the same caching policy is used throughout the whole network and over the whole time horizon without considering specific dynamics, except that several special joint content caching and delivery designs have adaptations due to the consideration of short-term video streaming dynamics. In a practical network, environmental dynamics usually exist and parameters and statistics can change over time and space. These dynamics generally come from (including but not limited to) the following phe- nomena: (i) the popularity distribution can change with time (e.g., emergence of a new viral video) and space (e.g., recording of different sports teams are popular in different cities); (ii) the caching realizations of the network can be inappropriate (e.g., users do not cache files according to the designated policy); and (iii) user mobility can change the locally available user density and cached files. Such network dynamics can degrade the performance of a network that uses a statically designed caching policy, whereas adaptations 27 (cache replacement) can automatically compensate. Accordingly, adopting a dynamic design can be benefi- cial. The dynamic content caching policy is generally designed in the way that cached files to be replaced from time to time, and thus known as cache replacement/refreshment design or online caching algorithm in the literature. Since the caching technologies have been used in different contexts, the replacement design has been widely discussed also in different contexts, e.g., web-caching [155], cache-aided peer-to-peer systems, information-centric networks, etc. However, as we have clarified in chapter 1, most of the replacement designs for wired networks might not be appropriate for cache-aided wireless D2D networks. Simple on- device cache replacement without cooperation has also been studied in the literature [27, 60, 156–160], e.g., LRU [158], LFU [156], GreedyDual [157], CAMP [60], and etc.. However, they might not fully leverage the benefits of D2D communications, as the possible cooperation among users is ignored. Cooperative caching and replacement that takes cooperation of users into consideration was studied in cache-aided ad-hoc networks and peer-to-peer systems. Ref. [161] considered a peer-to-peer system, and adaptively refreshed replications of servers in the system to balance loading on servers. The main idea in [161] is to prompt the creation of new replications and distribute them when the loading on a server is over a threshold. In [162], an optimal caching policy that minimizes the average cost is first studied by assuming perfect knowledge of the ad-hoc network. Then, it proposed a localized replacement algorithm by extending the idea of maintaing the priority values locally in the GreedyDual approach [157] to the network level. The results showed that this algorithm can effectively approximate the optimal design. In [24], a cache replacement design was proposed by using the value that is the product of the file size and access interest. Its idea was to first replace the file that is with large size but less interested. Considering both cache admission and cache replacement control, a group-based peer-to-peer cooperative caching framework (GroCoca) was proposed in [26] to improve the data accessibility of mobile peer-to-peer networks. The essence of GroCoca is to use common mobility patterns and similarity to form the groups of cooperative 28 caching, and then use longest time-to-live (TTL) to make caching and replacement decisions within each group. Ref. [29] studied the cache replacement in wireless home networks. It implemented the cache replacement policy by first synchronizing caches in the network to form a large cache, and then treat this big cache as one and leverage the simple on-device replacement design, e.g., GreedyDual [157]. Using the similar idea, Domical cooperative caching was proposed for multimedia streaming in [31] and [30]. The special objective of Domical is to enhance the average startup latency in wireless home networks by adapting the levels of cooperativeness and granularity. In the context of cache-aided cellular networks, several papers have investigated the dynamic cache replacement for caching at the BSs [163–169]. In [164], the authors proposed adopting a distributed caching replacement approach via Q-learning. In [165], the caches of BSs were refreshed periodically to stabilize the queues of two request types and to satisfy the quality of service requirements. In [166], the authors aimed to offload the traffic to infostations by using a multi-armed bandit optimization to refresh the caches of BSs. Ref. [167] proposed an algorithm exploiting the multi-armed bandit optimization to learn the popularity of the cached content and update the cache to increase the cache-hit rate. In [168], a reinforcement learning framework was proposed while considering popularity dynamics into the analysis in order to refresh the caches in BSs and to minimize delivery cost. In [169], the loss due to outdated caching policy was analyzed for a small cell BS and an updating algorithm to minimize the offloading loss was proposed. Based on real-data observations, [163] established a workload model and then developed simple caching content replacement policies for edge-caching networks. However, these caching replacement policies for caching at BSs do not carry over to D2D caching networks. To the best of my knowledge, there is a lack of caching content replacement that is specifically designed for cache-aided wireless D2D networks, except that in [170], the problem of how users in the cache-aided wireless D2D network can “reactively” update their caching content was investigated. However, to better leverage the cache space, the “proactive” update of the caches, i.e., files can be proactively pushed to user 29 caches even if users do not request them, might provide better performance. Designs following this principle will be discussed later in chapter 8, as part of the contributions in this dissertation. 2.1.9 Coded Caching in D2D Networks Combining network coding and caching, coded caching was first proposed in [40]. The idea of coded caching is to first let receivers locally cache part of the files. Then, the transmitters would delivery the coded files so that the requests of receivers can be satisfied simultaneously. Since files are partly cached in local devices, they can decode the coded files transmitted to them at the same time; thus save the number of transmissions for satisfying the requests of local devices. Since the seminal work proposed in [40], many papers started to exploit the idea of coded caching in different scenarios, e.g., decentralized coded caching [171], coded caching with nonuniform demand [172], and online coded caching [173]. Coded caching was also used with D2D communications. Ref. [41] was the first paper to combine coded caching with D2D communications. In [41], aiming to gain benefits from both the coded caching and frequency reuse scheme, coded caching was adopted along with D2D clustering. The result showed that the joint benefits might not be significant in terms of network throughput. The main reason is that coded caching requires having more users in a cluster to increase the cooperation gain, while the frequency reuse scheme requires less number of users in a cluster to increase the reuse gain – the coded caching gain and the frequency reuse gain conflict with each other. Although [46] showed the slightly discouraging result, coded D2D caching has been shown to be effec- tive in different contexts. In [174], coded caching was found to work well along with clustering for minimiz- ing the transmission cost of the network. Considering users having different cache sizes, [116] proposed a joint design for coded caching and D2D delivery that can minimize the D2D delivery load. Ref. [175] con- sidered the joint computation and communication offloading using cache-aided wireless D2D, and aimed to minimize energy consumption. The coded caching with D2D communications was used as a part of the 30 performance-improving approach discussed in the paper. In [117], the caching phase of coded D2D caching in [46] was simplified to gain feasibility in practice. It showed that the simplified coded D2D caching can effectively improve the average D2D delivery load and the worst-case delivery load, respectively. 2.1.10 Relevant Cross-Layer Topics There are some practical concerns that can have huge impact on the performance of cache-aided wireless D2D networks. Among them, the most important concerns are: (i) the incentive of sharing files among users; (ii) security and privacy; and (iii) the accuracy of popularity distribution estimation/prediction. Here, we will discuss papers that design the cache-aided wireless D2D networks by taking them into consideration. Whether users are willing to share their contents is of paramount importance for D2D networks. Thus, the proper incentive mechanism for cache-aided wireless D2D networks is critical. Ref. [176] aimed to incentivize users to participate file-sharing by paying them. It used the Stackelberg game to analyze the relationship between the utility of operator and users, and obtained the suitable price for encouraging file- sharing. The results showed that the obtained price can effectively improve the utilities of both operator and users. Similarly, [177] also considered to reward users who share their files via D2D communications with more practical details. It showed that Stackelberg game can again be used for obtaining the effective price, and thus the network can be improved. Ref. [178], by taking a special consideration of mobility, designed an incentive mechanism via analyzing a Stackelberg game; the cost of the operator can thus be minimized. Ref. [179] considered the joint optimization of caching policy and incentive mechanism, as the BS would reward the users for file-sharing. The results showed that the proposed design can improve the willingness of sharing of users. In [180], the power allocation, link scheduling, and rewards for users are jointly designed for the network. Using both theoretical and simulated results, it showed that the proposed design can improve the network in terms of social welfare, network capacity, and utility of the BS. Although security and privacy issues have been widely studied in simple wireless D2D networks [181], 31 as indicated in [43] and [36], they are yet to be intensively studied in cache-aided wireless D2D networks. In [114], considering the existence of an eavesdropper, the fundamental limit of coded D2D caching was studied. The results showed that when the number of files and users are large, the performance loss due to imposing secure constraints is not significant. In [182], the fundamental limits of coded D2D caching was studied in the situation that both the requests as well as the confidentiality constraints can be satisfied. The results showed that the gap between the proposed design in [182] and the upper bound can vanish as the cache size increases. Ref. [183] jointly considered the mobile computing, caching, and D2D communica- tions, and aimed to obtain both the security and efficiency for the network. By exploiting the social trust among users to make resource allocation decisions for computing and caching, it showed that the utility of the network is improved. Since most of the static caching policies are designed based on the given popularity distribution and the proactive replacement also needs the information of future popularity distributions, the accuracy of es- timation and prediction of popularity distributions is influential. Accordingly, papers relating popularity estimation/prediction with caching have been studied in the past years. The main idea of them is to exploit the relation between the prediction of popularity and caching so that the network performance can be im- proved. The survey paper [96] provided a comprehensive discussion of this sub-area of wireless caching networks. 2.2 Contributions of This Dissertation This dissertation consists of six chapters/works (from Ch. 3 to Ch. 8) which present the main results of our investigations. In this subsection, an overview of the results is first provided based on the taxonomy in Ch. 2.1. Then, the abstract and main contributions will be respectively provided for each work. Ch. 3 and Ch. 4 provide the information-theoretical analysis for cache-aided wireless D2D networks based on the measured popularity distribution. The scaling laws of the throughput-outage performance of 32 the networks are derived considering single-hop and multi-hop D2D communications in Ch. 3 and Ch. 4, respectively. The main goal of these two chapters is to obtain the scaling laws based on the popularity distributions measured from mobile users in wireless networks, as opposed to the conventional popularity distributions that are based on the dataset from wired networks. Ch. 5 discusses the caching policy and cooperation distance design that takes many practical aspects of the network into consideration. The proposed designs in Ch. 5 consider maximizing either throughput or EE or their tradeoff. Accordingly, the difference between the maximum throughput design and maximum EE design is discussed along with their tradeoff. The main goal of Ch. 5 is to contribute to the throughput-EE investigation which is lacking in the literature. The information of individual preference can be critical for caching policy design considering heteroge- neous preference model. While the individual preference aware design has been discussed in a few existing papers, there is a lack of statistical modeling and generation for individual preferences of users. Ch. 6 aims to fill this gap by proposing the statistical modeling for individual preferences based on an extensive real- world dataset from BBC iPlayer. The parameterizations and generation recipe of individual preferences are also provided. The generation recipe can help researchers generating individual preferences that are close to practice. Accordingly, researchers can validate their designs using numerical simulations with a somewhat realistic setup without the real-world data on hand. Ch. 7 investigates the individual preference aware caching policy design. The first goal of this work is to understand whether and how integrating the individual preference into the caching policy design can significantly improve the network performance. Since the interplay among different performance metrics are unclear, the second goal of Ch.7 is to address designs for different performance metrics, e.g., throughput, EE, and cache hit-rate, and their trade-offs. Finally, we observe that existing papers do not provide sufficient evaluations based on the real-world data. Therefore, the final goal of Ch. 7 is to address this issue by using the results in Ch. 6. 33 The above investigations all assume static policies, in which network dynamics might not be effectively contained. To deal with this issue, in Ch. 8, we investigate the dynamic cache replacement framework and corresponding replacement designs for cache-aided wireless D2D networks with BSs serving as the central control unit. In contrast to the reactive cache replacement which focuses on deciding what file to be evicted from the cache due to the need of caching a new file, Ch. 8 discusses the “proactive” cache replacement of the network, where files are proactively pushed into caches of devices before they request them. In the following, the abstracts of the chapters are provided along with their contributions. 2.2.1 Throughput–Outage Analysis and Evaluation of Cache-Aided D2D Networks with Measured Popularity Distributions 2.2.1.1 Abstract Caching of video files on user devices, combined with file exchange through device-to-device D2D com- munications is a promising method for increasing the throughput of wireless networks. Previous theoretical investigations showed that throughput can be increased by orders of magnitude, but assumed a Zipf distri- bution for modeling the popularity distribution, which was based on observations in wired networks. Thus, the question whether cache-aided D2D video distribution can provide in practice the benefits promised by existing theoretical literature remains open. To answer this question, Ch. 3 provides new results specifically for popularity distributions of video requests of mobile users. Based on an extensive real-world dataset, we adopt a generalized distribution, known as MZipf distribution. We first show that this popularity distribu- tion can fit the practical data well. Using this distribution, we analyze the throughput–outage tradeoff of the cache-aided D2D network and show that the scaling law is identical to the case of Zipf popularity distribu- tion when the MZipf distribution is sufficiently skewed, implying that the benefits previously promised in the literature could indeed be realized in practice. To support the theory, practical evaluations using numerical experiments are provided, and show that the cache-aided D2D can outperform the conventional unicasting 34 from base stations. 2.2.1.2 Contributions The main contributions of results in Ch. 3 are summarized below: Based on an extensive BBC iPlayer dataset, we extract the popularity distribution for the videos watched by mobile users. Such distribution is then modeled and parameterized by the MZipf distri- bution, which is a generalized version of the widely used Zipf distribution. To our best knowledge, this is the only work that reports the measured popularity distributions for mobile users and provides modeling results. To investigate the throughput–outage tradeoff of the cache-aided D2D networks considering a MZipf distribution, we generalize the theoretical treatment of [46] with a different but simpler proof tech- nique. Such generalization is non-trivial and several new techniques are used such that the influence ofq can be explicitly expressed. We show that the scaling law of cache-aided D2D achieved in [46] is achievable in the case of the practical MZipf distribution; we also characterize the influences of the critical parameters andq of the MZipf distribution on the throughput–outage tradeoff. The question of whether the gains theo- retically predicted in [46] hold with realistic (i.e., measured) popularity distributions has often been raised. Ch. 3 answers that important question. To support the theoretical study, we conduct numerical experiments with practical details and show that the cache-aided D2D can significantly outperform the conventional unicasting. To verify the benefits of considering the MZipf model from the network perspective, we also show that simula- tions using the MZipf model can provide more accurate performance evaluations compared with the conventional Zipf model. 35 2.2.2 Optimal Throughput–Outage Analysis of Cache-Aided Wireless Multi-Hop D2D Net- works 2.2.2.1 Abstract Cache-aided wireless D2D networks have demonstrated promising performance improvement for video dis- tribution compared to conventional distribution methods. Understanding the fundamental scaling behavior of such networks is thus of paramount importance. However, existing scaling laws for multi-hop networks have not been found to be optimal even for the case of Zipf popularity distributions (gaps between upper and lower bounds are not constants); furthermore, there are no scaling law results for such networks for the more practical case of a MZipf popularity distribution. We thus in this work investigate the throughput-outage per- formance for cache-aided wireless D2D network adopting multi-hop communications, with user distributed according to Poisson point process and MZipf popularity distribution of the files. We propose an achievable content caching and delivery scheme and analyze its performance. By showing that the achievable perfor- mance is tight to the proposed outer bound, the optimal scaling law is obtained. Furthermore, since the Zipf distribution is a special case of the MZipf distribution, the optimal scaling law for the networks considering Zipf popularity distribution is also obtained, which closes the gap in the literature. 2.2.2.2 Contributions The main contributions of Ch. 4 are summarized below: We propose the achievable scaling laws and the corresponding outer bounds for the throughput-outage performance of the cache-aided wireless multi-hop D2D networks where the PPP user distribution and the MZipf popularity distribution assumptions are adopted. This is the first work to provide the scaling law analysis for cache-aided wireless multi-hop D2D networks under these practical assumptions. We show that the multiplicative gap between the achievable scheme and the outer bound can be upper 36 bounded by a constant. Accordingly, there is no orderwise gap between the achievable performance and outer bound. As a result, our achievable throughput-outage scaling laws are optimum. Further- more, as the Zipf distribution is a special case of the MZipf distribution, our results close the gap between the achievable performance and outer bound observed in the literature that adopts the Zipf popularity distribution. 2.2.3 Caching Policy and Cooperation Distance Design for Base Station Assisted Wireless D2D Caching Networks: Throughput and Energy Efficiency Optimization and Trade- Off 2.2.3.1 Abstract The work in Ch. 5 investigates the optimal caching policy and cooperation distance design from both throughput and EE perspectives in base station (BS) assisted wireless D2D caching networks. By jointly considering the effects of BS transmission, D2D-caching, and self-caching, and the impact of the coopera- tion distance, a clustering approach is proposed with specifically designed power control and resource reuse policies. The throughput and EE of two network structures are comprehensively analyzed, and designs aim- ing to optimize the throughout and EE respectively are proposed. We also characterize the trade-off between the throughput and EE and provide corresponding designs. Simulations considering practical parameters are conducted to verify the analyses and evaluate the proposed designs; they demonstrate superior performance compared to state-of-the art. 2.2.3.2 Contributions The main contributions of Ch. 5 are summarized below: By jointly considering the effects of BS, D2D-caching, and self-caching, and the impact of the co- operation distance, we analyze network throughput and EE and propose the mathematically tractable 37 approximate formulations in the clustering network considering both active and inactive users with specifically designed power control and resource reuse policies. By exploiting the throughput and EE formulations, we propose the corresponding caching policy and cooperation design problems. We also show that the proposed optimization problems can be effectively solved by converting to standard concave and quasi-concave programs along with a simply one-dimensional search. We characterize the trade-off between throughput and EE and formulate their trade-off design prob- lem. To the best of our knowledge, this work is the first to address the trade-off between throughput and EE and provide the the corresponding trade-off design. By considering practical network parameters and configurations in simulations, we validate the pro- posed analyses and evaluate the proposed designs. 2.2.4 Individual Preference Probability Modeling and Parameterization for Video Content in Wireless Caching Networks 2.2.4.1 Abstract Caching of video files at the wireless edge, i.e., at the base stations or on user devices, is a key method for improving wireless video delivery. While global popularity distributions of video content have been inves- tigated in the past, and used in a variety of caching algorithms, Ch. 6 investigates the statistical modeling of the individual user preferences. With individual preferences being represented by probabilities, we iden- tify their critical features and parameters and propose a novel modeling framework by using a genre-based hierarchical structure as well as a parameterization of the framework based on an extensive real-world data set. Besides, correlation analysis between parameters and critical statistics of the framework is conducted. With the framework, an implementation recipe for generating practical individual preference probabilities 38 is proposed. By comparing with the underlying real data, we show that the proposed models and generation approach can effectively characterize individual preferences of users for video content. 2.2.4.2 Contributions The main contributions of Ch. 6 are summarized below: We propose a genre-based hierarchical modeling framework to enable the statistical descriptions of the individual preference probabilities. Specifically, we first identify that, instead of using the element- wise description, the individual preference probabilities of a user can be jointly described by the in- dividual popularity distribution and individual ranking order. The genre-based hierarchical modeling approach is then applied to both of them. Since each user owns a parameter set for describing their preference probabilities, the number of pa- rameters for the whole user set is so large that they can only be numerically handled and gradually become impossible to handle when the number of users increases. To resolve this issue, a statisti- cal parameterization is conducted for every parameter in the framework drastically condensing the description. Correlation analyses between parameters in the framework is conducted. The results reveal the critical correlation features of individual preferences and enhance the parameterization. By exploiting the framework, modeling, parameterization, and correlation analysis in this work, a complete implementation recipe of individual preference probability generation approach is pro- posed. 2 2 A complete code for the generation of individual preference probabilities of users according to the data is provided in [184]. 39 2.2.5 Individual Preference Aware Caching Policy Design in Wireless D2D Networks 2.2.5.1 Abstract Cache-aided wireless D2D networks allow significant throughput increase, depending on the concentration of the popularity distribution of files. Many studies assume that all users have the same preference distribu- tion; however, this may not be true in practice. In Ch. 7, we investigates whether and how the information about individual preferences can benefit cache-aided D2D networks. We examine a clustered network and derive a network utility that considers both the user distribution and channel fading effects into the analysis. We also formulate a utility maximization problem for designing caching policies. This maximization prob- lem can be applied to optimize several important quantities, including throughput, EE, cost, and hit-rate, and to solve different trade-off problems. We provide a general approach that can solve the proposed prob- lem under the assumption that users coordinate, and then prove that the proposed approach can obtain the stationary point under a mild assumption. Using simulations of practical setups, we show that performance can improve significantly by properly exploiting individual preferences. We also show that different types of trade-offs exist between different performance metrics, and that they can be managed through caching policy and cooperation distance designs. 2.2.5.2 Contributions The main contributions of Ch. 7 are summarized below: We formulate a utility maximization problem by considering individual preferences in the analysis. Caching policies that optimize several practically important metrics, e.g., throughput, EE, and hit-rate, and their trade-offs, can be obtained by solving the problem. We then propose a general, low-complex approach for solving the utility maximization problem, and prove that the solution approach improves at each iteration and then converges to a stationary point. 40 Considering the realistic setup based on extensive real-world data, we conduct comprehensive sim- ulations to show the benefits of exploiting individual preferences and to demonstrate trade-offs the between the different metrics. 2.2.6 Dynamic Caching Content Replacement in Base Station Assisted Wireless D2D Caching Networks 2.2.6.1 Abstract The concentrated popularity distribution of video files and the caching of popular files on users and their subsequent distribution via D2D communications have dramatically increased the throughput of wireless video networks. However, since popularity distribution is not time-invariant, and the files available in the neighborhood can change when other users move into and out of the neighborhood, there is a need for replacement of cache content. In Ch. 8, we propose a practical and feasible replacement architecture for BS assisted wireless D2D caching networks by exploiting the broadcasting of the BS. Based on the proposed architecture, we formulate a caching content replacement problem, with the goal of maximizing the time- average service rate under the cost constraint and queue stability. We combine the reward-to-go concept and the drift-plus-penalty methodology to develop a solution framework for the problem at hand. To realize the solution framework, two algorithms are proposed. The first algorithm is simple, but exploits only the historical record. The second algorithm can exploit both the historical record and future information, but is complex. Our simulation results indicate that when dynamics exist, the systems exploiting the proposed designs can outperform the systems using a static policy. 2.2.6.2 Contributions The main contributions of Ch. 8 are summarized below: We discuss the replacement problem in wireless D2D caching networks and propose a network archi- 41 tecture for the replacement procedure. To the best of our knowledge, this is the first work to focus on dynamic replacement in wireless D2D caching networks. We formulate the replacement problem in the form of a sequential decision-making problem with time-average cost constraint and queue stability. We propose a solution framework that incorporates the reward-to-go concept into the drift-plus-penalty methodology and then discuss the insights and benefits gained from adopting this framework. To put the proposed framework in practice, we develop and propose using two replacement algorithms that can satisfy the time-average constraints and stabilize the queueing system. The first algorithm is fairly simple to implement, but uses only the current system state and historical records for the content replacement. The second algorithm, on the other hand, can effectively leverage on both historical record and future predictions to make decisions. Our simulations, which adopt the practical network configurations for cache replacement, validate the effectiveness of our proposed designs. The results show that the dynamic cache replacement can significantly improve network performance. 2.3 Organization The remainder of this dissertation is organized as follows. Ch. 3 presents in detail the throughput-outage study for cache-aided wireless D2D networks with single-hop D2D communications. Following Ch. 3, Ch. 4 presents the study of the throughput-outage performance for cache-aided wireless D2D networks with multi-hop D2D communications. The work that investigates the caching policy and cooperation design for throughput and EE optimizations is presented in Ch. 5. Ch. 6 presents the work that studies the statistical modeling and generation of individual user preferences. Ch. 7 presents the work that investigates the individual preference aware caching polices. Ch. 8 contains the final work which provides the dynamic 42 caching content replacement architecture and design. Some concluding remarks and prospective directions are provided in Ch. 9. Several derivations, proofs, and supplemental materials are provided in appendices. Materials related to Ch. 3 is provide in Appendix A. Similarly, materials related to Ch. 4 can be found in Appendix B; materials related to Ch. 5 in Appendix C; materials related to Ch. 6 in Appendix D; materials related to Ch. 7 in Appendix E; and materials related to Ch. 8 in Appendix F. References are provided at the end of this dissertation. 43 C H A P T E R 3 Throughput–Outage Analysis and Evaluation of Cache-Aided D2D Networks with Measured Popularity Distributions 3.1 Introduction Cache-aided D2D networks have been widely discussed. Many papers have demonstrated that different aspects of networks, e.g., outage probability [42, 46, 185, 186], throughput [41, 42, 46, 70, 74, 99], energy efficiency (EE) [68, 99, 187], and delay [67, 188], can be improved by cach-aided wireless D2D. However, most existing papers assume that the popularity distribution follows the Zipf distribution (essentially a power law distribution). This assumption was based on observations in wired networks [189] with Youtube videos and with little empirical support for wireless network. A recent investigation [190] into wireless popularity distributions of general content showed little content reuse. However, as the authors of the [190] pointed out, since video connections were run via a secure https connection so that the content of the videos could not be determined, the investigation in [190] could not uniquely identify video content reuse. Therefore, the popularity distribution of video content reuse in wireless networks is still not clear. Accordingly, the question remains open whether cache-aided D2D video distribution can achieve in practice the significant gains promised in the literature. This chapter aims to answer this question. 44 3.1.1 Contributions In this chapter, we consider the measured video popularity distribution of the BBC (British Broadcasting Corporation) iPlayer, the most popular video distribution service in the UK. Through appropriate postpro- cessing, we are able to extract the popularity distribution for the videos watched via cellular connections (these might be different from the files watched through wired connections). We find that this distribution is not well described by a Zipf distribution, but rather a Mandelbrot-Zipf (MZipf) distribution [76], which is somewhat less skewed. Such distribution, in contrast to the simple Zipf distribution, is characterized by two parameters: the Zipf factor and plateau factorq, and it degenerates to the Zipf distribution whenq = 0. Thus, the MZipf distribution generalizes the Zipf distribution. Considering this more general model, we investigate the benefits of the cache-aided D2D video distribution. To understand the performance of the cache-aided D2D video distribution, we conduct a thorough throughput–outage tradeoff analysis following the framework in [46] but using a different analytical ap- proach and aim to see the scaling law of the throughput-outage tradeoff when the more general MZipf distribution is considered. We derive the analytical formulation of the caching policy maximizing the prob- ability of users to access the desired files via D2D communications. Based on this policy, we obtain the achievable throughput–outage tradeoff. Since the MZipf distribution has the additional factorq, the derived caching policy and achievable throughput–outage tradeoff can characterize the influence ofq. This distin- guishes our results from [46]. However, this does not imply the resulting scaling behavior is worse than the case with the Zipf distribution. In contrast, the results indicate that in a particular range ofq, the same scaling law as considering the Zipf distribution can be obtained again when the MZipf distribution is consid- ered; implying that the benefits promised by existing literature should be retained in practice. We emphasize that, after investigating the real-world data, we find that this range ofq is valid in practice. To support the theoretical analysis and verify the benefits of considering the MZipf model from the 45 perspective of the network, numerical experiments are conducted in D2D networks considering MZipf dis- tributions parameterized based on the real-world data and the realistic setup adopted from [42]. Results show that the cache-aided D2D scheme can provide orders of magnitude improvement of throughput for a negligible outage probability compared to conventional unicasting, and that the MZipf model can provide more accurate performance evaluations when compared with the Zipf model. Our main contributions are summarized below: Based on an extensive BBC iPlayer dataset, we extract the popularity distribution for the videos watched by mobile users. Such distribution is then modeled and parameterized by the MZipf distri- bution, which is a generalized version of the widely used Zipf distribution. To our best knowledge, this is the only work that reports the measured popularity distributions for mobile users and provides modeling results. To investigate the throughput–outage tradeoff of the cache-aided D2D networks considering a MZipf distribution, we generalize the theoretical treatment of [46] with a different but simpler proof tech- nique. Such generalization is non-trivial and several new techniques are used such that the influence ofq can be explicitly expressed. We show that the scaling law of cache-aided D2D achieved in [46] is achievable in the case of the practical MZipf distribution; we also characterize the influences of the critical parameters andq of the MZipf distribution on the throughput–outage tradeoff. The question of whether the gains theo- retically predicted in [46] hold with realistic (i.e., measured) popularity distributions has often been raised. The current paper answers that important question. To support the theoretical study, we conduct numerical experiments with practical details and show that the cache-aided D2D can significantly outperform the conventional unicasting. To verify the benefits of considering the MZipf model from the network perspective, we also show that simula- 46 tions using the MZipf model can provide more accurate performance evaluations compared with the conventional Zipf model. The remainder of the paper is organized as follows. In Ch. 3.2, the dataset of video requests for mobile users is described and the corresponding modeling and parameterization are presented. In Ch. 3.3, the theoretical analysis of throughput–outage tradeoff is provided and insights are discussed. We offer numerical experiments in Ch. 3.4 to support the theory. Finally, we conclude the paper in Ch. 3.5. Proofs of theorems and corollaries are relegated to appendix A of this dissertation. Scaling law order notation: given two functionsf andg, we say that: (1)f(n) =O(g(n)) if there exists a constantc and integerN such thatf(n) cg(n) forn > N. (2)f(n) = o(g(n)) if lim n!1 f(n) g(n) = 0. (3)f(n) = (g(n)) ifg(n) =O(f(n)). (4)f(n) = !(g(n)) ifg(n) = o(f(n)). (5)f(n) = (g(n)) if f(n) =O(g(n)) andg(n) =O(f(n)). 3.2 Measured Data and Popularity Distribution Modeling This work uses an extensive set of real-world data, namely the dataset of the BBC iPlayer [78, 81, 191], to obtain realistic video demand distributions. The BBC iPlayer is a video streaming service from BBC that provides video content for a number of BBC channels without charge. Content on the BBC iPlayer is available for up to 30 days depending on the policies. We consider two datasets covering June and July, 2014, which include 192,120,311 and 190,500,463 recorded access sessions, respectively. In each record, access information of the video content contains two important columns: user id and content id. User id is based on the long-term cookies that uniquely (in an anonymized way) identify users. Content id is the specific identity that uniquely identifies each video content separately. Although there are certain exceptions, user id and content id can generally help identify the user and the video content of each access. More detailed descriptions of the BBC iPlayer dataset can be found in [78, 81, 191]. 47 To facilitate the investigation, preprocessing is conducted on the dataset. We notice that a user could access the same file multiple times, possibly due to temporary disconnnections from Internet and/or due to temporary pauses by users while moving. Since a user is unlikely to access the same video after finishing watching the video within the period of a month [81] and a fetched file can be temporarily cached in the user device for later viewing, we consider multiple accesses made by the same user to the same file as a single unique access. We then separate the data requested by cellular users from those requested via cabled connections or personal WiFi by observing the services of the Internet service providers (ISPs), resulting in 640; 631 dif- ferent unique accesses (requests) among 267; 424 different users in June; 689; 461 different unique accesses among 327; 721 different users in July. We also separate the data between different regions by observing the Internet gateway through which the requests are routed. Specifically, We consider one of the four ma- jor cellular operators in the UK. We can geographically localize two of its gateways to two of the major metropolitan areas in the UK, and we believe are mainly intended to serve users from these metropolitan areas. Therefore, we use these two metropolitan region gateways to validate our results. Based on these data, we plot the global popularity distribution and find that the Zipf distribution is not a good fit. Instead, a MZipf distribution [76] provides a good approximation as demonstrated in Fig. 3.1 (since data for other months and regions show similar results, we thus omit their depictions for brevity): P r (f) = (f +q) P M j=1 (j +q) ;f = 1; 2;:::;M; (3.1) whereP r (f) is the probability that users want to access filef, i.e., the request probability of users for file f,M is the number of files in the library, is the Zipf factor, andq is the plateau factor. We note that the MZipf distribution degenerates to a Zipf distribution whenq = 0. We also note that a possible reason, as described in [192], for observing the MZipf distribution instead of the Zipf distribution is that a user only fetches the same file once. A fitting that minimizes the Kullback-Leibler (KL) divergence between the data and model provides 48 Table 3.1: Parameterization of Popularity Distribution using the MZipf Model Region (June) q (June) M (June) (July) q (July) M (July) Whole UK 1.36 50 16823 1.28 34 19379 Metro 1 1.23 33 6449 1.16 22 7345 Metro 2 1.18 28 4859 1.11 18 5405 values of the parameters ,q, andM as shown in Table 3.1. 1 The results imply that up to a breakpoint, i.e., q, of approximately 20-50 files, the popularity distribution is relatively flat, and decays faster from there. Also importantly, we find that > 1 for all results, which has important implications for the throughput– outage scaling law due to caching. Moreover, we find that the values ofq are much smaller (order-wise) than the values ofM, which also has an important implication that the aggregate memory of the D2D network can easily surpass the number of files requested with similar probabilities and thus should be cached in the D2D network intuitively. Mathematically, in Ch. 3.3, we will see that when the aggregate memory is smaller than the value ofq (order-wise), the outage goes to 1 asymptotically as the library sizeM andq go to infinity, indicating poor performance. Finally, based on the data in Metro region 1 during June, 2014, Fig. 3.2 shows the relationship between the values ofq, M, and the number of usersN; we letN range here from 10 to 10; 000, covering the range of realistic values for the number of users in a cell. It can be observed thatq is much smaller thanM whenN is realistic. Although not shown here for brevity, is (on average) between 0:2 and 1:1 for the range ofN considered in Fig. 3.2, and generally increases whenN increases. Although our dataset cannot directly represent the global popularity distribution of a small area, e.g., a cell, due to the limitation discussed, our results are the best indication currently available, because to the best of our knowledge there are no publicly available data for video reuse of mobile data on a per-cell basis. We will thus make in the following the assumption that the popularity distribution at each location follows 1 The KL divergence of a parameter setx is defined asD KL(x) = P m p data m log p data m p model m (x) . 49 the global (over a particular region) popularity distribution and use the parameters of Metro regions 1 and 2 henceforth. 3.3 Achievable Throughput–Outage Tradeoff From the measured data, we understand that the MZipf distribution is more suitable for mobile data traffic. In this section, we thus generalize the theoretical treatment in [46] by considering the MZipf distribution and provide the achievable throughput–outage tradeoff analysis. 3.3.1 Network Setup In this section, we describe the network model and define the throughput–outage tradeoff. Denote the number of users in the network as N. Our goal is to provide the asymptotic analysis when N ! 1, M !1, and q !1. 2 We assume a network where user devices can communicate with each other through direct links. We consider the transmission policy using clustering, in which the devices are grouped geographically into clusters such that any device within one cluster can communicate with any other devices in the same cluster with a constant rateC bits/second, but not with devices in a different cluster. The network is split into equal-sized clusters. We adopt a grid network in which the users are placed on a regular grid [46]. As a result,g c (M)N2N, which is a function ofM and denoted as the cluster size, is the number of users in a cluster and is a parameter to be chosen in order to analyze the throughput–outage tradeoff. Moreover, we say a potential link exists in the cluster if a user can find its desired file in the cluster through D2D communications and say that a cluster is good if it contains at least one potential link. 2 We generally considerq =O(M) because, by definition, the MZipf distribution would converge to simple uniform distribution whenq = !(M). Besides, as a matter of practice, we can see from Table 3.1 thatq is much smaller thanM. Note that we view the case thatq = (1) is a constant simply as a degenerate case of our results. Also, based on the experimental results, changes within a (small) finite range, i.e., does not go to infinity, asM increases. We therefore approximate as a fixed constant for the sake of analysis. 50 10 0 10 1 10 2 10 3 10 4 Rank (f) 10 -5 10 -4 10 -3 10 -2 10 -1 Popularity (P r (f)) Data Curve Fitting Curve - MZipf Fitting Curve - Zipf (a) Metro region 1. 10 0 10 1 10 2 10 3 10 4 Rank (f) 10 -5 10 -4 10 -3 10 -2 10 -1 Popularity (P r (f)) Data Curve Fitting Curve - MZipf Fitting Curve - Zipf (b) Metro region 2. Figure 3.1: Measured ordered popularity distribution of video files of the BBC iPlayer requested via the cellular operator in July of 2014. = 0:86 and = 0:83 for Zipf distributions in Metro regions 1 and 2, respectively. andq for the MZipf distributions are shown in Table 3.1. 51 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Number of Users 0 5 10 15 20 25 Value of q 0 500 1000 1500 2000 2500 3000 Number of files (M) Figure 3.2: Relation betweenq,M, andN using data in Metro region 1 of June, 2014. We assume only a single user in a cluster can use its potential link to obtain the requested file at a time, thus avoiding the interference between users in the same cluster. Besides, potential links of the same cluster are scheduled with equal probability (or, equivalently, in round robin). Therefore all users have the same average throughput. To avoid interference between clusters, we use a spatial reuse scheme with Time Division Multiple Access (TDMA). DenotingK as the reuse factor, such a reuse scheme evenly appliesK colors to the clusters, and only the clusters with the same color can be activated on the same time-frequency resource for D2D communications. Note that the adopted reuse scheme is analogous to the spatial reuse scheme in conventional cellular networks [193]. Besides, we use TDMA only as convenient example. Any scheme that allocates orthogonal resources to clusters with different colors is aligned with our model. Although the assumptions above are made for the subsequent theoretical analysis, they are actually practical. Specifically, the adjustable size of the cluster can be implemented by adapting the transmit power - in other words, the transmit power is chosen such that communication between opposite corners of a cluster is possible. The link rate for the D2D communication is fixed when no adaptive modulation and coding, and of course this rate has to be smaller than the capacity for the longest-distance communication envisioned in this system. The signal-to-noise ratio (SNR) is determined by the pathloss; small-scale fading can be neglected since in highly frequency-selective channels, the effects of this fading can be eliminated 52 by exploiting frequency diversity [193], and shadowing effects can be accommodated by adding a constant throughput loss from the systems point of view since the caching policy as well as the file delivery are implemented on a long-term scale. It must be emphasized that the above network is not optimum for D2D communications. Suitable power control, adaptive modulation and coding, etc., could all increase the spectral efficiency. However, our model provides both a useful lower bound on the performance as well as analytical tractability, which is important for comparability between different schemes. The information theoretical optimal throughput-outage tradeoff analysis is beyond the scope of this paper. We denoteS as the cache memory in a user device, i.e., a user can cache up toS files. Note that we do not considerS to grow to infinity asN!1,M!1, andq!1, i.e., we considerS = (1) as a fixed network parameter, in this paper. The aggregate memory in a cluster is thenSg c (M). An independent random caching policy is adopted for users to cache files. DenoteP c (f) as the probability of caching file f, where 0 P c (f) 1 and P M f=1 P c (f) = 1. Using such caching policy, each user caches each file independently at random according toP c (f). We note that when using this policy, a user might cache the same file multiple times, and this policy is used for the sake of analysis. Given the popularity distribution P r (), caching policy P c (), and transmission policy, we define the average throughput of a user u as T u = E [T u ], where T u is a throughput realization of user u, and the expectation is taken over the realizations of the cached files and requests. The minimum average throughput isT min = min u T u =T u due to the symmetry of the network (e.g., round robin scheduling). We define the number of users in outageN o as the number of users that cannot find their requested files. Thus the average outage is: p o = 1 N E [N o ] = 1 N X u P T u = 0 = 1P c u ; (3.2) whereP c u is the probability that a useru can find its desired file in a cluster. Due to the symmetry of the network,P c u is the same for all users. P c u is also called “hit-rate” in some literature [74, 186]. We note that our network setup follows the framework in [46]. Thus please refer to [46] for more rigorous descriptions. 53 3.3.2 Prerequisite for the Analysis of Throughput-Outage Tradeoff In this section, we analyze the achievable throughput–outage tradeoff defined by the following: Definition [46]: For a given network and popularity distribution, a throughput–outage pair (T;P o ) is achievable if there exists a caching policy and a transmission policy with outage probabilityp o P o and minimum per-user average throughputT min T . Under the network setup considered in Ch. 3.3.1, we determine the throughput–outage tradeoff by adopting the caching policy maximizingP c u and by adjusting the cluster sizeg c (M). We thus first provide the following theorem: Theorem 1: We definec 2 = qa 0 , wherea 0 = S(gc(M)1)1 , andc 1 1 is the solution of the equality c 1 = 1 +c 2 log 1 + c 1 c 2 . LetM!1,N!1, andq!1. Supposeg c (M)!1 asM!1, and denotem as the smallest index such thatP c (m + 1) = 0. Under the network model in Ch. 3.3.1, the caching distributionP c () that maximizesP c u is: P c (f) = 1 z f + ;f = 1;:::;M; (3.3) where = m 1 P m f=1 1 z f ,z f = (P r (f)) 1 S(gc(M)1)1 , [x] + = max(x; 0), and m = min c 1 Sg c (M) ;M : (3.4) Proof. See Appendix A.1. Observe thatP c (f) is monotonically decreasing andm determines the number of files whoseP c (f)> 0. Besides, we can observe that c 1 1 and c 1 = 1 only if c 2 = o(1). Furthermore, we can see that c 1 = (c 2 ) whenc 2 = (1). Thus, when consideringq = Sgc(M) and c 1 Sgc(M) < M, we obtain m = ( c 1 Sgc(M) ) = ( c 2 Sgc(M) ) = (q). Combining above results, Theorem 1 indicates that the caching policy should cover at least up to the file at rank q (order-wise) in the library. This is intuitive because the MZipf distribution has a relatively flat head andq characterizes the breaking point. 54 Using the result in Theorem 1, we then characterize P c u , i.e., the probability that a user can find the desired file in a cluster, in Corollaries 1 and 2: Corollary 1: Let M !1, N !1, and q!1. Suppose g c (M)!1 as M !1. Consider q =O Sgc(M) and g c (M) < M c 1 S . Under the network model in Ch. 3.3.1 and the caching policy in Theorem 1,P c u is expressed as: P c u = c 1 Sgc(M) +q 1 (M +q) 1 (q + 1) 1 (1 ) c 1 Sgc(M) +q c 1 Sgc(M) (M +q) 1 (q + 1) 1 (q + 1) 1 (M +q) 1 (q + 1) 1 : (3.5) Corollary 2: Let M ! 1, N ! 1, and q ! 1. Suppose g c (M) ! 1 as M ! 1. Consider q =O Sgc(M) andg c (M) = M c 1 S , where . DefineD = q M . Under the network model in Ch. 3.3.1 and the caching policy in Theorem 1,P c u is lower bounded as P c u 1 (1 )e (=c 1 ) (1 +D) 1 (D) 1 h (1 +D) S(gc(M)1)1 +1 (D) S(gc(M)1)1 +1 i (S(gc(M)1)1) : (3.6) Proof. See Appendix A.2. 3.3.3 Throughput-Outage Tradeoff for MZipf Distributions with < 1 Using the previous results, we characterize the throughput–outage tradeoff for < 1 in the following theorems. Theorem 2: Let M !1, N !1, and q!1. Suppose g c (M)!1 as M !1. Consider M =O(N),q =O Sgc(M) , and < 1. Denote = 1 2 (i.e., = 21 1 ). Under the network model in Ch. 3.3.1 and the caching policy in Theorem 1, we characterize the throughput–outage tradeoff achievable by adopting the caching policy in Theorem 1 using three regimes: (i) Define c 4 = q M . When g c (M) = c 3 M , where c 3 = (1), the achievable throughput–outage tradeoff is T (P o ) = C K M c 3 1 exp c 3 2 " Sc 1 c 3 +c 4 (Sc 1 c 3 +c 4 ) (c 4 ) 1 #!! +o(M ); 55 whereP o = 1M Sc 1 c 3 +c 4 (Sc 1 c 3 +c 4 ) (c 4 ) 1 . (ii) Definec 5 = q gc(M) . Wheng c (M) =!(M )< M c 1 S , the achievable throughput–outage tradeoff is T (P o ) = C K 1 g c (M) +o 1 g c (M) ; whereP o = 1 (gc(M)) 1 (M+c 5 gc(M)) 1 (c 5 gc(M)+1) 1 Sc 1 +c 5 (Sc 1 +c 5 ) (c 5 ) 1 . (iii) DefineD = q M . Wheng c (M) = M c 1 S , where , the achievable throughput-outage tradeoff is T (P o ) = C K Sc 1 M +o 1 M ; whereP o = (1 )e (=c 1 ) (1+D) 1 (D) 1 h (1 +D) S(gc(M)1)1 +1 (D) S(gc(M)1)1 +1 i (S(gc(M)1)1) . Proof. See Appendix A.3. By comparing Theorem 2 with Theorem 5 in [46], we observe that, whenq =O Sgc(M) , the scaling order of the throughput-outage tradeoff in MZipf popularity distribution is identical to that in the Zipf popularity distribution. 3 Theorem 2 indicates that the achievable throughput–outage tradeoff has the same scaling law as the Zipf distribution when the order of q is no larger than the the order of the aggragate memory, indicating that the performance improvement using the cache-aided D2D network with Zipf distribution can be retained when the popularity distribution follows the more practical MZipf distribution. In particular, since other regimes could have unacceptable high outage, the only regime we are interested in is the third regime of Theorem 2. We can then see from the results that the throughput scales with respect to ( S M ), meaning that the throughput of cache-aided D2D scales much better than the conventional unicasting whenN is much greater thanM (small library), i.e., T / S M >> 1 N . Besides, the throughput scales linearly with respect to the memory size of each device. The results also imply that cache-aided D2D has the same scaling law 3 By observing Theorem 2, it is then clear that we are not interested in cases thatgc(M) =o(M ) andgc(M) =!(M) since the former one gives an even worse outage, i.e.,Po! 1, and the latter one gives worse throughput whenPo! 0. 56 as the coded multicasting scheme of [40] and is better than Harmonic Broadcasting [194]. For the detailed discussions regarding scaling laws of different schemes, please refer to [46]. Theorem 2 does not characterize the case thatq = ! Sgc(M) . We thus provide the relevant discus- sions. Specifically, we consider the regime thatq = ! Sgc(M) whileq =O(M). This is because when q =!(M), the popularity distribution becomes a uniform distribution asymptotically, in which we are not interested. We then provides Theorem 3. Theorem 3: Let M !1, N !1, and q!1. Suppose g c (M)!1 as M !1. Consider < 1, q = ! Sgc(M) , andq =O(M) (i.e, g c (M) = o(M)). Under the network model in Ch. 3.3.1 and the caching policy in Theorem 1, the achievable outage is lower bounded by 1 asymptotically, i.e., P o 1o(1). Proof. See Appendix A.4. Theorem 3 suggests that we should increase the cluster size such that the aggregate memory is at least the same order ofq, i.e.,Sg c (M) = (q); otherwise the outage will always go to 1. In practice, this implies the outage of the network will be excessive if the aggregate memory is not large enough to accommodate caching at least to the order ofq files. 3.3.4 Throughput-Outage Tradeoff for MZipf Distributions with > 1 From Theorem 2, we understand that when < 1, the only meaningful regime is the third regime. In practice, this implies that it is necessary to have a high density D2D network (or on the other hand, a small library) for realizing the benefits of D2D caching. In this section, we want to see whether this condition can be relaxed when > 1, i.e., the popularity is more concentrated on the popular files located in the flat regime of the MZipf distribution. Since Theorem 3 suggests to have a sufficient aggregate memory, we thus focus on the first two regimes of Theorem 2. Specifically, we are interested in the scenario thatg c (M) = o(M) 57 andq =O Sgc(M) : 4 Theorem 4: LetM!1,N!1, andq!1. Supposeg c (M)!1 asM!1. Consider > 1, g c (M) = o(M) N, andq =O Sgc(M) . Definec 6 = q gc(M) . Under the network model in Ch. 3.3.1 and the caching policy in Theorem 1, the achievable throughput–outage tradeoff is T (P o ) = C K 1 g c (M) +o 1 g c (M) ; (3.7) whereP o = (c 6 ) 1 Sc 1 +c 6 Sc 1 +c 6 . Proof. See appendix A.5. Ifq =o Sgc(M) , we obtainc 1 = 1 andc 6 =o(1) by definition. We thus have Corollary 3: Corollary 3: LetM!1,N!1, andq!1. Supposeg c (M)!1 asM!1. Consider > 1, g c (M) =o(M)N, andq =o Sgc(M) . Under the network model in Ch. 3.3.1 and the caching policy in Theorem 1, the achievable throughput–outage tradeoff is T (P o ) = C K 1 g c (M) +o 1 g c (M) ; (3.8) whereP o =o(1). From Theorem 3 and Corollary 3, we observe that when > 1, we obtain the scaling law that is better than ( S M ) but worse than ( S q ). In practice, it implies that when > 1 and the aggregate memory is larger than the order ofq, the improvement of the cache-aided D2D could still be significant even if we have a large library. This relaxes the condition that we need a small library to have significant benefits when < 1. 3.3.5 Finite-Dimensional Simulations Finally, we provide results from finite-dimensional simulations in Fig. 3.3, which compares theoretical (solid lines) and simulated (dashed lines) curves. In Fig. 3.3, we adoptK = 4,S = 1,M = 1000, andN = 4 We actually can see from Theorem 4 that we needq =O Sgc(M) to boundPo away from 1 sincePo! 1 whenc6 =!(1). 58 10 -2 10 -1 10 0 Outage 10 -5 10 -4 10 -3 10 -2 Normalized Throughput Theory, = 0.4 Simulation, = 0.4 Theory, = 0.6 Simulation, = 0.6 Theory, = 0.8 Simulation, = 0.8 Theory, = 1.16 Simulation, = 1.16 0.04 0.06 0.08 0.1 0.12 0.14 0.8 1 1.2 1.4 1.6 1.8 10 -4 (a) Comparisons between different whoseq = 20. 10 -2 10 -1 10 0 Outage 10 -5 10 -4 10 -3 10 -2 Normalized Throughput Theory, q = 0 Simulation, q = 0 Theory, q = 20 Simulation, q = 20 Theory, q = 200 Simulation, q = 200 0.04 0.06 0.08 0.1 0.12 0.14 0.8 1 1.2 1.4 10 -4 (b) Comparisons between differentq whose = 0:6. Figure 3.3: Comparison between the normalized theoretical result (solid lines) and normalized simulated result (dashed lines) in networks adoptingK = 4,S = 1,M = 1000, andN = 10000. 10000. We observe that our analysis can effectively characterize (with small gap) the throughput–outage tradeoff even with finite dimensional setups. This is not common, as indicated by [46], when analyzing the scaling behavior of wireless networks. We note that although not being shown here for brevity, simulations with other parameters, e.g.,N = 5000 and/orM = 1500, show similar results. 59 3.4 Evaluations of cache-aided D2D Networks Our theoretical analysis shows that the cache-aided D2D scheme outperforms the conventional unicasting even if the popularity distribution follows the more practical MZipf distribution. To support the theory, we present simulations of the throughput-outage tradeoff using MZipf distributions parameterized according to the real-world data in the network considering practical setups as in [42]. For the simualtions, communi- cations between users occur at 2.4 GHz. We assume a cell of dimensions 0:36km 2 (600m 600m) that contains buildings as well as streets/outdoor environments. We assumeN = 10000 users in the cell, i.e., on average, there are 2 3 nodes in each square of 10 10 meters. The cell contains a Manhattan grid of square buildings with side length of 50m, separated by streets of width 10m. Each building is made up of offices of size 6:2m 6:2m. Within the cell, users (devices) are distributed at random according to a uniform distribution. Due to our geometrical setup, each node is assigned to be outdoors or indoors, and in the latter case placed in a particular office. Since 2:4 GHz communication can penetrate walls, we have to account for different scenarios, which are indoor communication (Winner model A1), outdoor-to-indoor communication (B4), indoor-to-outdoor communication (A2), and outdoor communication (B1) (see [42]). The number of clusters in a cell is varied from 2 2 = 4; 3 2 = 9;::::27 2 = 729; a frequency reuse factor ofK = 4 is used to minimize the inter-cluster interference. The cache memory on each deviceS is kept as a parameter that we will be varied in the simulations. To provide some real-world connections: storage of an hour-long video in medium video quality (suitable for a cellphone) takes about 300 MByte. Thus, storing 100 files with current cellphones is reasonably realistic, and given the continuous increase in memory size, even storing 500 files is not prohibitive (assuming some incentivization by network operators or other entities). In terms of channel models, we mostly employ the Winner channel models with a minor modification, motivated by the fact that it is difficult for establish a D2D link at low SNR [42], that no D2D communication 60 is possible for a distance larger than 100 m. In particular, we directly use Winner II channel models with antenna heights of 1:5m, as well as the probabilistic Line of Sight (LOS) and Non Line of Sight (NLOS) models. We add a probabilistic body shadowing loss ( L b ) with a lognormal distribution, where for LOS, L b = 4:2 and for NLOS, L b = 3:6 to account for the blockage of radiation by the person holding the device; see [195]. More details about the channel model can be found in [42]. Since Metro regions 1 and 2 of the dataset cover much smaller regions compared to the whole UK, and thus are expected to describe better the effects that might be encountered within a particular cell (though they are still much larger than a cell), we use their corresponding parameters for MZipf distributions in the simulations. Fig. 3.4(a) shows the throughput-outage tradeoff for different cache sizes on each device in Metro region 1. An outage of 10% implies that 90% of traffic can be offloaded to the D2D communications. We can see that the throughput of 10 5 bps can be achieved if the cache size of each user is up to 1=10 of the library size. Even forS =M=50, i.e., approximate 100 files (30 GB), the advantage compared to conventional unicasting described in [42] is two orders of magnitude. Even just caching of 30 files (M=200) also provides significant throughput gains, though only for outage probabilities> 0:01. The results for Metro region 2 (Fig. 3.4(b)) are very similar. Finally, we verify whether using the more detailed model of the MZipf distribution has an impact on the performance of the caching system. Note that here it is not important whether the throughput is better or worse with a specific model, but whether it is correct. In other words, is there a difference in performance when using the MZipf fit (more complicated, but better fit, as discussed above), or the Zipf fit? The short answer is that indeed there is a difference, as explained in detail in the following. In Fig. 3.5, we consider modeling results of Metro region 2 of July, and compare the proposed MZipf model with the Zipf model adopting the best-fitting parameter = 0:83 (fits the whole data curve) and the best-tail-fitting parameter = 1:11 (fits the power law of the tail). When comparing using the MZipf model 61 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 Outage Probability 10 2 10 3 10 4 10 5 10 6 Throughput per User (bps) Convetional Unicasting Proposed D2D (S=M/6) Proposed D2D (S=M/10) Proposed D2D (S=M/50) Proposed D2D (S=M/100) Proposed D2D (S=M/200) (a) Metro region 1 of July. 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 Outage Probability 10 2 10 3 10 4 10 5 10 6 Throughput per User (bps) Convetional Unicasting Proposed D2D (S=M/6) Proposed D2D (S=M/10) Proposed D2D (S=M/50) Proposed D2D (S=M/100) Proposed D2D (S=M/200) (b) Metro region 2 of July. Figure 3.4: Throughput outage tradeoff in networks assuming mixed office scenario for propagation channel; varying local storage size. 62 to using the Zipf model with the best-fitting parameter ( = 0:83), we can observe a performance gap be- tween them, and such gap increases as the storage size decreases. This is because a Zipf model fails to model the flattened head of the popularity distribution, and this drawback is significant when the storage size of de- vices is not large enough to completely store all files in the flattened head. When comparing with the Zipf model using the best-tail-fitting parameter ( = 1:11), the gap between the Zipf and MZipf models is even more significant, indicating simply fitting the power law of the tail could lead to a fairly inaccurate result. Above results indicate that, when using the inaccurate Zipf distribution to evaluate the system performance, it might generate an inaccurate result. Therefore it is necessary to use the MZipf distribution for model- ing and analysis. As a remark, we observe in some curves that decreasing the throughput (by increasing the cluster size) does not improve the outage. This is because when the cluster size is large and the storage size of devices is small, users need to fetch the desired files from their neighbors with distances larger than 100 m. Since we assume that a D2D link with a distance longer than 100 m is prohibited, a large cluster size accompanied by a small storage size leads to high channel outage and increases the overall outage. 3.5 Conclusions To answer the open question whether cache-aided D2D for video distribution can provide in practice the benefits promised in the literature, we analyze and evaluate the throughput–outage performance considering measured popularity distributions. Using an extensive dataset, we observe that the widely used Zipf distri- bution cannot effectively describe the popularity distribution of real wireless traffic data. We thus propose using a generalized version of Zipf distribution, i.e., the MZipf distribution, to model and parameterize the real data. Comparisons using measurements and numerical simulations verify the accuracy and necessity of this modeling. Considering such generalized modeling, we generalize the theoretical treatment in [46] and analyze the throughout–outage tradeoff. In particular, we show the impact of the plateau factorq of the MZipf distribution in the optimal caching distribution and the throughput-outage tradeoff. Theoretical re- 63 10 -3 10 -2 10 -1 10 0 Outage Probability 10 2 10 3 10 4 10 5 10 6 Throughput per User (bps) Conventional Unicasting Proposed D2D (S=M/10), Zipf ( =0.83) Proposed D2D (S=M/10), MZipf Proposed D2D (S=M/100), Zipf ( =0.83) Proposed D2D (S=M/100), MZipf Proposed D2D (S=M/200), Zipf ( =0.83) Proposed D2D (S=M/200), MZipf Proposed D2D (S=M/500), Zipf ( =0.83) Proposed D2D (S=M/500), MZipf (a) = 0:83 for Zipf distribution 10 -3 10 -2 10 -1 10 0 Outage Probability 10 2 10 3 10 4 10 5 10 6 Throughput per User (bps) Conventional Unicasting Proposed D2D (S=M/10), Zipf ( =1.11) Proposed D2D (S=M/10), MZipf Proposed D2D (S=M/100), Zipf ( =1.11) Proposed D2D (S=M/100), MZipf Proposed D2D (S=M/200), Zipf ( =1.11) Proposed D2D (S=M/200), MZipf Proposed D2D (S=M/500), Zipf ( =1.11) Proposed D2D (S=M/500), MZipf (b) = 1:11 for Zipf distribution Figure 3.5: Throughput outage tradeoff comparisons between different models for Metro region 2 of July in networks assuming mixed office scenario for propagation channel; varying local storage size. 64 sults show that the scaling behavior of the cache-aided D2D is identical to case of Zipf distribution under some parameter regimes validated by real data, implying that the benefits in the case of the Zipf distribution could be retained. To support the theory, extensive numerical evaluations considering practical propagation scenarios and other details are provided, and show that the cache-aided D2D for video distribution signif- icantly outperforms the conventional unicasting. Since the theory and numerical experiments both suggest positive results, we thus conclude that the cache-aided D2D for video distribution can in practice provide the benefits promised in the existing literature. 65 C H A P T E R 4 Optimal Throughput–Outage Analysis of Cache-Aided Wireless Multi- Hop D2D Networks 4.1 Introduction In Ch. 3, the throughput–outage performance of cache-aided wireless D2D networks with single-hop D2D communications has been discussed. The results in Ch. 3 validate the benefits of using single-hop cache- aided D2D from information-theoretical aspect. Based on the literature, we observe that uncoded cache- aided wireless D2D networks with multi-hop D2D communications can perform much better than that using only single-hop D2D communications [47, 88, 112, 196]. As a result, one nature question is whether such advantage of cache-aided wireless multi-hop D2D communications retains as the MZipf distribution is con- sidered. Moreover, we observe that there is a gap between the achievable throughput-outage performance and its outer bound in the literature when the Zipf popularity distribution is adopted. Consequently, another goal of this chapter is to close this gap. 66 4.1.1 Related Literature The throughput scaling law analysis for wireless D2D (or ad-hoc) networks has been subject to many inves- tigations since the seminal work of Gupta and Kumar [101]. In [101], the transport capacity was investigated under both a protocol model and a physical model, with multi-hop used for communications;N users are ei- ther placed arbitrarily or randomly. Both a lower (achievable) bound on the throughput per user, of the order 1 p N logN and an upper bound (under some conditions) of 1 p N were derived. In [102], a similar analysis was conducted with a generalized physical model and the upper bound 1 p N was validated un- der general conditions. The ( p logN) gap between the achievable throughput and the upper bound was closed in [103], however, with a slightly different model where the user distribution is described by a Pois- son point process (PPP). A number of other schemes and channel models were investigated in other papers as well. For example, analysis involving fading effect was provided in [104]. Also, the multicasting capacity was studied in [105] and [106]. In parallel to the investigations of the throughput scaling law, scaling laws for the throughput-delay tradeoff in wireless ad-hoc networks were provided in [107–109]. In [107], it was shown that the optimal throughput-delay tradeoff under random node distribution with no mobility isD(N) = (T (N)), where D(N) andT (N) are delay and throughput, respectively. Then, this result was generalized to networks with mobility in [108, 109]. Since all the previous investigations were based on multi-hop communications, a natural question is whether one can go beyond the scaling law bounds of multi-hop D2D communications by using more sophisticated physical layer processing. This was indeed shown to be the case in [110], which introduced a hierarchical cooperation scheme, where the cooperation between users is used to form a distributed multiple-input multiple-output (MIMO) system among the users and gain benefits. The result- ing scaling of the throughput per user is (N " ) for arbitrarily small", at the price of very complicated cooperation among user nodes. 67 Cache-aided D2D/ad-hoc networks have been substantially studied by the Computer Science community mostly with multi-hop communications, e.g., [21, 23, 25, 29, 111]. However, the fundamental scaling laws and optimality considerations did not draw much attention, except that [17] proposed a caching policy (square-root replication policy) that provides the optimum design in terms of the expected number of nodes to visit until finding the desired content. Only recently did the fundamental properties of cache-aided D2D/ad- hoc networks start to draw more attention, and several papers have characterized the scaling laws of uncoded cache-aided D2D/ad-hoc networks. In [45], the scaling law of the maximum expected throughput was characterized for single-hop cache-aided D2D networks considering a Zipf popularity distribution and a protocol model for transmission between nodes; however, it did not characterize the outage probability. In fact, this maximum expected throughput can only be achieved when the outage probability goes to 1 as N !1. To resolve this limitation, [46] investigated the scaling behavior of the throughput-outage performance for single-hop cache-aided D2D networks. It showed that the throughput per node can scale with S M with negligibly small outage probability when a heavy-tailed Zipf popularity distribution is considered, whereM is the file library sizeS is the per-user memory size. This result was later generalized in [48] by adopting the more practical and general modeling for the popularity distribution, namely the Mandelbrot-Zipf (MZipf) distribution. In [88], the scaling law of the average throughput per node for the cache-aided D2D network with multi-hop communications was characterized with the assumption of user locations on a grid, while the tradeoff between throughput and outage was not explicitly investigated. Ref. [47] investigated the scaling law of the throughput-outage performance for cache-aided D2D networks with multi-hop communications, and provided an achievable throughput scaling law under the condition that the outage is vanishing. In [112], an upper bound for the throughput scaling law was proposed, which complemented results in [88] and [47]. Notably, a major difference between the results in [47] and results in [88] and [112] was that [47] characterized the outage probability more explicitly. There exist papers investigating scaling laws using more complicated delivery approaches. The scal- 68 ing laws of coded cache-aided D2D/ad-hoc networks have been studied under different contexts, e.g., [41, 114–117]. Besides, to improve cache-aided multi-hop D2D, schemes involving hierarchical cooper- ations were introduced in [118, 120], and their scaling laws were characterized. In contrast to the above papers which studied the scaling behavior of the throughput and outage performance, the scaling behavior of the throughput-delay tradeoff was studied in [121]. In this paper, we concentrate on the uncoded cache-aided D2D with simple multi-hop communications, i.e., the complicated hierarchical cooperation is not adopted. This is because the cache-aided multi-hop D2D has been shown to provide better scaling laws than the conventional unicasting, shared-link coded caching, and cache-aided single-hop D2D [42, 46–48]. Also, the caching and delivery in D2D networks without coding can maintain the simplicity of the transmissions, while still providing the same scaling law as that with the corresponding coded version [41]. As compared to schemes involving hierarchical cooperation, although those schemes could be better than those considering simple multi-hop schemes in some situations, their implementations are much more complicated. More importantly, the implementation of simple multi- hop D2D communications in wireless networks is more plausible thanks to the recent developments in ad-hoc networks [33] and D2D networks [93, 197]. 4.1.2 Contributions In this chapter, we focus on the the scaling law analysis for the throughput-outage performance of uncoded cache-aided D2D with multi-hop communications. We aim to improve and complement results in previous papers [47, 112]. Specifically, when the outage is vanishing, [47] showed that the achievable throughput per user scales with q S M log(N) for the heavy-tailed Zipf distribution, while the upper bound in [47] was q S log(N) M , where S is the cache size of a user. Thus, there is a gap between the achievable performance and the upper bound. Ref. [112] provided a better throughput upper bound that scales as q S M . However, similar to [88], the tradeoff between throughput and outage performance was not 69 characterized in [112]. Recently, based on a real-world dataset of mobile users, [48] showed that, instead of the Zipf distribution adopted in [47] and [112], a more general modeling for the popularity distribution is the MZipf distribution. Due to these observations, our paper thus aims to close the gap between the achievable throughput-outage performance and its outer bound as well as to provide scaling law analysis under the MZipf distribution assumption. Note that this paper is the first one to provide scaling law analysis for cache-aided wireless multi-hop D2D networks considering the MZipf popularity distribution. In this work, we use PPP to model the user distribution and use the MZipf distribution to model the popularity distribution of video requests [48]. We assume the decentralized random caching policy [127] and derive the tight achievable scaling laws of throughput-outage performance for the regimes that the outage performances are either negligibly small or converging to zero, corresponding to the practical requirement that the desirable outage of the network should be small. Our achievable scheme is obtained by first deriving the optimal caching policy, and then exploiting a hybrid clustering and multi-hop delivery scheme. We also provide the outer bound of the throughput-outage performance, again, for the regimes that the outage performance is either negligibly small or converging to zero. The outer bound is derived by analyzing the upper bound of the distances between the source-destination pairs. We show that the derived achievable per-user throughput scaling law and its outer bound are tight, i.e., the multiplicative gap between the lower and upper bounds can be upper bounded by a constant. Specifically, we show that when the outage performance is negligibly small, the throughput per user scales according to q S M for < 1 and according to q S q for > 1, where is the Zipf factor and q is the plateau factor of the MZipf distribution (see mathematical definition in Ch. 4.2). Such result is intuitive as it indicates that, on the one hand, the performance is dominated by the file library size M when we have a heavy-tailed popularity distribution. On the other hand, the performance is dominated by the plateau factor q, i.e., the total number of very popular files, when we have a light-tailed popularity distribution. We note that since the Zipf distribution is simply a special case of the MZipf distribution, as a by-product, 70 our results close the gap of the Zipf distribution case in the literature, leading to the per user throughput scaling of q S M for < 1 and almost p S for > 1. Moreover, since the multiplicative gap between the achievable scheme and the outer bound can be upper bounded by a constant, our achievable throughput-outage scaling law is optimum. 4.2 Network Setup We consider a random dense network where users are placed according to a PPP within a unit square-shaped area [0; 1] [0; 1]. We assume that the density of the PPP isN. As a result, the average number of users in the network isN and the number of users n in the network is a random variable following the Poisson distribution. 1 Accordingly, the probability that the network hasn users is: P N (n =n) = N n n! e N : (4.1) Note that according to the PPP, thesen users are uniformly distributed within the unit square area. Each device in the network can cacheS files. We consider a library consisting ofM files and assume that each file has equal size. We assume that users request the files from the library independently according to a request distribution modeled by the MZipf distribution [48, 76]: P r (f; ;q) = (f +q) P M m=1 (m +q) = (f +q) H(1;M; ;q) ; (4.2) where is the Zipf factor andq is the plateau factor of the distribution;H(a;b; ;q) := P b f=a (f +q) . We can see that the MZipf distribution degenerates to a Zipf distribution whenq = 0. To simplify the nota- tion, we will in the remainder of this paper useP r (f) instead ofP r (f; ;q) as the short-handed expression. We consider the decentralized random caching policy for all users [127], in which users cache files inde- pendently according to the same caching policy. DenotingP c (f) as the probability that a user caches file 1 Since our derivations are based on results in [102] and [103], the results in this paper can be extended to the extended network in which users are placed according to a PPP with unit density in a square-shaped area of sizeN (See [102] and [103]). 71 f, the caching policy is fully described byP c (1);P c (2);:::;P c (M), where 0 P c (f) 1;8f; thus users cache files according to the caching policyfP c (f)g M f=1 . To satisfy the cache space constraint, we have P M f=1 P c (f) =S. In this paper, we assume thatS and are some constants. We consider the asymptotic analysis in this paper, in which we assume thatN!1 andM!1. We will restrict toM =o(N) andq =O(M) when < 1;M =o(N) andq =o(M) when > 1. The main reason for restricting toM =o(N) when < 1 is to render the users of the network the sufficient ability to cache the whole library. Similarly, the main reason for the assumption thatq =o(M) andM =o(N) when > 1 is to render the users of the network the sufficient ability to cache the most popularq files (orderwise); otherwise the outage probability would go to 1. The plateau factor q can either go to infinity or remain as a constant. When q goes to infinity, it is sufficient to consider q =O(M). This is because the MZipf distribution would behave like a uniform distribution asymptotically asq = !(M). As a result, we assumeq =O(M) when < 1. In addition, when > 1, it is more interesting to consider the case thatq = o(M) because it gives a clear distinction between the heavy-tailed case ( < 1) and the light-tailed case ( > 1). The definition of a heavy-tailed popularity distribution can be found in Definition 3 of [47]. Note that if we haveq =O(M) for the case that > 1, due to impact of a largeq, we would still have a heavy-tailed popularity distribution in this situation. As a result, we can expect that the scaling law in the case with > 1 andq = (M) to be similar to that with < 1 andq =O(M), as the performance of the former case is restricted byq [48]. As a matter of practice, we see from the measurment results in [48] thatq is much smaller thanM when > 1, which supports the consideration of q = o(M). When q is a constant, i.e., q = (1), the request distribution generally behaves like a Zipf distribution asM!1. Thus, the results forq = (1) can be representative for the analysis that uses the Zipf distribution for the request distribution. 2 We will considerq!1 in Chs. 2 We actually repeat all derivations for Zipf distribution, i.e.,q = 0, and find that this claim is true. The repeated derivations for Zipf distribution are omitted for brevity. 72 4.3 and 4.4 and considerq = (1) in Ch. 4.5. SinceS is a constant, the probability that a user can find the desired file from its own cache goes to zero asq andM go to infinity. This prevents the possible gain of trivial self-caching; we thus concentrate on the analysis of D2D collaborative caching gain. Moreover, similar to [46], we assume that different users making the requests on the same file would request different segments of the file, which avoids the gain from the naive multicasting. We consider the physical model and define that the link rate between two users i and j follows the well-known physical model [102, 104]: R(i;j) = 8 > > > > < > > > > : log 2 (1 +#); log 2 1 + P i l(i;j) N 0 + P k6=j P k l(k;j) ! # 0; log 2 1 + P i l(i;j) N 0 + P k6=j P k l(k;j) ! <# ; (4.3) where# is some constant according to the delivery mechanism;N 0 is the noise power spectral density;P i is the power of useri; andl(i;j) = minf1; d ij g is the power attenuation between usersi andj, whered ij is the distance between usersi andj,> 0 is some constant, and> 2 is the pathloss factor. We note that this model will not be directly used in the analysis of this paper. However, it is necessary when we want to leverage the results in [102] and [103] later. We consider multi-hop D2D delivery for the network. Users can only obtain their desired files through either multi-hop D2D delivery or self-caching. In other words, users can only obtain files from caches of the users in the network. Note that since S is a constant but M goes to infinity, we can assume without loss of generality that the throughput per user of using self-caching is identical to that using D2D-caching; thus we do not distinguish between users retrieving the desired files from their own caches and from caches of other users. We define an outage as an occurrence that a user cannot obtain its desired file through either the multi-hop D2D delivery or self-caching. Suppose we are given a realization of number of users n in the network with a realization of the placement of the user locations P according to the binomial point process. In addition, we are given a realization of file requests F and a realization of file placement G of users according to the popularity distribution P r () and caching policy P c (), respectively. We can 73 defineT u as the throughput of useru2U under a feasible multi-hop file delivery scheme. We then define the average throughput of user u with a given number of users n and location placement of users r as T u (n;r) =E[T u j n = n;P = r], where the expectation is taken over the file requests F of users, the file placement of usersG, and the file delivery scheme. Subsequently, we define T user (n;r) = min u2U T u (n;r): (4.4) Finally, the expected minimum average throughput of a user in the network is defined as T user =E n;P [T user (n;r)]; (4.5) where the expectation is taken overn andP. When the number of users in the network isn, we define N o (n) = X u2U 1fE[T u j P;F;G] = 0g (4.6) as the number of users that in outage, where1fE[T u j P;F;G] = 0g is the indicator function such that the value is 1 ifE[T u j P;F;G] = 0; otherwise the value is 0. Intuitively,1fE[T u j P;F;G] = 0g is equal to zero when the file delivery scheme cannot deliver the desired file to useru. We note that the expectation ofE[T u j P;F;G] is taken over the file delivery scheme and1fE[T u j P;F;G] = 0g is a random variable with the distribution being the function ofP,F, andG. The outage probability in the case ofn users is then defined as p o (n) = 1 n E P;F;G [N o (n)] = 1 n X u2U P (E[T u j P;F;G] = 0): (4.7) Consequently, the network outage probability is defined as p o =E n>0 [p o (n =n)] +P N (n = 0): (4.8) Note that since we considerN!1,P N (n = 0) is actually negligible for the asymptotic analysis. In the following, we will aim to analyze the throughput-outage performance in terms ofT user andp o . We will be 74 especially interested in the regime that the outage probabilityp o is small, i.e., the regime thatp o =, where is a negligibly small number or converges to zero. 4.3 Achievable Throughput-outage Performance In this section, we derive the achievable throughput-outage performance of the network, in which we say (T (P o );P o ) is achievable if there exists a caching and multi-hop file delivery scheme such that T user T (P o ) andp o P o . We will in the following first provide the achievable file delivery scheme, and then propose the achievable caching scheme. Accordingly, the achievable throughput-outage performance will be derived. In this section, we focus on the cases thatq!1, i.e.,q =!(1). The cases thatq = (1) will be discussed later in Ch. 4.5. 4.3.1 Achievable Caching and File Delivery Scheme We consider the following achievable multi-hop file delivery scheme. We let g c (M) be a function of M which goes to infinity asM!1. Then, a clustering approach is used to split the cell into equally-sized square clusters, in which each cluster has the side length q gc(M) N , andg c (M) is thus denoted as the cluster size. Different clusters could be activated simultaneously. The inter-cluster interference is avoided by a Time Division Multiple Access (TDMA) scheme with reuse factorK [193]. Such a reuse scheme evenly appliesK colors to the clusters, and only the clusters with the same color can be activated on the same time- frequency resource for file delivery. We assume that a user in a cluster can only access files cached by users in the same cluster via either accessing its own cache or using (multi-hop) D2D communications following the multi-hop approach proposed in [103]. Specifically, denotingV f as the set of users in a cluster that cache filef, we consider the following transmission policy: for each useru in the cluster, if the requested filef can be found in the caches of users inV f , then a userv f , randomly selected fromV f , is set as the source for deliverying the requested (real) filef to useru; if the requested file cannot be found from the 75 caches of any users in the cluster, user u would be matched with a randomly selected user v from users in the cluster, and then userv is set as the source for delivering a virtual file to useru. Note that it does not matter what file is delivered in this case, as the user is actually in outage. After the establishment of the matching of the sources and destinations, to deliver (both real and virtual) files, the multi-hop approach proposed in [103] directly applies. Note that the delivery of virtual files cannot generate throughput for the network because users receiving virtual files are indeed in outage and the desired files are not actually received. However, we would still assume them to be included in the multi-hop D2D communications for the convenience of the mathematical analysis. Such scheme is suboptimal. Nevertheless, when the outage probability is either negligibly small or converging to zero, this scheme will be orderwise optimal because the performance degradation caused by delivering virtual files is negligible. Also note that, since the multi- hop approach in [103] can provide the per-user symmetric throughput for all users, this delivery scheme can thus provide the per-user symmetric throughput for users that are not in outage. Furthermore, since users cache file independently, the matching of the source-destination pairs here is equivalent to the uniformly random matching (see proof in Appendix B.15). Finally, we note that the assumption that a user may get a desired file from only its own cluster seems rather restrictive. However, the fact that this scheme can achieve (in the order sense) the outer bound shows that inclusion of inter-cluster communication cannot change the scaling law. By adopting the aforementioned scheme, due to the symmetry of the network and the thinning property of PPP, the throughput-outage performance for each cluster is the same as the throughput-outage perfor- mance for the whole network. We will thus in the following focus on the analysis of a cluster to deriveT user andp o . In addition, since a user is in outage, i.e.,T u = 0, only if this user cannot find the desired file from any users in the same cluster, the outage probabilityp o is then equivalent to the probability that a user can- not find the desired file from users in the same cluster. Accordingly, when we denote the probability that a user can find the desired file in the cluster, i.e., the file hit-rate, asP h , it is then clear thatP h = 1p o . 76 To obtain the achievable caching scheme, we first provide Lemma 1 for the closed-form expression of p o . Then, serving as the achievable caching scheme, the the caching policy which minimizesp o is proposed in Theorem 1. Lemma 1: Considering the proposed file delivery scheme, cluster sizeg c (M), and the caching distribu- tionP c (), the outage probability of the proposed achievable scheme is p o = M X f=1 P r (f)e gc(M)Pc(f) : (4.9) Proof. See Appendix B.1. Theorem 1: LetN!1,M!1,q!1, andg c (M)!1. Denotem as the smallest index such thatP c (m + 1) = 0. LetC 2 = q Sgc(M) ;C 1 is the solution of the equation: C 1 = 1 +C 2 log 1 + C 1 C 2 . The caching distributionP c () that minimizes the outage probabilityp o is as follows: P c (f) = h log z f i + ;f = 1;:::;M; (4.10) where = exp P m f=1 logz f S m ,z f = (P r (f)) 1 gc(M) , [x] + = max(x; 0), and m = min C 1 Sg c (M) ;M : (4.11) Proof. See Appendix B.2. Remark 1: Similar to the results in [48], Theorem 1 indicates that the number of files with non-zero probability to be cached by users is at least on the same order as the plateau factorq – ifq =O(g c (M)), thenm = (g c (M)); ifq = !(g c (M)), thenm = (q). This is intuitive when we look at the shape of the MZipf distribution: the most popularq files (orderwise) have similar request probabilities, and we need to cache them to have the minimal outage probability. Remark 2: Since Theorem 1 gives the optimal caching policy that minimizes the outage probability for a given cluster size, this implies that such caching policy requires the smallest cluster size for a given 77 outage probability. Consequently, with a given outage probability, the network throughput for the clustering network can be maximized by the caching policy in Theorem 1 because the number of activated clusters is maximized. Based on the achievable caching and file delivery scheme in this subsection, we subsequently character- ize the achievable throughput-outage performance for both < 1 and > 1. 4.3.2 Throughput-Outage Performance for < 1 In this subsection, we consider < 1, q = !(1), and q = O(M), and characterize the achievable throughput-outage performance. We will in the following first provide Proposition 1 to characterize the upper bound of the outage probabilityp o . Then, Theorem 2 and Corollary 1 are provided to characterize the achievable throughput-outage performance. Finally, we use Proposition 2 and Corollary 2 to show that it is necessary to haveg c (M) = (M) to obtain desirable outage probability when < 1. Proposition 1: Let M !1, N !1, and q !1. Suppose < 1 and let D = q M . Consider g c (M) = M C 1 S = o(N), where . When adopting the caching policy in Theorem 1, the outage probabilityp o is upper bounded as p o (1 )e ( C 1 ) D D (1 +D) (1+D) (1 +D) 1 D 1 : (4.12) Proof. See Appendix B.3. Theorem 2: Let M ! 1, N ! 1, and q ! 1. Suppose < 1 and let D = q M . Consider g c (M) = M C 1 S = o(N), where . When adopting the caching policy in Theorem 1, the following throughput-outage performance is achievable: T (P o ) = (1P o ) K s C 1 S M ! ;P o = (1 )e ( C 1 ) D D (1 +D) (1+D) (1 +D) 1 D 1 : (4.13) Proof. See Appendix B.4. 78 Corollary 1: LetM!1,N!1, andq!1. Suppose < 1 andg c (M) = M C 1 S =o(N). When adopting the caching policy in Theorem 1 and considering = (1) , the following throughput-outage performance is achievable: T (P o ) = s S M ! ;P o = 1 (); (4.14) where 1 ()> 0 can be arbitrarily small. Furthermore, when considering!1, i.e., =!(1), we obtain the following achievable throughput-outage performance: T (P o ) = s S M ! ;P o = e =o(1): (4.15) Proof. Corollary 1 can be obtained directly from Theorem 2. Remark 3: From Theorem 2 and Corollary 1, we understand that when the outage probability is negli- gibly small, the achievable throughput is q S M . In addition, Corollary 1 shows that when the outage probability converges to zero exponentially fast as!1, the achievable throughput is q S M . We will see later that this result meets the outer bound of the scaling law provided in Theorem 4 in Ch. 4.4. Proposition 2: LetM!1,N!1, andq!1. Suppose 6= 1 andg c (M)!1. LetC 2 = q Sgc(M) and considerg c (M)< M C 1 S . When adopting the caching policy in Theorem 1, the outage probabilityp o is: p o =1 + (1 )e 1 C 1 1 C 1 S g c (M) M 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 1 + C 2 S gc(M) M 1 C 2 S gc(M) M 1 C 1 S g c (M) M 1 1 + C 2 C 1 1 C 2 C 1 1 1 + C 2 S gc(M) M 1 C 2 S gc(M) M 1 : (4.16) Proof. See Appendix B.6. Corollary 2: Let M !1, N !1, and q !1. Suppose < 1 and g c (M)!1. Consider g c (M) =o(M). When adopting the caching policy in Theorem 1, we obtain: p o = 1o(1): (4.17) 79 Proof. This is proved by using Proposition 2 with < 1 andg c (M) =o(M). Remark 4: Corollary 2 indicates that when < 1, it is necessary to haveg c (M) = (M) for guaran- teeing the reasonable outage. Consequently, we are not interested in the cases thatg c (M) =o(M). 4.3.3 Throughput-Outage Performance for > 1 In this section, the achievable throughput-outage performance is characterized when > 1,q =!(1), and q = o(M) are considered. In the following, we first use Proposition 3 and Corollary 3 to characterize the outage probability. Then, Theorem 3 and Corollary 4 are provided to characterize the achievable throughput- outage performance. Proposition 3: Let M !1, N !1, and q!1. Suppose > 1 and g c (M)!1. Consider g c (M) = o(M) andq = o(M). LetC 2 = q Sgc(M) . When adopting the caching policy in Theorem 1, the outage probabilityp o is: p o =1 + ( 1)e 1 C 1 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 C 2 C 1 1 C 1 C 2 1 C 1 C 1 +C 2 1 ! C 2 C 1 1 : (4.18) Proof. See Appendix B.7. Corollary 3: Let M !1, N !1, and q !1. Suppose > 1 and g c (M)!1. Consider g c (M) = o(M),q = o(M), andg c (M) = 1 q S , where 1 is a sufficiently large constant. We can obtain p o = 2 ( 1 ), where 2 ( 1 )> 0 can be arbitrarily small. Furthermore, when 1 =!(1), i.e.,q =o(g c (M)), we obtainp o = 1 ( 1 ) 1 =o(1). Proof. See Appendix B.8. Remark 5: Let M ! 1, N ! 1, and q ! 1. Suppose > 1 and g c (M) ! 1. Consider g c (M) =o(M),q =o(M), andg c (M) = 1 q S . If we adopt the caching policy in Theorem 1 and gradually 80 10 2 10 3 10 4 q/g c (M) 0.9955 0.996 0.9965 0.997 0.9975 0.998 0.9985 0.999 0.9995 Outage Probability = 1.2 = 1.6 = 2.0 Figure 4.1: Outage probability with respect toq=g c (M) in Remark 5 considering different . decrease 1 , thenp o ( 1 ) would gradually increase. Note that this is simply a claim without rigorous proof being provided. However, this is very intuitive because increasing 1 is equivalent to decreasingg c (M). Besides, this can be observed by using Proposition 3 with computer simulations (See Fig. 4.1). Remark 6: Remark 5 indicates that ifg c (M) is not large enough for caching popular files in the plateau area of the MZipf distribution, the outage probability can be very large. On the contrary, Corollary 3 indicates that if g c (M) is large enough, the outage probability can go to zero as 1 !1. In practice, this implies that ifg c (M) is large enough to cache the most popularq files, we can have a reasonably good outage performance. Theorem 3: Let M !1, N !1, and q !1. Suppose > 1 and g c (M)!1. Consider g c (M) = o(M),q = o(M), andg c (M) = 1 q S , where 1 = (1). When adopting the caching policy in Theorem 1, the following throughput-outage performance is achievable: T (P o ) = (1P o ) K s S 1 q ! ; P o = 1 + ( 1)e 1 C 1 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 C 2 C 1 1 C 1 C 2 1 C 1 C 1 +C 2 1 ! C 2 C 1 1 : (4.19) Proof. See Appendix B.9. 81 Corollary 4: Let M !1, N !1, and q !1. Suppose > 1 and g c (M)!1. Consider g c (M) = o(M), q = o(M), and g c (M) = 1 q S . When adopting the caching policy in Theorem 1 and considering 1 = (1) to be large enough, the following throughput-outage performance is achievable: T (P o ) = s S 1 q ! ;P o = 2 ( 1 ); (4.20) where 2 ( 1 )> 0 can be arbitrarily small. Furthermore, when considering 1 =!(1)!1, we obtain the following throughput-outage performance: T (P o ) = s S 1 q ! ;P o = 1 ( 1 ) 1 =o(1): (4.21) Proof. This is obtained by directly using Theorem 3 and Corollary 3. Remark 7: Theorem 3 and Corollary 4 characterize the achievable throughput-outage performance. Especially, Corollary 4 indicates that we can achieve the throughput q S q with an negligibly small outage probability. It also shows that when the outage probability converges to zero with the rate ( 1 ) 1 , the achievable throughput is q S 1 q . Besides, by comparing between Corollary 1 and Corollary 4, we understand that when the popularity distribution has a light tail, we can improve the scaling law: the performance is restricted by the order ofq, instead ofM. Finally, we will see that the achievable throughput- outage performance provided in Corollary 4 is optimum as it is tight to the outer bound provided in Theorem 5 in Ch. 4.4. 4.3.4 Finite Dimensional Simulations In the subsection, we provide results of finite-dimensional simulations in Fig. 4.2, which compare the the- oretical (solid lines) and simulated (dashed lines) curves of the achievable throughput-outage performance. The simulations are obtained by following the caching policy proposed in Theorem 1 and the delivery approach described in Ch. 4.1. The normalized throughput in the figures denotes the throughput that is nor- malized by the link capacity and effective reuse factor when using the multi-hop scheme in [103]. Also, we 82 10 -3 10 -2 10 -1 Outage Probability 10 -3 10 -2 10 -1 Normalized Throughput per User Theory - q=2 Simulation - q=2 Theory - q=200 Simulation - q=200 (a) = 0:6. 10 -4 10 -3 10 -2 10 -1 Outage Probability 10 -3 10 -2 10 -1 Normalized Throughput per User Theory - q=2 Simulation - q=2 Theory - q=200 Simulation - q=200 (b) = 1:6. Figure 4.2: Comparison between the normalized theoretical result (solid lines) and normalized simulated result (dashed lines) in networks adoptingS = 1 andM = 1000 under differentg c (M). 83 assume that the routing is centrally controlled and the number of bits transmitted is sufficiently large such that there are bits to deliver by the network for the most of the time. The results show that there is a con- stant factor gap between the theoretical and simulated curves. This might be caused by some constant factor between the theoretical throughput and simulated throughput results, as indicated in [103]. However, such constant gap is minor when we consider the scaling law behavior which focuses on the order of the limit- ing case. On the other hand, such gap also indicates that in practice, instead of pursuing a log-order gain of scaling, it might sometimes be more important to concentrate on the optimization that can bring the system some factor gain in practice. 4.4 Outer Bound of the Throughput-outage Performance In this section, we derive the outer bound of the throughput-outage performance. In the following, we say a point (T (P o );P o ) is dominant (thus serving as an outer bound point) if, for any caching and delivery scheme, eitherT (P o ) T user orP o p o is satisfied. Note that although there are different dominant points, we will specifically characterize the dominant points whereP o is either negligibly small or converging to zero. Besides, we again consider only the cases thatq!1 here. The cases thatq = (1) will be provided later in Ch. 4.5. Theorem 4: LetM!1,N!1, andq!1. Suppose < 1. When 0 = (1) is large enough, the throughput-outage performance of the network is dominated by: T (P o ) = s S 0 M ! ;P o = 0 1 ( 0 ); (4.22) where 0 1 ( 0 ) > 0 can be arbitrarily small. Furthermore, when 0 = !(1)!1, the throughput-outage performance of the network is dominated by: T (P o ) = s S 0 M ! ;P o = (e 0 ) =o(1): (4.23) Proof. See Appendix B.10. 84 Theorem 5: LetM!1,N!1, andq!1. Suppose > 1 andq = o(M). When considering 0 1 = (1), the throughput-outage performance of the network is dominated by: T (P o ) = s S 0 1 q ! ;P o = 0 2 ( 0 1 ); (4.24) where 0 2 ( 0 1 ) > 0 can be arbitrarily small. Furthermore, when considering 0 1 =O q 1 1 !1 but 0 1 q =o(M), the throughput-outage performance of the network is dominated by: T (P o ) = s S 0 1 q ! ;P o = 1 ( 0 1 ) 1 =o(1); (4.25) Proof. See Appendix B.11. Remark 8: By comparing between Corollary 1 and Theorem 4, we observe that the achievable throughput- outage performance and the outer bound are tight to each other when < 1. This indicates that the provided achievable scheme is orderwise optimal when the outage probability is either negligibly small or converging to zero. Remark 9: By comparing between Corollary 4 and Theorem 5, we again see that there is no gap between the achievable throughput-outage performance and the outer bound when > 1. This shows that the provided achievable scheme is orderwisely-optimal when the outage probability is either negligibly small or converging to zero. 4.5 Main Results forq = (1) Distributions In this section, we analyze the cases that q = (1). It should be noted that since, asymptotically, a MZipf distribution with q = (1) and M !1 behaves equivalently to a Zipf distribution in terms of the throughput-outage performance, the results in this section can be representative for the results consider- ing the Zipf distribution, i.e.,q = 0. 3 Note that since proofs for Theorems 6, 7, and 9 can be done simply 3 Note that the throughput-outage performance considering the standard Zipf distribution, i.e.,q = 0, can also be derived via repeating the approaches used in this chapter with some algebraic differences. 85 by settingq = (1) and repeating the proofs for the corresponding theorems in Ch. 4.3 and Ch. 4.4, their proofs are omitted for simplicity. The throughput-outage analysis results when considering q = (1) are provided below. Theorem 6 provides the caching scheme used for deriving the achievable throughput-outage performance; Theorems 7 and 8 describe the achievable throughput-outage performance for < 1 and > 1, respectively; and Theorems 9 and 10 provide the throughput-outage outer bound for < 1 and > 1, respectively. Theorem 6: LetM !1, q = (1), andg c (M)!1. Denotem as the smallest index such that P c (m + 1) = 0. The caching distributionP c () that minimizes the outage is: P c (f) = h log z f i + ;f = 1;:::;M; (4.26) where = exp P m f=1 logz f S m ,z f = (P r (f)) 1 gc(M) , [x] + = max(x; 0), and m = min Sg c (M) ;M : (4.27) Theorem 7: Let M !1, N !1, and q = (1). Suppose < 1 and consider g c (M) = zip M S , where zip = (1) . When adopting the caching policy in Theorem 6 and considering zip = (1), the following throughput-outage performance is achievable: T (P o ) = (1P o ) K s S zip M ! ;P o = (1 )e ( zip ) = 1;zip ( zip ): (4.28) Furthermore, when considering !1, i.e., = !(1), we obtain the following achievable throughput- outage performance: T (P o ) = s S zip M ! ;P o = e zip =o(1): (4.29) Theorem 8: LetM!1 andN!1, andq = (1). Suppose > 1. Considerg c (M) =o(M) and g c (M) = 0 1;zip q S , where 0 1;zip =o(M) is any function that goes to infinity asM!1. When adopting the caching policy in Theorem 1, the following throughput-outage performance is achievable: T (P o ) = s S 0 1;zip ! ;P o = 1 ( 0 1;zip ) 1 ! =o(1): (4.30) 86 Proof. See Appendix B.12. Theorem 9: LetM!1,N!1, andq = (1). Suppose < 1. When 0 zip = (1) is large enough, the throughput-outage performance of the network is dominated by: T (P o ) = s S 0 zip M ! ;P o = 0 1;zip ( 0 zip ); (4.31) where 0 1;zip ( 0 zip ) > 0 can be arbitrarily small. Furthermore, when 0 zip = !(1)!1 as M !1, the throughput-outage performance of the network is dominated as: T (P o ) = s S 0 zip M ! ;P o = (e 0 zip ): (4.32) Theorem 10: LetM!1,N!1, andq = (1). Suppose > 1. Suppose that 0 2;zip =o(M)!1 is any function that goes to infinity as M !1. The throughput-outage performance of the network is dominated as: T (P o ) = s S 0 2;zip ! ;P o = 1 ( 0 2;zip ) 1 ! =o(1): (4.33) Proof. See Appendix B.13. Remark 10: By comparing Theorem 7 with Theorem 9, we observe that, the achievable throughput- outage performance and the outer bound are tight when < 1. Likewise, by comparing Theorem 8 with Theorem 10, we see that the achievable throughput-outage performance and the outer bound are tight when > 1. With the results in this section, we can provide some comparisons with results in [47], [88], and [112]. We emphasize that although we tend to provide comparisons with results in those papers, different papers actually have slightly different network setups and assumptions; results in our paper are with more practical PPP model for the user distribution. Note that [47] and [112] adopted a simpler model in which the num- ber of users in the network is a fixed value; [88] assumes that users are on a grid and communications can 87 only happen between adjacent users on the grid. Besides, [47] provides detailed throughput-outage analy- sis for achievable performance with both < 1 and > 1 as well as the outer bound with < 1, while the outer bound is not tight. Ref. [88] provides the achievable throughput scaling law without addressing the outer bound. In contrast, [112] provides only the outer bound without addressing the achievable perfor- mance. Notably, both [88] and [112] are with centralized caching policy and they do not consider the outage probability. We note that when compared with the centralized caching policy, the randomized policy, as be- ing adopted in this paper, is easier to implement and more robust to user mobility [10]. The quantitative comparisons are remarked below. Remark 11: By comparing Theorems 7 and 8 with results in [47], we observe that our proposed achiev- able scheme improves the achievable throughput-outage performance. Specifically, the comparison results show that, when the outage converges to zero, the achievable throughput increases from q S M log(N) to q S zip M when < 1; and increases from q S log(N) to q S 0 1;zip for > 1, where zip and 0 1;zip can be any function such that zip ; 0 1;zip = !(1) asM!1. In summary, our results improve the achievable throughput by almostO p log(N) when the outage probability converges to zero. Remark 12: By comparing Theorems 9 and 10 with the outer bound in [47], we observe that our pro- posed outer bound improves the outer bound in [47] again by almostO p log(N) for the case < 1. 4 Remark 13: When comparing our results with results in [88] and [112], we see that our achievable throughput performance and outer bound are (orderwise) almost identical to the outer bound provided in [112] when a negligibly small outage probability is allowed - they are with q S M for < 1 and with virtually p S for > 1. The main difference between our results and results in [88] and [112] is that they have an additional regime for 1 < < 3 2 , where the throughput per node is worse than p S , whereas our results only have the regime > 1. The possible reason might be that the allowance of a tiny 4 Theorem 10 actually provides an outer bound that characterizes the convergence rate with more details, as compared to the corresponding outer bound in [47]. 88 outage (either negligibly small or converging to zero) in our results eliminate such regime. Note that since we consider the randomized caching policy, it is not possible to have an outage probability that is exactly zero, as opposed to the centralized caching strategies in [88] and [112]. 4.6 Conclusions In this work, we conduct a scaling law analysis for the throughput-outage performance of the cache-aided D2D networks with multi-hop communications under the PPP and MZipf modeling for user distribution and popularity distribution, respectively. By demonstrating that there is no gap between the proposed achievable performance and outer bound, optimality is obtained in this work. When q = !(1), our results show that the optimal throughput per user scaling is q S M if < 1 and q S q if > 1. In addition, whenq = (1), our results show that the optimal throughput per user is q S M if < 1 and almost p S if > 1. Note that all these results are either with negligibly small outage probability or with outage probability converging to zero, corresponding to the small outage requirement in practice. Since the analysis results for the case that q = (1) can be representative for the results considering the standard Zipf distribution, our results close the gap between the achievable throughput-outage performance and outer bound that exists in the literature. 89 C H A P T E R 5 Caching Policy and Cooperation Distance Design for Base Station As- sisted Wireless D2D Caching Networks: Throughput and Energy Ef- ficiency Optimization and Trade-Off 5.1 Introduction From Ch. 2, we know that cache-aided Wireless D2D networks have been widely explored. However, some practical aspects have not been fully explored. First, while many papers investigating self-caching and D2D caching, their interaction has not been well explored, and the self-caching impact was occasionally overlooked even if it is influential to the networks [71, 74]. In addition, although caching policies have been designed in pursuit of different objectives, e.g., cache hit rate, successful access probability [66, 73], latency [67, 71], throughput [42, 46, 69], and energy efficiency (EE) [65, 68], different objectives generally conflict with one another and have their own disadvantages. When using cache hit rate, the designs aim to maximize the probability that a user can reach the desired file through D2D communications, while ignoring the potential help from the base station (BS). Also, an optimal hit rate does not actually mean that the system throughput is optimal as well [74]. Similarly, when considering network throughput, EE of users cannot be guaranteed. On the other hand, when focusing on optimizing EE, the network throughput might be sacrificed 90 [198]. As a result, in order to improve the design of the system, it is necessary to comprehensively explore the trade-offs between different objectives. Finally, we observe that the cooperation distance is impactful for the D2D networks [42, 46, 68, 71]. Generally speaking, a larger cooperation distance can provide better caching cooperation between users, i.e., a user has a higher chance to obtain the desired content via D2D links, while on the downside it leads to the higher power consumption and lower frequency reuse gain. This trade-off motivates the interest in ex- ploring the effect of cooperation distance. We note that although the optimization of cooperation has drawn some attentions, the optimization of cooperation distance has not been investigated in all aspects, especially concerning the trade-off between different objectives. Based on the above observations, we investigate the caching policy and cooperation distance design in cache-aided wireless D2D networks in this chapter. 5.1.1 Contributions From the previous discussion, it can be concluded that a comprehensive understanding of different optimal designs and their trade-offs is necessary. In addition, the investigation of designs that provide the best compromise between different objectives is still far from providing conclusive results. The insufficiency lies in several aspects: (i) lack of joint design of caching policy and cooperation distance; (ii) relying on numerical results and/or simplified models; (iii) disregard of the effect of self-caching and its interaction with D2D-caching; (iv) absence of analysis and design for trade-off between the fundamental throughput and energy efficiency aspects. Therefore, our work aims to address these issues. In this work, a BS-assisted D2D caching network is considered. We focus on optimizing caching policy and cooperation distance designs in terms of network throughput and EE, respectively. We also discuss the throughput–EE trade-off and the network designs to achieve this trade-off. To jointly consider effects of BS, D2D-caching, and self-caching, we consider the user being able to access the desired file through BS links, D2D links, or its own cache. To embody the effect of cooperation distance and to mitigate interference 91 between D2D links, a cluster D2D network configuration [46, 71] is adopted with specified power control policy and frequency reuse approach. D2D communications are allowed only between users in the same cluster. Since different cooperation distances manifest different sizes of cluster, different cooperation gains, and power consumptions, this network configuration along with different objectives offers the flexibility in investigating different types of designs and their trade-offs. Network throughput analyses of two network structures, which we call the ”random-push” and the ”prioritized-push” networks, are provided in this work. Although the prioritized-push network is more spectrally efficient and practical, it suffers from a complicated formulation that makes its exact optimization intractable. In contrast, the random-push throughput is easier to analyze. We thus first analyze the random- push network, and then building on the results, we provide tractable approximations for the throughput of the prioritized-push network. Since the throughput of the random-push network and the proposed approxi- mation for the prioritized-push network are both concave functions when fixing a cooperation distance, the throughput-based design is thus converted to a standard concave program with a one-dimensional search, in which the solution can be effectively obtained by a simple quantization. To analyze the network EE, method- ologies similar to the throughout analysis are exploited for firstly analyzing the network power consumption of the networks. Then the network EE can be obtained by combining analytical results of throughput and power consumption. Since the resulting expressions of EE are quasi-concave when fixing the cooperation distance, the EE-based optimization becomes solving a standard quasi-concave program, for which the op- timal solutions are attainable. To investigate the throughput–EE trade-off, the concept of pareto-optimiality in multi-objective opti- mization is exploited [199]. By introducing the weighted sum method [199], the optimal trade-off design problem is proposed and a solution approach is provided by exploiting results in [200]. We note that the trade-off design can be interpreted as the design providing a compromise between two distinct objectives via adopting different cooperation distances and caching policies. Simulations considering practical param- 92 eters and network configuraions are offered to validate our theoretical analysis and evaluate the proposed designs. The proposed designs can outperform designs that does not jointly consider effects of BS com- munications, D2D communications, and self-caching in terms of the targeted objectives and provide better trade-off. The insights of the designs and the effects of critical parameters are also discussed. Our main contributions are summarized as follows: By jointly considering the effects of BS, D2D-caching, and self-caching, and the impact of the co- operation distance, we analyze network throughput and EE and propose the mathematically tractable approximate formulations in the clustering network considering both active and inactive users with specifically designed power control and resource reuse policies. By exploiting the throughput and EE formulations, we propose the corresponding caching policy and cooperation design problems. We also show that the proposed optimization problems can be effectively solved by converting to standard concave and quasi-concave programs along with a simply one-dimensional search. We characterize the trade-off between throughput and EE and formulate their trade-off design prob- lem. To the best of our knowledge, this work is the first to address the trade-off between throughput and EE and provide the the corresponding trade-off design. By considering practical network parameters and configurations in simulations, we validate the pro- posed analyses and evaluate the proposed designs. The remainder of the paper is organized as follows. In Ch. 5.2, the adopted network configurations and system models are introduced. In Ch. 5.3, throughput analysis and the corresponding optimization are provided. The EE formulation and its optimal design are proposed in Ch. 5.4. We analyze the throughput– EE trade-off and propose the trade-off design in Ch. 5.5. Numerical results and corresponding discussions are provided in Ch. 5.6. Conclusions are provided at the end of this chapter. 93 5.2 Content Caching and System Modeling of Base Station Assisted Wire- less D2D Caching Networks 5.2.1 Network and System Models This work considers a BS-assisted cache-enabled wireless D2D network and adopts the clustering presented in [46, 71]. To wit, a square cell with side lengthD is served by a BS and is split into several equal-sized square clusters with side length (henceforth called cluster size)d, where D2D communication is allowed be- tween two devices within the same cluster. Then the number of clusters in a cell isN = D 2 d 2 . With slight loss of practicality, a fractional number of clusters is allowed for mathematical tractability and simplicity. We consider two non-overlapping frequency bands for establishing BS communications and D2D commu- nications, respectively. For communications between the BS and the devices, the time-frequency resources of the BS band are shared by all clusters via an orthogonal multiple access approach, such as FDMA. To guarantee the minimum video streaming quality, each BS link assigned to a user is allowed to obtain the same, fixed, amount of resources (bandwidth and power). Typically, the data rate achievable on such a BS link is significantly lower than on a D2D link. Since the amount of resources is limited, there exists a maxi- mum number of usersN BS that can simultaneously use BS links. Adopting the clustering network structure provides the following benefits: (i) tractable closed-form expressions of critical metrics can be obtained; 1 (ii) the results are easily extensible to analyses of other aspects; and (iii) the resulting designs can serve as a benchmark/reference system for other systems, i.e., the performance achieved with a clustering approach constitutes an achievable lower bound. D2D communications are considered only between users within the same cluster. Consequently, we call (with slight abuse of definition)d also the cooperation distance; in fact, ”cluster size” and ”cooperation dis- 1 We note that a dynamic D2D scheduler, such as [69], provides better spectral efficiency; however it is very challenging to find the optimal caching policy for this case, and only some heuristic designs are known [69]. 94 tance” will be used interchangeably throughout this paper. Resources for D2D communications are spatially reused between the clusters. Such a reuse scheme evenly appliesK colors to the clusters, and only the clus- ters with the same color can be active on the same time-frequency resource for D2D communications. Note that the adopted reuse scheme is analogous to the spatial reuse scheme in conventional cellular networks [193], andK is the reuse factor. In this work, we adopt a simplified channel model in which only the path-loss effect is considered for mathematical tractability. The channel randomness, such as small-scale fading and shadowing, is ignored because link level optimization is not employed, the channel randomness can be averaged out by using frequency diversity and properties of Poisson point processes (PPPs), and the caching policy is designed and operated over a long time scale. The path-loss model is 20 log 10 4d 0 c + 10 log 10 ( d TR d 0 ) [dB]; (5.1) whered TR is the distance between transmitter and receiver, c is the wavelength of the carrier frequency, is the pathloss exponent,d 0 is the break point distance. To restrict the interference between different clusters and maintain the received signal power with respect to the change ofd, a power control policy is adopted such that 2 E D = ( p K 1) d d 0 ( 4d 0 c ) 2 ; (5.2) whereE D is the transmission power for D2D communications and, which is a choice of the designer, is the maximum allowable interference between two clusters using the same resource. Thus, by fixing to be sufficiently small, the interference can be effectively avoided. Besides, by this policy, the average received power of users in a cluster can be maintained even if the cluster size is adjusted for optimization purposes. This is mainly because E D scales with d on the order of . Note that this power control policy depends only on system parameters, and no attempt is made to adapt it to the channel states/distances between TX 2 Correcting our conference version [198], the multiplier p 2 in the same equation of [198] is unnecessary. This revision generally does not have any impact on the results in this paper and in[198]. 95 and RX. Hence, given the system parameters, the transmission powers of all D2D links in the network are identical. Also note that, since interference is avoided when is sufficiently small, the interference between clusters will be ignored in the remainder of the paper. In this work, users can obtain the desired content via their own caches, D2D communications, or BS communications with different transmission qualities and costs. We denote the throughput for a user that accesses the content via a BS link asT B ; via a D2D link asT D ; and from its own cache asT S ; and consider T S T D >T B . 3 Note that we generally assumeT B ,T D , andT S to be invariant with respect to the cluster size d, and these assumptions are reasonable when the power control policy in (5.2) is adopted, and the amount of BS resources assigned to each BS link are the same. 4 Furthermore, we assume that the throughput of the user is independent of the actual distance between the transmitter and receiver, which is practical when we have a fixed modulation-and-coding scheme. Similar to the throughput case, we denote the power consumption for a user to access the content via a BS link asE B ; the power consumption for a user to access the content using a D2D link is by definitionE D in (5.2); we consider onlyE B >E D . 5 Zero power consumption is assumed if the user can access the desired content from its own cache. For simplicity, we assume here that energy cost is purely determined by RF energy required for transmission; access to storage and coding/decoding is assumed to be negligible in comparison. We assume that the BS is equipped with an unlimited backhaul connected to repositories containing all contents in the library. Thus, the request from a user can always be satisfied (with a minimum video quality) if the BS link is available for that user. We consider two different types of users in this work: active and inactive users. An active user is a user 3 Here T B, T D, and T S can be generalized to include the perspective of the user satisfaction by considering the effective or weighted throughput. Our results will hold as long as the inequalityT ST D >T B holds. 4 This assumption is in line with policies of network providers that do not charge video traffic to users as long as they opt for lowest possible quality. 5 Similarly, here E B and E D could be generalized (by using the effective or weighted power consumption) to include the different impacts of power consumptions. For example, we can emphasize the importance of the power consumption of the users by rendering the user power consumption a larger weight. Our results will hold as long as the inequalityE B >E D holds. 96 who places a request that needs to be satisfied and participates in the D2D cooperation (i.e., sends files to other users that request them); an inactive user is a user who does not place request of its own but still par- ticipates in the D2D cooperation. We consider both active and inactive users to be independently distributed according to homogeneous Poisson point processes (HPPPs) with user densities a and i , respectively. Hence the overall user distribution is an HPPP with density u = a + i . The library consists ofM files with all files having the same size. Each user is assumed to be able to cacheS files in the device. A random caching policy [127] is employed by the users, and all users adopt the same caching policy. Denoting b m as the probability for the user to cache file m, the caching policy is expressed asfb m g M 1 , where P M m=1 b m = S M. All users follow the identical request probability distribution. The request probability of a user for file m, i.e., the probability that a user wants file m is denoted asa m with 0a m 1;8m, and P M m=1 a m = 1. The notations used in this paper are summarized in Table 5.1. 5.2.2 Elementary Access Probability Analysis Here the elementary access probabilities of using different transmission approaches are analyzed. The results will serve as the foundation for further results in the subsequent sections. Consider the caching policyfb m g M 1 . The self-caching probability of a user is defined as the probability that the desired file of the user can be found in its own cache: P S = M X m=1 a m b m : (5.3) Then considering there arek users in a cluster, the probability that a user cannot find the desired content through self-caching or D2D communications is P B;k = M X m=1 a m (1b m ) k ; (5.4) where (1b m ) k is the probability that file m is not in the caches of users of the cluster, and therefore 97 Table 5.1: Summary of Notations Notations Descriptions D;d;N;K Cell size; cluster size (cooperation distance); number of clusters; reuse factor c ;;d 0 ; Carrier frequency; path-loss exponent; breaking point distance; maximum allowable interference power a ; i ; u Density of active users; of inactive users; of overall users a ; i ; u (in a cluster) Average number of active users; of inactive users; of overall users P a k ;P i n ;P u k (in a cluster) Probability of number of active users to bek; number of inactive users to ben; number of overall users to bek T B ;T D ;T S Throughput of using BS link; of using D2D link; of using self-caching E B ;E D Power consumption of using BS link; of using D2D link as described in (5.2) S;M; Cache space of a user device; number of files in the library b m ;a m Probability for filem to be cached in a device ; probability for an active user to request filem P S ;P B;k ;P D;k Elementary access probabilities: refer the clear definitions to (5.3); to (5.4); to (5.5) 98 a m (1b m ) k is the probability that the user wants file m but file m is not in the caches of users of the cluster. Finally, when both BS and D2D links are available for a user, the probability that the user obtains the desired file via the D2D link is P D;k = 1P B;k P S = 1 M X m=1 a m (1b m ) k M X m=1 a m b m : (5.5) 5.3 Caching Policy and Cooperation Distance Design for Throughput Opti- mization In this section, the caching policy and cooperation distance design is investigated for the goal of optimizing network throughput. We first analyze the network throughput considering two different network structures, i.e., the random-push and prioritized-push networks. Then the optimization approach is proposed. We note that although the prioritized-push network is more spectrally efficient, its throughput analysis is more challenging and builds on the analysis of the random-push network. Also, in the following analyses, we assume for simplicity thatN BS is sufficiently large to provide BS links to all users that need one. While this assumption might not be true in general, from the simulations, we can observe that outage occurs mostly when the cluster size is very small (the number of clusters is large), which is usually not the cooperation distance that we are interested in (see Fig. 5.5 in Ch. 5.6). 5.3.1 Throughput Analysis for Random-Push Networks The random-push system operates as follows. For each cluster, the BS randomly chooses a user to serve without considering whether the user can obtain its desired content from its own cache. If the selected user can obtain the desired content from its own cache, the self-cache approach is used by the user; otherwise, the BS checks whether the desired content can be found through D2D links. If yes, the D2D communication is used; otherwise, the BS will serve the selected user by a BS link. The rest of the users then check whether 99 they can obtain their desired contents from their own caches. If yes, their requests can be satisfied; otherwise, they wait to be selected by the BS in the future. This system is called random-push because the BS tends to push the content to the randomly selected user without considering whether the content has already been cached by this user. Note that since the resources of both the BS and D2D communications are shared in a cluster-based manner, we indicate that only a single user in a cluster is allowed to communicate at a time. Now we analyze the throughput of the random-push system. Considering the HPPP, the numbers of active usersk and inactive usersn in a cluster are Poisson random variables with probability mass functions (pmfs) being P a k = ( a ) k k! e a ;k = 0; 1; 2;::::; and P i n = ( i ) n n! e i ;n = 0; 1; 2;::::; (5.6) respectively, where a = a d 2 and i = i d 2 . Suppose the number of active and inactive users in the cluster arek> 0 andn, respectively. Using the derived access probabilities, the throughput of the user selected by the BS is T c,Ran;k;n =T D P D;k+n +T B P B;k+n +T S P S =T D + (T B T D ) h M X m=1 a m (1b m ) (k+n) i + (T S T D ) M X m=1 a m b m : (5.7) 100 Hence the throughput of the cluster is T c,Ran = 1 X n=0 P i n 1 X k=1 P a k (T c,Ran;k;n + (k 1)T S P S ) = 1 X n=0 P i n " 1 X k=0 P a k T c,Ran;k;n +P a k (k 1)T S P S ! P a 0 (T c,Ran;0;n T S P S ) # (a) = 1 X n=0 P i n " T D + (T B T D ) M X m=1 a m (1b m ) n e abm 1 X k=0 ( a (1b m )) k k! e a(1bm) | {z } =1 ! + ( a 1)T S P S + (T S T D ) M X m=1 a m b m ! e a (T c,Ran;0;n T S P S ) # (b) = T D (1e a ) + (T B T D ) " M X m=1 a m e ( a+ i )bm # + (T S T D )(1e a ) +T S ( a 1 +e a ) M X m=1 a m b m ! (T B T D )e a " M X m=1 a m e i bm # ; (5.8) where (a) is derived by using (5.6) and rearranging the summation; (b) is derived by using the similar approach as in (a). It follows that the throughput of the system is T s,Ran =NT c,Ran : (5.9) Lemma 1-1: When given a fixedd, (5.9) is a non-decreasing concave function with respect to the feasible setB =f0b m 1;8mg. Proof. ConsiderB =f0 b m 1;8mg and a given fixedd. SinceT S T D T B > 0, a > 0, and i 0, it is simple to find that the first order partial derivative ofT s,Ran is non-negative onB. This leads to thatT s,Ran is non-decreasing with respect tob m ;8m, overB. To prove thatT s,Ran is concave overB, we note that the Hessian ofT s,Ran is a diagonal matrix with diagonal entries being non-positive. Therefore the Hessian ofT s,Ran is negative semidefinite overB. 101 5.3.2 Throughput Analysis for Prioritized-Push Networks Here we introduce the prioritized-push network, which is more practical and provides better spectral effi- ciency than the random-push network. The prioritized-push network operates as follows. For each cluster, every active user first checks whether their requests can be satisfied by their own caches. If yes, their re- quests are directly satisfied and they remain online for potential D2D cooperation; otherwise, they send the requests to the BS. The BS collects all the requests from the users who cannot be satisfied by self-caching and checks whether there exist users that can be satisfied by D2D links. If yes, the BS picks one to be served by the D2D communication. If not, the BS will randomly pick one user to be served by a BS link us- ing a given amount of BS resources. Thus, there is at least one user being served either by D2D or BS in the cluster as long as there are active users and not all of them can be satisfied by self-caching. The same procedure is implemented for every cluster. It can be immediately understood that the prioritized-push net- work is more spectrally efficient than the random-push network which also serves one user per cluster, but which randomly picks one user to serve without checking whether there are other users that can use D2D communications. 6 The throughput analysis for the prioritized-push network is more challenging. We thus, in the following, provide tractable approximations for them and use the approximations for conducting optimization. Suppose the number of active users in the cluster isk> 0. The probability of each active user to have its desired file 6 We note that, for the prioritized-push network, the number of served users by the D2D and BS links is proportional to the number of clusters, so that an increase in the cluster size automatically means a reduction in the number of non-self-served users (though the throughput still might increase, due to the higher throughput of D2D). Having said that, if we want to guarantee serving the same number of users by the D2D and BS links when the number of clusters are different, we can simply add some additional BS users who can then provides an additional throughput on the top of our adopted network structure; this does not affect the optimization of the caching policy. 102 not to be cached in the D2D network of a cluster is: 1 X n=0 P i n M X m=1 a m (1b m ) k+n = M X m=1 a m (1b m ) k e i bm : (5.10) Note that the derivation here follows the same approach as in (5.8). Using (5.10) and assuming that each user is independent, 7 the probability there is no potential D2D link in the cluster is: " M X m=1 a m (1b m ) k e i bm # k : (5.11) Then by ignoring the small probability that all users are served by either self-caching or BS, the sum through- put of the users in a cluster is approximated using 8 T c,Pri 1 X k=1 P a k 2 4 T D + (T B T D ) " M X m=1 a m (1b m ) k e i bm # k +T S (k 1) M X m=1 a m b m 3 5 =T D (1e a ) + (T B T D ) 1 X k=1 P a k " M X m=1 a m (1b m ) k e i bm # k +T S a 1 +e a M X m=1 a m b m : (5.12) Note that the total throughput is simply T s,Pri = NT c,Pri . Eq. (5.12) is too complicated for conducting caching policy and cooperation optimizations. We thus propose further approximations for them. Obviously, the complication is due to the second term in (5.12). To approximate it, we distinguish between two cases: (i) a and (ii) a > , where 1 is a small number. 9 This distinction is because we want to use two different approximations for different cases, i.e., a is small or large. When doing case 1, 10 since a is 7 This is generally not true because all users in the same cluster share the same D2D caching inventory. 8 This approximation does not work effectively when adopting a caching policy tending to be selfish and in a system whose popularity distribution is highly concentrated, e.g., = 1:3 andq = 0. However, in practice, the optimal caching policy tends to be selfish only in the case that the popularity distribution is highly concentrated or when the density of active users are overwhelmingly large, which rarely happens in practice. We note that under the practically considered popularity distributions in the simulations, this approximation works well. 9 The idea is similar to having the breaking-point in the path-loss model, and might be an empirically selected value. 10 Actually case 1 is much less important than case 2 since the optimal design usually needs more users. The reason for consid- ering case 1 is for the mathematical completeness. 103 small, i.e., there is a high probability to have a small number of active users, the most important terms of the summation are the first several terms. The following approximation with the parameter 1 is thus used: 1 X k=1 P a k " M X m=1 a m (1b m ) k e i bm # k 1 X k=1 P a k " M X m=1 a m (1b m ) k e i bm # (5.13) Then observe that the inner summation is the convex combination of several points located on a convex curve, we have 1 X k=1 P a k " M X m=1 a m (1b m ) k e i bm # 1 X k=1 P a k " M X m=1 a m (1b m ) k e i bm # 1 X k=1 P a k " M X m=1 a m (1b m ) k e i bm # ; (5.14) where the final inequality is because (1b m ) 1. By using (5.14), (5.12) can be expressed as T c,Pri-A1 =T D (1e a )+(T B T D ) " M X m=1 a m e (a+ i )bm # (T B T D )e a " M X m=1 a m e i bm # +T S a 1 +e a M X m=1 a m b m ; (5.15) which is a concave function (See Lemma 1-3 below). Considering a > , we approximate the outer exponentk using the mean value a . We thus have the following approximation: 1 X k=1 P a k " M X m=1 a m (1b m ) k e i bm # k 1 X k=0 P a k " M X m=1 a m (1b m ) k e i bm # a e a " M X m=1 a m e i bm # a " 1 X k=0 P a k M X m=1 a m (1b m ) k e i bm # a e a " M X m=1 a m e i bm # a = " M X m=1 a m e (a+ i )bm # a e a " M X m=1 a m e i bm # a ; (5.16) where the inequality is due to that x a is convex with respect to x when x 0 and a 1 and that 104 E [g(x)]g (E[x]) wheng(:) is convex (Jensen’s inequality). The resulting throughput is: T c,Pri-A2 =T D (1e a ) + (T B T D ) " M X m=1 a m e (a+ i )bm # a (T B T D )e a " M X m=1 a m e i bm # a +T S a 1 +e a M X m=1 a m b m : (5.17) To characterize (5.17), Lemma 1.2 is provided: Lemma 1-2: Suppose a 1. h P M m=1 a m e (a+ i )bm i a and h P M m=1 a m e i bm i a are convex and non-increasing with respect toB =f0b m 1;8mg. Proof. See Appendix C.1. Since (5.17) is still non-convex due to the difference of two convex functions, we further approximate it by dropping the third term in (5.17), resulting in a concave function (See Lemma 1-3 in the following): T c,Pri-AC2 =T D (1e a )+(T B T D ) " M X m=1 a m e (a+ i )bm # a +T S a 1 +e a M X m=1 a m b m : (5.18) We denoteT c,Pri-AC =T c,Pri-A1 if a ;T c,Pri-AC =T c,Pri-AC2 otherwise. Then Lemma 1-3 characterize the properties ofT c,Pri-AC : Lemma 1-3: When given a fixedd, T c,Pri-AC is a non-decreasing concave function with respect to the feasible setB =f0b m 1;8mg. Proof. The non-decreasing property and concavity ofT c,Pri-A1 can be proved by using the same approach in Lemma 1-1. We thus omit the proof for brevity. RegardingT c,Pri-AC2 , the proof is trivial by using Lemma 1-2 and observing thatT B T D 0. The simplification in (5.18) provides the tractability for optimizing caching policies, in which the throughput optimization problem becomes a standard concave program. To justify this simplification, we observe that the third term of (5.17) is due to the case that there is no active user in the cluster. Then because we generally consider the second part of the approximation to be useful when a is large, this simplification 105 could result in minor impact except for the point that a is near the breaking-point. Thus, since the points where the simplification is not effective are not near the optimal cooperation point, the error is less impor- tant. Besides, when we consider directly solving the non-convexT c,Pri-A2 using more advanced non-convex solution approaches, such as the concave-convex procedure [201], the performance does not improve. 5.3.3 Throughput-Based Caching Policy and Cooperation Distance Design According to the analysis in Secs. III.A and III.B, we design the caching policy and cooperation distance by solving the following optimization problem: max d;bm;8m=1;::;M T sys =N (T c,Ran or T c,Pri-AC ) subject to P M m=1 b m S; 0b m 1;8m: (5.19) To solve (5.19), we first observe that, if we can solve its sub-problem with any givend, the problem then becomes a simple one-dimensional problem with small range. Note that d > 0 is generally within 100 meters considering practical D2D communications, and, given the optimal solution is attainable when fixing d, the problem is solvable even by simple quantization without significant effort. We then provide the following proposition: Proposition 1: When given a fixedd, (5.19) becomes a concave optimization problem and its optimal solution must be tight at the equality of the sum constraint, i.e., for the optimal solution (b m ) ;8k;m, we have P M m=1 (b m ) =S. Proof. Follows from Lemmas 1-1 and 1-3. By Proposition 1, the problem becomes a standard concave optimization problem, and any convex solver 11 can be used to solve the problem. The overall solving approach is summarized as following: co- 11 General convex solvers need to find the Hessian matrix which requires a high computational cost as the dimension of the solution space is large. We hence note that the Lagrange multiplier based approach can be used to solve part of the problem, i.e., the part involvingT c,Ran andT c,Pri-A1, more effectively. 106 operation distanced is firstly quantized to form sub-problems of (5.19). Then the optimal caching policies of the quantized sub-problems are attained by the convex solver. Finally, by comparing between through- put results of different sub-problems, we can obtain the optimal caching along with the optimal cooperation distance. 5.4 Caching Policy and Cooperation DistanceDesign for Energy Efficiency Optimization In this work, we define the EE (bits/Joule) as the ratio of the total average throughput (bits/s) and total average power consumption (Joule/s): EE sys = T sys P sys = T clu P clu ; (5.20) whereT sys andP sys are the average throughput and average power consumption of the system, respectively; T clu andP clu are the average throughput and average power consumption of a cluster of the system. In the following, the EE is first analyzed in random-push and prioritized-push networks. Then the design aiming to optimize the EE is proposed. 5.4.1 Energy Efficiency Analysis for Random-Push Networks Recall that the average throughput of a cluster in the random-push network is derived in (5.8). Then by following the same approach, we can obtain the average power consumption of a cluster: P c,Ran =E D (1e a ) + (E B E D ) " M X m=1 a m e ( a+ i )bm # E D (1e a ) M X m=1 a m b m ! (E B E D )e a " M X m=1 a m e i bm # (5.21) By substituting (5.8) and (5.21) into (5.20), the EE of the random-push network is then derived. Lemma 2-1: When given a fixedd, T c,Ran P c,Ran is a positive quasi-concave and non-decreasing function with respect toB. 107 Proof. By noticingE B E D and following the same approach as in Lemma 1-1,P c,Ran can be proved to be a positive convex function and non-increasing with respect toB. Then observe thatP c,Ran is convex and non-increasing with respect toB;T c,Ran is concave and non-decreasing with respect toB;P c,Ran andT c,Ran are both positive. Thus, T c,Ran P c,Ran is a positive quasi-concave and non-decreasing function with respect toB [202]. 5.4.2 Energy Efficiency Analysis for Prioritized-Push Networks By following similar ideas and derivations as for the throughput analysis in Ch. 5.3.2, the power consump- tion of the prioritized-push network is approximated by: P c,Pri E D (1e a ) + (E B E D ) 1 X k=1 P a k " M X m=1 a m (1b m ) k e i bm # k ; P c,Pri-A1 =E D (1e a ) + (E B E D ) " M X m=1 a m e (a+ i )bm # (E B E D )e a " M X m=1 a m e i bm # ; P c,Pri-A2 =E D (1e a ) + (E B E D ) " M X m=1 a m e (a+ i )bm # a (E B E D )e a " M X m=1 a m e i bm # a ; P c,Pri-AC2 =E D (1e a ) + (E B E D ) " M X m=1 a m e (a+ i )bm # a ; (5.22) where the first equation is the power consumption counterpart of (5.12); the second is the counterpart of (5.15); the third is the counterpart of (5.17); and the forth is the counterpart of (5.18). Then again by substituting the approximations of the throughput and power consumption into (5.20), the approximation for EE in the prioritized-push network can be obtained. DenotingP c,Pri-AC =P c,Pri-A1 if a ;P c,Pri-AC = P c,Pri-AC2 otherwise, we have the following Lemma: Lemma 2-2: When given a fixedd, T c,Pri-AC P c,Pri-AC is a positive quasi-concave and non-decreasing function with respect toB. Proof. By following the same approach as for the proof of Lemma 1-1, we can prove thatP c,Pri-A1 is positive convex and non-increasing with respect toB. Also, by using Lemma 1-2, P c,Pri-AC2 can be proved to be 108 positive convex and non-increasing with respect toB. Then by combining the above results, Lemma 2-2 is proved. 5.4.3 EE-Based Caching Policy and Cooperation Distance Design By (5.20) and the EE analyses in Secs. IV .A and IV .B, the EE optimization problem is max d;bm;8m=1;::;M EE sys = T c,Ran P c,Ran or T c,Pri-AC P c,Pri-AC subject to P M m=1 b m S; 0b m 1;8m: (5.23) Here we use the same approach as in Ch. 5.3.3 in which we solve the sub-problems with quantizedd, and then perform a one-dimensional search. Therefore we focus on solving the problem with fixedd. For this case we have the following proposition: Proposition 2: For a fixedd, (5.23) is a standard quasi-concave problem and its optimal solution is tight at the equality of the sum constraint. Proof. Again follows from Lemmas 2-1 and 2-2. By Proposition 2, we know that (5.23) becomes a standard quasi-concave optimization with a convex feasible set when fixing d. Consequently, a standard solving procedure is used and briefly described as follows. By introducing a slack variable, the problem is equivalent to max t;bm;8m=1;::;M t subject to EE sys t P M m=1 b m S; 0b m 1;8m: (5.24) SinceP sys is positive, for a fixedt, we have a convex feasibility problem: max bm;8m=1;::;M 0 subject to T sys +tP sys 0 P M m=1 b m S; 0b m 1;8m: (5.25) 109 Note that if (5.25) is feasible, t is achievable. Since (5.25) is solvable by standard convex solvers, by exploiting the bisection or other adaptive approaches to adjustt, a solution arbitrarily close to the optimum can be obtained. 12 5.5 Throughput–Energy Efficiency Trade-Off Analysis and Design It can be observed that optimizing EE could be different from optimizing throughput, and there exists a trade-off between them. This section aims to characterize such trade-off and provide the trade-off design. To analyze the trade-off between throughput and EE, we need to consider a multi-objective optimization problem containing different objectives that could conflict with each other. That is to say, trade-offs between different objectives exist in the problem and a solution that dominates in all aspects generally does not exist. We thus introduce the pareto-optimality [199] in the throughput–EE domain in Proposition 3. Proposition 3: A pareto-optimal solution is defined as the solution withd o andfb m;o g M 1 such that there does not exist another feasible solution withd andfb m g M 1 satisfying the following conditions simultane- ously: T sys (fb m g M 1 ;d)>T sys (fb m;o g M 1 ;d o ); EE sys (fb m g M 1 ;d)>EE sys (fb m;o g M 1 ;d o ): (5.26) Since there could exist multiple pareto-optimal solutions, the collection of all such solutions is denoted as the pareto-optimal set . A common approach to deal with multi-objective problems is to convert the problem into a single objective problem via the weighted sum method [199]. We then provide Proposition 4 that helps us in finding pareto-optimal solutions. 12 Again, part of the solution approach can be incorporated with the more efficient Lagrange multiplier based approach. 110 Proposition 4: The optimal solution of the following problem gives a solution in a pareto-optimal set : max d;bm;8m=1;::;M w 1 T sys +w 2 EE sys subject to P M m=1 b m S; 0b m 1;8m: d2 feasible range (5.27) Note that (5.27) reduces to the throughput and EE optimization problem when consideringw 1 = 1,w 2 = 0 andw 1 = 0,w 2 = 1, respectively. Proof. We prove Proposition 4 by contradiction. Assume that the optimal solutiond o andfb m;o g of (5.27) does not give a pareto-optimal solution. Then there must exist a d andfb m g such that T sys (fb m g;d) > T sys (fb m;o g;d o ) andEE sys (fb m g;d)>EE sys (fb m;o g;d o ) are satisfied. It follows that w 1 T sys (fb m g;d) +w 2 EE sys (fb m g;d)>w 1 T sys (fb m;o g;d o ) +w 2 EE sys (fb m;o g;d o ): (5.28) This contradicts thatd o andfb m;o g are optimal for (5.27). To solve (5.27), the analyses in Secs. III and IV are exploited. We again focus on solving the problem with a fixed cooperation distanced. Then by considering the approximations in the analyses and given a fixedd, we observe that the objective function of (5.27) has a special structure: h(x) + f(x) g(x) ;x2B; (5.29) whereh(x) is concave,f(x) is concave, andg(x) is convex overB, respectively. We note that this stucture can be clearly identified by denotingh(x) =T sys ,f(x) =T sys , andg(x) =P sys , and by using Propositions 1 and 2. Note that in [200], (5.29) has been proven to be NP-complete and an efficient approach to find the-approximation of the global optimal solution of (5.29) was proposed. Thus, results and techniques in [200] can be exploited to solve (5.29). 13 13 Although only the minimization counterpart of (5.29) was explicitly investigated in [200]. According to [200] and our own investigations, concept, results, and derivations can be applied to (5.29) after some modifications. 111 5.6 Numerical Results This section provides numerical results to validate our analyses and evaluate the proposed designs. For all simulations in this paper, we consider the following parameters and setup: D = 600 m andK = 16. Also, we considerd 0 = 5 m, = 3:68, c = 310 8 f c , andf c = 2 GHz in the path-loss model. The maximum allowable interference is set to be at the order of noise power, i.e., = 2 2 N 0 W c,D2D , whereN 0 =174 dBm/Hz is the noise power density andW c,D2D = 20 MHz is the bandwidth for a D2D link. Thus the total bandwidth used for the D2D communications is 320 MHz. Although the theoretical framework in Secs. III, IV , and V does not consider the practical limit of the BS, in the simulations, unless otherwise indicated, we consider each BS link can use 200 kHz of bandwidth and 26 dBm of power consumption for transmission. Besides, we consider the BS to have 46 dBm total transmit power andW BS = 20 MHz total bandwidth. Thus, the maximum number of users that can be served by the BS isN BS = 100. By the aforementioned parameters, we considerT B = 200 kbits/s andT D = T S = 20 Mbits/s;E B = 26 dBm andE D 23 dBm is computed by (5.2). Thus, the cooperation distance d is within 100 meters. Note that here T B and T D are easily achievable in practice andT B = 200 kbits/s can provide the video quality with 360p [203]. We considerM = 1000 andS = 10, and the request probabilities follow a MZipf distribution in [77], which has recently been extracted from a very large, real-world dataset, i.e., the BBC iPlayer dataset, with parameters andq: a m = (m +q) P M n=1 (n +q) : (5.30) It can be seen that whenq = 0, the MZipf distribution reduces to the commonly used Zipf distribution. To evaluate the proposed designs in practical situations, in the simulations, two parameter sets are used: = 0:6,q = 0 and = 1:28,q = 34. The first parameter set is from the UMass Amherst youtube experiment [71], which is widely used in the literature, and the second corresponds to the parameters reported in [77]. Furthermore, we adopt two density sets of users: a = 0:0008 m 2 and i = 0:0042 m 2 ; a = 0:0022 112 Table 5.2: Summary of Parameters Parameters Values/Descriptions D;N;K;N BS ;W c,D2D ;W BS D = 600 m;N = D d ;K = 16;N BS = 100;W c,D2D = 20 MHz ;W BS = 20 MHz N 0 ; c ;;d 0 ; N 0 =174 dBm/Hz; c = 2 GHz; = 3:68;d 0 = 5 m; = 2 2 N 0 W a ; i ; u a = 0:0008; 0:0022; 0:0032m 2 ; i = u a ; u = 0:0050m 2 T B ;T D ;T S T B = 200 kbits/s;T D = 20 Mbits/s;T S =T D E B ;E D E B = 26 dBm;E D 23 dBm is determined by the power control S;M; ;q ; S = 10;M = 1000; = 0:6; 1:28 ;q = 0; 34 ; = 1:8 m 2 and i = 0:0028 m 2 . Both of these have a considerable number of inactive users. These values were chosen because we are considering video streaming applications, in which each user could occupy a large amount of resources and even though the percentage of data that is used for video is very high, the percentage of users using video streaming at any time need not be; furthermore only a fraction of all cellphones in an area are active at all. Finally, for designs in prioritized-push networks, = 1:8 is adopted according to some empirical experiences. The system parameters used in the simulations are summarized in Table 5.2. In Figs. 5.1, 5.2, and 5.3, to focus on evaluating the analysis results, we simulate the networks without considering the practical resource constraint, i.e.,N BS is temporarily assumed to be always sufficient in these figures. In Fig. 5.1a, we evaluate the proposed analyses of the random-push networks adopting = 0:6 andq = 0. When obtaining the results in Fig. 5.1, we adopt the caching policy designed by the proposed optimization in Ch. 5.3.3 for the random-push network. From the figures, we can observe that the analytical results are consistent with the Monte-Carlo results. Besides, it is interesting to observe that the EE increases with increasing cooperation distance. This is intuitive because when the cooperation distance, i.e., cluster size, increases, the probability that the user can find the desired file increases, leading to better EE. This is in contrast to the optimal throughput case where an increase of cooperation distance is not always good 113 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 0.5 1 1.5 2 2.5 Throughput (bits/s) 10 9 Analytical ( a =0.0008, i =0.0042) Monte-Carlo ( a =0.0008, i =0.0042) Analytical ( a =0.0022, i =0.0028) Monte-Carlo ( a =0.0022, i =0.0028) (a) Throughput. 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 0 0.5 1 1.5 2 2.5 3 3.5 EE (bits/mJoule) 10 5 Analytical ( a =0.0008, i =0.0042) Monte-Carlo ( a =0.0008, i =0.0042) Analytical ( a =0.0022, i =0.0028) Monte-Carlo ( a =0.0022, i =0.0028) (b) EE. Figure 5.1: Evaluation of the proposed analyses in the random-push networks with = 0:6 andq = 0. 114 because it could decrease the number of clusters, leading to a lower total throughput. Note that although the increase of cooperation distance can also increase the power consumption of the D2D links and decrease the total throughput, the increase of probability of having the desired file in the D2D network, i.e., the hit- rate, is overwhelmingly important in the random-push network because the BS randomly picks one user to serve in this network and the BS power consumption is dominating. A different behavior is observed in the prioritized-push network. In Figs. 5.2, we evaluate the proposed analytical results in the prioritized-push network adopting = 0:6,q = 0, a = 0:0008m 2 , and i = 0:0042m 2 . The adopted caching policy in the figure is designed by optimizingNT c,Pri-AC as discussed in Ch. 5.3.3. In Fig. 5.2, curves labeled byT Pri (P Pri ) are results of (5.12) and its power consumption counterpart; the curves labeled by T Pri-A (P Pri-A ) are results jointly expressed by (5.15) and (5.17) and their power consumption counterparts. From the figure, we can observe that the proposed approximations can effectively characterize the trend of the Monte-Carlo result, though there is a gap between the analyses and the Monte-Carlo results. We note that for other combinations of densities and MZipf parameter set not shown here for space reasons, similar results are observed. In Fig. 5.3 we compare T c,Pri-A with T c,Pri-AC in the prioritized-push network adopting = 0:6 and q = 0 to validate the justification of usingT c,Pri-AC for optimization. From the figure, we can observe thatT c,Pri-AC2 is obviously different only atd = 30 for a = 0:0022 andd = 50 for a = 0:0008, respectively, and these points are the closest points to the breaking-point and are not the optimal points. Note that although not shown here, we did compare the proposed designs with the designs obtained by directly optimizingT c,Pri-A using convex-concave procedure [201], which is a non-convex optimization, and saw no improvement. In the remaining simulations, we consider the practical resource constraint and focus on the prioritized- push network since it is more spectrally efficient and practical. To validate that the prioritized-push network is more spectrally efficient than the random-push network. Fig. 5.4 compares their network throughput adopting = 0:6 andq = 0 and the same caching policy designed by optimizingT c,Pri-AC . From the figure, 115 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 0.6 0.8 1 1.2 1.4 1.6 Throughput (bits/s) 10 9 Throughput - Monte-Carlo Throughput - T Pri-A Throughput - T Pri (a) Throughput. 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 0 0.5 1 1.5 2 2.5 3 EE (bits/mJoule) 10 5 EE - T Pri-A /P Pri-A EE - T Pri /P Pri EE - Monte-Carlo (b) EE. Figure 5.2: Evaluation of the proposed analyses in the prioritized-push network with = 0:6, q = 0, a = 0:0008m 2 , and i = 0:0042m 2 . 116 0 10 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 2.5 3 3.5 4 10 9 T Pri-AC ( a =0.0008, i =0.0042) T Pri-A ( a 0.0008, i =0.0042) T BS-Pri-AC ( a 0.0022, i =0.0028) T BS-Pri-A ( a 0.0022, i =0.0028) Figure 5.3: Throughput comparisons between the approximations in prioritized-push networks with = 0:6 andq = 0. 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 0.5 1 1.5 2 2.5 3 3.5 Throughput (bits/s) 10 9 Random-Push ( a =0.0008, i =0.0042) Prioritized-Push ( a =0.0008, i =0.0042) Random-Push ( a =0.0022, i =0.0028) Prioritized-Push ( a =0.0022, i =0.0028) Figure 5.4: Throughput comparisons between the random-push and prioritized-push networks with = 0:6 andq = 0. 117 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 0.5 1 1.5 2 2.5 3 3.5 Throughput (bits/s) 10 9 Limited ( a =0.0008, i =0.0042) Unlimited ( a =0.0008, i =0.0042) Limited ( a =0.0022, i =0.0028) Unlimited ( a =0.0022, i =0.0028) Figure 5.5: Throughput comparisons between the networks with and without resource constraint in the prioritized-push network with = 0:6 andq = 0. we can readily see that the prioritized-push network offers better throughput. In Fig. 5.5, we consider = 0:6 and q = 0 and compare the prioritized-push networks with and without the practical resource constraint. The adopted caching policy in the figure is designed by the proposed throughput optimization in Ch. 5.3.3. Note that sinceN BS = 100, when considering the practical constraint, the BS can serve at most 100 users. The curves labeled by ”Limited” indicate the results considering the practical resource constraint; the curves labeled by ”Unlimited” indicate the results without considering the practical resource constraint. From the figure, we can observe that the difference between the curves are significant whend is less than 20 m, which are points that we are not interested in. 14 In the following, we evaluate the proposed designs, i.e., the proposed throughout and EE designs, and compare them with the ”Max-Hit-Rate” design proposed in [74] and the ”selfish” design, in which each user caches the most popular files. By using the simulations, we also discuss the trade-off between throughput and EE. In Fig. 5.6, different caching designs are evaluated in terms of throughput and the adopted MZipf parameters are = 0:6 and q = 0. From the figures, we can observe that the proposed throughput and EE designs can provide very similar throughput performance. 15 The Max-Hit-Rate approach can provide 14 Similar results can be observed for the EE case. 15 The same results can be observed when considering the random-push networks. 118 an effective performance when the systems operate at a suitable cooperation distance, but its performance degrades significantly when the cooperation distance is large. This is because the Max-Hit-Rate approach cannot balance between using self-caching and D2D-caching, and thus the low frequency reuse gain due to the large cluster size could lead to a significant throughput degradation. This result indicates that the self- caching is influential and the effects of D2D-caching (with D2D communications) and self-caching should be jointly considered. This also indicates that when adopting the Max-Hit-Rate policy, it is safer to have a smaller cluster size rather than a larger cluster size to prevent the significant throughput degradation. The selfish approach performs poorly because it does not consider the benefits of D2D communications. In Figs. 5.7, different caching designs are evaluated in terms of EE and the adopted MZipf parameters are = 0:6 andq = 0. From the figures, we can observe that the proposed EE design can offer the best EE performance. Again, the Max-Hit-Rate design is effective when the cooperation distance is appropriately selected, and the selfish design provides poor performance. By comparing between the throughput and EE evaluations, it can be observed that the optimal cooperation distances are different. This leads to the trade- off between throughput and EE when selecting different cooperation distances, 16 and the compromise can be taken by selecting the cooperation distances between these two optimal cooperation distances. We note that although the Max-Hit-Rate design could be effective in terms of throughput and EE when appropriately selecting the corresponding cooperation distances, respectively, it offers a less effective trade-off between throughput and EE. Besides, by comparing results in Figs. 5.6 and 5.7, we can see that the Max-Hit-Rate design starts to diverge from the best throughput and EE designs when the density of active users increases. This indicates that when the density of active users increases, the best policy starts to be different from pure cooperation. Finally, from all figures, we observe that when the density of active users increases, the optimal 16 We also observe that, when considering a given cooperation distance, the proposed EE design can be near-optimal and optimal in terms of throughput and EE, respectively. This degrades the usefulness of the compromise design discussed in Ch. 5.5. We thus omit the simulations using different weights of (5.27). That being said, we think that the provided mathematical framework in Ch. 5.5 could be useful in certain scenarios or other parameter sets. 119 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 0.6 0.8 1 1.2 1.4 1.6 Throughput (bits/s) 10 9 Proposed Throughput Design Proposed EE Design Max-Hit-Rate Design Selfish Design (a) a = 0:0008m 2 andi = 0:0042m 2 . 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 1 1.5 2 2.5 3 3.5 Throughput (bits/s) 10 9 Proposed Throughput Design Proposed EE Design Max-Hit-Rate Design Selfish Design (b) a = 0:0022m 2 andi = 0:0028m 2 . Figure 5.6: Throughput comparisons between different caching policies in the prioritized-push network with = 0:6 andq = 0. 120 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 0 0.5 1 1.5 2 2.5 EE (bits/mJoule) 10 5 Proposed Throughput Design Proposed EE Design Max-Hit-Rate Design Selfish Design (a) a = 0:0008m 2 andi = 0:0042m 2 . 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 0 1 2 3 4 5 6 EE (bits/mJoule) 10 5 Proposed Throughput Design Proposed EE Design Max-Hit-Rate Design Selfish Design (b) a = 0:0022m 2 andi = 0:0028m 2 . Figure 5.7: EE comparisons between different caching policies in the prioritized-push network with = 0:6 andq = 0. cooperation distance decreases, owing to the benefits of the frequency reuse and eliminating the necessity of having many active users in a cluster. In Fig. 5.8, different caching designs are evaluated in the prioritized-push network adopting MZipf with = 1:28 andq = 34 and densities a = 0:0022 and i = 0:0028 in terms of throughput and EE. Similar results as in the previous figures can be observed, i.e., the effectiveness of the Max-Hit-Rate design and the trade-off between throughput and EE when adopting different cooperation distances. Overall, the simulation results show that, with practical popularity distributions, the trade-off between throughput and 121 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 1 1.5 2 2.5 3 3.5 4 Throughput (bits/s) 10 9 Proposed Throughput Design Proposed EE Design Max-Hit-Rate Design Selfish Design (a) Throughput. 0 10 20 30 40 50 60 70 80 90 100 Distance (meters) 0 2 4 6 8 EE (bits/mJoule) 10 5 Proposed Throughput Design Proposed EE Design Max-Hit-Rate Design Selfish Design (b) EE. Figure 5.8: Performance comparisons between different caching policies in the prioritized-push network with = 1:28,q = 34, a = 0:0022m 2 , and i = 0:0028m 2 . EE exists, and the trade-off can be adjusted by changing the cooperation distance. Besides, although the Max-Hit-Rate approach could provide good results when the cooperation distance is appropriately selected, it provides poor trade-off. Furthermore, the superior performance of the proposed designs as compared with the Max-Hit-Rate and selfish designs indicates that jointly considering the effects of D2D- and self-caching is important. In Fig. 5.9, we compare between different policies using different densities of active and inactive users to see the influence in the same network as in Figs. 5.8. We can once again observe that the Max-Hit-Rate 122 0 50 100 Distance (meters) 0.6 0.8 1 1.2 1.4 1.6 1.8 Throughput (bits/s) 10 9 a =0.0008, i =0.0042 Proposed Throughput Design Max-Hit-Rate Design Selfish Design 0 50 100 Distance (meters) 1 1.5 2 2.5 3 3.5 4 Throughput (bits/s) 10 9 a =0.0022, i =0.0028 0 50 100 Distance (meters) 1 1.5 2 2.5 3 3.5 4 4.5 5 Throughput (bits/s) 10 9 a =0.0032, i =0.0018 Figure 5.9: Throughput comparisons between different caching policies and densities in the prioritized-push network with = 1:28 andq = 34. policy starts to diverge from the proposed throughput design when the density of active users increases, just as in Figs. 5.6 and 5.7. More interestingly, even when we increase the density of active users to a larger number, the selfish policy is still much worse than the others. This indicates that in the prioritized-push networks considering practical popularity distribution, a policy to be cooperative and to exploit the D2D communications should be more appropriate in terms of throughput. A similar result can also be observed in terms of EE. We thus omit the corresponding figure for brevity. 5.7 Conclusions By considering the joint effects of BS-, D2D-, and self-caching and the impact of the cooperation distance, the design of caching policy and cooperation distance is investigated in the clustering BS-assisted wireless D2D caching network. Based on this setup, we analyze and optimize the network throughput and EE with two different network structures, i.e., random-push and prioritized-push networks. Note that the although the prioritized-push network is more spectrally efficient and practical, its analysis builds on the analysis of the random-push network. Since the throughput-based and EE-based designs could conflict with each 123 other, to resolve this issue, we discuss the trade-off between them. From simulations, we conclude that the self-caching effect is influential and considering the joint effects of D2D- and self-caching is important. Besides, the proposed throughput and EE designs can outperform other designs and provide better trade-off because they can acquire the balance point between selfishness and cooperativeness. By comparing between the throughput and EE evaluations, it can be observed that their optimal cooperation distances are different. This leads to the trade-off between throughput and EE when selecting different cooperation distances. This work focuses on the throughput- and EE-relevant designs and analyses. Consequently, the delay performance or any other delay-related constraints, such as outage performance, is not considered. The corresponding performance investigations and trade-offs are considered to be important future directions. 124 C H A P T E R 6 Individual Preference Probability Modeling and Parameterization for Video Content in Wireless Caching Networks 6.1 Introduction While algorithms for wireless video content caching have been widely explored, most of the literature adopts a homogeneous file-requesting model, i.e., assumes all users have the same file popularity distribution for deciding the caching policy. 1 Clearly this assumption violates the intuition that different users have different tastes and preferences, and the fact that different users have different preferences has been validated in various works [78, 191, 204, 205]. Thus, designs adopting the homogeneous file-requesting model are restricted to some extent due to lack of considering the individual user preference. To resolve this issue, papers exploiting the heterogeneous file-requesting model based on individual user preferences for caching and/or delivering contents have gradually drawn attention [82, 86, 87, 206–208]. Moreover, analyses of the individual preferences have demonstrated their capability of offering fundamental insights that might further enhance the system and/or strategy designs [78, 191]. Although recent literature starts to take individual preferences into consideration, the focus is basically 1 We also make this assumption in Ch. 3, Ch. 4, and Ch. 5. 125 on the policy design and network analysis based on certain abstract mathematical models without the support of real data. In fact, to our best knowledge, the statistical modeling for the individual user preferences based on real-world data has not been well investigated. Note that modeling the individual preference of a particular user for recommendation systems, also known as the “Netflix challenge”, has been investigated intensely by using learning methods [78, 204, 205]. However, this is different from the need to find statistics of individual user distributions. 2 6.1.1 Contributions In this chapter, we aims to fill the aforementioned gap by proposing a statistical modeling framework and its generation approach for individual user preferneces. Our model uses hierarchies of probabilities to represent preferences of users. Empirically, video files can be categorized into genres according to their features, and users might have strong preferences toward a few genres [78]. The overall request probability of a user for a file is then modeled as the probability that a user wants a specific genre, and the popularity of a file within this genre. Since the individual preference probabilities of users can be described by the individual popularity distributions and individual ranking orders, statistics of them are respectively investigated using the genre-based structure. We note that, in this chapter, we implicitly denote the distribution as the rank- frequency distribution when we use the term: popularity distribution. We aim to extract the models and parameterization for these different statistics. Such modeling and parameterization has to be based on real-world data to be meaningful. We are thus using data from an extensive dataset collected in the U.K. in 2014, namely the usage of the BBC iPlayer [78, 191, 209, 210]. 3 By observing the real data, we identify several important aspects of characterizing in- 2 The Netflix challenge focuses on the per-user perspective while the statistical modeling in this work aims to statistically represent the preferences of the whole user set in the network. 3 Though not being presented in the main part of this chapter, the proposed modeling framework has been used to analyze another dataset in which the data is collected from social media. The results show that the general structure of our modeling framework, 126 dividual preferences, and propose a modeling framework for individual preference probabilities. Besides, parameterization of the proposed framework is provided via understanding and modeling the distributions of parameters in the framework. Moreover, to enhance the parameterization, correlations between different pa- rameters and statistics used by the framework are also investigated. By following the modeling framework, an individual preference probability generation approach is proposed via judiciously linking the parameters and models together. We validate the proposed modeling and generation approach using real-world data. The validation results demonstrate that the proposed modeling and generation approach can effectively re- produce important features and statistics of the individual preference probabilities. Therefore the results can be helpful for designing, optimizing, analyzing, modeling, and simulating systems exploiting content caching and delivery as well as for studies of file popularities and user preferences. To be more specific, we have the following contributions: We propose a genre-based hierarchical modeling framework to enable the statistical descriptions of the individual preference probabilities. Specifically, we first identify that, instead of using the element- wise description, the individual preference probabilities of a user can be jointly described by the individual popularity distribution and individual ranking order. The genre-based hierarchical model- ing approach is then applied to both of them. The corresponding statistical models for the individual popularity distribution and individual ranking order are thus proposed, respectively. Since each user owns a parameter set for describing their preference probabilities, the number of parameters for the whole user set is so large that they can only be numerically handled and grad- ually become impossible to handle when the number of users increases. To resolve this issue, a statistical parameterization is conducted for every parameter in the framework drastically condensing and the functional shape of the different curves, carry over very well. The specific parameterizations are, of course, different between the two data sets, since they describe different types of video services. The results of this additional dataset are provided in Appendix D.6. 127 the description. Such statistical parameterization not only simplifies the representation of individual preferences but also enables the proposition of the individual preference generation without a huge parameter set. Correlation analyses between parameters in the framework is conducted. The results reveal the critical correlation features of individual preferences and enhance the parameterization. By exploiting the framework, modeling, parameterization, and correlation analysis in this work, a complete implementation recipe of individual preference probability generation approach is pro- posed. 4 All results need to be based on real-world data to be meaningful. Thus an extensive dataset from the BBC iPlayer is used for our investigations and validations of all propositions. The remainder of the chapter is organized as follows. Ch. 6.2 introduces the basic modeling concepts and describes the necessary tools for manipulating the dataset. The main modeling framework is presented in Ch. 6.3 and Ch. 6.4. In Ch. 6.5, parameterization approaches and results are provided. Ch. 6.6 offers the correlation analysis. We propose the individual preference probability generation recipe and conduct the corresponding numerical validations in Ch. 6.7. We summarize insights and discuss applications of our work in Ch. 6.8. Ch. 6.9 concludes this work. Various detail aspects are presented in Appendix D. 6.2 Individual Preference Probability Modeling and Dataset Preparations 6.2.1 Modeling on Individual Preference Probability In this work, we represent the individual preference by the individual user probability, which is defined as the probability that a specific user will in the future request a specific file for watching; multiple views by 4 A complete code for the generation of the individual preference probabilities of users according to the data can be found in [184]. 128 the same user are thus ignored (i.e., treated the same as single viewing). Since different users can have different preferences, preference probabilities of different users for the same file could be different. We assume that each file can be uniquely assigned to a genre, and there areG genres in the library. Therefore denotingM g as the number of files in genreg, the total number of files in the library is given by P G g=1 M g . Given this library, we denote the preference probability of the filem in genreg for userk asp k g;m . Then the following properties must hold: 0p k g;m 1;8g;m;k and P G g=1 P Mg m=1 p k g;m = 1;8k. We note that, when considering only probability representation for the individual preferences, the impact of loading of users on the system preference, i.e., the global popularity distribution, cannot be modeled. Therefore, the statistical modeling of loading of users is independently investigated as a constituent of system parameters in Ch. 6.5.4, and the loading distribution is used when generating the global popularity. We also note that, roughly, the loading of a user is its number of accesses to the files in the dataset, and the precise definition will later be provided in Ch. 6.5.4. To characterize individual preference probabilities of users, two important features need to be character- ized: individual popularity distributions of files and individual ranking orders of files. Different individual popularity distributions represent different concentration rates of popularity distributions that different users might have, and different individual ranking orders represent different preferences for files by ranking files differently. To clarify these concepts, we provide a simple example. We consider two users with different preferences. Suppose that G = 1 and M 1 = 3. Therefore there are three files in the library. Then sup- pose we havep 1 1;1 = 0:5, p 1 1;2 = 0:3, p 1 1;3 = 0:2; andp 2 1;1 = 0:05, p 2 1;2 = 0:7, p 2 1;3 = 0:25. Then note that these six popularity values are a complete description, but obviously such a description becomes im- possible to handle when considering thousands of files and millions of users. From the description, it can be observed that their popularity distributions are somewhat different, i.e., 0:5, 0:3, 0:2 and 0:7, 0:25, 0:05, re- spectively, so that the second user has a stronger concentration than the first. In addition, the ranking orders are different, namely 1, 2, 3 and 2, 3, 1, respectively. It can thus be observed that the differences between 129 preferences of users can be fully described by the differences of individual popularity distributions and in- dividual ranking orders. While the above example gives deterministic descriptions, in the following we will aim for stochastic descriptions of these quantities. To avoid confusions, in the following sections, we use global popularity/probability of genres/files to denote the popularity/probability of genres/files computed by taking all users into consideration. Conversely, the individual popularity/probability of genres/files is used to denote the popularity/probability computed by considering only a single specific user. In addition, without loss of generality, we consider the indices of genres to follow the descending order of the global popularities of genres, i.e., the global popularity of genre g is larger than the global popularity of genreg + 1 for all 1gG. 6.2.2 Dataset Descriptions and Preprocessing This work uses an extensive set of real-world data, namely the dataset of the BBC iPlayer [78, 191]. The BBC iPlayer is a video streaming service provided by BBC (British Broadcasting Corporation) that provides video and radio content for a number of BBC channels without charge. 5 Content on the BBC iPlayer is basically available for 30 days after its first appearance [191]. We consider the two datasets covering June and July 2014, which include 192,120,311 and 190,500,463 recorded access sessions, respectively. 6 In each record, access information of the video content contains two important columns: user id and content id. user id is based on the long-term cookies that unqiuely (in an anonymized way) identify users. content id is the specific identity that uniquely identifies each video content separately. Although there are certain exceptions, user id and content id can generally help identify the user and the video content of each access. 5 We note that the BBC iPlayer is a massive application, ranked 2 in terms of the load it imposes on UKs networks. Only YouTube had more load than the dataset we consider. Thus, learning statistics and optimizing the network architecture for just this one application is a worthwhile endeavour because it can lead to huge impact on overall traffic of whole countries. 6 Although what period of the dataset to choose should depend on the specific scenarios and applications, the choice of one- month is reasonable here since the BBC iPlayer assumes a weekly update and the average valid time for the files is approximately 30 days [191]. 130 In addition to access identifications, video files in the BBC iPlayer are annotated with one or more genres. 7 More detailed descriptions of the BBC iPlayer dataset can be found in [78, 191]. To facilitate the investigation, preprocessing is conducted on the dataset. We first define “unique ac- cess”. By observations, we notice that a user could access the same file multiple times, possibly due to temporary disconnnections from Internet and/or due to temporary pauses raised by users when moving be- tween locations. Since a user is generally unlikely to access the same video after finishing to watch the video within the period of a month, 8 we assume that each user only needs to access a video once, and can cache it for any subsequent views. 9 We therefore consider multiple accesses made by the same user to the same file as a single unique access. We furthermore define a regular user as a user with more than 30 unique ac- cesses in a month, and restrict our subsequent investigation to the regular users. We note that the number of unique regular users in June and July are 384,596 and 369,105, respectively. As described previously, a file could be annotated with no, one, or several genres, and the genre-wise classification is the foundation for characterizing preferences of users in our work. Hence if a file cannot be classified into any genre, i.e., if no genre is annotated on the file, the file is filtered out during the preprocessing. Besides, if a file is annotated with multiple genres, the file is considered shared by all annotated genres, i.e., each genre is considered accessed 1 N times when a file withN annotated genres is accessed. When considering the dataset constituted by regular users, the number of genres in the library is 110. 7 Notice that there are certain files that are not annotated with any genre. We simply filter them out, as described in the following paragraph. 8 This statement was partly supported by results in [78, 191]. Please refer to to Ch. III.A and V .A in [78] for details. Besides, by using the real data, we measure the rate that a user might watch the same video on different days within the same month, and obtain the result of approximate 6%, i.e., the combined number of minutes watched for a file from the successive sessions is usually not more than the total duration of the file. 9 Using the approach that users cache on the first view and replay from local cache for the subsequent views, this can even be applied to any other dataset since the external access pattern is constructed to be similar to the one with the unique access property. 131 6.2.3 Kullback-Leibler Distance Based Parameter Estimation In Ch. 6.3 and Ch. 6.4, we propose models to fit statistics acquired from the dataset. To find the suitable models and the parameters that best fit the models to the real data, the minimum Kullback-Leibler (K-L) distance approach is adopted and is given by ^ x = arg min x D KL (x) = X m p real m log p real m p model m (x) ; (6.1) wherex is the vector representation of parameters, p real m is the probability of outcomem in real data, and p model m (x) is the probability of outcome m characterized by the proposed model and x. We note that p real m log p real m p model m (x) = 0 if p real m = 0 by definition; P m p real m = 1; and P m p model m (x) = 1. To find a good model of the target statistics, the following steps are basically used: (I) we choose distributions based on vi- sual inspection; (II) we confirm the fitness of the chosen distributions by the above K-L test. Thus, the main justification for adopting the models and set of functions proposed in this work is empirical. 10 We note that the efficacy of the K-L distance can be interpreted from the point of view of information theory. According to [211], the K-L distance is also known as the relative entropy. Thus the K-L distance between the model and the data reflects the increase of entropy when we approximate the distribution of the data by the model. The value of the K-L distance is the number of additional bits (nats) on average we need to use when the code designed for describing the random variable of the modeling distribution is used to describe the ran- dom variable embodied by the data. In other word, the K-L distance measures the inefficiency of the model for describing the real data. It should be noted that the K-L based estimation has pros and cons, and the corresponding discussions are provided in Appendix D.1. 10 The relevant empirical justifications for the proposed models are provided in Appendix D.5. 132 6.2.4 Genre-Based Structure and Modeling In this work, a genre-based structure is adopted for the proposed modeling. This structure is adopted both for pragmatic and fundamental reasons. From a practical point of view, a direct modeling of individual popularities would involve too many parameters (a similar reasoning underlies, e.g., cluster-based modeling of wireless propagation channels). Besides, according to the analysis in [78, 191] and our results, the users show strong preferences for a few specific genres. Thus, the ability to characterizing genre preferences is important for the model. More fundamentally, it is infeasible to formulate the statistics of individual user preferences on files by simply observing the accesses of users: in other words, a user does not have a probability to access a specific file - (s)he either requests it or does not. Therefore, instead of directly finding the statistics of file preferences, we first investigate the statistics of genre preferences of users, and then approximate the file preferences within each genre by using the conditional non-user-specific statistics of files in each genre. Since the preference probabilities of a user are fully described by its corresponding individual popularity distribution and ranking order, we investigate statistics of the individual popularity distribution and ranking order using the genre-based structure in Ch. 6.3 and Ch. 6.4, respectively. To provide a clear overview of the proposed modeling framework, a simple two-part summary is provided as follows. Firstly, to characterize the statistics of the individual popularity distribution, we use the following distri- butions and models: Size distribution (Ch. 6.3.1): since each user is only interested in a small number of genres, we use size distribution to indicate the statistics of how many genres a user is watching. Individual genre popularity distribution (Ch. 6.3.2): given the number of desired genres for a user, individual genre popularity distribution characterizes how concentrated the preference for specific genres is. 133 Genre-based conditional popularity distribution (Ch. 6.3.3): we use the genre-based conditional pop- ularity distribution of each genre to approximate the file popularity distribution within the correspond- ing genre. Secondly, to characterize the statistics of the individual ranking order, we use the following distributions and models: Size distribution (Ch. 6.3.1): the size distribution is again used here because it indicates how many genres we need to rank for a user. Genre appearance probabilities (Ch. 6.4.1): Since only the desired genres of a user need to be ranked, for a given genre, we use genre appearance probabilities to characterize the possibilities of genres that are desired by a user. Genre ranking distribution (Ch. 6.4.2): For a genre, its genre ranking distribution characterizes the probability distribution in terms of rank for the genre given that the genre is desired by the user. We directly use the global ranking order of files within each genre to approximate the individual ranking order for files within the corresponding genre. Following the description of framework, the parameterizations and correlation analyses of the frame- work are provided in Ch. 6.5 and Ch. 6.6, respectively. Then, to generate individual preference probabilities of a user, a generation approach is proposed and validated in Ch. 6.7. Specifically, the proposed generation approach first generates parameters according to the parameterization and correlation analysis results. Then individual popularity distributions and ranking orders are generated via using models in the framework. Fi- nally, by linking individual popularity distributions and ranking orders, the desired preference probabilities are generated. The sketch of the generation approach is presented in Fig. 6.1. We note that the parameter generations and the complete flow chart of the generation approach specifically used for the adopted dataset are further detailed in Ch. 6.7 and Appendix D.4. 134 Generate individual ranking order using Alg. 1 Generate individual genre popularity dist. using (6.4) Generate the size of genre list using size dist. in (6.2) Parameter Generations Generate genre- based conditional popularity dist. using (6.5) Generate genre ranking dist. using (6.7) Generate individual preference probabilities p k g;m ;8g;m using (6.24) Generate genre appearance probabilities using (6.6) S k S k Rg;8g p out k (:) p in k;g (:);8g r k p ap(:) Figure 6.1: Sketch of the individual preference probability generation of a user. 135 6.3 Proposed Modeling of Individual Popularity Distributions In this section the genre-based structure is adopted and models for describing individual preference popu- larity are proposed. To be specific, the relevant statistics of genre popularity of users are first investigated. Then, the genre-based conditional popularity distributions for files in each genre are investigated. 6.3.1 Size Distribution Here the size distribution is investigated and modeled. By observations from real data, we found that a user would usually access a small number of genres even if there are more than one hundred genres in the library, and even if we consider users that access the iPlayer more than 30 times per month. These observations can be intuitively explained by that people usually have their specific interests which constitute only a small portion of the whole entertainment palette. To quantify these observations, the size distribution 11 is modeled as Pr(S k =i) = f i (a Si ;b Si ) P M Si j=1 f j (a Si ;b Si ) s DGamma(a Si ;b Si ;M Si ) (6.2) wherei = 1; 2;:::;M Si andM Si is the maximal possible number of genres accessed by a user in the dataset; S k is the number of genres being accessed by userk;a Si andb Si are parameters that characterize the mod- eling distribution; and f i (a;b) =i a1 exp (bi): (6.3) Wheni is a continuous variable instead of discrete,f i (a;b) follows the basic expression of Gamma distribu- tion. As a result, (6.2) is named Discrete Gamma (DGamma) distribution. The fundamental characteristic of the DGamma distribution is that it is a hybrid power and exponential function, which is flexible to repre- sent the cases that increase and decrease are according to the power law, the exponential law, and their mix. 11 We model the number of genres accessed by the user as a random variable described by size distribution. Therefore although the size distribution is non-user-specific, different users can have different numbers of accessed genres. 136 We compare the proposed model with the real distribution derived from the dataset in June and July in Figs. 6.2a and 6.2b, respectively. Parameters for the model are provided in Tables D.9 and D.10 in Appendix D.4. It can be observed that the model is able to well reproduce the size distribution from the real data except for the regime with very low probabilities. 6.3.2 Individual Genre Popularity Distribution The popularity of a genre for a specific userk is defined as the ratio between the number of accesses to the genre by userk and the total number of accesses by the same user. Therefore characterizing the individual genre popularity distribution is to characterize the concentration level of individual popularity in terms of genres, in other words fitting the sorted distribution of the genre popularities for this user. The proposed model for this individual genre popularity distribution is the Mandelbrot-Zipf (MZipf) distribution [76]: P out k (i) = (i +q out k ) out k P S k j=1 (j +q out k ) out k ; (6.4) whereS k is the number of genres accessed by userk,P out k (i) is the popularity of theith ranked genre, out k is the Zipf factor, andq out k is the plateau factor. We note that the MZipf distribution degenerates to a Zipf distribution whenq out k = 0. Since a specific userk would have a specific combination of out k andq out k , this renders the complete description of all out k andq out k impossible. As a result, to describe out k andq out k for all k, a statistical modeling for them is necessary and is presented in Ch. 6.5.1. In Fig. 6.3, we provide exemplary comparisons between the model and real data in June on a log-log scale. Parameters of the MZipf distribution are out k = 5:5;q out k = 8:0 and out k = 1:2;q out k = 0:65 for Figs. 6.3a and 6.3b, respectively. From both figures, it can be observed that the MZipf model can effectively characterizes the real data. In Fig. 6.4, we provide similar exemplary comparisons for July, and the parameters for Figs. 6.4a and 6.4b are out k = 1:5;q out k =0:5 and out k = 4:1;q out k = 3:0, respectively. Again we observe the good fit between the model and real data. From Figs. 6.3 and 6.4, we can observe that the curve is concave-like whenq out k is positive and convex-like whenq out k is negative. Note that when 137 10 0 10 1 10 2 i 10 -10 10 -5 10 0 Pr(i) Data Proposed Model (a) June. 10 0 10 1 10 2 i 10 -10 10 -5 10 0 Pr(i) Data Proposed Model (b) July. Figure 6.2: Comparisons between the model and real data of size distributions. 138 q out k = 0 the curve is affine. Also note that the individual genre popularity distribution only specifies the concentration rate of the preference and does not specify which genre is the most popular one for this particular user; this aspect of genre ranking will be discussed in Ch. 6.4. 6.3.3 Genre-Based Conditional Popularity Distribution The genre-based conditional popularity distribution of a given genre is the conditional probability distribu- tion under the condition that files are annotated with the given genre. We use this distribution to approximate the per-user conditional preference probabilities of files under the condition that the file is annotated with the desired genre. We emphasize that the approximation is due to the infeasibility of the direct characteri- zation of user-based file preference statistics as discussed at the beginning of Ch. 6.2.4. Since genre-based conditional popularity distributions are non-user-specific distributions, different users are assumed to have the same distribution for the same genre, though of course the realizations of what different users download are different. To model the genre-based conditional popularity distribution of genre g, we propose to again use the MZipf distribution: P in g (i) = (i +q in g ) in g P Mg j=1 (j +q in g ) in g ; (6.5) whereP in g (i) is the popularity of theith ranked file in genreg, in g is the Zipf factor,q in g is the plateau factor, and M g is the number of files in genre g. We will again provide the statistical modeling for parameters in (6.5), and the results are presented in Ch. 6.5.2. In Fig. 6.5, the model is compared with the real distribution of genre “factual”. Parameters of the MZipf distribution for June and July are in g = 2:5, q in g = 64,M g = 5751 and in g = 2:8,q in g = 160,M g = 6235, respectively. From the figures, we observe that the MZipf distribution can effectively model the real distributions. 12 12 Of course, the MZipf distribution can effectively model other genres. 139 10 0 10 1 i 10 -2 10 -1 10 0 P k out (i) Data Proposed Model (a) Case 1. 1 2 3 4 5 6 i 10 -2 10 -1 10 0 P k out (i) Data Proposed Model (b) Case 2. Figure 6.3: Exemplary comparisons between the model and real data of individual genre popularity distri- butions in June. 140 1 2 3 i 10 -2 10 -1 10 0 P k out (i) Data Proposed Model (a) Case 1. 1 2 3 4 5 6 7 i 10 -2 10 -1 10 0 P k out (i) Data Proposed Model (b) Case 2. Figure 6.4: Exemplary comparisons between the model and real data of individual genre popularity distri- butions in July. 141 10 0 10 2 10 4 i 10 -8 10 -6 10 -4 10 -2 10 0 P g in (i) Data Proposed Model (a) June. 10 0 10 2 10 4 i 10 -8 10 -6 10 -4 10 -2 10 0 P g in (i) Data Proposed Model (b) July. Figure 6.5: Exemplary comparisons between the model and real data of genre-based conditional popularity distributions. 142 6.4 Proposed Modeling of Individual Ranking Orders In this section, the statistical modeling for individual ranking order is investigated. Identical to the individual popularity distribution case, a genre-based structure is adopted. 6.4.1 Genre Appearance Probability As elaborated in previous sections, the number of genres that a user might access is usually much smaller than the total number of genres in the library. Therefore for each userk, we can obtain a genre list which collects the genres that are accessed by user k. The number of genres in the genre list of user k is by definitionS k . Since the genre list of a user explicitly indicates the specific preference of that user on genres, character- izing statistics of the genre list is necessary. Thus, genre appearance probabilities are used. The appearance probability of genreg is defined as the probability of genreg to appear in genre lists of users, and it is the ratio between the number of times that genreg appears in genre lists of users and the number of total users. The proposed model describing genre appearance probabilities is P ap (g) =N ap exp( ap g); (6.6) whereP ap (g) is the appearance probability of genreg, ap is the shaping parameter, andN ap is the scaling parameter. The comparisons between the model and the real data is provided in Fig. 6.6, and the parameters of the model are offered in Tables D.9 and D.10 in Appendix D.4. 6.4.2 Genre Ranking Distribution Given the genre list of a user, the ranking order of genres in the list characterizes the preference of a user. To investigate the statistics of the ranking order, we investigate the ranking distributions of genres. The ranking distribution of a genre g is defined as the distribution of ranks of genre g in genre lists of users 143 10 0 10 1 10 2 g 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 P ap (g) Data Proposed Model (a) June. 10 0 10 1 10 2 g 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 P ap (g) Data Proposed Model (b) July. Figure 6.6: Comparisons between the model and real data of genre appearance probabilities. 144 conditioning on genre g appearing in those genre lists. By this definition, we denote Pr(R g = i) as the probability of genreg to be theith ranked genre when genreg appears in a genre list. The proposed model for the distribution of this quantity is a DGamma distribution: Pr(R g =i) = f i (a;b) P G j=1 f j (a;b) s DGamma(a rk g ;b rk g ;G): (6.7) The DGamma distribution in (6.7) follows the same expressions in (6.2) and (6.3). We will again provide the statistical modeling of the parameters in ranking distributions in Ch. 6.5.3. In Fig. 6.7, exemplary comparisons between the model and real data are provided, and we again demon- strate the results of the genre “factual”. Parameters for the model area rk g = 2:95,b rk g = 0:8 for June and a rk g = 2:65,b rk g = 0:75 for July, andG = 110 for both June and July according to the dataset descriptions in Ch. 6.2.2. The results show the good agreement between the model and real data. 6.5 Statistical Parameterization of the Proposed Modeling Framework By using the framework and models, the individual preferences can be described via using distributions and probability models in Ch. 6.3 and Ch. 6.4, which greatly reduces the complexity for describing a dataset. However, the parameters in the proposed framework still need a numerical description, and this numerical description can gradually become impossible to handle when the number of users in a dataset increases. To further reduce the description complexity, in this section, the statistical representations of parameters in the modeling framework are proposed. We note that such representations are expressed either by well-known distributions or by certain specifically designed distributions. 13 Therefore, when dealing with well-known distributions, the standard maximum likelihood (ML) approach in Matlab (2017) is used to fit the real data; on the contrary, when dealing with special distributions, the K-L approach in Ch. 6.2.3 is used again. We note that all parameterization results in this section are provided quantitatively in Appendix D.4 with 13 Since our goal is to further reduce the complexity of expressing our proposed modeling framework, we aim to fit the real data at least to a certain degree even with some artificially constructed distributions. 145 10 0 10 1 i 10 -6 10 -4 10 -2 10 0 Pr(i) Data Proposed Model (a) June. 10 0 10 1 i 10 -6 10 -4 10 -2 10 0 Pr(i) Data Proposed Model (b) July. Figure 6.7: Exemplary comparisons between the model and real data of ranking distributions. 146 complete details, including the confidence interval calculations. Thus, the details of the model curves in all figures are referred to Appendix D.4. 6.5.1 Statistical Modeling for Parameters of Individual Genre Popularity Distributions In this subsection, the statistical model of the parameters of individual genre popularity distributions of users, i.e., out k andq out k , are provided. As indicated in Ch. 6.3.2 and in Figs. 6.3 and 6.4, the shapes of individual genre popularity distributions can be categorized into different types according to the sign of q out k . As a result, it is natural to also characterize the distributions of parameters differently according to the relevant types. Consequently, we provide four distributions which independently model the two parameters with two types: (i) out k whose relevantq out k is non-negative, i.e.,q out k 0; (ii)q out k 0; (iii) out k whose relevantq out k is negative, i.e.,q out k < 0; and (iv)q out k < 0. In the remaining article, the short-handed descriptions are used for them: (i) out k with non-negative type (NNT); (ii)q out k with NNT; (iii) out k with negative type (NT); and (iv)q out k with NT. We note that, by the dataset, the probabilities of having non-negativeq out k , i.e.,q out k 0, when randomly picking a user areP out NNT = 0:795 andP out NNT = 0:784 in June and July, respectively. Thus the probability of picking a user who has a negativeq out k is 1P out NNT . Here we provide the statistical models for out k andq out k with NNT. The model for out k with NNT is a mixed distribution whose probability density function (pdf) is f out ga,NNT (x) =c out 1;ga f Gam (x;a out 1;ga ;b out 1;ga ) + (1c out 1;ga c out 3;ga ) f unif (x;a out 2;ga ;b out 2;ga ) +c out 3;ga f Gam (x;a out 3;ga ;b out 3;ga ); (6.8) wheref Gam (x;a;b) is a Gamma distribution with pdf f Gam (x;a;b) = 8 > > > < > > > : 1 b a (a) x a1 exp x b ; x 0; 0; otherwise; (6.9) 147 andf unif (x;a;b) is a uniform distribution with pdf f unif (x;a;b) = 8 > > > < > > > : 1 ba ; axb; 0; otherwise. (6.10) From (6.8), we can observe thatf out ga,NNT (x) is a distribution summed by three different constituent distribu- tions withc out 1;ga , 1c out 1;ga c out 3;ga , andc out 3;ga being their weights, respectively. Then by carefully selecting parametersa out 1;ga ;b out 1;ga ;:::;b out 3;ga , the distribution of out k with NNT is almost identical to a mixed distribution of three distributions with non-overlapping supports. This description can be more clear when observing Figs. 6.8 and 6.9. Similar to out k with NNT, the pdf ofq out k with NNT is also a mixed distribution: f out q,NNT (x) =c out 1;q f Gam (x;a out 1;q ;b out 1;q ) + (1c out 1;q c out 3;q )f unif (x;a out 2;q ;b out 2;q ) +c out 3;q f Gam (x;a out 3;q ;b out 3;q ); (6.11) where f Gam and f unif are defined in (6.9) and (6.10), respectively, and a out 1;q ;b out 1;q ;:::;b out 3;q ;c out 1;q ;c out 3;q are the modeling parameters. The proposed models of out k andq out k with NNT are compared with the real data in the form of cumulative distribution function (cdf) in Figs. 6.8 and 6.9, respectively, for both June and July. The parameters are given in Tables D.9 and D.10 in Appendix D.4. From the figures we can observe the effectiveness of the models, and that the cdfs can be regarded as a three-part function, which gives rise to the idea of using the mixed distribution. We note that since both out k andq out k with NNT are modeled using specifically designed distributions, they are fitted by the K-L approach. Now we provide the statistical modeling for out k and q out k with NT. The model for out k with NT is Loglogistic distribution, and the pdf of out k with NT is f out ga,NT (x) =f Logl (x; out ga ; out ga ) = 1 out ga 1 x exp(z) (1 + exp(z)) 2 ; (6.12) wherez = log(x) out ga out ga . Note thatf Logl (x;;) is the standard pdf expression of a Loglogistic distribution whose log-mean and log-scale parameter are and, respectively. The statistical model ofq out k with NT is 148 0 5 10 15 20 25 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (a) June. 0 5 10 15 20 25 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (b) July. Figure 6.8: Comparisons between the model and real data for the distribution of out k with NNT. 149 0 10 20 30 40 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (a) June. 0 10 20 30 40 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (b) July. Figure 6.9: Comparisons between the model and real data for the distribution ofq out k with NNT. 150 a variant of the Beta distribution, namely negative Beta distribution. The pdf ofq out k with NT is f out q,NT (x) =f Beta (x;a out q ;b out q ) = 8 > > > < > > > : (x) a out q 1 (1 +x) b out q 1 B(a out q ;b out q ) ; 1<x< 0; 0; otherwise; (6.13) whereB(a;b) is the Beta function. Clearly, (6.13) is a Beta distribution in which the variable isx instead ofx, and the domain ofx is changed tox2 (1; 0) instead ofx2 (0; 1) accordingly. In Figs. 6.10 and 6.11, the comparisons between the models and real data are provided for out k andq out k with NT, respectively. We note that, different from their NNT counterparts, out k and q out k with NT are modeled using standard distributions and their variants. Therefore they are fitted by using a ML approach. 6.5.2 Statistical Modeling for Parameters of Genre-Based Conditional Popularity Distribu- tion We now present the statistical modeling for in g andq in g . We again consider in g andq in g with NNT and NT, respectively. We note that, according to the dataset, the probabilities of non-negative q in k when randomly picking a genre areP in NNT = 0:66 andP in NNT = 0:71 in June and July, respectively. The in g with NNT is modeled by the variant of the Loglogistic distribution, namely, the shifted-and- truncated Loglogistic distribution. The pdf of in g is then given by f in ga,NNT (x; in ga ; in ga ;S in ga; ;T in ga ) =f STLogl (x; in ga ; in ga ;S in ga ;T in ga ) 8 > > > > > > > > > < > > > > > > > > > : 1 in ga 1 xS in ga exp(z) (1 + exp(z)) 2 ; S in ga <x<T in ga ; Z 1 T in ga 1 in ga 1 xS in ga exp(z)dx (1 + exp(z)) 2 ; x =T in ga ; 0; otherwise, (6.14) wherez = log(xS in ga ) in ga in ga ,S in ga is the shift of the original Loglogistic distribution, andT in ga is the truncation parameter of the Loglogistic distribution. Theq in g with NNT is modeled by the truncated Loglogistic distri- 151 0 1 2 3 4 5 6 7 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (a) June. 0 1 2 3 4 5 6 7 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (b) July. Figure 6.10: Comparisons between the model and real data for the distribution of out k with NT. 152 0 0.2 0.4 0.6 0.8 1 -x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (a) June. 0 0.2 0.4 0.6 0.8 1 -x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (b) July. Figure 6.11: Comparisons between the model and real data for the distribution ofq out k with NT. 153 bution, i.e., the shifted-and-truncated Loglogistic distribution with zero shift. Thus the pdf ofq in g with NNT is f in q,NNT (x; in q ; in q ;T in q ) =f STLogl (x; in q ; in q ;S in q = 0;T in q ): (6.15) In Figs. 6.12 and 6.13, the models of in g andq in g with NNT are respectively compared with real data. We note that the model could be made more compact than the expression in (6.14) sinceS in ga is close to zero when considering the adopted dataset. However, we includeS in ga in the model to preserve the flexibility when dealing with other potential datasets. 14 The in g with NT is modeled by Weibull distribution, and its pdf is f in ga,NT (x) =f WB (x;a in ga ;b in ga ) = 8 > > > < > > > : b in ga a in ga x a in ga b in ga 1 exp h x a in ga b in ga i ; x 0; 0; otherwise. (6.16) We note thatf WB (x;a;b) is the pdf of a Weibull distribution with scaling parametera and shaping parameter b. Theq in g with NT is again modeled using negative Beta distribution in (6.13). To be specific, the pdf ofq in g with NT is f in q,NT (x) =f Beta (x;a in q ;b in q ): (6.17) The proposed modeling for in g andq in g with NT are respectively compared with real data in Figs. 6.14 and 6.15. 6.5.3 Statistical Modeling for Parameters of Genre Ranking Distribution Here the statistical modeling for parametersa rk g andb rk g in ranking distributions are provided. The model for a rk g is Weibull distribution, and its pdf is f rk a (x) =f WB (x; rk a ; rk a ); (6.18) 14 We foundS in ga to be far from zero when we consider a one-week sub-dataset instead of the complete one-month dataset. 154 0 5 10 15 20 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Data (a) June. 0 5 10 15 20 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Data (b) July. Figure 6.12: Comparisons between the model and real data for the distribution of in k with NNT. 155 0 50 100 150 200 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Prposed Model Data (a) June. 0 50 100 150 200 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF (b) July. Figure 6.13: Comparisons between the model and real data for the distribution ofq in k with NNT. 156 0 0.5 1 1.5 2 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposd Model Data (a) June. 0 0.5 1 1.5 2 2.5 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Data (b) July. Figure 6.14: Comparisons between the model and real data for the distribution of in k with NT. 157 0 0.2 0.4 0.6 0.8 1 -x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Data (a) June. 0 0.2 0.4 0.6 0.8 1 -x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Data (b) July. Figure 6.15: Comparisons between the model and real data for the distribution ofq in k with NT. 158 wheref WB (x; rk a ; rk a ) is described by (6.16). The model forb rk g is Gamma distribution, and its pdf is f rk b (x) =f Gam (x; rk b ; rk b ); (6.19) wheref Gam (x; rk b ; rk b ) is described by (6.9). The comparisons between the models and the real data fora rk g andb rk g are provided in Figs. 6.16 and 6.17. 6.5.4 Statistical Modeling for User Loading Here the statistical modeling for the distribution of loading of regular users is provided. Although the loading of users is irrelevant to the individual preferences of users, for the system simulations and for the sake of generating the final global popularity, the characterization of user loading is necessary. Based on the dataset, the loading of a user is given by the number of unique accesses of the user. Thus, the loading of users is always greater than or equal to 30 according to the descriptions in Ch. 6.2.2. The loading distribution is then modeled by the shifted MZipf distribution: Pr(L k =i) = (i 29 +q Ld ) Ld P L j=30 (j 29 +q Ld ) Ld ;i 30; (6.20) whereL k is the load of the userk andL is the maximum possible load; Ld andq Ld are the parameters for the distribution. In Fig. 6.18, the model is compared with real data. 6.6 Correlation Analysis for Parameters of Proposed Modeling Framework In this section, we investigate the correlation between parameters using both the Pearson correlation coef- ficient, i.e., linear correlation coefficient, and the Spearman rank correlation coefficient. 15 The reasons for using rank correlation are: (i) to provide the correlation analysis from a ranking perspective since ranking order is an important characteristic in our model; (ii) to allow reconstruction of the correlations between 15 Note that there are other types of rank correlation coefficient. The Spearman rank correlation coefficient is one of the most commonly used, and its definition is directly related to the linear correlation coefficient. 159 0 2 4 6 8 10 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Data (a) June. 0 5 10 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Data (b) July. Figure 6.16: Comparisons between the model and real data for the distribution ofa rk g . 160 0 0.5 1 1.5 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Data (a) June. 0 0.5 1 1.5 2 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Modeling Data (b) July. Figure 6.17: Comparisons between the model and real data for the distribution ofb rk g . 161 10 2 10 3 i 10 -8 10 -6 10 -4 10 -2 10 0 Pr(i) Data, June Proposed Model, June Data, July Proposed Model, July Figure 6.18: Comparison between the model and real data of the loading distribution. parameters characterized by arbitrary distributions. Note that, when parameters are non-Gaussian or not commonly used multivariate distributions, it is generally impossible to reconstruct their dependence by us- ing only linear correlation information [212]. On the contrary, the reconstruction of rank correlation via copulas can be used generally for almost any distributions [212, 213]. As a result, knowing the rank corre- lation is important and its properties are exploited by the proposed individual probability generation in next section. The Pearson correlation coefficient is defined as Ln (x;y) = Cov(x;y) x y ; (6.21) where Cov(x;y) is the covariance ofx andy; x and y are standard deviations ofx andy, respectively. The Spearman rank correlation coefficient is defined as [214] Rn (x;y) = Cov(r x ;r y ) rx ry ; (6.22) where r x and r y is the corresponding ranking of the original x and y, respectively; Cov(r x ;r y ), rx and ry then are the covariance and standard deviations of r x and r y , respectively. We elaborate r x using an example. Suppose we have three possible values 0:3, 0:5, 0:7 for x. Their corresponding ranking, i.e., values ofr x then are 1, 2, and 3, respectively. By comparing between (6.21) with (6.22), it can be observed 162 that (6.22) is simply the linear correlation coefficient of the corresponding ranking values ofx andy. We note that since the real data is used for conducting the correlation analyses, the corresponding sample-based approaches are then used for finding the results instead of the true expectations. Here we discuss the correlations between parameters and present some insights. We generally aim to explore the correlation between parameters of the same distribution. This is relevant to determining whether a specific trend of the distribution (jointly determined by parameters of the same popularity distribution) appears more frequently for a popularity distribution. Note that, in this work, parameters are considered correlated only when their absolute values of linear and/or rank correlation coefficients are greater than 0:5. We then focus on the analytical discussions here, and the complete numerical descriptions are provided in Tables D.1, D.2, D.3, and D.4 in Appendix D.2. The results are as follows. The out k andq out k with NNT are correlated positively in terms of both linear and rank correlation coefficients. In addition, the out k andq out k with NT are also correlated positively in terms of both linear and rank correlation coefficients. The results indicate that the out k andq out k could balance each other so that the case in which user has a single highly preferred genre with many other extremely low preferred genres seldom exists. Since a user’s preference might be related to its loading and the number of genres of interest, we also explore whether out k andq out k are correlated toS k andL k . The results indicate that there is no significant correlation between them. Also, we notice that thatS k andL k are not correlated with each other. Thus, even if a user only has a very few number of genres of interest, (s)he can still impose a very high load on the network and vice versa. We now consider the parameters of genre-based conditional popularity distributions. For in k andq in k with NNT, the results show that there is only a slight correlation between them in terms of both linear correlation and rank correlation. For the case of in k andq in k with NT, again only a slight correlation is observed in terms of both linear and rank correlations. Finally we consider the parameters of ranking distributions, i.e., a rk g andb rk g . From the results, we observe that they are correlated with each other in terms of linear and rank 163 correlations. Besides, since it is intuitive that a more popular genre (in terms of global ranking order) should have a higher likelihood of having a higher rank, we explore the ranking correlation betweena rk g ,b rk g , and the global ranking. Specifically, we relate the global rank of each genre with theira rk g andb rk g and compute the rank correlation. The result shows that thea rk g andb rk g are somewhat correlated to the global ranking, indicating that a genre with a higher global rank is likely to have a higher rank among the interests of the users. 6.7 Proposed Individual Preference Probability Generation In this section, we first propose an approach that can generate individual preference probabilities of users ac- cording to the models, parameterization, and correlation results in previous sections. Then the effectiveness of the proposed generation approach is validated by comparisons with real-world data. 6.7.1 Parameter Generation To generate the individual preference probabilities of users, the first step is to generate the parameters used by the models in Ch. 6.3 and Ch. 6.4 via using the results in Ch. 6.5 and Ch. 6.6. The parameter generation is based on the rank correlation results and the individual marginal distributions of the parameters. We therefore define the parameter generation function as y(x) = G para (C Rn x ;ff x g); (6.23) where x is the parameters to be generated, y is the generated instance of x, C Rn x is the rank covariance matrix ofx, andff x g is the set of marginal distributions ofx. We note that the implementation recipe of G para (C Rn x ;ff x g) is provided in Appendix D.3. Also note that if a parameter in x is not correlated with other parameters, it is obvious that the parameter instance of x can be generated simply by its marginal distribution. 164 There are two types of parameters: (i) library-based parameters, and (ii) individual-based parameters. Since library-based parameters are determined at the beginning of the generation process of a library and is invariant across users, they either are directly given from the setup or only need a single generation for a particular library. By contrast, the individual-based parameters are generated independently for each user, and different users generally have their own instances of parameters. The library-based parameters are: G, M g ;8g,a Si ,b Si ,M Si , in g ,q in g ,a rk g ,b rk g ;8g, ap ,N ap ,L, Ld , andq Ld . The individual-based parameters are: out k andq out k ;8k. Finally, we discuss the sensitivity of the statistics of the parameters with respect to the change of datasets based on the extensive real-world data in June and July. Indeed, the parameterization results show that the statistics of dataset of June and July are quite similar, and most of the fundamental statistics of the parameters in two datasets are close, includinga Si ,b Si ,M Si ,a rk g , ap ,N ap ,L, Ld ,q Ld , out k andq out k . In addition, the rest of the parameters are only different in part. Specifically, for in g andq in g , the differences lie only in the cases considering their negative types; forb rk g , the difference lies only at the values of rk b . In conclusion, the fundamental statistics of the parameters in the framework is insensitive to the change of time in the scale of one month for our dataset. 16 This also leads to similar global popularity distributions, as we will see in Ch. 6.7.3. We should note that although their statistics are similar, they are actually different in many details, including the exact file and genre orders and the exact popularity distributions of a genre. We also stress that the conclusion here is only valid for the dataset adopted in our work, and the extension of this conclusion to other timeframes, and in particular to other types of video service such as YouTube or Netflix should undergo a careful examination. 16 We note that all descriptions here can be quantified by comparing the values in the Tables in Appendix D.4. 165 6.7.2 Procedure of the Proposed Individual Preference Probability Generation Approach Here the general procedure of the individual preference probability generation is elaborated. Note that the sketch of the generation approach has already been provided in Fig. 6.1. To generate the individual pref- erence probabilities of users, we first prepare all the library-specific parameters. Then the genre-based conditional popularity distributionsP in g (:);8g, genre appearance probabilitiesP ap (g);8g, and ranking dis- tributionsfPr(R g )g;8g; are generated according to (6.5), (6.6), and (6.7), respectively. Note that these distributions are library-specific and are invariant when generating individual preference probabilities of different users. We next generate the individual popularity distribution of userk. The number of genres in the genre list of user k, i.e., S k , is first generated according to (6.2). Then we generate out k and q out k according to the results in Ch. 6.5.1 and (6.23). Subsequently, the individual genre popularity distribution P out k (:) is generated according to (6.4). The genre list and the individual ranking order of user k are generated according to the proposed ranking order generation approach in Alg. 1. The output of the ranking order generation process is the genre index vectorr k of userk, wherer k contains the indices of genres that appear in the genre list. Besides, the order of the indices inr k is exactly the ranking order of corresponding genres. Thereforer k uniquely specifies the genre list and the ranking order of userk. For example, suppose we have G = 5,S k = 3, andr k = [3; 2; 5]. We know that the genre 2, 3, and 5 are genres in the genre list of userk; genre 3 is ranked 1st; genre 2 is ranked 2nd; and genre 5 is ranked 3rd for userk. For Alg. 1, we provide the following remarks: (i)krk 0 is equal to the number of non-zero entries inr, wherekk 0 is theL 0 norm; (ii) step 4 is to randomize the filling order of genres at each round; (iii) step 7 is to check whether the genre has already been filled into the genre list; (iv) step 10 is to check whether the selected genre should appear in the genre list, whether the ranking valueR is less or equal to the size of the genre list, and whether the genre list is full; and (v) step 16 is to generate the final order of genres in the list according to the generated 166 ranking values. For example, supposeG = 5,S k = 3, andr = [0; 2; 1; 0; 2]. We would haver k = [3; 2; 5] according to step 16 in Alg. 1. Equipped with genre-based conditional probability distributionsP in g (:) and after the generations of the individual preference popularityP out k (:) and the genre index vectorr k , individual preference probabilities of userk can then be generated by 17 p k g;m =f out k;g P in g (m); (6.24) where f out k;g = 8 > > < > > : P out k (i); entryi ofr k =g 0; otherwise : (6.25) Eq. (6.25) indicates that only genres indexed inr k have non-zero preference probabilities, and the preference order is given by the order of indices inr k . For example, suppose that we haveG = 5,S k = 3,P out k (1) = 0:5455,P out k (2) = 0:2727,P out k (3) = 0:1818, andr k = [3; 2; 5]. Thenf out k;1 =f out k;4 = 0,f out k;3 =P out k (1) = 0:5455,f out k;2 = P out k (2) = 0:2727, andf out k;5 = P out k (3) = 0:1818. We note that, without loss of generality (for the proposed modeling framework), (6.24) assumes the indices of files within each genre to follow the descending order of the global popularities of files within the genre, i.e., p k g;m p k g;m+1 ;8k. By combining (6.24) with (6.25), the individual preference probabilitiesp k g;m ;8g;m of userk can be obtained. By repeating the procedures in this section, individual preference probabilities of different users can be generated. We note that althoughL k is not used for generating the individual preference of a user, it is used when generating the global preference of users since it indicates the traffic generated by each user. The global popularity distribution is generated by p Gb g;m = P K k=1 p k g;m L k P g;m P K k=1 p k g;m L k ; (6.26) 17 It can be observed that, with the proposed modeling and generator, the filem in genreg is ranked at themth position in the genre-based conditional popularity distribution of genreg. This is because the non-user-specific genre-based conditional popularity distribution is used to approximate the user preferences of files within the genre, and this index arrangement is used for convenience and without loss of the generality. 167 wherep Gl g;m is the global preference probability of filem in genreg,K is the total number of users, andL k is the loading of userk generated according to (6.20). 6.7.3 Numerical Validations Here the generation approach is validated by comparing generated results to the underlying real data. To set up the generation approach, basic parameters of models used by the approach need to be specified and are provided in Tables D.9 and D.10 in Appendix D.4. We note that since the proposed modeling assumes each file has only one annotated genre while the files could indeed have more than a single annotated genre in the real data, there exists a mismatch for the number of files in each genre between the generation approach and the real data when considering they have the same number of total files. To calibrate, we conduct an adjustment on the numbers of files in genres of the generation approach so that the influences of the multi- genre files are accommodated. To clarify the concept used for the calibration, the following example is provided. Suppose we have 1000 files in genre 1 and 2000 files in genre 2 according to real data, but the number of total files is only 2400 because there are 600 files annotated with both genres 1 and 2. This indicates we wantM 1 +M 2 = 2400 for the generation approach. Then since there are 600 files to be shared by genres 1 and 2, we consider these files contribute 1 2 to each genre, i.e., a file with 2 annotated genres is counted as 1 2 file in each genre. Therefore after the calibration, we haveM 1 = 400 + 300 = 700 and M 2 = 1400 + 300 = 1700. 18 Note that all the numbers of filesM g ;8g, after calibration are provided in Tables D.12 and D.13 in Appendix D.4. In addition to the calibration issue, since the global popularity distribution is highly sensitive to the parameters of genre-based conditional popularity distributions, i.e., in g ;8g and q in g ;8g, to have a highly accurate generation for comparison purpose, in addition to providing the results generated purely by the sta- tistical models in Ch. 6.5.2, we also provide the results for which the numerical values directly derived from 18 If there exist fractional numbers after the calibration, they are simply rounded to the nearest integer. 168 the dataset are used for the top 30 ranked genres, i.e., for in g ;g = 1;:::; 30 and q in g ;g = 1;:::; 30. Their values are provided in Table D.11 in Appendix D.4. To clarify the sensitivity problem and the reason of us- ing the numerical values, we stress that the real-world data is actually one of all possible instances that can be generated by the proposed generation approach. In other words, when using the statistical approach for generating certain parameters, we are comparing one particular realization of the file popularities with an- other realization (the measured data). Also, since the conditional genre-based popularity distributions whose corresponding genres are popular generally cover a wide range of files, the change of parameters of those popularity distributions indeed influences the final popularity significantly. Moreover, because the genera- tion approach is non-linear, taking the average of the differently generated global popularity distributions does not actually give the equivalent result generated by the average of parameters. The validation of the individual components of the model has been provided in previous sections through- out the chapter. Hence, as a validation of the complete model, we investigate whether averaging over the obtained individual user distributions provides the total popularity distribution that was independently ex- tracted from the observed data. We note that the complete flow of the generation approach is provided in Fig. D.16 in Appendix D.4, and it is particularly used for the validations in this work. Thus it is also a demonstra- tion of how to use the proposed framework and generation approach to generate the individual preference probabilities and their corresponding global popularity distribution. Fig. 6.19 compares the global popular- ity of files of the dataset of June with the global popularity of files constructed by realizations generated by the generator; the same comparison for July is in Fig. 6.20. The results show good agreement between the model and the data in both figures when the numerical values of in g andq in g are used for the top 30 ranked genres. We note that the global popularity distributions in Figs. 6.19 and 6.20 are close because the statistics of parameters are insensitive to the monthly change as we discussed at the end of Ch. 6.7.1. 169 10 0 10 1 10 2 10 3 10 4 10 5 Rank 10 -8 10 -6 10 -4 10 -2 Popularity Real Data Proposed Generation (30 Numerical) Proposed Generation (Pure Statistical) Figure 6.19: Comparison between global popularity distributions from proposed generation approach and real data in June. 10 0 10 1 10 2 10 3 10 4 10 5 Rank 10 -8 10 -6 10 -4 10 -2 Popularity Real Data Proposed Generation (30 Numerical) Proposed Generation (Pure Statistical) Figure 6.20: Comparison between global popularity distributions from proposed generation approach and real data in July. 170 6.8 Summary of Insights and Applications To finish our discussions, here we first summarize insights of this work and then discuss possible applica- tions. To model the popularity distributions, we introduce the MZipf distribution, which is commonly used for modeling popularity distributions. For most cases of the results, the plateau factor of the MZipf distribu- tion is positive, leading to a flat head followed by a steep tail. The flat head indicates that there is a group of popular files/genres with almost equal popularity and the steep tail represents a group of progressively less popular files/genres. This implies that the requests are mainly from the group of popular files/genres, and the requests spread out evenly within such group. The results of the size and ranking distributions indicate that each person usually has only a handful of genres of interest. Besides, the genres of interests and the corresponding orders of preferences are different between different individuals. From the statistics of pa- rameters, the positive correlation between parameters of individual genre popularity distributions indicates that the cases that a user has a single highly preferred genre with many other extremely low preferred genres seldom exists. Besides, we observe that the individual genre popularity distribution is not correlated to the number of genres of interest and the loading a user imposes on the network. Furthermore, we did not find obvious correlations between the number of genres of interest and the loading, indicating that even if a user only has a very small number of genres of interest, (s)he can still impose a very high load on the network and vice versa. Finally, we notice that the shape of a ranking distribution is slightly related to the global ranking of that genre. Specifically, when a genre has a higher global rank, it is more likely that its has a higher rank among the interests of the users. The improved modeling of the popularity distribution, i.e., the modeling involving individual prefer- ences, allows the following applications: Adjust the cached content individually to maximize the utility according to the estimated individual preferences of users [78, 191, 208, 215]. 171 Improve system performance by grouping users with similar preferences to enhance cooperation be- tween users [86, 87, 216]. Optimize the caching policy by considering the individual preferences of users [82, 206, 207]. Decide where and how to store content in intermediate routers by considering the aggregates of inter- ests at the edges of the network adopting an information-centric architecture [16]. Adjust advertising campaigns to appeal better to consumers of such TV shows. Offer foundation of more accurate network analysis. Recent literature has demonstrated that exploiting the information of individual preferences can further improve different aspects of the systems, while they either offered evaluations using simple individual pref- erence modeling without support by real-world data or spent significant efforts on obtaining the data. Note that since collecting data is very challenging, sometimes, even if a research group spends significant efforts, the volume of the collected data might still be insufficient for providing reliable results. Our work can help in this situation. Based on the real-world data, the proposed modeling framework and parameterization can be used to generate the practical pseudo individual preference probabilities for verifying the investigations considering individual preferences. A demonstration of applying our work is provided in [206]. 6.9 Conclusions This chapter proposed what is to the best of our knowledge the first modeling framework and correspond- ing statistical models for individual preference probabilities of users for video content based on real-world data, and following the framework, parameterizations and correlation analyses are conducted. The parame- terized model is able to reproduce critical statistics of the individual preferences, and therefore an individual probability generation approach is proposed by judiciously linking those statistics together. The model- 172 ing framework is based on, and parameterized by, extensive real-world data sets. The effectiveness of the proposed models and the generation approach was validated. The framework and methodology presented in this work are capable of being used for other datasets, and the analysis methods and approaches adopted throughout the chapter are extensible. Also, the flexibility of the proposed generation approach allows to replace particular fitting distributions if other data sets might indicate a need for such a replacement. In other words, any part of the models can be replaced if necessary, and the generation approach can still be effective as long as the logical flow and critical implementation steps are preserved. On the other hand, parameterization and correlation analysis results depend on the dataset, and there is no guarantee for extending those results to other datasets. Thus, careful examinations should be conducted when considering other datasets. 173 Algorithm 1 Proposed Ranking Order Genreation Approach 1: Input:S k 2: Init: a zero vectorr =0 3: whilekrk 0 <S k do 4: Create a random permutaion vector P v with entries being 2; 3;:::;G and create an augmented vector P = [1jP v ] 5: fori = 1!G do 6: g =P(i) 7: ifr(g) = 0 then 8: tv binomial(1;P ap (g)) 9: Rv DGamma(a rk g ;b rk g ;G) 10: ift = 1 andRS k andkrk 0 <S k then 11: r(g) =R 12: end if 13: end if 14: end for 15: end while 16: r k = arrangement of indices of sort(r; ascend). Break tie by putting the lower index at the lower order. Ignore indices with corresponding values being zero inr. 17: returnr k 174 C H A P T E R 7 Individual Preference Aware Caching Policy Design in Wireless D2D Networks 7.1 Introduction Cache-aided D2D has demonstrated the ability to significantly improve network performance without the need for newly installed infrastructure and complicated coding. However, most of the existing papers for cache-aided D2D networks consider a homogeneous preference model, which assumes all users have the same file preference. In other words, each user requests files independently and randomly according to the same popularity distribution. This model is at best an approximation because different users indeed have different taste and preferences. Such heterogeneity in preferences of users has been observed in [78] and has also been modeled in recent works [79–81] and in Ch. 6. Furthermore, based on real-world data, results in [78] have shown that leveraging on the individual (user) preferences indeed can improve network performance. Thus, these aforementioned observations clarify that cache-aided D2D networks can be further improved by using heterogeneous model, instead of using homogeneous model, in the network design. This is because designs that are based on the latter model are restricted as it does not consider individual user preferences. 175 Researchers have recently begun to consider individual preferences in their analysis, and studies have accordingly shown that it is possible to use this information to improve the performance of wireless caching networks [78, 79, 82, 84–87, 130]. 1 In [78], individual preferences were studied, and a machine-learning ap- proach was used to learn the user’s preferences and accordingly decide which video should be preloaded onto a local device cache based on the preference of that particular user. While this kind of approach (also known as the “Netflix challenge”) is very important for recommendation systems and preloading on individual de- vices, it is not applicable to cache-aided D2D networks. In [82], an individual preference-aware weighted sum utility of users was formulated and optimized when the files are being cached at the BSs. Meanwhile, [86] designed a caching policy by assuming that users in different groups have different file preferences; the goal then is to maximize the successful file discovery probability of different groups without taking the possible interference into account. In [87], a content push strategy was designed to maximize the D2D of- floading gain for a particular demand realization by jointly considering the influences of user preference and sharing willingness. In [79], user preferences were used to maximize the offloading probability without ac- counting for the details of the physical layer. Using individual preference and user similarity, [84] proposed a caching content allocation approach to maximize a specifically defined utility. While [85] focused on es- timating individual preferences using a learning-based algorithm, the study provided a caching policy that exploits the estimated preferences in order to minimize the average delay of D2D networks. Lastly, [130] proposed a game theoretical-cooperative caching design by assuming that users know exactly what files they want to request. Despite this progress, the understanding of how individual preference can be used to improve cache- aided D2D networks is still far from complete. It is still unclear whether integrating individual preferences into the design can improve network performance significantly. Moreover, the interplay between and among the different performance metrics, e.g., throughput, EE, and hit-rate, and the corresponding trade-off that 1 The conference paper of this chapter [206] is one of the earliest studies that took individual preferences into consideration. 176 result from these interactions, are still subject to further studies. Most importantly, the existing papers do not provide sufficient evaluations based on real-world data. Accordingly, this chapter aims to address these issues. 7.1.1 Contributions In this chapter, we consider a BS-assisted cache-aided D2D network, where users can obtain the desired files from the BS, caches of neighboring users via D2D links, and their local caches. We assume that users have different preferences and caching policies; thus, our goal is to maximize network utility by designing individual preference-aware caching policies for users. We analyze the network based on the clustering and random-push scheduling [99], and then propose a non-convex utility maximization problem formulation. We then show that our proposed utility maximization problem can be applied to solve different practical problems, e.g., throughput, hit-rate, and EE optimization problems, as well as several trade-off problems. Hence, it is sufficient to investigate a general solution approach for the proposed utility maximization prob- lem. In addition, we discuss how the proposed utility maximization problem can be used in scenarios with different fading and user distributions. We assume that users perfectly know the individual preferences of all the other users in the network and that they are allowed to coordinate with one another. With these considerations, we solve the utility maxi- mization problem and obtain the users’ caching policies. The idea of the proposed approach is to optimize individually and iteratively the users’ caching policies until convergence. We show that this method is fairly simple to use, improves at each iteration, and converges to a stationary point under a mild assumption. We then evaluate the proposed caching policies in networks under realistic setups; in particular, we adopt the practical individual preference generator proposed in [81] based on extensive real-world data. Our results show that network performance can significantly improve when the information on individ- ual preferences is properly exploited. We also compare the performances of those networks that optimize 177 throughput, EE, and hit-rate using the proposed utility maximization framework, and investigate the influ- ences of the cooperation range of the D2D network. The results indicate that there are trade-offs between these important metrics, and we can manage the trade-offs by properly exploiting our proposed framework. Finally, we show how the proposed designs can be used as good reference designs for obtaining effective designs in networks with complicated scheduling. We emphasize that, to the best of our knowledge, this is the first work that validates from different perspectives of the network the benefits of exploiting user prefer- ences, and gains insight by simulations based on the real-world data. To sum up, our paper has the following contributions: We formulate a utility maximization problem by considering individual preferences in the analysis. Caching policies that optimize several practically important metrics, e.g., throughput, EE, and hit-rate, and their trade-offs, can be obtained by solving the problem. We then propose a general, low-complex approach for solving the utility maximization problem, and prove that the solution approach improves at each iteration and then converges to a stationary point. Considering the realistic setup based on extensive real-world data, we conduct comprehensive sim- ulations to show the benefits of exploiting individual preferences and to demonstrate trade-offs the between the different metrics. 7.2 Network and Individual Preference Models We consider a BS-assisted cache-aided wireless D2D network, where the BS helps file delivery and makes scheduling decisions. The users can obtain the desired files by retrieving them from their own caches, D2D communications, or BS links. The file library consists of M files, and for simplicity, we assume that all files have the same size. 2 Each user is able to cacheS files in their storage. Besides, for having a nontrivial 2 This paper generally focuses on understanding the impact of individual preferences on network performance and the tradeoffs among different performance metrics. Thus, the investigation on how to deal with heterogeneous sizes of files is beyond our scope. 178 case, we requireS < M. Furthermore, in most practical situations,S << M will hold. Users can either be active or inactive. An active user is a user who places a request that needs to be satisfied and participates in the D2D cooperation, i.e., the user sends files to other users that request them. On the other hand, an inactive user is a user who does not place its own request but still participates in the D2D cooperation. We consider a widely used clustering network model [46, 70, 99, 133, 134]. In the model, there is a square cell with a BS at the center point, and the cell is divided into equal-sized square clusters with side lengthD. The users are allowed to cooperate via D2D communications only with users in the same cluster. The “cooperation distance” or “cluster size” we henceforth reference thus corresponds to the dimensionD of such a cluster, not the cell radius of the BS. We assume that there is no interference between users of different clusters; this can be achieved by letting different clusters use different time/frequency resources with “spatial reuse”. We will in the following focus on a single cluster. Nevertheless, our results can easily be extended to multi-cluster scenarios. We denote the number of active users in a cluster as K A and the number of inactive users in a cluster asK I . The total number of users in a cluster is thenK = K A +K I . The described model is shown in Fig. 7.1. We consider serving users via “random-push” scheduling [99], which functions as follows. For a cluster, the BS first randomly selects an active user without knowing whether its request can be satisfied by the user’s own (local) cache. If the selected user can obtain the desired file from the local cache, i.e., the desired file is actually cached by the selected user, then the user request is satisfied immediately. Otherwise, the BS checks whether the other users in the D2D network store the desired content and whether the channel quality between the selected user and the other users storing the desired file is larger than the minimum requirement (in terms of capacity) for a D2D transmission. If yes, then a D2D link is used to transmit the desired content; otherwise, the user needs to use the BS link to access the content. We assume that the BS has an unlimited However, based on our numerical investigations (omitted for brevity), the performance evaluation using designs with equal filesize assumption could be representative for the evaluation without equal filesize assumption. 179 Figure 7.1: This figure shows an example of the network model. In the left-middle cluster, we haveK A = 2, K I = 1,S = 2, andM = 10. bandwidth backhaul to repositories that store all files in the library. This guarantees that any request from a selected user can always be satisfied, albeit at a potentially high cost. After scheduling for the selected user, the remaining active users check whether the files in their local caches can satisfy their requests. If yes, then their requests are satisfied. Clearly, such scheduling can guarantee that at least one user is served and all users are scheduled fairly in the sense that every user is selected with equal probability by the BS for service. We note that there exist scheduling approaches with better overall throughput, e.g., the priority- push scheduling [99] and dynamic link scheduling [69]. However, using them has other drawbacks, such as unfairness and high complexity. Most importantly, it is very challenging to obtain tractable formulations for these advanced scheduling schemes [69, 99]. On the other hand, the random-push scheduling leads to tractable expressions for different critical metrics and is easy to implement; thus serving as a good reference system. It should be noted that the definitions of user distribution and channel model can influence the scheduling behaviors. We will discuss about them later in Ch. 7.3.4. We represent individual preferences for requesting video files as probabilities. We denote the request probability of user k for file m, i.e., the probability that user k wants file m in the future, as a k m , where 180 0 a k m 1;8m;k and P M m=1 a k m = 1;8k. Note that the preferences of inactive users are actually not used in the proposed problem formulation and solution approach later; their modeling of preferences is for the purpose of consistency. Different users can have different caching policies. As such, we denoteb k m as the probability that userk would cache filem, whereas the caching policy of userk is described byfb k m g M 1 , where 0b k m 1;8m;k; and P M m=1 b k m S;8k. An implementation of this probabilistic caching policy can be found in [127]. Note that the caching policy becomes deterministic when considering the limiting case thatb m k becomes 1 or 0. This is useful in the situation where the central controller knows a priori which users are going to be in a cluster, so that the caching policy can avoid detrimental file overlap (compare [71]). Such a deterministic predictability of user location occurs in a place where the same people are in geographical proximity every day, e.g., in an office scenario. We assume that users perfectly know the individual preferences of the other users in the same cluster. In other words, users know a k m ;8m;k. We also assume that users can coordinate with one another to design their caching policies for a common goal, e.g., maximizing network throughput. The users then coordinate such that they cache files by fully considering the caching policies and preferences of other users. 3 Accordingly, this approach suits scenarios where user locations are deterministic, e.g., office scenarios. Note that the information exchange and the coordination between users are assumed to be handled centrally by the BS. Moreover, it will be shown later that although users are assumed to coordinate, the algorithm does not necessarily need to be operated in a centralized manner. On the contrary, since users have a common goal and know the individual preferences of other users perfectly, each user can independently implement the same caching policy design algorithm and accordingly obtain the same coordinated design that gives the 3 A common approach used to incentivize users to coordinate is through payment by the network operator. Alternatively, since each user generally benefits from D2D communications, a token-based approach similar to the traditional file-sharing networks can ensure that specific users do not exploit the system without contributing to it. Generally, the topic of giving incentives is an important one for D2D networks. However, this is already beyond the scope of this paper, albeit it is still considered an important future direction of this paper. 181 Table 7.1: Summary of Frequently Used Notations Notations Descriptions M;S;D Number of files in the library; number of files can be stored by a user; cluster size K a ;K i ;K Number of active users; of inactive users; of total users in a cluster U A ;U I ;U index set of active users ; of inactive users; of all users a k m ;b k m request probability of userk for filem; caching probability of userk for filem L k;l ;w k probability of a successful D2D link between usersk andl; weight of userk U B ;U D ;U S Utility of using BS link; of using D2D link; of using self-caching T B ;T D ;T S Throughput of using BS link; of using D2D link; of using self-caching C B ;C D ;C S Cost of using BS link; of using D2D link; of using self-caching U net ;T net ;C net ;H net ; EE net Utility; throughput; cost; energy efficiency; hit-rate of the network P k B ;P k D ;P k S Elementary access probabilities: refer the clear definitions to (1); to (3); to (2) policies of all users. Consequently, each user can extract their own caching policies and then implement independently. In this work, users can access the desired files from their local caches, from the caches of other users, and from the BS. We consider different utilities when different types of approaches are used. The utility of accessing a file via a BS link is then denoted asU B , the utility of accessing a file via a D2D link asU D , and the utility of accessing a file via the users’ own cache asU S . Although we consider all users to have the same utility, the extension to the case that different users have different utilities is straightforward. Besides, the utility can be set differently for different practical purposes, such as throughput and EE maximization, etc. However, we will generally assume thatU B U D U S , which implies that using self-access is superior to using a D2D link, and using a D2D link is superior to using a BS link. We will discuss this more thoroughly in Ch. 7.3.3. Table 7.1 summarizes the notations frequently used in this paper. 182 7.3 Caching Policy Design Problem Our goal here is to design caching policies that optimize network utility by using the information about individual preferences. In this section, we first derive the access probabilities of different accessing ap- proaches for a user. Based on the results, we then formulate the caching policy design problem that we aim for. To clarify the usefulness of the proposed network utility maximization problem, we then show how it can be used to solve various practical problems. Finally, we discuss how the proposed network utility can accommodate different scenarios with different fading and user distributions. 7.3.1 Fundamental Access Probability Consider the system model in Ch. 7.2. We denoteU A as the index set of active users andU I as the index set of inactive users, andU =U A S U I . We denote the channel between user k and user l as h k;l and the corresponding signal-to-noise ratio (SNR) as SNR k;l . We letC be the minimal capacity requirement for establishing a D2D link. When userk is selected, the probability that userk accesses the desired file through a BS link is expressed as P k B = M X m=1 a k m " Y l2U 1b l m 1 fh k;l ;Cg # ; (7.1) where1 fh k;l ;Cg = 1 if log 2 (1 + SNR k;l ) > C; otherwise1 fh k;l ;Cg = 0. Note that Y l2U 1b l m 1 fh k;l ;Cg is the probability that filem can only be obtained via a BS link, anda k m Y l2U 1b l m 1 fh k;l ;Cg is the prob- ability that the user wants filem but filem can only be obtained via a BS link. We define the self-access probability, i.e., the probability that userk can obtain the desired file from its own cache, as P k S = M X m=1 a k m b k m : (7.2) By usingP k B andP k S , the probability that userk obtains the desired file via a D2D link is P k D = 1P k S P k B = 1 M X m=1 a k m " Y l2U 1b l m 1 fh k;l ;Cg # M X m=1 a k m b k m : (7.3) 183 7.3.2 Utility Maximization Problem Formulation Now, we derive the expected utility of the network. We assume that for any userk, the channel gains of all possible associated D2D links, i.e.,1 fh k;l ;Cg ;8l; are independent (see use cases in Ch. 7.3.4). By using the results in Ch. 7.3.1, the utility of the selected userk when userk is selected by the BS is expressed as U k =U D P k D +U B P k B +U S P k S : (7.4) We denote weightsw 1 ;w 2 ;:::;w K A as the weighting on different users, which indicates the relative priority of users. Since users are randomly selected by the BS, the expected utility contributed by the selected users is U = X k2U A w k EfU D P k D +U B P k B +U S P k S g = X k2U A w k K A h U D P k D +U B P k B +U S P k S i (7.5) The users that are not selected by the BS can still check whether their desired files are cached in their local caches. As such, we can obtain additional utilities from the users’ ability to satisfy their own requests. Thus, the expected utility of the network is U net =U +U local =U +U S 1 K A X k2U A X l2U A ;l6=k M X m=1 w l a l m b l m = X k2U A w k U D K A + (U B U D ) M X m=1 S m + (K A U S U D ) M X m=1 X k2U A w k a k m b k m K A ; (7.6) where S m = P k2U A w k a m;k K A Q l2U (1b l m L k;l ) and L k;l = Pr [log 2 (1 + SNR k;l )>C]. Note that the derivations for (7.6) are shown in appendix A. Moreover, the computation forL k;l will be discussed later in detail in Ch. 7.3.4. Using (7.6), the caching policy design problem that maximizes the network utility is: max b k m ;8k;m U net subject to P M m=1 b k m S;8k; 0b k m 1;8k;m: (7.7) 184 We then have the following Proposition. Proposition 1: The optimal solution of (7.7) must be tight at the equality of the sum constraint, i.e., for the optimal solution (b k m ) ;8k;m, we have M X m=1 (b k m ) =S;8k: (7.8) Proof. By (7.6), the first order partial derivatives ofU net is: @U net @b j m =(U B U D ) X k2U A w k a k m K A L k;j Y l2U;l6=j (1b l m L k;l ) +1 fj2U A g (K A U S U D ) w j a j m K A ;8j;m; (7.9) where1 fj2U A g = 1 whenj2U A ; otherwise1 fj2U A g = 0. SinceU B U D U S , 0 L k;l 1;8k;l, and 0 b k m 1;8k;m, we then have @U net @b k m 0;8k;m. Therefore, U net is non-decreasing with respect to b k m ;8k;m, which indicates that the optimal solution of (7.7) must be tight at the equality of the sum constraint. 7.3.3 Interpretations of the Utility Maximization Problem and its relationship to Practice In this subsection, we show how the utility maximization problem can be used in designing caching policies for solving various practical and important problems. In the following, we consider the equal-weight case, i.e.,w 1 =w 2 =::: =w K A = 1, for notation convenience. The extension to other weights is straightforward. 7.3.3.1 Throughput Maximization Problem ConsiderU B = T B ,U D = T D ,U S = T S , andT B T D T S , whereT B is the throughput of a BS link,T D is the throughput of a D2D link, andT S is the throughput of self-access. The utility maximization problem then becomes the throughput maximization problem, in which the expected throughput is T net =T D + (T B T D ) M X m=1 S m + (K A T S T D ) M X m=1 X k2U A a k m b k m K A : (7.10) 185 7.3.3.2 Cost/Power Minimization Problem LetU B =C B ,U D =C D ,U S =C S , andC B C D C S , whereC B is the cost of a BS link,C D is the cost of a D2D link, andC S is the cost of self-access. The problem can then be cast as the cost minimization problem, expressed as min b k m ;8k;m C net =C D + (C B C D ) P M m=1 S m + (K A C S C D ) P M m=1 P k2U A a k m b k m K A subject to P M m=1 b k m S;8k; 0b k m 1;8k;m: (7.11) If the power consumption is considered as cost, the problem is the power minimization problem. 7.3.3.3 Hit-Rate Maximization Problem LetU B = 0,U D = 1, andU S = 1 K A . The problem then is to maximize H net = X k2U A E h P k D +P k S i = 1 M X m=1 S m ; (7.12) which maximizes the file hit-rate of the network. 7.3.3.4 Throughput–Cost Weighted Sum Problem To attain the desired trade-off between the different metrics, a common approach used is through maximizing the weighted sum/difference of the different metrics [199]. For example, considering the trade-off between throughput and cost, we can maximize w T T net w C C net ; (7.13) wherew T 0 andw C 0. Such a weighted sum/difference problem is equivalent to the utility maximiza- tion problem as we letU B = w T T B w C C B ,U D = w T T D w C C D , andU S = w T T S w C C S . The same concept can also be used for the throughput–hit-rate trade-off. Also, the same concept can also be applied to trade-off between more than two objectives. 186 7.3.3.5 Efficiency Problem In some situations, we aim to maximize the efficiency, e.g., EE (bits/Joule). The following discussions then show that the efficiency maximization problem can be addressed by solving the weighted sum problem described in Ch. 7.3.3.4. We consider EE maximization problem as an example. The same concept can be used for other prob- lems. Suppose we aim to maximize EE, which is given as EE net = total number of bits transmitted total number of energy consumed = expected throughput of the network expected power consumed by the network = T net C net : Then, the EE maximization problem is: max b k m ;8k;m EE net = T net C net = T D +(T B T D ) P M m=1 Sm+(K A T S T D ) P M m=1 P k2U A a k m b k m K A C D +(C B C D ) P M m=1 Sm+(K A C S C D ) P M m=1 P k2U A a k m b k m K A subject to P M m=1 b k m S;8k; 0b k m 1;8k;m: (7.14) This problem is then equivalent to max t;b k m ;8k;m t subject to T net C net t; P M m=1 b k m S;8k; 0b k m 1;8k;m; (7.15) Assuming that the optimalt is known, then the problem in (7.15) is equivalent to finding the optimal policy in max b k m ;8k;m T net t C net subject to P M m=1 b k m S;8k; 0b k m 1;8k;m: (7.16) By observing (7.16), it is clear that we have a weighted difference problem similar to that described in Ch. 7.3.3.4, in whichw T = 1 andw C =t . Thus, it can be cast into the utility maximization framework. Also, the optimal policy should result inT net t C net = 0. In general, we cannot know the optimalt a priori; however, the aforementioned idea can still be used to solve the EE maximization problem. Suppose we have the same problem as that in (7.16), but we now 187 replacet witht. Accordingly, we have the following interpretations: (i) if the solution results in a positive number, i.e.,T net tC net > 0, then our solution can provide an EE larger thant; (ii) if the solution gives T net tC net < 0, then our solution provides an EE that is less than t, and t is not achievable. As such, by adjusting t based on the results and by solving the problem in (7.16) using different t, we can keep optimizingt. Thus, we improve the solution. Finally, by carefully adjustingt and by solving (7.16) many times, we can maximize the EE. Since the utility maximization problem is non-convex, we might not find the bestt and the corresponding user caching policies. However, we can still obtain an effective solution by using the above approach. This technique is identical to that used for solving a quasi-convex problem [202]. 7.3.4 Effects of the Statistics of Wireless Channels and User Distributions In (7.6), the channel quality influences the expected utility viaL k;l . Thus, understanding the general expres- sion ofL k;l and its relationship to channel physics is important. In this section, we provide several useful expressions for L k;l and then discuss its relationship to the possible scenarios. Note that if k = l, then L k;l =L k;k = 1. Therefore in the following, we considerk6=l. Letd k;l be the distance between userk and userl. The input–output relationship between usersk andl then follows the general expression: y l = q PG(d k;l )s k;l h k;l x k +n l ; (7.17) where y l is the received signal at user l; x k is the transmit signal from user k; PG(d k;l ) is the path gain effect (channel (power) gain averaged over small-scale and large-scale fading);s k;l is the shadowing power gain,h k;l is the small-scale fading amplitude; andn l is the Gaussian noise with power 2 n . LetE D be the transmission power of the D2D link. By using (7.17), the received SNR for the D2D link between usersk 188 andl is SNR k;l = E D jh k;l j 2 s k;l PG(d k;l ) 2 n , and therefore L k;l = Pr jh k;l j 2 s k;l PG(d k;l )> 2 n (2 C 1) E D : (7.18) We will show later some practical examples and then demonstrate how (7.18) is computed using user and fading distributions. The extensions to other models are feasible by leveraging the existing results of fading [193] and distance distributions [217]. 7.3.4.1 Case 1: Systems with effective link quality control In modern wireless communication systems, approaches such as adaptive power control and frequency-and- antenna-diversity, are used to combat fading effects in wireless channels. Thus, in systems with effective link quality control, we can assume that the D2D links between users in an area can be guaranteed, leading toL k;l = 1;8k;l. The exact distribution of users then becomes irrelevant in this case. 7.3.4.2 Case 2: Systems with deterministic path-loss and shadow fading When users are less mobile or stationary, the joint effect of pathloss and shadow fading between users is deterministic. As a result,s k;l and PG(d k;l ) are the given constants and are based on the exact locations of users. In this case, we focus on characterizing the small-scale fading. Thus, L k;l = Pr jh k;l j 2 > 2 n (2 C 1) E D s k;l PG(d k;l ) ; (7.19) where the closed-form expressions are attainable for commonly used fading distribution. For example, let us consider a normalized Rayleigh fading whose average power is 1, we have L k;l = exp 2 n (2 C 1) E D s k;l PG(d k;l ) : (7.20) Note that in this case, the distribution of users can be arbitrary but deterministic. 189 7.3.4.3 Case 3: K users uniformly distributed in a square with side lengthD and with shadowing and small-scale fading Here, we use the lognormal shadowing and normalized Rayleigh fading as an example. According to results in [218], the distanced between two users independently and uniformly distributed over a square area with unit side length is described by the probability density function: f sq (d) = 8 > > < > > : 2d( +d 2 4d); 0d 1; 2d(2d 2 + 4 p d 2 1 + 2 sin 1 2d 2 d 2 ); 1<d p 2: (7.21) Then, when fixing the shadowings k;l , by using the property of Rayleigh fading again, we have L k;l (s k;l ) = Z p 2D 0 exp 2 n (2 C 1) E D s k;l PG(x) f[d =x])dx = Z p 2 0 exp 2 n (2 C 1) E D s k;l PG(Dx) f sq (x))dx: (7.22) Assume that the shadowing and small-scale fading effects of different links between different users are independent. We can then generalize (7.22) as L k;l = Z p 2 0 Z 1 0 exp 2 n (2 C 1) E D sPG(Dx) f s k;l (s)ds f sq (x))dx; (7.23) wheref s k;l (s) is the pdf of the shadowing effect for the channel link between userk andl. Let the mean and standard deviation of the lognormal distribution to beu dB and F , respectively. We then obtain L k;l = Z p 2 0 Z 1 0 exp 2 n (2 C 1) E D PG(Dx)s 10= log(10) s F p 2 exp (10 log 10 (s)u dB ) 2 2 2 F ds f sq (x)dx: (7.24) It should be noted that the inner integral of (7.24) is the complement of the channel outage when the joint effect of the fading is the Suzuki distribution [193]. 7.4 Proposed Caching Policy Design From the discussions in Ch. 7.3.3, we understand that the proposed utility maximization can be used in solving many practical and important problems. We thus propose in this section a general solution approach 190 for solving (7.7). Specifically, we propose an approach that iteratively optimizes the caching policies of users. Denote b k 0 = [b k 0 1 ;:::;b k 0 M ] T as the policy vector of user k 0 . We iteratively solve the following subproblem for differentk 0 by fixing other users’ caching policies: max b k 0 U k 0 LP =U net (b 1 ;:::;b k 0;:::;b K ) (7.25a) subject to P M m=1 b k 0 m =S; (7.25b) 0b m 1;8m: (7.25c) Whenk 0 2U A , we obtain U k 0 LP = X k2U A w k U D K A + (U B U D ) M X m=1 X k2U A w k a m;k K A 2 4 Y l2U;l6=k 0 (1b l m L k;l ) 3 5 + (K A U S U D ) 2 4 M X m=1 X k2U A ;k6=k 0 w k a k m b k m K A 3 5 M X m=1 b k 0 m 0 @ (U B U D ) X k2U A w k a m;k K A L k;k 0 2 4 Y l2U;l6=k 0 (1b l m L k;l ) 3 5 + (U D K A U S ) w k 0a k 0 m K A 1 A ; (7.26) whenk 0 2U I , we obtain U k 0 LP = X k2U A w k U D K A + (U B U D ) M X m=1 X k2U A w k a m;k K A 2 4 Y l2U;l6=k 0 (1b l m L k;l ) 3 5 + (K A U S U D ) 2 4 M X m=1 X k2U A w k a k m b k m K A 3 5 M X m=1 b k 0 m 0 @ (U B U D ) X k2U A w k a m;k K A L k;k 0 2 4 Y l2U;l6=k 0 (1b l m L k;l ) 3 5 1 A : (7.27) Note that (7.26) and (7.27) are simply reformulations of (7.6), in which we isolate the terms that contain the variables to be optimized. From (7.26) and (7.27), we can see that (7.25) is a linear program. General linear program solvers could be applied to solve (7.25). However, we provide here a more 191 insightful and efficient approach via the analytical closed-form expressions in (7.26) and (7.27). By letting U k 0 ;m LP,S = 8 > > > > > > < > > > > > > : 0 @ (U D U B ) X k2U A w k a m;k K A L k;k 0 2 4 Y l2U;l6=k 0 (1b l m L k;l ) 3 5 + (K A U S U D ) w k 0a k 0 m K A 1 A ; k 0 2U A ; 0 @ (U D U B ) X k2U A w k a m;k K A L k;k 0 2 4 Y l2U;l6=k 0 (1b l m L k;l ) 3 5 1 A ; k 0 2U I ; (7.28) we notice that maximizingU k 0 LP is equivalent to maximizing M X m=1 b k 0 m U k 0 ;m LP,S : (7.29) Then, observe that the optimal solution of (7.29) subject to constraints (7.25b) and (7.25c) can be obtained by allocating the cache space to the terms offering larger payoffs. Thus, the optimal solution of (7.25) is expressed as (b k 0 m ) = 8 > > < > > : 1; m2 k 0; 0; otherwise; (7.30) where k 0 =fm : U k 0 ;m LP,S is among theS largest of allU k 0 ;m LP,S g. By iteratively solving (7.25) via (7.30) for differentk 0 until convergence, the caching policy design problem in (7.7) can be effectively solved. Denote B k =f(b k 1 ;:::;b k M ) T : P M m=1 b k m =S; 0b k m 1;8mg. The solution approach is summarized in Alg. 2. Since (7.30) suggests that the probability for a user to cache filem is either 1 or 0, we actually eliminate the probabilistic interpretation and attain the deterministic policies of users. To characterize the performance of the proposed solution approach, we provide the following theorem: Theorem 1: Alg. 2 is monotonically non-decreasing at each iteration and can converge to a stationary point if each iteration provides an unique maximizer. 4 Proof. See Appendix B. 4 If the maximizer is not unique, then we will encounter a tie between differentU k 0 ;m LP,S , which is generally unlikely as users have different preferences on different files. Thus, such unique maximizer assumption is mild. 192 Algorithm 2 Iterative User-Based Caching Policy Design At iterationr, choose a userk 0 and update b r+1 k 0 = arg max b k 02B k 0 U(b r 1 ;:::;b r k 0 1 ;b k 0;b r k 0 +1 ;:::;b r K ) b r+1 k =b r k ;8k6=k 0 Finally, we note that although the proposed design needs coordination between users, the users can independently run the proposed Alg. 2 since they know the other users’ individual preferences. In other words, Alg. 2 can actually be implemented in a decentralized manner, given that the users perfectly know the other users’ individual preferences. 7.4.1 Complexity Analysis of the Proposed Caching Policy Design From theorem 1, we can already understand the performance and convergence of the proposed design. Now, we analyze its complexity. Observe that the proposed design is based on the iterative algorithm in Alg. 2. Thus, the complexity comes from the computation at each iteration and the number of iterations required for convergence. At each iteration, the main computational complexity comes from the computing forU k;m LP,S ;8m and sortingU k;m LP,S ;8m. Then, we note that, in terms of the total number of additions and the multiplications, the complexity order when computing forU k;m LP,S ;8m isO MK 2 ; the complexity order of the sorting is O(M logM). As a result, the overall complexity order at each iteration becomesO MK 2 +M logM . Regarding the number of iterations for the convergence, the general analytical expression is intractable; thus, we run simulations to understand how many iterations we would need in practice. For the simulations, we consider the same setup as that shown in Fig. 7.4 and evaluate the proposed throughput-based design (see Ch. 7.5 for the details of the simulation setup). The convergence of a single user’s caching policy does not necessarily imply convergence of all users. Hence, we test the stopping criterion after updating the caching policies of all users in order to guarantee the convergence of all users. This is given as M X m=1 K X k=1 jb k m j 2 10 4 : (7.31) 193 Therefore, the number of iterations would be the multiple ofK, i.e., the number of iteration can only be K; 2K; 3K;:::; etc. We consider three different numbers of users for the simulations: K = 10; 20; 30. The results show that the proposed design can converge within 10K number of iterations for more than 99% of the cases. In practice, this indicates fast convergence. 7.5 Numerical Results This section provides simulation results to validate the analysis, evaluate the proposed designs, compare between different designs, and provide insights. 7.5.1 Simulation Setup We evaluate here the performance of a cluster that covers a square area and has side lengthD. In the simula- tions, we assume that the users are uniformly distributed within the cluster. Unless otherwise indicated, we assume that the users adopt the random-push scheduling. Likewise, we assume that the users have equal- weights, i.e.,w k = 1;8k2U A . We consider a practical channel model for D2D links, which consists of the pathloss, shadowing, Rayleigh fading, and Gaussian noise. The path-loss model of the D2D link between usersk andl is described as [42, 193] 20 log 10 4d 0 c + 10 log 10 d k;l d 0 ; (7.32) whered 0 = 10 m is the breakpoint distance, c = 310 8 fc m, wheref c = 2 GHz is the carrier frequency, = 3:68 is the path-loss exponent, andd k;l is the distance between usersk andl. The shadowing is modeled by a log-normal distribution with mean dB = 0 dB and standard deviation F = 8 dB, and the small-scale fading is Rayleigh distributed. We assume that the noise power spectral density isN 0 =174 dBm/Hz. We denoteE D as the transmission power of the device and SNR min = 5 dB as the minimum SNR requirement for a successful transmission of a D2D link. Thus, R min = log 2 (1 + 3:16) is the minimum transmission 194 rate of a D2D link. We then let T D = B D R min be the throughput of a D2D link, where B D = 20 MHz is the bandwidth of a D2D link. We assume that a BS link will always exists whenever a user schedules to use it. Since the BS must supply the users in many clusters, we assume that a BS link can share only 1 100 of the BS resources. Hence, the transmission power of a BS link is E B = 26 dBm, which is 1 100 of the total 46 dBm of the BS power. Similarly, the bandwidth of a BS link isB B = 200 kHz, which is 1 100 of the total 20 MHz bandwidth. We thus letT B = B B R min . Note that a 200-kHz bandwidth is enough to transmit a low-resolution video, e.g., 360p. We assume that there would be no cost when users obtain the desired file from their local caches, and we letT S = 2T D to indicate the slightly better quality of the video when self-caching is possible. 5 For simplicity, we assume that the energy cost is purely determined by the radio frequency energy required for transmission; access to storage and coding/decoding is assumed to be negligible in comparison. Thus, based on the above setup, we obtainC B = E B ,C D = E D , andC S = 0. Therefore, the EE of the network is EE net = T net C net according to the definition in Ch. 7.3.3.5. We considerM = 1000 for all simulations. To obtain the individual preferences of users and the corre- sponding system popularity for the simulations, we use the generator described in [81] to generate individual preference probabilities of 20000 users to form a dataset. Then, for each realization of the simulations, we randomly select users in the generated dataset for evaluation. As such, the system popularity of the simula- tions is simply the average of individual preferences of all 20000 users in the dataset. In the following, we show the benefits of exploiting the individual preferences by comparing between designs with and without using individual preferences. In other words, we compare between the network performances where the proposed design are implemented either by using the knowledge of individual pref- erence probabilities as in Ch. 7.4 or simply by using the system-wide popularity distribution. In the latter case, the individual preference probabilities of all users in (5.23) are replaced by the global (system) popu- 5 Although we can immediately obtain the file when it is in the local cache, the throughput is bounded by the rate by which the user watches the file. Also, mathematically, we should not letT S go to infinity if we want meaningful results. 195 larity distribution, i.e., all users assume the same preference probabilities described by the global popularity distribution. 7.5.2 Effects of the Individual Preferences In this subsection, we validate the analytical results provided in Ch. 7.3 and show the benefits of using information of individual preferences. For all simulations in this subsection, we adopt D = 80 m and E D = 20 dBm. In the figures, the results of the proposed design that uses individual preferences is labeled with “+ Individual”; the design that uses global popularity distribution is labeled with “+ Global”. We first verify our analytical formulations and show the efficacy of using the individual preferences. In Fig. 7.2, we consider bothS = 5 andS = 20 and no inactive users (K I = 0), and evaluate the proposed design in terms of the throughput and EE. When evaluating the throughput in 7.2(a), the proposed design is used for maximizing the throughput; when evaluating the EE in 7.2(b), the proposed design is used for maximizing the EE. The curves labeled with “Analytical” are directly computed from expressions in Ch. 7.3; the curves with “Simulations” are results of Monte-Carlo simulations. We observe that the analytical results match the simulations very well, thereby validating our derivations in section III. Moreover, we see that the design that exploits the individual preferences significantly outperforms the corresponding design that does not use individual preferences. We next show the impact of inactive users by observing the throughput difference. In Fig. 7.3, we considerS = 5 and compare the performances of two networks that have different number of inactive users, i.e.,K I = 0 andK I = 25. The curves are generated by using the proposed design that aims to maximize throughput. The results show that the benefits of the inactive users are more significant when the number of active users in a cluster is small. AtK A = 3, the throughput performance improves by 49% whenK I = 25; atK A = 53, it improves by only 1:5% whenK I = 25. This indicates that although inactive users can help to improve performance, such improvement becomes insignificant when too many users (in the same cluster) 196 0 10 20 30 40 50 60 Number of Active Users 0 0.5 1 1.5 2 Bits/s 10 9 Proposed + Individual (Simulation, S=5) Proposed + Global (Simulation, S=5) Proposed + Individual (Analytical, S=5) Proposed + Global (Analytical, S=5) Proposed + Individual (Simulation, S=20) Proposed + Global (Simulation, S=20) Proposed + Individual (Analytical, S=20) Proposed + Global (Analytical, S=20) (a) Throughput. 0 10 20 30 40 50 60 Number of Active Users 0 0.5 1 1.5 2 2.5 Bits/mJ 10 7 Proposed + Individual (Simulation, S=5) Proposed + Global (Simulation, S=5) Proposed + Individual (Analytical, S=5) Proposed + Global (Analytical, S=5) Proposed + Individual (Simulation, S=20) Proposed + Global (Simulation, S=20) Proposed + Individual (Analytical, S=20) Proposed + Global (Analytical, S=20) (b) EE. Figure 7.2: Comparisons between analytical and simulated results in terms of throughput and EE. 197 0 10 20 30 40 50 60 Number of Active Users 0 2 4 6 8 10 Bits/s 10 8 Proposed + Individual (K I =0) Proposed + Global (K I =0) Proposed + Individual (K I =25) Proposed + Global (K I =25) Figure 7.3: Comparisons between networks with different number of inactive users in terms of throughput. share a single D2D band. This implies that when the number of inactive users is large, we might want to have multiple D2D links [152] to benefits more from the inactive users or adjust the number of users in a cluster by reducing the cluster size. However, using either approach should be subject to careful trade-offs between different aspects, such as interference management, power control, reduction of hit-rate, etc. 7.5.3 Tradeoff Behaviors between Different Performance Metrics In this subsection, we compare different designs and show the tradeoffs between throughput, EE, and hit- rate. Specifically, in all the following figures, we compare between caching policies obtained by using the proposed design framework in pursuit of different goals, i.e., throughput, EE, hit-rate, and the throughput– hit-rate tradeoff, in terms of throughput, EE, and hit-rate. For the throughput–hit-rate tradeoff design, we design the caching policies of users by maximizing T net +T D K A H net , i.e., by using U B = T B , U D = T D +K A T D , andU S =T S +T D . Such tradoff design is interpreted as a weighted sum of throughput and hit-rate, in which the throughput is rendered the weight 1 and the hit-rate rendered the weightK A T D . Note that the termT D in the weight of the hit-rate is basically to calibrate between different units. This tradeoff design is then labeled with “TH-HIT Tradeoff - ” in the figures, where might be different to indicate different tradeoff behaviors. Considering S = 10, R = 80, E D = 13 dBm, and K I = 0, we compare different designs in Fig. 198 7.4. Unsurprisingly, the throughput-based, EE-based, and hit-rate-based designs provide the best through- put, EE, and hit-rate, respectively. The hit-rate-based design provides poor throughput because it does not consider the self-caching gains possibly brought by letting users to cache their desired files. In con- trast, the throughput-based design is not effective in terms of hit-rate because the design overemphasizes on self-caching gains. By striking a balanced viewpoint between throughput and hit-rate, the appropriate throughput–hit-rate trade-off designs can efficiently trade throughput for hit-rate. This can then significantly improves the hit-rate without degrading the throughput much. Note that by adjusting, we can effectively adjust the trade-off behavior. Finally, we observe that an energy-efficient caching policy can be obtained by balancing the throughput and hit-rate. In fact, when = 1, the throughput–hit-rate trade-off design per- forms almost as well as our proposed EE-based design. Also, it is worthwhile noting that when compared to usingE D = 20 dBm as that in Fig. 7.2, the adoption ofE D = 13 dBm here indeed reduces the power consumption significantly, resulting in much better EE. However, such transmission power reduction only slightly increases the channel outage. 6 This implies the usefulness of a good power control policy of the network. We now consider the same setup as that featured in Fig. 7.4 and compare the proposed design to some other reference designs in Fig. 7.5. Specifically, we compare the proposed design to the baseline selfish design, in which each user selfishly caches the files according to their own preferences without considering other users. The selfish design can be considered as an extreme as opposed to the maximum hit-rate design, which maximizes the cooperation between users. We also compare the proposed designs to the design that adopts the global popularity distribution, similar to those in Figs. 7.2 and 7.3. Furthermore, we compare the proposed designs to Alg. 1 in [86], which is labeled as “Guo 2017”. To adapt the design in [86] to our network model, we make some revisions on it. First, we let each group defined in [86] stand for only a single user. Besides, we let the cooperation range defined in [86] be the same as the cluster size defined in 6 The channel outage rate increases by only 0:012. 199 5 10 15 20 25 30 35 40 45 50 Number of Active Users 0 2 4 6 8 10 12 14 Bits/s 10 8 TH-HIT Tradeoff - 1 TH-HIT Tradeoff - 1/4 TH-HIT Tradeoff - 1/8 Hit-Rate Throughput EE (a) Throughput. 5 10 15 20 25 30 35 40 45 50 Number of Active Users 0 0.5 1 1.5 2 2.5 3 Bits/mJ 10 7 TH-HIT Tradeoff - 1 TH-HIT Tradeoff - 1/4 TH-HIT Tradeoff - 1/8 Hit-Rate Throughput EE (b) EE. 5 10 15 20 25 30 35 40 45 50 Number of Active Users 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Hit-Rate TH-HIT Tradeoff - 1 TH-HIT Tradeoff - 1/4 TH-HIT Tradeoff - 1/8 Hit-Rate Throughput EE (c) Hit-Rate. Figure 7.4: Comparisons between different designs in terms of throughput, EE, and hit-rate. 200 our paper. Since the design proposed in [86] assumes that a user can only make caching decision for a single cache space, we implement an naive extension for it by treating each caching space of a user independently and repeatedly using the same policy designed in [86] for every caching space of the user. The results show that our proposed designs outperform all the other designs in terms of throughput, EE, and hit-rate. Moreover, the selfish design performs well in terms of throughput, while it performs very poorly in terms of EE and hit-rate. This is because the network throughput can be effectively enhanced by having large local gains when all users are active. However, this near-optimal selfish design cannot hold if there are inactive users. We will see this later in another figure. On the other hand, the selfish design inherently provides very poor hit-rate, leading to poor EE since the BS links are frequently used. 7.5.4 Performance Evaluations with respect to Cluster Size Fig. 7.6 and Fig. 7.7 evaluate the proposed designs with respect to the cluster sizeD. Changing the cluster size should be accompanied by a suitable transmission power control of D2D links. Hence, we adopt the power control policy proposed in [99] to appropriately manage the average SNR of the received signal and the interference between clusters. This power control policy is: E D = ( p K 1) d d 0 ( 4d 0 c ) 2 ; (7.33) where K = 16 is the reuse factor and = 2 2 N 0 B D is the maximum allowable interference between clusters. 7 Such power control policy can adjust the transmission power of devices such that the average SNR of the received signal and the inter-cluster interference are almost invariant when changing the cluster size. Since D2D links are expected to exist only for short-distance transmissions, we consider D 90 m. This results in E D 20 dBm when we use (7.33) to adjust the power. We consider the Poisson point processes to model the number of active and inactive users in the cluster, where A and I represent 7 The value of is at the level of noise power; hence, for brevity, we ignore the inter-cluster interference in the simulations. Accordingly, we use here to compute only forE D. 201 5 10 15 20 25 30 35 40 45 50 Number of Active Users 0 2 4 6 8 10 12 Bits/s 10 8 Proposed Throughput Proposed EE Proposed Hit-Rate Guo 2017 Selfish Design Throughput, Global (a) Throughput. 5 10 15 20 25 30 35 40 45 50 Number of Active Users 0 0.5 1 1.5 2 2.5 Bits/mJ 10 7 Proposed Throughput Proposed EE Proposed Hit-Rate Guo 2017 Selfish Design Throughput, Global (b) EE. 5 10 15 20 25 30 35 40 45 50 Number of Active Users 0 0.2 0.4 0.6 0.8 1 Hit-Rate Proposed Throughput Proposed EE Proposed Hit-Rate Guo 2017 Selfish Design Throughput, Global (c) Hit-Rate. Figure 7.5: Comparisons between different designs in terms of throughput, EE, and hit-rate. 202 the densities of active and inactive users, respectively. Thus, the number of active and inactive users are the random variables described by the Poisson distributions with parameter A D 2 and I D 2 , respectively. Also, we need to accommodate the fact that a cluster has different numbers of users whenD is different. Hence, instead of directly looking at the throughput, we evaluate using the throughput per area (Bits=s=m 2 ). Similar to Fig. 7.4, we compare between different caching policies developed through the proposed design, with the aim of maximizing different performance metrics. In addition, we compare with two reference curves as those in Fig. 7.5. Furthermore, we include an additional reference curve which adopts the coordinated design with homogeneous modeling (labeled “Homogo Model” in Figs. 7.6 and 7.7). This curve considers the situation that the policy is designed by using global popularity distribution while the users indeed have the same preferences following the global popularity distribution. It thus represents the performance of systems that design and evaluate using the homogeneous modeling as employed in previous papers - we want to see the influences on the performance evaluation of cache-aided D2D networks when changing from a homogeneous modeling to a more practical heterogeneous modeling. In Fig. 7.6, we consider S = 10, A = 0:01, and I = 0, i.e., no inactive users. Since the area throughput and EE are influenced by multiple factors, they are not convex/concave functions in general. As such, we see that the area throughput of the throughput-based design fluctuates whenD is small; it becomes somewhat flat whenD is large. This is because the contribution of the D2D transmission becomes minor as too many users share the same D2D band in a cluster. We also see that the selfish design is relatively effective again because all users are active. As expected, hit-rate-based design provides the best hit-rate. Meanwhile, the area throughput of the hit-rate-based design continuously decreases with respect toD since it strives to improve the hit-rate without considering the influence of the self-caching gain. In contrast, the throughput-based design again provides the best area throughput, but it is not effective for hit-rate. In terms of EE, the EE-based design outperforms others significantly. Moreover, the optimal point of the EE is at a large cooperation distance because it is necessary to have high hit-rate in order to have large EE; otherwise, 203 the BS needs to serve the users by using more BS links, leading to the smaller throughput and larger power consumption, and thus poor EE. By observations, we see that the trade-off between the area throughput and EE can be attained not only through different caching policies but also through different cluster sizes. Thus, a network designer should consider both the caching policy and cluster size when designing the network. Finally, we argue that exploit- ing individual preferences is expectedly beneficial. We can see that our proposed system, which considers individual preferences in the design, performs better than the system that operates under the assumption that users have the same preferences, i.e., the curve with “Homogo Model” label. Such result implies that, rather than being detrimental, the diverse preferences of users on files can actually be used to further improve the network. In Fig. 7.7, we conduct a similar evaluation as that done in Fig. 7.6. Here, we adopt A = 0:005 and I = 0:005, i.e., there are some inactive users. We can see that most of the phenomena observed in Fig. 7.6 can be observed again here. Since we now have inactive users, they should be cooperative such that we can obtain the optimal throughput while the active users are still fairly selfish. This then distinguishes the selfish design from our proposed design, and renders the throughput-based design to perform well in terms of EE and hit-rate. We can actually observe that the difference between the optimal values of the throughput- and EE-based designs in terms of EE is smaller as compared to that in Fig. 7.6. However, the trade-off between throughput and EE is still significant as we change the cluster size. Finally, we see that the proposed system performs better than the system with pure homogeneous modeling. This again validates our point that having users with diverse preferences is beneficial. 7.5.5 Evaluations with Different Schedulers subsectionPerformance Evaluations with Different Schedulers Finally, we evaluate the proposed design in clustering networks with two different schedulers. Through this, we can show how the proposed design can 204 10 20 30 40 50 60 70 80 90 Cooperation Distance (meters) 0 0.5 1 1.5 2 2.5 3 Bits/s/m 2 10 5 Selfish Throughput, Global Throughput, Homogo Model Throughput, Individual EE, Individual Hit-Rate, Individual (a) Throughput. 10 20 30 40 50 60 70 80 90 Cooperation Distance (meters) 0 1 2 3 4 Bits/mJ 10 7 Selfish Throughput, Global Throughput, Homogo Model Throughput, Individual EE, Individual Hit-Rate, Individual (b) EE. 10 20 30 40 50 60 70 80 90 Cooperation Distance (meters) 0 0.2 0.4 0.6 0.8 1 Hit-Rate Selfish Throughput, Global Throughput, Homogo Model Throughput, Individual EE, Individual Hit-Rate, Individual (c) Hit-Rate. Figure 7.6: Comparisons between different designs in terms of throughput, EE, and hit-rate with respect to cluster size with A = 0:01 and I = 0. 205 10 20 30 40 50 60 70 80 90 Cooperation Distance (meters) 0 2 4 6 8 10 12 14 Bits/s/m 2 10 4 Selfish Throughput, Global Throughput, Homogo Model Throughput, Individual EE, Individual Hit-Rate, Individual (a) Throughput. 10 20 30 40 50 60 70 80 90 Cooperation Distance (meters) 0 0.5 1 1.5 2 2.5 3 Bits/mJ 10 7 Selfish Throughput, Global Throughput, Homogo Model Throughput, Individual EE, Individual Hit-Rate, Individual (b) EE. 10 20 30 40 50 60 70 80 90 Cooperation Distance (meters) 0 0.2 0.4 0.6 0.8 1 Hit-Rate Selfish Throughput, Global Throughput, Homogo Model Throughput, Individual EE, Individual Hit-Rate, Individual (c) Hit-Rate. Figure 7.7: Comparisons between different designs in terms of throughput, EE, and hit-rate with respect to cluster size with A = 0:005 and I = 0:005. 206 help developers to design a caching policy for a network that has a very complicated scheduler. Specifically, in addition to evaluating using the random-push scheduler, we evaluate (under the same caching policy) the “priority-push scheduler” [99], which functions as follows: every user initially checks whether the files in their local caches can satisfy their requests. If yes, then the requests are satisfied; otherwise, they send their requests to the BS. The BS then checks whether there are users that can be satisfied by using D2D links. If yes, then the BS randomly selects one user to be served by the D2D link; otherwise, the BS randomly selects one user from those sending the requests and then serve the user via a BS link. Such scheduler maximizes the usage of the D2D communications. Hence, we can expect that the priority-push network will have higher throughput and better EE than the random-push network. On the other hand, it might be unfair to those users whose preferences are not similar to the mainstream - they might be less likely to be selected and accordingly be served. More importantly, such complicated scheduler results in an intractable expression for designing caching policies. We demonstrate how to exploit the proposed designs in this work along with some numerical results to guide the designer in obtaining the effective designs for it. In Fig. 7.8, we consider the same setup as that in Fig. 7.6 and evaluate the proposed design in both net- works using the random-push and priority-push schedulers, labeled as “Random” (dash line) and “Priority” (solid line), respectively. We observe that the priority-push network generally outperforms the random-push network in terms of the area throughput and EE. 8 Besides, we observe that, in terms of the area throughput, the results of the random-push network can be fairly representative. The results for EE show some more subtle effects. We can see that the optimal cluster size for the priority-push network is much smaller, which implies that it is unnecessary in the priority-push network to have high hit-rate to obtain the best EE. This is because the priority-push scheduler would schedule a D2D link as long as there exists one, implying that it would have a higher rate for scheduling D2D links than simply the hit-rate - the probability for at least one 8 Since the hit-rate considering the priority-push scheduling is the same as the hit-rate considering the random-push scheduling, we omit the demonstration of the hit-rate for brevity. 207 10 20 30 40 50 60 70 80 90 Cooperation Distance (meters) 0 0.5 1 1.5 2 2.5 3 Bits/s/m 2 10 5 HIT, Random Throughput, Random EE, Random TH-HIT 1/8, Random HIT, Priority Throughput, Priority EE, Priority TH-HIT 1/8, Priority (a) Throughput. 10 20 30 40 50 60 70 80 90 Cooperation Distance (meters) 0 2 4 6 8 10 12 14 Bits/mJ 10 7 HIT, Random Throughput, Random EE, Random TH-HIT 1/8, Random HIT, Priority Throughput, Priority EE, Priority TH-HIT 1/8, Priority (b) EE. Figure 7.8: Comparisons between different designs in terms of throughput and EE with respect to cluster size with A = 0:01 and I = 0. 208 user to find the desired file in the D2D network is higher than a particular user to find his/her desired file. Thus, to obtain an energy-efficient design in the priority-push network, we need to choose a design that has smaller cluster size and has lower hit-rate than those that provide the optimal EE in the random-push net- work. Overall, based on the above results, we conclude that we need to reduce the cluster size and consider various trade-off designs proposed in this paper in order to obtain an effective design in the priority-push network. Since our proposed trade-off designs can efficiently evaluate the throughput and hit-rate, such trial-and-error procedure might not be challenging. 7.5.6 Summary of the Insights Here we summarize the insights from our simulation results: It is necessary to consider the influence of users’ individual preferences on system design because the evaluations done under the assumption that users have the same preference are not representative of the evaluations done while considering individual preferences. Therefore, by considering the effects of individual preferences, the proposed designs can significantly improve the network performance. A system optimized for throughput can have significant loss in EE and vice versa. Similarly, a system optimized for hit-rate can have significant loss in throughput and EE and vice versa. However, by allowing small sub-optimality in one performance can significantly improve the performance in the other. Instead of directly optimizing EE which can be complicated, one alternative approach is to use a throughput-hit-rate tradeoff design, i.e., by balancing between throughput and hit-rate, a design with high EE can by obtained. While directly finding the effective caching policy for the network using a the priority-push schedul- ing is very challenging due to mathematical intractability, various throughput-hit-rate tradeoff designs 209 obtained by considering the tractable random-push scheduling can serve as alternatives. This is be- cause the priority-push and random-push scheduling have the similar behavior, except that the former scheduling prioritizes the users that can use D2D links. 7.6 Conclusions In this work, we used the individual preferences of users to improve cache-aided D2D networks. We used an individual preference probability model to derive the network utility of a clustering network and to pro- pose a utility maximization problem accordingly. This problem can be applied to solve different important and practical problems, e.g., throughput, EE, hit-rate optimization, and different trade-off problems. By as- suming users can coordinate, a general solution approach for solving the utility maximization problem was proposed. Comprehensive numerical evaluations were conducted with practical individual preference and network setups. Our results show that we can appropriately exploit the information about the users’ indi- vidual preferences to significantly increase the performance of cache-aided D2D networks. Our results also show that throughput and hit-rate are in conflict with each other; nevertheless, such conflict can be resolved through a suitable trade-off design. To obtain an effective EE design, in addition to directly optimizing EE, we can solve a properly designed throughput–hit-rate tradeoff design, offering another perspective for EE optimization. Aside from optimizing the caching policy to improve performance, we proposed changing the cooperation distance of the D2D to achieve this goal; likewise, the tradeoff exists in this regard. Fi- nally, we demonstrated that the results of our work can serve as a foundation for designing caching polices in networks with a more involved scheduling policy. 210 C H A P T E R 8 Dynamic Caching Content Replacement in Base Station Assisted Wire- less D2D Caching Networks 8.1 Introduction The main challenge in caching networks sits on the network’s decision on which file is cached by whom. This is, by and large, considered as the caching policy design problem. Although many researchers have already investigated the different aspects of this problem using different approaches, their main emphasis generally lies on the direction of static policies based on network statistics, i.e., the same caching policy is used throughout the whole network and over the whole time horizon without considering specific dy- namics. 1 Conversely, it can be beneficial to consider dynamics of the network and to proactively conduct dynamic caching content replacement/refreshing, in which content of caches is proactively replaced accord- ing to the dynamics of the network. There are several motivations for it: (i) the popularity distribution can change with time (e.g., emergence of a new viral video) and space (e.g., recording of different sports teams are popular in different cities); (ii) the caching realizations of the network can be inappropriate (e.g., users do not cache files according to the designated policy); and (iii) user mobility can change the locally avail- 1 Static policies are adopted in Ch. 3 – Ch. 7 of this dissertation as well. 211 able user density and cached files. Such network dynamics can degrade the performance of a network that uses a statically designed caching policy, whereas adaptations (cache replacement) can automatically com- pensate. Accordingly, adopting a dynamic design can be beneficial. However, to the best of our knowledge, proactive dynamic content caching and replacement in cache-aided wireless D2D networks have still yet to be investigated fully (as will be further discussed below). Thus, this chapter aims to improve this situation. 8.1.1 Literature Review Several papers have investigated the dynamic cache replacement in femtocaching and BS-caching cases [163–169]. In [164], the authors proposed adopting a distributed caching replacement approach via Q- learning, albeit their focus was on caching at the BS and on fixed network topology. In [165], the caches of BSs were refreshed periodically to stabilize the queues of two request types and to satisfy the quality of service requirements. In [166], the authors aimed to offload the traffic to infostations, and thus used a multi-armed bandit optimization to refresh the caches of BSs. Meanwhile, [167] proposed an algorithm ex- ploiting the multi-armed bandit optimization to learn the popularity of the cached content and update the cache to increase the cache-hit rate. In [168], a reinforcement learning framework was proposed while con- sidering popularity dynamics into the analysis in order to refresh the caches in BSs and to minimize delivery cost. In [169], the loss due to outdated caching policy was analyzed for a small cell BS and an updating al- gorithm to minimize the offloading loss was proposed. Based on real-data observations, [163] established a workload model and then developed simple caching content replacement policies for edge-caching net- works. However, these caching replacement policies for femtocaching do not carry over to D2D caching networks due to the following: (i) the use of more constrained wireless channels demands a specific architec- ture for conducting replacement; (ii) the distributed file-caching structure and intertwined D2D cooperations and communications between users result in a more complicated and constrained conditions for making re- placement decisions; and (iii) the locally available cached files can change with time due to user mobility, 212 e.g., users carrying critical files could vanish right after the replacement actions. Cache replacement in users has its history in the Computer Science community [159, 219], which gen- erally consider individual replacement and/or networks with special properties without considering D2D cooperation. Although [65] implicitly used content caching replacement, the study mainly focused on joint content delivery and caching design at a given time slot at a given user demand. This is obviously dif- ferent from our goal. In [160], user cache refreshment was investigated using a Markov decision process (MDP); however, the study focused on efficient buffering for a single user and ignored the important mul- tiuser situation and D2D network communications. In [170], the problem of how users can “reactively” update their caching content was investigated. This is different from our aim of “proactively” updating the caching content. 8.1.2 Contributions In this work, we consider BS-assisted wireless D2D caching networks and focus on dealing with different dynamics, including user mobility and the change of popularity distribution. We first propose a network ar- chitecture for content replacement. Since dynamics exist when conducting content replacement, we then devise approaches to help decide which files should be cached and what files should be removed from users’ caches. To provide a general design for the network, we describe the network using several random pro- cesses, i.e., service process that describes the services for video file requests, arrival process that describes the arrivals of requests, and outage process that describes the dropping of requests. Thus, any network whose behavior can be described by those processes can use our design. To conduct replacement, we pro- pose using the broadcasting nature of the BS. To observe the network behavior and make decisions, we use a queueing system to individually queue up requests of different files. Hence the BS can make decisions by observing the network state and queueing record. Since the replacement action (via broadcasting) gener- ates cost, we thus aim to maximize the time-average service rate, defined as the average number of requests 213 served per time slot, subject to a time-average cost constraint and queue stability. The replacement problem includes three parts: (i) deciding when to conduct a replacement; (ii) deciding which files to newly cache on users; and (iii) deciding what files should be replaced, i.e., deleted from the caches. The joint design of these three problems is extremely challenging. Thus, we propose a heuristic but effective procedure for the final part of the problem. Most of the work in this chapter concentrates on the first two parts, i.e., deciding when and which files to push into the user caches. For this, we develop a sequential decision-making optimization problem, and propose a solution framework by combining the “reward-to- go” concept and the drift-plus-penalty methodology from Lyapunov optimization [220]. We also provide analytical results to show insights and benefits of this framework. Directly solving the optimization problem in the framework might not be feasible; thus, we propose two algorithms for practical implementation, as the algorithms can satisfy the time-average constraint and stabilize the queues. The first algorithm makes myopic decisions to minimize the upper bound of the drift-plus-penalty term. This approach is fairly simple; however, it uses historical record and present system states without considering future information. On the other hand, we can leverage on potential future information in the second algorithm, as it employs Monte- Carlo sampling [221, 222] to incorporate future information into the decision-making process. To enhance the second approach, two complexity-reduction approaches for Monte-Carlo sampling are proposed. We use simulations to demonstrate the efficacy of the proposed replacement designs and to gain insights into these approaches. The results show that when dynamics exist and our approaches are used, the network is significantly improved as compared to that when the static approaches are used. Our main contributions are summarized as following: We discuss the replacement problem in wireless D2D caching networks and propose a network archi- tecture for the replacement procedure. To the best of our knowledge, this is the first work to focus on dynamic replacement in wireless D2D caching networks. We formulate the replacement problem in the form of a sequential decision-making problem with 214 time-average cost constraint and queue stability. We propose a solution framework that incorporates the reward-to-go concept into the drift-plus-penalty methodology and then discuss the insights and benefits gained from adopting this framework. To put the proposed framework in practice, we develop and propose using two replacement algorithms that can satisfy the time-average constraints and stabilize the queueing system. The first algorithm is fairly simple to implement, but uses only the current system state and historical records for the content replacement. The second algorithm, on the other hand, can effectively leverage on both historical record and future predictions to make decisions. Our simulations, which adopt the practical network configurations for cache replacement, validate the effectiveness of our proposed designs. The results show that the dynamic cache replacement can significantly improve network performance. Likewise, the simulation results provide insights into the dynamic replacement process performed in this chapter. 8.2 System Model In this work, we consider a BS-assisted wireless D2D caching network, in which users in the network can cache files and communicate with one another. We consider a centrally controlled scheduling for D2D networks; the BS serves as the central controller that collects requests and caching information from users, schedules D2D communications, and decides on the replacement actions. We also assume that the BS can broadcast files to users, thereby enabling the cache content replacement for users. To focus on the performance of the on-device caching, we assume that users can be served only through self-caching, D2D caching, and broadcast without using user-specific BS links. Thus, user requests can only be satisfied in three ways: files in their own caches, files accessible via D2D communication, and broadcast files. When a user generates a request, it first checks whether this request can be satisfied by the files in its own cache, 215 i.e., by self-caching. If yes, then the request is satisfied; otherwise, i.e., a request cannot be satisfied by self- caching, this request is sent to the BS for possible services via the D2D communication or via broadcast. We assume in the file replacement process that the central controller can observe all requests sent to the BS and knows the information on which files are cached by which users. These assumptions consequently lead to some additional signaling cost. Moreover, broadcasting files from the BS to the users also induces cost. Since the amount of signaling bits is typically much smaller than the number of bits in a video delivery, the cost of the signaling overhead could be included as part of the cost of conducting a file replacement (which is mainly dominated by the cost of the file broadcast). As will be shown later in Ch. 8.3, our problem formulation considers this cost by having a time-average cost constraint. We consider a library consisting ofM files and assume that all files have equal sizes for simplicity. We assume users can cache only a single file of the library, i.e.,S = 1, in most of the chapter (Chs. 8.3-8.6) for simplicity, and extend to networks where users can cache multiple files, i.e.,S > 1, in Ch. 8.7. We consider a homogeneous request probability model which usesa m to describe the probability of a user to request file m, with P M m=1 a m = 1. 2 To describe the realization of the files cached in the network at timet, we denote the caching probability of filem asb m (t): b m (t) = N m (t) P M n=1 N n (t) ; (8.1) where N m (t) is the number of users caching file m in the D2D network at time t known by the BS via signaling. By definition, 0 b m (t) 1 then indicates the probability of file m being cached by a user of the network. We consider both active and inactive users. The active users are defined as the users who generate requests, whereas the inactive users are those who do not, albeit both types of users participate in the D2D communications. Note that an inactive user can also choose not to participate in the D2D 2 Nevertheless, it will be evident that our proposed replacement framework and designs can also be applied to networks that consider individual user preference [81, 206], although the information on individual preference is not fully leveraged. The design that fully exploits such information is an important direction for future studies 216 communications depending on the scenario assumptions. However, such inactive user is then independent of the D2D network, and can thus be ignored without restriction of generality. Moreover, as will become clear in the succeeding discussions, our replacement approach does not use the specific information regarding the number of active and inactive users for making decisions. Instead, we use the queuing dynamics of requests to implicitly convey the information on the number of active users waiting for the services. Hence, we do not need to specify the distributions (or numbers) of active and inactive users in the model. We adopt a queueing system at the BS withM queues, where queuem stores requests for filem, to help identify the historical record and make replacement decisions. We denoteQ m (t) as the number of requests in queuem at timet. The update of queuem is described as Q m (t + 1) = maxfQ m (t) +r m (t)s m (t)s out m (t); 0g; (8.2) wherer m (t) 0 is the number of requests of filem arriving at timet,s m (t) 0 is the number of requests of filem satisfied by the network at timet, ands out m (t) 0 is the outage of requests of filem at timet. Here, an outage is defined as a user dropping the request before being served by the network. It should be noted that r m (t) and s m (t) of (8.2) do not include the requests and services directly satisfied by and provided through self-caching. This is because those requests that self-caching can satisfy would be directly handled by the corresponding services, and they cancel each other. Such result is in line with our model, which posits that a BS can only observe those requests that self-caching cannot satisfy. On the contrary, the impact of self- caching is implicitly considered in the process, as the requests satisfied by self-caching are resolved without having to add requests to the queuing system. Note that when evaluating the overall network performance in simulations, the requests satisfied by self-caching are still considered in the evaluation. Our results in this chapter can be used by any file request and content delivery model described in (8.2). As a result, we do not assume a specific file request and content delivery model. Observe that Q m (t), r m (t), s m (t), and s out m (t) are random processes, where r m (t) is related to the popularity distribution and the number of users and their modes;s m (t) is related to the caching distribution 217 of users and the number of users in the network; s out m (t) depends on the user’s willingness to wait for the service. Obviously, for files that are not stored by any of the users, if there is no other sources for accessing them (e.g., file broadcasting due to replacement actions), then an outage occurs no matter how long the user is willing to wait. With these interpretations, we identify conditions, i.e., time scale decomposition and monotonicity below, that can significantly benefit the replacement scheme. Note that these conditions are not assumptions that will be used in our analysis later on. Instead, they describe the conditions that would give large replacement benefits in practice. However, violating these conditions can gradually decrease the performance gain. For example, when user mobility becomes faster, the performance gradually degrades (see Fig. 8.5). Despite this, violating these conditions does not prevent us from using the analytical results and replacement designs provided in this chapter. Time scale decomposition: 1. Popularity distribution varies slowly with respect to the replacement, i.e.,E T pop >E T rep , where E T pop is the average time period that the popularity distribution stays invariant andE T rep is the average time between two replacement actions. 2. User mobility is slow with respect to the replacement, i.e.,E [T cell ]>E T rep , whereE [T cell ] is the average time period that a user stays in the effective service area of the same BS. 3. The user mode switches from active to inactive at a frequency similar to or slower than the frequency of the replacement, i.e., E [T mode ] > E T rep , whereE [T mode ] is the average minimal time period that each user switches from active to inactive. This condition guarantees that a user request stays in the queue for a reasonably long period. Monotonicity: 1. When the number of requests is sufficient, the number of services, E [s m (t)], is monotonically in- creasing as a function ofb m (t). However,s m (t) can also be a function of other parameters, such as 218 queue sizeQ m (t), user locations, user modes, etc. Usually, the more that the network caches a file, the higher the service rate for the network would be for that file. 2. The expected number of outages,E [s out m (t)], is a monotonically increasing function of the queue size Q m (t). This is also commonly observed since a larger queue size indicates longer latency of delivery, and thus higher probability that users would cancel a request. The overall procedure in time slot t is as follows: the users first check whether their requests can be satisfied by the files in their own caches. If yes, then the requests are satisfied. Otherwise, users send requests to the BS. The BS then collects the requests and observesr m (t) (Q m (t);8m; are already known), and then decides what action to take. If the BS decides to conduct a file replacement, then the replacement procedure is consequently conducted according to the decision. After the action, the network serves the users by a pre-determined content delivery mechanism and decides s m (t). Finally, the transition of user modes is conducted leading tos out m (t). We then finish time slott, and the network transitions to time slot t + 1. The following summarizes the assumptions and feasibility of our model: 1. The BS can centrally control the D2D scheduling and conduct replacement action, collect requests that cannot be satisfied by self-caching, and collect information on what files are cached by users. 2. To focus on the effects of on-device caching, users are served only by self-caching, D2D-caching, and broadcast from the BS. 3. Users can be either active or inactive. Since our replacement will use the queuing dynamics of requests to make decisions, we do not need to specify the statistics on numbers of active and inactive users in the network (of course, we need to specify their statistics for obtaining the numerical results in Ch. 8.8). 4. Our model is very general such that any file request and any content delivery model that (8.2) describes 219 can use our design. We thus do not specify a file request and content delivery model here (again, we need a specific file request and content delivery model for obtaining the numerical results in Ch. 8.8). 5. Although our design could be feasibly used in general situations, this does not mean it would perform well under extreme scenarios, e.g., high-mobility scenarios. We thus discussed the conditions, i.e., time scale decomposition and monotonicity, that would give large benefits. 8.3 Dynamic Caching Content Replacement In this section, we first describe the caching content replacement procedure, and then introduce the mathe- matical formulation of the replacement problem. We assume that S = 1 and that the BS can broadcast a single file at a time. 3 Suppose we want to increaseb m (t) byd m (t), where 0<d m (t) 1b m (t) is the replacement step-size, i.e., we want to replace other files by filem with a targeted fractiond m (t). To do this, the BS broadcasts filem to users and decides which files should be replaced or deleted from the cache. Here, our policy is to first replace those files that have the lowest “pressure”, i.e., smallest queue size, on the queue. To be specific, we first construct a file replacement order by assigning a smaller index to the file with the smaller queue size. Thereafter, we select and replace the files that have the lowest index, and then follow the order of the indices to drop files until we achieve the desired ratio of files, i.e.,d m (t). Note that the user that should drop the file is selected randomly. For example, when deciding to drop file 3 and cache file 1 from the broadcasting, the users that should perform this operation are selected randomly from the set of users caching file 3 in the network. To provide a concrete example, suppose we have 3 files withb 1 (t) = 0:3, b 2 (t) = 0:3, b 3 (t) = 0:4 and Q 1 (t) = 8,Q 2 (t) = 4,Q 3 (t) = 2, and want to increase file 1 byd 1 (t) = 0:05. The BS broadcasts file 1 and selects file 3 to be replaced by the ratio of 0:05, resulting inb 1 (t) = 0:35,b 2 (t) = 0:3,b 3 (t) = 0:35 after 3 The issue of extending the broadcast of multiple files at a time is still a subject for future research. As such, broadcasting multiple files within a short period is not too different from broadcasting only one single file at a time. 220 the replacement. Consider another example that we want to increase file 1 byd 1 (t) = 0:5. Then we again broadcast file 1 and replace files, leading tob 1 (t) = 0:8,b 2 (t) = 0:2,b 3 (t) = 0 after the replacement. The intuition of this replacement procedure is that the file with a lower pressure likely is cached on users more frequently than is necessary to serve the user requests. We note that since the number of files cached in the network is integer in practice, we cannot realize arbitrary step-size. Thus in practice, we useN rep to decide how many users should conduct the replacement, whereN rep = round(Ud m (t)) is the integer that can provide the closest approximation to the desired step-size andU is the number of users in the network. It is obvious that the considered replacement procedure can be further optimized by considering more flexible strategies. Instead of dropping all the files with the smallest index first, and then the second (see the second example), we can flexibly switch between dropping different files. However, this flexibility complicates the problem. Since we focus on deciding when and which file should be broadcast and what step-size to take, investigating this flexible assignment is left for future work. Note this suboptimal replacement procedure is effective enough if we choose carefully both the file to be broadcast and the step-size. For most of this work, we focus on deciding when and which files should be broadcast and newly cached by users and what step-size to take in the replacement procedure. The goal of the decisions is to maximize the time-average number of requests satisfied by the D2D network subject to the cost constraint and queue stability. We define a broadcasting action at time t as a two tuple: (m;d m (t)), where m = 1; 2;:::;M is the file being broadcast and 0 < d m (t) 1b m (t) is the replacement step-size of broadcasting file m. We also define the silent action without broadcasting asA slt = (0; 0). Consequently, denotingD m (t) as the set involving all possible step-sizes of broadcasting file m at time t, the action space at time t is A(t) =A br (t)[fA slt g, where A br (t) =f(m;d m (t))jm = 1;:::;M;d m (t)2D m (t)g: (8.3) The cardinality ofD m (t) can be infinitely large sinced m (t) is generally a real number. However, in practice, D m (t) is finite because we only have finite number of users and because we can implement quantization. 221 With the definition of the action space, our replacement problem is mathematically formulated as max P lim inf T!1 1 T T1 X t=0 M X m=1 E h s A(t) m (t) i (8.4a) s.t. lim sup T!1 1 T T1 X t=0 E h c A(t) inst (t) i C (8.4b) lim sup T!1 1 T T1 X t=0 E h Q A(t) m (t) i <1;8m; (8.4c) where A(t) 2 A(t) is the action we take at time t according to some policy P ; c A(t) inst (t) is the cost of action A(t); C is the cost constraint; and Q A(t) m (t) is the size of queue m under a sequence of decisions A(0);A(1);:::;A(t1). Note that we useQ m (t) for the general purpose, whereas we useQ A(t) m (t) to stress that the result is under the sequencefA(0);A(1);:::;A(t1)g. Besides, the superscript A(t) is to explicitly indicate that decision sequenceA(t) influences the random processes. This concept applies to all notations in the remainder of this chapter. In the formulation, (8.4b) indicates that we need to follow a time-average cost constraint. Besides, (8.4c) indicates that we need to stabilize every queue such that all requests can be possibly served as long as they stay in the system [220]. 4 Furthermore, note thats A(t) m (t) in the objective function can be replaced by some other reward functions, such as number of bits. In this case, we need to use number of bits to represent our queue size. In addition,s A(t) m (t) is indeed a function of the system parameter setP(t), which is subject to the actual file request and content delivery mechanism of the networks. However, to simplify the notation in the chapter, we do not explicitly write dependence onP(t). Finally, althoughc A(t) inst (t) can be different when we choose to conduct different actions, we simply assume here a constant cost when broadcasting different files and zero cost when being silent without broadcasting. Mathematically, we thus let c A(t) inst (t) = c if 4 Note that due to the physical constraints in practice, it is possible that queuesQm(t) are bounded inherently. However, in this case, it is still meaningful to derive an algorithm based on the notion that queues can become infinite. This is because any algorithm derived for a limiting case will work close to optimum for a finite, but sufficiently large, n. Consequently, we are devising an algorithm assuming a large number of requests, which requires queue stabilization, and then apply this algorithm to cases of inherently finite queue lengths. 222 A(t)2A br (t) andc A(t) inst (t) = 0 ifA(t) = A slt . Note that since conducting a broadcasting action should induce a much higher cost than being silent without broadcasting, a broadcasting action cannot always be performed. As a result, we generally setc to be larger thanC. 8.4 Drift-Plus-Penalty Aided Minimization Methodology Considering the replacement architecture proposed in Ch. 8.3, our goal is to find a policy P that maxi- mizes the time-average service rate while subject to queue stability and cost constraint as described in (8.4). However, solving (8.4) is a sequential decision-making problem, which is very challenging under general conditions and with large dimension. To solve this, we combine the drift-plus-penalty methodology in Lya- punov optimization [220] with the idea of “reward-to-go” [223], i.e., the reward in the future, to develop the policy design framework. First, we define the reward-to-go for filem at timet inl time-slots as: ~ R m (t;A(t)) = 1 l s A(t) m (t) +E " l1 X =1 s A(t+) m (t +) #! ; (8.5) whereA(t +); = 0;:::;l 1 are actions determined by a policyP andE h s A(t) m (t +) i is the expected service rate in theth time-slot after the considered timet. With this definition, we then formulate another optimization problem: max P lim inf T!1 1 T T1 X t=0 M X m=1 E h ~ R m (t;A(t)) i s.t. (8.4b), (8.4c); (8.6) whereA(t);8t; are determined by a policyP . We then provide the following Lemma: Lemma 1: Suppose actionsA(t)2A(t);8t; are determined by a policyP and the expected service rate is upper bounded by a finite numbers max asE h s A(t) m (t) i s max . Accordingly, the following holds: lim T!1 1 T T1 X t=0 E h s A(t) m (t) i T1 X t=0 E h ~ R m (t;A(t)) i ! = 0: (8.7) 223 Proof. See Appendix F.1. Lemma 1 shows that the optimization problem in (8.4) is equivalent to that in (8.6). Besides, when l = 1, (8.6) automatically degenerates to (8.4). Thus, Lemma 1 explains the rationale of considering (8.6). To find the effective solution for (8.6), we consider using the drift-plus-penalty-minimization methodology. To define the drift, we first introduce a virtual cost queue: Z(t + 1) = max Z(t) +c A(t) inst (t)C; 0 ; (8.8) where 0 Z(0) <1 is the initial condition. We assume that the number of arrivals is bounded, i.e., r m (t)<1;8m. Then, by (8.2) and (8.8), we can obtain: M X m=1 [Q m (t + 1)] 2 + [Z(t + 1)] 2 M X m=1 h Q m (t) +r m (t)s A(t) m (t)s out m (t) i 2 + h Z(t) +c A(t) inst (t)C i 2 M X m=1 [Q m (t)] 2 + [Z(t)] 2 + 2 " M X m=1 Q m (t) r m (t)s A(t) m (t)s out m (t) +Z(t) c A(t) inst (t)C # + 2B; (8.9) where 2B M X m=1 r m (t)s A(t) m (t)s out m (t) 2 + c A(t) inst (t)C 2 0 is a constant. We define L(t) = 1 2 h P M m=1 (Q m (t)) 2 + (Z(t)) 2 i and define the drift as (t) = L(t + 1)L(t). Consider a finite non-negative numberV . The drift-plus-penalty is then bounded as: (t)V M X m=1 ~ R m (t;A(t)) (8.10a) M X m=1 Q m (t)(r m (t)s A(t) m (t)s out m (t))V M X m=1 ~ R m (t;A(t)) +Z(t)(c A(t) inst (t)C) +B (8.10b) M X m=1 Q m (t)(r m (t)s A(t) m (t))V M X m=1 ~ R m (t;A(t)) +Z(t)(c A(t) inst (t)C) +B: (8.10c) A policy that selects actions by minimizing the drift-plus-penalty in (8.10c) leads to the following theorems: 5 5 We note that by following the concept of the Bellman’s principle of optimality, the necessary condition for an optimal policy 224 Theorem 1: Suppose M, V , Z(0), and Q m (0);8m; are some finite numbers. Assume r m (t) r max ;8m; are finite and bounded; C > 0 andc A(t) inst (t);8A(t)2A(t); are also finite and bounded. If the adopted policy chooses the actionA(t)2A(t) such that (8.10c) is minimized for allt, thenQ A(t) m (t);8m;8t, are upper bounded. Accordingly, constraints in (8.4c) are satisfied, i.e., every queue is stable. Moreover, the time-average cost constraint in (8.4b) is satisfied. Proof. See Appendix F.2. Theorem 2: Assume P M m=1 E[Q m (t)] V , E[Z(t)] V , r m (t) r max , and c A(t) inst (t) C max for some finite positive,,r max , andC max . Assume that P M m=1 ~ R m (t;A(t)) is finite and upper bounded. When the actions A(t)2A(t);8t; are determined by a policy P , there must exist a finite non-negative numbery such that lim inf t!1 1 T T1 X t=0 M X m=1 E h ~ R m (t;A(t)) i y B V r max C max : (8.11) Furthermore,y can be maximized when (8.10c) is minimized at allt. Proof. See Appendix F.3. By observing the proof of Theorem 1,Q A(t) m (t);8m, andZ(t) are upper bounded. Therefore, the prereq- uisite of Theorem 2 can be realized. Theorem 2 indicates that minimizing (8.10c) can effectively maximize y . In addition,V controls the trade-off between the performance of the reward-to-go and the real and cost queue lengths. WhenV = 0, Theorem 2 induces a trivial lower bound. However, this does not necessarily mean that the time-average service rate in this situation is very poor. This is because even ifV = 0, we can P is that, while subject to the queuing stability and cost constraint,P can at each time slot maximize ~ Rm(t;A(t)) withl!1. Consequently, whenl andV tend to infinity, the drift-plus-penalty-minimization can satisfy the optimality condition. However, since directly finding ~ Rm(t;A(t)) withl!1 might not be possible, we need to resort to minimizing (8.6) with finitel andV for finding a feasible solution. 225 still stabilize the queuing system, which implicitly provides good service rate. In this context, the inclusion of the penalty term can be interpreted as a means of controlling the optimization of the service rate. Finally, we show a lower bound performance of the proposed design using Theorem 3: Theorem 3: Assume that there exists a randomized policy that is i.i.d. with respective to timet and independent toQ m (t),Z(t), and toB m (t);8m; such that the following is satisfied: E c inst (t) C; E s m (t) y m ;8m; E " M X m=1 r m (t)s m (t) # ; (8.12) where could be arbitrary small. SupposeA(t)2A(t);8t; are determined by a policyP that minimizes (8.10c). Then, the following is satisfied: lim inf T!1 1 T T1 X t=0 M X m=1 E h ~ R m (t;A(t)) i M X m=1 y m B V : (8.13) Proof. See Appendix F.4. Theorem 3 indicates that the drift-plus-penalty minimization approach can be better than an arbitrary randomized design. This characterizes a lower bound performance of the drift-plus-penalty-minimization methodology. The above results show the benefits of using the drift-plus-penalty methodology to design a policy. However, directly minimizing (8.10c) can be very difficult or even impossible due to the need of computing P M m=1 ~ R m (t;A(t)). We thus propose in Ch. 8.5 and 8.6 two alternative designs that can be practiced to help resolve this issue. 8.5 Myopic Drift-Plus-Penalty Aided Minimization Replacement In this section, we propose the first design which myopically minimizes the drift-plus-penalty, i.e., the drift- plus-penalty minimization is performed without considering the future payoff. Observe that whenl = 1, the 226 drift-plus-penalty can be bounded as (t)V M X m=1 ~ R m (t;A(t)) M X m=1 Q m (t)(r m (t)s A(t) m (t))V M X m=1 s A(t) m (t) +Z(t)(c A(t) inst (t)C) +B M X m=1 Q m (t)s A(t) m (t)V M X m=1 s A(t) m (t) +Z(t)c A(t) inst (t)Z(t)C +X +B (a) (Q m (t) +V )s A(t) m (t) +Z(t)c A(t) inst (t)Z(t)C +X +B;m = 1; 2;:::;M; (8.14) whereX P M m=1 Q m (t)r m (t) 0 is a constant-bound given thatQ m (t);8m; are upper bounded (see Theorem 4 later); (a) is becauseQ m (t)s A(t) m (t) 0;8m. The original drift-plus-penalty methodology aims to minimize the first inequality in (8.14). However, when the D2D scheduler is complicated, s A(t) m (t) might not have an analytical expression that is easy to compute or estimate under different actions. Thus, we use the final inequality in (8.14) and develop a simplification that is based on the following observation: if we choose to broadcast filem at timet, then we can immediately knows A(t) m (t) = Q m (t) +r m (t) since the broadcast can satisfy all requests for filem at timet. Besides, since we assume no cost for silence, we also know that a sufficient condition to choose to be silent is: (Q m (t) +V )s Am m (t) +Z(t)c Am inst (t)> 0;8m; (8.15) whereA m 2A br (t) denotes any action broadcasting filem. By previous observations, we solve the follow- ing optimization problem for making the decision: A(t) = arg min A2A(t) g A (t); (8.16) whereg A (t) = h P M m=1 1 fA=Amg (Q m (t) +V )s A m (t) i +Z(t)c A inst (t) and 1 fA=Amg is an indicator func- tion that has value of 1 only if the BS broadcasts filem. Note that whenA = A slt ,g A slt(t) = 0 and when A =A m ,g Am;V (t) =(Q m (t) +V ) (Q m (t) +r m (t)) +Z(t)c Am inst . As a result, solving (8.16) is very sim- ple. The intuition in the solution to (8.16) is that the system tends to broadcast the file with higher pressure 227 on the queue provided that the pressure in the virtual (cost) queue is sufficiently low. The complete replacement approach is to solve (8.16) and decide the action at every time slot. Since (8.16) can be easily solved, the complexity of the approach is low. Also, since the proposed approach here exploits only the history record (queue sizes) and the current system state without using any future information, this approach is named myopic drift-plus-penalty minimization (MyDPP) replacement. Note that MyDPP cannot distinguish the differences in step-sizes when we broadcast the same file; thus this approach cannot adaptively select step-sizes. Consequently, when implementing the MyDPP replacement, we consider a compressed broadcasting action spaceA cp (t), in which a constant step-sized is adopted for any broadcasting action. Mathematically, this indicatesA(t) =A cp (t)[fA slt g for the MyDPP replacement, where A cp (t) =f(m;d m (t))jm = 1;:::;M;d m (t) =dg (8.17) Note that the constant step-size d of the replacement procedure should be carefully selected right at the beginning. Finally, the overall algorithm of MyDPP replacement is summarized in Alg. 3. In addition, the proposed MyDPP approach can guarantee the time-average cost constraint and stabilize the queues according to the Theorem 4: Theorem 4: Suppose M, V , Z(0), and Q m (0);8m; are some finite numbers. Consider r m (t) r max ;8m; are finite and bounded; C > 0 and c A(t) inst (t);8A(t)2A(t); are also finite and bounded. Con- sidering using the MyDPP policy, then Q A(t) m (t);8m;8t, are upper bounded. Accordingly, constraints in (8.4c) are satisfied. Moreover, the time-average cost constraint in (8.4b) is satisfied. Proof. The proof follows the similar approach in Theorem 1. We thus omit it for brevity. 228 Algorithm 3 Proposed MyDPP Replacement Design 1: Init: Start att = 0,Q m (0) 0;8m,Z(0) 0. Set step-sized(t) =d andV 0. 2: fort = 0; 1;::: do 3: Evaluateg Am;V (t) =(Q m (t) +V ) (Q m (t) +r m (t)) +Z(t)c Am inst ;8m 4: if min m=1;:::;M g Am;V (t)< 0 then 5: Broadcast the filem, wherem = arg min m=1;::;M g Am;V (t) 6: Conduct the replacement procedure provided in Ch. 8.3 with step-sized 7: else 8: Keep silent, i.e.,A(t) =A slt 9: end if 10: Update the real queuesQ m (t);8m; and the virtual queueZ(t) 11: end for 8.6 Drift-Plus-Penalty Aided Minimization Replacement Exploiting Sam- pling Next, we derive a method that exploits the potential future information, i.e., the knowledge about future changes, in the popularity distribution and the corresponding payoff. Guessing the future popularity distri- bution (e.g., which videos will become ”viral”) is a widely investigated topic; thus we will not be discussed further. Instead, we simply assume here that such future information is available on the BS. Besides, we as- sume the operation of the network can be modeled and simulated via Monte-Carlo methods. Accordingly, we propose the second design, which decides on the actions to take with the aid of future information (i.e., l> 1). 229 8.6.1 Proposed Replacement Exploiting Sampling and Rolling Horizon To introduce the future information and to satisfy the constraints in (8.4), we first need to minimize (8.10c) with finite V and l > 1. However, computing for ~ R(t;A(t)) at each time might be impossible and/or can be very complex. Therefore, we propose an alternative approach for estimating ~ R(t;A(t)). Besides, to reduce complexity, we aim to skip such estimation for some time-slots. As such, we observe that on one hand, we should not broadcast if the cost queue is already highly pressured. On the other hand, if we broadcast, then the algorithm shall select the action that provides the highest reward-to-go. This observation then breaks down the decision-making problem into two sub-problems: (i) whether to conduct a broadcast with replacement; and (ii) which file to broadcast with what step-size. We then solve the sub-problems sequentially by exploiting the drift-plus-penalty methodology. When V = 0 and l = 1, the drift-plus-penalty approach leads to the most stable and cost effective network, which indicates that such approach is conservative. Thus, we solve the problem withV = 0 and l = 1 to decide whether to conduct a broadcast, i.e., we exploit the MyDPP approach withV = 0 in Ch. 8.5 to decide whether to broadcast a file. When we decide to broadcast a file, we need to select the specific file and the step-size. In this case, we considerV =1 6 to optimize this decision, and then introduce the Monte-Carlo sampling along with a probabilistic candidate selection approach to estimate ~ R(t;A(t)). Suppose we have decided to broadcast and conduct a replacement. We first construct the candidate set that includes all the possible broadcasting candidates. Recall that when we consider to broadcast at timet, we select an action fromA br (t) in (8.3). Accordingly, the candidate set (t) is constructed by including all possible broadcasting actions, i.e., all possible combinations of the broadcasting files and step-sizes: (t) =f = (m;d m )jm = 1; 2;::;M;d m 2D m (t)g: (8.18) 6 In practice, differentV could be considered for different tradeoffs. However, this does not change the essence of our design. 230 We then use the proposed Monte-Carlo based sampling to select the best action. Suppose we are in time slott. A Monte-Carlo sample of a candidate = (m;d m ) is derived by using the followingT stage stage simulation procedure: 1. At the simulation timek =t, 7 we broadcast filem with a step-sized m , and then simulate the system using Monte-Carlo method and record ^ R(k;A(k);W (k)), where ^ R(k;A(k);W (k)) is the sampling reward with randomnessW (k) at timek. 2. At simulation timek = t + 1 tot +T stage 1, we simulate the system withA(k) = A slt , i.e., the system is silent, and record ^ R(k;A(k);W (k)). 3. Output the estimated reward-to-go of candidate: ~ R t () = t+T stage1 X k=t ^ R(k;A(k);W (k)). We note that we assume the operation of the system can be modeled and simulated effectively. Besides, T stage needs to be carefully chosen to provide effective approximations. Since we conduct simulations considering only a single broadcast in T stage time-slots, T stage is suggested to be the average number of time slots between two replacement actions. This is because, by definition, the system should remain silent between two replacement actions. Note that cost constraint and the cost of each broadcast determines the average number of time slots between two replacement actions. For example, if the broadcasting cost is c A inst (t) =c;8A2A br (t), then we can on average broadcast only once every c C time slots. We now describe the candidate selection following the idea in [221]. Suppose we acquireN samples for each candidate at time slott. Denote the selection probability for a candidate as n () when considering n samples, where P 2 n () = 1. For a candidate2 , the update of the selection probability is: n () = ( ) ~ Rt;n() P 2 ( ) ~ Rt;n() n1 ();n = 1;:::;N; (8.19) where is the annealing coefficient of candidate and ~ R t;n () is the sampling reward-to-go for sample n of candidate at time t. We then use the selection probability N ();8 2 to decide which file 7 Note that we conduct the system simulation starting atk =t 231 to broadcast and what its corresponding step-size should be; that is, we decide the final action according to the sample that the distribution N () randomly generates. The initial selection probabilities can be any distribution such that P 2 0 () = 1. However, we usually consider the uniform distribution for initialization. We stress that, according to Theorem 3.1 in [221], when N tends to infinity, this selection approach converges to the optimal distribution that offers the optimal reward based on the given sampling procedure and on the candidate set. We now summarize the proposed replacement approach in this sub-section as follows and in Alg. 4. At each timet, we first decide whether to broadcast using Alg. 3. If the result of Alg. 3 suggests broadcasting, then we enter the next phase, in which we decide the broadcasting file and the step-size; otherwise, the system remains silent. If we broadcast, then we need to construct the candidate set and use Monte- Carlo sampling to acquire reward-to-go samples. We then compute for the final selection distribution N () using (8.19); the action, including both the broadcasting file and step-size, is determined by using a random sample of the selection distribution. The replacement approach proposed here is named sampling based drift-plus-penalty (SPDPP). Compared with MyDPP, SPDPP can adaptively adjust the step-size and exploit the future benefits to make decisions. Besides, the proposed SPDPP replacement can also satisfy the required constraints. This is because we use the same approach as MyDPP to decide whether to broadcast or not. We thus omit the proofs for brevity. 8.6.2 Complexity Reduction Approach Alg. 4 considers all possible broadcasting files and step-sizes as candidates and uses a pre-determined sample sizeN. However, it is sometimes unnecessary to go through all candidates and use up toN samples for every candidate. In this section, we discuss some approaches to make Alg. 4 less complex. Specifically, we aim to use the algorithm itself to decide the number of candidates and samples. We therefore propose two complexity reduction approaches that can be used simultaneously. 232 Algorithm 4 Proposed SPDPP Replacement Design 1: Init: SetQ m (0) 0;8m,Z(0) 0, and the number of samplesN 2: fort = 0; 1;::: do 3: Evaluateg Am;0 (t) =Q m (t) (Q m (t) +r m (t)) +Z(t)c Am inst ;8m 4: if min m=1;::;M g Am;0 (t)< 0 then 5: Construct the candidate set 6: Compute N () using (8.19) with the proposed sampling procedure 7: Select the action: (m;d m ) = N () 8: Broadcast filem and conduct the replacement procedure with step-sized m 9: else 10: Keep silent, i.e.,A(t) =A slt 11: end if 12: Update the real queuesQ m (t);8m; and the virtual queueZ(t) 13: end for 8.6.2.1 Initial Candidate Number Reduction In some situations, some files are redundantly cached to the point that we even want to decrease their percentages in the network. Thus, we do not have to include them in the candidate set. To identify those files, we observe that we broadcast only if there exists a file m such that g Am;0 (t) < 0. This indicates that it is more necessary to broadcast files with g Am;0 (t) < 0. Thus, we can include only those files in our candidate set. Note that this approach might result in the drop of the optimal solution. However, the probability for this to occur can be reduced by setting some lower bound on the minimal number of files to be included in the candidate set, and then adding files with smallerg Am;0 (t) in an ascending order. In addition, we can also set up a hard constraint for the maximum number of files included in the candidate set. 233 Although this might result in the loss of the optimal solution, it could also effectively bound complexity in practice. 8.6.2.2 Sampling with candidate pruning We can adaptively prune the candidates to reduce the number of samples per candidate during the sampling process. Recall that the update of the selection distribution n () is a sequential update. Thus, instead of completely generating ~ R k;n ();8;n; and then find the final selection distribution, we can gradually gener- ate the samples and update the selection distribution; that is, we generate ~ R k;1 ();8; and then compute for 1 (); generate ~ R k;2 ();8; and then compute for 2 (); and so on. In updating the selection distribution, when there is a candidate such that n () < , it is improbable that this candidate would be selected. Hence, we set n () = 0 and normalize n () such that their sum is still equal to one. When n () = 0, we then know that it is never selected. Thus, is pruned from the candidate set, and we no longer need to generate more samples for this candidate. This process continues until either there exists a candidate such that n () = 1 or untiln =N is reached. This approach can reduce the number of candidates during the sampling process, and can allow to terminate the process earlier. It is clear that when! 0, this approach tends to maintain optimality asymptotically. 8.7 Extension to Caching Multiple Files For the convenience of elaborating the designs and fundamental concepts, we assumed in the previous sections that each user would cache only one file, i.e., S = 1. Here, we describe how we extend the proposed designs to the networks such that the users can cache multiple files, i.e., S > 1. As discussed previously, a caching content replacement is constituted by deciding which file should be newly cached by users and which file should be removed. To extend the proposed designs, we first extend the replacement procedure in Ch. 8.3 to determine what files to remove from the users whenS > 1. Suppose we want to 234 increaseb m (t) byd m (t). We first find all users who do not cache filem. Among those users, we randomly selectN rep users such thatb m (t) can increased m (t) if those users newly cache filem. Recall that theN rep defined in Ch. 8.3 is the integer that can provide the closest approximation to the desired step-size. When the selected users receive the broadcast filem, they need to decide which file to remove from their caches in order to cache filem. To make the decision, each user looks at the files in their own caches and removes the file that has the smallest corresponding queue size. Clearly, such decision follows the similar intuition as that discussed in Ch. 8.3 – we remove the file whose corresponding queue has the lowest pressure. With this extended replacement procedure, our designs, aiming to decide what file should be newly cached by users, can directly be applied to the networks. Thus, to conduct the replacement in networks withS > 1, we first decide when and which file to newly cache by using the same approaches as those proposed in Ch. 8.5 and Ch. 8.6, and then use the extended replacement procedure to decide which file should be removed by which user. 8.8 Performance Evaluations and Discussions In this section, we use simulations to evaluate the proposed replacement designs and provide relevant discus- sions. Note that although we need to consider a specific file request and content delivery mechanism in the following simulations for the purpose of obtaining numerical results, this does not mean that our proposed framework and algorithms are restricted to it. 8 8 Simulation results under a different simulation environment can be found in the conference version of this chapter [224]. Although we present only the results of the MyDPP approach in [224], the results still demonstrate the generality of our replacement framework. Moreover, although we cannot analytically characterize nor empirically demonstrate the optimality of the proposed designs in complex networks, we still numerically show that the proposed design is near-optimal in a very simplified scenario (see Fig. 2 in [224]). 235 8.8.1 Simulation Environment In all simulations, we consider 4000 users located (and possibly moving) within a square-shaped area, with side length 1000 m. The BS is located at the center of this square and serves as the central controller. The service coverage of the BS, i.e., the cell, also covers a square-shaped area, with side length 500 m. The con- sideration of a simulation area that is larger than the serving area is to emulate the users’ behavior, which moves in and out of the coverage region. D2D communication is implemented based on the clustering of the users in a cell, as has been widely adopted for D2D based video caching [42, 70, 71, 99]. In particular, the cell is split into several smaller and equal-sized sqaure clusters, where only users within the same clus- ter can communicate with each other. We denote the side lengthG of a cluster as the cluster size. To avoid interference, a spatial reuse scheme is employed, i.e., only clusters that are a minimum distance apart from one other may use the same time/frequency resources, similar to cellular frequency reuse. Thus, the size of a cluster, also interpreted as the cooperation distance, can greatly affect the throughput and outage perfor- mance. All communications within a cluster use the same data rate regardless of the distance between the users, corresponding to a fixed modulation and coding scheme. In all simulations, D2D links have a service rate of 200 Mbits/s. This service rate is feasible when we adopt mmWave communications or when we ap- ply reuse factor one along with the advanced WiFi service. To be able to use either approach, we consider the cluster sizeG to be upper bounded by 100 m [42, 99]. All users generate requests according to a request distribution. In a cluster, users fulfill requests from files in the local cache whenever possible. Otherwise, the requests are sent to the BS. Among the requests (in the same cluster) that can be fulfilled via D2D com- munications, the BS randomly selects one such request to satisfy. The above D2D scheduling and delivery generally follow the priority-scheduling as that detailed in [99]. We consider here users cannot be served by user-specific BS links, but can be served by broadcasting of the BS. When the BS broadcasts filem (for both replacement and service), all user requests in the cell for filem are satisfied and the queue of filem is 236 cleared. Control overhead is ignored in simulations for simplicity. We model the service using a slotted structure and then evaluate the performance in terms of the number of requests satisfied per slot, which include the requests satisfied by self-caching, D2D communications, and BS broadcasting. We consider a slot length of 6 s and simulateT = 14400 time slots (complete 24 hours) to obtain one sample result. This setup allows the users to finish downloading a file whose size is 150 MB within each slot. Note that this file size is enough to provide around 30 minutes of video with fairly good quality. We adopt the mobility model in [137], which directly connects to the user velocity and the random waypoint model in [225] such that we can model the user movement. Each useru in the mobility model randomly selects a target point within the simulation area, i.e., within the 1 km 2 area, and moves toward the target point with a constant velocity. To decide the velocity of the movement, each useru randomly selects the velocity in [0; 2V u ], whereV u is the average velocity of this user.V u is randomly selected from [0; 2V net ] at the beginning of the simulations, whereV net = 1 m/s (3:6 km/h) is the average velocity in the network, which corresponds to a fast walking speed. The general mobility pattern is as follows. Each user first picks a target point, selects the velocity for this trip, and then moves toward the target. Since we adopt the slotted structure, each user checks whether the moving distance is sufficient to reach the target point at the end of each time slot. If yes, then the user chooses another target point and velocity for a new trip; if not, then the user keeps moving toward the same target point until it arrives. A user can either be in an active or inactive mode. When the request of an active user is satisfied, the user immediately transits to inactive. Each user can change its mode at the end of each time slot, and the probability of changing mode is 0:05. When a user changes from active to inactive, the request of the user is dropped from the queueing system, thereby causing outage. Conversely, a user’s request is generated according to the request distribution at the time that a user changes from an inactive to an active. This request is accordingly sent to the BS at the beginning of the next time slot if the local cache cannot satisfy the request. A user can move in and out of the cell. When a user moves out of the cell at the end of the 237 time slot, the request of the user is dropped from the network, and the BS loses the information of the user. On the other hand, when a user moves into the cell, the user can either be in an active or inactive mode with equal probability. If the user is in active mode, then the request is generated according to the request distribution at that time slot. We consider a single update of the request distribution per hour, i.e., a single update every 600 time slots. The request distribution update is always the last function to be conducted in a time slot. In each update,K new files are added into the library and become the most popularK files. Thus, the rank of all the original files should degrade byK. In addition, the originally least popularK files are dropped from the library, indicating the users are no longer interested in those files. Aside from adding and dropping files, the concentration rate of the request distribution can change at each update. We model the request distribution by using a Zipf distribution [99] with Zipf parameter = 0:2 + 0:5(k 1);k = 1; 2;:::; 25. The change of indexk, indicates the change of the concentration rate, and we model this using a Markov process with a transition probability matrixP , in whichP k;k = 0:5;P k;k+1 = 0:25;P k;k1 = 0:25; where 2k 24; P 1;1 = 0:5;P 1;2 = 0:5;P 25;25 = 0:5;P 25;24 = 0:5;P k;l = 0, otherwise. Due to users’ mobility, we also need to consider the outage caused by those users moving away from each other during the transmission. This condition is called “mobility outage”. Notice that users are guaranteed to communicate with each other only if they are in the same cluster. Thus, mobility outage occurs when two users that have established D2D communications at the beginning of a time slot are not in the same cluster at the end of the time slot. Once an outage occurs, the request is not satisfied, and the user remains active with the same request. Note that when users are served by the broadcasting from the BS, such mobility outage does not happen. To initialize a simulation, we adopt the following procedures: (i) all users are uniformly distributed within the square with side length 1000 m; (ii) users located within the BS service area are set to active mode, whereas the users located outside the BS service area are set to inactive mode; (iii) every user ran- 238 domly selects their average velocities used during the simulation, and then initializes a new trip by using the described mobility model; and (iv) the initial request distribution is set at indexk = 13, i.e., = 0:8. In all the simulations below, MATLAB TM is used to build up our simulation environment. We run simulations on a server with 72 CPU cores. Each core has a rate of 2:1 GHz. 8.8.2 Simulation Results Now, we evaluate the proposed designs. We present our results by their sample means (specific points) and sample deviations (error bars). In all simulations, we considerC = 1 andc A inst = 20;8A2A br (t). This means that on the average, the broadcasting action happens once per 20 time slots. In the MyDPP approach, V = 0 is considered 9 and different step-sizes (indicated in the legends of the figures) are used. In the SPDPP approach, we considerT stage = 20,N = 10, = 1:3;82 , andD m (t) =fd m j (1b m (t))=2 k > d min ;k = 0; 1;:::;g[fd min g, whered min = 0:001 is the minimal step-size. We use the complexity reduction approaches in Ch. 8.6.2 for SPDPP. The minimal and maximal number of candidate files are 2 and 4, respectively. 10 The threshold for pruning a candidate is = 10 6 . In Figs. 8.1 and 8.2, to focus on evaluating the performance of the replacement designs, the mobility outage is temporarily excluded. Then in the remaining figures, the influence of such outage is included. All the proposed replacement designs can satisfy the cost constraint within < 0:005 accuracy, i.e., 1 T P T1 t=0 c A(t) inst (t)C + with high probability, and accordingly stabilize the queueing system in the simulations. This is not shown in the figures for brevity. In the following dicussion, we demonstrate the performance of the proposed replacement designs and compare them with static approaches. In all figures, “Zipf-0:8” indicates a time-invariant caching policy 9 Although differentV can entail different trade-off by theorems. However, the low-complexity implementation is merely the approximation of the exact drift-plus-penalty minimization; thus, the trade-off entailed byV in MyDPP is not very unclear. We thus choose the most cost-effective case (V = 0) for the demonstrations. 10 Of course, a candidate file can have different step-sizes and recall that a final candidate is jointly determined by the candidate file and step-size. 239 based on a Zipf distribution with parameter 0:8 [71]; “Brod” indicates that the BS periodically broadcasts, i.e., the BS broadcasts the files in a round-robin manner every 20 time slots, but does not conduct replace- ment. The “Zipf-0:8” policy is also used as the initial caching policy for the replacement designs. Since we focus on demonstrating the performance of the replacement designs, we do not try to optimize the static pol- icy. Besides, we adopt this policy because it is simple to use and performs well [71] as it matches the initial request distribution, which also has the Zipf parameter = 0:8. In Fig. 8.1, S = 1, M = 100, and K = 3 are considered. We observe that the choice of step-size indeed influences the results significantly and the optimal step-size depends on the adopted parameters and network configurations. Clearly, the best step-size cannot be obtained before we actually run the simulations, which prevents the real-time optimization. Fortunately, we can still obtain a somewhat efficient step-size by looking at the concentration rate of the request distribution. From experience with our simulations, the step-size is well-performing when it is on the order of the popularity of the most popular files, e.g.,d = 0:05 in the figure. 11 Besides, it can be observed that when the caching distribution is not appropriate, a larger cluster size could lead to better performance. This is intuitive because when the caching distribution is inappropriate, we need to enlarge the cluster size to increase the probability that a user can find the desired file in the cluster. For example, in Fig. 8.1, the MyDPP withd = 0:05 has the best performance when cluster size is within the range of 60 70 m, while the best performance for the MyDPP withd = 0:01 is around 71 m. The reason is that when the step-size isd = 0:01, it might not provide sufficiently fast replacement to adjust the caches of users to accommodate the new files within a short period after the update of the request distribution. Finally, we observe that all proposed designs perform better than the static approaches and outperform the MyDPP with extremely small step-size (d = 0:001). This validates the benefits of having appropriate replacement, even when some type of broadcasting is used. Note that whend is very small, the 11 Even thoughd = 0:05 is optimum throughout the whole chapter, in simulations considering different network setups, different d could be optimum. 240 20 30 40 50 60 70 80 90 100 Cluster Size (meters) 5 10 15 20 25 30 35 40 Satisfied Requests/Slot Zipf-0.8 Zipf-0.8 + Brod MyDPP, d=0.2 MyDPP, d=0.05 MyDPP, d=0.01 MyDPP, d=0.001 Figure 8.1: Throughput as function of cluster size for MyDPP replacement with different step-sizes. MyDPP is very close to simply providing appropriate broadcasting without cache content replacement. We compare between MyDPP and SPDDP in Fig. 8.2, which considersS = 1 andM = 100;K = 3 and K = 6 are adopted in Fig. 8.2a and Fig. 8.2b, respectively. We observe that the proposed SPDPP replacement usually provides the best performance without the need of manually choosing the appropriate step-size. The proposed MyDPP design can be comparable to the SPDPP design when the step-size is optimized. The benefit of MyDPP is that it has lower complexity and does not need the predictions of the future, though the step-size needs to be appropriately selected. All the proposed replacement designs demonstrate significant improvement when compared to the static policy. In Fig. 8.3, we compare different schemes in the same network as in Fig. 8.2, but this time the influence of mobility outage is considered. From the figure, the same observations as in Fig. 8.2 can be obtained. Besides, by comparing Fig. 8.2 with Fig. 8.3, we observe that the performance slightly degrades due to the mobility outage, and such degradation is larger when the cluster size is smaller. This is intuitive because when the cluster size is small, it is more likely to suffer from mobility outages. In Fig. 8.4, we evaluate the proposed designs in networks where the user can cache multiple files. The replacement design is implemented following the extension approach proposed in Sec. 8.7. We consider S = 5,M = 100, andK = 3 in Fig. 8.4a. The results generally show the good agreement with our previous 241 20 30 40 50 60 70 80 90 100 Cluster Size (meters) 5 10 15 20 25 30 35 40 Satisfied Requests/Slot Zipf-0.8 Zipf-0.8 + Brod MyDPP, d = 0.05 MyDPP, d = 0.01 SPDPP (a) K = 3. 20 30 40 50 60 70 80 90 100 Cluster Size (meters) 0 5 10 15 20 25 30 35 40 45 Satisfied Requests/Slot Zipf-0.8 Zipf-0.8 + Brod MyDPP, d = 0.05 MyDPP, d = 0.01 SPDPP (b) K = 6. Figure 8.2: Throughput as function of cluster size for different replacement schemes. 242 20 30 40 50 60 70 80 90 100 Cluster Size (meters) 0 5 10 15 20 25 30 35 40 Satisfied Requests/Slot Zipf-0.8 Zipf-0.8 + Brod MyDPP, d = 0.05 MyDPP, d = 0.01 SPDPP (a) K = 3. 20 30 40 50 60 70 80 90 100 Cluster Size (meters) 0 5 10 15 20 25 30 35 40 Satisfied Requests/Slot Zipf-0.8 Zipf-0.8 + Brod MyDPP, d = 0.05 MyDPP, d = 0.01 SPDPP (b) K = 6. Figure 8.3: Throughput as function of cluster size for different replacement schemes in networks including mobility outage. 243 observations. Besides, the performance is improved when compared withS = 1 in Fig. 8.3. This is clearly because the total number of files that can be cached in a cluster increases. We also note that - in line with results from the literature - the optimum cluster size shrinks as more files can be cached per users. In Fig. 8.4b, we consider S = 5, M = 1000, and K = 6, and obtain the same observations as in all previous figures. This indicates that our replacement designs are effective considering a more practical library size. We should note that although not shown here due to page limitation and for simplicity, the observations and improvements presented in this chapter are likewise obtained in networks with other parameters, e.g., M = 200 andT stage = 30. Finally, we demonstrate the effects of violating the conditions provided at the end of Sec. 8.2. In Fig. 8.5, we considerS = 1,K = 3, andM = 100, and evaluate the MyDPP design in networks with different average network velocities, i.e., V net = 1; 5; 13; 21 m/s. We observe that the performance gain of the MyDPP design gradually decreases asV net increases, while it still outperforms the static policies even with high mobility, e.g.,V net = 21 m/s (75:6 km/h). The result effectively demonstrates that the performance gain of a replacement design is gradually reduced as the conditions are violated. However, even if the conditions are violated, the proposed replacement can still provide some benefits as compared to the static policies. 8.9 Conclusions In this chapter, we investigated dynamic caching content replacement in BS-assisted wireless D2D caching networks as a response to the issue of time-varying dynamics of networks, e.g., time-varying popularity dis- tribution and mobility of users. Our goal is to refresh the caching content in users such that it can match the demand of the network. We proposed a network architecture for caching content replacement by exploiting the broadcasting nature of the BS and by using a queueing system to track the history record. We formulated the replacement problem as a sequential decision-making problem that maximizes the service rate while be- ing subject to the cost constraint and queue stability. By combining the concept of rewards-to-go and the 244 20 30 40 50 60 70 80 90 100 Cluster Size (meters) 15 20 25 30 35 40 45 50 55 60 Satisfied Requests/Slot Zipf-0.8 Zipf-0.8 + Brod MyDPP, d = 0.05 MyDPP, d = 0.01 SPDPP (a) S = 5 andM = 100. 20 30 40 50 60 70 80 90 100 Cluster Size (meters) 0 5 10 15 20 25 30 35 Satisfied Requests/Slot Zipf-0.8 Zipf-0.8 + Brod MyDPP, d = 0.05 MyDPP, d = 0.01 SPDPP (b) S = 5 andM = 1000. Figure 8.4: Throughput as function of cluster size for different replacement schemes in networks including mobility outage and with caching of multiple files per user. 245 20 30 40 50 60 70 80 90 100 Cluster Size (meters) 0 5 10 15 20 25 30 35 40 Satisfied Requests/Slot MyDPP, V net =1, d=0.05 MyDPP, V net =5, d=0.05 MyDPP, V net =13, d=0.05 MyDPP, V net =21, d=0.05 Zipf-0.8, V net =1 Zipf-0.8 + Brod, V net =1 Figure 8.5: Throughput as function of cluster size for MyDPP replacement with different average network velocities. drift-plus-penalty methodology, a solution framework was proposed. Two algorithms that approximate the solution were proposed: the first algorithm used only the historical record, whereas the second used both his- torical record and near-future information. We showed, both analytically and empirically, that our proposed designs can significantly improve the performance while still satisfying the constraints. We also observed that dynamic caching content replacement is necessary to realize the potential performance gain of D2D caching when dynamics exist. 246 C H A P T E R 9 Concluding Remarks and Prospective Directions In this dissertation, I conducted investigations of various topics in cache-aided wireless D2D networks. In Ch. 3 and Ch. 4, results showed from the theoretical point of view that wireless cache-aided D2D can significantly improve the video distributions for mobile users, as its scaling laws based on the popularity distributions of mobile users outperform the conventional unicasting and some other competing technolo- gies. Results in Ch. 5 indicated that having effective tradeoff between throughput and energy efficiency is of paramount importance for networks. Investigations in Ch. 6 and Ch. 7 demonstrated the efficacy of involv- ing the knowledge of individual user preferences when designing the network. As a byproduct, their results also supported the conclusions in Ch.5 that tradeoffs exist among different performance metrics, and an ap- propriate design can provide benefits by optimizing such tradeoffs. Finally, results in Ch. 8 indicated that by using proactive dynamic cache replacement, network dynamics of cache-aided wireless networks can be ap- propriately accommodated. This thus improves the network performance in situations that dynamics cause issues to static policies. Despite all the work that has been done in the area of cache-aided wireless D2D networks, including this dissertation, there are still many topics left for exploration. These can be categorized into information theoretical aspect and network design aspect. With respect to information-theoretical topics, current scaling 247 laws for cache-aided wireless single-hop D2D are based on the clustering structure and static frequency reuse scheme. However, known as a powerful approach in wireless networks, dynamic link scheduling and power control can significantly improve the networks. Therefore, understanding whether involving complicated dynamic link scheduling and power control can further improve scaling laws of cache-aided wireless single- hop D2D is of interest. In addition to this, channels considering fading effects have not been investigated much. Although there are some results for conventional wireless D2D networks, those results have not been extended to cache-aided wireless D2D networks yet. Such an extension is very important because fading effects happen commonly in wireless scenarios and they are notorious for performance degradation. Finally, it is interesting to see how and whether involving MIMO technologies can improve the network. Such a problem has partly been answered by papers investigating hierarchical cooperation, which exploits the distributed MIMO concept to facilitate multi-hop delivery. However, those papers were based on channels without fading. As a result, their approach tended to pursue spatial multiplexing of data streams from different users via cooperation. Accordingly, whether the fading can influence and whether diversity needs to be considered might need some investigations. Concerning network design aspects, there are four important topics to investigate. First, joint content caching and delivery design is very important, and there is a lack of comprehensive investigation of such topic. Current papers, as discussed in Ch. 2, for joint design commonly made strong assumptions and/or restrict their investigations in some specific part of delivery mechanism; therefore, how to obtain an effec- tive joint design in cache-aided wireless D2D networks is still under development. Second, individual aware design is still at the basic stage. Specifically, most existing papers (include our study) were based on cen- tralized caching policies. However, in practice, a decentralized caching policy is more implementable and scalable. Therefore, investigating the decentralized individual preference aware caching policy should be an important future direction. Third, caching policy design considering incentive mechanism has not been well-studied. Since D2D networks naturally involve sharing among users, having users to cooperate and 248 participate in the D2D caching and delivery is a necessary premise. Consequently, a caching policy needs to consider the incentive mechanism and its resulting impact. However, current papers lack taking this into consideration. Finally, in observing the results of Ch. 8, we understand that an effective cache replacement is critical when network dynamics exist. However, design of cache replacement specifically for BS-assisted cache-aided wireless D2D networks has not drawn much attention. To the best of our knowledge, our work is the first to investigate in this direction, but how to efficiently and effectively exploit future information in decision-making process still needs further investigations. It should be noted that, as opposed to conven- tional reactive replacement which focuses on file eviction when new files arrive at users, the proactive cache replacement which can conduct cache replacement according to the knowledge/prediction of future trend is novel and promising for this type of investigation. 249 Bibliography [1] Cisco virtual networking index: Global mobile data traffic forecast update, 2015-2020. Technical report, San Jose, CA, USA. [2] Cisco virtual networking index: Global mobile data traffic forecast update, 2016-2021. Technical report, San Jose, CA, USA. [3] Cisco virtual networking index: Global mobile data traffic forecast update, 2017-2022. Technical report, San Jose, CA, USA. [4] N. Golrezaei, A. F. Molisch, A. G. Dimakis, and G. Caire. Femtocaching and device-to-device collaboration: A new architecture for wireless video distribution. IEEE Commmun. Mag., 51(4):142– 149, April 2013. [5] J. G. Andrews, S. Buzzi, W. Choi, and et. al. What will 5g be? IEEE J. Sel. Areas Commun., 32(6):1065–1082, January 2014. [6] B. Meixner. Hypervideos and interactive multimedia presentations. ACM Computing Surveys (CSUR), 50(1):1–34, 2017. [7] J. Liu, S. G. Rao, B. Li, and H. Zhang. Opportunities and challenges of peer-to-peer internet video broadcast. Proceedings of the IEEE, 96(1):11–24, 2007. 250 [8] Y . Liu, Y . Guo, and C. Liang. A survey on peer-to-peer video streaming systems. Peer-to-peer Networking and Applications, 1(1):18–28, 2008. [9] K. Suh, C. Diot, J. Kurose, L. Massoulie, C. Neumann, D. Towsley, and M. Varvello. Push-to-peer video-on-demand system: Design and evaluation. IEEE J. Sele. Areas Commun., 25(9):1706–1716, 2007. [10] A. F. Molisch, G. Caire, D. Ott, J. R. Foerster, D. Bethanabhotla, and M. Ji. Caching eliminates the wireless bottleneck in video aware wireless networks. Adv. Elect. Eng., 2014(261390), November 2014. [11] J. Wang. A survey of web caching schemes for the internet. ACM SIGCOMM Computer Communi- cation Review, 29(5):36–46, 1999. [12] A. Passarella. A survey on content-centric technologies for the current internet: Cdn and p2p solu- tions. Computer Communications, 35(1):1–32, 2012. [13] P. Wendell and M. J. Freedman. Going viral: flash crowds in an open cdn. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, pages 549–558, 2011. [14] G. Xylomenos, C. N. Ververidis, V . A. Siris, N. Fotiou, C. Tsilopoulos, X. Vasilakos, K. V . Katsaros, and G. C. Polyzos. A survey of information-centric networking research. IEEE communications surveys & tutorials, 16(2):1024–1049, 2013. [15] G. Zhang, Y . Li, and T. Lin. Caching in information centric networking: A survey. Computer Networks, 57(16):3128–3141, 2013. [16] G Tyson, N Sastry, R Cuevas, I Rimac, and A Mauthe. A survey of mobility in information-centric networks. Communications of the ACM, 56(12):90–98, December 2013. 251 [17] E. Cohen and S. Shenker. Replication strategies in unstructured peer-to-peer networks. ACM SIG- COMM Computer Communication Review, 32(4):177–190, 2002. [18] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker. Search and replication in unstructured peer-to-peer networks. In Proceedings of the 16th international conference on Supercomputing, pages 84–95, 2002. [19] S. Androutsellis-Theotokis and D. Spinellis. A survey of peer-to-peer content distribution technolo- gies. ACM computing surveys (CSUR), 36(4):335–371, 2004. [20] S. Iyer, A. Rowstron, and P. Druschel. Squirrel: A decentralized peer-to-peer web cache. In Proceed- ings of the twenty-first annual symposium on Principles of distributed computing, pages 213–222, 2002. [21] S. Ghandeharizadeh, B. Krishnamachari, and S. Song. Placement of continuous media in wireless peer-to-peer networks. IEEE Trans. Multimedia, 6(2):335–342, 2004. [22] J. Zhao, P. Zhang, G. Cao, and C. R. Das. Cooperative caching in wireless p2p networks: Design, implementation, and evaluation. IEEE Trans. Parallel and Distributed Systems, 21(2):229–241, 2010. [23] G. Cao, L. Yin, and C. R. Das. Cooperative cache-based data access in ad hoc networks. IEEE Computer, 37(2):32–39, Feb 2004. [24] L. Yin and G. Cao. Supporting cooperative caching in ad hoc networks. IEEE Trans. mobile comput- ing, 5(1):77–89, 2005. [25] T. Hara and S. K. Madria. Data replication for improving data accessibility in ad hoc networks. IEEE transactions on mobile computing, 5(11):1515–1532, 2006. [26] C.-Y . Chow, H. V . Leong, and A. T. S. Chan. Grococa: Group-based peer-to-peer cooperative caching in mobile environment. IEEE J. Sele. Areas Commun., 25(1):179–191, 2007. 252 [27] Shahram Ghandeharizadeh and Shahin Shayandeh. Greedy cache management techniques for mobile devices. In 2007 IEEE 23rd International Conference on Data Engineering Workshop, pages 39–48. IEEE, 2007. [28] K. Papagiannaki, M. Yarvis, and W S. Conner. Experimental characterization of home wireless networks and design implications. In IEEE INFOCOM 2006, pages 1–13, 2006. [29] S. Ghandeharizadeh and S. Shayandeh. Cooperative caching techniques for continuous media in wireless home networks. In Proceedings of the 1st international conference on Ambient media and systems, pages 1–8, 2008. [30] Shahram Ghandeharizadeh and Shahin Shayandeh. A comparison of block-based and clip-based cooperative caching techniques for streaming media in wireless home networks. In International Conference on Wireless Algorithms, Systems, and Applications, pages 43–52. Springer, 2009. [31] Shahram Ghandeharizadeh and Shahin Shayandeh. Domical cooperative caching for streaming media in wireless home networks. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 7(4):1–17, 2011. [32] J. Zhao and G. Cao. Vadd: Vehicle-assisted data delivery in vehicular ad hoc networks. IEEE Trans. veh. technol., 57(3):1910–1922, 2008. [33] S. Al-Sultan, M. M. Al-Doori, A. H. Al-Bayatti, and H. Zedan. A comprehensive survey on vehicular ad hoc network. Journal of network and computer applications, 37:380–392, 2014. [34] Ridha Soua, Eirini Kalogeiton, Gaetano Manzo, Joao M Duarte, Maria Rita Palattella, Antonio Di Maio, Torsten Braun, Thomas Engel, Leandro A Villas, and Gianluca A Rizzo. Sdn coordina- tion for ccn and fc content dissemination in vanets. In Ad Hoc Networks, pages 221–233. Springer, 2017. 253 [35] L. Li, G. Zhao, and R. S. Blum. A survey of caching techniques in cellular networks: Research issues and challenges in content placement and delivery strategies. IEEE Communications Surveys & Tutorials, 20(3):1710–1732, 2018. [36] I. Ahmed, M. H. Ismail, and M. S. Hassan. Video transmission using device-to-device communica- tions: A survey. IEEE Access, 7:131019–131038, 2019. [37] C. Yang, Y . Yao, Z. Chen, and B. Xia. Analysis on cache-enabled wireless heterogeneous networks. IEEE Trans. Wireless Commun., 15(1):131–145, January 2016. [38] N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Cair. Wireless video content delivery through distributed caching and peer-to-peer gossiping. November 2011. [39] K. Doppler, M. Rinne, C. Wijting, C. B. Ribeiro, and K. Hugl. Device-to-device communication as an underlay to lte-advanced networks. IEEE Commmun. Mag., 47(12):42–49, December 2009. [40] M. A. Maddah-Ali and U. Niessen. Fundamental limits of caching. IEEE Trans. Inf. Theory, 60(5):2856–2867, May 2014. [41] M. Ji, G. Caire, and A. F. Molisch. Fundamental limits of caching in wireless d2d networks. IEEE Trans. Inf. Theory, 62(2):849–869, February 2016. [42] M. Ji, G. Caire, and A. F. Molisch. Wireless device-to-device caching networks: Basic principles and system performance. IEEE J. Sel. Area Commun., 34(1):176–189, January 2016. [43] M. Mehrabi, D. You, V . Latzko, H. Salah, M. Reisslein, and F. H. P. Fitzek. Device-enhanced mec: Multi-access edge computing (mec) aided by end device computation and caching: A survey. IEEE Access, 7:166079–166108, 2019. [44] D. Prerna, R. Tekchandani, and N. Kumar. Device-to-device content caching techniques in 5g: A taxonomy, solutions, and challenges. Computer Communications, 2020. 254 [45] N. Golrezaei, A. D. Dimakis, and A. F. Molisch. Scaling behavior for device-to-device communica- tions with distributed caching. IEEE Trans. Inf. Theory, 60(7):4286–4298, July 2014. [46] M. Ji, G. Gaire, and A. F. Molisch. The throughput-outage tradeoff of wireless one-hop caching networks. IEEE Trans. Inf. Theory, 61(12):6833–6859, December 2015. [47] S.-W. Jeon, S.-N. Hong, M. Ji, G. Caire, and A. F Molisch. Wireless multihop device-to-device caching networks. IEEE Trans. Inf. Theory, 63(3):1662–1676, 2017. [48] M.-C. Lee, M. Ji, A. F. Molisch, and N. Sastry. Throughput-outage analysis and evaluation of cache- aided d2d networks with measured popularity distributions. IEEE Trans. on Wireless Commun., 18(11):5316–5332, November 2019. [49] X. Lin, J. G. Andrews, A. Ghosh, and R. Ratasuk. An overview of 3gpp device-to-device proximity services. IEEE Commun. Mag., 52(4):40–48, 2014. [50] B. Tang, H. Gupta, and S. R. Das. Benefit-based data caching in ad hoc networks. IEEE Trans. Mobile Comput., 7(3):289–304, 2008. [51] E. Yaacoub and O. Kubbar. Energy-efficient device-to-device communications in lte public safety networks. In 2012 IEEE globecom workshops, pages 391–395. IEEE, 2012. [52] E. Yaacoub. On the use of device-to-device communications for qos and data rate enhancement in lte public safety networks. In 2014 IEEE Wireless Communications and Networking Conference Workshops (WCNCW), pages 236–241. IEEE, 2014. [53] N. Saxena, M. Agiwal, H. Ahmad, and A. Roy. D2d-based survival on sharing: For enhanced disaster time connectivity. IEEE Technology and Society Mag., 37(3):64–73, 2018. [54] H. Nishiyama, M. Ito, and N. Kato. Relay-by-smartphone: realizing multihop device-to-device com- munications. IEEE Commun.Mag., 52(4):56–65, 2014. 255 [55] L. Wang, H. Wu, Z. Han, P. Zhang, and H. V . Poor. Multi-hop cooperative caching in social iot using matching theory. IEEE Trans on Wireless Commun., 17(4):2127–2145, 2017. [56] G. Cao. Proactive power-aware cache management for mobile computing systems. IEEE Trans. Computers, 51(6):608–621, 2002. [57] P. Mach, Z. Becvar, and T. Vanek. In-band device-to-device communication in ofdma cellular net- works: A survey and challenges. IEEE Communications Surveys & Tutorials, 17(4):1885–1922, 2015. [58] J. Gao and P. Steenkiste. Design and evaluation of a distributed scalable content discovery system. IEEE J. Sele. Areas Commun., 22(1):54–66, 2004. [59] J. Risson and T. Moors. Survey of research towards robust peer-to-peer networks: Search methods. Computer networks, 50(17):3485–3521, 2006. [60] Shahram Ghandeharizadeh, Sandy Irani, Jenny Lam, and Jason Yap. Camp: a cost adaptive multi- queue eviction policy for key-value stores. In Proceedings of the 15th International Middleware Conference, pages 289–300, 2014. [61] F. Figueiredo, J. M. Almeida, M. A. Gonc ¸alves, and F. Benevenuto. Trendlearner: Early prediction of popularity trends of user generated content. Information Sciences, 349:172–187, 2016. [62] C. Li, J. Liu, and S. Ouyang. Characterizing and predicting the popularity of online videos. IEEE Access, 4:1630–1641, 2016. [63] D. Liu, B. Chen, C. Yang, and A. F. Molisch. Caching at the wireless edge: design aspects, challenges, and future directions. IEEE Commun. Mag., 54(9):22–28, 2016. [64] J. Xing, Y . Cui, and V . Lau. Temporal-spatial request aggregation for cache-enabled wireless multi- casting networks. In IEEE Global Communications Conference, pages 1–7. IEEE, 2017. 256 [65] M. Gregori, J. Matamoros J. Gomez-Vilardebo, and D. Gunduz. Wireless content caching for small cell and d2d networks. IEEE J. Sel. Area Commun., 34(5):1222–1234, May 2016. [66] D. Malak, M. Al-Shalash, and J. G. Andrews. Optimizing content caching to maximize the density of successful receptions in device-to-device networking. IEEE Trans. Commun., 64(10):4365–4380, October 2016. [67] Y . Wang, X. Tao, X. Zhang, and Y . Gu. Cooperative caching placement in cache-enabled d2d under- laid cellular network. IEEE Commun. Lett., 21(5):1151–1154, May 2017. [68] B. Chen, C. Yang, and A. F. Molisch. Cache-enabled device-to-device communications: Offloading gain and energy cost. IEEE Trans. Wireless Commun., 17(78):4519–4536, July 2017. [69] L. Zhang, M. Xiao, G. Wu, and S. Li. Efficient scheduling and power allocation for d2d-assisted wireless caching networks. IEEE Trans. Commun., 64(6):2438–2452, June 2016. [70] B. Chen, C. Yang, and G. Wang. High-throughput opportunistic cooperative device-to-device com- munications with caching. IEEE Trans. Veh. Technol., 66(8):7527–7539, August 2017. [71] N. Golrezaei, P. Mansourifard, A. F. Molisch, and A. G. Dimakis. Base-station assisted device-to- device communications for high-throughput wireless video networks. IEEE Trans. Wireless Com- mun., 13(7):3665–3676, July 2014. [72] N. Golrezaei, A. D. Dimakis, and A. F. Molisch. Scaling behavior for device-to-device communica- tions with distributed caching. IEEE Trans. Inf. Theory, 60(7):4286–4298, July 2014. [73] J. Rao, H. Feng, C. Yang, Z. Chen, and B. Xia. Optimal caching placement for d2d assisted wireless caching networks. May 2016. [74] Z. Chen, N. Pappas, and M. Kountouris. Probabilistic caching in wireless d2d networks: cache hit optimal vs. throughput optimal. IEEE Commun. Lett., 21(3):584–587, March 2017. 257 [75] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: Evidence and implications. In IEEE INFOCOM’99, volume 1, pages 126–134. IEEE, 1999. [76] M. Hefeeda and O. Saleh. Traffic modeling and proportional partial caching for peer-to-peer systems. IEEE/ACM Trans. Netw., 16(6):1447–1460, December 2008. [77] M.-C. Lee, M. Ji, A. F. Molisch, and N. Sastry. Performance of caching-based d2d video distribution with measured popularity distributions. December 2019. [78] D. Karamshuk, N. Sastry, M. Al-Bassam, A. Secker, and J. Chandaria. Take-away tv: Recharging wok commutes with predictive preloading of catch-up tv content. IEEE J. Sel Commun., 34(8):2091– 2101, August 2016. [79] B. Chen and C. Yang. Caching policy optimization for d2d communications by learning user prefer- ence. June 2017. [80] M.-C. Lee, A. F. Molisch, N. Sastry, and A. Raman. Individual preference probability modeling for video content in wireless caching networks. pages 1–7, December 2017. [81] M.-C. Lee, A. F. Molisch, N. Sastry, and A. Raman. Individual preference probability modeling and parameterization for video content in wireless caching networks. IEEE/ACM Transactions on Networking, 27(2):676–690, April 2019. [82] D. Liu and C. Yang. Caching at base stations with heterogeneous user demands and spatial locality. IEEE Trans. Commun., 67(2):1554–1569, Februay 2019. [83] Y . Zhang, Y . Xu, Q. Wu, X. Liu, K. Yao, and A. Anpalagan. A game-theoretic approach for optimal distributed cooperative hybrid caching in d2d networks. IEEE Wireless Commun. Lett., 7(3):324–327, 2017. 258 [84] T. Zhang, H. Fan, J. Loo, and D. Liu. User preference aware caching deployment for device-to-device caching networks. IEEE System J., 62(6):4629–4652, December 2017. [85] Y . Li, C. Zhong, M. C. Gursory, and S. Velipasalar. Learning-based delay-aware caching in wireless d2d caching networks. IEEE Access, 6:77250–77264, November 2018. [86] Y . Guo, L. Duan, and R. Zhang. Cooperative local caching under heterogeneous file preferences. IEEE Trans. Commun., 65(1):444–457, January 2017. [87] Y . Pan, C. Pan, H. Zhu, and et. al. On consideration of content preference and sharing willingness in d2d assisted offloading. IEEE J. Sel. Area Commun., 35(4):978–992, April 2017. [88] S. Gitzenis, G. S. Paschos, and L. Tassiulas. Asymptotic laws for joint content replication and delivery in wireless networks. IEEE Trans. Inf. Theory, 59(5):2760–2776, 2012. [89] K. Shanmugam, N. Golrezaei, A. F. Molisch, A. G. Dimakis, and G. Caire. Femtocaching: Wireless content delivery through distributed caching helpers. IEEE Trans. Inf. Theory, 59(12):8402–8413, December 2013. [90] S. Salamatian, A. Beirami, A. Cohen, and M. M´ edard. Centralized vs decentralized multi-agent guesswork. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 2258–2262. IEEE, 2017. [91] X. Wang, M. Chen, T. Taleb, A. Ksentini, and V . C. M. Leung. Cache in the air: Exploiting content caching and delivery techniques for 5g systems. IEEE Communications Mag., 52(2):131–139, 2014. [92] A. Asadi, Q. Wang, and V . Mancuso. A survey on device-to-device communication in cellular net- works. IEEE Communications Surveys & Tutorials, 16(4):1801–1819, 2014. 259 [93] O. A. Amodu, M. Othman, N. K. Noordin, and I. Ahmad. A primer on design aspects, recent ad- vances, and challenges in cellular device-to-device communication. Ad Hoc Networks, 94:101938, 2019. [94] C. Lee, E. Hwang, and D. Pyeon. A popularity-aware prefetching scheme to support interactive p2p streaming. IEEE Trans. Consumer Electronics, 58(2):382–388, 2012. [95] S.-H. Lim, Y .-B. Ko, G.-H. Jung, J. Kim, and M.-W. Jang. Inter-chunk popularity-based edge-first caching in content-centric networking. IEEE Commun. Lett., 18(8):1331–1334, 2014. [96] Huda S Goian, Omar Y Al-Jarrah, Sami Muhaidat, Yousof Al-Hammadi, Paul Yoo, and Mehrdad Dianati. Popularity-based video caching techniques for cache-enabled networks: a survey. IEEE Access, 7:27699–27719, 2019. [97] J. Yu, C. T. Chou, X. Du, and T. Wang. Internal popularity of streaming video and its implication on caching. In 20th International Conference on Advanced Information Networking and Applications- Volume 1 (AINA’06), volume 1, pages 6–pp. IEEE, 2006. [98] S. Tewari and L. Kleinrock. On fairness, optimal download performance and proportional replication in peer-to-peer networks. In International Conference on Research in Networking, pages 709–717. Springer, 2005. [99] M.-C. Lee and A. F. Molisch. Caching policy and cooperation distance design for base station assisted wireless d2d caching networks: Throughput and energy efficiency optimization and trade-off. IEEE Transactions on Wireless Communications, 17(11):7500–7514, November 2018. [100] Y . Cao, M. Tao, F. Xu, and K. Liu. Fundamental storage-latency tradeoff in cache-aided mimo interference networks. IEEE Trans. Wireless Commun., 16(8):5061–5076, August 2017. 260 [101] P. Gupta and P. R. Kumar. The capacity of wireless networks. IEEE Trans. on inf. theory, 46(2):388– 404, 2000. [102] A. Agarwal and P. R. Kumar. Capacity bounds for ad hoc and hybrid wireless networks. ACM SIGCOMM Computer Communication Review, 34(3):71–81, 2004. [103] M. Franceschetti, O. Dousse, D. N. Tse, and P. Thiran. Closing the gap in the capacity of wireless networks via percolation theory. IEEE Trans. Inf. Theory, 53(3):1009–1018, 2007. [104] F. Xue and P. R. Kumar. Scaling laws for ad hoc wireless networks: an information theoretic ap- proach. Foundations and Trends R in Networking, 1(2):145–270, 2006. [105] S. Shakkottai, X. Liu, and R. Srikant. The multicast capacity of large multihop wireless networks. IEEE/ACM Trans Networking, 18(6):1691–1700, 2010. [106] U. Niesen, P. Gupta, and D. Shah. The balanced unicast and multicast capacity regions of large wireless networks. IEEE Trans. Inf. Theory, 56(5):2249–2271, 2010. [107] A El Gamal, J. Mammen, B. Prabhakar, and D. Shah. Throughput-delay trade-off in wireless net- works. In IEEE INFOCOM 2004, volume 1. IEEE, 2004. [108] A. El Gamal, J. Mammen, B. Prabhakar, and D. Shah. Optimal throughput-delay scaling in wireless networks-part i: The fluid model. IEEE Trans. on Inf. Theory, 52(6):2568–2592, 2006. [109] A. El Gamal, J. Mammen, B. Prabhakar, and D. Shah. Optimal throughput–delay scaling in wireless networkspart ii: Constant-size packets. IEEE Trans. on Inf. Theory, 52(11):5111–5116, 2006. [110] A. Ozgur, O. L´ evˆ eque, and D. N. Tse. Hierarchical cooperation achieves optimal capacity scaling in ad hoc networks. IEEE Trans. inf. theory, 53(10):3549–3572, 2007. 261 [111] T. Hara. Effective replica allocation in ad hoc networks for improving data accessibility. In IEEE INFOCOM 2001, volume 3, pages 1568–1576. IEEE, 2001. [112] L. Qiu and G. Cao. Popularity-aware caching increases the capacity of wireless networks. IEEE Trans. Mobile Comput., 19(1):173–187, 2019. [113] M.-C. Lee, M. Ji, and F. A. Molisch. Throughput–outage analysis of cache-aided wireless multi-hop d2d networks. IEEE Trans. Commun., 2020. (in preparation). [114] Z. H. Awan and A. Sezgin. Fundamental limits of caching in d2d networks with secure delivery. In 2015 IEEE International Conference on Communication Workshop (ICCW), pages 464–469. IEEE, 2015. [115] N. Naderializadeh, M. A. Maddah-Ali, and Amir S. Avestimehr. Fundamental limits of cache-aided interference management. IEEE Transactions on Information Theory, 63(5):3092–3107, 2017. [116] A. M. Ibrahim, A. A. Zewail, and A. Yener. Device-to-device coded caching with heterogeneous cache sizes. In 2018 IEEE International Conference on Communications (ICC), pages 1–6. IEEE, 2018. [117] C ¸ . Yapar, K. Wan, R. F. Schaefer, and G. Caire. On the optimality of d2d coded caching with uncoded cache placement and one-shot delivery. IEEE Trans. Commun., 67(12):8179–8192, 2019. [118] J. Guo, J. Yuan, and J. Zhang. An achievable throughput scaling law of wireless device-to-device caching networks with distributed mimo and hierarchical cooperations. IEEE Trans. on Wireless Commun., 17(1):492–505, 2017. [119] S.-N. Hong and G. Caire. Beyond scaling laws: On the rate performance of dense device-to-device wireless networks. IEEE Trans. on Inf. Theory, 61(9):4735–4750, 2015. 262 [120] A. Liu, V . K. N. Lau, and G. Caire. Cache-induced hierarchical cooperation in wireless device-to- device caching networks. IEEE Trans. on Inf. Theory, 64(6):4629–4652, 2018. [121] M. Mahdian and E. M. Yeh. Throughput and delay scaling of content-centric ad hoc and heteroge- neous wireless networks. IEEE/ACM Trans. Networking, 25(5):3030–3043, 2017. [122] S. Ghandeharizadeh, T. Helmi, T. Jung, S. Kapadia, and S. Shayandeh. An evaluation of two policies for placement of continuous media in multi-hop wireless networks. In Proceedings of the Twelfth International Conference on Distributed Multimedia Systems, pages 1–13, 2006. [123] N. Golrezaei, M. Ji, A. F. Molisch, A. G. Dimakis, and G. Caire. Device-to-device communications for wireless video delivery. In 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pages 930–933. IEEE, 2012. [124] Y . Li, M. C. Gursoy, and S. Velipasalar. A delay-aware caching algorithm for wireless d2d caching networks. In 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WK- SHPS), pages 456–461. IEEE, 2017. [125] P. Lin, Q. Song, Y . Yu, and A. Jamalipour. Extensive cooperative caching in d2d integrated cellular networks. IEEE Commun. Lett., 21(9):2101–2104, 2017. [126] R. Amer, M. M. Butt, M. Bennis, and N. Marchetti. Inter-cluster cooperation for wireless d2d caching networks. IEEE Trans. Wireless Commun., 17(9):6108–6121, 2018. [127] B. Blaszczyszyn and A. Giovanidis. Optimal geographic caching in cellular networks. June 20015. [128] R. Wang, R. Li, E. Liu, and P. Wang. Performance analysis and optimization of caching placement in heterogeneous wireless networks. IEEE Commun. Lett., 23(10):1883–1887, 2019. [129] S. Soleimani and X. Tao. Cooperative crossing cache placement in cache-enabled device to device- aided cellular networks. Applied Sciences, 8(9):1578, 2018. 263 [130] Y . Zhang, Y . Xu, Q. Wu, X. Liu, K. Yao, and A. Anpalagan. A game-theoretic approach for optimal distributed cooperative hybrid caching in d2d networks. IEEE Wireless Commun. Lett., 7(3):324–327, 2017. [131] L. E. Chatzieleftheriou, M. Karaliopoulos, and I. Koutsopoulos. Caching-aware recommendations: Nudging user preferences towards better caching performance. In IEEE INFOCOM 2017-IEEE Con- ference on Computer Communications, pages 1–9. IEEE, 2017. [132] D. Malak, M. Al-Shalash, and J. G. Andrews. Modeling and performance analysis of full-duplex communications in cache-enabled d2d networks. May 2018. [133] J. Guo, J. Yuan, and J. Zhang. An achievable throughput scaling law of wireless device-to-device caching networks with distributed mimo and hierarchical corporations. IEEE Trans. Wireless Com- mun., 17(1):492–505, January 2018. [134] A. Liu, V . K. N. Lau, and G. Caire. Cache-induced hierarchical cooperation in wireless device-to- device caching networks. IEEE Trans. Inf. Theory, 62(6):4629–4652, June 2018. [135] C. Jarray and A. Giovanidis. The effects of mobility on the hit performance of cached d2d networks. In 2016 14th international symposium on modeling and optimization in mobile, ad hoc, and wireless networks (WiOpt), pages 1–8. IEEE, 2016. [136] M. Chen, Y . Hao, M. Qiu, J. Song, D. Wu, and I. Humar. Mobility-aware caching and computation offloading in 5g ultra-dense cellular networks. Sensors, 16(7):974, 2016. [137] R. Wang, J. Zhang, S. H. Song, and K. B. Letaief. Mobility-aware caching in d2d networks. IEEE Trans. Wireless Commun., 16(8):5001–5015, August 2017. [138] N. Giatsoglou, K. Ntontin, E. Kartsakli, A. Antonopoulos, and C. Verikoukis. D2d-aware device caching in mmwave-cellular networks. IEEE J. on Sele. Areas Commun., 35(9):2025–2037, 2017. 264 [139] R. Amer, H. ElSawy, Jacek K., M. Majid B., and N. Marchetti. Cooperative transmission and proba- bilistic caching for clustered d2d networks. In 2019 IEEE Wireless Communications and Networking Conference (WCNC), pages 1–6. IEEE, 2019. [140] X. Li, X. Wang, P.-J. Wan, Z. Han, and V . C. M. Leung. Hierarchical edge caching in device-to- device aided mobile networks: Modeling, optimization, and design. IEEE J. Sele. Areas Commun., 36(8):1768–1785, 2018. [141] T. Doan, K. N.and Van Nguyen, H. Shin, and T. QS Quek. Socially-aware caching in wireless networks with random d2d communications. IEEE Access, 7:58394–58406, 2019. [142] J. Kim, G. Caire, and A. F. Molisch. Quality-aware streaming and scheduling for device-to-device video delivery. IEEE/ACM Transactions on Networking, 24(4):2319–2331, 2015. [143] N. Naderializadeh, D. T. H. Kao, and A. S. Avestimehr. How to utilize caching to improve spec- tral efficiency in device-to-device wireless networks. In 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 415–422. IEEE, 2014. [144] J. Liu, S. Guo, S. Xiao, M. Pan, X. Zhou, G. Y . Li, G. Wu, and S. Li. Resource allocation for coop- erative d2d-enabled wireless caching networks. In 2018 IEEE Global Communications Conference (GLOBECOM), pages 1–7. IEEE, 2018. [145] M. Choi, A. No, M. Ji, and J. Kim. Markov decision policies for dynamic video delivery in wireless caching networks. IEEE Trans. Wireless Commun., 18(12):5705–5718, 2019. [146] M. Choi, A. F. Molisch, and J. Kim. Joint distributed link scheduling and power allocation for content delivery in wireless caching networks. arXiv preprint arXiv:1911.13010, 2019. [147] M. Chen, S. Ullah, L. Wang, J. Chen, X. Wei, K.-I. Kim, and J. Xu. Analysis and scheduling in a 5g heterogeneous content delivery network. IEEE Access, 6:44803–44814, 2018. 265 [148] J. Chuan, L. Wang, and J. Wu. Belief propagation based distributed content delivery scheme in caching-enabled d2d networks. In ICC 2019-2019 IEEE International Conference on Communica- tions (ICC), pages 1–5. IEEE, 2019. [149] Y . Niu, Y . Liu, Y . Li, Z. Zhong, B. Ai, and P. Hui. Mobility-aware caching scheduling for fog computing in mmwave band. IEEE Access, 6:69358–69370, 2018. [150] B. Lv, L. Huang, and R. Wang. Joint downlink scheduling for file placement and delivery in cache- assisted wireless networks with finite file lifetime. IEEE Trans. Commun., 67(6):4177–4192, 2019. [151] B. Chen, C. Yang, and Z. Xiong. Optimal caching and scheduling for cache-enabled d2d communi- cations. IEEE Commun. Lett., 21(5):1155–1158, 2017. [152] Y . Cai and A. F. Molisch. On the multi-activation oriented design of d2d-aided caching networks. December 2019. [153] M. Choi, J. Kim, and J. Moon. Wireless video caching and dynamic streaming under differentiated quality requirements. IEEE J. Sel. Areas in Commun., 36(6):1245–1257, June 2018. [154] B. Gabr, B. Soret, P. Popovski, S. Hosny, and M. Nafie. Social-aware content delivery in low latency d2d caching networks. In 2019 IEEE Globecom Workshops (GC Wkshps), pages 1–6. IEEE, 2019. [155] A. Balamash and M. Krunz. An overview of web caching replacement algorithms. IEEE Communi- cations Surveys & Tutorials, 6(2):44–56, 2004. [156] J. T. Robinson and M. V . Devarakonda. Data cache management using frequency-based replace- ment. In Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pages 134–142, 1990. 266 [157] N. Young. On-line caching as cache size varies. In Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms, pages 241–250. Society for Industrial and Applied Mathematics, 1991. [158] E. J. O’neil, P. E. O’neil, and G. Weikum. The lru-k page replacement algorithm for database disk buffering. Acm Sigmod Record, 22(2):297–306, 1993. [159] P. T. Joy and K. P. Jacob. A comparative study of cache replacement policies in wireless mobile networks. Advances in Comput. and Information Technol., pages 609–619, 2012. [160] Z. Chen, H. Mohammed, and W. Chen. Proactive caching for energy-efficiency in wireless networks: A markov decision process approach. pages 1–6, May 2018. [161] Vijay Gopalakrishnan, Bujor Silaghi, Bobby Bhattacharjee, and Pete Keleher. Adaptive replication in peer-to-peer systems. In 24th International Conference on Distributed Computing Systems, 2004. Proceedings., pages 360–369. IEEE, 2004. [162] S. Jin and L. Wang. Content and service replication strategies in multi-hop wireless mesh networks. In Proceedings of the 8th ACM international symposium on Modeling, analysis and simulation of wireless and mobile systems, pages 79–86, 2005. [163] N. Carlsson and D. Eager. Ephemeral content popularity at the edge and implications for on-demand caching. IEEE Trans. Parallel and Distributed Systems, 28(6):1621–1634, June 2017. [164] J. Gu, W. Wang, A. Huang, H. Shan, and Z. Zhang. Distributed cache replacement for caching-enable base stations in cellular networks. pages 2648–2653, May 2014. [165] N. Abedini and S. Shakkottai. Content caching and scheduling in wireless networks with elastic and inelastic traffic. IEEE/ACM Trans. Network., 22(3):864–874, June 2014. 267 [166] P. Blasco and D. Gunduz. Content-level selective offloading in heterogeneous networks: Multi-armed bandit optimization and regret bounds. arXiv:1407.6154v1, July 2014. [167] S. Muller, O. Atan, M. van der Schaar, and A. Klein. Context-aware proactive content caching with service differentiation in wireless networks. IEEE Trans. Wireless Commun., 16(2):1024–1036, February 2017. [168] A. Sadeghi, F. Sheikholeslami, and G. B. Giannakis. Optimal and scalable caching for 5g using reinforcement learning of space-time popularities. IEEE J. Sel. Topics Sig. Process., 12(1):180–190, February 2018. [169] B. N. Bharath, K. G. Nagananda, D. Gunduz, and H. Vincent Poor. Caching with time-varying popu- larity profiles: A learning-theoretic perspective. IEEE Trans. Commun., 66(9):3837–3847, September 2018. [170] D. Wu, L. Zhou, Y . Cai, and Y . Qian. Collaborative caching and matching for d2d content sharing. IEEE Wireless Commun. Mag., 25(3):43–49, July 2018. [171] M. A. Maddah-Ali and U. Niesen. Decentralized coded caching attains order-optimal memory-rate tradeoff. IEEE/ACM Trans. Netw., 23(4):1029–1040, 2014. [172] U. Niessen and M. A. Maddah-Ali. Coded caching with nonuniform demands. IEEE Trans. Inf. Theory, 63(2):1146–1158, February 2017. [173] Ramtin Pedarsani, Mohammad Ali Maddah-Ali, and Urs Niesen. Online coded caching. IEEE/ACM Trans. Netw, 24(2):836–845, 2015. [174] J. P¨ a¨ akk¨ onen, A. Barreal, C. Hollanti, and O. Tirkkonen. Coded caching clusters with device-to- device communications. IEEE Trans. Mobile Comput., 18(2):264–275, 2018. 268 [175] D. Wang, Y . Lan, T. Zhao, Z. Yin, and X. Wang. On the design of computation offloading in cache- aided d2d multicast networks. IEEE Access, 6:63426–63441, 2018. [176] Z. Chen, Y . Liu, B. Zhou, and M. Tao. Caching incentive design in wireless d2d networks: A stackelberg game approach. In 2016 IEEE International Conference on Communications (ICC), pages 1–6. IEEE, 2016. [177] L. Shi, L. Zhao, G. Zheng, Z. Han, and Y . Ye. Incentive design for cache-enabled d2d underlaid cellular networks using stackelberg game. IEEE Trans. on Veh. Technol., 68(1):765–779, 2018. [178] R. Wang, J. Zhang, and K. B. Letaief. Incentive mechanism design for cache-assisted d2d com- munications: A mobility-aware approach. In 2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pages 1–5. IEEE, 2017. [179] S. Wang, X. Zhang, L. Wang, J. Yang, and W. Wang. Joint design of device to device caching strategy and incentive scheme in mobile edge networks. IET Communications, 12(14):1728–1736, 2018. [180] C. Yi, S. Huang, and J. Cai. An incentive mechanism integrating joint power, channel and link man- agement for social-aware d2d content sharing and proactive caching. IEEE Trans. Mobile Comput., 17(4):789–802, 2017. [181] M. Haus, M. Waqas, A. Y . Ding, Y . Li, S. Tarkoma, and J. Ott. Security and privacy in device-to- device (d2d) communication: A review. IEEE Communications Surveys & Tutorials, 19(2):1054– 1079, 2017. [182] A. A. Zewail and A. Yener. Fundamental limits of secure device-to-device coded caching. In 2016 50th Asilomar Conference on Signals, Systems and Computers, pages 1414–1418. IEEE, 2016. 269 [183] Y . He, F. R. Yu, N. Zhao, and H. Yin. Secure social networks in 5g systems with mobile edge comput- ing, caching, and device-to-device communications. IEEE Wireless Commun. Mag., 25(3):103–109, 2018. [184] M.-C. Lee, A. F. Molisch, N. Sastry, and A. Raman. Code for generating files with realistic popularity distribution. Available athttps://wides.usc.edu/research_matlab.html. [185] X. Song, Y . Geng, X. Meng, J. Liu, and and Y . Wen W. Lei. Cache-enabled device to device networks with contention-based multimedia delivery. IEEE Access, 6:3228–3239, February 2017. [186] D. Malak, M. Al-Shalash, and J. G. Andrews. Spatially correlated content caching for device-to- device communications. IEEE Trans. Wireless Commun., 17(1):56–70, January 2018. [187] T. Deng, G. Ahani, and D. Yuan P. Fan. Cost-optimal caching for d2d networks with user mobility: Modeling, analysis, and computational approaches. IEEE Trans. Wireless Commun., 17(5):3082– 3094, May 2018. [188] L. Pei, Z. Yang, C. Pan, W. Huang, and M. Chen. Joint bandwidth, caching and association optimiza- tion for d2d assisted wireless networks. April 2018. [189] M. Zink, K. Suh, Y . Gu, and J. Kurose. Watch global, cache local: Youtube network traffic at a campus network - measurements and implications. 2008. [190] X. Hu and A. Striegel. Redundancy elimination might be overrated: A quantitative study on wireless traffic. May 2017. [191] G. Nencioni, N. Sastry, G. Tyson, and et. al. Score: Exploiting global broadcasts to create offline personal channels for on-demand access. IEEE/ACM Trans. Netw., 24(4):2429–2442, August 2016. [192] K. P. Gummadi, R. J. Dunn, S. Saroiu, H. M. Levy S. D. Gribble, and J. Zahorjan. Measurement, modeling, and analysis of a peer-to-peer file-sharing workloads. 24(4):314–329, October 2003. 270 [193] A. F. Molisch. Wireless Communications. IEEE Press-Wiley, 2 edition, 2012. [194] L.-S. Juhn and L.-M. Tseng. Harmonic broadcasting for video-on-demand service. IEEE Trans. Broadcast., 43(3):268–271, September 1997. [195] J. Karedal, A. J. Johansson, F. Tufvesson, and A. F. Molisch. A measurement-based fading model for wireless personal area networks. IEEE Trans. Wireless Commun., 7(11):4575–4585, November 2008. [196] M. Ji, R.-R. Chen, G. Caire, and A. F. Molisch. Fundamental limits of distributed caching in multihop d2d wireless networks. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 2950–2954. IEEE, 2017. [197] J. Liu, N. Kato, J. Ma, and N. Kadowaki. Device-to-device communication in lte-advanced networks: A survey. IEEE Communications Surveys & Tutorials, 17(4):1923–1940, 2014. [198] M.-C. Lee and A. F. Molisch. On the caching policy and cooperation distance design in base station assisted wireless d2d networks. pages 1–7, May 2018. [199] M. Ehrgott. Multicriteria Optimization. pringer-Verlag, New York, 2005. [200] R. W. Freund and F. Jarre. Solving the sum-of-ratios problem by an interior-point method. J. Global Optimization, 19(1):83–102, 2001. [201] B. K. Sriperumbudur and G. R. Lanckriet. On the convergence of the concave-convex procedure. pages 1759–1767, 2009. [202] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. [203] S. Lederer, C. Muller, and C. Timmerer. Dynamic adaptive streaming over http dataset. pages 89–94, 2012. 271 [204] C. A. Gomez-Uribe and N. Hunt. The netflix recommender system: Algorithms, business value, and innovation. ACM Trans. Management Inf. Syst., 6(4):13:113:19, 2016. [205] L. E. Chatzieleftheriou, M. Karaliopoulos, and I. Koutsopoulos. Caching-aware recommendations: Nudging user preferences towards better caching performance. May 2017. [206] M.-C. Lee and A. F. Molisch. Individual preference aware caching policy design for energy-efficient wireless d2d communications. pages 1–7, December 2017. [207] B. Chen and C. Yang. Caching policy for cache-enabled d2d communications by learning user pref- erence. IEEE Transactions on Communications, 66(12):6586–6601, 2018. [208] W. Hoiles, O. N. Gharehshiran, V . Krishnamurthy, N.-D. Dao, and H. Zhang. Adaptive caching in the youtube content distribution network: A revealed preference game-theoretic learning approach. IEEE Trans. Cogn. Commun. Netw., 1(14):71–84, March 2015. [209] D. Karamshuk, N. Sastry, A. Secker, and J. Chandaria. Isp-friendly peer-assisted on-demand stream- ing of long duration content in bbc iplayer. May 2015. [210] D. Karamshuk, N. Sastry, A. Secker, and J. Chandaria. On factors affecting the usage and adoption of a nation-wide tv streaming service. May 2015. [211] T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, Inc., 2006. [212] P. Embrechts, F. Lindskog, and A. McNeil. Modelling dependence with copulas and applications to risk management. 1:329–384, 2003. [213] Generate correlated data using rank correlation. Available athttps://www.mathworks.com/ help/stats/generate-correlated-data-using-rank-correlation.html, 2017. 272 [214] J.D. Gibbons. Nonparametric statistical inference. Marcel Dekker Inc., 1985. [215] A. Raman, N. Sastry, A. Sathiaseelan, A. Secker, and J. Chandaria. Wi-stitch: Content delivery in converged edge networks. 2017. [216] A. Raman, D. Karamshuk, N. Sastry, J. Chandaria, and A. Secker. Consume local: Towards carbon free content delivery. July 2018. [217] A. M. Mathai. An introduction to geometrical probability: Distributional aspects with applications. Boca Raton, FL, USA: CRC, 1999. [218] C.-C. Tseng, H.-T. Chen, and K.-C. Chen. On the distance distributions of the wireless ad hoc networks. 2007. [219] P. T. Joy and K. P. Jacob. Optimal hybrid broadcast scheduling and adaptive cooperative caching for spatial queries in road networks. J. Ambient Intell. Human Comput., 8(4):607–624, August 2017. [220] M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, 2010. [221] H. S. Chang, M. C. Fu, J. Hu, and S. I. Marcus. An asymptotically efficient simulation-based algo- rithm for finite horizon stochastic dynamic programming. IEEE Trans. Automatic Control, 52(1):89– 94, January 2007. [222] T. Homem de Melloa and G. Bayraksan. Monte carlo sampling-based methods for stochastic opti- mization. Surveys in Operations Research and Management Science, 19(1):56–85, January 2014. [223] D. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, 4 edition, 2012. 273 [224] M.-C. Lee, H. Feng, and A. F. Molisch. Dynamic caching content replacement in base station assisted wireless d2d caching networks. IEEE Access, 8:33909–33925, 2020. [225] C. Bettstetter, G. Resta, and P. Santi. The node distribution of the random waypoint mobility model for wireless ad hoc networks. IEEE Trans. Mobile Comput., 2(3):257–269, July 2003. [226] M. Ji, A. M. Tulino, J. Llorca, and G. Caire. Order-optimal rate of caching and coded multicasting with random demands. IEEE Trans. Inf. Theory, 63(6):3923–3949, June 2017. [227] S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press, 2013. [228] T. J. DiCiccio and B. Efron. Bootstrap confidence intervals. 11(7):198–228, 1996. [229] A. Raman, G. Tyson, and N. Sastry. Facebook (a) live? are live social broadcasts really broadcasts? April 2018. [230] D.P. Bertsekas. Nonlinear Programming. Belmont: Athena scientific, 1999. 274 C H A P T E R A Appendices of Chapter 3 A.1 Proof of Theorem 1 In this section, our goal is to find the caching policy that maximizesP c u . Note that the probability that a useru can find its desired filef in the cluster through D2D communications is 1(1P c (f)) S(gc(M)1) . Then by using the law of total probability, we haveP c u = P M f=1 P r (f) 1 (1P c (f)) S(gc(M)1) : To maximize P c u , we follow the similar approach based on convex minimization and KKT conditions in Appendix C of [46] and obtain P c (f) = " 1 P r (f)S(g c (M) 1) 1 S(gc(M)1)1 # + : (A.1) Next, we need to find the such that P M f=1 P c (f) = 1. Let = S(gc(M)1) 1 S(gc(M)1)1 and z f = (P r (f)) 1 S(gc(M)1)1 . Note that z f is non-increasing with respect to f since P r (f) is non-increasing. By following the similar argument in appendix C of [46], we obtain that = m 1 P m f=1 1 z f , satisfying z m +1 andz m . Thus ifm is a unique integer inf1; 2; ;M 1g, it satisfies: m 1 +z m +1 P m f=1 1 z f andm 1 +z m P m f=1 1 z f : Then in order to determinem as a function ofg c (M) in the assumption that 275 g c (M)!1 asM!1, we need to evaluate z m +1 m X f=1 1 z f = (m +q + 1) S(gc(M)1)1 m X f=1 (f +q) S(gc(M)1)1 = 1 m +q + 1 a 0 m X f=1 (f +q) a 0 ; (A.2) z m m X f=1 1 z f = (m +q) S(gc(M)1)1 m X f=1 (f +q) S(gc(M)1)1 = 1 m +q a 0 m X f=1 (f +q) a 0 ; (A.3) where a 0 = S(gc(M)1)1 . We then characterize P m f=1 (f +q) a 0 . By using the fundamental concept of integration, we observe that m X f=1 (f +q) a 0 Z m +1 1 (x +q) a 0 dx = (m +q + 1) a 0 +1 (q + 1) a 0 +1 a 0 + 1 ; m X f=1 (f +q) a 0 (1 +q) a 0 + Z m 1 (x +q) a 0 dx = (1 +q) a 0 + (m +q) a 0 +1 (q + 1) a 0 +1 a 0 + 1 : (A.4) By using (A.4), we can obtain the upper (UB 1) bound and lower bound (LB 1) of (A.2): LB 1 = q + 1 m +q + 1 a 0 + 1 a 0 + 1 " (m +q) m +q m +q + 1 a 0 (q + 1) q + 1 m +q + 1 a 0 # ; UB 1 = 1 a 0 + 1 " (m +q + 1) (q + 1) q + 1 m +q + 1 a 0 # : (A.5) Similarly, we can obtain the upper (UB 2) bound and lower bound (LB 2) of (A.3): LB 2 = q + 1 m +q a 0 + 1 a 0 + 1 " (m +q) (q + 1) q + 1 m +q a 0 # ; UB 2 = 1 a 0 + 1 " (m +q + 1) m +q + 1 m +q a 0 (q + 1) q + 1 m +q a 0 # : (A.6) We then definec 1 =m a 0 andc 2 =qa 0 . Notice thata 0 # 0 asg c (M)!1. Hence LB 1 = c 2 a 0 + 1 c 1 +c 2 a 0 + 1 ! a 0 + 1 a 0 + 1 2 4 c 1 +c 2 a 0 c 1 +c2 a 0 c 1 +c 2 a 0 + 1 ! a 0 c 2 a 0 + 1 c 2 a 0 + 1 c 1 +c 2 a 0 + 1 ! a 0 3 5 ; = 1 1 (a 0 ) + 1 1 +a 0 c 1 +c 2 a 0 (1 2 (a 0 )) c 2 a 0 + 1 (1 1 (a 0 )) = 1 1 +a 0 c 1 +c 2 a 0 (1 2 (a 0 )) c 2 a 0 (1 1 (a 0 ))a 0 (1 1 (a 0 )) UB 1 = 1 a 0 + 1 2 4 c 1 +c 2 a 0 + 1 c 2 a 0 + 1 c 2 a 0 + 1 c 1 +c 2 a 0 + 1 ! a 0 3 5 = 1 1 +a 0 c 1 +c 2 a 0 + 1 c 2 a 0 + 1 (1 1 (a 0 )) ; (A.7) 276 and LB 2 = c 2 a 0 + 1 c 1 +c 2 a 0 ! a 0 + 1 a 0 + 1 2 4 c 1 +c 2 a 0 c 2 a 0 + 1 c 2 a 0 + 1 c 1 +c 2 a 0 ! a 0 3 5 = 1 1 +a 0 c 1 +c 2 a 0 c 2 a 0 (1 3 (a 0 ))a 0 (1 3 (a 0 )) ; UB 2 = 1 a 0 + 1 2 4 c 1 +c 2 a 0 + 1 c 1 +c 2 a 0 + 1 c 1 +c 2 a 0 ! a 0 c 2 a 0 + 1 c 2 a 0 + 1 c 1 +c 2 a 0 ! a 0 3 5 = 1 1 +a 0 c 1 +c 2 a 0 + 1 (1 + 4 (a 0 )) c 2 a 0 + 1 (1 3 (a 0 )) ; (A.8) where i (a 0 ),i = 1;:::; 4 tend to zeros asa 0 # 0. Then we denote that 1 1 (a 0 ) = c 2 +a 0 c 1 +c 2 +a 0 a 0 = ( 1 ) a 0 ; 1 2 (a 0 ) = c 1 +c 2 c 1 +c 2 +a 0 a 0 = ( 2 ) a 0 ; 1 3 (a 0 ) = c 2 +a 0 c 1 +c 2 a 0 = ( 3 ) a 0 ; 1 4 (a 0 ) = c 1 +c 2 +a 0 c 1 +c 2 a 0 = ( 4 ) a 0 : (A.9) It follows that c a 0 i (a 0 ) = c h 1 ( i ) a 0 i a 0 a 0 !0 = c log ( i );i = 1;:::; 4; (A.10) where the second equality is by L’Hˆ ospital’s rule. Thus, supposec =O(c 1 +c 2 ), we obtain c a 0 1 (a 0 ) a 0 !0 = c log 1 + c 1 c 2 ; c a 0 2 (a 0 ) a 0 !0 = 0; c a 0 3 (a 0 ) a 0 !0 = c log 1 + c 1 c 2 ; c a 0 4 (a 0 ) a 0 !0 = 0: (A.11) By using the above results and thatm = c 1 a 0 , it follows that, whena 0 ! 0, we obtain c 1 =a 0 + a 0 + 1 + 1/ c 1 a 0 / c 1 =a 0 + a 0 + 1 + 1; (A.12) where =c 2 log 1 + c 1 c 2 . Thus, we obtain c 1 a 0 1 1 a 0 + 1 = c 1 a 0 + 1 = 1 + a 0 + 1 ; (A.13) leading toc 1 =a 0 + 1 + = 1 +: We then conclude that m = c 1 a 0 = c 1 (S(g c (M) 1) 1) +O(1); (A.14) 277 wherec 1 satisfies the equalityc 1 = 1 +c 2 log 1 + c 1 c 2 andc 2 =qa 0 . This indicatesm = c 1 Sgc(M) to the leading order. Besides, it should be clear that if c 1 Sgc(M) M, we havem =M. We also note that when q = 0, our result degenerates to the results in [46] (Observe that whenq = 0, we obtainc 2 = 0 andc 1 = 1). A.2 Proof of Corollary 1 and Corollary 2 Before starting the main proof, we first provide a useful Lemma: Lemma 1: Denote b X m=a (m +q) =H( ;q;a;b). When 6= 1, we have 1 1 (b +q + 1) 1 (a +q) 1 H( ;q;a;b) 1 1 (b +q) 1 (a +q) 1 + (a +q) : Proof. Consider 6= 1. By the fundamental calculus, we have H( ;q;a;b) = b X m=a (m +q) Z b+1 a dx (x +q) = 1 1 (x +q) 1 j b+1 a = 1 1 (b +q + 1) 1 (a +q) 1 ; H( ;q;a;b) = b X m=a (m +q) (a +q) + Z b a dx (x +q) = (a +q) + 1 1 (x +q) 1 j b a = 1 1 (b +q) 1 (a +q) 1 + (a +q) : 278 A.2.1 Proof of Corollary 1 We considerg c (M)< M c 1 S andq =O Sgc(M) . We thus obtainc 1 =O(1) andm <M. The probability that a useru finds the desired file in the cluster is then P c u = M X f=1 P r (f) 1 (1P c (f)) S(gc(M)1) = m X f=1 P r (f) 1 z f S(gc(M)1) ! (a) m X f=1 P r (f) m X f=1 P r (f) P r (m + 1) P r (f) P r (m + 1) P r (f) 1 S(gc(M)1)1 = m X f=1 P r (f)P r (m + 1) m X f=1 f +q m + 1 +q S(gc(M)1)1 = H( ;q; 1;m ) H( ;q; 1;m) (m +q + 1) H( ;q; 1;m) m X f=1 f +q m + 1 +q S(gc(M)1)1 (b) 1 1 (m +q) 1 1 1 (q + 1) 1 + (1 +q) (m +q + 1) P m f=1 f+q m +1+q S(gc(M)1)1 1 1 (M +q + 1) 1 1 1 (q + 1) 1 = (m +q) 1 (q + 1) 1 + (1 )(1 +q) (M +q + 1) 1 (q + 1) 1 (1 )(m +q + 1) 1 m +1+q S(gc(M)1)1 P m f=1 (f +q) S(gc(M)1)1 (M +q + 1) 1 (q + 1) 1 (c) (m +q) 1 (q + 1) 1 + (1 )(1 +q) (M +q + 1) 1 (q + 1) 1 (1 )(m +q + 1) 1 m +1+q S(gc(M)1)1 h (1 +q) S(gc(M)1)1 + R m 1 (x +q) S(gc(M)1)1 dx i (M +q + 1) 1 (q + 1) 1 = (m +q) 1 (q + 1) 1 + (1 )(1 +q) (M +q + 1) 1 (q + 1) 1 (1 )(m +q + 1) 1 m +1+q S(gc(M)1)1 (M +q + 1) 1 (q + 1) 1 " (1 +q) S(gc(M)1)1 + 1 S(gc(M)1)1 + 1 (m +q) S(gc(M)1)1 +1 (q + 1) S(gc(M)1)1 +1 # (d) = c 1 Sgc(M) +q 1 (q + 1) 1 (M +q) 1 (q + 1) 1 (1 ) c 1 Sgc(M) +q c 1 Sgc(M) (M +q) 1 (q + 1) 1 +o 0 B @ c 1 Sgc(M) +q 1 (M +q) 1 (q + 1) 1 1 C A; (A.15) 279 where (a) is becausez m +1 ; (b) uses results in Lemma 1; (c) exploits Riemann sum andm 1; (d) uses Theorem 1 thatm = c 1 Sgc(M) andg c (M)!1. Similarly, P c u (a) m X f=1 P r (f) m X f=1 P r (f) P r (m ) P r (f) P r (m ) P r (f) 1 S(gc(M)1)1 = H( ;q; 1;m ) H( ;q; 1;m) (m +q) H( ;q; 1;m) m X f=1 f +q m +q S(gc(M)1)1 1 1 (m +q + 1) 1 1 1 (q + 1) 1 (m +q) P m f=1 f+q m +q S(gc(M)1)1 1 1 (M +q) 1 1 1 (q + 1) 1 + (1 +q) = (m +q + 1) 1 (q + 1) 1 (M +q) 1 (q + 1) 1 + (1 )(1 +q) (1 )(m +q) 1 m +q S(gc(M)1)1 P m f=1 (f +q) S(gc(M)1)1 (M +q) 1 (q + 1) 1 + (1 )(1 +q) (m +q + 1) 1 (q + 1) 1 (M +q) 1 (q + 1) 1 + (1 )(1 +q) (1 )(m +q) 1 m +q S(gc(M)1)1 h R m +1 1 (x +q) S(gc(M)1)1 dx i (M +q) 1 (q + 1) 1 + (1 )(1 +q) = c 1 Sgc(M) +q 1 (q + 1) 1 + (M +q) 1 (q + 1) 1 (1 ) c 1 Sgc(M) +q Sgc(M) (M +q) 1 (q + 1) 1 +o 0 B @ c 1 Sgc(M) +q 1 (M +q) 1 (q + 1) 1 1 C A; (A.16) where (a) is becausez m . By combining the above results, Corollary 1 is proved. A.2.2 Proof of Corollary 2 Wheng c (M) = M c 1 S , where , we obtainm =M. Thus, results in Corollary 1 is no longer appropriate. Now sincem =M, we thus have = M1 P M f=1 1 z f . We defineD = q M . Then P c u = M X f=1 P r (f) 1 z f S(gc(M)1) ! = 1 S(gc(M)1) M X f=1 P r (f) (z f ) S(gc(M)1) = 1 0 @ M 1 P M f=1 P r (f) 1 S(gc(M)1)1 1 A S(gc(M)1) M X f=1 P r (f) 1 S(gc(M)1)1 280 = 1 (M 1) S(gc(M)1) 0 @ M X f=1 (f +q) H( ;q; 1;M) 1 S(gc(M)1)1 1 A (S(gc(M)1)1) = 1 (M 1) S(gc(M)1) H( ;q; 1;M) 1 P M f=1 (f +q) S(gc(M)1)1 (S(gc(M)1)1) DenotingS(g c (M) 1) 1 as', we have P c u 1 (M 1) S(gc(M)1) 1 1 (M +q + 1) 1 1 1 (q + 1) 1 1 (1 +q) ' + R M 1 (x +q) ' dx ' = 1 (1 ) (M 1) S(gc(M)1) (M +q + 1) 1 (q + 1) 1 1 (1 +q) ' + 1 ' +1 (M +q) ' +1 (q + 1) ' +1 ' = 1 (1 ) (M 1) S(gc(M)1) (M +q + 1) 1 (q + 1) 1 1 ' + ('+ ) ' '+ (1) | {z } =e ' + 1 (1 +q) ' + (M +q) ' +1 (q + 1) ' +1 ' = 1 (1 ) (M 1) S(gc(M)1) (M) S(gc(M)1) M 1 (M +DM + 1) 1 (DM + 1) 1 e " ' + 1 1 M 1 +DM M ' + (1 +D) ' +1 D + 1 M ' +1 # ' = 1 (1 ) 1 1 M S( M c 1 S 1) | {z } =e =c 1 1 (1 +D + 1 M ) 1 (D + 1 M ) 1 e " ' + 1 1 M D + 1 M ' + (1 +D) ' +1 D + 1 M ' +1 # ' = 1 (1 )e (=c 1 ) (1 +D) 1 (D) 1 h (1 +D) ' +1 (D) ' +1 i ' +o(1): A.3 Proof of Theorem 2 In this section, we provide the proof for Theorem 2, which letsM!1,N!1, andq!1 and consider g c (M)!1 asM!1. We first outline the proof. From Corollaries 1 and 2, we obtain the lower bound ofP c u , which is determined by the cluster sizeg c (M) and the condition ofq. Since the outage probability P o = 1P c u , therefore we can obtain the upper bound of the outage. Subsequently, for each outage regime, 281 we obtain the lower bound ofT min by computing the lower bound of the sum throughputT sum and using the result that T min = 1 N T sum , following the fact that each user is symmetric and has the same average throughput. Since the achievable upper bound of the outage probability and the corresponding lower bound of the throughput can be obtained, we characterize the achievable throughput-outage tradeoff. In Theorem 2, we consider < 1 and the regime coveringq =O Sgc(M) . The cases that < 1 andq =! Sgc(M) will be considered later in Theorem 3. The main flow for computing T sum is the following (see also Appendix D in [46]). Denote L as the number of active links, we have T sum =CE[L] =CE[number of active cluster]; (A.17) whereC is the constant link rate and the second equality is because only one transmission is allowed in a cluster in a time-frequency slot. Then noticing that E[number of active cluster] 1 K E[number of good cluster] = 1 K (number of total clustersP(W > 0)); (A.18) where theK is reuse factor. Recall that a good cluster is where there exists at least one potential link in the cluster. ThusW = P gc(M) u=1 1 u is the number of potential links, where1 u is the indicator that equals to one if useru can access the desired file in the cluster; otherwise1 u = 0. A.3.1 Proof of Regime 1 In this section, we considerg c (M) =c 3 M , wherec 3 = (1), andq =O Sgc(M) . We definec 4 = q M . According to Corollary 1, we obtain: 282 P c u = c 1 Sgc(M) +q 1 (M +q) 1 (q + 1) 1 (1 ) c 1 Sgc(M) c 1 Sgc(M) +q (M +q) 1 (q + 1) 1 (q + 1) 1 (M +q) 1 (q + 1) 1 = c 1 c 3 SM +c 4 M 1 (M +q) 1 (q + 1) 1 (1 ) c 1 c 3 SM c 1 c 3 SM +c 4 M (M +q) 1 (q + 1) 1 (c 4 M + 1) 1 (M +q) 1 (q + 1) 1 (a) = M (1 ) M 1 " Sc 1 c 3 +c 4 1 (1 ) Sc 1 c 3 Sc 1 c 3 +c 4 (c 4 ) 1 # +o M (1 ) M 1 ! =M (1)(1 ) " Sc 1 c 3 +c 4 (Sc 1 c 3 +c 4 ) (c 4 ) 1 # +o M (b) = M " Sc 1 c 3 +c 4 (Sc 1 c 3 +c 4 ) (c 4 ) 1 # +o M ; where (a) is becauseq =o(M) andM!1, and (b) is because ( 1)(1 ) (d) = ( 1) 1 2 1 1 = ( 1) 1 =; (A.19) where (d) is because = 1 2 => 2 = 1 => ( 1) = 2 1 => = 2 1 1 : (A.20) Now we lower boundT min by using (A.18) and computingP(W > 0). 1 We first introduce the definition of self-bounding property and a corresponding Lemma: Definition [226]: LetX R and consider a non-negative -variate function g :X ! [0;1). We say that g has the self-bounding property if there exists a function g i : X 1 ! R such that, for all x 1 ;:::;x X and alli = 1;:::;, 0g i (x 1 ; ;x )g i (x 1 ; ;x i1 ;x i+1 ; ;x ) 1; X i=1 )(g i (x 1 ; ;x )g i (x 1 ; ;x i1 ;x i+1 ; ;x ))g(x 1 ; ;x ): (A.21) 1 Note that the proof technique used for this part is based on the concentration of functions with the self-bounding property and is different from the one in [46]. 283 Lemma 2 (p. 182, Th. 6.12 in [227]): ConsiderX R and the random vectorX = (X 1 ;:::;X )2 X , whereX 1 ;:::;X are mutually statistically independent. DenoteY = g(X), whereg(:) has the self- bounding property. Then, for any 0<E[Y ], we have P(YE[Y ]) exp 2 2E[Y ] : (A.22) We observe that the sum functiong(x 1 ;:::;x ) = P i=1 x i has self-bounding property whenx i ;8i; are binary, i.e.,x i 2f0; 1g. Thus,W = P gc(M) u=1 1 u satisfies the conditions of Lemma 2. By using Lemma 2 and considering =E[W ], we obtainP(W 0) exp E[W ] 2 . It follows that P(W > 0)> 1 exp E[W ] 2 : (A.23) Using (A.18) and (A.23), we thus obtain E[number of active cluster] 1 K (number of total clustersP(W > 0)) N Kg c (M) exp 1 exp E[W ] 2 = N Kc 3 M 1 exp E[W ] 2 : (A.24) To computeE[W ], we note thatE[W ] =E h P gc(M) u=1 1 u i =g c (M)P c u . Thus, E[W ] =c 3 M M " Sc 1 c 3 +c 4 (Sc 1 c 3 +c 4 ) (c 4 ) 1 # +o M ! =c 3 " Sc 1 c 3 +c 4 (Sc 1 c 3 +c 4 ) (c 4 ) 1 # +o(1): (A.25) By using (A.17), (A.24), (A.25), and thatT min = 1 N T sum , we obtain T min C K M c 3 1 exp c 3 2 " Sc 1 c 3 +c 4 (Sc 1 c 3 +c 4 ) (c 4 ) 1 #!! +o(M ): Finally, by exploiting the perturbation argument similar to appendix J in [46], we obtain the achievable throughput-outage tradeoff for regime 1 in the theorem as T (P o ) = C K M c 3 1 exp c 3 2 " Sc 1 c 3 +c 4 (Sc 1 c 3 +c 4 ) (c 4 ) 1 #!! +o(M ); (A.26) whereP o = 1M Sc 1 c 3 +c 4 (Sc 1 c 3 +c 4 ) (c 4 ) 1 . 284 A.3.2 Proof of Regime 2 In this section, we considerg c (M) =!(M )< M c 1 S andq =O Sgc(M) . We definec 5 = q gc(M) . Again by using Corollary 1, we obtain P c u = c 1 Sgc(M) +q 1 (M +q) 1 (q + 1) 1 (1 ) c 1 Sgc(M) c 1 Sgc(M) +q (M +q) 1 (q + 1) 1 (q + 1) 1 (M +q) 1 (q + 1) 1 = c 1 Sgc(M) +c 5 g c (M) 1 (1 ) c 1 Sgc(M) c 1 Sgc(M) +c 5 g c (M) (c 5 g c (M) + 1) 1 (M +c 5 g c (M)) 1 (c 5 g c (M) + 1) 1 = (g c (M)) 1 Sc 1 +c 5 1 (1 ) Sc 1 Sc 1 +c 5 (c 5 ) 1 (M +c 5 g c (M)) 1 (c 5 g c (M) + 1) 1 = (g c (M)) 1 Sc 1 +c 5 (Sc 1 +c 5 ) (c 5 ) 1 +o gc(M) M 1 (M +c 5 g c (M)) 1 (c 5 g c (M) + 1) 1 : (A.27) Then we again use the same approach as used in regime 1 to obtain the lower bound of T min . We first compute E[W ] =g c (M)P c u = g c (M)(g c (M)) 1 Sc 1 +c 5 (Sc 1 +c 5 ) (c 5 ) 1 (M +c 5 g c (M)) 1 (c 5 g c (M) + 1) 1 +o (g c (M)) 2 M 1 (a) =1; (A.28) where (a) is becauseg c (M)< M c 1 S ,q =O Sgc(M) , and g c (M) g c (M) M 1 (b) = g c (M)! M (1 ) M 1 ! (c) = g c (M)!(M ) (d) = !(1) =1 (A.29) where (b) is becauseg c (M) =!(M ); (c) follows the same derivations as in (A.20); (d) is again because g c (M) =!(M ). Consequently, we obtain T min C K 1 g c (M) +o 1 g c (M) ; (A.30) since exp(E[W ]=2)! 0. Again by using a perturbation argument, it follows that T (P o ) = C K 1 g c (M) +o 1 g c (M) ; (A.31) whereP o = 1 (gc(M)) 1 (M+c 5 gc(M)) 1 (c 5 gc(M)+1) 1 Sc 1 +c 5 (Sc 1 +c 5 ) (c 5 ) 1 . 285 A.3.3 Proof of Regime 3 Finally, we considerg c (M) = M c 1 S , where andq =O Sgc(M) . Thus, instead of using Corollary 1, Corollary 2 is adopted. By Corollary 2, we obtain P o (1 )e (=c 1 ) (1 +D) 1 (D) 1 h (1 +D) S(gc(M)1)1 +1 (D) S(gc(M)1)1 +1 i (S(gc(M)1)1) +o(1): (A.32) To compute the lower bound ofT min , it is clear thatE[W ]!1 because bothP c u andg c (M) in regime 3 are larger than their counterparts in regime 2. Consequently, we obtain T min C K 1 g c (M) +o 1 g c (M) = C K Sc 1 M +o 1 g c (M) : (A.33) Again by using a perturbation argument, we obtain the achievable throughput-outage tradeoff: T (P o ) = C K Sc 1 M +o 1 M ; (A.34) whereP o = (1 )e (=c 1 ) (1+D) 1 (D) 1 h (1 +D) S(gc(M)1)1 +1 (D) S(gc(M)1)1 +1 i (S(gc(M)1)1) . A.4 Proof of Theorem 3 Observe that P o goes to 1 when we consider q =O Sgc(M) and g c (M) = o(M) < M c 1 S according to Theorem 2 (regimes 1 and 2). By intuition, it follows that P o also goes to 1 when we consider q = ! Sgc(M) whileq =O(M) since increasing the value ofq degrades the concentration of the popularity distribution which increases the outage. This leads to Theorem 3. Rigorously, observe that P c u = M X f=1 P r (f) 1 (1P c (f)) S(gc(M)1) = M X f=1 P r (f)G(f); (A.35) where G(f) = 1 (1P c (f)) S(gc(M)1) . Then denote the optimal caching policy for P r (f; ;q 1 ) asP q 1 c (f) and the optimal caching policy forP r (f; ;q 2 ) asP q 2 c (f), where bothP q 1 c (f) andP q 2 c (f) are 286 monotonically decreasing with respect tof (see Appendix A.1). Consideringq 1 <q 2 , we want to show the following M X f=1 P r (f; ;q 1 ) 1 (1P q 1 c (f)) S(gc(M)1) (a) M X f=1 P r (f; ;q 1 ) 1 (1P q 2 c (f)) S(gc(M)1) (b) > M X f=1 P r (f; ;q 2 ) 1 (1P q 2 c (f)) S(gc(M)1) ; (A.36) is true. Since (a) is true simply becauseP q 1 c (f) is the optimal policy forP r (f; ;q 1 ), it is thus sufficient to show that (b) is true. To show the (b) of (A.36) is true, we note that wheng<h and> 0, M X f=1 P r (f)G(f) +G(g)G(h)> M X f=1 P r (f)G(f) (A.37) becauseG(f) is monotonically decreasing whenP c (f) is monotonically decreasing with respect tof. Eq. (A.37) indicates that, given the caching policy is monotonically decreasing, when we add to the popularity with lower index (better rank) by subtracting from the one with higher index, we can improveP c u . Then notice that whenq 1 <q 2 , we obtain: P r (1; ;q 1 ) = (1 +q 1 ) P M f=1 (f +q 1 ) (1 +q 2 ) P M f=1 (f +q 2 ) =P r (1; ;q 2 ) (A.38) and (f +q 1 ) (f + 1 +q 1 ) > (f +q 2 ) (f + 1 +q 2 ) ;f = 1; 2;:::;M; (A.39) i.e., starting with a larger value,P r (f; ;q 1 ) decreases faster thanP r (f; ;q 2 ) with respect tof. By using (A.37), (A.38), and (A.39), we can then obtain M X f=1 P r (f; ;q 1 ) 1 (1P q 2 c (f)) S(gc(M)1) M X f=1 P r (f; ;q 2 ) 1 (1P q 2 c (f)) S(gc(M)1) > 0; proving the (b) of (A.36) is true. 287 A.5 Proof of Theorem 4 We considerg c (M) = o(M) N andq =O Sgc(M) . Since these regimes implyg c (M) < M c 1 S , we should apply Corollary 1. We definec 6 = q gc(M) . When > 1, we obtain P c u = c 1 Sgc(M) +q 1 (M +q) 1 (q + 1) 1 (1 ) c 1 Sgc(M) c 1 Sgc(M) +q (M +q) 1 (q + 1) 1 (q + 1) 1 (M +q) 1 (q + 1) 1 = (c 6 g c (M) + 1) 1 (c 6 g c (M) + 1) 1 (M +c 6 g c (M)) 1 c 1 Sgc(M) +c 6 g c (M) 1 (c 6 g c (M) + 1) 1 (M +c 6 g c (M)) 1 ( 1) c 1 Sgc(M) c 1 Sgc(M) +c 6 g c (M) (c 6 g c (M) + 1) 1 (M +c 6 g c (M)) 1 (a) (c 6 g c (M) + 1) 1 (c 6 g c (M) + 1) 1 c 1 Sgc(M) +c 6 g c (M) 1 (c 6 g c (M) + 1) 1 ( 1) c 1 Sgc(M) c 1 Sgc(M) +c 6 g c (M) (c 6 g c (M) + 1) 1 = 1 0 @ c 6 g c (M) + 1 c 1 Sgc(M) +c 6 g c (M) 1 A 1 ( 1)(c 6 g c (M) + 1) 1 c 1 Sgc(M) +c 6 g c (M) c 1 Sgc(M) 1 = 1 0 @ c 6 g c (M) + 1 c 1 Sgc(M) +c 6 g c (M) 1 A 0 @ 0 @ c 6 g c (M) + 1 c 1 Sgc(M) +c 6 g c (M) 1 A 1 + ( 1) 0 @ c 6 g c (M) + 1 c 1 Sgc(M) 1 A 1 1 A = 1 0 @ c 6 g c (M) + 1 c 1 Sgc(M) +c 6 g c (M) 1 A c 6 g c (M) +c 1 Sg c (M) c 6 g c (M) + 1 = 1 c 6 Sc 1 +c 6 ! c 6 +Sc 1 c 6 o(1) = 1 (c 6 ) 1 Sc 1 +c 6 Sc 1 +c 6 o(1); where (a) is because (1 +c 6 g c (M)) 1 > (1 +c 6 g c (M)) 1 (M +c 6 g c (M)) 1 > 0. Then notice that E[W ] =g c (M)P c u !1 sinceg c (M)!1 andc 6 =O(1). Consequently,P (W > 0)! 1 by Lemma 2 (see Appendix A.3.1). It follows that T min C K 1 g c (M) +o 1 g c (M) : Finally, by the perturbation argument again, we obtain Theorem 4. 288 C H A P T E R B Appendices of Chapter 4 B.1 Proof of Lemma 1 According to (4.1), the probability of havingn users in a cluster isP gc(M) (n). Then, observe thatE P;F;G h No(n) n i is the probability that a user cannot find the desired file from the users in the cluster when there aren users in a cluster. We thus obtain the outage probability: p o = 1 X n=1 E P;F;G N o (n) n n =n P gc(M) (n) +P gc(M) (0) = 1 X n=1 M X f=1 P r (f)(1P c (f)) n (g c (M)) n n! e gc(M) +e gc(M) = M X f=1 1 X n=1 P r (f)(1P c (f)) n (g c (M)) n n! e gc(M) + M X f=1 P r (f)e gc(M) = M X f=1 1 X n=0 P r (f)(1P c (f)) n (g c (M)) n n! e gc(M) = M X f=1 1 X n=0 P r (f) [g c (M)(1P c (f))] n n! e gc(M)(1Pc(f)) e gc(M)Pc(f) = M X f=1 P r (f)e gc(M)Pc(f) 1 X n=0 [g c (M)(1P c (f))] n n! e gc(M)(1Pc(f)) | {z } =1 = M X f=1 P r (f)e gc(M)Pc(f) : (B.1) 289 B.2 Proof of Theorem 1 When considering a cluster with side length q gc(M) N , the number of users in the cluster is a Poisson random variable with mean equal tog c (M). Then according to Lemma 1, the optimization problem that minimizes the outage probability of the cluster is: min M X f=1 P r (f)e gc(M)Pc(f) s:t: M X f=1 P c (f) =S 0P c (f) 1;8f = 1; 2;:::;M: (B.2) Then since the optimization is convex, by using the Lagrange multiplier, we can obtain the optimal solution: P c (f) = min 1; 1 g c (M) log g c (M)P r (f) + ! = min 0 @ 1; " log g c (M)P r (f) 1 gc(M) # + 1 A ; (B.3) where [a] + = max(a; 0) and is the Lagrange multiplier such that P M m=1 P c (f) =S. To derive the final results of Theorem 1, in the following, we will first assume: P c (f) = min 0 @ 1; " log g c (M)P r (f) 1 gc(M) # + 1 A = " log g c (M)P r (f) 1 gc(M) # + : (B.4) Then based on the resulting Theorem, we will show that the assumption in (B.4) is indeed true for the caching policy derived in Theorem 1. Before starting the proof, we provide Lemma 2: Lemma 2: Supposeq 0 andF > 1. We have the following inequalities: F X f=1 log(f +q) (F +q + 1) log(F +q + 1)F (1 +q) log(1 +q); F X f=1 log(f +q) log(1 +q) + (F +q) log(F +q)F (1 +q) log(1 +q) + 1: (B.5) Proof. See Appendix B.5. We denote = gc(M) 1 gc(M) , andz f = (P r (f)) 1 gc(M) . As a result,P c (f) = log z f + . We denote m M as the smallest index such thatP c (m + 1) = 0. Then sinceP c (f) is monotonically decreasing, 290 we know that is a parameter such that log z m +1 > 0 and log z m 0. This leads to: z m > 1 and z m +1 1, i.e, <z m andz m +1 . Observe that P m f=1 log z f =S. It follows that m X f=1 log z f z m S ; m X f=1 log z f z m +1 S: (B.6) As a result, m X f=1 log P r (f) P r (m ) 1 gc(M) S ; m X f=1 log P r (f) P r (m + 1) 1 gc(M) S: (B.7) Recall thatP r (f) = (f+q) H(1;M; ;q) . It follows that m X f=1 log P r (f) P r (m ) 1 gc(M) = m X f=1 log f +q m +q gc(M) = g c (M) m X f=1 log f +q m +q (B.8) By using Lemma 2, we know m X f=1 log(f +q) (m +q + 1) log(m +q + 1)m (1 +q) log(1 +q); m X f=1 log(f +q) log(1 +q) + (m +q) log(m +q)m (1 +q) log(1 +q) + 1: (B.9) As a result, we obtain: g c (M) m X f=1 log f +q m +q g c (M) [log(1 +q) + (m +q) log(m +q)m (1 +q) log(1 +q) + 1] + g c (M) m log(m +q) g c (M) m X f=1 log f +q m +q g c (M) [(m +q + 1) log(m +q + 1)m (1 +q) log(1 +q)] + g c (M) m log(m +q): (B.10) This leads to m X f=1 log P r (f) P r (m ) 1 gc(M) g c (M) [m q log(m +q) +q log(1 +q) 1] ; m X f=1 log P r (f) P r (m ) 1 gc(M) g c (M) (m (q + 1) log(m +q + 1)m log m +q + 1 m +q + (1 +q) log(1 +q) : (B.11) 291 Similarly, we have m X f=1 log P r (f) P r (m + 1) 1 gc(M) = m X f=1 log f +q m +q + 1 gc(M) = g c (M) m X f=1 log f +q m +q + 1 (B.12) Hence, by using (B.9), we obtain: g c (M) m X f=1 log f +q m +q + 1 g c (M) [log(1 +q) + (m +q) log(m +q)m (1 +q) log(1 +q) + 1] + g c (M) m log(m +q + 1); g c (M) m X f=1 log f +q m +q + 1 g c (M) [(m +q + 1) log(m +q + 1)m (1 +q) log(1 +q)] + g c (M) m log(m +q + 1): (B.13) By using (B.12), this then leads to m X f=1 log P r (f) P r (m + 1) 1 gc(M) g c (M) m q log(m +q)m log m +q m +q + 1 +q log(1 +q) 1 m X f=1 log P r (f) P r (m + 1) 1 gc(M) g c (M) [(m (q + 1) log(m +q + 1) + (1 +q) log(1 +q)]: (B.14) We now leta 0 = Sgc(M) ,q =C 2 a 0 , andm =C 1 a 0 , whereC 1 andC 2 are some constant. We want to determinem wheng c (M)!1 (or equivalentlya 0 !1). From (B.11), we obtain : 1 S m X f=1 log P r (f) P r (m ) 1 gc(M) 1 a 0 C 1 a 0 C 2 a 0 log (C 1 +C 2 )a 0 +C 2 a 0 log(1 +C 2 a 0 ) 1 =C 1 C 2 log (C 1 +C 2 )a 0 +C 2 log(1 +C 2 a 0 ) 1 a 0 =C 1 C 2 log (C 1 +C 2 )a 0 C 2 a 0 + 1 1 a 0 =C 1 C 2 log " C 1 +C 2 C 2 + 1 a 0 # 1 a 0 C 1 C 2 log 1 + C 1 C 2 (B.15) 292 Also from (B.11), we obtain: 1 S m X f=1 log P r (f) P r (m ) 1 gc(M) C 1 C 2 + 1 a 0 log (C 1 +C 2 )a 0 + 1 C 1 log C 1 +C 2 + 1 a 0 C 1 +C 2 ! + C 2 + 1 a 0 log(1 +C 2 a 0 ) =C 1 C 2 + 1 a 0 log (C 1 +C 2 )a 0 C 2 a 0 + 1 C 1 log C 1 +C 2 + 1 a 0 C 1 +C 2 ! =C 1 C 2 + 1 a 0 log " C 1 +C 2 C 2 + 1 a 0 # C 1 log C 1 +C 2 + 1 a 0 C 1 +C 2 ! C 1 C 2 log 1 + C 1 C 2 : (B.16) As a result of (B.15) and (B.16), we obtain C 1 C 2 log 1 + C 1 C 2 / 1 S m X f=1 log P r (f) P r (m ) 1 gc(M) /C 1 C 2 log 1 + C 1 C 2 (B.17) Similarly, from (B.14), we can obtain: 1 S m X f=1 log P r (f) P r (m + 1) 1 gc(M) 1 a 0 C 1 a 0 C 2 a 0 log (C 1 +C 2 )a 0 (C 1 a 0 ) log (C 1 +C 2 )a 0 (C 1 +C 2 )a 0 + 1 +C 2 a 0 log(1 +C 2 a 0 ) 1 =C 1 C 2 log (C 1 +C 2 )a 0 C 1 log (C 1 +C 2 )a 0 (C 1 +C 2 )a 0 + 1 +C 2 log(1 +C 2 a 0 ) 1 a 0 =C 1 C 2 log (C 1 +C 2 )a 0 C 2 a 0 + 1 C 1 log C 1 +C 2 C 1 +C 2 + 1 a 0 ! 1 a 0 C 1 C 2 log 1 + C 1 C 2 (B.18) 293 Again from (B.14), we obtain: 1 S m X f=1 log P r (f) P r (m + 1) 1 gc(M) 1 a 0 C 1 a 0 (C 2 a 0 + 1) log (C 1 +C 2 )a 0 + 1 + (1 +C 2 a 0 ) log(1 +C 2 a 0 ) =C 1 C 2 + 1 a 0 log (C 1 +C 2 )a 0 + 1 + C 2 + 1 a 0 log(1 +C 2 a 0 ) =C 1 C 2 + 1 a 0 log (C 1 +C 2 )a 0 C 2 a 0 + 1 C 1 C 2 log 1 + C 1 C 2 : (B.19) As a result of (B.18) and (B.19), we obtain C 1 C 2 log 1 + C 1 C 2 / 1 S m X f=1 log P r (f) P r (m + 1) 1 gc(M) /C 1 C 2 log 1 + C 1 C 2 : (B.20) Finally, by using (B.7), (B.17), and (B.20), we obtain the following relationship:C 1 C 2 log 1 + C 1 C 2 = 1. Recall thata 0 = Sgc(M) ,C 2 = a 0 q , andm =C 1 a 0 . We conclude that m = min C 1 Sg c (M) ;M ; (B.21) whereC 1 is a solution ofC 1 = 1 +C 2 log 1 + C 1 C 2 . Since above derivations are based on the assumption in (B.4), we now show that this assumption is indeed true for is indeed true for the caching policy in Theorem 1. Observe that sinceP c (f) is a monoton- ically decreasing function off, it is sufficient to showP c (1) 1. Then sinceP c (1) = log z 1 , this is equivalent to show logz 1 log 1 as in the following. We first observe from above results that m X f=1 log z f =S <=> m X f=1 logz f m log =S (B.22) It follows that log = 1 m m X f=1 logz f + S m : (B.23) 294 Then observe that m X f=1 logz f = m X f=1 log(P r (f)) 1 gc(M) = g c (M) m X f=1 log(f +q) 1 g c (M) m X f=1 logH(1;M; ;q) = g c (M) m X f=1 log(f +q) m g c (M) logH(1;M; ;q): (B.24) It follows from (B.23) and (B.24) that log = 1 m g c (M) m X f=1 log(f +q) + 1 g c (M) logH(1;M; ;q) + S m : (B.25) By using Lemma 2, we obtain: m X f=1 log(f +q) (m +q + 1) log (m +q + 1)m (1 +q) log(1 +q): (B.26) By using (B.25) and (B.26), we then obtain log g c (M) 1 m [(m +q + 1) log (m +q + 1)m (1 +q) log(1 +q)] + 1 g c (M) logH(1;M; ;q) + S m : (B.27) Recall that logz 1 = log(a 1 ) 1 gc(M) = 1 g c (M) log (1 +q) 1 g c (M) logH(1;M; ;q): (B.28) By using (B.27) and (B.28), we obtain: logz 1 log g c (M) 1 m [(m +q + 1) log (m +q + 1)m (1 +q) log(1 +q)] + 1 g c (M) logH(1;M; ;q) + S m + 1 g c (M) log (1 +q) 1 g c (M) logH(1;M; ;q) = g c (M) 1 + q m + 1 m log (m +q + 1) 1 q m + 1 m log(1 +q) + S m g c (M) log(1 +q): (B.29) To show logz 1 log 1, we discuss two cases: (i) C 1 Sgc(M) M and (ii) C 1 Sgc(M) > M. We first consider C 1 Sgc(M) M. Then noticing that in this case m = C 1 Sgc(M) . In addition, we have 295 q =C 2 a 0 = C 2 Sgc(M) andg c (M)!1. It follows from (B.29) that logz 1 log g c (M) h 1 + C 2 C 1 + C 1 Sg c (M) log (C 1 +C 2 )Sg c (M) + 1 1 C 2 C 1 + C 1 Sg c (M) log 1 + C 2 Sg c (M) i + C 1 N g c (M) log 1 + C 2 Sg c (M) = g c (M) 2 4 1 + C 1 Sg c (M) log 0 @ (C 1 +C 2 )Sgc(M) + 1 C 2 Sgc(M) + 1 1 A 1 3 5 + g c (M) C 2 C 1 log 0 @ (C 1 +C 2 )SNgc(M) + 1 C 2 Sgc(M) + 1 1 A + C 1 g c (M) = g c (M) 1 + C 1 Sg c (M) log 1 + C 1 C 2 1 + g c (M) C 2 C 1 log 1 + C 1 C 2 + C 1 g c (M) +o(1): (B.30) Recall thatC 1 = 1 +C 2 log 1 + C 1 C 2 according to Theorem 1. We thus have: log 1 + C 1 C 2 = C 1 1 C 2 : (B.31) By combining (B.30) and (B.31), we finally obtain: logz 1 log g c (M) 1 + C 1 Sg c (M) C 1 1 C 2 1 + g c (M) C 2 C 1 C 1 1 C 2 + C 1 g c (M) +o(1) = g c (M) C 1 C 2 1 C 2 + g c (M) C 1 Sg c (M) C 1 1 C 2 +o(1) (a) = o(1); (B.32) where (a) is becauseN!1 and C 1 C 2 = (1) according to the equationC 1 = 1 +C 2 log 1 + C 1 C 2 . We now consider C 1 Sgc(M) >M. In this case, we havem =M. Then from (B.29), we obtain: logz 1 log g c (M) 1 + q M + 1 M log (M +q + 1) 1 q M + 1 M log(1 +q) + S M g c (M) log(1 +q) = g c (M) 1 + q M + 1 M log (M +q + 1) 1 1 + q M + 1 M log(1 +q) g c (M) + S M = g c (M) 1 + q M + 1 M log M +q + 1 q + 1 +o(1) (a) = o(1); (B.33) where (a) can be shown by considering two cases: (i) wheng c (M) = (M), then sinceq =O(M) andS is finite, (a) is thus true; and (ii) wheng c (M) =o(M), then we must haveq = (M) becauseq = C 2 Sgc(M) 296 and C 1 Sgc(M) > M and C 1 C 2 = (1). Therefore (a) is true. Combining results in (B.32) and (B.33), we show that logz 1 log 1 is true, and thus prove that the validity of the assumption in (B.4) for the caching policy proposed in Theorem 1. This concludes the proof of Theorem 1. B.3 Proof of Proposition 1 Before the proof of Proposition 1, we first state a Lemma: Lemma 3: (the original Lemma 1 in [48]): Denote b X f=a (f +q) = H(a;b; ;q). When 6= 1, we have 1 1 (b +q + 1) 1 (a +q) 1 H(a;b; ;q) 1 1 (b +q) 1 (a +q) 1 + (a +q) : Supposeg c (M) = M C 1 S , where . According to Theorem 1, we obtainm =M. Then observe that z f = (P r (f)) 1 gc(M) . The outage probability is p o = M X f=1 P r (f)e gc(M) log z f = M X f=1 P r (f) z f gc(M) = () gc(M) M X f=1 P r (f) (P r (f)) 1 gc(M) gc(M) = () gc(M) M X f=1 P r (f) (P r (f)) 1 = () gc(M) M (B.34) where () gc(M) = exp P M f=1 logz f S M g c (M) ! =e gc(M) M P M f=1 logz f e Sgc(M) M =e gc(M) M P M f=1 logz f e C 1 : (B.35) We then note that M X f=1 logz f = M X f=1 log(P r (f)) 1 gc(M) = 1 g c (M) M X f=1 logP r (f) = 1 g c (M) M X f=1 log (f +q) H(1;M; ;q) = g c (M) M X f=1 log(f +q) M g c (M) logH(1;M; ;q) (a) g c (M) (log(1 +q) + (M +q) log(M +q)M (1 +q) log(1 +q) + 1) M g c (M) log 1 1 (M +q + 1) 1 (1 +q) 1 ; (B.36) 297 where (a) is because M X f=1 log(f +q) log(1 +q) + (M +q) log(M +q)M (1 +q) log(1 +q) + 1 (B.37) by Lemma 2 and H(1;M; ;q) 1 1 (M +q + 1) 1 (1 +q) 1 (B.38) by Lemma 1 of [48]. It follows from (B.34), (B.35), and (B.37) that the outage probability can be upper bounded as: p o = M X f=1 P r (f)e gc(M) log z f =M () gc(M) e gc(M) M h gc(M) (log(1+q)+(M+q) log(M+q)M(1+q) log(1+q)+1) i e gc(M) M h M gc(M) log 1 1 ((M+q+1) 1 (1+q) 1 ) i e C 1 =Me C 1 (1 +q) M (M +q) M (M+q) e (1 +q) M (1+q) e M 1 1 (M +q + 1) 1 (1 +q) 1 1 (B.39) We letD = q M . It follows from (B.39) that: p o = M X f=1 P r (f)e gc(M) log z f (1 )Me C 1 e M (M +DM + 1) 1 (1 +DM) 1 (1 +DM) M (M +DM) M (M+DM) (1 +DM) M (1+DM) = (1 )Me C 1 e M (M +DM + 1) 1 (1 +DM) 1 (M +DM) (1+D) (1 +DM) D = (1 )e C 1 e M (1 +D) (1+D) (D + 1 M ) D (1 +D + 1 M ) 1 (D + 1 M ) 1 MM (1+D) M D M 1 = (1 )e C 1 (1 +D) (1+D) (D) D (1 +D) 1 (D) 1 +o (1 )e C 1 (1 +D) (1+D) (D) D (1 +D) 1 (D) 1 ! : (B.40) 298 B.4 Proof of Theorem 2 When consideringg c (M) = M C 1 S , by Proposition 1, we know that the outage probability of the cluster is upper bounded as p o (1 )e ( C 1 ) D D (1 +D) (1+D) (1 +D) 1 D 1 : (B.41) To compute the througput of a cluster, we leverage the results in [103]. Recall that when using the achievable scheme in Ch. 4.3.1, the multi-hop approach proposed in [103] is used for delivering both real and virtual files. We denote the throughput generated via transmitting real file as effective throughput; the throughput generated via transmitting virtual file as virtual throughput; and the sum of the real and virtual throughput as mixing throughput. Since only the effective throughput can be taken into account forT user , we want to compute its value. To compute the effective throughput, our approach is to first compute the mixing throughput, and then exclude the virtual throughput from it. From the definition, we know: T user =E n;P min u2U E [C u 1 Hu j n;P] ; (B.42) where C u is the mixing throughput of user u; 1 Hu is the indicating function of the event H u defined as H u =fthe useru can find the desired file in the clusterg. Thus, 1 Hu = 1 if useru can find the desired file in the cluster; otherwise1 Hu = 0. Then according to the result in [103] and [104] and due to the frequency reuse scheme among different clusters, we have the following Theorem: Theorem A.1 [103, 104]: When using the proposed achievable scheme, with high probability (w.h.p.), users in a cluster with side length q gc(M) N can achieve C u = 1 K q 1 gc(M) of the mixing throughput simultaneously. From Theorem A.1, we know that, w.h.p, there exists a = (1) > 0 such thatC u K q 1 gc(M) for all users. We note that both Theorem A.1 and event1 Hu have the symmetry property for all users. It is then 299 sufficient that we consider an arbitrary user in the network. We letC user = K q 1 gc(M) and define an event H =fthe user can find the desired file in the clusterg. Recall thatP h = 1p o is the file hit-rate. Then by using above arguments and (B.42) andg c (M) = C 1 S M , we obtain: T user E n;P [E [C user 1 H j n;P]] =C user E n;P [E [1 H j n;P]] =C user P h = (1p o )C user = 1p o K s 1 g c (M) ! = 1p o K s C 1 S M ! : (B.43) By combining (B.41) and (B.43), we finally obtain the asymptotic achievable throughput-outage tradeoff: T (P o ) = 1P o K s C 1 S M ! ;P o = (1 )e ( C 1 ) D D (1 +D) (1+D) (1 +D) 1 D 1 : (B.44) B.5 Proof of Lemma 2 By using the concept of Riemann sum in Calculus, we can obtain: F X f=1 log(f +q) Z F +1 1 log(x +q)dx = (x +q) log(x +q)xj F +1 1 ; F X f=1 log(f +q) log(1 +q) + Z F 1 log(x +q)dx = log(1 +q) + (x +q) log(x +q)xj F 1 : (B.45) It follows that F X f=1 log(f +q) (F +q + 1) log(F +q + 1)F (1 +q) log(1 +q); F X f=1 log(f +q) log(1 +q) + (F +q) log(F +q)F (1 +q) log(1 +q) + 1: (B.46) B.6 Proof of Proposition 2 In the following, we will obtain both the upper and lower bounds ofp o wheng c (M) < M C 1 S . As will be shown, the upper and lower bounds have the same expression. We can thus conclude the expression ofp o . 300 Wheng c (M)< M C 1 S , we havem <M according to Theorem 1. Consequently, the outage probability is: p o = M X f=1 P r (f)e gc(M) log z f = m X f=1 P r (f) z f gc(M) + M X f=m +1 P r (f) = m X f=1 P r (f)e gc(M) logz f e gc(M log) + M X f=m +1 P r (f) = m X f=1 P r (f)(z f ) gc(M) gc(M) + M X f=m +1 P r (f) = gc(M) m X f=1 P r (f)(P r (f)) gc(M) gc(M) + M X f=m +1 P r (f) =m gc(M) + M X f=m +1 P r (f) = 1 m X f=1 P r (f) +m gc(M) : (B.47) To lower bound (B.47), we in the following upper bound P m f=1 P r (f) and lower boundm gc(M) , respec- tively. We first derive the upper bound for P m f=1 P r (f) as follows: m X f=1 P r (f) = H(1;m ; ;q) H(1;M; ;q) (a) 1 1 (m +q) 1 (q + 1) 1 + (q + 1) 1 1 [(M +q + 1) 1 (q + 1) 1 ] = (m +q) 1 (q + 1) 1 + (1 )(q + 1) (M +q + 1) 1 (q + 1) 1 = (m +q) 1 (q + 1) 1 (M +q) 1 (q + 1) 1 +o (); (B.48) where (a) is due to Lemma 3 and is with whatever order we have in (B.48). Thus,o () here is simply to indicate some negligible terms that have even smaller order than the major term in (B.48). We use this for notation simplicity, and the same notion applies for all derivations in the remaining Appendix B. We next derive the lower bound form gc(M) . Recall thatm = C 1 Sgc(M) . Then by using the same derivation as in (B.35), we obtain: m gc(M) =m exp P m f=1 logz f S m g c (M) ! =m e gc(M) m P m f=1 logz f e Sgc(M) m =m e gc(M) m P m f=1 logz f e C 1 : (B.49) To find the lower bound of (B.49), our approach is by obtaining the lower bound of P m f=1 logz f . By following the same derivations as in (B.36), we obtain: m X f=1 logz f = g c (M) m X f=1 log(f +q) m g c (M) H(1;M; ;q): (B.50) 301 Then by (B.50) and Lemmas 2 and 3, we obtain: m X f=1 logz f g c (M) ((m +q + 1) log(m +q + 1)m (1 +q) log(1 +q)) m g c (M) log 1 1 (M +q) 1 (1 +q) 1 + (1 +q) : (B.51) By substituting (B.51) into (B.49), we obtain: m gc(M) m e C 1 e gc(M) m h gc(M) ((m +q+1) log(m +q+1)m (1+q) log(1+q)) i e gc(M) m h m gc(M) log 1 1 ((M+q) 1 (1+q) 1 )+(1+q) i =m e C 1 (m +q + 1) m (m +q+1) e (1 +q) m (1+q) 1 1 1 ((M +q) 1 (1 +q) 1 ) + (1 +q) = (1 )e 1 C 1 1 m (m +q + 1) (m +q + 1) (q+1) m (1 +q) (q+1) m (M +q) 1 (1 +q) 1 + (1 )(1 +q) = (1 )e 1 C 1 1 m M 1 (1 + q m + 1 m ) 1 + q M 1 q M + 1 M 1 + (1 )(1+q) M 1 1 +q m +q + 1 (q+1) m = (1 )e 1 C 1 1 m M 1 (1 + q m ) 1 + q M 1 q M 1 q m +q q m +o (): (B.52) 302 Recall thatm = C 1 Sgc(M) andq = C 2 Sgc(M) . By using (B.47), (B.48), and (B.52), we obtain: p o 1 + (1 )e 1 C 1 1 m M 1 (1 + q m ) 1 + q M 1 q M 1 q m +q q m (m +q) 1 (q + 1) 1 (M +q) 1 (q + 1) 1 +o(1) = 1 + (1 )e 1 C 1 1 C 1 S g c (M) M 1 (1 + C 2 C 1 ) 1 + C 2 S gc(M) M 1 C 2 S gc(M) M 1 C 2 C 1 +C 2 C 2 C 1 m M 1 (1 + q m ) 1 ( q m + 1 m ) 1 (1 + q M ) 1 ( q M + 1 M ) 1 +o(1) = 1 + (1 )e 1 C 1 1 C 1 S g c (M) M 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 1 + C 2 S gc(M) M 1 C 2 S gc(M) M 1 C 1 S g c (M) M 1 1 + C 2 C 1 1 C 2 C 1 1 1 + C 2 S gc(M) M 1 C 2 S gc(M) M 1 +o (): (B.53) Similar to the above procedure, we derive the upper bound forp o . Therefore, to obtain the upper bound of (B.47), we in the following obtain the lower bound of P m f=1 P r (f) and upper bound of m gc(M) , respectively. We first derive the lower bound for P m f=1 P r (f) as follows: m X f=1 P r (f) = H(1;m ; ;q) H(1;M; ;q) (a) 1 1 (m +q + 1) 1 (q + 1) 1 1 1 [(M +q) 1 (q + 1) 1 ] + (q + 1) = (m +q + 1) 1 (q + 1) 1 (M +q) 1 (q + 1) 1 + (1 )(q + 1) = (m +q) 1 (q + 1) 1 (M +q) 1 (q + 1) 1 +o (); (B.54) where (a) is due to Lemma 1 of [48]. We then derive the upper bound for m gc(M) . Recall that m = C 1 Sgc(M) and by using the same derivation as in (B.49), we obtain: m gc(M) =m e gc(M) m P m f=1 logz f e C 1 : (B.55) 303 Then by (B.50) and Lemmas 2 and 3, we obtain: m X f=1 logz f g c (M) (log(1 +q) + (m +q) log(m +q)m (1 +q) log(1 +q) + 1) m g c (M) log 1 1 (M +q) 1 (1 +q) 1 : (B.56) By substituting (B.56) into (B.55), we obtain: m gc(M) m e C 1 e gc(M) m h gc(M) (log(1+q)+(m +q) log(m +q)m (1+q) log(1+q)+1) i e gc(M) m h m gc(M) log 1 1 ((M+q+1) 1 (1+q) 1 ) i =m e C 1 (1 +q) m (m +q) m (m +q) e (1 +q) m (1+q) e m 1 1 1 ((M +q + 1) 1 (1 +q) 1 ) = (1 )e 1 C 1 1 m (m +q) (m +q) q m (1 +q) q m e m (M +q + 1) 1 (1 +q) 1 = (1 )e 1 C 1 1 m M 1 (1 + q m ) e m 1 + q M + 1 M 1 q M + 1 M 1 1 +q m +q (q+1) m = (1 )e 1 C 1 1 m M 1 (1 + q m ) 1 + q M 1 q M 1 q m +q q m +o (): (B.57) Recall thatm = C 1 Sgc(M) andq = C 2 Sgc(M) . Also observe that the final results of (B.54) and (B.57) are identical to (B.48) and (B.52), respectively. By using (B.47), (B.54), and (B.57), and following the similar derivations as in (B.53), we obtain: p o 1 + (1 )e 1 C 1 1 C 1 S g c (M) M 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 1 + C 2 S gc(M) M 1 C 2 S gc(M) M 1 C 1 S g c (M) M 1 1 + C 2 C 1 1 C 2 C 1 1 1 + C 2 S gc(M) M 1 C 2 S gc(M) M 1 +o (): (B.58) Combining (B.53) and (B.58) completes the proof. 304 B.7 Proof of Proposition 3 From Proposition 2, when > 1,g c (M) =o(M), andq =o(M), we obtain: p o = 1 + (1 )e 1 C 1 1 C 1 S g c (M) M 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 1 + C 2 S gc(M) M 1 C 2 S gc(M) M 1 C 1 S g c (M) M 1 1 + C 2 C 1 1 C 2 C 1 1 1 + C 2 S gc(M) M 1 C 2 S gc(M) M 1 = 1 + ( 1)e 1 C 1 1 C 1 S M g c (M) 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 C 2 S M gc(M) 1 M C 2 Sgc(M)+ M 1 C 1 S M g c (M) 1 C 1 C 2 1 C 1 C 1 +C 2 1 C 2 S M gc(M) 1 M C 2 Sgc(M)+ M 1 = 1 + ( 1)e 1 C 1 1 C 1 S M g c (M) 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 C 2 S M gc(M) 1 C 1 S M g c (M) 1 C 1 C 2 1 C 1 C 1 +C 2 1 C 2 S M gc(M) 1 = 1 + ( 1)e 1 C 1 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 C 2 C 1 1 C 1 C 2 1 C 1 C 1 +C 2 1 ! C 2 C 1 1 (B.59) B.8 Proof of Corollary 3 From Proposition 3, we know p o =1 + ( 1)e 1 C 1 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 C 2 C 1 1 C 1 C 2 1 C 1 C 1 +C 2 1 ! C 2 C 1 1 : (B.60) 305 Then when consideringg c (M) = 1 q S , we obtainC 2 = 1 . Then when increasing 1 to a large number, this obtainC 1 very close to 1 andC 2 very close to zero. As a result, we obtain p o = 1 + ( 1)(1 1 ( 1 )) (1 2 ( 1 )) (1 3 ( 1 )) 4 ( 1 ) (1 5 ( 1 )): (B.61) Since k ( 1 );k = 1;:::; 5; can be arbitrarily small when increasing 1 to be sufficiently large, we conclude thatp o =( 1 ), where( 1 ) is arbitrarily small. Furthermore, if we let 1 !1, it follows that 4 ( 1 ) = 1 ( 1 ) 1 and 5 = 1 ( 1 ) 1 . As a result, we obtain p o = 1 + 1 ( 1 ) 1 1 + 1 ( 1 ) 1 = 1 ( 1 ) 1 : (B.62) B.9 Proof of Theorem 3 The procedure for proving Theorem 3 is same as for Theorem 2. We here consider g c (M) = 1 q S and g c (M) =o(M). Therefore, the average number of users in a cluster isg c (M) =o(M)< M C 1 S . Then since we considerg c (M) = o(M) andg c (M) = 1 q S , where 1 = (1), we haveq = o(M). Consequently by Proposition 3, we obtain the outage probability: p o =1 + ( 1)e 1 C 1 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 C 2 C 1 1 C 1 C 2 1 C 1 C 1 +C 2 1 ! C 2 C 1 1 : (B.63) To derive the achievableT user , the same approach used for deriving Theorem 2 is adopted. Therefore, we obtain: T user =E min u2U E [C u 1 Hu j n;P] ; (B.64) Then again by exploiting Theorem A.1 (see Appendix B.4), we can use the same arguments as in Appendix B.4 andg c (M) = 1 q S , leading to: T user E [E [C user 1 H j n;P]] =C user E [E [1 H j n;P]] =C user P h = (1p o )C user = 1p o K s 1 g c (M) ! = 1p o K s S 1 q ! : (B.65) 306 By combining (B.63) and (B.65), we finally obtain the asymptotic achievable throughput-outage perfor- mance: T (P o ) = (1P o ) K s S 1 q ! ; P o = 1 + ( 1)e 1 C 1 1 C 1 C 1 +C 2 C 2 C 1 +C 2 C 2 C 1 C 2 C 1 1 C 1 C 2 1 C 1 C 1 +C 2 1 ! C 2 C 1 1 : (B.66) B.10 Proof of Theorem 4 The procedure for the proof is as follows. We first consider the network having n =n =!(M) uniformly distributed users, and then derive the outer bound ofT user (n) andp o (n). Then, we compute theT user andp o via accommodating different realizations ofn with high probability. Suppose the network has n = n = !(M) users, where the location placement P of users follows the BPP. We denote(n;P) = P u2U Tu n as the average throughput per user in the network andL(n;P) as the average distance between the source and destination in the network. Using Theorem 4.2 in [102], which describes the upper bound of the transportation capacity of the network for any arbitrary placement of users and choice of transmission powers, we obtain (n;P)L(n;P)n p n : (B.67) Consequently, we obtain(n;P) 1 L(n;P) p n . To compute the upper bound of(n;P), we need to findL(n;P) as described below. First, we provide Lemmas 4 as follows. Lemma 4: Whenn =!(M) users are uniformly distributed within a network with unit size, the proba- bility to haveN D users within an area of sizeA =o N D n is upper bounded byo(1). Proof. We denoten A as the number of users in an area with sizeA. Then according to Markov ineqaulity, 307 we can obtain P (n A N D ) E[n A ] N D = nA N D : (B.68) Consequently, by lettingA =o N D n , we complete the proof. We denote n s as the number of different users that a certain user searches through for obtaining the desired file and denotep miss (n) as the probability that this user cannot find the desired file from those users being searched. Then, we provide Lemma 5: Lemma 5: Suppose < 1. We then have the following results: when a user in the network searches through n s = o M S different users, we obtain p miss (n s ) 1 o(1). Furthermore, when a user in the network searches through n s = 0 M different users for some 0 , we have the following results: (i) p miss (n s ) 0 1 ( 0 ) if 0 = (1), where 0 1 ( 0 ) can be arbitrarily small as 0 is large enough; and (ii) p miss (n s ) (1 )e (S 0 ) if 0 =!(1). Proof. See Appendix B.14. From Lemmas 4 and 5, we conclude that to have a non-vanishing probability for a user to obtain the desired file (i.e., p miss (n) does not go to 1), with high probability (w.h.p.), the distance between the source and destination is at least q M Sn . As a result, L(n;P) q M Sn . Furthermore, if we considerL(n;P) = q 0 M Sn , we know that, w.h.p., the distance between a source-destination pair is O q 0 M Sn ; otherwise we should haveL(n;P) = ! q 0 M Sn . As a result, w.h.p., the number of users searched by a user isn s =O 0 M S . Recall thatp o (n) is the outage probability when the network hasn users. Also, note that above arguments apply to any number of usersn = !(M) and any P. Accordingly, by combining this with Lemma 5, it follows that (n;P) s S 0 M ! (B.69) withp o (n) 0 1 ( 0 ) if 0 = (1), where 0 1 ( 0 ) can be arbitrarily small; andp o (n) (1 )e (S 0 ) if 0 = !(1). Then since(n;P) = P u2U Tu n T user (n;P), we conclude that for anyn = !(M), we must 308 have T user (n) s S 0 M ! ;p o (n) 0 1 ( 0 ); if 0 = (1); T user (n) s S 0 M ! ;p o (n) (1 )e ( 0 ) ; if 0 =!(1): (B.70) To complete the proof of Theorem 4, we uses Lemma 6: Lemma 6: LetN be the density of a Poisson distribution in whichP N (n) = N n n! e N . Then suppose U =o(N), the following can be obtained: U X n=0 P N (n)o(1): (B.71) Proof. We denote X as the Poisson random variable with density N. According to Chernoff bound, we obtain U X n=0 P N (n) =P(XU) =P(e tX e tU ) E[e tX ] e tU =e tU E[e tX ]; (B.72) wheret> 0. Then we observe that the moment generating function ofX givesE[e tX ] = e N(e t 1) . By lettingt = 1, it follows that U X n=0 P N (n)e U e N(1 1 e ) =e (N(1 1 e )U) : (B.73) By lettingU =o(N), we obtain (N(1 1 e )U) = (N)!1. This leads to U X n=0 P N (n)o(1): (B.74) Finally, recall that we considerM = o(N) when < 1. Consequently, for the adopted network, we haveP (n =!(M)) = 1o(1) according to Lemma 6. In other word, w.h.p., we have n = !(M). It follows from (B.70) and Lemma 6 that T user s S 0 M ! ;p o 0 1 ( 0 ); if 0 = (1); T user s S 0 M ! ;p o (1 )e (S 0 ) ; if 0 =!(1): (B.75) This leads to Theorem 4. 309 B.11 Proof of Theorem 5 The procedure for the proof of Theorem 5 is similar to the proof for Theorem 4. Here, we again first obtain (n;P) 1 L(n;P) p n and compute the lower bound ofL(n;P). To do this, we first provide Lemmas 7 and 8 that will be used later: Lemma 7: Whenn =!(q) users are uniformly distributed within a network with unit size, the minimum size of an area to have q S users with high probability is q Sn . Proof. This is proved by using Lemma 4. Lemma 8: Suppose > 1 andn = !(q). Consideringq = o(M), we have the following results: (i) when a user searches throughn s = o q S different users in the network, we obtainp miss (n) 1o(1); (ii) when a user searches throughn s = 0 1 q S different users, where 0 1 = (1) > 0, we obtainp miss (n) miss ( 0 1 ), where miss ( 0 1 ) = (1) > 0 can be arbitrarily small; (iii) when a user searches throughn s = 0 1 q S < M S different users, where 0 1 =O q 1 1 !1, we obtain:p miss (n) 1 ( 0 1 ) 1 . Proof. When lettingn s =o q S , by using the similar derivations in (B.89), we obtain: p miss (n) 1 1 1 (Sn s +q) 1 (1 +q) 1 1 1 [(M +q + 1) 1 (1 +q) 1 ] o(1) = 1 (1 +q) 1 (Sn s +q) 1 (1 +q) 1 (M +q) 1 o(1) = 1 1 1+q Sn s+q 1 1 1+q M+q 1 o(1) (a) = 1o(1); (B.76) where (a) is due ton s =o q S . When lettingn s = 0 1 q S , from (B.76), we obtain: p miss (n) 1 1 1+q Sn s+q 1 1 1+q M+q 1 = 1+q Sn s+q 1 1+q M+q 1 1 1+q M+q 1 1 +q Sn s +q 1 1 +q M +q 1 1 +q Sn s +q 1 1 M 1 (a) = miss ( 0 1 )o(1); (B.77) 310 where (a) is due to Lemma 3 and 0 1 = (1) > 0. Finally, by using the similar derivations in (B.89) and (B.77) and lettingn s = 0 1 q S < M S , where 0 1 =O q 1 1 !1, we obtain: p miss (n) 1 1 1 (Sn s +q) 1 (1 +q) 1 + (1 +q) 1 1 [(M +q + 1) 1 (1 +q) 1 ] 1 +q Sn s +q 1 1 1 +q = 1 +q 0 1 q +q 1 1 1 +q = 1 q + 1 0 1 + 1 ! 1 1 1 +q = 1 ( 0 1 ) 1 : (B.78) From Lemmas 7 and 8, we conclude that to have a non-vanishing probability for a user to obtain the desired file, w.h.p., the the distance between the source and destination is at least q q Sn . Furthermore, if we consider L(n;P) = q 0 1 q Sn , we know that (w.h.p.) the distance between a source-destination pair isO q 0 1 q Sn ; otherwise we should haveL(n;P) = ! q 0 1 q Sn . As a result, w.h.p., the number of users searched by a user isn s =O 0 1 q S . Note that above arguments are valid for anyn = !(q) and P. Consequently, by combining this with Lemma 8 and again usingT user (n;P) (n;P) 1 L(n;P) p n , we conclude that for alln =!(q), we must have T user (n) s S 0 1 q ! (B.79) withp o (n) miss ( 0 1 ) when 0 1 = (1); andp o (n) 1 ( 0 1 ) 1 when 0 1 =!(1) and 0 1 =o q 1 1 . Finally, recall that we considerq = o(N) when > 1. Consequently, according to (B.79) and Lemma 6, we conclude: T user s S 0 1 q ! ;p o 0 2 ( 0 1 ); (B.80) in which 0 2 ( 0 1 ) = (1) > 0 when 0 1 = (1), where 0 2 ( 0 1 ) can be arbitrarily small; 0 2 ( 0 1 ) = 1 ( 0 1 ) 1 when 0 1 =!(1) and 0 1 =o q 1 1 . 311 B.12 Proof of Theorem 8 We considerm =o(M) andm = Sgc(M) !1 wheng c (M)!1. From (B.47), we know p o = 1 m X f=1 P r (f) +m gc(M) : (B.81) By using the derivations in (B.52) and (B.57) and considering > 1 andq = (1), we obtain: 1 (m ) 1 m gc(M) 1 (m ) 1 : (B.82) Then notice that when > 1,q = (1), andg c (M)!1, we have m X f=1 P r (f) = P m f=1 (f +q) P M f=1 (f +q) = P m +q+1 f=q+1 f P M+q+1 f=q+1 f = P m +q+1 f=1 f P q f=1 f P M+q+1 f=1 f P q f=1 f : (B.83) Observe that P 1 f=1 f converges when > 1. We let P M+q+1 f=1 f = . It follows from (B.83) and Lemma 3 that m X f=1 P r (f) = P M f=m +q+2 f P q f=1 f P q f=1 f = 1 P M f=m +q+2 f P q f=1 f 1 1 1 M 1 (m +q + 2) 1 + (m +q + 2) P q f=1 f = 1 (m +q + 2) 1 M 1 + ( 1)(m +q + 2) ( 1)( P q f=1 f ) = 1 1 (m ) 1 : (B.84) Consequently, by using (B.81), (B.82), and (B.84), we obtainp o 1 (Sgc(M)) 1 . To computeT user , we directly exploit the derivations in (B.65), leading to T user = 1p o K s 1 g c (M) ! : (B.85) Then, sincep o 1 (Sgc(M)) 1 , we can letg c (M) = 0 1;zip S !1, where 0 1;zip is an arbitrary function such that 0 1;zip !1 whenM!1, and obtain the achievable throughput-outage performance: T (P o ) = s S 0 1;zip ! ;P o = 1 ( 0 1;zip ) 1 ! =o(1): (B.86) 312 B.13 Proof of Theorem 10 The proof is identical to Theorem 5 with minor modifications. We first provide Lemma 9: Lemma 9: Suppose > 1,n = !(q), andq = (1). Let 0 2;zip = (1). Whenn s = 0 2;zip q S , we havep miss (n) 1 ( 0 2;zip ) 1 . Proof. By using (B.83), (B.84), Lemma 3, andq = (1), we then know: p miss (n) = 1 Sn s X f=1 P r (f) = P M f=Sn s+q+2 f P q f=1 f 1 1 (M + 1) 1 (Sn s +q + 2) 1 P q f=1 f = (Sn s +q + 2) 1 (M + 1) 1 ( 1)( P q f=1 f ) = 1 ( 0 2;zip q) 1 ! = 1 ( 0 2;zip ) 1 ! : (B.87) Then following similar derivations and arguments in Theorem 5, we know when ifL(n) = 0 2;zip q Sn , w.h.p.,n s =O 0 2;zip q S for any user in the network. Therefore, by using Lemma 9 and the same approach as that in Theorem 5, we obtain that the throughput-outage performance of the network is dominated by: T (P o ) = s S 0 2;zip ! ;P o = 1 ( 0 2;zip ) 1 ! : (B.88) B.14 Proof of Lemma 5 The proof of the case thatn s = o M S is simple. In this case, the bestp miss (n s ) happens when all users being visited cache different files. Accordingly, we have p miss (n s ) 1 Sn s X f=1 P r (f) = 1 H(1;Sn s ; ;q) H(1;M; ;q) (a) 1 1 1 (Sn s +q) 1 (1 +q) 1 + (1 +q) 1 1 [(M +q + 1) 1 (1 +q) 1 ] = 1 1 1 (Sn s +q) 1 (1 +q) 1 1 1 [(M +q + 1) 1 (1 +q) 1 ] o(1) (b) = 1o(1); (B.89) where (a) is due to Lemma 3; and (b) is because 1 1 (Sn s +q) 1 (1 +q) 1 1 1 [(M +q + 1) 1 (1 +q) 1 ] =o(1) (B.90) 313 for eitherq =o(M) orq = (M). For the other cases, we recall that we adopt the decentralized random policyP c (). Accordingly, when considering the case thatn s = 0 M, the probability that a user cannot find its desired file, i.e.,p miss (n s ), can be described as [46, 48]: p miss (n s ) = M X f=1 P r (f)(1P c (f)) n s : (B.91) Accordingly, the minimum ofp miss (n s ) can be obtained by optimizing the caching policy using: min M X f=1 P r (f)(1P c (f)) n s s:t: M X f=1 P c (f) =S 0P c (f) 1;8f = 1; 2;:::;M: (B.92) Observe that (B.92) has a very similar optimization problem to that being optimized in Theorem 1 of [48] except that here P M f=1 P c (f) is equal to S instead of 1 and that constraints P c (f) 1;8f; need to be satisfied. To accommodate the additional constraints thatP c (f) 1;8f, we adopt the same approach as that for proving Theorem 1 of this paper – we assume that these constraints are satisfied when finding the solution, and then show that the assumption is indeed true for the obtained solution. Accordingly, we denote P c;n s (f) as the optimum solution of (B.92); denotem n s as the smallest index such thatP c (m n s + 1) = 0; and letC 2;n s = q n s1 . Then, by following the similar procedure as that for proving Theorem 1 of [48] (see details in Appendix A in [48]), we can obtain: P c;n s (f) = " 1 n s z f;n s # + ; (B.93) where n s = m n s S P M f=1 1 z f;n s ;z f;n s = (P r (f)) 1 n s1 ; and m n s = min C 1;n s n s ;M ; (B.94) whereC 1;n s S is the solution of the equalityC 1;n s = S +C 2;n s log 1 + C 1;n s C 2;n s . It should be noticed that now P M f=1 P c (f) is equal toS instead of 1 when repeating the similar procedure as that in Appendix 314 in [48]. Finally, we can observe thatz f;n s = (P r (f)) 1 n s1 > 0, and thus n s = m n s S P M f=1 1 z f;n s > 0 as well. Accordingly, we observe from (B.93) that the assumption thatP c;n s (f) 1;8f; is true. Based on the derived caching policy P c;n s (), we now derive p miss (n s ). Suppose we let 0 be large enough such that C 1;n s n s M. Note that this is always possible as 0 = (1). We denote = n s 1. Then, noticing that n s = MS P M f=1 1 z f;n s , we obtain: p miss (n s ) = M X f=1 P r (f)(1P c (f)) n s = M X f=1 P r (f) n s z f;n s ! n s = ( n s ) n s M X f=1 P r (f) (z f;n s ) n s = 0 @ MS P M f=1 (P r (f)) 1 n s1 1 A n s 0 @ M X f=1 (P r (f)) 1 n s1 1 A = (MS) n s 0 @ M X f=1 (P r (f)) 1 n s1 1 A (n s1) = (MS) n s 0 @ M X f=1 (f +q) H(1;M; ;q) 1 n s1 1 A (n s1) = (MS) n s H(1;M; ;q) 1 P M f=1 (f +q) n s1 n s1 (a) (MS) n s 1 1 [(M +q) 1 (1 +q) 1 ] + (1 +q) 1 R M+1 1 (x +q) dx = (MS) n s 1 1 [(M +q) 1 (1 +q) 1 ] + (1 +q) 1 1 +1 h (M +q + 1) +1 (q + 1) +1 i = (1 )(MS) n s (M +q) 1 (1 +q) 1 + (1 )(1 +q) 1 + h (M +q + 1) +1 (q + 1) +1 i = (1 )(MS) n s (M +q) 1 (1 +q) 1 + (1 )(1 +q) e h (M +q + 1) +1 (q + 1) +1 i = (1 )e (M +q) 1 (1 +q) 1 MS M n s (M) 1 " M +q + 1 M +1 q + 1 M +1 # +o(1); (B.95) where (a) is due to Lemma 1 of [48]. Recall thatn s = 0 M andD = q M . It follows from (B.95) that p miss (n s ) (1 )e 1 S M 0 M (1 +D) 1 (D + 1=M) 1 " 1 +D + 1 M +1 D + 1 M +1 # +o(1) = (1 )e (S 0 ) 1 S M 0 M (1 +D) 1 (D) 1 h (1 +D) +1 (D) +1 i +o(1) (B.96) Finally, by using (B.96) and by letting 0 = (1), but 0 is arbitrarily large, we obtainp miss (n s ) 0 1 ( 0 ), where 0 1 ( 0 ) can be arbitrarily small. Similarly, by letting 0 = !(1), we obtain p miss (n s ) (1 315 )e (S 0 ) =o(1). This completes the proof of Lemma 5. B.15 Proof of the Uniformly Random Matching When a user requests filef andV f is not empty, it needs to pick up a user uniformly random inV f as the source. Then, since every user caches files according to the same randomized caching policy, the uniformly random selection of a user inV f indicates the uniformly random selection of a user in the cluster, because the probabilities for the users to be included inV f are the same. On the other hand, ifV f is empty, the user directly pick a user in the cluster uniformly at random as the source. Combining above statements with the fact that users request files according to the same distribution, we conclude that the proposed scheme in Ch. 4.3.1 matches users in the cluster uniformly at random. 316 C H A P T E R C Appendices of Chapter 5 C.1 Proof of Lemma 1.2 Proof. Here we only prove the part regarding " M X m=1 a m e (a+ i )bm # a because the part regarding h P M m=1 a m e i bm i a can be similarly proved. Note that P M m=1 a m e (a+ i )bm is convex and non-increasing with respect toB, andx a is convex and non-decreasing when a 1 andx 0. We denote P M m=1 a m e (a+ i )bm asg(b), whereb2B;x a ash(x). Thus, h P M m=1 a m e (a+ i )bm i a = h(g(b)). Suppose 0 1. We observe that g (b 1 + (1)b 2 )g(b 1 ) + (1)g(b 2 ) (C.1) due to convexity, whereb 1 ;b 2 ;b 1 + (1)b 2 2B. Then noticing that 0g (b 1 + (1)b 2 ) 1 and that 0g(b 1 ) + (1)g(b 2 ) 1 due to the facts that 0g(b 1 ) 1 and 0g(b 2 ) 1, we know that h (g (b 1 + (1)b 2 ))h (g(b 1 ) + (1)g(b 2 )) (C.2) 317 due to the non-decreasing property ofh(x);x 0. Finally, by combining the above results and thath(x) is convex whenx 0, we have h (g (b 1 + (1)b 2 ))h (g(b 1 ) + (1)g(b 2 ))h(g(b 1 )) + (1)h(g(b 2 )): (C.3) This proves that h P M m=1 a m e (a+ i )bm i a is convex. To prove that h P M m=1 a m e (a+ i )bm i a is non- increasing, we simply notice that P M m=1 a m e (a+ i )bm is non-increasing andx a is non-decreasing when considering the feasible setB. 318 C H A P T E R D Appendices of Chapter 6 D.1 Challenges, Limitations, and Drawbacks of Kullback-Leibler Distance Based Parameter Estimation When dealing with a dataset with a large amount of raw data and without much understanding of the prop- erties of the dataset, the estimation of the critical parameters is challenging. This is either because the exact properties are unclear or even because we do not know what parameter is most suitable one to estimate. In such a case, the K-L based estimation is useful because it is simple to implement and is intuitively relevant to the fundamental property of statistics, i.e., the K-L distance. K-L based estimation also has its limitations and drawbacks. First of all, it could be subject to overfitting, and thus sometimes provides non-smooth results. For example, it can be observed from Figs. 6.12 and 6.13 that the tails of the curves are non-smooth. Besides, the performance of the estimation is difficult to analyze. In fact, we need to resort to a numerical approach, such as bootstrapping, to find the confidence intervals as presented in Appendix D.4. Moreover, it cannot provide any insight for choosing between different models. To be specific, if we keep adding more parameters into the modeling distribution, the K-L distance results of the K-L based estimation might keep improving, which indicates that the additional parameters 319 can provide the better model. However, this might not be true because the increase of number of parameters in a model can have adverse impacts on other aspects, such as complexity and overfitting effect. From these discussions, it can be understood that there are non-trivial issues that can be more profoundly investigated for refining the framework. However, since our focus is to provide a complete modeling, parameterization, and preference generation framework based on the real-world data, treatments of those issues remain topics for future works. D.2 Details of the Correlation Analysis Results Here the results of correlation analysis are reported in detail. To be specific, we provide several tables to present the correlation coefficients between different parameters and their corresponding 95% confidence intervals. In Tables D.1 and D.2, the linear correlation results, i.e., the Pearson correlation coefficients of pa- rameters, are presented for June and July, respectively. Note that the confidence intervals in the tables are computed by using the standard tool in Matlab (2017). From the tables, we can observe that the confi- dence intervals regarding parameters of genre-based conditional popularity and genre ranking distributions are wide. The main reason could be that we do not have sufficient samples for them. In fact, we can only get around 50 to 110 samples for them because we only haveG = 110 genres. In Tables D.3 and D.4, the rank correlation results, i.e., the Spearman correlation coefficients of param- eters, are presented for June and July, respectively. We note that the correlation coefficient and confidence intervals here are computed by following the definition in (6.22), i.e., we first convert the original values into their corresponding ranks, and then conduct the linear correlation computation. 320 Table D.1: Linear Correlation Results of June Parameters Correlation Confidence Interval out k NNT vs.q out k NNT 0:806 [0:805; 0:807] out k NNT vs.S k NNT 0:080 [0:076;0:072] out k NNT vs.L k NNT 0:015 [0:011; 0:019] q out k NNT vs.S k NNT 0:037 [0:041;0:033] q out k NNT vs.L k NNT 0:003 [0:007;0:001] S k NNT vs.L k NNT 0:236 [0:232; 0:240] out k NT vs.q out k NT 0:578 [0:573; 0:583] out k NT vs.S k NT 0:008 [0:016; 0:000] out k NT vs.L k NT 0:036 [0:029; 0:044] q out k NT vs.S k NT 0:038 [0:030; 0:046] q out k NT vs.L k NT 0:016 [0:008; 0:024] S k NT vs.L k NT 0:215 [0:207; 0:222] in g NNT vs.q in g NNT 0:281 [0:006; 0:517] in g NT vs.q in g NT 0:366 [0:032; 0:627] a rk g vs.b rk g 0:570 [0:406; 0:698] 321 Table D.2: Linear Correlation Results of July Parameters Correlation Confidence Interval out k NNT vs.q out k NNT 0:812 [0:811; 0:813] out k NNT vs.S k NNT 0:345 [0:349;0:341] out k NNT vs.L k NNT 0:091 [0:087; 0:095] q out k NNT vs.S k NNT 0:166 [0:170;0:162] q out k NNT vs.L k NNT 0:067 [0:063; 0:071] S k NNT vs.L k NNT 0:281 [0:277; 0:285] out k NT vs.q out k NT 0:605 [0:600; 0:610] out k NT vs.S k NT 0:030 [0:037;0:022] out k NT vs.L k NT 0:063 [0:055; 0:070] q out k NT vs.S k NT 0:188 [0:180; 0:195] q out k NT vs.L k NT 0:068 [0:060; 0:075] S k NT vs.L k NT 0:266 [0:260; 0:274] in g NNT vs.q in g NNT 0:301 [0:044; 0:521] in g NT vs.q in g NT 0:286 [0:076; 0:581] a rk g vs.b rk g 0:605 [0:453; 0:723] 322 Table D.3: Rank Correlation Results of June Parameters Correlation Confidence Interval out k NNT vs.q out k NNT 0:714 [0:712; 0:716] out k NNT vs.S k NNT 0:074 [0:078;0:070] out k NNT vs.L k NNT 0:022 [0:018; 0:026] q out k NNT vs.S k NNT 0:024 [0:028;0:020] q out k NNT vs.L k NNT 0:002 [0:006; 0:002] S k NNT vs.L k NNT 0:171 [0:167; 0:175] out k NT vs.q out k NT 0:666 [0:662; 0:671] out k NT vs.S k NT 0:011 [0:003; 0:019] out k NT vs.L k NT 0:060 [0:053; 0:068] q out k NT vs.S k NT 0:044 [0:036; 0:052] q out k NT vs.L k NT 0:021 [0:013; 0:029] S k NT vs.L k NT 0:155 [0:147; 0:162] in g NNT vs.q in g NNT 0:386 [0:123; 0:598] in g NT vs.q in g NT 0:341 [0:003; 0:609] a rk g vs.b rk g 0:625 [0:475; 0:739] a rk g vs. Global Rank 0:268 [0:059; 0:455] b rk g vs. Global Rank 0:303 [0:485;0:096] 323 Table D.4: Rank Correlation Results of July Parameters Correlation Confidence Interval out k NNT vs.q out k NNT 0:748 [0:746; 0:749] out k NNT vs.S k NNT 0:345 [0:349;0:342] out k NNT vs.L k NNT 0:128 [0:124; 0:132] q out k NNT vs.S k NNT 0:125 [0:129;0:121] q out k NNT vs.L k NNT 0:082 [0:078; 0:086] S k NNT vs.L k NNT 0:249 [0:245; 0:252] out k NT vs.q out k NT 0:702 [0:698; 0:706] out k NT vs.S k NT 0:050 [0:042; 0:058] out k NT vs.L k NT 0:115 [0:108; 0:123] q out k NT vs.S k NT 0:210 [0:202; 0:217] q out k NT vs.L k NT 0:099 [0:092; 0:107] S k NT vs.L k NT 0:274 [0:267; 0:281] in g NNT vs.q in g NNT 0:503 [0:279; 0:675] in g NT vs.q in g NT 0:301 [0:060; 0:592] a rk g vs.b rk g 0:610 [0:459; 0:726] a rk g vs. Global Rank 0:146 [0:069; 0:345] b rk g vs. Global Rank 0:385 [0:550;0:190] 324 D.3 Generation of Correlated Random Samples with Arbitrary Distribu- tions using Rank Correlation Here the implementation recipe of the rank-based parameter generation defined in (6.23) is provided. The generation is based on the Gaussian copula and rank correlation and can be realized by ”Statistics and Machine Learning” toolbox in Matlab (2017). We consider thex whose dimension isN, i.e., we considerN parameters. We suppose the rank correla- tion ofx, i.e.,C Rn x , and their marginal distributionsff x g are given. To generate the dependentM instances ofN parameters, the first step is to generateM samples of each parameterv x 1 ;:::;v x N using their marginal distributionsf x 1 ;:::;f x N , wherex n is thenth parameter inx and eachv xn is a vector with dimensionM. This can be implemented using randsrc in Matlab. Then step 2 is to convert theC Rn x into its corresponding linear correlation coefficientC Ln x by exploiting properties of Gaussian copula. This can be implemented by using copulaparam in Matlab. Step 3 is to generate the Gaussian copula random numbersu x 1 ;u x 2 ;:::;u x N usingC Ln x , where the dimension of eachu xn isM. This step can be implemented using copularnd in Mat- lab. Step 4 is to find the orders of eachu xn in ascending order and record the indices of the orders using i n . Step 5 is to rearrange the order of eachv xn by usingi n , and the rearrangement is to letv xn follow the order ofu xn . The psudocode of the generation approach is provided in Alg. 5 with the corresponding im- plementation functions in Matlab. Note that the finaly n in Alg. 5 is the samples for paramterx n ,8n, and the collection of a n-tuple (y 1 (m);y 2 (m);:::;y N (m)) is a dependent instance ofx. We note that a simple but representative example for this generation approach can be found in [213]. 325 Algorithm 5 Implementation Recipe of the Rank-Based Parameter Generation 1: v xn =randsrc(M;f xn ),8n; implemented using randsrc in Matlab. 2: C Ln x =copulaparam(Gaussian;C Rn x ;type;Spearman),8n; implemented using copulaparam in Matlab. 3: u xn =copularnd(Gaussian;C Ln x ;M),8n; implemented using copularnd in Matlab. 4: i n =sort(u xn ),8n; implemented using sort in Matlab. 5: y n (i n ) =sort(v xn ),8n. D.4 Details of the Parameteriztion Results and Individual Preference Prob- ability Generation Approach In this appendix, the complete parameterization results of the models are first provided. Then the complete implementation flow chart of the generation approach specifically used for the numerical results in Section VII.C is presented. This implementation can be regarded as an example of using the modeling framework and generation approach. Finally, we offer the additional details for generating the numerical results. The complete parameterization results and their corresponding 95% confidence intervals are provided in Tables D.9 and D.10 at the end of the appendices. We note that when a parameterization is conducted by using ML approach, the confidence interval is provided simply by standard approach (from Matlab toolbox). However, when considering parameterization using K-L approach, the confidence interval calculation is via the bootstrapping [228], which is a Monte-Carlo based approach, with 1000 bootstrapping samples. Note that the bootstrapping confidence interval calculations can be implemented by using the function bootci in the “Statistics and Machine Learning” toolbox in Matlab (2017). The results are shown in Table Tables D.9 and D.10 for June and July, respectively. We note that since the parameters directly given by the setup or environment do not have confidence intervals, we note their confidence intervals as ”None” in the tables. With those fundamental parameters being specified, we then can generate the individual preference probabilities of users via using the modeling framework and generation approach. Therefore we provide a 326 complete flow diagram of the implementation recipe of the generation approach in Fig. D.16 at the end of the paper. In the figure, the rectangular blocks whose corners are sharp, i.e., the red blocks, correspond to the steps of the parameter generation; the rectangular blocks whose corners are round, i.e., the blue blocks, correspond to the steps of the main modeling framework; and the ellipse blocks, i.e., the green blocks, corre- spond to the final generated results. We note that the flow in Fig. D.16 somehow corresponds to the adopted dataset in this work in the sense that the detailed structure of parameter generation is constructed accord- ing to the parameterization and correlation analysis results of the adopted dataset, i.e., we jointly generate those parameters that are correlated with one another. This also implies that if the parameterization and/or correlation results are different (when another dataset is used), the structure of the parameter generations needs to be fine-tuned accordingly. We also note that theM g ;8g, i.e., the numbers of files of each genre, are parameters determined by the emulation setup. Finally, as mentioned in Ch. 6.7.3, the numerical validations in Figs. 6.19 and 6.20 require the specifi- cally provided numerical values of in g ;q in g ;g = 1;:::; 30, and the calibrated numerical values ofM g ;8g, of the dataset. Their values are respectively provided in Tables D.11, D.12 and D.13 at the end of the appen- dices. D.5 Empirical Justifications for the Proposed Modeling Here we report the K-L distance and K-S test results of our models. We first report the average K-L dis- tance for models in Ch. 6.3 and Ch. 6.4, and the loading distribution in Table D.5. From the Table, we can see that the K-L distances are all small, indicating the effectiveness of the proposed models. For the statisti- cal representation provided in Ch. 6.5, our goal is to reduce the description complexity of the parameter set. We thus express the parameterization either by well-known distributions or by certain specifically designed distributions. For the specifically designed distributions, we provide the K-L distance to show that our repre- sentation is effective; for the well-known distributions, we conduct the K-S test at the standard significance 327 level of 0:05 to show that our representations are effective. Note that when a K-S test cannot reject the null hypothesis, it indicates we cannot say the proposed modeling distribution is not equivalent to the statistics of the real data. The K-L distance results are shown in Table D.6; and the K-S test results are shown in Ta- ble D.7. All results show that our statistical representations are effective except for the out k NT andq out k NT. This is because, in these cases, we compare the quantized real data with continuous distributions, and the K-S test is conducted with a large number of samples (more than 70000 samples). Note that the parameter- ization of the real data is quantized because it is not possible to exhaustively search for the best parameters located in a real domain without quantization. However, although they did not pass the K-S test, we can still see from the Figs. 10 and 11 that the statistical representations fit the real data very well. To make the above statement more concrete, we compare out k NT andq out k NT with their corresponding quantized mod- eling distribution and show the results using K-L distance in Table D.8. We can see that the K-L distances are small, indicating good fit between the proposed modeling and real data. D.6 Individual Preference Modeling of Facebook Dataset Here we use another dataset, which is from the records of Facebook, to validate our proposed modeling framework and show the generality and extensibility. In particular, we will see that the general structure of our modeling framework, and the functional shape of the different curves, carry over very well. The specific parameterizations are, of course, different between the two data sets, since they describe different types of video services. The Facebook dataset contains records from the on-demand video accesses, which are firstly generated in the form of live videos in a live social broadcast platform, Facebook Live [229], and then change to on- demand videos after the conclusion of the live broadcast. We collected a large-scale dataset comprising of interactions from users during eight months. As a part of the crawl, we collect all the comments made during the eight months for the videos that were made available after the live broadcast. Since comments are one 328 Table D.5: K-L Distance Results of Proposed Models in Ch. 6.3 and Ch. 6.4, and loading distribution Modeling Target Dataset K-L Distance Individual genre popularity June 0:0277 Individual genre popularity July 0:0279 Genre-based conditional popularity June 0:0731 Genre-based conditional popularity July 0:0770 Size distribution June 0:0067 Size distribution July 0:0070 Genre appearance probability June 0:0701 Genre appearance probability July 0:0832 Ranking distribution June 0:0311 Ranking distribution July 0:0321 Loading distribution June 0:0015 Loading distribution July 0:0021 Table D.6: K-L Distance Results of the Specifically Designed Distributions in Ch. 6.5 Modeling Target Dataset K-L Distance out k NNT June 0:0865 out k NNT July 0:0808 q out k NNT June 0:0352 q out k NNT July 0:0361 329 Table D.7: K-S Test Results of the Well-Known Distributions in Ch. 6.5 Modeling Target Dataset K-S Test out k NT June The test reject the null hypothesis out k NT July The test reject the null hypothesis q out k NT June The test reject the null hypothesis q out k NT July The test reject the null hypothesis in g NNT June The test cannot reject the null hypothesis in g NNT July The test cannot reject the null hypothesis q in g NNT June The test cannot reject the null hypothesis q in g NNT July The test cannot reject the null hypothesis a rk g June The test cannot reject the null hypothesis a rk g July The test cannot reject the null hypothesis b rk g June The test cannot reject the null hypothesis b rk g July The test cannot reject the null hypothesis Table D.8: K-L Distance Results of the Quantized Well-Known Distributions in Ch. 6.5 Modeling Target Dataset K-L Distance out k NT June 0:0188 out k NT July 0:0192 q out k NT June 0:0572 q out k NT July 0:0527 330 of the major forms of the user interaction with the video, we use the records of comments as the indications of accesses from the users. While we realize that the commenters are only a subset of the viewers, this was the only way data could be obtained. Since we are interested in the genre of the user-access, we focus on the page videos, which are typically maintained by various organizations, e.g., political parties, news channels, and sports teams, as the videos published from a page are usually tagged with the categories. Note that the number of genres in this dataset is 35. To this end, we have collected 3:8M users accessing 123K categorized videos, which we will use for the analysis of individual preference modeling. 1 While the amount of data is not as much as the BBC iPlayer and might not be able to provide reliable results, we believe it includes a significant proportion from a new and widely accessed live medium to indicate that our modeling approach is applicable to more scenarios than the BBC iPlayer. We emphasize that collecting data is very challenging, due to privacy considerations. We also note that although both the BBC iPlayer and Facebook datasets contain on-demand video streaming, they indeed feature different types of contents. Based on the Facebook dataset, we apply the proposed modeling framework. We again consider the unique access feature and consider only those users that have at least 10 unique accesses. The genre-based structure is again used, and the modeling results are demonstrated in the following. Figs. D.1 and D.2 show the results of the individual genre popularity distributions. From the figures, we can observe good matches between the real data and the proposed model. The results of the genre-based conditional popularity distributions are shown in Figs. D.3 and D.4. Again, we see that the proposed model can fit the real data. We note that considering all the data on hand, the MZipf distribution can effectively model all the distributions involving genre popularity. In Figs. D.5 and D.6, we compare the real data with the proposed model of the ranking distribution. From the figures, we can see that all the figures show an excellent match between the proposed model 1 Though we collected data from so many users, only approximately seven thousand of them are useful for analysis, i.e., only few of them can provide at least 10 identifiable unique accesses. 331 1 2 3 10 -2 10 -1 10 0 Real Data Proposed Model (a) Case 1. 10 0 10 1 10 -3 10 -2 10 -1 10 0 Real Data Proposed Model (b) Case 2. Figure D.1: Exemplary comparisons between the model and real data of individual genre popularity distri- butions. 332 1 2 10 -2 10 -1 10 0 Real Data Proposed Model (a) Case 3. 1 2 3 4 5 6 7 8 10 -2 10 -1 10 0 Real Data Proposed Model (b) Case 4. Figure D.2: Exemplary comparisons between the model and real data of individual genre popularity distri- butions. 333 10 0 10 1 10 2 10 3 10 4 10 -5 10 -4 10 -3 10 -2 Real Data Proposed Model (a) Case 1. 10 0 10 1 10 2 10 3 10 -4 10 -3 10 -2 10 -1 Real Data Proposed Model (b) Case 2. Figure D.3: Exemplary comparisons between the model and real data of genre-based conditional popularity distributions. 334 10 0 10 1 10 2 10 3 10 4 10 -5 10 -4 10 -3 10 -2 Real Data Proposed Model (a) Case 3. 10 0 10 1 10 2 10 3 10 4 10 -5 10 -4 10 -3 10 -2 Real Data Proposed Model (b) Case 4. Figure D.4: Exemplary comparisons between the model and real data of genre-based conditional popularity distributions. 335 and real data except for Fig. D.6b, which was specifically chosen to demonstrate one of the few cases in which the real data does not match the proposed model so effectively. Rather, the results match the model proposed in our conference version [80], i.e., the double-sided Zipf distribution, which corresponds to the case where we consider only high frequency users; the possible explanation is that the Facebook dataset on hand might still not be large enough to include a sufficient amount of regular users, leading to the higher emphasis on high frequency users. We compare the proposed models of the appearance probabilities, size distribution, and loading distribution with the real data in Figs. D.7, D.8, and D.9, respectively. From all figures, we can again observe excellent matches between the proposed models and the real data. Also, it is interesting to mention that, different from the results of the BBC iPlayer, the Facebook results demonstrate the concentration on the rank to be equal to one or two and the size of genre list to be equal to one. In other words, in the Facebook dataset, most people concentrate merely on one or two areas of interest. Finally, we validate the proposed generation approach without considering the statistical representations of parameters proposed in Ch. 6.5, i.e., we simply using the numerical values of the models in Ch. 6.3 and Ch. 6.4 to generate the global popularity distribution. This is because from the results of the parameteriza- tion, the number of genres, i.e., 35 genres, appears to be insufficient to obtain the statistical representations of the parameters. This will be discussed more in the next paragraph. In Fig. D.10, the global popularity distribution of the real data is compared with the one generated by the proposed approach. From the fig- ure, we can see that the proposed approach can generate a result close to the real data. This validates the effectiveness of the proposed models and the generation approach. In summary, the above results validate that the proposed modeling framework is effective when another dataset is adopted. While of course this is not a conclusive proof that the framework will work for all possible datasets, it is at least indicative that two quite different types of video services can be described well by our model, and thus might serve for other researchers in the modeling of their data. We now turn to the statistical representations of parameters in the modeling framework for the Facebook 336 10 0 10 1 10 -6 10 -4 10 -2 10 0 Real Data Proposed Model (a) Case 1. 10 0 10 1 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 Real Data Proposed Model (b) Case 2. Figure D.5: Exemplary comparisons between the model and real data of ranking distributions. 337 10 0 10 1 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 Real Data Proposed Model (a) Case 3. 10 0 10 1 10 -6 10 -4 10 -2 10 0 Real Data Proposed Model (b) Case 4. Figure D.6: Exemplary comparisons between the model and real data of ranking distributions. 10 0 10 1 10 2 10 -4 10 -3 10 -2 10 -1 10 0 Real Data Proposed Model Figure D.7: Comparisons between the model and real data of genre appearance probabilities. 338 10 0 10 1 10 -8 10 -6 10 -4 10 -2 10 0 Real Data Proposed Model Figure D.8: Comparisons between the model and real data of the size distribution. 10 1 10 2 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 Real Data Proposed Model Figure D.9: Comparisons between the model and real data of the loading distribution. 10 0 10 1 10 2 10 3 10 4 10 5 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 Data Curve Proposed Generation Figure D.10: Comparison between global popularity distributions of files from proposed generation ap- proach and real data. 339 dataset, corresponding to Ch. 6.5 of the manuscript. In Figs. D.11 and D.12, the statistical representations of parameters of individual genre popularity distributions are demonstrated. It can be observed that the proposed statistical modeling is effective while the specific values of parameters are different. Also, we can see that the plot of the real data exhibits some jumps, indicating that the data volume of the Facebook dataset might not be sufficient. In Figs. D.13 and D.14, the statistical representations of parameters of genre-based conditional popularity distributions are demonstrated. From the figures, we can see that the proposed model is effective, while the amount of data is somewhat insufficient because we can only have 35 different genres in the Facebook dataset. Finally, we present the statistical representations of parameters of ranking distributions in Fig. D.15. From the figures, we can see that the proposed model cannot successfully characterize the negative part of a rk g of the real data. Note that the proposed model can match the real data if we exclude the negative values ofa rk g in the real data. Although the results in Figs. D.13 - D.15 show that the proposed models might be able to characterize the real data in some cases, the results are considered unreliable because we only have 35 genres. On the other hand, since we only have a small number of genres, the complexity of using the purely numerical values instead of the statistical representations for these parameters might be acceptable in this case. Overall, the proposed statistical representations of parameters are effective in several cases, while there are some exceptions to handle. That being said, the parameterization results in terms of the specific shaping of the statistical representations are different from those in the BBC iPlayer. This reflects the statement in the main body of the manuscript that “the conclusion of the specific values in the modeling is valid only for the adopted dataset, and its extension of this conclusion to other types of video service should undergo careful examination.” 340 0 2 4 6 8 10 12 14 16 18 20 22 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Real Data Proposed Model (a) out k with NNT. 0 5 10 15 20 25 30 35 40 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Real Data Proposed Model (b) q out k with NNT. Figure D.11: Comparisons between the model and real data. 341 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (a) out k with NT. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (b) q out k with NT Figure D.12: Comparisons between the model and real data. 342 0 2 4 6 8 10 12 14 16 18 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Real Data (a) in g with NNT. 0 20 40 60 80 100 120 140 160 180 200 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Real Data (b) q in g with NNT. Figure D.13: Comparisons between the model and real data. 343 0 0.2 0.4 0.6 0.8 1 1.2 1.4 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (a) in g with NT. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Data Proposed Model (b) q in g with NT. Figure D.14: Comparisons between the model and real data. 344 -2 -1 0 1 2 3 4 5 6 7 8 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Real Data (a) a rk g . 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 0 0.2 0.4 0.6 0.8 1 F(x) Empirical CDF Proposed Model Real Data (b) b rk g . Figure D.15: Comparisons between the model and real data. 345 Table D.9: Parameterization Results Parameter Value (June) Confidence Interval (June) Value (July) Confidence Interval (July) G 110 None 110 None a Si 4:95 [4:93; 4:97] 4:10 [4:10; 4:10] b Si 0:65 [0:65; 0:65] 0:50 [0:50; 0:50] M Si 55 None 64 None ap 0:10 [0:10; 0:10] 0:10 [0:1; 0:1] N ap 0:84 [0:84; 0:84] 0:86 [0:86; 0:86] P out NNT 0:784 None 0:795 None a out 1;ga 7:31 [6:70; 8:38] 7:91 [7:31; 9:03] b out 1;ga 0:35 [0:27; 0:39] 0:34 [0:27; 0:38] a out 2;ga 4:5 [3:87; 4:82] 4:7 [4:12; 5:00] b out 2;ga 19:6 [19:55; 19:67] 19:6 19:54; 19:70] a out 3;ga 24360 [21764; 27952] 25148 [21853; 30849] b out 3;ga 0:0008 [0:0007; 0:0009] 0:0008 [0:0007; 0:0009] c out 1;ga 0:31 [0:29; 0:33] 0:33 [0:30; 0:34] c out 3;ga 0:30 [0:29; 0:31] 0:34 [0:33; 0:35] a out 1;q 1:20 [1:14; 1:25] 1:24 [1:20; 1:30] b out 1;q 3:88 [3:41; 4:35] 3:60 [3:06; 3:87] a out 2;q 16 [14:7; 17:4] 15 [13:5; 15:8] b out 2;q 39 [39; 39] 39 [39; 39] a out 3;q 26836 [26109; 27521] 23713 [23027; 24393] b out 3;q 0:0015 [0:0015; 0:0015] 0:0017 [0:0016; 0:0017] c out 1;q 0:44 [0:43; 0:45] 0:45 [0:43; 0:46] c out 3;q 0:24 [0:24; 0:24] 0:18 [0:18; 0:18] 346 Table D.10: Parameterization Results Parameter Value (June) Confidence Interval (June) Value (July) Confidence Interval (July) out ga 0:034 [0:037;0:031] 0:042 [0:039; 0:045] out ga 0:247 [0:246; 0:248] 0:232 [0:230; 0:233] a out q 1:080 [1:067; 1:092] 1:099 [1:086; 1:111] b out q 0:850 [0:839; 0:861] 0:855 [0:844; 0:867] P in NNT 0:71 None 0:66 None in ga 1:27 [0:90; 1:63] 1:26 [0:94; 1:58] in ga 0:80 [0:37; 1:40] 0:72 [0:58; 0:89] S in ga 0:1 None 0:1 None T in ga 20 None 20 None in q 3:36 [2:77; 3:96] 3:19 [2:52; 3:87] in q 1:26 [1:00; 1:59] 1:52 [1:21; 1:89] S in q 0:0 None 0:0 None T in q 200 None 200 None a in ga 0:88 [0:74; 1:07] 1:10 [0:91; 1:35] b in ga 1:87 [1:42; 2:47] 1:85 [1:39; 2:46] a in q 1:87 [1:18; 2:97] 1:28 [0:61; 2:70] b in q 0:72 [0:37; 1:41] 0:68 [0:29; 1:61] rk a 5:13 [4:72; 5:59] 5:18 [4:73; 5:68] rk a 2:66 [2:26; 3:13] 2:43 [2:08; 2:83] rk b 9:26 [6:89; 12:45] 7:59 [5:68; 10:13] rk b 0:063 [0:047; 0:086] 0:072 [0:054; 0:098] Ld 4:6 [4:54; 4:73] 4:4 [4:34; 4:54] q Ld 64 [62:84; 67:68] 60 [58:86; 62:74] L 3197 None 3080 None 347 Table D.11: Numerical Results of in g andq in g g in g (June) q in g (June) in g (July) q in g (July) g in g (June) q in g (June) in g (July) q in g (July) 1 1:9 200 1:9 200 16 10:8 200 20:0 96 2 5:2 200 2:2 110 17 9:2 66 17:7 200 3 2:9 146 4:5 200 18 1:7 4 15:2 166 4 2:5 64 2:8 160 19 4:3 44 1:6 8 5 2:5 200 2:6 200 20 20:0 156 8:6 64 6 10:1 200 5:6 200 21 10:2 48 3:9 42 7 6:1 94 3:7 152 22 20:0 76 10:0 200 8 4:6 32 5:4 138 23 12:5 52 2:3 16 9 2:5 16 2:6 50 24 3:6 26 19:9 76 10 5:4 200 2:3 18 25 3:5 8 19:6 36 11 4:4 88 2:9 32 26 1:6 2 5:2 32 12 2:4 6 5:6 54 27 20:0 88 5:2 54 13 2:4 14 6:5 200 28 13:0 12 2:6 8 14 17:2 200 18:8 200 29 9:2 84 13:3 100 15 1:9 14 1:9 6 30 14:4 146 1:5 4 348 Table D.12: Numerical Results ofM g of June g 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mg 2955 996 1103 4548 820 882 842 617 2080 344 1379 1564 3042 223 1694 g 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Mg 238 283 3025 1016 56 203 80 166 787 357 1818 43 16 191 277 g 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Mg 604 12 294 56 58 228 16 244 1565 795 1889 208 57 147 16 g 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Mg 367 25 776 202 18 32 412 359 12 267 954 5 23 250 39 g 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 Mg 6 357 7 439 13 208 7 103 97 58 34 116 4 131 3 g 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 Mg 60 58 16 38 34 50 62 25 1 3 17 37 12 5 22 g 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 Mg 11 19 25 18 4 7 12 5 5 4 4 6 6 6 2 g 106 107 108 109 110 Mg 4 1 1 2 1 349 Table D.13: Numerical Results ofM g of July g 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mg 2827 1468 1034 5042 865 1004 568 738 1501 2033 3234 617 330 91 3046 g 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Mg 160 210 259 1722 280 889 259 1648 87 33 134 271 829 194 1760 g 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Mg 45 331 323 171 3 58 231 26 1544 1666 19 520 13 34 18 g 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Mg 11 46 365 219 44 541 60 341 16 3 29 601 297 9 252 g 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 Mg 40 411 1104 373 309 79 6 227 23 9 40 38 200 56 81 g 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 Mg 3 24 11 120 14 3 16 91 4 16 7 46 47 44 7 g 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 Mg 23 14 5 1 18 9 7 9 4 6 11 4 5 3 3 g 106 107 108 109 110 Mg 4 4 4 1 1 350 Generate individual ranking order using Alg. 1 Generate individual genre popularity dist. using (6.4) Generate the size of genre list using size dist. in (6.2) Generate genre- based conditional popularity dist. using (6.5) Generate genre ranking dist. using (6.7) Generate genre appear. prob. using (6.6) GenerateL k using the result in Ch. 6.5.4 Generatea rk g andb rk g ,8g Fundamental Parameter Pool Fundamental Parameter Pool Fundamental Parameter Pool Fundamental Parameter Pool Generate out k andq out k Generate out k andq out k Decide out k andq out k by the probability P NNT out Generate in g andq in g ,8g Generate in g andq in g ,8g Decide in g and q in g by the probability P NNT in Generate individual preference probabilities p k g;m ;8g;m, using (6.24) Generate the global popularity distributionp Gl g;m ;8g;m, using (6.26) S k S k Rg;8g p out k (:) p in k;g (:);8g r k p ap(:) a rk g ;b rk g ,8g out k ;q out k out k ;q out k out k ;q out k in g ;q in g ,8g in g ;q in g ,8g in g ;q in g ,8g ap;N ap;G a Si ;b Si ;M Si G Ld ;q Ld ;L G;Mg;8g L k ;8k p k g;m ,8g;m;k rk a ; rk a ; rk b ; rk b a out 1;ga ;a out 2;ga ;:::;c out 3;q out ga ; out ga ;a out q ;b out q in ga ; in ga ;:::;T in q a in ga ;b in ga ;a in q ;b in q Figure D.16: Complete flow diagram of the individual preference probability and global popularity distri- bution generation. 351 C H A P T E R E Appendices of Chapter 7 E.1 Derivations of the expected Utility We first derive the expression ofU. Using (7.1), (7.2), and (7.3), we obtain U = X k2U A w k K A E ( U D 1 M X m=1 a k m " Y l2U (1b l m 1 fh k;l ;Cg ) # M X m=1 a k m b k m ! +U B M X m=1 a k m " K Y l2U (1b l m 1 fh k;l ;Cg ) # +U S M X m=1 a k m b k m ) = X k2U A w k U D K A + (U B U D ) K X k2U A M X m=1 w k a k m K A " K Y l2U (1b l m E h f1 fh k;l ;Cg g) # + (U S U D ) M X m=1 X k2U A w k a k m b k m K A = X k2U A w k U D K A + (U B U D ) M X m=1 X k2U A w k a k m K A " K Y l2U (1b l m L k;l ) # | {z } Sm +(U S U D ) M X m=1 K X k2U A w k a k m b k m K A = X k2U A w k U D K A + (U B U D ) M X m=1 S m + (U S U D ) M X m=1 K X k2U A w k a k m b k m K A : (E.1) 352 By using (E.1), we thus obtain U net = X k2U A w k U D K A + (U B U D ) M X m=1 S m + (U S U D ) M X m=1 X k2U A w k a k m b k m K A +U S M X m=1 X k2U A X l2U A ;l6=k w l a l m b l m K A = X k2U A w k U D K A + (U B U D ) M X m=1 S m + (K A U S U D ) M X m=1 X k2U A w k a k m b k m K A : E.2 Proof of Theorem 1 To prove the Theorem, we first note that the problem in (7.7) satisfies the block separable structure as follows: max b 1 ;:::;b K U(b 1 ;b 2 ;:::;b K ) s.t. b k 2B k ;8k: (E.2) Eq. (E.2) indicates that the constrains on different blocks are separable. Denote u(b k 0;B r ) =U(b r 1 ;:::;b r k 0 1 ;b k 0;b r k 0 +1 ;:::;b r K ) for brevity. From Alg. 2, we notice thatb r+1 k 0 = arg max b k 02B i u(b k 0;B r ) at each iteration. Hence, we have u(b r+1 k 0 ;B r )u(b r k 0;B r ): (E.3) Thus, we know that the algorithm is monotonically non-decreasing. Then, since the optimal objective function of (7.7) should not be infinity, the algorithm must converge. To prove that Alg. 2 converges to a stationary point if every iteration has an unique maximization, we use the analysis framework for block coordinate descent methods in [230] as follows. 1 Suppose each iteration in Alg. 2 has a unique maximizer. Then, Alg. 2 converges to a unique solutionB = b 1 ;:::; b K as the 1 The proof basically follow the same steps as those in proving Proposition 2.7.1 in [230]. Hence, we here provide a shortened version of the proof. 353 number of iterationsr!1. According to Alg. 2, we know that b k 0 = arg max b k 02B k 0 u(b k 0;B);8k 0 = 1; 2;:::;K: (E.4) As a result, due to concavity, r k 0U( b 1 ;:::; b K ) T b k 0 b k 0 0;8b k 02B k 0; (E.5) wherer k 0 denotes the gradient ofU with respect to componentb k 0. We denoteb = vec(B)2B is the vectorization ofB = [b 1 ;::;b K ] and b = vec(B)2B is the vectorization ofB, whereB = vec(B 1 B 2 :::B K ). Then, notice that (E.5) is true for allk 0 = 1; 2;:::;K. It follows from the Cartesian product structure of a set that rU( b) T b b 0;8b2B; (E.6) wherer is the gradient with respect to b. This proves that Alg. 2 converges to a stationary point. 354 C H A P T E R F Appendices of Chapter 8 F.1 Proof of Lemma 1 Observe that T1 X t=0 ~ R(t;A(t)) = T1 X t=0 1 l s A(t) m (t) +E " l1 X =1 s A(t+) m (t +) #! (F.1) Suppose that actionsA(t);8t; are determined by policyP . By taking the expectations on both sides of (F.1) and then divided byT , we can obtain: 1 T T1 X t=0 E h ~ R(t;A(t)) i = 1 T T1 X t=0 1 l E h s A(t) m (t) i +E " l1 X =1 s A(t+) m (t +) #! = 1 T l2 X t=0 t + 1 l E h s A(t) m (t) i + 1 T T X t=l1 E h s A(t) m (t) i + 1 T T +l1 X t=T +1 l +Tt l E h s A(t) m (t) i : (F.2) 355 Eq. (F.2) then leads to: 1 T T1 X t=0 E h s A(t) m (t) i 1 T T1 X t=0 E h ~ R(t;A(t)) i = 1 T l2 X t=0 lt 1 l E h s A(t) m (t) i 1 T T +l1 X t=T +1 tT l E h s A(t) m (t) i (F.3) It follows that whenT!1, we obtain lim T!1 1 T T1 X t=0 E h s A(t) m (t) i 1 T T1 X t=0 E h ~ R(t;A(t)) i ! = 0: (F.4) F.2 Proof of Theorem 1 Proof of Queue Stability. SupposeM,Z(0), andQ m (0);8m; are finite numbers. Also, assume thatC > 0, c A inst > 0;8A 2 A(t), and r m (t) r max ;8m;t, are finite and bounded. Suppose that the system ultimately becomes unstable, then a sequencefA(t)g generated by minimizing (8.10c) and anm such that lim t!1 Q A(t) m (t) =1 must exist. Thus, for some t 0 , the network must not broadcast file m when t > t 0 . However, this is impossible: as long as there is no broadcast,Z(t) reduces to 0 ast!1. WhenZ(t) = 0, however, we can choose freely to broadcast filem and empty queuem to minimize (8.10c). This leads to contradiction. Therefore, we conclude thatQ m (t);8m;8t; cannot go to infinity, and thus are upper bounded. This leads the queue stability in (8.4c) to be satisfied. Proof of Satisfaction of Cost Constraint. To prove this, we need to prove the rate stability ofZ(t). Recall that the decision whether to broadcast a file is determined by minimizing (8.10c). Assume that Z(t) is infinite whent!1. Then we know that for somet 0 ,Z(t)> M 2 +Vymax c A inst (t) must hold whent>t 0 , where andy max are some positive numbers such thats A m (t),Q m (t), and P M m=1 ~ R m (t;A(t))y max . Note that andy max must exist sinceQ m (t);8m; are upper bounded. This indicates that whent>t 0 , the 356 solution of minimizing (8.10c) must beA slt (t) since M X m=1 Q m (t)s A(t) m (t)V M X m=1 ~ R m (t;A(t)) +Z(t)c A inst (t) > 0 =Z(t)c A slt inst : (F.5) This means, in this case, we will never choose to broadcast the file. Consequently,Z(t) can never increase to infinity whenZ(0) is finite. This contradicts the assumption; thus,Z(t) must be finite and upper bounded by M 2 +Vymax c whent!1. This indicatesZ(t) is rate stable according to Definition 2.2 in [220]. Then, by the Rate Stability Theorem (Theorem 2.5 in [220]), we know the cost constraint in (8.4b) is satisfied. F.3 Proof of Theorem 2 Sincer m (t)r max andc A(t) inst C max , it follows from (8.10) that there must exist some finite non-negative numbery for a policyP such that: (t)V M X m=1 ~ R m (t;A(t)) M X m=1 Q m (t)(r m (t)s m (t))V M X m=1 ~ R m (t;A(t)) +Z(t)(c A(t) inst (t)C) +B M X m=1 Q m (t)r m (t)V M X m=1 ~ R m (t;A(t)) +Z(t)c A(t) inst (t) +B r max M X m=1 Q m (t)Vy +C max Z(t) +B: (F.6) Then by summing the inequality fort from 0 toT 1, we can obtain T1 X t=0 " (t)V M X m=1 ~ R m (t;A(t)) # TVy +TB +r max T1 X t=0 M X m=1 Q m (t) +C max T1 X t=0 Z(t): (F.7) After some algebraical manipulations, it follows that 1 T T1 X t=0 M X m=1 ~ R m (t;A(t))y B V r max TV T1 X t=0 M X m=1 Q m (t) C max TV T1 X t=0 Z(t) + L(T )L(0) VT : (F.8) 357 Observe that P M m=1 E[Q m (t)] V , andE[Z(t)] V . By lettingT!1 and taking the expectation, we then obtain lim inf T!1 1 T T1 X t=0 E " M X m=1 ~ R m (t;A(t)) # y B V r max C max : (F.9) Finally, according to (F.6), we can understand that the minimization of (8.10c) can maximizey . F.4 Proof of Theorem 3 Consider the drift-plus-penalty minimization policyP , in whichA(t) is determined byP . Using (8.10) and summing fromt = 0 tot =T 1, we can obtain L(T )L(0)V T1 X t=0 M X m=1 ~ R m (t;A(t)) T1 X t=0 M X m=1 Q A(t) m (t)(r m (t)s A(t) m )V T1 X t=0 M X m=1 ~ R m (t;A(t)) + T1 X t=0 Z A(t) (t) c A(t) inst (t)C + T1 X t=0 B (F.10) SinceA(t) minimizes (8.10c), we obtain T1 X t=0 M X m=1 Q A(t) m (t)(r m (t)s A(t) m )V T1 X t=0 M X m=1 ~ R m (t;A(t)) + T1 X t=0 Z A(t) (t) c A(t) inst (t)C + T1 X t=0 B T1 X t=0 M X m=1 Q A(t) m (t)(r m (t)s m )V T1 X t=0 M X m=1 ~ R m (t; ) + T1 X t=0 Z A(t) (t) c inst (t)C + T1 X t=0 B: (F.11) Then by taking expectations on (F.10) and using (F.11) and the assumption in (8.12), we can obtain L(T )L(0)V T1 X t=0 M X m=1 E h ~ R m (t;A(t)) i T1 X t=0 M X m=1 E h Q A(t) m (t) i V T1 X t=0 M X m=1 E h ~ R m (t; ) i + T1 X t=0 E h Z A(t) (t) i + T1 X t=0 B (a) T1 X t=0 M X m=1 E h Q A(t) m (t) i V T1 X t=0 M X m=1 y m + T1 X t=0 E h Z A(t) (t) i + T1 X t=0 B; (F.12) where (a) is because policy is i.i.d.. Since can be arbitrarily zero, it follows from Lemma 1 that lim inf T!1 1 T T1 X t=0 M X m=1 E h s A(t) m (t) i lim T!1 1 VT T1 X t=0 M X m=1 Vy m B V + L(T )L(0) VT M X m=1 y m B V : 358
Abstract (if available)
Abstract
Based on the concentrated popularity distribution of video files, caching of popular files on devices, and distributing them via device-to-device (D2D) communications allows a significant improvement of wireless video networks. This dissertation discusses various aspects of the cache-aided wireless D2D networks, aiming to provide further understanding and improvement. Both practical and theoretical perspectives are considered. Starting from the conventional homogeneous user preference model, we analyze the throughput-outage performance of the single-hop and multi-hop cache-aided D2D networks from the information-theoretical point of view. The analysis assumes a model based on the measured popularity distribution of mobile users
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Fundamentals of two user-centric architectures for 5G: device-to-device communication and cache-aided interference management
PDF
Fundamental limits of caching networks: turning memory into bandwidth
PDF
Domical: a new cooperative caching framework for streaming media in wireless home networks
PDF
Exploiting side information for link setup and maintenance in next generation wireless networks
PDF
Elements of next-generation wireless video systems: millimeter-wave and device-to-device algorithms
PDF
Enabling virtual and augmented reality over dense wireless networks
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Using formal optimization techniques to improve the performance of mobile and data center networks
PDF
Multidimensional characterization of propagation channels for next-generation wireless and localization systems
PDF
Neighbor discovery in device-to-device communication
PDF
Improving spectrum efficiency of 802.11ax networks
PDF
Joint routing, scheduling, and resource allocation in multi-hop networks: from wireless ad-hoc networks to distributed computing networks
PDF
Real-time channel sounder designs for millimeter-wave and ultra-wideband communications
PDF
Anycast stability, security and latency in the Domain Name System (DNS) and Content Deliver Networks (CDNs)
PDF
Cache analysis and techniques for optimizing data movement across the cache hierarchy
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
PDF
Achieving efficient MU-MIMO and indoor localization via switched-beam antennas
PDF
Theory and design of magnetic induction-based wireless underground sensor networks
PDF
Enhancing collaboration on the edge: communication, scheduling and learning
PDF
Channel sounding for next-generation wireless communication systems
Asset Metadata
Creator
Lee, Ming-Chun
(author)
Core Title
Design, modeling, and analysis for cache-aided wireless device-to-device communications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
06/08/2020
Defense Date
04/30/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cache replacement,device-to-device communications,individual user preference,OAI-PMH Harvest,scaling laws,tradeoff,wireless caching networks
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Molisch, Andreas (
committee chair
), Ghandeharizadeh, Shahram (
committee member
), Psounis, Konstantinos (
committee member
)
Creator Email
mingchul@usc.edu,stevenlee04054@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-315301
Unique identifier
UC11665855
Identifier
etd-LeeMingChu-8569.pdf (filename),usctheses-c89-315301 (legacy record id)
Legacy Identifier
etd-LeeMingChu-8569.pdf
Dmrecord
315301
Document Type
Dissertation
Rights
Lee, Ming-Chun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
cache replacement
device-to-device communications
individual user preference
scaling laws
tradeoff
wireless caching networks