Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Optimal distributed algorithms for scheduling and load balancing in wireless networks
(USC Thesis Other)
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
OPTIMAL DISTRIBUTED ALGORITHMS FOR SCHEDULING AND LOAD
BALANCING IN WIRELESS NETWORKS
by
Dilip Bethanabhotla
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2015
Copyright 2015 Dilip Bethanabhotla
Dedication
This dissertation is dedicated to my loving family and all the teachers, mentors and
researchers who have inspired me.
ii
Acknowledgements
It is obvious that this thesis would not have been possible without the encouragement
and support from my advisor Prof. Giuseppe Caire. I consider myself fortunate to have
met him and be advised and inspired by him. He is a role model for me as well as several
other students as an erudite scholar and mentor on one hand and as a multifaceted
personality on the other hand. His brilliance and versatility have greatly in
uenced
my professional as well as personal life. Being inclined more towards mathematics and
theory as an early graduate student, thanks to Prof. Caire, I was carefully steered into
the spirit of engineering, which is to make mathematics relevant to engineering problems
through precise modeling of a practical system and then test the validity of the model and
the results/conclusions obtained thereof through extensive experiments. He is probably
the only person who believed in my technical abilities and encouraged/pushed me to be
creative in research, work hard, come up with my own results and realize my full potential.
On the soft skills front, Prof. Caire made me realize the importance of being proactive
in getting things done, networking with people, communicating and sharing ideas with
researchers from varied backgrounds. Being shy and aloof initially, I have transformed
into someone who can now talk, collaborate and work together in teams with people
from diverse backgrounds both in professional and personal life. I am also thankful to
him and Dr. Haralabos Papadopoulos at Docomo Innovations Inc. for the internship
opportunity in Silicon Valley which signicantly changed my perspective of engineering
and its impact on society. It has opened doors for me and has got me curious about
the process of turning a theoretical idea into something which can have an impact on a
system used by a billion people. In fact, it has also shaped my next career step.
iii
I am grateful to Prof. Konstantinos Psounis, Prof. Michael Neely, Prof. Andreas
Molisch and Prof. Jason Fulman for agreeing to serve on my thesis committee and for all
the fruitful research interactions that I had with them. I was fortunate to work with each
of them and have learned a lot from their sharp thinking and great attention to detail. I
am greatly indebted to them for their constructive comments and for being coauthors of
papers which led to the results of this dissertation.
I thank Dr. Haralabos Papadopoulos for hosting me at Docomo Innovations Inc.
during the summer of 2013. That internship in Silicon Valley certainly was a watershed
in shaping my viewpoint on engineering innovation. I thank him for his warm friendship
and advice on several issues.
I also thank Ozgun, Konstantin, Joon, Mingyue and Yonglong for being such won
derful collaborators and coauthors on several papers we wrote together. I extend my
thanks to SongNam and Ansuman for their help and advice on many occasions and also
to Vinod for being a great roommate through the years.
Finally and most importantly, I thank my parents for their sel
ess love, support and
encouragement in every endeavor I take up. Right from the outset, they guided me in my
education and made sure I always took the right steps. Their high values and emphasis on
quality education have shaped the person I am today. I thank my elder brother Sandeep
and sisterinlaw Lavanya for their help in making the transition to grad school, advice
on several practical issues and for introducing me to the bright and breezy California.
And a special thank you to the new entrant in the family, my wife, Manaswini.
iv
Abstract
With the proliferation of billions of smart devices including wearables, multimedia capa
ble handheld devices like tablets and smartphones, and the deluge of HD video content
streaming on them, the existing wireless networking technologies (cellular + WiFi) need
a signicant overhaul in terms of system architecture design and ecient resource alloca
tion algorithms. This poses several challenges across the entire spectrum of the wireless
network system architecture. These range from the unpredictable wireless medium and
interference at the lower layers, to limited throughput and unfair resource sharing at
the network layer, and to complex media content characteristics at the application layer.
Typical approaches to these problems have been isolated and independent across network
layers, leading to partial heuristic solutions at dierent layers which when put together
do not perform well and often introduce more complexity.
In this dissertation, taking a holistic view of the entire system, we have designed net
work architectures and algorithms for scheduling, load balancing and congestion control
spanning dierent layers of the network. The primary focus is on developing low com
plexity algorithms which work in a selfadaptive and online fashion in response to un
predictably changing network conditions, and which can be implemented in a distributed
manner across the various nodes in the network. In particular, the key underlying theme
of this dissertation is to show that the designed algorithms, albeit distributed in nature, are
actually optimal in the sense of maximizing a global networkwide performance metric.
In the rst part of this dissertation, we consider the usercell association problem for
a massive MIMO heterogeneous network. We focus on the design and analysis of self
organizing, usercentric distributed algorithms for load balancing in a wireless network
with the advanced physical layer feature of Massive MIMO which is likely to play a
v
key role in future cellular standards like 5G and WiFi standards like 802.11 ac/ax. We
formulate the usercell association problem as a network utility maximization, where the
network utility is a function of the users' longterm average rates (peruser throughputs).
Under a massiveMIMO specic system model, we show that optimizing the activity
fractions between userBS pairs problem is a convex problem that can be solved eciently
by centralized subgradient algorithms. Furthermore, we show that such a solution is
physically realizable, in the sense that there exists a scheduling sequence approaching
arbitrarily closely the optimal activity fractions. We then consider a decentralized user
centric scheme, where each user has a positive probability to switch cell association if the
utility expected from a dierent base station is higher than the utility achieved from the
currently associated one. We formulate a noncooperative association game and show that
its purestrategy Nash equilibria must be close to the global optimum of the centralized
problem. We also show that, under certain technical conditions that we refer to as heavy
loaded network, if the centralized global optimum consists of a unique association (i.e., no
user has positive activity fraction to more than one base station), then this association is a
purestrategy Nash equilibrium of the corresponding usercentric association game. Based
on previously known results, we also have that the proposed usercentric decentralized
probabilistic scheme converges to a purestrategy Nash equilibrium with probability 1,
for the practically relevant cases of proportional fairness and maxmin fairness utility
functions. Hence, our usercentric algorithm is attractive not only for its simplicity and
fully decentralized implementation, but also because it operates near the system social
optimum.
In the second part of the dissertation, we focus on the design and analysis of low
complexity, selfadaptive and distributed algorithms for user scheduling and congestion
control for ecient delivery of video content over wireless networks. In particular, we
consider the problem of optimizing delivery of stored video to users in a multicell wireless
network formed by many users and helpers, deployed over a localized geographic area
and sharing the same channel bandwidth. We focus on the wireless segment of the
vi
network, assuming that the video les are already present at the helper nodes. This
condition holds when the backhaul connecting the helper nodes to some video server
in the core network is fast enough, such that we can neglect the delays introduced by
the backhaul. In the case where such fast backhaul is not present, we assume that the
helpers can cache the relevant les by exploiting the inherent asynchronous content reuse
of VoD in order to predict and proactively store the popular video les such that, with
high probability, the demanded les are eectively already present in the helpers caches.
This justies our assumption of neglecting the eects of the wired backhaul and focusing
only on the wireless segment of the system. For the network at hand, we consider the
problem of simultaneous ondemand video streaming to multiple users, where multiple
unicast streaming sessions run in parallel and compete for the same network resources.
We formulate the problem as a Network Utility Maximization (NUM) where the objective
is to fairly maximize users' video streaming Quality of Experience (QoE) and then derive
an iterative scheme using Lyapunov Optimization, which can solve the NUM problem
up to any level of accuracy. Moreover, it can be used directly as an online protocol
by interpreting the iterations as control actions in successive transmission slots. The
proposed scheme decomposes into interconnected layers: an adaptive video streaming
layer that is reminiscent of DASH, implemented at each user node, and involves video
chunk requests, playback buer monitoring and adaptive selection of the coded video,
and a maxweight transmission scheduler implemented at each helper. These two layers
are interconnected by the appropriate queues maintained at the nodes of the network,
which form the weights for the maxweight scheduler. We then extend the design of
the maxweight transmission scheduler to the case where the helpers are equipped with
multiuser MIMO (MUMIMO) capabilities. We exploit the channel hardening eect
of high dimensional MIMO channels and devise a low complexity user selection scheme
to solve the underlying combinatorial problem of selecting user subsets for linear zero
forcing beamforming (LZFBF), which can be easily implemented and run independently
at each helper. Through simulations, we show that deploying MUMIMO signicantly
vii
improves the average video quality and reduces the percentage of time spent in buering
mode. In addition, we demonstrate that the proposed cross layer approach is able to
serve users more fairly than a baseline scheme representative of current systems running
independently designed protocol layers, where the video quality adaptation (e.g., based
on DASH) runs on top of a classical MAC/PHY transmission scheduler (e.g., based on
Proportional Fairness), without exploiting crosslayer joint optimization.
viii
Table of Contents
Dedication ii
Acknowledgements iii
Abstract v
List of Tables 4
List of Figures 5
Chapter 1 Introduction 6
1.1 SelfOrganizing Massive MIMO Heterogeneous Wireless Networks . . . . . 6
1.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Video Aware Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2 Pull Video Streaming and Extension to MUMIMO . . . . . . . . 14
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 2 Optimal UserCell Association for Massive MIMO Wireless Networks 17
2.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 System Model and Problem Denition . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Instantaneous rates and user throughput in massive MIMO systems 21
2.2.2 Recasting usercell association as NUM problem . . . . . . . . . . 24
2.3 Centralized Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.1 KKT conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Solving for the Primal Variables . . . . . . . . . . . . . . . . . . . 35
2.4 Distributed UserCell Association Algorithms . . . . . . . . . . . . . . . . 37
2.4.1 Usercentric association games . . . . . . . . . . . . . . . . . . . . 40
2.4.2 Usercentric decentralized online algorithms . . . . . . . . . . . . 42
2.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.2 Experiment 2: 3GPP HetNet Model . . . . . . . . . . . . . . . . . 47
Chapter 3 Adaptive Video Streaming for Wireless Networks with Multiple Users
and Helpers 51
3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.1.1 Wireless transmission channel . . . . . . . . . . . . . . . . . . . . . 54
1
3.1.2 Transmission queues dynamics and network state . . . . . . . . . . 56
3.2 Problem Formulation and Optimal Scheduling Policy . . . . . . . . . . . . 57
3.2.1 Dynamic scheduling policy . . . . . . . . . . . . . . . . . . . . . . 61
3.2.1.1 Control actions at the user nodes (congestion control) . . 62
3.2.1.2 Control actions at the helper nodes (transmission schedul
ing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.2 Derivation of the scheduling policy . . . . . . . . . . . . . . . . . . 66
3.2.2.1 Derivation of the congestion control action . . . . . . . . 69
3.2.2.2 Derivation of the transmission scheduling action . . . . . 69
3.2.3 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Prebuering, rebuering and skipping chunks . . . . . . . . . . . . . . . 72
3.3.1 Skipping chunks from playback . . . . . . . . . . . . . . . . . . . . 75
3.3.2 Prebuering and rebuering . . . . . . . . . . . . . . . . . . . . . 77
3.4 Numerical Experiments, Discussion and Conclusions . . . . . . . . . . . . 78
3.4.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.4.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Chapter 4 WiFlix: Adaptive Video Streaming in Massive MUMIMO Networks 87
4.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2.1 Timescales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.2 Request Queue Dynamics . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.3 Wireless System Model with Massive MUMIMO Helpers . . . . . 94
4.2.4 Network State and Dynamic Scheduling Policy . . . . . . . . . . . 98
4.3 Problem Formulation and Streaming Policy . . . . . . . . . . . . . . . . . 99
4.3.1 The DriftPlusPenalty Expression . . . . . . . . . . . . . . . . . . 102
4.3.2 The DriftPlusPenalty Policy . . . . . . . . . . . . . . . . . . . . . 105
4.3.2.1 Control actions at the user nodes (pull congestion control) 105
4.3.2.2 Control actions at the helper nodes (transmission schedul
ing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.4 Policy Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5 Prebuering and rebuering chunks . . . . . . . . . . . . . . . . . . . . . 111
4.6 Numerical Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Appendix A
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.1 Massive MIMO User Rates . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
A.3 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Appendix B
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
B.1 Proof of Lemma 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
B.2 Proof of Theorem 3 and of Corollary 2 . . . . . . . . . . . . . . . . . . . . 131
2
Appendix C
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
C.1 Proof of Theorem 4 and of Corollary 3 . . . . . . . . . . . . . . . . . . . . 138
References 145
3
List of Tables
3.1 Arrival times of chunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4
List of Figures
2.1 Wireless network with 2 Macro BSs and several small cell BSs . . . . . . . 45
2.2 Performance comparison of various algorithms for
1 and the layout of
Fig. 2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3 A 3GPP HetNet scenario with small cell BSs deployed in hot zones . . . . 48
2.4 Comparison of the proposed distributed algorithm and Max peakrate in
terms of the throughput statistics: 5 percentile rate, arithmetic mean rate
and geometric mean rate, for
1 and the layout of Fig. 2.3. . . . . . . . 49
2.5 Load distribution: proposed distributed algorithm vs. Max peakrate as
sociation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1 Evolution of number of ordered and consumed chunks . . . . . . . . . . . 74
3.2 Ratequality prole of the test video sequence used in our simulations. . . 81
3.3 Toplogy (the green line indicates the trajectory of the mobile user in Ex
periment 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4 CDFs of dierent performance metrics for Experiment 1. . . . . . . . . . . 85
3.5 Topology and CDFs of dierent performance metrics for Experiment 2. . 86
4.1 TimeScale Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.3 Performance tradeos with policy control parameter V . . . . . . . . . . . 116
4.4 Performance comparison of advanced and dumb receivers. . . . . . . . . . 117
4.5 Video streaming QoE improvement with MUMIMO over SUMIMO . . . 118
4.6 Performance comparison of a crosslayer approach with a baseline scheme. 119
5
Chapter 1
Introduction
This dissertation addresses two key problems in wireless network scheduling with the
common underlying theme of designing distributed algorithms which are globally optimal
in the sense of maximizing a networkwide performance metric. The rst problem that we
address in Chapter 2 of this dissertation is the design of algorithms for usercell association
and load balancing in Massive MIMO heterogeneous networks while the second problem
we treat in Chapters 3 and 4 is the design of transmission scheduling and congestion
control algorithms for ecient delivery of video content over wireless networks. In the
following, we describe the two problems in more detail:
1.1 SelfOrganizing Massive MIMO Heterogeneous Wireless
Networks
With the proliferation of mobile devices and services, industry predicts that the wireless
data trac is going to increase by two to three orders of magnitude within a decade [1].
Although the denition of the next generation of systems and standards is at its initial
phase, it is widely agreed that the next generation of wireless networks, generally referred
to as \5G", will involve a combination of multiuser MIMO technology, cell densication,
and heterogeneous architectures based on nested tiers of smaller and smaller cells op
erating at higher and higher frequencies, in order to target trac hotspots [2]. These
6
trends have motivated the recent surge of research on massive and dense deployment of
base station antennas, both in the form of Massive MIMO schemes, with hundreds of an
tennas at each cell site [3{5], and in the form of multitier networks of densely deployed
smallcells [6,7].
Massive MIMO promises dramatic increases in spectral eciency by transmitting
independent data streams simultaneously to multiple users sharing the same transmission
resource (timefrequency slot). The massive MIMO regime [3{5] distinguishes itself from
classical multiuser MIMO [8,9] by the fact that the number of served users is signicantly
less than the (very large) number of base station antennas. Operating in TimeDivision
Duplexing (TDD) mode, massive MIMO can provide very large spectral eciencies, simple
percell processing, and very attractive power eciency due to the large array gain [3].
Thanks to the higher and higher carrier frequencies [10], it is possible to implement
massive MIMO even in relatively small base stations within a reasonable form factor.
Hence, it is envisaged that massive MIMO will not just be applied to large towermounted
base stations, but also used in conjunction with small cells [11].
The heterogeneous wireless network framework mentioned above may include some of
the following features: 1) base stations that may dier signicantly by transmit power,
number of antennas, and multiplexing gain (e.g., see [12] and references therein); 2) non
homogeneous user spatial distribution, characterized by highdensity hotspots separated
by less dense regions [13]; 3) Due to the large beamforming gain of massive MIMO, a user
may be in good SINR conditions with respect to several base stations. As a consequence,
the rationale that has driven for decades the conventional cellular system design and
optimization, based on symmetric latticedeployed cells (see for example [3{5]) and/or
(roughly) uniform number of users per cell (e.g., see [14] and references therein), must be
abandoned in favor of more ecient schemes that include usercell association into the
optimization problem.
In conventional technologies, the usercell association is decided on the basis of the so
called Reference Signal Received Power (RSRP), possibly in combination with Reference
7
Signal Received Quality (RSRQ) (see [13] for details). In short, these are measures of the
signal strength measured on a loadindependent reference beacon signal sent by each base
station [15]. Such association does not take into account the actual load of base stations,
i.e., the number of associated users per downlink data stream, and may be arbitrarily
suboptimal in a heterogeneous scenario. \Biasing" is a commonly proposed method to
cope with cell or user density asymmetries, where the RSRP is articially scaled by a bias
term that depends on the type of base station [15, 16] in order to inherently steer users
to associate with close smallcell base stations, thereby \oloading" congested macro
cells. Nevertheless, biasing methods are either heuristic or are based on some average
performance metrics, where averaging is over the random placement of users and base
station according to stochastic geometry models [14{17]. Furthermore, biasing attempts
to balance user trac across tiers, but not within each tier. In contrast, here we seek
pointwise optimal usercell association, i.e., for any given placement of users and base
stations.
1.1.1 Contributions
In Chapter 2, we focus on the problem of optimal usercell association for the downlink of
a heterogeneous wireless network (including the features said above) with massive MIMO
base stations. Our problem formulation captures the fact that, in modern dataoriented
systems with OFDMA/TDMA scheduling, not all users are simultaneously served on all
the transmission resources (timefrequency slots). Hence, what matters is not the user
instantaneous rate or SINR level, achieved at any given timefrequency slot, rather the
longterm average rate, referred to hereafter as peruser throughput. It is also important to
notice that, in realistic network topologies, users have dierent distances and propagation
conditions (pathloss) with respect to the base stations. Hence, maximizing the network
spectral eciency (user sum rate) typically yields unacceptable peruser performance,
since this may lead to a large number of users located in unfavorable positions (e.g.,
at the cell edges) with nearzero throughput (see [5, 18, 19]). Motivated by the above
8
observations, we formulate the system optimization problem as a rigorous Network Utility
Maximization (NUM), where the fairness criterion is re
ected by the choice of the network
utility function. Instead of focusing on the perslot instantaneous user rates, our network
utility is a function of the user throughputs. It should be noticed that fairness across
the users is often implicitly assumed by considering equal user airtime, as for example
in [3, 4]. Since equal airtime is just one of the many possible fairness criteria, here we
take a more systematic approach, which includes equal airtime as a special case.
It is wellknown that solving the general joint usercell association, precoding vectors
design, and power allocation problem is NPhard [20, 21]. Instead, in this dissertation
we heavily exploit the specic system simplication occurring in the massive MIMO
regime [3{5]. While in general the user instantaneous rates are functions of the multiuser
MIMO precoding scheme, of the base station power allocation, and of the MIMO channel
matrix realization (e.g., see [22]), in a massive MIMO system the instantaneous rates
converge to easily computable deterministic limits R
k;j
that depend only on the overall
system topology (path gains between base stations userk) and system conguration (pilot
signal allocation, transmit power, number of antennas and number of downlink data
streams of the base stations), but are independent of the other users' cell association (see
details in Appendix A.1). It follows that the massive MIMO regime induces decoupling
and symmetrization of the user instantaneous rates, yielding a dramatic simplication
of the NUM problem, which turns out to be convex with respect to the user activity
fractions, i.e., the fractions
k;j
of transmission resources over which user k is served
by base station j. Furthermore, we prove that the solution to this convex problem is
physically realizable in the following sense: there exists a feasible schedule consisting of a
sequence of integer scheduling congurations (see denition in Section 2.2) such that, by
timesharing these congurations, the timeaveraged user rates converge to the globally
optimal throughput vector.
While our NUM solution is optimal, its implementation as an online protocol re
quires centralized computation and coordination across the base stations. This may be
9
undesirable in practice. Then, we also consider a fully decentralized usercentric scheme
similar to [23], where each user has a positive probability to switch cell association if the
utility expected from a dierent base station is higher than the utility achieved from the
currently associated one. In particular, we formulate a related noncooperative associa
tion game where the users are the players, and the base stations operate according to a
local resource allocation rule that determines the users' utility. By studying the KKT
conditions of the global optimization problem and comparing them with the conditions
under which the bestresponse strategy of the game makes all users keep their current
association, we prove that the purestrategy Nash equilibria of such a game must be very
close to the global optimum of the centralized problem. Furthermore, we prove that,
under certain technical conditions that we refer to as heavyloaded network, if the central
ized global optimum consists of a unique association (i.e., no user has positive activity
fraction to more than one base station), then this association is a purestrategy Nash
equilibrium of the corresponding usercentric association game. Based on [23], we also
have that the proposed usercentric decentralized scheme converges to a Nash equilib
rium with probability 1, for the practically relevant cases of proportional fairness (PF)
and hard fairness (HF). Hence, our usercentric algorithm is attractive not only for its
simplicity and fully decentralized implementation, but also because it operates near the
system social optimum.
Further details in Chapter 2.
1.2 Video Aware Wireless Networks
Demand for video content over wireless networks has grown dramatically in recent years
and shows no signs of slowing down. According to the Cisco Visual Networking Index
mobile forecast for 20132018 [1], mobile video data will increase 14fold, accounting for
69 percent of total mobile data trac by 2018. This increase is mainly due to ondemand
video streaming, enabled by multimedia devices such as tablets and smartphones. In
10
addition, recent measurement studies [24] reveal that, in 2013, around 26.9% of video
streaming sessions on the Internet experienced playback interruption due to rebuering,
43.3% were impacted by low resolution, and 4.8% failed to start altogether. In order to
meet such explosive demands for video content, dynamic adaptive video streaming over
HTTP(DASH) [25] is gaining popularity for video streaming over wireless networks. This
is an application layer protocol for video delivery where each client estimates the avail
able capacity during a video streaming session and chooses the most appropriate video
quality level correspondingly. Dierent quality levels can be obtained either by storing
multiple versions of the same video encoded at dierent bitrates, or by using scalable
video coding and sending an adaptive number of renement layers. In this way, DASH
attempts to maintain a reasonable quality of experience (QoE) even under changing net
work conditions. However, a signicant overhaul of existing wireless systems is needed
not just at the application layer but also at the media access control (MAC) and phys
ical (PHY) layers. For instance, the popular video platforms like Youtube and Net
ix,
which employ DASH at the application layer, have realized this fact and recently released
Video Quality Reports [26,27] where they compare and contrast dierent network service
providers (ISP) in a given geographical area and rank/label them as either Lower De
nition (LD) or Standard Denition (SD) or High Denition (HD) based on the quality
of video streaming activity in their network over a certain time frame in order to inform
users that the choice of ISP can aect video streaming QoE. Moreover, it is well un
derstood that the current trend of cellular and WiFi technology (e.g., LTE [28]) cannot
cope with such trac increase, unless the density of the deployed wireless infrastructure
is increased correspondingly. This motivates the recent
urry of research on massive
and dense deployment of base station antennas, either in the form of \massive MIMO"
solutions (hundreds of antennas at each cell site [4, 5, 29]) or in the form of very dense
smallcell networks (multiple nested tiers of smaller and smaller cells, possibly operating
at higher and higher carrier frequencies [6, 7]). While discussing the relative merits of
11
these approaches is out of the scope of this dissertation, we mention here that the small
cell solution appears to be particularly attractive to handle a high density of nomadic
(low mobility) users demanding high data rates, as for typical VoD streaming users.
Motivated by these considerations, in this dissertation we envisage a network formed
by densely deployed xed nodes (denoted as helpers), serving multiple stationary or low
mobility (nomadic) videostreaming users. We focus on VoD streaming, where users start
their streaming sessions at random times, and demand dierent video les. Hence, the
approach of having all users overhearing a common multicasting data stream, as in live
streaming, is not applicable. In contrast, each streaming user requests sequentially a
number of video segments (referred to as chunks) and starts its playback after some pre
buferring delay, typically much smaller than the duration of the whole streaming session.
In order to guarantee continuous playback in the streaming session, the system has to
ensure that each video chunk is delivered before its playback deadline. This fundamentally
dierentiates VoD streaming from both live streaming and le downloading.
We focus on the wireless segment of the network, assuming that the video les are
already present at the helper nodes. This condition holds when the backhaul connecting
the helper nodes to some video server in the core network is fast enough, such that we can
neglect the delays introduced by the backhaul. In the case where such fast backhaul is
not present, the recently proposed approach of caching at the wireless edge (see [30{41])
was shown to be able to exploit the inherent asynchronous content reuse of VoD in order
to predict and proactively store the popular video les such that, with high probability,
the demanded les are eectively already present in the helpers caches. This justies
our assumption of neglecting the eects of the wired backhaul and focusing only on the
wireless segment of the system.
1.2.1 Contributions
In order to address the problem of ecient delivery of video over wireless networks, a cross
layer optimization approach has been proposed in several works (e.g., see [42{47]). It is
12
wellknown from these works that metrics such as prebuering time, rebuering time and
video quality impact streaming QoE. However, the joint optimization of these metrics by
directly controlling the dynamics of the playback buers of all the users in the network
seems to require solving a Markov Decision Problem (MDP) which is typically quite
dicult and incurs the wellknown curse of dimensionality. For instance, [43] considers
the adaptive transmission of video in a much simpler setting of a pointtopoint wireless
link and formulates the problem as an MDP which is then solved using the value iteration
algorithm. However, even in such a simple pointtopoint scenario, the value iteration
policy requires extensive computation to be done oine and stored in a lookup table
which is then used for the actual transmission. Thus, in order to obtain a tractable
formulation, we follow a \divide and conquer" approach, conceptually organized in the
following steps:
i) We formulate a Network Utility Maximization (NUM) problem [48{50] where the
network utility function is a concave and componentwise nondecreasing function of the
timeaveraged users' requested video quality index and the maximization is subject to the
stability of all queues in the system. The shape of the network utility function can be
chosen in order to enforce some desired notion of fairness across the users [51].
ii) We solve the NUM problem in the framework of Lyapunov Optimization [52],
using the drift plus penalty (DPP) approach [52]. The obtained solution is provably
asymptotically optimal (with respect to the dened NUM problem) on a persample path
sense (i.e., without assuming stationarity and ergodicity of the underlying network state
process [52, 53]). Furthermore, it naturally decomposes into subpolicies that can be
implemented in a distributed way, by functions performed at the users and the helpers,
requiring only local information. The function implemented at the user nodes is referred
to as congestion control, since it consists of the adaptive selection of the video quality and
the serving helper. The function implemented at the helpers is referred to as transmission
scheduling, since it corresponds to the adaptive selection of the user to be served on the
downlink of each helper station.
13
iii) We observe that, since all queues in the system are stable, all requested video
chunks shall be eventually delivered.
iv) As a consequence, in order to ensure that all the video chunks are delivered within
their playback deadline, it is sucient to ensure that the largest delay among all queues
at the helpers serving any given user is not larger than the prebuering time allowed
for that user at its streaming session startup phase. We refer to the event that a chunk
is not delivered within its playback deadline as a buer underrun event. Since such
events are perceived as very harmful for the overall quality of the streaming session, the
system must operate in the regime where the relative fraction of such chunks (referred
to as buer underrun rate) is small. We refer to such desirable regime as the smooth
streaming regime. In particular, when the maximum delay of each queue in the system
admits a deterministic upper bound (e.g., see [54]), setting the prebuering time larger
than such bound makes the underrun rate equal to zero. However, for a system with
arbitrary user mobility, arbitrary perchunk
uctuations of the video coding rate (as in
typical Variable BitRate (VBR) coding [55]), and users joining or leaving the system at
arbitrary times, such deterministic delay upper bounds do not exist. Hence, in order to
make the system operate in the smooth streaming regime, we propose a method to locally
estimate the delays with which the video packets are delivered, such that each user can
calculate its prebuering and rebuering time to be larger than the locally estimated
maximum queue delay. Through simulations, we demonstrate that the combination of
our scheduling policy and adaptive prebuering scheme is able to achieve the desired
fairness across the users and, at the same time, very small playback buer underrun rate.
Further details in Chapter 3.
1.2.2 Pull Video Streaming and Extension to MUMIMO
Chapter 3 considers a \push" scheduling policy that can serve data out of order and
may result in data loss in the presence of intermittent connectivity and/or mobility. In
contrast, Chapter 4 takes a dierent approach that is robust to fast topology variations. It
14
opportunistically \pulls" data, in order, from nodes in the immediate neighborhood. This
results in smoother and more reliable performance. Another shortcoming of Chapter 3 is
that it considers only helpers operating according to OFDM/TDMA, i.e., serving at most
one user per transmission resource (referred to as PHY frame hereafter). As a matter of
fact, the current wireless technology trend is rapidly evolving towards multiuser MIMO
schemes ( e.g., see [56{59]) where multiple users can be served on the same PHY frame
by spatial multiplexing. We therefore describe a system WiFlix in Chapter 4 that allows
for general wireless channel models, including multiuser MIMO as a special case.
Motivated by the forthcoming progress in small cell networks (e.g., in wave2 of the
recent IEEE 802.11ac standard), in Chapter 4, we design ecient algorithms for adaptive
video streaming in a network of helpers capable of multiuser MIMO, i.e., serving multiple
users on the same timefrequency slot by spatial multiplexing through multiple antennas.
We devise a low complexity greedy user selection scheme to solve optimally the underlying
combinatorial problem of selecting users for multiuser beamforming. The transmission
scheduling decisions consist of each base station choosing the subset of users for MU
MIMO beamforming. By exploiting the channel hardening eect of high dimensional
MIMO channels, we reduce the combinatorial weighted sum rate maximization over the
multiuser multicell network (which would involve an exponentially complex exhaustive
user selection, or some polynomial complexity heuristic greedy user selection at each
base station) to a simple subset selection problem which is optimally solved by a low
complexity algorithm. The algorithm can again be implemented independently at the
MAC layer of every base station/access point of the network service provider.
Finally, we demonstrate through extensive simulation in a realistic network topology
and using actual encoded video data that the proposed video streaming system with
MUMIMO base stations is very eective in improving the average video quality and
reducing the percentage of the video streaming session spent in buering mode. Given
its promising performance, we expect that the system WiFlix described in Chapter 4 can
act as a blueprint for future practical system design.
15
Further details in Chapter 4.
1.3 Organization
This thesis is organized into three parts. Chapter 2 addresses the problem of usercell
association and load balancing in a Massive MIMO heterogeneous wireless network. Then,
Chapters 3 and 4 address the problem of ecient video delivery over wireless networks.
In particular, Chapter 3 proposes a push scheduling policy in a network with multiple
helpers and nomadic users. On the other hand, Chapter 4 proposes a blueprint `WiFlix'
for a possible future wireless system with a pull streaming policy which handles video
chunk delays in a better way and is more robust to user mobility. Furthermore, the key
extension in WiFlix is the low complexity algorithm at the MAC layer for scheduling
subsets of users for MUMIMO beamforming which boosts the video streaming QoE of
the users as indicated by the simulation results in Chapter 4.
Please note that the mathematical notation used is independent across chapters.
16
Chapter 2
Optimal UserCell Association for Massive MIMO Wireless
Networks
This chapter is organized as follows. In Section 2.1, we review prior work addressing
the problem of usercell association. In Section 2.2, we describe the system model for a
network with Massive MIMO base stations and then formulate the usercell association
problem in such a network as a convex Network Utility Maximization (NUM) problem.
We also show that fractional solutions are, in fact, physically realizable. Section 2.3
illustrates our proposed centralized scheme based on the dual subgradient method to solve
the NUM problem. Then, in Section 2.4, we consider a randomized and decentralized
usercentric association scheme where each user selshly decides to associate with a base
station based on its own usercentric utility function. We show that this scheme actually
converges to a Nash equilibrium and that, surprisingly, such Nash equilibria are close to
being socially optimal in the sense of solving the NUM problem. Finally, simulation results
are provided in Section 2.5 showing the near optimality of the usercentric algorithm and
its superior performance over a traditional MaxRSRP scheme in terms of balancing load
evenly among all the base stations.
17
2.1 Related work
The need for ecient usercell association schemes for heterogeneous networks, beyond
what is currently done for regular/uniform user density deployments, is clearly stated in
standard documents such as [13].
The literature on the broad topic of usercell association is vast, and several approaches
targeted to dierent performance metrics and system assumptions have been proposed.
Providing a compressive coverage of such large body of works would be out of the scope
of this dissertation. Therefore, we shall focus only on the works that more directly relate
to ours.
Joint power allocation and usercell association for the purpose of minimizing the total
power subject to target user SINR constraints has been widely studied in the framework
of CDMA powercontrolled networks [60{62]. This approach assumes that all users, on
any timefrequency slot, must maintain a certain target instantaneous rate, SINR level or,
more in general, QoS constraint (see [63]). While this may be relevant for CDMA systems,
where users continuously transmit (uplink) or receive (downlink), it is not meaningful in
the case of OFDMA/TDMA systems with scheduling, where the instantaneous rate of a
user is zero on the slots on which it is not scheduled.
Several works have considered the problem of joint usercell association, beamforming
vector design and power allocation. This problem was shown to be NPhard [20, 21, 64]
and approximate solution methods have been proposed. Our work diers from this line
of works because the network utility function used in there is a function of the user
instantaneous rates, and the optimization of the powers and beamforming vectors is
based on the instantaneous realization of the channel matrices (see Appendix A.1). The
complex channel coecients change over time according to the channel coherence time,
which may range from a few tens of ms for slowly moving users in systems operating in
the 25 GHz bands, to less than 1 ms for systems operating at mmwaves (e.g., 2060
GHz [10, 11]). It follows that optimizing usercell association, beamforming vectors and
18
powers on the basis of such rapidly varying channel state information is highly impractical,
beyond leading to mathematically involved and computationally hard problems.
In [65], the problem of usercell association was considered for a particular model of
a multicell network operating at mmwaves (60 GHz). It is assumed that each user and
base station are equipped with a steerable antenna and that, for a given association, such
antennas point perfectly at each other such that they achieve a certain desired gain while
rejecting perfectly the interference from other base stations. This leads to deterministic
and decoupled instantaneous user ratesR
k;j
. In this respect, this system model is similar
to ours, where in our case the decoupled and deterministic user instantaneous rates follow
from the massive MIMO regime (see Section 2.2 and Lemma 1). However, the problem
formulation in [65] is completely dierent from ours, since the goal in [65] consists of
minimizing the maximum perbase station load, subject to target user throughput de
mands. This is a loadbalancing problem, while here we solve a NUM problem where the
user throughputs are not assigned as a constraint, but are the result of the optimization.
In addition, [65] does not apply to a system employing multiuser MIMO at each base
station, where multiple users can be served simultaneously by a single base station on
each slot.
In [17], a multitier heterogeneous network with base stations that can possibly use
multiuser MIMO is considered, and the problem of usercell association is treated from
a stochastic geometry viewpoint. Base stations in each tier and users are randomly
distributed over the system area according to Poisson point processes, and an expression
for the user SINR averaged over the stochastic base station/user placement is obtained,
Based on such expression, a biasing scheme is proposed in order to induce loadbalancing
between tiers, assuming that users connect to the base station with the strongest received
signal. While the overall motivation and system view (a heterogeneous wireless network
with multiantenna base stations) is clearly related to ours, the treatment of the problem
is clearly completely dierent. Here, we obtain a pointwise global optimum solution for
the users' throughput for a given (arbitrary) placement of the users and base stations,
19
while in [17], a heuristic biasing scheme is obtained on the basis of an SINR performance
indicator, with averaging over the ensemble of stochastic placements.
Our problem formulation is to some extent related to that of [16]. However, [16]
assumes that each base station applies a local PF criterion, and the optimization is given
in terms of integer (binary 01) association variables. The resulting integerprogramming
problem is relaxed, and a subgradient method, which can also be seen as an online
iterative protocol, is proposed. The same problem is considered in [66], where Lagrangian
duality is used in order to circumvent the integer programming problem, and the dual
problem is solved via a coordinate descent method, without the need for relaxation.
Notice that the problem formulation in [16, 66] applies only to base stations serving a
single user per slot (no multiuser MIMO), and uniquely to the case where each base
station applies, independently of the others, a local PF policy giving equal airtime to
its associated users. In contrast, our problem is formulated for a global network utility
function (which includes PF at the whole network level) and our NUM problem is convex
in nature, not requiring any convex relaxation.
On a separate thread, [23] proposes a usercentric gametheoretic approach to the asso
ciation problem, which is completely decentralized. The associated randomized algorithm
(which can be turned into an online protocol) is shown to converge to a Nash equilib
rium under certain conditions on the peruser utility function. The Pareto eciency of
the Nash equilibria is studied, but it is not a priori clear whether such operating points
are close to any welldened global system optimality (social welfare). Our usercentric
scheme is closely related to the scheme of [23]. However, we consider a more general class
of usercentric utility functions re
ecting a desired notion of local (percell) fairness, and
we show the nontrivial fact that the corresponding usercentric schemes operate near the
system social optimum of the corresponding networkwide utility function.
20
2.2 System Model and Problem Denition
We consider a system formed by J base stations (BSs) serving K single antenna users,
distributed over a given area. We use j PJ t1; 2;:::;Ju and k PK t1; 2;:::;Ku
to index base stations (BSs) and users respectively. Each BS schedules transmissions
over contiguous timefrequency slots, each comprising a block of OFDM subcarriers and
symbols.
1
We use the commonly accepted and widely used blockfading channel model [3{
5,22] and distinguish between largescale and smallscale eects. The largescale channel
coecients are functions of the BSuser distance and shadowing. The smallscale eects
are modeled as Rayleigh fading coecients that remain constant within each slot. We let
M
j
denote the number of antennas at BS j, andS
j
denote the number of downlink data
streams that BS j can transmit on any given slot, i.e., S
j
is the multiplexing gain of BS
j and the ratioS
j
{M
j
is the corresponding spatial load. We assume TDD operation with
reciprocitybased channel state estimation [3,5]. Hence, every BS antenna in the vicinity
of user k can estimate its downlink channel coecient to user k from the uplink pilot
transmitted by user k. This enables the training of large antenna arrays (e.g., M
j
" 1)
with training overhead proportional to S
j
. [3].
2.2.1 Instantaneous rates and user throughput in massive MIMO systems
Consider the rate R
k;j
ptq that can be reliably transmitted from BS j to user k over a
given slott. This is referred to as instantaneous user rate. In general,R
k;j
ptq depends on
both large scale and small scale eects, and in particular on the realization on slot t of
the M
j
S
j
channel matrix between the antenna array of BS j and the antennas of the
S
j
users scheduled on slot t (see Appendix A.1). Let j
k
ptq denote the index of the BS to
which userk is associated at slot timet, and letS
j
ptq denote the set ofS
j
users scheduled
by BS j on slot t. The sequencetjptq : t 1; 2;:::u with jptqpj
1
ptq;:::;j
K
ptqqPJ
K
is referred to as an association sequence. Notice that, in general, the number of users
1
For example, in LTE [67], resource blocks are 7 OFDM symbols long (corresponding to a duration of
0:5 ms), and 12 subcarriers wide (corresponding to a bandwidth of12 15kHz 180kHz).
21
associated to a given BSj at any timet may be dierent from the BS spatial multiplexing
gain S
j
. In particular, if tk : j
k
ptq ju S
j
, then some downlink data streams are
not used, while iftk :j
k
ptqju¥S
j
then the BS will schedule S
j
out of the possible
associated users to be served on slot t. We refer to the users scheduled on a given slot
as the active users, and letS
j
ptq denote the set of active users of BS j at time t. The
sequencetpS
1
ptq;:::;S
J
ptqq :t 1; 2;:::u is referred to as an activation sequence.
For a given association sequence tjptqu and activation sequence tpS
1
ptq;:::;S
J
ptqqu,
the throughput of user k is dened as the limit of the timeaveraged scheduled instanta
neous rate:
2
r
k
lim
TÑ8
1
T
T
¸
t1
R
k;j
k
ptq
ptq 1tkPS
j
k
ptq
ptqu; (2.1)
whenever this limit exists (in the sense of convergence in probability [68]). In this work,
we restrict to ergodic stationary systems obeying the following assumptions:
A1) The largescale channel coecients are constant in time;
A2) The smallscale Rayleigh fading coecients evolve across dierent slots according
to a stationary and ergodic process with given timefrequency correlation.
A3) The usercell association policy and scheduling policy at each BS is such that the
limit (2.1) exists.
While A2) is a very common assumption in wireless communications [67], assumption A1)
holds locally, assuming users with low mobility with respect to the time scale over which
we observe the network. Under A1) and A2), assumption A3) is immediately veried by
stationary policies, i.e., policies that that determine the userBS association and sets of
active users as a function of the the channel coecients on each slot t.
At this point, we bring in the fundamental system simplication due to the massive
MIMO regime with perBS processing [3]. In our system, the user achievable instanta
neous rates R
k;j
ptq are given by the following result:
2
We denote the indicator function of a conditionA as 1tAu.
22
Lemma 1 For given largescale channel coecients and assuming that the smallscale
Rayleigh fading obeys the mild assumptions in [3{5], there exist deterministic quantities
tR
k;j
u such thatR
k;j
ptq
a:s:
Ñ R
k;j
, for allkPK andjPJ asM
j
;S
j
Ñ8 with xed spatial
load S
j
{M
j
j
¥ 0. Furthermore, tR
k;j
u are functions of the system parameters but
are independent of the usercell association and of the active user set.
Proof The proof is a consequence of the largesystem analysis based on asymptotic
random matrix theory developed in [3{5] for massive MIMO multicell systems. For the
sake of completeness, in Appendix A.1 we provide explicit expressions (taken from [4,5])
for the user instantaneous ratestR
k;j
u under various system assumptions.
As a consequence of Lemma 1, we have
Corollary 1 Under the assumptions of Lemma 1 and the system assumption A3), the
limit in (2.1) is given by
r
k
¸
jPJ
k;j
R
k;j
; @kPK (2.2)
where
k;j
lim
TÑ8
tt:kPS
j
ptqu
T
denotes the limit of the fraction of slots on which user k
is served by BS j (activity fraction).
Proof It is sucient to write
1
T
T
¸
t1
R
k;j
k
ptq
ptq 1tkPS
j
k
ptq
ptqu
1
T
¸
jPJ
¸
t:j
k
ptqj
R
k;j
ptq 1tkPS
j
ptqu
¸
jPJ
R
k;j
1
T
¸
t:j
k
ptqj
1tkPS
j
ptqu (2.3)
¸
jPJ
tt :kPS
j
ptqu
T
R
k;j
(2.4)
where (2.3) follows from Lemma 1 and (2.4) follows by rearranging terms and by noticing
that the conditionkPS
j
ptq implies thatj
k
ptqj. Then, under assumption A3) the limit
of the fraction of time slots on which userk is active on the downlink of BSj,
tt:kPS
j
ptqu
T
,
23
must exist and it is denoted by
k;j
. Thus, taking the limit of (2.4) for T Ñ8 we nd
(2.2).
It is worthwhile to remark that the (a.s.) convergence in Lemma 1 is very quick with
respect to the M
j
's. In particular, under mild assumptions on the channel coecients
(in particular, in the assumption of Rayleigh fading of Appendix A.1) a central limit
theorem can be proved such that, for large but nite M
j
, the actual rate can be written
as R
k;j
ptq R
k;j
k;j
ptq, where
k;j
ptq is a Gaussian \
uctuation" with mean zero
and variance Op1{M
2
j
qq (see for example [69] and references therein). It follows the
asymptotic rate limits tR
k;j
u yield very accurate results even for large but practical
values ofM
j
andS
j
. As a matter of fact, using the asymptotic instantaneous rate limits
in lieu of the corresponding actual quantities has become a widely accepted common
practice in massive MIMO system analysis [3{5]. Therefore, we shall use the limiting
valuestR
k;j
u as a useful and accurate proxy for the user instantaneous rates. This has
the key advantage that the user throughput, of a given network topology, depends on the
activity fractions, as seen from (2.2). Using this fact, in the next section we shall cast
the usercell association problem as a convex NUM with respect to the variablest
k;j
u.
2.2.2 Recasting usercell association as NUM problem
We wish to nd the optimal association of users to BSs such that an overall network
utility function Uprq of the user throughputs vector r
pr
1
;:::;r
k
q is maximized. We
shall choose the network utility function in order to achieve a desired balance between
networkwide overall performance and user fairness, re
ected by the fact that no user
should be given zero throughput.
3
Desirable network utility functions Uprq are concave
and componentwise monotonically increasing, such that larger user throughputs yield
3
As a matter of fact, an admission control scheme at some upper layer decides which users can join the
system, such that all admitted users are given positive throughput. In practice, it is meaningless to admit
users and leave them to starve with zero throughput. While we do not treat here admission control, it is
meaningful to assume that all users treated by the association and scheduling scheme are admitted, and
therefore must be served with some positive throughput.
24
larger utility, but the shape of the concave function imposes the desired notion of fairness.
In this chapter we consider the wellknown and widely used family of utility functions
dened by [51]
Uprq
¸
k
pr
k
q; (2.5)
where
pxq
$
'
&
'
%
logx for
1
x
1
1
for
1
(2.6)
and
¥ 0 is a parameter that determines the level of fairness. For example, this family
includes PF (for
1), where Uprq
°
k
logr
k
, and HF (for
Ñ8), where Uprq
min
k
r
k
.
In general, we may consider arbitrary restrictions on the possible usercell associations
(e.g., some BSs may have restricted access with respect to certain users). Then, we
letJ
k
J denote the set of BSs which can potentially serve user k (of course, the
unrestricted accessJ
k
J is a special case). It is important to notice that, though user
k is served by a single BS jPJ
k
in any given slot, it may be served by dierent BSs in
J
k
on dierent slots. Consequently, a user k can be associated fractionally to multiple
BSs inJ
k
. In this case, we have more than a single activity fraction
k;j
:jPJ
k
taking
positive values. We express this notion formally through the following denitions:
Denition 1 Association: A user k is said to be associated to the set of BSsJ
k
J
k
if
k;j
¡ 0 for all jPJ
k
and
k;j
0 for all jPJ
k
zJ
k
.
Denition 2 Unique Association: A userk is said to be uniquely associated ifJ
k

1. In this case, we denote by j
k
the BS to which user k is uniquely associated, i.e.,
J
k
tj
k
u.
Notice that, even though userk is uniquely associated to BSj
k
, it is not necessarily served
by BS j
k
on all slots. For example, if
k;j
k
0:5, then BS j
k
serves user k only on 50%
of the slots.
25
Denition 3 Fractional Association: A user k is said to be fractionally associated to
the set of BSsJ
k
J
k
ifJ
k
¡ 1.
At this point, the Network Utility Maximization (NUM) problem at hand can be expressed
by:
maximize Uprq (2.7a)
subject to r
k
¤
¸
jPJ
k
k;j
R
k;j
; @kPK (2.7b)
¸
kPK
k;j
¤S
j
; @ jPJ (2.7c)
¸
jPJ
k;j
¤ 1; @ kPK (2.7d)
r
k
¥ 0;
k;j
¥ 0; @ kPK; jPJ; (2.7e)
whereR
k;j
are the user instantaneous rates given by Lemma 1, and where the optimization
is with respect to r and
t
k;j
u. An explanation of the constraints (2.7b){(2.7e) is in
order:
The constraint (2.7b) follows from the expression of the user throughput in Corollary
1, when the set of allowed BS is restricted toJ
k
, and from the fact that Upq is
componentwise increasing, such that the optimum is always achieved when (2.7b)
is satised with equality for all k.
The constraints in (2.7c) re
ect the fact that the sum activities of all the users being
served by any given BSj cannot exceed the number of simultaneous downlink data
streams S
j
(multiplexing gain constraint).
The constraint (2.7d) simply re
ects the fact that each user's total activity fraction
(over all BSs) cannot be more than one (achieving the bound with equality means
that a user is served by some BS in every resource block).
26
Remark 1 An immediate consequence of the network utility function in (2.5) is that,
for
¥ 1, the solution of (2.7) must associate all users, i.e., for all k PK it must be
J
k
¥ 1. Otherwise, some user would have zero throughput, yielding Uprq8. The
feature that all users in the system are served with nonzero throughput re
ects the notion
of fairness built into the NUM problem and it is very desirable in practice, as we have
already remarked. Therefore, from now on, we shall restrict to
¥ 1.
We now give some further denitions that will be useful in the sequel.
Denition 4 Feasible Association Conguration: Any set of activity fractionst
k;j
u
satisfying (2.7c)(2.7e) is said to be a feasible association conguration.
Denition 5 Integer Scheduling Conguration: A feasible association congura
tion is said to be an integer scheduling conguration if
k;j
Pt0; 1u for all pairs pk;jq.
In terms of system implementation, it is relevant to consider whether a given feasible
association conguration can be achieved as a limit, for T Ñ8, of the empirical activity
fractions resulting from some actual association and activation sequences. In particular,
we have:
Denition 6 Physically Realizability: A feasible association conguration is said
to be physically realizable if there exist an association sequence tjptqu and an activation
sequencetpS
1
ptq;:::;S
J
ptqqu such that
k;j
lim
TÑ8
tt:kPS
j
ptqu
T
for all k;j.
Notice that integer scheduling congurations correspond to timeinvariant association
and activation sequences. In fact, in this case the limit of the pk;jqth activity fraction
is equal to 1 if user k is permanently associated and scheduled (active) on BS j, or 0
if it is not associated or it is associated but never scheduled. Hence, it is immediate to
see that if is a convex combination of some integer scheduling congurations, then
is physically realizable. Building on this key observation, the following result yields the
27
physical realizability (in the sense of Denition 6) of the feasible association congurations
of the NUM problem (2.7).
Theorem 1 Any feasible association conguration is physically realizable.
Proof See Appendix A.2.
We conclude this section by pointing out some observations about our NUM problem
formulation and its solution:
Problem (2.7) is convex. We shall develop in Section 2.3 an ecient method for its
solution that also sheds light on the properties of the optimal solution.
Several seemingly similar usercell association problems have been formulated as
suming that each user is constrained to be associated permanently to a single
BS [16,70,71]. As a consequence, the resulting optimization is combinatorial, since
it includes an additional set of constraints restricting the feasible association con
gurations to be unique associations (see Denition 2).
Problem (2.7) can be seen as a convex relaxation of the corresponding unique as
sociation combinatorial problem. Nevertheless, its solution can be implemented (as
a consequence of Theorem 1). Hence, the resulting optimal utility function value
provides a (feasible) upper bound benchmark to any usercell association scheme
imposing unique association, for the massive MIMO network with given user in
stantaneous ratestR
k;j
u and BS multiplexing gainstS
j
u.
2.3 Centralized Solution
In order to solve the convex program (2.7), general purpose numerical solvers like CVX
[72] or powerful numerical methods such as the alternating direction methods of mul
tipliers [73] can be used. However, here we focus on solving (2.7) by using a direct
method based on Lagrangian duality. This yields both an ecient numerical method,
28
able to easily handle networks with hundreds of users and tens of base stations, and has
the nonnegligible advantage of illuminating the structure and properties of the solution,
which will be used to establish the nearoptimality of a decentralized usercentric scheme
studied in Section 2.4. We rst formulate the dual program of (2.7) for a general utility
function Upq. We then specialize to the class dened in (2.5) and develop a centralized
algorithm to solve the dual program. The optimal solution of the dual program is then
used to obtain the primal variables.
We form the Lagrangian function for the primal problem (2.7) by introducing the dual
variables/prices (we use the terms dual variables and prices interchangeably)t
k
u for
the constraint (2.7b), ptp
j
u for the constraint (2.7c) andt
k
u for the constraint
(2.7d). Then, the Lagrangian function takes on the form:
Lp; r;; p;q Uprq
¸
k
k
r
k
¸
j
k;j
R
k;j
¸
j
p
j
¸
k
k;j
S
j
¸
k
k
¸
j
k;j
1
(2.8)
Uprq
¸
k
k
r
k
¸
j
S
j
p
j
¸
k
k
¸
pk;jq
k;j
p
k
R
k;j
p
j
k
q:
(2.9)
The dual function is given by the maximum of the Lagrangian over the primal variables
¥ 0 and r¥ 0:
Gp; p;q max
;r¥0
Lp; r;; p;q: (2.10)
The valueGp; p;q, for any set of nonnegative dual variables, provides an upper bound
on the optimal primal objective value. The dual program nds the tightest of such upper
29
bounds by minimizingGp; p;q over the feasible set of dual variables [74], i.e., it is given
by:
minimize Gp; p;q
subject to ; p;¥ 0:
From (2.9), we observe that Gp; p;q8 if
k
R
k;j
p
j
k
¡ 0 for somepk;jq. In
the case when
k
R
k;j
p
j
k
¤ 0 for all pk;jq, it is easy to see that Lp; r;; p;q
is maximized when is chosen such that the term
°
pk;jq
k;j
p
k
R
k;j
p
j
k
q in (2.9)
vanishes. From these observations, we nd the dual program equivalent form:
minimize max
r¥0
#
Uprq
¸
k
k
r
k
+
¸
j
S
j
p
j
¸
k
k
(2.11a)
subject to
k
R
k;j
¤p
j
k
@pk;jq (2.11b)
; p;¥ 0: (2.11c)
We now particularize to the class of network utility functions in (2.5). Thanks to the
additive form of the network utility function, the maximization with respect to r in (2.11a)
decomposes into the sum (over kPK) of individual maximizations of the terms
pr
k
q
k
r
k
; kPK:
Setting the derivative with respect tor
k
to zero, it is immediate to show that the maximum
is achieved for
r
k
k
; (2.12)
30
where, for future convenience, we dene 1{
. The corresponding (maximum) value is
1
1
1
k
for
1, and log
k
1 for
1. We rst consider in detail the case
1.
Using (2.12) into (2.11a), we obtain the dual program in the form:
minimize
¸
j
S
j
p
j
¸
k
k
1
1
¸
k
1
k
(2.13a)
subject to
k
¤
p
j
k
R
k;j
@pk;jq (2.13b)
; p;¥ 0: (2.13c)
The minimization over is immediate, and yields
k
min
j
"
p
j
k
R
k;j
*
: (2.14)
Replacing, we obtain:
minimize
q
Gpp;q
¸
j
S
j
p
j
¸
k
k
1
1
¸
k
min
j
"
p
j
k
R
k;j
*
1
(2.15a)
subject to p;¥ 0: (2.15b)
We next describe a convergent subgradient algorithm to approximate arbitrarily closely
the solution to (2.15). We let pp
piq
;
piq
q denote the value of the dual variables at the
ith subgradient iteration. The pi 1qth iterate is given as the ith iterate minus an
appropriately scaled adjustment along the subgradient chosen at the current iteration [75].
The subgradient algorithm comprises the following steps:
1. Initialize pp;q to some arbitrary positive values pp
p0q
;
p0q
q and let i 0. Also
choose the number of iterations
4
i
max
and the step sequence s
piq
a
bi
with appro
priately chosen constants a¡ 0, b¥ 0.
4
Alternatively we could choose a stopping criterion for the algorithm.
31
2. Choose the subgradient for the ith iteration, based on the objective function in
(2.15a) evaluated in the neighborhood ofpp
piq
;
piq
q. In particular, let:
5
j
piq
k
arg max
j
R
k;j
p
piq
j
piq
k
; (2.16a)
and let
K
piq
j
!
kPK; s.t. j
piq
k
j
)
: (2.16b)
The ith iteration subgradient is based on the derivative of the term
q
G
piq
pp;q
¸
j
S
j
p
j
¸
k
k
F
piq
pp;q
where
F
piq
pp;q
1
1
¸
k
p
j
piq
k
k
R
k;j
piq
k
1
1
1
¸
j
¸
kPK
piq
j
p
j
k
R
k;j
1
:
3. Taking derivatives of
q
G
piq
pp;q with respect to p
j
and
k
, respectively, and the
nonnegativity constraint of p
j
and
k
, the corresponding subgradient iteration is
given by
p
pi1q
j
p
piq
j
s
piq
¸
kPK
piq
j
R
1
k;j
p
piq
j
piq
k
S
j
(2.16c)
pi1q
k
piq
k
s
piq
R
1
k;j
piq
k
p
piq
j
piq
k
piq
k
1
(2.16d)
4. If i i
max
increment i by 1 and go to step 2, else stop.
5
In case multiple j indices maximize R
k;j
{pp
piq
j
piq
k
q any one of these can be used.
32
It can easily be veried that the formulas (2.16) also provide the corresponding subgra
dient algorithm iteration updates in the case
1.
In the following, we let pp
;
q denote the dual variable values after a suciently
large number of iterations of (2.16){(2.16d). Oncepp
;
q have been obtained, we need
to solve for the corresponding primal variables
in (2.7), thus obtaining the optimal
association conguration and the corresponding optimal user throughputs. First, we rst
discuss the KKT conditions of (2.7). Then, in Section 2.3.2, we consider the primal
variables solution.
2.3.1 KKT conditions
The convex program (2.7) is given in canonical form with linear inequality constraints
(2.7b){(2.7e). Therefore the Slater condition reduces to feasibility. This implies that
strong duality holds and the KKT conditions including the feasibility and the comple
mentary slackness conditions are both necessary and sucient for optimality. Noticing
that all variables are nonnegative (for a classical argument, see [76, Th. 4.4.1]), by taking
the partial derivatives of Lp; r;; p;q with respect to r
k
and
k;j
we obtain necessary
and sucient conditions for optimality in the form
BL
Br
k
1
pr
k
q
k
¤ 0 (2.17)
BL
B
k;j
k
R
k;j
p
j
k
¤ 0 (2.18)
where inequalities (2.17)(2.18) must hold with strict equality for the strictly positive
components r
k
,
k;j
, respectively, at the optimal points. The complementary slackness
conditions are equivalently expressed as follows: at the optimal point, the inequalities
¸
kPK
k;j
S
j
¤ 0 (2.19)
¸
jPJ
k;j
1 ¤ 0; (2.20)
33
must hold with strict equality for all strictly positive components of p and, respectively.
On the other hand, for the components p
j
0 (resp.,
k
0), (2.19) (resp., (2.20)) may
hold with strict inequality.
Letp
o
; r
o
;
o
; p
o
;
o
q denote an optimal point, i.e., a set of values forp; r;; p;q
that achieves the min (w.r.t. the dual variables) of the max (w.r.t. the primal variables)
of (2.9), and recall that we assumed
¥ 1, implying that r
o
k
0 for some k cannot
be an optimal point (see Remark 1). This implies that (2.17) must hold with equality
atp; r;; p;qp
o
; r
o
;
o
; p
o
;
o
q. Using the expression
1
pxqx
, the denition
r
k
°
jPJ
k
k;j
R
k;j
and substituting
k
1
pr
k
q in (2.18) , (2.17)(2.18) reduce to
¸
j
1
PJ
k
o
k;j
1R
k;j
1
R
k;j
p
o
j
o
k
@ jPJ
k
pp
o
;
o
q (2.21)
¸
j
1
PJ
k
o
k;j
1R
k;j
1 ¡
R
k;j
p
o
j
o
k
@ jRJ
k
pp
o
;
o
q; (2.22)
where the setJ
k
pp
o
;
o
q is given by
J
k
pp
o
;
o
q
#
j :
R
k;j
p
o
j
o
k
max
j
1
PJ
k
R
k;j
1
p
o
j
1
o
k
+
: (2.23)
Summarizing, the consistency conditions for the optimality of the activity fractions at an
optimal set of prices are as follows: for each user k PK and the sets of BSs dened in
(2.23), we have
$
'
&
'
%
o
k;j
¡ 0 for some jPJ
k
pp
o
;
o
q
o
k;j
0 @ jRJ
k
pp
o
;
o
q:
(2.24)
Interpreting the quantity
R
k;j
p
0
j
0
k
as the \bangperbuck" oered by BS j to user k (a
term from economics [77]) at the prices p
o
j
and
o
k
,J
k
pp
o
;
o
q is the set of BSs oering
the maximum bangperbuck to user k. Thus, the conditions (2.21) and (2.24) imply
that, at the optimum prices p
o
and
o
,
34
the throughput of every user k should be equal to the maximum bangperbuck
value oered by some BS in its neighborhood;
every user k can have a strictly positive activity fraction to only those BSs which
oer the maximum bangperbuck value, among the BSs inJ
k
(from which user k
is allowed to get service).
2.3.2 Solving for the Primal Variables
We now describe how to use the solutionpp
;
q of the dual problem (2.15) to solve for
the primal variables. First we note that, using (2.12) and (2.14) we have
r
k
max
j
#
R
k;j
p
j
k
+
; @kPK : (2.25)
Hence, the optimal user throughputs are given by (2.25). In order to calculate the op
timal association conguration
frompp
;
q, we can solve the KKT conditions. In
particular, we can choose any feasible association conguration
satisfying
k;j
0 @ jRJ
k
pp
;
q (2.26a)
¸
j
1
PJ
k
pp
;
q
k;j
1R
k;j
1 ¡
R
k;j
p
j
k
@ jRJ
k
pp
;
q (2.26b)
¸
j
1
PJ
k
pp
;
q
k;j
1R
k;j
1
R
k;j
p
j
k
@ jPJ
k
pp
;
q (2.26c)
p
j
¸
kPK
k;j
S
j
0 @ jPJ (2.26d)
k
¸
jPJ
k;j
1
0 @ kPK: (2.26e)
However in practice, the dual subgradient algorithm yields dual variables that dier
from their optimal value by some very small numerical error, due to nite machine pre
cision and nite number of iterations. Hence, the system of KKT conditions above may
not have a solution when p
and
are numerically calculated. We therefore propose a
35
numerically stable approach that always yields a feasible association conguration and,
in particular, yields the exact optimal
when p
and
are exactly at their optimal
point (see Lemma 2 below).
As noticed before, the optimal throughput values r
are given by (2.25). Dene the
ratios
r
R
k;j
R
k;j
{r
k
, and consider the quantities f
k
pq
°
jPJ
k
k;j
r
R
k;j
, for k PK.
By construction, there exists an optimal feasible association conguration
such that
f
k
p
q 1, i.e., there is an optimal point where the (linear) functions f
k
pq are equal to
1, for all kPK. This suggests that
can be found as the solution of the LP:
maximize (2.27a)
subject to ¤f
k
pq; @kPK (2.27b)
¸
kPK
k;j
¤S
j
; @ jPJ (2.27c)
¸
jPJ
k;j
¤ 1; @ kPK (2.27d)
k;j
¥ 0; @ kPK; jPJ: (2.27e)
We have:
Lemma 2 If the throughputs r
k
given by (2.25) correspond to the exact optimal solution
of the NUM problem (2.7), then the solution of (2.27) is the corresponding optimal feasible
association conguration.
Proof Notice that (2.27) maximizes the minimum f
k
pq by maximizing a common lower
bound subject to the feasibility of the association conguration. Let p denote the
solution ofp2:27q and let
max
min
k
f
k
pp q denote the achieved maximum value of the
common lower bound. Since, by construction, there exist a feasible conguration
for
whichf
k
p
q 1 for allkPK, there are two possible cases: 1)
max
1, or 2)
max
¥ 1.
Case 1) is impossible, otherwise p could be improved to
, contradicting the assumption
36
that p is the solution of (2.27). Case 2) can only hold with equality. In fact, if it held
with strict inequality, we would have f
k
pp q¡ 1 for all k, implying
¸
jPJ
k
p
k
R
k;j
¡
¸
jPJ
k
k
R
k;j
r
k
; @kPK:
Since the network utility functionUpq is componentwise increasing, this means that there
exists a feasible association conguration p yielding better utility than the optimal
,
thus leading to a contradiction. It follows that it must be
max
1f
k
pp q for allkPK,
implying p
since, by the system model setup, R
k;j
¡ 0 for all jPJ
k
.
Remark 2 In order to derive a centralized association/scheduling policy that yields the
activity fractions that are the solution of (2.7), some association and activation sequences
must be found with empirical activity fractions converging (in the limit for T Ñ8) to
. Theorem 1 guarantees that this is possible. However, nding such sequences is not
easy in general. From the proof of Theorem 1 in Appendix A.2, we can see that this is
equivalent to nding integer scheduling congurations such that
can be written as their
convex combination. Unfortunately, this is again a combinatorial problem that may be
hard to solve in general.
Motivated by this consideration, in the next section we study a class of schemes
that yield unique association congurations and fully decentralized usercentric associa
tion/scheduling policies. Yet, somehow surprisingly, these schemes are shown to perform
close to the globally optimal centralized solution.
2.4 Distributed UserCell Association Algorithms
In this section, we focus on usercell association algorithms where each user makes its own
association decisions in a selsh way, i.e., based on its own usercentric utility function.
In particular, we consider a class of such schemes where the usercentric utility function
37
is the user throughput,r
k
, and each BS applies a local version of the NUM (2.7) in order
to allocate its transmission resources (timefrequency slots) among its associated users.
LettingK
j
denote the set of users uniquely associated to BS j, the service policy at
each BS jPJ solves the following local NUM problem:
maximize
¸
kPK
j
p
k;j
R
k;j
q (2.28a)
subject to
¸
kPK
j
k;j
¤S
j
; (2.28b)
0¤
k;j
¤ 1; @ kPK
j
: (2.28c)
The solution of this problem is given by:
Theorem 2 For
¥ 1 (i.e, ¤ 1), without loss of generality assume that
6
R
1
1;j
¥R
1
2;j
¥:::¥R
1
K
j
;j
: (2.29)
Setting R
1
0;j
8 and R
1
K
j
1;j
0, let k
Pt1;:::;K
j
u be such that
R
1
k
1;j
¥
K
j

°
kk
R
1
k;j
S
j
k
1
¡R
1
k
;j
: (2.30)
Then, the solution of (2.28) is given by
k;j
$
'
'
'
&
'
'
'
%
1; for 1¤k¤k
1
pS
j
k
1qR
1
k;j
K
j

°
kk
R
1
k;j
for k
¤k¤K
j

(2.31)
6
If this is not the case, the users must be sorted in nonincreasing order with respect to the values
R
1
k;j
by some permutation , and the statement of the theorem is valid by replacing the user index k
with its permuted version pkq.
38
Proof The proof is given in Appendix A.3. Here, we just note that an indexk
satisfying
(2.30) always exists. When this is not unique (e.g., in the caseK
j
S
j
and 1), any
choice of k
satisfying (2.30) yields the same optimal value of the objective function.
Remark 3 For the particularly important case of PF (
1) the solution of Theorem 2
reduces to:
k;j
$
'
&
'
%
S
j
K
j

; @kPK
j
if S
j
¤K
j

1 @kPK
j
if S
j
¡K
j

(2.32)
Remark 4 Under the condition
S
j
R
1
k;j
°
k
1
PK
j
R
1
k
1
;j
¤ 1; @ jPJ and kPK
j
; (2.33)
the solution of Theorem 2 simplies to:
k;j
S
j
R
1
k;j
°
k
1
PK
j
R
1
k
1
;j
; @ kPK
j
: (2.34)
The term
°
k
1
PK
j
R
1
k
1
;j
in the denominator of (2.34) can be interpreted as a measure of
the load of BS j. For example, for the case of proportional fairness this quantity is
simply equal toK
j
, i.e., the number of users uniquely associated with BS j. Condition
(2.33) shall be referred to as the heavy load condition. Since we are interested in the
performance of the network in heavy load conditions,
7
in the following we shall assume
that (2.33) holds for all BSs jPJ .
7
If the network is lightly loaded, some spatial dimensions at some BSs may be underutilized, meaning
that not all the downlink streams need to be used at each slot time. In this case, the network has more
downlink capacity than needed, and other problems beyond NUM become relevant, as for example the
transmit power minimization subject to given target peruser throughputs (see for example [65]). This
nonheavyloaded regime is not the focus of this dissertation, and its study in the context of massive
MIMO is left for future work.
39
Since the activity fractionst
k;j
u are xed by the local NUM policy (2.34), we shall
not distinguish any longer between the partition and the induced unique association
conguration. In particular, we have:
Denition 7 Valid partition: A partitiontK
j
:jPJu of the user setK is valid if it
corresponds to a feasible unique association conguration. In particular, the corresponding
association vector j has components j
k
PJ
k
dened by j
k
jôkPK
j
, for all kPK.
For a given valid partition tK
j
u, the throughput of any user k is given by r
k
k;j
k
R
k;j
k
.
2.4.1 Usercentric association games
Following [23], the usercentric association algorithms considered in this work can be
studied in the framework of noncooperative association games. In particular, the normal
form association game is dened by:
Players: the users kPK.
Action space: each user k has an action setJ
k
where action j
k
PJ
k
corresponds
to the decision of user k to associate uniquely to BS j
k
. Therefore, the joint action
set of all users is the Cartesian productAJ
1
J
K
.
Payo functions: the payo function of user k is its throughput r
k
k;j
k
R
k;j
k
,
where R
k;j
k
is a xed value that depends on the massive MIMO downlink scheme
employed (see Appendix A.1), and
k;j
k
is given by (2.34).
Since according to (2.34)
k;j
is a function ofK
j
, it follows that the user throughputs r
k
are functions of the joint action jPA, i.e., the payo functions are maps r
k
:AÑR
.
In order to stress this dependency, we shall use the notation r
k
pjq.
A unique association conguration j pj
1
;:::;j
K
q is said to be a pure Nash equi
librium if, for all k PK, we have r
k
pj
1
;:::;j
k
;:::;j
K
q ¥ r
k
pj
1
;:::;j;:::;j
K
q for any
40
j PJ
k
. In other words, no user has an incentive to change unilaterally its association
while all other users stay unchanged.
Before discussing a specic user centric algorithm in terms of its online decentralized
implementation, let's examine the global optimality properties of the Nash equilibria of
the related association game dened above. Suppose that there exists a valid partition
tK
j
:jPJu for which (2.33) holds and such that:
S
j
k
R
k;j
k
°
k
1
PK
j
k
R
1
k
1
;j
k
¡
S
`
R
k;`
°
k
1
PK
`
R
1
k
1
;`
; @ kPK; and `PJ
k
with `j
k
: (2.35)
Then, setting the dual variables of the global NUM problem as
p
j
1
S
j
¸
k
1
PK
j
R
1
k
1
;j
1{
; and
k
0; (2.36)
we can easily verify that the KKT conditions (2.26) are satised with
k;j
$
'
&
'
%
S
j
R
1
k;j
°
k
1
PK
j
R
1
k
1
;j
for jj
k
0 for jj
k
;
which coincide with (2.34) and, using (2.25), yield the user throughputsr
k
R
k;j
k
p
j
k
S
j
k
R
k;j
k
°
k
1
PK
j
k
R
1
k
1
;j
k
. Obviously, these throughputs are the same obtained by applying the local
NUM policy (solution of (2.28)) at each BS j, under the unique association given by
tK
j
: j PJu. Since the KKT conditions are necessary and sucient, we conclude that
the valid partition tK
j
: j PJu combined with the local NUM policy (2.34) yields a
globally optimal unique association conguration for the networkwide NUM problem
(2.7). Also, notice that the inequalities (2.35) imply the purestrategy Nash equilibrium
S
j
k
R
k;j
k
°
k
1
PK
j
k
R
1
k
1
;j
k
¥
S
`
R
k;`
R
1
k;`
°
k
1
PK
`
R
1
k
1
;`
; @ kPK; and `PJ
k
with `j
k
: (2.37)
41
Summarizing, we have proved:
Lemma 3 If a valid partition tK
j
: j PJu satises (2.33) and (2.35), then the corre
sponding association j is a purestrategy Nash equilibrium of the decentralized association
game. Furthermore, such Nash equilibrium corresponds to the global optimum of the
networkwide NUM problem (2.7).
Now, suppose that the association game has a purestrategy Nash equilibrium. Hence,
there exists a unique association j such that (2.37) holds. Arguing in the reverse direction
of the argument leading to Lemma 3, we observe that if the heavyloaded condition (2.33)
holds, then the KKT conditions (2.26) of the global problem are \almost" satised. For
example, in the case PF (
1), the Nash equilibrium conditions are given by
S
j
k
R
k;j
k
K
j
k

¥
S
`
R
k;`
K
`
 1
; @ kPK; and `PJ
k
with `j
k
: (2.38)
WhenK
j
¡ S
j
" 1 for all j PJ (heavyloaded system, typical in the case of massive
MIMO networks), we have that the \+1" in the denominator of the promised rate terms
is negligible with respect to the set sizeK
`
, and therefore the Nash equilibrium condi
tion and the KKT conditions (2.26) almost coincide. This means that, for heavyloaded
systems, a decentralized usercentric system operating at its Nash equilibrium is also very
close to the global optimum of the networkwide NUM problem.
2.4.2 Usercentric decentralized online algorithms
For a decentralized online implementation of the usercentric association scheme, several
variants have been proposed.
8
Here, and in our simulations, we restrict to a very
simple scheme that requires only local information at the users. In practice this can be
easily obtained from the BSs through the \beaconstung" approach [78], where each BS
8
For example, [23] discusses also the case where two subsets of BSs operate according to dierent
local NUM policies; one subset operates according to PF (equal airtime) and another subset operates
according to HF (equal throughput).
42
advertises the required information while broadcasting its beacon signal. In the proposed
scheme, starting from a current association j, each userk compares its current throughput
r
k
pjq with the highest promised throughput
p r
k
max
`PJ
k
zj
k
k;`
R
k;`
; (2.39)
where
k;`
is given by (2.31). If p r
k
¡ r
k
pjq, then user k changes its association to BS
p
`
achieving the max in (2.39), with some xed probability P p0; 1q. Otherwise, user k
keeps its current association.
This algorithm evolves according to a discretetime Markov Chain, with state space
A (i.e., the joint action space of the related association game). SinceA is a nite set
and since every state has a selftransition of positive probability, the chain is aperiodic.
Furthermore, it is immediate to see that the purestrategy Nash equilibria of the game are
absorbing states. If any state jPA communicate with a each equilibrium (i.e., there is a
path of positive probability from j to a Nash equilibrium state), then the only persistent
classes are the Nash equilibria and all other states are transients. In this case, we have
that the algorithm converges to a Nash equilibrium with probability 1.
An improvement path in the association game consists of a sequence of unique as
sociation congurations j, each diering from the preceding one in a single component
only, such that the change of strategy in that component increases the throughput of the
corresponding user. If every improvement path is nite, then the game is said to have the
nite improvement path property, which implies that any state communicates with a Nash
equilibrium state (see [23] and references therein). For
1 (PF) and
Ñ 8 (HF),
it is known that such property holds [23], implying that the usercentric decentralized
association algorithm converges with probability 1 to a purestrategy Nash equilibrium.
For what was said before, this Nash equilibrium is very close to the global NUM opti
mum, when the network is in the heavyloaded condition (2.33). This is conrmed by
extensive simulations, some of which are reported in Section 2.5. Furthermore, we always
43
observed convergence also for other values of
Pp1;8q. Therefore, we conjecture that
purestrategy Nash equilibria exist with high probability for random network topologies
and arbitrary fairness factor
, although proving the nite improvement path property
has been, so far, elusive.
Remark 5 In practice, networks have a (slowly) timevarying topology due to user mo
tion across the coverage area, and transitions due to users joining or leaving the system.
Hence, the association must be continuously updated. We notice that the usercentric
randomized scheme proposed here, with positive switch probability , is naturally suited
to this purpose, allowing association switching as the user peak rates evolve in time due
to user mobility or the BSs load changes due to users joining or leaving the system. Also,
it is practical to introduce some hysteresis in order to prevent too frequent association
switches (which incur some protocol cost) and too wild
uctuations of the user peak rates.
However, these practical considerations go beyond the scope of this dissertation.
2.5 Numerical Experiments
In this section we present a comparative evaluation of the usercentric loadbalancing
scheme considered in this chapter, and a heuristic scheme based on maximum peakrate
association. In particular, there is no \standard" commonly accepted way to perform user
BS association in a network employing multiuser MIMO. Here, as a term of comparison
we have chosen a naive maximum peak rate association scheme, i.e., userk associates with
BS jpkq arg max
jPJ
k
R
k;j
. In the massive MIMO regime, the user peak rates converge
to the deterministic limit (A.6) which depends only on the individual user SINR terms
(A.5) or (A.4), which can be assumed to be known. Hence, the Max peakrate association
scheme can be easily implemented in the massive MIMO case. After the users associate
with the BSs using the Max peakrate decision, the BSs locally implement the
fairness
policy according to (2.31). We remark that the MaxRSRP scheme mentioned in Section
1.1 does not apply since, in general, the SINR achieved by any user with multiuser MIMO
44
downlink spatial multiplexing depends on the channel matrix realization and on the set
of simultaneously scheduled users.
2.5.1 Experiment 1
In this experiment, we consider a network topology formed by a 900m1800m rectan
gular region with several smallcell BSs and two macro BSs whose locations are xed
throughout all of the simulation runs. As shown in Fig. 2.1, the two macro BSs (in
dicated by the ) are located in the centers of the two 900m900m square subregions
comprising the 900m1800m rectangular area, and 40 small cell BSs (indicated by 's)
are uniformly distributed in the region. The number of users (indicated by's) and their
locations change across dierent simulation runs, and are generated according to a non
homogeneous Poisson point process with higher density in a central region around each
of the Macro BSs, as shown in Fig.2.1.
200 400 600 800 1000 1200 1400 1600
100
200
300
400
500
600
700
800
Num. of SCs: 34 Num. of UEs: 5995
Figure 2.1: Wireless network with 2 Macro BSs and several small cell BSs
The macro BS hasM 100 antennas and serves user sets of sizeS 10, with 46dBm
transmission power. Each smallcell BS hasM 40 antennas and serves user sets of size
S 4, with transmission power of 35dBm. The pathloss from the macro BS to a user
45
and from a smallcell BS to a user is given by
1
1p
d
40
q
3:5
and by
1
1p
d
40
q
4
, respectively
9
,
with d representing the BSuser distance (assuming a torus wraparound model to avoid
boundary eects).
We calculate the peak rates R
k;j
using the formulas (A.6) and (A.5) for ZFBF with
pilot contamination. We assume that each Macro BS uses the same set of S 10 pilots
which are mutually orthogonal while the small cell BSs use a dierent set of S 4 pilots
which are mutually orthogonal.
In Figs. 2.2a2.2d, we compare the performance of the proposed centralized and
distributed algorithms with the Max peakrate association scheme. We choose a constant
switching probability 0:1 when simulating the distributed algorithm. For every
realization of the layout similar to Fig. 2.1, we calculate the throughput statistics 1)
the 5% percentile throughput, 2) the geometric mean of user throughputs and 3) the
arithmetic mean of user throughputs and then plot the CDFs of these quantities over
100 realizations for the case of
1 (PF scheduling) in Figs. 2.2a, 2.2c and 2.2b
respectively. We have run the centralized solution computed via the method in Section 2.3
and, remarkably, the performance of the randomized distributed usercentric algorithm is
almost indistinguishable from the performance of the (optimal) centralized solution, as we
have argued to hold for highlyloaded systems in Section 2.4. Furthermore, in Fig. 2.2d,
we compare the performance of the distributed algorithm with the Max peakrate scheme
by calculating the ratio between the throughput statistic of the distributed algorithm
and the throughput statistic of the Max peakrate for every realization and then plot
the CDF of the ratio over 100 realizations. Fig. 2.2d reveals the fact that distributed
algorithm results in superior performance in terms of the 5% percentile throughput, and
the geometric mean. For instance, half of the realizations observe more than 30% gain in
5 percentile throughput with the distributed algorithm over the Max peakrate scheme.
Nevertheless, the Max peakrate achieves higher average throughput, since the PF fairness
9
A greater pathloss exponent (4) is used for smallcell BSs, in order to take into account the fact that
the macroBS antennas are at higher elevation.
46
1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
x 10
−4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Frac. of realizations with 5 percentile throughput < x
5 percentile
Centralized−subgradient
Distributed
Max peak rate
(a)
0.023 0.024 0.025 0.026 0.027 0.028 0.029
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Frac. of realizations with arith. mean of user throughputs < x
Arithmetic Mean
Centralized−subgradient
Distributed
Max peak rate
(b)
5.4 5.6 5.8 6 6.2 6.4 6.6 6.8
x 10
−3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Frac. of realizations with geometric mean throughput < x
Geometric Mean
Centralized−subgradient
Distributed
Max peak rate
(c)
0.9 1 1.1 1.2 1.3 1.4 1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Frac. of realizations with throughput statistic ratio < x
Comparison of distributed algorithm vs. Max peak rate
5 percentile ratio
geometric mean ratio
arithmetic mean ratio
(d)
Figure 2.2: Performance comparison of various algorithms for
1 and the layout of
Fig. 2.1.
function imposes to serve all users in a proportionally fair way across the network, while
Max peakrate does not.
2.5.2 Experiment 2: 3GPP HetNet Model
In this experiment, we conduct simulations in a more realistic network topology as shown
in Fig. 2.3 which is compliant with the layout specied for small cell heterogenous net
works in 3GPP standardization [13]. In particular, we have a cellular layout with 7 Macro
cells (indicated by the ) with each macro cell consisting of 3 hot zones. A hot zone is a
47
−1500 −1000 −500 0 500 1000 1500
−1500
−1000
−500
0
500
1000
1500
Figure 2.3: A 3GPP HetNet scenario with small cell BSs deployed in hot zones
geographical area where the concentration of users (indicated by the green's) is much
higher than the rest of the layout. Within each hot zone, there are 4 small cells (indicated
by the red's) randomly dropped in order to meet the high trac demands in the hot
zone. Note that the model we use to drop the small cells and the users is exactly com
pliant with the parameters provided in the 3GPP standardization document [13]. The
Macro/small cell powers and the pathloss models used in this experiment are identical to
those used in Experiment 1.
We calculate the peak rates R
k;j
using the formulas (A.6) and (A.5) for ZFBF with
pilot contamination. We assume that each Macro BS uses the same set of 10 pilots which
are mutually orthogonal. Furthermore, we assume that the pilots used by the small cell
BSs are orthogonal with the 10 pilots used by the Macro BSs. Moreover, within each hot
zone, 16 mutually orthogonal pilots are used; 4 pilots each for each of the small cell BSs
in the hot zone. These 16 pilots are then reused in every hot zone of the layout.
As in Experiment 1, we run the distributed algorithm and the Max peakrate scheme
with
1 (proportional fairness) for 100 dierent realizations of the 3GPP layout of
Fig. 2.3 wherein the locations of the users, hot zones and the small cell BSs are generated
48
0.8 1 1.2 1.4 1.6 1.8 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Frac. of realizations with 5 percentile ratio < x
5 percentile ratio
arithmetic mean ratio
geometric mean ratio
Figure 2.4: Comparison of the proposed distributed algorithm and Max peakrate in
terms of the throughput statistics: 5 percentile rate, arithmetic mean rate and geometric
mean rate, for
1 and the layout of Fig. 2.3.
randomly for each realization in compliance with 3GPP specications while the locations
of the Macro BSs remain xed. Similar to Experiment 1, we compare the performance
of the distributed algorithm and the Max peakrate scheme by plotting the CDF of the
throughput statistics. From Fig. 2.4, we notice that the distributed algorithm provides
¥ 25% gain over Max peakrate in terms of the 5 percentile throughput for about 50%
of the realizations.
Finally, in Fig. 2.5, we compare the performance of the proposed usercentric dis
tributed algorithm with the Max peakrate scheme for the layout in Fig. 2.3 in terms
of load balancing across various BSs. For every BS j, we rst calculate the load which
is the number of users K
j
 uniquely associated with BS j. Then, we plot in Fig. 2.5
the perBS load for the macro and the small cell BSs, where within each set we sort the
BSs in decreasing load order. The superior performance of the proposed algorithm can
be qualitatively appreciated from Fig. 2.5 by observing that the load achieved by our
scheme is more evenly balanced, both across the two tiers, and within BSs of the same
tier.
49
0
200
400
600
800
1000
Number of users (Load)
BSs sorted according to load
Randomized Algorithm
Number of users associated with each Macro BS
Number of users associated with each small cell BS
0
200
400
600
800
1000
Number of users (Load)
BSs sorted according to load
Max Peak Rate
Number of users associated with each Macro BS
Number of users associated with each small cell BS
Figure 2.5: Load distribution: proposed distributed algorithm vs. Max peakrate associ
ation.
50
Chapter 3
Adaptive Video Streaming for Wireless Networks with
Multiple Users and Helpers
This chapter focuses on the design and analysis of scheduling and congestion control
algorithms for ecient delivery of video content over wireless networks. We formulate
the problem as a Network Utility Maximization (NUM) where the objective is to fairly
maximize users' video streaming Quality of Experience (QoE) and then derive an iterative
\push" policy using Lyapunov Optimization, which can solve the NUM problem up to any
level of accuracy. The policy can be used directly as an online protocol by interpreting
the iterations as control actions in successive transmission slots.
This chapter is organized as follows. In Section 3.1, we describe the system model for
VoD streaming in a wireless network with multiple users and helpers, and discuss some
key underlying assumptions. In Section 3.2, we formulate the NUM problem, provide
the proposed distributed dynamic scheduling policy for its solution, and state the main
results on its optimality. Section 3.3 illustrates our proposed scheme for adaptive pre
buering and rebuering in order to cope with playback buer underrun events. Finally,
simulation results illustrating the particular features of the proposed scheme are provided
in Section 3.4. The main technical proofs are collected in the Appendices, in order to
maintain the
ow of exposition.
51
3.1 System Model
We consider a timeslotted wireless network with multiple users and multiple helper
stations sharing the same bandwidth. The network is dened by a bipartite graph
G pU;H;Eq, whereU denotes the set of users,H denotes the set of helpers, andE
contains edges for all pairsph;uq such that there exists a potential transmission link be
tween h PH and u PU.
1
We denote byNpuq H the neighborhood of user u, i.e.,
NpuqthPH :ph;uqPEu. Similarly,NphqtuPU :ph;uqPEu.
Each user u PU requests a video le f
u
from a libraryF of possible les. Each
video le is formed by a sequence of chunks. Each chunk corresponds to a group of
pictures (GOP) that are encoded and decoded as standalone units [79]. Chunks have a
xed playback duration, given by T
gop
(# frames per GOP){, where is the frame
rate, expressed in frames per second. The streaming process consists of transferring
chunks from the helpers to the requesting users such that the playback buer at each
user contains the required chunks at the beginning of each chunk playback deadline. The
playback starts after a certain prebuering time, during which the playback buer is
lled by a determined amount of ordered chunks. The prebuering time is typically
much shorter than the duration of the streaming session.
The helpers may not have access to the whole video library, because of backhaul
constraints or caching constraints.
2
In general, we denote byHpfq the set of helpers that
contain lefPF. Hence, useru requesting lef
u
can only download video chunks from
helpers in the setNpuqXHpf
u
q.
Each lefPF is encoded at a nite number of dierent quality levelsmPt1;:::;N
f
u.
This is similar to the implementation of several current video streaming technologies, such
as Microsoft Smooth Streaming and Apple HTTP Live Streaming [80]. Due to the VBR
1
The existence of such potential links depends on the channel gain coecients between helper h and
user u (see the physical channel model in Section 3.1.1), as well as on some protocol imposing restricted
access for some helpers.
2
For example, in a FemtoCaching network (see discussion in Section 1.2) each helper contains a subset
of the les depending on some caching algorithm.
52
nature of video coding [55], the qualityrate prole of a given le f may vary from chunk
to chunk. We let D
f
pm;tq and B
f
pm;tq denote the video quality measure (e.g., see [81])
and the number of bits per pixel for lef at chunk timet and quality levelm respectively.
A scheduling policy for the network at hand consists of a sequence of decisions such
that, at each chunk time t, each streaming user u requests its desired tth chunk of le
f
u
from one or more helpers inNpuqXHpf
u
q at some quality level m
u
ptqPt1;:::;N
f
u,
and each helper h transmits the sourceencoded bits of currently or previously requested
chunks to the users. For simplicity, we assume that the scheduler timescale coincides with
the chunk interval, i.e., at each chunk interval a scheduling decision is made. Conven
tionally, we assume a slotted time axis t 0; 1; 2; 3:::; corresponding to epochstT
gop
.
Letting T
u
denote the prebuering time of user u (where T
u
is an integer), the chunks
are downloaded starting at time t 0 and the tth chunk playback deadline is tT
u
.
A buer underrun event for user u at time t is dened as the event that the playback
buer does not contain chunk number tT
u
at slot time t. When a buer underrun
even occurs, the playback may be stopped until enough ordered chunks are accumulated
in the playback buer. This is called stall event, and the process of reconstituting the
playback buer to a certain desired level of ordered chunks is referred to as rebuering.
Alternatively, the playback might just skip the missing chunk. The details relative to
prebuering, rebuering and chunk skipping are discussed in Section 3.3.
Letting N
pix
denote the number of pixels per frame, a chunk contains kT
gop
N
pix
pixels. Hence, the number of bits in thetth chunk of lef, encoded at quality levelm, is
given by kB
f
pm;tq. We assume that a chunk can be partially downloaded from multiple
helpers, and letR
hu
ptq denote the source coding rate (bit per pixel) of chunk t requested
by user u from helper h. It follows that the source coding rates must satisfy, for all t,
¸
hPNpuqXHpfuq
R
hu
ptqB
fu
pm
u
ptq;tq; @ph;uqPE; (3.1)
53
where m
u
ptq denotes the quality level at which chunk t of le f
u
is requested by user u.
The constraint (3.1) re
ects the fact that the aggregate bits of a given chunk t from all
helpers serving user u must be equal to the total number of bits in the requested chunk.
When a chunk request is made and the source coding ratesR
hu
ptq are determined, helper
h places the corresponding kR
hu
ptq bits in a transmission queue Q
hu
\pointing" at user
u. This queue contains the sourceencoded bits that have to be sent from helper h to
user u. Notice that in order to be able to download dierent parts of the same chunk
from dierent helpers, the network controller needs to ensure that all received bits from
the serving helpersNpuqXHpf
u
q are useful, i.e., the union of all requested bits yields
the total bits in the requested chunk, without overlaps or gaps. Alternatively, each chunk
can be encoded by intrasession Random Linear Network Coding [82] such that as long
as kB
fu
pm
u
ptq;tq parity bits are collected at user u, the tth chunk can be decoded and
it becomes available in the user playback buer. Interestingly, even though we optimize
over algorithms that allow the possibility of downloading dierent bits of the same chunk
from dierent helpers, the optimal scheduling policy (derived in Section 3.2) has a simple
structure that always requests entire chunks from single helpers. Hence, without loss of
optimality, neither protocol coordination to prevent overlaps or gaps, nor intrasession
linear network coding, are needed for the algorithm implementation.
3.1.1 Wireless transmission channel
We model the wireless channel for each linkph;uqPE as a frequency and time selective
underspread fading channel [83]. Using OFDM, the channel can be converted into a
set of parallel narrowband subchannels in the frequency domain (subcarriers), each of
which is timeselective with a certain fading channel coherence time. The smallscale
Rayleigh fading channel coecients can be considered as constant over timefrequency
\tiles" spanning blocks of adjacent subcarriers in the frequency domain and blocks of
OFDM symbols in the time domain. For example, in the LTE standard [28], the small
scale fading coecients can be considered constant over a coherence time interval of 0:5
54
ms and a coherence bandwidth of 180 kHz, corresponding to \tiles" of 7 OFDM symbols
12 subcarriers. For a total system available bandwidth of 18MHz (after excluding
the guard bands) and a scheduling slot of duration T
gop
0:5s (typical video chunk
duration), we have that a scheduling slot spans
0:51810
6
0:510
3
18010
3
10
5
tiles, i.e., channel
fading coecients. Even assuming some correlation between fading coecients, it is
apparent that the timefrequency diversity experienced in the transmission of a chunk is
very large. Thus, it is safe to assume that channel coding over such a large number of
resource blocks achieves the ergodic capacity of the underlying fading channel.
In this chapter we refer to ergodic capacity as the average mutual information resulting
from Gaussian i.i.d. inputs of the singleuser channel from helper h and user uPNphq,
while treating the intercell interference, i.e., the signals of all other helpers h
1
h as
noise, where averaging is with respect to the rstorder distribution of the smallscale
fading. This rate is achievable by i.i.d. Gaussian coding ensembles and approachable in
practice by modern graphbased codes [84] provided that the length of a codeword spans
a large number of independent smallscale fading states [85].
We assume that the helpers transmit at constant power, and that the smallcell net
work makes use of universal frequency reuse, that is, the whole system bandwidth is used
by all the helper stations. We further assume that every user u, when decoding a trans
mission from a particular helper hPNpuq treats intercell interference as noise. Under
these system assumptions, the maximum achievable rate
3
at slot timet for linkph;uqPE
is given by
C
hu
ptqE
log
1
P
h
g
hu
ptqs
hu

2
1
°
h
1
h
h
1
PNpuq
P
h
1g
h
1
u
ptqs
h
1
u

2
; (3.2)
whereP
h
is the transmit power of helperh,s
hu
is the smallscale fading gain from helper
h to useru andg
hu
ptq is the slow fading gain (path loss) from helper h to useru. Notice
3
We express channel coding rates
hu
in bit/s/Hz, i.e., bit per complex channel symbol use and the
source coding ratesR
hu
in bit/pixel, i.e., bits per source symbol, in agreement with standard information
theoretic channel coding and source coding.
55
that at the denominator of the Signal to Interference plus Noise Ratio (SINR) inside the
logarithm in (3.2) we have the sum of the signal powers of all helpers h
1
PNpuq :h
1
h,
indicating the intercell interference suered from useru, when decoding the transmission
from helper h.
In this work, consistently with most current wireless standards, we consider the case of
intracell orthogonal access. This means that each helper h serves its neighboring users
u PNphq using orthogonal FDMA/TDMA. It follows that the feasible set of channel
coding ratest
hu
ptq :uPNphqu for each helper h must satisfy the constraint:
¸
uPNphq
hu
ptq
C
hu
ptq
¤ 1; @ hPH: (3.3)
The underlying assumption, which makes the rate region dened in (3.3) achievable, is
that helper h is aware of the slowly varying path loss coecients g
hu
ptq for all uPNphq,
such that rate adaptation is possible. This is consistent with currently implemented rate
adaptation schemes [28,67,86].
3.1.2 Transmission queues dynamics and network state
The dynamics (time evolution) of the transmission queues at the helpers is given by:
Q
hu
pt 1q maxtQ
hu
ptqn
hu
ptq; 0ukR
hu
ptq; @ph;uqPE; (3.4)
wheren denotes the number of physical layer channel symbols corresponding to the dura
tionT
gop
, and
hu
ptq is the channel coding rate (bits/channel symbol) of the transmission
from helperh to useru at timet. Notice that (3.4) re
ects the fact that at any chunk time
t the requested amount kR
hu
ptq of sourceencoded bits is input to the queue of helper h
serving useru, and up ton
hu
ptq sourceencoded bits are extracted from the same queue
and delivered by helper h to user u over the wireless channel.
The channel coecientsg
hu
ptq models path loss and shadowing between helperh and
user u, and are assumed to change slowly in time. For a typical smallcell scenario with
56
nomadic users moving at walking speed or slower, the path loss coecients change on a
timescale of the order of 10s (i.e., 20 scheduling slots). This time scale is much slower
than the coherence of the smallscale fading, but it is comparable with the duration of
the video chunks. Therefore, variations of these coecients during a streaming session
(e.g., due to user mobility) are relevant. At this point, we can formally dene the network
state and a feasible scheduling policy for our system.
Denition 8 The network state !ptq collects the quantities that evolve independently
of the scheduling decisions in the network. These are, in particular, the slowlyvarying
channel gains, the video quality levels, and the corresponding bitrates of the chunk at
time t. Hence, we have
!ptqtg
hu
ptq;D
fu
p;tq;B
fu
p;tq :@ph;uqPEu: (3.5)
Denition 9 A scheduling policytaptqu
8
t0
is a sequence of control actionsaptq compris
ing the vector Rptq with elements kR
hu
ptq of requested sourcecoded bits, the vectorptq
with elements n
hu
ptq of transmitted channelcoded bits, and the quality level decisions
tm
u
ptq :@ uPUu.
Denition 10 For any t, the feasible set of control actions A
!ptq
includes all control
actions aptq such that the constraints (3.1) and (3.3) are satised.
Denition 11 A feasible scheduling policy for the system at hand is a sequence of control
actionstaptqu
8
t0
such that aptqPA
!ptq
for all t.
3.2 Problem Formulation and Optimal Scheduling Policy
The goal of a scheduling policy for our system is to maximize a concave network util
ity function of the individual users' video quality index. Since these are timevarying
57
quantities, we focus on the timeaveraged expectation of such quantities. In addition, all
sourcecoded bits requested by the users should be delivered. This imposes the constraint
that all transmission queues at the helpers must be stable. Throughout this work, we use
the following standard notation for the timeaveraged expectation of any quantity x:
x : lim
tÑ8
1
t
t1
¸
0
Erxpqs: (3.6)
We dene D
u
: lim
tÑ8
1
t
°
t1
0
ErD
fu
pm
u
pq;qs to be the timeaveraged expected
quality of useru, andQ
hu
: lim
tÑ8
1
t
°
t1
0
ErQ
hu
pqs to be the timeaveraged expected
length of the queue at helper h for data transmission to user u, assuming temporarily
that these limits exist.
4
Let
u
pq be a concave, continuous, and nondecreasing function
dening network utility vs. video quality for user uPU. Then, the proposed scheduling
policy is the solution of the following NUM problem:
maximize
¸
uPU
u
pD
u
q
subject to Q
hu
8@ph;uqPE
aptqPA
!ptq
@ t; (3.7)
where requirement of nite Q
hu
corresponds to the strong stability condition for all the
queues [52].
By appropriately choosing the functions
u
pq, we can impose some desired notion of
fairness. For example, a general class of concave functions suitable for this purpose is
given by the fairness network utility, dened by [51]
u
pxq
$
'
&
'
%
logx 1
x
1
1
¡ 0; 1
(3.8)
4
The existence of these limits is assumed temporarily for ease of exposition of the optimization problem
(3.7) but is not required for the derivation of the scheduling policy and for the proof of Theorem 3.
58
In this case, it is wellknown that 0 yields the maximization of the sum quality (no
fairness), Ñ8 yields the maximization of the worstcase quality (maxmin fairness)
and 1 yields the maximization of the geometric mean quality (proportional fairness).
Remark 6 On the relevance of NUM problem (3.7) for video streaming. A natural
objection to our problem formulation is that queue stability guarantees only that chunks
\will be eventually delivered" with nite (average) delay, but it does not guarantee that the
chunks are delivered within their playback deadline. In fact, the network utility function
in (3.7) is dened in terms of the longterm averaged requested user video quality level.
As a matter of fact, some requested chunks may not be delivered within their playback
deadlines and therefore the requested video quality may not correspond to the delivered
video quality. Of course, requested and delivered video quality coincide if the maximum
delay incurred by any chunk requested by each user u is not larger than the prebuering
time T
u
allowed at the startup phase of the streaming session of user u. As explained in
Section 1.2.1, in order to obtain a clean and tractable problem leading to a low complexity
and lowoverhead decentralized policy, we have taken a \divide and conquer" approach.
First, we focus on the NUM (3.7) subject to queue stability. Then, we force the system
to work in the smooth streaming regime by allowing sucient prebuering time. This is
obtained by the decentralized adaptive prebuering/rebuering time estimation scheme
presented in Section 3.3.
Here, we argue that the proposed scheme can achieve nearoptimal performance in the
sense of a small perturbation with respect to the optimality of a modied network utility
function where the buer underrun events are weighted by some bounded penalty in terms
of the quality index.
First, we observe that if, for each u, the delay introduced by all queues in the helpers
serving u is upperbounded by a deterministic constant E
u;max
, then by letting the pre
buering time T
u
¥E
u;max
all chunks requested at time t are delivered within their dead
line tT
u
. In this case, the buer underrun rate is zero and our policy (solution of the
59
NUM problem (3.7)) is exactly optimal even with respect to a modied network utility
function that takes into account the buer underrun events.
Then, we observe that, in the realistic case of nonstationary nonergodic networks
considered in this chapter, such uniform delay upper bounds may not exist or may be
simply too loose to yield a practically useful prebufering policy. For this purpose, the
adaptive prebuering/rebuering policy proposed in Section 3.3 provides the best possible
estimate ofT
u
based on local information, such that the buer underrun rate can be made
small. Dene the indicator function
5
of the buer underrun event for user u at time t
as
u
ptq 1tchunk t is not delivered by time tT
u
u and let
u
denote its timeaveraged
expected value (according to the notation dened in (3.6)), i.e., the buer underrun rate of
user u. Let also 0¤
u
ptq¤
u
denote the video quality penalty incurred by such event,
assumed uniformly bounded by the userdependent constant
u
. The modied network
utility function that takes explicitly into account the buer underrun events is given by
°
uPU
u
pD
u
u
q. Since
u
pq is concave and nondecreasing (it has positive bounded
variation), we can write
0¤
u
pD
u
q
u
pD
u
u
u
q
u
u
¤
1
u
; (3.9)
for some positive constant
1
u
. Summing over alluPU and using the fact that
u
¤
u
u
,
which implies
u
pD
u
u
q¥
u
pD
u
u
u
q, we obtain the bounds
¸
uPU
u
pD
u
q¥
¸
uPU
u
pD
u
u
q¥
¸
uPU
u
pD
u
q
¸
uPU
1
u
u
u
: (3.10)
Hence, the maximization in (3.7), combined with a prebuering/rebuering scheme that
makes the buer underrun rate
u
very small for all users, yields a modied network utility
function (including the quality penalty incurred by buer underrun events) within a small
perturbation of the optimal value of (3.7). The latter clearly upper bounds any policy that
5
1tAu denotes the indicator function of a condition or eventA.
60
takes explicitly into account the chunk delivery delays, since it is given in terms of the
requested video quality. In conclusions, when \almost all" chunks are delivered within
their playback time, maximizing the network utility expressed in terms of the requested
video quality is nearly optimal and, as shown in Sections 3.2.1 { 3.2.3, has the advan
tage of yielding a very simple decentralized dynamic scheduling policy through the DPP
approach.
Having claried that the solution of the NUM problem (3.7) is relevant for the VoD
streaming problem at hand, in the following we rst illustrate a dynamic scheduling policy
for problem (3.7) and then provide Theorem 3, which states the optimality guarantee of
the proposed dynamic scheduling policy in a strong persample path sense.
3.2.1 Dynamic scheduling policy
We introduce auxiliary variables
u
ptq and corresponding virtual queues
u
ptq with buer
evolution:
u
pt 1q maxt
u
ptq
u
ptqD
fu
pm
u
ptq;tq; 0u: (3.11)
Each user u P U updates its own virtual queue
u
ptq locally. Also, we introduce a
scheduling policy control parameterV ¡ 0 that trades o the average queue lengths with
the accuracy with which the policy is able to approach the optimum of the NUM problem
(3.7).
According to Denition 9, a scheduling policy is dened by specifying how to calculate
the sourcecoding rates R
hu
ptq, the video quality levels m
u
ptq, and the channel coding
rates
hu
ptq, for all chunk timest. These are given by solving local maximizations at each
user nodeu and helper nodeh. Since these maximizations depend only on local variables
that can be learned by each node from its neighbors through simple protocol signaling
at negligible overhead cost (a few scalar quanties per chunk time), the resulting policy is
decentralized.
61
3.2.1.1 Control actions at the user nodes (congestion control)
At time t, each uPU chooses the helper in its neighborhood having the desired le f
u
and with the shortest queue, i.e.,
h
u
ptq argmintQ
hu
ptq : hPNpuqXHpf
u
qu: (3.12)
Then, it determines the quality level m
u
ptq of the requested chunk at time t as:
m
u
ptq argmin
!
kQ
h
u
ptqu
ptqB
fu
pm;tq
u
ptqD
fu
pm;tq : mPt1;:::;N
fu
u
)
: (3.13)
The source coding rates for the requested chunk at time t are given by:
R
hu
ptq
$
'
&
'
%
B
fu
pm
u
ptq;tq forhh
u
ptq
0 forhh
u
ptq
(3.14)
The virtual queue
u
ptq is updated according to (3.11), where
u
ptq is given by:
u
ptq argmax
V
u
p
q
u
ptq
:
PrD
min
u
;D
max
u
s
(
; (3.15)
where D
min
u
and D
max
u
are uniform lower and upper bounds on the quality function
D
fu
p;tq.
We refer to the policy (3.12) { (3.15) as congestion control since each useru selects the
helper from which to request the current video chunk and the quality at which this chunk
is requested by taking into account the state of the transmission queues of all helpers h
that potentially can deliver such chunk, and choosing the least congested queue (selection
in (3.12)) and an appropriate video quality level that balances the desire for high quality
(re
ected by the term
u
ptqD
fu
pm;tq in (3.13)) and the desire for low transmission
queues (re
ected by the term kQ
h
u
ptqu
ptqB
fu
pm;tq in (3.13)). Notice that the streaming
of the video lef
u
may be handled by dierent helpers across the streaming session, but
each individual chunk is entirely downloaded from a single helper. Notice also that in
62
order to compute (3.12) { (3.15) each user needs to know only local information formed
by the queue backlogs Q
hu
ptq of its neighboring helpers, and by the locally computed
virtual queue backlog
u
ptq.
The above congestion control action at the users is reminiscent of the current adap
tive streaming technology for video on demand systems, referred to as DASH (Dynamic
Adaptive Streaming over HTTP) [79, 87], where the client (user) progressively fetches a
video le by downloading successive chunks, and makes adaptive decisions on the qual
ity level based on its current knowledge of the congestion of the underlying serverclient
connection. Our policy generalizes DASH by allowing the client u to dynamically select
the least backlogged server h
u
ptq, at each chunk time t.
3.2.1.2 Control actions at the helper nodes (transmission scheduling)
At time t, the general transmission scheduling consists of maximizing the weighted sum
rate of the transmission rates achievable at scheduling slot t. Namely, the network of
helpers must solve the MaxWeighted Sum Rate (MWSR) problem:
maximize
¸
hPH
¸
uPNphq
Q
hu
ptq
hu
ptq
subject to ptqPRptq (3.16)
whereRptq is the region of achievable rates supported by the network at time t. In this
work, we consider two dierent physical layer assumptions, yielding to two versions of
the above general MWSR problem.
In the rst case, referred to as \macrodiversity", the users can decode multiple data
streams from multiple helpers if they are scheduled with nonzero rate on the same slot.
Notice that, consistently with (3.2) and (3.3), this does not contradict the fact that
63
interference is treated as noise and that each helper uses orthogonal intracell access.
6
In
the macrodiversity case, the rate regionRptq is given by the Cartesian product of the
orthogonal access regions (3.3), such that the general MWSR problem (3.16) decomposes
into individual problems, to be solved in a decentralized way at each helper node. After
the change of variables
hu
ptq
hu
ptq
C
hu
ptq
, it is immediate to see that (3.16) reduces to the
set of decoupled Linear Programs (LPs):
maximize
¸
uPNphq
Q
hu
ptqC
hu
ptq
hu
ptq
subject to
¸
uPNphq
hu
ptq¤ 1; (3.17)
for all h PH. The feasible region of (3.17) is the Nphqdimensional simplex and the
solution is given by the vertex corresponding to user u
h
ptq given by
u
h
ptq argmaxtQ
hu
ptqC
hu
ptq : uPNphqu; (3.18)
with rate vector given by
hu
h
ptq
ptqC
hu
h
ptq
ptq and
hu
ptq 0 for all uu
h
ptq.
In the second case, referred to as \unique association", any user can receive data from
not more than a single helper on any scheduling slot. In this case, the MWSR problem
reduces to a maximum weighted matching problem that can be solved by an LP as follows.
We introduce variables
hu
ptq such that
hu
ptq 1 if useru is served by helperh at time
t and
hu
ptq 0 if it is not. It is obvious that if
hu
ptq 1, then
hu
ptq C
hu
ptq,
6
As a matter of fact, if a user is scheduled with nonzero rate at more than one helper, it could use
successive interference cancellation or joint decoding of all its intended data streams. Nevertheless, we
assume here, conservatively, that interference (even from the intended streams) is treated as noise.
64
implying that
h
1
u
ptq 0 for all h
1
h and
hu
1ptq 0 for all u
1
u (by (3.3)). Hence,
(3.16) in this case reduces to
maximize
¸
hPH
¸
uPNphq
hu
ptqQ
hu
ptqC
hu
ptq
subject to
¸
hPNpuq
hu
ptq¤ 1@ uPU;
¸
uPNphq
hu
ptq 1@ hPH;
hu
ptqPt0; 1u; @ hPH; uPU: (3.19)
A wellknown result (see [88, Theorem 64.7]) states that, since the network graphG
pU;H;Eq is bipartite, the integer programming problem (3.19) can be relaxed to an LP
by replacing the integer constraints ont
hu
ptqu with the linear constraints
hu
ptqPr0; 1s
for all hPH; uPU. The solution of the relaxed LP is guaranteed to be integral, such
that it is feasible (and therefore optimal) for (3.19). Notice that, in the case of unique
association, the rate scheduling problem does not admit a decoupled solution, calculated
independently at each helper node. Hence, a network controller that solves (3.19) and
allocates the downlink rates (and the userhelper dynamic association) at each slot time
t is required. Again, since t ticks at the chunk time, i.e., on the time scale of seconds,
this does not involve a very large complexity, although it is denitely more complex than
the macrodiversity case.
Remark 7 Dynamic helperuser association. Notice that here, unlike conventional cel
lular systems, we do not assign a xed set of users to each helper. In contrast, the helper
user association is dynamic, and results from the transmission scheduling decision. Notice
also that, for both the macrodiversity and the unique association cases, despite the fact
that each helper h is allowed to serve its queues with rates
hu
ptq satisfying (3.3), the
proposed policy allocates the whole tth downlink slot to a single user uPNphq, served at
65
its own peakrate C
hu
ptq. This is reminiscent of opportunistic user selection in highrate
downlink schemes of 3G cellular systems, such as HSDPA and EvDo [28, 89].
3.2.2 Derivation of the scheduling policy
In order to solve problem (3.7) using the stochastic optimization theory developed in [52],
it is convenient to transform it into an equivalent problem that involves the maximization
of a single time average. This transformation is achieved through the use of auxiliary
variables
u
ptq and the corresponding virtual queues
u
ptq with buer evolution given in
(3.11). Consider the transformed problem:
maximize
¸
uPU
u
p
u
q (3.20)
subject to Q
hu
8@ph;uqPE (3.21)
u
¤D
u
@ u P U (3.22)
D
min
u
¤
u
ptq¤D
max
u
@ u P U (3.23)
aptqPA
!ptq
@ t (3.24)
Notice that constraints (3.22) correspond to stability of the virtual queues
u
, since
u
and D
u
are the timeaveraged arrival rate and the timeaveraged service rate for the
virtual queue given in (3.11). We have:
Lemma 4 Problems (3.7) and (3.20) { (3.24) are equivalent.
Proof See Appendix B.1.
Thanks to Lemma 4, we shall now focus on the solution of problem (3.20) { (3.24).
Let Qptq denote the column vector containing the backlogs of queues Q
hu
@ph;uqPE,
let ptq denote the column vector for the virtual queues
u
@ u PU,
ptq denote the
column vector with elements
u
ptq @ u PU, and Dptq denote the column vector with
elements D
fu
pm
u
ptq;tq@ uPU. Let Gptq
Q
T
ptq;
T
ptq
T
be the composite vector of
66
queue backlogs and dene the quadratic Lyapunov functionLpGptqq
1
2
G
T
ptqGptq. The
oneslot drift of the Lyapunov function at slot t is given by
LpGpt 1qqLpGptqq
1
2
Q
T
pt 1qQpt 1q Q
T
ptqQptq
1
2
T
pt 1qpt 1q
T
ptqptq
1
2
pmaxtQptqptq; 0u Rptqq
T
pmaxtQptqptq; 0u Rptqq Q
T
ptqQptq
1
2
pmaxtptq
ptq Dptq; 0uq
T
pmaxtptq
ptq Dptq; 0uq
T
ptqptq
;
(3.25)
where we have used the queue evolution equations (3.4) and (3.11) and \max" is applied
componentwise.
Noticing that for any nonnegative scalar quantities Q;;R; ;
and D we have the
inequalities
pmaxtQ; 0uRq
2
¤Q
2
2
R
2
2QpRq; (3.26)
and
pmaxt
D; 0uq
2
¤p
Dq
2
2
p
Dq
2
2p
Dq; (3.27)
we have
LpGpt 1qqLpGptqq¤
1
2
T
ptqptq R
T
ptqRptqpRptqptqq
T
Qptq
1
2
p
ptq Dptqq
T
p
ptq Dptqqp
ptq Dptqq
T
ptq
(3.28)
¤KpRptqptqq
T
Qptqp
ptq Dptqq
T
ptq; (3.29)
67
whereK is a uniform bound on the term
1
2
T
ptqptq R
T
ptqRptq
1
2
p
ptq Dptqq
T
p
ptq Dptqq
, which exists under the realistic assumption that the source coding rates, the chan
nel coding rates and the video quality measures are upper bounded by some constants,
independent of t. The conditional expected Lyapunov drift for slot t is dened by
pGptqqErLpGpt 1qqGptqsLpGptqq: (3.30)
Adding on both sides the penalty termV
°
uPU
Er
u
p
u
ptqqGptqs, where V ¥ 0 is the
policy control parameter already introduced above, we have
pGptqqV
¸
uPU
Er
u
p
u
ptqqGptqs
¤KV
¸
uPU
Er
u
p
u
ptqqGptqsE
pRptqptqq
T
QptqGptq
E
p
ptq Dptqq
T
ptqGptq
: (3.31)
The DPP policy acquires information about Gptq and !ptq at every slot t and chooses
aptq P A
!ptq
in order to minimize the right hand side of the above inequality. The
nonconstant part of this expression can be written as
R
T
ptqQptq D
T
ptqptq
V
¸
uPU
u
p
u
ptqq
T
ptqptq
T
ptqQptq: (3.32)
The resulting control action aptq is given by the minimization, at each chunk time t, of
the expression in (3.32). Notice that the rst term of (3.32) depends only on Rptq and
on m
u
ptq @ u PU, the second term of (3.32) depends only on
ptq and the third term
of (3.32) depends only on ptq. Thus, the overall minimization decomposes into three
separate subproblems. The rst subproblem (related to the rst term in (3.32)) consists
68
of choosing the quality levelstm
u
ptqu and the requested videocoding ratestR
hu
ptqu for
each useru and current chunk at time t. The second subproblem (related to the second
term in (3.32)) involves the greedy maximization of each user network utility function
with respect to the auxiliary control variables
u
ptq. The third subproblem (related to
the third term in (3.32)), consists of allocating the channel coding rates
hu
ptq for each
helper h to its neighboring users uPNphq.
Next, we show that the minimization of (3.32) yields the congestion control subpolicy
at the users and the transmission scheduling subpolicy at the helpers given before.
3.2.2.1 Derivation of the congestion control action
The rst term in (3.32) is given by
¸
uPU
$
&
%
¸
hPNpuqXHpfuq
kQ
hu
ptqR
hu
ptq
u
ptqD
fu
pm
u
ptq;tq
,
.

: (3.33)
The minimization is achieved by minimizing separately each term inside the sum w.r.t.
u with respect to m
u
ptq and R
hu
ptq. It is immediate to see that the solution consists of
choosing the helper h
u
ptq as in (3.12), the quality level as in (3.13) and requesting the
whole chunk from helper h
u
ptq at quality m
u
ptq, i.e., letting R
h
u
ptqu
ptq B
fu
pm
u
ptq;tq,
as given in (3.14). The second term in (3.32), after a change of sign, is given by
¸
uPU
tV
u
p
u
ptqq
u
ptq
u
ptqu: (3.34)
Again, this is maximized by maximizing separately each term, yielding (3.15).
3.2.2.2 Derivation of the transmission scheduling action
After a change of sign, the maximization of the third term in (3.32) yields precisely (3.16)
whereRptq is dened by the physical layer model of the network, and it is particularized
to the cases of macrodiversity and unique association as discussed in Section 3.2.1.2.
69
It is worthwhile to notice here that our NUM approach can be applied to virtually
any network with any physical layer (e.g., including nonuniversal frequency reuse, non
orthogonal intracel access, multiuser MIMO [22], cooperative network MIMO [5]). In
fact, all what is needed is to characterize the network in terms of its achievable rate region
Rptq, when averaging with respect to the smallscale fading, and conditioning with respect
to the slowly timevarying pathloss coecients, that depend on the network topology and
therefore on the users motion. Of course, for more complicated type of wireless physical
layers, the description ofRptq and therefore the solution of the corresponding MWSR
problem (3.16) may be much more involved than in the cases treated here. For example,
an extension of this approach to the case of multiantenna helper nodes using multiuser
MIMO is given in [90].
3.2.3 Optimality
As outlined in Section 3.1, VBR video yields timevarying quality and rate functions
D
f
pm;tq andB
f
pm;tq, which depend on the individual video le. Furthermore, arbitrary
user motion yields time variations of the path coecientsg
hu
ptq at the same timescale of
the video streaming session. As a result, any stationarity or ergodicity assumption about
the network state process !ptq is unlikely to hold in most practically relevant settings.
Therefore, we consider the optimality of the DPP policy for an arbitrary sample path of
the network state !ptq. Following in the footsteps of [52, 53], we compare the network
utility achieved by our DPP policy with that achieved by an optimal oracle policy with
T slot lookahead, i.e., such knowledge of the future network states over an interval of
length T slots. Time is split into frames of duration T slots and we consider F such
70
frames. For an arbitrary sample path!ptq, we consider the static optimization problem
over the jth frame
maximize
¸
uPU
u
1
T
pj1qT1
¸
jT
D
u
pq
(3.35)
subject to
1
T
pj1qT1
¸
jT
rkR
hu
pqn
hu
pqs¤ 0@ph;uqPE (3.36)
aptqPA
!ptq
@ t P tjT;:::;pj 1qT 1u; (3.37)
and denote by
opt
j
the maximum of the network utility function for frame j, achieved
over all policies which have future knowledge of the sample path!ptq over thejth frame,
subject to the constraint (3.36), which ensures that for every queueQ
hu
, the total service
provided over the frame is at least as large as the total arrivals in that frame. We have
the following result:
Theorem 3 For the system dened in Section 3.1, with state, scheduling policy and
feasible action set given in Denitions 8, 9 and 10, respectively, the dynamic scheduling
policy dened in Section 3.2.1, with control actions given in (3.11) { (3.18), achieves the
persample path network utility
¸
uPU
u
D
u
¥ lim
FÑ8
1
F
F1
¸
j0
opt
j
O
1
V
(3.38)
with bounded queue backlogs satisfying
lim
FÑ8
1
FT
FT1
¸
0
¸
ph;uqPE
Q
hu
pq
¸
uPU
u
pq
¤OpVq (3.39)
where Op1{Vq indicates a term that vanishes as 1{V and OpVq indicates a term that
grows linearly with V , as the policy control parameter V grows large.
Proof See Appendix B.2.
An immediate corollary of Theorem 3 is:
71
Corollary 2 For the system dened in Section 3.1, when the network state is stationary
and ergodic, then
¸
uPU
u
pD
u
q¥
opt
O
1
V
; (3.40)
where
opt
is the optimal value of the NUM problem (3.7) in the stationary ergodic case,
7
and
¸
ph;uqPE
Q
hu
¸
uPU
u
¤OpVq (3.41)
In particular, if the network state is i.i.d., the bounding term in (3.40) is explicitly given by
Op1{Vq
K
V
, and the bounding term in (3.41) is explicitly given by
KVpmax
min
q
, where
min
°
uPU
u
pD
min
u
q,
max
°
uPU
u
pD
max
u
q, ¡ 0 is the slack variable corresponding
to the constraint (3.36), and the constantK is dened in (3.29).
Proof See Appendix B.2.
3.3 Prebuering, rebuering and skipping chunks
As described in Section 3.1, the playback process consumes chunks at xed playback
rate 1{T
gop
(one chunk per time slot), while the number of ordered chunks per unit time
entering the playback buer is a random variable, due to the fact that the network state
!ptq is a random process (or an arbitrary varying function of time) and the transmission
resources are dynamically allocated by the scheduling policy. Chunks must be ordered
sequentially in order to be useful for video playback. If chunks go through dierent queues
in the network and are aected by dierent delays, it may happen that already received
7
Notice that in the stationary and ergodic case the value
opt
is generally achieved by an instantaneous
policy with perfect knowledge of the state statistics or, equivalently, by a policy with innite lookahead,
since the state statistics can be learned arbitrarily well from any sample path with probability 1, because
of ergodicity.
72
chunks with higher order number cannot be used for playback until the missing chunks
with lower order number are also received.
As we have noticed already in Section 1.2.1 and in Remark 6, the NUM problem
formulation in (3.7) does not take into account the possibility of buer underrun events,
i.e., chunks that are not delivered within their playback deadline. This simplication has
the advantage of yielding the simple and decentralized scheduling policy of Section 3.2.1.
However, in order to make such policy useful in practice we have to force the system to
work in the smooth streaming regime, i.e., in the regime of very small buer underrun
rate. This can be done by adaptively determining the prebuering time T
u
for each user
u on the basis of an estimate of the largest delay of queuestQ
hu
ptq :hPNpuqu. In this
section, we propose a simple method that allows to determine T
u
in a decentralized way,
based on the local information available at each user u.
An example of the playback buer dynamics is illustrated in Table 3.1 and Fig. 3.1.
The table indicates the chunk numbers and their respective arrival times. The blue curve
in Fig. 3.1 shows the time evolution of the number of ordered chunks available in the
playback buer. The green curve indicates the evolution with time of the number of
chunks consumed by playback. The playback consumption starts after an initial pre
buering delay T
u
d, as indicated in the gure. At any instant t, the chunk requested
at td is expected to be available in the playback buer. However, if the chunk is
delivered with a delay greater than d, the two curves meet and a buer underrun event
occurs. In order to prevent these events, each user u should choose its prebuering time
T
u
to be larger than the maximum delay of the serving queuestQ
hu
:hPNpuqXHpf
u
qu.
Unfortunately, such maximum delay is neither deterministic nor known a priori.
We propose a scheme where each user u estimates its local delays by monitoring its
delivery times in a sliding window spanning a xed number of time slots. In addition,
users can also skip a chunk if, by doing so, a suciently large jumpup in the number of
ordered chunks in the playback buer is achieved. For instance, in Table 3.1 and Fig. 3.1,
the chunk which comes 4
th
in the ordered sequence arrives at the end of time slot 11.
73
Table 3.1: Arrival times of chunks
Chunk number 1 2 3 4 5 6 7 8 9 10 11 12 13
Arrival time 3 4 5 11 6 8 9 10 12 13 16 15 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1
2
3
4
5
6
7
8
9
10
d
time
number of chunks
number of ordered chunks in playback buffer
number of chunks consumed by playback
Figure 3.1: Evolution of number of ordered and consumed chunks
However, chunks numbered 5; 6; 7 and 8 arrive before slot 11 but cannot be played since
4 is missing. More generally, if chunk 4 were to arrive with a delay such that the number
of chunks which arrive before 4 but come later in the ordered sequence becomes large,
then the user could either continue waiting for the missing chunk and incur a stall event,
or skip chunk 4 from playback and take advantage of the many already received chunks.
Let t
k
denote the time slot in which a user requests the k
th
chunk and let A
k
be the
time slot in which the chunk arrives at the user playback buer. The delay for chunk
k is W
k
A
k
t
k
. Without loss of generality, consider a user u starting its streaming
session at time t 1. In the proposed scheduling policy, user u requests one chunk per
scheduling time, sequentially and possibly from dierent helpers, such that t
k
k. Since
the chunks are downloaded from dierent helpers with dierent queue lengths, they may
be received out of order. For example, it may happen that A
k
A
j
for some j k.
Hence, chunk k cannot be played until all chunks j for j k are also received. We say
74
that a chunk k becomes playable when all the chunks j¤k are received. Let P
k
denote
the time when chunk k becomes playable. Then, we have:
P
k
maxtA
1
;A
2
;:::;A
k
u: (3.42)
The proposed policy consists of two parts: skipping chunks from playback and buering
policy. We examine these two features separately in the following.
3.3.1 Skipping chunks from playback
Prior to slot t, the set of playable chunks istk :P
k
¤t 1u and
k
t1
maxtk :P
k
¤t 1u (3.43)
is the highestorder chunk in the ordered sequence of playable chunks. At the end of slot
t, user u considers the setC
t
of all chunks which have arrived before or during slot t and
which come later than k
t1
in the ordered sequence of playback. The setC
t
is given by:
C
t
tk :k¡k
t1
; A
k
¤tu: (3.44)
The next available chunk with order larger than k
t1
is given by:
k
t
mintk :k¡k
t1
; A
k
¤tu: (3.45)
LetC
t
C
t
be the set of chunks which become playable at the end of slot t, i.e.,
C
t
tk :A
k
¤t;P
k
tu: (3.46)
75
If k
t
comes next to k
t1
in the playback order (i.e. if k
t
k
t1
1), thenC
t
is non
empty and all the chunks k PC
t
can be added to the playback buer. Further, k
t
is
recursively updated as:
k
t
k
t1
C
t
: (3.47)
Denoting the increment in the size of the playback buer at the end of slott by
t
, we
have in this case that
t
C
t
. On the other hand, ifk
t
is not the immediate successor
of k
t1
in the playback order (i.e. if k
t
¡k
t1
1), then there is no chunk inC
t
which
becomes playable at the end of slot t and thereforeC
t
H. In this case, the algorithm
comparesC
t
 with a threshold in order to decide whether it should wait further for the
missing chunk k
t1
1 or skip it in order to increase the playback buer anyway. The
intuition behind such a decision is that it is worthwhile to skip a chunk if skipping such a
chunk results in a large jump in the playback buer size. The size of this possible jump
can be exactly computed fromC
t
as follows: assuming k
t
k
t1
2, if we skip chunk
k
t1
1, then the increase in the playback buer is given by the size of the set:
tj¥ 2 : k
t1
iPC
t
@ 2¤i¤ju:
We therefore propose the following policy: ifC
t
¤ (where is a parameter¡ 0), then
wait for chunk k
t1
1 and let k
t
k
t1
. Otherwise, if C
t
 ¡ , the increase of the
playback buer is worthwhile and therefore it is useful to skip chunk k
t1
1. In this
case, if k
t
k
t1
2, then k
t
is updated as
k
t
k
t1
maxtj :k
t1
i@ 2¤i¤ju (3.48)
and all the chunks numbered fromk
t1
2 tok
t
are made playable at the end of slott and
added to the playback buer. We therefore have
t
tj¡ 2 :k
t1
i@ 2¤i¤ju in
this case. Instead, ifk
t
¡k
t1
2, then the user skips chunkk
t1
1 and starts waiting
76
for chunkk
t1
2. Only a single chunk is allowed to be skipped per slot because skipping
multiple chunks might cause damage to the quality of experience of the user. Note that
in this case,k
t
is updated tok
t1
1 even though the chunkk
t1
1 is missing and is not
playable. This is to ensure that when chunkk
t1
2 is received, it is considered playable
despite the fact that chunkk
t1
1 is missing. Also note that there is no increment in the
playback buer (i.e.,
t
0) in this case because there is no new chunk which becomes
playable. Note that choosing 8 corresponds to the case when no chunk is skipped.
3.3.2 Prebuering and rebuering
The goal here is to determine the delayT
u
after which useru should start playback, with
respect to the time at which the rst chunk is requested (beginning of the streaming
session). Intuitively, choosing a large T
u
makes the buer underrun rate small. However,
a too large T
u
is very annoying for the user's quality of experience. From the chunk
skipping strategy seen above, we know that
t
is the number of new chunks added to
the playback buer at the end of slot t. We dene the size of the playback buer
t
as
the number of playable chunks in the buer not yet played. Without loss of generality,
assume again that the streaming session starts at t 1. Then,
t
is recursively given by
the updating equation:
t
maxt
t1
1tt¡T
u
u; 0u
t
: (3.49)
From the qualitative discussion on the evolution of the playback buer at the beginning
of Section 3.3, we notice that the longest period during which
t
is not incremented (in
the absence of chunk skipping decisions) is given by the maximum delay W
k
to deliver
chunks. In addition, we note that each useru needs to adaptively estimateW
k
in order to
chooseT
u
. In the proposed method, useru calculates for every chunkk the corresponding
delay W
k
A
k
t
k
. Notice that the delay of chunk k, can be calculated only at time
A
k
, i.e., when the chunk is actually delivered. At each time t 1; 2;:::, useru calculates
77
the maximum observed delay E
t
in a sliding window of size , (in all the numerical
experiments in the sequel, we use 10) by letting:
E
t
maxtW
k
: t 1¤A
k
¤tu: (3.50)
Finally, user u starts its playback when
t
crosses the level E
t
, i.e.,
T
u
mintt :
t
¥E
t
u: (3.51)
If we have
t
0 for some t ¡ T
u
, a stall events occurs and the algorithm enters a
rebuering phase in which the same algorithm presented above is employed again to
determine the new instant tT
u
1 at which playback is restarted. Notice that, with
some abuse of notation, we have denoted the rebuering delay again byT
u
although this
is reestimated using the sliding window method at each new stall event. In fact, when a
stall event occurs, it is likely that some change in the network state has occurred, such
that the maximum delay must be reestimated.
3.4 Numerical Experiments, Discussion and Conclusions
In this section, we present two targeted numerical experiments illustrating the particular
features of the proposed scheme. The rst experiment considers the performance under a
\macrodiversity" physical layer, for which the rate scheduling subproblem takes on the
form (3.17). We consider a large network with many stationary users and one mobile user
moving across the network at constant speed. Users alternate between idle and active
phases of video streaming. Each streaming session (when moving form idle to active state)
is initialized using the prebuering scheme described in Section 3.3. This simulation
demonstrates the dynamic and adaptive nature of the policy in response to VBR video
coding and users joining or leaving the system at arbitrary times. Furthermore, the
statistics relative to the streaming session of the mobile user shed light on the ability of
78
the proposed algorithm to seamlessly discover new helper nodes as the user changes its
position across the network. The second experiment considers a smaller network formed
by four helpers and several users, in a situation of congestion for which most users are
close to one helper. We consider the proposed scheme both under a \macrodiversity" and
under \unique association" physical layer (where in the latter case, the rate scheduling
problem takes on the form (3.19)) and compare its performance with a naive approach
with maxSINR userhelper association, representative of today's baseline technology.
As described in Section 1.2, the helpers could be base stations connected to some
video server through a wired backbone, or they could be dedicated wireless nodes with
local caching capacity. For the sake of simplicity and replicability of our numerical results,
here we assume that each helper has available the whole video library. Therefore, for any
request f
u
we haveNpuqXHpf
u
qNpuq. We use the utility function
u
pxq logpxq
for all u PU (i.e., we use fairness with 1 [51]). As described in Section 3.1, a
scheduling slot duration of 0:5s and a total available system bandwidth of W 18 MHz
yield 10
5
LTE resource blocks per slot [28]. The total number of channel symbols n in a
scheduling slot is 10
5
84. We assume that each user u has an edge to every helper h
which satises nC
hu
ptq¡ 1 Mb (i.e., at least 2 Mb/s of peak rate).
The path loss coecients g
hu
ptq between helper h and user u are based on the WIN
NER II channel model [91]. In particular, we let
g
hu
ptq 10
PLpd
hu
ptqq
10
;
where d
hu
ptq is the distance from helper h to user u at time t, and where
PLpdqA logpdqBC logpf
0
{5q
dB
: (3.52)
In (3.52), d is expressed in meters, the carrier frequency f
o
in GHz, and
dB
denotes
a shadowing lognormal variable with variance
2
dB
. The parameters A;B;C and
2
dB
are scenariodependent constants. Among the several models specied in WINNER II
79
we chose the A1 model in [91], representative of a smallcell scenario. In this case,
3 ¤ d ¤ 100, and the model parameters are given by A 18:7, B 46:8, C 20,
2
dB
9 in lineofsight (LOS) condition, or A 36:8, B 43:8, C 20,
2
dB
16 in
nonlineofsight (NLOS) condition. For distances less than 3 m, we extended the model
by setting PLpdq PLp3q. Each link is in LOS or NLOS independently and at random,
with probability p
l
pdq and 1p
l
pdq, respectively, where
p
l
pdq
$
'
&
'
%
1 if d¤ 2:5m
1 0:9p1p1:24 0:6 logpdqq
3
q
1{3
if else
Every helper transmits at xed power level P 10
8
.
Using Jensen's inequality in (3.2) to replacing the denominator of the SINR term
with its average (this will be the average received intercell interference power), and
the fact that the smallscale fading coecients s
hu
areCNp0; 1q, the peak achievable
rates can be lowerbounded by the closedform expressionC
hu
ptqe
1{
hu
ptq
Ei
1;
1
hu
ptq
,
where Eip1;xq
³
8
x
e
t
t
dt for x ¥ 0, and
hu
ptq
P
h
g
hu
ptq
1
°
h
1
h
P
h
1g
h
1
u
ptq
. This formula,
which provides a very accurate lower bound to (3.2) when the SINR denominator in (3.2)
contains many independent terms, is an achievable rate
8
and is used in the numerical
results of this section.
We assume that all the users request chunks successively from VBRencoded video
sequences. Each video le is a long sequence of chunks, each of duration 0:5 seconds
and with a frame rate 30 frames per second. We consider a specic video sequence
formed by 800 chunks, constructed using 4 video clips from the database in [92], each
of length 200 chunks. The chunks are encoded into dierent quality modes. Here, the
quality index is measured using the Structural SIMilarity (SSIM) index dened in [93],
Fig. s 3.2a and 3.2b show the size in kbits and the SSIM values as a function of the chunk
index, respectively, for the dierent quality modes. In our experiments, the chunks from
8
A lower bound to an achievable rate is obviously achievable.
80
0 100 200 300 400 500 600 700 800
0
1000
2000
3000
4000
5000
6000
7000
8000
chunk number
chunk size in kbits
(a) Bitrate prole
0 100 200 300 400 500 600 700 800
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
chunk number
quality in SSIM index
(b) Quality prole
Figure 3.2: Ratequality prole of the test video sequence used in our simulations.
1 to 200 and 601 to 800 are encoded into 8 quality modes, while the chunks numbered
from 201 to 600 are encoded in 4 quality modes. In both the experiments in the sequel,
each user starts its streaming session of 1000 chunks from some arbitrary position in this
reference video sequence and successively requests 1000 chunks by cycling through the
sequence.
3.4.1 Experiment 1
In the large network experiment, we consider a 40m40m square area divided into 8 8
small square cells of side length 5m as shown in Fig. 4.2. A helper is located at the center
of each small square cell. The network includes 319 randomly placed stationary users
and one mobile user whose trajectory is indicated by the green line. At t 0, the mobile
user starts a video streaming session of 1000 chunks. Simultaneously, it starts moving
along the trajectory and stops after it requests the 1000
th
chunk. It doesn't request any
more chunks after it stops moving. As the user moves through its trajectory, the new
path loss coecientsg
hu
ptq are calculated using the Winner II model said above, leading
timevarying peak link ratesC
hu
ptq. The remaining 319 users in the system are stationary
throughout the simulation period and alternate between idle and active phases of video
streaming. Att 0, all the stationary users are idle and each one of them independently
81
starts a streaming session with probability p 0:005 at every slot. Thus, the time for
which a user stays idle is geometrically distributed with mean
1
p
200 slots. Once a user
starts a streaming session, it stays active during 1000 video chunks. After nishing the
requests, it goes back into the idle state and may start a new session after an independent
and random geometrically distributed idle time. We simulate the proposed scheme under
the macrodiversity physical layer for 3000 slots for xed values of the key parameters V ,
and set to 10
13
; 25 and 50 respectively. These values have been chosen after extensive
simulation and yields a good behavior of the scheduling policy. In general, the policy
parameters have to be tuned to the specic network environment.
−22.5 −17.5 −12.5 −7.5 −2.5 2.5 7.5 12.5 17.5
−22.5
−17.5
−12.5
−7.5
−2.5
2.5
7.5
12.5
17.5
Figure 3.3: Toplogy (the green line indicates the trajectory of the mobile user in Experi
ment 1).
We show the results in terms of the empirical CDF (over the user population) of
the following metrics: 1) The percentage of skipped chunks spanning multiple streaming
sessions of each user (Fig. 3.4d); 2) The quality (SSIM) averaged over the delivered chunks
spanning multiple streaming sessions of each user (Fig. 3.4a); 3) The initial prebuering
time (in number of slots) is calculated for each streaming session (Fig. 3.4b); 4) The
percentage of time spent in rebuering mode is calculated with respect to the total
playback time spanning multiple streaming sessions of each user (Fig. 3.4c).
Focusing on the mobile user, we observe that the percentage of skipped chunks is 16%
and the prebuering time is 180 time slots (i.e., 1min). Fig. 3.4e shows the evolution
82
of the playback buer
t
over time. We notice that there is only one interruption (stall
event) in the entire streaming session. The quality (SSIM) averaged over the delivered
chunks is observed to be a high value of 0:87 (the maximum being 1.0). The helpers
are numbered from 1 to 64, left to right and bottom to top, in Fig. 4.2. In Fig. 3.4f,
we plot the helper index providing chunk k 1;:::; 1000 vs. the chunk index. We can
observe that as the user moves slowly along the path, the policy \discovers" adaptively
the current neighboring helpers and downloads chunks from them in a seamless fashion.
Overall, these results demonstrate the dynamic and adaptive nature of the proposed
policy in response to user mobility, variable bitrate video coding, and users joining or
leaving the system at arbitrary times.
3.4.2 Experiment 2
In this experiment, we focus on a smaller network with 4 helpers and 20 stationary users
as indicated in Fig. 3.5a. The dimensions used for the topology are the same as in
Fig. 4.2 where each of the 4 helpers is located at the centre of a 5m 5m square cell
and the overall area of the system is 10m 10m. We consider a situation where the
20 users in the system are located close to the same helper, as indicated in Fig. 3.5a.
We choose this nonuniform user distribution in order to investigate the load balancing
property of the proposed policy in contrast to a naive scheme that allocates users to
helpers based on maximum signal strength (or, equivalently, based on maximum SINR).
In this experiment, all the 20 users start their streaming session simultaneously at t 0,
and stop after 1000 requested chunks. A baseline scheme, representative of current WLAN
technology, performs clientbased userhelper association, i.e., every useru chooses helper
h
u
ptq argmaxtC
hu
ptq :hPNpuqu. Then, the streaming process takes place accordingly
by adapting the requested video quality according to DASH [79, 87]. We have emulated
this situation by applying the same video quality level decisions as in (3.13), with user
helper association as given above.
83
We provide results for the proposed schemes with both \macrodiversity" and \unique
association". In order to simulate the unique association scheme, we solve the LP relax
ation of (3.19) in every slot using the standard linear programming solver of MATLAB.
In practice, this can be implemented by a centralized network controller. The results
are shown in the form of empirical CDF (over the user population) of: 1) SSIM aver
aged over the chunks (Fig. 3.5b); 2) fraction of slots spent in buering mode (including
prebuering and rebuering periods) (Fig. 3.5c); We notice that the proposed policy,
both under macrodiversity and unique association, improves over the baseline scheme in
terms of the video quality metric and the fraction of slots spent in buering mode. This is
because the baseline scheme a priori xes the association of a user to the helper with best
peak link rate, while the proposed schemes yield better load balancing by allowing each
user to dynamically select the best helper in its neighborhood based on the congestion
control decision (3.12), which takes into account the length of all queues \pointing" at
the user itself. In addition, we notice that though the macrodiversity and the unique
association schemes dier signicantly in terms of implementation, the dierence in terms
of performance is small. This shows that 1) even in such a small cell scenario, macro
diversity does not provide a large gain over unique association;
9
2) the major source of
gain of the proposed scheme over the base line scheme is due to its seamless load balancing
property; 3) the main advantage of a macrodiversity physical layer over a physical layer
where unique association is enforced consists of the simplicity of the decentralized nature
of rate scheduling subproblem (3.17) over the centralized maximum weighted matching
solution (3.19).
9
Notice that in a macrocell scenario, where most users are in good SINR conditions to at most one
base station, macrodiversity would yield an even smaller performance gain over unique association.
84
0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
quality averaged over delivered chunks (x)
fraction of users with average quality < x
(a)
0 50 100 150 200 250 300 350 400 450
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pre−buffering time (x)
fraction of users with pre−buffering time < x
(b)
0 5 10 15 20 25 30 35
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
percentage of playback time spent in re−buffering mode (x)
fraction of users with % of re−buffering < x
(c)
0 2 4 6 8 10 12 14 16 18
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
percentage of skipped chunks (x)
fraction of users with % of skipped chunks < x
(d)
0 100 200 300 400 500 600 700 800 900 1000
0
20
40
60
80
100
120
140
160
time slot
playback buffer size
(e)
0 100 200 300 400 500 600 700 800 900 1000
0
10
20
30
40
50
60
chunk number
helper assigned
(f)
Figure 3.4: CDFs of dierent performance metrics for Experiment 1.
85
−7.5 −2.5 2.5
−7.5
−2.5
2.5
(a)
0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
SSIM averaged over delivered chunks (x)
fraction of users with average SSIM < x
macro−diversity
baseline
unique association
(b)
20 25 30 35 40 45 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
percentage of time spent in buffering mode (x)
fraction of users with % of buffering < x
macro−diversity
baseline
unique association
(c)
Figure 3.5: Topology and CDFs of dierent performance metrics for Experiment 2.
86
Chapter 4
WiFlix: Adaptive Video Streaming in Massive MUMIMO
Networks
The \push" scheduling policy of Chapter 3 is eective for network topologies with nomadic
(slowly moving) users, but can result in video chunks being delivered out of order at the
user. Furthermore, a relatively fast moving user which is preselected for transmission at
a helper may move out of that helper's vicinity when the actual transmission takes place.
Moreover, at the PHY layer, Chapter 3 considered a simple system where each base sta
tion had a single antenna and therefore could serve only a single user in one transmission
resource (referred to as a PHY frame hereafter). However, it is widely accepted that the
explosive increase in demand for video content can be met only through advanced PHY
layer techniques. This motivates the recent surge in the quest for disruptive technologies
dening the next generation (5G) of wireless systems [57]. Among several approaches
proposed, the most promising seems to be the dense and massive deployment of base sta
tion antennas in the form of \massive MUMIMO" solutions where hundreds of antennas
are installed within reasonable form factors at each base station [4, 58] promising large
increases in spectral eciency by transmitting independent data streams simultaneously
to multiple users sharing the same PHY frame.
87
4.1 Contributions
Motivated by these considerations, this chapter focuses on the problem of dynamic adap
tive video streaming in a wireless network formed by a number of densely deployed wireless
helper nodes, employing massive MUMIMO technology, serving multiple wireless users
over a given geographic coverage area and on the same shared channel bandwidth. We
address the problem by jointly optimizing the video quality adaptation at the DASH layer
(application layer) and the transmission scheduling of user subsets at the MAC layer given
that the PHY layer employs MUMIMO beamforming. This is obtained through a cross
layer approach where the appropriate queue sizes maintained at the users act as a bridge
between the layers. In particular, the novel contributions of this chapter are as follows:
We introduce the notion of a request queue. This is a virtual queue, maintained
by each user, that serves to sequentially request video chunks from helper nodes,
such that the choice of the helper node and the quality at which each video chunk
is requested can be adaptively adjusted. Each user, upon deciding the quality of
the chunk, requests the bits corresponding to that chunk and places them in the
request queue. Note that this does not mean the user has already downloaded
the chunk, but the chunk bits are \virtually" placed in the request queue and will
be taken out when the chunk is eectively delivered to the user. In this way, the
user maintains in the request queue all the chunk bits that have been requested
but not downloaded and adaptively adjusts the quality of future chunk requests
based on its length. In addition, the user broadcasts this length to the helpers
in its current vicinity and \pulls" bits from them in the right order necessary for
video playback starting at the \head of line" (HOL) of the request queue. Even if
a mobile user gets out of range of a helper while downloading the HOL bits, it can
still rerequest those bits from the new helper in its current vicinity. In this way,
the user always downloads chunks in the playback order and does not skip any of
them. This improves signicantly upon the \push" scheme proposed in Chapter 3
88
where the chunks could be downloaded out of order due to varying delays at the
helper queues or skipped if a user moves out of a helper's coverage.
We systematically obtain our crosslayer policy as the dynamic solution of a Network
Utility Maximization (NUM) problem, where the network utility function is given in
terms of the users' timeaveraged video quality, and the maximization constraints
are given by imposing stability of each request queue. The stability constraint
implies that every requested chunk will be eventually delivered, while delivery in
the right sequential order is guaranteed by the request queue mechanism described
above. The proposed policy decomposes naturally into two interconnected layers: i)
a video streaming adaptation layer reminiscent of DASH, implemented at each user
node, and involving the adaptive video quality selection and placement of the video
chunk requests into the request queue; ii) a transmission scheduling layer where a
maxweight scheduler is implemented at the helpers. These two layers are inter
connected by the users' request queues, which form the weights for the maxweight
scheduler. Although queue stability guarantees that all requested chunks are even
tually delivered, such delivery may still occur, occasionally, after the corresponding
playback deadline. In this case, we are in the presence of a stall event. In order
to control the stall event probability and make it suciently small, we follow the
same divide and conquer approach of [47], and adaptively set the prebuering/re
buering time by monitoring the chunk delivery delay in a sliding window. This
approach has the advantage of yielding very good performances also in terms of
stall event probability, while allowing for the elegant and mathematically tractable
NUM framework in terms of the video quality maximization.
We particularize the maxweight transmission policy to a network of helpers with
MUMIMO capabilities, where the scheduling actions consist of choosing the subset
of users for MUMIMO beamforming at each helper. By exploiting the \channel
hardening" eect of largedimensional MIMO channels (massive MIMO) [4, 5, 29],
89
we reduce the combinatorial weighted sum rate maximization over the multiuser
multicell network (which would involve an exponentially complex exhaustive user
selection, or some polynomial complexity heuristic greedy user selection at each
helper) to a simple subset selection problem which is optimally solved by a low
complexity algorithm. The algorithm can be implemented independently at the
MAC layer of each helper. The only information that needs to be exchanged between
the layers is the length of the users' request queues, which can be easily gathered
as \protocol information" via the uplink, together with the chunk requests.
We show through simulation in a realistic network topology and using actual en
coded video data that the proposed system is very eective in improving the average
video quality and reducing the percentage of time spent in buering mode.
4.2 System Model
As in Chapter 3, we consider a wireless network with multiple users and multiple helper
stations sharing the same bandwidth. The network is dened by a bipartite graphG
pU;H;Eq, whereU denotes the set of users,H denotes the set of helpers, andE contains
edges for all pairs ph;uq such that helper h can transmit information to user u. We
denote byNpuq H the neighborhood of user u, i.e.,Npuq th PH : ph;uq PEu.
Similarly,Nphq tu PU : ph;uq PEu. Each user u PU requests a video le f
u
from
a libraryF of possible les. As in Chapter 3, each video le is formed by a sequence
of chunks. Each chunk corresponds to a group of pictures (GOP) that are encoded
and decoded as standalone units [79]. Chunks have a xed playback duration, given
by T
gop
(# frames per GOP){, where is the frame rate, expressed in frames per
second. The streaming process consists of transferring chunks from the helpers to the
requesting users such that the playback buer at each user contains the required chunks
at the beginning of each chunk playback deadline. The playback starts after a short
prebuering time, during which the playback buer is lled by a determined amount of
90
ordered chunks. The details relative to prebuering and chunk playback deadlines are
discussed in Section 4.5.
As in Chapter 3, each le f PF is encoded at a nite number of dierent qual
ity/compression levels m P t1;:::;N
f
u. Due to the variable bit rate (VBR) nature of
video coding [55], the qualityrate prole of a given le f may vary from chunk to chunk.
For example, the same compression level may produce a dierent user quality index as
well as a dierent bit requirement from one chunk to the next, depending on whether the
video chunk is showing a still image or a rapidly changing scene. We let D
f
pm;iq and
B
f
pm;iq denote the video quality measure (e.g., see [81]) and the size (in number of bits)
of the i
th
chunk in le f at quality level m respectively.
4.2.1 Timescales
In contrast to Chapter 3, in this chapter, we consider a more practical system where
we separate the time scales at which the congestion control decisions at the application
layer and the transmission scheduling decisions at the PHY/MAC layer take place. This
is because the time scale at which chunks are requested and the time scale at which
PHY layer transmissions are scheduled dier by 1 3 orders of magnitude. For instance,
in current video streaming technology [25], the typical video chunk spans a duration of
0:5 2 seconds while the duration of a PHY frame is of the order of milliseconds. For
example, with a PHY frame duration of 10 ms (as in the LTE 4G standard [67]) and
assuming T
gop
0:5s, a video chunk spans n
0:5
1010
3
50 PHY frames. Fig. 4.1
qualitatively illustrates the dierent time scales.
In the following, we consider dynamic scheduling policies that operate every transmis
sion slott at the PHY frame time scale, i.e., they provide a scheduling/resource allocation
decision at each PHY frame time tPZ. However, new chunks are requested at multiples
of the chunk time, i.e., at times t in for i P Z and n denoting the number of PHY
frames per chunk time, assumed here to be an integer for simplicity. In the rest of this
91
Figure 4.1: TimeScale Decomposition
chapter, we will use consistently the following notation: index t denotes the PHY frame
transmission slots, and the index i denotes video chunk slots.
4.2.2 Request Queue Dynamics
At the beginning of theith video chunk slot, each useruPU requests a particular quality
mode for the ith chunk of its video stream. That is, on each slot tPt0;n; 2n; 3n;:::u,
each user uPU species the quality mode m
u
ptq for its next video chunk. This decision
species the quality D
fu
pm
u
ptq;tq and the amount of bits B
fu
pm
u
ptq;tq associated with
that chunk. As these decisions are made only at times t that are multiples of n, it is
convenient to dene:
D
fu
pm
u
ptq;tq 0 and B
fu
pm
u
ptq;tq 0 for tRt0;n; 2n;:::u: (4.1)
92
The bitsB
fu
pm
u
ptq;tq are called the requested bits of useru on slott, and are placed in a
request queue Q
u
ptq. The request queue evolves at the PHY timescale over transmission
slots tPt0; 1; 2;:::u as:
Q
u
pt 1q maxtQ
u
ptq
u
ptqB
fu
pm
u
ptq;tq; 0u @ uPU; (4.2)
where
u
ptq is the amount of bits downloaded by user u on slot t. Note that the request
queue can decrease every transmission slot t as new bits are downloaded, but can only
increase on slots t that are multiples of n. Intuitively, Q
u
ptq consists of bits associated
with all chunks that have been requested by user u but not yet fully received.
The quantity
u
ptq indicates the instantaneous aggregate downloading rate of user u
on slot t, expressed in bits per slot. This is given by
u
ptq
¸
hPNpuq
hu
ptq1
hu
ptq (4.3)
where 1
hu
is an indicator function, equal to 1 if helper h has the video le requested
by user u and zero otherwise, and
hu
ptq is the rate served by helper h to user u on
slot t. The matrix r
hu
ptqs of transmission rates is selected within a set of feasible
transmission rate matrices for slot t. The set of all rate matrices supported by the
network at a given slot time t is referred to as the feasible instantaneous rate region
at time t, and depends on the network topology and channel state (e.g., on the fading
channels realization). The setRptq can be specied according to any multiple access
communication model. For example, the setRptq may include the constraint that each
user can receive a positive rate from at most one helper and/or constrain helpers to
restrict transmissions to at most S users, where S denotes the maximum number of
downlink data streams (spatial multiplexing gain) that the helper station can handle.
1
The setRptq can also handle models that allow simultaneous download from multiple
1
See [54] for a discussion of various wireless multiple access scenarios and interference models that t
this general framework.
93
helpers (for instance, in a cellular CDMA system with macro diversity), or information
theoretic capacity regions of various network topology models, inclusive of broadcast and
interference constraints (e.g., [94]). We also mention here that this framework can also
handle nonwireless scenarios. For example, it can constrainr
hu
ptqs to be permutation
matrices associated with packet switch constraints. However, as explained often in this
dissertation, it is desirable for current and future systems to take advantage of massive
MUMIMO capabilities at the helpers. Section 4.2.3 speciesRptq for the relevant wireless
scenario with helpers employing massive MUMIMO, which is the primary focus of this
paper. The simulation results in Section 4.6 are carried out under this specic wireless
model.
Remark 8 Each user u maintains Q
u
ptq and updates it according to (4.2) every trans
mission slott. A small amount of bookkeeping is also required by the user to associate the
bitsQ
u
ptq with their appropriate chunks. Specically, each user maintains a list of chunks
it has requested but not yet fully received, along with the quality modes it requested for
each chunk. It can receive new bits on slot t only from a helper that has its requested le,
and only if Q
u
ptq¡ 0. When downloading these bits, the user rst informs the helper of
the requested chunks, the desired quality levels, and the bit location needed for downloading
the residual bits of the nextinline chunk.
4.2.3 Wireless System Model with Massive MUMIMO Helpers
In this section, we specify the region of instantaneous service ratesRptq for the specic
PHY layer model comprising of massive MUMIMO at each helper. Each helper h, with
a large number of antennasM installed, implements MUMIMO to serve the usersNphq
in its vicinity. As a result, helper h can serve simultaneously, in the spatial domain, any
subset of size not larger than mintM;Nphqu of the users inNphq. We further assume
that each helper performs linear zeroforcing beamforming (LZFBF) to the set of selected
users (referred to in the following as \active users").
94
The wireless channel is modeled by the wellknown and widely accepted blockfading
model, where at each transmission slot t, the channel corresponding to the helperuser
linkph;uq inE is given by
y
u
ptq
a
g
hu
ptq
H
h;u
ptqV
h
ptqx
h
ptq
¸
h
1
h
a
g
h
1
u
ptq
H
h
1
;u
ptqV
h
1ptqx
h
1ptqz
u
ptq (4.4)
where
h;u
ptq is the M 1 column vector of channel coecients from the antenna array
of helperh to the receiving antenna of useru,g
hu
ptq is the largescale distance dependent
pathloss from helper h to user u, V
h
ptq is the downlink precoding matrix of helper h,
and x
h
ptq is the vector of transmitted complex information symbols (QAM modulation)
of helper h. z
u
ptq denotes the additive Gaussian noise at the uth user receiver. Notice
that this model takes fully into account the intercell interference of the signals sent by
other helpers h
1
h, on the link from helper h to user u.
We useS
h
ptq to denote the subset that is chosen for LZFBF in transmission slot t.
The M 1 channel vectors
u;h
ptq of all users uPS
h
ptq are assumed to be known at the
helperh through some form of channel state feedback. Such channel vectors are collected
as the columns of aMS
h
ptq channel matrix
h
ptq. The LZFBF precoded signal vector
is given by V
h
ptqx
h
ptq where x
h
ptq is theS
h
ptq 1 column vector of symbols to be sent
to users u PS
h
ptq and V
h
ptq is the ZFBF precoding matrix of dimension MS
h
ptq
given by the normalized pseudoinverse
Vptq
h
ptqp
H
h
ptq
h
ptqq
1
ptq
1{2
(4.5)
where ptq is a columnnormalizing diagonal matrix with theuth diagonal element given
by
u
ptq
1
H
h
ptq
h
ptq
1
uu
(4.6)
95
where rs
uu
denotes the uth diagonal element of the matrix argument. Using the fact
that
H
h
ptqV
h
ptq ptq
1{2
, the resulting downlink channel to user uPS
h
ptq becomes
y
u
ptq
a
g
hu
ptq
u
ptqx
hu
ptqz
u
ptq (4.7)
where g
hu
ptq is the large scale pathloss coecient from helper h to user u. Under the
assumptions that M;S
h
ptq Ñ 8 with a xed ratio
S
h
ptq
M
¤ 1, random matrix theory
results (see [58,95]) can be invoked to show that
u
ptqÑ
1
S
h
ptq 1
M
(4.8)
Thus, for a given choice of subsetS
h
ptq and under the assumption that the powerP
h
ptq is
equally shared across the user streams inS
h
ptq, the vector c
h
pS
h
ptq;tqtc
hu
pS
h
ptq;tqu
uPU
of rates (in bits per channel symbol) achieved by all the users inNphq is given by
2
such
that
c
hu
pS
h
ptq;tq
$
'
&
'
%
0 if uRS
h
ptq
log
1
MS
h
ptq1
S
h
ptq
P
h
g
hu
ptq
1
°
h
1
h
P
h
1
u
g
h
1
u
ptq
if uPS
h
ptq
(4.9)
In fact, it is known that the asymptotics kick in very quickly making the rates in (4.9)
achievable for practical values of M andS
h
ptq. Notice that the rate expression is inde
pendent of the small scale fading coecients. This is because of using a large number
of antennas M at the helpers which renders a large MS
h
ptq random channel matrix
ptq of i.i.d complex Gaussian small scale fading coecients in every transmission slot
2
This rate expressions neglects the eect of pilot contamination, which arises in massive MIMO with
TDD and openloop channel estimation based on uplink pilots and channel reciprocity. While in the regime
of innite number of base station antennas and nite number of users, pilot contamination dominates
the massive MIMO performance in a multicell network [29], it is wellknown that in the morerealistic
regime of large but nite number of antennas this eect is typically negligible with respect to the residual
multiuser intercell interference [4, 5]. Here, for simplicity of exposition and space limitation, we neglect
these eects and assume that the LZFBF precoder is computed from ideal knowledge of the channel
matrix, such that our rate expressions are exact under this assumption. However, we hasten to say that
our approach is immediately applicable to the case of imperfect channel state knowledge, by using the
appropriate (more involved) feasible rate expressions.
96
t. When each helper performs LZFBF in every transmission slot t, the coecients
u
ptq
given in (4.6) by the reciprocals of the diagonal elements of the inverse Wishart matrix
p
H
ptqptqq
1
\harden" at a deterministic value (4.8) (see [95]) due to the large size of
the matrix ptq and the assumption
S
h
ptq
M
¤ 1. This results in deterministic rate expres
sions as in (4.9) which are independent of ptq and are just dependent on the large scale
path loss coecients g
hu
ptq. Furthermore, in the case when the helpers are incapable of
MUMIMO, i.e., when the active user subset size S
h
ptq is chosen to be exactly 1, the
above formula still holds by settingS
h
ptq 1 and this is referred to as single user MIMO
(SUMIMO).
Since helperh can choose an active user subset from the collection of all possible user
subsets ofNphq, the vectort
hu
ptqu
uPNphq
of bits scheduled by helper h to users u in its
neighborhoodNphq is constrained to lie in the discrete set of vectors
tsc
h
pS
h
;tq :S
h
Nphqu (4.10)
where s is the number of channel symbols available in every transmission slot t.
We assume that the receiver at every user is advanced in the sense that it can decode
multiple streams in the same transmission slot, i.e., user u, in transmission slot t, can
receive
u
ptq
°
hPNpuq
hu
ptq videoencoded bits by simultaneously downloading
hu
ptq
bits from helpers h inNpuq. Notice that each stream is achievable (in an information
theoretic sense), by treating the other streams as Gaussian noise, i.e., we do not make use
of multiuser detection schemes (e.g., based on successive interference cancellation) at the
user receivers. Therefore, our rate expressions are representative of what can be achieved
with today's user device technology.
For the sake of comparison, in the simulation results of Section 4.6 we also consider
a dumb receiver heuristic where each user u decodes only the strongest data stream
and therefore downloads only max
hPNpuq
hu
ptq videoencoded bits. While the dumb
receiver heuristic is a degradation of the optimal solution involving advanced receivers,
97
the simulation results in Section 4.6 show that this degradation is almost negligible.
This also implicitly indicates that, in most relevant practical topologies and pathloss
scenarios, it is unlikely that the same user is scheduled by more than one helper in the
same transmission slot.
4.2.4 Network State and Dynamic Scheduling Policy
The small scale fading channel vectors
h;u
ptq change every transmission slot t while the
channel coecient g
hu
ptq, which models path loss and shadowing between helper h and
useru, is assumed to change slowly in time. However, due to the channel hardening eect
of high dimensional MIMO channels, we observe from (4.9) that the rates are independent
of the small scale channel fading coecients and are dependent only on the large scale
pathloss coecients. For a typical network with nomadic users moving at walking speed
or slower, the path loss coecients change on a timescale of the order of 10s (i.e., 20
video chunk slots). This time scale is much slower than the coherence of the smallscale
fading, but it is comparable with the duration of the video chunks. Therefore, variations
of these coecients, albeit slow, are relevant during a streaming session (e.g., due to user
mobility). Furthermore, the video qualityD
fu
p;tq and the chunk sizeB
fu
p;tq vary from
chunk to chunk due to the variable bitrate nature of video coding. Therefore, at this
point, as in Chapter 3, we can formally dene the network state and a feasible scheduling
policy for our system as follows:
Denition 12 The network state!ptq collects all the quantities that evolve independently
of the scheduling decisions in the network. These are, in particular, the slowlyvarying
channel gains, the video quality levels, and the corresponding bitrates of the chunk at
transmission slot t. Hence, we have
!ptqtg
hu
ptq;D
fu
p;tq;B
fu
p;tq :@ph;uqPEu: (4.11)
98
Denition 13 A scheduling policy taptqu
8
t0
is a sequence of control actions aptq com
prising the vector ptq with elements
hu
ptq of transmitted bits, and the quality level
decisionstm
u
ptq :@ uPUu.
Denition 14 For any slott, the feasible set of control actionsA
!ptq
includes all control
actions aptq such that the constraint (4.10) is satised.
Denition 15 A feasible scheduling policy for the system at hand is a sequence of control
actionstaptqu
8
t0
such that aptqPA
!ptq
for all slots t.
4.3 Problem Formulation and Streaming Policy
When optimizing the users' video QoE we have to take into account that users compete
for the same shared transmission resource (the network wireless spectrum and the helpers
spatial downlink data streams) and, given the fact that the users are placed in arbitrary
positions with respect to the helpers, their attainable service rates may be quite dierent.
Hence, some fairness criterion must be enforced. In addition, we need to carefully dene
the notion of QoE, since the adaptive nature of the streaming process involves a possibly
timevarying quality level across the streaming sessions.
As already mentioned before in Section 1.2.1 and Chapter 3, we remark once again
that, in order to obtain a tractable formulation, we adopt the divide and conquer approach:
1. We rst formulate the NUM problem (4.13), where the network utility function is
a concave and component wise nondecreasing function of the time averaged users'
requested video quality and the maximization is subject to the stability of all the
request queues in the system.
2. We then solve the NUM problem using the Lyapunov Optimization framework and
obtain the driftpluspenalty policy which adapts to arbitrarily changing network
conditions and in fact is optimal (with respect to the NUM problem) under non
stationary and nonergodic evolution of the underlying network state process.
99
3. Since all the request queues in the system are ensured to be stable, the requested
video chunks are eventually delivered. However, in order to ensure that all the
video chunks are delivered within their playback deadline, it suces for every user
to choose a prebuering time which exceeds the largest delay with which a chunk
is delivered. In particular, when the maximum delay of each request queue in the
system admits a deterministic upper bound, setting the prebuering time larger
than such a bound makes the playback buer under rate zero. However, for a system
with arbitrary (nonstationary, nonergodic) evolution of the underlying network
state process (for e.g., arbitrary user mobility and arbitrary perchunk
uctuations
of video coding rate due to VBR coding), such deterministic upper bounds on the
maximum delay may not exist or are too loose to be useful in practice. Hence, in
Section 4.5, we propose a method to locally estimate the delays with which video
chunks are delivered, such that each user can calculate its prebuering and re
buering times to be larger than the locally estimated maximum delay. Through
simulations in Section 4.6, we demonstrate the eectiveness of the combination of
the driftpluspenalty policy and the adaptive prebuering scheme.
In the rest of this section, we focus on the NUM problem formulation and its solution
through the drift plus penalty approach. The goal of the NUM problem for our system
is to maximize a concave network utility function of the individual users' video qualities.
Since these are timevarying quantities, we focus on the time average of such qualities.
In addition, all chunks requested by the users should be delivered. This imposes the
constraint that all request queues at the users must be stable. Throughout this work, we
use the following standard notation for the time average expectation of any quantity x:
x : lim
tÑ8
1
t
t1
¸
0
Erxpqs: (4.12)
As in Chapter 3, we letD
u
: lim
tÑ8
1
t
°
t1
0
ErD
fu
pm
u
pq;qs denote the time average
of the expected quality of user u, and Q
u
: lim
tÑ8
1
t
°
t1
0
ErQ
u
pqs to be the time
100
average of the expected length of the request queue at user u, assuming temporarily
that these limits exist.
3
Let
u
pq be a concave, continuous, and nondecreasing function
dening network utility vs. video quality for useruPU. The NUM problem that we wish
to solve is given by:
maximize
¸
uPU
u
pD
u
q (4.13a)
subject to Q
u
8@ uPU (4.13b)
aptqPA
!ptq
@ t; (4.13c)
where requirement of nite Q
u
corresponds to the strong stability condition for all the
queues [52].
As described in Chapter 3, in order to solve problem (4.13) using the stochastic
optimization theory developed in [52], it is convenient to transform it into an equivalent
problem that involves the maximization of a single time average. This transformation is
achieved through the use of auxiliary variables
u
ptq and the corresponding virtual queues
u
ptq with buer evolution:
u
pt 1q maxt
u
ptq
u
ptqD
fu
pm
u
ptq;tq; 0u: (4.14)
3
As in Chapter 3, the existence of these limits is assumed temporarily for ease of exposition of the
optimization problem (4.13) but is not required for the derivation of the scheduling policy and for the
proof of Theorem 4.
101
. Consider the transformed problem:
maximize
¸
uPU
u
p
u
q (4.15)
subject to Q
u
8@ uPU (4.16)
u
¤D
u
@ u P U (4.17)
D
min
u
¤
u
ptq¤D
max
u
@ u P U (4.18)
ptqPA
!ptq
@ t (4.19)
Notice that constraints (4.17) correspond to stability of the virtual queues
u
, since
u
and D
u
are the timeaveraged arrival rate and the timeaveraged service rate for the
virtual queue given in (4.14). We have:
Lemma 5 Problems (4.13) and (4.15) { (4.19) are equivalent.
Proof The proof is the same as the proof of Lemma 4 in Chapter 3 and is omitted.
4.3.1 The DriftPlusPenalty Expression
Let Qptq denote the column vector containing the backlogs of queues Q
u
@ u PU, let
ptq denote the column vector for the virtual queues
u
@ u P U,
ptq denote the
column vector with elements
u
ptq@uPU, Bptq denote the column vector with elements
B
fu
pm
u
ptq;tq@uPU, Dptq denote the column vector with elementsD
fu
pm
u
ptq;tq@uPU
and ptq denote the column vector with elements
u
ptq @ u P U as dened in (4.3).
Let Gptq
Q
T
ptq;
T
ptq
T
be the composite vector of queue backlogs and dene the
quadratic Lyapunov function LpGptqq
1
2
G
T
ptqGptq. Intuitively, taking actions to push
LpGptqq down tends to maintain stability of all queues. Dene pGptqq as the oneslot
drift of the Lyapunov function at slot t :
pGptqq :LpGpt 1qqLpGptqq (4.20)
102
The driftpluspenalty algorithm is designed to observe the queues, the current B
fu
p;tq,
D
fu
p;tq for all users u and !ptq on each slot t, and to then choose quality mode m
u
ptq
for all usersu, matrix of transmitted bitsp
hu
ptqqPRp!ptqq and
u
ptq subject toD
min
u
¤
u
ptq¤D
max
u
to minimize a bound on the following driftpluspenalty expression:
pGptqqV
¸
uPU
u
p
u
ptqq (4.21)
where V is a nonnegative weight that aects a performance bound. Intuitively, the
value of V aects the extent to which the control actions on slot t emphasize utility
maximization in comparison to drift minimization.
Lemma 6 Under any control algorithm, the driftpluspenalty expression satises:
pGptqqV
¸
uPU
u
p
u
ptqq¤KV
¸
uPU
u
p
u
ptqqpBptqptqq
T
Qptq
p
ptq Dptqq
T
ptq: (4.22)
whereK is a uniform upper bound on the term
1
2
pBptqptqq
T
pBptqptqq p
ptq Dptqq
T
p
ptq Dptqq
:
Proof Expanding the quadratic Lyapunov function, we have
LpGpt 1qqLpGptqq
1
2
Q
T
pt 1qQpt 1q Q
T
ptqQptq
1
2
T
pt 1qpt 1q
T
ptqptq
1
2
pmaxtQptqptq Bptq; 0uq
T
pmaxtQptqptq Bptq; 0uq Q
T
ptqQptq
1
2
pmaxtptq
ptq Dptq; 0uq
T
pmaxtptq
ptq Dptq; 0uq
T
ptqptq
;
(4.23)
103
where we have used the queue evolution equations (4.2) and (4.14) and \max" is applied
componentwise.
Using the fact that for any nonnegative scalar quantities ;
and D we have the
inequalities
pmaxt
D; 0uq
2
¤p
Dq
2
2
p
Dq
2
2p
Dq; (4.24)
we have
LpGpt 1qqLpGptqq¤
1
2
pBptqptqq
T
pBptqptqqpBptqptqq
T
Qptq
1
2
p
ptq Dptqq
T
p
ptq Dptqqp
ptq Dptqq
T
ptq
(4.25)
Under the realistic assumption that the chunk sizes, the transmission rates and the
video quality measures are bounded above by some constants, independent of t, the term
1
2
pBptqptqq
T
pBptqptqq p
ptq Dptqq
T
p
ptq Dptqq
is bounded above by a constantK. Using this fact and adding the penalty term
V
°
uPU
u
p
u
ptqq on both sides of the inequality (4.25) yields the result.
The driftpluspenalty (DPP) policy described below acquires information about the
queue states Gptq, the ratequality prole pB
fu
p;tq;D
fu
p;tqq for all users u and the
channel state !ptq at every slot t, and chooses control actions m
u
ptq,r
hu
ptqsPRp!ptqq
and
u
ptq, subject to D
min
u
¤
u
ptq¤D
max
u
, in order to minimize the last three terms on
the right hand side of the inequality (4.22).
The nonconstant part in the right hand side of (4.22) can be rewritten as:
B
T
ptqQptq D
T
ptqptq
V
¸
uPU
u
p
u
ptqq
T
ptqptq
T
ptqQptq: (4.26)
104
The resulting control actions are given by the minimization, at transmission slot t, of the
expression in (4.26). Notice that the rst term of (4.26) depends only on m
u
ptq@ uPU,
the second term of (4.26) depends only on
ptq and the third term of (4.26) depends only
on ptq. Thus, the overall minimization decomposes into three separate subproblems,
yielding the layered scheme given below
4.3.2 The DriftPlusPenalty Policy
We address the minimization of (4.26) focusing separately on its (separable) components.
4.3.2.1 Control actions at the user nodes (pull congestion control)
The rst term in (4.26) is given by
¸
uPU
tQ
u
ptqB
fu
pm
u
ptq;tq
u
ptqD
fu
pm
u
ptq;tqu: (4.27)
The minimization variables m
u
ptq appear in separate terms of the sum and hence can
be optimized separately over each user u P U. Thus, each user observes the queues
Q
u
ptq;
u
ptq and is aware of the the ratequality prole pB
fu
p;tq;D
fu
p;tqq on slot t
(vidoe metadata), so that it can choose the quality level of the requested chunk at every
video chunk slot i, i.e., at transmission slots tPtin :iPZu as:
m
u
ptq argmintQ
u
ptqB
fu
pm;tq
u
ptqD
fu
pm;tq : mPt1;:::;N
fu
uu: (4.28)
As dened in (4.1), for all transmission slots t which are not integer multiples of n, there
is no chunk requested and therefore B
fu
pm
u
ptq;tq and D
fu
pm
u
ptq;tq are equal to be 0.
The second term in (4.26), after a change of sign, is given by
¸
uPU
tV
u
p
u
ptqq
u
ptq
u
ptqu: (4.29)
105
Again, this is maximized by maximizing separately each term, yielding the simple one
dimensional maximization (e.g., solvable by linesearch):
u
ptq argmax
V
u
p
q
u
ptq
:
PrD
min
u
;D
max
u
s
(
; (4.30)
We refer to the policy (4.28) as pull congestion control since each user u selects the
quality level at which this chunk is requested by taking into account the state of its
request queue Q
u
. It chooses an appropriate video quality level that balances the desire
for high quality (re
ected by the term
u
ptqD
fu
pm;tq in (4.28)) and the desire for
low request queue lengths (re
ected by the term Q
u
ptqB
fu
pm;tq in (4.28)) and then
opportunistically pulls the chunk at that video quality level from the helpers in its current
vicinity. This policy is reminiscent of the current DASH technology [79], where the client
(user) progressively fetches a video le by downloading successive chunks, and makes
adaptive decisions on the source encoding quality based on its current knowledge of
the congestion of the underlying serverclient connection. Notice also that, in order to
compute (4.28) and (4.30), each user needs to know only local information formed by
the locally maintained request queue backlog Q
u
ptq and by the locally computed virtual
queue backlog
u
ptq.
4.3.2.2 Control actions at the helper nodes (transmission scheduling)
At transmission slott, the network controller observes the queuesQ
u
ptq of all usersu and
the topology state!ptq, and chooses the feasible instantaneous rate matrixr
hu
ptqsPRptq
to maximize the weighted sum rate of the transmission rates achievable in transmission
slott. Namely, the network of helpers must solve the MaxWeighted Sum Rate (MWSR)
problem:
maximize
¸
hPH
¸
uPNphq
Q
u
ptq
hu
ptq
subject to r
hu
ptqsPRptq (4.31)
106
whereRptq is the feasible instantaneous rate region of the network at slot t. It is im
mediate to see that, after a change of sign, the maximization of the third term in (4.26)
yields the problem (4.31). We now particularize the problem (4.31) to the specic wireless
system with massive MUMIMO helpers. For the constraint (4.10) specic to the wireless
system, the general weighted sumrate maximization problem (4.31) reduces to:
maximize
¸
hPH
¸
uPNphq
Q
u
ptq
hu
ptq
subject to t
hu
ptqu
uPNphq
Ptsc
h
pS
h
;tq :S
h
Nphqu@ hPH: (4.32)
This problem decouples into separate maximizations for each helper h given by the fol
lowing discrete optimization problem:
maximize
¸
uPNphq
Q
u
ptq
hu
ptq
subject to t
hu
ptqu
uPNphq
Ptsc
h
pS
h
;tq :S
h
Nphqu: (4.33)
The above optimization problem at each helper h essentially corresponds to maximizing
the weighted sum rate over the discrete set of vectors tsc
h
pS
h
;tq :S
h
Nphqu with
an exponential number 2
Nphq
1 of choices for the active user subset. However, the
key observation from rate expression (4.9) is that when helper h schedules the subset
S
h
of users for MUMIMO beamforming, the rate of each user u P S
h
depends only
on the cardinality S
h
 but not on the identity of the members of the subsetS
h
. This
implies that for a xed subset sizeS, the subsetU
pS;tq of users maximizing the weighted
sum rate can be obtained by sorting the users inNphq according to the weighted rate
Q
u
ptq log
1
MS1
S
P
h
g
hu
ptq
p1
°
h
1
h
P
h
1
u
g
h
1
u
ptqq
and choosing greedily the bestS users. Thus,
we have
U
h
pS;tq argmaxS
"
Q
u
ptq log
1
MS 1
S
P
h
g
hu
ptq
1
°
h
1
h
P
h
1
u
g
h
1
u
ptq
: uPNphq
*
;
(4.34)
107
where argmaxS denotes the operation of choosing the rst S elements of a set of real
numbers sorted in decreasing order.
This sort & greedy selection procedure is repeated for every subset size yielding all
the subsets tU
pS;tqu
Nphq
S1
. Then, from these subsets, the subsetU
ptq which has the
maximum weighted sum rate is picked as
U
h
ptq argmax
$
&
%
¸
uPU
h
pS;tq
Q
u
ptq log
1
MS 1
S
P
h
g
hu
ptq
1
°
h
1
h
P
h
1
u
g
h
1
u
ptq
: U
h
pS;tq@ S
,
.

(4.35)
yielding the optimal solution to (4.33).
A typical sorting algorithm has complexity OpNphq logpNphqqq and since the sorting
procedure is repeated for every subset size, the algorithm has complexityO
Nphq
2
logpNphq
which improves upon existing user scheduling algorithms [96] for the MIMO broadcast
channel.
4.4 Policy Performance
As outlined in Section 4.2, VBR video yields timevarying quality and rate functions
D
f
pm;tq andB
f
pm;tq, which depend on the individual video le. Furthermore, arbitrary
user motion yields slower time variations of the path coecients g
hu
ptq at the same time
scale of the video streaming session. As a result, any stationarity or ergodicity assumption
about the network state process !ptq is unlikely to hold in most practically relevant
settings. Therefore, we consider the optimality of the DPP policy for an arbitrary sample
path of the network state !ptq. Following in the footsteps of [52, 53], we compare the
network utility achieved by our DPP policy with that achieved by an optimal oracle
policy withT slot lookahead, i.e., knowledge of the future network states over an interval
of length T slots. Time is split into frames of duration T slots and we consider F such
108
frames. For an arbitrary sample path!ptq, we consider the static optimization problem
over the jth frame
maximize
¸
uPU
u
1
T
pj1qT1
¸
jT
D
u
pq
(4.36)
subject to
1
T
pj1qT1
¸
jT
rB
fu
pm
u
pq;q
u
pqs¤ 0@ uPU (4.37)
aptqPA
!ptq
@ t P tjT;:::;pj 1qT 1u; (4.38)
and denote by
opt
j
the maximum of the network utility function for frame j, achieved
over all policies which have future knowledge of the sample path!ptq over thejth frame,
subject to the constraint (4.37), which ensures that for every queue Q
u
, the total service
provided over the frame is at least as large as the total arrivals in that frame. We have
the following result:
Theorem 4 The DPP scheduling policy achieves persample path network utility
¸
uPU
u
D
u
¥ lim
FÑ8
1
F
F1
¸
j0
opt
j
O
1
V
(4.39)
with bounded queue backlogs satisfying
lim
FÑ8
1
FT
FT1
¸
0
¸
uPU
Q
u
pq
¸
uPU
u
pq
¤OpVq (4.40)
where Op1{Vq indicates a term that vanishes as 1{V and OpVq indicates a term that
grows linearly with V , as the policy control parameter V grows large.
Proof See Appendix C.1.
An immediate corollary of Theorem 4 is:
109
Corollary 3 For the system dened in Section 4.2, when the network state is stationary
and ergodic, then
¸
uPU
u
pD
u
q¥
opt
O
1
V
; (4.41)
where
opt
is the optimal value of the NUM problem (4.13) in the stationary ergodic case,
4
and
¸
uPU
Q
u
¸
uPU
u
¤OpVq (4.42)
In particular, if the network state is i.i.d., the bounding term in (4.41) is explicitly given by
Op1{Vq
K
V
, and the bounding term in (4.42) is explicitly given by
KVpmax
min
q
, where
min
°
uPU
u
pD
min
u
q,
max
°
uPU
u
pD
max
u
q, ¡ 0 is the slack variable corresponding
to the constraint (4.37), and the constantK is dened in (4.22).
Proof See Appendix C.1.
Remark 9 Notice from decisions (4.35) that, unlike conventional cellular systems, we
do not assign a xed set of users to each helper. In contrast, the helperuser association is
dynamic, and results from the transmission scheduling decisions (4.34) and (4.35). Notice
also that for a practical implementation of these decisions, each helper h rst needs to
learn locally the request queue lengths Q
u
of the users in its neighborhoodNphq. Then,
it has to greedily pick the user subsetU
h
ptq according to (4.35) and again learn the small
scale fading channel vectors
h;u
of the users inU
h
ptq for LZFBF precoding through some
form of channel state feedback from the users. Furthermore, the
hu
ptq videoencoded bits
transmitted by helper h to user u should correspond to the chunks at the head of line of
the request queue Q
u
, encoded at the quality level chosen by user u in a previous video
4
Notice that in the stationary and ergodic case the value
opt
is generally achieved by an instantaneous
policy with perfect knowledge of the state statistics or, equivalently, by a policy with innite lookahead,
since the state statistics can be learned arbitrarily well from any sample path with probability 1, because
of ergodicity.
110
chunk slot based on the pull scheme (4.28). Thus, each user u must also broadcast the
metadata (chunk number and quality level) of the chunks at the head of line along with
Q
u
to the helpers inNpuq.
.
Remark 10 In a practical implementation, one can assume that each video chunk is
segmented into dierent subpackets for wireless transmission and user u downloads the
°
hPNpuq
h
uptq
video encoded bits in slott by progressively downloading dierent subpackets
of the chunk from dierent helpers. However, for a user u to download
hu
ptq bits from
each helper h inNpuq, it needs to keep track of the subpackets that it has downloaded.
Nevertheless, if intrasession network coding is employed to encode the subpackets of each
chunk, then the user no longer needs to bookkeep at the subpacket level and just needs
to download the required number of linear combinations to decode the chunk. Thus, this
scenario is ideally suited to an application of distributed storage codes like DRESS [97]
which makes the subpacket bookkeeping problem much simpler.
4.5 Prebuering and rebuering chunks
As described in Section 4.2, the playback process consumes chunks at a xed playback
rate 1{T
top
(one chunk per video chunk slot i), while the number of chunks per video
chunk slot is a random variable due to the fact that !ptq is a random process and the
transmission resources are dynamically allocated by the DPP scheduling policy. In order
to prevent stall events, each user u should choose its prebuering time T
u
to be larger
than the maximum delay with which a chunk is delivered to it. However, such maximum
delay is neither deterministic nor known a priori. Moreover, even in special cases where
the maximum delay of each request queue in the system admits a deterministic bound
(e.g., see [54]), such a bound may be loose and setting the prebuering time to be larger
than that bound might correspond to downloading the whole video le before starting
playback. We therefore propose a scheme where each user u estimates its local delays by
111
monitoring its delivery times in a sliding window spanning a xed number of video chunk
slots.
The goal here is to determine the delay T
u
after which user u should start playback,
with respect to the time at which the rst chunk is requested (beginning of the streaming
session). We dene the size of the playback buer
u
piq as the number of chunks available
in the buer at video chunk slot i but not yet played out. Without loss of generality,
assume that the streaming session starts at i 1. Then,
u
piq evolves at the video
timescale over video chunk slots iPt0; 1;:::u as:
5
u
piq maxt
u
pi 1q 1ti¡T
u
u; 0ua
i
: (4.43)
where a
i
is the number of chunks which are completely downloaded in the transmission
slots between tpi 1qn and t in. Note that the playback buer is updated every
video chunk slot i, i.e., at the time scale of seconds. Thus, if the download of a chunk is
completed between tpi 1qn and t in, from the playback buer's perspective, the
chunk is considered to have arrived at the end of theith video chunk slot, i.e., attin.
LetA
k
denote the video chunk slot in which chunkk arrives at the user and letW
k
denote
the delay (measured in video chunk slots) with which chunkk is delivered. Note that the
longest period during which
u
piq is not incremented is given by the maximum delay to
deliver chunks. Thus, each user u needs to adaptively estimate W
k
in order to choose
T
u
. In the proposed method, at each video chunk slot i 1; 2;:::, user u calculates the
maximum observed delay E
i
in a sliding window of size , by letting:
E
i
maxtW
k
: i 1¤A
k
¤iu: (4.44)
Finally, useru starts its playback when
i
crosses the levelE
i
, i.e.,T
u
minti :
u
piq¥
E
i
u where is a tuning parameter. If a stall event occurs at video chunk slot T , i.e.,
5
1tKu denotes the indicator function of a condition or eventK.
112
0 10 20 30 40 50 60 70 80
0
10
20
30
40
50
60
70
80
Figure 4.2: Simulation setup
i
0 for i¡T , the algorithm enters a rebuering phase in which the same algorithm
presented above is employed again to determine the new instant T T
u
1 at which
playback is restarted. With slight abuse of notation, we have reused T
u
to denote the
rebuering delay although it is reestimated using the sliding window method at each
new stall event.
4.6 Numerical Experiment
Our simulations are based on a network topology formed by a 80m80m region with 5
smallcell BSs (indicated by 's) as shown in Fig. 4.2. The users (indicated by 's) are
generated according to a nonhomogeneous Poisson point process with higher density in
a central region of size
80
3
m
80
3
m, as shown in Fig. 4.2.
Each smallcell BS hasM antennas and serves user sets of size uptoS, with transmis
sion power of 35dBm. The pathloss from a smallcell BS to a user is given by
1
1p
d
40
q
3:5
,
with d representing the BSuser distance (assuming a torus wraparound model to avoid
boundary eects). We assume a PHY fame duration of 10 ms and a total system band
width of 18 Mhz as specied in the LTE 4G standard. With one OFDM resource block
113
(712 channel symbols) spanning 0:5 ms in time and 180 Khz in bandwidth (correspond
ing to 12 adjacent subcarriers each with 15 KHz bandwidth), each transmission slot spans
s 84 100 20 channel symbols.
We assume that all the users request chunks successively from VBRencoded video
sequences. We use the same video les from Chapter 3. Each video le is a long sequence
of chunks, each of duration 0:5 seconds and with a frame rate 30 frames per second.
We consider a specic video sequence formed by 800 chunks, constructed using several
standard video clips from the database in [92]. The chunks are encoded into dierent
quality modes with the quality index measured using the Structural SIMilarity (SSIM)
index dened in [93]. The chunks from 1 to 200 are encoded into 8 quality modes with
an average bitrate of 631 kbps. Chunks 201 to 400 are encoded in 4 quality modes at an
average bitrate of 3908 kbps. Similarly, chunks 401 600 and 601 800 are encoded into
4 and 8 quality modes with average bitrates of 6679 kbps and 556 kbps respectively. In
the simulation, each user starts its streaming session of 1000 chunks from some arbitrary
position in this reference video sequence and successively requests 1000 chunks by cycling
through the sequence.
We choose the utility function
u
pq logpq@uPU to impose proportional fairness.
We set the playback buer tuning parameter (described in Section 3.3) 3. We simulate
our algorithm for the layout shown in Fig. 4.2(with 501 users generated according to a
nonhomogenous Poisson point process as explained above). At t 1, all the users
simultaneously start streaming 1000 chunks.
We rst study the performance of our algorithm withM 40 antennas and maximum
active user subset size S 10 for dierent values of the policy control parameter V . We
plot the results for the values V , 2V , 5V and 10V with V 10
14
in Figs. 4.3a4.3f.
Figs. 4.3a and 4.3b plot the network utility
°
u
logpD
u
q and the time average total queue
(request+virtual) backlog
°
u
pQ
u
u
q for dierent values of V . We can observe the
rOp1{Vq;OpVqs utilitybacklog tradeo as expected from Theorem 4. The large numbers
in Fig. 4.3b are a result of initializing all the virtual queues
u
to 0:5 10
14
. Since it
114
is only the request queues which are actually responsible for delivering the video chunks,
we also plot the time average total request queue length
°
u
Q
u
in Fig. 4.3c and verify
similar OpVq behavior. In addition, we plot the CDF over the user population of key
performance metrics a) the average video quality and b) the average delay in the reception
of vide chunks measured in video chunk slots in Figs. 4.3d, 4.3e respectively. We also
show the delay in term of video playback performance by plotting the CDF over the
user population of the % of time spent in `buering' mode (including prebuering and
rebuering periods) assuming the playback buer tuning parameter 3 in Fig. 4.3f.
We observe that both the QoE metrics average video quality and the % of time spent in
buering mode are satisfactory for the control policy parameter V 2 10
14
and choose
that value for the rest of the simulations in this section.
We next study the performance loss experienced under the dumb receiver heuristic
where the receiver at every user u decodes only the strongest signal and downloads only
max
hPNpuq
hu
ptq in contrast to the macro diversity advanced receiver which can decode
multiple signals simultaneously and download all the
°
hPNpuq
hu
ptq bits. UsingM 40
andS 10, we simulate our algorithm and plot the CDF's over the user population of a)
the average video quality b) the average delay in the reception of video chunks measured
in video chunk slots and c) the % of playback time spent in buering mode in Figs. 4.4a,
4.4b and 4.4c respectively. We observe that the performance loss in using a dumb receiver
is fairly negligible and therefore use a dumb receiver for the rest of the simulations in this
section.
We now study the QoE improvement when MUMIMO is deployed at the access points
in place of legacy SUMIMO systems. We plot the CDF over the user population of the
same video streaming QoE metrics as in the previous gures for three dierent cases
1) MUMIMO with M 40 antennas and maximum active user subset size S 10;
2) MUMIMO with M 20 antennas and maximum active user subset size S 5; 3)
SUMIMO with M 10 antennas. From Figs. 4.5a, 4.5c and 4.5b, we can observe that
there is signicant improvement of video streaming performance in terms of the average
115
1 2 3 4 5 6 7 8 9 10
x 10
14
−49.6
−49.4
−49.2
−49
−48.8
−48.6
−48.4
−48.2
−48
−47.8
−47.6
V
Network utility
(a) Network utility versus V
1 2 3 4 5 6 7 8 9 10
x 10
14
0
0.5
1
1.5
2
2.5
3
x 10
17
V
Time average total queue (request + virtual) length
(b) Time average total queue (request+virtual) length versus
V
1 2 3 4 5 6 7 8 9 10
x 10
14
1
2
3
4
5
6
7
8
9
10
11
x 10
9
V
Time average total request queue length
(c) Time average total request queue length vs. V
0.89 0.895 0.9 0.905 0.91 0.915 0.92 0.925 0.93
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
video quality averaged over delivered chunks(x)
fraction of users with avg. video quality < x
V
2V
5V
10V
(d) CDF of avg. video quality
0 5 10 15 20 25 30 35 40 45
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
average delay measured in chunk time slots(x)
fraction of users with avg. delay < x
V
2V
5V
10V
(e) CDF of average delay
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pecentage of time spent in buffering mode(x)
fraction of users with buffering pecentage < x
V
2V
5V
10V
(f) CDF of time spent in `buering' mode
Figure 4.3: Performance tradeos with policy control parameter V
116
0.89 0.895 0.9 0.905 0.91 0.915 0.92 0.925 0.93
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
video quality averaged over delivered chunks(x)
fraction of users with avg. video quality < x
dumb receiver
advanced receiver
(a)
5 6 7 8 9 10 11 12
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
average delay measured in chunk time slots(x)
fraction of users with average delay < x
dumb receiver
advanced receiver
(b)
1.5 2 2.5 3 3.5 4 4.5 5 5.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pecentage of time spent in buffering mode(x)
fraction of users with buffering pecentage < x
dumb receiver
advanced receiver
(c)
Figure 4.4: Performance comparison of advanced and dumb receivers.
video quality, the average delay (or alternately the percentage of time spent in buering
mode) when MUMIMO is employed at the PHY layer in comparison to SUMIMO. This
clearly indicates that upgrading current SUMIMO systems to massive MUMIMO is a
promising approach to meet the increasing demands for HD video streaming.
Finally, we study the benets of using a cross layer approach in comparison to a
baseline scheme representative of legacy wireless systems. We perform this comparison
for the case where every helper employs SUMIMO with M 10 antennas. For the
baseline scheme, every user rst xes its association with the unique helper that provides
117
0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
video quality averaged over delivered chunks(x)
fraction of users with avg. video quality < x
Empirical CDF
MU−MIMO: M=40, S=10
MU−MIMO: M=20, S=5
SU−MIMO: M=10, S=1
(a)
0 50 100 150 200 250 300
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
average delay measured in chunk time slots(x)
fraction of users with avg. delay < x
MU−MIMO: M=40, S=10
MU−MIMO: M=20, S=5
SU−MIMO: M=10, S=1
(b)
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pecentage of time spent in buffering mode(x)
fraction of users with buffering pecentage < x
Empirical CDF
MU−MIMO: M=40, S=10
MU−MIMO: M=20, S=5
SU−MIMO: M=10, S=1
(c)
Figure 4.5: Video streaming QoE improvement with MUMIMO over SUMIMO
118
0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
video quality averaged over delivered chunks(x)
fraction of users with avg. video quality < x
baseline
cross layer
(a)
0 100 200 300 400 500 600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
average delay measured in chunk time slots(x)
fraction of users with avg. delay < x
baseline
cross layer
(b)
Figure 4.6: Performance comparison of a crosslayer approach with a baseline scheme.
the maximum received signal strength (RSSI) P
h
g
hu
and then uses the same control
decision (4.28) to choose the quality levels for the chunks that arrive into the request
queue every video chunk slot. Furthermore, we assume that the helpers locally employ
proportional fairness/ equal airtime scheduling, i.e., each helper h schedules the users
associated with it through the maxRSSI scheme in a roundrobin fashion across the
transmission slots independent of the request queue lengths at the users. This baseline
scheme is representative of current practical systems where the decisions across dierent
layers are independent and there is no interaction between the upper and lower layers.
We plot the CDFs over the user population of the average video quality and the average
delay in the reception of chunks in Figs. 4.6a and 4.6b respectively. We can observe that
the cross layer scheme treats the users in a fair manner while the baseline scheme favors
some users at the expense of other users in the system.
119
Appendix A
A.1 Massive MIMO User Rates
While the results and the schemes presented in this paper hold for any network charac
terized by a set of user instantaneous ratestR
k;j
u and BS spatial multiplexing constraints
tS
j
u, it is worthwhile to specically connect our treatment to massive MIMO performance
analysis. The rate formulas presented here can be obtained, albeit at the cost of some
eort, by particularizing the results found in several papers (in particular, see [3{5]). For
the sake of completeness, we restate some massive MIMO rate analysis results in a unied
notation consistent with this paper.
One use of the (complex discretetime baseband) channel observed at the kth user
receiving antenna can be represented as
y
k;j
¸
jPJ
?
g
k;j
h
H
k;j
x
j
z
k
; (A.1)
whereg
k;j
PR
denotes the largescale channel powergain coecient between userk and
BS j, comprising distancedependent pathloss and shadowing, x
j
PC
M
j
is the transmit
signal vector of BS j, z
k
CNp0;N
0
q is the additive Gaussian noise sample at receiver
k, and h
k;j
is the M
j
dimensional channel vector, formed by the smallscale fading coef
cients. We assume i.i.d. Rayleigh fading, such that h
k;j
has i.i.d. elementsCNp0; 1q.
The transmitted signals are constrained byEr}x
j
}
2
s¤P
j
, whereP
j
denotes the transmit
power of BS j. BS j sends S
j
downlink data streams in each scheduling resource block.
120
We let
j
S
j
{M
j
denote the spatial load (number of downlink streams per BS an
tenna). With linear multiuser MIMO precoding, each base station serving usersS
j
K
withS
j
S
j
forms its transmitted signal as x
j
°
kPS
j
v
k;j
d
k
, wheretd
k
u are mutually
uncorrelated zeromean data symbols with the same persymbol average energy for all
kPS
j
(we assume equal power per stream at each BS).
The precoding vectors tv
k;j
: k PS
k
u are computed by BS j as a function of its
Channel State Information (CSI). We follow the CSI estimation scheme based on TDD
with uplinkdownlink (uplinkdownlink) reciprocity as in [3] adapted to the heterogeneous
network at hand. Assuming blockfading, constant over timefrequency coherence blocks
of T channel uses, and letting maxtS
j
u¤Q¤T denote the uplink pilot dimension, the
CSI is obtained by letting the active users in each cell send their uplink pilot signals on
the rst Q symbols on each slot. Then, downlink data transmission takes place in the
remaining TQ symbols. We index the set of mutually orthogonal pilot signals by the
setQt1;:::;Qu. Pilot signals are distributed across the BSs such that BS j is given
a subsetQ
pjq
Q of sizeQ
pjq
 S
j
of mutually orthogonal pilots. We denote by qpkq
the pilot index of user k. In particular, ifkPS
j
, thenqpkqPQ
pjq
. Also, we letJ
pqq
J
denote the set of BSs which make use of pilot signal q, i.e.,J
pqq
tj PJ : q PQ
pjq
u.
The pilot signal allocation, dened equivalently by the ensembles of setstQ
pjq
:jPJu or
tJ
pqq
:qPQu, is optimized in some suitable way depending on the topology of the network
(see for example [5] for a thorough analysis of optimized pilot reuse schemes). Here, we
provide formulas that hold for any pilot allocation of the type considered in [3{5], i.e.,
where each cell is given a set of mutually orthogonal pilot signals, and sets of dierent
cells may have nonempty intersection (leading to pilot contamination). The specic
optimization of the pilot allocation across base stations, in a multicell scenario, is well
beyond the scope of this paper.
The uplink signal block received at BS j during the uplink training phase is given by
Y
ul
j
¸
`PJ
¸
kPS
`
?
g
k;j
h
k;j
H
qpkq
Z
ul
j
; (A.2)
121
where
q
PC
Q
is theqth pilot signal and Z
ul
j
PC
M
j
Q
with i.i.d. elementsCNp0;N
0
q.
We follow the CSI estimation approach given in [3], where BS j obtains the estimate
of the downlink channel vector for user k PS
j
up to a real positive scaling factor and
some bias additive terms known as pilot contamination by projecting Y
ul
j
along the pilot
signal vector
qpkq
. Pilot contamination is due to the fact that, since T is limited by the
channel coherence time and bandwidth, thenQ cannot be arbitrarily large. Hence, theQ
mutually orthogonal pilot signals must be reused by several BSs. In particular, the CSI
estimate for user kPS
j
, given by
1
p
h
k;j
Y
ul
j
qpkq
}
qpkq
}
2
¸
k
1
:qpk
1
qqpkq
?
g
k
1
;j
h
k
1
;j
r z
ul
k;j
; (A.3)
contains the linear combination of the channels from all users k
1
using the same pilot
signal qpkq (where kPS
j
and the other k
1
k are active in other cells) to BS j.
The most popular and simplest multiuser MIMO downlink precoding methods, widely
analyzed and also implemented in practice, are conjugate beamforming (CBF) [3] and
zeroforcing beamforming (ZFBF) [4,5].
User SINR with CBF: Particularizing the analysis in [5, Th. 1] to the notation
and CSI estimation given above, we nd that the Signal to Interference plus Noise Ratio
(SINR) at user k receiver served by BS j under the system assumptions given above,
for large M
j
and
j
S
j
{M
j
, is closely approximated (in the sense of Lemma 1) by the
deterministic quantity:
SINR
k;j
g
2
k;j
SNR
j
{
j
¸
`PJ
g
k;`
SNR
`
¸
`PJ
pqpkqq
:`j
g
2
k;`
SNR
`
{
`
; (A.4)
where SNR
j
P
j
{N
0
, and where ¥ 1 is a normalization factor, common to all BSs,
chosen to ensure that the transmit power constraint is not violated by any BS [3]. In
1
We let r z
ul
k;j
Z
ul
j
qpkq
}
qpkq
}
2
denote the projected noise vector with i.i.d. componentsCNp0;
2
q,
where
2
N
0
QPu
and where Pu denotes the energy per symbol of the uplink pilot signals.
122
practice, can be adjusted adaptively by measuring the transmit power at each BS (which
is a function of the precoding vectors and, as a consequence, of the estimated channel
vectors (A.3)) averaged over a suitably chosen time window. Notice also that in the
limit of innite antennas and nite number of downlink streams, i.e.,
j
Ñ 0 for all j,
the choice of becomes irrelevant, and the SINR converges to the wellknown massive
MIMO expression SINR
k;j
g
2
k;j
°
`PJ
pqpkqq
:`j
g
2
k;`
, in the symmetric case where SNR
j
and
j
are identical for all BS, as derived in [3].
User SINR with ZFBF: Particularizing the analysis in [5, Th. 2], we nd that the
Signal to Interference plus Noise Ratio (SINR) at userk receiver served by BSj under the
system assumptions given above, for large M
j
and
j
S
j
{M
j
, is closely approximated
(in the sense of Lemma 1) by the deterministic quantity:
SINR
k;j
p1
j
qg
2
k;j
SNR
j
{
j
2
g
k;j
SNR
j
¸
`PJ :`j
g
k;`
SNR
`
¸
`PJ
pqpkqq
:`j
p1
`
qg
2
k;`
SNR
`
{
`
: (A.5)
Comparing (A.5) with (A.4) we notice that the eect of ZFBF consists of decreasing the
beamforming gain and the pilot contamination eect by the quantityp1
j
q, due to zero
forcing precoding, and reducing the intracell interference from g
k;j
SNR
j
to
2
g
k;j
SNR
j
.
In fact, in the case of ideal channel estimation (i.e.,
2
0), the intracell interference
with ZFBF is exactly zero. Notice also that, in the limit of
j
Ñ 0 for all j, the SINRs
with ZFBF and CBF coincide, conrming the fact that CBF and ZFBF are equivalent in
the regime of very large antennas and nite number of users per BS [3].
User instantaneous rates: At this point, assuming Gaussian codebooks and tak
ing into account that the downlink data transmission phase takes place on T Q ¡ 0
dimensions, for each slot of T dimensions, the user rates expressed in bit/dimension are
given by
R
k;j
p1Q{Tq log
2
p1SINR
k;j
q: (A.6)
123
A.2 Proof of Theorem 1
We represent the network by a bipartite graphG pJ;K;Eq whereJ is the set of BS
nodes,K is the set of user nodes, andE J K is the set of edges indicating possible
association (for simplicity, here we letJ
k
J @ k). An integer scheduling conguration
corresponds to a collection of edgesF E, such that each BS j PJ is incident to at
most S
j
edges inF, while each user k is incident to at most one edge inF. When
S
j
1 @ j PJ , an integer scheduling congurationF corresponds to a matching in
G. For S
j
¡ 1, we can think of an integer scheduling congurationF as a generalized
matching. We now associate a point inR
E
to every integer scheduling congurationF.
For this purpose, given an integer scheduling congurationF, let its incidence vector
be where
k;j
1 if pk;jq PF and 0 otherwise. Let
denote the set of incidence
vectors where each incidence vector corresponds to an integer scheduling conguration.
By timesharing among such integer scheduling congurations, any feasible association
conguration in the convex hull of
can be achieved in the sense of longterm time
average. Let
P
1
cohp
q (A.7)
denote the convex polytope obtained by taking the convex hull of the points in
. Also, let
P denote the convex polytope corresponding to the set of linear constraints (2.7c){(2.7e),
i.e., containing all the feasible association congurations. The relation between the convex
polytopesP andP
1
is not clear a priori. If one could show thatP P
1
, then any feasible
association conguration can be realized by rst expressing the vector of user activity
fractions as a convex combination of integer scheduling congurations in
, and then
time sharing the transmission slots among those congurations with scheduling dictated
by the convex combination (see the observation made before Theorem 1 in Section 2.2.2).
Hence, proving P P
1
implies the proof of Theorem 1. We shall prove this assertion by
showing that both the relations P P
1
and P
1
P hold.
124
Proposition 1 P
1
P .
Proof Consider any integer scheduling conguration P
. It is easy to check that
satises the constraints (2.7c){(2.7e). Thus,
P holds and since P is a convex
polytope, P
1
cohp
q is also a subset of P .
Proposition 2 P P
1
.
We state a series of lemmas which provide a proof of Proposition 2. While we give proofs
for certain lemmas, the other lemmas are well known and the reader is referred to the
relevant literature in combinatorial optimization (see [98, 99] for example). The goal is
to show that the set of extreme points (vertices) of P is included in the set of incidence
vectors of integer scheduling congurations
. Once this is shown, we have our result
since P cohpextpPqq cohp
q P
1
where extpPq is the set of the extreme points of
P . We rewrite P , i.e., the constraints (2.7c){(2.7e) as:
P
$
'
'
'
'
&
'
'
'
'
%
:
J
K
I
¤
s
1
0
,
/
/
/
/
.
/
/
/
/

t : A¤ bu; (A.8)
where A
J
K
I
and b
s
1
0
. Here, J is a matrix of dimensionsJE with elements in
the binary sett0; 1u, where columns are indexed by edges inE and the rows are indexed
by the BSs inJ . Each column has exactly one 1 corresponding to the BS on which the
edge is incident. s is aJ 1 column vector with elementsS
j
@jPJ . Similarly, K is a
matrix of dimensionsKE with elements int0; 1u, where columns are again indexed
by the edges inE and the rows are indexed by the users inK. Again, each column has
exactly one 1, corresponding to the user on which the edge is incident. 1 is aK 1 all1
column vector. I is theEE identity matrix. Note that G
J
K
is the incidence
125
matrix of the bipartite graphG, i.e., G is a matrix of dimensions pJKqE and
elements int0; 1u, where columns are indexed by edges inE and each column has exactly
two 1's, corresponding to the two vertices of the edge (one vertex inJ and the other in
K).
Denition 16 Extreme Point: A point v in P is said to be an extreme point of P if
it cannot be expressed as a convex combination of points in Pztvu.
Let v be an extreme point of the convex polytopeP given by (A.8). Then, v must satisfy
the following lemmas:
Lemma 7 There areE constraints in A¤ b which are tight at v, i.e., a
T
i
vb
i
@iP
t1;:::;Eu and in addition, a
1
;:::; a
E
are linearly independent.
Proof ConsiderT ta
j
: a
T
j
vb
j
u. If dimpspanpTqq E, then there exists d 0 such
that d is orthogonal to spanpTq, i.e., for all a
j
PT , a
T
j
d 0 and therefore a
T
j
pvdq
a
T
j
vb
j
. For all other constraints, v satises strict inequality, i.e., a
T
i
v b
i
, so there is
some suciently small ¡ 0 such that a
T
i
pvdq¤b
i
and a
T
i
pvdq¤b
i
. This means
that vd and vd are inP which in turn implies that v
1
2
pvdq
1
2
pvdq is
expressed as a convex combination of two other feasible points. This contradicts the fact
that v is an extreme point.
Lemma 8 v is the unique solution to theE constraints which are tight from Lemma 7.
Proof The set of E linear equations from Lemma 7 is a rank E system of linear
equations inE dimensions. Thus, v is the unique solution to the system.
It follows that every extreme point (or vertex) of P is a unique solution to the linear
system obtained from the tightness ofE constraints in the set of constraints A¤ b.
Denition 17 Totally Unimodular Matrix: A matrix G is said to be totally uni
modular if every square submatrix of G has determinant 0,1 or1.
126
Lemma 9 For all bipartite graphsG, the incidence matrix G is totally unimodular.
Lemma 10 If G is totally unimodular, then
G
I
is totally unimodular.
See [98] for proofs of Lemmas 9 and 10. In particular, the incidence matrix G
J
K
of the bipartite graphG is totally unimodular from Lemma 9.
Let v be a vertex ofP . From Lemmas 7 and 8, there exists a rankE square submatrix
A
1
of A such that A
1
v b
1
and v is the unique solution to the system.
Lemma 11 v is an integer vector and is in the set of integer scheduling congurations
.
Proof A
1
is a full rank square submatrix of
G
I
and since
G
I
is totally unimodular
from Lemma 10, we have that det A
1
1. Now by Cramer's rule, we have the ith
component v
i
of v as:
v
i
detpA
1
i
b
1
q
detpA
1
q
(A.9)
where A
1
i
b
1
is A
1
with the ith column replaced by b
1
. Note that b has all integer
elements, implying that b
1
is an integer vector. Thus, with b
1
being an integer vector and
detpA
1
q1, we conclude that v
i
is an integer. Now, given that v is an integer vector
and it satises (2.7c){(2.7e), the only possible way for which this can happen is that v is
an integer scheduling conguration, i.e., vP
. This concludes the proof of Proposition
2.
127
A.3 Proof of Theorem 2
The Lagrangian corresponding to (2.28) is
q
Lp
j
;q
¸
kPK
j
p
k;j
R
k;j
q
1
1
p
¸
kPK
j
k;j
S
j
q (A.10)
where
j
p
1;j
;:::;
K
j
;j
q and ¥ 0. Since we assume
¥ 1, the optimal
j
must
have strictly positive components. (see Remark 1). By taking the partial derivative of
(A.10) with respect to
k;j
, we obtain the necessary and sucient KKT conditions for
optimality in the form
k;j
¤
R
1
k;j
(A.11)
where (A.11) must hold with equality for the variables
k;j
which are strictly less than 1
at the optimal solution. In addition, (2.28b) must hold with equality since all resources
are exhausted at the optimal solution, i.e.,
¸
kPK
j
k;j
S
j
(A.12)
Using the ordering (2.29) in (A.11), we obtain an explicit expression of the optimal
j
in terms of the Lagrangian multiplier as
k;j
$
'
&
'
%
1; for 1¤k¤k
1
R
1
k;j
for k
¤k¤K
j

(A.13)
wherek
Pt1;:::;K
j
u is such thatR
1
k
1;j
¥
¡R
1
k
;j
. Substituting (A.13) in (A.12),
we can solve for and get
K
j

°
kk
R
1
k;j
S
j
k
1
: (A.14)
Substituting (A.14) in (A.13), we nally obtain (2.31).
128
By the suciency of the KKT conditions, the value of k
can be found as follows:
the condition R
1
k
1;j
¥
¡ R
1
k
;j
with
given by (A.14) is sequentially tested for
tentative values of k
1; 2; 3;::: and the search is stopped (and the corresponding k
is chosen) as soon as this condition is satised.
129
Appendix B
B.1 Proof of Lemma 4
Let
opt
1
and
opt
2
be the optimal solutions of problems (3.7) and (3.20) { (3.24), respec
tively. Formally,
opt
1
is the supremum objective function value over all algorithms that
satisfy the constraints of problem (3.7). The value
opt
2
is dened similarly. Now, x
¡ 0 and let a
ptq be a policy that satises all constraints of the transformed problem
(3.20) { (3.24) and achieves a utility not smaller than
opt
2
. We have
opt
2
¤
¸
uPU
u
p
u
q
paq
¤
¸
uPU
u
p
u
q
pbq
¤
¸
uPU
u
pD
u
q
pcq
¤
1
opt
; (B.1)
where (a) follows from Jensen's inequality applied to the concave function
u
pq, (b)
follows by noticing that the policy a
ptq satises the constraint (3.22) and
u
pq is non
decreasing, and (c) follows from the fact that since a
ptq is feasible for problem (3.20) {
(3.24), then it also satises the constraints of problem (3.7) and therefore it is feasible
for the latter. As this holds for all ¡ 0, we conclude that
opt
2
¤
opt
1
.
Now, leta
1
ptq be a policy for the original problem (3.7), achieving a utility not smaller
than
opt
1
. Sincea
1
ptq is feasible for (3.7), it also satises the constraints (3.21), (3.24)
of the transformed problem. Further, we choose
1
ptq D
1
for all time t. Such choice
130
of
1
ptq together with the policy a
1
ptq forms a feasible policy for problem (3.20) { (3.24).
Therefore:
opt
1
¤
¸
uPU
u
pD
1
u
q
¸
uPU
u
p
1
u
q¤
opt
2
: (B.2)
As this holds for all ¡ 0, we conclude that
opt
1
¤
opt
2
. Thus, (B.1) and (B.2) imply
that
opt
1
opt
2
and, by comparing the constraint, it is immediate to conclude that an
optimal policy for the transformed problem can be directly turned into an optimal policy
for the original problem.
B.2 Proof of Theorem 3 and of Corollary 2
As in Section 3.2.2, we consider the following problem, equivalent to (3.35) { (3.37), which
involves a sum of timeaverages instead of functions of time averages and introduces the
auxiliary variables
u
pq:
maximize
1
T
pj1qT1
¸
jT
¸
uPU
u
p
u
pqq (B.3)
subject to
1
T
pj1qT1
¸
jT
rkR
hu
pqn
hu
pqs¤ 0@ph;uqPE (B.4)
1
T
pj1qT1
¸
jT
r
u
pqD
u
pqs¤ 0@ uPU (B.5)
D
min
u
¤
u
pq¤D
max
u
@ uPU; @ PtjT;:::;pj 1qT 1u (B.6)
apqPA
!pq
@ P tjT;:::;pj 1qT 1u: (B.7)
The update equations for the transmission queuesQ
hu
@ph;uqPE and the virtual queues
u
@uPU are given in (3.4) and in (3.11), respectively. Let Gpq
Q
T
pq;
T
pq
T
be
the combined queue backlogs column vector, and dene the quadratic Lyapunov function
131
LpGpqq
1
2
G
T
pqGpq. Fix a particular slot in the jth frame. We rst consider the
oneslot drift of LpGpqq. From (3.28), we know that
LpGp 1qqLpGpqq¤KpRpqpqq
T
Qpqp
pq Dpqq
T
pq (B.8)
whereK is a uniform bound on the term
1
2
T
pqpqR
T
pqRpq
1
2
p
pqDpqq
T
p
pqDpqq,
that exists under the realistic assumption that the source coding rates, the channel coding
rates and the video quality measures are upper bounded by some constants, independent
of . We chooseK such that
K¡ 2
T
(B.9)
where is a vector whose components are all equal to the same number and this
number is a uniform upper bound on the maximum possible magnitude of drift in any
of the queues (both actual and virtual) in one slot. With the additional penalty term
V
°
uPU
u
p
u
pqq added on both sides of (B.8), we have the following DPP inequality:
LpGp 1qqLpGpqqV
¸
uPU
u
p
u
pqq
¤KpRpqpqq
T
Qpqp
pq Dpqq
T
pqV
¸
uPU
u
p
u
pqq
(B.10)
Let tapqu
pj1qT1
jT
denote the DPP policy which minimizes the right hand side of the
drift plus penalty inequality (B.10). Since it minimizes the expression on the RHS of
(B.10), any other policy ta
pqu
pj1qT1
jT
comprising of the decisions tm
u
pqu
pj1qT1
jT
,
132
tR
pqu
pj1qT1
jT
, t
pqu
pj1qT1
jT
and t
pqu
pj1qT1
jT
would give a larger value of the
expression. We therefore have
LpGp 1qqLpGpqqV
¸
uPU
u
p
u
pqq
¤KpR
pq
pqq
T
Qpqp
pq D
pqq
T
pq V
¸
uPU
u
p
u
pqq:
(B.11)
Further, we note that the maximum change in the queue length vectorsQ
hu
pq and
u
pq
from one slot to the next is bounded by. Thus, we have for all PtjT;:::;pj1qT1u
Q
hu
pqQ
hu
pjTq¤pjTq @ph;uqPE (B.12)

u
pq
u
pjTq¤pjTq @ uPU (B.13)
Substituting the above inequalities in (B.11), we have
LpGp 1qqLpGpqqV
¸
uPU
u
p
u
pqq¤KpR
pq
pqq
T
pQpjTqpjTqq
p
pq D
pqq
T
ppjTqpjTqq
V
¸
uPU
u
p
u
pqq: (B.14)
Then, summing (B.14) over PtjT;:::;pj 1qT 1u, we obtain the T slot Lyapunov
drift over the jth frame:
LpGppj 1qTqqLpGpjTqqV
jTT1
¸
jT
¸
uPU
u
p
u
pqq
¤KT
¸
jTT1
jT
pR
pq
pqq
T
QpjTq
¸
jTT1
jT
pR
pq
pqqpjTq
T
¸
jTT1
jT
p
pq D
pqq
T
pjTq
¸
jTT1
jT
p
pq D
pqqpjTq
T
V
¸
jTT1
jT
¸
uPU
u
p
u
pqq (B.15)
133
Using the inequalities R
pq
pq¤ 2,
pq D
pq¤ 2 in (B.15), we have
LpGppj 1qTqqLpGpjTqqV
jTT1
¸
jT
¸
uPU
u
p
u
pqq
¤KT
¸
jTT1
jT
pR
pq
pqq
T
QpjTq 2
¸
jTT1
jT
pjTq
T
¸
jTT1
jT
p
pq D
pqq
T
pjTq 2
¸
jTT1
jT
pjTq
T
V
¸
jTT1
jT
¸
uPU
u
p
u
pqq (B.16)
Using
T
¤
K
2
,
°
jTT1
jT
pjTq
TpT1q
2
, we get
LpGppj 1qTqqLpGpjTqqV
jTT1
¸
jT
¸
uPU
u
p
u
pqq
¤KTKTpT 1q
¸
jTT1
jT
pR
pq
pqq
T
QpjTq
¸
jTT1
jT
p
pq D
pqq
T
pjTqV
¸
jTT1
jT
¸
uPU
u
p
u
pqq (B.17)
We now consider the policyta
pqu
pj1qT1
jT
satisfying the following constraints:
1
1
T
pj1qT1
¸
jT
rkR
hu
pqn
hu
pqs @ph;uqPE (B.18)
1
T
pj1qT1
¸
jT
r
u
pqD
u
pqs @ uPU (B.19)
1
It is easy to see that such policy is guaranteed to exist provided that we allow, without loss of
generality, for a virtual video layer of zero quality and zero rate, and in the assumption that, at any time
t, each useru has at least one linkph;uqPE withhPNpuqXHpfuq with peak rateC
hu
ptq lower bounded
by some strictly positive number Cmin. This prevents the case where a user gets zero rate for a whole
frame of length T . This assumption is not restrictive in practice since a user experiencing unacceptably
poor link quality to all the helpers for a long time interval would be disconnected from the network and
its streaming session halted.
134
where ¡ 0 is arbitrary. We plug in the inequalities (B.18), (B.19) in (B.17) and obtain
LpGppj 1qTqqLpGpjTqqV
jTT1
¸
jT
¸
uPU
u
p
u
pqq
KT
2
T
¸
ph;uqPE
Q
hu
pjTqT
¸
uPU
u
pjTqV
¸
jTT1
jT
¸
uPU
u
p
u
pqq (B.20)
Also, considering the fact that for any vector
p
1
;:::;
U
q we have
¸
uPU
u
pD
min
u
q
min
¤
¸
uPU
u
p
u
q¤
max
¸
uPU
u
pD
max
u
q; (B.21)
we can write:
LpGppj 1qTqqLpGpjTqq
KT
2
VTp
max
min
qT
¸
ph;uqPE
Q
hu
pjTqT
¸
uPU
u
pjTq
(B.22)
Once again using (B.12), (B.13), we have:
LpGppj 1qTqqLpGpjTqq KT
2
VTp
max
min
q
jTT1
¸
jT
¸
ph;uqPE
Q
hu
pq
jTT1
¸
jT
¸
uPU
u
pq
pEUqTpT 1q
2
(B.23)
Summing the above over the frames jPt0;:::;F 1u yields
LpGppFTqqLpGp0qq KT
2
FVFTp
max
min
q
FT1
¸
0
¸
ph;uqPE
Q
hu
pq
FT1
¸
0
¸
uPU
u
pq
pEUqFTpT 1q
2
(B.24)
135
Rearranging and neglecting appropriate terms, we get
1
FT
FT1
¸
0
¸
ph;uqPE
Q
hu
pq
1
FT
FT1
¸
0
¸
uPU
u
pq
KT
Vp
max
min
q
LpGp0qq
FT
pEUqpT 1q
2
(B.25)
Taking limits as F Ñ8
lim
FÑ8
1
FT
FT1
¸
0
¸
ph;uqPE
Q
hu
pq
¸
uPU
u
pq
KT
Vp
max
min
q
pEUqpT 1q
2
(B.26)
such that (3.39) is proved.
We now consider the policyta
pqu
pj1qT1
jT
which achieves the optimal solution
opt
j
to the problem (B.3) { (B.7). Using (B.4) and (B.5) in (B.17), we have
LpGppj 1qTqqLpGpjTqqV
jTT1
¸
jT
¸
uPU
u
p
u
pqq¤KTKTpT 1qVT
opt
j
(B.27)
Summing this over jPt0;:::;F 1u, yields
LpGppFTqqLpGp0qqV
FT1
¸
0
¸
uPU
u
p
u
pqq¤KT
2
FVT
F1
¸
j0
opt
j
: (B.28)
Dividing both sides by VFT and using the fact that LpGppFTqq¡ 0 , we get
1
FT
FT1
¸
0
¸
uPU
u
p
u
pqq¥
1
F
F1
¸
j0
opt
j
KT
V
LpGp0qq
VTF
: (B.29)
At this point, using Jensen's inequality, the fact that
u
pq is continuous and non
decreasing for alluPU, and the fact that the strong stability of the queues (B.26) implies
136
that lim
FÑ8
1
FT
°
FT1
0
u
pq 8@uPU, which in turns implies that
u
¤D
u
@uPU,
we arrive at
¸
uPU
u
D
u
¥ lim
FÑ8
1
F
F1
¸
j0
opt
j
KT
V
: (B.30)
such that (3.38) is proved.
Thus, the utility under the DPP policy is within Op1{Vq of the time average of the
opt
j
utility values that can be achieved only if knowledge of the future states up to a
lookahead of blocks of T slots. If T is increased, then the value of
opt
j
for every frame
j improves since we allow a larger lookahead. However, from (B.30), we can see that if
T is increased, thenV can also be increased in order to maintain the same distance from
optimality. This yields a corresponding OpVq increase in the queues backlog.
For the case where the network state!ptq is stationary and ergodic, the time average
in the left hand side of (B.26) and in the right hand side of (B.30) become ensemble
averages because of ergodicity. Thus, we obtain (3.40) and (3.41). Furthermore, if the
network state is i.i.d., we can take T 1 in the above derivation, obtaining the bounds
given in Corollary 2.
137
Appendix C
C.1 Proof of Theorem 4 and of Corollary 3
As in Section 4.3, we consider the following problem, equivalent to (4.36) { (4.38), which
involves a sum of timeaverages instead of functions of time averages and introduces the
auxiliary variables
u
pq:
maximize
1
T
pj1qT1
¸
jT
¸
uPU
u
p
u
pqq (C.1)
subject to
1
T
pj1qT1
¸
jT
rB
fu
pm
u
pq;q
u
pqs¤ 0@ uPU (C.2)
1
T
pj1qT1
¸
jT
r
u
pqD
u
pqs¤ 0@ uPU (C.3)
D
min
u
¤
u
pq¤D
max
u
@ uPU; @ PtjT;:::;pj 1qT 1u (C.4)
apqPA
!pq
@ P tjT;:::;pj 1qT 1u: (C.5)
The update equations for the request queuesQ
u
@uPU and the virtual queues
u
@uP
U are given in (4.2) and in (4.14), respectively. Let Gpq
Q
T
pq;
T
pq
T
be the
combined queue backlogs column vector, and dene the quadratic Lyapunov function
LpGpqq
1
2
G
T
pqGpq. Fix a particular slot in the jth frame. We rst consider the
oneslot drift of LpGpqq. From (4.25), we know that
LpGp 1qqLpGpqq¤KpBpqpqq
T
Qpqp
pq Dpqq
T
pq (C.6)
138
whereK is a uniform bound on the term
1
2
pBpqpqq
T
pBpqpqq p
pq Dpqq
T
p
pq Dpqq
;
which exists under the realistic assumption that the chunk sizes, the transmission rates
and the video quality measures are upper bounded by some constants, independent of .
We chooseK such that
K¡ 2
T
(C.7)
where is a vector whose components are all equal to the same number and this
number is a uniform upper bound on the maximum possible magnitude of drift in any
of the queues (both actual and virtual) in one slot. With the additional penalty term
V
°
uPU
u
p
u
pqq added on both sides of (C.6), we have the following DPP inequality:
LpGp 1qqLpGpqqV
¸
uPU
u
p
u
pqq
KpBpqpqq
T
Qpqp
pq Dpqq
T
pqV
¸
uPU
u
p
u
pqq (C.8)
Let tapqu
pj1qT1
jT
denote the DPP policy which minimizes the right hand side of the
driftpluspenalty inequality (C.8). Since it minimizes the expression on the RHS of
(C.8), any other policy ta
pqu
pj1qT1
jT
comprising of the decisions tm
u
pqu
pj1qT1
jT
,
tB
pqu
pj1qT1
jT
, t
pqu
pj1qT1
jT
and t
pqu
pj1qT1
jT
would give a larger value of the
expression. We therefore have
LpGp 1qqLpGpqqV
¸
uPU
u
p
u
pqq
¤KpB
pq
pqq
T
Qpqp
pq D
pqq
T
pqV
¸
uPU
u
p
u
pqq:
(C.9)
139
Further, we note that the maximum change in the queue length vectorsQ
u
pq and
u
pq
from one slot to the next is bounded by. Thus, we have for all PtjT;:::;pj1qT1u
Q
u
pqQ
u
pjTq¤pjTq @ uPU (C.10)

u
pq
u
pjTq¤pjTq @ uPU (C.11)
Substituting the above inequalities in (C.9), we have
LpGp 1qqLpGpqqV
¸
uPU
u
p
u
pqq¤KpB
pq
pqq
T
pQpjTqpjTqq
p
pq D
pqq
T
ppjTqpjTqq
V
¸
uPU
u
p
u
pqq: (C.12)
Then, summing (C.12) over PtjT;:::;pj 1qT 1u, we obtain the T slot Lyapunov
drift over the jth frame:
LpGppj 1qTqqLpGpjTqqV
jTT1
¸
jT
¸
uPU
u
p
u
pqq
¤KT
¸
jTT1
jT
pB
pq
pqq
T
QpjTq
¸
jTT1
jT
pB
pq
pqqpjTq
T
¸
jTT1
jT
p
pq D
pqq
T
pjTq
¸
jTT1
jT
p
pq D
pqqpjTq
T
V
¸
jTT1
jT
¸
uPU
u
p
u
pqq (C.13)
Using the inequalities B
pq
pq¤ 2,
pq D
pq¤ 2 in (C.13), we have
LpGppj 1qTqqLpGpjTqqV
jTT1
¸
jT
¸
uPU
u
p
u
pqq
¤KT
¸
jTT1
jT
pB
pq
pqq
T
QpjTq 2
¸
jTT1
jT
pjTq
T
¸
jTT1
jT
p
pq D
pqq
T
pjTq 2
¸
jTT1
jT
pjTq
T
V
¸
jTT1
jT
¸
uPU
u
p
u
pqq (C.14)
140
Using
T
¤
K
2
,
°
jTT1
jT
pjTq
TpT1q
2
, we get
LpGppj 1qTqqLpGpjTqqV
jTT1
¸
jT
¸
uPU
u
p
u
pqq
¤KTKTpT 1q
¸
jTT1
jT
pB
pq
pqq
T
QpjTq
¸
jTT1
jT
p
pq D
pqq
T
pjTqV
¸
jTT1
jT
¸
uPU
u
p
u
pqq (C.15)
We now consider the policyta
pqu
pj1qT1
jT
satisfying the following constraints:
1
1
T
pj1qT1
¸
jT
B
fu
pm
u
pq;q
u
pq
@ uPU (C.16)
1
T
pj1qT1
¸
jT
r
u
pqD
u
pqs @ uPU (C.17)
where ¡ 0 is arbitrary. We plug in the inequalities (C.16), (C.17) in (C.15) and obtain
LpGppj 1qTqqLpGpjTqqV
jTT1
¸
jT
¸
uPU
u
p
u
pqq
KT
2
T
¸
uPU
Q
u
pjTqT
¸
uPU
u
pjTqV
¸
jTT1
jT
¸
uPU
u
p
u
pqq (C.18)
Also, considering the fact that for any vector
p
1
;:::;
U
q we have
¸
uPU
u
pD
min
u
q
min
¤
¸
uPU
u
p
u
q¤
max
¸
uPU
u
pD
max
u
q; (C.19)
1
It is easy to see that such policy is guaranteed to exist provided that we allow, without loss of
generality, for a virtual video layer of zero quality and zero rate, and in the assumption that, at any slot
t, each useru has at least one linkph;uqPE withhPNpuqXHpfuq with peak rateC
hu
ptq lower bounded
by some strictly positive number Cmin. This prevents the case where a user gets zero rate for a whole
frame of length T . This assumption is not restrictive in practice since a user experiencing unacceptably
poor link quality to all the helpers for a long time interval would be disconnected from the network and
its streaming session halted.
141
we can write:
LpGppj 1qTqqLpGpjTqq KT
2
VTp
max
min
qT
¸
uPU
Q
u
pjTqT
¸
uPU
u
pjTq
(C.20)
Once again using (C.10), (C.11), we have:
LpGppj 1qTqqLpGpjTqq KT
2
VTp
max
min
q
jTT1
¸
jT
¸
uPU
Q
u
pq
jTT1
¸
jT
¸
uPU
u
pqUTpT 1q (C.21)
Summing the above over the frames jPt0;:::;F 1u yields
LpGppFTqqLpGp0qq KT
2
FVFTp
max
min
q
FT1
¸
0
¸
uPU
Q
u
pq
FT1
¸
0
¸
uPU
u
pqUFTpT 1q (C.22)
Rearranging and neglecting appropriate terms, we get
1
FT
FT1
¸
0
¸
uPU
Q
u
pq
1
FT
FT1
¸
0
¸
uPU
u
pq
KT
Vp
max
min
q
LpGp0qq
FT
UpT 1q (C.23)
Taking limits as F Ñ8
lim
FÑ8
1
FT
FT1
¸
0
¸
uPU
Q
u
pq
¸
uPU
u
pq
KT
Vp
max
min
q
UpT 1q
(C.24)
such that (4.40) is proved.
142
We now consider the policyta
pqu
pj1qT1
jT
which achieves the optimal solution
opt
j
to the problem (C.1) { (C.5). Using (C.2) and (C.3) in (C.15), we have
LpGppj 1qTqqLpGpjTqqV
jTT1
¸
jT
¸
uPU
u
p
u
pqq¤KTKTpT 1qVT
opt
j
(C.25)
Summing this over jPt0;:::;F 1u, yields
LpGppFTqqLpGp0qqV
FT1
¸
0
¸
uPU
u
p
u
pqq¤KT
2
FVT
F1
¸
j0
opt
j
: (C.26)
Dividing both sides by VFT and using the fact that LpGppFTqq¡ 0 , we get
1
FT
FT1
¸
0
¸
uPU
u
p
u
pqq¥
1
F
F1
¸
j0
opt
j
KT
V
LpGp0qq
VTF
: (C.27)
At this point, using Jensen's inequality, the fact that
u
pq is continuous and non
decreasing for alluPU, and the fact that the strong stability of the queues (C.24) implies
that lim
FÑ8
1
FT
°
FT1
0
u
pq 8@uPU, which in turns implies that
u
¤D
u
@uPU,
we arrive at
¸
uPU
u
D
u
¥ lim
FÑ8
1
F
F1
¸
j0
opt
j
KT
V
: (C.28)
such that (4.39) is proved.
Thus, the utility under the DPP policy is within Op1{Vq of the time average of the
opt
j
utility values that can be achieved only if knowledge of the future states up to a
lookahead of blocks of T slots. If T is increased, then the value of
opt
j
for every frame
j improves since we allow a larger lookahead. However, from (C.28), we can see that if
T is increased, thenV can also be increased in order to maintain the same distance from
optimality. This yields a corresponding OpVq increase in the queues backlog.
143
For the case where the network state!ptq is stationary and ergodic, the time average
in the left hand side of (C.24) and in the right hand side of (C.28) become ensemble
averages because of ergodicity. Thus, we obtain (4.41) and (4.42). Furthermore, if the
network state is i.i.d., we can take T 1 in the above derivation, obtaining the bounds
given in Corollary 3.
144
References
[1] \Cisco visual networking index: Global mobile data trac forecast update,
20132018." [Online]. Available: http://goo.gl/1XYhqY
[2] H. Ishii, Y. Kishiyama, and H. Takahashi, \A novel architecture for LTEB: C
plane/Uplane split and phantom cell concept," in IEEE Globecom Workshops, 2012,
pp. 624{630.
[3] T. L. Marzetta, \Noncooperative cellular wireless with unlimited numbers of base
station antennas," IEEE Trans. on Wireless Commun., vol. 9, no. 11, pp. 3590{3600,
2010.
[4] J. Hoydis, S. Ten Brink, and M. Debbah, \Massive MIMO: How many antennas do
we need?" in IEEE 49th Annual Allerton Conference on Communication, Control,
and Computing, 2011, pp. 545{550.
[5] H. Huh, G. Caire, H. Papadopoulos, and S. Ramprashad, \Achieving massive MIMO
spectral eciency with a notsolarge number of antennas," IEEE Trans. on Wireless
Commun., vol. 11, no. 9, pp. 3226{3239, 2012.
[6] J. Hoydis, M. Kobayashi, and M. Debbah, \Green smallcell networks," IEEE Ve
hicular Technology Magazine,, vol. 6, no. 1, pp. 37{43, 2011.
[7] V. Chandrasekhar, J. Andrews, and A. Gatherer, \Femtocell networks: a survey,"
IEEE Commun. Magazine,, vol. 46, no. 9, pp. 59{67, 2008.
[8] \Quantenna 4 4 MIMO technology." [Online]. Available: http://www.quantenna.
com/4x4mimo.html
[9] \Broadcom 6 6 MIMO press release." [Online]. Available: http://goo.gl/zyLQ54
[10] T. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. Wong, J. Schulz,
M. Samimi, and F. Gutierrez, \Millimeter wave mobile communications for 5G cel
lular: It will work!" IEEE Access, vol. 1, pp. 335{349, 2013.
[11] A. Adhikary, E. A. Safadi, M. Samimi, R. Wang, G. Caire, T. S. Rappaport, and
A. F. Molisch, \Joint spatial division and multiplexing for mmwave channels," IEEE
J. Sel. Areas in Commun., vol. 32, no. 6, pp. 1239{1255, 2014.
145
[12] A. Ghosh, N. Mangalvedhe, R. Ratasuk, B. Mondal, M. Cudak, E. Visotsky, T. A.
Thomas, J. G. Andrews, P. Xia, H. S. Jo et al., \Heterogeneous cellular networks:
From theory to practice," IEEE Communications Magazine, vol. 50, no. 6, pp. 54{64,
2012.
[13] \3GPP TR36.872 specication." [Online]. Available: http://www.3gpp.org/
dynareport/36872.htm
[14] H. S. Dhillon, R. K. Ganti, F. Baccelli, and J. G. Andrews, \Modeling and anal
ysis of ktier downlink heterogeneous cellular networks," IEEE J. on Sel. Areas in
Commun., vol. 30, no. 3, pp. 550{560, 2012.
[15] J. G. Andrews, S. Singh, Q. Ye, X. Lin, and H. Dhillon, \An overview of load bal
ancing in HetNets: Old myths and open problems," arXiv preprint arXiv:1307.7779,
2013.
[16] Q. Ye, B. Rong, Y. Chen, M. AlShalash, C. Caramanis, and J. Andrews, \User
association for load balancing in heterogeneous cellular networks," IEEE Trans. on
Wireless Commun., vol. 12, no. 6, pp. 2706{2716, 2013.
[17] A. Gupta, H. Dhillon, S. Vishwanath, and J. Andrews, \Downlink multiantenna
heterogeneous cellular network with load balancing," IEEE Trans. on Commun.,
vol. 62, no. 11, pp. 4052{4067, 2014.
[18] H. Huh, A. M. Tulino, and G. Caire, \Network MIMO with linear zeroforcing
beamforming: Large system analysis, impact of channel estimation, and reduced
complexity scheduling," IEEE Trans. on Information Theory, vol. 58, no. 5, pp.
2911{2934, 2012.
[19] H. Huh, S.H. Moon, Y.T. Kim, I. Lee, and G. Caire, \Multicell MIMO downlink
with cell cooperation and fair scheduling: a largesystem limit analysis," IEEE Trans.
on Information Theory, vol. 57, no. 12, pp. 7771{7786, 2011.
[20] M. Hong and Z.Q. Luo, \Distributed linear precoder optimization and base station
selection for an uplink heterogeneous network," IEEE Trans. on Signal Processing,
vol. 61, no. 12, pp. 3214{3228, 2013.
[21] M. Sanjabi, M. Razaviyayn, and Z.Q. Luo, \Optimal joint base station assignment
and downlink beamforming for heterogeneous networks," in IEEE Int. Conf. on
Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012, pp. 2821{2824.
[22] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran, \Multiuser MIMO achievable
rates with downlink training and channel state feedback," IEEE Trans. on Inform.
Theory, vol. 56, no. 6, pp. 2845{2866, 2010.
[23] E. Aryafar, A. KeshavarzHaddad, M. Wang, and M. Chiang, \RAT selection games
in HetNets," in Proc. IEEE INFOCOM, April 2013, pp. 998{1006.
146
[24] \Conviva viewer experience report, 2014." [Online]. Available: http://www.conviva.
com/vxr/
[25] \MPEG DASH standard." [Online]. Available: http://dashif.org/mpegdash
[26] [Online]. Available: https://www.google.com/get/videoqualityreport/
[27] [Online]. Available: http://ispspeedindex.net
ix.com/
[28] S. Sesia, I. Touk, and M. Baker, LTE: the Long Term EvolutionFrom theory to
practice. Wiley, 2009.
[29] T. L. Marzetta, \Noncooperative cellular wireless with unlimited numbers of base
station antennas," IEEE Trans. on Wireless Commun., vol. 9, no. 11, pp. 3590{3600,
Nov. 2010.
[30] M. Ji, G. Caire, and A. F. Molisch, \Optimal throughputoutage tradeo in wireless
onehop caching networks," arXiv preprint arXiv:1302.2168, 2013.
[31] , \Wireless devicetodevice caching networks: Basic principles and system per
formance," arXiv preprint arXiv:1305.5216, 2013.
[32] N. Golrezaei, A. F. Molisch, A. G. Dimakis, and G. Caire, \Femtocaching and device
todevice collaboration: A new architecture for wireless video distribution," Com
munications Magazine, IEEE, vol. 51, no. 4, pp. 142{149, 2013.
[33] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, \Femto
caching: Wireless video content delivery through distributed caching helpers," arXiv
preprint arXiv:1109.4179, 2011.
[34] N. Golrezaei, K. Shanmugam, A. G. Dimakis, A. F. Molisch, and G. Caire, \Fem
tocaching: Wireless video content delivery through distributed caching helpers," in
INFOCOM, Proceedings. IEEE, 2012, pp. 1107{1115.
[35] , \Wireless video content delivery through coded distributed caching," in Com
munications (ICC), International Conference on. IEEE, 2012, pp. 2467{2472.
[36] M. A. MaddahAli and U. Niesen, \Fundamental limits of caching," in Information
Theory Proceedings (ISIT), International Symposium on. IEEE, 2013, pp. 1077{
1081.
[37] , \Decentralized coded caching attains orderoptimal memoryrate tradeo,"
arXiv preprint arXiv:1301.5848, 2013.
[38] U. Niesen and M. A. MaddahAli, \Coded caching with nonuniform demands," arXiv
preprint arXiv:1308.0178, 2013.
[39] R. Pedarsani, M. A. MaddahAli, and U. Niesen, \Online coded caching," arXiv
preprint arXiv:1311.3646, 2013.
147
[40] M. Ji, A. M. Tulino, J. Llorca, and G. Caire, \Order optimal coded cachingaided
multicast under zipf demand distributions," arXiv preprint arXiv:1402.4576, 2014.
[41] M. Ji, G. Caire, and A. F. Molisch, \Fundamental limits of distributed caching in
d2d wireless networks," in Information Theory Workshop (ITW). IEEE, 2013, pp.
1{5.
[42] V. Joseph and G. de Veciana, \Nova: Qoedriven optimization of dashbased video
delivery in networks," arXiv preprint arXiv:1307.7210, 2013.
[43] C. Gong and X. Wang, \Adaptive transmission for delayconstrained wireless video,"
Wireless Communications, IEEE Transactions on, vol. 13, no. 1, pp. 49{61, January
2014.
[44] A. A. Khalek, C. Caramanis, and R. W. Heath Jr, \Loss visibility optimized realtime
video transmission over mimo systems," arXiv preprint arXiv:1301.3174, 2013.
[45] M. Zhao, X. Gong, J. Liang, W. Wang, X. Que, and S. Cheng, \Scheduling and
resource allocation for wireless dynamic adaptive streaming of scalable videos over
http," in Communications (ICC), 2014 IEEE International Conference on, June
2014, pp. 1681{1686.
[46] C. Chen, R. Heath, A. Bovik, and G. de Veciana, \Adaptive policies for realtime
video transmission: A markov decision process framework," in Image Processing
(ICIP), 2011 18th IEEE International Conference on. IEEE, 2011, pp. 2249{2252.
[47] D. Bethanabhotla, G. Caire, and M. J. Neely, \Adaptive video streaming for wireless
networks with multiple users and helpers," Communications, IEEE Transactions on,
vol. 63, no. 1, pp. 268{285, Jan 2015.
[48] F. Kelly, \The mathematics of trac in networks," The Princeton Companion to
Mathematics, 2006.
[49] Y. Yi and M. Chiang, \Stochastic network utility maximisationa tribute to Kelly's
paper published in this journal a decade ago," European Transactions on Telecom
munications, vol. 19, no. 4, pp. 421{442, 2008.
[50] M. Chiang, S. Low, A. Calderbank, and J. Doyle, \Layering as optimization decom
position: A mathematical theory of network architectures," Proceedings of the IEEE,
vol. 95, no. 1, pp. 255{312, 2007.
[51] J. Mo and J. Walrand, \Fair endtoend windowbased congestion control,"
IEEE/ACM Transactions on Networking (ToN), vol. 8, no. 5, pp. 556{567, 2000.
[52] M. Neely, \Stochastic network optimization with application to communication and
queueing systems," Synthesis Lectures on Communication Networks, vol. 3, no. 1,
pp. 1{211, 2010.
148
[53] , \Universal scheduling for networks with arbitrary trac, channels, and mobil
ity," in Decision and Control (CDC), 2010 49th IEEE Conference on. IEEE, 2010,
pp. 1822{1829.
[54] M. J. Neely, \Wireless peertopeer scheduling in mobile networks," in 46th Annual
Conference on Information Sciences and Systems (CISS). IEEE, 2012, pp. 1{6.
[55] A. Ortega, \Variable bitrate video coding," Compressed Video over Networks, pp.
343{382, 2000.
[56] \IEEE 802.11ac  Wikipedia," https://en.wikipedia.org/wiki/IEEE 802.11ac, Last
Updated: January 14, 2014.
[57] J. Andrews, S. Buzzi, W. Choi, S. Hanly, A. Lozano, A. Soong, and J. Zhang, \What
will 5G be?" Selected Areas in Communications, IEEE Journal on, vol. 32, no. 6,
pp. 1065{1082, June 2014.
[58] D. Bethanabhotla, O. Y. Bursalioglu, H. C. Papadopoulos, and G. Caire, \Op
timal usercell association for massive MIMO wireless networks," arXiv preprint
arXiv:1407.6731, 2014.
[59] \pcell white paper." [Online]. Available: http://goo.gl/lDrofd
[60] S. V. Hanly, \An algorithm for combined cellsite selection and power control to
maximize cellular spread spectrum capacity," IEEE J. on Sel. Areas in Commun.,
vol. 13, no. 7, pp. 1332{1340, 1995.
[61] K. Son, S. Chong, and G. Veciana, \Dynamic association for load balancing and
interference avoidance in multicell networks," IEEE Trans. on Wireless Commun.,
vol. 8, no. 7, pp. 3566{3576, 2009.
[62] S. Stanczak, M. Wiczanowski, and H. Boche, \Distributed utilitybased power con
trol: Objectives and algorithms," IEEE Trans. on Signal Processing, vol. 55, no. 10,
pp. 5058{5068, 2007.
[63] M. Schubert and H. Boche, QoSbased resource allocation and transceiver optimiza
tion. Now Publishers Inc, 2006.
[64] M. Razaviyayn, M. Hong, and Z.Q. Luo, \Linear transceiver design for a mimo
interfering broadcast channel achieving max{min fairness," Signal Processing, vol. 93,
no. 12, pp. 3327{3340, 2013.
[65] G. Athanasiou, P. C. Weeraddana, C. Fischione, and L. Tassiulas, \Optimizing client
association in 60 GHz wireless access networks," arXiv preprint arXiv:1301.2723,
2013.
[66] K. Shen and W. Yu, \Downlink cell association optimization for heterogeneous net
works via dual coordinate descent," in IEEE Int. Conf. on Acoustics, Speech and
Signal Processing (ICASSP). IEEE, 2013, pp. 4779{4783.
149
[67] A. F. Molisch, Wireless Communications. Wiley, 2010, vol. 15.
[68] G. Grimmett and D. Stirzaker, Probability and random processes. Oxford Univ.
Press, 1992.
[69] R. Couillet and M. Debbah, Random matrix methods for wireless communications.
Cambridge University Press, 2011.
[70] Y. Bejerano, S.J. Han, and L. E. Li, \Fairness and load balancing in wireless LANs
using association control," in Proceedings of the 10th Annual International Confer
ence on Mobile Computing and Networking. ACM, 2004, pp. 315{329.
[71] L. Li, M. Pal, and Y. R. Yang, \Proportional fairness in multirate wireless LANs,"
in Proc. IEEE INFOCOM, 2008, pp. 1004{1012.
[72] M. Grant and S. Boyd, \CVX: Matlab software for disciplined convex programming,
version 2.0 beta," http://cvxr.com/cvx, Sep. 2013.
[73] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, \Distributed optimiza
tion and statistical learning via the alternating direction method of multipliers,"
Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1{122, 2011.
[74] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge University Press,
2009.
[75] S. Boyd and A. Mutapcic, \Subgradient methods," Lecture notes of EE364b, Stan
ford University, Winter 20062007.
[76] R. G. Gallager, Information Theory and Reliable Communication. John Wiley &
Sons, Inc., 1968.
[77] N. R. Devanur, C. H. Papadimitriou, A. Saberi, and V. V. Vazirani, \Market equi
librium via a primal{dual algorithm for a convex program," Journal of the ACM,
vol. 55, no. 5, p. 22, 2008.
[78] R. Chandra, J. Padhye, L. Ravindranath, and A. Wolman, \Beaconstung: Wi
Fi without associations," in IEEE Workshop on Mobile Computing Systems and
Applications, HotMobile,, 2007, pp. 53{57.
[79] Y. S anchez, T. Schierl, C. Hellge, T. Wiegand, D. Hong, D. De Vleeschauwer,
W. Van Leekwijck, and Y. Lelouedec, \iDASH: improved dynamic adaptive stream
ing over http using scalable video coding," in ACM Multimedia Systems Conference
(MMSys), 2011, pp. 23{25.
[80] A. Begen, T. Akgul, and M. Baugher, \Watching video over the web: Part 1: Stream
ing protocols," Internet Computing, IEEE, vol. 15, no. 2, pp. 54{63, 2011.
[81] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, \Image quality assessment: From
error visibility to structural similarity," Image Processing, IEEE Transactions on,
vol. 13, no. 4, pp. 600{612, 2004.
150
[82] T. Ho, M. M edard, R. Koetter, D. R. Karger, M. Eros, J. Shi, and B. Leong, \A
random linear network coding approach to multicast," Information Theory, IEEE
Transactions on, vol. 52, no. 10, pp. 4413{4430, 2006.
[83] D. Tse and P. Viswanath, Fundamentals of wireless communication. Cambridge
Univ Pr, 2005.
[84] T. Richardson and R. L. Urbanke, Modern coding theory. Cambridge University
Press, 2008.
[85] E. Biglieri, J. Proakis, and S. Shamai, \Fading channels: Informationtheoretic and
communications aspects," Information Theory, IEEE Transactions on, vol. 44, no. 6,
pp. 2619{2692, 1998.
[86] E. H. Ong, J. Kneckt, O. Alanen, Z. Chang, T. Huovinen, and T. Nihtila, \IEEE
802.11 ac: Enhancements for very high throughput WLANs," in 22nd Interna
tional Symposium on Personal Indoor and Mobile Radio Communications (PIMRC).
IEEE, 2011, pp. 849{853.
[87] Y. Sanchez, T. Schierl, C. Hellge, T. Wiegand, D. Hong, D. De Vleeschauwer,
W. Van Leekwijck, and Y. Lelouedec, \Improved caching for httpbased video on
demand using scalable video coding," in Consumer Communications and Network
ing Conference (CCNC), 2011 IEEE. IEEE, 2011, pp. 595{599.
[88] A. Schrijver, Combinatorial optimization: polyhedra and eciency. Springer, 2003,
vol. 24.
[89] N. Bhushan, C. Lott, P. Black, R. Attar, Y.C. Jou, M. Fan, D. Ghosh, and J. Au,
\CDMA2000 1 EVDO revision a: a physical layer and mac layer overview," Com
munications Magazine, IEEE, vol. 44, no. 2, pp. 37{49, 2006.
[90] D. Bethanabhotla, G. Caire, and M. J. Neely, \Adaptive video streaming in MU
MIMO networks," arXiv preprint arXiv:1401.6476, 2014.
[91] P. Kyosti, J. Meinila, L. Hentila, X. Zhao, T. Jamsa, C. Schneider, M. Narandzic,
M. Milojevic, A. Hong, J. Ylitalo et al., \WINNER II channel models," European
Commission, Deliverable ISTWINNER D, vol. 1, 2007.
[92] http://media.xiph.org/video/derf/.
[93] https://ece.uwaterloo.ca/
z70wang/research/ssim/.
[94] A. El Gamal and Y. Kim, \Lecture notes on network information theory," Arxiv
preprint arxiv:1001.3404, vol. 3, 2010.
[95] H. Huh, A. M. Tulino, and G. Caire, \Network mimo with linear zeroforcing
beamforming: Large system analysis, impact of channel estimation, and reduced
complexity scheduling," Information Theory, IEEE Transactions on, vol. 58, no. 5,
pp. 2911{2934, 2012.
151
[96] T. Yoo and A. Goldsmith, \On the optimality of multiantenna broadcast scheduling
using zeroforcing beamforming," Selected Areas in Communications, IEEE Journal
on, vol. 24, no. 3, pp. 528{541, 2006.
[97] S. Pawar, N. Noorshams, S. El Rouayheb, and K. Ramchandran, \Dress codes for
the storage cloud: Simple randomized constructions," in Information Theory (ISIT),
2011 IEEE International Symposium on, pp. 2338{2342.
[98] J. Vondrak, \Polyhedral techniques in combinatorial optimization," Lecture notes of
CS369P, Stanford University, Fall 2010.
[99] A. Schrijver, Combinatorial optimization: polyhedra and eciency. Springer, 2003,
vol. 24.
152
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Enabling massive distributed MIMO for small cell networks
PDF
Distributed interference management in large wireless networks
PDF
Optimal resource allocation and crosslayer control in cognitive and cooperative wireless networks
PDF
Congestion control in multihop wireless networks
PDF
Rate adaptation in networks of wireless sensors
PDF
Fundamental limits of caching networks: turning memory into bandwidth
PDF
Achieving high data rates in distributed MIMO systems
PDF
Structured codes in network information theory
PDF
On practical network optimization: convergence, finite buffers, and load balancing
PDF
Improving spectrum efficiency of 802.11ax networks
PDF
Enabling virtual and augmented reality over dense wireless networks
PDF
Scheduling and resource allocation with incomplete information in wireless networks
PDF
Using formal optimization techniques to improve the performance of mobile and data center networks
PDF
Learning, adaptation and control to enhance wireless network performance
PDF
IEEE 802.11 is good enough to build wireless multihop networks
PDF
Design and analysis of large scale antenna systems
PDF
Algorithmic aspects of energy efficient transmission in multihop cooperative wireless networks
PDF
Learning and control for wireless networks via graph signal processing
PDF
Channel state information feedback, prediction and scheduling for the downlink of MIMOOFDM wireless systems
PDF
Dispersed computing in dynamic environments
Asset Metadata
Creator
Bethanabhotla, Dilip
(author)
Core Title
Optimal distributed algorithms for scheduling and load balancing in wireless networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
07/06/2015
Defense Date
04/06/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
congestion control,load balancing,massive MIMO,OAIPMH Harvest,scheduling,video aware wireless networks
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Caire, Giuseppe (
committee chair
), Fulman, Jason (
committee member
), Psounis, Konstantinos (
committee member
)
Creator Email
bethanab@usc.edu,dilip.bethanabhotla@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/uscthesesc3586654
Unique identifier
UC11299082
Identifier
etdBethanabho3544.pdf (filename),uscthesesc3586654 (legacy record id)
Legacy Identifier
etdBethanabho3544.pdf
Dmrecord
586654
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Bethanabhotla, Dilip
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 900892810, USA
Tags
congestion control
load balancing
massive MIMO
scheduling
video aware wireless networks