Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Scheduling and resource allocation with incomplete information in wireless networks
(USC Thesis Other)
Scheduling and resource allocation with incomplete information in wireless networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SCHEDULING AND RESOURCE ALLOCATION WITH INCOMPLETE INFORMATION IN WIRELESS NETWORKS by Yanting Wu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2015 Copyright 2015 Yanting Wu Dedication To my beloved family: Yonghui Wu, Cuixian Wang, Zheng Li and Lucas Li ii Acknowledgments This work would not have been possible without the encouragement, help, and guidance that I received over the years from many individuals. First and foremost is my advisor, Prof. Bhaskar Krishnamachari. He is a mentor not only in my research, but also in life and personality. He introduced me to the research world, and taught me about research, learning, career planning, and so much more. His passion for scientific discoveries and his dedication in making technical contributions inspired me and set me a role model. He guided me through every step to do meaningful research, from the topics, problem formulations, to details in proofs, and writing. He even searched papers for me, and read them with me together. Prof. Krishnamachari not only gave me a lot of detailed guidance, he also gave me freedom and encouragement to explore new problems and topics. His continuous support and encouragement and patiently help led me through my PhD studies. Prof. Krishnamachari also shared his experiences of being a PhD student and these years of being a faculty member, and set himself an example to us that keeping accumulating knowledge and building the abilities iii can finally pay off. Even though we may be slow and have a lot of hard time at the beginning, eventually the hardwork can lead to a leap and someday we could become an expert in the area. He created a quite positive atmosphere for learning and thinking. It is an honor for me to be his student. Next, I would like to take this opportunity to show my gratitude to the professors and collaborators in USC who had shared their knowledge with me, helped me and guided me. I would give my heartily thanks to Prof. Shanghua Teng, who not only gave me insightful advice on research problems, but also gave me great support and guidance during my toughest time in my Ph.D pursuit. I would also like to thank Prof. David Kempe, Prof. Shaddin Dughmi and other USC CS Theory Group members who brought their knowledge and ideas in my research problem discussions. Special thanks go to Li Han, who brought my attention to papers in dynamic speed scaling, which provided me a key theoretical tool in my third study. I would also like to thank Prof. Rahul Jain, Prof. John Silvester, Prof. Ethan Katz-Bassett and Prof. Shanghua Teng for serving on my qualifying exam and dissertation committees. Their constructive feedback and suggestions greatly improved this dissertation. Thirdly, I would like to thank those collaborators who are outside of USC: Prof. Amotz Bar-Noy, George Rabanca, and Prof. Rajgopal Kannan. I very much appreci- ate their insightful discussions with me. iv During my PhD studies, I enjoyed every moment being a member in Autonomous Networks Research Group (ANRG). This group is like a family. We discussed problems together, helped with each other, shared the knowledge, ideas and opportunities. Thank you all, my ANRG friends, for helping, accompanying and caring for me. Last but not least, I would like to thank my families: my husband Zheng Li, and my parents. With their love, patience, support, and unwavering belief in me, I have been able to complete this long PhD journey. Thank you with all my heart and soul. v Table of Contents Dedication ii Acknowledgments iii List Of Figures ix List Of Tables xi Abstract xiii Chapter 1: Introduction 1 1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Transmission over a Markovian Channel . . . . . . . . . . . . . . 5 1.1.2 Dynamic Multi-carrier Selection . . . . . . . . . . . . . . . . . . 6 1.1.3 Transmission for Random Arrivals with Energy-Delay Tradeoff . 11 1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 2: Background 14 2.1 Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Mechanism Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 Auctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.1 MDP and POMDP Problems . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Multi-armed Bandit Problems . . . . . . . . . . . . . . . . . . . 29 2.4 Online Algorithms and Competitive Analysis . . . . . . . . . . . . . . . 32 Chapter 3: Related Work 35 3.1 Scheduling and Resource Allocation using Water-filling . . . . . . . . . . 35 3.2 Scheduling and Resource Allocation using Game Theory . . . . . . . . . 37 vi 3.2.1 Power Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2.2 Medium Access Control . . . . . . . . . . . . . . . . . . . . . . 39 3.2.3 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.4 Dynamic Spectrum Access . . . . . . . . . . . . . . . . . . . . . 42 3.3 Scheduling and Resource Allocation using Mechanism Design . . . . . . 43 3.4 Scheduling and Resource Allocation using Reinforcement Learning . . . 46 3.5 Scheduling and Resource Allocation using Online Algorithms . . . . . . 50 Chapter 4: Transmission over a Markovian Channel 52 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 Optimal Policy with Complete Channel Information . . . . . . . . . . . . 56 4.3.1 Threshold Structure of the Optimal Policy . . . . . . . . . . . . . 57 4.3.1.1 Closed Form Expression of Threshold . . . . . . . . 59 4.3.1.2 V ( 0 ) andV ( 1 ) . . . . . . . . . . . . . . . . . . . . 60 4.3.2 K-Conservative Policies . . . . . . . . . . . . . . . . . . . . . . 61 4.4 Online Learning with Unknown Channel State Transition Probabilities . . 63 4.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Chapter 5: Dynamic Multi-carrier Selection: Binary Bid 73 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.3 Theoretical Analysis: A Game Setting . . . . . . . . . . . . . . . . . . . 78 5.3.1 Parameters Known Case - Carriers’ Perspective . . . . . . . . . . 78 5.3.1.1 Mixed Strategy Nash Equilibrium . . . . . . . . . . . . 79 5.3.2 Parameters Known Case - Transmitter’ Perspective . . . . . . . . 86 5.4 Online Learning with Unknown Channel State Parameters . . . . . . . . 90 5.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Chapter 6: Dynamic Multi-carrier Selection: Multi-bits Bid 99 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.2.1 Power-Rate Model . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.3 Theoretical Analysis: an Auction Setting . . . . . . . . . . . . . . . . . . 106 6.3.1 Optimal Power and Data Allocation . . . . . . . . . . . . . . . . 107 6.3.2 A Incentive Mechanism Design Which Ensures Truthfulness . . 109 6.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 vii 6.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.5.1 Worst Case Data Rate Efficiency . . . . . . . . . . . . . . . . . 120 6.5.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . 121 6.5.2.1 Channel and Power-Rate Model . . . . . . . . . . . . . 122 6.5.2.2 Single Operator Contract Model versus Auction Model 123 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Chapter 7: Transmission for Random Arrivals with Energy-Delay Tradeoff 126 7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.2.1 Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . 127 7.2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 130 7.3 Optimal Policy with Complete Channel Information . . . . . . . . . . . . 132 7.3.1 Greedy Algorithm for A Single Arrival . . . . . . . . . . . . . . 132 7.3.2 Backward Greedy Algorithm for Multiple Arrivals . . . . . . . . 137 7.4 Online Learning with Incomplete Knowledge of Arrivals . . . . . . . . . 142 7.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Chapter 8: Conclusions and Open Questions 156 8.1 Transmission over Multiple Markovian Channels . . . . . . . . . . . . . 157 8.2 Regulating Carrier Action for Markovian Channels . . . . . . . . . . . . 158 8.3 Decentralized Transmission Policy with Energy-Delay Tradeoff . . . . . 158 Reference List 160 viii List Of Figures 4.1 K-conservative policy Markov chain. . . . . . . . . . . . . . . . . . . . 62 4.2 Expected total discounted reward for K-conservative policies . . . . . . . 69 4.3 Percentage of time that a 1 ( +) optimal arm is selected over time by UCB-P algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4 Percentage of time that a 1 ( +) optimal arm is selected over time by UCBP-TUNED algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.1 Illustration of competitive uplink carrier selection and rate allocation prob- lem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 Carrier 1 payoff against carrier 2 using dominant strategy . . . . . . . . . 93 5.3 Carrier 1 payoff against carrier 2 using UCB1 strategy . . . . . . . . . . 93 5.4 Transmitter payoff when one carrier uses dominant strategy . . . . . . . . 95 5.5 Transmitter payoff when one carrier uses UCB1 strategy . . . . . . . . . 95 5.6 Normalized transmitter payoff with respect to optimum when both play UCB1 as a function of the two channel parameters . . . . . . . . . . . . 96 6.1 System model of multi-carrier rate allocation . . . . . . . . . . . . . . . 102 6.2 Data versus power allocation . . . . . . . . . . . . . . . . . . . . . . . . 106 ix 6.3 Optimal power-rate curve selection . . . . . . . . . . . . . . . . . . . . 109 6.4 Expected payoff versus probabilities based on a convex piecewise linear function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.5 Expected payoff versus probabilities based on a smooth convex function . 115 6.6 2km 2km view of the BS deployment by two major cellular operators over an area in London . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.1 Illustration example of scheduling problem with energy-delay tradeoff . . 128 7.2 Backward greedy algorithm illustration . . . . . . . . . . . . . . . . . . 139 7.3 Example of scheduling inA On C andA On D . . . . . . . . . . . . . . . . . . 145 7.4 Snapshots of simulation results for different arrival patterns . . . . . . . . 153 x List Of Tables 4.1 The optimal strategy table . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.1 Reward function for a carrier . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Carriers’ payoff matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3 Carriers’ payoff matrix-mixed strategy Nash equilibrium scenario 1 . . . 83 5.4 Carriers’ payoff matrix-mixed strategy Nash equilibrium scenario 2 . . . 83 5.5 Carriers’ payoff matrix-pure strategy Nash equilibrium scenario 1 . . . . 85 5.6 Transmitter’s expected throughput . . . . . . . . . . . . . . . . . . . . . 86 5.7 Carriers’ payoff forp 1 = 0 andp 2 = 1 . . . . . . . . . . . . . . . . . . . 89 5.8 Transmitter’s payoff forp 1 = 0 andp 2 = 1 . . . . . . . . . . . . . . . . . 90 5.9 Carriers’ payoff matrix: example 1 . . . . . . . . . . . . . . . . . . . . . 96 5.10 Carriers’ payoff matrix: example 2 . . . . . . . . . . . . . . . . . . . . . 97 6.1 Worst case data rate efficiency . . . . . . . . . . . . . . . . . . . . . . . 120 6.2 Single operator contract model versus proposed auction model . . . . . . 124 7.1 Comparison of three scheduling scheme . . . . . . . . . . . . . . . . . . 129 xi 7.2 Competitive ratio in simulations . . . . . . . . . . . . . . . . . . . . . . 154 xii Abstract In a wireless network, it is quite common that a transmitter needs to make decisions on scheduling and resource allocation with incomplete information. In this dissertation, we argue that efficient mechanisms can be designed even if some key information regarding to scheduling and resource allocation is unknown. We substantiate this thesis through three studies: Transmission over a Markovian channel Dynamic multi-carrier selection Transmission for random arrivals with energy-delay tradeoff First, we study the optimal transmission policy for a single transmitter in a 2-state Markovian channel. The transmitter can either sending with high rate or low rate, with rewards depending on the action chosen and the underlying channel state. The aim is to compute the scheduling policy to determine which actions to choose at each time slot in order to maximize the expected total discounted reward. We first establish the threshold xiii structure of the optimal policy when the underlying channel statistics are known. We then focus on the more challenging case when the statistics are unknown. For this problem, we map different threshold policies to arms of a suitably defined multi-armed bandit problem. To tractably handle the complexity introduced by countably infinite arms and the infinite time horizon, we weaken our objective a little: finding a (1 ( + ))- approximate policy instead. We present the UCB-P algorithm, which can achieve this objective with logarithmic-time regret. Next, we study the transmission control problem where a transmitter is able to trans- mit data simultaneously over multiple i.i.d. channels owned by different carriers. The carriers send bids indicating their channel quality to the transmitter, and the transmitter allocates power and data rate accordingly. The transmitter pays the carriers to use the channel. To efficiently use its power and maximize the throughput, the transmitter re- quires the knowledge of the channel quality. However, such information is incomplete to the transmitter and its transmission control relies on the bids from the carriers. Carri- ers are self-interested entities who compete with each other to get the payment from the transmitter. Hence, the transmitter also needs to have a strategic approach in order to deal with the self-interested carriers’ behavior. We first consider a simplified case in which there is one transmitter and two carriers, and the bid is binary. We analyze the Nash equilibrium of the carriers and use the price of xiv anarchy (PoA) to analyze the efficiency of the game. We prove that there exists a bounded PoA if the penalties for unsuccessful transmissions are set carefully. The bound is 2, and we prove that this bound is the best considering all possible settings. We next extend the binary bids to be multi-bit bids, and allow for an arbitrary number of carriers. The key aim of this latter study is to guarantee the truthfulness from the bidders. We propose a rewarding mechanism based on a convex piecewise linear function and prove that such a mechanism can ensure the truthfulness of the carriers. We also prove that with the number of bits per bid increases, the throughput obtained by the transmitter approaches the optimal. Finally, we study a scheduling problem on a time slotted channel to balance the trade- off between a higher energy cost incurred by packing multiple packets on the same slot, and higher latency incurred by deferring some packets. The objective is to minimize the weighted sum of delay and energy cost from all nodes. We first analyze the offline scheduling problem, in which the traffic arrivals are given in advance, and prove that a greedy algorithm is optimal. We then focus on the scenario where the information of the arrivals is incomplete, and the scheduler can only see the traffic arrivals so far. We develop an efficient online algorithm, which we prove is O(1)-competitive. xv Chapter 1 Introduction With the advancement of radio technologies, wireless networks have become ubiquitous due to their many advantages over wired networks. First and most importantly, they pro- vide mobility, which enables users to access their data over the network while they are moving. Second, they are easier to deploy. Instead of installing cables through walls, which can be challenging in some cases such as old buildings, wireless networks use the electromagnetic waves to communicate, and the access points or base stations can be deployed at the places where they can be easily installed. Thirdly, due to the develop- ment of wireless technology, the cost for using wireless networks becomes cheaper and affordable: sometimes, they are even cheaper than using wired network. The development of wireless technologies stimulates the development of mobile ap- plications. In turn, the great demand of pervasive mobile applications further stimulates the rapid development and growth of the wireless networks. During the past two decades, 1 we have witnessed a fast growth in wireless networks and the continuous increasing de- mand of wireless services [1]. Wireless networks are now on the verge of a third phase of growth. The first phase was dominated by voice traffic, and the second phase, which we are currently in, is dominated by textual and numeric data traffic. In the third phase, it is predicted that the traffic will be dominated by videos [2]. According to Cisco, in 2012, the average mobile user consumed 201 megabytes of data a month, including one hour of video, and two hours of audio and download one app per month; by 2017, it is predicted that the average mobile user will use 2 gigabytes of data per month, including 10 hours of video and 15 hours of audio [3]. Though most of the videos in the social media are prerecorded nowadays, in the future, the demand for live video capturing and streaming will become higher and higher. As a result, future wireless networks will need to be optimized for the delivery of mobile data services, especially for video contents and video applications, which include significant uplink traffic. Though the data rate offered by wireless networks keeps increasing, it is still a big challenge to satisfy the vast increase of the video traffic given the limited wireless resource [4]. In wireless networks, rapidly changing wireless channels are the fundamental contrib- utor which makes wireless networks design very challenging. Transmission signals are affected by path loss, shadowing, noise, and interference in wireless channels. Besides 2 these, the channel condition varies over time due to user mobility, fading and many other dynamic factors. Moreover, unlike wired network, the transmission in wireless networks are usually not independent of each other. One user’s signal often acts as interference to other users who share the same channel. Without careful interference management, the wireless communication can be very ineffective. Moreover, in wireless networks, the information obtained by the transmitter and the receiver are typically unequal. The transmitter requires the knowledge of the wireless channels in order to optimally allocate resources and transmit efficiently, however, such information are usually only known by the receivers. In real systems, sometimes it is challenging for the transmitter to obtain the truthful and accurate channel quality information. Due to these fundamental challenges in wireless network, a systematic and generic approach for efficient design of wireless networks is still far from being achieved if not impossible. Because of these fundamental difficulties, as well as the high demands with limited wireless networks resources, how to overcome the difficulties and improve the system performance have become an important research topic. In this thesis, we focus on settings where the transmitter needs to make decisions such as how to allocate power and rate, how to schedule the transmissions, and so on, with incomplete information about the channels. Such scenarios are common in wireless environments due to the fundamental challenges we mentioned before. 3 1.1 Contributions We study three fundamental problems: Transmission over a Markovian channel Dynamic multi-carrier selection Transmission for random arrivals with energy-delay tradeoff In these studies, we use the idea of dynamic resource allocation. The basic idea be- hind dynamic resource allocation is to utilize the channel more efficiently by optimiz- ing the transmission parameters such as transmitting power, symbol transmission rate, modulation scheme, coding scheme, bandwidth, or combinations of these parameters. Techniques of dynamic resource allocation include power control, rate adaptation, and dynamic channel allocation. To judge a design, there can be multiple metrics such as total transmit power, overall throughput, fairness, etc. In this thesis, our objective of the design is to minimize the total transmit power or to maximize the overall system throughput. Applying theoretical tools such as game theory, reinforcement learning, mechanism design, online algorithms and competitive analysis, we prove that that scheduling and resource allocation mechanisms can be designed and optimized even with incomplete in- formation about the channel conditions. We also validate our claims through simulations. 4 1.1.1 Transmission over a Markovian Channel We study a communication system operating over a Gilbert-Elliott Channel in a time- slotted fashion. The Gilbert-Elliott channel is a 2-state Markov channel. The channel conditions varies over time, and at each time slot, a channel can be in either state High (H) or Low (L). If the channel is in state H, it allows the transmitter transmitting with a high data rate, and a L state only allows a low data rate to pass. The probability that a channel in state H or L is determined by a state transition matrix. We study the scheduling and resource allocation strategies of the transmitter. The objective for the transmitter is to decide at each time, based on prior observations, whether to send data with a high data rate or a low data rate. The former incurs the risk of failure, but reveals the channels true state, while the latter is a safe but unrevealing choice. In the first part of this study, we consider the scenario when the channel transition probabilities are known. We model the problem as a Partially Observable Markov De- cision Process (POMDP) problem and show that the optimal policy always has a single threshold that corresponds to a K-conservative policy, in which the transmitter adopts the conservative approach for K steps after each failure before reattempting the aggressive strategy. After understanding the optimal policy given the complete information about the un- derlying state transition matrix, we move to a more challenging case which the underlying 5 state transition matrix is unknown, and this is the main focus of this study. In this setting, the problem of finding the optimal strategy is equivalent to finding the optimal choice of K. We map the problem to a multi-armed bandit, where each possible K-conservative policy corresponds to an arm. To deal with the difficulties of optimizing the discounted cost over an infinite horizon, and the countably infinite arms that result from this map- ping, we introduce approximation parameters and, and show that a modification of the well-known UCB1 [5] policy guarantees that the number of times arms that are not within (1 ( +)) of the optimal are played is bounded by a logarithmic function of time. In other words, we show that the time-averaged regret with respect to a (1(+))-optimal policy tends to zero. 1.1.2 Dynamic Multi-carrier Selection Optimizing throughput is one of the central problems in wireless networking research. To make good use of the available wireless channels, the transmitter must allocate power and data rate efficiently. In this part, we study a simple yet fundamental rate allocation prob- lem in which a transmitter transmitting data over multiple channels owned by different carriers. The transmitter does not know the channel quality, and the corresponding carri- ers are selfish. The channels from the transmitter to each carrier are independent channels with two states: high or low and the channel states are assumed to be i.i.d. Bernoulli 6 random variables. To obtain the channel state information, the transmitter initializes auc- tions, and the carriers communicate to the transmitter a bid corresponding to the possible state of their respective channels. The transmitter responds to these bids by deciding the allocated power and data rate on each channel. Given a fixed power, the transmitter can send data with a high rate or a low rate. When the transmitter sends data at a high rate, we assume that there is a failure and nothing gets sent if the transmission channel actually turns out to be bad. In this case, the transmitter levies a penalty on the corresponding carrier. But whenever data is successfully sent, it pays the corresponding carrier a fee. There are two roles in this setting: the carriers and the transmitter. The carriers want to get as much reward as possible, avoiding the penalties. Since the transmitter’s rate allocation is a competitive resource that directly affects the carriers’ utilities, the setting can be modeled as a non-cooperative game. On the other hand, the transmitter is the game designer: it can choose the rewards and penalties settings in order to influence how the carriers play the game. The goal of the transmitter is to maximize the throughput without the knowledge of the carriers’ channel states. We first consider a simplified scenario that there are two carriers in competition and the bid is binary: High (H) or Low (L). This setting can be modeled as a two player, non-cooperative game. 7 The transmitter’s power and data rate allocation strategies are: when both bids are low, the transmitter splits power equally and sends data at a low rateR 0 over both channels; when both bids are high, the transmitter splits its power equally and sends data at a high rateR 1 over both channels; and when one bid is high, and the other is low, the transmitter allocates the full power to the high bid bidder, and sends data at a very high rate R 2 over its channel. When the low data rate is used, no matter what channel state is, the transmission is always successful. However, when the high or very high data rate is used, with some probability, the transmission will fail, and we assume that all the data in that transmission are lost. Whenever data is successfully sent, the transmitter pays the carrier a fee proportional to the rate obtained. However, for failed transmissions, the transmitter levies penalties on the carriers. The penalties are the parameters that the transmitter can tune to shape the carriers behaviors. Initially, we assume that the carriers both know each others’ channel parameters, but the transmitter does not. We use the payoff matrix to determine the Nash equilibrium, and prove that there exists a way to set the penalty terms such that both carriers have dominant strategies. We also prove the price of anarchy from the transmitter’s point of view is at most 2, and this is in a sense the best bound. If the underlying channels’ states are known, both carriers will play their dominant strategies if they have one. However, if the underlying channel status is unknown even 8 to the carriers, then the carriers need to learn which action is more beneficial. Assuming that the underlying channel state is drawn from an unknown underlying distribution at each time slot, we show that modeling each carrier’s choice of action as a multi-armed bandit leads to desirable results. In this study, we adapt UCB1 algorithm [5], which there are two arms for each carrier, each arm corresponding to an action: bidding high or bidding low. From the simulations, we find that the UCB1 algorithm gives a performance which is close to the dominant strategy, and when both carriers use UCB1 to choose their strategies, it can give even better payoffs than playing the dominant strategy. We then extend the study of the binary bid to be multi-bits bids, the number of carriers can be arbitrary, and the power-rate model is more flexible which can be based on any convex function. In this study, the transmitter is able to transmit data simultaneously over at mostK parallel i.i.d. channels. The probability that a channel in state high is p, and the value of p are different among different channels. The transmitter allocates some power to a selected set of channels, and use a high rate or a low rate on each of the selected channels. The rates concavely change with the power, and high rates and low rates follow different concave functions with respect to the power. When a low data rate is used, no matter what channel state is, the transmission is always successful. However, when the aggressive data 9 rate is used, with probability (1p), the transmission will fail. The transmitter rewards the carriers for successful transmissions, and penalize them for failure. We model the problem as an auction, in which the transmitter is an auctioneer, and the carriers are the bidders. The objective of the transmitter is to efficiently use the channels to meet its traffic requirement. When the traffic is heavy, this is equivalent to maximizing the total data rate over multiple channels under a power constraint. The objective of the carriers is to maximize the expected payment from the transmitter. We propose a payment mechanism using a convex piecewise linear function of chan- nel probabilities, and prove that bidding truthfully is always a preferable action for a carrier. We also prove that the throughput obtained by the transmitter approaches the maximum possible throughput with perfect information about the channel statistics as the number of bits per bid increases. Since bidding truthfully is always preferable, comparing with many existing literature on competitive rate allocation, which typically needs multi- ple iterations to converge [6–9], our proposed mechanism is one-shot and does not require iterative convergence. Our proposed mechanism is win-win for both customers (transmit- ters) and operators since the customers have better service (and/or lower payment) and the operators (carriers) may potentially obtain larger revenues due to more customers and more efficient use of the channels. The last claim is supported by simulations based on a dataset of real BS locations over a 2km 2km area in London. 10 1.1.3 Transmission for Random Arrivals with Energy-Delay Tradeoff In wireless networks, a key tradeoff is that between energy and delay. In this study, we formulate, study, and solve a fundamental problem pertaining to the energy-delay tradeoff in the context of transmissions on a time slotted channel. In a general form, this problem consists of multiple independent nodes with arbitrary packet arrivals over time. At each slot, every node decides at the beginning of a slot whether to send one or more of its queued as well as newly arrived packets, or to defer some or all of them to a future slot. The more the total number of packets sent on a given slot, the higher the energy cost (modeled as an arbitrary strictly convex function of the total number of simultaneously scheduled packets); on the other hand, deferral incurs a delay cost for each deferred packet. In this system, there is a centralized scheduler, who schedules the various arriving packets to different slots in order to minimize a cost function which combines both the deferral penalty and the energy penalty. In the first part of this study, we assume that the centralized scheduler has a complete knowledge about the packet arrivals, and develop a backward greedy algorithm, which is proved to be optimal. Then we focus on a more realistic causal scenario, in which the centralized scheduler has incomplete information about the arrivals and only knows about the arrivals up to the current time slot, and develop an efficient online algorithm for this scenario, which 11 is proved to be O(1)-competitive with respect to the offline optimal. In other words, our online algorithm guarantees a cost that is within a constant factor of the cost incurred by the offline optimal algorithm for any arbitrary sequence of packet arrivals. We run simulations to test the performance of online algorithm. Although the competitive ratio in principle depends on the cost function, in our simulation, we find this ratio is always less than 2. Our formulation is general and spans diverse applications such as multi-packet recep- tion from cooperative transmitters (in which it is assumed that higher numbers of pack- ets spent at the same slot incur a higher energy cost), as well as multiple sender-receiver pairs employing interference mitigation strategies (such as successive interference cancel- lation) where too there is a higher energy cost for allowing multiple interfering packets in a given slot. Though in traditional communication systems the transmission power grows exponentially with the rate (the number of packets scheduled in a slot), as can be seen by inverting the well-known Shannon-Hartley capacity theorem which relates rate as a logarithmic function of power, our model is in fact even more general, as the requirement we impose on the energy cost is that it be strictly convex, increasing, and exponentially bounded. 12 1.2 Organization The rest of the dissertation is organized as follows. We present some background material in Chapter 2 and a brief survey of relevant studies from the literature in Chapter 3. In Chapter 4, we study the transmission control over a single 2-state Markovian channel. Chapter 5 and Chapter 6 are both about the dynamic multi-carrier selection. Chapter 5 focuses on binary bid, while Chapter 5 is about multi-bit bids. In Chapter 7, we investigate the scheduling problem on a time-slotted channel with energy-delay tradeoff. Finally, we conclude the dissertation and indicate some open directions for future work in Chapter 8. 13 Chapter 2 Background Game theory, mechanism design, reinforcement leaning, and online learning are some theoretical tools used in this dissertation. This chapter provides a brief background on these topics. 2.1 Game Theory Game theory is a set of mathematical tools which are used to describe, model and ana- lyze strategic interaction among rational entities [10]. It originates from microeconomics and is used to predict the outcome of complex interactive decision situations. From early 1990s, game theory was applied to networking problems including flow control, con- gestion control, routing and pricing of Internet services. In recent years, there has been growing interest in applying game theory to model today’s leading communications and 14 networking issues in wireless networking, including power control and resource shar- ing [11, 12]. Typically, a game has three components: players, strategies and payoff functions, represented by G = (N; (S i ) i2N ; (u i ) i2N ). Players are the game participants, denoted byN . In wireless network, they are usually wireless nodes such as mobile devices, or base stations, depending on the application. Usually, we label players byi = 1; 2; n. Strategies specify what every players can do. We often use notationS i for strategies set for playeri, and useS i to represent strategy sets of all other players other than playeri. In wireless network, strategies may include the choice of a modulation scheme, coding rate, transmit power level, flow control parameters, price of a channel, and so on. The third component, which are the payoff functions, is crucial for a game. It defines how each player evaluates every strategy profile and represents the preference relationship. To get a good evaluation of each strategy profile, a game player usually needs to take into account what it wants to do and what it should expect other players to do. A number is assigned to each possible outcome; this number is called the payoff (or utility), and often denoted byu i . A higher payoff represents a more desirable outcome [13]. In wireless networks, a player may prefer outcomes which yield higher throughput, lower drop rates or error rates, lower power consumption, and lower cost, though in many practical situations, these goals may be in conflict. Designing appropriate payoff functions and modeling the 15 preference of these relationships is one of the key challenges in wireless networks game modeling [14]. In wireless networks such as sensor networks, pervasive computing systems, mesh networks, ad hoc networks, a node typically has the following features: decentralized operation, self configuration, and power/energy awareness [12]. Thus, the decisions are usually made in a distributed fashion. Game theory, which models the interaction of au- tonomous agents, are helpful in analyzing such wireless networks, since nodes in such wireless networks are autonomous agents. They make their own decisions, set the param- eters and change the mode of the operations on their own. In some cases, wireless nodes may make the decisions to improve the overall perfor- mance of the network, in which the wireless nodes cooperate. In many other cases, they may behave selfishly, only caring about their own interests. Sometimes, they may even behave maliciously in order to ruin the performance of other nodes. The latter two cases, which are called non-cooperative games, are our focus. In a non-cooperative game, players typically have totally or partially conflicting in- terests. For example, a number of wireless nodes attempts to transmit data in a common area. One node’s transmission can be interfered by other nodes’ transmission. In order to successfully transmit data, a wireless node may increase its power, which causes more 16 severe interference to other nodes. If all transmitting nodes behave similarly, every trans- mitting node will transmit at its maximal power level to improve its own performance, however, doing so will increases the overall interference in the system, which, in turn, adversely impacts the performance of all the involved wireless nodes. In this example, the interference presents a conflict, which affects the nodes’ decisions. To represent a two-player non-cooperative finite game, a payoff matrix may be used. The rows and columns of the matrix represents the strategies of each player. Each element in the matrix is a pair of numbers which represents the payoffs for the two players when a certain combination of strategies is used. In a non-cooperative game, there are typically two kinds of strategies for a player: pure strategy and mixed strategy. Given a set of strategies, if a player chooses to take one strategic action with probability 1 then that player is playing a pure strategy. A mixed strategy is a probability distribution over pure strategies. Each player’s randomization is statistically independent of those of his opponents, and the payoffs to a profile of mixed strategies are the expected values of the corresponding pure strategy payoffs. In wireless networks research, the best resource allocation strategy is to maximize the long term expected aggregated throughput (system throughput) across users and across time given the resource constrains. The solution of such an optimization problem is called social optimal. However, since players in the non-cooperative game are self-interested, it 17 is difficult or sometimes impossible to achieve the social optimal since it requires play- ers completely cooperate with each other to maximize the system throughput regardless individual gain. In non-cooperative games, we typical consider Nash equilibrium [15], which is defined as a steady state where no player in the game would unilaterally change his strategy if he is selfish and rational. In most cases, a Nash equilibrium is not the same as the social optimal. In general, uniqueness and existence of Nash equilibrium may not be guaranteed. According to Nash, a mixed Nash equilibrium always exists in finite games, but a pure strategy Nash equilibrium may not. Uniqueness and existence of Nash equilibrium is an important topic for researchers. If a Nash equilibrium exists, two metrics are used to measure the performance of the game. One is price of anarchy (PoA) [16], the other is price of stability (PoS) [17]. They are defined as PoA = value of the worst Nash equilibrium value of the optimal ; PoS = value of the best Nash equilibrium value of the optimal : 18 2.2 Mechanism Design Mechanism design has a close relationship with game theory. Actually, it can be con- sidered as a field in game theory which studies designing rules of a game or a system to achieve a specific outcome, even though each agent is self-interested. Different from tra- ditional game theory studies, which given a mechanism, and study the equilibrium and the efficiency of the game, mechanism design considers the problem from a inverse point of view. In mechanism design, the goal function is given, while the mechanism is unknown. In other words, we can treat the mechanisms design as solving the problems in which the mechanism is the value of a variable. Mechanism design is not just to analyze how a given mechanism works, but rather to find efficient mechanisms whose outcomes agree with those specified by a given goal function, when information about the environment is distributed among self-interested agents [18]. Such agents will act strategically and may hold private information that is relevant to decisions. In many economics scenarios, the Nash equilibrium is not efficient, and in such cases, mechanism design theory can be used to identify other, more efficient institutions. The study of mechanism design begins with the work of Hurwicz [19], who defines a mech- anism as a communication system in which participants send messages to each other and/or to a “message center,” and where a pre-specified rule assigns an outcome (such as an allocation of goods and services) for every collection of received messages. Within 19 this framework, markets and market-like institutions could be compared with a vast array of alternative institutions. Later, Hurwicz introduced one of the key concepts in mech- anism design: incentive-compatibility [20], which allows the analysis to incorporate the incentives of self-interested participants. In particular, it enables a rigorous analysis of economies where agents are self-interested and have relevant private information [21]. Mechanism design has three components: agents, decisions and private information (i.e. preferences). The agents are the players who participate in the game. Among these agents, there is a special one, called the principal, who chooses the payoff structure such that the agents have an incentive to behave according to the rules. Each agent has some private information, as a result, they have different preferences, and such preferences are reflected as “type”, denoted as2 , where is the space of types. An agent makes a decision (i.e. which type should be reported to the principal) based on a decision rule, which is a function of its type. The procedure works as follows: first, the principal commits a mechanism, which defines the payoff structure y; second, all other agents reports, possibly dishonestly, a type e ; third, the mechanism is executed and the agent receives the outcome: the goods allocation and the money transfer, denoted ast() [22]. In mechanism design, an abstract term "Social Choice" [23] is introduced, which is simply an aggregation of the preferences of the different participants towards a single 20 joint decision. A Social Choice Function is defined as the mapping of the truthful type to the decisions and transfers, denoted asf() = (d();t()), while a mechanism is the mapping of the reported type to the transfers, denoted asy( e ) = (d( e );t( e )). The mechanism design distinguishes it from a game, as the consequence of a profile of messages is an outcome rather than a vector of utility payoffs. Once the preferences of the individuals are specified, then a mechanism induces a game. Since in the mechanism design setting the preferences of individuals vary, this distinction between mechanisms and games is critical [24]. In computer networks, mechanism design is widely used in routing of messages, scheduling of tasks, allocation of power, and so on. In an environment where there are multiple owners of resources or requests, an efficient mechanism should take into account different preferences of the different owners. It should function well assuming that each participant are selfish, rational and act strategically. 2.2.1 Auctions In mechanism design, one of the important branches is the auction theory. In an auction, there are typically two roles: the auctioneer and the bidders. The auctioneer is typically the seller, who has some items such as goods or services to sell; while the bidders are the buyers. Usually, the seller does not know the buyers’ true valuations of the item, and the 21 bidders does not know each others’ valutions. The auctioneer predefines the bidding rule, and uses the bid to determine the bidders’ valuations. The bidders place their bids, and the auctioneer sells the item to the highest bidder, and gets paid based on the predefined rule [25]. In economic theory, an auction may refer to any mechanism or set of trading rules for exchange. Auction theory has been widely applied to wireless networks. For example, when allocating spectrum for mobile telephony or the license to build a transmission line, we want to design an auction that will allocate the licenses to those who value them the most highly. Some of these auctions raised several billion dollars for governments around the globe [26]. An auction can also be used to regulate the participating parties actions: a regulator can choose instruments (e.g. a price or a quantity, a subsidy or a tax, issuing a license to operate) so that the regulated parties will choose the action (e.g. how much to produce or how much to charge) to promote efficiency (e.g. market clearing, which equates demand and supply). Traditionally, there are four types of auctions: first-price sealed-bid auctions, second- price sealed-bid auctions (a.k.a Vickrey auctions to honor William Vickrey, who wrote the first game-theoretic analysis of auctions [27]), open ascending-bid auctions (a.k.a. English auction) and open descending-bid auctions (a.k.a. Dutch auctions). In first-price sealed-bid auctions, the bidders place their bid in a sealed envelope and simultaneously 22 give the envelope to the auctioneer. The auctioneer opens the envelope and sells the item to the highest bidder. The highest bidder pays the amount of bid. The procedure of the second-price auctions are similar to the first-price sealed-bid auctions, however, the winner pays the amount of the second highest bid. In the open ascending-bid auctions, the bidders keep increasing their bids, each of them will stop if they are no longer willing to pay more than his current bid, the procedure continues until there is only one bidder left, and this highest bidder wins the item and pays the amount of his bid. The open descending-bid auctions is the inverse of the ascending-bid auctions. The auctioneer starts the open descending-bid auction with a sufficient high price to deter all bidders, then he keeps descreasing the price progressively until there is a bidder willing to buy the item, and the item is sold to this bidder with the current price. Though the types of the auctions appear to be four, there are actually only two different kinds of auctions [25]. The open descending-bid auctions and the first-price sealed-bid auctions are equivalent. In open descending-bid auctions, the auctioneer is lowing the price from its high starting piont until someone accepts the bid and pays the current price. Even though the bids are open, the bidders learn nothing while the auction is running. For each bidderi, there is a first priceb i at which he is willing to accept the item paying the priceb i . Thus, such a process is equilvalent to the first-price sealed-bid, which the highest bidder wins the item and pays the amount of his bid. The open-ascending bid auctions 23 and the second-price sealed-bid auctions are equivalent. In open-ascending bid auctions, the bidders drop out when the price is increasing. The winner is the last remaining bidder, and he/she pays the price when the second-to-last bidder drops out. The valuation of the item is private information, a bidder can overbid, underbid or truthfully bid. An important study in auction is to design mechanisms to ensure truthful- ness of the bids. Second-price sealed-bid auctions are auctions which ensure the truth- fulness since truthful bidding is weakly dominant [28]. To prove this, we need to show that deviation from the truthful bidding will not provide any improvement of the payoff of a bidder. Assuming that the true valuation of bidder i for the item is v i , but he bids b i instead. There are two possible deviations: overbidding and underbidding. In both cases, the only thing gets affected is whether this bidder wins or loses, but never affects how much he pays in the event that he wins. Assuming other bidders do not change their bid, bidderi bidsb i > v i , and this overbidding changes his lost to be a win. In order for this to be happen, the orignial highest bid b j has to be in between v i and b i ; bidder i’s payoff received from this overbidding isv i b j 0, which indicates overbidding will not provide any higher payoff. Similarly, we can prove that underbidding does not provide any higher payoff, either. As a result, no matter what other bidders bid (i.e. overbidding, underbidding, or truthful bidding), bidding truthfully is always a good idea in such an auction. 24 2.3 Reinforcement Learning In wireless network, due to the dynamic nature of the wireless channels and the unbal- anced knowledge about the channel quality from the transmitter and the receiver, it is quite common for a transmitter to take transmission actions with incomplete information about the channels. A transmitter usually processes some type of experimental data, and dynamically adjust its transmission parameters such as power or rate. How the trans- mitter collects the data and processes it determines the most suitable algorithm to use. Such algorithms typically measure the states as outputs, estimates the model parameters, and output control signals. In reinforcement learning, the algorithm processes rewards, estimates some value function and outputs the action [29]. Reinforcement learning is a kind of learning which maps situations to actions in order to to maximize a numerical reward [30]. Without knowing which actions to take, the learner needs to try different actions and discover which action yield the most reward. Actions may not only affect the immediate reward, it may also has an influence of all subsequent actions and rewards. Reinforcement learning is not learning from examples, but rather from interactions. In practice, it is not always easy to get a large example pool which covers all possible situations, in such a scenario, we can use reinforcement learning, which the learner learns from its experience. Every time the learner takes an 25 action, it gets some feedback by obtaining some rewards. Such experience (i.e. history actions and rewards) helps determine the current action [29]. The learner is called agent, and outside which the agent interacts with is called the environment. The agent selects actions to maximize the rewards presented by the envi- ronment. Suppose that the time is slotted:t = 0; 1; 2;:::. We assume that at time slott, the agent receives a states t from the environment, and chooses an actiona t , as a consequence of the actiona t , the agent receive a numerical rewardr t and moves to a new states t+1 at the next time slott + 1. At each time slot, the agent implements a mapping from states to probabilities of selecting each possible action. This mapping is called the agent’s policy and denoted as t , and t (s;a) is the probability ofa t = a ands t = s. Reinforcement learning methods specify how the agent changes its policy as a result of its experience such that the reward received by the agent is maximized over the long run [30]. 2.3.1 MDP and POMDP Problems Reinforcement learning is widely used to solve Markov Decision Process (MDP) [31] problems and Partially Observable Markov Decision Process (POMDP) [32] problems. An MDP is a discrete time stochastic control process; it can be represented by a 5- tuple (S;A;P;R;), whereS is the set of the states,A is the set of the actions,P is the set of conditional transition probabilities between states,R : SA! R is the reward 26 function, and2 [0; 1] is the discount factor. Assuming time is slotted, and at each slot, the process is in some states, and the agent chooses an actiona2 A which is available in states. As a consequence of this action, the process randomly move to a new states 0 , and the agent gets some rewardR(s;a). The probability that the process moves into its new states 0 is influenced by the chosen action a, and such probabilities are given by the transition function p(s 0 js;a). In other words, the next states 0 not only depends on the current states, but also the agent’s action a. However, s 0 is only depend on s and a, and it is conditionally independent of all the previous states and actions. The underlying state transitions of MDP satisfies the Markov property. MDP problems are typically solved by linear programming or dynamic programming [33]. A POMDP is a generalization of a Markov decision process (MDP). A POMDP mod- els an agent decision process in which it is assumed that the internal state of the environ- ment is dynamics and determined by an MDP, but the agent cannot directly observe the underlying internal state. POMDP can be represented by a 7-tuple (S;A;P;R; ;O;), where S is a set of states, A is a set of actions, P is a set of conditional transition probabilities between states, R : SA! R is the reward function, is a set of observations, O is a set of conditional observation probabilities, and2 [0; 1] is the discount factor. Different 27 from MDP, which the agent can directly observe the environment’s state, in POMDP, the agent does not always be able to observe the internal environment state, and has to make decisions under uncertainty of the true internal environment state. Because such an observation process produces uncertainty of the underlying internal state, POMDP introduces some additional set: the observation set and its corresponding conditional observation probabilities setO. MDP does not require such sets because every time the agent can observe the true state of the environment with certainty. In POMDP, instead of keeping a single value of the current state, it must maintain a probability distribution over the set of possible states, based on a set of observations and observation probabilities, and the underlying MDP. Such a probability distribution about the current state are encoded as a information vectorb = [b 1 ;b 2 :::], which is called belief. The belief is itself a discrete-time cotinuous-state Markov process. To calculate the optimal control for POMDP, it is crucial to understated how the beliefs are updated. Assuming that the time is slotted, at timet, the environment is in some state s2 S, where S is the set of states. The agent takes an action a2 A, which causes the environment to transition to states 0 with probabilityp(s 0 js;a). At the same time, the agent receives an observationo2 which depends on the new state of the environments 0 with probabilityO(ojs 0 ;a) and receive some rewardR(s;a). Letb(s;t) denote the belief 28 that the environment is in states at slott, after taking actiona and observeo, a new belief that the environment is in states 0 will be b(s 0 ;t + 1) =O(ojs 0 ;a) X s2S P (s 0 js;a)b(s;t); = 1 Pr(o;jb;a) ; Pr(ojb;a) = X s 0 2S O(ojs 0 ;a) X s2S P (s 0 js;a)b(s;t) is a normalization constant. As we can see, the belief update requires the knowledge of previous belief state b, the action taken a, and the current observation o. To solve POMDP problems, the history actions and observations do matter. The objective of both MDP and POMDP problems is typically to choose the “right” action such that the expected total discounted reward is maximized, which is defined as E 1 t=0 [ t R t ], whereR t is the obtained reward at timet. 2.3.2 Multi-armed Bandit Problems Multi-armed bandit is a very useful tool in reinforcement learning, espeically for a single agent reinforcement learning. The multi-armed bandit problem is taken from playing slot machines. Considering a player who is playing a slot machine with n arms, and each arm will gives you a different reward with different winning and losing probabilities, the 29 objective of the player is to figure out the best arm which gives the greatest reward. Each time the player plays, he hasn arms to pull. Initially, the player may need to pull each arm several times and try to compute a runing average of each arm. He can greedily pull the arm which seems to give the greatest reward. However, since the rewards given by each arm are stochastic, the arm which looks the best so far may turn out to give you really bad rewards later on. Thus, as a strategic player, instead of keeping playing the same arm, the player should greedily pull the best so far, while still gives other arms some chance to be played. The former action is called exploitation, which is based on the knowledge already acquired. The latter action is called exploration, which is to try new actions to further increase knowledge. How to balance these two choices is known as the exploitation versus exploration dilemma. In a typical multi-armed bandit problem, a policy’s performance is measured in terms of its “regret”, defined as the gap between the the expected reward that could be obtained by an omniscient user that knows the parameters for the stochastic rewards generated by each arm and the expected cumulative reward of that policy. It is of interest to charac- terize the growth of regret with respect to time as well as with respect to the number of arms/players. Intuitively, if the regret grows sublinearly over time, the time-averaged regret tends to zero. 30 Some early work in multi-armed bandit include [34–37]. J. Gittins and D. Jones [34], P. Whittle [36] and P. Varaiya et al. [37] have shown that the maximum-value N- armed bandit problem, an N-dimensional Markov decision problem, can be reduced to a sequence of one-dimensional stopping problem. In each of the latter problems, one finds for each state i of an arm, its index m i max 1 ER =(1Ea ), where + 1 is a stopping time, 0<a< 1 is the discount factor, andR is the present value of the rewards earned in periods 1;:::; when the arm is chosen in those time slots. In each time slot, one selects the arm with largest index in the current state. T. Lai and H. Robbins studied classic non-Bayesian version of the problem in [38]. There are K independent arms, each generating stochastic rewards that are i.i.d. over time. The player is unaware of the parameters for each arm, and must use some policy to play the arms in such a way as to maximize the cumulative expected reward over the long term. To maximize the cumulative expected reward is equivalent to minimize the regret, which is defined as the gap between the the expected reward that could be obtained by an omniscient user that knows the parameters for the stochastic rewards generated by each arm and the expected cumulative reward of that policy. T. Lai and H. Robbins prove that the lower bound of the regret for classic multi-armed bandit problems indexed by a single real parameter iso(n a )8a > 0. In other words, the cumulative expected reward asymptotically approaches the optimal. 31 Watkins proposed the-greedy strategy to solve the bandit problem [39]. The-greedy strategy consists of choosing a random arm with-frequency, and otherwise choosing the arm with the highest estimated mean, the estimation being based on the rewards observed thus far.-greedy algorithm is simple and used widely in practice. However, the constant factor prevents the strategy from getting arbitrarily close to the optimal arm. P. Auer et al. propose UCB-1 algorithm which computes an Upper Confidence Bound (UCB) for each arm in [5], which achieves logarithmic regret uniformly over time. The key of the UCB1 is to select arm with largest x i + q 2ln(n) n i , where x i is the average reward got on armi;n i is the number of times armi has been played up to the current time slot. x i is the term reflecting the exploitation, while q 2ln(n) n i is the term reflecting exploration. 2.4 Online Algorithms and Competitive Analysis When we analyze an algorithm, which generates some output, a traditional way is to assume that all the inputs are given in advance. However, in many real problems, such an assumption is usually not hold. In many problems, inputs appear online, meaning that only the present and history inputs are available; the future inputs are not accessible. An online algorithm has to generate an output without the knowledge of future inputs. Formally, many online problems can be described as follows. 32 An online algorithmA is presented with a request sequence =(1);(2);:::(m): The algorithmA has to serve each request online without knowledge of the future requests (i.e. when serving request(t), it does not know any request(t 0 ) witht 0 > t). Serving requests incurs cost, and the goal is to serve the entire request with as small total cost as possible [40]. To evaluate the performance of an online algorithm, Sleator et al. [41, 42] suggest to compare the online algorithm to an optimal offline algorithm, and this is called competi- tive analysis. The idea of competitiveness is to compare the cost of an online algorithm to the cost of the offline algorithm. An offline algorithm is an omniscient algorithm that knows the entire input requests, so the total cost is minimum. The better an online al- gorithm approximates the optimal, the more competitive this algorithm is. To quantify how competitive an online algorithm is, let C A () be the cost incurred by A and let C OPT () denote the cost of the offline optimal. For a deterministic algorithm, it is called ccompetitive if there exists a constanta such that C A ()cC OPT () +a (2.1) for all request sequences. The factorc is called competitive ratio ofA. For a randomized online algorithm A, the competitive ratio is defined with respect to an adversary. It is assumed that the adversary always knows the description of A. 33 However, based on how much information the adversary can observe from the online algorithms, an adversary an be categorized to be one of the following three categories: Oblivious Adversary: the oblivious adversary has to generate a complete request sequence in advance before any requests are served by the online algorithm. Adaptive Online Adversary: the adaptive online adversary may observe the online algorithm and generate the next request based on the algorithm’s randomized an- swers to all previous requests. It cannot observe the random choices made by the online algorithm on the present or future request. Adaptive Offline Adversary: the adaptive offline adversary generates a request se- quence adaptively. An randomized online algorithmA is calledccompetitive against any oblivious adver- sary if there is a constant a such for all request sequences generated by an oblivious adversary, E[C A ()] c C OPT () + a. The expectation is taken over the random choices made byA. For adaptive online or adaptive offline adversaries, denoted asADV , letE[C A ] andE[C ADV ] be the expected cost incurred byA andADV in serving a request sequence generated byADV , a randomized online algorithmA is calledccompetitive against any adaptive online or adaptive offline adversary if there is a constanta such for all adaptive online or adaptive offline adversaries,E[C A ] cE[C OPT ]() +a, where the expectation is taken over the random choices made byA. 34 Chapter 3 Related Work In this chapter, we give an overview of the previous research and literature that are rele- vant to our studies. We also compare and contrast our work with these relevant existing literature. 3.1 Scheduling and Resource Allocation using Water-filling In a wireless network, the wireless channel experiences time-varying fading. To achieve optimal power allocation or maximum throughput under a power constraint, a commonly used approach is water-filling over time [43–46]. Since power affects the rate, the opti- mal adaptive technique should use variable rate based on channel conditions. With high 35 power, the transmission time is shorter, however, the energy consumed is higher. Bing- ham proposes a finite granularity multicarrier loading algorithm which assigns bits suc- cessively to the subcarrier until the target rate is reached in [47]. Chow et al. improve the water-filling computation complexity by iteratively adjust the system performance mar- gin until convergence [48]. Yu and Cioffi solve a simple two-band channel partition and power allocation problem using a water-filling algorithm in [49]. Kim et al. propose a joint subcarrier and power allocation algorithm in [50] and optimal power is calculated by water-filling fashion. In our studies, the transmitter also does water-filling across either channels (Chapter 5 and Chapter 6) or slots for a single channel (Chapter 4 and Chapter 7). However, many prior works do waterfilling given the channel states information. In our problem setting in Chapter 4 and Chapter 7, with autonomous selfish carriers, truth- fulness is no longer a trivial thing. In Chapter 7, transmission time as well as energy consumption are both taken into account when doing the water-filling, while in previous works [44–46], the objective is mainly on minimizing the total transmission power either without considering the delay, or considering the delay only has some deadline. 36 3.2 Scheduling and Resource Allocation using Game Theory In wireless networks, the nodes are typically distributed and self-configured. Each node operates its own resources such as power and rate, and one node’s transmission can be- come another node’s interference. In such an environment, in which the resources are usually shared and the nodes’ actions can affect each other, it is quite natural to model the wireless resource allocation problem as games. We summarize below the related literatures on modeling the resource allocations such as power, channel, spectrum as games and how the game theory and are applied in solving the scheduling and resource allocation problems in wireless networks. 3.2.1 Power Control Distributed power control is one of the most popular studies in wireless networking re- search using game theory. In power control problems, the performance is usually mea- sured as Bit Error Rate (BER). Typically, increasing the transmission power increases the SINR, and consequently decreases BER, while decreasing the transmission power de- creases SINR, and increases BER. However, as mentioned before, transmitting with the maximum power level may not necessarily guarantee a low BER since if the other trans- mitting nodes do so, a node’s SINR will go down due to more severe interference caused by other nodes. 37 Goodman, et al. did some initial work to introduce game theory tools in wireless networks, more specifically, CDMA networks in late 90s [51–53]. They models the dis- tributed power control as a non-cooperative games. These initial work provided some in- sights into how communication systems should be designed. MacKenzie, and Wicker [54] further expanded the idea of non-cooperative game modeling for power control. In [54], utility is defined as a function of BER per power unit. In such a system, if a user’s trans- mit power is too high, he is wasting the precious power while having little impact on his bit error rate. The users will attempt to make the best possible choices with the consid- eration that the others are doing the same thing. Gunturi and Paganini in [55] form a non-cooperative power control game, which handles base station assignment, hand-off, and power control in a multicell environment. Yu, et al. formulate a multiuser power con- trol problem as a noncooperative game, and show the existence and uniqueness of a Nash equilibrium for a two-player version game and propose a water-filling algorithm which reaches the Nash equilibrium efficiently. Han et al. in [56] introduce a virtual referee to the multi-cell power control problem to monitor and improve the outcome of non- cooperative competition among distributed users. In [57], Agarwal et al. propose using different transmit power level for different neighbors. The power level is chosen in such a way that the minimum signal power required for acceptable performance is achieved. 38 In [58], a non-cooperative game for power control in frequency-selective channels is in- vestigated and sufficient conditions of the existence and uniqueness of the equilibrium are proposed. Mertikopoulos et al. study a power allocation game for orthogonal multiple access channels, prove that there exists a unique equilibrium of this game when the chan- nels are static and show that a simple distributed learning schema based on the replicator dynamics converges to equilibrium exponentially fast [59]. 3.2.2 Medium Access Control The medium access control problem with many users contending for access to a shared communications medium lends itself naturally to a game theoretic formulation. In medium access control games, selfish users try to maximize their own throughput by obtaining an unfair share of access to the channel and consequently decreasing the ability of other users to access the channel. Zander is one of the earliest researchers studying the medium access control problem using game theory. In [60], he studies the network which uses slotted ALOHA multiple access schemes. There are a jammer and network nodes in this network, both of which are subject to an average power constraint. He uses a game theoretic approach which models the network as a two person constant sum game. However, the game he considered is cooperative in nature and does not consider contention between selfish nodes themselves. 39 MacKenzie and Wicker study the slotted Aloha medium access control problem as non- cooperative games which users contending for the channel [61–64]. In their work, they have shown that the tools of game theory lead to strategies in which optimal behavior emerges “naturally” from the selfish interests of the agents and the rules of the games, that stability is achieved with low attempts rates in a selfish Aloha system, and proved the existence of an equilibrium in an multipacket reception slotted Aloha system. In [65], Cagalj et, al studies the cheating problem in CSMA/CA network using game theoretic approach, and argues that for multiple cheaters, the game can be designed in such a way that cheaters will have incentive to cooperate with each other. In [66, 67], Chen et al. propose a game-theoretic model for contention control. They characterize Nash equilibria of random access games, study their dynamics, and propose distributed algorithms to achieve Nash equilibria. 3.2.3 Routing Another problem that is well modeled by game theory is the problem of routing in wire- less networks. In this problem, the players are usually the source nodes or sometimes a source/destination pair. The actions for each player is a set of possible paths from the source to destination. Preferences in this game can be in several forms such as the end-to- end delay for a packet to traverse the chosen route: a short delay is preferred to a longer 40 delay, reliability of the link: a reliable link is preferred to an unreliable one, and so on. Take the end to end delay as example, since the link is usually shared by multiple routes, the more flows use a given link, the higher the delay will be. This dependency here forms a routing game. One interesting problem in routing game is called Braess paradox [68]. Suppose a given network reaches its equilibrium, adding additional network links, which contradicts our intuition, can sometimes actually slow down the traffic [69]. In [23], Roughgarden, Tardos et al. show that if we add edges to a network with an equilibrium pattern of traffic, there is always an equilibrium in the new network whose travel time is no more than 4 3 times as large. Another problem studied in wireless routing is the effect of selfish nodes on the for- warding operation. The establishment of multi-hop routes in an ad hoc network relies on nodes’ forwarding packets for one another. However, a selfish node, in order to conserve its limited energy resources, could decide not to participate in the forwarding process by switching off its interface. If all nodes decide to alter their behavior in this way, acting selfishly, this may lead to the collapse of the network. For a single shot game, the equi- librium solution for selfish nodes is none of them forwards the packet. However, in a repeated game setting, a node tends to behave in a socially beneficial manner in order 41 to receive any benefit in the later stages. Some example studies examining the effect of selfish nodes on forwarding include [70–74]. 3.2.4 Dynamic Spectrum Access Game theory is also used in dynamic spectrum access and resource management in cogni- tive radio network. In [75], Nie and Comanciu propose a game-theoretic adaptive channel allocation scheme to analyze the behavior of cognitive radio networks. The players of the game were the wireless nodes who are self-interested and their strategies were defined in terms of channel selection. The authors show that the channel allocation problem can be formulated as a potential game, as a result, it converges to a deterministic channel alloca- tion Nash equilibrium point. In [76], a game framework of spectrum sharing is proposed and the bound of the price of anarchy is analyzed. In [77], Niyato and Hossain propose an equilibrium pricing scheme where the QoS performance degradation of the primary users was considered as the cost in offering the spectrum access to the secondary users. The authors analyze the problem as a Bertrand game and obtained the Nash equilibrium which provides the optimal pricing. In [78], Ghosh and Sarkar model the problem as a competitive game where the primary needs to select the price of its channel with the knowledge of its own channel state but not its competitors. Secondary users select the 42 channels based on the states and the prices. The authors prove that there exists a unique symmetric Nash equilibrium strategy in this game setting. Game theory is also used in our studies. In Chapter 5, we model the carriers’ self- interested behaviors as playing a non-cooperative game. However, unlike most of the works in the existing literature, which focus on proving the existence, uniqueness, or convergence of equilibrium, our focus is on issues of information asymmetry between conflicting parties such as the transmitter and the receivers and the design of appropriate penalties levied by the transmitter to ensure that the carriers’ self-interested behaviors do not hurt performance too much. 3.3 Scheduling and Resource Allocation using Mechanism Design Mechanism design focuses on issues of incentives and strategies. In wireless networks, the resources such as spectrum, power are valuable and limited. In such environments, an auction provides a way for efficient allocation of the scarce resources. The sellers can improve revenue by dynamically pricing based on buyer demands, while buyers bene- fit since auctions assign resources to buyers who value them the most. Because of this, there is an ongoing effort to propose efficient mechanisms regarding to scheduling and 43 resource allocation in wireless networks through auctions [9, 79–82]. In [79], Gandhi et al. propose a conflict-free spectrum allocation which maximize the auction revenue and spectrum utilization by using a compact and highly expressive bidding language: each buyer expresses its demand as the amount of spectrum desired at each particular per-unit price using a continuous concave piecewise linear demand curve. Huang et al. study the power allocation among a group of spectrum users using auctions [80, 81]. Each user’s transmitted power is uniformly spread across the entire available bandwidth controlled by the manager. The manager announces a price (for either received power or SINR), the users submit bids for the amount of power they wish to purchase, and the manager allocates power proportional to the bids received. The authors prove the existence and uniqueness of Nash equilibrium and give an iterative and distributed algorithm for power allocation which converges to the Nash equilibrium. In [9], Iosifidis et al. study the mobile data offloading using double auction. The authors propose an iterative double mechanism which ensure that the mobile network operators maximize their offloading while the local wireless access point minimize the offloading costs. A major advantage of their proposed algorithm is that it does not require full information about the mobile network operator, nor APs, which incurs minimum communication overhead. In [82], Gao et al. study the spectrum auction with multiple auctioneers and multiple bidders, and 44 propose a mechanism in which auctioneer systematically raises the trading price and bid- ders subsequently choose one auctioneer for bidding. The authors analytically show that the proposed algorithm converges to the equilibrium with maximum spectrum utilization of the whole system. Our problem in Chapter 6 is also about using auction mechanisms to find the optimal. However, unlike many of the previous studies, which typically take some iterative learning process for the algorithm to converge to the Nash equilibrium or the optimal, our proposed mechanism ensures the truthful bidding, there is no iterative learning algorithm whose convergence is affected. In mechanism design, intervention framework is proposed to provide incentive schemes such that the self interested entities will play a game in the favor of the mechanism de- signer [6–8]. Van der Schaar et al. propose to use an intervention device which is able to monitor the actions of users and affect the resource usage. An intervention manager strategically chooses an intervention device to maximize the system performance such as the sum of the utility of all users. Our proposed mechanism in Chapter 6 also uses inter- vention framework to manipulate the users actions. However, different from these works, in which an optimal intervention device is selected among a set of intervention devices, and the algorithms to find the optimal or a good enough intervention device typically re- quires several iterations to converge, we have only a fixed intervention device/rule, which is predefined in the contract. 45 3.4 Scheduling and Resource Allocation using Reinforcement Learning In this section, we first review some of the previous works on transmission control over Markovian channels, where the problem is modeled as a POMDP problem. Then we present the related work which uses reinforcement learning, more specifically, multi- armed bandit to solve the problems. Johnston and Krishnamurthy [83] consider the problem of minimizing the transmis- sion energy and latency associated with transmitting a file across a Gilbert-Elliott fading channel, formulate it as a POMDP, identify a threshold policy for it, and analyze it for various parameter settings. Karmokar et al. [84] consider optimizing multiple objectives (transmission power, delay, and drop probability) for packet scheduling across a more general finite-state Markov channel. Motivated by opportunistic spectrum sensing, sev- eral recent studies have explored optimizing sensing and access decisions over multiple independent but stochastically identical parallel Gilbert Elliott channels, in which the objective is to select one channel at each time, showing that a simple myopic policy is optimal [85, 86]. 46 Multi-armed bandit provides a fundamental approach for optimal sequential decision making and learning in uncertain environments. Gai et al. have intensively studied wire- less networking optimization problem using multi-armed bandit. Their contributions can be categorized in three main parts: learning with linear reward, learning for stochas- tic water-filling with linear and nonlinear rewards and learning in decentralized settings. In [87–89], the authors study the problem where at each time slot, a set of multiple ran- dom variable can be selected, subject to a general arbitrary constraint on weights associ- ated with the selected variables and reward is a linearly weighted combination of these selected variables. Simply treating each feasible weighted combination of elements as a distinct arm is slow due to exponential number of arms. The authors take into account the linear nature of the dependencies and propose learning with linear rewards (LLR) and Matching Learning for Markovian Rewards (MLMR) policy which greatly save storage and computation still having logrithmic regret. The proposed algorithms can be applied to solve problems such as maximum weight matching in bipartite graphs (which is useful for channel allocations in cognitive radio networks), shortest path, minimum spanning tree computation, channel selection and so on. In [90], the authors study the power allocation over stochastically time varying channels with unknown gain to noise ratio distributions using multi-armed bandit. The authors show that the proposed algorithm has regret that grows polynomially in the number of channels and logarithmically in time. In [91], the 47 authors present distributed learning policies which result in a regret that is uniformly log- arithmic in time. In [92], Kalathil et al. study a decentralized spectrum access control problem where users want to access a set of channels. Each user can only access one channel at a time and they cannot coordinate with each other. The authors consider the problem as the decentralized multi-armed bandit problem with distinct arms for each players and pro- pose an index-type policy dUCB 4 which achieves (at most) near-O(log 2 T ) growth non- asymptotically in expected regret. In [93], Lai, et al. study multiple users spectrum access control in a highly dynamic environment where the availability of each channel is modeled as a Markov chain. The author propose a heuristic policy based on histogram estimation of the unknown parameters which provides a linear order of the system re- gret. In [94], Anandkumar, et al. study the distributed learning and cooperative allocation among multiple secondary users to maximize cognitive system throughput. The author propose schema which result in logarithmic regret. They also prove a lower bound de- rived for any learning scheme is asymptotically logarithmic in the number of slots, and demonstrate that their schemes achieve asymptotic order optimality in terms of regret. Liu and Zhao also study a decentralized multi-armed bandit problem with multiple dis- tributed players in [95, 96]. A decentralized policy is proposed to achieve the optimal order under the fairness constraint. The proposed policy is based on a Time Division 48 Fair Sharing (TDFS) of the M best arms where no pre-agreement on the time sharing schedule is needed and its order optimality is proven under general reward, observation and collision models. In [97], Dai et al. consider a non-Bayesian version of the sensing problem where the channel parameters are unknown, and show that when the problem has a finite-option structure, online learning algorithms can be designed to achieve near- logarithmic regret. In [98], Nayyar et al. consider the same sensing problem over two non-identical channels, derive the optimal threshold structure policy for it. For the non- Bayesian setting, show a mapping to countably infinite-armed multi-armed bandit, and also prove logarithmic regret with respect to a (OPT) policy, similar to the approach adopted in this work for a different formulation. Our studies focus on the scenarios when the information is incomplete, multi-armed bandits provide a handy way to figure out a solution which converges to the optimal when the information is complete. In Chapter 4, when the underlying state transition matrix is unknown, we map the problem to a multi-armed bandit, where each possible policy corresponds to an arm. In Chapter 5, in the case of unknown stochastic payoffs, we consider the use of a multi-armed bandit-based learning algorithm. 49 3.5 Scheduling and Resource Allocation using Online Algorithms Online algorithms and competitive analysis are widely used in areas such as resource allocation in operating systems [42, 99, 100], distributed data management [101–103], robotics [104, 105], scheduling [106, 107] and load balancing [108–111]. In Chapter 7, our design and analysis of the online algorithm closely relate to dynamic speed scaling [112–116]. In a speed scaling problem, the objective is to minimize a combination of total (possibly weighted) flow (i.e. the number of unfinished job) and total energy used. Speeding up can reduce the flow processing time. However, it will consume more energy. This indicates a flow versus energy trade-off. In [113], Bansal et al. study a problem which minimizes the integral of a linear combination of total flow plus energy. The authors propose an online algorithm which schedules the unfinished job with the least remaining unfinished work, and runs at speed of the inverse of the energy function. By using the amortized competitive analysis [115], the authors prove that this online algorithm is 3-competitive. Andrew et al. improve the results to be 2-competitive in [114] by using a different potential function in the amortized competitive analysis. Different from the works in dynamic speed scaling, which the speed can be updated at any time, our problem is discrete as the number of packets scheduled in each slot has to be an integer. 50 Load Balancing in data centers is related to our studies in Chapter 7 [109–111] . Wier- man et al. model an Internet-scale system as a collection of geographically diverse data centers and consider two types of costs: operation costs (energy costs plus delay costs) and switching costs. The offline problem in which the scheduler has all future workload information is modeled as convex optimization problem and the optimal is achieved by solving the convex problems backwards in time [110]. Based on the structure of the of- fline optimal, the authors develop efficient online algorithms for dynamic right-sizing in data centers. Similarly, our problem also has energy costs and delay costs, but not switch- ing cost. The way we get the offline optimal is also schedule backward in time, however, we have a much simpler algorithm: greedily scheduling packets based on the marginal cost. Another difference is that Wierman et al.’s work are in continuous domain, while ours is discrete. 51 Chapter 4 Transmission over a Markovian Channel 4.1 Overview Communication over the wireless channels are affected by fading conditions, interference, path loss, etc. To gain a better utilization of wireless channels, a transmitter needs to adapt transmission parameters such as data rate and transmission power according to the communication channel states. In this chapter 1 , we analyze mathematically a communication system operating over a 2-state Markov channel (known as the Gilbert-Elliott Channel) in a time-slotted fashion. A high state allows the transmitter transmitting data with a high data rate, while a low state only allows a small data rate to pass. At each slot, the transmitter to decide whether to send data at an aggressive or conservative rate based on prior observations. The former 1 This chapter is based on the work in [117]. 52 incurs the risk of failure, but reveals the channels true state, while the latter is a safe but unrevealing choice. We use expected total discounted reward to measure the performance, and the objective is to take the "right" actions such that the expected total discounted reward is maximized. When the channel transition probabilities are known, this problem can be modelled as a partially observable Markov decision process (POMDP). This formulation is very closely related to a recent work pertaining to betting on Gilbert Elliott channels [118], which considers three choices, and shows that a threshold-type policy consisting of one, two, or three thresholds depending on the parameters, is optimal. In our setting, we show that the optimal policy always has a single threshold that corresponds to aK-conservative policy, in which the transmitter adopts the conservative approach forK steps after each failure before reattempting the aggressive strategy. Unlike [118], however, our focus is on the case when the underlying state transition matrix is unknown. When the underlying channel statistics are unknown, the problem of finding the op- timal strategy is equivalent to finding the optimal choice ofK. We map the problem to a multi-armed bandit, where each possibleK-conservative policy corresponds to an arm. To deal with the difficulties of optimizing the discounted cost over an infinite horizon, and the countably infinite arms that result from this mapping, we introduce approximation pa- rameters, , and show that a modification of the well-known UCB1 policy guarantees 53 that the number of times arms that are not within (1 ( +)) of the optimal are played is bounded by a logarithmic function of time. In other words, we show that the time- averaged regret with respect to a (1 ( +))-optimal policy tends to zero. 4.2 Model For our problem setting, we consider the Gilbert-Elliott channel which is a Markov chain with two states: high (denoted by 1) or low (denoted by 0). The transition probabilities matrix is given as: P = 2 6 6 4 P 00 P 01 P 10 P 11 3 7 7 5 = 2 6 6 4 1 0 0 1 1 1 3 7 7 5 : (4.1) Define = 1 0 . We assume that the channel is positive correlated, which means 0. At the beginning of each time slot, the transmitter chooses one of the following two actions: Sending Conservatively (SC): the transmitter sends data with a low rate. No matter what the channel state is, it can successfully transmit a small number of bits. We 54 assign a rewardR 0 to this action. Since the transmission is always successful, the transmitter cannot learn the state if this action is chosen. Sending Aggressively (SA): the transmitter sends data with a high rate. If the chan- nel is in state high, we consider the transmission successful and the transmitter can get a high reward R 1 (> R 0 ). If the channel is in state low, sending with a high rate will cause high error rate and drop rate, we consider the transmission a failure and the transmitter gets a constant penaltyC. We assume if the transmitter sends aggressively, it can learn the state of the channel. In other words, we assume that when the channel is in state low, an aggressive transmission strategy will encounter and detect failure. Because when sending conservatively, the state of the channel is not directly observ- able, the problem we consider in this chapter turns out to be a POMDP problem. In [119], it has been shown that a sufficient statistic to make an optimal decision for this POMDP problem is the conditional probability that the channel is in state high given all past ac- tions and observations. We call this conditional probability the belief, represented by b t = Pr [S t = 1jH t ] which H t is the history of all actions and observations before t th time slot. When sending aggressively, the transmitter learns the state of the channel; so the belief is 0 if the channel is in state low or 1 if the channel is in state high. 55 The following is the expression for the expected rewards: R(b t ;A t ) = 8 > > < > > : R 0 ifA t =SC; b t R 1 (1b t )C ifA t =SA; (4.2) whereb t is the belief of the channel in high state andA t is the action taken by the trans- mitter at time t. In this chapter, we consider the expected total-discounted reward to make decisions, which is defined as the weighted sum of the expected reward over a potentially infinite horizon, as shown in Eq. 4.3: E " 1 X t=0 t R(b t ;A t )jb 0 =p # ; (4.3) where b 0 is the initial belief, R(b t ;A t ) is the expected reward at t, and is a constant discounter factor. is smaller than 1 to give higher weights to the current and near future rewards, and lower weights to the farther future rewards. 4.3 Optimal Policy with Complete Channel Information In this section, we discuss the optimal policy when the transition probabilities are known. 56 4.3.1 Threshold Structure of the Optimal Policy Policy, denoted by, is defined as a rule which maps belief probabilities to actions. We useV (p) to represent the expected total-discounted reward the transmitter can get given the initial belief isp and policy is: V (p) =E " 1 X t=0 t R(b t ;A t )jb 0 =p # : (4.4) The aim is to find a policy having the greatest value of the expected total-discounted reward, denoted byV (p): V (p) = max fV (p)g: (4.5) According to [120, Thm. 6.3], there exists a stationary policy which makesV (p) = V (p), thusV (p) can be calculated by the following equation: V (p) = max A2fSA;SCg fV A (p)g; (4.6) 57 whereV A (p) is the greatest value of the expected total-discounted reward by taking action A when the initial belief probability isp.V A (p) can be expressed as: V A (p) =R(p;A) +E [V A (p 0 )jb 0 =p;A 0 =A]; (4.7) whereb 0 is the initial belief,A is the action taken by transmitter, andp 0 is the new belief after taking actionA. Sending conservatively: by taking this action, the belief changes fromp toT (p) = 0 (1p) + 1 p =p + 0 , hence, V SC (p) =R 0 +V (T (p)): (4.8) Sending aggressively: by taking this action, the channel state is known, so V SA (p) = (pR 1 (1p)C) +[pV ( 1 ) + (1p)V ( 0 )]: (4.9) This formulation turns out to be similar to the problem formulation in [118], except for two main differences as follows: (1). We do not have a separate action for sensing. (2). We introduce penalty when sending fails. 58 Theorem 1. The optimal policy has a single threshold structure. (p) = 2 6 6 4 SC if 0p; SA if p 1; (4.10) where p is the belief, and is the threshold. We omit the proof here because it follows in a straightforward manner from the results in [118]. Since the penalty does not change the linear property of function V SA (p), according to [118, Thm. 1, Thm.2],V (p) is convex and nondecreasing, thus the optimal solution follows a threshold structure. However, unlike [118], which shows the existence of multiple thresholds, there is a single threshold for our problem setting. 4.3.1.1 Closed Form Expression of Threshold The threshold is the solution of the following equation: R 0 +V (T ()) =V SA (): (4.11) There are two possible scenarios forT (): 59 IfT (), we haveV (T ()) =V SC (T ()), substituting in Eq. (4.11), we can get: = R 0 +C (R 1 +C) +V ( 1 ) R 0 1 : (4.12) Otherwise, we haveV (T ()) =V SA (T ()), substituting in Eq. (4.11), we can get Eq. (4.13): = (1 1 )R 0 + 0 R 1 + (1)(1)C +( 1)(1)V ( 0 ) (1)(R 1 + (1)C +( 1)V ( 0 )) : (4.13) 4.3.1.2 V ( 0 ) andV ( 1 ) To calculateV ( 1 ) andV ( 0 ), there are also two possible scenarios: If 1 , then since 0 1 , V ( 1 ) =V ( 0 ) = R 0 1 : (4.14) Otherwise, 1 >,V ( 1 ) =V SA ( 1 ), using Eq. (9), we get V ( 1 ) = 1 R 1 (1 1 )C +(1 1 )V ( 0 ) 1 1 : (4.15) 60 To getV ( 0 ), we adapt [118, Thm.4]: V ( 0 ) = max A2fSA;SCg fV A ( 0 )g = maxfR 0 +V (T ( 0 ));V SA ( 0 )g (4.16) = maxfR 0 1 N 1 ; max 0nN1 f 1 n 1 R 0 + n V SA (T n ( 0 ))gg: Since N is arbitrary and 0 < 1, whenN!1, we have V (p) = max n0 fR 0 1 n 1 + n V SA (T n ( 0 ))g: (4.17) Using Eq. (9), Eq.(4.15), Eq. (4.17), we can get V ( 0 ) = max n0 ( 1 n 1 R 0 + n ( n R 1 + n C(1)C) 1 n+1 [1 (1) n ] ) ; (4.18) where n = T n ( 0 ) 1 1 = (1 n ) S 1 1 . 4.3.2 K-Conservative Policies In this section, we will discuss the K-conservative policy, where K is the number of time slots to send conservatively after a failure, before sending aggressively again. The Markov chain corresponding to K-conservative policy is as shown in Fig. 4.1. The states are the number of time slots which the transmitter has sent conservatively 61 Figure 4.1: K-conservative policy Markov chain. since last failure. There are K + 2 states in this Markov chain. State K corresponds to that the transmitter has already sent conservatively forK time slots, and it will send aggressively next time slot. If the transmitter sends aggressively and succeeds, it goes to stateSA and continues to send aggressively at the next time slot; otherwise it goes back to state 0. The probability that transmitter stays in stateSA is 1 . The transmitter has to waitK time slots before sending aggressively again, so the probabilities from statei to statei + 1 is always 1 when 0i<K. There are K + 2 states, each state corresponds to a belief and an action; belief and action determine the expected total-discounted reward. Thus given K, there are K + 2 different expected total-discounted rewards. Theorem 2. The threshold policy structure is equivalent to aK opt -conservative policy structure, whereK opt is the number of time slots that a transmitter sends conservatively after a failure before sending aggressively again. If S >,K opt =dlog (1 (1) 0 )e1, otherwiseK opt =1. 62 Proof. Whenever the transmitter sends aggressively but fails, it goes back to sending conservatively stage. The belief after a failure is 0 , and it changes with time according to formula: T n ( 0 ) = T (T n1 ( 0 )) = 0 ( 1 n+1 1 ); wheren is the number of time slots that the transmitter has sent conservatively since last time failure. If S , T n ( 0 ) increases when n increases, whenn!1, T n ( 0 )! S . The optimal policy is always sending conservatively,K opt =1. If S > , then there exists a finite integerK opt which makesT Kopt1 ( 0 ) < and T Kopt ( 0 ), K opt =dlog (1 (1) 0 )e 1: (4.19) 4.4 Online Learning with Unknown Channel State Transition Probabilities In this section, we will discuss how to find the optimal policy if the underlying channel’s transition probabilities are unknown. To findK opt , we use the idea of mapping each K- conservative policy to a countable multi-armed bandits of countable time horizon. Now there are two challenges: (1). The number of arms can be infinite. (2). To get the true total 63 discounted reward, each arm requires to be continuing played until time goes to infinity. To address these two challenges, we weaken our objective to find a suboptimal which is an (1 ( +)) approximation of the optimal arm instead. Theorem 3 and Theorem 4 address the two challenges respectively. We define (1)-optimal arm as the arm which gives (1)-approximation of the best arm no matter what the initial belief is. Let arm SC correspond to the always sending conservatively policy, orK opt =1. Theorem 3. Given an and boundB on, there exists aK max , such that8KK max , the best arm in the arm setC =f0; 1;:::K;SCg is an (1)-optimal arm. Proof. IfK >K opt orK opt =1, the optimal arm is already included in the arm sets. IfK <K opt <1, suppose that transmitter has already sent conservatively for n time slots, letk opt =K opt n,k =Kn, andC 0 =R 1 +C + 1 (R 1 R 0 ), V k opt (p)V k (p) (4.20) = [R 0 1 kopt 1 + kopt V SA (T kopt (p))] [R 0 1 k 1 + k V SA (T k (p))] < k [V SA (T kopt (p))V SA (T k (p))] < k (T Kopt ( 0 )T K ( 0 ))(R 1 +C +( R 1 1 R 0 1 )) = k K+1 (1 KoptK ) S C 0 < K+1 C 0 <B K+1 C 0 : 64 LetK max =log B C 0 1, whenKK max ,V k opt (p)V k (p)<. Theorem 4. Given an, there exists aT max such that8T T max , an arm for the finite horizon total discounted reward up to time T is at most away from the infinite horizon total discounted reward. Proof. E[ 1 X t=0 t R i (t)jb 0 =p]E[ Tmax X t=0 t R i (t)jb 0 =p] (4.21) = E[ 1 X t=Tmax+1 t R i (t)jb 0 =p]< Tmax+1 R 1 1 : WhenTdlog (1) R2 e 1, E[ P 1 t=0 t R i (t)jb 0 =p]E[ P Tmax t=0 t R i (t)jb 0 =p]<. We define period as time interval between arm switches, A-reward as the average (1)-finite horizon approximation total discounted reward at one period, regret as the number of time slots during which the transmitter does not use an (1 ( +))-optimal policy. We design the UCB-Period (UCB-P) algorithm as shown in Algorithm 1. It is similar to UCB1 algorithm, but the time unit is period and the A-rewards are accumulated for each period. 65 Algorithm 1 Deterministic policy: UCB-P Initialization: L( K max + 2) arms: arm 0, , arm (L 2) and the SC arm. Play each arm for T ( T max ) time slots, then keep playing the arm until the arm hits a transmission failure. Get an initial A-reward A =A(i); (i = 0; 1;L 2;SC) for each armi. Let n i = 1(i = 0; 1;L 2;SC) be the initial number of periods armi has been played, andn =L be the initial number of periods played so far. for periodn = 1; 2; do Select the arm with highest value of (1) A i +C R 1 +C + q 2ln(n) n j . Play the selected arm for at least a period, then switch arms when it hits a transmission failure. Update the average A-reward,n i of the selected arm andn. end for Theorem 5. The regret of Algorithm UCB-P is bounded byO(L(L +T )ln(t)), where t is the number of time slots passed so far. Proof. The procedure is similar to UCB1, [121, Thm.1 ] can be adapted. A-reward is within the range of [ C 1 ; R 1 1 ], where the left boundary corresponds to the transmitter sending aggressively but failing every time slot, and the right boundary corresponds to the transmitter sending aggressively and succeeding every time slot. We normalize the A-reward to be in the range of [0, 1], UCB1 algorithm shows that the number of time slots that selects non-optimal arm is bounded byO((L 1)ln(n)), where L is the number of arms and n is the overall number of plays done so far. The best arm is a (1 ( +))-optimal arm. If any other non-best arm ~ K hits SA states at theT th time slots, the transmitter keeps playing that arm until sending fails, and the time playing the arm can be larger than T. Since the rewardR 1 is the best reward the 66 transmitter can get, these time slots do not count towards the regret. However, if such a ~ K th arm hits a transmission failure just beforeT th time slot, the transmitter needs to send conservatively for at least ~ K more time slots. Since ~ KL, the arm can contribute regret for at most (L +T ) time slots. Thus, we will use (L +T ) to bound time slots generating regret in one period. The number of plays selecting non-optimal arms in UCB1 is bounded by O((L 1)ln(n)). In our problem, the number of periods playing non-best arm is bounded by O((L1)ln(n)), wheren is the number of periods,n< t T . Each non-best arm is played at mostO((L+T )ln(n)) time slots. In total, the number of time slots playing non-best arms is bounded byO((L 1)(L +T )ln( t T )) =O(L(L +T )ln(t)). Note that besides the best arm, some of the non-best arm in the arm sets may also give (1 ( +)) approximation of the real expected total discounted reward, this only makes the regret smaller.O(L(L + T )ln(t)) is still an upper bound of the regret. More specifically, takingL = K max + 2 andL =T max , the regret can be bounded byO(K max (K max +T max )ln(t)). 4.5 Simulations We start by analyzing the known underlying transition matrix case. We setR 0 = 1,R 1 = 2,C = 0:5, = 0:75, and select 5 groups of transition probabilities, each corresponding to a different threshold, or equivalently, a different K opt -conservative policy. The first 67 corresponds to a scenario that the optimal policy is always sending conservatively. The other 4 correspond to different K opt -conservative policies. The optimal Ks are shown in Table 4.1. In Fig. 4.2, the x axis represents different K-conservative policies and the y axis represents expected total discounted rewards. 5 curves correspond to different transition probabilities. The expected total discounted rewards get maximum whenK = K opt . Take theK opt = 4 curve as an example,whenK < 4, the total discounted reward increases when K increases and whenK > 4, the total discounted reward decreases when K increase. The expected total discounted rewards get maximum whenK = 4. 0 1 K opt 0.36 0.91 0.5446 1 0.26 0.86 0.5060 2 0.16 0.96 0.4597 3 0.16 0.91 0.4553 4 0.01 0.61 0.5918 1 Table 4.1: The optimal strategy table Next, we consider the unknown transition probability matrix case. ForK6=1 sce- narios, if is bounded by 0.8, taking = 0:02 and = 0:02, we get K max = 26 and T max = 20. For the simulations, we takeL = 30,T = 100. We run the simulations with different 0 and 1 and measure the percentage of time playing the (1 ( +))-optimal arm. Fig.4.3 is the simulation results running UCB-P algorithm. We can see when time 68 Figure 4.2: Expected total discounted reward for K-conservative policies goes to infinity, the percentage of time playing (1 ( +)) optimal arm approaches 100%. Figure 4.3: Percentage of time that a 1 ( +) optimal arm is selected over time by UCB-P algorithm 69 UCB-P algorithm, although mathematically proven to have logarithmic regret when T!1, doesn’t work that well in practice since it convergence slowly when the differ- ences between arms are small. Thus, we use UCBP-TUNED algorithm, which the bound of UCB-P algorithm is tuned more finely. UCB1-TUNED [121] is adapted in our UCBP- TUNED algorithm. We use V k (s) def = ( 1 s s =1 ( (1)A k; +C R 1 +C ) 2 ) (4.22) (1) A k +C R 1 +C + r 2 lnn s ; as an upper confidence bound for the variance of arm k, in which k is the arm index, and s is the number of periods that arm k is played during the first n periods,A k; is the A-reward that arm k played theth time. We replace the upper confidence bound p 2 ln(n)=n j of policy UCB-P with s ln(n) n j minf1=4;V j (n j )g; (4.23) and rerun the simulations. From Fig. 4.4, we can see that UCBP-TUNED algorithm converges much faster than UCB-P algorithm. 70 Figure 4.4: Percentage of time that a 1 ( +) optimal arm is selected over time by UCBP-TUNED algorithm 4.6 Summary This chapter discusses the optimal policy of transmitting over a Gilbert Elliott Channel to maximize the expected total discounted rewards. If the underlying channel transition probabilities are known, the optimal policy has a single threshold. The threshold de- termines a K-conservative policy. If the underlying channel transition probabilities are unknown but a bound on is known, we relax the requirement and show how to learn (1 ( +))-optimal policy. We designed UCB-P algorithm and the simulation results 71 have shown that the percentage of selecting the (1 ( +))-optimal arm approaches 100% whenn!1. 72 Chapter 5 Dynamic Multi-carrier Selection: Binary Bid 5.1 Overview Optimizing throughput is one of the central problems in wireless networking research. To make good use of the available channels, the transmitter needs to smartly adapt its trans- mission parameter such as power and data rate based on the channel conditions. However, the channel information is revealed at the receiver (carrier) side and the transmitter relies on the feedback from the carriers to do the power and rate allocation. To the transmitter, the channel information is incomplete. In the following two chapters, we study a simple yet fundamental problem which a transmitter is able to transmit data over multiple i.i.d channels owned by different carriers. When the transmitter has some data to sent, it initialize an auction, and the carriers reply the auction with a bid indicating its channel quality. The transmitter then allocates 73 power and data rate according to the bids. The transmitter pays the carriers to use the channel and the carriers are self-interested entities who compete with each other to get the payment from the transmitter. In such an environment, it is not obvious that the carriers will reveal the channel quality truthfully. If they can make more money by lying about the channel quality, they will do so. In this setting, there are two roles: the transmitter and the carriers. The transmitter strategically designs the payment policy to influence the carriers’ behaviors, thus, we consider the transmitter as the game designer. The transmitter’s objective is to maximize the throughput to satisfy its traffic requirement as much as possible. The carriers adjust its bids based on the payment policy and its true channel quality parameter; they can be considered as game players. The objective of a carrier is to maximize the expected payment. In this chapter 1 , we consider a scenario which two carriers compete to carry data from a transmitter in exchange for some payment (see illustration in Fig. 5.1). The two channels from the transmitter to each carrier are independent channels with two states: high or low. The channel states are assumed to be i.i.d. Bernoulli random variables. Initially we assume that the carriers both know each others channel parameters, but the transmitter does not. At each time slot, the carriers communicate to the transmitter a binary bid corresponding to the possible state of their respective channels. The transmitter 1 This chapter is based on the work in [122]. 74 Figure 5.1: Illustration of competitive uplink carrier selection and rate allocation problem responds to these bids by deciding whether to send aggressively (at a high, or very high rate) or conservatively (at a low rate) on one or both channels. Specifically when both carriers bid low, the transmitter sends data at a low rate R 0 over both channels. And when both carriers bid high, the transmitter splits its power to send data at a high rate R 1 over both channels. When one of the carriers bids low and the other bids high, the transmitter sends data at a very high rateR 2 over the high bid channel. When the sender sends data at a high or very high rate, we assume that there is a failure and nothing gets sent if the transmission channel actually turns out to be bad. In this case, the sender levies a penalty on the carriers. But whenever data is successfully sent, it pays the carrier a fee proportional to the rate obtained. 75 Since the transmitter’s rate allocation is a competitive resource that directly affects the carriers’ utilities, the setting can be modeled as a two player, non-cooperative game. We initially assume that the underlying channel state parameters are known by both carriers. We use the payoff matrix to analyze the Nash equilibrium of the game, and price of anarchy to analyze the efficiency of the game. We prove that there is a way to set the penalty terms such that both carriers have dominant strategies, and the data carried by two carriers is at least 1/2 of the optimal, in other words, that the price of anarchy from the transmitter’s point of view is at most 2. If the underlying channels’ states are known we can assume that the two carriers will play their dominant strategies if they have one. However, if the underlying channel status is unknown, the carriers need to learn which action is more beneficial. Assuming that the underlying channel state is drawn from an unknown underlying distribution at each time slot, we show that modeling each payers’ choice of action as a multi-armed bandit leads to desirable results. In this chapter we adapt UCB1 algorithm [121], which there are two arms for each receiver, each arm corresponding to an action: bidding high or bidding low. From the simulations, we find that the UCB1 algorithm gives a performance which is close to the dominant strategy, and, when both receivers use UCB1 to choose their strategies, it can give even better payoffs than playing the dominant strategy. 76 5.2 Model In the rate allocation game we consider two carriers and one transmitter. The transmitter uses the two carriers to forward data to the destination. The channel from the transmitter to each carrier has one of the two states at each time slot: low (L or 0) or high (H or 1). The two channels are independent with each other and their state comes from an i.i.d. distribution. We denote p i (i = 0; 1) as the probability that channel i is in state high at any time. Before transmitting, neither the carriers nor the transmitter know the state of the channel. At the beginning of each time slot, each carrier makes a "bid" (high or low). The transmitter allocates rate to the carriers according to the bids sent. At the end of the time slot both carriers observe whether or not their transmission was successful. A transmission is not successful if the respective channel is in a low state but has been assumed to be in a high state. Since the channel state is unknown in advance, the carriers’ bid may lead to an unsuc- cessful transmission. If the the transmission is successful, the carrier is paid an amount proportional to the transmission rate. Otherwise, it will get a penalty (negative reward). Table 5.1 shows the reward functions for each carrier. Based on Shannon-Hartley capacity theorem, we assume thatR 0 <R 1 <R 2 < 2R 1 . C andD are the penalties that the carriers get for making a high bid when their channel state is low. 77 There are two roles in this game setting: the transmitter and the carriers. The trans- mitter wants to carry as much data as possible to the destination. It is not interested in penalizing the carriers, but only uses the penalty to give incentive to the carriers to make good guesses. The carriers are only interested in the reward and they don’t lose any utility from transmitting more data. 5.3 Theoretical Analysis: A Game Setting 5.3.1 Parameters Known Case - Carriers’ Perspective Table 5.2 shows the relationship between the expected rewards for the two carriers as a normal form game. In each cell, the first value corresponds to the reward for carrier 1, and the second value corresponds to the reward for carrier 2. Table 5.1: Reward function for a carrier Bid Actual State Other Channel Bid Reward L L L R 0 H 0 L H L R 0 H 0 H L L C H D H H L R 2 H R 1 78 5.3.1.1 Mixed Strategy Nash Equilibrium We denote by XY i the expected reward for carrier i (i = 1; 2) when carrier 1 bids X, carrier 2 bids Y (where X and Y are high or low). LL 1 =R 0 ; (5.1) LL 2 =R 0 ; LH 1 = 0; LH 2 =p 2 R 2 (1p 2 )C; HL 1 =p 1 R 2 (1p 1 )C; HL 2 = 0; HH 1 =p 1 R 1 (1p 1 )D; HH 2 =p 2 R 1 (1p 2 )D: Table 5.2: Carriers’ payoff matrix X X X X X X X X X X X X Carrier 1 Carrier 2 L H L (R 0 ;R 0 ) (0;p 2 R 2 (1p 2 )C) H (p 1 R 2 (1p 1 )C; 0) (p 1 R 1 (1p 1 )D;p 2 R 1 (1p 2 )D) 79 Let carrier 1 bid high with probabilityq 1 , and carrier 2 bid high with probabilityq 2 . At Nash equilibrium, carrier 1 selects the probability such that the utility function for carrier 2 is the same for both bidding high and bidding low. Therefore we have: (1q 1 )LL 2 = (1q 1 )LH 2 +q 1 HH 2 : (5.2) Similarly for carrier 2: (1q 2 )LL 1 = (1q 2 )HL 1 +q 2 HH 1 : (5.3) Solving 5.2 and 5.3 we get : q 1 = C +Cp 2 R 0 +p 2 R 2 C +D +Cp 2 Dp 2 R 0 p 2 R 1 +p 2 R 2 ; (5.4) q 2 = C +Cp 1 R 0 +p 1 R 2 C +D +Cp 1 Dp 1 R 0 p 1 R 1 +p 1 R 2 : (5.5) 80 Settingq 1 andq 2 to be 0 or 1, we can find a relationship between the values ofp 1 andp 2 and the existence of a pure strategy Nash equilibrium. If q 1 = 0 and q 2 = 0, then p 1 = C+R 0 C+R 2 p 2 = C+R 0 C+R 2 : If q 1 = 0 and q 2 = 1, then p 1 = D D+R 1 p 2 = C+R 0 C+R 2 : If q 1 = 1 and q 2 = 0, then p 1 = C+R 0 C+R 2 p 2 = D D+R 1 : If q 1 = 1 and q 2 = 1, then p 1 = D D+R 1 p 2 = D D+R 1 : (5.6) Denote b 1 = minf C +R 0 C +R 2 ; D D +R 1 g; (5.7) b 2 = maxf C +R 0 C +R 2 ; D D +R 1 g: (5.8) Theorem 6. Ifp 1 = 2 [b 1 ;b 2 ] orp 2 = 2 [b 1 ;b 2 ], then there exists a unique pure strategy Nash equilibrium. Proof. Letp 1 <b 1 , thusp 1 < C+R 0 C+R 2 andp 1 < D D+R 1 HL 1 =p 1 R 2 (1p 1 )C <b 1 (R 2 +C)CR 0 ; HH 1 =p 1 R 1 (1p 1 )D<b 1 (R 1 +D)D 0: 81 Thus carrier 1 has a dominating strategy for bidding low. When carrier 1 bids low, the optimal action for carrier 2 is bidding low ifLL 2 >LH 2 , and high otherwise. Letp 1 >b 2 , thusp 1 > C+R 0 C+R 2 andp 1 > D D+R 1 HL 1 =p 1 R 2 (1p 1 )C >b 2 (R 2 +C)CR 0 ; HH 1 =p 1 R 1 (1p 1 )D>b 2 (R 1 +D)D 0: Thus the dominating strategy for carrier 1 is bidding high. When carrier 1 bids high, the optimal action for carrier 2 is bidding high ifHL 2 <HH 2 and low otherwise. Similarly forp 2 = 2 [b 1 ;b 2 ]. Lemma 1. Ifp 1 2 (b 1 ;b 2 ) andp 2 2 (b 1 ;b 2 ), there exists multiple Nash equilibria. Proof. Letp 1 2 (b 1 ;b 2 ) andp 2 2 (b 1 ;b 2 ), then there are two possible scenarios: Scenario 1:b 1 = C+R 0 C+R 2 ,b 2 = D D+R 1 , then LH 2 =p 2 R 2 (1p 2 )C =p 2 (R 2 +C)C >b 1 (R 2 +C)C =R 0 : Similarly,HL 1 >R 0 . HH 1 =p 1 R 1 (1p 1 )D =p 1 (R 1 +D)D<b 2 (R 1 +D) = 0: 82 Similarly,HH 2 < 0. The payoff matrix for carriers will become as Table 5.3 shown: Table 5.3: Carriers’ payoff matrix-mixed strategy Nash equilibrium scenario 1 X X X X X X X X X X X X Carrier 1 Carrier 2 L H L (R 0 ;R 0 ) (0;>R 0 ) H (>R 0 ; 0) (< 0;< 0) There are two Nash equilibria: when one carrier bids high, the other carrier bids low. Scenario 2:b 1 = D D+R 1 ,b 2 = C+R 0 C+R 2 , then LH 2 =p 2 R 2 (1p 2 )C =p 2 (R 2 +C)C <b 2 (R 2 +C)C =R 0 : Similarly,HL 1 <R 0 . HH 1 =p 1 R 1 (1p 1 )D =p 1 (R 1 +D)D>b 2 (R 1 +D) = 0: Similarly,HH 2 > 0. The payoff matrix for carriers will become as Table 5.5 shown: Table 5.4: Carriers’ payoff matrix-mixed strategy Nash equilibrium scenario 2 X X X X X X X X X X X X Carrier 1 Carrier 2 L H L (R 0 ;R 0 ) (0;<R 0 ) H (<R 0 ; 0) (> 0;> 0) 83 There are two Nash equilibria: both bid high, or both bid low. In the range of (b1;b2) (b1;b2), if both carriers play mixed strategy Nash equilib- rium, their utility could become much worse than they play pure strategy Nash equilib- rium. If the mixed strategy Nash equilibrium is used. The expected total utility function for each carrier are: U 1 = (1q 1 )(1q 2 )R 0 +q 1 (1p 1 )(1q 2 )(C); (5.9) +q 1 (1p 1 )q 2 (D) +q 1 p 1 (1q 2 )R 2 +q 1 p 1 q 2 R 1 : U 2 = (1q 2 )(1q 1 )R 0 +q 2 (1p 2 )(1q 1 )(C); (5.10) +q 2 (1p 2 )q 1 (D) +q 2 p 2 (1q 1 )R 2 +q 2 p2q 1 R 1 : In cases whereb 1 = D D+R 1 ;b 2 = C+R 0 C+R 2 , whenp 1 !b 1 +,p 2 !b 1 +, we haveq 1 ! 1 andq 2 ! 1. Substituting in Eq. (5.10) and Eq. (5.11), we can getU 1 ! 0 andU 2 ! 0, which is much worse than they just playLL Nash equilibrium. Both carriers suffer if they play mixed strategy Nash equilibrium. For simplicity, we want to set C and D such that we only have a pure strategy Nash equilibrium, independent of the probability distributionsp 1 andp 2 . 84 Lemma 2. Given C, there exists a D, such that there only exist a pure strategy Nash equilibrium. Proof. WhenD = CR 1 R 0 R 1 R 0 R 2 , we can getb 1 = b 2 , there only exists pure strategy Nash equilibrium regions. Lemma 3. If there only exists pure strategy Nash equilibrium regions, both carriers have a dominant strategy. Proof. If we only have pure strategy Nash equilibrium regions, then we must haveb 1 = b 2 =p. There are four possible scenarios: Scenarios 1:p 1 <p andp 2 <p, The payoff matrix for carriers will become as Table 5.5 shown: Table 5.5: Carriers’ payoff matrix-pure strategy Nash equilibrium scenario 1 X X X X X X X X X X X X Carrier 1 Carrier 2 L H L (R 0 ;R 0 ) (0;<R 0 ) H (<R 0 ; 0) (< 0;< 0) The dominant strategies for both carriers are bidding low. Similarly, we have Scenario 2: p 1 < p and p 2 > p, dominant strategy for carrier 1 is bidding low and dominant strategy for carrier 2 is bidding high. 85 Scenario 3: p 1 > p andp 2 < p, dominant strategy for carrier 1 is bidding high and dominant strategy for carrier 2 is bidding low. Scenario 4:p 1 >p andp 2 >p, dominant strategy for both carriers is bidding high. 5.3.2 Parameters Known Case - Transmitter’ Perspective In this section, we consider the amount of data which can be sent by the two carriers. Think the transmitter asks the two carriers to forward its data. What the transmitter really cares about is how much data is sent. In this case, when sending fails, we consider the data sent is 0. The penalty term C and D are to let the carriers adjust their bidding, but for transmitter, it does not get such a penalty. Table 5.6 represents the expected rewards table got from the transmitter’s view: Table 5.6: Transmitter’s expected throughput X X X X X X X X X X X X Carrier 1 Carrier 2 L H L (R 0 ;R 0 ) (0;p 2 R 2 ) H (p 1 R 2 ; 0) (p 1 R 1 ;p 2 R 1 ) 86 Utility functions from the transmitter’s point of view: V LL =R 0 +R 0 ; (5.11) V HL =p 1 R 2 ; (5.12) V LH =p 2 R 2 ; (5.13) V HH =p 1 R 1 +p 2 R 1 : (5.14) (5.15) price of anarchy (PoA): PoA = max s2S V (s) min s2NE V (s) : (5.16) where S is the strategy set, NE is the Nash equilibrium set, andV (s) =fV LL ;V HL ;V LH ;V HH g. Theorem 7. IfC = R 0 R 2 R 0 R 1 R 1 R 0 andD = R 0 R 1 R 1 R 0 , thenPoA< 2. Proof. IfC = R 0 R 2 R 0 R 1 R 1 R 0 andD = R 0 R 1 R 1 R 0 , thenb 1 = b 2 = R 0 R 1 . There only exists pure strategy Nash equilibrium regions. Letp = R 0 R 1 , 87 Ifp 1 <p andp 2 <p, V LL = 2R 0 ; (5.17) V HL =p 1 R 2 < R 0 R 2 R 1 < 2R 0 ; (5.18) V LH =p 2 R 2 < R 0 R 2 R 1 < 2R 0 ; (5.19) V HH =p 1 R 1 +p 2 R 1 < 2R 0 : (5.20) (5.21) The optimal is LL. The pure strategy Nash equilibrium is also LL. ThusPoA = 1. Ifp 1 <p andp 2 >p, V LL = 2R 0 < 2p 2 R 1 < 2p 2 R 2 ; (5.22) V HL =p 1 R 2 <p 2 R 2 ; (5.23) V LH =p 2 R 2 ; (5.24) V HH =p 1 R 1 +p 2 R 1 < 2p 2 R 1 < 2p 2 R 2 : (5.25) (5.26) The optimal is at most 2p 2 R 2 . The pure strategy Nash equilibrium is LH. ThusPoA< 2. Ifp 1 >p andp 2 <p, similar to thep 1 <p andp 2 >p case. 88 Ifp 1 >p andp 2 >p, V LL = 2R 0 < 2p 1 R 1 ; (5.27) V HL =p 1 R 2 < 2p 1 R 1 ; (5.28) V LH =p 2 R 2 < 2p 2 R 1 ; (5.29) V HH =p 1 R 1 +p 2 R 1 : (5.30) (5.31) The optimal is at most 2(p 1 R 1 +p 2 R 1 ). The pure strategy Nash equilibrium is HH. Thus PoA< 2 Lemma 4. In the rate allocation game, for any fixed penaltiesC andD, there existp 1 andp 2 such that thePoA is at least 2R 0 R 2 . Proof. Assume thatp 1 = 0 andp 2 = 1. Then the Table 5.7 shows the carriers’ payoff matrices and Table 5.8 shows the transmitter’s payoff. Table 5.7: Carriers’ payoff forp 1 = 0 andp 2 = 1 X X X X X X X X X X X X Carrier 1 Carrier 2 L H L (R 0 ;R 0 ) (0;R 2 ) H (0; 0) (C;R 1 ) 89 Table 5.8: Transmitter’s payoff forp 1 = 0 andp 2 = 1 X X X X X X X X X X X X Carrier 1 Carrier 2 L H L (2R 0 ) (R 2 ) H (0) (R 1 ) SinceR3>R1, then the only The pure strategy Nash equilibrium in this instance of the game is (L;H) for a transmitter utility ofR 1 . If 2R 0 >R 2 then the optimal solution from the transmitter perspective is (L;L) for an utility of 2R 0 . The price of anarchy is at least 2R 0 R3 . Corollary 1. The price of anarchy for the rate allocation game over all instances can be arbitrarily close to 2 for any C and D. Proof. SettingR 0 =R 1 + =R 2 +2 (! 0+) in the Lemma 4 leads to aPoA! 2. This corollary implies that our result in Theorem 7 showing how that the PoA can be bounded by 2 is essentially tight in the sense that no better guarantee could be provided that applies to all problem parameters. 5.4 Online Learning with Unknown Channel State Parameters When the channels’ status are known, and C and D are set as described in Theorem 7, both carriers have dominant strategies. However, when the channels’ status are unknown, the carriers need to try both actions: sending with high data rate or sending with low data 90 rate. The underlying channels are stochastic, even to each carrier, the probability that the channel will be good is unknown. Multi-armed bandits are handy tool to tackle the stochastic channel problems, so we adopt the well known UCB1 algorithm [121] to figure out the optimal strategies for each carrier. The arms correspond to actions: bidding high or low, each carrier only records the average rewards and number of plays and play by the UCB1 algorithm in a distributed manner without taking into account the other carrier’s actions. We recap the UCB1 algorithm in Alg. 2, normalizing the rewards in our case to lie between 0 and 1. Algorithm 2 Algorithm 1: Online learning using UCB1 There are two arms corresponding to each carrier: bidding high or bidding low. Letx l be the rewards which represents the average reward gained by each carrier by playing arm l (l = H, L),n l represents how many times the arm l is played. Initialization: Initially, playing each arm once, store the initial rewards inx l , and set n l = 1. for time slotn = 1; 2; do Select the arm with highest value of x l +D R 2 +D + q 2ln(n) n l . Play the selected arm for a time slot. Update the average reward of the selected arm as well asn l of the selected arm. end for 91 5.5 Simulations In this section, we present some simulation results showing that the UCB1 learning al- gorithm performs well. In all simulations we fix the penaltiesC andD as in Theorem 7 which leaves each carrier with a dominant strategy, but which is not usually known by the carriers. In the figures below we show how the UCB1 learning algorithm compares with playing the dominant strategy (if the carrier knew it) and determine that using UCB1 does not lose much utility in average, and sometimes is better than the dominant strategy. First, in Fig. 5.2, we assume that carrier 2 knows its probability for the state of the channel being high, and plays its own dominant strategy. In this case carrier 1 would be better off if it knew the probability of its state and would play the dominant strategy. However, playing UCB1 does not lose much utility in average. Fig. 5.2 shows for each R 0 as a fraction ofR 1 , the average payoff over multiple games in whichR 0 ,p 1 andp 2 are distributed over their entire domain. In Fig. 5.3 we show the average payoff over multiple choices ofR 0 ,p 1 andp 2 , when carrier 1 plays either the dominant strategy or the UCB1 strategy, and carrier 2 plays the UCB1 strategy. We can see here that the dominant strategy is only better in average for large values ofR 0 and for small values ofR 0 playing UCB1 brings better payoff. Fig. 5.4 and Fig. 5.5 show the same scenarios from the transmitter’s perspective. Fig. 5.4 compares the optimal average utility the transmitter could get from each game 92 Figure 5.2: Carrier 1 payoff against carrier 2 using dominant strategy Figure 5.3: Carrier 1 payoff against carrier 2 using UCB1 strategy 93 to the average utility the transmitter gets from carrier 1 using UCB1 or carrier 1 using its dominant strategy, when carrier 2 plays its dominant strategy. We notice that both strategies give almost the same payoff to the transmitter, especially when the value of R 0 is much smaller compared to R 1 . This happens because when carrier 1 uses UCB1 against a player that uses its dominant strategy then carrier 1 will quickly learn to play its dominant strategy as well. Fig. 5.5 shows how the transmitter optimal payoff compares to the transmitter payoff when the carrier 2 uses the UCB1. When both carriers use the UCB1 algorithm to choose their strategies, the transmitter payoff is better than when one carrier uses the dominant strategy and the other carrier uses the UCB1 learning algorithm. When both carriers are using the UCB1 learning algorithm the carriers don’t play the Nash equilibrium when that is much worse than cooperating. This is why the UCB1 sometimes performs better than the dominant strategy. Finally, Fig. 5.6 shows how the transmission rate varies when carriers use the UCB1 learning algorithm, compared to the optimal transmission rate. In this simulation, we vary the actual probabilities of the two channels while keeping the rewards unchanged, and we observe that when the two channels are equally good the UCB1 algorithm obtains almost optimal transmission rate. We now consider two specific problem instances to illustrate the performance when UCB1 is adopted by both carriers. In both cases, we assume the following parameters 94 Figure 5.4: Transmitter payoff when one carrier uses dominant strategy Figure 5.5: Transmitter payoff when one carrier uses UCB1 strategy 95 Figure 5.6: Normalized transmitter payoff with respect to optimum when both play UCB1 as a function of the two channel parameters hold: R 0 = 40; R 1 = 45; R 2 = 60; C = 120; D = 360; T = 10 5 ; b 1 =b 2 = 8=9. Example 1: Probability parametersp 1 = 6=9; p 2 = 7:9=9 In this case, the payoff matrix from the carriers’ point of view is shown in Table 5.10: Table 5.9: Carriers’ payoff matrix: example 1 X X X X X X X X X X X X Carrier 1 Carrier 2 L H L (40; 40) (0; 38) H (0; 0) (90;4:5) 96 The optimal action (from the transmitter’s perspective) is both carriers bidding low. When both carriers apply UCB1, we find that for carrier 1, the number of times out of 100,000 that it bids high is 657, the number of times it bids low is 99343; for carrier 2, the number of times it bids high is 39814, and the number of times it bids low is 60186. Example 2: Probability parameters:p 1 = 6=9; p 2 = 8:1=9 The payoff matrix from the carriers’ point of view is shown in Table 5.10: Table 5.10: Carriers’ payoff matrix: example 2 X X X X X X X X X X X X Carrier 1 Carrier 2 L H L (40; 40) (0; 42) H (0; 0) (90; 4:5) In this case, the optimal action (from the transmitter’s perspective) is carrier 1 bidding low, carrier 2 bidding high. for Carrier 1, the number of times out of 100,000 that it bids high is 622, the number of times it bids low is 99378; for Carrier 2, the number of times it bids high is 62706, and the number of times it bids low is 62706. These examples illustrate how the distributed learning algorithm is sensitive to the underlying channel parameters and learns to play the right bid over a sufficient period of time, although as expected, the regret is higher when the channel parameter is close tob 1 . 97 5.6 Summary In this chapter, we have presented and investigated a competitive rate allocation game in which two selfish carriers compete to forward the data from a transmitter to a destination for a rate-proportional fee. We showed that even if the transmitter is unaware of the stochastic parameters of the two channels, it can set penalties for failures in such a way that the two carriers’ strategic bids yield a total rate that is not less than half of the best possible rate it could achieve if it had knowledge of the channel parameters. We have also studied the challenging case when the underlying channel is unknown, resulting in a game with unknown stochastic payoffs. For this game, we numerically evaluated the use of the well-known UCB1 strategy for multi-armed bandits, and showed that it gives performance close to the dominant strategies (in the case the payoffs are known) or sometimes even better. In future work, we would like to obtain more rigorous results for the game with unknown stochastic payoffs. 98 Chapter 6 Dynamic Multi-carrier Selection: Multi-bits Bid 6.1 Overview In this chapter 1 , we extend our study in Chapter 5 and consider a problem that the trans- mitter is able to transmit data simultaneously over at most K parallel channels. The channels from the transmitter to each carrier are i.i.d. with two states: high or low. A channel in state high allows the transmitter sending data with a high data rate, while a channel in state low only allows the transmitter sending data with a low data rate. We assume that the probability that a channel in state high isp, the value ofp are different among different channels. When the conservative data rate is used, no matter what chan- nel state is, the transmission is always successful. However, when the aggressive data rate is used, with probability (1p), the transmission will fail, and we assume all the data are 1 This chapter is based on the work in [123]. 99 lost. The transmitter rewards the carriers for successful transmissions and penalize them for failure. In this chapter, we model the problem as an auction, in which the transmitter is an auc- tioneer, and the carriers are the bidders. The objective of the transmitter is to efficiently use the channels to meet its traffic requirement. When the traffic is heavy, this is equiv- alent to maximizing the total data rate over multiple channels under a power constraint. The objective of the carriers is to maximize the expected payment from the transmitter. The main focus of this chapter is to design the payment mechanism in such a way that the carriers will reveal their channel qualities truthfully, and as result, the transmitter is able to efficiently use its power and maximize the throughput. We summarize our results as follows: We propose a payment mechanism using a convex piecewise linear function of channel probabilities, and prove that bidding truthfully is always a preferable action for a carrier. We prove that the throughput obtained by the transmitter approaches the maxi- mum possible throughput with perfect information about the channel statistics as the number of bits per bid increases. Since bidding truthfully is always preferable, comparing with many existing lit- eratures on competitive rate allocation, which typically runs multiple iterations to 100 converge, our proposed mechanism is one-shot and does not require iterative con- vergence. The system overhead is little since the bids communication requires a few control message exchanges and happens only at the beginning of an auction cycle, and the computation of the payment is light. Our proposed mechanism is win-win for both customers and operators since the customers have better service (and/or lower payment) and the operators may po- tentially obtain larger revenues due to more customers and more efficient use of the channels. This claim is supported by simulations based on a dataset of real BS locations over a 2km 2km area in London. 6.2 Model We consider the transmission problem of a single mobile device which can simultaneously transmit data over multiple channels provided by multiple operators. In this chapter, we call such a mobile device as the transmitter and different operators who own different channels as carriers. We study in this chapter a simple yet fundamental rate allocation problem in which a transmitter does not know the states of the channels, and the corre- sponding carriers are self-interested. 101 Figure 6.1: System model of multi-carrier rate allocation The system works as shown in Fig. 6.1: when the transmitter has some data to sent, it announces an auction to request channel resources. The nearby carriers reply to the request with a bid indicating the quality of their respective channels. After receiving the bids, the transmitter ranks the bids, estimates the channel quality, selects a set of channels, determines power and data rate allocation strategies, replies to the selected channel with allocated data rate, and then transmits data. Whenever data is successfully sent, the transmitter gives credits to the carrier. However, we assume that there is a failure and nothing gets sent if the transmission channel is bad but the transmitter uses a high data rate to send data. In this case, the carrier incurs a penalty, and returns credits to the transmitter. The transmitter stays with the same set of channels and use the same power and data allocation strategy until it finishes transmitting or announces another auction. We call this one request-reply cycle one auction cycle. 102 The key elements to determine the transmitter’s allocation strategy is the bids from the carriers, and its own traffic requirement. The objective of the transmitter is to efficiently allocate power and data rate to meet the traffic requirement. When the traffic is heavy, the transmitter has to maximize the total data rate under a given power constraint to best meet its traffic requirement. However, if the traffic is light, a few channels or even a single channel can satisfy the transmitter’s traffic requirement, in which case, the transmitter may only use a small set of channels. In our mechanism, each auction cycle can be considered as a single shot game. A car- rier who gets some data from the transmitter during one auction cycle may not necessarily get data from another auction cycle. We assume that the transmitter’s total traffic is the transmitter’s private information, and is not revealed to the carriers. Moreover, A carrier is not able to estimate such information based on the amount of data allocated to it (0 if its channel is not selected). In practice, especially in a relatively stable environment, the transmitter may select the same/similar sets of channels in different auction cycles. To monitor the carriers’ performance and generate the payments, a third party (i.e., service broker) software is installed in the transmitter side and the carrier side. The trans- mitter side software monitors the bids received from multiple operators and how the mo- bile device splits data stream. The carrier side software monitors the bids sent and the 103 successfully received data. As noted before, credits are converted to financial payments over a longer time period, such as monthly billing. To emulate the real system, we have the following assumptions: A1: A transmitter is able to transmit data simultaneously over at most K parallel channels. A2: There are N carriers (channels) competing to get data from the transmitter. Typically, N > K. N is a variable, which changes with locations and time. Different areas may have different number of carriers nearby. Even if in the same location, a carrier may still choose not to participate in the competition for some auction cycles depending on its available resources. A3: Channel quality dynamically changes over time due to the transmitter’s move- ment, environment changes, and many other reasons. However, in one auction period, we assume that the channel quality stays the same. A4: Transmitter’s traffic requirement dynamically changes. Based on the traffic requirement and channels’ quality, the transmitter may simultaneously be able to connect k channels,k<K, andk is a variable. In one auction period, we assume thatk stays the same. A5: It is difficult for a carrier to monitor other carriers’ channels’ quality, or estimate the transmitter’s traffic requirement. A channel which is not selected by the transmitter 104 may not necessarily be due to bad channel quality, it is also possible that the transmitter does not need that many channels. Thus, the history helps neither in estimating other carrier’s channels, nor in estimating the transmitter’s selections. If a carriers gets some data from the transmitter, all it knows is that it is selected during this auction cycle; what is the percentage of the allocated data among the total traffic is unknown. A6: The carriers are risk averse, which in our context means the carriers tend to choose actions which may give a possibly lower, but a more quantifiable expected payoff rather than choose actions which give unquantifiable payoffs. Although the latter actions may sometimes give high returns, there also exists the risk to get a lower or even negative expected reward. 6.2.1 Power-Rate Model The expected data rate is a concave function of the power. We assume that there are two sets of rate allocation strategies depending on the channel state. A channel in high state (i.e., the channel noise level is low) allows the transmitter sending data aggressively, shown in the upper curvef h , while a channel in low state (i.e., the channel noise level is high) only allows the transmitter sending data conservatively, shown in the lower curve f l . We assume that if data rate allocated based on the curvef h , the transmission will fail if the channel condition turns out to be in low state and nothing gets sent (i.e., the noise 105 level is too high so thatSNR<SNR outage , the data got corrupted). However, if data rate is allocated based on curvef l , the transmission will be always successful no matter what the channel states are. Whenever data are successfully sent, the transmitter rewards the carrier some credits, and the carrier incurs a penalty and returns credits to the transmitter for a failed transmission. Figure 6.2: Data versus power allocation 6.3 Theoretical Analysis: an Auction Setting Before we get to describe the auction mechanism design, we first take a look at what the optimal power and data allocation strategies are for a transmitter given perfect information about multiple parallel channels. 106 6.3.1 Optimal Power and Data Allocation We assume the transmitter is able to simultaneously transmit data over at mostK chan- nels. The indices of the carriers represent their bid rankings. For example, carrier 1 is the one who sends the highest bid. The channels from the transmitter to each carrier are independent stochastic channels with two states: high or low. We assume that the channel states are independent and identically distributed (i.i.d.) Bernoulli random variables. Letp i (i = 1; 2;:::;N) be the probability that channeli is in state high during this auction cycle. The transmitter’s maximum power is P max , and in each auction cycle, a transmitter ranks the carriers’ bids, and selectsk (K) best of them, and allocates powerP 1 ;P 2 ,...P k to their corresponding channels. k X i=1 P i P max As we know, if the transmitter has the perfect information about the channels’ statis- tics, it can allocates power and data in an optimized way. As an example, let us assume that the transmitter is able to simultaneously transmit data over at most 2 channels, and the best two channels’ probabilities in high state arep 1 andp 2 , and such information is known by the transmitter. We further assume that the total power used by the transmitter 107 is 1, and letP denote the portion of power allocated to carrier 1; the power allocated to carrier 2 is 1P . Depending on the traffic , we can always scale up/down the total power. The transmitter can select allocation strategies from the following: both channels’ allocated data rates are based on the curvef h ; channel 1’s data rate allocation is based on the curvef h , while channel 2 data rate allocation is based onf l ; both channels’ allocated data rates are based on the curvef l . Let V opt (P ) denote the maximum expected data rate given that power allocated to carrier 1 isP , and letV opt denote the maximum expected data rate of optimal strategy. V opt (P ) = maxff l (P ) +f l (1P ); p 1 f h (P ) +f l (1P ); p 1 f h (P ) +p 2 f h (1P )g: V opt = max P2[0;1] fV opt (P )g: Let opt denote the allocation strategy which obtainsV opt . As an example, we assume that f h = 10log(1 + 100P ); f l = 10log(1 + 2P ); the optimal power-rate curve selection is as Fig. 6.3 shown. 108 Figure 6.3: Optimal power-rate curve selection 6.3.2 A Incentive Mechanism Design Which Ensures Truthfulness As discussed before, to optimally allocate power and data, the key is to know the channel statistic parameters. However, only the receivers (carriers) are able to accurately estimate such parameters. How to ensure truthful bidding is the main focus of this section. We present below the details of our proposed incentive mechanism which guarantees truth- fulness. As introduced before, after the transmitter announces an auction, the nearby carriers will respond to the auction with a bid. We assume that the number of bits per bid isl; and the whole probability interval then are divided inton = 2 l smaller intervals, denoted by [0; 1 ]; [ 1 ; 2 ]:::[ n1 ; n ]. 2 2 These i s are predefined in the contract, they can be evenly or unevenly distributed between [0;1] depending on system requirements or conditions. 109 Each small interval [ i ; i+1 ] is assigned a different reward and penalty value. The carriers’ bids are based on these values and their true channel statistics. We design the rewards and penalties based on a convex function f(p) =R x (1 +) p1 ( > 0; p2 [0; 1]); whereR x is the allocated data rate. Given ( i ; f( i )) and( i+1 ; f( i+1 )), a straight line is determined. We assume that the slope of this line isk i , and the intercept on the y axis isl i , then we have 8 > > > < > > > : k i i +l i =f( i ) =R x (1 +) i 1 k i i+1 +l i =f( i+1 ) =R x (1 +) i+1 1 Solving the linear equations, we get 8 > > > < > > > : k i = Rx[(1+) i+1 1 (1+) i 1 ] i+1 i l i = Rx[ i+1 (1+) i 1 i (1+) i+1 1 ] i+1 i Sincef(p) = R x (1 +) p1 is a convex function with respect top, the slope of the line keeps increasing when the index i increases. The set of these line segments compose a convex piecewise linear function, as shown in Fig. 6.4. 110 If a carrier bid isi, the rewards are given as follows: R = 8 > > > > > > > > < > > > > > > > > : R x ifR x is allocated based onf l k i +l i ifR x is allocated based onf h and succeed l i ifR x is allocated based onf h and fail If the channel probability isp, andR x is allocated based onf h , the expected reward for biddingi isk i p +l i . For allocating data rate based on curvef h case, the set of all the expected rewards for bidding truthfully compose a convex piecewise linear function, shown in Fig. 6.4. The lines in different colors represent the expected payoffs for different probability intervals. Theorem 8. If the expected payoff of truthful bidding is a convex piecewise linear function with respect to channel probability p, the carriers will bid truthfully. Proof. If the transmitter uses the lower curvef l to allocate power and data to a channel, no matter what the carrier bids, the allocation is the same, so is the expected payoff. If the transmitter uses the upper curve f h to allocate power and data, assume that a carrier’s channel probabilityp andp2 [ i ; i+1 ], with probabilityp, it can successfully receive the data, and with probability (1p), it will fail. Letu i denote the payoff that it 111 Figure 6.4: Expected payoff versus probabilities based on a convex piecewise linear func- tion bidsi, the expected payoff for biddingi isE[u i ] = k i p +l i , demonstrated by lineL i in Fig. 6.4. A truthful bidding for this carrier isi. However, we assume that it bidsm instead. Ifm>i, the reward and penalty assignment will be based on lineL m : The expected reward for this carrier will bek m p +l m . From Fig. 6.4, considering the line segments in the range of [ i ; i+1 ], we can easily see that the expected payoff of L i is greater than L m . Thus, overbidding gives a lower expected reward, and a rational carrier should not overbid. 112 Similarly, we can prove that carrieri will not underbid. For a channel which is selected, the above analysis shows that for the corresponding carrier, it has no incentive to lie about its channel quality. For a channel which is not selected, overbidding may increase the possibility to be selected. However, under the assumption 4~6, overbidding may still not be a good idea for the following reasons. First, since the traffic from the transmitter is unknown, a very good channel may not be selected (The transmitter may not need that many channels). Second, without the knowledge of other channels, a channel who overbids may still not be selected, especially when there are many good channels participating the auction in that auction cycle. Thirdly, if a channel overbids and is selected, as a carrier, it does not know whether this is because of overbidding. It is still possible that it will be selected even without overbidding, in which case, overbidding yields less expected reward. Fourthly, in a real system, there are typically multiple transmitters requesting resources, a channel which is not selected by one transmitter may be selected by another, and a channel which is bad to one transmitter may turn out to be good to another transmitter. Thus, as a risk averse carrier, bidding truthfully is always a more preferable action. We define the data rate efficiency, denoted by , as the ratio of the total data rate obtained from the auction to the rate obtained by the transmitter under perfect information about the channel statistics. 113 Theorem 9. For fixed rates setting, when the granularity of the probability range ap- proaches 0, the data rate efficiency from the transmitter’s point of view approaches 1. Proof. Refer to Fig. 6.3, if a grid covers no boundary of two regions, the power and data allocation is already optimal. However, if a grid covers the boundary of two regions, the optimal strategy is undetermined. For the latter case, assume the optimal data rate isV opt , however, the transmitter chooses a suboptimal action which gives an expected data rate V subopt . Since the expected data rate function is continuous, the expected data rate at the boundary is equal for both allocation strategies, there exists a grid length which makes V opt V subopt <. When! 0;! 0, = V subopt V opt = V opt V opt ! 1. Remark: The key part of this incentive mechanism design is the convexity of the func- tion. In general, any convex piecewise linear function can guarantee truthfulness. Taking the limit of the probability range parameters, the payoff design curve becomes a smooth convex function, shown in Fig. 6.5. The reward design will be based on the slope and intersection of the tangent line. 114 Figure 6.5: Expected payoff versus probabilities based on a smooth convex function 6.4 Implementation In this section, we discuss some implementation issues. The following implementation is meant to be an illustrative example, it may not necessarily be the best or the unique design. In this implementation, there is a middle man (i.e., third party software) between the transmitter and the carriers. Initially, the middle man designs the payment contract, and end-parties: the transmitter and the carriers, agree with the contract. In this example, the following items are predefined in the contract: The convex function which the payment functions rely on, for example f(p) =R x (1 +) p1 (>0; p2 [0; 1]); 115 (the value of is also predefined in the contract) . The probability interval boundaries ( 0 ; 1 ; 2; ::: n ): With the above information, the expected reward for a carrier is purely determined by its channel probabilityp and the allocated dataR x . Once the two parties agree on the contract, they install the third party software: the transmitter side software monitors the bids received from multiple operators and how the mobile device splits data stream; the carrier side software monitors the bids sent and the successfully received data rate. Periodically, a transmitter announces an auction. The car- riers response to the auction by sending a bid, indicating their channel quality. Then the transmitter selects a set of channels, determines power and data rate allocation strategies, replies to the selected carriers with the data rate it will use and then starts transmitting. A dynamic reward is determined based on the successfully received data rate. The reward can be positive or negative depending on the performance. For successful transmissions, the carriers win some credits; for unsuccessful transmissions, the carriers lose some cred- its (return credits back). The transmitter stays with the selected channels until next auc- tion period. The auction period length, for example, can be as long as the coherence time. Similar to current cellular network, a monthly bill is generated based on the overall per- formance (i.e, the credits are converted to a financial payment over a month). Different 116 from current cellular network payment model, in our model, the price is always dynamic depending on the channel quality and traffic requirement 3 . Here are pseudo code of the transmitter side software and the carrier side software: Algorithm 3, and Algorithm 4. Algorithm 3 Pseudo Code of the Transmitter Side Software while There are some data to be sent do Announce an Auction: The transmitter broadcasts an auction notification, request- ing channel resources from nearby carriers. Power and Rate Allocation: After receiving bids from nearby carriers, the trans- mitter ranks the bids, b 1 > b 2 > ::: > b n , converted the bids to probability values, ^ p 1 > ^ p 2 >:::> ^ p n (i.e. it randomly picks up a probablity value corresponding to each bid, for example, it randomly picks up a probability value in [ i ; i+1 ) if the received bid isi), selects a set of channels, and allocates power and data rate optimally based on ^ p i to best serve the traffic requirement under the total power constrainP max , let (P i ;R i ) denote the power and data rate allocated to thei th highest bid channel. Reply Bids: The transmitter replies to the selected carriers with the data rateR x and whether it is an aggressive rate or conservative rate. Transmitting: Transmit dataR i using powerP i . end while The duration of the auction cycle can be made dynamic, and it could depend on var- ious issues such as overhead, protocol/standard constraints on signaling and control fre- quency, and also performance. For example, in a stable environment, we can use coher- ence time as the time length of the auction cycle. However, in a highly varying envi- ronment, we can use the rate adaptation mechanism in cellular networks such as using 3 Note that several other researchers have previously proposed and explored various dynamic pricing schemes for cellular networks [124–127] . 117 Algorithm 4 Pseudo Code of the Carrier Side Software Initialization: The convex functionf(p); The probability interval boundaries: 0 ; 1 ; 2 ;::: n . while Listening to the auction notifications do if An auction notification is received, and it decides to participate in the auction then. Reply the Auction: The carrier replies to the transmitter’s auction with a bid, for example, i. Receiving Data from the Transmitter: Calculate the credits based on the per- formance and the bids using the proposed auction model. end if end while Generating Monthly Bill: A monthly bill is generated based on accumulated credits. different coding or modulation mechanisms 4 and we can make the auction cycle longer to tradeoff overhead reduction for performance reduction, and the bids are about the average channel quality. Regarding to the overhead introduced by this system, it is small. First, the commu- nication of the bids are small control messages, and takes very little time to exchange. Such a communication only happens at the beginning of the auction cycle. In a stable environment, after a short time to establish the channel selection, the transmission can last for a while. In a highly dynamic environment, it is possible that the channel changes to a different state after or even during the small control messages exchange. However, in such a rapidly changing environment, it would certainly be very challenging to have 4 In current cellular network, a mobile device typically uses one channel at a time, while our work supports transmitting over multiple channels simultaneously. 118 an optimal solution. The auction cycle duration can be optimized but this is outside of the scope of this study. Second, since the system is about a multi-carriers scenario, it requires a good design and deployment of control signaling channels for efficient data transmission control and the overall system performance, we adapt the approaches pro- posed in carrier aggregation [128]. For example, in LTE/LET-Advanced systems, with a minor modification of the contritol structure in LTE systems, each carrier can have its own coded control channel [128, 129]. 6.5 Simulations In this section we first present the convergence of data rate efficiency for the worst case setting, then we use a data set of real BS locations over an 2km 2km area in London to conduct two sets of simulations: the single operator contract model (SOCM) and our proposed auction model (AM). The transmission in this simulation is time slotted: at each slot, the transmission will fail with the corresponding probability. In practice, we can also use mt mr to approximatep, wherem t is the amount of data allocated andm r is the amount of data received. For biddingi, the expected payment will bek i p +l i =k i mt mr +l i . 119 Table 6.1: Worst case data rate efficiency Bits 1 2 3 4 5 6 7 8 Worst 0.58 0.60 0.62 0.67 0.75 0.84 0.94 0.96 6.5.1 Worst Case Data Rate Efficiency We consider the two carriers model; each carrier knows its channel parameter. To show the effectiveness of the multiple-bit bids, we use a simplified power-rate allocation model, in which either the full power is allocated to a single channel, or equally split between two channels. We use the possible data rate of LTE network and selectR 0 = 50; R 1 = 3500; R 2 = 6144 in the unit of Kb/slot as our rate parameters [130].R 0 is a conservative rate given half of the power is allocated to the channel; R 1 is an aggressive rate given half of the power, andR 2 is a very aggressive rate given the full power. We consider the whole probabilities range is divided into [0; R 0 R 1 ], and small equally-divided intervals 5 of [ R 0 R 1 ; 1]. We select the worst data rate efficiency among all possible probabilities setting with increased number of bits per bid. The result is shown in Table 6.1. We can see that the worst case data rate efficiency converges to 1 with increased number of bits per bid. 5 For two carrier, equally-split power scenario, R0 R1 acts as a threshold. Take a single-bit bid as an exam- ple, when channel probabilityp< R0 R1 , the dominant strategy for a carrier is to bid low, and whenp> R0 R1 , the dominant strategy is to bid high. More details can be found in reference [123]. 120 6.5.2 Performance Evaluation In this section, we evaluate the performance of our proposed aucion model by comparing it with the commonly used single operator contract model. We use the data set from Ofcom’s Sitefinder [131], and obtain precise coordinates of BSs from two major operators over an 2km 2 km area in London as shown in Fig. 6.6. There are 158 BSs (marked as blue) from Operator 1 and 128 BSs from Operator 2 (marked as red). We evaluate the mobile device’s throughput as well as the carriers’ net profit. Figure 6.6: 2km 2km view of the BS deployment by two major cellular operators over an area in London 121 6.5.2.1 Channel and Power-Rate Model We use the simple path loss model with log-normal shadowing and fading to model the channel. The received power in the unit of dB can be obtained from the following equa- tion: Pr =Pt +L 10log( d d 0 ) + ; whereP t is the transmission power,L is a constant, is the path loss exponent, and is the log-normal shadowing and fading, N(0; 2 ) in the unit of dB . We setP t = 24dB, L =34dB, = 3:5,d 0 = 1m, and = 10 in our simulation. We assume the noise power isN =100dB, and ignore the interference from mobile devices in neighboring cells. We use signal to noise ratio (SNR) to measure the receiving signal’s quality. SNR = PrN = 90 35log(d) + : We assume SNR outage capacitySNR outage = 10dB: WhenSNRSNR outage , we consider the channel is in state high, otherwise, the channel is in state low. The probability that a channel is in state high is 122 p =p(SNR>SNR outage ): As to the power-rate model, We assume the total power is 1, and use the following equations: f l = 10log(1 + 100P ) f h = 10log(1 + 2P ). 6.5.2.2 Single Operator Contract Model versus Auction Model SOCM is the commonly used model nowadays. In this model, each transmitter contracts with a single carrier and is only allowed to connect to a single BS from the carrier it is bound to. Since the transmitter allocates all power to a single channel, plug P = 1 in f l andf h , we get ratesR l = 11; andR h = 46 in the unit of kb/slot. Whenp < 11 46 , the transmitter transmits with rate R l , otherwise, with rate R h . We sample 100 customers from Operator 1 and 100 customers from Operator 2 located uniformly at random in this area. The payment to the carriers is proportional to the amount of the transmitted data. We assume that it is 10 4 in the unit of $/kb ($10 for 1 Gb data). Consequently, the payment is u = 10 4 R x in the unit of $/slot. AM is our proposed model. In this evaluation, we consider K = 2 case. We sample 200 customers placed at the same 123 random locations as in the single carrier contract model and use 8-bit bids: the whole probability range are evenly divided into 2 8 smaller intervals; the payment is based on f(p) = 1:1 p1 10 4 R x (p2 [0; 1]) in the unit of $/slot. The simulation results are shown in Table 6.2. We can see that our proposed model provides a win-win strategy to solve the wireless bandwidth thirst problem. The average throughput of a transmitter almost doubles while paying less per byte, and the opera- tors make more revenue due to more potential customers and more efficient use of the spectrum by mainly serving the nearby mobile devices. Table 6.2: Single operator contract model versus proposed auction model SOCM AM Op1 Op2 Op1 Op2 Transmitter’s avg throughput(kb/slot) 27.89 32.89 58.83 Transmitter’s avg payment per kb ($) 10 4 7:2 10 5 (<10 4 ) Carrier’s avg profit per slot ($/slot) 0.2789 0.3289 0.3649 0.4931 Note that the actual contract adopted in practice may depend on other market factors, but these examples show the overall benefit of carrier flexibility to both users (in terms of increased throughput and possibly reduced marginal cost) and operators (in terms of increased profit). We believe that the gain can be even more significant in future wireless system with greater carrier diversity and higher traffic. 124 6.6 Summary We have presented and investigated a competitive rate allocation game in which multiple selfish carriers compete to carry data from a transmitter in exchange for a payment. We have shown that even if the transmitter is unaware of the stochastic parameters of the channels, it can set rewards and penalties in such a way that the carriers’ strategic bids yield an expected total rate that is close to the best possible expected total rate. The payment is designed according to a convex piecewise linear function; this design gives the incentive for the carriers to bid truthfully. With this design, even the worst case data rate efficiency from the transmitter’s point of view converges to 1 for a large number of bits. Through simulations, we have compared our proposed model with the commonly used single carrier contract model, and have shown that our proposed model could be beneficial to both the mobile users as well as the operators. 125 Chapter 7 Transmission for Random Arrivals with Energy-Delay Tradeoff 7.1 Overview In wireless networks, a key tradeoff is between energy and delay. In this chapter 1 ., we formulate, study, and solve a fundamental problem pertaining to the energy-delay tradeoff in the context of transmissions on a time slotted channel. In a general form, this problem consists of multiple independent nodes with arbitrary packet arrivals over time. At each slot, every node decides at the beginning of a slot whether to send one or more of its queued as well as newly arrived packets, or to defer some or all of them to a future slot. The more the total number of packets sent on a given slot, the higher the energy cost 1 This chapter is based on the work in [132]. 126 (modeled as an arbitrary strictly convex, increasing and exponentially bounded function of the total number of simultaneously scheduled packets); on the other hand, deferral incurs a delay cost for each deferred packet. In this system, there is a centralized scheduler who knows the arrivals and sched- ules the various arriving packets to different slots in order to minimize a cost function which combines both the deferral penalty and the energy penalty. We first assume that the centralized scheduler has a complete knowledge about the packet arrivals, and develop a backward greedy algorithm, which is proved to be optimal. Then we consider a more realistic scenario, in which the centralized scheduler only knows about the arrivals up to the current time slot, and develop an efficient online algorithm for this scenario, which is proved to be O(1)-competitive of the optimal. 7.2 Model 7.2.1 Motivating Example We present one example which motivates the general problem we investigate in this work. Consider the simplest case where there is one sender and one receiver, and there is a single time-slotted channel between the sender and the receiver, shown in Fig. 7.1. There are arrivals over time. The sender can send multiple packets at a higher power cost, or defer 127 Figure 7.1: Illustration example of scheduling problem with energy-delay tradeoff some of them, incurring a delay penalty for each packet for each time-slot it spends in the queue. For a single channel, the rate obtained can be modelled as R = Blog 2 (1 + Pr N ), whereB is the bandwidth,P r is received power, andN is noise power. The transmission powerP t / P r . AssumeN is a constant,P t / 2 R , where = 1 B . We can considerR as the number of packets sent in a time slot. For illustration, we consider three options for the sender: batch scheduling (sending all packets in one slot), constant rate scheduling (sending packets with a constant rate), and dynamic scheduling (sending different number of packets in different slots). 10 packets arrive at once, and there are 5 slots to use. We use a linear function to represent the deferral penalty (i.e. a packet arrives at thej th slot is transmitted in k th (> j) slot, the deferral penalty for this packet is kj). We use f(x) = 2 0:5x 1 to represent the transmission power, wherex is the number of packets scheduled in the same slot. 128 Table 7.1: Comparison of three scheduling scheme Scheduling scheme Deferral Energy Total Penalty Batch: (10,0,0,0,0) 0 32 31 Constant rate: (2,2,2,2,2) 20 5 25 Optimal: (4,3,2,1,0) 10 6.2 16.2 Let (S 1 ; S 2 S 3 ; S 4 S 5 ) denote the number of packets scheduled in 5 slots. The schedul- ing for the three illustrative schemes and the corresponding cost is as shown in Table 7.1. From Table 7.1, we can see that the batch scheduling has small deferral penalty but high energy cost, while the constant rate scheduling has small energy cost but high defer- ral penalty. These two scheduling are more costly than the optimal. Intuitively, we should schedule more packets in earlier slots than later slots to balance the tradeoff between the deferral penalty and the energy cost. However, in a real system, in which the arrivals can happen at any slot, the optimal scheduling is not obvious. In this chapter, we investigate this problem, and try to find the optimal or close-to-optimal scheduling for any random arrivals. The problem is particularly challenging when considering the online case, where scheduling and deferring decision must be made without knowledge of future arrivals. Other examples that fit the general formulation we introduce and solve in this chapter include multi-packet reception from cooperative transmitters (in which it is assumed that higher numbers of packets spent at the same slot incur a higher energy cost), as well as multiple sender-receiver pairs employing interference mitigation strategies (such as successive interference cancellation) where too there is a higher energy cost for allowing 129 multiple interfering packets in a given slot. Though in wireless network, the transmission power grows exponentially with the rate (the number of packets scheduled in a slot), our model is in fact even more general, as the only requirement we impose on the energy cost is that it be strictly convex, increasing and exponentially bounded. 7.2.2 Problem Formulation Consider a wireless network in which every node interferes with each other. The channel in this network is time slotted. Packets arrive at the beginning of a time slot, and are scheduled by a centralized scheduler. A packet can be transmitted immediately at the same slot as its arrival, in which case the transmission deferral is 0, or in a latter slot, in which case the transmission deferral is a positive number. We consider that the deferral penalty is linear. A packet transmitted in a slot is affected by the "interference" from all packets which are transmitted at the same time slot. This is modeled as an energy cost that is purely determined by the number of packets transmitted in the same slot. We use f(X) to represent the energy cost in a slot, where X is the number of the packets sent at the same time slot. f(X) is assumed to be any function that is strictly convex, increasing and exponentially bounded and increasing withX. We define the total penalty J(A) as weighted sum of the deferral penalty and the energy cost, as shown in the Eq. 7.1. 130 J(A) =w 1 N X i=1 d i +w 2 M X j=1 f(X j ); (7.1) where w 1 , w 2 are the weights, and M is the number of slots (M can be1), N is the total number of packets arrived. The objective of the centralized scheduler is to minimize J(A). Letw = w 2 w 1 , maximizingJ(A) is equivalent to minimizing the total cost ofC(A), as shown in the Eq.7.2 C(A) = N X i=1 d i +w M X j=1 f(X j ): (7.2) Please note that in this model, what really matters is the number of packets scheduled in each slot and the aggregate number of deferrals for each packet; the model is not concerned with where the packets are from, and which particular nodes or packets should be scheduled at which slot. 131 7.3 Optimal Policy with Complete Channel Information 7.3.1 Greedy Algorithm for A Single Arrival In this subsection, we consider the scenario that there is a single arrival. We can start our timingt = 1 at the slot of the first arrival. We assume that the number of packets in this single arrival is N, and it can be scheduled to any slot t 1. The Greedy algorithm, denoted asG(N), is as shown in Algorithm 5. Algorithm 5 Greedy Algorithm for A Single Arrival Initialization: N packets arrive at the 1 th time slot; we number them as packet 1; 2; ;N. The number of packets scheduled in each slot:X j = 0 forj = 1; 2; . Total costC = 0. Greedy Schedule: for Packet indexi = 1; 2; ;N do Put packeti in the slot which minimizes the marginal cost Update the number of packets in the selected slot Update the total cost end for If there is a tie in minimizing the marginal cost, the slot with the smallest index is selected, which ensures that Greedy algorithm is unique since the slot selected at each step is uniquely determined. Letmc (i) j denote the marginal cost to put thei th packet in thej th slot. We can get the following lemmas from the Greedy algorithm. 132 Lemma 5. The least marginal cost for Greedy algorithm keeps increasing. Proof. Let us compare the marginal cost to schedule thei th packet and the (i+1) th packet. When scheduling the (i + 1) th packet, the number of packets in each slot is exactly the same as scheduling the i th packet, except one slot, in which the i th packet is put. We assume that thei th packet is put in thek th slot. mc (i) j mc (i) k forj6=k. If the (i + 1) th packet is scheduled in a slotj6=k, we knowmc (i+1) j mc (i) k . If the (i + 1) th packet is scheduled in slotk, let us now compare the marginal cost of putting the (i + 1) th packet ink th slot with the marginal cost of putting thei th packet in k th slot. Since the energy cost functionf(X) is convex and increasing, we have mc (i+1) k mc (i) k =w(f(X k + 1) 2f(X k ) +f(X k 1)) 0; whereX k is the number of packets in slotk after scheduling thei th packet in thek th slot. mc (i+1) k mc (i) k . Thus,min j=1;2::: (mc (i) j )min j=1;2::: (mc (i+1) j ), the least marginal cost for Greedy al- gorithm keeps increasing. 133 Lemma 6. Assume that (:::;g 1 ; g 2 ;:::g k ;:::) is a schedule got from Greedy algorithm, and (:::;g 0 1 ;g 0 2 :::g 0 k ;:::) is also a schedule got from Greedy algorithm, if P k t=1 g t P k t=1 g 0 t , we can conclude thatg i g 0 i fori = 1;:::;k: Proof. When a greedy algorithm selects a slot in [t + 1;t + 2;:::;t +k]; it selects the one with the least marginal cost, and the index is the smallest if there is a tie. Since such a slot is unique, the schedule order of applying Greedy algorithm in [t + 1;:::;t +k] is uniquely determined. Since P k t=1 g t P k t=1 g 0 t , the schedule has to reach (:::;g 1 ; g 2 ;:::g k :::) first, which indicatesg i g 0 i fori = 1;:::;k: Define Separate Greedy (denoted asSG) algorithm as follows: to scheduleN packets onM slots. Here, we borrow the notationM for simplicity of the following discussions, M is large enough, for example, the maximum energy cost in a slot is bounded byf(N), when M > wf(N), this indicates that there will be no packet scheduled in slot M in the optimal scheduling. We separate theN packets to be two parts: Nk, k; we also separate the slots to be two parts: the firstm, the last (Mm). The (Nk) packets are scheduled in the firstm slots, while thek packets are scheduled in the last (Mm) slots. The problem becomes (Nk;m) and (k;Mm). We apply Greedy algorithm separately on each part. Please note, when we apply Greedy algorithm on (k;Mm), we construct a fictitious arrival which assumes that thek packets arrives at the (m + 1) th slot, denoted as Fictitious arrival. 134 LetG(M;N) = (g 1 ;g 2 ;:::g m ;g m+1 ;:::;g M ) denote the schedule result of applying Greedy algorithm to scheduleN packets inM slots. LetSG(m;Nk; Mm;k) = (g 0 1 ;g 0 2 ;:::g 0 m ;g 0 m+1 ;g 0 m+2 ;:::g 0 M ) denote the schedule result of applying Separate Greedy algorithm. The cost ofSG is defined asC(SG(m;Nk; Mm;k)) = C(G(m;N k)) +C(G(Mm;k)) +km. Lemma 7. C(G(M;N))C(SG(m;Nk; Mm;k)). Proof. Since the delay is linear, compare the original marginal cost when the arrival hap- pens at the 1 st slot with the Fictitious arrival, the difference is a constant m. Thus, the order of selections of Greedy algorithm on the Fictitious arrival will be the same as ap- plying Greedy algorithm on the original problem. Assumek 0 = P M t=m+1 g t . Ifk 0 =k, then those two algorithms give the same solution since the Greedy algorithm result is unique. If k 0 < k, compareSG algorithm withG, the firstm slot lacks (kk 0 ) steps and the cost reduction is no more than (kk 0 )mc , wheremc is the largest least marginal cost, i.e. theN th step marginal cost inG; the last (Nm) slots obtain (kk 0 ) more steps and the cost increase is no less than (kk 0 )mc . SinceG stops at thek th step for the second part and the marginal cost always increases. Therefore, the total cost ofSG is no less thanG fork 0 < k. Similarly, we can prove k 0 >k. 135 In other words, considering that we sort the two parts schedule steps ofSG based on the increasing of the marginal cost. There will be no swap after the sorting, i.e. inSG, stepk is before stepm, after sorting, this still hods. We compare the cost of each step usingSG withG one by one. The step cost ofSG is no less thanG. Thus,C(G(M;N)) C(SG(m;Nk; Mm;k)). Theorem 10. Greedy algorithm is optimal. Proof. LetOPT (M;N) =(o 1 ;o 2 ;:::o m ) = OPT (m;N) be the optimal, wherem M, o m > 0, and P m t=t 1 o t = N. Once the o m is fixed, the total cost is determined by the schedule of the remaining (m 1) slots. To minimize the total cost of the schedulingN packets inM slots is equivalent to minimizing the total cost of scheduling the remaining (No m ) packets in the first (m 1) slots. Thus,OPT (m;N) = (OPT (m 1;N o m );o m ). LetG(m;N) = (g 1 ;g 2; :::g m ) be the Greedy schedule for schedule N packets in m slots, P m t=t 1 g t =N. We use the induction to prove the theorem. Basis step:N = 1,OPT (M; 1) = (1);G(M; 1) = (1). Inductive step: Assume that fork packets, 0<k<N, the Greedy algorithm can find an optimal solution, then we consider the optimal schedule fork =N. 136 Sinceo m > 0,N OPTnm = P m1 t=1 o t < N, the Greedy algorithm can give an optimal solution. OPT (M;N) =(g 0 1 ;g 0 2 ;:::g 0 m1 ;o m ), where the schedule of the firstm 1 slots is got from Greedy algorithm.OPT (M;N) =SG(m 1;N OPTnm ; 1;o m ). According to Lemma 7,C(G(m;N))C(OPT (M;N)). C(G(M;N)) =C(SG(m;N;Mm; 0)). Apply Lemma 7 again, C(G(M;N)) C(G(m;N)). Thus, the Greedy algorithm is optimal. 7.3.2 Backward Greedy Algorithm for Multiple Arrivals In this subsection, we consider the general scenario in which packets arrive at different slots. We assume that there are K arrivals, and let N i (i = 1;::K) be the number of packets arrive at each arrival. The Backward Greedy algorithm (denotedBG) is as shown in Algorithm 6. We first schedule the last arrival’s (theK th arrival) packets by greedily selecting the slot from t K to1 which minimizes the marginal cost, this is equivalent to the case of single arrival. Then we consider the second last arrival’s packets (the (K 1) th arrival), and schedule them one by one by selecting the slot fromt K1 to1 which minimize the marginal cost. Such a process keeps going until all packets are scheduled. 137 Algorithm 6 Backward Greedy Algorithm Initialization: The time slot index for new arrivals:t 1 ;t 2 ; ;t K The number of packets for each new arrival:N 1 ;N 2 ; ;N K The number of packets scheduled in each slot:X j = 0 forj = 1; 2; . Total costC = 0. Backward Greedy Schedule: for Arrival indexa =K;K 1; ; 1 do for Each packet from thea th arrival do Put the packet in a slot fromt a to1 which minimizes the marginal cost. Update the number of packets in the selected slot Update the total cost end for end for The Fig. 7.2 is a demonstration of how backward greedy works. We consider two arrivals, and compare the scheduling when there is only the first arrival. From the Fig. 7.2, we can see due to the second arrival, the packets from the first arrival are pushed to earlier slots. Please note that fairness is not the concern of this chapter, so some of the earlier packets could be scheduled later than the later arrived packets. However, as the backward greedy algorithm determines only the number of packets scheduled in each slot, it is still possible to readjust the order of packet transmissions (e.g., to a FIFO order) to enable fairness. Lemma 8. InBG, considering the packets at the i th arrival, the marginal cost when scheduling these packets keeps increasing. 138 Figure 7.2: Backward greedy algorithm illustration Proof. For thei th arrival, the marginal cost for scheduling a packet in [t j ;t j+1 ] (j i) keeps increasing based on Lemma 5. We can sort the marginal cost without swapping the schedule steps, i.e. stepk happens before stepm in some [t j ;t j+1 ] (j i), in the sorted steps, this relationship still holds. The selection order ofBG is exactly the same as based on the sorted marginal cost. Theorem 11. The Backward Greedy Algorithm is optimal. Proof. The total cost is composed of two parts: the total cost of deferral, and the total cost of energy. We first consider the total cost of deferral, denoted asC d . Since the deferral cost is a linear function,C d = P M t=1 r i , wherer i is the number of packets which are not sent at the i th slots. Whether the packets left are from new arrival or from some old arrival does not 139 affectC d . Because of this, without considering the fairness, we can freely schedule the new arrival packets first. The energy cost, denoted asC i , purely depends on the number of packets scheduled in each slot. Thus, backward does not affect the total cost. We next consider the simplest multiple arrival case, in which there are two arrivals: N 1 packets arrive att 1 , andN 2 packets arrive att 2 ,t 1 <t 2 M (M can be1). We assume that the optimal schedule is to put (N 1 r) packets from slott 1 to (t 2 1), and (N 2 +r) packets from slot t 2 to M: If r = 0, the problem is decoupled to be two single arrival problem. According to Theorem 10, theBG algorithm is optimal. If r > 0, the deferral cost for these r packets is r(t 2 t 1 ) at the beginning of slot t 2 . Assume that theser packets arrive at slott 2 (denoted as Fictitious arrival), the total cost isr(t 2 t 1 ) less than the original problem. To minimize the total cost of the original problem is equivalent to minimizing the total cost of the Fictitious arrival. This Fictitious arrival can be decoupled as two single arrival problem, and according to Theorem 10, we can apply Greedy algorithm on both single arrival problems to get an optimal solution. LetG L (N 1 r;t 2 t 1 ) = (g 0 l1 ;g 0 l2 ;:::;g 0 lk ) denote the schedule of the (N 1 r) packets, wherek = t 2 t 1 ; andG R (N 2 +r;Mt 2 ) = (g 0 r1 ;g 0 r2 ;:::;g 0 rm ) denote the schedule of (N 2 +r) packets, wherem =Mt 2 , inG R (N 2 +r;Mt 2 ), we can still consider the N 2 packets are scheduled first. (G L ,G R ) is an optimal schedule. Assume theBG algorithm 140 gives a schedule which (N 1 r 0 ) packets is scheduled from slott 1 to (t 2 1), and (N 2 +r 0 ) packets from slott 2 toM: Ifr 0 = r, when scheduling ther packets in slotst 2 toM, the marginal cost got from BG algorithm is the marginal cost got fromG R (N 2 +r;Mt 2 ) in the Fictitious arrival plus a constantr(t 2 t 1 ). Since bothBG andG R in the Fictitious arrival select the slots based on the increasing marginal cost, TheBG and (G L ,G R ) give the same schedule result. Ifr 0 < r, considering the schedule of the first arrival packetsN 1 , compare (G L ,G R ) withBG algorithm, the first (t 2 t 1 ) slot lacks (rr 0 ) steps and the cost reduction is no more than (rr 0 )mc , where mc is the largest least marginal cost, i.e., N th 1 step marginal cost for schedule the first arrival packets inBG ; the last (Mt 2 + 1) slots obtain (r 0 r) more steps and the cost increase is no less than (r 0 r)mc sinceBG stops at ther 0th step for the second part (schedule packets fromt 1 tot 2 ) and the marginal cost always increases. Thus, the total costBG is no less than (G L ,G R ).BG is optimal in this case. Similarly, we can prove forr 0 >r,BG is optimal. We finally consider there areN arrivals.BG(N 1 ;BG(N 2 ;:::BG(N k1 ;N k )). BG(N k1 ;N k ) is a two arrival case, as we proved,BG gives an optimal scheduler. Assume thatOPT k1 =BG(N k1 ;N k ): 141 Then we consider schedule ofBG(N k2 ;OPT k1 ), OPT k1 can be considered as a “single” arrival, then the problem becomes a two arrival case. Similarly, we can recursively prove thatBG is optimal. Please note: 1. The optimal scheduling changes with the set of arrivals. For example, we assume that there areK arrivals, with differentK, the optimal scheduling is different, but given aK, we can find the optimal by scheduling backward in time. 2. It is crucial that the energy cost function is convex. It is not difficult to construct a counter-example of a non-convex energy cost function for which the optimality does not hold. 7.4 Online Learning with Incomplete Knowledge of Arrivals In this section, we propose an efficient online algorithm which isO(1)-competitive. Our online algorithm is based on [113]; before we introduce our algorithm, let us briefly re- view the scheduling mechanism in [113]. In [113], the author develops an online dynamic speed scaling algorithm for the objective of minimizing a linear combination of energy and response time. An instance consists ofn jobs, where jobi has a release timer i , and 142 a positive worky i . An online scheduler is not aware of jobi until timer i , and, at timer i , it learnsy i . For each time, a scheduler specifies a job to be run and a speed at which the processor is run. They assume that preemption is allowed, that is, a job may be suspended and later restarted from the point of suspension. A jobi completes oncey i units of work have been performed oni. The speed is the rate at which work is completed; a job with work y run at a constant speed s completes in y s seconds. The objective of the online scheduler is to minimize R I G(t)dt, whereG(t) = P (s t ) +n t ,s t is the speed at timet, n t is the number of unfinished jobs, andI is the time interval. P also needs to satisfy the following conditions: P is defined, continuous and differentiable at all speeds in [0;1); P (0) = 0;P is strictly increasing;P is strictly convex;P is unbounded. The authors propose an algorithm as follows: The scheduler schedules the unfinished job with the least remaining unfinished work, and runs at speeds t where s t = 8 > > > < > > > : P 1 (n t + 1) ifn t 1 0 ifn t = 0 : (7.3) Every time when a new job is released or a job is finished,s t is updated. Please note that in the dynamic speed scaling problem, the job can be released and finished at any time t, where t is a real number. We call this algorithm continuous online algorithm, denoted asA On C . 143 In our problem, each packet can be considered as a job with unit work,P (x) =wf(x) in our case, sincef(x) satisfies all the conditions forP , so isP (x) = wf(x); n t is the number of unscheduled packets at timet. We develop our online algorithm, denoted as A On D (D means discrete), as Algorithm 7 shown. Algorithm 7 Online Algorithm Initialization: The total number of packets scheduled so far based on algorithmA On C : n s = 0 (n s can be fractional later) The number of packets arrived so far:n t = 0 The number of packets scheduled in each slot:X t = 0 fort = 1; 2; . Total cost:C = 0. Online Schedule: fort = 1; 2; ; do Updaten t if there are new arrivals Let n be the number of packets waiting to be scheduled at the start of interval t, excluding the fractional packet possibly leftover byA On C but already scheduled byA On D in the previous interval. During the interval [t;t + t 1 ], A On C schedules this leftover packet. A On D tracksA On C ’s schedule from (t + t 1 ) onwards using Eq. 3 starting with n remaining packets. Updaten s based on the number of packets (can be fractional) scheduled in a slot usingA On C . UpdateX t =dn t+1 s edn t s e Update C based on the schedule ofX t end for InA On D , the number of packets scheduled at each slot is calculated based on the number of packets scheduled byA On C . In algorithmA On C , a transmission of a packet does not necessarily finish at the integer time point. At the end of a slot, if a packet is scheduled 144 Figure 7.3: Example of scheduling inA On C andA On D across two slots inA On C , we push the packet in the earlier slot inA On D , as shown in Fig. 7.3. ThoughA On D is based onA On C , they are quite different in several aspects: 1. The speed inA On C varies in a slot while inA On D , the speed in a slot can be considered as a constant. 2. The number of packets scheduled in a slot inA On C is a real number, while inA On D , it has to be an integer. 3. InA On C , cost update happens every time when some new packets are coming or a packet is leaving. In other words, the cost is calculated in a continuous way with respect to time. However, inA On D , the cost is updated at the end of a slot (i.e. the integer time point), which is discrete with respect to time. 145 Due to these difference, the competitive analysis in this problem is challenging and the idea of amortized competitive analysis used in [113, 115] can not be directly applied to our problem. Thus, we take a different path, which usesA On C as a bridge to do the competitive analysis. Lemma 9. There exists a constantc such that C(A On D ) C(A On C ) c. Proof. First, the packets scheduled byA On D andA On C is roughly the same, the difference is less than 1 for each slot, and up to any time T , the difference of the total number of packets scheduled byA On D andA On C is less than 1, so we compare the cost ofA On D and A On C slot by slot. LetC(A) t represent the cost at slott. Assuming that during slott,A On D schedulesk packets, the cost ofA On D isC(A On D ) t =P (k) +nk. For algorithmA On C , besides the possible packet leftover at the beginning of a slot (scheduled in [t;t+t 1 ]), it is also possible that there is a packet being processed partially at the end of a slot, let t 2 be the time interval to process this packet in slot t. Let us assume that there arek 1 packets between these two fractional packets if t 2 > 0; k packets if t 2 = 0. Case 1: We first consider t 2 > 0 t 1 + t 2 + 1 P 1 (n + 1) + 1 P 1 (n) + + 1 P 1 (n + 3k) = 1 146 SinceP is increasing, 1 P 1 is decreasing, we have 8 > > > < > > > : k1 P 1 (n+1) 1 k+1 P 1 (n+2k) 1 =) 8 > > > < > > > : P (k 1) n + 1 P (k + 1) +kn + 2 As to the cost of the theA On C , the cost processing the packet when there are n un- scheduled packet is 1 P 1 (n+1) (P (P 1 (n + 1)) +n) = (2n + 1)=P 1 (n + 1) C(A On C ) t 2n + 1 P 1 (n + 1) + + 2(n + 2k) + 1 P 1 (n + 3k) 1 P 1 (n + 1) (2n + 1 + + 2(n + 2k) + 1) = k 1 P 1 (n + 1) (2nk + 3) The first inequality is by ignoring the cost of the fractional packets at the beginning and the end of a slot. The second equality is by scaling the time interval to process each packet based on the smallest whole packet processing time. 147 SinceP is exponentially bounded, assuming that P (x+1) P (x) M for8x, the cost ofA On D is C(A On D ) t =P (k) +nkMP (k 1) +nk M(n + 1) +nk = (M + 1)n +Mk: (7.4) Since the energy cost functionP (x) is strictly convex, increasing and exponentially bounded, the least expensive function (the function with the slowest growing speed) sat- isfying such a condition is polynomial functionf(x) =x ; where> 1. P (x) = wf(x) = wx , let P (c 1 x) = w(c 1 x) wc 1 x, as long as wc 1 1, P (c 1 x)x for8x 1. Similarly,x+P (x+2)P (c 1 x)+P (3x) 2maxfP (c 1 x); P (3x)g = 2P (c 2 x)j c 2 =maxfc 1 ;3g . Applying the same approach again, there exists a constantc 3 2c 2 , such thatP (c 3 x) x +P (x + 2) for8x 1. Plugx =k 1 in, we get P (c 3 (k 1))>P (k + 1) +k 1n + 1 Thus k1 P 1 (n+1) 1 c 3 : 148 For other strictly convex, increasing and exponentially bounded functions which are growing faster thanf(x) =x , similar approach can be applied to prove that there exists such a constantc 3 . C(A On D ) t C(A On C ) t c 3 (M + 1)nk +M 2nk + 3 c: C(A On D ) C(A On C ) = P C(A On D ) t P C(A On C ) t c: Case 2: t 2 = 0, there are two possibilities to make t 2 = 0: eitherA On C finishes processing all the packets before a slot ends or it just finishes processing a whole packet. In both cases, the number of whole packets scheduled byA On C andA On D arek. We can use the same approach to prove that there exists a constantc, such that C(A On D ) C(A On C ) c: Here, we give an example onc using the motivating example in section 7.2.1, where P (k) = 2 0:5k 1.P (cx)x +P (x + 2) for8x 1 whenc 3:88. We also use the following lemma which is implied by the results in [113]. Lemma 10. Assume that the optimal cost for the continuous online scheduling which minimize R I G(t)dt isC(A Opt C ), C(A On C ) C(A Opt C ) 3 . Lemma 11. LetA Opt D denote the optimal scheduling (BG) of our original problem (dis- crete version), there exists a constantc 0 such that C(A Opt C ) C(A opt D ) c 0 . 149 Proof. To measure the gap between the cost of our algorithm C(A Opt D ) and the cost of A Opt C , we introduce an inter-medium scheduling mechanism, which the speed is only updated at the integer point of time, but cost is calculated in an integral fashion. We call such an inter-medium scheduling mechanism fictitious continuous algorithm, denoted as A Opt FC .A Opt FC works as follows: according to algorithmBG, we know the number of packets sent at each slot: X t ,t = 1; 2;:::.A Opt FC uses the same speedX t at slott. Similar toA On C , the cost is R T t=0 (P (s t ) +n t )dt up to timeT . Considering slott, the cost of optimal scheduling is to schedulek = X t packets in slott, C(A Opt D ) t =P (k) +nk; the cost ofA Opt FC is C(A Opt FC ) t = 1 k (P (k) +n) + + 1 k (P (k) +nk + 1) = P (k) + 2nk + 1 2 =P (k) +nk + k + 1 2 : Similar to the proof in Lemma 9, there exists a constantc 0 , such that (c 0 1)(P (k) +nk) (c 0 1)(P (k)) k + 1 2 150 Thus, C(A Opt FC ) C(A Opt D ) = P C(A Opt FC ) t P C(A Opt D ) t c 0 Since the cost of bothA Opt FC andA Opt C are calculated in the same way, C(A Opt C ) C(A Opt FC ) 1: Thus, C(A Opt C ) C(A Opt D ) = C(A Opt C ) C(A Opt FC ) C(A Opt FC ) C(A Opt D ) c 0 : Here, we give an example onc 0 usingP (k) = 2 0:5k 1. Sincenk, (c 0 1)(P (k) + nk) k+1 2 whenc 0 3:42. Theorem 12.A On D is O(1)-competitive. Proof. From the above lemmas, we get that C(A On D ) C(A Opt D ) = C(A On D ) C(A On C ) C(A On C ) C(A Opt C ) C(A Opt C ) C(A opt D ) < 3cc 0 ; which isO(1)competitive: 151 7.5 Simulations In this section, we run simulations to test the performance of online algorithm. We mainly use two sets of energy functions: polynomial and exponential and randomly generate ar- rivals from the following three patterns: burst arrival, constant arrival and random arrival. Burst arrival and constant arrival can be considered as special cases of random arrival. In wireless networks, several applications generate bursty traffic such as surveillance ap- plications, habitat monitoring applications in wireless sensor networks [133–135]. Con- stant traffic happens less in real wireless applications, though the traffic pattern during transferring a block of video content in video streaming can be considered as constant traffic [136]. In these simulations 2 , we change the coefficient of the interference function to represent different weightw. f(x) = 0:5x 2 .f(x) =e 0:5x 1. The simulation results are as shown in Fig. 7.4. Fig. 7.4 are snapshots of the simulation results. From Fig. 7.4, we can see that with different energy function, the scheduling is different. Although in some slot, the number of packets scheduled by the online algorithm and the optimal can be large, the increased cost is amortized among other slots. Thus, the competitive ratio is small, indicating that the online algorithm’s performance is close to the optimal. 2 The simulation code can be found in [137]. 152 Figure 7.4: Snapshots of simulation results for different arrival patterns 153 Table 7.2: Competitive ratio in simulations Energy cost function Average Worst f(x) = 0:5x 2 1.3157 1.6076 f(x) =x 1:1 1.2971 1.5924 f(x) = 0:25x 3 1.2957 1.8169 f(x) = 0:25(2 x 1) 1.1803 1.8375 f(x) = 2 0:5x 1 1.1663 1.8547 We also run the tests more intensively by selecting different energy cost functions and run the simulation over 1000 times for each function. We randomly generate traffic arrival patterns for each run, and record the average and the largest competitive ratio for each function, as shown in Table 7.2. Although the competitive ratio in principle depends on the cost function, in our simulation, we find this ratio is always less than 2. 7.6 Summary In this chapter we have studied the optimal centralized scheduling of packets in a time slotted channel to effect desired energy-delay tradeoffs. Under a very general energy cost model that is assumed to be any strictly convex increasing function with respect to the number of packets transmitted in a given slot, and a deferral penalty that is linear in the number of slots each packet is deferred by, we aim to minimize the weighted linear combination of deferral penalties and energy penalties. 154 We have proved that given the full knowledge of the arrivals, the centralized scheduler can optimally schedule the packets in each slot using a simple greedy algorithm. We also considered the more realistic scenario which the centralized scheduler only knows the arrivals so far, and we have developed an efficient online algorithm which is O(1)- competitive. 155 Chapter 8 Conclusions and Open Questions In wireless networks, it is quite common that a transmitter requires to take transmission actions without the complete information about the channel conditions. In this disserta- tion, we have presented three case studies: Transmission over a Markovian channel Dynamic multi-carrier selection Transmission for random arrivals with energy-delay tradeoff We have shown that even when the transmitter has incomplete information, it is still possible to design efficient algorithm which achieves near optimal performance. In these studies, we first investigate the situation when the information is complete and prove that the optimal solution has some structure, then we move to study the realistic situation when 156 some of the parameters about the channels are unknown. Armed with theoretical tools such as game theory, reinforcement learning and mechanism design, we have designed novel algorithms and proved they either have a performance bound, or converge to the optimal in the long run. While the results in this dissertation have provided useful insights into real-world optimization with unknown variables, there are a number of interesting open questions and extension directions to be explored in the future. 8.1 Transmission over Multiple Markovian Channels While we have proved that the optimal transmission over a single Markovian channel has a single threshold structure and we gave the formula to calculate the threshold, it is un- clear that the structure will be this simple and elegant when the transmitter is allowed to simultaneously transmit data over multiple Markovian channels. The number of possible power and data rate allocation strategies grows exponentially with the number of chan- nels. Whether the optimal policy still has some threshold structure, whether the optimal policy is calculable, and whether we can design some online learning algorithms to learn the channel parameters when such parameters are unknown are some open questions for future investigation. 157 8.2 Regulating Carrier Action for Markovian Channels In this dissertation, we have discussed the scenario when a transmitter transmits data over multiple i.i.d channels owned by different carriers. The i.i.d channel has only one parameterp, which corresponds to the probability for a successful transmission. However, when the channels are Markovian, each channel will have two parameters: the probability that a channel goes from bad state to a good statep 01 , and the probability that it stays in bad statep 00 , this greatly increases the complexity of the problem, since the transmitter requires to figure out a way to get the information of both parameters. How to design the game, or mechanism such that the transmitter is able to efficiently transmit data even though the transmitter is not able to directly observe the channel parameters is the future work. 8.3 Decentralized Transmission Policy with Energy-Delay Tradeoff We have studied the scheduling problem on a time-slotted channel with energy-delay tradeoff from a oracle-based point of view in this dissertation. However, in real system, it is more likely the nodes are distributed and self-configured. For future work, we plan to consider distributed scheduling, in which different nodes make independent decisions. 158 This could be potentially modelled in a game-theoretic framework. Another open problem is to consider online scheduling in a time-slotted channel where the channel conditions varies over time such as due to fading. This could be modeled by a time-varying energy penalty. 159 Reference List [1] P. Sharma, “Evolution of mobile wireless communication networks-1g to 5g as well as future prospective of next generation communication network,” Interna- tional Journal of Computer Science and Mobile Computing, vol. 2, no. 8, pp. 47– 53, 2013. [2] O. Oyman, J. Foerster, Y .-j. Tcha, and S.-C. Lee, “Toward enhanced mobile video services over wimax and LTE [wimax/LTE update],” IEEE Communications Mag- azine, vol. 48, no. 8, pp. 68–76, 2010. [3] C. Guglielmo, “Cisco mobile data shows surge in smartphone users, 4g usage.” http://www.forbes. com/sites/connieguglielmo/2013/02/06/ cisco-mobile-data-shows-surge-in-smartphone-users-4g-usage/. Online; accessed 04/10/2015. [4] D. Goldman, “Sorry, america: Your wireless airwaves are full.” http: //money.cnn.com/2012/02/21/technology/spectrum_crunch/ index.htm. Online; accessed 04/10/2015. [5] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine learning, vol. 47, no. 2-3, pp. 235–256, 2002. [6] L. Canzian, Y . Xiao, W. Zame, M. Zorzi, and M. van der Schaar, “Intervention with private information, imperfect monitoring and costly communication,” IEEE Transactions on Communications, vol. 61, no. 8, pp. 3192–3205, 2013. [7] J. Park and M. van der Schaar, “The theory of intervention games for resource sharing in wireless communications,” IEEE Journal on Selected Areas in Commu- nications, vol. 30, no. 1, pp. 165–175, 2012. [8] Y . Xiao, J. Park, and M. van der Schaar, “Intervention in power control games with selfish users,” IEEE Journal of Selected Topics in Signal Processing, vol. 6, no. 2, pp. 165–179, 2012. 160 [9] G. Iosifidis, L. Gao, J. Huang, and L. Tassiulas, “An iterative double auction for mobile data offloading,” in Proceedings of the 11th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), pp. 154–161, May 2013. [10] M. J. Osborne, An introduction to game theory, vol. 3. Oxford University Press New York, 2004. [11] V . Srivastava, J. O. Neel, A. B. MacKenzie, R. Menon, L. A. DaSilva, J. E. Hicks, J. H. Reed, and R. P. Gilles, “Using game theory to analyze wireless ad hoc net- works.,” IEEE Communications Surveys and Tutorials, vol. 7, no. 1-4, pp. 46–56, 2005. [12] A. B. MacKenzie and L. A. DaSilva, “Game theory for wireless engineers,” Syn- thesis Lectures on Communications, vol. 1, no. 1, pp. 1–86, 2006. [13] D. Fudenberg and J. Tirole, Game theory. MIT press, 1991. [14] A. B. MacKenzie and L. A. DaSilva, Game theory for wireless engineers. Morgan and Claypool Publishers, 2006. [15] J. F. Nash et al., “Equilibrium points in n-person games,” Proceedings of the na- tional academy of sciences, vol. 36, no. 1, pp. 48–49, 1950. [16] E. Koutsoupias and C. Papadimitriou, “Worst-case equilibria,” in Proceedings of the 16th annual conference on Theoretical aspects of computer science (STACS), pp. 404–413, Springer, March 1999. [17] E. Anshelevich, A. Dasgupta, J. Kleinberg, E. Tardos, T. Wexler, and T. Rough- garden, “The price of stability for network design with fair cost allocation,” SIAM Journal on Computing, vol. 38, no. 4, pp. 1602–1623, 2008. [18] L. Hurwicz and S. Reiter, Designing economic mechanisms. Cambridge University Press, 2006. [19] L. Hurwicz, Optimality and informational efficiency in resource allocation pro- cesses. Stanford University Press, 1960. [20] L. Hurwicz, “On informationally decentralized systems,” Decision and organiza- tion, 1972. [21] D. Mookherjee, “the 2007 nobel memorial prize in mechanism design theory,” The Scandinavian Journal of Economics, vol. 110, no. 2, pp. 237–260, 2008. 161 [22] T. Borgers, D. Krahmer, and R. Strausz, An introduction to the theory of mechanism design. Oxford University Press, 2015. [23] N. Nisan, T. Roughgarden, E. Tardos, and V . V . Vazirani, Algorithmic game theory, vol. 1. Cambridge University Press Cambridge, 2007. [24] M. Jackson, “Mechanism theory, humanities and social sciences,” tech. rep., Cali- fornia Institute of Technology, Pasadena, CA, 2000. [25] V . Krishna, Auction theory. Academic press, 2009. [26] F. M. Menezes and P. K. Monteiro, An introduction to auction theory. Oxford University Press, 2005. [27] W. Vickrey, “Counterspeculation, auctions, and competitive sealed tenders,” The Journal of finance, vol. 16, no. 1, pp. 8–37, 1961. [28] D. Easley and J. Kleinberg, Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press, 2010. [29] H. M. Schwartz, Multi-agent Machine Learning: A Reinforcement Approach. John Wiley & Sons, 2014. [30] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 1998. [31] R. Bellman, “A markovian decision process,” Journal of Mathematics and Me- chanics, vol. 6, pp. 679–684, 1957. [32] E. J. Sondik, “The optimal control of partially observable markov processes over the infinite horizon: Discounted costs,” Operations Research, vol. 26, no. 2, pp. 282–304, 1978. [33] R. E. Bellman, Dynamic Programming. Princeton University Press, 1957. [34] D. M. Jones and J. C. Gittins, A dynamic allocation index for the sequential design of experiments. University of Cambridge, Department of Engineering, 1972. [35] J. C. Gittins, “Bandit processes and dynamic allocation indices,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 148–177, 1979. [36] P. Whittle, “Multi-armed bandits and the gittins index,” Journal of the Royal Sta- tistical Society. Series B (Methodological), pp. 143–149, 1980. 162 [37] P. P. Varaiya, J. Walrand, and C. Buyukkoc, “Extensions of the multiarmed bandit problem: the discounted case,” IEEE Transactions on Automatic Control, vol. 30, no. 5, pp. 426–439, 1985. [38] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Ad- vances in applied mathematics, vol. 6, no. 1, pp. 4–22, 1985. [39] C. J. C. H. Watkins, Learning from delayed rewards. PhD thesis, University of Cambridge England, 1989. [40] S. Albers, Competitive online algorithms. Citeseer, 1996. [41] D. D. Sleator and R. E. Tarjan, “Amortized efficiency of list update and paging rules,” Communications of the ACM, vol. 28, no. 2, pp. 202–208, 1985. [42] M. Manasse, L. McGeoch, and D. Sleator, “Competitive algorithms for on-line problems,” in Proceedings of the twentieth annual ACM symposium on Theory of computing, pp. 322–333, ACM, 1988. [43] R. G. Gallager, Information theory and reliable communication, vol. 2. Springer, 1968. [44] A. J. Goldsmith and P. P. Varaiya, “Capacity of fading channels with channel side information,” IEEE Transactions on Information Theory, vol. 43, no. 6, pp. 1986– 1992, 1997. [45] J. Tang and X. Zhang, “Quality-of-service driven power and rate adaptation over wireless links,” IEEE Transactions on Wireless Communications, vol. 6, no. 8, pp. 3058–3068, 2007. [46] E. Uysal-Biyikoglu, A. El Gamal, and B. Prabhakar, “Adaptive transmission of variable-rate data over a fading channel for energy-efficiency,” in IEEE Global Telecommunications Conference (GLOBECOM), vol. 1, pp. 97–101, November 2002. [47] J. A. Bingham, “Multicarrier modulation for data transmission: An idea whose time has come,” IEEE Communications Magazine, vol. 28, no. 5, pp. 5–14, 1990. [48] P. S. Chow, J. M. Cioffi, J. Bingham, et al., “A practical discrete multitone transceiver loading algorithm for data transmission over spectrally shaped chan- nels,” IEEE Transactions on communications, vol. 43, no. 234, pp. 773–775, 1995. 163 [49] W. Yu and J. M. Cioffi, “Fdma capacity of gaussian multiple-access channels with isi,” IEEE Transactions on Communications, vol. 50, no. 1, pp. 102–111, 2002. [50] K. Kim, Y . Han, and S.-L. Kim, “Joint subcarrier and power allocation in uplink ofdma systems,” IEEE Communications Letters, vol. 9, no. 6, pp. 526–528, 2005. [51] V . Shah, N. B. Mandayam, and D. Goodman, “Power control for wireless data based on utility and pricing,” in IEEE International Symposium on Personal, In- door and Mobile Radio Communications, vol. 3, pp. 1427–1432, September 1998. [52] D. Goodman and N. Mandayam, “Power control for wireless data,” IEEE Personal Communications, vol. 7, no. 2, pp. 48–54, 2000. [53] C. U. Saraydar, N. B. Mandayam, and D. Goodman, “Pareto efficiency of pricing- based power control in wireless data networks,” in IEEE Wireless Communications and Networking Conference (WCNC), pp. 231–235, September 1999. [54] A. B. MacKenzie and S. B. Wicker, “Game theory in communications: Motivation, explanation, and application to power control,” in IEEE Global Telecommunica- tions Conference (GLOBECOM), vol. 2, pp. 821–826, November 2001. [55] S. Gunturi and F. Paganini, “Game theoretic approach to power control in cellular cdma,” in IEEE 58th Vehicular Technology Conference (VTC), vol. 4, pp. 2362– 2366, October 2003. [56] Z. Han, Z. Ji, and K. R. Liu, “Non-cooperative resource competition game by virtual referee in multi-cell ofdma networks,” IEEE Journal on Selected Areas in Communications, vol. 25, no. 6, pp. 1079–1090, 2007. [57] S. Agarwal, R. H. Katz, S. V . Krishnamurthy, and S. K. Dao, “Distributed power control in ad-hoc wireless networks,” in IEEE International Symposium on Per- sonal, Indoor and Mobile Radio Communications, vol. 2, pp. F–59, September 2001. [58] G. Miao, N. Himayat, G. Y . Li, and S. Talwar, “Distributed interference-aware energy-efficient power optimization,” IEEE Transactions on Wireless Communica- tions, vol. 10, no. 4, pp. 1323–1333, 2011. [59] P. Mertikopoulos, E. V . Belmega, A. L. Moustakas, and S. Lasaulce, “Distributed learning policies for power allocation in multiple access channels,” IEEE Journal on Selected Areas in Communications, vol. 30, no. 1, pp. 96–106, 2012. 164 [60] J. Zander, “Jamming games in slotted aloha packet radio networks,” in IEEE Mili- tary Communications Conference (MILCOM), pp. 830–834, October 1990. [61] A. B. MacKenzie and S. B. Wicker, “Selfish users in aloha: a game-theoretic ap- proach,” in IEEE 54th Vehicular Technology Conference (VTC), vol. 3, pp. 1354– 1357, October 2001. [62] A. B. MacKenzie and S. B. Wicker, “Game theory and the design of self- configuring, adaptive wireless networks,” IEEE Communications Magazine, vol. 39, no. 11, pp. 126–131, 2001. [63] A. B. MacKenzie and S. B. Wicker, “Stability of multipacket slotted aloha with selfish users and perfect information,” in Proceedings of IEEE INFOCOM, vol. 3, pp. 1583–1590, April 2003. [64] A. B. MacKenzie, Game theoretic analysis of power control and medium access control. Cornell University, 2003. [65] M. Cagalj, S. Ganeriwal, I. Aad, and J.-P. Hubaux, “On selfish behavior in csma/ca networks,” in Proceedings of IEEE INFOCOM, vol. 4, pp. 2513–2524, March 2005. [66] L. Chen, S. H. Low, and J. C. Doyle, “Random access game and medium access control design,” IEEE/ACM Transactions on Networking (TON), vol. 18, no. 4, pp. 1303–1316, 2010. [67] L. Chen, T. Cui, S. H. Low, and J. C. Doyle, “A game-theoretic model for medium access control,” in Proceedings of WICON, October 2007. [68] D. Braess, “Über ein paradoxon aus der verkehrsplanung,” Un- ternehmensforschung, vol. 12, no. 1, pp. 258–268, 1968. [69] D. Easley and J. Kleinberg, Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press, 2010. [70] A. Urpi, M. Bonuccelli, and S. Giordano, “Modelling cooperation in mobile ad hoc networks: a formal description of selfishness,” in Proceedings of the 1st Interna- tional Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), pp. 10–pages, March 2003. [71] P. Michiardi and R. Molva, “Analysis of coalition formation and cooperation strate- gies in mobile ad hoc networks,” Ad Hoc Networks, vol. 3, no. 2, pp. 193–219, 2005. 165 [72] M. Félegyházi, L. Buttyán, and J.-P. Hubaux, “Equilibrium analysis of packet for- warding strategies in wireless ad hoc networks–the static case,” pp. 776–789, 2003. [73] E. Altman, A. A. Kherani, P. Michiardi, and R. Molva, “Non-cooperative forward- ing in ad-hoc networks,” in NETWORKING 2005: Networking Technologies, Ser- vices, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communications Systems, pp. 486–498, Springer, 2005. [74] R. Axelrod and W. D. Hamilton, “The evolution of cooperation,” Science, vol. 211, no. 4489, pp. 1390–1396, 1981. [75] N. Nin and C. Comaniciu, “Adaptive channel allocation spectrum etiquette,” in Proceedings of the 1st IEEE International Symposium on New Frontiers in Dy- namic Spectrum Access Networks (DySPAN), pp. 269–278, November 2005. [76] M. M. Halldórsson, J. Y . Halpern, L. E. Li, and V . S. Mirrokni, “On spectrum sharing games,” in Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, pp. 107–114, ACM, July 2004. [77] D. Niyato and E. Hossain, “Competitive pricing for spectrum sharing in cognitive radio networks: Dynamic game, inefficiency of nash equilibrium, and collusion,” IEEE Journal on Selected Areas in Communications, vol. 26, no. 1, pp. 192–202, 2008. [78] A. Ghosh and S. Sarkar, “Quality sensitive price competition in spectrum oligopoly,” in IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 2770–2774, July 2013. [79] S. Gandhi, C. Buragohain, L. Cao, H. Zheng, and S. Suri, “A general framework for wireless spectrum auctions,” in IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks (DySPAN), pp. 22–33, April 2007. [80] J. Huang, R. A. Berry, and M. L. Honig, “Auction mechanisms for distributed spec- trum sharing,” in Proceedings of the fourty-second Allerton Conference, September 2004. [81] J. Huang, R. A. Berry, and M. L. Honig, “Auction-based spectrum sharing,” ACM Mobile Networks and Applications, vol. 11, no. 3, pp. 405–418, 2006. [82] L. Gao, Y . Xu, and X. Wang, “Map: Multiauctioneer progressive auction for dy- namic spectrum access,” IEEE Transactions on Mobile Computing, vol. 10, no. 8, pp. 1144–1161, 2011. 166 [83] L. A. Johnston and V . Krishnamurthy, “Opportunistic file transfer over a fading channel: A pomdp search theory formulation with optimal threshold policies,” IEEE Transactions on Wireless Communications, vol. 5, no. 2, pp. 394–405, 2006. [84] A. K. Karmokar, D. V . Djonin, and V . K. Bhargava, “Optimal and suboptimal packet scheduling over correlated time varying flat fading channels,” IEEE Trans- actions on Wireless Communications, vol. 5, no. 2, pp. 446–456, 2006. [85] Q. Zhao, B. Krishnamachari, and K. Liu, “On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance,” IEEE Transactions on Wireless Communications, vol. 7, no. 12, pp. 5431–5440, 2008. [86] S. H. A. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari, “Optimality of myopic sensing in multichannel opportunistic access,” IEEE Transactions on Information Theory, vol. 55, no. 9, pp. 4040–4050, 2009. [87] Y . Gai, B. Krishnamachari, and R. Jain, “Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation,” in IEEE Symposium on International Dynamic Spectrum Access Networks (DySPAN), pp. 1–9, April 2010. [88] Y . Gai, B. Krishnamachari, and R. Jain, “Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual obser- vations,” IEEE/ACM Transactions on Networking (TON), vol. 20, no. 5, pp. 1466– 1478, 2012. [89] Y . Gai, B. Krishnamachari, and M. Liu, “On the combinatorial multi-armed bandit problem with markovian rewards,” in IEEE Global Telecommunications Confer- ence (GLOBECOM), pp. 1–6, December 2011. [90] Y . Gai and B. Krishnamachari, “Online learning algorithms for stochastic water- filling,” in Information Theory and Applications Workshop (ITA), pp. 352–356, February 2012. [91] Y . Gai and B. Krishnamachari, “Decentralized online learning algorithms for op- portunistic spectrum access,” in IEEE Global Telecommunications Conference (GLOBECOM), pp. 1–6, December 2011. [92] D. Kalathil, N. Nayyar, and R. Jain, “Decentralized learning for multiplayer multiarmed bandits,” IEEE Transactions on Information Theory, vol. 60, no. 4, pp. 2331–2345, 2014. 167 [93] L. Lai, H. Jiang, and H. V . Poor, “Medium access in cognitive radio networks: A competitive multi-armed bandit framework,” in the 42nd Asilomar Conference on Signals, Systems and Computers, pp. 98–102, October 2008. [94] A. Anandkumar, N. Michael, and A. Tang, “Opportunistic spectrum access with multiple users: learning under competition,” in Proceedings of IEEE INFOCOM, pp. 1–9, March 2010. [95] K. Liu and Q. Zhao, “Learning from collisions in cognitive radio networks: Time division fair sharing without pre-agreement,” in Proceedings of IEEE Military Communication Conference (MILCOM), pp. 2262–2267, November 2010. [96] K. Liu and Q. Zhao, “Decentralized multi-armed bandit with multiple distributed players,” in Information Theory and Applications Workshop (ITA), pp. 1–10, Febru- ary 2010. [97] W. Dai, Y . Gai, B. Krishnamachari, and Q. Zhao, “The non-bayesian restless multi- armed bandit: A case of near-logarithmic regret,” in IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP), pp. 2940–2943, May 2011. [98] N. Nayyar, Y . Gai, and B. Krishnamachari, “On a restless multi-armed bandit problem with non-identical arms,” in Proceedings of the 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 369–376, September 2011. [99] L. A. Belady, “A study of replacement algorithms for a virtual-storage computer,” IBM Systems journal, vol. 5, no. 2, pp. 78–101, 1966. [100] A. Fiat, R. M. Karp, M. Luby, L. A. McGeoch, D. D. Sleator, and N. E. Young, “Competitive paging algorithms,” Journal of Algorithms, vol. 12, no. 4, pp. 685– 699, 1991. [101] Y . Bartal, M. Charikar, and P. Indyk, “On page migration and other related task systems.,” in Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms (SODA), pp. 43–52, Citeseer, January 1997. [102] B. Awerbuch, Y . Bartal, and A. Fiat, “Competitive distributed file allocation,” in Proceedings of the twenty-fifth annual ACM symposium on Theory of computing (STOC), pp. 164–173, ACM, May 1993. 168 [103] Y . Bartal, A. Fiat, and Y . Rabani, “Competitive algorithms for distributed data management,” Journal of Computer and System Sciences, vol. 51, no. 3, pp. 341– 358, 1995. [104] R. A. Baezayates, J. C. Culberson, and G. J. Rawlins, “Searching in the plane,” Information and Computation, vol. 106, no. 2, pp. 234–252, 1993. [105] C. H. Papadimitriou and M. Yannakakis, “Shortest paths without a map,” in Au- tomata, Languages and Programming, pp. 610–620, Springer, 1989. [106] R. L. Graham, “Bounds for certain multiprocessing anomalies,” Bell System Tech- nical Journal, vol. 45, no. 9, pp. 1563–1581, 1966. [107] Y . Bartal, A. Fiat, H. Karloff, and R. V ohra, “New algorithms for an ancient scheduling problem,” in Proceedings of the twenty-fourth annual ACM symposium on Theory of computing (STOC), pp. 51–58, ACM, May 1992. [108] Y . Azar, B. Kalyanasundaram, S. Plotkin, K. R. Pruhs, and O. Waarts, “Online load balancing of temporary tasks,” in Algorithms and Data Structures, pp. 119– 130, Springer, 1993. [109] Z. Liu, M. Lin, A. Wierman, S. H. Low, and L. L. Andrew, “Greening geographical load balancing,” in Proceedings of the ACM SIGMETRICS joint international con- ference on Measurement and modeling of computer systems, pp. 233–244, ACM, June 2011. [110] M. Lin, A. Wierman, L. Andrew, and E. Thereska, “Dynamic right-sizing for power-proportional data centers,” in Proceedings of IEEE INFOCOM, pp. 1098– 1106, April 2011. [111] M. Lin, Z. Liu, A. Wierman, and L. L. Andrew, “Online algorithms for geographi- cal load balancing,” in Proceedings of IEEE Green Computing Conference (IGCC), pp. 1–10, June 2012. [112] N. Bansal, T. Kimbrel, and K. Pruhs, “Speed scaling to manage energy and tem- perature,” Journal of the ACM (JACM), vol. 54, no. 1, p. 3, 2007. [113] N. Bansal, K. Pruhs, and C. Stein, “Speed scaling for weighted flow time,” SIAM Journal on Computing, vol. 39, no. 4, pp. 1294–1308, 2009. [114] L. L. Andrew, A. Wierman, and A. Tang, “Optimal speed scaling under arbitrary power functions,” ACM SIGMETRICS Performance Evaluation Review, vol. 37, no. 2, pp. 39–41, 2009. 169 [115] K. Pruhs, “Competitive online scheduling for server systems,” ACM SIGMETRICS Performance Evaluation Review, vol. 34, no. 4, pp. 52–58, 2007. [116] A. Wierman, L. L. Andrew, and A. Tang, “Power-aware speed scaling in processor sharing systems,” in Proceedings of IEEE INFOCOM, pp. 2007–2015, April 2009. [117] Y . Wu and B. Krishnamachari, “Online learning to optimize transmission over an unknown gilbert-elliott channel,” in Proceedings of the 10th International Sym- posium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), pp. 27–32, May 2012. [118] A. Laourine and L. Tong, “Betting on gilbert-elliot channels,” IEEE Transactions on Wireless Communications, vol. 9, pp. 723–733, February 2010. [119] R. D. Smallwood and E. J. Sondik, “The optimal control of partially observable markov processes over a finite horizon,” Operations Research, vol. 21, no. 5, pp. 1071–1088, 1973. [120] S. Ross, Applied Probability Models with Optimization Applications. San Fran- cisco: Holden-Day, 1970. [121] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine learning, vol. 47, no. 2-3, pp. 235–256, 2002. [122] Y . Wu, G. Rabanca, B. Krishnamachari, and A. Bar-Noy, “A competitive rate allo- cation game,” in Proceedings of the 3rd International Conference on Game Theory for Networks, pp. 16–30, Springer, June 2012. [123] Y . Wu, B. Krishnamachari, G. Rabanca, and A. Bar-Noy, “Efficient mechanism design for competitive uplink carrier selection and rate allocation.” submitted to IEEE Transactions on Vehicular Technology. [124] S. Mandal, D. Saha, and A. Mahanti, “A technique to support dynamic pricing strategy for differentiated cellular mobile services,” in IEEE Global Telecommuni- cations Conference (GLOBECOM), vol. 6, pp. 5–pp, November 2005. [125] S. Yaipairoj and F. C. Harmantzis, “Auction-based congestion pricing for wireless data services,” in IEEE International Conference on Communications (ICC), vol. 3, pp. 1059–1064, June 2006. [126] B. Al-Manthari, N. Nasser, and H. Hassanein, “Congestion pricing in wireless cellular networks,” IEEE Communications Surveys and Tutorials, vol. 13, no. 3, pp. 358–371, 2011. 170 [127] M. Bateni, M. T. Hajiaghayi, S. Jafarpour, and D. Pei, “Towards an efficient al- gorithmic framework for pricing cellular data service,” in Proceedings of IEEE INFOCOM, pp. 581–585, April 2011. [128] G. Yuan, X. Zhang, W. Wang, and Y . Yang, “Carrier aggregation for LTE-advanced mobile communication systems,” IEEE Communications Magazine, vol. 48, no. 2, pp. 88–93, 2010. [129] 3GPP R1-084424, “Control channel design issues for carrier aggregation in LTE- A,” tech. rep., Motorola, Prague, Czech Republic, Nov 2008. [130] G. De la Roche, A. Alayón-Glazunov, and B. Allen, LTE-advanced and next gen- eration wireless networks: Channel modelling and propagation. John Wiley & Sons, 2012. [131] “Mobile phone base station database.” http://www.sitefinder.ofcom. org.uk/. [132] Y . Wu, R. Kannan, and B. Krishnamachari, “Efficient scheduling for energy-delay tradeoff on a time-slotted channel.” submitted to IEEE Globecom 2015. [133] R. Alesii, F. Graziosi, L. Pomante, and C. Rinaldi, “Exploiting wsn for audio surveillance applications: the vowsn approach,” in Proceedings of the eleventh EUROMICRO Conference on DIGITAL SYSTEM DESIGN Architectures, Methods and Tools, pp. 520–524, 2008. [134] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson, “Wireless sensor networks for habitat monitoring,” in Proceedings of the first ACM interna- tional workshop on Wireless sensor networks and applications, pp. 88–97, 2002. [135] S. Ray, I. Demirkol, and W. Heinzelman, “Supporting bursty traffic in wireless sensor networks through a distributed advertisement-based tdma protocol (atma),” Ad Hoc Networks, vol. 11, no. 3, pp. 959–974, 2013. [136] A. Rao, A. Legout, Y .-s. Lim, D. Towsley, C. Barakat, and W. Dabbous, “Network characteristics of video streaming traffic,” in Proceedings of the Seventh COnfer- ence on emerging Networking EXperiments and Technologies, p. 25, ACM, 2011. [137] Y . Wu, “Transmission for random arrivals with energy-delay tradeoff simulation code.” http://anrg.usc.edu/www/downloads/. Item 30; online; ac- cessed 06/10/2015. 171
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Optimal resource allocation and cross-layer control in cognitive and cooperative wireless networks
PDF
Online learning algorithms for network optimization with unknown variables
PDF
Utilizing context and structure of reward functions to improve online learning in wireless networks
PDF
Algorithmic aspects of energy efficient transmission in multihop cooperative wireless networks
PDF
Multichannel data collection for throughput maximization in wireless sensor networks
PDF
Learning, adaptation and control to enhance wireless network performance
PDF
Intelligent near-optimal resource allocation and sharing for self-reconfigurable robotic and other networks
PDF
Relative positioning, network formation, and routing in robotic wireless networks
PDF
Optimizing task assignment for collaborative computing over heterogeneous network devices
PDF
Optimal distributed algorithms for scheduling and load balancing in wireless networks
PDF
On practical network optimization: convergence, finite buffers, and load balancing
PDF
Sensing with sound: acoustic tomography and underwater sensor networks
PDF
Data-driven optimization for indoor localization
PDF
Time synchronization and scheduling in underwater wireless networks
PDF
Understanding the characteristics of Internet traffic dynamics in wired and wireless networks
PDF
Improving spectrum efficiency of 802.11ax networks
PDF
On scheduling, timeliness and security in large scale distributed computing
PDF
Quantum computation in wireless networks
PDF
Joint routing, scheduling, and resource allocation in multi-hop networks: from wireless ad-hoc networks to distributed computing networks
PDF
Energy-efficient packet transmissions with delay constraints for wireless communications
Asset Metadata
Creator
Wu, Yanting
(author)
Core Title
Scheduling and resource allocation with incomplete information in wireless networks
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering (Computer Networks)
Publication Date
07/01/2015
Defense Date
04/23/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
algorithm/mechanism design and analysis,OAI-PMH Harvest,online learning,scheduling and resource allocation,wireless network optimization
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Krishnamachari, Bhaskar (
committee chair
), Silvester, John A. (
committee member
), Teng, Shang-Hua (
committee member
)
Creator Email
mariawuyt@gmail.com,yantingw@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-585285
Unique identifier
UC11302036
Identifier
etd-WuYanting-3536.pdf (filename),usctheses-c3-585285 (legacy record id)
Legacy Identifier
etd-WuYanting-3536.pdf
Dmrecord
585285
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Wu, Yanting
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
algorithm/mechanism design and analysis
online learning
scheduling and resource allocation
wireless network optimization