Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Information design in non-atomic routing games: computation, repeated setting and experiment
(USC Thesis Other)
Information design in non-atomic routing games: computation, repeated setting and experiment
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Information Design in Non-atomic Routing Games: Computation, Repeated Setting and
Experiment
by
Yixian Zhu
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
December 2022
Copyright 2022 Yixian Zhu
This paper is dedicated to my institution mentors
under whose constant guidance I have completed this dissertation.
They not only enlightened me with academic knowledge
but also gave me valuable advice whenever I needed it the most.
ii
Acknowledgements
I acknowledges the financial support of USC Ming Hsieh Department of Electrcal and Computer
Engineering and USC Sonny Astani Department of Civil and Environmental Engineering
I would like to thank my advisor Dr. Ketan Savla for motivating me throughout my journey of
this thesis paper.
I would also like to thank my committee members Dr. Petros Ioannou, Dr. Mihailo Jovanovic,
Dr. Ashutosh Nayyar, and Dr. Maged M Dessouky wholeheartedly.
My thanks also go to USC Center for Advanced Research Computing, undergraduate student
researcher Christine Stavish, Grace Foltz and Alejandra Reyes, without whose help this thesis
paper was not possible.
Finally, I am thankful to my respectful parents and beloved girlfriend who inspired me through
my long journey.
iii
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter 2: Problem Formulation and Preliminaries for Information Design . . . . . . 9
2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Private Signaling Policies . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Indirect Signaling Policies . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.3 Public Signaling Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 3: An Exact Polynomial Optimization Formulation for Private Signaling Poli-
cies in Information Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Atomic Private Signaling Policies . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Diagonal Atomic Private Signaling Policies . . . . . . . . . . . . . . . . . . . . . 20
3.3 Monotonicity of Optimal Cost Value under Diagonal Atomic Private Signaling
Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 4: Simulations and Computational Complexity for Information Design . . . . 24
4.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 Parallel Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1.1 Affine Latency Functions . . . . . . . . . . . . . . . . . . . . . 26
4.1.1.2 BPR Latency Functions . . . . . . . . . . . . . . . . . . . . . . 28
4.1.2 Wheatstone Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.2.1 Affine Latency Functions . . . . . . . . . . . . . . . . . . . . . 30
4.1.2.2 Quadratic Latency Functions . . . . . . . . . . . . . . . . . . . 32
4.1.3 Scaling of Runtime with Network Size . . . . . . . . . . . . . . . . . . . 34
iv
Chapter 5: Convergence in Repeated Setting . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1 Route choice model for participating agents . . . . . . . . . . . . . . . . . . . . . 36
5.2 Route choice model for non-participating agents . . . . . . . . . . . . . . . . . . . 39
5.3 Convergence Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5 Connection to Existing Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.5.1 Calibrated Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.5.2 Regret-Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5.3 Experience Weighted Attraction Learning Model . . . . . . . . . . . . . . 45
5.5.4 Individual Evolutionary Learning Model . . . . . . . . . . . . . . . . . . 48
5.5.5 Adjusted Mistrust Dynamic . . . . . . . . . . . . . . . . . . . . . . . . . 50
Chapter 6: Experimental Study on Learning Correlated Equilibrium . . . . . . . . . . 52
6.1 Experiment Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 Adapting Theory to Experiment Setup and Hypotheses . . . . . . . . . . . . . . . 55
6.2.1 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.3 Experiment Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3.1 Experiment Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3.2 Recommendation Following vs Displayed Rating . . . . . . . . . . . . . . 59
6.3.3 Long Run Behavior of
ˆ
P . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.3.4 Long Run Behavior of the Displayed Rating . . . . . . . . . . . . . . . . . 60
6.3.5 Long Run Empirical Probability of Following Recommendation . . . . . . 61
6.3.6 Displayed Rating vs Time-Averaged Aggregated Regret . . . . . . . . . . 62
Chapter 7: Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
A Matrix Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
B Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
C Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
D Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
E Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
F Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
G Technical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
H Proof of Proposition 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
I Proof of Proposition 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
J Proof of Proposition 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
K Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
L Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
M Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
v
N Feedback Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
vi
List of Figures
4.1 Comparison of minimum cost achievable under private signals, public signals and
full information over two parallel links, under different ν for (a) affine latency
functions and (b) BPR latency functions. . . . . . . . . . . . . . . . . . . . . . . 26
4.2 (a) Wheatstone network; and comparison of social costs under (b) affine link
latency functions and (c) quadratic link latency functions. . . . . . . . . . . . . . 30
4.3 Log linear plot of run time versus n for parallel networks. . . . . . . . . . . . . . . 34
5.1 Convergence of regret and its forecast over two parallel links for (a) review aggregation
with discounting factor λ = 0.9, and (b) dynamic participation rate. . . . . . . . . . . 43
5.2 Evolution of x(k)− π
ω(k)
with time k for extended EWA model. . . . . . . . . . . 48
5.3 Evolution of x(k)− π
ω(k)
with time k for extended IEL model. . . . . . . . . . . . 50
5.4 Evolution of regret and its forecast over two parallel links for adjusted mistrust dynamics. 51
6.1 User interface before route selection during a typical scenario. . . . . . . . . . . . 53
6.2 User interface after route selection and before review submission during a typical
scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3 Input-output illustration for the participant and the simulated model. . . . . . . . 56
6.4 Linear regression between empirical probability of recommendation and quantized
displayed rating. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.5 Evolution of
ˆ
P
s
with participant number s. . . . . . . . . . . . . . . . . . . . . . 60
6.6 Evolution of end of session display rating r
s
(100) with participant number s. . . . 60
6.7 Evolution of the display rating during the session of the last participant, i.e.,
r
33
(k), with scenario number k. . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.8 Evolution of end of session cumulative empirical probability of following recom-
mendation with increasing participant number. . . . . . . . . . . . . . . . . . . . 62
6.9 Relation between displayed rating and time-averaged aggregated regret for all
participants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
vii
6.10 Relation between displayed rating and time-averaged aggregated regret for a sam-
ple participant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
viii
Abstract
We consider a routing game among non-atomic agents where link latency functions are condi-
tional on an uncertain state of the network. The agents have the same prior belief about the state,
but only a fixed fraction receive private route recommendations or a common message, which are
generated by a known randomization, referred to as private or public signaling policy respectively.
The remaining agents choose route according to Bayes Nash flow with respect to the prior. We
develop a computational approach to solve the optimal information design problem, i.e., to mini-
mize expected social latency over all public or obedient private signaling policies. For a fixed flow
induced by non-participating agents, design of an optimal private signaling policy is shown to be
a generalized problem of moments for polynomial link latency functions, and to admit an atomic
solution with a provable upper bound on the number of atoms. This implies that, for polynomial
link latency functions, information design can be equivalently cast as a polynomial optimization
problem. This in turn can be arbitrarily lower bounded by a known hierarchy of semidefinite relax-
ations. The first level of this hierarchy is shown to be exact for the basic two link case with affine
latency functions. We also identify a class of private signaling policies over which the optimal
social cost is non-increasing with increasing fraction of participating agents for parallel networks.
This is in contrast to existing results where the cost of participating agents under a fixed signaling
policy may increase with their increasing fraction.
We then study the non-atomic routing game in a repeated setting. In every round, nature
chooses a state in an i.i.d. manner according to a publicly known distribution. The recommenda-
tion system makes private route recommendations to participating agents according to a publicly
known signaling policy. The participating agents choose between obeying or not obeying the rec-
ommendation according to cumulative regret of the participating agent population in the previous
ix
round. The non-participating agents choose route according to myopic best response to a calibrated
forecast of the routing decisions of the participating agents. We show that, for parallel networks, if
the recommendation system’s signal strategy satisfies the obedience condition, then, almost surely,
the link flows are asymptotically consistent with the Bayes correlated equilibrium induced by the
signaling policy.
Finally, we report findings from a related experiment with one participant at a time engaged
in repeated route choice decision on computer. In every round, the participant is shown travel
time distribution for each route, a route recommendation generated by an obedient policy, and
a numeric rating suggestive of average experience of previous participants with the quality of
recommendation. Upon entering route choice, the actual travel times derived from route choice of
previous participants are revealed. The participant uses this information to evaluate the quality of
recommendation received for that round and enter a numeric review accordingly. This is combined
with historical reviews to update rating for the next round. Data analysis from 33 participants each
with 100 rounds suggests moderate negative correlation between the display rating and the average
regret with respect to optimal choice, and a strong positive correlation between the rating and
the likelihood of following recommendation. Overall, under obedient recommendation policy, the
rating seems to converge close to its maximum value by the end of the experiments in conjunction
with very high frequency of following recommendations.
x
Chapter 1
Introduction
Route choice decision in traffic networks under uncertain and dynamic environments, such as the
ones induced by recurring unpredictable incidents, can be a daunting task for agents. Private
route recommendation or public information systems could therefore play an important role in
such settings. While the agents have prior about the uncertain state, e.g., through experience or
publicly available historic records, the informational advantage of such systems in knowing the
actual realization gives the possibility of inducing a range of traffic flows through appropriate
route recommendation or public information strategies.
A strategy of a recommendation system to map state realization to randomized private route
recommendations for the agents is referred to as a private signaling policy; a strategy to map
state realization to randomized public messages is referred to as a public signaling policy. The
implementation of a private signaling policy requires the ability to provide different route recom-
mendation to different agents. This can be achieved through personal mobile devices. On the other
hand, public signaling policies require broadcasting the same message to all the agents. This can
be achieved though road side variable message signs or through personal mobile devices. If the
state corresponds toincident orno incident, then, e.g., the message space for the public policy
can be the same, with no broadcast when the message generated by the policy is no incident.
Alternately, a message could also be a route recommendation. A private signaling policy is feasible
or obedient, if, to every agent, it recommends a route which is weakly better in expectation, with
respect to the induced posterior, than the other routes. Under a public signaling policy, the agents
1
can be assumed to choose routes consistent with Bayes Nash flow with respect to the posterior.
The problem of minimizing expected social latency cost over all obedient private or over all public
signaling policies is referred to as information design in this thesis. We are interested in these prob-
lems for non-atomic agents, when a fraction of agents do not participate in signaling and induce
Bayes Nash flow with respect to the prior. The technical challenge is the joint consideration of
optimal signaling policy for participating agents and the flow induced by non-participating agents.
We are also interested in a repeated non-atomic routing game where the link latency functions
are conditional on the state of the nature, e.g., whether there is a traffic incident or not. This state is
generated in an i.i.d. manner from an exogenous distribution. Extension of obedience constraint to
a repeated setting is not clear. Specifically, how does an agent determine obedience based on the se-
ries of recommendations and payoffs that she receives, and can a static notion guarantee persistent
obedience in a repeated setting? One naive justification is that the agents follow the recommen-
dations in every round until they have sufficient samples to empirically evaluate the obedience
condition. If the strategy indeed satisfies the obedience constraint, then the agents’ evaluation
would be so and therefore they will continue following the recommendation subsequently. We
rather consider a setting where the agents choose between obeying or not obeying recommenda-
tion in every round from the beginning. This choice is determined by the agents’ regret associated
with the decision in the previous round and aggregated over all other agents. Moreover, we also
allow for partial signaling in repeated setting, i.e., when only a fraction of agents participate in
signaling. The rest are assumed to choose routes as best response to their forecast of the link flows
induced by the participating agents.
The efficacy of the recommendation system depends on the extent to which the drivers follow
the recommendations. This aspect is ignored in typical route recommender systems, e.g., see [1,
2]. A plausible extension of obedience to repeated setting is that the likelihood of following rec-
ommendation in a round is related to regret from previous rounds. However, there has been no
experimental study of the learning rule used in our repeated setting or even of correlated equilib-
rium for non-atomic games to the best of our knowledge.
2
1.1 Motivations
Information design for finite agents has attracted considerable attention recently with applications
in multiple domains, e.g., see [3] for an overview; the single agent case was studied in [4] as
Bayesian persuasion. In the finite agent (and finite action) setting, the obedience condition on the
signaling policy can be expressed as finite linear constraints, one for each combination of actions
by the agents. This allows to cast the information design problem as a tractable optimization
problem. Techniques to further reduce computational cost of information design are presented in
[5]. However, analogous computational approaches to solve information design for non-atomic
agents, particularly for routing games, are lacking.
There has been a growing interest recently in understanding the impact of information in non-
atomic routing games. For example, [6–9] illustrate that revealing full information to all the agents
may not minimize social cost. Information design using private signaling policies, as in this thesis,
has also been pursued recently in [10]. Optimal public signaling policies for some settings were
characterized in [11]. While these works provide useful insights, the information design aspect
of these works is restricted to stylized settings involving a network with just two parallel links,
sub-optimal policies, and link latency functions which ensure non-zero flow on all links under all
state realizations. It is not apparent to what extent can the methodologies underlying these studies,
which typically rely on analytical solutions, be generalized. Motivated by this, we develop a com-
putational approach in this thesis. While the detailed discussion is presented for parallel networks
for simplicity in presentation, we also describe the extension of the computational framework to
general networks with a single origin-destination pair.
Our key observation is that information design for polynomial latency functions has strong
connections with the generalized problem of moments (GPM) [12]. A GPM minimizes, over finite
probability measures, a cost which is linear in moments with respect to these measures subject to
constraints which are also linear in the moments. This connection allows to leverage computational
tools developed for GPM, such as GloptiPoly [13], which utilizes a hierarchy of semidefinite
relaxations to lower bound GPM arbitrarily closely. For a fixed flow induced by non-participating
3
agents, information design for participating agents is indeed a GPM. Furthermore, since the cost
and constraints involve moments up to a finite order, there exists an optimal signaling policy which
is atomic with provable upper bound on the number of atoms [14]. In other words, interestingly, a
finite-support, atomic signaling policy can achieve optimal performance. This property also allows
to equivalently cast information design, when the non-participating agents choose route according
to Bayes Nash flow, as a polynomial optimization problem. This can be arbitrarily lower bounded
by hierarchy of semidefinite relaxations [15], which can also be implemented in GloptiPoly. The
first level of this hierarchy is shown to be exact for the basic two link case with affine latency
functions, and it relies on using convexity of cost function and constraints to sharpen the bound
from [14] for optimal solution. The lower bound obtained from the hierarchy can be used to
upper bound the optimality gap of a feasible solution obtained by packages such as MultiStart
in Matlab. Indeed, in our simulations, we report number of starting points for MultiStart and
the relaxation order forGloptiPoly for which this gap was found to be negligible.
It is natural to compare our approach with semidefinite programming based approaches for
computation of (Bayes) correlated equilibria, e.g., in continuous polynomial games [16]. In [16],
the action set is continuous and the agents are finite, and hence alternate formulations for cor-
related equilibrium are proposed which involve approximation through finite moments and dis-
cretization of the action set. In our setup, where the action set is finite and agents are non-atomic,
the constraints for participating agents are readily in computational form and involve moments
up to a finite order without any approximation. This then allows us to consider an equivalent fi-
nite discretization, with known cardinality, of the agent population, to transform equivalently into
polynomial optimization. Thereafter, the use of semidefinite relaxation hierarchy is standard.
The computational approach of this thesis can be utilized to complement the current studies
on (paradoxical) effect of different fractions of participating agents under specific public signaling
policies (primarily, full information). While existing works, e.g., [17, 18], study the effect on
population-specific (i.e., participating and non-participating) costs, we study the effect on the social
cost, in the spirit of the recommendation system’s perspective adopted in the thesis. We provide
4
a class of private signaling policies under which the optimal social cost is non-increasing with
increasing fraction of participating agents. The key idea is to use an optimal solution at a given
fraction to synthesize signaling policies which are feasible for all higher fractions and give the
same cost. This monotonic result does not require the link latency functions to be polynomial. On
the other hand, we illustrate through examples that public signaling policies may worsen social
performance if too many agents receive the signal.
The motivation for the repeated setting comes from the practical consideration that agents have
the discretion to use or not to use a navigation system, and if they use one, then whether to obey
the recommendation or not on any given trip. It is therefore of interest to understand the long term
effectiveness of a route recommendation strategy in such a setting. The feature of our model to
let the decision of a participating agent depend on the collective experience of all the participating
agents in the previous rounds is inspired by platforms such as Yelp which aggregate feedback from
all users to provide a single review rating which is publicly accessible. The decision models in this
thesis are reminiscent of well-known models studied in the context of convergence to correlated
equilibrium, e.g., [19–21]. This is not completely surprising given that the obedience constraint
has been shown to be equivalent to Bayes correlated equilibrium [22]. Specifically, a participating
agent computes regret associated with the decision in the previous round in the same spirit as [21];
however, its decision in the current round depends on the cumulative regret of all the participating
agents, and also, the regret determines the propensity to follow the recommendation in the current
round as opposed to deviating from the decision in the previous round as in [21]. The model for
non-participating agents to choose routes as best response to a forecast of the flow induced by the
participating agents is reminiscent of [19].
There has been a lot of theoretical work on learning Nash equilibria in the context of selfish
routing, e.g., see [23–25]. There has also been experimental investigation on convergence of route
choice behavior to Nash equilibrium, e.g., see [26, 27]. However, corresponding work on corre-
lated equilibria are lacking. In face, existing experimental studies on correlated equilibrium are
almost exclusively for 2 X 2 settings and concern primarily Game of Chicken and the Battle of
5
Sexes, e.g., see [28–32]. These studies have identified the following to be key drivers for ensuring
high rate of following recommendation: (i) clear and common understanding of the signal which
generates the recommendations, often achieved by public announcement of the signaling policy;
(ii) the induced correlated equilibrium being Pareto improvement over Nash equilibria; (iii) accu-
rate knowledge of mapping between utility and payoffs of players; (iv) trust that the opponent will
follow the recommendation; and (v) fixed vs. random opponent in different rounds, with the former
allowing development of strategy profile of the opponent and using it for choice of action. Other
studies include [33] which examined the effect of payoff asymmetry on a participant’s willingness
to follow recommendations.
Some of the considerations mentioned above in the context of following recommendations
become more significant for non-atomic games. Signaling policies for these games comprise prob-
ability distributions with continuous support. This complexity makes the announcement of signal
by the mediator as well as its comprehension by the participants to be challenging. This may po-
tentially aggravate known issue with participants not able to execute Bayes rule accurately even
for probability mass functions. The issue of trust that other participants will follow recommen-
dations takes assumes a bigger role in the presence of continuum of participants. Furthermore,
practical limitations on laboratory experiments necessitates consideration of a pseudo-non-atomic
setup, consisting of mixture of a small number of human subjects and a large number of simulated
agents. We performed experiments in the limiting case of one human subject at a time with change
of participant after a fixed number of rounds.
1.2 Contributions
Regarding computational properties of information design in static non-atomic routing games, the
main contributions of the thesis are as follows. First, by making connection to GPM and associ-
ated semidefinite programming machinery, we point to a compelling computational framework to
solve information design problems. Second, by establishing the existence of an atomic optimal
6
solution, we provide credence to such a structural assumption often implicitly made in information
design studies. The sharpening of the bound on the number of atoms that we illustrate in a simple
case suggests the possibility of using the problem structure to reduce the size of the optimization
formulation, and hence the computation cost. Third, the result and underlying proof technique
for the monotonic behavior of social cost under a reasonable class of private signaling policies
imply that private signaling policies can guarantee performance which is robust to higher than an-
ticipated agent participation rate. However, our results also suggest that this may be difficult to
achieve through public signaling policies. Overall, the contributions allow to considerably expand
the scope of information design studies which has been limited so far to stylized settings.
In regard to the convergence in repeated non-atomic routing games, our main result are as
follows. If the regret is of satisficing type with respect to a default choice and is averaged over
past rounds and over all agents, and if the signaling policy is obedient, then the non-participating
agents’ forecast of the link flows induced by the participating agents converges to actual induced
link flows. Moreover, the asymptotic link flows converges to the link flows corresponding to the
Bayes correlated equilibrium associated with the signaling policy.
As to the experimental study of the repeated non-atomic routing games, our main contributions
and findings are as follows. First, we provide an experiment protocol template for studying cor-
related equilibrium in non-atomic games. Second, through our experimental findings, we provide
an instantiation of a setting in which the long run route choice decisions are highly consistent with
private recommendations generated by an obedient policy. Our third contribution is in verifying el-
ements of a prior dynamic model for obedience in repeated setting. Our experiments found strong
correlation between following recommendation in a particular round and the display rating in that
round. We found moderate correlation overall between the review submitted by a participant in a
round and the regret which could be associated with the outcome of the round. This correlation
however became strong when restricted to participants who demonstrated and self-reported to care-
fully assess all the information for their route choice decision and for providing review feedback.
This is important because the notion of regret has been a mainstay in prior theoretical studies on
7
convergence to correlated equilibrium. Overall, the displayed rating converged to maximum value
in conjunction with very high likelihood of following recommendation.
8
Chapter 2
Problem Formulation and Preliminaries for Information Design
2.1 Notations
We now define key notations to be used throughout this thesis. △(X) denotes the set of all prob-
ability distributions on X. For an integer n, we let [n] :={1,2,...,n}. For a vector x∈R
n
, let
supp(x) :={i∈[n]| x
i
̸= 0} be the set of indices whose corresponding entries in x are not zero,
and diag(x) be the n× n diagonal matrix with elements of x on the main diagonal. Forλ ≥ 0, let
P
n
(λ) :=
x∈R
n
≥ 0
| ∑
i∈[n]
x
i
=λ
be the(n− 1)-dimensional simplex of size λ. When λ = 1,
we shall simply denote the simplex asP
n
for brevity in notation. 0
n× m
and 1
n× m
will denote
n× m matrices all of whose entries are 0 and 1 respectively. In all these notations, the subscripts
corresponding to size shall be omitted when clear from the context. For matrices A and B of the
same size, their Frobenius inner product is A· B=∑
i, j
A
i, j
B
i, j
. For a real number r∈R, we let
[r]
+
= max(r,0), and⌊r⌋ to be the largest integer no greater than r.
2.2 Problem Formulation
Consider a network consisting of n parallel links between a single source-destination pair.
1
We
use link and route interchangeably for parallel network. Without loss of generality, let the agent
1
Extension to non-parallel networks is discussed in Section 4, where n denotes the number of routes between the
origin-destination pair.
9
population generate a unit volume of traffic demand. The link latency functions ℓ
ω,i
( f
i
), i∈[n],
give latency on link i as a function of flow f
i
through them, conditional on the state of the network
ω∈Ω={ω
1
,...,ω
s
}. Throughout the thesis, we shall make the following basic assumption on
these functions.
Assumption 1. For every i∈[n], ω∈Ω, ℓ
ω,i
is a non-negative, continuously differentiable and
non-decreasing function.
At times, we shall strengthen the assumption toℓ
ω,i
being strictly increasing. A class of func-
tions satisfying Assumption 1 which is attractive from a computational perspective is that of poly-
nomial functions:
ℓ
ω,i
( f
i
)=
D
∑
d=0
α
d,ω,i
f
d
i
, i∈[n], ω∈Ω (2.1)
with α
0,ω,i
≥ 0 and α
1,ω,i
≥ 0. We shall also let α
d
refer to the s× n matrix whose entries are
α
d,ω,i
. Two instances of (2.1) commonly studied in the literature are affine and the Bureau of
Public Roads (BPR) functions [34]. In the former case, D= 1 and in the latter case, D= 4 with
α
1
=α
2
=α
3
= 0. Furthermore, the BPR function has the following interpretation: α
0,ω,i
is the
free flow travel time on link i when the state isω, and
0.15
α
0,ω,i
α
4,ω,i
1
4
is the flow capacity of link i
when the state isω.
Let ω ∼ µ
0
∈ interior(△(Ω)), for some prior µ
0
which is known to all the agents. The
agents do not have access to the realization of ω, but a fixed fraction ν ∈ [0,1] of the agents
receives private route recommendations or public messages conditional on the realized state.
2.2.1 Private Signaling Policies
The conditional route recommendations are generated by a private signaling policy π ={π
ω
∈
△(P
n
(ν)) : ω∈Ω} as follows. Given a realizationω∈Ω, sample a x∈P
n
(ν) according toπ
ω
,
and partition the agent population into n+ 1 parts with volumes(x
1
,...,x
n
,1− ν). All the agents
are identical, and therefore in the non-atomic setting that we are considering here the partition
can be formed by independently assigning every agent to a partition with probability equal to the
10
volume of that partition. The agents in the(n+ 1)-th partition, with volume 1− ν, do not receive
any recommendation, whereas all the agents in the i-th partition, i∈[n], receive recommendation
to take route i, i.e., x
i
volume of agents is recommended to take route i.
Example 1. LetΩ={ω
1
,ω
2
}. An example of a signaling policy for the two-link case withν = 1
is: π
ω
1
= x
1
with probability 0.5 and= x
2
with probability 0.5,π
ω
2
= x
1
with probability 0.25 and
= x
2
with probability 0.75, with x
1
=[0.75 0.25]
T
and x
2
=[0.5 0.5]
T
. When the state is ω
1
,
then the system recommends route 1 to 0.75 volume of agents and route 2 to the remaining 0.25
volume with probability 0.5, and recommends route 1 to 0.5 volume of agents and route 2 to the
remaining 0.5 volume with probability 0.5. π
ω
2
has a similar interpretation.
The special case whenπ
ω
1
andπ
ω
2
are probability mass functions, as in this example, will be
later referred to as atomic private signaling policies and will play an important role in the thesis.
The policyπ and the fractionν is publicly known to all the agents. Therefore, it is easy to see
that the (joint) posterior on(x,ω), i.e., the proportion of agents getting different recommendations
and the state of the network, formed by an agent who receives recommendation i∈[n] is:
µ
π,i
(x,ω)=
x
i
π
ω
(x)µ
0
(ω)
∑
θ∈Ω
R
p∈P(ν)
p
i
π
θ
(p)dpµ
0
(θ)
(2.2)
and the posterior formed by an agent who does not receive a recommendation is:
µ
π,/ 0
(x,ω)=π
ω
(x)µ
0
(ω) (2.3)
Remark 1. One could consider an alternate setup where the set of agents who do not participate
in the signaling scheme is pre-determined. These agents do not receive a recommendation and
also do not have knowledge aboutπ. In this case, (2.3) can be replaced with µ
π,/ 0
(x,ω)=
µ
0
(ω)
|P(ν)|
obtained by replacingπ
ω
with the uniform distribution. The methodologies developed in this thesis
also extend to this alternate setting.
11
A signaling policy is said to be obedient if the recommendation received by every agent is
weakly better, in expectation with respect to posterior in (2.2), than other routes, while the non-
participating agents induce a Bayes Nash flow with respect to their posterior in (2.3). Formally, a
π is said to be obedient
2
if there exists y∈P
n
(1− ν) such that
3
:
∑
ω
Z
x
ℓ
ω,i
(x
i
+ y
i
)µ
π,i
(x,ω)dx≤ ∑
ω
Z
x
ℓ
ω, j
(x
j
+ y
j
)µ
π,i
(x,ω)dx, i, j∈[n] (2.4a)
∑
ω
Z
x
ℓ
ω,i
(x
i
+ y
i
)µ
π,/ 0
(x,ω)dx≤ ∑
ω
Z
x
ℓ
ω, j
(x
j
+ y
j
)µ
π,/ 0
(x,ω)dx, i∈ supp(y), j∈[n] (2.4b)
y is the flow induced by the non-participating agents. (2.4b) captures the fact that this is the Bayes
Nash flow with respect to the prior. Plugging the expressions of beliefs from (2.2) and (2.3), noting
that the denominators on both sides of the inequalities are the same in (2.4), and multiplying both
sides of the second set of inequalities by y
i
, one equivalently gets:
∑
ω
Z
x
ℓ
ω,i
(x
i
+ y
i
)x
i
π
ω
(x)dxµ
0
(ω)≤ ∑
ω
Z
x
ℓ
ω, j
(x
j
+ y
j
)x
i
π
ω
(x)dxµ
0
(ω), i, j∈[n] (2.5a)
∑
ω
Z
x
ℓ
ω,i
(x
i
+ y
i
)y
i
π
ω
(x)dxµ
0
(ω)≤ ∑
ω
Z
x
ℓ
ω, j
(x
j
+ y
j
)y
i
π
ω
(x)dxµ
0
(ω), i, j∈[n] (2.5b)
We emphasize that multiplying both sides by y
i
allows to equivalently relax the restriction on i in
terms of y in (2.4b) to get (2.5b).
The social cost is taken to be the expected total latency:
J(π,y) :=
∑
ω,i
Z
x
(x
i
+ y
i
)ℓ
ω,i
(x
i
+ y
i
)π
ω
(x)dxµ
0
(ω) (2.6)
The information design problem can then be stated as
min
(π,y)∈Π× P(1− ν)
J(π,y) s.t. (2.5) (2.7)
2
An obedient signaling policy can be interpreted as a Bayes correlated equilibrium [22].
3
Throughout the thesis, unless noted otherwise, the summation over indices for degree, state and link, such as d,ω
and i, respectively, are to be taken over the entire range, i.e.,{0,...,D},Ω and[n], respectively, and the integral w.r.t.
x is overP(ν).
12
whereΠ is the concise notation for△(P(ν))
s
.
Remark 2. 1. If there are multiple feasible y for a givenπ, then a solution(π
∗ ,y
∗ ) to (2.7) can
be interpreted as implicitly requiring an additional effort from the recommendation system
to enforce y
∗ . One could alternately consider a robust formulation by replacing min
(π,y)
in
(2.7) with min
π
max
y
. We leave such an extension for future consideration. Moreover, as
we state below after the remark, under a rather reasonable condition on the link latency
functions, there exists a unique feasible y for everyπ, in which case the robust version is the
same as (2.7).
2. One can show that revelation principle, e.g., see [3], holds true in the setting of this thesis
for strictly increasing link latency functions [35, Section IIB]. This implies that optimal-
ity in the class of obedient direct private signaling policies, i.e., signaling policies which
recommend routes, also ensures optimality within a broader class which includes indirect
signaling policies. An indirect signaling policy provides noisy information about the state
realization. The route choice is then determined by Bayes Nash flow with respect to the pos-
terior beliefs induced by the signaling policy. In Section 2.2.3, we consider a special case of
indirect policies, known as public signaling policies.
3. The feasible set in (2.7) is non-empty for allν∈[0,1]. Details are provided in Remark 5.
It can be shown easily using a straightforward adaptation of the standard argument for Wardrop
equilibrium in the deterministic case that, for everyπ∈Π, a y∈P(1− ν) satisfies (2.5b) if and
only if it solves the following convex problem:
min
y∈P(1− ν)
∑
ω,i
Z
y
i
0
Z
x
ℓ
ω,i
(x
i
+ s)π
ω
(x)dxdsµ
0
(ω) (2.8)
13
Moreover, such a y is unique if{ℓ
ω,i
}
ω,i
are strictly increasing over[0,1]. In particular, for unique-
ness, it is sufficient to have α
1,ω,i
> 0 for allω,i for affine latency functions, and α
4,ω,i
> 0 for all
ω,i for BPR latency functions.
4
2.2.2 Indirect Signaling Policies
The private signaling policies considered in the previous section are direct, i.e., their output space is
the set of routes. A generalization is when the output space is arbitrary set of messages, e.g., travel
time on the routes. Let the message space be{1,...,m}=[m]. Formally, an indirect signaling
policy isπ
ind
={π
ind
ω
∈△(P
m
(ν)) : ω∈Ω}. The policy generates a message vector ¯ x∈P
m
(ν),
where ¯ x
h
is the volume of agents who get message k∈[m]. The joint posterior formed by an agent
who receives message k is:
µ
π
ind
,k
( ¯ x,ω)=
¯ x
k
π
ind
ω
( ¯ x)µ
0
(ω)
∑
θ∈Ω
R
p∈P
m
(ν)
p
h
π
ind
θ
(p)dpµ
0
(θ)
and the posterior formed by an agent who does not receive a recommendation is:
µ
π
ind
,/ 0
( ¯ x,ω)=π
ind
ω
( ¯ x)µ
0
(ω)
Let x
(k)
∈P(ν) be the link flow induced by the agents receiving message k, and let y∈P(1− ν) be the link flow induced by agents not receiving the message. These link flows are given by
the Bayes Nash equilibrium (BNE) of the underlying Bayesian game, i.e., they satisfy:∀i, j∈[n],
∀k∈[m],
∑
ω
Z
¯ x
¯ x
k
x
(k)
i
ℓ
ω
i
∑
r
¯ x
r
x
(r)
i
+ y
i
π
ind
ω
( ¯ x)µ
0
(ω)d ¯ x≤ ∑
ω
¯ x
k
x
(k)
i
ℓ
ω
j
(
∑
r
¯ x
r
x
(r)
j
+ y
j
)π
ind
ω
( ¯ x)µ
0
(ω)d ¯ x
∑
ω
Z
¯ x
y
i
ℓ
ω
i
∑
r
¯ x
r
x
(r)
i
+ y
i
π
ind
ω
( ¯ x)µ
0
(ω)d ¯ x≤ ∑
ω
Z
¯ x
y
i
ℓ
ω
j
∑
r
¯ x
r
x
(r)
j
+ y
j
π
ind
ω
( ¯ x)µ
0
(ω)d ¯ x
(2.9)
4
Note that all the derivatives of the BPR latency function are zero at 0. However, one can easily show uniqueness
in the special cases when, for a signaling policy supported only on x
i
= 0, (2.8) has a solution with y
i
= 0.
14
We next discuss existence and equivalence of BNE link flows.
Proposition 1. (x,y)≡ ({x
(k)
: k∈[m]},y) is a BNE flow for an indirect signaling policy π
ind
if
and only if it is a solution to:
min
y∈P(1− ν)
x
(k)
∈P(ν),k∈[m]
∑
i,ω
Z
¯ x
Z
∑
k
¯ x
k
x
(k)
i
+y
i
0
ℓ
ω,i
(z)π
ind
ω
( ¯ x)µ
0
(ω)dzd ¯ x (2.10)
Furthermore, if the link latency functions are strictly increasing, then all the BNE flows asso-
ciated with a policy have the same aggregate link flow, i.e., for any two BNE flows (x
(1)
,y
(1)
) and
(x
(2)
,y
(2)
), we have∑
k
¯ x
k
x
(1,k)
+ y
(1)
=∑
k
¯ x
k
x
(2,k)
+ y
(2)
for all ¯ x∈P
m
(ν).
Remark 3. 1. Direct private signaling policies in Section 2.2.1 correspond to the special case
of indirect policies when the set of messages is equal to the set of routes. Accordingly obedi-
ence condition in (2.5) is derived from (2.9) with x
(k)
i
=ν if i= k and equal to zero otherwise.
2. Proposition 1 implies that the revelation principle, e.g., see [3], holds true in the setting of
this thesis. That is, for every indirect policy, there exists a direct policy which induces the
same aggregate link flows, and therefore, it is sufficient to optimize over the class of direct
policies.
2.2.3 Public Signaling Policies
A public signaling policy is an indirect signaling policy, under which, for every state realization,
ν fraction of agents all receive the same message among {1,...,m}=[m]. Formally, a public
signaling policy is a map π
pub
:Ω→△([m]), or can alternately be represented as a s× m row
stochastic matrix. The posterior formed by agents when the message they receive is k is:
µ
π
pub
,k
(ω)=
π
pub
(k|ω)µ
0
(ω)
∑
θ
π
pub
(k|θ)µ
0
(θ)
, ω∈Ω (2.11)
15
The joint posterior formed by agents who do not receive message, but have knowledge ofπ
pub
, is:
µ
π
pub
,/ 0
(k,ω)=π
pub
(k|ω)µ
0
(ω), k∈[m],ω∈Ω (2.12)
Let x
(k)
∈P(ν) be the link flow induced by participating agents, when the message they
receive is k∈ [m], and let y∈P(1− ν) be the link flow induced by agents not receiving the
message. x
(k)
is the Bayes Nash flow with respect to the posterior in (2.11) and y is the Bayes Nash
flow with respect to the posterior in (2.12). That is, x
(k)
satisfies:
∑
ω
ℓ
ω,i
(x
(k)
i
+ y
i
)µ
π
pub
,k
(ω)≤ ∑
ω
ℓ
ω, j
(x
(k)
j
+ y
j
)µ
π
pub
,k
(ω), i∈ supp(x
(k)
), j∈[n]
Substituting the expression from (2.11), the conditions on {x
(1)
,...,x
(m)
} can be collectively
rewritten as
x
(k)
i
∑
ω
ℓ
ω,i
(x
(k)
i
+ y
i
)− ℓ
ω, j
(x
(k)
j
+ y
j
)
π
pub
(k|ω)µ
0
(ω)≤ 0, i, j∈[n], k∈[m] (2.13)
Similarly, the condition on y can be written as
y
i∑
k,ω
ℓ
ω,i
(x
(k)
i
+ y
i
)− ℓ
ω, j
(x
(k)
j
+ y
j
)
π
pub
(k|ω)µ
0
(ω)≤ 0, i, j∈[n] (2.14)
The social cost is:
J(π
pub
,x,y) :=
∑
k,i,ω
(x
(k)
i
+ y
i
)ℓ
ω,i
(x
(k)
i
+ y
i
)π
pub
(k|ω)µ
0
(ω)
(2.15)
Therefore, the problem of optimal public signaling policy design can be written as:
min
x
(k)
∈P(ν),k∈[m]
y∈P(1− ν)
π
pub
∈Π(m)
J(π
pub
,x,y) s.t. (2.13)− (2.14) (2.16)
16
Example 2. Two public signaling policies which have attracted particular interest are full infor-
mation and no information:
π
pub, full
=
k=1 k=2 ... k=m
ω
1
1 0 ... 0
ω
2
0 1 ... 0
.
.
.
.
.
.
.
.
. ...
.
.
.
ω
s
0 0 ... 1
, π
pub, no
=
k=1 k=2 ... k=m
ω
1
1 0 ... 0
ω
2
1 0 ... 0
.
.
.
.
.
.
.
.
. ...
.
.
.
ω
s
1 0 ... 0
(2.17)
where m = s for the full information signaling policy, and m is arbitrary, e.g., m = 1, for the
no information signaling policy. In fact, any row-stochastic π
pub, no
with identical rows is a no
information signaling policy.
It is sometimes of interest to evaluate the cost of a given public signaling policy. The cost can
be computed from the induced flows x
(k)
, k∈[m], and y. These flows are solution to a convex
optimization problem [35, Proposition 1].
17
Chapter 3
An Exact Polynomial Optimization Formulation for Private
Signaling Policies in Information Design
In this chapter, unless stated otherwise, we assume that the link latency functions are polynomial,
i.e., of the form in (2.1). For such latency functions, designing optimal public signaling policy in
(2.16) for a given message space is a polynomial optimization problem. For example, (2.16) is a
third degree polynomial optimization problem for affine link latency functions. This is however
not the case for private policies in (2.7). We now describe a procedure to equivalently convert (2.7)
into a polynomial optimization problem.
Let us first consider minimizing J(π,y) over π satisfying (2.5a), for a fixed y. Note that,
for y= 0, this corresponds to the information design problem in the special case when ν = 1.
Even in this special case, which has been studied previously in [8, 10], no comprehensive solution
methodology exists.
We start by rewriting the information design problem in terms of moments of the signaling pol-
icyπ. Let z be the vector of all monomials in x
1
,...,x
n
up to degree
D+1
2
if D is odd, and
D
2
+1 if D
is even, arranged in a lexicographical order. For example, for D= 3, z=[1, x
1
,...,x
n
, x
2
1
,...,x
1
x
n
,
x
2
x
1
,...,x
2
x
n
,...,x
n
x
1
,...,x
2
n
]
T
. For a fixed y, (2.7) can then be written as (3.1) for appropriate
symmetric matrices C
ω
, A
(i, j)
ω
, and B
(i, j)
ω
; expressions for these matrices in the special case when
D= 1 (i.e., affine link latency functions) are provided in Appendix A.The cost in (3.1a) is the same
18
as the cost in (2.6), (3.1b) corresponds to the obedience constraint in (2.5a), and (3.1c) corresponds
to (2.5b).
min
π∈Π
∑
ω
Z
C
ω
(y)· zz
T
π
ω
(x)dx (3.1a)
s.t.
∑
ω
Z
A
(i, j)
ω
(y)· zz
T
π
ω
(x)dx≥ 0, i, j∈[n] (3.1b)
∑
ω
Z
B
(i, j)
ω
(y)· zz
T
π
ω
(x)dx≥ 0, i, j∈[n] (3.1c)
(3.1) is an instance of the generalized problem of moments (GPM) [12], which in turn can
be solved numerically using GloptiPoly [13]. This software solves GPM by lower bounding it
with semidefinite relaxations of increasing order. The stopping criterion on the order is however
problem-dependent; approximations can be obtained by a user-specified order. In the special of
n= 2, the first order relaxation is tight.
Proposition 2. Let n= 2. For every y∈P(1− ν), (3.1) is equivalent to a semidefinite program.
Remark 4. Proposition 2 implies that, in the case of two links, when all the agents are participat-
ing, i.e., ν = 1, computing optimal signaling policy is tractable for arbitrary polynomial latency
functions. This is to be contrasted with existing work, e.g., [8, 10], where an optimal signaling
policy is provided for such a setting only for certain affine link latency functions.
3.1 Atomic Private Signaling Policies
A natural approach to approximate the joint optimization in (2.7) is to discretize the support ofπ.
A signaling policyπ is called m-atomic, m∈N, if, for everyω∈Ω,π
ω
is supported on m discrete
points x
(k)
∈P(ν), k∈[m]. Let the set of such signaling policies be denoted asΠ(m). It is easy
to see that every signaling policy in Π(m) can be represented as a s× m row stochastic matrix.
To emphasize the matrix notation, we let π(k|ω) denote the probability of recommending routes
according to x
(k)
when the state realization isω. Computing optimal signaling policy inΠ(m) can
19
be written as the polynomial optimization problem (3.2)
1
. In particular, for (2.1) with D= 1, i.e.,
affine link latency functions, the polynomials in the cost functions and the constraints are of degree
3.
min
x
(k)
∈P(ν),k∈[m]
y∈P(1− ν)
π∈Π(m)
∑
k,ω,i
x
(k)
i
+ y
i
ℓ
ω,i
(x
(k)
i
+ y
i
)π(k|ω)µ
0
(ω) (3.2a)
s.t.
∑
k,ω
ℓ
ω,i
(x
(k)
i
+ y
i
)x
(k)
i
π(k|ω)µ
0
(ω)≤ ∑
k,ω
ℓ
ω, j
(x
(k)
j
+ y
j
)x
(k)
i
π(k|ω)µ
0
(ω), i, j∈[n]
(3.2b)
∑
k,ω
ℓ
ω,i
(x
(k)
i
+ y
i
)y
i
π(k|ω)µ
0
(ω)≤ ∑
k,ω
ℓ
ω, j
(x
(k)
j
+ y
j
)y
i
π(k|ω)µ
0
(ω), i, j∈[n]
(3.2c)
(3.2) can also be solved (approximately) usingGloptiPoly. (3.2) gives an increasingly tighter
upper bound to (2.7) with increasing m∈N. While it is natural to expect the gap between (3.2)
and (2.7) to go to zero as m→+∞, the gap in fact becomes zero for finite m.
Theorem 1. (2.7) is equivalent to (3.2) for m≥ s
D+n
D+1
.
The upper bound in Theorem 1 on the number of atoms required to realize an optimal signaling
policy can be tightened in some cases, as we show in the next section.
3.2 Diagonal Atomic Private Signaling Policies
An atomic policy which has attracted particular attention is when π is the identity matrix of size
s. We shall refer to such a policy as a diagonal atomic signaling policy, and denote its finite
support as x
ω
, ω ∈Ω. These policies are among the simplest policies which do not reveal the
true state. They simplify the process of route recommendation for the system, and also reduce
the complexity of the information design problem. Besides, as shown in Section 3.3, they are
1
Throughout the thesis, unless noted otherwise, the summation over index for discrete support, such as k, is to be
taken over the entire range, i.e., m.
20
an important medium for showing monotonicity of cost with increasing fraction of participating
agents. Moreover, as simulations in Section 4.1 suggest, it might be sufficient to focus on them for
optimal performance, however a formal study is left to future work. The polynomial optimization
problem in (3.2) for diagonal atomic policies simplifies to:
min
x
ω
∈P(ν),ω∈Ω
y∈P(1− ν)
∑
ω,i
(x
ω
i
+ y
i
)ℓ
ω,i
(x
ω
i
+ y
i
)µ
0
(ω) (3.3a)
s.t.
∑
ω
ℓ
ω,i
(x
ω
i
+ y
i
)x
ω
i
µ
0
(ω)≤ ∑
ω
ℓ
ω, j
(x
ω
j
+ y
j
)x
ω
i
µ
0
(ω), i, j∈[n] (3.3b)
∑
ω
ℓ
ω,i
(x
ω
i
+ y
i
)y
i
µ
0
(ω)≤ ∑
ω
ℓ
ω, j
(x
ω
j
+ y
j
)y
i
µ
0
(ω), i, j∈[n] (3.3c)
In general, (3.3) gives an upper bound to (3.2) for m≥ s, and hence also for (2.7).
Remark 5. It is interesting to compare the formulations in (3.2) and (2.16) for m-atomic private
signaling policies and public signaling policies with m messages respectively.
Every public signaling policy with m messages can be equivalently realized by an m-atomic
private signaling policy, but the converse is not true in general. Formally, given a ν ∈ [0,1],
for every public signaling policy π
pub
with m messages, there exists an m-atomic direct private
signaling policy with the same cost. In particular, there exists a feasible 1-atomic private signal
corresponding to π
pub, no
in (2.17) with m= 1, and hence (2.7) is feasible for every ν ∈[0,1].
Considering s duplicates of the same atom as for m= 1 case implies that (3.3) is feasible for all
ν∈[0,1]. Feasibility of (3.2) can be established along similar lines.
Remark 6. If ν = 1, i.e., y= 0, then Proposition 2 can be strengthened by identifying tractable
sufficient conditions on the coefficients of link latency functions such that (3.1) admits a diagonal
atomic optimal solution, using the approach of the proof of Proposition 3. Specifically, one can
rewrite (3.1) only in terms of scalar x
1
and then note that it is sufficient to ensure that A
(i, j)
ω
zz
T
in (3.1b) is concave in x
1
for all i, j∈[n= 2] and ω∈Ω. The latter can be written in terms of
non-negativity of polynomials corresponding to the second derivative of A
(i, j)
ω
zz
T
. These in turn
can be equivalently written as semidefinite constraints, e.g., using [36, Theorems 9 and 10].
21
The next result establishes the equivalence between (3.3) and (2.7) in a special case, and also
establishes that (3.3) is equivalent to the following semidefinite program:
min
M⪰ 0
ˆ
J(M) := C· M (3.4a)
s.t. A
(i, j)
· M≥ 0, i, j∈[n] (3.4b)
B
(i, j)
· M≥ 0, i, j∈[n] (3.4c)
M(1,1)= 1 (3.4d)
M(i, j)≥ 0, i, j∈[(s+ 1)n+ 1] (3.4e)
S
(k)
x
· M= 0, S
y
· M= 0, k∈[m] (3.4f)
T
(i,k)
x
· M= 0, T
(i)
y
· M= 0 i∈[n], k∈[m] (3.4g)
where the expressions for symmetric matrices C, A
(i, j)
, B
(i, j)
, S
(k)
x
, S
y
, T
(i,k)
x
and T
(i)
y
for the special
case D= 1 are provided in Appendix A.
Proposition 3. If n= 2, then (3.3), (2.7) and (3.4) are all equivalent to each other for (2.1) with
D= 1, i.e., for affine link latency functions.
Remark 7. 1. For n= 2 and D= 1, Proposition 3 implies that an optimal signaling policy can
be realized with s atoms, which is much less than the bound
3
2
s= 3s given by Theorem 1.
2. In spite of its apparent limited scope, Proposition 3 is the first to provide a complete char-
acterization of solution to the information design problem even in the most basic setting.
Indeed, Proposition 3 and its proof approach might appear to be generalization of an obser-
vation in [10], which was made forν = 1, and for limited affine link latency functions. Not
only do we remove these restrictions, but more importantly, our proof implicitly highlights
that the obedience constraint needs more careful treatment than suggested in [10]. Finally,
one can follow the proof of Proposition 3 to show that (2.7) admits a diagonal atomic optimal
signaling policy also for n= 2 and D= 2 ifα
2,ω,1
=α
2,ω,2
≥ 0 for allω∈Ω.
22
3. It is informative to contrast the different approaches of Proposition 2 and Proposition 3 for
establishing tightness of the natural semidefinite relaxation of the corresponding variants
of the information design problem. Proposition 1 simply relies on the ability to rewrite the
problem in terms of univariate probability measures with compact support, whereas Propo-
sition 3 relies on the tightness of the GPM obtained by relaxation because it has optimal
probability measures supported on single atoms.
3.3 Monotonicity of Optimal Cost Value under Diagonal Atomic
Private Signaling Policies
Let J
diag
(x,y) denote the cost function in (3.3a), and let J
diag,∗ (ν) denote the optimal value for a
givenν.
Theorem 2. J
diag,∗ (ν) is continuous and monotonically non-increasing with respect toν∈[0,1].
Remark 8. 1. Note that Theorem 2 does not require the link latency functions to be polynomial.
2. In light of Proposition 3, Theorem 2 implies that, if n= 2 and if the link latency functions are
affine, then the optimal cost value under all, i.e., not necessarily (diagonal) atomic, private
signaling policies is continuous and monotonically non-increasing in ν∈[0,1]. However,
this is not necessarily the case with public signaling policies, as we illustrate in Section 4.1.
3. The proof of Theorem 2, in Appendix F, implies that for a (not necessarily optimal) atomic
diagonal signaling policyπ
diag
for someν
1
∈[0,1], one can construct a simpleν-dependent
atomic diagonal signaling policy with the same social cost as π
diag
for all ν∈[ν
1
,1]. In
other words, one can construct a simple feedback (usingν) atomic diagonal signaling policy
around a nominal π
diag
under which the social cost does not increase due to higher than
nominal fraction of participating agents for whichπ
diag
is designed. This is to be contrasted
with existing results according to which the cost of participating agents may increase with
their increasing fraction under a fixed (open-loop) signaling policy, e.g., see [17, 18].
23
Chapter 4
Simulations and Computational Complexity for Information
Design
The computational framework for information design extends to non-parallel networks with a
single origin-destination pair. A route in this case potentially consists of multiple links. Ac-
cordingly, the obedience condition in (2.5a) becomes∑
ω
R
x
∑
e∈i
ℓ
ω,e
( ˜ x
e
+ ˜ y
e
)x
i
π
ω
(x)dxµ
0
(ω)≤ ∑
ω
R
x
∑
e∈ j
ℓ
ω,e
( ˜ x
e
+ ˜ y
e
)x
i
π
ω
(x)dxµ
0
(ω) for all i, j∈[n], where e∈ i denotes all the links e con-
stituting route i, and ˜ x
e
and ˜ y
e
denote the flow on link e induced by the participating and non-
participating agents respectively. ˜ x and ˜ y are linear in route flows x and y respectively. Therefore,
(2.5a) is polynomial for polynomial link latency functions also for non-parallel networks. The
same observation holds true for (2.5b), as well as for the cost function in (2.6). This extends to
the indirect policy setup as well. Indeed, such generalizations are used in the simulations for the
wheatstone network in Section 4.1. In particular, for networks with n routes and link latency func-
tions of degree D, the polynomials involved in cost functions and constraints are of degree D+ 1
and have n variables. Therefore, the lower bound on the number of atoms in Theorem 1 holds as
is for non-parallel networks (note that n is the number of routes and not the number of links).
The optimal signaling policy is to be computed offline, and the messages or recommendations
to be generated in real-time as the state changes are obtained through sampling from a given policy.
Nevertheless, it is important to examine the complexity of computing optimal policy. The worst-
case complexity of semidefinite program, under standard solution methods, scales no worse than
24
the square root of problem size, e.g., see [37]. The sum of the sizes of all the variables in (3.1) is
n(m+1)+sm. The maximum degree of the monomials for writing the r-th semidefinite relaxation
when the link latency functions are of degree D is
˜
D+ r− 1, where
˜
D=(D+ 1)/2 if D is odd
and= D/2+ 1 if D is even. Therefore, the worst case complexity for solving the r-th semidefinite
relaxation of (3.2) grows no worse than (n(m+ 1)+ sm)
˜
D+r− 1
2
. Semidefinite relaxations of small
order have been found in practice to give reasonable lower bound for polynomial optimization
problems [15], and the practical performance of semidefinite solvers has been found to be much
better than indicated by the worse-case complexity [37].
4.1 Simulations
We compare the minimum cost achievable under private signals, public signals, and full infor-
mation over two parallel links for affine and BPR latency functions (Section 4.1.1) and over a
Wheatstone network for affine and quadratic latency functions (Section 4.1.2). We also provide
a practical scaling of runtime with network size for parallel networks (Section 4.1.3). The sim-
ulations were performed using GloptiPoly and MultiStart function (with fmincon solver) in
MATLAB on a high performance computing facility.
1
In particular, the upper bound computed
byMultiStart allows to certify optimality of the lower bound obtained fromGloptiPoly, espe-
cially when the solution fromGloptiPoly does not come with an explicit certificate of optimality.
In all instances in Sections 4.1.1 and 4.1.2, it was found sufficient to have 100 starting points for
MultiStart and relaxation order of 3 forGloptiPoly. We chose 2 starting points in Section 4.1.3.
The no information signal corresponds toν = 0, when all the costs are expectedly equal.
4.1.1 Parallel Network
For both the scenarios in this case, the total demand is set to be 5.
1
The simulation code is available at https://github.com/YixianZhu2016/
Information-Design-Simulations.
25
0 0.5 1
108
110
112
114
116
118
120
cost
full info
optimal private
optimal public (m=2)
(a)
0 0.5 1
70
75
80
85
cost
full info
optimal private (m=2)
optimal public (m=2)
(b)
Figure 4.1: Comparison of minimum cost achievable under private signals, public
signals and full information over two parallel links, under different ν for (a) affine
latency functions and (b) BPR latency functions.
4.1.1.1 Affine Latency Functions
Figure 4.1(a) provides comparison between social costs for the following parameters: µ
0
(ω
1
)=
0.6= 1− µ
0
(ω
2
),
α
0
=
i=1 i=2
ω
1
5 25
ω
2
20 15
, α
1
=
i=1 i=2
ω
1
4 2
ω
2
1 2
The minimum social cost, i.e., the social cost when the system can mandate which route every
(receiving and non-receiving) agent takes for every realization ofω,
2
for these parameters is 83.33.
While the cost in Figure 4.1(a) shows non-monotonic behavior with respect to ν in the full
information case as well as under optimal public signal, the optimal cost is monotonically non-
decreasing under private signals. Expectedly, the optimal cost under public signal is no greater
than the cost under full information, and the optimal cost under private signal is no greater than
under public signal. Interestingly, in this case, full information is an optimal public signal for small
values ofν, and gives the same cost as an optimal private signal for even smaller values ofν.
2
This is also referred to as the first-best strategy.
26
Following Proposition 3, optimal private signal is computed using (3.3). The approximation
to optimal social cost under public signals using (2.16) was found to be identical for m= 2,3,4.
Optimal public signals underlying Figure 4.1(a) forν = 0.25,0.5,0.75,1 are, respectively:
x=
k=1 k=2
i=1 1.25 0
i=2 0 1.25
, y=
3.23
0.52
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
i=1 2.06 2.06
i=2 0.44 0.44
, y=
2.11
0.39
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
1 0
x=
k=1 k=2
i=1 3.75 0
i=2 0 3.75
, y=
0.42
0.83
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
1 0
x=
k=1 k=2
i=1 4.17 0.2
i=2 0.83 4.8
, y=
0
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
1 0
and optimal private signals for the sameν are:
x=
k=1 k=2
i=1 0.32 0
i=2 0.93 1.25
, y=
3.75
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
i=1 1.58 0.37
i=2 0.92 2.13
, y=
2.5
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
27
x=
k=1 k=2
i=1 2.83 1.62
i=2 0.92 2.13
, y=
1.25
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
i=1 4.08 2.87
i=2 0.92 2.13
, y=
0
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
4.1.1.2 BPR Latency Functions
Figure 4.1(b) provides comparison between social costs for the following parameters: µ
0
(ω
1
)=
0.6= 1− µ
0
(ω
2
),
α
0
=
i=1 i=2
ω
1
5 25
ω
2
20 15
, α
4
=
i=1 i=2
ω
1
0.047 0.025
ω
2
0.037 0.058
andα
1
=α
2
=α
3
= 0. These parameters correspond to free flow travel times and capacities being
equal to α
0
and
i=1 i=2
ω
1
2 3.5
ω
2
3 2.5
respectively. The minimum social cost for these parameters is
52.78.
The social cost profile in Figure 4.1(b) shows similar qualitative dependence on ν as in Fig-
ure 4.1(a). Since diagonal atomic private signals are observed to be optimal (based on the sample
values reported above), monotonicity of the corresponding cost is consistent with Theorem 2.
The approximation to optimal social cost under private signals using (3.2) was found to be
identical for m= 2,3,4, suggesting that m= 2 atoms are possibly sufficient to realize optimal
private signal in this case. This is much less than the upper bound of 2
6
5
= 12 atoms given by
Theorem 1. Similarly, the approximation to optimal social cost under public signals using (2.16)
28
was found to be identical for m = 2,3,4. Optimal public signals underlying Figure 4.1(b) for
ν = 0.25,0.5,0.75,1 are, respectively:
x=
k=1 k=2
i=1 1.25 0
i=2 0 1.25
, y=
3.75
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
i=1 2.5 0
i=2 0 2.5
, y=
2.5
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
i=1 3.75 0
i=2 0 3.75
, y=
1.25
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
i=1 5.0 2.08
i=2 0.0 2.92
, y=
0
0
,π
pub
=
k=1 k=2
ω
1
0.87 0
ω
2
0.13 1
and optimal private signals for the sameν are:
x=
k=1 k=2
i=1 0.99 0
i=2 0.26 1.25
, y=
3.75
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
i=1 2.24 0.0
i=2 0.26 2.5
, y=
2.5
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
29
x=
k=1 k=2
i=1 3.49 0.76
i=2 0.26 2.99
, y=
1.25
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
i=1 4.74 2.01
i=2 0.26 2.99
, y=
0
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
4.1.2 Wheatstone Network
o
d
i= 1 i= 2
i= 3 i= 4
i= 5
0 0.5 1
30
35
40
cost
full info
optimal private (m=2)
optimal public (m=2)
0 0.5 1
44
46
48
50
52
54
cost
full info
optimal private (m=2)
optimal public (m=2)
(a) (b) (c)
Figure 4.2: (a) Wheatstone network; and comparison of social costs under (b)
affine link latency functions and (c) quadratic link latency functions.
4.1.2.1 Affine Latency Functions
Consider the Wheatstone network shown in Figure 4.2(a), where a demand of 2.5 needs to be
routed from o to d. Consider paths 1, 2 and 3 consisting of i={1,2}, i={3,4} and i={1,5,4}
respectively. Figure 4.2(b) shows comparison between the costs for the following simulation pa-
rameters: µ
0
(ω
1
)= 0.5= 1− µ
0
(ω
2
),
α
0
=
i=1 i=2 i=3 i=4 i=5
ω
1
1 15 24 1 2
ω
2
1 0.5 4 1 20
, α
1
=
i=1 i=2 i=3 i=4 i=5
ω
1
3 1 1.5 0.5 5
ω
2
3 0.5 1.5 0.5 5
30
The minimum social cost for these simulation parameters is 19.67. The optimal social cost under
public and private signals for m= 2 atoms are plotted in Figure 4.2(b). Optimal public signals for
ν = 0.25,0.5,0.75,1 are, respectively:
x=
k=1 k=2
path 1 0 0.625
path 2 0.625 0
path 3 0 0
, y=
1.53
0.34
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
path 1 0 1.25
path 2 1.25 0
path 3 0 0
, y=
1.23
0.02
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
path 1 0 1.875
path 2 1.875 0
path 3 0 0
, y=
0.625
0
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
path 1 0.08 2.5
path 2 2.42 0
path 3 0 0
, y=
0
0
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
and a set of optimal private signals for the sameν are:
x=
k=1 k=2
path 1 0.02 0.61
path 2 0.61 0.02
path 3 0 0
, y=
1.53
0.34
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
31
x=
k=1 k=2
path 1 0 1.25
path 2 1.25 0
path 3 0 0
, y=
1.25
0
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
path 1 0.14 1.87
path 2 1.73 0
path 3 0 0
, y=
0.63
0
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
path 1 0.76 2.5
path 2 1.74 0
path 3 0 0
, y=
0
0
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
The social cost profile in Figure 4.2(b) shows similar qualitative dependence on ν as in Fig-
ure 4.1(a), except that the full information signal is an optimal public signal for allν∈[0,1].
4.1.2.2 Quadratic Latency Functions
Consider the same Wheatstone network setup as before, except for quadratic link latency functions
with the following coefficients:
α
0
=
i=1 i=2 i=3 i=4 i=5
ω
1
1 15 24 1 2
ω
2
1 0.5 4 1 20
, α
1
=
i=1 i=2 i=3 i=4 i=5
ω
1
2 3 5 4 2
ω
2
4 3 1 4 3
,
α
2
=
i=1 i=2 i=3 i=4 i=5
ω
1
0.4314 0.1818 0.1455 0.8693 0.5499
ω
2
0.9106 0.2638 0.1361 0.5797 0.1450
32
The minimum social cost in this case is found to be 29.40. The optimal social cost under
public and private signals for m= 2 atoms are plotted in Figure 4.2(c). Optimal public signals for
ν = 0.25,0.5,0.75,1 are, respectively:
x=
k=1 k=2
path 1 0 0
path 2 0 0.625
path 3 0.625 0
, y=
1.521
0.354
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
path 1 0 0.017
path 2 0 1.233
path 3 1.25 0
, y=
1.25
0
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
path 1 0.160 0.642
path 2 0 1.233
path 3 1.715 0
, y=
0.625
0
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
path 1 0.785 1.267
path 2 0 1.233
path 3 1.715 0
, y=
0
0
0
,π
pub
=
k=1 k=2
ω
1
1 0
ω
2
0 1
and a set of optimal private signals for the sameν are:
x=
k=1 k=2
path 1 0 0
path 2 0 0.625
path 3 0.625 0
, y=
1.521
0.354
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
33
x=
k=1 k=2
path 1 0.025 0.04
path 2 0.108 1.21
path 3 1.117 0
, y=
1.25
0
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
path 1 0.653 0.664
path 2 0.104 1.211
path 3 1.118 0
, y=
0.625
0
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
x=
k=1 k=2
path 1 1.277 1.290
path 2 0.108 1.210
path 3 1.115 0
, y=
0
0
0
,π =
k=1 k=2
ω
1
1 0
ω
2
0 1
The social cost profile in Figure 4.2(c) shows similar qualitative dependence on ν as in Fig-
ure 4.1(a), with the exception that the full information signal is an optimal public signal for all
ν∈[0,1] in this case.
4.1.3 Scaling of Runtime with Network Size
0
2
4
6
8
10
12
14
log(runtime in seconds)
2345
Number of Links (n)
Figure 4.3: Log linear plot of run time versus n for parallel networks.
34
We revisit the parallel network setup and report runtime versus number of links. The link
latency functions are affine with coefficients for n-link network to be the first n columns of
α
0
=
i=1 i=2 i=3 i=4 i=5
ω
1
5 25 4 24 17
ω
2
20 15 24 12 19
, α
1
=
i=1 i=2 i=3 i=4 i=5
ω
1
4 2 1 2 4
ω
2
1 2 3 5 2
The total demand is set to be 2.5n, and the prior isµ
0
(ω
1
)= 0.6= 1− µ
0
(ω
2
) in all the instances.
The log-linear plot in Figure 4.3 suggests that runtime grows exponentially with n. This apparent
inconsistency with the discussion in Section 4 is to be understood in the context of complex re-
source management strategies embedded in the high performance computing facility used for these
simulations. Furthermore, implicit in the analysis of Section 4 is a large n assumption, and a fixed
absolute accuracy level which is independent of n. Relaxing these assumptions in the context of
practical solvers and hardware limitations is outside the scope of this thesis, and will be pursued in
future work.
35
Chapter 5
Convergence in Repeated Setting
In this chapter, we restrict ourselves to diagonal atomic policies π ={π
ω
∈P
n
(ν) : ω ∈Ω}.
Consider the following repeated game setting. At the beginning of each stage, a state of the nature
ω is sampled i.i.d from a publicly known prior µ
0
, and a fixed fraction ν ∈[0,1] of the agents
receives private route recommendations conditional on the realized state. These conditional rec-
ommendations are generated by the publicly know signalπ. The strategy space of a participating
agent is{obey,donotobey}, whereas the strategy space of a non-participating Bayesian agent is
{1,...,n}. The donotobey behavior of the participating agents is modeled by the row-stochastic
matrix P with zeros on the diagonal, where P
i j
is the fraction of donotobey participating agents
who are recommended i but choose j. The choice of a participating agent is driven by a notion
of regret, and the decision of a non-participating agent is myopic best response to a forecast of
participating agent decisions. The details are as follows.
5.1 Route choice model for participating agents
The outline of the decision making process of participating agents is as follows. At the end of
stage k, every participating agent computes difference between the payoffs associated with the
recommended choice and the alternative choice(s). These payoff differences are then aggregated
over the entire population of participating agents to give u(k). The average of the initial condition
m(1) and u(1),...,u(k), denoted as m(k+ 1), is then mapped to regret as θ(m(k+ 1))∈[0,1],
36
which equals the fraction of participating agents who do not follow recommendation in stage k+1.
The details for each step in this process is provided next.
We assume that upon completion of trips, all agents (participating and non-participating agents)
have access to traffic report from the k-th stage. This report consists ofω(k) and{ℓ
ω(k),i
}
i∈[n]
. For
simplicity, first consider the two-link case with ℓ
ω(k),1
>ℓ
ω(k),2
. For a participating agent who was
recommended route 1, irrespective of whether she obeys the recommendation or not, her recom-
mendation is sub-optimal byℓ
ω(k),1
− ℓ
ω(k),2
. In the general case of n≥ 2 links, for a participating
agent who obeys recommendation to take route i, the sub-optimality of the recommendation is
ℓ
ω(k),i
− ∑
j∈[n]
P
i j
ℓ
ω(k), j
. On the other hand, for a participating agent who does not obey the rec-
ommendation to take route i but rather takes route j, the sub-optimality of her recommendation
isℓ
ω(k),i
− ℓ
ω(k), j
. Taking into account the number of participating agents who are recommended
different routes and the fraction of them who obey or donotobey the recommendation, the aggre-
gation of payoff difference over the entire participating agent population is given by:
u(k)=ν
∑
i∈[n]
ℓ
ω(k),i
− ∑
j∈[n]
P
i j
ℓ
ω(k), j
π
ω(k),i
1− θ(m(k))
+ν
∑
i, j∈[n]
ℓ
ω(k),i
− ℓ
ω(k), j
P
i j
π
ω(k),i
θ(m(k))
=ν
∑
i, j∈[n]
ℓ
ω(k),i
− ℓ
ω(k), j
P
i j
π
ω(k),i
=νπ
ω(k)
T
(I− P)ℓ
ω(k)
(5.1)
where ℓ
ω(k)
is the n× 1 column vectors whose entries are ℓ
ω(k),i
, i∈[n]. The average of these
instantaneous payoff differences (as well as the initial condition m(1)) is:
m(k+ 1)=
1
k+ 1
(m(1)+ u(1)+...u(k))
=
k
k+ 1
m(k)+
1
k+ 1
u(k)
(5.2)
We adopt the following notion of regret:
θ(m(k))=
[m(k)]
+
m
max
(5.3)
37
where m
max
is chosen to be sufficiently large so that θ(m(k))∈[0,1] for all possible values of
m(k). Since [m(k)]
+
can be upper bounded by ∑
i∈[n]
∑
D
d=0
max
ω∈Ω
α
d,ω,i
, it can then serve as a
lower bound for m
max
.
θ(k)=θ(m(k)) is interpreted as the fraction of agents who do not follow recommendation.
Therefore, the link flows induced by participating agents be given by:
x
i
(k)= x
i
(m(k),ω(k))
=νπ
ω(k),i
(1− θ(m(k)))+ν
∑
j∈[n]
P
ji
π
ω(k), j
θ(m(k))
which in matrix form becomes:
x(k)= x(m(k),ω(k))
=νπ
ω(k)
(1− θ(m(k)))+ν
∑
j∈[n]
P
ji
π
ω(k), j
θ(m(k))
=ν
π
ω(k)
− θ(m(k))π
ω(k)
+θ(m(k))P
T
π
ω(k)
=νπ
ω(k)
+νθ(m(k))(P
T
− I)π
ω(k)
(5.4)
where x(m(k),ω(k))∈P
n
(ν).
Remark 9. The framework in (5.1)-(5.4) is reminiscent of the regret matching framework of [21].
Details on comparison are provided in Section 5.5.2. A few points are worth special emphasis:
1. The aggregation of payoff difference over all participating agents in (5.1) is inspired by
platforms such as Yelp.
2. In the framework of [21], [m]
+
is referred to as regret and m
max
is the inertia parameter
which captures the propensity of a participating agent to stick to the action choice in the
previous round. On the other hand, in our setup,θ =[m]
+
/m
max
is interpreted as the degree
of obedience of an individual agent, tuned by the parameter m
max
.
38
5.2 Route choice model for non-participating agents
The non-participating agents forecast the flow induced by participating agents for different real-
izations ofω using a model structurally similar to (5.4), as follows:
ˆ x
i
(k)= ˆ x
i
(
ˆ
θ(k),ω(k))
=νπ
ω(k),i
1− ˆ
θ(k)
+ν
∑
j∈[n]
P
ji
π
ω(k), j
ˆ
θ(k)
or in matrix form as:
ˆ x(k)= ˆ x(
ˆ
θ(k),ω(k))
=νπ
ω(k)
+ν
ˆ
θ(k)(P
T
− I)π
ω(k)
(5.5)
where
ˆ
θ is non-participating agents’ forecast ofθ, ˆ x(k)∈P
n
(ν). The simple exponential smooth-
ing model for this forecast can be written in the following equivalent forms:
ˆ
θ(k+ 1)=β(k+ 1)θ(k)+(1− β(k+ 1))
ˆ
θ(k)
=
ˆ
θ(k)+β(k+ 1)e
θ
(k) (5.6)
where e
θ
(k) := θ(k)− ˆ
θ(k), β(k+ 1)∈ (β
min
,β
max
)⊂ (0,1) is the smoothing parameter, and
ˆ
θ(1)∈[0,1] is the initial forecast.
Let us denote the myopic best response to the forecast as y(
ˆ
θ(k)). For a givenθ∈[0,1], y(θ)
is the unique (assuming strictly increasing latency functions{ℓ
ω,i
}
ω,i
, more details in Lemma 2)
y∈P
n
(1− ν) satisfying:
y
i
> 0 =⇒ E
ω∼ µ
0
[ℓ
ω,i
( ˆ x
i
(θ,ω)+ y
i
)]≤ E
ω∼ µ
0
[ℓ
ω, j
( ˆ x
j
(θ,ω)+ y
j
)], i, j∈[n]
(5.7)
Remark 10. Implementation of (5.6) requires non-participating agents to have access toθ(k). As
we assumed in Section 5.1, non-participating agents have access to stateω(k), delays{ℓ
ω(k),i
}
i∈[n]
.
39
Non-participating agents also know the fixed signal π, as well as explicit form of latency func-
tions. For a given
ˆ
θ(k), (5.5) gives ˆ x
i
(k). (5.7) then gives y(
ˆ
θ(k)), i.e., the flow induced by
non-participating agents is known to them. Assuming the latency functions are strictly increasing,
the total link-wise flows can be inferred from other information known to non-participating agents.
They can then infer the actual inflows induced by participating agents. Using the inverse of (5.4),
non-participating agents can then getθ(k).
An equivalent variational inequality characterization of y(θ) is that it is the unique y∈P
n
(1− ν) satisfying
∑
i∈[n]
E
ω∼ µ
0
[ℓ
ω,i
( ˆ x
i
(θ,ω)+ y
i
)](z
i
− y
i
)≥ 0 ∀z∈P
n
(1− ν) (5.8)
In matrix form, y(θ) satisfies:
E
ω∼ µ
0
(z− y(θ))
T
ℓ
ω
( ˆ x(θ,ω)+ y)
≥ 0 ∀z∈P(1− ν) (5.9)
Note that the total link flow forecasted by non-participating agents is ˆx(
ˆ
θ(k),ω(k))+ y(
ˆ
θ(k)),
but the actual flow is x(m(k),ω(k))+ y(
ˆ
θ(k)).
Remark 11. 1. One can show that the map defined in (5.7) is Lipschitz continuous. Detailed
proof in Lemma 2.
2. The forecast and best response combination of non-participating agent decisions is reminis-
cent of [19], where convergence is established to correlated equilibrium set in two player
games if the forecast is calibrated to the plays of the opponent. We argue in Section 5.5.1
that, with appropriate adaptation to the non-atomic setting of this thesis, the forecast strat-
egy of non-participating agents could also interpreted as being calibrated to participating
agents’ actions.
40
5.3 Convergence Analysis and Discussion
We shall establish convergence of (5.1)-(5.6) forπ which are obedient, i.e., for which there exists
y∈P
n
(1− ν) such that:
∑
ω∈Ω
ℓ
ω,i
(π
ω,i
+ y
i
)π
ω,i
µ
0
(ω)≤ ∑
ω∈Ω
ℓ
ω, j
(π
ω, j
+ y
j
)π
ω,i
µ
0
(ω), i, j∈[n] (5.10a)
∑
ω∈Ω
ℓ
ω,i
(π
ω,i
+ y
i
)y
i
µ
0
(ω)≤ ∑
ω∈Ω
ℓ
ω, j
(π
ω, j
+ y
j
)y
i
µ
0
(ω), i, j∈[n] (5.10b)
Following [22],(π,y) is therefore the Bayes Correlated Equilibrium induced byπ. We start the
convergence analysis by considering the extreme case when there are no non-participating agents,
i.e., whenν = 1.
Proposition 4. Consider the dynamics in (5.1)-(5.4) with ν = 1, and general polynomial latency
functions in (2.1). For anyµ
0
, obedientπ, and every m(1)∈[− m
max
,+m
max
], we have that, almost
surely
lim
k→∞
x
i
(k)− π
ω(k),i
= 0, i∈[n]
Towards consideration of the general case of heterogeneous population, we first provide an
analysis of the case in the absence of the aggregator, i.e., for some fixed m(k)≡ m.
Proposition 5. Consider the dynamics in (5.1)-(5.6) for some fixed m (k)≡ m∈[− m
max
,+m
max
].
For anyµ
0
, obedientπ,ν∈(0,1),
ˆ
θ(1)∈[0,1], we have that
lim
k→∞
ˆ
θ(k)=θ(m)
Next, we extend the analysis to the case when m(k) evolves according to (5.1)-(5.2).
Proposition 6. Consider the dynamics in (5.1)-(5.6). For any µ
0
, obedient π, ν∈(0,1),
ˆ
θ(1)∈
[0,1], m(1)∈[− m
max
,+m
max
], we have that
lim
k→∞
ˆ
θ(k)− θ(m(k))= 0
41
The following results will be useful in proving convergence of link-wise flows.
Lemma 1. Consider the dynamics in (5.1)-(5.6) for general polynomial latency functions in (2.1).
For anyµ
0
, obedientπ,ν∈(0,1), and initial conditions
ˆ
θ(1)∈[0,1] and m(1)∈[− m
max
,+m
max
],
there exists a subsequence{m(k
s
)}
s
, whose limit is negative almost surely.
Lemma 2. The mapping y(θ) :[0,1]→P
n
(1− ν) defined in (5.7) for general polynomial latency
functions in (2.1) is Lipschitz continuous.
We are now ready to state the main result.
Theorem 3. Consider the dynamics in (5.1)-(5.6) for general polynomial latency functions in (2.1).
For anyµ
0
, obedientπ,ν∈(0,1), and initial conditions
ˆ
θ(1)∈[0,1] and m(1)∈[− m
max
,+m
max
],
we have that, almost surely
lim
k→∞
x
i
(k)− νπ
ω(k),i
= 0, i∈[n]
5.4 Simulations
We report simulation results which suggest that the main convergence result is robust to natural
variations of the dynamics in (5.1)-(5.6). We consider a network with two parallel links. Unless
noted otherwise, in all the scenarios,µ
0
(ω
1
)= 0.6= 1− µ
0
(ω
2
),ν= 0.5,θ(1)= 0.5,
ˆ
θ(1)= 0.25,
and the total demand is set to be 1. Also, the link latency functions are affine with coefficients:
α
0
=
i=1 i=2
ω
1
5 25
ω
2
20 15
, α
1
=
i=1 i=2
ω
1
4 2
ω
2
1 2
In Scenario (a), we replace (5.2) with discounted review aggregation: m(k+ 1)=λm(k)+
(1− λ)u(k). In Scenario (b), we let the participation rate also be dynamic, and in particular, we
42
0 50 100
0
0.1
0.2
0.3
0.4
(a)
0 50 100 150 200 250
0
0.1
0.2
0.3
(b)
Figure 5.1: Convergence of regret and its forecast over two parallel links for (a) review
aggregation with discounting factor λ = 0.9, and (b) dynamic participation rate.
letν(k)=θ(k). Figure 5.1, which shows results for these scenarios, suggests that convergence of
regret forecast as well as actual regret to zero is achieved even under these variations.
5.5 Connection to Existing Literature
5.5.1 Calibrated Forecast
[19] studies a two player repeated game, in which, in each round, each player forecasts the prob-
ability p=(p
1
,..., p
n
) of decisions for the other player. Convergence to the set of correlated
equilibria is established if the forecasting rule is calibrated.
Let N(p,k) be the number of rounds up to the k-th round in which the forecast is p, and let
ρ(p, j,k) be the fraction of these rounds for which the other player plays j. Then the forecasting
rule is said to be calibrated if:
lim
k→∞
∑
p
ρ(p, j,k)− p
j
N(p,k)
k
= 0 ∀ j (5.11)
In contrast to [19], our setting considers non-atomic agents, and the utility of every player
depends also on the state ω, which changes from one round to the next. Non-atomicity gives B
43
agents sufficient samples to compare forecast with outcome on a per round basis. Specifically, the
flow forecast ˆx
j
(after normalization by ν) could be interpreted as the forecast of the probability
with which an individual participating agent will pick route j, and similarly x
j
is the probability
with which a participating agent actually picks j. Therefore, a reasonable adaptation of (5.11) to
our setting is that the B agents’ forecast is calibrated if:
lim
k→∞
1
k
k
∑
t=1
x
j
(t)− ˆ x
j
(t)
= 0 ∀ j (5.12)
for almost every i.i.d. realization ofω(t)∼ µ
0
.
Proposition 6 shows that lim
k→∞
ˆ
θ(k)− θ(m(k))= 0. Combine with (5.4) and (5.5), one can
easily get lim
k→∞
x
j
(k)− ˆ x
j
(k)= 0 for∀ j. Thus the forecasting rule in (5.5)-(5.6) is calibrated in
the sense of (5.12).
5.5.2 Regret-Matching
[21] considers the following repeated game between finite players. In round k, a player who chose
action i in round k− 1, switches to j̸= i with probability
[m
i→ j
(k)]
+
m
max
, and sticks to i with probability
1− ∑
j̸=i
[m
i→ j
(k)]
+
m
max
, where m
i→ j
(k) is a measure of regret for not having played j every time that
i was played in the past, and the inertia parameter m
max
is sufficiently large to ensure that the
probabilities are well defined.
Formally, the regret is computed as: m
i→ j
(k) :=
1
k− 1
∑
k− 1
t=1:player chooses i
(U( j,s
− (t))− U(s(t))),
where U is the utility function of the player under consideration, s(t) is the set of actions chosen
by all the players in round t, and s
− (t) is the set of actions of all the players except the one under
consideration.
The similarities and differences with the decision model of the P agents in our setup is now
apparent. An individual P agent computes regret for its action to follow or to not to follow the
recommendation in the previous round in the same spirit as [21]. However, this regret is not cate-
gorized in terms of the specific action taken in the previous round. Such an absence of conditioning
44
facilitates efficient aggregation of regrets of all the P agents as a simple summation. The time-
averaging of regrets is achieved through (5.2), where m
max
plays the role as the inertia parameter
similar to [21].
5.5.3 Experience Weighted Attraction Learning Model
[38] studied 3 learning models for Nash equilibrium in repeated atomic games with a continuum of
strategies: reinforcement learning (RL, which is a special case of EWA), experience weighted at-
traction learning (EWA), and individual evolutionary learning (IEL). These 3 models use foregone
utility, similar in spirit to the regret in our model, to compute probabilities of choosing strategies.
[38] showed convergence using EWA (with small discretized strategy space) and IEL to Nash equi-
librium. [32] extended IEL to correlated equilibrium for repeated atomic games (Battle of Sexes
and Chicken games) with a recommendation system. In this section, we adapt EWA to our repeated
non-atomic routing game setting by discretizing the unit demand.
Let all the agents be participating, i.e., ν = 1. Uniformly discretizing the unit demand into H
parts results in H subclasses of agents each with a demand of
1
H
. The discrete strategy space for
subclass h∈[H] is Q
h
={q
h,r
}
r∈[|Q
h
|]
, q
h,r
∈[0,1], where each q
h,r
is a predetermined probabil-
ity of following recommendation. Different subclasses might have different strategy spaces. In
simulations, in order to show convergence, we let max
r
q
h,r
= 1,∀h∈[H]. If agents do not fol-
low recommendations, they choose the remaining routes according to the constant row-stochastic
matrix P. We restrict ourselves to affine latency functions, i.e., D= 1.
Consider the following repeated game setting. At the beginning of each stage, state of nature
ω is sampled i.i.d from a publicly known prior µ
0
, and all the agents receive private route recom-
mendations conditional on the realized state. These conditional recommendations are generated
by the publicly known diagonal atomic policyπ ={π
ω
∈P
n
(ν) : ω∈Ω}.
At the beginning of stage k,
π
ω(k),i
H
of agents in subclass h receive route i∈[n] as their rec-
ommendation, among which
e
λA
h,r
(k)
∑
˜ r∈[|Q
h
|]
e
λA
h,˜ r
(k)
fraction will choose strategy q
h,r
, thus following their
45
recommendations with probability q
h,r
. The attraction A
h,r
(k) is the propensity of choosing strat-
egy q
h,r
for agents in subclass h at stage k. A
h,r
(k) is updated based on the possible payoff W
h,r
(k)
that strategy q
h,r
might have earned had it been played, weighted by L
h,r
(k), as follows:
A
h,r
(k+ 1)=φA
h,r
(k)+ L
h,r
(k)W
h,r
(k) (5.13)
where φ ≥ 0, and the computations of W
h,r
(k) and L
h,r
(k) will be explained later on. Therefore,
the flow on route i at stage k is given by:
x
i
(k)=
π
ω(k),i
H
∑
h∈[H]
∑
r∈[|Q
h
|]
q
h,r
e
λA
h,r
(k)
∑
˜ r∈[|Q
h
|]
e
λA
h,˜ r
(k)
+
∑
j∈[n]
π
ω(k), j
H
P
ji ∑
h∈[H]
∑
r∈[|Q
h
|]
(1− q
h,r
)e
λA
h,r
(k)
∑
˜ r∈[|Q
h
|]
e
λA
h,˜ r
(k)
At the end of stage k, agents in subclass h compute their foregone utility W
h,r
(k) associated
with strategy q
h,r
as follows. Assuming all agents in subclass h switched to use strategy q
h,r
at
stage k, and agents in other subclasses played the same strategies that they chose at stage k, the
foregone utility W
h,r
(k) is computed as:
W
h,r
(k)
=(− )
∑
i∈[n]
π
ω(k),i
H
q
h,r
α
1,ω(k),i
x
i
(k)− π
ω(k),i
H
∑
ˆ r∈[|Q
h
|]
q
h,ˆ r
e
λA
h,ˆ r
(k)
∑
˜ r∈[|Q
h
|]
e
λA
h,˜ r
(k)
− ∑
j∈[n]
π
ω(k), j
H
P
ji ∑
ˆ r∈[|Q
h
|]
(1− q
h,ˆ r
)e
λA
h,ˆ r
(k)
∑
˜ r∈[|Q
h
|]
e
λA
h,˜ r
(k)
+
π
ω(k),i
H
q
h,r
+
∑
j∈[n]
π
ω(k), j
H
P
ji
(1− q
h,r
)
+α
0,ω(k),i
+(1− q
h,r
)
∑
l∈[n]
P
il
α
1,ω(k),l
x
l
(k)− π
ω(k),l
H
∑
ˆ r∈[|Q
h
|]
q
h,ˆ r
e
λA
h,ˆ r
(k)
∑
˜ r∈[|Q
h
|]
e
λA
h,˜ r
(k)
− ∑
j∈[n]
π
ω(k), j
H
P
jl ∑
ˆ r∈[|Q
h
|]
(1− q
h,ˆ r
)e
λA
h,ˆ r
(k)
∑
˜ r∈[|Q
h
|]
e
λA
h,˜ r
(k)
+
π
ω(k),l
H
q
h,r
+
∑
j∈[n]
π
ω(k), j
H
P
jl
(1− q
h,r
)
+α
0,ω(k),l
!
where the first two lines compute the possible payoff for agents in subclass h who follow their
recommendations (with probability q
h,r
), and the last two lines compute the possible payoff for
agents in subclass h who do not follow their recommendations (with probability 1− q
h,r
). In order
to compute the foregone utility W
h,r
(k), agents in subclass h only need to know the actual link
46
flows {x
i
(k)}
i∈[n]
, and the parameters of latency functions{α
d,ω(k),i
}
d∈{1,2},i∈[n]
. Note that, using
the negative of latencies as payoffs is just one of many ways to model the foregone utility, and is
used here due to its simplicity.
L
h,r
in (5.13) determines the extent to which hypothetical evaluations will be used in computing
attractions, and is given by:
L
h,r
(k)=
δ+(1− δ)
e
λA
h,r
(k)
∑
˜ r∈[|Q
h
|]
e
λA
h,˜ r
(k)
N(k)
whereδ≥ 0, and N(k) is the experience weight, which starts at predetermined N(0), and is updated
as follows:
N(k)=ρN(k− 1)+ 1
withρ≥ 0. Finally, note that, whenδ = 0,ρ = 0 and N(0)= 0, the EWA models is the same as
the RL model in [38].
We simulated this extended EWA model using the following parameters: n= 3,|Ω|= 5,µ
0
=
[0.1 0.2 0.4 0.05 0.25], P =
0 0 1
0 0 1
0.5 0.5 0
, λ = 0.35, δ = 0.9, ρ = 0.95, φ = 1.04, H = 10,
Q
h
={0.1,0.2,...,0.9,1},∀h∈ [H], N(0) = 10, A(0) entries are uniform random variables in
[0,1], and
α
0
=
i=1 i=2 i=3
ω
1
5 25 4
ω
2
20 15 24
ω
3
15 20 14
ω
4
11 15 16
ω
5
8 10 20
, α
1
=
i=1 i=2 i=3
ω
1
4 2 1
ω
2
1 2 3
ω
3
2 3 4
ω
4
3 5 2
ω
5
5 4 5
, π =
i=1 i=2 i=3
ω
1
0.1 0 0.9
ω
2
0 1 0
ω
3
0.6 0 0.4
ω
4
0.9 0.1 0
ω
5
0.6 0.4 0
Figure 5.2 shows convergence of link-wise flow x
i
(k) to signalπ
ω(k),i
.
47
Figure 5.2: Evolution of x(k)− π
ω(k)
with time k for extended EWA model.
5.5.4 Individual Evolutionary Learning Model
In this section, we adapt IEL to our repeated non-atomic routing game setting.
Consider the same method for discretizing the population, and the same repeated game setting
as in Section 5.5.3. But now the discrete strategy space for subclass h∈ [H] is time varying
Q
h
(k)={q
h,r
(k)}
r∈[|Q
h
|]
, q
h,r
(k)∈[0,1], with|Q
h
(k)|≡| Q
h
|, and agents choose a strategy with a
probability proportional to the associated foregone utility.
At the beginning of stage k,
π
ω(k),i
H
of agents in subclass h received route i∈[n] as their rec-
ommendation, among which
W
h,r
(k− 1)
∑
˜ r∈[|Q
h
|]
W
h,˜ r
(k− 1)
fraction will choose strategy q
h,r
(k), thus following
their recommendations with probability q
h,r
(k). The computation of foregone utility W
h,r
(k) will
be explained later on. Therefore, the flow on route i at stage k is given by:
x
i
(k)=
π
ω(k),i
H
∑
h∈[H]
∑
r∈[|Q
h
|]
q
h,r
(k)W
h,r
(k− 1)
∑
˜ r∈[|Q
h
|]
W
h,˜ r
(k− 1)
+
∑
j∈[n]
π
ω(k), j
H
P
ji ∑
h∈[H]
∑
r∈[|Q
h
|]
(1− q
h,r
(k))W
h,r
(k− 1)
∑
˜ r∈[|Q
h
|]
W
h,˜ r
(k− 1)
At the end of stage k, agents first introduce new strategies in their strategy sets as follows. With
fixed probability Pr, each current strategy q
h,r
(k) will undergo an i.i.d. process of being replaced
by a new strategy drawn from a normal distribution with a mean equal to the current strategy
q
h,r
(k), and a predetermined standard deviation σ. That is, with probability Pr, q
h,r
(k) will be
replaced by q∼ N(q
h,r
(k),σ),q∈[0,1]; if q∈(− ∞,0)∪(1,+∞) then q is redrawn.
Agents in subclass h then compute their foregone utility associated with revised strategies
q
h,r
(k). Assuming all agents in subclass h switched to use strategy q
h,r
(k) at stage k, and agents in
48
other subclasses played the same strategies that they chose at stage k, the foregone utility W
h,r
(k)
is computed as follows:
W
h,r
(k)
=inverse
∑
i∈[n]
π
ω(k),i
H
q
h,r
(k)
α
1,ω(k),i
x
i
(k)− π
ω(k),i
H
∑
ˆ r∈[|Q
h
|]
q
h,ˆ r
(k)W
h,ˆ r
(k− 1)
∑
˜ r∈[|Q
h
|]
W
h,˜ r
(k− 1)
− ∑
j∈[n]
π
ω(k), j
H
P
ji ∑
ˆ r∈[|Q
h
|]
(1− q
h,ˆ r
(k))W
h,ˆ r
(k− 1)
∑
˜ r∈[|Q
h
|]
W
h,˜ r
(k− 1)
+
π
ω(k),i
H
q
h,r
(k)+
∑
j∈[n]
π
ω(k), j
H
P
ji
(1− q
h,r
(k))
+α
0,ω(k),i
+(1− q
h,r
(k))
∑
l∈[n]
P
il
α
1,ω(k),l
x
l
(k)− π
ω(k),l
H
∑
ˆ r∈[|Q
h
|]
q
h,ˆ r
(k)W
h,ˆ r
(k− 1)
∑
˜ r∈[|Q
h
|]
W
h,˜ r
(k− 1)
− ∑
j∈[n]
π
ω(k), j
H
P
jl ∑
ˆ r∈[|Q
h
|]
(1− q
h,ˆ r
(k))W
h,ˆ r
(k− 1)
∑
˜ r∈[|Q
h
|]
W
h,˜ r
(k− 1)
+
π
ω(k),l
H
q
h,r
(k)+
∑
j∈[n]
π
ω(k), j
H
P
jl
(1− q
h,r
(k))
+α
0,ω(k),l
!
Note that, instead of using the negative of latencies as utilities as in the EWA model, we use the
inverse of latencies as utilities here. This is only for simplicity and other choices are possible.
Next, agents in subclass h reinforce strategies that would have worked well by replication
as follows. Two strategies q
h,ˆ r
(k) and q
h,˜ r
(k) are chosen randomly with uniform probability and
replacement, the one with the higher foregone utility is selected to be in the strategy space Q
h
(k+1)
for stage k+ 1, i.e.,
q
h,r
(k+ 1)=
q
h,ˆ r
(k) if W
h,ˆ r
(k)≥ W
h,˜ r
(k)
q
h,˜ r
(k) if W
h,ˆ r
(k)< W
h,˜ r
(k)
The foregone utility is updated accordingly as follows:
W
h,r
(k)=
W
h,ˆ r
(k) if q
h,r
(k+ 1)= q
h,ˆ r
(k)
W
h,˜ r
(k) if q
h,r
(k+ 1)= q
h,˜ r
(k)
The replication is repeated|Q
h
| times until a complete strategy set Q
h
(k+ 1) is obtained.
49
We simulated the above extended IEL model for the following parameters: Pr= 0.033, σ =
0.01, Q
h
(0)={0.1,0.2,...,0.9,1},∀h∈[H]. The remaining parameters n,|Ω|, µ
0
, P, H,α
0
,α
1
,
π are the same as in Section 5.5.3.
Figure 5.3 shows convergence of link-wise flow x
i
(k) to signalπ
ω(k),i
, though with more notice-
able oscillations when compared with the extended EWA model in Section 5.5.3. The oscillations
are due to the randomness associated with the use of normal distribution for revising the strategy
set.
Figure 5.3: Evolution of x(k)− π
ω(k)
with time k for extended IEL model.
5.5.5 Adjusted Mistrust Dynamic
In Section 5.1, we provide a model (5.1) for the payoff differences to be aggregated. In our ex-
perimental study in Section 6.2, we modify the aggregation model as follows due to practical
considerations.
Letℓ
1
,...,ℓ
n
be the realized travel times on the n links, and let i be the recommended route.
Then, we let payoff difference be equal toℓ
i
− min
j∈[n]
ℓ
j
. The aggregation of payoff difference
over the entire P-agent population is given by:
u(k)=ν
∑
i, j∈[n]
ℓ
ω(k),i
− min
j∈[n]
ℓ
ω(k), j
π
ω(k),i
(5.14)
We report simulation results under the adjusted mistrust dynamics in (5.2)-(5.6) and (5.14).
We consider a network with two parallel links. µ
0
(ω
1
)= 0.6= 1− µ
0
(ω
2
),ν = 0.5,θ(1)= 0.5,
50
ˆ
θ(1)= 0.25, and the total demand is set to be 1. Also, the link latency functions are affine with
coefficients:
α
0
=
i=1 i=2
ω
1
18 20
ω
2
18 17
, α
1
=
i=1 i=2
ω
1
4 2
ω
2
1 2
Figure 5.4: Evolution of regret and its forecast over two parallel links for adjusted
mistrust dynamics.
Figure 5.4 suggests that convergence of regret forecast to the actual regret. However the actual
regret converges to a non-zero value which implies a discrepancy between the actual link flows and
the Bayes correlated equilibrium induced by the signaling strategy.
51
Chapter 6
Experimental Study on Learning Correlated Equilibrium
This chapter includes findings from an experimental study on regret-based learning rules and cor-
related equilibrium for repeated game setting that studied in Chapter 5. In this chapter, we restrict
ourselves to diagonal atomic policies π ={π
ω
∈P
n
(ν) : ω∈Ω}. Additionally, all agents are
participating, that isν = 1.
6.1 Experiment Procedure
Simulating non-atomic setup requires simultaneous participation by a very large number of par-
ticipants. Practical limitations on laboratory experiments therefore necessitate consideration of a
pseudo-non-atomic setup, consisting of mixture of a small number of human participants and a
large number of simulated participants. We performed experiments in the limiting case of one
human participant at a time with change of participant after a fixed number of rounds.
The experiment protocol was reviewed and approved by the Institutional Review Board at the
University of Southern California (USC # UP-22-00107). The experiment was conducted in the
Kaprielian Hall at USC during April 2022 and May 2022. A total of 34 participants with a good
command of the English language and with no prior experience with our experiment were recruited
from the undergraduate population of the university. Upon arrival at the laboratory, participants
were given a presentation by an experiment personnel, aided by slide illustrations, available at
[39]. The experiments was run on a networked desktop computer, and programmed using Python.
52
The program used the Python Flask framework to create a local host server that hosted the study
during experiments. All data was collected in a SQLite database. This type of database is file-
based, and so all data was stored within a file on the password-protected desktop computer locally
that remained in the locked laboratory. Except for experiment personnel, only one participant was
present in the laboratory during an experiment session.
Figure 6.1: User interface before route selection during a typical scenario.
During the experiment, a participant had to select one among three routes in a traffic network
on computer, in a series of rounds or scenarios. The traffic network was chosen to be outside
the Los Angeles area to minimize bias. The information displayed during a typical scenario, as
shown in Figure 6.1, consisted of: (i) Traffic network with three possible routes to go from the
origin to the destination, marked as Route 1, Route 2, and Route 3. The route highlighted is
the recommended route; (ii) Travel time forecasts on different routes corresponding to different
underlying states, shown via histogram in the right side of the screen; (iii) Average rating at the top
left indicating what other participants in the past thought of the quality of recommendation in terms
of recommending the fastest path. The average rating is on a scale of 1 to 5 with 5 being the highest;
and (iv) Menu at the bottom to enter route selection. During the pre-experiment presentation,
the participants were instructed to make route choice based on the information contained in (i)-
(iii). Specifically, they were instructed to let their likelihood of following the recommendation be
proportional to the displayed rating, and to use the histogram when they decide not to follow the
53
recommendation. Upon selecting a route, and pressing the "Select" button, the computer revealed
the actual travel time on top of the histogram for each route as in Figure 6.2. These are the forecasts
for the specific state value associated with the round.
Figure 6.2: User interface after route selection and before review submission
during a typical scenario.
The travel time forecasts are computed from forecasted link flows using the link latency func-
tions in (2.1), where the state ω is predetermined and is the same for all the participants in the
same round. The forecasted link flows are computed by assuming that the other agents follow
recommendation with a likelihood proportional to the displayed rating, and for those who do not
follow, their route choice mimics the alternate choices made by previous participants under same
state and same displayed rating; see (6.1)-(6.2) for details. The participant is required to enter
his/her own experience with the quality of recommendation in this scenario by choosing a review
rating on a sliding scale at the bottom of the screen. During the pre-experiment presentation, the
participants were instructed to consider not only the rankings but also the difference in values of
actual travel times of both the recommended route and the route chosen by participants to come
up with a rational review rating. Upon pressing the "Rate" button, the computer showed the next
scenario whose displayed rating is the average of the review just submitted by the participant and
the reviews submitted by previous participants under same state and same displayed rating; see
(6.1) for details. The process repeats until all the scenarios are done, after which the participant
54
was asked to fill a survey to get insight into his/her route choice decision making strategy during
the experiment. This survey is provided in the Appendix.
All participants received a show-up renumeration of US $10 and an additional maximum of
US $10 depending on the level of homogeneity that they exhibit. For a given participant, this is
measured in terms of the difference between his/her empirical conditional route choice probabil-
ity distributions and that of the previous participants. This is then mapped proportionally to the
renumeration amount. The conditional probability distribution here refers to the probability of
choosing route j when recommended route i. The additional renumeration averaged $9.5 across
all participants. We used a pretty generous mapping from a participant’s level of homogeneity to
his/her renumeration, and therefore the high average is not necessarily an indication of high level of
homogeneity among the participants. All the renumeration was paid in cash. Participant sessions
lasted for approximately an hour on average.
For each scenario, we recorded the displayed rating (between 0 and 5), the recommended route
({1,2,3}), the selected route ({1,2,3}), the submitted review (between 0 and 5), the start and end
times (the local clock time of the computer running the experiment program).
6.2 Adapting Theory to Experiment Setup and Hypotheses
Let there be S human participants in total, each participating for K rounds. The numbering of the
participants is in the order in which they participate. Figure 6.3 shows that the response from a
participant in every round consists of route chosen in that round as well as the review about the
quality of recommendation in that round. In Section 5.1, we provided a model for an estimate
of this review to be related to a notion of instantaneous regret. We modify the notion for the
experiments as follows. Letℓ
1
,...,ℓ
n
be the realized travel times on the n links, and let i be the
recommended route. Then, we let instantaneous regret be equal toℓ
i
− min
j∈[n]
ℓ
j
.
The responses of the participants are summarily stored and updated as follows:
55
participant s
Eq. (6.3)
Eq. (6.1) (6.2)
ˆ m
s
route recommendation i
( f
s
,r
s
)
route choice j
review
Figure 6.3: Input-output illustration for the participant and the simulated model.
• R(s,k;ω,r): set of the reviews submitted by participants 1,...,s− 1 in all their rounds
1,...,K, and by participant s in rounds 1,...,k, when the state realization was ω, and the
displayed rating was r. Let
¯
R(s,k;ω,r) be the average of those review values.
• M(s,k;ω,r): set of instantaneous regret for participants 1,...,s− 1 in all their rounds
1,...,K, and by participant s in rounds 1,...,k, when the state realization was ω, and the
displayed rating was r. Let
¯
M(s,k;ω,r) be the average of these regret values.
• N (s;i, j): number of times participants 1,...,s− 1 choose route j when recommended route
i in all their rounds 1,...,K.
Furthermore, the matrix P is intrinsic to the participants and needs to be estimated online during
the experiments. We adopt the following natural estimate:
ˆ
P
s
i j
=
N (s;i, j)
∑
j∈[n], j̸=i
N (s;i, j)
i̸= j
0 i= j
i, j∈[n], s∈{2,...,S}
starting with a specified
ˆ
P
1
for the first participant.
56
The ratings to be displayed as well as link flows for the human participant s∈[S] in various
rounds are given by
1
: for k= 1,...,K− 1,
r
s
(k+ 1)=
k
k+ 1
r
s
(k)+
1
k+ 1
¯
R(s,k;ω(k),r
s
(k)) (6.1)
f
s
i
(k+ 1)=π
ω(k+1),i
r
s
(k+ 1)
r
max
+
∑
j∈[n]
ˆ
P
s
ji
π
ω(k+1), j
1− r
s
(k+ 1)
r
max
(6.2)
starting with a specified r
s
(1) for the first round.
We modify the model for human participant from (5.1)-(5.2) as: k= 1,...,K,
ˆ u
s
(k)=
¯
M(s,k;ω(k),r
s
(k))
ˆ m
s
(k)=
1
k
( ˆ u
s
(1)+...+ ˆ u
s
(k))
(6.3)
6.2.1 Hypotheses
Our objective is to test the following hypotheses:
(H1) Empirical probability of a participant choosing recommended route is proportional to the
value of the rating displayed with the recommendation. The findings for this hypothesis are
in Section 6.3.2.
(H2) The displayed rating, i.e., r
s
(·), converges to its maximum value r
max
as s increases. The em-
pirical probability of participants choosing recommended route converges to 1 as s increases.
The findings for this hypothesis are in Sections 6.3.4 and 6.3.5.
(H3) The empirical distributions of route choices when not following recommendation, i.e., off-
diagonal entries of
ˆ
P
s
, converge as s increases. The findings for this hypothesis are in Sec-
tion 6.3.3.
(H4) There is a negative correlation between time-averaged aggregated regret ˆ m
s
(·) and the dis-
played rating r
s
(·). The findings for this hypothesis are in Section 6.3.6.
1
The rating r
s
is quantized to the nearest first decimal place when displaying to the participants.
57
6.3 Experiment Findings
6.3.1 Experiment Parameters
The parameters in our experiments were n= 3,|Ω|= 5, S= 33
2
, K= 100,
ˆ
P
1
=
0 0 1
0 0 1
0.5 0.5 0
,
r
s
(1)= 2.5 for all s∈[S], r
max
= 5,
α
0
=
i=1 i=2 i=3
ω
1
5 25 4
ω
2
20 15 24
ω
3
15 20 14
ω
4
11 15 16
ω
5
8 10 20
, α
1
=
i=1 i=2 i=3
ω
1
4 2 1
ω
2
1 2 3
ω
3
2 3 4
ω
4
3 5 2
ω
5
5 4 5
, π =
i=1 i=2 i=3
ω
1
0.1 0 0.9
ω
2
0 1 0
ω
3
0.6 0 0.4
ω
4
0.9 0.1 0
ω
5
0.6 0.4 0
Theπ chosen is an optimal solution to the information design problem [40], i.e., it minimizes so-
cial cost among all obedient policies. Indeed, the social cost under this policy is about 13% lower
than under Bayes Wardrop equilibrium. The link-wise expected latencies with respect to posteri-
ors formed by receiving recommended routes 1, 2 and 3 respectively are(E
r1
[ℓ
1
],E
r1
[ℓ
2
],E
r1
[ℓ
3
])=
(13.95,16.83,16.88),(E
r2
[ℓ
1
],E
r2
[ℓ
2
],E
r2
[ℓ
3
])=(16.95,15.20,22.56) and(E
r3
[ℓ
1
],E
r3
[ℓ
2
],E
r3
[ℓ
3
]
)=(12.31,21.80,11.75). Moreover, the link-wise expected latencies with respect to prior µ
0
are
(E
µ
0
[ℓ
1
],E
µ
0
[ℓ
2
],E
µ
0
[ℓ
3
])=(15.85,16.75,16.60). Apparently, following recommended routes re-
sults in less expected latency not only comparing to the alternative notfollow strategies, but also
comparing to Bayes Wardrop equilibrium strategy when there is no recommendation. The se-
quence{ω(k)}
100
k=1
is sampled offline in an i.i.d. manner from µ
0
=[0.1 0.2 0.4 0.05 0.25]. The
empirical distribution of the sampled sequence is equal to the prior µ
0
, and the same sampled
sequence is used for all the participants.
2
34 participants were recruited in total, but one participant faced technical error during experiment, and therefore
we discarded the data from this participant from our analysis.
58
6.3.2 Recommendation Following vs Displayed Rating
Figure 6.4: Linear regression between empirical probability of recommendation
and quantized displayed rating.
For the sake of analysis, we quantized the displayed rating by rounding off to one decimal
place. We computed empirical probabilities of following recommendation over all participants and
all scenarios for each quantized displayed rating. We discarded data associated with the displayed
rating interval[2.6,3.9] because the rating changed so rapidly in[2.6,3.9] that the participants saw
a specific display rating in the range [2.6,3.9] for less than one scenario on average. Given such a
low sample size, the statistics for following recommendation conditional on display rating in the
interval[2.6,3.9] would not be reliable. Moreover, these discarded data points accounted for only
4.8% of all data points. Figure 6.4 shows the outcome of linear regression between the empirical
probabilities and displayed ratings. The coefficient of determination R
2
= 0.9522 suggests a strong
correlation. The positive slope suggests a positive correlation, and the negative intercept suggests
that participants would not follow the recommendations when the displayed rating is less than
0.2113
0.2397
≈ 0.9. In the extreme case when the quantized display rating is 5, the linear regression model
generates a projected empirical probability of 0.9872 which is consistent with the theoretical model
in (5.4).
59
6.3.3 Long Run Behavior of
ˆ
P
Recall that
ˆ
P
s
is held constant throughout the session of participant s, and updated to
ˆ
P
s+1
at the
end of the session. Also, note from the definition that
ˆ
P
s
is a row-stochastic matrix, and that its
diagonal entries are zero. In our current case of n= 3, this therefore leaves 3 independent entries,
say
ˆ
P
s
12
,
ˆ
P
s
23
and
ˆ
P
s
31
. The evolution of these quantities with increasing s is shown in Figure 6.5.
The plots suggest convergence of
ˆ
P
s
as s increases.
Figure 6.5: Evolution of
ˆ
P
s
with participant number s.
6.3.4 Long Run Behavior of the Displayed Rating
Figure 6.6: Evolution of end of session display rating r
s
(100) with participant
number s.
Recall from (6.1) that in order to simulate review collection from multiple participants partici-
pating simultaneously, we rather collect reviews from previous scenarios by previous participants
60
who made route choice under the same realization of state and for the same displayed rating. Nat-
urally, the accuracy of such a surrogate becomes better with more data collection. Motivated by
this, we studied the long run behavior of displayed rating r
s
in two ways. First, we studied the end
of session displayed rating with increasing participant number. As Figure 6.6 illustrates, that this
value increases to roughly 4.67, and seems to be settling around this value, by the last participant.
Second, we studied the displayed rating during the session of the last participant participant #33.
Figure 6.7 shows a monotonic increase of the displayed rating from 2.5 (recall that the initial con-
dition is set to be r
s
(1)= 2.5 for all the participants) to a value of around 4.67 by the end of the
session. The evolution of display rating is understandably smoother given that the set
¯
R used in
(6.1) has sufficient samples for all frequently occurring combinations of ω and r by the time of the
last participant.
Figure 6.7: Evolution of the display rating during the session of the last partici-
pant, i.e., r
33
(k), with scenario number k.
6.3.5 Long Run Empirical Probability of Following Recommendation
Figure 6.8 shows the evolution of cumulative empirical probability of following recommendations
from the start of session for participant 1 through the end of session for s for increasing value of s.
This cumulative empirical probability in Figure 6.8 seems to converge to around 0.87. This
compares very well with the prediction of the linear regression model of Section 6.3.2, according
61
Figure 6.8: Evolution of end of session cumulative empirical probability of fol-
lowing recommendation with increasing participant number.
to which the empirical probability of following recommendation for the display rating of 4.67 (the
long run value from Section 6.3.4) is approximately equal to 0.2397× 4.67− 0.211= 0.91.
6.3.6 Displayed Rating vs Time-Averaged Aggregated Regret
Based on the data from all the 3300 scenarios, we could not find a good correlation between the
displayed rating and time-averaged aggregated regret (no regression model gave a good fit). We
then repeated analysis after discarding the data which satisfy at least one of the following (for
reasons similar to the one described in Section 6.3.2):
• The displayed rating was less than 4. This was to remove the transient effect from initial
condition r
s
[1]= 2.5. This accounted for only about 5 % of total data points.
• The time-averaged aggregated regret (cf. (6.3)) is less than 2.6 minutes. This was because
the displayed ratings varied vastly during such scenarios. This accounted for about 50 % of
total data points.
Figure 6.9 (e) shows the outcome of linear regression between the displayed rating and the
corresponding time-averaged aggregated regret values after pre-processing the data using afore-
mentioned threshold values. The R
2
= 0.6167 value suggests a moderate negative correlation.
62
Figure 6.9 also shows linear regression outcomes using different combinations of threshold values.
For example, Figure 6.9 (f) shows the regression after discarding data with displayed rating less
than 4, or when the time-averaged aggregate regret is less than 4 minutes, and that this resulted in
retaining 1% of total data points. The scenario in Figure 6.9 (e) gives the maximum R
2
value.
(a) (0, 2.25, 100%) (b) (2.6, 2.25, 46%) (c) (4, 2.25, 1%)
(d) (0, 4, 95%) (e) (2.6, 4, 44%) (f) (4, 4, 1%)
(g) (0, 4.6, 33%) (h) (2.6, 4.6, 13%) (i) (4, 4.6, 0%)
Figure 6.9: Relation between displayed rating and time-averaged aggregated re-
gret for all participants.
We also investigated correlation on an individual participant basis. We found that a large num-
ber of participants submitted r
max
= 5 review too often. Indeed, only 9 out of the 33 participants
submitted review of 5 less than half of their individual 100 scenarios. This might be because the
63
default review on the slider is set to 5 before the participant adjusts it to submit his/her own review,
and hence data from participants who submit a review 5 too often might be biased.
Analysis of data on an individual basis from 9 participants who submitted review of 5 less
than half times revealed that, for 5 of such participants, there was a strong negative correlation
(R
2
> 0.93) between displayed rating and time-averaged aggregated regret. This relationship for
one such participant (#21) is shown in Figure 6.10, where we discarded data points (≈ 4% of total
data) when the review rating was less than 4 to remove the transient effect from initial condition
r
21
(1)= 2.5. The average R
2
value
3
for all the 9 participants who submitted review of 5 less than
half times was 0.649. On the other hand, the average R
2
value for the other 24 participants who
submitted review of 5 more than half times was 0.4664.
Figure 6.10: Relation between displayed rating and time-averaged aggregated
regret for a sample participant.
We repeated our analysis by taking into account participants’ response to feedback survey,
specifically to questions 6 and 7 (see Appendix). 12 participants self-reported to be using the strat-
egy suggested in pre-experiment instructions for making route choice decision by carefully taking
into account recommended route, displayed rating and histograms. The remaining 21 participants
self-reported to be following recommendations if the displayed rating is above a threshold value.
The average of this self-reported threshold value was 4.175.
3
The average R
2
value for multiple participants refers to the average of the R
2
values computed for individual
participants.
64
We analyzed the data separately for these two types of participants. After discarding data points
corresponding to displayed rating less than 4, analysis of the remaining data from the 12 partici-
pants who used the strategy suggested in the instructions revealed a moderate negative correlation
between displayed rating and time-averaged aggregated regret (average R
2
= 0.5897). For 5 of
such participants, there was a strong negative correlation (average R
2
> 0.93). On the other hand,
for the rest 21 participants who used threshold based strategy, their average R
2
value was 0.4708.
There were 2 participants who self-reported to use the recommended strategy, and also sub-
mitted review of 5 less than half times. The average R
2
value for these 2 participants was 0.9505,
suggesting a strong (negative) correlation between displayed rating and time-averaged aggregated
regret for them.
65
Chapter 7
Conclusion and Future Work
7.1 Conclusion
Existing works on information design for non-atomic routing games provide useful insights, whose
generalization however is not readily apparent. Relatedly, a computational approach to operational-
ize optimal information design for general settings does not exist to the best of my knowledge. By
making connection to semidefinite programming (SDP), this thesis not only fills this gap, but also
allows to leverage computational tools developed by the SDP community. The latter is particularly
relevant for extending the approach to non-atomic games beyond routing.
This thesis also provides convergence analysis for a repeated non-atomic routing game with
partial signaling. The decision models are reminiscent of adaptive models studied previously in
the context of dynamic procedures leading to the set of correlated equilibria. Our analysis lays the
foundation to extend such analysis to repeated scenarios with explicit presence of a coordinator,
albeit in the context of non-atomic routing games.
The immediate utility of the proposed experimental setup is to provide a setting which induces
high rate of following of private route recommendation in traffic networks with uncertain travel
times. The fact that the experiment design mirrors setup for a learning model with provable con-
vergence to the Bayes correlated equilibrium induced by the obedient policy gives further credence
to this setting. In addition to providing evidence for overall convergence, the analysis also provides
66
insight into empirical validity of key components of the learning model. Specifically, a strong cor-
relation was found between the likelihood of a particular agent following recommendation with
the summary statistic of experience of other agents encapsulated by the displayed rating. The em-
pirical route choice distributions conditional on recommendation also seem to converge. The fact
that only moderate correlation was found in general between the average regret and the displayed
rating, and that several participants self-reported to follow the recommendation even for medium
values of displayed rating suggests further investigation is needed to conclusively establish regret
as a key driver in learning. The experimental strategy to approximate non-atomic games with
atomic settings could be of independent interest for experimental studies in non-atomic games,
especially with regards to correlated equilibria.
7.2 Future Work
There are several directions for future work. Regarding the computational properties of infor-
mation design, the bound in Theorem 1 may be computationally prohibitive for large networks.
Proposition 3, related discussion in Remark 7, and Section 4.1 on the other hand suggest the pos-
sibility of exploring problem structure to tighten the bound. A counterpart to Theorem 1 for public
signaling policies is open. A relatively unexplored direction is sub-optimality bounds for simple
classes of signaling policies such as diagonal atomic. Finally, it would be interesting to utilize the
approach in this thesis to quantify the reduction in price of anarchy under information design. This
will complement, e.g., preliminary analysis in [7].
In regard to the convergence in repeated setting, it would be interesting to extend the analysis
to general increasing latency functions, and to settings involving multiple destinations. The notion
of conditional universal consistency in [20] potentially offers a framework to unify and generalize
the specific decision models of the participating and non-participating agents considered in this
thesis. Finally, it would be interesting to consider a multiscale routing decision framework to
enable integration of traffic flow dynamics into the decision-theoretic analysis, e.g., as in [41].
67
As to the experimental study of the repeated game, directions for testing robustness of the ex-
perimental findings reported in this thesis include randomizing initial condition on display rating
for participants as well as the default value on the slider used for collecting review from the par-
ticipants, and relaxing explicit suggestions to the participants for following a specific route choice
strategy. It would also be interesting to repeat the experiments under an obedient policy which
induces a bad correlated equilibrium [29], i.e., which is (Pareto) inferior to the Bayes Wardrop
equilibrium. It would also be interesting to explore connection between our agent model in (5.1)-
(5.4) and the learning models in [32, 38].
68
References
1. Wang, H., Li, G., Hu, H., Chen, S., Shen, B., Wu, H., et al. R3: a real-time route recommen-
dation system. Proceedings of the VLDB Endowment 7, 1549–1552 (2014).
2. Herzog, D., Massoud, H. & Wörndl, W. Routeme: A mobile recommender system for person-
alized, multi-modal route planning in Proceedings of the 25th Conference on User Modeling,
Adaptation and Personalization (2017), 67–75.
3. Bergemann, D. & Morris, S. Information design: A unified perspective. Journal of Economic
Literature 57, 44–95 (2019).
4. Kamenica, E. & Gentzkow, M. Bayesian Persuasion. American Economic Review 101, 2590–
2615. doi:10.1257/aer.101.6.2590 (2011).
5. Dughmi, S. & Xu, H. Algorithmic Bayesian Persuasion in ACM Symposium on the Theory of
Computing (2016).
6. Acemoglu, D., Makhdoumi, A., Malekian, A. & Ozdaglar, A. Informational Braess’ Paradox:
The Effect of Information on Traffic Congestion. arXiv preprint arXiv:1601.02039 (2016).
7. Vasserman, S., Feldman, M. & Hassidim, A. Implementing the Wisdom of Waze in IJCAI
(2015), 660–666.
8. Das, S., Kamenica, E. & Mirka, R. Reducing congestion through information design in Aller-
ton Conference on Communication, Control and Computing (2017).
9. Wu, M. & Amin, S. Information Design for Regulating Traffic Flows under Uncertain Net-
work State in 2019 57th Annual Allerton Conference on Communication, Control, and Com-
puting (Allerton) Extended version athttps://arxiv.org/abs/1908.07105 (2019), 671–
678.
10. Tavafoghi, H. & Teneketzis, D. Strategic Information Provision in Routing Games Available
at https://hamidtavaf.github.io/infodesign_routing.pdf.
11. Massicot, O. & Langbort, C. Public Signals and Persuasion for Road Network Congestion
Games under Vagaries. IFAC-PapersOnLine 51, 124–130 (2019).
12. Lasserre, J. B. A semidefinite programming approach to the generalized problem of moments.
Mathematical Programming 112, 65–92 (2008).
69
13. Henrion, D., Lasserre, J.-B. & Löfberg, J. GloptiPoly 3: moments, optimization and semidef-
inite programming. Optimization Methods & Software 24, 761–779 (2009).
14. Bayer, C. & Teichmann, J. The proof of Tchakaloff’s theorem. Proceedings of the American
mathematical society 134, 3035–3040 (2006).
15. Lasserre, J. B. Global optimization with polynomials and the problem of moments. SIAM
Journal on optimization 11, 796–817 (2001).
16. Stein, N. D., Parrilo, P. A. & Ozdaglar, A. Correlated equilibria in continuous games: Char-
acterization and computation. Games and Economic Behavior 71, 436–455 (2011).
17. Mahmassani, H. S. & Jayakrishnan, R. System performance and user response under real-
time information in a congested traffic corridor. Transportation Research Part A: General
25, 293–307 (1991).
18. Wu, M., Amin, S. & Ozdaglar, A. E. Value of information systems in routing games. arXiv
preprint arXiv:1808.10590 (2018).
19. Foster, D. P. & V ohra, R. V . Calibrated learning and correlated equilibrium. Games and Eco-
nomic Behavior 21, 40 (1997).
20. Fudenberg, D. & Levine, D. K. Conditional universal consistency. Games and Economic
Behavior 29, 104–130 (1999).
21. Hart, S. & Mas-Colell, A. A simple adaptive procedure leading to correlated equilibrium.
Econometrica 68, 1127–1150 (2000).
22. Bergemann, D. & Morris, S. Bayes correlated equilibrium and the comparison of information
structures in games. Theoretical Economics 11, 487–522 (2016).
23. Fischer, S., Räcke, H. & Vöcking, B. Fast convergence to Wardrop equilibria by adaptive
sampling methods. SIAM Journal on Computing 39, 3700–3735 (2010).
24. Blum, A., Even-Dar, E. & Ligett, K. Routing without regret: On convergence to Nash equi-
libria of regret-minimizing algorithms in routing games. Theory of Computing 6, 179–199
(2010).
25. Krichene, W., Drighès, B. & Bayen, A. M. Online learning of nash equilibria in congestion
games. SIAM Journal on Control and Optimization 53, 1056–1081 (2015).
26. Iida, Y ., Akiyama, T. & Uchida, T. Experimental analysis of dynamic route choice behavior.
Transportation Research Part B: Methodological 26, 17–32 (1992).
27. Selten, R., Chmura, T., Pitz, T., Kube, S. & Schreckenberg, M. Commuters route choice
behaviour. Games and Economic Behavior 58, 394–406 (2007).
70
28. Cason, T. N. & Sharma, T. Recommended play and correlated equilibria: an experimental
study. Economic Theory 33, 11–27 (2007).
29. Duffy, J. & Feltovich, N. Correlated Equilibria, Good and Bad: An Experimental Study.
International Economic Review 51, 701–721 (2010).
30. Bone, J., Drouvelis, M. & Ray, I. Following Recommendation to Avoid Coordination-Failure
in 2 x 2 Games (Department of Economics, University of Birmingham, 2012).
31. John Duffy, E. K. L. & Lim, W. Coordination via Correlation: An Experimental Study. Eco-
nomic Theory 64, 265–304 (2017).
32. Arifovic, J., Boitnott, J. F. & Duffy, J. Learning correlated equilibria: An evolutionary ap-
proach. Journal of Economic Behavior & Organization 157, 171–190 (2019).
33. Nejat Anbarcı, N. F. & G¨ urdal, M. Y . Payoff Inequity Reduces the Effectiveness of Correlated-
Equilibrium Recommendations. European Economic Review 108, 172–190 (2018).
34. Branston, D. Link capacity functions: A review. Transportation research 10, 223–236 (1976).
35. Zhu, Y . & Savla, K. Information Design in Non-atomic Routing Games with Partial Partici-
pation: Computation and Properties Available athttps://arxiv.org/abs/2005.03000.
36. Nesterov, Y . in High performance optimization 405–440 (Springer, 2000).
37. Vandenberghe, L. & Boyd, S. Semidefinite programming. SIAM review 38, 49–95 (1996).
38. Arifovic, J. & Ledyard, J. A behavioral model for mechanism design: Individual evolutionary
learning. Journal of Economic Behavior & Organization 78, 374–395 (2011).
39. Zhu, Y . & Savla, K. Participant Instruction Slides for the Experiment Long-term Route
Choice Decisions under Personalized Recommendations Available at https://viterbi-
web.usc.edu/~ksavla/misc/OrientationSlides.pptx.
40. Zhu, Y . & Savla, K. Information Design in Non-atomic Routing Games with Partial Partic-
ipation: Computation and Properties. IEEE Transactions on Control of Network Systems 9,
613–624 (2022).
41. Como, G., Savla, K., Acemoglu, D., Dahleh, M. A. & Frazzoli, E. Stability analysis of trans-
portation networks with multiscale driver decisions. SIAM Journal on Control and Optimiza-
tion 51, 230–252 (2013).
42. Fiacco, A. V . Introduction to sensitivity and stability analysis in nonlinear programming
(Elsevier, 1983).
43. Helton, J. W. & Nie, J. A semidefinite approach for truncated K-moment problems. Founda-
tions of Computational Mathematics 12, 851–881 (2012).
71
44. Curto, R. E. & Fialkow, L. A. Truncated K-moment problems in several variables. Journal of
Operator Theory, 189–226 (2005).
45. Abramowitz, M. & Stegun, I. A. Handbook of Mathematical Functions with Formulas, Graphs,
and Mathematical Tables, Chapter 5 228 (US Government printing office, 1964).
46. Dempe, S. & V ogel, S. The Subdifferential of the Optimal Solution in Parametric Optimiza-
tion. Technical report, Fachbereich Mathematik, Technische Universit at Chemnitz, Chem-
nitz, Germany (1997).
72
Appendices
A Matrix Expressions
In the matrices below, the lower triangular entries, generically represented as *, are equal to their
upper triangular counterparts, and e
i
is the standard i-th basis vector inR
n
, i.e., its i-th entry is one
and all the other entries are zero.
Expressions for matrices in (3.1) when D= 1 are as follows:
C
ω
(y)=µ
0
(ω)
y
T
diag(α
1,ω
)y+ y
T
α
0,ω
α
0,ω
T
2
+ y
T
diag(α
1,ω
)
∗ diag(α
1,ω
)
,
whereα
d,ω
=[α
d,ω,1
,...,α
d,ω,n
]
T
, d= 0,1.
Expressions for matrices in (3.4) when D= 1 are as follows:
C=
x
ω
1 x
ω
s
y
0
µ
0
(ω
1
)
2
α
0,ω
1
T
...
µ
0
(ω
s
)
2
α
0,ω
s
T
α
0
T
2
x
ω
1 ∗ µ
0
(ω
1
)diag(α
1,ω
1
) ... 0 µ
0
(ω
1
)diag(α
1,ω
1
)
.
.
.
.
.
. ...
.
.
.
.
.
.
x
ω
s
∗ ∗ ... µ
0
(ω
s
)diag(α
1,ω
s
) µ
0
(ω
s
)diag(α
1,ω
s
)
y ∗ ∗ ... ∗ diag(α
1
)
,
73
α
d
:=
∑
ω
µ
0
(ω)α
d,ω
, α
d,ω
:=[α
d,ω,1
,...,α
d,ω,n
]
T
A
(i, j)
=
x
ω
1 x
ω
s
y
0 µ
0
(ω
1
)
α
0,ω
1
, j
− α
0,ω
1
,i
2
e
T
i
... µ
0
(ω
s
)
α
0,ω
s
, j
− α
0,ω
s
,i
2
e
T
i
0
x
ω
1 ∗ ˜
A
(i, j)
ω
1
+
˜
A
(i, j)
ω
1
T
... 0
˜
A
(i, j)
ω
1
.
.
. ...
.
.
.
.
.
.
x
ω
s
∗ ∗ ...
˜
A
(i, j)
ω
s
+
˜
A
(i, j)
ω
s
T
˜
A
(i, j)
ω
s
y ∗ ∗ ... ∗ 0
˜
A
(i, j)
ω
=µ
0
(ω)
α
1,ω, j
2
e
i
e
T
j
− α
1,ω,i
2
e
i
e
T
i
, B
(i, j)
=
x
ω
1 x
ω
s
y
0 0 ... 0
α
0, j
− α
0,i
2
e
T
i
x
ω
1 ∗ 0 ... 0
˜
A
(i, j)
ω
1
.
.
.
.
.
. ...
.
.
.
.
.
.
x
ω
s
∗ ∗ ... 0
˜
A
(i, j)
ω
s
y ∗ ∗ ... ∗ ∑
k
˜
A
(i, j)
ω
k
S
(k)
x
=
x
ω
1 x
ω
k
x
ω
s
y
− ν 0 ... 1
T
/2 ... 0 0
x
ω
1 ∗ 0 ... 0 ... 0 0
.
.
.
.
.
. ...
.
.
. ...
.
.
.
.
.
.
x
ω
k
∗ ∗ ... 0 ... 0 0
.
.
.
.
.
. ...
.
.
. ...
.
.
.
.
.
.
x
ω
s
∗ ∗ ... ∗ ... 0 0
y ∗ ∗ ... ∗ ... ∗ 0
, S
y
=
x
ω
1 x
ω
s
y
ν− 1 0 ... 0 1
T
/2
x
ω
1 ∗ 0 ... 0 0
.
.
.
.
.
. ...
.
.
.
.
.
.
x
ω
s
∗ ∗ ... 0 0
y ∗ ∗ ... ∗ 0
74
T
(i,k)
x
=
x
ω
1 x
ω
k
x
ω
s
y
0 0 ... − νe
T
i
2
... 0 0
x
ω
1 ∗ 0 ... 0 ... 0 0
.
.
.
.
.
. ...
.
.
. ...
.
.
.
.
.
.
x
ω
k
∗ ∗ ...
1e
T
i
+e
i
1
T
2
... 0 0
.
.
.
.
.
. ...
.
.
. ...
.
.
.
.
.
.
x
ω
s
∗ ∗ ... ∗ ... 0 0
y ∗ ∗ ... ∗ ... ∗ 0
, T
(i)
y
=
x
ω
1 x
ω
s
y
0 0 ... 0
(1− ν)e
T
i
2
x
ω
1 ∗ 0 ... 0 0
.
.
.
.
.
. ...
.
.
.
.
.
.
x
ω
s
∗ ∗ ... 0 0
y ∗ ∗ ... ∗ 1e
T
i
+e
i
1
T
2
B Proof of Proposition 1
It is easy to see that (2.10) is convex since the link latency functions are non-decreasing. There-
fore, the KKT conditions for optimality become necessary and sufficient and can be shown to be
equivalent to the BNE condition in (2.9) following standard arguments.
In order to establish the second part of the proposition, consider the following set of aggregate
link flows:
˜
X :=
n
˜ x
(k)
∈P(1), k∈[m] : ˜ x
(k)
= x
(k)
+ y, x
(k)
∈P(ν), k∈[m], y∈P(1− ν)
o
(B.1a)
=
n
˜ x
(k)
∈P(1), k∈[m] :
∑
i
min
k
˜ x
(k)
i
≥ 1− ν
o
(B.1b)
(B.1b) follows from the following. For every({x
(k)
: k∈[m]},y),∑
i
min
k
(x
(k)
i
+y
i
)≥ ∑
i
y
i
= 1− ν.
Vice-versa, for every ˜ x satisfying the inequality in (B.1b), let ˜ y
i
:= min
k
˜ x
(k)
i
, i∈[n]. With this,
y
i
=
˜ y
i
∑
j
˜ y
j
(1− ν)∈P(1− ν) and x
(k)
= ˜ x
(k)
− y∈P(ν), k∈[m].
75
Convexity of
˜
X is established as follows. Consider any ˜ x
(1,k)
and ˜ x
(2,k)
, k∈[m], belonging to
˜
X . Therefore, for allβ∈[0,1],
∑
i
min
k
β ˜ x
(1,k)
i
+(1− β) ˜ x
(2,k)
i
=
∑
i
β ˜ x
(1,k(i))
i
+(1− β) ˜ x
(2,k(i))
i
≥ β
∑
i
min
k
˜ x
(1,k)
i
+(1− β)
∑
i
min
k
˜ x
(2,k)
i
= 1− ν
where k(i) in the first equality is a k∈[m] for which β ˜ x
(1,k)
i
+(1− β) ˜ x
(2,k)
i
achieves the smallest
value.
Now consider the following:
min
{ ˜ x
(k)
:k∈[m]}∈
˜
X
∑
i,ω
Z
¯ x
Z
∑
k
¯ x
k
˜ x
(k)
i
0
ℓ
ω,i
(z)π
ind
ω
( ¯ x)µ
0
(ω)dzd ¯ x (B.2)
where we use (B.1b) for
˜
X . Recall that
˜
X is convex. The cost function in (B.2), say F( ˜ x), can
be shown to be strictly convex for all ˜ x∈
˜
X as follows. The generic entry of the Hessian of F is
given by:
∂
2
F
∂ ˜ x
(k
1
)
h
∂ ˜ x
(k
2
)
j
=
∑
ω
R
¯ x
¯ x
k
1
¯ x
k
2
ℓ
′
ω,h
(∑
k
¯ x
k
˜ x
(k)
h
)π
ind
ω
( ¯ x)µ
0
(ω)d ¯ x, h= j
0, h̸= j
Therefore, ˜ x
T
∇
2
˜ x=∑
i,ω
R
¯ x
∑
k
¯ x
k
˜ x
(k)
i
2
ℓ
′
ω,i
(∑
k
¯ x
k
˜ x
(k)
i
)π
ind
ω
( ¯ x)µ
0
(ω)d ¯ x> 0 for all ˜ x∈
˜
X , where
the inequality holds because the integrand is non-negative, and for every ¯ x∈P
m
,∑
k
¯ x
k
˜ x
(k)
i
> 0 for
at least one i.
Therefore, (B.2) is strictly convex. Following the definition of
˜
X in (B.1a), it is also easy to
see that (B.2) and (2.10) give the same solution. Therefore, for every global minimum of (2.10),
i.e., BNE flow, there exists a unique aggregate link flow in
˜
X , which is the unique global minimum
of (B.2).
76
C Proof of Proposition 2
Substituting x
2
=ν− x
1
, (2.7) can be re-written in terms of probability measures ˜ π={ ˜ π
ω
: ω∈Ω}
over the single variable x
1
, with the only constraint that each entry of ˜ π is supported over[0,ν]. Let
˜ η :={( ˜ η
0
ω
,..., ˜ η
D+1
ω
) : ω∈Ω} be the reals corresponding to the first D+ 1 moments of ˜ π. The
cost function in (3.1a) and the constraint in (3.1b)-(3.1c) can be expressed as linear combinations of
elements of ˜ η. The additional constraint that the elements of ˜ η have to correspond to the first D+1
moments of probability measures supported on [0,ν] can be written in terms of linear equations
and semidefinite matrix constraints, e.g., see [16, Proposition A.6].
D Proof of Theorem 1
We refer to Section G for the definition of a truncated moment sequence used in this proof.
Substituting x
n
=ν− ∑
i∈[n− 1]
x
i
and y
n
=ν− ∑
i∈[n− 1]
y
i
, (2.7) can be equivalently rewritten
in terms of(x
1
,...,x
n− 1
) and(y
1
,...,y
n− 1
). We use this reduced form of (2.7) for this proof. Let
(π
∗ ,y
∗ ) be an optimal solution to (2.7). The polynomials in x appearing in the cost and constraints
in (2.7) have highest degree D+ 1. Consider a( ˜ π
∗ ,y
∗ ), where, for everyω∈[s], ˜ π
∗ ω
has the same
truncated moment sequence of degree D+ 1 as π
∗ ω
. Such a ( ˜ π
∗ ,y
∗ ) satisfies the constraints in
(2.7) and gives the same cost value as (π
∗ ,y
∗ ), and is therefore also optimal. The theorem then
follows from [14, Theorem 2] according to which, a truncated moment sequence in n− 1 variables
of degree D+ 1 admits a feasible measure if and only if it admits a feasible measure with support
consisting of at most
D+n
D+1
atoms.
77
E Proof of Proposition 3
Equivalence between (3.3) and (2.7)
Let (π
∗ ,y
∗ ) be an optimal solution to (2.7). We show that, for every y∈P(1− ν), there exists
an optimal solution to (3.1) which is diagonal atomic. When specialized to y
∗ , this establishes the
equivalence.
It is sufficient to show that for every π ={π
ω
: ω∈Ω} feasible for (3.1), the diagonal atomic
π
at
:={π
at
ω
(x)=δ(x− E
π
ω
(x)) :ω∈Ω} is also feasible and satisfies J(π
at
)≤ J(π).
For every y,
J(π)− J(π
at
)=
∑
ω,i
µ
0
(ω)y
i
Z
ℓ
ω,i
(x
i
)π
ω
(x)dx− Z
ℓ
ω,i
(x
i
)π
at
ω
(x)dx
+
∑
ω,i
µ
0
(ω)
Z
x
i
ℓ
ω,i
(x
i
)π
ω
(x)dx− Z
x
i
ℓ
ω,i
(x
i
)π
at
ω
(x)dx
≥ 0
where the inequality follows from Jensen’s inequality due to convexity ofℓ
ω,i
(since it is affine)
and of x
i
ℓ
ω,i
; it is easy to see that the latter follows from the convexity ofℓ
ω,i
.
(3.1b) forπ, i= 1 and j= 2 is:
∑
ω
Z
α
1,ω,1
x
2
1
− α
1,ω,2
x
2
x
1
+α
1,ω,1
x
1
y
1
− α
1,ω,2
x
1
y
2
+α
0,ω,1
x
1
− α
0,ω,2
x
1
!
π
ω
(x)dxµ
0
(ω)≤ 0
Plugging x
2
=ν− x
1
, this is equivalent to:
∑
ω
(α
1,ω,1
+α
1,ω,2
)
Z
x
2
1
π
ω
(x)dx+(α
1,ω,1
y
1
+α
0,ω,1
− να
1,ω,2
− y
2
α
1,ω,2
− α
0,ω,2
)· Z
x
1
π
ω
(x)dx
µ
0
(ω)≤ 0
78
R
x
1
π
ω
(x)dx=
R
x
1
π
at
ω
(x)dx by definition, and
R
x
2
1
π
ω
(x)dx≥ (
R
x
1
π
ω
(x)dx)
2
=
R
x
2
1
π
at
ω
(x)dx by
Jensen’s inequality. Therefore,
∑
ω
(α
1,ω,1
+α
1,ω,2
)
Z
x
2
1
π
at
ω
(x)dx+(α
1,ω,1
y
1
+α
0,ω,1
− να
1,ω,2
− y
2
α
1,ω,2
− α
0,ω,2
)· Z
x
1
π
at
ω
(x)dx
µ
0
(ω)≤ 0
which is equivalent to (3.1b) forπ
at
, i= 1 and j= 2. The proof for i= 2 and j= 1 is identical.
The coefficients in B
(i, j)
ω
corresponding to the quadratic terms are zero and therefore,
Z
B
(i, j)
ω
zz
T
π
ω
(x)dx=
Z
B
(i, j)
ω
zz
T
π
at
ω
(x)dx
Hence,π
at
satisfies (3.1c) trivially.
Equivalence between (3.3) and (3.4)
(3.3) is equivalent to:
min
ˆ π∈
ˆ
Π
Z
C· ˆ
Z d ˆ π
s.t.
Z
A
(i, j)
· ˆ
Z d ˆ π≥ 0, i, j∈[n]
Z
B
(i, j)
· ˆ
Z d ˆ π≥ 0, i, j∈[n]
ˆ π is 1-atomic
(E.1)
where the expressions for the symmetric matrices C, A
(i, j)
and B
(i, j)
are in Appendix A
ˆ
Z=
1 ˆ z
T
ˆ z ˆ zˆ z
T
, ˆ z=[x
ω
1
1
,...,x
ω
1
n
,...,x
ω
s
1
,...,x
ω
s
n
,y
1
,...,y
n
]
T
and
ˆ
Π is the set of probability distributions over ˆ z satisfying x
ω
k
∈P(ν) for all k∈[s] and y∈
P(1− ν).
79
It therefore suffices to establish the equivalence between (E.1) and (3.4). We do this via a
constrained version of (3.4):
min
M⪰ 0
ˆ
J(M) s.t. (3.4b)− (3.4g), rank(M)= 1 (E.2)
Specifically, (a) for every ˆ π feasible for (E.1), M( ˆ π) :=
R
ˆ
Z d ˆ π is feasible for (E.2), and hence also
for (3.4); (b) for every M=
1 ˆ η
T
ˆ η ˆ η ˆ η
T
feasible for (E.2), ˆ π =δ(ˆ z− ˆ η) is feasible for (E.1); and
(c) there exists an optimal solution M
∗ for (3.4) such that rank(M
∗ )= 1. (a) and (b) together imply
the equivalence between (E.1) and (E.2), and (c) implies the equivalence between (E.2) and (3.4).
The proofs are as follows.
(a) For a 1-atomic ˆ π, M( ˆ π) =
1
R
ˆ z
T
d ˆ π
R
ˆ zd ˆ π
R
ˆ zd ˆ π
R
ˆ z
T
d ˆ π
implying that M( ˆ π) is rank one and
positive semidefinite. M( ˆ π) satisfying (3.4b) and (3.4c) follow from the corresponding con-
straints in (E.1). (3.4d) follows from the definition of M( ˆ π), and the rest of the constraints
in (3.4) follow from constraints on the support of ˆ π.
(b) Proposition 7 implies that the 1-atomic ˆ π =δ(ˆ z− ˆ η) belongs to
ˆ
Π. Simple algebra shows
the equivalence between the other constraints in (E.1) and the corresponding constraints in
(3.4).
(c) It is sufficient to show that, for every M =
1 ˆ η
T
ˆ η M
0
feasible for (3.4), the rank one
ˆ
M =
1 ˆ η
T
ˆ η ˆ η ˆ η
T
is also feasible and satisfies
ˆ
J(M)≥ ˆ
J(
ˆ
M).
ˆ
J(M)− ˆ
J(
ˆ
M) = C
0
· (M
0
− ˆ η ˆ η
T
), where C
0
is the principal submatrix of C obtained by
removing the first row and the first column. M⪰ 0 implies M
0
− ˆ η ˆ η
T
⪰ 0. It is easy to see
that C
0
is positive semidefinite. Since the inner product of positive semidefinite matrices is
non-negative, this implies that
ˆ
J(M)− ˆ
J(
ˆ
M)≥ 0.
80
Feasibility of (3.4d)-(3.4e) follows from the definition of
ˆ
M. It is easy to see that S
(k)
x
· ˆ
M=
S
(k)
x
· M and S
y
· ˆ
M= S· M, and therefore (3.4f) is also satisfied. Also, for all i∈[n] and k∈[s],
T
(i,k)
x
· ˆ
M=− ν ˆ η
i
+(1e
T
i
)· ( ˆ η ˆ η
T
)=− ν ˆ η
i
+
∑
j
ˆ η
i
ˆ η
j
=− ν ˆ η
i
+ν ˆ η
i
= 0
Similarly, T
(i)
y
· ˆ
M= 0 for all i∈[n], implying (3.4g) is satisfied by
ˆ
M. (3.4b) for M for i= 1
and j= 2 is:
∑
k
α
1,ω
k
,1
M
0
2(k− 1)+1,2(k− 1)+1
− α
1,ω
k
,2
M
0
2(k− 1)+1,2k
+α
1,ω
k
,1
M
0
2(k− 1)+1,2s+1
− α
1,ω
k
,2
M
0
2(k− 1)+1,2s+2
+(α
0,ω
k
,1
− α
0,ω
k
,2
) ˆ η
2(k− 1)+1
µ
0
(ω
k
)≤ 0
Plugging
M
0
2(k− 1)+1,2k
=ν ˆ η
2(k− 1)+1
− M
0
2(k− 1)+1,2(k− 1)+1
M
0
2(k− 1)+1,2s+2
=(1− ν) ˆ η
2(k− 1)+1
− M
0
2(k− 1)+1,2s+1
this is equivalent to
∑
k
(α
1,ω
k
,1
+α
1,ω
k
,2
)(M
0
2(k− 1)+1,2(k− 1)+1
+ M
0
2(k− 1)+1,2s+1
)
+(α
0,ω
k
,1
− α
0,ω
k
,2
− α
1,ω
k
,2
) ˆ η
2(k− 1)+1
µ
0
(ω
k
)≤ 0
(E.3)
M⪰ 0 =⇒
1 ˆ η
2(k− 1)+1
ˆ η
2s+1
∗ M
0
2(k− 1)+1,2(k− 1)+1
M
0
2(k− 1)+1,2s+1
∗ ∗ M
0
2s+1,2s+1
⪰ 0
=⇒
M
0
2(k− 1)+1,2(k− 1)+1
M
0
2(k− 1)+1,2s+1
∗ M
0
2s+1,2s+1
−
ˆ η
2(k− 1)+1
ˆ η
2s+1
ˆ η
2(k− 1)+1
ˆ η
2s+1
⪰ 0
81
Inner product with
α
1,ω
k
,1
+α
1,ω
k
,2
α
1,ω
k
,1
+α
1,ω
k
,2
0 0
⪰ 0 gives
(α
1,ω
k
,1
+α
1,ω
k
,2
)(M
0
2(k− 1)+1,2(k− 1)+1
+ M
0
2(k− 1)+1,2s+1
)
≥ (α
1,ω
k
,1
+α
1,ω
k
,2
)( ˆ η
2
2(k− 1)+1
+ ˆ η
2(k− 1)+1
ˆ η
2s+1
) (E.4)
Plugging into (E.3) implies that (3.4b) is satisfied by
ˆ
M for i= 1 and j= 2. The proof for
i= 2 and j= 1, as well as for (3.4c), follows similarly.
F Proof of Theorem 2
For everyν∈[0,1], the feasible set for (3.3) is non-empty. Among the constraints that characterize
the feasible set, the only ones which depend onν are the linear equalities and inequalities associ-
ated with the characterization ofP(ν) andP(1− ν), and are therefore continuous inν. There-
fore, the feasible set is continuous inν∈[0,1].
1
Furthermore, continuity of link latency functions
implies that J
diag
(x,y) is continuous. Therefore, [42, Theorem 2.2.2] implies that J
diag,∗ (ν) is con-
tinuous and the set of optimal solutions to (3.3) is upper semi-continuous in ν∈[0,1], which in
turn implies that there exists optimal solution(x
∗ (ν),y
∗ (ν)) which is continuous inν∈[0,1]. For
such a solution, C
∗ i j
(ν) :=∑
ω
µ
0
(ω)
ℓ
ω,
i
(x
∗ ,ω
i
(ν)+ y
∗ i
(ν))− ℓ
ω, j
(x
∗ ,ω
j
(ν)+ y
∗ j
(ν))
, i, j∈[n],
are also continuous. Consequently, almost every ν ∈[0,1] belongs to a non-zero interval over
which, for every i, j∈[n], C
∗ i j
(ν) is either non-positive or positive. Since J
diag,∗ (ν) is continuous,
it suffices to show that J
diag,∗ (ν) is monotonically non-increasing over such intervals.
Consider one such interval[ν
1
,ν
2
]⊆ [0,1], and define the following over it, for ε∈P
n
:
x
ω
i
(ε,ν)= x
∗ ,ω
i
(ν
1
)+ε
i
(ν− ν
1
), y
i
(ε,ν)= y
∗ i
(ν
1
)− ε
i
(ν− ν
1
), i∈[n],ω∈Ω
(F.1)
1
We forego excessive formalism in arguing about continuity of the feasible set and the optimal solution set with
respect to ν∈[0,1]. A formal argument would require to define these sets as point to set mappings and study the
continuity of such mappings, e.g., see [42, Definition 2.2.1], but would not add further insight.
82
(F.1) implies that, for allε∈P
n
andν∈[ν
1
,ν
2
],
x
ω
i
(ε,ν)+ y
i
(ε,ν)= x
∗ ,ω
i
(ν
1
)+ y
∗ i
(ν
1
), i∈[n],ω∈Ω
C
i j
(ε,ν) :=
∑
ω
µ
0
(ω)(ℓ
ω,i
(x
ω
i
(ε,ν)+ y
i
(ε,ν))− ℓ
ω, j
(x
ω
j
(ε,ν)+ y
j
(ε,ν)))
=
∑
ω
µ
0
(ω)(ℓ
ω,i
(x
∗ ,ω
i
(ν
1
)+ y
∗ i
(ν
1
))− ℓ
ω, j
(x
∗ ,ω
j
(ν
1
)+ y
∗ j
(ν
1
)))= C
∗ i j
(ν
1
), i, j∈[n]
(F.2)
(3.3a) and (F.2) imply that for allε∈P
n
andν∈[ν
1
,ν
2
],
J
diag
(ε,ν) := J
diag
(x(ε,ν),y(ε,ν))
=
∑
i,ω
µ
0
(ω)(x
∗ ,ω
i
(ν
1
)+ y
∗ i
(ν
1
))ℓ
ω,i
(x
∗ ,ω
i
(ν
1
)+ y
∗ i
(ν
1
))
= J
diag,∗ (ν
1
)
If(x(ε,ν),y(ε,ν)) is feasible, then J
diag,∗ (ν)≤ J
diag
((x(ε,ν),y(ε,ν)))= J
diag,∗ (ν
1
), thereby es-
tablishing the theorem. We now establish feasibility of(x(ε,ν),y(ε,ν)).
It is straightforward to check that∑
i
x
ω
i
(ε,ν)=ν and∑
i
y
i
(ε,ν)= 1− ν for allε∈[0,1] and
ω. By construction in (F.1), x
ω
i
(ε,ν)≥ x
∗ ,ω
i
(ν
1
)≥ 0 for all i,ω, where the second inequality
follows from optimality, and hence feasibility, of x
∗ ,ω
i
(ν
1
). Noting from (F.1) that y
i
(ε,ν), i∈[n],
is non-increasing inν, its non-negativity is ensured for allν by ensuring non-negativity forν=ν
2
.
This corresponds to choosing:
ε∈{˜ ε∈P
n
| ˜ ε
i
≤ y
∗ i
(ν
1
)/(ν
2
− ν
1
), i∈[n]} (F.3)
The set in (F.3) is non-empty because it contains ε = y
∗ (ν
1
)/(1− ν
1
). The feasibility of the
inequalities in (3.3b)-(3.3c) is established for a given (i, j) by considering the sign of C
∗ i j
(ν
1
) as
follows.
83
• C
∗ i j
(ν
1
)≤ 0. (F.2) implies C
i j
(ε,ν)≤ 0, which in turn implies that (3.3c) hold true for(i, j).
Feasibility of (3.3b) for(i, j) also follows from (F.2):
∑
ω
µ
0
(ω)x
ω
i
(ε,ν)
ℓ
ω,i
(x
ω
i
(ε,ν)+ y
i
(ε,ν))− ℓ
ω, j
(x
ω
j
(ε,ν)+ y
j
(ε,ν))
=
∑
ω
µ
0
(ω)x
∗ ,ω
i
ℓ
ω,i
(x
∗ ,ω
i
(ν
1
)+ y
∗ i
(ν))− ℓ
ω, j
(x
∗ ,ω
j
(ν
1
)+ y
∗ j
(ν
1
))
+ε
i
(ν− ν
1
)C
∗ i j
(ν
1
)
≤ ∑
ω
µ
0
(ω)x
∗ ,ω
i
ℓ
ω,i
(x
∗ ,ω
i
(ν
1
)+ y
∗ i
(ν))− ℓ
ω, j
(x
∗ ,ω
j
(ν
1
)+ y
∗ j
(ν
1
))
≤ 0
where the last inequality follows from the feasibility of(x
∗ (ν
1
),y
∗ (ν
1
)).
• C
∗ i j
(ν
1
)> 0 and hence y
∗ i
(ν
1
)= 0. The only feasible solution to (F.3) in this case is ε
i
= 0.
Therefore, y
i
(0,ν)= y
∗ i
(ν
1
)= 0. (F.2) implies C
i j
(0,ν)> 0 and hence (3.3c) is satisfied
with equality for(i, j). Furthermore,
∑
ω
µ
0
(ω)x
ω
i
(0,ν)
ℓ
ω,i
(x
ω
i
(0,ν)+ y
i
(0,ν))− ℓ
ω, j
(x
ω
j
(0,ν)+ y
j
(0,ν))
=
∑
ω
µ
0
(ω)x
∗ ,ω
i
ℓ
ω,i
(x
∗ ,ω
i
(ν
1
)+ y
∗ i
(ν))− ℓ
ω, j
(x
∗ ,ω
j
(ν
1
)+ y
∗ j
(ν
1
))
≤ 0
which establishes (3.3b) for(i, j)
G Technical Results
We need additional definitions for the next result. These are adapted from [43]. A truncated
moment sequence (tms) in ˜ n variables and of degree d is a finite sequence t =(t
a
) indexed by
nonnegative integer vectors a :=(a
1
,...,a
˜ n
)∈N
˜ n
with|a| := a
1
+...+ a
˜ n
≤ d. Given a set K, a
tms t is said to admit a K- probability measureζ , i.e., a nonnegative Borel measure supported in K
with
R
K
dζ = 1, if
t
a
=
Z
K
ˆ z
a
dζ, ∀a∈N
˜ n
:|a|≤ d
where ˆ z
a
= ˆ z
a
1
1
... ˆ z
a
˜ n
˜ n
for ˆ z=(ˆ z
1
,..., ˆ z
˜ n
).
84
We are interested in tms of degree 2. Accordingly, for brevity in notation, for i, j∈[ ˜ n], let
t
i
:= t
(0,...,0, 1
|{z}
i
,0,...,0)
, t
i, j
:= t
(0,...,0, 1
|{z}
i
,0,...,0, 1
|{z}
j
,0,...,0)
(G.1)
We are also specifically interested in probability measures over the set of all ˆ z inR
˜ n
, ˜ n= n(s+ 1),
satisfying ∑
i∈[(k− 1)n+1:kn]
ˆ z
i
=ν for all k∈[s] and ∑
i∈[sn+1:(s+1)n]
ˆ z
i
= 1− ν. Let the set of such
probability measures be denoted asP(ν).
M(w) :=
1 w
1
... w
˜ n
w
1,1
... w
1, ˜ n
... w
˜ n,1
... w
˜ n, ˜ n
w
1
w
1,1
... w
1, ˜ n
w
1,1,1
... w
1,1, ˜ n
... w
1, ˜ n,1
... w
1, ˜ n, ˜ n
.
.
.
.
.
. ...
.
.
.
.
.
. ...
.
.
. ...
.
.
. ...
.
.
.
w
˜ n
w
˜ n,1
... w
˜ n, ˜ n
w
˜ n,1,1
... w
˜ n,1, ˜ n
... w
˜ n, ˜ n,1
... w
˜ n, ˜ n, ˜ n
w
1,1
w
1,1,1
... w
1,1, ˜ n
w
1,1,1,1
... w
1,1,1, ˜ n
... w
1,1, ˜ n,1
... w
1,1, ˜ n, ˜ n
.
.
.
.
.
. ...
.
.
.
.
.
. ...
.
.
. ...
.
.
. ...
.
.
.
w
1, ˜ n
w
1, ˜ n,1
... w
1, ˜ n, ˜ n
w
1, ˜ n,1,1
... w
1, ˜ n,1, ˜ n
... w
1, ˜ n, ˜ n,1
... w
1, ˜ n, ˜ n, ˜ n
.
.
.
.
.
. ...
.
.
.
.
.
. ...
.
.
. ...
.
.
. ...
.
.
.
w
˜ n,1
w
˜ n,1,1
... w
˜ n,1, ˜ n
w
˜ n,1,1,1
... w
˜ n,1,1, ˜ n
... w
˜ n,1, ˜ n,1
... w
˜ n,1, ˜ n, ˜ n
.
.
.
.
.
. ...
.
.
.
.
.
. ...
.
.
. ...
.
.
. ...
.
.
.
w
˜ n, ˜ n
w
˜ n, ˜ n,1
... w
˜ n, ˜ n, ˜ n
w
˜ n, ˜ n,1,1
... w
˜ n, ˜ n,1, ˜ n
... w
˜ n, ˜ n, ˜ n,1
... w
˜ n, ˜ n, ˜ n, ˜ n
⪰ 0
(G.2)
Proposition 7. If a tms t in ˜ n= n(s+ 1) variables and of degree 2 satisfies:
M(t) :=
1 t
1
... t
˜ n
t
1
t
1,1
... t
1, ˜ n
.
.
.
.
.
. ...
.
.
.
t
˜ n
t
˜ n,1
... t
˜ n, ˜ n
⪰ 0; t
i
≥ 0, i∈[ ˜ n]; t
i, j
≥ 0, i, j∈[ ˜ n]
∑
i∈[(k− 1)n+1:kn]
t
i
=ν, k∈[s];
∑
i∈[sn+1:(s+1)n]
t
i
= 1− ν
∑
j∈[(k− 1)n+1:kn]
t
i, j
=νt
i
, i∈[(k− 1)n+ 1 : n], k∈[s]
∑
j∈[sn+1:(s+1)n]
t
i, j
=(1− ν)t
i
, i∈[sn+ 1 :(s+ 1)n]
(G.3a)
85
rank(M(t))= 1 (G.3b)
then it admits a unique P(ν)-probability measure, which is also 1-atomic and given by ζ(ˆ z)=
δ(ˆ z− [t
1
,...,t
˜ n
]
T
).
Proof. (G.3b) implies that
t
i, j
= t
i
t
j
, i, j∈[ ˜ n] (G.4)
[43, Theorem 1.1], which in turn is from [44], implies that a t satisfying (G.3a) admits a unique
P(ν)- probability measure if there exists a tms w in ˜ n variables and of degree 4 such that it satisfies
w
a
= t
a
for all|a|≤ 2, and (G.2), (G.5):
M
i
(w) :=
w
i
w
i,1
... w
i, ˜ n
w
i,1
w
i,1,1
... w
i,1, ˜ n
.
.
.
.
.
. ...
.
.
.
w
i, ˜ n
w
i, ˜ n,1
... w
i, ˜ n, ˜ n
⪰ 0, i∈[ ˜ n]
∑
k∈[(l− 1)n+1:ln]
w
i, j,k
=ν w
i, j
, i, j∈[ ˜ n]
∑
k∈[sn+1:(s+1)n]
w
i, j,k
=(1− ν)w
i, j
, i, j∈[ ˜ n], l∈[s]
(G.5)
where w
i
, w
i, j
, w
i, j,k
, and w
i, j,k,l
are defined similar to (G.1). Let
w
i, j,k
= t
i
t
j
t
k
, w
i, j,k,l
= t
i
t
j
t
k
t
l
, i, j,k,l∈[ ˜ n] (G.6)
(G.4) and (G.6) imply w
i, j,k
= t
i, j
t
k
= w
i, j
t
k
, and therefore,∑
k∈[(l− 1)n+1:ln]
w
i, j,k
= w
i, j∑
k∈[(l− 1)n+1:ln]
t
k
=νw
i, j
for all l∈[s] and∑
k∈[sn+1:(s+1)n]
w
i, j,k
= w
i, j∑
k∈[sn+1:(s+1)n]
t
k
=(1− ν)w
i, j
. (G.6) im-
plies that every column of M
i
(w) is a multiple of the first column, and therefore rank (M
i
(w))= 1.
Since the leading entry w
i
is nonnegative, M
i
(w) is positive semidefinite. Along the same lines,
M(w) has rank one and is positive semidefinite.
86
Since rank(M(w))= 1= rank(M(t)), [43, Theorem 1.1] implies that the unique probability
measureζ is 1-atomic. The expression forζ is then trivial from the fact thatE
ζ
[ˆ z]=[t
1
,...,t
˜ n
]
T
.
H Proof of Proposition 4
Plugging (5.1) into (5.2) and substituting (5.4) gives
m(k)=
m(1)
k
+
1
k
k− 1
∑
t=1
u(t)
=
m(1)
k
+
1
k
k− 1
∑
t=1
π
ω(t)
T
(I− P)ℓ
ω(t)
(x(t))
=
m(1)
k
+
1
k
k− 1
∑
t=1
π
ω(t)
T
(I− P)
α
0,ω(t)
+
D
∑
d=1
diag(α
d,ω(t)
)diag(x(t))
d− 1
x(t)
!
=
m(1)
k
+
1
k
k− 1
∑
t=1
π
ω(t)
T
(I− P)
ℓ
ω(t)
(π
ω(t)
)+
D
∑
d=1
diag(α
d,ω(t)
)
diag(x(t))
d− 1
x(t)
− diag(π
ω(t)
)
d− 1
π
ω(t)
!
=
m(1)
k
+
1
k
k− 1
∑
t=1
π
ω(t)
T
(I− P)ℓ
ω(t)
(π
ω(t)
)
− 1
k
k− 1
∑
t=1
θ(t)π
ω(t)
T
(I− P)
D
∑
d=1
diag(α
d,ω(t)
)
d− 1
∑
i=0
diag(x(t))
d− 1− i
diag(π
ω(t)
)
i
(I− P
T
)π
ω(t)
| {z }
≥ 0
≤ m(1)
k
+
1
k
k− 1
∑
t=1
π
ω(t)
T
(I− P)ℓ
ω(t)
(π
ω(t)
)
=
m(1)
k
+
∑
i, j∈[n]
P
i j
1
k
k− 1
∑
t=1
π
ω(t),i
ℓ
ω(t),i
(π
ω(t)
)− ℓ
ω(t), j
(π
ω(t)
)
(H.1)
(5.10) and the strong law of large numbers implies that, almost surely, lim
k→∞
1
k
∑
k− 1
t=1
π
ω(t),i
ℓ
ω(t),i
(π
ω(t)
)− ℓ
ω(t), j
(π
ω(t)
)
≤ 0 for all i, j∈[n], and therefore, almost surely, limsup
k→∞
m(k)≤ 0, and hence lim
k→∞
θ(k)= 0 from (5.3). The proposition then follows from (5.4).
87
I Proof of Proposition 5
(5.6) implies e
θ
(k+ 1)=(1− β(k+ 1))e
θ
(k), and hence e
θ
(k+ 1)=∏
k+1
t=2
(1− β(t))e
θ
(1). The
proposition then follows from the fact that(1− β
max
)
k
e
θ
(1)≤ e
θ
(k+ 1)≤ (1− β
min
)
k
e
θ
(1).
J Proof of Proposition 6
(5.6) implies
e
θ
(k+ 1)=(1− β(k+ 1))e
θ
(k)+δ(k)
(J.1)
whereδ(k)=θ(k+1)− θ(k). Using the expression ofθ from (5.3) and noting that|[a]
+
− [b]
+
|≤ [a− b]
+
for all a,b∈R, we get
− [u(k)− m(k)]
+
(k+ 1)m
max
≤ δ(k)≤ [u(k)− m(k)]
+
(k+ 1)m
max
Since[a− b]
+
≤| a|+|b| for all a,b∈R, and both m(k) and u(k) are absolutely upper bounded by
m
max
,
− 2
k+ 1
≤ δ(k)≤ 2
k+ 1
(J.2)
Sinceδ(k) can only be bounded by a harmonic sequence whose partial sum sequence diverges, it
is not easy to prove that lim
k→∞
e
θ
(k)= 0 directly using (J.1). Instead, we lower and upper bound
e(k) by e
θ
(k) and e
θ
(k), respectively, and show that these converge to zero. (J.1) can be rewritten
as
e
θ
(k)=
k
∏
t=2
(1− β(t))
!
e
θ
(1)+
k− 2
∑
t=1
k
∏
τ=t+2
(1− β(τ))
!
δ(t)+δ(k− 1) (J.3)
Substituting (J.2) into (J.3),
e
θ
(k) :=
k
∏
t=2
(1− β(t))
!
e
θ
(1)− 2
˜
δ(k)≤ e
θ
(k)≤
k
∏
t=2
(1− β(t))
!
e
θ
(1)+ 2
˜
δ(k)=: e
θ
(k)
(J.4)
88
where
˜
δ(k)=∑
k
t=2
(1− β
min
)
k− t 1
t
=γ
k
∑
k
t=2
γ
− t
t
, withγ := 1− β
min
∈(0,1).
γ
− t
t
is decreasing over
(0,− 1
lnγ
], and increasing over[− 1
lnγ
,+∞). Therefore, with k
∗ :=⌊−
1
lnγ
⌋,
˜
δ(k)=γ
k
k
∗ ∑
t=2
γ
− t
t
+
k
∑
t=k
∗ +1
γ
− t
t
!
≤ γ
k
Z
k
∗ 1
γ
− t
t
dt+
Z
k+1
k
∗ +1
γ
− t
t
dt
≤ γ
k
Z
k+1
1
γ
− t
t
dt
| {z }
=:I
k+1
Change of variable z=γ
− t
gives I
k+1
=
R
γ
− (k+1)
γ
− 1
dz
lnz
= Li(γ
− (k+1)
)− Li(γ
− 1
), where Li(·) is the
logarithmic integral function [45] whose asymptotic behavior is Li(r)= O(
r
lnr
). Sinceγ∈(0,1),
γ
− (k+1)
→∞ as k→∞. Therefore, as k→∞,
0<
˜
δ(k)≤ γ
k
Li(γ
− (k+1)
)− Li(γ
− 1
)
=γ
k
O
γ
− (k+1)
− (k+ 1)lnγ
!
− Li(γ
− 1
)
!
= O
γ
− 1
− (k+ 1)lnγ
− γ
k
Li(γ
− 1
)→ 0
which leads to lim
k→∞
˜
δ(k)= 0. Using it in (J.4) therefore gives lim
k→∞
e
θ
(k)= 0 and lim
k→∞
e
θ
(k)=
0, and hence lim
k→∞
e
θ
(k)= 0.
K Proof of Lemma 1
Plugging (5.1) into (5.2) gives the following:
m(k)=
m(1)
k
+
1
k
ν
k− 1
∑
t=1
π(t)
T
(I− P)
α
0,ω(t)
+
D
∑
d=1
diag(α
d,ω(t)
)diag(x(t)+ y(
ˆ
θ(t)))
d− 1
·
x(t)+ y(
ˆ
θ(t))
!
(K.1)
89
Substituting (5.4), (K.1) gives the following:
m(k)=
m(1)
k
+
1
k
ν
k− 1
∑
t=1
π(t)
T
(I− P)
α
0,ω(t)
+
D
∑
d=1
diag(α
d,ω(t)
)diag(νπ
ω(t)
+ y(0))
d− 1
·
νπ
ω(t)
+ y(0)
!
+
1
k
ν
k− 1
∑
t=1
π(t)
T
(I− P)
D
∑
d=1
diag(α
d,ω(t)
)
diag(x(t)+ y(
ˆ
θ(t)))
d− 1
x(t)+ y(
ˆ
θ(t))
− diag(νπ
ω(t)
+ y(0))
d− 1
νπ
ω(t)
+ y(0)
!
=
m(1)
k
+
1
k
ν
k− 1
∑
t=1
π(t)
T
(I− P)
α
0,ω(t)
+
D
∑
d=1
diag(α
d,ω(t)
)diag(νπ
ω(t)
+ y(0))
d− 1
·
νπ
ω(t)
+ y(0)
!
+
1
k
ν
k− 1
∑
t=1
π(t)
T
(I− P)
D
∑
d=1
diag(α
d,ω(t)
)
d− 1
∑
i=0
diag(x(t)+ y(
ˆ
θ(t)))
d− 1− i
diag(νπ
ω(t)
+ y(0))
i
·
νθ(t)(P
T
− I)π
ω(t)
+ y(
ˆ
θ(t))− y(0)
(K.2)
Since m(k) is a bounded sequence, it contains a convergent subsequence{m((k
s
)}
s
with limit,
say, m. Considering (K.2) for this subsequence, using Proposition 6 we have, almost surely,
m=ν
∑
ω∈Ω
µ
0
(ω)π
ω
T
(I− P)
α
0,ω
+
D
∑
d=1
diag(α
d,ω
)diag(νπ
ω
+ y(0))
d− 1
(νπ
ω
+ y(0))
!
| {z }
=:m
1
+ν
∑
ω∈Ω
µ
0
(ω)π
ω
T
(I− P)
D
∑
d=1
diag(α
d,ω
)
d− 1
∑
i=0
diag(x(m,π
ω
)+ y(θ(m)))
d− 1− i
diag(νπ
ω
+ y(0))
i
· | {z }
=:m
2
νθ(m)(P
T
− I)π
ω
+ y(θ(m))− y(0)
| {z }
=:m
2
(K.3)
90
i.e., the limit m of every convergent subsequence of{m(k)}
k
has to satisfy (K.3). We now show that
(K.3) admits a unique solution in m, and that this solution is non-positive. Towards that purpose,
the following properties of m
1
and m
2
defined in (K.3) will be useful:
(a) m
1
≤ 0 for all m. This is due to the obedience condition;
(b) m
2
= 0 for all m≤ 0. This follows from the definition of m
2
, where θ(m)= 0 for m≤ 0
from (5.3);
(c) m
2
≤ 0 for all m> 0, as follows. (5.9) for y(θ(m)) and y(0), respectively, give:
∑
ω∈Ω
µ
0
(ω)(y(0)− y(θ(m)))
T
α
0,ω
+
D
∑
d=1
diag(α
d,ω
)diag(x(m,π
ω
)+ y(θ(m)))
d− 1
· (x(m,π
ω
)+ y(θ(m)))
!
≥ 0
∑
ω∈Ω
µ
0
(ω)(y(θ(m))− y(0))
T
α
0,ω
+
D
∑
d=1
diag(α
d,ω
)diag(νπ
ω
+ y(0))
d− 1
(νπ
ω
+ y(0))
!
≥ 0
Adding the two expressions gives
∑
ω∈Ω
µ
0
(ω)(y(0)− y(θ(m)))
T
D
∑
d=1
diag(α
d,ω
)
diag(x(m,π
ω
)+ y(θ(m)))
d− 1
· (x(m,π
ω
)+ y(θ(m)))− diag(νπ
ω
+ y(0))
d− 1
(νπ
ω
+ y(0))
!
=
∑
ω∈Ω
µ
0
(ω)(y(0)− y(θ(m)))
T
D
∑
d=1
diag(α
d,ω
)
d− 1
∑
i=0
diag(x(m,π
ω
)+ y(θ(m)))
d− 1− i
· diag(νπ
ω
+ y(0))
i
νθ(m)(P
T
− I)π
ω
+ y(θ(m))− y(0)
≥ 0
(K.4)
91
Finally,
m
2
=− 1
θ(m)
∑
ω∈Ω
µ
0
(ω)
νθ(m)π
ω
T
(P− I)+ y
T
(θ(m))− y
T
(0)
D
∑
d=1
diag(α
d,ω
)· | {z }
=:m
3
≥ 0
d− 1
∑
i=0
diag(x(m,π
ω
)+ y(θ(m)))
d− 1− i
diag(νπ
ω
+ y(0))
i
νθ(m)(P
T
− I)π
ω
+ y(θ(m))− y(0)
| {z }
=:m
3
≥ 0
− 1
θ(m)
∑
ω∈Ω
µ
0
(ω)(y(0)− y(θ(m)))
T
D
∑
d=1
diag(α
d,ω
)· | {z }
=:m
4
≥ 0 from (K.4)
d− 1
∑
i=0
diag(x(m,π
ω
)+ y(θ(m)))
d− 1− i
diag(νπ
ω
+ y(0))
i
νθ(m)(P
T
− I)π
ω
+ y(θ(m))− y(0)
| {z }
=:m
4
≥ 0 from (K.4)
where we note thatθ(m)> 0 for m> 0.
(a), (b) and (c) imply that the right hand side of (K.3) is always non-positive. (a) and (b) also
imply that the only non-positive solution to (K.3) is m= m
1
.
L Proof of Lemma 2
Along the lines of Theorem 1 in [46], we just need to verify the following three conditions: strong
sufficient optimality condition of second order (SOC), constant rank constraint qualification (CR),
and Mangasarian-Fromowitz constraint qualification (MF). Plugging in (2.1) and rewriting the
latency function in (5.7) gives:
ℓ
ω,i
( ˆ x
i
(θ,ω)+ y
i
)=
D
∑
d=0
α
d,ω,i
(νπ
ω.i
+νθP
T
i
π
ω
− νθπ
ω,i
+ y
i
)=α
D,ω,i
y
D
i
+
D− 1
∑
d=0
β
d,ω,i
y
d
i
(L.1)
92
where P
i
∈R
n
is the i-th column of the row-stochastic matrix P, andβ
d,ω,i
({α
k,ω,i
}
k∈{0,1,··· ,D}
,π
ω
,
θ,P
i
) is the coefficient of monomial of degree d. Taking expectation of (L.1) gives:
E
ω∼ µ
0
[ℓ
ω,i
( ˆ x
i
(θ,ω)+ y
i
)]= E
ω∼ µ
0
[α
D,ω,i
y
D
i
+
D− 1
∑
d=0
β
d,ω,i
y
d
i
]= ¯ α
D,i
y
D
i
+
D− 1
∑
d=0
¯
β
d,i
y
d
i
where ¯ α
D,i
:= E
ω∼ µ
0
[α
D,ω,i
],
¯
β
d,i
:= E
ω∼ µ
0
[β
d,ω,i
]. y(θ) satisfies (5.7) if and only if it solves the
following convex problem:
min
y∈P
n
(1− ν)
f(y,θ)=
∑
i∈[n]
Z
y
i
0
E
ω∼ µ
0
[ℓ
ω,i
( ˆ x
i
(θ,ω)+ s)]ds=
∑
i∈[n]
¯ α
D,i
D+ 1
y
D+1
i
+
D− 1
∑
d=0
¯
β
d,i
d+ 1
y
d+1
i
!
s.t. g
i
(y,θ)=− y
i
≤ 0, i∈[n]
h(y,θ)=
n
∑
i=1
y
i
− 1+ν
(L.2)
Moreover, such y is unique if{ℓ
ω,i
}
ω,i
are strictly increasing over[0,1]. Let the Lagrange function
be L(y,θ,λ,µ)= f(y,θ)− ∑
i∈[n]
λ
i
y
i
+µ(∑
n
i=1
y
i
− 1+ν). For a given θ
0
, y= y
0
is a locally
optimal solution for (L.2) atθ =θ
0
. Now, we want to show that y(θ) is Lipschitz continuous by
verifying aforementioned three conditions.
(SOC) Let{H
i j
}
i, j∈[n]
:=∇
2
yy
L(y
0
,θ
0
,λ,µ), from (L.2) we have:
H
i j
=
0 if i̸= j
D ¯ α
D,i
y
D− 1
i
+∑
D− 1
d=1
d
¯
β
d,i
y
d− 1
i
if i= j
which implies that for∀v∈R
n
, v̸= 0, we have v
T
∇
2
yy
L(y
0
,θ
0
,λ,µ)v> 0.
(CR) Let I
0
:={i|g
i
(y
0
,θ
0
)= 0}, I is arbitrary but fixed subset I⊆ I
0
. Note that,∑
i∈[n]
1(g
i
(y
0
,θ
0
)=
0)≤ n− 1, therefore the set of gradients{∇
y
g
i
(y
0
,θ
0
), i∈ I}∪{∇
y
h(y
0
,θ
0
)} has constant rank
|I|+ 1, for all(y,θ) in open neighborhood of(y
0
,θ
0
).
93
(MF) From (L.2) we have:
∇
y
g
i
(y
0
,θ
0
)=
0
.
.
.
− 1
.
.
.
0
n× 1
← i-th entry ∇
y
h(y
0
,θ
0
)=
1
1
.
.
.
1
n× 1
Obviously,∇
y
h(y
0
,θ
0
) is linearly independent. Let v∈R
n
be such that:
v
i
=
1 if g
i
(y
0
,θ
0
)= 0
∑
i∈[n]
1(g
i
(y
0
,θ
0
)=0)
∑
i∈[n]
1(g
i
(y
0
,θ
0
)=0)− n
if g
i
(y
0
,θ
0
)̸= 0
v satisfies that:
∇
y
g
i
(y
0
,θ
0
)
T
v< 0, for g
i
(y
0
,θ
0
)= 0
∇
y
h(y
0
,θ
0
)
T
v= 0
M Proof of Theorem 3
Following (5.4), it suffices to show that lim
k→∞
θ(m(k))= 0, almost surely.
Lemma 1 shows that there exists a convergent subsequence{m(k
s
)}
s
with limit m≤ 0, almost
surely. That is, there exists ˜ s
1
> 0 such that
|m(k
s
)− m|≤− m/3 ∀s> ˜ s
1
a.s. (M.1)
94
We now show that all the terms of the parent sequence m(k) between two consecutive subse-
quence terms m(k
s
) and m(k
s+1
), for s> ˜ s
1
, are negative. (5.2) implies, for all s> ˜ s
1
,
m(k
s
+ 1)=
k
s
k
s
+ 1
m(k
s
)+
1
k
s
+ 1
u
0
(k
s
)+
1
k
s
+ 1
△u(k
s
)
i.e.,
m(k
s
+ 1)− m=
k
s
k
s
+ 1
(m(k
s
)− m)+
1
k
s
+ 1
(u
0
(k
s
)− m)+
1
k
s
+ 1
△u(k
s
) (M.2)
where
u
0
(k
s
) :=νπ
ω(k
s
)
T
(I− P)
α
0,ω(k
s
)
+
D
∑
d=1
diag(α
d,ω(k
s
)
)diag(νπ
ω(k
s
)
+ y(0))
d− 1
νπ
ω(k
s
)
+ y(0)
!
be the instantaneous mistrust assuming non-participating agents estimate that participating agents
are obedient, i.e., assuming
ˆ
θ(k
s
)= 0, and
△u(k
s
) :=u(k
s
)− u
0
(k
s
)
=νπ
ω(k
s
)
T
(I− P)
D
∑
d=1
diag(α
d,ω(k
s
)
)
diag(x(k
s
)+ y(
ˆ
θ(k
s
)))
d− 1
x(k
s
)+ y(
ˆ
θ(k
s
))
− diag(νπ
ω(k
s
)
+ y(0))
d− 1
νπ
ω(k
s
)
+ y(0)
!
=νπ
ω(k
s
)
T
(I− P)
D
∑
d=1
diag(α
d,ω(k
s
)
)
d− 1
∑
i=0
diag(x(k
s
)+ y(
ˆ
θ(k
s
)))
d− 1− i
diag(νπ
ω(k
s
)
+ y(0))
i
·
y(
ˆ
θ(k
s
))− y(0)
(M.3)
In the above equations, we utilize (M.1) to get m(k
s
)≤ 0, and hence θ(m(k
s
))= 0, for s> ˜ s
1
.
Using the Lipschitz property of y(.) (cf. Remark 11) with (M.3) gives
|△u(k
s
)|≤ L|
ˆ
θ(k
s
)|, s> ˜ s
1
(M.4)
95
for some L> 0. Proposition 6 implies that there exists
˜
k
1
> 0 such that|θ(m(k))− ˆ
θ(k)|≤− m
3L
for
all k>
˜
k
1
. Let ˜ s
2
be the smallest integer greater than ˜ s
1
such that k
˜ s
2
>
˜
k
1
. Therefore,|θ(m(k
s
))− ˆ
θ(k
s
)|=|
ˆ
θ(k
s
)|≤− m
3L
for all s> ˜ s
2
. Combining with (M.4) gives
|△u(k
s
)|≤− m
3
, s> ˜ s
2
(M.5)
Strong law of large numbers gives lim
τ→∞
∑
τ
t=1
u
0
(t)
τ
= m almost surely, i.e., there exists
˜
k
2
such that
∑
τ
t=1
u
0
(t)
τ
− m
≤− m/6 for allτ>
˜
k
2
. Therefore, for allτ>
˜
k
2
and a non-negative integer
r,
τ
7m
6
≤ τ
∑
t=1
u
0
(t)≤ τ
5m
6
, (τ+ 1+ r)
7m
6
≤ τ+1+r
∑
t=1
u
0
(t)≤ (τ+ 1+ r)
5m
6
Subtraction gives
|
τ+1+r
∑
t=τ+1
(u
0
(t)− m)|≤− (2τ+ r+ 1)
m
6
, τ>
˜
k
2
,r≥ 0 (M.6)
Let ˜ s be the smallest integer greater than ˜ s
2
such that k
˜ s
>
˜
k
2
. Using (M.1), (M.5) and (M.6)
(τ = k
s
− 1, r= 0) in (M.2) gives, for all s> ˜ s,
|m(k
s
+ 1)− m|≤− k
s
k
s
+ 1
m
3
− (2k
s
− 1)
k
s
+ 1
m
6
− 1
k
s
+ 1
m
3
<− 2m
3
which implies that m(k
s
+ 1)<
m
3
≤ 0.
Hereafter, we show that m(k
s
+ 2),m(k
s
+ 3),...m(k
s+1
− 1) are all negative, for s> ˜ s, using
induction. Let m(t)< 0 for all t∈{k
s
+1,...,k− 1}, for some k∈{k
s
+2,...,k
s+1
− 1}. Repeated
application of (M.2) gives
m(k)− m=
k
s
k
(m(k
s
)− m)+
1
k
k− 1
∑
t=k
s
(u
0
(t)− m)+
1
k
k− 1
∑
t=k
s
△u(t) (M.7)
96
Along similar lines as (M.4), one can show that|△u(t)|≤− m
3
for all t∈{k
s
,...,k− 1}. Using
this, along with (M.1) and (M.6) (τ = k
s
− 1, r= k− k
s
− 1), in (M.7) gives
|m(k)− m|≤− k
s
k
m
3
− (k+ k
s
− 2)
k
m
6
− (k− k
s
)
k
m
3
<− 2m
3
which implies that m(k)<
m
3
≤ 0.
N Feedback Survey
1. On the scale of 1 to 5 (with 5 being the highest), how did you feel you understood what was
happening in each scenario, what you were doing and what you needed to do next?
2. On the scale of 1 to 5 (with 5 being the highest), how often did you check the average star
rating before making route choice decisions?
3. When you indeed checked the average star rating, on the scale of 1 to 5 (with 5 being the
highest), how much did the average star rating affect your route choice decisions?
4. On the scale of 1 to 5 (with 5 being the highest), how often did you check the histograms
before making route choice decisions?
5. When you indeed checked the histograms, on the scale of 1 to 5 (with 5 being the highest),
how much did the histograms affect your route choice decisions?
6. How did you make your route choice decisions?
(a) Study the average star rating, histograms as well as the recommendations to come up
with a decision.
(b) Follow the recommendations as long as the average star rating is above certain value.
(c) Always follow the recommendations.
(d) Make random choices.
97
7. If you choose (b) in question 6, what was the threshold value?
8. What did you like (if any) or dislike (if any) about the experiment interface?
9. Any other comments you might have
98
Abstract (if available)
Abstract
We consider a routing game among non-atomic agents where link latency functions are conditional on an uncertain state of the network. The agents have the same prior belief about the state, but only a fixed fraction receive private route recommendations or a common message, which are generated by a known randomization, referred to as private or public signaling policy respectively. The remaining agents choose route according to Bayes Nash flow with respect to the prior. We develop a computational approach to solve the optimal information design problem, i.e., to minimize expected social latency over all public or obedient private signaling policies. For a fixed flow induced by non-participating agents, design of an optimal private signaling policy is shown to be a generalized problem of moments for polynomial link latency functions, and to admit an atomic solution with a provable upper bound on the number of atoms. This implies that, for polynomial link latency functions, information design can be equivalently cast as a polynomial optimization problem. This in turn can be arbitrarily lower bounded by a known hierarchy of semidefinite relaxations. The first level of this hierarchy is shown to be exact for the basic two link case with affine latency functions. We also identify a class of private signaling policies over which the optimal social cost is non-increasing with increasing fraction of participating agents for parallel networks. This is in contrast to existing results where the cost of participating agents under a fixed signaling policy may increase with their increasing fraction.
We then study the non-atomic routing game in a repeated setting. In every round, nature chooses a state in an i.i.d. manner according to a publicly known distribution. The recommendation system makes private route recommendations to participating agents according to a publicly known signaling policy. The participating agents choose between obeying or not obeying the recommendation according to cumulative regret of the participating agent population in the previous round. The non-participating agents choose route according to myopic best response to a calibrated forecast of the routing decisions of the participating agents. We show that, for parallel networks, if the recommendation system’s signal strategy satisfies the obedience condition, then, almost surely, the link flows are asymptotically consistent with the Bayes correlated equilibrium induced by the signaling policy.
Finally, we report findings from a related experiment with one participant at a time engaged in repeated route choice decision on computer. In every round, the participant is shown travel time distribution for each route, a route recommendation generated by an obedient policy, and a numeric rating suggestive of average experience of previous participants with the quality of recommendation. Upon entering route choice, the actual travel times derived from route choice of previous participants are revealed. The participant uses this information to evaluate the quality of recommendation received for that round and enter a numeric review accordingly. This is combined with historical reviews to update rating for the next round. Data analysis from 33 participants each with 100 rounds suggests moderate negative correlation between the display rating and the average regret with respect to optimal choice, and a strong positive correlation between the rating and the likelihood of following recommendation. Overall, under obedient recommendation policy, the rating seems to converge close to its maximum value by the end of the experiments in conjunction with very high frequency of following recommendations.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Integration of truck scheduling and routing with parking availability
PDF
Optimum multimodal routing under normal condition and disruptions
PDF
Provable reinforcement learning for constrained and multi-agent control systems
PDF
Computational geometric partitioning for vehicle routing
PDF
Train routing and timetabling algorithms for general networks
PDF
Intelligent urban freight transportation
PDF
Personalized Pareto-improving pricing-and-routing schemes with preference learning for optimum freight routing
PDF
Elements of robustness and optimal control for infrastructure networks
PDF
Train scheduling and routing under dynamic headway control
PDF
New Lagrangian methods for constrained convex programs and their applications
PDF
Dynamic routing and rate control in stochastic network optimization: from theory to practice
PDF
Improving mobility in urban environments using intelligent transportation technologies
PDF
Novel techniques for analysis and control of traffic flow in urban traffic networks
PDF
Models and algorithms for the freight movement problem in drayage operations
PDF
Models and algorithms for pricing and routing in ride-sharing
PDF
Evaluating city logistics using two-level location routing modeling and SCPM simulation
PDF
Interaction and topology in distributed multi-agent coordination
PDF
Efficient delivery of augmented information services over distributed computing networks
PDF
Scheduling and resource allocation with incomplete information in wireless networks
PDF
Exploiting side information for link setup and maintenance in next generation wireless networks
Asset Metadata
Creator
Zhu, Yixian
(filename)
Core Title
Information design in non-atomic routing games: computation, repeated setting and experiment
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2022-12
Publication Date
11/03/2022
Defense Date
08/31/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
behavioral dynamics.,Information Systems,network theory(graphs),OAI-PMH Harvest,optimization methods,routing,smart transportation,traffic control
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Savla, Ketan (
committee chair
), Dessouky, Maged (
committee member
), Ioannou, Petros (
committee member
), Jovanovic, Mihailo (
committee member
), Nayyar, Ashutosh (
committee member
)
Creator Email
yixian@usc.edu,yixianusc@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112305187
Unique identifier
UC112305187
Identifier
etd-ZhuYixian-11298.pdf (filename)
Legacy Identifier
etd-ZhuYixian-11298
Document Type
Dissertation
Format
theses (aat)
Rights
Zhu, Yixian
Internet Media Type
application/pdf
Type
texts
Source
20221107-usctheses-batch-990
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
behavioral dynamics.
network theory(graphs)
optimization methods
routing
smart transportation
traffic control