Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Making web transfers more efficient
(USC Thesis Other)
Making web transfers more efficient
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Making Web Transfers More Efficient
by
Kyriakos Zarifis
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2018
Copyright 2018 Kyriakos Zarifis
Acknowledgements
First, I would like to thank my advisors, Ethan Katz-Bassett and Ramesh Govindan, for
providing continuous motivation, inspiration, and valuable guidance through my studies.
My managers and mentors at Akamai, Mark Holland and Manish Jain, for providing
the means and knowledge needed to conduct my research, and for their continuous feedback
that largely helped to shape the direction of my research and dissertation.
My collaborators and co-authors, including Tobias Flach, Matt Calder, Brandon
Schlinker, and Rui Miao.
My lab mates in the NSL group at the Computer Science department of the University
of Southern California.
Everyone who helps run the Computer Science department at USC. Lizsl De Leon and
Tracy Charles deserve a particular mention, for patiently taking care of all my academic
needs through the years.
Deus Ex Machina, for providing a nice workplace, although they should be thanking
me for all the coffee I bought.
Last but not least, I want to thank my family and friends for their continuous support
through this interesting journey.
And Mania.
ii
Table of Contents
Acknowledgements ii
List of Tables v
List of Figures vi
Abstract viii
Chapter 1: Introduction 1
1.1 Diagnosing inflated paths between client and server . . . . . . . . . . . . . 5
1.2 Understanding performance benefits of HTTP/2 . . . . . . . . . . . . . . 7
1.3 Increasing channel efficiency by speculatively delivering content to the browser 7
1.4 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 2: Understanding Path Inflation Causes for Mobile Traffic 10
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 A Taxonomy of Inflated Routes . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Path Inflation Today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Chapter 3: Analyzing H1 traces to estimate H2 performance (RTH2) 26
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Background and Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 The RT-H2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Chapter 4: Prepositioning content on the client to speed up page loads 50
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Prepositioning Approach and Challenges . . . . . . . . . . . . . . . . . . . 53
4.2.1 Using Idle Network time to Preposition . . . . . . . . . . . . . . . 54
iii
4.2.2 Prepositioning Mechanisms . . . . . . . . . . . . . . . . . . . . . . 57
4.2.3 The Applicability of Prepositioning . . . . . . . . . . . . . . . . . . 58
4.2.4 Prepositioning Challenges . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Prepositioning Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.1 Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.2 Selecting Preposition Candidates . . . . . . . . . . . . . . . . . . . 64
4.3.3 Determining Preposition Payload Quota . . . . . . . . . . . . . . . 65
4.3.4 Assigning Candidates to Timeframes . . . . . . . . . . . . . . . . . 68
4.4 Evaluation of Prepositioning . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4.1 Experiment setup and methodology . . . . . . . . . . . . . . . . . 75
4.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Chapter 5: Literature Review 95
5.1 Path Inflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Mobile performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3 HTTP/2 performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.4 Prepositioning content on the client . . . . . . . . . . . . . . . . . . . . . 98
5.5 Other Page Load Optimizations . . . . . . . . . . . . . . . . . . . . . . . . 99
Chapter 6: Conclusions 104
6.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2 Lessons Learned and Future Directions . . . . . . . . . . . . . . . . . . . . 106
6.3 Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Bibliography 113
iv
List of Tables
2.1 Fraction of traceroutes from major US carriers with metro-level inflation. 16
2.2 Observed path inflation for two carriers in Q4 2011. . . . . . . . . . . . . 22
2.3 Observed peering locations between carriers and Google. . . . . . . . . . . 24
3.1 Validation of HTTP/2 model . . . . . . . . . . . . . . . . . . . . . . . . . 41
v
List of Figures
1.1 End to end client - server path for a web transfer . . . . . . . . . . . . . . 2
2.1 Optimal routing for mobile clients. . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Different ways a client can be directed to a server. . . . . . . . . . . . . . 18
2.3 Root cause analysis for metro-level inflation. . . . . . . . . . . . . . . . . . 20
2.4 Server selection flapping due to coarse client-server mapping. . . . . . . . 22
2.5 Observed ingress points for major US carriers. . . . . . . . . . . . . . . . . 24
3.1 Transformation of an HTTP/1.1 waterfall to an HTTP/2 waterfall. . . . . 33
3.2 rt-h2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Example of validation on a real page for 3 RTT values. . . . . . . . . . . . 40
3.4 Impact of HTTP/2 on Page Load Times for 280K page downloads . . . . 43
3.5 Fraction of times a website was estimated to experience positive/zero/neg-
ative impact on Page Load Time when using HTTP/2. . . . . . . . . . . . 44
3.6 Distributions of page load performance impact of HTTP/2 on 55 web sites 45
3.7 Impact of Prioritization optimization on Page Load Time . . . . . . . . . 46
3.8 Impact of Push optimization on Page Load Time . . . . . . . . . . . . . . 47
3.9 Impact of HTTP/2 with optimizations on Page Load Time . . . . . . . . 48
vi
4.1 Page download through a CDN . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Preposition timeframes during a page transition . . . . . . . . . . . . . . . 55
4.3 Time-to-first-byte of landing pages for CDN users . . . . . . . . . . . . . . 56
4.4 Fraction of real user page loads for which objects of CDN-served page were
not cached on the browser . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5 Impact on page load performance when prepositioning all candidate objects 62
4.6 Total preposition Candidates payload . . . . . . . . . . . . . . . . . . . . 67
4.7 The process of sampling real user page transitions . . . . . . . . . . . . . 70
4.8 Illustration of the network-idle times during a page transition . . . . . . . 71
4.9 Impact of page transition prepositioning compared to baseline performance 81
4.10 Performance of loading prefetched objects from browser cache . . . . . . . 84
4.11 Impact of PreHTML prepositioning for landing pages compared to baseline
performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.12 Data reduction during page load process due to transition prepositioning . 86
4.13 Comparison of proposed prepositioning to strawman approaches . . . . . 87
4.14 Fraction of prepositioned data used during page load (landing page prepo-
sitioning) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.15 Fraction of prepositioned data used during page load (transition preposi-
tioning) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.16 Preposition Candidates selection . . . . . . . . . . . . . . . . . . . . . . . 91
4.17 Comparison between Preposition Candidate selection approaches . . . . . 92
4.18 Stability of page transition probabilities over time . . . . . . . . . . . . . 93
vii
Abstract
Web performance has attracted a lot of attention in the industry and academia in the
last decade, as it has been repeatedly associated with business revenues. The Web is a
vast and diverse ecosystem, and there are multiple layers in which delays can occur. In
this dissertation we focus on delays in the routing layer caused by using circuitous paths
between clients and servers, and delays in the application layer caused by inefficient use of
their communication channel.
The first step of a Web transfer involves establishing a connection over a physical path
between the client and the server. Previous research has shown that the shortest path
is not always selected, due to routing protocol policy-based decisions. In this work we
re-evaluate path inflation, specifically focusing on mobile traffic directed to Google servers,
in order to understand the evolution of the infrastructure of mobile carrier networks, and
how it can affect user experience.
Once a connection has been established, information is exchanged between the two
hosts according to the communication protocol defined by HTTP, the application layer
protocol used in the vast majority of today’s Web transfers. HTTP recently saw a big
redesign and addition of new features, all of which aimed at faster transfers. In this work
we develop a model of the new version of HTTP/2 and pass through it a large dataset
viii
of HTTP/1 traces, in order to understand the performance implications of deploying
the new version of the protocol in the wild. Our study exposes several opportunities for
improvements, specifically using a feature of HTTP/2 that allows a server to send to
the client an object without the client requesting it. Generalizing from that observation,
we design, develop and evaluate a system that allows CDNs to utilize idle network time
around page downloads to send to the client content that the client is expected to request
in the current or next page navigation. We show that if implemented correctly, this kind
of speculative content prepositioning on the client can achieve a performance improvement
comparable to having a page loaded on the client cache.
ix
Chapter 1
Introduction
Delays in web applications have been repeatedly shown to negatively impact business
revenues [19, 38, 48, 50] because they directly affect user experience and bounce rates –
the percentage of users that leave a web site soon after visiting the first page. As such,
responsive and fast-loading web services have become an important goal, particularly for
news, e-commerce, and other web sites that rely on user engagement.
Web application performance is directly dependent on how quickly a client can exchange
information with a web server. The speed of this communication is itself dependent on
several factors: a single page load involves data traversing multiple networks and physical
links of various capacities and loads, reaching multiple remote servers at different domains,
and being processed both at the client and the servers.
Delays in client-server communication can be introduced in any of the stages of
initiating and completing a transfer. As shown in Fig.1.1, a user request to open a web
page is forwarded through different layers of infrastructure. In order to reach a remote
web server, a user request first needs to be routed into the Internet backbone. Wired
devices like home routers are directly connected to their ISPs’ Internet backbone. Traffic
1
Figure 1.1: End to end client - server path for a web transfer
of mobile devices needs to be routed through cell towers into Internet backbone ingress
points. From there, the user request needs to be routed through the Internet to the
remote web server. Once it reaches the server and a connection has been established, an
application layer protocol facilitates the client-server communication by defining the types
of messages exchanged between the two ends to complete a transfer. The application
payload is segmented into packets that are delivered across the Internet path according
to the rules defined by a transport protocol. This protocol ensures that packets will be
delivered and tries to avoid congesting the network with flow control. In order to avoid
long paths, web site owners often use Content Delivery Networks (CDNs), which deploy
intermediate servers (Fig. 1.1) close to end users around the globe, in order to serve some
of the web site’s content from nearby locations. Each of these layers presents potential
performance bottlenecks and opportunities for optimization.
At a high level, in order to complete a web transfer, the following things need to
happen: a) The network needs to find a path for to establish the client-server connection,
b) the server has to decide what data to send to the client and when, c) the server has to
2
decide how to interleave multiple objects needed to render the page into the connection,
and d) the network has to ensure reliability of packet delivery.
Previous work [62] has looked at understanding and mitigating delays introduced in
the protocols that provide realibility of data transfer between a content provider and a
client. In this analysis, a key factor in identifying such delays and was the analysis of
large sets of real user traces from user devices to Google’s servers.
In this dissertation we also use large data of real user traces to understand reasons for
delays relevant to the other three steps of a web transfer. More specifically, by analyzing
real traces between content provider serves and web clients, we want to understand and try
to mitigate web transfer delays relevant to reaching the remote erver in order to initiate
communication, and relevant to the way the server utilizes the communication channel.
Optimizing factors like physical protocol or link capabilities, network congestion, or client
and server processing speeds, while also important and relevant, are not in the scope of
this dissertation.
The first part of this dissertation focuses on understanding the reasons and performance
implications of inflated paths for mobile traffic on the Internet. Routing a request efficiently
through a global network is a non trivial process that involves collaboration between
network devices owned by different organizations. The most direct available path between
a client and a server is not always used, leading to delays caused by traveling longer
distance than necessary [85]. Use of longer paths is not always accidental: Internetwork
routing is policy-based and relies upon contracts among ISPs and content providers as
much as on infrastructure. Previous studies that have shown this [66,88] did not focus
on mobile traffic, which has grown considerably in the last decade. One study focusing
3
on mobile traffic path [95] identifies carrier infrastructure as the most important limiting
factor leading to inflated paths. In this work, by analyzing a large data set of real user
requests to a content provider network, we want to get a better understanding of the
reasons that can lead to such inflated paths today, and compare more recent results to
older conclusions, in order to understand the evolution of the infrastructure of mobile
carrier networks, and how it can affect user experience.
The second part of this work focuses on understanding and mitigating application-
layer delays in web transfers. Once a client request has been routed to a server, an
application-layer protocol defines the language that the two hosts should use to interact.
A well-defined and efficient application layer protocol can make a lot of difference on the
effective goodput, which is the useful payload transferred between the end hosts per unit
of time. Since the birth of the World Wide Web, HTTP has been the de-facto protocol for
client-server communication for web transfers. HTTP/1.1 has remained the latest version
of the protocol for almost 20 years, until a new version of the protocol, HTTP/2, started
seen increased adoption rates a few years ago [91]. The new version comes with various
optimizations and features. With the exception of prior work conducted in controlled
environments [93], or user studies on user-perceived load times [49,90], the efficiency and
best practices for using the new features of HTTP/2 at a global scale have remained
largely unexplored. This work aims at providing a better understanding of potential
benefits of the new version of the protocol, by analyzing a large data set of real user
HTTP/1.1 downloads and estimating their performance through HTTP/2.
In order to optimize delivery, web site owners often make use of Content Delivery
Networks (CDNs), which help serve content faster by utilizing a proxy server that is close
4
to the client. This effectively reduces the client-server path, as well as the likelihood of
using a circuitous path to a remote web server. In a CDN context, a client requests a page
from a nearby CDN proxy, which serves the client any embedded objects that it has cached
locally, avoiding trips to the remote origin server when possible. The proxy’s location and
the CDN’s infrastructure and monitoring lay the groundwork for some further application
layer protocol optimizations. Using insights derived from analysis of real user downloads
around the world, we design and implement a way to make use of time the client-proxy
channel is idle waiting for a response from the remote web server, to speculatively and
proactively send data from the CDN proxy to the client.
Thesis statement: Traces from real users enable identification of opportunities at
all layers for more efficient CDN delivery of dynamic web pages in emerging settings and
with emerging protocols.
In the following chapters we discuss the studies that we conducted in detail. The next
paragraphs provide a summary of each of the studies, the motivation for them, questions
that we wanted to answer, and, when applicable, suggestions for optimizations that were
based on observations, and conclusions from each study.
1.1 Diagnosing inflated paths between client and server
Web application performance is highly dependent on round trip times between client
and server. Most web transfers are short and as such usually blocked by RTT, and not
bandwidth [20].
5
End-to-end network latency is a factor of three network-related delays: queueing delay
in intermediate routing devices between end-hosts; processing delay on routing devices and
end hosts; and propagation delay, the time it takes for packets to be transmitted through
the physical medium forming the path between the client and the server. Under normal
network operation, propagation delay dominates these three factors. The longer the path
from a client to a server, the higher the propagation delay. To reduce propagation delay,
content providers continuously expand their infrastructure [52], CDNs are utilized, and
ISPs and mobile carriers deploy infrastructre in an attempt to get closer to their clients.
However, it has been shown that paths can still be circuitous, despite the existence of
shorter paths connecting a client to a server [66,88].
In Chapter 2 we quantify and understand the causes of geographically circuitous
routes from mobile clients. Understanding the impact of Internet topology and routing on
mobile users requires broad, longitudinal network measurements conducted from mobile
devices. For this study, we use 1.5 years of measurements from devices on 4 US carriers.
We identify the key elements that can affect the Internet routes taken by traffic from
mobile users (client location, server locations, carrier topology, carrier/content-provider
peering). We then develop a methodology to diagnose the specific cause for inflated routes.
Although we observe that the evolution of some carrier networks improves performance
in some regions, we also observe many clients - even in major metropolitan areas - that
continue to take geographically circuitous routes to content providers, due to limitations
in the current topologies.
6
1.2 Understanding performance benefits of HTTP/2
After a path has been chosen and communication has been established between a client and
a server, an application layer protocol defines the language that is used to communicate
between the two end hosts. HTTP/1.1 has been the de-facto application layer protocol
powering for Web transfers for the last 20 years. In the last few years, HTTP/2 has been
slowly but surely replacing HTTP/1.1 [91]. This new version of the protocol comes with a
set of improvements designed to reduce page load times.
With the standardization of HTTP/2, content providers want to understand the
benefits and pitfalls of transitioning to the new standard. In Chapter 3, using a large
dataset of HTTP/1.1 page load traces from production traffic on Akamai’s Content
Delivery Network and a model of HTTP/2 behavior that we developed, we obtain the
distribution of performance differences between the two protocol versions for nearly 280,000
real user downloads. We find that HTTP/2 provides significant performance improvements
in the tail, and, for websites for which HTTP/2 does not improve median performance,
we explore how optimizations based on new features can improve performance, and how
these improvements relate to page structure.
1.3 Increasing channel efficiency by speculatively delivering
content to the browser
Inspired by results from our work on estimating the performance impact of HTTP/2, in
Chapter 4 we explore the potential benefits of prepositioning, i.e. speculatively sending
content to browsers before they request it. Proxy servers owned by CDNs are in a unique
7
position to derive benefits from prepositioning. This is because there are times around a
page load during which client-proxy connections are idle
When a user loads a page served through a CDN, the CDN proxy can send objects to
the client while it fetches the HTML page from the origin. Before the user then navigates
to another page served by the CDN (a page transition), the CDN has two opportunities
to preposition objects: the time during which the user interacts with the first page (user
think time), and the time to fetch the second page’s HTML from the origin. Put together,
these timeframes provide an opportunity for delivering content to the browser during
times that the proxy-client channel would otherwise be idle, thus reducing page load time.
The challenges in achieving landing and page transition prepositioning are to know
which objects to preposition, to do so in a manner that does not delay page rendering,
while taking into account which objects are shared across multiple pages, which objects
are already cached, and the likelihood of visits to each possible next page. To address
these, we leverage CDN instrumentation, together with a novel optimization framework
that captures these challenges as constraints while striving to minimize expected page load
times. This involved collecting and analyze large data set of real user download traces of
page loads served through the CDN, which provide valuable insights on how to efficiently
configure prepositioning. We find that, for page transitions, CDNs can achieve similar
performance as having a warmed up browser cache, and the difference in performance
between the two can be explained by browser cache behavior.
8
1.4 Summary of contributions
In summary, the contributions made in this thesis are as follows:
1. We identify key reasons that can cause mobile clients to take circuitous paths to
reach a remote Web server, using a dataset of connection traces of mobile clients to
Google front-end servers.
2. We develop a model that predicts the impact on page load times when using HTTP/2
instead of HTTP/1.1, and apply that model on a large dataset of real user HTTP/1.1
downloads of web pages served through Akamai’s CDN proxy servers.
3. We evaluate the impact of content prepositioning on real web sites and design
best practices for i) predicting what content is likely to be requested by a browser
imminently, and ii) speculatively sending content to the browser while the network
channel is idle between page navigations.
9
Chapter 2
Understanding Path Inflation Causes for Mobile Traffic
As mobile Internet becomes more popular, carriers and content providers must engineer
their topologies, routing configurations, and server deployments to maintain good perfor-
mance for users of mobile devices. Understanding the impact of Internet topology and
routing on mobile users requires broad, longitudinal network measurements conducted
from mobile devices. In this work, we are the first to use such a view to quantify and
understand the causes of geographically circuitous routes from mobile clients using 1.5
years of measurements from devices on 4 US carriers. We identify the key elements
that can affect the Internet routes taken by traffic from mobile users (client location,
server locations, carrier topology, carrier/content-provider peering). We then develop
a methodology to diagnose the specific cause for inflated routes. Although we observe
that the evolution of some carrier networks improves performance in some regions, we
also observe many clients - even in major metropolitan areas - that continue to take
geographically circuitous routes to content providers, due to limitations in the current
topologies.
10
2.1 Introduction
As mobile Internet becomes more popular, carriers and content providers must engineer
their topologies, routing configurations, and server deployments to maintain good per-
formance for users of mobile devices. A key challenge is that performance changes over
space and time, as users move with their devices and providers evolve their topologies.
Thus, understanding the impact of Internet topology and routing on mobile users requires
broad, longitudinal network measurements from mobile devices.
In this work, we are the first to identify and quantify the performance impact of several
causes for inflated Internet routes taken by mobile clients, based on a dataset of 901,000
measurements gathered from mobile devices during 18 months. In particular, we isolate
cases in which the distance traveled along a network path is significantly longer than the
direct geodesic distance between endpoints. Our analysis focuses on performance with
respect to Google, a large, popular content provider that peers widely with ISPs and
hosts servers in many locations worldwide. This rich connectivity allows us to expose the
topology of carrier networks as well as inefficiencies in current routing. We constrain our
analysis to devices located in the US, where our dataset is densest.
Our key results are as follows. First, we find that path inflation is endemic: in the
last quarter of 2011 (Q4 2011), we observe substantial path inflation in at least 47% of
measurements from devices, covering three out of four major US carriers. While the
average fraction of samples experiencing path inflation dropped over the subsequent year,
we find that one fifth of our samples continue to exhibit inflation. Second, we classify root
causes for path inflation and develop an algorithm for identifying them. Specifically, we
11
identify whether the root cause is due to the mobile carrier’s topology, the peering between
the carrier and Google, and/or the mapping of mobile clients to Google servers. Third, we
characterize the impact of this path inflation on network latencies, which are important
for interactive workloads typical in the mobile environment. We show that the impact on
end-to-end latency varies significantly depending on the carrier and device location, and
that it changes over time as topologies evolve. We estimate that additional propagation
delay can range from at least 5-50ms, which is significant for service providers [66]. We
show that addressing the source of inflation can reduce download times by hundreds of
milliseconds. We argue that it will become increasingly important to optimize routing as
last-mile delays in mobile networks improve and the relative impact of inflation becomes
larger.
2.2 Background
As Internet-connected mobile devices proliferate, we need to understand factors affecting
Internet service performance from mobile devices. In this chapter, we focus on two factors:
the carrier topology, and the routing choices and peering arrangements that mobile carriers
and service providers use to provide access to the Internet.
The device’s carrier network can have multiple Internet ingress points — locations
where the carrier’s access network connects to the Internet. The carrier’s network may
also connect with a Web service provider at a peering point — a location where these two
networks exchange traffic and routes. The Domain Name System (DNS) resolvers from
12
Peering Server User Cell Tower Ingress
Metro Area
Internet
Figure 2.1: Optimal routing for mobile clients.
(generally) the carrier and the service provider combine to direct the client to a server for
the service by resolving the name of the service to a server IP address.
Idealized Operation. This chapter focuses on Google as the service provider. To
understand how mobile devices access Google’s services, we make the following assumptions
about how Google maps clients to servers to minimize latency. First, Google has globally
distributed servers, forming a network that peers with Internet service provider networks
widely and densely [?,?]. Second, Google uses DNS to direct clients (in our case, mobile
devices) to topologically nearby servers. Last, Google can accurately map mobile clients
to their DNS resolvers [?]. Since its network’s rich infrastructure aims at reducing client
latency, Google is an excellent case study to understand how carrier topology and routing
choices align with Google’s efforts to improve client performance.
We use Figure 2.1 to illustrate the ideal case of a mobile device connecting to a Google
server. A mobile device uses DNS to look up www.google.com. Google’s resolver returns
an optimal Google destination based on a resolver-server mapping. Traffic from the device
traverses the carrier’s access network, entering the Internet through an ingress point.
Ideally, this ingress point is near the mobile device’s location. The traffic enters Google’s
network through a nearby peering point and is routed to the server.
13
In this chapter, we identify significant deviations from this idealized behavior. Specifi-
cally, we are interested in metro-level path inflation [?], where traffic from a mobile client
to a Google server exits the metropolitan (henceforth metro) area even though Google has
a presence there. This metro-level inflation impacts performance by increasing latency.
Example Inflation. Carrier topology determines where traffic from mobile hosts enters
the carrier network. Prior work has suggested that mobile carriers have relatively few
ingress points [95]. Therefore, traffic from a client in the Los Angeles area may enter the
Internet in San Francisco because the carrier does not have an ingress in Los Angeles. If
the destination service has a server in Los Angeles, the topology can add significant latency
compared to having an ingress in LA. Routing configurations and peering arrangements
can also cause path inflation. As providers move services to servers located closer to
clients, the location where carriers peer with a provider’s network may significantly affect
performance. For instance, if a carrier has ingress points in Seattle and San Francisco, but
peers with a provider only in San Francisco, it may route Seattle traffic to San Francisco
even if the provider has a presence in Seattle.
2.3 Dataset
Data Collected. Our data consists of network measurements (ping, traceroute, HTTP
GET, UDP bursts and DNS lookups) issued from Speedometer, an internal Android
app developed by Google and deployed on thousands of volunteer devices. Speedometer
conducts approximately 20-25 measurements every five minutes, as long as the device has
sufficient remaining battery life (80%) and is connected to a cellular network.
1
1
The app source is available at: https://github.com/Mobiperf/Speedometer
14
Our analysis focuses on measurements toward Google servers including 310K tracer-
outes, 300K pings and 350K DNS lookups issued in three three-month periods (2011 Q4,
2012 Q2 and Q4). We focus on measurements issued by devices in the US, where the
majority of users is located, with a particular density of measurements in areas with large
Google offices. All users running the app have consented to sharing collected data in an
anonymized form.
2
Some fields are stripped (e.g. device IP addresses, IDs), others are
replaced by hash values (e.g. HTTP URLs). Location data is anonymized to the center of
a region that contains at least 1000 users and is larger than 1km
2
.
The above measurements are part of a dataset that we published to a Google Cloud
Storage bucket and released under the Creative Commons Zero license
3
. We also provide
Mobile Performance Maps, a visualization tool to navigate parts of the dataset, understand
network performance and supplement the analysis in this chapter: http://mpm.cs.usc.edu.
Finding Ingress Points. In order to identify locations of ingress points, for each
carrier, we graphed the topology of routes from mobile devices to Google, as revealed
by the traceroutes in our dataset. We observe that traceroutes from clients in the same
regions tend to follow similar paths. We used the DNS names of routers in those paths
to identify the location of hops at which they enter the public Internet. In general, the
traceroutes form well-defined structures, starting with private or unresolvable addresses,
where all measurements from a given region reach the Internet in a single, resolvable
location, generally a point of presence of the carrier’s backbone network. We define this
location as the ingress point.
2
Google’s privacy and legal teams reviewed and approved data anonymization and release.
3
http://commondatastorage.googleapis.com/speedometer/README.txt
15
AT&T Sprint T-Mobile Verizon
Q4 2011 0.98 0.10 0.65 0.47
Q2 2012 0.98 0.21 0.25 0.15
Q4 2012 0.00 0.21 0.20 0.38
Table 2.1: Fraction of traceroutes from major US carriers with metro-level inflation.
Finding Peering Points. To infer peering locations between the carriers and Google,
we identified for each path the last hop before entering Google’s network, and the first
hop inside it (identified by an IP address from Google’s blocks). Using location hints in
the hostnames of those hop pairs, we infer peering locations for each carrier [?]. In cases
where the carrier does not peer with Google (i.e.,sends traffic through a transit AS), we
use the ingress to Google’s network as the inferred peering location.
2.4 A Taxonomy of Inflated Routes
Types of Path Inflation. Table 2.1 shows, for traceroutes in our dataset from the four
largest mobile carriers in the US, the fraction of routes that incurred a metro-level path
inflation.
For three of the four carriers, more than half of all traceoutes to Google experienced a
metro-level deviation in Q4 2011. Further, nearly all measurements from AT&T customers
traversed inflated paths to Google. Note that these results are biased toward locations of
users in our dataset and are not intended to be generalized. Nevertheless, at a high-level,
this table shows that metro-level deviations occur in routes from the four major carriers,
even though Google deploys servers around the world to serve nearby clients [66]. However,
we also observe that the fraction of paths experiencing metro-level inflation decreases
16
significantly over the subsequent 12 months. As we will show, we can directly link some
of these improvements to the topological expansion of carriers.
In the rest of the chapter, we examine path inflation to understand its causes and
to explore what measures carriers have adopted to reduce or eliminate it. We begin by
characterizing the different types of metro-level inflations we see in our dataset. We split
the end-to-end path into three logical parts: client to carrier ingress point (Carrier Access),
carrier ingress point to service provider ingress point (Interdomain), and service provider
ingress point to destination server (Provider Backbone). Then we define the following
observed traffic patterns of inflated routes:
Carrier Access Inflation. Traffic from a client in metro area L (Local) enters the Internet
in metro area R (Remote), and is directed to a Google server in R.
Interdomain Inflation. Traffic from a client in area L enters the carrier’s backbone in L,
then enters Google’s network in area R and is directed to a Google server there.
Carrier Access-Interdomain Inflation. Traffic from a client in metro area L enters the
carrier’s backbone in metro area R, then enters Google’s network back in area L and is
directed to a Google server there.
Provider Backbone Inflation. Traffic from a client in area L enters the carrier’s backbone
and Google’s network in area L, but is directed to a Google server in a different area R.
In all cases, Google servers are known to exist in both metro areas L and R.
Possible Causes of Path Inflation. If a carrier lacks sufficient ingress points from its
cellular network to the Internet, it can cause Carrier Access Inflation. For example, if a
carrier has no Internet ingress points in metro area L, it must send the traffic from L to
17
Internet
Metro Area A
User B Cell Tower
Metro Area B
Peering Server
Metro Area C
Server
No Peering
No Ingress Ingress
Cell Tower
Ingress
User A
Server Peering
Cell Tower User C
Figure 2.2: Different ways a client can be directed to a server. User A is the ideal case, where
the traffic never leaves a geographical area. User B and C’s traffic suffers path inflation, due to
lack of ingress point and peering point respectively.
another area R (Figure 2.2, user B). If a carrier’s access network ingresses into the Internet
in metro-area L, a lack of peering between the mobile carrier and Google in metro-area L
causes traffic to leave the metro area, resulting in Interdomain Inflation (Figure 2.2, user
C). If a carrier has too few ingresses and lacks peering near its ingresses, we may observe
Carrier Access-Interdomain Inflation. In this case a carrier, lacking ingress in area L,
hauls traffic to a remote area R, where it lacks peering with Google. A peering point
exists in area L, so traffic returns there to enter Google’s network. Though a provider
like Google has servers in most major metropolitan areas, it can still experience Provider
Backbone Inflation if either Google or the mobile carrier groups together clients in diverse
regions when making routing decisions. In this case, Google directs at least some of the
clients to distant servers. Google may also route a fraction of traffic long distances across
its backbone for measurement or other purposes.
Identifying root causes. We run one or more of the following checks, depending on
the inflated part(s) of the path, to perform root cause analysis (illustrated in Figure 2.3).
18
Examining Carrier Access Inflation. For inflated carrier access paths, we determine
whether the problem is the lack of an available nearby ingress point. To do so, we examine
the first public IP addresses for other traceroutes issued by clients of the same carrier in
the same area. If none of those addesses are in the client’s metro area, we conclude there
is a lack of available local ingress.
Examining Interdomain Inflation. For paths inflated between the carrier ingress point
and the ingress to Google’s network, we determine whether it is due to a lack of peering
near the carrier’s ingress point. We check whether any traceroutes from the same carrier
enter Google’s network in that metro area, implying that a local peering exists. If no such
traceroutes exist, we infer a lack of local peering.
Examining Provider Backbone Inflation. For paths inflated inside Google’s network, we
check for inefficient mappings of clients to servers. We look for groups of clients from
different metro areas all getting directed to servers at either one or the other area for some
period, possibly flapping between the two areas over time. If we observe that behavior,
we infer inefficient client/ resolver clustering.
A small number of traceroutes (< 2%) experienced inflated paths but did not fit any
of the above root causes. These could be explained by load balancing, persistent incorrect
mapping of a client to a resolver/server, or a response to network outages.
19
Lack of local ingress point Lack of local peering point Inefficient client clustering
Are there any traces with
first hop in this area?
Are there any traces served by
local target without exiting area?
Are all traces directed to exactly
one destination at any given time?
YES NO NO
End-to-end path
Analysis
Diagnosis
Carrier Access Part Inflated? Interdomain Part Inflated? Provider Backbone Part Inflated?
Input Carrier Access Interdomain Provider Backbone
YES YES YES
Classification
Figure 2.3: Root cause analysis for metro-level inflation.
2.5 Results
We first present examples of the three dominant root causes for metro-level inflation. We
then show aggregate results from our inflation analysis, its potential impact on latency,
and the evolution of causes of path inflation over time.
Case studies. For each root cause, we now present one example. For each example,
we describe what the traceroutes show, what the diagnosis was, and note the estimated
performance hit, ranging from 7-72% extra propagation delay. We constrain our analysis
to the period between late 2011 and mid 2012, where the dataset is sufficiently dense.
Lack of ingress point. We observe that all traceroutes to Google from AT&T clients in the
NYC area enter the public Internet via an ingress point in Chicago. Thus, Google directs
these New York clients to a server in the Chicago area, even though it is not the server
geographically closest to the clients. These Chicago servers are approximately 1074km
further from the clients than the New York servers are, leading to an expected minimum
additional round-trip latency of 16ms (7% overhead) [?].
20
Lack of peering. We observe AT&T peering with Google near San Francisco (SF),
4
but
not near Los Angeles (LA) or Seattle. Therefore, Google directs clients in those two areas
to servers in SF rather than in their local metros. While our data in these regions become
sparse after mid 2012, we verified that this inflation persists for clients from LA in Q2
2013. The observed median RTT for Seattle users served by servers in SF is 90ms. Since
those servers are 1089km farther away from the servers nearest to the Seattle users, they
experience a delay inflation of at least 16ms (21%). As a result, loading even a simple
website like the Google homepage requires an additional 160ms.
Coarse client-server mapping granularity or Inefficient client/resolver clustering. We
observe a behavior for Verizon clients that suggests that Google is jointly directing clients
in Seattle and SF. At any given time, traffic from both areas was directed towards the same
Google servers, either in the Seattle or in the SF area, therefore exhibiting suboptimal
performance for some distant clients. Figure 2.4 illustrates this behavior over a 2-month
period. Normally, users served by servers in their metro area observe a median RTT of
22ms and 45ms for SF and Seattle respectively. However, when users in one area are
served by servers in the other area (indicated by the filled pattern in the figure), the
additional 1089km one-way distance adds an extra 16ms delay (an overhead of 72% and
35% for SF and Seattle users respectively).
Inflation Breakdown by Root Cause. In this section, we show aggregated statistics
of some of the observed anomalies that cause performance degradation. We focus on Q4
2011 and on AT&T and Verizon Wireless, the period and carriers for which the dataset
is the densest. We also focus on three large metropolitan areas that were populated
4
For the granularity of our analysis, we treat all locations in the Bay Area as equivalent.
21
0
500
1000
1500
Nov 15 Dec 1 Dec 15
# measurements
Seattle server
SF server
(a) SF clients
0
50
100
150
200
Nov 15 Dec 1 Dec 15
SF server
Seattle server
(b) Seattle clients
Figure 2.4: Server selection flapping due to coarse client-server mapping. Dashed areas denote
measurements where the client was directed to a remote server.
Closest
Server
Count Fraction
Inflated
I P D Extra
Dst. (km)
Extra
RTT (ms)
Extra
PLT (ms)
AT&T
SF 7759 1.00 x x 4200 31.5 315
Seattle 303 1.00 x 2106 15.8 158
NYC 2720 1.00 x 2148 16.1 161
Verizon
SF 20528 0.30 x 2178 16.3 163
Seattle 2435 0.33 x 1974 14.8 148
NYC 7029 0.98 694 5.2 52
Table 2.2: Overall results for two carriers in Q4 2011. The table shows what fraction of all
traceroutes from clients in three different locations presented a deviation, cause of the deviation
(I = Ingress, P = Peering, D = DNS/clustering), extra distance traveled (round-trip), extra round
trip time (RTT), and extra page load time (PLT) when accessing the Google homepage.
enough to generate significant data (SF, New York and Seattle). Google servers exist in
all three areas. For all measurements issued from those areas, we quantify the fraction of
metro-level inflations and determine the root cause. We believe that the path inflation
observed in those areas implies probable inflation in less-populated regions.
Table 2.2 shows aggregate results for the three regions. For each case, it includes the
extra round-trip distance traveled as well as a loose lower bound of the additional delay
incurred by traveling that distance, based on the speed of data through fiber [?]. We
observed inflated routes from all regions for both carriers. Most of the traceroutes from
Verizon clients in the NYC area went to servers near Washington, D.C., but we were
22
unable to discern the exact cause. This represents a small geographic detour and may
not impact performance in practice. Verizon clients from the Seattle and SF metro were
routed together, possibly as a result of using the same DNS resolvers, as described in our
case study above. For all traces from AT&T clients in the NYC area, the first public
AT&T hop is in Chicago, indicating a lack of a closer ingress point. AT&T clients from
the SF area were all served by a nearby Google server. However, traffic went from SF to
Seattle before returning to the server in SF. In the traceroutes, the first public IP address
was always from an AT&T router in Seattle, suggesting a lack of an ingress point near
SF, and increasing the RTT by at least 31ms for all traffic. This behavior progressively
disappeared in early 2012, with the observed appearance of an AT&T ingress point in
the SF area. An informal discussion with the carrier confirms initial deployment of this
ingress in 2011. Note that traceroutes from clients in Seattle were also routed to Google
targets in the SF area. Though Seattle traffic reached a local ingress, AT&T routed it
to SF before handing it to Google’s network, indicating a lack of peering in Seattle and
explaining why traffic from SF clients returned to SF after detouring to Seattle.
Evolution of Root Causes. As suggested above, carriers’ topologies have evolved
over time. Since our dataset is skewed towards some regions, we cannot enumerate the
complete evolution of carrier topology and routing configuration, but can provide insight
into why we see fewer path inflation instances over time for some carriers.
Ingress Points. Figure 2.5 maps the observed ingress points at the end of 2011. While
our dataset is limited, we can see indications of improvement. An earlier study [95] found
4-6 ingress points per carrier, whereas our results indicate that some carriers doubled this
figure. This expansion opens up the possibility of much more direct routes from clients to
23
AT&T
Sprint
T-Mobile
Verizon
SEA
PDX
SFO
LAX
SAN
PHX
SLC
SAN
DEN
DFW
HOU
MIA
ATL
MSP
OMA
MCI
BNA
CHI
BOS
LGA
PHL
DCA
CVG
Figure 2.5: Observed ingress points for major US carriers. Locations are labeled with airport
codes belonging to the ingress metro area.
Carrier Peering locations (2011 Q4) (2012 Q2) (2012 Q4)
AT&T CHI, DFW, HOU, MSP, PDX, SAT, SFO + ATL, CMH + DEN
Sprint ASH, ATL, CHI, DFW, LGA, SEA, SFO + LAX
T-Mobile DCA, DFW, LAX, LGA, MSP, SEA, SFO + MIL + MIA
Verizon ATL, CHI, DAL, DCA, DFW, HOU, LAX, + ASH, MIA
SCL, SEA, SFO
Table 2.3: Observed peering locations between carriers and Google. Locations are identified by
airport codes belonging to the metro area.
services. Additionally, we noticed the appearance of AT&T ingresses in SF and LA, and
of at least one Sprint ingress point in LA during the measurement period.
Peering points. Table 2.3 summarizes the peering points that we observe. In 2011, most
traceroutes from Sprint users in LA are directed to Google servers in Texas or SF. In
measurements from Q2 2012, we observed an additional peering point between Sprint and
Google near LA. Around the same time, we observe that Google started directing Sprint’s
LA clients to LA servers.
24
2.6 Path Inflation Today
Our measurements show that many instances of path inflation in the US disappeared
over time. However, in addition to the persistent lack of AT&T peering in the LA area
mentioned earlier, we see evidence for inflated paths in other regions of the world (from
Q3 2013 measurement data). For example, clients of Nawras in Oman are directed to
servers in Paris, France instead of closer servers in New Delhi, India. This increases the
round trip distance by over 7000km, and may be related to a lack of high-speed paths to
the servers in India. We also see instances of path inflation in regions with well-developed
infrastructure. E-Plus clients in southern Germany are delegated to Paris or Hamburg
servers instead of a close-by server in Munich, and Movistar clients in Spain are directed
to servers in London instead of local servers in Madrid. These instances suggest that path
inflation is likely to be a persistent problem in many parts of the globe, and motivate
the design of a continuous measurement infrastructure for identifying instances of path
inflation, and diagnosing their root causes.
2.7 Chapter Summary
This chapter took a first look into diagnosing path inflation for mobile client traffic,
using a large collection of longitudinal measurements gathered by smartphones located in
diverse regions and carrier networks. We provided a taxonomy of causes for path inflation,
identified the reasons behind observed cases, and quantified their impact. We found that
a lack of carrier ingress points or provider peering points can cause lengthy detours, but,
in general, routes improve as carrier and provider topologies evolve.
25
Chapter 3
Analyzing H1 traces to estimate H2 performance (RTH2)
With the standardization of HTTP/2, content providers want to understand the benefits
and pitfalls of transitioning to the new standard. Using a large dataset of HTTP/1.1
resource timing data from production traffic on Akamai’s CDN, and a model of HTTP/2
behavior, we obtain the distribution of performance differences between the protocol ver-
sions for nearly 280,000 downloads. We find that HTTP/2 provides significant performance
improvements in the tail, and, for websites for which HTTP/2 does not improve median
performance, we explore how optimizations like prioritization and push can improve
performance, and how these improvements relate to page structure.
3.1 Introduction
HTTP/2 is replacing HTTP/1.1 as the IETF standard for the delivery of web traffic
and is already supported by major browsers. The design of HTTP/2 has been motivated
by concerns about the performance of HTTP/1.1. The aspect of web performance most
relevant to end-users is page load time (PLT), which has been shown to correlate with
content provider revenue, so content providers have gone to great lengths to optimize it.
26
HTTP/2 is a step in that direction: it multiplexes objects on a single TCP connection,
permits clients to specify priorities, and allows servers to push content speculatively.
Several prior studies have shown mixed results on the performance difference between
HTTP/1.1 and HTTP/2 [61, 78, 93] The relative performance of these two protocols
has been hard to assess because modern web pages have complex dependencies between
objects, and can contain objects hosted on different sites. Many of these prior studies are
focused on lab environments, and some have not used real browsers as test agents, which
can restrict visibility into browser-side tasks like resource parsing, execution or rendering
time.
This has motivated us to study the performance of HTTP/2 using data collected from
live page views by real end-users. Our study uses HTTP/1.1 Resource Timing [31] data
collected from a broad set of customers on a major CDN (Akamai). The data we collect
consists of detailed timing breakdowns for the base page and each embedded resource on
a small sample of all page views.
Collecting this data across many different pages over days or weeks provides a large
body of performance data across a wide range of page types, browsers, geographies and
network conditions. Notably, the values for these parameters that we sample are the
ones that are actually used by the web page. For example, if a particular page has most
end-users in Africa, our data reflects the page performance in Africa, as opposed to the
page performance in a lab emulating African conditions. Furthermore, over time we collect
many samples for the same page, which allows us to report statistics that reflect real-world
timing variations within a single page, such as those induced by the browser cache hit
27
rate, as well as variations based in localization, personalization, cookies, or just differences
between back to back downloads of the same page due to dynamic content.
Contributions. The first contribution of this paper is a model, called rt-h2, that takes
the resource timing data for a single HTTP/1.1 page view, and estimates the difference in
page load times for that page view between HTTP/1.1 and HTTP/2. To do this, rt-h2
models four important components of HTTP/2: multiplexing, push, prioritization, and
frame interleaving. rt-h2 also contains a model of TCP that is reasonably accurate for
Web transfers.
Our second contribution is to explore the PLT differences between HTTP/1.1 and
HTTP/2 from nearly 280,000 page views of customers of Akamai. Of these, we select
55 distinct websites which have a significant number of instrumented page views and
explore the relative performance under zero packet loss. In this setting, page structure
and the diversity of the client base (in terms of location, browser type, etc.) should
determine performance. We found that roughly 60% of the time HTTP/2 has smaller
PLT and 28% of the time it has negligible impact, but there are websites for which more
often than others it leads to performance degradation. We explored two optimizations,
prioritization and push. Push provided more improvement for cases where HTTP/2 was
already beneficial, and both helped the cases that saw degradation with HTTP/2.
Taken together, our findings indicate that CDNs should start experimenting with
HTTP/2 at scale, as it can have benefits for many clients of their customers.
28
3.2 Background and Approach
In this section, we provide some background into HTTP/1.1 and HTTP/2, then discuss
challenges in understanding the relative performance of these two protocols and finally
provide an overview of our approach.
A typical web page consists of tens of resources fetched from many different servers.
Many of these objects have dependencies between them. A base page HTML file is
downloaded before sub-resources can be requested. Once sub-resources are downloaded
and parsed, they can trigger the downloads of other objects. The user-perceived latency
in loading a web page is a complex combination of the time taken to download, parse and
render (if needed) all objects.
HTTP/1.1. The original HTTP/1.0 specification only allowed for one response-request
stream to be transferred per TCP connection. In order for a new request to be sent out,
the current one needed to download completely. Recognizing the performance impact
of this decision, HTTP/1.1 defined an optional pipelining mode, in which a client can
send multiple requests without waiting for replies, but the server is required to deliver all
replies in the order the requests were received. However, pipelining is rarely used, as it is
buggy in many implementations, it is limited to 2 resources by design in some clients and
it is off by default on others, and does not play well with proxies [15].
Subsequent optimizations enabled parallel downloads by opening multiple concurrent
connections to the server. Browsers typically limit themselves to six parallel connections
per hostname. To leverage this to achieve faster downloads, domain sharding, is used to
partition objects across different hostnames.
29
HTTP/2. HTTP/2 allows multiple, concurrent requests to be outstanding on the same
TCP connection. This prevents the case where a resource that the browser is ready to
load is forced to wait for an idle connection. It also allows for explicit prioritization of
the delivery of resources. For example, when a server has received a request for both
an image and a Javascript object, and it has both ready to deliver, the protocol allows
(but does not mandate) that the Javascript be given priority for the connection. This
prioritization facilitates parallelization of client processing and downloading. It also
provides a mechanism for a server to push content to a client without receiving a request
from it. While the standard does not specify best practices for pushing objects, the intent
of this mechanism is to enable servers to keep the pipe to the client as busy as possible.
HTTP/2 also uses header compression, which can affect performance. Uncompressed
GET requests for tens of objects in a web page can take a few round-trips to put on the
wire, especially with TCP in slow-start. Header compression allows many requests to fit
in a single burst [14]. HTTP/2 also mandates HTTPS delivery of content. Except for
potential delays in connection setup, this particular factor is not expected to strongly
influence the performance of web pages.
Page Load Time. Both HTTP/1.1 and HTTP/2 contain performance optimizations
whose goal is to reduce the user-perceived latency of downloading web pages, called
the page load time (PLT). Recent web performance studies have converged upon an
operational definition of PLT [71], which is when the browser fires the onLoad event.
Understanding relative performance: Challenges. HTTP/2 contains several opti-
mizations that should result in better performance than HTTP/1.*, but these performance
benefits may not always be realized in practice. First, while mechanisms for prioritization
30
and push are defined in the standard, actual performance improvements may depend
upon the specific policies that Web servers implement for these optimizations. Second,
interactions with TCP can limit the performance advantages of HTTP/2. Compared with
when objects are retrieved over parallel connections, the congestion window on a single
multiplexed channel grows more slowly. Moreover, parallel connections are more forgiving
of loss: when a drop occurs in a stream, it will only trigger recovery on that stream.
Our Approach. We use Resource Timing [31] data collected using Javascript from a
broad set of customers on Akamai’s CDN. When enabled by a customer, Akamai servers
insert a small body of Javascript into 1% of those customer’s pages as they are delivered
to end users. The script triggers the monitoring of per-resource timing information,
which includes the start/end timestamps for: DNS lookup, TCP connection setup, TLS
handshake if any, request sent to the server, and start and end from response from the
server [31]. The script then encodes that information into a trie structure and delivers it
to an Akamai back-end system. Over a selected one-week period we observed data for
about 44,000 distinct base-page hostnames and 3.4 million distinct base-page URLs.
Unlike prior work [61, 78, 93], our data consists of detailed timing breakdowns for
the base page and each embedded resource from real clients. From this information, we
obtain realistic networks delays and browser side processing and rendering delays for the
complete set of resources in a page, and are able to assess PLTs as reported by browser
onLoad events.
However, our dataset captures HTTP/1.* downloads, so the primary challenge we face
in this work is how to predict the page load performance for this dataset under HTTP/2.
Using a real HTTP/2 deployment on the CDN is, at the moment, not an option, because
31
of the complexity and scale of the endeavor. So, we resort to using a model of HTTP/2,
as described in the next section.
3.3 The RT-H2 Model
In this section, we describe a model and an associated simulator that implements it
(rt-h2), for estimating HTTP/2 PLTs from HTTP/1.1 resource timing data.
Input. The input to rt-h2 is the Resource Timing (RT) data for a single HTTP/1.1
download of a website from a real client. The input can be visualized as a waterfall.
Fig. 3.1 (left) illustrates a simplified waterfall for a page downloaded via HTTP/1.1,
containing seven objects: a base HTML page, one CSS file, 3 Javascript files and 3 images.
The HTML file is downloaded first and it is parsed as it is being downloaded. So, even
before 1.html completes, the browser has determined that 2.css, 3.js and 4.png need
to be downloaded next. These three resources depend on 1.html, and that HTML page
is said to be a parent of these resources. However, not all of these resources can be
immediately downloaded: most browsers limit the number of parallel connections to a
given website, and Fig. 3.1 (left) shows a simplified example with at most two parallel
connections. Therefore, only the download of 2.css is initiated, and other objects are
blocked until 1.html has downloaded.
The waterfall diagram also illustrates three other important features that can be
gleaned from RT data. First, when 3.js completes, it triggers the download of 5.js. The
time between the completion of 3.js’s download and the request for 5.js represents the
processing time for 3.js. The processing time for 5.js is also visible since that Javascript
32
Figure 3.1: Transformation of an HTTP/1.1 waterfall to an HTTP/2 waterfall.
triggers the downloads of two images 6.png and 7.png. Second, 5.js is an example of
3rd-party content (3PC). Recall that our dataset consists of websites hosted on a major
CDN. A particular object is 3PC if it is not hosted on the CDN. Examples of 3PC include
ads, tags, analytics, and external JavaScript files that can trigger the download of other
3PC or origin content. Finally, the dashed line in Fig. 3.1 runs through objects that
represent the critical path in the waterfall. The critical path of the waterfall demarcates
objects whose download and processing times determine the PLT. The browser’s OnLoad()
event is triggered after 6.png is downloaded, so 6.png and its ancestors are on the critical
path.
The waterfall also explicitly contains four kinds of download timing information. The
blue boxes (Request), mark the time from when the object was requested by the client to
the time when the first byte of the object was received. The red boxes (Download), mark
the time from when the first byte of the object was received by the browser, to when the
last byte of the object was received. The brown boxes (Blocked) represent the duration of
time between when an object could have been retrieved and when the request was actually
made. The green boxes (3PC) demarcate the retrieval of third-party content.
33
Real waterfalls have the same elements as in Fig. 3.1, but can be significantly more
complex, with hundreds of objects, several levels of dependencies, and multiple sources of
third-party content.
Output. The output of rt-h2 is a transformed version of the input waterfall produced
by applying the features of HTTP/2 on a real HTTP/1.1 waterfall and the percentage
change in PLT, which we denote Δ
PLT
.
Components of rt-h2. Both HTTP protocol versions are complex, and HTTP/2
contains many optional features with unspecified policies or best practices. rt-h2 is
designed to be able to explore what-if scenarios of different combinations of policies or
optional features. It models HTTP/2 in a layered fashion and has several components, as
shown in Fig. 3.2. We describe each of these below.
Multiplexing. In HTTP/2, a client maintains a single TCP channel with any one server, on
which resources are multiplexed. rt-h2’s multiplexing component, which operates at the
object level, analyzes the input waterfall to determine which objects can be multiplexed.
With HTTP/2, any resource with a URI covered by the certificate of the origin can reuse
the same channel. rt-h2 parses resource URLs and looks for patterns resembling the base
page URL. Because strict string matching does not cover all the cases (www.example.com
and img.xmpl.com can in fact be the same origin), we also assume that any hostname that
serves more than 5 resources in the same download must be origin content. The output of
the multiplexing component is a collection of sets of objects that can be multiplexed by
HTTP/2 because they come from the same server.
34
Figure 3.2: rt-h2 Components
Browser-cached resources are not included in the multiplexer’s output. RT data does
not explicit mark cached resources, so rt-h2 determines an object is cached if its retrieval
time in the original waterfall is less than 10ms. For cached objects, rt-h2 preserves
the timing from the original waterfall. Moreover, rt-h2 does not attempt to model
multiplexing of 3PC content and simply preserves the duration, in the waterfall, of each
3PC resource.
Push. This component emulates the ability of an HTTP/2 server to proactively send
resources to a client without the client having to request them. Push can keep the pipe to
the client full. For example, in Fig. 3.1 (left), 2.css, 3.js and 4.png could be served by
an HTTP/2 server as soon as 1.html is requested, rather than waiting for the client to
request them.
While HTTP/2 specifies a push mechanism, it does not specify what policies to use for
pushing. We have implemented a push policy, ideal-push, which assumes that the server
can assess which objects the client might request after downloading the base HTML file.
This is idealized, since there can be dynamic content, e.g. Javascript can be executed at
the browser, and its output can at best be over-approximated by the server (for example,
by static program analysis). However, ideal-push gives an upper bound on H2 server-push
performance.
35
Prioritization. The last object-layer component of rt-h2 is a component that assigns
priorities to objects. This prioritization represents a way for browsers to control the
way a TCP channel is shared across multiplexed resources. As with push, the HTTP/2
specification defines the mechanism for assigning priorities, but does not mandate a specific
scheduling policy; rt-h2 can explore different prioritization policies. A basic type of
prioritization enforced by today’s browsers assigns bandwitdh resources to Javascript, CSS
and HTML files before other file types since these files need to be processed by the browser
and can trigger downloads of other objects. In this work, we explore a prioritization policy
which further preferentially prioritizes Javascript, CSS or HTML files that are on the
critical path. In practice, browsers can determine this prioritization by extracting critical
paths from historical traces of page downloads.
Interleaving . HTTP/2 permits interleaving of objects, and this rt-h2 component imple-
ments this capability. Among all objects of the same priority that can be multiplexed
together at a given time, it interleaves 16K chunks (called frames) of these objects in
FIFO order.
TCP module . The core of rt-h2 is a custom, discrete-event, TCP simulator which
simulates TCP-CUBIC’s congestion window growth [64]. When determining what data
to transmit, the TCP module supplies the interleaving module with a desired number
of bytes B that can be transmitted at each tick of the simulator. The latter, in turn,
consults the multiplexing, prioritization, and push modules and determines which frames
of which objects need to be served by the server, such that the total size of the frames
is less than B. This is repeated until all objects have been served. The simulator clock
ticks every RTT. We assume that connections are loss-limited, based on experience at
36
Akamai. We model loss by giving each packet an equal chance of getting dropped. If one
or more packets within one window are dropped, we assess a 77% chance of causing a
retransmission time-out [62] and increment the time counter appropriately.
Preprocessing the input. The HTTP/1.1 waterfall input to rt-h2 was produced by a
client running an unknown TCP stack version, and for which we know HTTP layer request
latency, but not TCP characteristics like loss. To compare the two protocols on an even
footing, we run the HTTP/1.1 waterfall through the TCP module, without any HTTP/2
features on, then use the resulting waterfall as input to the HTTP/2 model, and estimate
Δ
PLT
based on those two. This way, any inaccuracies in the TCP model impact both
protocols equally.
Object sizes. RT data does not include object sizes. We need the size of the objects
so we can accurately feed them in the TCP module and get estimated transfer times,
given loss and RTT values. We use a separate dataset to fill in that information from
CDN edge data. For those resources that we don’t have sizes for, we fetch them and add
their size to our existing size dataset. Since some are inaccessible, this leaves a negligible
fraction of resources with unknown size (1%), which we default to 10KB. Dependency
trees. If the download time of a resource changes because of HTTP/2, the start times
of dependent resources need to be updated accordingly. For that reason we need to
have information on the dependency relationships between resources. Inspired by prior
work [92], we use two approaches to determine the dependency trees of websites. First, an
external tool downloads each targeted website many times, each time blocking a critical
resource (jss/css/html). Resources that are not downloaded are declared descendants
37
of the blocked resource. This produces a set of subtrees, which are combined to build
the full dependency tree. The tool ignores resources that only appear occasionally (e.g.
analytics, tags, etc), but focuses on the critical structure of a page, which is consistent over
time. This process is external to the RT data analysis and can be done offline. Second,
for resources for which we do not have dependency information from the external tool,
we derive dependency relationships by analyzing the input waterfall. Specifically, the
parent of a resource is defined as the last css/js/html resource that completed before the
beginning of this resource.
Running a waterfall through rt-h2. Fig. 3.1 shows how rt-h2 transforms an input
waterfall to its HTTP/2 equivalent. 3.js is prioritized over 2.css. As a result, the
simulation returns an earlier completion time for it than its original end time, and adjusts
its dependent resources accordingly, shifting 5.js and its 2 children to the left. The
request for 6.png and 7.png are requested on the same channel, maintaining their distance
from the end of 5.js, which corresponds to processing time. The difference between the
end times of the respective last resources is calculated, and the onLoad event is shifted
accordingly. The Δ
PLT
is defined as the % change between the times of the two onLoad
events.
3.4 Validation
Methodology. In this section, we validate the rt-h2 model against PLT differences
obtained from real traces for ground truth. The goal of validating rt-h2 is to understand
whether the model’s estimates for Δ
PLT
are comparable to those observed in a realistic
38
experiment. We set up an Akamai CDN server in a lab and configured it to serve 8 real
websites both via HTTP/1.1 and HTTP/2. These 8 websites are the most popular ones
in the CDN among those who have opted-in to resource timing monitoring and already
use HTTPS.
We validate rt-h2 against those websites as follows: We download each web page
100 times via HTTP/1.1 and 100 times via HTTP/2. For each HTTP/1.1 download
we generate a resource timing beacon that is used as input to our tool, generating 100
estimated HTTP/2 waterfalls, and we obtain the estimated Δ
PLT
from those. We repeat
this process for 3 RTT values (20ms, 50ms, 100ms). This is the induced round-trip
between the test client and the CDN edge server, which serves all of the pages’ (cacheable)
origin content. For the base HTML (and only that), the client request is forwarded to the
customer’s origin website, the latency to which is variable.
Fig. 3.3 shows an example of this process for one of the target web pages. There
are 3 groups of 4 lines, each group representing a different RTT. Solid lines correspond
to PLTs of real downloads, dashed lines correspond to their modeled equivalents. Blue
lines are HTTP/1.1 and red lines are HTTP/2. Specifically, in each RTT group, the blue
dashed line corresponds to the CDF of PLTs after passing the 100 waterfalls through the
model but without applying the HTTP/2 features (so, simply passing them through our
TCP model), and the red dashed lines corresponds to the CDF of estimated PLTs after
applying HTTP/2. We want the difference between the dashed line (model) to be similar
to that of the solid lines (ground truth), which means that the distribution transformation
of the PLTs in our model after applying HTTP/2 was similar to the transformation of
PLTs that the real downloads observed switching from HTTP/1.1 to HTTP/2. In this
39
Figure 3.3: Example of validation on a real page for 3 RTT values.
example, which corresponds to p1 in Table 1, the accuracy of the model is very good for
RTTs of 20ms and 50ms, but slightly worse for 100ms (predicted Δ
PLT
=-18%, when in
reality HTTP/2 reduced the PLT by 11% (Δ
PLT
=-11%)).
Table 1 shows the ground truth and predicted Δ
PLT
(%) for the 8 target pages. The
values shown are the medians of each set of the 100 runs. The model estimates the overall
impact of HTTP/2 on page load time of the test page very accurately for zero loss, upon
which most of our results are based. For all estimations, the model always correctly
estimates that the impact of HTTP/2 is positive, the difference between ground truth
and estimated Δ
PLT
is within 20% of the PLT, for 3/4 of them it is within 10%. The
accuracy can decrease for higher RTTs (100ms). However, we note that such high RTT
values to the CDN edge are rare. Considering the simplicity of the model, these validation
results are encouraging for using the model to draw conclusions on larger scale data.
40
RTT Δ
PLT
p1 p2 p3 p4 p5 p6 p7 p8
20ms Real -10.5 -12.0 -50.9 -2.6 -18.0 0.1 -15.4 -8.2
Model -7.8 -11.0 -53.2 -6.3 -11.8 -1.3 -2.3 -9.5
50ms Real -9.3 -24.0 -97.7 -6.3 -23.5 -4.4 -11.4 -10.1
Model -8.7 -7.2 -84.0 -9.9 -18.1 -4.6 -2.4 -21.6
100ms Real -15.2 -31.1 -104.4 -6.9 -32.7 -5.8 -14.6 -16.1
Model -9.8 -33.3 -92.6 -15.0 -19.2 -14.2 -6.1 -24.9
Table 3.1: Δ
PLT
(%) prediction. For each page (p1-p8) and RTT value, “Real” indicates the
ground truth PLT % change, and “Model” indicates the PLT % predicted by rt-h2.
3.5 Results
3.5.1 Methodology
Dataset. The RT data contains page views sampled at 1% from Akamai customers who
have opted in this measurement. Each sample produces a waterfall. We run rt-h2 on two
sets of waterfalls. The first is an aggregate dataset of 278,178 waterfalls spanning 56,851
unique URLs and 2,759 unique hostnames, corresponding to about 24 hours worth of data.
We then extracted a per-website dataset of 126,919 waterfalls drawn from the aggregate
dataset. These waterfalls correspond to page views of 55 distinct websites. Each website
has an average of 2,350 waterfall samples, with a minimum of 180 and a maximum of
over 26,000. Intuitively, each website’s collection of waterfalls represents a sample of the
clients of that website, that use various browsers and devices, from geographically diverse
locations. These 55 websites are the most popular of Akamai’s customers that have opted
in to the measurement and contain, on average, 111 objects per page, with the minimum
and maximum being 5 and 500 respectively.
Metrics. Our primary metric is Δ
PLT
. For the aggregate dataset, we are interested
in the Δ
PLT
distribution across all waterfalls. For the per-website dataset, we explore
41
first-order statistics (min, max, mean, median and the top and bottom deciles). We focus
particularly on the 90th percentile of the Δ
PLT
distribution, since tail performance is
increasingly important for content providers.
Experimental Settings. We first understand the performance of basic HTTP/2, and
then explore the impact of two optimizations discussed in Section 3.3: prioritization and
push.
The prioritization scenario was motivated by our observation that default HTTP/2
multiplexing can result in critical objects being downloaded later than they could, which
can happen when many equal priority files are sent simultaneously. This what-if scenario
asks: What would the Δ
PLT
distribution look like if we knew how to prioritize objects
that are on the critical path? This is somewhat hypothetical, since the browser or server
would need to know the optimal order. We are exploring ways to make this possible, but
this scenario gives us an upper bound on the performance improvement.
The push what-if scenario is based on our observation of the considerable idle network
time until the base HTML file is available at the browser. This scenario asks: What would
the Δ
PLT
distribution look like if the server pushed content speculatively? Ideal push
pushes all non-cached objects, and assumes an omniscient server which can predict what
resources the client will need.
Network Conditions. Much prior attention has focused on the impact of network
conditions on HTTP/2 [61, 78, 93]. Our evaluation of the impact of loss on HTTP/2
provided similar findings to previous work [93], so we omit it for brevity. Our primary
evaluations, presented here, are under No-Loss settings, in which rt-h2’s TCP module
does not simulate loss. Recall that rt-h2 does not simulate bandwidth limits, so the PLTs
42
Figure 3.4: Overall impact of HTTP/2 on PLTs over 280K input waterfalls at zero loss.
we observe are a function of (a) the round-trip times to clients, (b) TCP window growth,
and (c) the objects on a page and their dependencies. Since the effects of the first two
factors affect HTTP/1.1 and HTTP/2 similarly under zero loss, the Δ
PLT
results are, to
a large extent, impacted by page structure.
3.5.2 Evaluation
Basic HTTP/2, Aggregate Dataset. Fig. 3.4 plots the distribution of Δ
PLT
s
across the aggregate dataset. Recall that this contains nearly 280K waterfalls. Of these,
almost 60% benefit from HTTP/2 (negative Δ
PLT
). For another 28% of the samples, the
performance of the two protocols is identical, and HTTP/2 actually hurts performance
for the rest. We discuss the possible reasons for some of these below, but these results
paint a nuanced picture: HTTP/2 does improve performance for a majority of waterfalls,
but despite better protocol design, web page PLTs can largely be determined by page
structure and dependencies.
Basic HTTP/2, Per-website dataset. The aggregate dataset provides a macroscopic
view of HTTP/2 performance, but looking at the per-website dataset provides more
43
Figure 3.5: Fraction of times a website was estimated to experience bad(red) / zero(blue) /
good(green) PLT change when using HTTP/2.
interesting insights. Fig. 3.5 shows the fraction of times each website experienced a
negative (green), positive (red), or zero (blue) Δ
PLT
. For a given website, each waterfall
represents a page view by a client. This figure shows that different downloads of the same
page may be impacted differently by HTTP/2. Several factors contribute to this: the RTT
of a client, the variability of user agents, devices and processing times, and the impact
of customizations and dynamic content mean that no two waterfalls are likely to be the
same.
However, Fig. 3.5 hides the magnitude of the Δ
PLT
s on each website, so we resort
to a different view of this result. Fig. 3.6 plots some first order statistics of the Δ
PLT
s,
for each website. The bottom and top whiskers indicate the 10th and 90th percentile
respectively, the bottom and top of a bar indicates the 25th and 75th percentile, and the
dark dot shows the median.
For all websites except 2, HTTP/2 improves PLT at the 75th percentile. In other
words, for these websites, at least 75% of the downloads would see a benefit by using
HTTP/2. For nearly two-thirds of the websites, the 90th percentile of clients would see
a benefit. For nearly half the websites (28 out of 55), the 10th percentile of clients see
44
-20
-15
-10
-5
0
5
10
15
20
0 10 20 30 40 50
Δ(PLT) (%)
Web Page #
Figure 3.6: Δ
PLT
distributions for each website at zero loss. Each candlestick shows
10/25/50/75/90th %ile.
a Δ
PLT
of 10% or more. Taken together, these results present an interesting view of
HTTP/2 performance: under no-loss conditions, the structure of most websites is such
that multiplexing provides benefits.
But why is it that, for a third of the websites, the upper quartile of waterfalls are
negatively impacted by HTTP/2? One hypothesis was that the clients of these websites
had a qualitatively different RTT distribution than those of other websites (HTTP/2 is
known to degrade with RTT [33]). However, plotting the distribution of RTTs (omitted
for space) showed no obvious correlation between the distribution of first-order statistics
of the RTTs and those of the Δ
PLT
s.
Other potential reasons for performance differences across websites could be differences
in macroscopic Web page characteristics such as total payload of resources in the waterfall,
number, total payload and number of resources served from the origin domain (which get
multiplexed), number of cached resources, number of 3PC resources, critical path length,
number of 3PC resources on critical path, number of js/css/html files served from the
45
Figure 3.7: Impact of Prioritization on the (i) Δ
PLT
distribution across all samples of all pages
(left) and (ii) 90th percentile of the Δ
PLT
of each page (right).
origin (and thus get prioritized in that channel) and device type. None of these seemed to
directly correlate with the observed Δ
PLT
s.
So, we resorted to a methodology that explores the impact of optimizations like
prioritization and push, based on observed patterns in manually examined waterfalls.
Each optimization focuses on one aspect of page structure, and we wanted to see if negative
HTTP/2 impact could be explained by some of these.
Prioritization, Per-website data. Fig. 3.7 shows the results of the prioritization
what-if scenario. Recall that this scenario was motivated by the observation that some
pages download many critical objects (e.g. Javascripts), which in turn trigger many other
downloads. Basic HTTP/2 does not prioritize these, so can delay the download of a
resource that is on the critical path.
After applying prioritization, only 2 websites still see a negative impact from HTTP/2
at the 90%th percentile (at most 10% of the time). Fig. 3.7 (right) shows the increase or
decrease in the 90th percentile: for the third of the websites for which basic HTTP/2 can
perform badly in the upper quartile, prioritization provides significant gains, improving
the 90th percentile Δ
PLT
s by up to 4%. We notice that prioritization does not affect
the 90th percentile of the websites in the middle of the figure, for which the impact of
46
Figure 3.8: Impact of Push on the (i) Δ
PLT
distribution across all samples of all pages (left)
and (ii) 90th percentile of the Δ
PLT
of each page (right).
HTTP/2 was already almost always positive. Fig. 3.7 (left) shows that across all waterfalls,
prioritization slightly improves the Δ
PLT
distribution but also removes the tail of negative
impacts.
Push, Per-Website data. Another reason why HTTP/2 performs worse than
HTTP/1.1 is a structural one. We have found examples where HTTP/2 multiplexes 6 or
fewer objects. In such cases, using parallel connections can be better, since each of those
(up to 6 for most browsers) starts with an initial window of 10, whereas HTTP/2 uses a
single TCP channel with the same congestion window. We have seen a similar effect with
domain sharding. When a website is sharded across 3 domains and HTTP/2 multiplexes
18 objects or fewer, HTTP/1.1 wins. This indicates that these websites may have been
optimized for HTTP/1.1.
Fig. 3.8 shows performance using ideal push. As with prioritization, ideal push provides
benefits at the 90th percentile except for 3 websites. However, relative to prioritization,
it improves the median performance of each website significantly and only 7 out of 55
websites do not see more than 10% gain for the top 10th percentile of samples. This more
pervasive improvement is visible in the change in the aggregate CDF (Fig. 3.8 (left)),
where now only 3-4% of waterfalls see a negative performance impact from HTTP/2.
47
Figure 3.9: Impact of HTTP/2 with optimizations on PLTs
Putting it all together. Fig. 3.9 plots the overall impact of the optimizations on the
aggregate dataset. This results in gains with HTTP/2 for nearly 70% of the waterfalls,
equal performance for most of the rest, and only about 1% of the waterfalls seeing worse
performance. The fraction of waterfalls with high performance gains is much higher,
thanks in large part to push.
In summary, our results suggest that HTTP/2’s features provide good performance
gains for most of the websites. For about a third, the top quartile’s PLT performance
worsens with HTTP/2, but this can be fixed with a combination of prioritization and
push. Prioritization addresses structural issues in the waterfall that cause this worse
performance, and push does that too, but also increases the gains for HTTP/2 across the
board.
3.6 Chapter Summary
While HTTP/2 standardization is complete, the conditions under which HTTP/2 improves
over the existing standard are not yet completely understood. Our work adds to this
understanding by analyzing a large dataset of instrumented HTTP/1.1 page views, and
48
a model (rt-h2) that estimates Δ
PLT
from this dataset. We find that HTTP/2’s basic
features can improve the 90th percentile Δ
PLT
for nearly two thirds of the websites. Push
and prioritization extend this further to cover all websites. Our work reveals aspects of
page structure in our dataset that determine the efficacy of push and prioritization. Much
work remains, however, including potentially enriching our model CDNs, and finding
practical methods to achieve the forms of prioritization and push we consider in this
work.
49
Chapter 4
Prepositioning content on the client to speed up page loads
Prepositioning content on browsers before they request it can reduce page load times.
CDNs are in a unique position to provide benefits from prepositioning. When a user lands
on a page served by a CDN, the CDN proxy can push objects while it fetches the HTML
page from the origin. When the user then navigates to another page served by the CDN
(a page transition), the latter has two opportunities to preposition objects: the user think
time, and the time to fetch the HTML from the origin. The challenges in achieving landing
and page transition prepositioning are to know which objects to preposition, to do so in
a manner that does not negatively impact the speed of browser rendering, while taking
into account which objects are shared across several pages, which objects are already
cached on the browser, and the likelihoodof visits to each possible next page. To address
these, we leverage CDN instrumentation, together with a novel optimization framework
that captures these challenges as constraints while striving to minimize expected page
load times. We find that, for page transitions, CDNs can help browsers achieve similar
page load performance as if having a warmed up cache, and the difference in performance
between the two can be explained by browser cache behavior.
50
4.1 Introduction
Loading a web page is a complex process that requires a web browser to setup connections
to multiple remote servers, request tens or hundreds of files, and to download and process
them in the right order before it can finally render the page. While each part of that
process can be a bottleneck for rendering a page, it has been shown that browsers spend
a significant fraction of the page load time idly waiting for content to arrive from remote
servers [92]. As such, there is no single approach to optimizing page loads.
Many websites are hosted on Content Delivery Networks (CDNs) which speed up page
delivery by caching copies of web content on hundreds or thousands of geographically
distributed proxy servers close to users. However, even in a CDN, there are two significant
opportunities to speed up page loads. The first is when the user lands on the CDN for the
first time (or after cached objects from previous visits have been evicted). For the CDN
we study in this work, the majority of landing page visits are such first views. For these
first views, the CDN proxy can use the latency of accessing an origin server (the CDN
customer’s server) which dynamically generates the HTML for the page, to preposition
1
content. We call this PreHTML prepositioning, which can be achieved using HTTP/2
push.
Prepositioning can also occur during page transitions, when the user navigates to a
next page within the CDN. In this case, the CDN can use the time to contact the origin
server, but it can also speculatively preposition objects after the landing page has been
1
We use the term prepositioning to encompass a class of mechanisms [26,27,29] in browsers and servers,
including Prefetch and HTTP/2 Push, that send content to a browser before it explicitly requests that
content, or before it has downloaded the corresponding HTML, and to differentiate from the particular
mechanism Prefetch.
51
loaded, but before the user navigates to the next page. This can be achieved with prefetch
directives which are invoked after a browser’s Onload event, so we call this PostOnload
prepositioning.
Prepositioning faces two main challenges: which objects should be considered, and,
given time constraints, how many and which of those should be prepositioned. We address
these challenges by using (a) an instrumentation infrastructure in CDNs which we use to
estimate the set of objects downloaded during a page view, the time available to preposition
objects, and the empirical probability of visiting each other linked page within the same
site, and (b) a binary integer programming optimization formulation that incorporates
these estimates into constraints and maximizes expected page load times.
While speculative prepositioning has the potential to increase page loads, it comes
with the cost of sending extra data that the client might not need. For this reason, and
because of the associated data costs of speculative prepositioning on mobile clients, this
approach is geared primarily towards (and evaluated on) wired clients.
Using experiments with live downloads of 160 customer websites of a global CDN, we
show that following our proposed techniques for page transition prepositioning achieves
similar performance to having a warm cache (the best possible performance). These
benefits reach up to 33% page load time improvements over the baseline without prepo-
sitioning, achieving performance similar to having all the objects cached. The benefits
come from the fact that page transitions within a website share some objects with the
landing page, so they are cached, and the remaining objects can fit within the PostOnload
and PreHTML timeframes. However, there is still a gap of tens of milliseconds, compared
to having a warm cache. By manually examining browser traces, we find that in all these
52
Figure 4.1: Page download through a CDN
cases, the difference is likely due to Chrome’s (the browser we use) cache behavior [2]:
Prepositioned objects are stored in caches requiring a disk access, which likely explains the
delay compared to experiments with recently cached objects, which Chrome loads from
memory in our setup. Finally, we find that landing page prepositioning can achieve 20-50%
improvement for nearly 40% of the sites, but only with mechanisms for prepositioning
third-party content [12,83]. The gap stems from the cache behavior just explained and
from the lack of sufficient time to preposition all objects.
4.2 Prepositioning Approach and Challenges
To understand how to preposition objects, we briefly revisit the process of a page load.
After a browser requests and receives an HTML file from a remote server, the browser
parses the file and sends requests to that server (as well as others) for objects referenced in
the file that are needed to load the page. The order in which these objects are downloaded
and parsed can impact user experience. HTML files of modern web pages reference inter-
dependent CSS and JavaScript files that are needed to display a page properly [51,68].
Unless specified otherwise by special HTML tags, download and execution of JavaScript
files blocks HTML parsing because executing them can affect the Document Object Model
(DOM), a logical representation of the page structure that is created as the HTML is
parsed. Since JavaScript can reference CSS files, their execution is in turn blocked by
53
the download and processing of CSS files. For that reason, CSS and JavaScript files are,
generally, on the Critical Render Path (CRP) of loading a web page, since their download
and processing time can determine when the first pixels are painted. Other than those
objects, web pages typically contain media files and images, whose delivery and rendering
speed also affects user experience. Since the client cannot request any of those objects
before it starts parsing the HTML, at least two round-trips between the client and the
server are required before they are delivered to the client.
4.2.1 Using Idle Network time to Preposition
This chapter explores reducing the latency of page loads through CDNs. A CDN optimizes
web page downloads for its customers by utilizing a proxy server that is close to the client.
When users visit a CDN customer, they typically first request the landing page for that
customer. The user’s browser requests the HTML file for the landing page from that
nearby CDN proxy. For dynamic pages (pages whose HTML file is dynamically generated
on a web server when they are requested), the proxy typically forwards the request to a
remote origin server and returns that server’s reply to the client. When the client receives
and parses the HTML and sends subsequent requests for embedded objects, the CDN
proxy serves those objects from its cache, without requiring more trips to the remote origin
(Fig. 4.1). After landing on a CDN-hosted page, the user might subsequently navigate to
a next-page: these page transitions typically follow the same steps discussed above.
CDN delivery presents opportunities for optimizing page load times (PLTs). Previous
work has shown that browsers can spend around 30% of a page load time waiting for data
to download from the network [83,92]. Especially for CDNs, as a user browses through a
54
Figure 4.2: Preposition timeframes in a page transition. Green indicates client object request,
blue indicates server response. The network is mostly idle during timeframes shown in gray: before
the HTML arrives at the client and after the load event fires on the browser.
web site there are several timeframes when the network channel is idle, and these represent
opportunities to optimize PLT by prepositioning content.
In this work, we consider the general problem of prepositioning during page transitions,
of which prepositioning for landing pages is a special case. To see why, consider that the
idle time during a page transitions can be broken down into two parts (Fig. 4.2):
PostOnload timeframe There is usually a relatively long time between when a page
loads (signaled by the browser firing the onLoad event) and the moment the user triggers
a next navigation, during which the user interacts with the current page [69]. A CDN
can use this PostOnload timeframe time to speculatively send content to the browser, but
only if the next page is hosted by the CDN (since only in that case can the CDN know
what content to preposition). In order to utilize this time meaningfully the CDN would
need to guess which pages the user might navigate to next and decide which objects from
those pages should be prepositioned.
55
Figure 4.3: Time-to-first-byte of landing pages for CDN users. Data is drawn from RUM data
and shows that there is a long idle network time (463ms in the median) before a client receives
the HTML they request, during which the CDN could speculatevily preposition content for the
requested page.
PreHTML timeframe When a user clicks on a link the browser sends an HTML GET
request. Obtained from real user page load traces through Akamai’s CDN using RUM
data (Chapter ??), Fig. 4.3 shows the distribution of idle network time waiting for the
HTML response to arrive from the origin server. At the median, clients observe 463ms of
idle network time before they receive the HTML file. This PreHTML timeframe (Fig. 4.2)
is a good candidate to preposition content to the client. Unlike the PostOnload timeframe,
in this timeframe the proxy knows which page the user is loading. However, in order to
preposition content efficiently, the proxy still needs to know which objects will be needed
to load that particular page.
In summary, when a user transitions from one CDN-served page to another, we
can use the combination of the PostOnload and PreHTML timeframes to preposition
content. However, when a user lands on a CDN-served page from a non-CDN served page
(e.g. loading a page through a search engine or by typing a URL in the address bar), the
CDN can only utilize the PreHTML timeframe for prepositioning.
56
Idle network times also exist during the page load process, after a browser has received
and starts parsing the HTML file. In this work, we do not consider this timeframe because
the network gaps in that timeframe are smaller, it is harder to avoid interfering with
the normal page load process, and it is harder to reason about what to prefetch in the
presence of dependencies between objects.
4.2.2 Prepositioning Mechanisms
Modern browsers contain many mechanisms to optimize latency, ranging from hostname-
level DNS pre-fetch (that performs DNS lookups for a hostname before objects are requested
from that domain) and pre-connect (that additionally establishes TCP connections to the
server), to page-level pre-render directives (that fully render a page in a hidden tab before
the user navigates to it).
We explore two mechanisms relevant to prepositioning: Prefetch and HTTP/2 Push.
Prefetch is a directive which instructs the client to trigger a low priority fetch for an object
after the completion of a page load, and is meant to be used for next page navigations.
Push allows a server to proactively send an object to a client without the client having
requested it.
For PostOnload prepositioning, we focus on Prefetch because the mechanism is designed
to trigger download after onLoad by default, thus not affecting objects critical to the
current page load. Prefetch requests are sent with low priority, creating less risk for
contention with other object downloads in the next page that are fetched normally.
For PreHTML prepositioning, Push is an attractive mechanism, because it allows the
CDN proxy to preposition data on the client as soon as the HTML request arrives at the
57
proxy, and before the HTML reply is available. This way the proxy can maximally use the
idle time when the HTML request is forwarded to the remote origin. By contrast, with
other mechanisms, the client needs to issue a request for the object after receiving at least
part of the HTML, and requiring an extra round-trip until the object is fetched. One
limitation of Push is that it can not be used for third-party objects (objects not served
through the CDN), since Pushed objects are required to be multiplexed in the HTTP/2
connection established with the CDN. However, third-party objects can be prepositioned
in the PostOnload timeframe, because they can be delivered with Prefetch, even though
an additional request is required.
4.2.3 The Applicability of Prepositioning
While prepositioning mechanisms are already supported in browsers, prepositioning
is applicable only in some (but very important) circumstances. For landing pages,
prepositioning is likely to be beneficial only for first views to a page, i.e., those that do
not load most of the content from the browser cache; its benefits will be smaller when a
user has recently visited a particular page, since a lot of the page’s content will likely be
already cached on the browser.
The CDN studied in this work collects real-user download measurements for perfor-
mance analytics, which include sampled page load traces (4.3.1) with indications whether
objects were loaded from the browser cache. Not all browsers report this, so we only
consider Chrome and Firefox traces, which do. We then tag traces for which more than a
few objects were loaded from cache as repeat-view, and tag the rest as first-views. Drawn
from that data, Fig. 4.4 shows the fraction of first-view visits to landing pages of web
58
Figure 4.4: Fraction of real user page loads for which objects of CDN-served page were not
cached on the browser
sites. This shows significant potential for the applicability of prepositioning for landing
pages in PreHTML time: for half of the pages, at least nearly 95% of the page views were
first views.
When navigating to a next page, the browser is likely to have some objects cached
from the previous page. The remaining objects can be prepositioned in the PostOnload
and next PreHTML timeframe. Further, prepositioning in PreHTML time is likely to
be more beneficial for pages that are dynamically generated by an (often distant) origin
server (static pages can be cached at the proxy). For the CDN we consider, nearly 50% of
the sites have landing pages dynamically generated at the origin server. Moreover, for this
CDN, when a landing page is dynamically generated, the next page is also dynamically
generated in 97% of the cases. In these cases, page transition prepositioning can leverage
both PostOnload and PreHTML timeframes.
While Push avoids one extra round-trip, it can limit the types of objects we can prepo-
sition. Third-party content (3PC) not served through the CDN cannot be prepositioned
with Push. Third-party objects, on the other hand, are not a problem for PostOnload
59
prepositioning, since the mechanism used there (Prefetch) can be used to trigger prepo-
sitioning of content served by third-parties. For that reason, when possible, we want
to preposition third-party objects in PostOnload timeframe. Furthermore, only objects
cached on the CDN can be pushed. In general, cache hit rates are high, so we assume
that anything cacheable is in the proxy’s cache or can be quickly retrieved from a nearby
proxy.
4.2.4 Prepositioning Challenges
Given existing mechanisms, the central challenge in prepositioning is to determine the
best policy: which objects to preposition, and when.
Objects to Preposition For page requests that happen frequently and regularly by
the same user, one approach is to implement prepositioning at the browser, which can
analyze user browsing patterns to periodically warm up pages that the user is likely to
visit again soon. However, such pages are likely to be at least partially already cached on
that user’s browser, reducing the efficacy of prepositioning. In contrast, in this work, we
consider just-in-time prepositioning even for rarely or never-before visited pages, delaying
the prepositioning decisions until when we have information about possible next user
navigation actions.
One challenge in our work is to determine which objects to preposition. For PostOnload,
prepositioning can begin after the browser’s onLoad event for a page fires, but we need a
way to know which pages the user is likely to visit after the current one. Then, for each of
those pages, we need to know which objects are likely to be requested. For PreHTML,
60
once the user has issued a page request, the goal is to utilize the idle time until the HTML
response from the remote origin server arrives at the browser. The earliest indication
the CDN has that a user is loading a page is an HTML request arriving at the proxy, so
selecting objects to preposition should be based on that input.
Finally, prepositioning techniques need to avoid sending objects already cached in the
browser. For example, many CSS and JavaScript objects, as well as images like banners
and logos, are common across several pages of the same web site. When a user navigates
from a landing page to a next page, the browser may already have some of the objects
needed to render the second page.
Determining a Prepositioning Window and Schedule A prepositioning schedule
determines the order in which objects are scheduled, and when prepositioning stops.
For PostOnload, the prepositioning timeframe starts when a page fully loads (onLoad
event fires) and ends when the user triggers navigation to another page. The PreHTML
timeframe begins when a user request initially arrives at the CDN proxy, and ends when
the origin server returns the base page to the proxy. Given a set of objects that should
be considered for prepositioning (the preposition candidates) we need to determine how
many can be safely downloaded within this timeframe (the prepositioning payload) while
maximizing improvements to page load times. Analysis on 160 websites hosted by a
large CDN revealed that the total payload of the preposition candidates can be large,
and aggressively prepositioning all candidates can delay delivery of the base page to the
client, which in turn can negatively impact the page load process, since the HTML is
required at the client before it can start requesting other objects. This is illustrated in
61
Figure 4.5: Change in Time-To-First-Contentful-Paint when prepositioning all candidate objects,
compared to no prepositioning. Prepositioning all potentially useful content at PreHTML time
can have negative impact on user-perceived load times. In our dataset, half of the web sites suffer
performance degradation of up to 50% by aggressive prepositioning.
Fig. 4.5 for landing pages, which shows that prepositioning all potentially useful content
on the browser in PreHTML time actually increases the Time to First Contentful Paint
(a commonly used [40] measure of page load time) for 50% of the pages.
Having determined the size of the prepositioning payload and a total set of preposition
candidates, our next challenge is to weigh those candidates by importance. Due to the
dependencies among objects in a web page and the serialized way they are resolved, having
some objects on the client in advance can have a different impact on user-perceived latency
than having another set of objects of the same total size (for example, a Javascript before
a CSS). More generally, the challenge here is to decide which of many possible candidates
we should fit in the preposition timeframe, depending on the expected benefit they will
have.
62
4.3 Prepositioning Design
4.3.1 Design Overview
CDN proxy servers are typically much closer to the client (usually as close as 20ms)
than the origin web server, which can be hundreds of milliseconds away. As such, the
proxies can deliver several TCP windows’ worth of data to the client before the HTML
arrives from the origin server and is forwarded by the CDN proxy. Additionally, we can
configure CDN proxies to instruct browsers to speculatively prefetch objects for possible
next navigations, based on historically observed data.
Our approach also uses the observation that large CDNs have Real User Monitoring
(RUM) technologies that collect end-to-end user interactions with websites, to ensure that
quality of service is achieved and to identify performance bottlenecks. These technologies
typically consist of browser instrumentation and a RUM backend which stores measure-
ments obtained from the instrumentation. To enable RUM measurements, the CDN proxy
injects a small script into the HTML that it relayed to the client, which instructs the
client to collect timestamps during the page load process and report them back to the
CDN. More specifically, we use the Navigation Timing [21] and Resource Timing [30]
instrumentation data to obtain detailed information of application layer events raised by
browsers of real user page downloads. This data includes sampled information on pages
loaded by real users, with specific details about which objects were downloaded for those
pages, the exact timings of each of the object in the page load process, as well as a hint
about which page a page load was triggered from. As is the case with similar traces we
63
used for the studies described in the two previous chapters, these datasets provide valuable
historical information at very large scale and are central to our approach.
Our design for prepositioning consists of three functionally distinct components: (a)
finding objects that we should consider prepositioning (the prepositioning candidates), (b)
determining how much time we have to preposition objects (the quota), and (c) fitting
useful candidates within the quota.
4.3.2 Selecting Preposition Candidates
We use data from RUM to determine preposition candidates for a web page given its
URL. The RUM backend samples real user page downloads and receives download traces
for many users from different locations and times of day, using different browsers. From
this, it is possible to more accurately determine the set of objects downloaded by a page
view than, for example, by analyzing single page view, since these downloads will reveal
objects tailored to the user, her location or device. This also exposes objects that are
not directly referenced in the HTML file. In practice, we have found that, for a given
webpage, the distribution of the frequency of appearance of an object is strongly bimodal:
some objects appear less than 20% of the time, while others appear in more than 80%
of the traces. To filter out objects that are unlikely to be useful, we select as an initial
set of preposition candidates for a particular page the objects that appear in at least
50% of the real user downloads of that page. In a production deployment, that threshold
can be further optimized by being set dynamically per web site, based on the observed
distributions. However, our results indicate that this approach provided good accuracy
for the set of web sites studied in this work (4.4.2).
64
Many pages are highly dynamic and can change often. Short-term changes occur, for
example, due to objects like pictures of daily deals on e-commerce web pages. Objects
that are part of the structure of a website, like CSS and JavaScript files, change less often.
Our RUM data collection happens at a global scale, collecting enough samples to run
this preposition candidate list calculation often enough to capture both long-term and
short-term changes on a web page.
Finally, from the list of prepositioning candidates, we need to trim objects that the
client has cached. In this work, we focus on evaluating the efficacy of prepositioning for
first views, but a practical deployment of this approach can use a mechanism for signaling
cache state [7]. We are currently prototyping a cache digest mechanism by which a client
can include a summary of its cache in a page request which the CDN can use to trim the
list of preposition candidates.
For page transitions, the preposition timeframe consists of the PostOnload phase and
the PreHTML page of the next page. For the PostOnload phase, since we do not know
yet with certainty which page the user will load next, we need to consider the intersection
of the preposition candidates of each possible next page. We describe below the technique
we use to determine next page candidates. The PreHTML phase candidates are specific
to a particular page, because by that time the user has initiated a navigation to that page.
However, in this phase, there is no need to consider objects that we prepositioned in the
PostOnload phase. If there is no PostOnload phase only the candidates specific to the
requested URL are considered.
4.3.3 Determining Preposition Payload Quota
65
PreHTML timeframe We could consider aggressively prepositioning all candidates.
However, as shown in 4.2.4, this can degrade performance significantly. Instead, we
use a simplified model of TCP behavior to determine the number of bytes that can be
prepositioned before the HTML arrives at the proxy. We call this the payload quota. This
model is applied by the CDN proxy when it first receives the page request. The model
takes two inputs: the RTT between the proxy and the client (RTT
(proxy,client)
) which can
be measured at the proxy, and the TTFB, the time to first byte of the HTML being
received at the proxy, which can be obtained from RUM measurements (in this work, we
use the 10th percentile of these measurements for a given page, to be conservative).
Because the CDN has well-provisioned paths between client and proxy, these paths
are not bandwidth limited, but RTT limited: as such, to estimate the payload quota for
PreHTL time (Q
pre
), we model TCP slow start behavior as follows:
Q
pre
=
R
X
n=1
cwn(i),R =TTFB/RTT
(proxy,client)
Our model is conservative because it assumes a new connection to account for landing
page requests and assumes the default initial congestion window value used by the CDN.
PostOnload timeframe For PostOnload prepositioning we can afford to be less con-
servative than with PreHTML Push for a few reasons. First, Prefetch, the mechanism
we use for PostOnload prepositioning, is by default lower than a user-triggered request
of a page’s HTML file. Some browsers (e.g. Chrome) cancel prefetch requests upon next
page navigation. Even if that does not happen or if prefetched objects are already in
tranfer, server implementations should ensure correct prioritization. Apache recently
66
Figure 4.6: Total preposition Candidates payload
added a patch (based on input from us
2
) that prioritizes prefetched objects in a way
that does not conflict with the critical path of the next page. The lower priority means
that prefetched objects can continue to be sent at idle network times during page load
with small contention between them and the next page HTML. Second, at the end of the
current page load, the TCP window has grown considerably, allowing us to preposition
much more in the same time than we could with Push (assuming a new channel).
Despite this, we cannot preposition all candidates for several reasons. First, there are
too many such candidates (the list includes objects from all next pages), which can add
up to 10s of MBs (Fig. 4.6). Second, not all browsers cancel prefetch requests upon next
page navigation. Third, prepositioning all objects can delay the next page at the server,
even honoring prioritization, because prepositioned objects may introduce head of line
blocking at kernel socket buffers [57].
For this reason, we need a conservative estimate of the payload quota for PostOnload.
A previous study reports 7 seconds of average user interaction times with pages [69]. Since
we cannot know the exact user interaction time for each page, we pick a conservative,
2
When designing our prepositioning mechanisms, we noticed anomalous server prioritization behavior
in Apache and shared this with Apache developers who patched the anomaly [5].
67
configurable quota Q
post
to minimize the impact on subsequent page downloads. In our
experiments, this quota is set to 3 MB, which corresponds to less than a few seconds for a
warm client-to-CDN connection. We expect this to be a lower bound on user interaction
times. CDN operators can configure this parameter based on user think time distributions
derived from RUM instrumentation.
4.3.4 Assigning Candidates to Timeframes
Deciding how to configure prepositioning for page transitions requires knowing the pages
a user might transition to, as well as objects that are required for those pages, and how
they affect page load time. Our solution to this involves two steps: we first estimate,
with instrumentation that we have developed, the empirical probability of transitioning
to a given next page from the current page. Then, we use these transition probabilities
to calculate the expected benefit of different allocations of objects to the PostOnload
timeframe of the current page and the PreHTML timeframes of each of the next pages,
and generate an allocation that provides the maximum expected benefit, while minimizing
cost of prepositioning. We express the cost of prepositioning in terms of payload that was
unnecessarily prepositioned and not used.
Estimating Next-Page Transitions In order to decide what objects to preposition
during a transition, we need a way to predict the next page based on the current one.
This data could be drawn from origin web server logs [54,73,97] but this would require
each customer to provide the CDN access to that data. Page transitions could also be
extracted from browser histories. Some browsers already collect and analyze user browsing
68
patterns to inform some prepositioning experimentally [11]. However, collecting these at
scale would require CDN-browser collaboration and may need to address privacy issues in
order to allow browser-collected data to be exposed to third party entities like the CDN.
We build next page transition probabilities from existing RUM implementation. Our
RUM implementation samples individual page downloads, so it does not contain page
transition information. For this, we extended RUM measurements to include a Referer field,
which contains the previous page from which a page was accessed (Fig. 4.7). The Referer
is a normal HTTP header field that is included in many web requests. By including it in
the beacon that clients send to our RUM backend, we can observe some page transitions
over time.
The Referer has some limitations that are not critical for prepositioning. First, it does
not reveal the full path of a user through a web site; it only reveals a specific navigation
from one page to another, which is why we focus only on single transitions to next pages.
Second, since RUM measruments are collected only for domains that have opted in the
service, the samples can not capture navigations out of the website towards a domain
without RUM enabled, or towards a domain not served by the CDN at all (e.g. the
transition from C to E2 in Fig. 4.7). The measruments can capture navigations into the
website (e.g. following links from a search engine, like the transition from E1 to N2), but
we also ignore those since since we cannot use them for prepositioning anyway. Third,
because of the random sampling, this approach might not reveal all possible next pages
given a page: it can only reveal those transitions that were randomly sampled. Thus,
Referer might miss infrequent next-page transitions; given their low probability, objects
from such pages are not good candidates for prepositioning anyway.
69
Figure 4.7: Collection of real user page transitions. Navigations within the a web site served by
the CDN are sampled, reported to the backend and analyzed offline for page transition probability
estimation.
We use the Referer to estimate the empirical probability of navigating to a given next
page from the current page. Specifically, let L be a landing page for which we want to
perform PostOnload prepositioning, in order to optimize the loading of a next page N
i
of
pageL. We collect all RUM beacons withL as the Referer value. Such a beacon represents
a real transition from L to the beacon’s target page (N
i
). Then N = [N
1
,N
2
,...,N
k
] is the
set of all observed next pages. We then count the number of observed sampled transitions
from L to N
i
(denoted|L→N
i
|) over a period of time in the recent past and calculate
the empirical probability of transitioning to page N
i
from page L as:
P (i) =
L→N
i
P
k
n=1
L→N
n
Optimizing Candidate Assignment to Timeframes Once these transition proba-
bilities are estimated, we want to optimize object prepositioning by assigning preposition
candidates to PostOnload and PreHTML timeframes in order to minimize the expected
page load time across all next pages, while also minimizing unnecessary prepositioning. To
achieve this, we model the problem as an integer (binary) optimization which prioritizes
70
Figure 4.8: Illustration of the network-idle time frames during a page transition without
prepositioning (left) and a potential assignment of preposition candidates in those time frames
(right). The goal is to assign objects to PostOnload/PreHTML times in a way that minimizes
expected TTFCP and PLT across next pages as well as the cost of speculative prepositioning,
taking into account transition probabilities, object types (critical / non-critical / third-party),
object sizes, and available time.
between different object types, while jointly optimizing candidate assignment across the
two timeframes.
Since a user can navigate to any of the next pages, objects used in each of them are
preposition candidates. We denote a set of candidate objects in a next page N
j
byO
j
and
the set of all candidate objects across all next pages byO =
S
k
j=1
O
j
. Note that an object
may appear in more than one page, e.g., in Fig. 4.8, objectsO
1
∩O
2
={A,B} appear in
both N
1
and N
2
. Let s(i) denote the size of an object i.
Differentiating between critical and non-critical objects Objects of different types have
different importance. In most modern browsers, all CSS objects need to be processed before
Javascript can be evaluated, since the latter can reference the former. Page rendering
does not begin until all those critical objects have been received and processed.
3
We
categorize all candidate objects to a set of critical objectsC⊆O and a set of non-critical
objectsN⊆O such that they partition the object setO. Critical objects are important
for the Time To First Contentful Paint (TTFCP), while all objects (both critical and
3
There is an exception to this: some objects can be loaded asynchronously, indicating that they should
be fetched after a page load event fires on the browser. These are usually scripts used for analytics, or
other functions non vital to page rendering or load time. Such objects are tagged in our dataset and
excluded from the prioritization process.
71
non-critical) affect Page Load Time (PLT). Since TTFCP is crucial to user-perceived
latency, we want to prioritize prepositioning the critical objects. Our problem formulation
is based on the empirical observation in our dataset that all critical objects across all
observed next pages can fit in the combined PostOnload+PreHTML timeframes. Since
all critical object candidates can be prepositioned, the goal is to assign them to specific
timeframes (PostOnload or PreHTML) in a way that also maximizes the expected benefit
on PLT, by allowing non-critical objects to be assigned to residual idle times in these two
timeframes after critical objects have been assigned.
Notation. We denote whether an object i is selected for prepositioning in a particular
timeframe with a binary variable. Specifically, every object in the PostOnline timeframe
is available to all next pages, but every object in the PreHTML timeframe for page N
j
is
only available to page j. We use a binary variable c
i,post
to indicate that a critical object
i∈C is assigned to PostOnload timeframe if c
i,post
= 1 and to PreHTML timeframe if
c
i,post
= 0. Further, we define n
i,post
as a binary variable to indicate whether or not a
non-critical object i is assigned to PostOnload timeframe (it is if n
i,post
= 1, it is not if
n
i,post
= 0). Similarly, we define n
i,pre(j)
to idicate whether or not a non-critical object is
assigned to the PreHTML timeframe of a next page N
j
.
The Objective Given the above types of objects, we want to preposition those objects to
minimize the expected page load time, so users spend minimum time on average to load a
page. This can be derived as follows. The total size of non-prepositioned objects of page
N
j
, denoted as PendingPayload(j), is:
72
PendingPayload(j) =
X
i∈O
j
s(i)−
X
i∈C
j
s(i)−
X
i∈N
j
s(i)
c
i,post
+n
i,pre(j)
,
where s(i) is the size of object i,C
j
=C∩O
j
is the set of critical objects for next page
N
j
andN
j
=N∩O
j
is the set of non-critical objects for next page N
j
. The pending
payload for page N
j
does not include critical objects, which are always prepositioned
in our formulation based on our empirical observations. The time to load page N
j
is
load time(j)∼PendingPayload(j) (more specifically, it is proportional to the number
of round trips needed to transfer PendingPayload(j)). Recall that P (j) is a transi-
tion probability to page N
j
from the landing page L. Hence, the expected page load
time is E[LoadTime] =
P
k
j=1
P (j)load time(j), which is the objective function in our
optimization formulation.
The Constraints We define a few constraints in our formulation. First, the PostOnload
quota Q
post
and the PreHTML quota Q
pre
respectively limit the maximum total size of
objects assigned to PostOnload and each of the PreHTML timeframes. This results in the
following constraints:
X
i∈C
c
i,post
s(i) +
X
i∈N
n
i,post
s(i)≤Q
post
(4.1)
X
i∈C
1−c
i,post
s(i) +
X
i∈N
j
n
i,pre(j)
s(i)≤Q
pre
∀j∈
k
(4.2)
73
From the above constraints, every critical object is prepositioned in either PostOnload
or PreHTML, but not both. However, we need a separate constraint set for non-critical
objects, to ensure that they are not assigned to both timeframes:
n
i,post
+n
i,pre(j)
≤ 1 ∀j∈
k
,i∈N
j
(4.3)
Accounting for third-party objects Finally, our formulation has to account for third
party objects that cannot be prepositioned in the PreHTML timeframe, since the Push
mechanism does not allow that. LetT be a set of all third party objects andT⊆O.
Critical third-party objects must be assigned to PostOnload time, since they cannot be
assigned to PreHTML and they must be prepositioned. Non-critical third-party objects
must either be assigned to PostOnload or not at all, since they cannot be assigned to
PreHTML but don’t have to be prepositioned. The third-party content (3PC) constraints
are formally defined as:
c
i,post
= 1 ∀i∈C∩T (4.4)
n
i,pre(j)
= 0 ∀j∈{1,·,k},i∈O
j
∩N∩T. (4.5)
Thus, the overall formulation of the objective is:
minE
LoadTime
s.t. : (4.1),(4.2),(4.3),(4.4),(4.5).
74
Putting it together This formulation is binary integer programming, and can be intractable
for large problem sizes. In our dataset, the largest websites have a total of at most 500
prepositioning candidates. At this scale, this formulation can be solved by an optimized
solver like Gurobi [13] in a few minutes, so it can be easily run offline to generate preposition
configurations. Our evaluations (4.4) use this approach.
Landing page prepositioning When a user transitions to a CDN-hosted page from
an external site, such as a search engine, PostOnload prepositioning cannot be used since
the CDN does not control the previous page. In those cases, we only consider preposition
candidates for the particular page that is signaled through the HTML request, and we
can only assign them to the PreHTML timeframe, so the task is simpler. We order the
candidates in order of importance, from render-critical to non-critical, i.e.: CSS, Javascript,
fonts, images. This ordering is the default prioritization scheme used by browsers and has
also been used as the preferred prioritization in previous work [83,92]. Then, we greedily
assign to PreHTML time as many objects as we can fit, in that order.
4.4 Evaluation of Prepositioning
In this section, we evaluate the efficacy of prepositioning by running experiments on
websites hosted on a global CDN.
4.4.1 Experiment setup and methodology
Testbed and test targets In order to evaluate prepositioning, we test the designs
described above on live web pages hosted on a large, global CDN. We set up a CDN proxy
75
and 5 clients in a laboratory in Washington, and configured the proxy server to serve
real websites via HTTP/2. We then measured the performance of page loads with and
without landing page and transition prepositioning. When prepositioning is not used, the
configuration of HTTP/2 is the same as that in production.
The studied CDN permits its customers to opt into RUM monitoring. At the time
of writing, RUM was enabled for 686 distinct DNS domains. Of these, a little over half
(352) have landing pages that are fetched from the origin server, rather than served from
CDN cache. We use this subset because they are good candidates for the PreHTML
prepositioning techniques in this work. RUM measurements are sampled at a very low
sampling rate, and only 160 of the 352 had at least 50 RUM measurements per day. We
do not consider web sites with fewer than 50 daily measurements, in order to ensure that
we use enough samples for the process of identifying objects that should be prepositioned.
We use these 160 web sites in our experiments below.
Our experiments use five client workstations, each of which is a 64-bit machine with
a 2.5GHz CPU and 16GB memory running Windows 8 and connected using a wired
network. Having multiple clients helps us parallelize the experiments in order to get the
repeated measurement samples from each website for statistical validity. Each client runs
an instance of WebPageTest [46] to instrument the automated page downloads using full
browsers and to export various page load metrics (discussed below). In our experiments,
we use the latest version of the Chrome browser.
The CDN proxy used in the experiments is a server with specifications and configura-
tions identical to the deployed proxy servers in the CDN’s production network.
76
Our experiments are thus designed to measure prepositioning efficacy for wired clients,
in contrast to recent work [83] that has quantified benefits of pushing and prefetching
after the HTML response arrives, for mobile clients, in non-CDN settings. Our results are
a lower bound on the performance improvements likely in mobile since our approach is
additionally using the considerable PreHTML idle time observed in CDNs settings, but we
have left to future work prepositioning for mobile devices because a realistic prepositioning
policy will also need to consider energy and data plan limits, which are not considered in
previous work [83].
Page Load Metrics Several metrics have been proposed for measuring page load times,
each measuring different points in the load process. We evaluate the performance of
prepositioning for two metrics:
Time to first contentful paint (TTFCP) [40] is the time during a page load when the
browser renders the first contentful pixel. Page Load Time (PLT) is typically signalled by
the browser when it finishes loading a page. PLT has been traditionally used as a metric
for page load performance.
We focus on these metrics since they capture two major aspects of a page load that
are relevant to user-perceived latency: initial meaningful paint, and the time all network
activity related to fully displaying the page ends.
Page Transition Prepositioning Experiments These experiments evaluate the im-
pact of prepositioning when a user transitions from a landing page of a web site to another
page within the web site. For this, we perform back to back downloads of the same
page transition many times, with different settings as described below. Each experiment
77
consists of several runs, where a run evaluates a single transition between pages under one
of these settings. For each experiment run, we pick a next page out of the set of possible
next pages, based on the observed probability distribution of real user transitions (4.3.4).
A batch of runs consists of one run for each of the settings: this way, since each run of a
setting is explored at the same time as runs of other settings, we can minimize effects due
to network and time of day variability. We then run 100 batches in order to get enough
samples to obtain statistically meaningful distributions of performance for the following
settings:
Baseline: These runs load a landing page, and then load (and measure) the chosen
subpage, without any prepositioning. This is used as baseline for comparison.
RepeatView: These runs load the second page twice, and measure the second load
(with warm cache). Prepositioning cannot perform better than having a warmed-up cache,
but this setting represents an ideal against which to compare prepositioning performance.
TransPrep: These runs load a landing page and then a secondary page, and perform
page transition prepositioning during the transition to the second page, using the opti-
mization formulation discussed in (4.3.4). The performance of the second page load is
measured.
TransPrepOracle: These runs are similar to TransPrep, but for the PostOnload
prepositioning, assume an oracle that can accurately guess the next page a user will
navigate to. Only objects in that page are considered for prepositioning. By comparing
TransPrepOracle to TransPrep, we aim to evaluate the impact of using estimated next
page probability distributions to select and prioritize preposition candidates.
78
For both TTFCP and PLT, and for each setting, we compute the fractional difference
in the mean (from 100 runs) between that setting and the Baseline. We then draw
distributions of this difference across our 160 targets.
Landing page Prepositioning Experiments These experiments evaluate the impact
of PreHTML-only prepositioning, used in those cases where a user lands for a first view
on a CDN-hosted page from another page.
Baseline: These are page loads without any prepositioning. This serves as the baseline
for comparison.
RepeatView: These runs measure the performance of the second of two consecutive
loads of the same page, so the browser cache is populated with cacheable objects. As
before, this represents an ideal against which to compare landing page prepositioning.
LandPrep: These are page loads where the prioritization process selects the best
candidates, and the push quota limits the total amount pushed in PreHTML time (as
described in 4.3.4), in order to preposition for a landing page.
LandPrep3PC: These runs emulate the ability to additionally preposition third-party
content (3PC) in PreHTML time. For this setting, we want to selectively pre-warm the
browser cache to emulate 3PC prepositioning. Before running an experiment, we trigger
the download of a simple page with directives that instruct the client to Prefetch the
objects we want to warm up [29]. They are included in the Push Candidate selection
process and counted towards the preposition quota. First-party objects are Pushed
normally. We then load the target page and measure its performance. By comparing
LandPrep3PC to LandPrep, we can evaluate how much more performance would improve
79
if the CDN could push third-party objects (or instruct the client to fetch them from a
third-party server as soon as the page request arrives at the CDN proxy). This setting
approximates the efficacy of Vroom [83] if it was applied for wired clients in a CDN setting.
This recent work prefetches third-party content using a different mechanism that was
evaluated for mobile device clients in a non-CDN setting. Another difference is that with
Vroom, prepositioning of third-party objects would only happen after the HTML has
arrived at the client, which does not utilize the PreHTML time.
4.4.2 Results
Prepositioning for page transitions Fig. 4.9 (top) shows the performance improve-
ment, relative to baseline (no prepositioning) for the 160 targets for the PLT metric.
Here, RepeatView has a significant improvement: over 40% of the pages see nearly 25%
improvement in PLT. This suggests that next pages have a lot of content distinct from
that of the landing page.
Importantly, from our perspective, TransPrep matches RepeatView quite closely: it
suggests that our optimizations in 4.3.4 are identifying and prepositioning objects correctly
based on transition probabilities, while taking into account the fact that the same object
can appear in multiple next pages. Thus, our CDN can use prepositioning to match warm
cache behavior for page transitions.
We also note that the differences between TransPrepOracle and TransPrep are negligible
in this case: a careful optimization of the prepositioning quota, together with empirical
80
RepeatView
TransPrep
TransPrepOracle
TransPrep
TransPrepOracle
RepeatView
Figure 4.9: Impact of page transition prepositioning compared to baseline performance. The
performance boost provided by careful prepositioning during page transitions (yellow line) is close
to the performance boost of loading the second page with a warm browser cache (green line).
Additionally, speculative prepositioning considering objects from all possible next pages (yellow
line) performs similarly to having an oracle that can predict the next page and considering for
prepositioning only objects that exist in that page (red line).
page transition probabilities from the Referer seem sufficient to overcome the uncertainty
of not knowing which next page the user is going to visit next.
Fig. 4.9 (bottom) shows the impact of prepositioning on TTFCP. TTFCP is affected
only by a subset of objects, those on the critical render path. We observe that, for nearly
80% of the targets, RepeatView only improves TTFCP by less than 20%. This small
improvement suggests that the critical object payload for these websites was small. If a
user transitions to a next page within the same website as the landing page, the next page
81
will only need to download those critical objects that are not cached from visiting the
landing page. For most pages then, TTFCP gains using prepositioning are expected to be
small (since RepeatView gains are an upper-bound on prepositioning gains). However, for
nearly 20% of the websites, the TTFCP gains from RepeatView can range from 20% to
over 50%.
Like with PLT, TransPrep follows RepeatView for TTFCP, showing that page transition
prepositioning has equivalent benefits as a warm cache for many websites. TransPrep also
performs as well as TransPrepOracle. This is as expected since we have enough quota
to preposition all critical objects for any transition. The downside is that we may be
prepositioning objects that may not be used (discussed below).
But why is there a gap between TransPrepOracle/TransPrep and RepeatView? There
are 3 plausible reasons. One is the error in estimating skeletons from the RUM data.
There are cases where cache-busting mechanisms lead the RUM-based candidate selection
to ignore some objects, because their URLs change too often. Such mechanisms are
sometimes used be web page owners specifically to prevent intermediate caches like CDNs
from caching those objects. This can be done by adding a timestamp to the URL of
an object, for example. As such, the RUM-based prepositioning candidate selection
process could fail to identify those objects as popular. By comparing selected candidates
with downloaded objects, we find that these missed objects are too few to explain the
performance gap. The second is that, for some pages, it may not have been possible to
preposition all objects. However we verified that for TransPrepOracle, the preposition
quota is enough for all candidates in a particular next page.
82
The third hypothesis is that prefetched objects could be processed differently by the
browser than cached objects in RepeatView, leading to a performance difference for the
two cases even when all the objects that were cached in one scenario were prefetched in the
other. By manually examining waterfalls and Chrome download traces for 37 websites in
our set, we found that browser processing accounts for part of this performance gap. This
also explains the gap between RepeatView and TransPrep/TransPrepOracle for TTFCP.
In Chrome (but other browsers are qualitatively similar [2]), objects can be stored
in memory, in an HTTP (or disk) cache, and in a Push cache which stores HTTP/2
pushed object. Prefetched objects from the PostOnload phase are stored in the HTTP
cache, and Pushed objects in the PreHTML phase are stored in the Push cache. How
the browser implements the Push cache is browser dependent: Chrome appears to use
disk to store pushed objects. The difference between RepeatView and TransPrep arises
from this difference: in our RepeatView experiments, all objects are in the memory cache,
while in TransPrep objects have to be retrieved from the on-disk HTTP cache or Push
cache. We illustrate this difference using Chrome traces from one of our experiments (Fig.
4.10). The top shows prepositioned objects that take tens of milliseconds to load from the
cache. The bottom shows all of those same objects are loaded within 1 millisecond in the
RepeatView run.
Prepositioning for landing pages For landing pages, Fig. 4.11(top) shows the impact
of prepositioning on TTFCP. Landing page prepositioning shown by the Push curve, shows
small benefit by itself, achieving less than 10% gains for 90% of the websites. A major
reason for this is the prevalence of 3PC content [63]. Being able to preposition 3PC
83
Figure 4.10: Loading from cache objects that have been prefeteched (top) versus fetched normally
(bottom)
content, as in the LandPrep3PC which emulates that ability, brings the prepositioning
performance closer to RepeatView for TTFCP: for about 40% of the websites, more than
10% TTFCP gains can be had, with 20% of the websites seeing 30-50% improvements.
However, there is still a gap between LandPrep3PC and RepeatView: the reasons for it
are similar to the reasons for the gap between TransPrep and RepeatView for PLT.
Finally, we observe that, while CDNs have an easily deployable mechanism for prepo-
sitioning 3PC content for page transitions (by using the Prefetch directive on the landing
page), mechanisms for prepositioning 3PC content for landing pages are still not widely
deployed. Recent research has explored and evaluated a mechanism that could allow
prepositioning of third-party objects [83] right after the HTML arrives at the client (but
not PreHTML), and other proposals are under development [12].
84
RepeatView
LandPrep3PC
LandPrep
RepeatView
LandPrep3PC
LandPrep
Figure 4.11: Impact of PreHTML prepositioning for landing pages compared to baseline perfor-
mance
As an aside, we saw earlier (Fig. 4.5) that prepositioning aggressively, without
considering the available PreHTML idle network time and without imposing a quota,
results in up to 50% worse performance for about half the websites. The LandPrep line in
Fig. 4.11 (top) shows that the use of a preposition quota mitigates that risk: the use of a
quota ensures some performance improvement for almost all pages.
The performance impact of landing page prepositioning on Page Load Time (Fig.
4.11, bottom) follows a similar pattern to that of TTFCP, except that the gap between
LandPrep3PC and RepeatView is higher, likely due to the larger excess quota than for
85
0 20 40 60 80 100
0
0.2
0.4
0.6
0.8
1
TransPrep
TransPrepOracle
RepeatView
TransPrepEqualProb
Requested payload reduction (%)
CDF of Web Pages
Figure 4.12: Data reduction during page load process due to transition prepositioning
next-pages (which benefit from having a PostOnload timeframe, and have cached objects
from the landing page).
Payload reduction during page load due to prepositioning Fig. 4.9 showed
that the performance improvement when prepositioning considering next page transition
probabilities is comparable to that of having an oracle that can correctly guess the next
page. To explain this, Fig. 4.12 plots the reduction in bytes transferred during page loads
after speculative prepositioning. This shows that TransPrep and TransPrepOracle result
in comparable reduction in the number of bytes transferred, relative to the baseline scheme
(TransPrep is less accurate since it considers objects from all potential navigations).
Comparison to simpler prepositioning alternatives In Fig. 4.12, the line TransPrepE-
qualProb shows the payload reduction if a simpler technique was used to choose what
objects should be prepositioned in the available time. In this scenario, the estimated next
page navigation probabilities are ignored, and all next pages are assumed to have equal
86
−30 −25 −20 −15 −10 −5 0
0
0.2
0.4
0.6
0.8
1
proposed prepositioning
most shared object
most popular next page
Page Load Time change (%)
CDF of Page Loads
Figure 4.13: Comparison of proposed prepositioning to strawman approaches. Taking into
account objects types and the estimated probabilities of navigating any next page leads to
increased benefit than prepositioning for the most popular page, or for the most popular objects
probabilities of being navigated to. The difference between TransPrep and TransPrepE-
qualProb highlights the benefit of using estimated probabilities from historical data.
Specifically, when taking the estimated transition probabilities into account, the median
next page load observes 50% payload reduction after prepositioning, while the respective
decrease is 24% if equal probabilities are assumed.
Similarly, Fig. 4.13 shows how our prepositioning mechanism compares to two other
strawman approaches, in terms of impact on PLT. The first approach (most shared object)
selects objects to preposition based on how many of the next pages they appear in, without
considering the probability of navigating to each of these pages. The second approach
(most popular next page) considers prepositioning all the objects that appear in the most
probably next page, but none of the objects in other pages. The proposed prepositioning
approach considers the probability of requesting any object from the total set of objects
existing in all next pages, by taking into account what objects appear in each next page
87
LandPrep
LandPrep3PC
Figure 4.14: Fraction of prepositioned data that was used during page load (landing page
prepositioning)
and the probability of navigating to it. As the figure shows, this achieves double the
performance benefit than any of the simpler alternatives on the median case.
The Cost of Speculation Fig. 4.14 shows the fraction of bytes that were prepositioned
speculatively and were actually required and used during a page load (after the HTML
was parsed by the browser), for PreHTML prepositioning. We call this the prepositioning
efficiency. We extracted this information by analyzing Chrome timeline traces for the
issued experiments. The figure shows that for 80% of the pages, all of the prepositioned
payload was useful, while only for 10% of the pages is less than 80% of the prepositioned
payload used.
Similarly, for transition prepositioning, Fig. 4.15 shows the difference in useful prepo-
sitioned data between having a hypothetical oracle that can predict the next page
(TransPrepOracle), versus having to consider candidates from all pages (TransPrep),
when prepositioning for page transitions. With the oracle, the efficiency is high for all
pages (although not all prepositioned payload is always used for all pages, because we
88
TransPrep
TransPrepOracle
Figure 4.15: Fraction of prepositioned data that was used during page load (transition preposi-
tioning)
calculate the set of objects to be prepositioned from sparsely sampled RUM data). In
practice, without an oracle (TransPrep), the efficiency is lower and uniformly distributed,
because we speculatively additionally preposition objects for pages that are not visited.
Together with Fig. 4.9 and Fig. 4.12 these plots show that although speculative
prepositioning fetches more objects than necessary for a single next page and achieves
slightly suboptimal pending payload reduction, it gives similar performance improvement
as having an oracle that accurately guesses the next page.
Comparison of Preposition Candidates selection to Vroom As described in
section 4.3.2, we use historical real user (RUM) data to identify the potential set of
useful objects consider prepositioning. The benefit of this approach is that by analyzing
hundreds of real page loads from real user every day, we expose differences from page loads
from various user agents, locations, personalization settings, and times of day. This helps
accurately identify objects that are stable across different downloads, and not tailored
89
to the user, location, personalization etc. This also exposes objects that are not directly
referenced in a page’s HTML.
Vroom [83] uses an approach which combines a similar offline analysis of previous page
loads with an online analysis of the HTML file that is being served. Specifically, for the
offline analysis, a Vroom-enabled server periodically loads a web page every hour, and
generates the intersection of objects appearing in the last three loads of that page. For
the online analysis, a Vroom-enabled server parses an HTML that it serves, right before
it sends it to a client, in order to identify any embedded URLs that exist on the HTML
of that particular page load. While this adds 100ms of delay, it is meant to identify
objects that the offline part of the analysis could not capture, since many objects vary for
back-to-back loads of the same page.
To simulate Vroom’s candidate object generation, we followed this process: to get the
set of objects detected from the offline analysis part, we used three RUM data points
for each target web page, each generated one hour apart from the other, and got the
intersection of the objects appearing in those downloads. For the online analysis set,
we used an HTML file generated for a page load on the same day as the three offline
downloads, and analyzed it to extract embedded URLs. For reference, we also kept the
full list of objects requested for that particular ’online’ download. The union of the offline
and the online sets gives us an estimation of what Vroom would have selected in our
experiment runs.
Fig. 4.16 shows that our RUM-based approach (PCs) generates a Preposition Candidate
set of 10-15 fewer objects compared to that generated by Vroom (Vroom(Offline and
Online)) on the median case. Vroom’s online analysis finds only 1 extra object that
90
0 50 100 150 200 250
0
0.2
0.4
0.6
0.8
1
Vroom(Offline and Online)
Number of objects
CDF of Web Pages
Vroom(Offline)ButNotInPCs
PCs
Vroom(Online)ButNotInPCs
Figure 4.16: Preposition Candidates selection compared to Vroom. Our approach based on
historical real-user download analysis can capture the same objects as Vroom, while filtering
out objects that vary between downloads more accurately, and without requiring the additional
overhead of online HTML analysis.
our approach does not find, on the median case (Vroom(Online)ButNotInPCs). Almost
all the extra candidates that Vroom identifies, which the RUM does not, are generated
from Vroom’s offline analysis (Vroom(Offline)ButNotInPCs). Vroom’s offline filtering of
popular objects, which is based on 3 samples from the last 3 hours generated by a single
client location, is expected to be worse than that based on hundreds of samples per day
generated by different users and user devices around the world. Analysis verifies that
the extra objects that Vroom identifies are objects that actually change over multiple
downloads of the same page, which did not exist in the full set of objects downloaded in
the ’online’ download sample, and so should not be considered for prepositioning.
Fig. 4.17 shows this result from a different perspective by highlighting a few points:
first, our approach captures almost all the obects that Vroom’s online approach identifies
(Vroom(Online)AndPC and Vroom(Online) match); second, the common objects found
by both approaches are the same as the common objects found by both approaches if
91
0 50 100 150 200 250
0
0.2
0.4
0.6
0.8
1
Vroom(Online)
Vroom(Offline)AndPCs
Number of objects
CDF of Web Pages
VroomAndPCs
PCsButNotVroom
Vroom(Online)AndPC
Figure 4.17: Comparison between Vroom’s and our Preposition Candidate selection
Vroom only did offline analysis (VroomAndPCs and Vroom(Offline)AndPCs match); lastly,
our approach identifies a negligible number of additional candidates compared to Vroom
( PCsButNotVroom);
These results show that a RUM-based offline-only approach that uses a large amount
of diverse historical data can overcome the limitations of the offline-only part of Vroom’s
object selection, achieving slightly better filtering of unnecessary objects, and at the same
time obviates the need for an online analysis of the HTML, which adds 100ms processing
delay but provides negligible additional benefit to the RUM-based preposition candidate
set generation.
On the Stability of Next-page Transition Probabilities A fairly accurate estima-
tion of the probability of a user to navigate to a particular next page is important for
the process of selecting objects to preposition during page transitions. Since we estimate
92
Figure 4.18: Similarity of prepositioned payload for the next page when objects are selected
based on next page probabilities derived from two different weeks
those probabilities based on page transitions exposed through the Referer field of page
requests, we are making an intrinsic assumption that these probabilities are relatively
stable over time. This is because we aggregate these transition probabilities over a period
(in our experiments, a week), given the low sampling rate of the RUM data.
In order to validate the stability of these estimates, we look at how those distributions
might change over time. Specifically, we want to evaluate how distribution changes affect
the selection of prepositioned objects. For that, we look at the similarity of the selected
prepositioned payload for next-page navigation, when objects are selected based on next
page probabilities derived from Referers aggregated over two different, consecutive, weeks.
Fig. 4.18 shows that, for half of the pages, at least 85% of the selected payload was
the same. We have reason to believe that the transition probabilities are more stable
over shorter timescales (on the order of days), so we expect to improve these results by
increasing the sampling rate of the RUM measurements.
93
4.5 Chapter Summary
In this chapter we examined the design and evaluation of landing and page transition
prepositioning in large CDNs. Our technique leverages CDN instrumentation which enables
collection and analysis of large datasets of real user download traces, used to determine
preposition candidates and page transition probabilities. We also develop an optimization
algorithm that optimizes expected page loads, subject to browser rendering order and
page sharing constraints. We find that CDNs can achieve warm-cache performance for
page transitions, modulo performance artifacts due to browser caching behavior.
94
Chapter 5
Literature Review
This chapter summarizes and organizes previous research that is related to the work
presented in this dissertation as well as newer or ongoing work relevant to some of the
topics we discuss.
5.1 Path Inflation
We have demonstrated how mobile network topologies and path inflation have evolved
in the last decade, and that ingress point location is one of several factors that can
affect mobile performance. These results complement research from 10 years ago which
showed that interdomain routes suffer from path inflation particularly due to infrastructure
limitations like peering points only at select locations, but also due to routing policies [88].
More recent work has contradicted this older result by showing that many content providers
have direct connections to the majority of networks hosting their users [56].
A large body of work has investigated path inflation for wired networks as well as its
implications in the past. Savage et al. [85] as well as Tangmunarunkit et al. [89] showed
95
that the most well performing existing path between a client and a server is not always
selected, as a result of the policies used for inter-domain routing on the Internet.
Other researchers investigated reasons for suboptimal performance of clients of Google’s
CDN, showing that clients in the same geographical area can experience different latencies
to Google’s servers [66,100]. In a similar vein, Rula and Bustamante have investigated
how poor localization of clients by DNS can result in inflated latency [84].
5.2 Mobile performance
Cellular networks present new challenges and opportunities for studying path inflation, as
well as other factors related to page load speed. One study demonstrates differences in
metro-area mobile performance but does not investigate the root causes [87]. Other work
shows that routing over suboptimal paths due to lack of nearby ingress points causes a
45% increase in RTT latency because of the additional distance traveled, compared to
idealized routing [58].
Xu et al. analyze how transparent Web proxies in cellular networks affect Web
performance [96] for mobile clients. Some studies recently looked at the performance
of mobile virtual network operators (MVNOs) that operate on top of existing cellular
structures [86,99].
The Google-backed AMP project [3] is an open source initiative that encourages page
developers to adhere to best practices (derived from Google PageSpeed Insights [25]), in
order to ensure fast delivery and loading of pages on mobile devices. Other recent work
96
optimizes mobile page delivery by preprocessing scripts on the server on behalf of the
mobile client [75].
5.3 HTTP/2 performance
Several prior studies have assessed the performance of SPDY [33,55,60], the precursor
to HTTP/2. These papers have shed significant insight into SPDY performance and,
especially, the impact of different network characteristics on the performance of SPDY
compared to HTTP/1. Many of those studies use the approach of recording and replaying
websites, which can miss out on unreplayable parts of a page download, and does not
expose the variability across many downloads of the same page due to personalization,
localization and dynamic content [78, 93]. Our work uses traces from real page views,
so contains actual processing and rendering delays, and realistic client distributions.
Furthermore, by using a model, we are able to explore several what-if scenarios on a very
large dataset at fairly fast speed.
Prior work [34, 61] has also focused on impact of SPDY specifically on cellular
networks. The results are ambiguous, with some showing PLT decrease by 23% and others
highlighting that the single channel enforced by SDPY suffers more often from spurious
retransmissions. Our work is complementary, and we have left to future work to explore
whether Δ
PLT
s for mobile clients are qualitatively different from those for desktop clients.
Our HTTP/2 modeling work would benefit by the help of a tool that calculates
object relationships, like the browser plug-in wProf [92]. Unfortunately wProf calculates
dependencies in real-time, which cannot not be used at the scale of traces that we are
97
dealing with. A similar tool could be used to share structures and critical paths of targeted
websites, which can inform optimal prioritization.
5.4 Prepositioning content on the client
Recent work focuses on prepositioning content from servers to mobile clients, with the
origin server pushing objects and instructing the client to prefetch third-party content
as it sends the HTML response [83]. The impact is evaluated by replaying copies of web
pages from a controlled server. Our design makes use of a CDN infrastructure, allowing a
CDN proxy to use the time well before the HTML arrives from a remote server. More
importantly, we introduce PostOnload prepositioning for page transitions. Prepositioning
for next pages has been explored in the past through models and simulation [57,67,79].
We evaluate the impact of prepositioning in two time frames during page transitions
(before and after the arrival of the next page HTML), and on live pages served by the
origin server in a realistic CDN environment. We additionally introduce the notion of a
preposition quota, to avoid negative impact.
Other work has evaluated the impact of using HTTP/2 Push in simulation [98] or by
replaying copies of pages from a controlled server [93]. Those approaches did not consider
prepositioning for next pages. Among those, some show mixed results, verifying the
need for a preposition quota [82], and exposing a flaw in replay-based evaluations: Most
modern dynamic web pages are hard to replay faithfully, as they often involve execution
of server-side scripts that cannot be emulated by the replay server, and this can end up
98
missing a whole subtree of the page load, increasing the risk of misleading results. Our
evaluations perform downloads on live pages.
More recently several articles have appeared that describe experiences with HTTP/2
Push and lessons learned [1,16,17,32]. A lot of that is consistent with the work presented
in this dissertation and concludes that Push is not suitable for all environments and that
it should be used with caution in order to extract its benefits.
Previous work evaluates the efficacy of prepositioning for mobile applications by
predicting user actions based on ad-hoc application logs, concluding that future work
should focus on deciding which objects should be prepositioned [81]. Some of the work
presented in this thesis has been a step in this direction.
5.5 Other Page Load Optimizations
Web performance has gained a lot of attention in the last 10 years, both in the academic
community and the industry, as it has been closely associated with company revenues.
Specifically, delays in page load times directly affect bounce rates (percentage of web page
visitors that give up waiting for a page to load and navigate to a different page, which
can have a huge impact on for-profit organizations, particularly e-commerce web sites. As
such, there has been a very large body of work that focuses on reducing pages load times.
Analyzing object dependencies for better prioritization Some prior work reduces
page load time by analyzing web dependencies either offline [51] or on the fly [76] and
then prioritizing web page content such that the critical path of a page load is shortened.
99
These approaches try to make best use of the idle network time during a page load, rather
than around it, and hence are complementary to our prepositioning work.
Network protocol optimizations SPDY [18] and QUIC [28] propose network opti-
mizations to improve page load times. Other protocol optimizations help reduce connection
setup times [80] or loss recovery time [59,62].
Proxy-based optimizations A line of work has utilized proxies to speed up page loads.
Flywheel [47] and Opera Mini [23] achieve this by compressing data on intermediate
proxies and delivering a smaller payoad to the client. Shandian [94] and Amazon Silk [4]
use proxies to execute part of the page load on behalf of the client in order to reduce the
processing overhead on the client.
Other known web development best practices [25] are often enforced by intermediate
proxies, by restructuring the HTML sent by the remote origin server at the proxy. Such
optimizations include combining all stylesheets into one file, inlining scripts into the
HTML, or combining images into a single large image (”sprites”) to reduce the number of
stylesheet, javascript, or image objects downloaded.
Browser optimizations The browser itself presents several opportunities to implement
optimizations relevant both to network activity and processing, rendering, etc. Zoomm [53]
preprocesses the HTML as soon as it arrives, in order to prefetch and preprocess objects
before the HTML parser requests them. Such optimizations that trigger speculative
download of objects exists in a lot of popular modern browsers. Other work suggests
splitting data to chunks to parallelize processing on the browser [70, 72], which aims
100
at reducing the traversal of the critical path of a page load. One recent browser-side
optimization for Chrome detects when a connection is slow on a mobile device, and loads
a previously stored version of the page on the device [10].
In order to reduce time user-perceived load times, Chrome’s lazy loading [9] defers
downloading images that are ”below the fold”, i.e. objects that do not appear on the part
of a page that fits on the user’s screen, and will only trigger the download of those images
if the user scrolls down to the part of the page that they exist in.
The World Web Consortium’s Network Information API [45] defines a way for browsers
to determine and report the connection type used by the browser at any given time.
Google’s Network Quality Estimator [22] uses that API to constantly collect user-end
measurements enriched with network quality information, and provide developers with
page load and network performance metrics as experienced by the client. Such historical
or real-time measurements can be used to adapt different aspects of page loads depending
on network quality, from dynamic transport protocol parametrization to even deciding
between application-layer protocols (e.g. HTTP/2 vs QUIC) [74]
Page load performance metrics A very significant body of research is dealing with
identifying or defining the right metrics to measure performance. While this research does
not always directly optimize transfer speeds, it aims at answering a very important question:
What metrics should we optimize for in order to achieve user-perceived performance boost?
Traditional metrics like Page Load Time (PLT) [24] or Time To First Byte (TTFB) [39]
have been shown to capture user-perceived performance inadequately: PLT captures the
time all network activity related to a page load ends, which can happen well after a web
101
page has loaded and had become interactive, as far as a user is concerned. TTFB, on
the other hand, can signal a time well before a page is usable, as it could signal the
arrival at the client of first bytes of the header of an HTML file [36]. Time To First Paint
(TTFP) [41] signals the time the first pixel of a page was drawn on screen. However, this
pixel could be anything painted too early, like the background color of a page, and the
page could be interactive much later than that, so this metric is often not helpful.
On the other hand, Time To First Contentful Paint (TTFCP) [40,41], signals the time
some text, image, canvas is painted on the screen. While this captures user-perceived load
speed more accurately than previous metrics, it can still fire when some less meaningful
content like a header or navigation bar is shown. As a step closer to user-perceived
performance metric, Time to First Meaningful Paint (TTFMP) [42,43] signals the time
some meaningful content is painted on the user screen. This measures layout operations
by the browser, and captures the time most of the content was painted on the screen, thus
avoiding small (and less interesting to the user) layout events.
Speed Index [35] takes the visual progress of the visible page loading and computes an
overall score for how quickly the content painted. This is (usually) achieved by exporting
a video capture of a page load and analyzing the rendered images at 100ms intervals,
to capture load progress over time. However, the fact that a page is displayed does not
mean that it is ready to be used by the user; parts of the page might require evaluation
of scripts before they are fully functional and responsive to user input. For this reason,
Netravali et al. propose Ready Index [77], to capture the time when the above-the-fold
content has been displayed and the page has become responsive and interactive. This is
similar to Chrome’s definition of Time To Interactive [44]. Lastly, Kelton et al. define
102
user-perceived PLT (uPLT) [65], which is measured by monitoring the user’s eye gaze on
the screen, and propose using that metric to measure and improve performance.
103
Chapter 6
Conclusions
6.1 Summary of contributions
In this dissertation we investigated two factors that can limit the speed of web transfers:
circuitous paths and unused channel capacity between clients and servers. Additionally,
we designed and evaluated an approach for increasing the efficiency of communication
channels between clients and CDN proxy servers.
In Chapter 2 we examined longitudinal measurements to Google servers collected over
18 months from mobile devices in diverse regions using different carrier networks. We used
that data to identify path inflations, to create a taxonomy of reasons for such inflated paths,
and to quantify the impact of the delay introduced by path inflation on web performance.
We find that in late 2011 at least 47% of the traces in our data suffered path inflation
due to one or more of the following reasons: a) limited carrier network infrastructure,
b) lack of local peering between carrier and Google or c) DNS-based mapping of mobile
clients to remote Google servers. Inflated paths added 5-50ms propagation delays, which
corresponds to several hundreds of milliseconds of additional page load times.
104
In Chapter 3 we introduced a model of HTTP/2 that can be used to estimate the
performance of HTTP/2 using HTTP/1.1 traces. We used this model to estimate the
page load performance impact of deploying HTTP/2 on a large CDN (Akamai), through
which we have available historical data of user page loads. We pass through the model a
large dataset of 280,000 real user downloads of pages using HTTP/1.1. We find that 67%
of the web sites in the dataset would see improvement with HTTP/2 while most of the
remaining web sites would see little change. We also show how two features of HTTP/2
(prioritization, and, more importantly, Push) could theoretically provide additional benefit
to all web sites.
Lastly, in Chapter 4, motivated by observations from our HTTP/2 model estimations
related to the potential benefit from using Server Push, we further explore the opportunity
to use the idle network time between a CDN proxy and a client in order to speculatively
send content to the client before it requests it. To this end, we identify two timeframes
around a page load during which the network channel is idle: a) the time after a page load
completes and the user interacts with the page (PostOnload time), and b) the time after
the user requests a page and before the HTML response arrives at the clients (PreHTML).
We design and implement a way for the CDN to guess which objects a client is likely
to request in the near future, and to configure CDN proxies to speculatively send those
objects to the client during those idle timeframes. We find that, using the proposed policy
and a combination of Prefetch and Push between page transitions, page load times can
see up to 32% reductions, leading to performance similar to having the whole page cached
on the browser in advance.
105
6.2 Lessons Learned and Future Directions
The Web is a vast ecosystem of different devices, network types, and protocols operating
at different layers, Web optimization a complex objective that needs to be explored at
several different layers at the same time. In this work we focus on part of that ecosystem,
and take a step at fully understanding the potential for improvement. To continue to
get a better picture, we need to overcome some existing limitations, some of which we
encountered during this work.
We need continuous studies and evaluations of new technologies. Communities
and standardization groups like the Internet Engineering Task Force (IETF) and the
World Web Consortium (W3C) play a crucial role in defining web standards. While new
proposals are usually accompanied by thorough testing and evaluation, it is important to
ensure that standards are continually being evaluated in a comprehensive way. Performing
holistic evaluations for new standards in the wild is hard.
Companies are very often involved in such groups and push towards standardization
of web technologies that aim at better performance, based on data drawn from their own
production networks. As such, it is not uncommon for proposals to be (knowingly or not)
designed around specific goals of the parties that suggest them, potentially ignoring some
unobserved scenarios.
More importantly, new proposals are generally hard to evaluate in all possible contexts.
In Chapters 3 and 5, we saw how different studies on HTTP/2 performance, including
the study presented in this dissertation, concluded in somewhat different results.
106
As an example, while HTTP/2 is getting quickly adopted as the new standard, there is
an ongoing discussion whether multiple parallel connections, like those used with HTTP/1,
is better than the single connection used in HTTP/2. The trade-off is especially unclear
in high-loss environments, where the parallel connections mitigate TCP’s slow down after
loss. Similarly, the choice between HTTP/2 and QUIC is not always clear, and the winner
depends on network characteristics [74]. Because of that, Chrome currently tries to open
racing connections using both protocols, and chooses the winner in real time.
Another example is the use of HTTP/2 Push: we know know that defining policies to
extraxts the benefits of Push is hard [16,17], and we saw that studies that evaluated its
impact resulted in resulted in ambiguous conclusions (Chapters 4 and 5). The HTTP/2
feature would be more widely adopted if there existed easy ways to extract its benefits
and avoid pitfalls. For example, the specification could define inherent ways to avoid
sending to the client objects that it has already in its cache [8], or ways to ensure that
pushing too much content to a client is impossible or harmless.
More generally, a reason for such discrepancies in evaluations is often that they are
conducted under different settings, using different tools and methodologies, and focusing
on different networks and devices. For these reason, while standardizing specifications for
new technologies is necessary, specs alone are not enough: we need continuous and diverse
studies and evaluations of new features in order to develop best practices for using them,
or to amend the specifications so that best practices enforced by default. Combining
results from multiple, diverse studies is important in order to expose different aspects of
newly proposed technologies and to understand settings under which those technologies
can be more impactful.
107
We need to understand and fix the third-party ecosystem. In Chapter 4 we
saw that third-party content can be the bottleneck for a page load. Third-party services
make things easier for developers by providing common libraries (e.g. jQuery, React and
Angular), for content owners by providing insights for analytics (e.g. Google analytics) and
personalized content placement (e.g. Outbrain), as well as for advertisers by simplifying
the process of installing ad trackers and targeting ads. However such third-party content
is often the culprit for slow page loads [63]. It is important to understand the role of
third-party content and find ways to remove it as a bottleneck, without compromising the
value that it provides.
It is clear that we can’t just block everything: Libraries are needed for dev simplicity.
AMP [3] strips down some script execution, and forces publishers to store content on
Google’s CDN. Ads are needed for monetization. Brave [6] strips ads and trackers, and
attempts to redesign the online ad ecosystem by creating better incentives for publishers,
users, and advertisers. Although the problem of third-party objects being often the
bottleneck has been recognized [37] and efforts are being made towards that direction,
there is still a lot to be done.
We need ways to measure web performance for real users at global scale
Collecting meaningful Web performance measurements is a challenging task merely due to
the size and heterogeneity of the Internet. In order to be able to reason meaningfully about
performance, we need data that represents a large variety of device types connected to the
Internet backbone through networks of different capacities and with different interdomain
connectivities, over large periods of time. While collecting such diverse measurements is
108
very hard, it is absolutely necessary if we want meaningful studies, because sparse data
can lead to misinformed conclusions.
There are several companies that provide web performance insights by crowdsourcing
measurements and using user devices as vantage points, in order to get performance
insights from the viewpoint of a geographically dispersed user base. Companies also
use synthetic measurements, by deploying static vantage points to issue measurements
against either their own web sites for analytics, or towards competitors for benchmarking
and comparison. These datasets are rarely public and usually biased towards specific
targets. If the research community had the infrastructure to generate large, public, diverse
datasets using similar techniques, it would help gain important and useful insights on
Web performance.
On top of covering large user segments, meaningful measurements must also expose
multiple dimensions of web transfer performance in a unified way, in order to make it
possible to identify performance bottlenecks across routing, network, application protocol
layers, or even CPU bottlenecks on end host devices.
Real User Monitoring, used heavily in our work [21,30], is a good start for exposing
application-layer performance details, but it is still lacking important features, like exposing
network-level details along with the traces. In a step towards that direction, Google
is working on a way to report network quality through Chrome downloads [22]. While
this is now standardized [45], no other browser currently supports it. The benefits of
exposing such information through a standardized channel like RUM would be significant
as it would allow RUM analyses to slice data and draw network quality-performance
correlations.
109
We need ways to gather measurements while preserving privacy. In order to
expose to the research community valuable data, we have to overcome the challenge of
safely sharing real user datasets without compromising privacy and anonymity. In this
work, we used data collected at companies with which we collaborated. Maintaining
anonymity has been a recurring theme within our collaboration with Akamai, as customer
and end-user privacy is taken very seriously. A way to allow analyses of sensitive data has
been by signing NDAs between data owners and researchers, but this is a very ad-hoc
and thus slow process. Having a standardized way to make measurements available for
research purposes (e.g. similar to the way the Speedometer dataset we used in Chapter 2
is available to anyone) will significantly boost Web performance research efforts.
Challenges in emerging markets . The work presented in this dissertation focused
either on wired clients, or on mobile clients mostly in North America (due to aforementioned
dataset limitation for mobile clients in other regions.) However, the issue of evaluating
technologies for different environments becomes even more relevant when we consider an
important, but usually ignored, fraction of users: emerging markets, and in particular
emerging mobile users in developing countries. Companies are definitely interested in this
particular user segment (the so-called ”next billion users”) and are allocating resources and
investing in infrastructure. Many characteristics are different in these markets, including
interdomain connectivity, last mile networks, and types of user device, raising a set of
unique challenges compared to those met in more developed regions.
We need to attack Web performance at different layers. There have been great
strides in optimizing the network, and there are still a lot of potential new directions.
However, every time you break a bottleneck we reveal a new one (cite Ilya’s book), and
110
for that reason it is not enough to optimize the network alone. The traditional OSI slicing
into 4 or 7 layers is a nice simplification for describing networks, but when we talk about
optimizing the Web we need to look at the whole stack in parallel. Spiral process is
required, collaboration between experts in sub-fields, to fix something in a layer, measure
impact, identify new bottleneck potentially in other layer, break it, measure, repeat. This
work has shown cases where the benefit from network optimization was bounded by client
processing bottlenecks (push buffer, prioritization on the server). Such cases are not very
uncommon and in order to unlock full potential of new protocols, the standardization
entities need to cover specs for correct implementations on all involved parties.
We need to optimize for the right metrics. We saw in Chapters 4 and 5 that
traditional metrics like Page Load Time, Time To First Contentful Paint, or even Speed
Index don’t capture user-perceived performance faithfully. Focusing on different metrics
that more directly describe user experience and engagement with web sites is becoming
more and more important. This is a topic that is gaining a lot of traction in the last years,
and while some new suggestions have been made, there is a lot to be done until we have
converged to one or a few metrics that are good enough to replace the existing traditional
ones.
6.3 Epilogue
In this dissertation we investigated two factors that can limit the speed of web transfers:
circuitous paths and unused channel capacity between clients and servers. We also
111
designed and evaluated an approach for increasing the utilization of communication
channels between clients and CDN proxy servers.
However, Web optimization is a vast research field: performance bottlenecks in web
transfers can appear in different parts of the Web ecosystem and the protocols and
devices involved in keeping it functional. There are a lot of remaining challenges, and new
challenges are guaranteed to continue to emerge as the Web evolves.
112
Bibliography
[1] A Closer Look To HTTP/2 Push. https://www.shimmercat.com/en/blog/articles/
whats-push/.
[2] A tale of four caches. https://blog.yoav.ws/tale-of-four-caches/.
[3] Accelerated Mobile Pages. https://www.ampproject.org/.
[4] Amazon Silk browser. http://amazonsilk.wordpress.com/.
[5] Apache HTTP/2 stream priorities patch. https://icing.github.io/mod h2/nimble.
html.
[6] Brave browser. https://brave.com/.
[7] Cache Digests for HTTP/2. https://tools.ietf.org/html/
draft-ietf-httpbis-cache-digest-00.
[8] Cache Digests for HTTP/2. https://tools.ietf.org/html/
draft-ietf-httpbis-cache-digest-04.
[9] Chrome Lazy Loading. https://www.bleepingcomputer.com/news/google/
google-chrome-to-feature-built-in-image-lazy-loading/.
[10] Chrome Offline Previews. https://www.chromestatus.com/feature/
5076871637106688.
[11] Chrome Predictor. https://blog.ouseful.info/2013/11/21/more-digital-traces/.
[12] Early Hints RFC. https://tools.ietf.org/html/draft-ietf-httpbis-early-hints-04.
[13] Gurobi optimization. http://http://www.gurobi.com//.
[14] HPACK - Header Compression for HTTP/2. http://http2.github.io/http2-spec/
compression.html.
[15] HTTP Pipelining âĂŞ Not So FastâĂę(Nor Slow!). http://www.guypo.com/
http-pipelining-not-so-fast-nor-slow/.
[16] HTTP/2 push is tougher than I thought. https://jakearchibald.com/2017/
h2-push-tougher-than-i-thought/.
113
[17] HTTP/2 Push: The details. https://calendar.perfplanet.com/2016/
http2-push-the-details/.
[18] http://dev.chromium.org/spdy. http://dev.chromium.org/spdy.
[19] Latency Is Everywhere And It Costs You Sales. http://highscalability.com/
latency-everywhere-and-it-costs-you-sales-how-crush-it.
[20] Latency: The New Web Performance Bottleneck. https://www.igvita.com/2012/07/
19/latency-the-new-web-performance-bottleneck/.
[21] Navigation Timing. https://www.w3.org/TR/navigation-timing.
[22] Network Quality Estimator. https://blog.chromium.org/2017/09/
chrome-62-beta-network-quality.html.
[23] Opera Mini browser. http://www.opera.com/mobile/.
[24] Page Load Time. https://www.maxcdn.com/one/visual-glossary/page-load-time/.
[25] PageSpeed Insights. https://developers.google.com/speed/docs/insights/.
[26] Preload. https://www.w3.org/TR/preload/.
[27] Push API. https://www.w3.org/TR/push-api/.
[28] QUIC, a multiplexed transport over UDP. https://www.chromium.org/quic.
[29] Resource Hints. https://www.w3.org/TR/resource-hints/.
[30] Resource Timing. https://www.w3.org/TR/resource-timing.
[31] Resource Timing Specification. http://www.w3.org/TR/resource-timing/.
[32] Rules of Thumb for HTTP/2 Push. https://docs.google.com/document/d/
1K0NykTXBbbbTlv60t5MyJvXjqKGsCVNYHyLEXIxYMv0/.
[33] SPDY: An experimental protocol for a faster web.
https://www.chromium.org/spdy/spdy-whitepaper.
[34] SPDY Performance on Mobile Networks. https://developers.google.com/speed/
articles/spdy-for-mobile.
[35] Speed Index. https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/
metrics/speed-index.
[36] Stop Worrying about Time To First Byte. https://blog.cloudflare.com/
ttfb-time-to-first-byte-considered-meaningles/.
[37] Taking back control over third-party content. https://conferences.oreilly.com/
velocity/vl-ca-2016/public/schedule/detail/49909.
114
[38] The Impact Of Page Loading Times On Site Revenue. http://www.surgedigital.co.
uk/blog/the-impact-of-page-loading-times-on-site-revenue/.
[39] Time To First Byte. https://www.maxcdn.com/one/visual-glossary/
page-load-time/.
[40] Time To First Contentful Description. https://gtmetrix.com/blog/
first-contentful-paint-explained/.
[41] Time To First Contentful Paint Spec. https://docs.google.com/document/d/
1kKGZO3qlBBVOSZTf-T8BOMETzk3bY15SC-jsMJWv4IE/.
[42] Time To First Meaningful Paint Description. https://developers.google.com/web/
tools/lighthouse/audits/first-meaningful-paint.
[43] Time To First Meaningful Paint Spec. https://goo.gl/Todxh8.
[44] Time To Interactive. https://docs.google.com/document/d/
11sWqwdfd3u1TwyZhsc-fB2NcqMZ 59Kz4XKiivp1cIg/.
[45] W3C Network Information API. https://wicg.github.io/netinfo/.
[46] WebPageTest. http://www.webpagetest.org/.
[47] Victor Agababov, Michael Buettner, Victor Chudnovsky, Mark Cogan, Ben Green-
stein, Shane McDaniel, Michael Piatek, Colin Scott, Matt Welsh, and Bolian Yin.
Flywheel: Google’s data compression proxy for the mobile web. In Networked
Systems Design and Implementation, NSDI, 2015.
[48] Nina Bhatti, Anna Bouch, and Allan Kuchinsky. Integrating user-perceived quality
into web server design. Computer Networks, 2000.
[49] Enrico Bocchi, Luca De Cicco, Marco Mellia, and Dario Rossi. The web, the users,
and the mos: Influence of http/2 on user experience. In International Conference
on Passive and Active Network Measurement, 2017.
[50] Anna Bouch, Allan Kuchinsky, and Nina T. Bhatti. Quality is in the eye of the
beholder: meeting users’ requirements for internet quality of service. In Conference
on Human factors in computing systems, CHI, 2000.
[51] Michael Butkiewicz, Daimeng Wang, Zhe Wu, Harsha V. Madhyastha, and Vyas
Sekar. Klotski: Reprioritizing web content to improve user experience on mobile
devices. In Networked Systems Design and Implementation, NSDI, 2015.
[52] Matt Calder, Xun Fan, Zi Hu, Ethan Katz-Bassett, John Heidemann, and Ramesh
Govindan. Mapping the expansion of google’s serving infrastructure. In Proceedings
of the 2013 conference on Internet measurement conference, 2013.
[53] Calin Cascaval, Seth Fowler, Pablo Montesinos-Ortego, Wayne Piekarski, Mehrdad
Reshadi, Behnam Robatmili, Michael Weber, and Vrajesh Bhavsar. Zoomm: a
parallel web browser engine for multicore mobile devices. In ACM SIGPLAN Notices,
2013.
115
[54] Xin Chen and Xiaodong Zhang. A popularity-based prediction model for web
prefetching. Computer, 2003.
[55] Wael Cherif, Youenn Fablet, Eric Nassor, Jonathan Taquet, and Yuki Fujimori.
Dash fast start using http/2. In Workshop on Network and Operating Systems
Support for Digital Audio and Video, NOSSDAV, 2015.
[56] Yi-Ching Chiu, Brandon Schlinker, Abhishek Balaji Radhakrishnan, Ethan Katz-
Bassett, and Ramesh Govindan. Are We One Hop Away from a Better Internet? In
Internet Measurement Conference, IMC, 2015.
[57] M. Crovella and P. Barford. The network effects of prefetching. In INFOCOM, 1998.
[58] Wei Dong, Zihui Ge, and Seungjoon Lee. 3G Meets the Internet: Understanding
the Performance of Hierarchical Routing in 3G Networks. In ITC, 2011.
[59] Nandita Dukkipati, Matt Mathis, Yuchung Cheng, and Monia Ghobadi. Proportional
rate reduction for tcp. In Internet Measurement Conference, IMC, 2011.
[60] Yehia El-khatib, Gareth Tyson, and Michael Welzl. Can SPDY really make the web
faster? In IFIP Networking Conference, 2014.
[61] Jeffrey Erman, Vijay Gopalakrishnan, Rittwik Jana, and K. K. Ramakrishnan.
Towards a spdy’ier mobile web? In International Conference on emerging Networking
EXperiments and Technologies, CoNEXT, 2013.
[62] Tobias Flach, Nandita Dukkipati, Andreas Terzis, Barath Raghavan, Neal Cardwell,
Yuchung Cheng, Ankur Jain, Shuai Hao, Ethan Katz-Bassett, and Ramesh Govindan.
Reducing web latency: the virtue of gentle aggression. In Special Interest Group on
Data Communication, SIGCOMM, 2013.
[63] Utkarsh Goel, Moritz Steiner, Mike P Wittie, Martin Flack, and Stephen Ludin.
Measuring what is not ours: A tale of 3rd party performance. In Passive and Active
Measurement, PAM, 2017.
[64] Sangtae Ha, Injong Rhee, and Lisong Xu. CUBIC: a new tcp-friendly high-speed
TCP variant. Operating Systems Review, 2008.
[65] Conor Kelton, Jihoon Ryoo, Aruna Balasubramanian, and Samir R. Das. Improving
user perceived page load times using gaze. In Networked Systems Design and
Implementation, NSDI, 2017.
[66] Rupa Krishnan, Harsha V. Madhyastha, Sushant Jain, Sridhar Srinivasan, Arvind
Krishnamurthy, Thomas Anderson, and Jie Gao. Moving beyond end-to-end path
information to optimize cdn performance. 2009.
[67] Thomas M. Kroeger, Darrell D. E. Long, and Jeffrey C. Mogul. Exploring the bounds
of web latency reduction from caching and prefetching. In USENIX Symposium on
Internet Technologies and Systems, 1997.
116
[68] Zhichun Li, Ming Zhang, Zhaosheng Zhu, Yan Chen, Albert G. Greenberg, and
Yi-Min Wang. Webprophet: Automating performance prediction for web services.
In Networked Systems Design and Implementation, NSDI, 2010.
[69] Chao Liu, Ryen W. White, and Susan Dumais. Understanding web browsing
behaviors through weibull analysis of dwell time. In Conference on Research and
Development in Information Retrieval, SIGIR, 2010.
[70] Haohui Mai, Shuo Tang, Samuel T King, Calin Cascaval, and Pablo Montesinos. A
case for parallelizing web pages. In Workshop on Hot Topics in Parallelism, HotPar,
2012.
[71] Patrick Meenan. How fast is your web site? Communications of the ACM, 2013.
[72] Leo A Meyerovich and Rastislav Bodik. Fast and parallel webpage layout. In
International Conference on World Wide Web, WWW, 2013.
[73] Alexandros Nanopoulos, Dimitrios Katsaros, and Yannis Manolopoulos. A data
mining algorithm for generalized web prefetching. IEEE Trans. on Knowl. and Data
Eng., 2003.
[74] Usama Naseer and Theophilus Benson. Configtron: Tackling network diversity with
heterogeneous configurations. In 9th USENIX Workshop on Hot Topics in Cloud
Computing (HotCloud 17), 2017.
[75] Ravi Netravali and James Mickens. Prophecy: Accelerating mobile page loads using
final-state write logs. In Networked Systems Design and Implementation, NSDI,
2018.
[76] Ravi Netravali, James Mickens, and Hari Balakrishnan. Polaris: Faster page
loads using fine-grained dependency tracking. In Networked Systems Design and
Implementation, NSDI, 2016.
[77] Ravi Netravali, Vikram Nathan, James Mickens, and Hari Balakrishnan. Vesper:
Measuring time-to-interactivity for modern web pages. In Networked Systems Design
and Implementation, NSDI, 2018.
[78] Jitu Padhye and Henrik Frystyk Nielsen. A comparison of SPDY and http perfor-
mance. Technical Report MSR-TR-2012-102, 2012.
[79] Venkata N. Padmanabhan and Jeffrey C. Mogul. Using predictive prefetching to
improve world wide web latency. SIGCOMM CCR, 1996.
[80] Sivasankar Radhakrishnan, Yuchung Cheng, Jerry Chu, Arvind Jain, and Barath
Raghavan. Tcp fast open. In International Conference on emerging Networking
EXperiments and Technologies, CoNEXT, 2011.
[81] Sanae Rosen. Improving Mobile Network Performance Through Measurement-driven
System Design Approaches. PhD thesis, University of Michigan, 2016.
117
[82] Sanae Rosen, Bo Han, Shuai Hao, Z. Morley Mao, and Feng Qian. Push or request:
An investigation of http/2 server push for improving mobile performance. In
International Conference on World Wide Web, WWW, 2017.
[83] Vaspol Ruamviboonsuk, Ravi Netravali, Muhammed Uluyol, and Harsha V Mad-
hyastha. Vroom: Accelerating the mobile web with server-aided dependency resolu-
tion. In Special Interest Group on Data Communication, SIGCOMM, 2017.
[84] John P. Rula and Fabian E. Bustamante. Behind the Curtain - Cellular DNS and
Content Replica Selection. In Internet Measurement Conference, IMC, 2014.
[85] Stefan Savage, Andy Collins, Eric Hoffman, John Snell, and Thomas Anderson. The
end-to-end effects of internet path selection. 1999.
[86] Paul Schmitt, Morgan Vigil, and Elizabeth Belding. A Study of MVNO Data Paths
and Performance. In Passive and Active Measurement, PAM, 2016.
[87] Joel Sommers and Paul Barford. Cell vs. WiFi: on the performance of metro area
mobile connections. In Internet Measurement Conference, IMC, 2012.
[88] Neil Spring, Ratul Mahajan, and Thomas Anderson. The causes of path inflation.
In Proceedings of the 2003 Conference on Applications, Technologies, Architectures,
and Protocols for Computer Communications, 2003.
[89] Hongsuda Tangmunarunkit, Ramesh Govindan, Scott Shenker, and Deborah Estrin.
The impact of routing policy on internet paths. In Proceedings IEEE INFOCOM
2001, The Conference on Computer Communications, Twentieth Annual Joint
Conference of the IEEE Computer and Communications Societies, Twenty years
into the communications odyssey, Anchorage, Alaska, USA, April 22-26, 2001, 2001.
[90] Matteo Varvello, Jeremy Blackburn, David Naylor, and Konstantina Papagiannaki.
Eyeorg: A platform for crowdsourcing web quality of experience measurements.
In Proceedings of the 12th International on Conference on Emerging Networking
EXperiments and Technologies, 2016.
[91] Matteo Varvello, Kyle Schomp, David Naylor, Jeremy Blackburn, Alessandro Fi-
namore, and Konstantina Papagiannaki. Is the web http/2 yet? In Passive and
Active Measurement, PAM. 2016.
[92] Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and David
Wetherall. Demystifying page load performance with wprof. In Networked Systems
Design and Implementation, NSDI, 2013.
[93] Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and David
Wetherall. How speedy is spdy? In Networked Systems Design and Implementation,
NSDI, 2013.
[94] Xiao Sophia Wang, Arvind Krishnamurthy, and David Wetherall. Speeding up web
page loads with shandian. In Networked Systems Design and Implementation, NSDI,
2016.
118
[95] Qiang Xu, Junxian Huang, Zhaoguang Wang, Feng Qian, Alexandre Gerber, and
Zhuoqing Morley Mao. Cellular data network infrastructure characterization and
implication on mobile content placement. In SIGMETRICS, 2011.
[96] Xing Xu, Yurong Jiang, Tobias Flach, Ethan Katz-Bassett, David Choffnes, and
Ramesh Govindan. Investigating Transparent Web Proxies in Cellular Networks. In
Passive and Active Measurement, PAM, 2015.
[97] Qiang Yang, Haining Henry Zhang, and Tianyi Li. Mining web logs for prediction
models in www caching and prefetching. In SIGKDD International Conference on
Knowledge Discovery and Data Mining, 2001.
[98] Kyriakos Zarifis, Mark Holland, Manish Jain, Ethan Katz-Bassett, and Ramesh
Govindan. Modeling HTTP/2 speed from HTTP/1 traces. In Passive and Active
Measurement, PAM, 2016.
[99] Fatima Zarinni, Ayon Chakraborty, Vyas Sekar, Samir R. Das, and Phillipa Gill.
A First Look at Performance in Mobile Virtual Network Operators. In Internet
Measurement Conference, IMC, 2014.
[100] Yaping Zhu, Benjamin Helsley, Jennifer Rexford, Aspi Siganporia, and Sridhar
Srinivasan. LatLong: Diagnosing Wide-Area Latency Changes for CDNs. IEEE
TNSM.
119
Abstract (if available)
Abstract
Web performance has attracted a lot of attention in the industry and academia in the last decade, as it has been repeatedly associated with business revenues. The Web is a vast and diverse ecosystem, and there are multiple layers in which delays can occur. In this dissertation we focus on delays in the routing layer caused by using circuitous paths between clients and servers, and delays in the application layer caused by inefficient use of their communication channel. ❧ The first step of a Web transfer involves establishing a connection over a physical path between the client and the server. Previous research has shown that the shortest path is not always selected, due to routing protocol policy-based decisions. In this work we re-evaluate path inflation, specifically focusing on mobile traffic directed to Google servers, in order to understand the evolution of the infrastructure of mobile carrier networks, and how it can affect user experience. ❧ Once a connection has been established, information is exchanged between the two hosts according to the communication protocol defined by HTTP, the application layer protocol used in the vast majority of today’s Web transfers. HTTP recently saw a big redesign and addition of new features, all of which aimed at faster transfers. In this work we develop a model of the new version of HTTP/2 and pass through it a large dataset of HTTP/1 traces, in order to understand the performance implications of deploying the new version of the protocol in the wild. Our study exposes several opportunities for improvements, specifically using a feature of HTTP/2 that allows a server to send to the client an object without the client requesting it. ❧ Generalizing from that observation, we design, develop and evaluate a system that allows CDNs to utilize idle network time around page downloads to send to the client content that the client is expected to request in the current or next page navigation. We show that if implemented correctly, this kind of speculative content prepositioning on the client can achieve a performance improvement comparable to having a page loaded on the client cache.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Measuring the impact of CDN design decisions
PDF
Detecting and mitigating root causes for slow Web transfers
PDF
USC Computer Science Technical Reports, no. 971 (2017)
PDF
Improving user experience on today’s internet via innovation in internet routing
PDF
Anycast stability, security and latency in the Domain Name System (DNS) and Content Deliver Networks (CDNs)
PDF
Enabling efficient service enumeration through smart selection of measurements
PDF
Relative positioning, network formation, and routing in robotic wireless networks
PDF
Efficient pipelines for vision-based context sensing
PDF
Balancing security and performance of network request-response protocols
PDF
A protocol framework for attacker traceback in wireless multi-hop networks
PDF
On efficient data transfers across geographically dispersed datacenters
PDF
Robust routing and energy management in wireless sensor networks
PDF
Performant, scalable, and efficient deployment of network function virtualization
PDF
Dynamic routing and rate control in stochastic network optimization: from theory to practice
PDF
Toward understanding mobile apps at scale
PDF
Elements of next-generation wireless video systems: millimeter-wave and device-to-device algorithms
PDF
A framework for runtime energy efficient mobile execution
PDF
Networked cooperative perception: towards robust and efficient autonomous driving
PDF
Gradient-based active query routing in wireless sensor networks
PDF
Learning about the Internet through efficient sampling and aggregation
Asset Metadata
Creator
Zarifis, Kyriakos
(author)
Core Title
Making web transfers more efficient
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
10/15/2018
Defense Date
05/09/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
content delivery networks,HTTP,OAI-PMH Harvest,web optimization,web performance
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Govindan, Ramesh (
committee chair
), Katz-Bassett, Ethan (
committee chair
), Psounis, Kostantinos (
committee member
)
Creator Email
kyr.zarifis@gmail.com,kyriakos@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-79054
Unique identifier
UC11672272
Identifier
etd-ZarifisKyr-6840.pdf (filename),usctheses-c89-79054 (legacy record id)
Legacy Identifier
etd-ZarifisKyr-6840.pdf
Dmrecord
79054
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zarifis, Kyriakos
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
content delivery networks
HTTP
web optimization
web performance