Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 944 (2014)
(USC DC Other)
USC Computer Science Technical Reports, no. 944 (2014)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Investigating Transparent Web Proxies
in Cellular Networks
Xing Xu
*
, Yurong Jiang
*
, Tobias Flach
*
, Ethan Katz-Bassett
*
, David Choffnes
†
, and
Ramesh Govindan
*
*
University of Southern California —
†
Northeastern University
ABSTRACT
Users increasingly use mobile devices as their primary mean-
s to access the Internet. While it is well known that cel-
lular network operators employ middleboxes, the details of
their behavior and their impact on performance for repre-
sentative Web workloads are poorly understood. This paper
presents an analysis of proxy behavior and how transparen-
t Web proxies interact with HTTP traffic in four major US
cell carriers. We find that all four carriers use these prox-
ies to interpose on HTTP traffic, but they vary in terms of
whether they perform object caching, traffic redirection, im-
age compression, and connection reuse. For example, some
transparent proxies unilaterally lower the quality of images,
which improves object fetch time but may hurt user satis-
faction. We also find that these proxies do not necessarily
enhance performance for mobile Web workloads in terms of
object fetch times; namely, we observe noticeable benefit-
s only when flow sizes are large and the path between the
server and proxy exhibits large latency and/or loss.
1. INTRODUCTION
Internet service providers commonly deploy middleboxes
inside their networks for reasons that include security, traf-
fic management, and performance optimization [23]. In the
mobile environment where resources such as spectrum are
scarce, operators have significant incentives to interpose on
Internet traffic. Unfortunately, operators are rarely trans-
parent about middlebox policies, and their impact on rep-
resentative mobile workloads is poorly understood. Previ-
ous work identified that middleboxes exist in cellular net-
works and characterized several middlebox behaviors [8,11,
13, 21, 24, 26]. For example, these studies show that car-
riers proxy traffic to servers by transparently splitting client
TCP connections into two connections: the proxy terminates
the client’s TCP connection by spoofing as the server, and
the proxy establishes a separate connection to the server by
spoofing as the client. With split connections, the proxy can
configure each segment individually and respond to latency
and loss independently, potentially improving performance.
It is widely believed that splitting TCP connections should
improve – or at least not worsen – performance for devices
in cellular networks, where latencies and loss can be much
larger than in fixed-line paths [9, 11, 13, 21, 24]. Howev-
er, previous studies do not characterize performance impact
with respect to modern cellular networks and workloads.
In this paper, we are the first to conduct a detailed study
of transparent proxies in four major US cellular providers
and determine their impact on performance. Our measure-
ments indicate that all four carriers use transparent proxies
for Web traffic (TCP port 80), which represents a large por-
tion of today’s Internet flows. Thus, we focus in particular
on transparent Web proxies.
We designed controlled experiments to investigate fea-
tures of transparent Web proxy implementations, including
caching, content modification, traffic redirection to preferred
servers, and connection persistence. Specifically, we tightly
control and monitor the traffic generated by devices, DNS
servers, and Web servers to characterize proxy behavior and
its impact under varying network conditions and workload-
s, including representative workloads using a mobile Web
browser. We also develop techniques that allow us to infer
proxy behavior for communication with servers that we do
not control and evaluate their impact on popular Web sites.
Key Results. First, each carrier implements proxying poli-
cies differently, and they can lead to a different user experi-
ence in terms of the speed and quality of downloaded con-
tent. For example, image compression can reduce download
time by a factor of five, but caching content has little im-
pact on performance in our experiments. Second, we ob-
serve that split connections improve performance for larg-
er flows (up to 45%), but have negligible impact on small
ones (≤100KB). We show that proxied connections can pro-
vide benefits in lossy and high-latency environments, par-
ticularly where the cellular segment is not the dominating
factor determining end-to-end performance. We use a mo-
bile Web browser to download replicated Web content from
servers we control while approximating the same commu-
nication patterns. Under normal network conditions, these
proxies do not measurably improve performance, but page
load times are 30% faster when we induce loss on the wired
segment. Last, we verify that proxying occurs of all of the
most popular 100 Web site front pages, but discover that Y-
ouTube video servers bypass T-Mobile’s proxy possibly due
to special arrangements between the providers. Our results
1
indicate proxies may not necessarily improve performance
for mobile users, motivating the need for larger-scale and
more in-depth analysis of the performance benefits across
networks, devices, locations and workloads.
2. BACKGROUND AND RELATED WORK
Few studies systematically reveal middlebox policies in
mobile networks and assess their impact. Early work in
this area has focused on understanding, modeling and im-
proving split-TCP designs for proxies in wireless and cel-
lular networks. An early survey [8] qualitatively character-
izes the behavior and role of performance-enhancing prox-
ies for wireless networks in general. Ehsan et al. [11] study
the benefits of proxies for satellite networks and describe
the benefits of split-TCP connections. Necker et al. [24]
explore, through simulation, the impact of proxies on bulk
downloads and Web traffic on UMTS networks. Ivanovich
et al. [21] discuss advanced ACKing strategies to buffer data
at the proxy for increased wireless link utilization. Finally,
Gomez et al. [17] show that PEPs can improve Web brows-
ing performance, Rodriguez et al. [25] discuss the architec-
ture of a PEP (together with associated TCP optimization-
s) for a GPRS network, and Baccelli et al. [7] model the
performance of split-TCP to understand its asymptotic be-
havior. In contrast, our work characterizes the behavior and
performance impact of deployed proxies on modern cellular
networks, across four major US carriers.
More recently, several pieces of work have explored oth-
er aspects of proxy behavior in modern cellular networks.
Botta et al. [9] explore how middleboxes can impact mea-
surements, and propose a careful methodology for cellular
measurements, some of which we adopt and extend. Farkas
et al. [13] use numerical simulations to quantify the perfor-
mance improvement of proxies in LTE networks, while our
work directly measures this improvement. Ehsan et al. [12]
study tradeoffs of caching through real user traces. Clos-
est to our work are three measurement studies that have at-
tempted to reveal complementary aspects of proxy behav-
ior. Wang et al. [26] show how cellular middlebox settings
can impact mobile device energy usage and how middlebox-
es can be used to attack or deny service to mobile devices.
Weaver et al. [27] study the prevalence of HTTP proxying
using a large dataset of clients and taxonomize the types
of HTTP proxying seen in the wild, ranging from transcod-
ing proxies to censoring and anti-virus proxies. Unlike our
work, theirs do not attempt to enumerate the detailed TCP-
level behavior of cellular proxies for various network con-
ditions and Web workloads. Finally, Jiang et al. [22] ana-
lyze, through measurements, bufferbloat in cellular network-
s, and propose a dynamic window adjustment algorithm to
alleviate this. Our work explores proxy behavior, which in-
cludes buffers among many other features that impact per-
formance.
3. EXPERIMENTAL TESTBED
Our testbed design is motivated by three goals. First, we
Table 1—Proxy implementations observed in our study.
AT&T T-Mobile Verizon Sprint
Caching X X
Redirection X
Content Rewriting X
Connection Persistence X X X
Delayed Handshaking X X X X
want to conduct controlled experiments to determine how a
proxy responds to different Web flow characteristics. Sec-
ond, for transparently proxied connections, we want to use
microbenchmarks to identify under which circumstances the
proxy behavior helps or hurts performance in terms of down-
load time. Last, we want to understand how proxy behavior
impacts the performance under realistic workloads. For this
work we focus on the delivery time for Web sites requiring
multiple resources from different servers.
With these goals in mind, we set up the following testbed.
We use multiple rooted mobile devices (HTC One phones
with Android 4.3) and different cellular carrier data plan-
s to explore proxy behavior for each of the four major US
carriers. Moreover, we control a Web server and a DNS
subdomain that resolves to it, allowing us to monitor both
endpoints of a connection when we access a URL via one of
our mobile devices. Finally, we run tcpdump on the device
and on the server to capture detailed network information,
including TCP/IP headers and timestamps (after synchroniz-
ing endpoints using NTP).
With full control over the server and client devices we can
explore proxy properties through different experiment con-
figurations, varying parameters like content to fetch, socket
properties (e.g. server IP/port), HTTP configuration (includ-
ing modified headers), and even adjust network conditions.
We perform all measurements from mobile devices locat-
ed in Southern California. For each given configuration we
mention in this paper, we conduct at least 250 trials. When
comparing performance results between two configurations,
each trial is composed of one run per configuration to mini-
mize the probability of signal strength and congestion varia-
tion impacting our results. In addition, we filter results with
poor signal strength.
4. PROXY FEATURES
We characterize proxy implementations for four major US
carriers and their potential impact on client-perceived per-
formance. We identify five proxy features: caching, redirec-
tion, content rewriting, connection persistence, and delayed
server-side handshakes. We observed different feature sets
for each carrier (Table 1).
To observe proxy features, we conduct experiments be-
tween the mobile phones and our server. Since we control
both endpoints, we can correlate and examine client- and
server-side packet traces, and extract features that indicate
Web proxy interference. We first establish the presence of
a Web proxy by inspecting various connection properties,
including the TCP window scaling parameter, receiver win-
dow, sequence and acknowledge numbers. In all four car-
2
10K 100K 500K
Filesize (KB)
10
-1
10
0
10
1
Fetching Time (s)
Non-Cache
Cache
Non-Cache (Delay)
Cache (Delay)
Figure 1—Fetch times for cached
and uncached objects.
300 400 500 600 700
Original Size (KB)
0
100
200
300
400
500
600
700
Compressed Size (KB)
y = x
499.5 500.0 500.5
0
250
500
~500 Zoom In
Figure 2—Impact of Sprint’s
image compression (original vs.
compressed file size).
12 52 115 300 500
Filesize (KB)
0
5
10
15
20
25
Fetching Time (s)
Compress
Non-Compress
Figure 3—Fetch times for com-
pressed (left), and original images
(right), on Sprint.
0.0 0.3 0.6 0.9 1.2 1.5
Extra Delay (s)
0.0
0.5
1.0
1.5
2.0
One Way Latency (s)
Split
Non-Split
(Extra Delay)
Figure 4—Server-side handshake
latency for split (top) and non-
split connections, on T-Mobile.
riers we studied, at least one of these properties was incon-
sistent between the client and server, suggesting interference
by a proxy. In addition, we observe that a client receives the
initial TCP handshake response before the server receives a
SYN packet, and conversely a server receives acknowledge-
ments for transmitted data packets before the client sees the
same data. Thus, we conclude that these proxies split con-
nections between the two original endpoints.
We observe that proxies only intercept traffic on a small
number of ports (including port 80). Thus, we can compare
data for proxied and unproxied traffic by varying the server
port number. In our experiments we use port 80 to elicit
proxy behavior, and port 7777 to bypass the proxy.
To characterize proxy behavior we parameterize our ex-
periments along multiple dimensions. We vary the serv-
er port, to control proxy interference. We analyze traffic
observed when accessing different destinations, using both
static IPs and DNS names resolvable by our controlled DNS
server. We also experiment with multiple content types, flow
sizes, packet delay and loss through traffic shaping, and in-
vestigate the effect of different HTTP header configurations.
4.1 Caching
Behavior. We conclude that a carrier caches content if an
HTTP request sent by the client does not reach the server, yet
the client receives a response. We use unique resources host-
ed only on our server to rule out other explanations. We ob-
serve content caching for T-Mobile and Sprint. They cache
most Web objects (e.g., CSS, JavaScript, JPEG, PNG, GIF,
and TXT) but they do not cache HTML files. Both carriers
cache at per-device and per-session granularity. That is, the
cache is not shared between users and gets purged whenever
the device releases its IP address. In addition, cache entries
expire after a timeout period (≈5 minutes for Sprint, ≈30
minutes for T-Mobile).
Impact. Since the content is closer to the client, objec-
t fetch time can decrease significantly. In Fig. 1, we show
the measured fetch time for cached and non-cached objects,
and the impact of network latency. From top to bottom, the
boxes describe 90th, 75th, 50th, 25th and 10th percentiles
(same for subsequent figures). If the cellular link dominates
end-to-end latency we observe no noticeable performance
gain when accessing cached resources. However, in envi-
ronments with larger wired latencies (we demonstrate this by
introducing delays for outgoing packets on the server side),
we see fetch time improvements for small files (10KB). For
larger files (500KB), TCP throughput is bottlenecked by the
carrier capacity, again preventing caching benefits. In ad-
dition to faster serving time, caching can reduce a carrier’s
inbound traffic, especially if the carrier segments are lossy.
4.2 Redirection
Behavior. With this feature, a proxy redirects traffic based
on an independent DNS resolution of the Host field in the
header of an HTTP request, ignoring the target IP provided
in the packet. To observe this, we send an HTTP request to
our Web server IP but provide a third-party domain name in
the HTTP Host field, which triggers an error if handled by
our server. If the proxy uses redirection, we do not observe
the request in our server traces, yet the referenced website
renders at the client side. Only T-Mobile elicits this behavior
and we confirm it for more than 100 popular domains.
Impact. We cannot be certain, but this feature could be for
traffic engineering considerations, e.g., the carrier can con-
trol the destination for HTTP traffic at the proxy instead of
relying on devices. It is worth noting, that any server IP
mapping based on client-selected DNS servers is silently and
transparently overridden by this feature.
4.3 Object Rewriting
Behavior. In this case a proxy modifies file contents, for
example to improve performance through mechanisms like
whitespace trimming, or image transcoding to reduce the
load on the cellular segment. For a variety of Web file types
and content patterns, we compared the payloads transmitted
by the server with the contents received by the mobile de-
vices to detect this feature. We only observe compression
of image files, and only with Sprint up to an original file size
of 500KB (see Fig. 2).
Impact. Compressed files can be fetched faster, as shown
in Fig. 3. But, this comes with potential drawback of sacri-
ficing quality for faster load times. Aggressive compression
can distort images in ways that are unacceptable to the con-
tent provider or user [6]. Further, for transcoding the whole
image has to be fetched first. With larger images this can
result in long delays.
4.4 Connection Persistence
Behavior. Proxies can persist connections to both endpoints.
3
For the server-side segment, some proxies remove a client’s
connection: close directive in the HTTP header (used
to inform the server to close the connection upon query re-
sponse), or add aconnection: keep-alive entry. To per-
sist the client-proxy connection, some proxies drop the serv-
er’s TCP FIN packet. We find that AT&T and Sprint proxies
keep the connection to the HTTP server alive after each re-
quest completes. The keepalive time is∼10s for AT&T, and
∼30s for Sprint. AT&T, Sprint and T-Mobile drop the TCP
FIN from server to persist the client-proxy connection.
Impact. The advantages of this strategy are that persistent
connections avoid the delays that new per-object connection-
s would incur from TCP handshakes and slow start. Reusing
a connection can also minimize overhead on NAT table map-
pings at the edge of the carrier network.
4.5 Delayed Handshaking
Observation. Finally, we confirm that proxies in each car-
rier delay the initial handshake between themselves and a
server until receiving the HTTP request. Fig. 4 illustrates
this behavior. We artificially delay the query which pro-
portionally increases the server-side reception delay for the
handshake packet.
Impact. Deferred handshakes can delay end-to-end commu-
nication, in particular if a client opens a connection early to
avoid the establishment overhead when a query is ready to
transmit.
5. SPLIT CONNECTION PERFORMANCE
Intuitively, split TCP connections should offer better client-
perceived performance (i.e., faster downloads) than direct
connections if the proxy is on the same path. For one, s-
plitting the connection reduces the RTTs between connected
endpoints, which allows TCP to grow its congestion window
faster. Likewise, it isolates the throughput impact of loss
events that occur independently along each segment, and it
speeds loss detection and recovery [13, 21, 25].
In practice, splitting TCP connections offers benefits that
depend on the size of the flow and the relative performance
of the split path segments. For short flows, it is unclear if
split connections always result in better client-perceived per-
formance. Likewise, for cases where the cellular segment is
substantially worse than the wired segment, reducing RTT
and loss have little impact on the fetch time for Web objects.
The following subsections use controlled experiments to
understand the performance impact of split connections for
a Web server we host, for alternative network conditions be-
tween the server and proxy, and for realistic Web browser
workloads. We observe that this performance impact is not
uniform across carriers, network conditions, or Web sites.
While our experiments cannot be used to compare per-
formance across carriers (since we cannot create identical
conditions across them), we can get valuable insights in-
to the conditions under which split-connections do and do
not work well and understand how these insights generalize
0 1 2 3 4 5
0
50
100
150
200
Bytes (KB)
Split
Non-Split
RWIN
0 1 2 3 4 5
Time (s)
0
50
100
150
200
Bytes (KB)
Figure 6—Server Side Bytes-In-Flight (AT&T)
across carriers.
Baseline Performance. For each carrier, we fetch object-
s with different sizes using split and non-split connections.
Fig. 5(c) and 5(d) shows there is no significant performance
difference for Sprint for any of the file sizes. In contrast,
Fig. 5(a) shows that for T-Mobile, proxied downloads of
larger objects finish much earlier (in the 1MB case, 30%
faster in the median). AT&T shows similar performance
compared to T-Mobile. We emphasize that when we make
performance statements about a carrier, say Sprint, that is
shorthand for the performance seen by the mobile device in
our testbed connected to the Sprint network, not a blanket
statement about the carrier’s overall performance.
To understand the reasons for different performance ben-
efits, we analyze the network properties of the cellular and
wired path segments. First, we find that the wired segments
(server to proxy) for all four carriers have similar charac-
teristics in terms of latency and bandwidth. For Sprint and
Verizon, the limited bandwidth of their cellular segments is
the main performance bottleneck, thus limiting the efficacy
of split connections. In contrast, AT&T and T-Mobile offer
more bandwidth, thus transfers benefit from split connec-
tions since the shorter client-facing latencies enable faster
ramp-up of TCP’s congestion window. Interestingly, TCP’s
congestion window ramps up slowly in AT&T and T-Mobile
due to TCP’s Hybrid Start feature used by default in the Lin-
ux CUBIC congestion control mechanism [18, 19]. Figure 6
explains this behavior for a sample run, using a plot of the
bytes-in-flight for both split connection (top) and non-split
connection (bottom), which approximates the server conges-
tion window. We also plot the receiver-advertised window.
In the top figure, we observe that the proxy’s ACKs quickly
resulting in high throughput. In the bottom figure, we see
that the non-split connection exits slow start at approximate-
ly 0.7s. Due to the relatively high bandwidth-delay product
and associated large numbers of packets in flight, TCP de-
cides on a safe slow start exit point to avoid heavy losses and
underutilizes available bandwidth during the transfer. Since
the connection never reaches the channel capacity, splitting
connections can help to tune features like this for the two
path segments independently.
Impact of Varying Network Conditions. We repeat the
4
100K 500K 1000K
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Fetching Time (s)
Split
Non-Split
(a) T-Mobile
100K 500K 1000K
0.0
0.5
1.0
1.5
2.0
2.5
3.0
(b) AT&T
100K 500K 1000K
0
5
10
15
20
(c) Sprint
100K 500K 1000K
0
2
4
6
8
10
(d) Verizon
Figure 5—Fetching times for different file sizes. Split connections result in reduced fetch times for T-Mobile and AT&T.
10K 100K 500K 1000K
0
1
2
3
4
5
Fetching Time (s)
Split (50ms)
Non-Split (50ms)
Split (200ms)
Non-Split (200ms)
(a) T-Mobile
10K 100K 500K 1000K
0
1
2
3
4
5
6
7
8
(b) AT&T
10K 100K 500K 1000K
0
5
10
15
20
25
30
35
40
(c) Sprint
10K 100K 500K 1000K
0
2
4
6
8
10
(d) Verizon
Figure 7—Fetching times for different file sizes, with varying amounts of delay added to outgoing server-side packets.
10K 100K 500K 1000K
0.0
0.5
1.0
1.5
2.0
Fetching Time (s)
Split (3%)
Non-Split (3%)
Split (5%)
Non-Split (5%)
(a) T-Mobile
10K 100K 500K 1000K
0
1
2
3
4
5
(b) AT&T
Figure 8—Fetching times for different file sizes, with varying
packet loss rate of outgoing server-side packets.
experiments above, but emulate high latency wired path seg-
ments by having our server introduce between 50 and 200m-
s delay on each packet it sends. Fig. 7 plots the impact on
fetch times for various file sizes, comparing proxied and un-
proxied traffic. Split connections improve performance in
AT&T’s and T-Mobile’s case for larger files and delays (e.g.,
>1 MB and 200 ms delay); we do not observe statistically
significant changes with Sprint and Verizon.
Performance improvements are similar for AT&T, Sprint,
and Verizon when introducing correlated loss (i.e.,, 25% loss
probability when previous packet is lost) on the wired seg-
ment. Interestingly, T-Mobile’s performance for proxied traf-
fic is independent of loss rate in our experiments. This is a
reflection of the proxy maintaining a large-enough buffer to
compensate for the reduced throughput during loss.
Overall, these experiments show that split connections are
most impactful in environments where the cellular segment
is not the dominant factor with respect to end-to-end perfor-
mance. Thus, carriers with better cellular links benefit most.
Web Browsing. We now move from characterizing perfor-
mance for isolated object fetches to realistic workloads gen-
erated by a browser accessing popular Web sites. Since we
cannot bypass the proxy when accessing Web servers, we re-
sort to hosting Web site replicas on our server. For this, we
fetch the original URLs including all embedded resources,
even if they are delivered by third parties. We use a different
IP alias for each Web host. We then measure the round-trip
time to each real Web host and induce per-alias delay at our
server to approximate the communication patterns between
the phone and the real hosts. We host three qualitatively dif-
ferent types of sites: a news site (18 objects), a search engine
(14 objects), and an image-bound site (8 objects with 2 large
images). We introduce 3% packet loss on the server side
to investigate the impact of congestion. Also, we simulate
follow-up visits, by fetching the news site, waiting 10 sec-
onds, then fetching a link on the page. Thus, proxies that
persist connections and cache static content can potentially
improve performance compared to bypassed traffic.
Fig. 9 shows the Web browsing results. With introduced
loss, split connections generally outperform their unproxied
counterparts, with up to 30% lower completion times in the
median. This is mainly due to the proxy absorbing losses,
thus keeping performance comparable to a loss-free environ-
ment. The proxy buffer benefits for T-Mobile mentioned ear-
lier are evident in this experiment as well. Caching does not
provide significant gains on T-Mobile or Sprint in our tests.
In contrast, Sprint’s image compression drastically reduces
fetch times on the image-bound site. Finally, we find that
T-Mobile and AT&T’s persistent connections can improve
performance by≈10% for follow-up visits (not shown).
6. ON THE PREV ALENCE OF PROXYING
5
News Image Search
0
1
2
3
4
5
6
7
Fetching Time (s)
Split
Split (Loss)
Non-Split
Non-Split (Loss)
(a) T-Mobile
News Image Search
0
1
2
3
4
5
6
7
(b) AT&T
News Image Search
0
2
4
6
8
10
12
14
16
18
(c) Sprint
News Image Search
0
2
4
6
8
10
12
14
16
18
(d) Verizon
Figure 9—Fetching times for three Web site types exploring the effects of splitting connections (in loss-free and lossy environments).
The experiments described above tell us how a cellular
proxy interacts with flows to our Web site, but do not nec-
essarily inform how the proxies interact with other, popular
sites. For example, a cellular and content provider may have
a special agreement to bypass proxies for certain content,
or the content provider’s servers may be off-path from the
proxy. The methodology in the previous sections does not
help here because it requires access to the mobile device and
server; for popular sites we have access to the former only.
To understand proxying prevalence for commonly accessed
servers, we study how many of the 100 most popular web-
sites [1] are proxied. We have no visibility at the server end,
thus we use a fingerprint analysis technique to identify split
TCP connections. This enables us to determine if the carriers
in our study proxy all, some, or none of the sites.
Fingerprinting. The key observation driving our fingerprint-
based proxy detection is that proxies use predictable patterns
when setting bits in the TCP/IP header, which are different
from the ones used by the proxied Web servers. We use the
following rules to identify proxying for arbitrary Web sites.
For each Web site, we collect packet traces for four con-
nections with different properties. We fetch content via the
cellular (c) and wired connections (w), using HTTP (h), and
HTTPS (s). From the traces, we derive connection finger-
prints, denoted byF
c,h
,F
w,h
,F
c,s
andF
w,s
. The finger-
print for each packet trace is composed of the receiver win-
dow, the window scaling option value, advertised maximum
segment size, and the IP/ID pattern, all extracted from the
handshake response packet. In§4, we observed these fields
as being most frequently manipulated by proxies.
In the wired network environment, traffic cannot pass through
the cellular proxy. Therefore, the fingerprintsF
w,h
andF
w,s
are from the Web server (possibly a server-side middlebox).
To obtain the Web server’s fingerprint in the cellular environ-
ment, we need to bypass the potential proxy. In the previous
sections, we used a non-standard port (7777) since we con-
trolled the server. But in general, Web sites do not listen on
this port, so we use port 443 (HTTPS), which we verified to
be un-proxied, and is supported by many Web sites.
In addition, we use a common fingerprint obtained by
fetching content from our server, denoted byF
p
. We demon-
strated earlier that this is the cellular proxy’s fingerprint,
seen by the client when establishing a connection to our
server. Based on these five fingerprints per site, we conclude
that the phone communicates with an HTTP proxy to access
web resources if the following conditions apply:
F
c,h
=F
p
(1)
F
c,h
6=F
w,h
(2)
F
c,h
6=F
c,s
⇒F
c,s
=F
w,s
(3)
First, we check if the phone observes the PEP fingerprint
when establishing a connection to a web server using the cel-
lular network (rule 1). Then, we ensure that the web server
is not using the same fingerprint when responding to a client,
by accessing the server through a wired connection (rule 2).
Finally, we ensure that servers do not simply use different
fingerprints depending on the network access type. For this,
we check if the HTTP and HTTPS fingerprints in the cellu-
lar environment do not match, indicating that HTTPS traffic
bypasses the traffic. As a result we expect that the HTTPS
fingerprints should match regardless of the network access
type used (rule 3). Additionally we conclude that the phone
always communicates with the same proxy infrastructure for
sitesw
i
if the following additional condition holds:
∀w
1
,w
2
:F
c,h
(w
1
) =F
c,h
(w
2
) (4)
For each of the 100 most popular websites [1], we first
obtain the mobile-specific version of the site (if one exists).
To control for the fact that fixed-line and cellular networks
may resolve DNS names and perform redirection differently,
we generate theF
w,∗
fingerprints by connecting to the same
IP address found in the cellular network.
Among the 100 most popular websites,∼20 websites do
not support HTTPS. For these websites, we cannot check
rule 3. For the ∼10 websites that always redirect HTTP
requests to HTTPS, we use the redirection response as the
fingerprint for the HTTP response.
Results. Overall, rule 1 holds for each of the tested web-
sites, and rule 4 holds for all pairs of websites. This strongly
suggests that all Web traffic is handled by the same proxy
within a carrier’s network. Rules 2 and 3 do not hold for
a few destinations. In particular, the fingerprints for three
websites connecting over the wired network match the fin-
gerprint of the Sprint proxy. For another three websites we
observe non-matching HTTPS fingerprints.
The results above indicate that contents for index pages
are proxied, but they do not indicate whether the same is
true for all site content. In particular, we suspect that con-
6
Table 2—Sample TCP-based traceroutes indicating that T-Mobile
selectively proxies connections on port 80.
Hop Test server Test server YouTube
(port 80) (port 443) (port 80)
1 192.168.42.129 192.168.42.129 192.168.42.129
2 10.170.224.192 10.170.224.192 10.170.224.192
3 10.170.224.138 10.170.224.138 10.170.224.138
4 10.165.54.12 10.165.54.12 10.165.54.12
5 128.125.121.204 10.165.54.1 10.165.54.1
6 10.170.213.11 10.170.213.11
.
.
.
.
.
.
Last 128.125.121.204 208.54.39.44
tent such as streaming video, which is often heavily opti-
mized based on client performance, could bypass the proxy
to avoid interference with these optimizations. To test this
hypothesis we use a similar strategy as above for the video
streaming URLs from three popular video streaming web-
sites. For Hulu, we verify that the traffic is proxied for all
four carriers. Verizon uses IPv6 for YouTube and NetFlix
which we omit from this study. YouTube traffic is proxied
for AT&T and Sprint.
We observe T-Mobile’s traffic exchanged with certain Y-
ouTube server IP addresses bypasses the proxy. We measure
paths to YouTube and to other hosts usingtcptraceroute
to determine if this occurs because these YouTube servers
are not on the path to the proxy. Table 2 presents our results,
indicating that the IP-level path to YouTube servers differs
from those passing through the proxy (hop 5), and shares IP
hops with paths to our Web server over unproxied connec-
tions (hop 6).
7. DISCUSSION AND FUTURE WORK
Limitations. This paper focuses on methodologies and ex-
periments for identifying and characterizing proxies in four
US cellular networks using a small number of devices. We
measured the impact of proxies for a variety of network con-
figurations, but future work will use a broader set of loca-
tions and carriers to generalize our results. Our study char-
acterizes proxies only in IPv4 networks. Only one carrier,
Verizon, supported native IPv6 connectivity (in addition to
IPv4). Verizon proxies v4 Web traffic but does not proxy
it when using IPv6, a topic of future work. This study fo-
cused on behavior for the 100 most-popular Web sites and
one testbed Web site; we found that proxying was consistent
for all but YouTube on T-Mobile. We believe that such ex-
ceptions to proxying are rare, but we would like to evaluate
this on more Web sites.
Selective proxying. We were interested to discover that
proxies interpose on connections to almost all major Web
sites, but Google’s YouTube traffic bypasses T-Mobile prox-
ies. We cannot be certain, but it seems likely that Google
worked with T-Mobile to enable the bypass. YouTube ac-
counts for significant portions of Internet traffic, and Google
has actively developed approaches to improve delivery [3,5,
10, 14, 16]. This suggests that Google sees benefit in main-
taining an end-to-end connection to clients, and T-Mobile
appears willing to work with (at least some) providers to en-
able bypassing of the proxy. HTTPS provides another mean-
s to bypass the proxy, and providers are increasingly using
it to serve Web content. It will be interesting to observe
trends over time, to see whether the role of proxies diminish-
es as content moves to HTTPS and, perhaps, as more Web
providers negotiate arrangements like YouTube has.
Proxy evolution. Despite evidence of selective proxying
and unclear performance benefits from existing proxies, we
believe that future proxies can serve an important role in cel-
lular networks. Cellular carriers control the whole transport
segment between the client device and the proxy. As such
it is possible to fine tune connections. For example, con-
nections between the phone and the proxy can use advanced
protocol features which cannot be easily deployed in a public
network due to potential third-party interference [14]. Fur-
ther, with explicit proxies (e.g., SPDY/compression prox-
ies [2]) a client can use a single connection to the proxy,
which then in turn establishes connections to requested sites.
7
8. REFERENCES
[1] Alexa top 100 websites. http://www.alexa.com/topsites.
[2] Data Compression Proxy.
https://developer.chrome.com/multidevice/data-compression.
[3] Experimenting with QUIC. http:
//blog.chromium.org/2013/06/experimenting-with-quic.html.
[4] SPDY Proxy. http://www.chromium.org/spdy/spdy-proxy.
[5] SPDY whitepaper.
http://www.chromium.org/spdy/spdy-whitepaper.
[6] Sprint Community.
https://community.sprint.com/baw/thread/144305.
[7] F. Baccelli, G. Carofiglio, and S. Foss. Proxy caching in split
TCP: Dynamics, stability and tail asymptotics. In Proc.
INFOCOM.
[8] J. Border, M. Kojo, J. Griner, G. Montenegro, and Z. Shelby.
Performance Enhancing Proxies Intended to Mitigate
Link-related Degradations. Technical report, RFC 3135,
2001.
[9] A. Botta and A. Pescap´ e. Monitoring and measuring wireless
network performance in the presence of middleboxes. In
Proc. WONS, 2011.
[10] N. Dukkipati, T. Refice, Y . Cheng, J. Chu, T. Herbert,
A. Agarwal, A. Jain, and N. Sutin. An Argument for
Increasing TCP’s Initial Congestion Window. ACM Comput.
Commun. Rev., 2010.
[11] N. Ehsan, M. Liu, and R. J. Ragland. Evaluation of
performance enhancing proxies in Internet over satellite.
IJCS, 16(6), 2003.
[12] J. Erman, A. Gerber, M. T. Hajiaghayi, D. Pei, S. Sen, and
O. Spatscheck. To cache or not to cache: The 3g case.
[13] V . Farkas, B. H´ eder, and S. Nov´ aczki. A Split Connection
TCP Proxy in LTE Networks. In Inf. Comm. Tech., 2012.
[14] T. Flach, N. Dukkipati, A. Terzis, B. Raghavan, N. Cardwell,
Y . Cheng, A. Jain, S. Hao, E. Katz-Bassett, and R. Govindan.
Reducing Web Latency: the Virtue of Gentle Aggression. In
Proc. SIGCOMM, 2013.
[15] A. Friedrich, J. Yakemovic, T. S. Taylor, T. Hansen, and
K. Selvamani. Performance Enhancing Proxy, 2007. US
Patent App. 11/966,485.
[16] M. Ghobadi, Y . Cheng, A. Jain, and M. Mathis. Trickle: Rate
Limiting YouTube Video Streaming. In Proc. USENIX ATC,
2012.
[17] C. Gomez, M. Catalan, D. Viamonte, . c. J. Paradells, and
A. Calveras. Web browsing optimization over 2.5G and 3G:
end-to-end mechanisms vs. usage of performance enhancing
proxies. Wireless Comm. and Mob. Comp., 2008.
[18] S. Ha and I. Rhee. Hybrid Slow Start for High-Bandwidth
and Long-Distance Networks. In Proc. PFLDnet, 2008.
[19] S. Ha, I. Rhee, and L. Xu. CUBIC: a new TCP-friendly
high-speed TCP variant. ACM SIGOPS Op. Sys. Rev., 2008.
[20] J. Huang, F. Qian, Y . Guo, Y . Zhou, Q. Xu, Z. M. Mao,
S. Sen, and O. Spatscheck. An in-depth study of LTE: Effect
of network protocol and application behavior on
performance. In Proc. SIGCOMM, 2013.
[21] M. Ivanovich, P. Bickerdike, and J. Li. On TCP performance
enhancing proxies in a wireless environment. IEEE Comm.
Mag., 46(9), 2008.
[22] H. Jiang, Y . Wang, K. Lee, and I. Rhee. Tackling bufferbloat
in 3G/4G networks. In Proc. IMC, 2012.
[23] C. Kreibich, N. Weaver, B. Nechaev, and V . Paxson.
Netalyzr: Illuminating the edge network. In Proc. IMC,
2010.
[24] M. C. Necker, M. Scharf, and A. Weber. Performance of
different proxy concepts in UMTS networks. In Wireless Sys.
and Mob. in Next Gen. Internet. 2005.
[25] P. Rodriguez and V . Fridman. Performance of PEPs in
cellular wireless networks. In Web content caching and
distribution, pages 19–38. Springer, 2004.
[26] Z. Wang, Z. Qian, Q. Xu, Z. Mao, and M. Zhang. An untold
story of middleboxes in cellular networks. In Proc.
SIGCOMM, 2011.
[27] N. Weaver, C. Kreibich, M. Dam, and V . Paxson. Here Be
Web Proxies. In Proc. PAM, 2014.
8
Abstract (if available)
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 961 (2015)
PDF
USC Computer Science Technical Reports, no. 934 (2013)
PDF
USC Computer Science Technical Reports, no. 957 (2015)
PDF
USC Computer Science Technical Reports, no. 971 (2017)
PDF
USC Computer Science Technical Reports, no. 939 (2013)
PDF
USC Computer Science Technical Reports, no. 958 (2015)
PDF
USC Computer Science Technical Reports, no. 935 (2013)
PDF
USC Computer Science Technical Reports, no. 949 (2014)
PDF
USC Computer Science Technical Reports, no. 921 (2011)
PDF
USC Computer Science Technical Reports, no. 692 (1999)
PDF
USC Computer Science Technical Reports, no. 704 (1999)
PDF
USC Computer Science Technical Reports, no. 746 (2001)
PDF
USC Computer Science Technical Reports, no. 848 (2005)
PDF
USC Computer Science Technical Reports, no. 771 (2002)
PDF
USC Computer Science Technical Reports, no. 852 (2005)
PDF
USC Computer Science Technical Reports, no. 945 (2014)
PDF
USC Computer Science Technical Reports, no. 937 (2013)
PDF
USC Computer Science Technical Reports, no. 941 (2014)
PDF
USC Computer Science Technical Reports, no. 669 (1998)
PDF
USC Computer Science Technical Reports, no. 915 (2010)
Description
Xing Xu, Yurong Jiang, Tobias Flach, Ethan Katz-Bassett, David Choffnes, and Ramesh Govindan. "Investigating transparent web proxies in cellular networks." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 944 (2014).
Asset Metadata
Creator
Choffnes, David
(author),
Flach, Tobias
(author),
Govindan, Ramesh
(author),
Jiang, Yurong
(author),
Katz-Bassett, Ethan
(author),
Xu, Xing
(author)
Core Title
USC Computer Science Technical Reports, no. 944 (2014)
Alternative Title
Investigating transparent web proxies in cellular networks (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
8 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16270237
Identifier
14-944 Investigating Transparent Web Proxies in Cellular Networks (filename)
Legacy Identifier
usc-cstr-14-944
Format
8 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/