Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Balancing security and performance of network request-response protocols
(USC Thesis Other)
Balancing security and performance of network request-response protocols
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
BALANCING SECURITY AND PERFORMANCE OF NETWORK
REQUEST-RESPONSE PROTOCOLS
by
Liang Zhu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2018
Copyright 2018 Liang Zhu
Dedication
To my parents and grandparents, for the love and sacrifice.
ii
Acknowledgments
It has been a quite amazing journey to conduct my PhD research at USC. I want to thank
people who have helped, guided and inspired me over the past years.
First and foremost, I would like to thank my advisor Prof. John Heidemann. I have
been very fortunate to work with John during my PhD study. John brings me into
the research of computer networks that I have been fascinated with. John is a serious
researcher and he sets an example for me. He teaches me how to define and approach
research problems, and critical thinking. We collaborated on every piece of my PhD
work, and his advice not only advances the core of research ideas, but also improves the
presentation of research results. I appreciate his guidance, encouragement and valuable
feedback.
I also would like to thank Prof. Ramesh Govindan, Prof. Bhaskar Krishnamachari,
Prof. Ethan Katz-Bassett and Prof. Wyatt Lloyd for taking the time to serve on my qual-
ifying examination and dissertation committee, and providing valuable feedback on this
dissertation. Particularly, Prof. Ethan Katz-Bassett gave a lot of detailed and sharp com-
ments on this dissertation, and his advice greatly helped to shape the thesis of this dis-
sertation.
This dissertation is built upon collaborations. I want to thank Dr. Johanna Amann
for the guidance during my internship at ICSI. We collaborated on the study of OCSP
(Chapter 2) which was primarily done when I was at ICSI. I would like to thank Zi
iii
Hu, Duane Wessels and Allison Mankin who contributed to the study of Connection-
Oriented DNS (Chapter 3).
I also want to thank my fellow colleagues and friends at the ANT Lab and USC for
the valuable inputs on my work and sharing our research ideas: Yuri Pradkin, Lin Quan,
Xue Cai, Xun Fan, Zi Hu, Calvin Ardi, Abdul Qadeer, Hang Guo, Lan Wei, Hao Shi,
Xiyue Deng, Abdulla Alwabel and many others.
The completion of this dissertation would not have been possible without the end-
less support of my family. I want to thank my parents for providing me with the best
education resources and supporting all my pursuits. I want to thank my wife, Yutong,
for her patience, faithful support and sharing a life journey with me. I also want to thank
my grandparents, my uncles and many others in my family for the support.
Liang Zhu
University of Southern California
September 2018
iv
Table of Contents
Dedication ii
Acknowledgments iii
List of Tables viii
List of Figures ix
Abstract xi
1 Introduction 1
1.1 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Demonstrating the Thesis Statement . . . . . . . . . . . . . . . . . . . 5
1.2.1 Measuring OCSP latency . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 DNS over TCP and TLS . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 DNS Experiment Framework . . . . . . . . . . . . . . . . . . . 8
1.2.4 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Measuring the Latency and Pervasiveness of TLS Certificate Revocation 14
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 OCSP Use in Applications and Hosts . . . . . . . . . . . . . . . . . . . 18
2.4 Latency of OCSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 OCSP Delay in Network Trac . . . . . . . . . . . . . . . . . 22
2.4.2 OCSP Server Delay . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 OCSP Overhead in TLS . . . . . . . . . . . . . . . . . . . . . 26
2.4.4 Eectiveness of OCSP Caching . . . . . . . . . . . . . . . . . 29
2.5 OCSP In Action: Revoked certificates . . . . . . . . . . . . . . . . . . 30
2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
v
3 Connection-Oriented DNS to Improve Privacy and Security 35
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.2 The Limitations of Single-Packet Exchange . . . . . . . . . . . 42
3.2.3 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Design and Implementation of T-DNS . . . . . . . . . . . . . . . . . . 48
3.3.1 DNS over TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.2 DNS over TLS . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.3 Implementation Status . . . . . . . . . . . . . . . . . . . . . . 52
3.3.4 Gradual Deployment . . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Connection Reuse and Resources . . . . . . . . . . . . . . . . . . . . . 54
3.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.2 Trace Replay and Parameterization . . . . . . . . . . . . . . . 56
3.4.3 Concurrent Connections and Hit Fraction . . . . . . . . . . . . 57
3.5 Performance Under Attack . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.1 DNS: Amplifying Attacks on Others . . . . . . . . . . . . . . . 60
3.5.2 Direct Denial-of-Service on the DNS Server . . . . . . . . . . . 61
3.6 Client-side Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.6.1 Computation Costs . . . . . . . . . . . . . . . . . . . . . . . . 66
3.6.2 Latency: Stub-to-Recursive Resolver . . . . . . . . . . . . . . 68
3.6.3 Latency: Recursive to Authoritative . . . . . . . . . . . . . . . 71
3.6.4 Client connection-hit fractions . . . . . . . . . . . . . . . . . . 73
3.6.5 Modeling End-to-End Latency for Clients . . . . . . . . . . . . 75
3.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.7.1 Siblings: DNSSEC and DANE/TLSA . . . . . . . . . . . . . . 78
3.7.2 DNSCrypt and DNSCurve . . . . . . . . . . . . . . . . . . . . 79
3.7.3 Unbound and TLS . . . . . . . . . . . . . . . . . . . . . . . . 80
3.7.4 Reusing Other Standards: DTLS, TLS over SCTP, HTTPS, and
Tcpcrypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.7.5 Other Approaches to DNS Privacy . . . . . . . . . . . . . . . . 82
3.7.6 Specific Attacks on DNS . . . . . . . . . . . . . . . . . . . . . 82
3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4 DNS Experimentation at Scale 85
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2 LDplayer: DNS trace player . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.1 Design Requirements . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.2 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . 91
4.2.3 Synthesize Zones to Provide Responses . . . . . . . . . . . . . 93
4.2.4 Emulate DNS Hierarchy Eciently . . . . . . . . . . . . . . . 96
4.2.5 Mutate Trace For Various Experiments . . . . . . . . . . . . . 101
vi
4.2.6 Distribute Queries For Accurate Replay . . . . . . . . . . . . . 102
4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.4.1 Experiment Setup and Traces . . . . . . . . . . . . . . . . . . 107
4.4.2 Accuracy of Replay Timing and Rate . . . . . . . . . . . . . . 108
4.4.3 Accuracy of Responses when Emulating DNS Hierarchy . . . . 112
4.4.4 Single-Host Throughput . . . . . . . . . . . . . . . . . . . . . 118
4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.5.1 Impact of Increased DNSSEC Queries . . . . . . . . . . . . . . 119
4.5.2 Performance of DNS over TCP and TLS at a Root Server . . . . 121
4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5 Future Work and Conclusions 133
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6 Appendix for Connection-Oriented DNS to Improve Privacy and Security 140
6.1 Current Query Response Sizes . . . . . . . . . . . . . . . . . . . . . . 141
6.2 Domain names per Web Page . . . . . . . . . . . . . . . . . . . . . . . 141
6.3 Additional Data for Server-Side Latency . . . . . . . . . . . . . . . . . 142
6.4 Detailed Discussion of Latency . . . . . . . . . . . . . . . . . . . . . . 142
6.4.1 Detailed Discussion of Latency: Stub-to-Recursive Resolver . . 146
6.4.2 Detailed Discussion of Latency: Recursive Resolver to Author-
itative Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.5 Data To Estimate Stub-to-Recursive and Recursive-to-Authoritative RTTs153
6.6 Additional Data for Client-Side Latency . . . . . . . . . . . . . . . . . 156
6.7 Detailed Evaluation of Deployment . . . . . . . . . . . . . . . . . . . 156
6.7.1 Overall goals and constraints . . . . . . . . . . . . . . . . . . . 156
6.7.2 Improving privacy . . . . . . . . . . . . . . . . . . . . . . . . 159
6.7.3 Preventing DoS . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.7.4 Removing Policy Constraints . . . . . . . . . . . . . . . . . . . 162
6.7.5 Comparison to deployment of alternatives . . . . . . . . . . . . 163
6.8 Potential Vulnerabilities Raised by TCP . . . . . . . . . . . . . . . . . 164
6.9 Relationship of T-DNS and TLS to DTLS . . . . . . . . . . . . . . . . 165
Bibliography 167
vii
List of Tables
2.1 OCSP applications observed. . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 CDN usage of 304 unique OCSP servers. . . . . . . . . . . . . . . . . 23
2.3 Top 10 busy OCSP servers. . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Certificates retrieved for Alexa top-1000 websites. . . . . . . . . . . . . 26
2.5 Distribution of OCSP validity times for 290 k unique certificates. . . . . 30
3.1 Benefits of T-DNS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Design choices in T-DNS as compared to alternatives. . . . . . . . . . . 49
3.3 Datasets used to evaluate connection reuse and concurrent connections. 55
3.4 Amplification factors of DNS/UDP and TCP. . . . . . . . . . . . . . . 60
3.5 Limited resource for each protocol combination in tested DoS attacks. . 63
3.6 Computational costs of connection setup and packet processing. . . . . 67
4.1 Determine the time to send queries. . . . . . . . . . . . . . . . . . . . 104
4.2 DNS traces used in experiments and evaluation. . . . . . . . . . . . . . 108
4.3 Comparisons between responses in reality and responses in replay. . . . 114
viii
List of Figures
2.1 Cumulative distribution of OCSP lookup time. . . . . . . . . . . . . . . 21
2.2 Cumulative distribution of OCSP delay for Alexa top-1000 websites. . . 26
2.3 Cumulative distribution of OCSP network latency and TLS delay. . . . 28
2.4 Daily number of TLS connections, OCSP requests and connections. . . 30
3.1 DNS overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Estimated response sizes with dierent length DNSSEC keys. . . . . . 44
3.3 Number of concurrent connections (DNSChanger). . . . . . . . . . . . 57
3.4 Number of concurrent connections (Level 3 and B-root). . . . . . . . . 58
3.5 Median connection hit fractions at server-side. . . . . . . . . . . . . . . 58
3.6 Network topology for DoS attack evaluation. . . . . . . . . . . . . . . 61
3.7 UDP-based DNS performance under DoS attack. . . . . . . . . . . . . 62
3.8 TCP-based DNS performance under spoofed DoS attack. . . . . . . . . 62
3.9 TCP-based DNS performance with non-spoofed DoS attack. . . . . . . 63
3.10 Per-query response times with a cold cache. . . . . . . . . . . . . . . . 69
3.11 Per-query response times with dierent protocols. . . . . . . . . . . . . 72
3.12 Median client-side connection hit fractions. . . . . . . . . . . . . . . . 74
3.13 Median client-side connection hit fractions. . . . . . . . . . . . . . . . 74
3.14 End-to-end-performance with diernet protocols and RTT. . . . . . . . 75
ix
4.1 The architecture of LDplayer. . . . . . . . . . . . . . . . . . . . . . . . 92
4.2 Server proxies that manipulate the source and destination addresses. . . 96
4.3 The architecture of trace mutator. . . . . . . . . . . . . . . . . . . . . . 102
4.4 Multi-level query distribution. . . . . . . . . . . . . . . . . . . . . . . 103
4.5 A prototype of distributed query system. . . . . . . . . . . . . . . . . . 106
4.6 Network topology used for evaluation. . . . . . . . . . . . . . . . . . . 108
4.7 Query timing dierence between replayed and original traces. . . . . . 110
4.8 CDF of inter-arrival time of original and replayed traces. . . . . . . . . 111
4.9 Query rate dierences between replayed and original B-Root trace. . . . 111
4.10 Number of queries sent by the recursive server. . . . . . . . . . . . . . 116
4.11 The throughput of a fast replay over UDP. . . . . . . . . . . . . . . . . 119
4.12 Bandwidth of responses under dierent DNSSEC ZSK sizes. . . . . . . 120
4.13 CPU usage with dierent TCP timeouts under minimal RTT . . . . . . 121
4.14 Network topology for experiments of replaying Root DNS traces. . . . 121
4.15 Evaluation of server memory and connections requirement for TCP. . . 125
4.16 Evaluation of server memory and connections requirement for TLS. . . 126
4.17 Evaluation of query latency over all clients and non-busy clients. . . . . 128
6.1 Response sizes from name servers of the Alexa top-1000 websites. . . . 140
6.2 Cumulative distribution of number of unique hostnames per web page. . 142
6.3 The number of concurrent connections with dierent time-out windows. 143
6.4 Server-side hit ratio with dierent time-out windows. . . . . . . . . . . 144
6.5 Quartile plot of server-side connection hit fraction. . . . . . . . . . . . 145
6.6 RTTs for ISP-provided and three third-party recursive resolvers. . . . . 154
6.7 RTTs to authoritative servers of Alexa top-1000 domains. . . . . . . . . 154
6.8 RTTs to authoritative servers of random 1000 Alexa domains. . . . . . 155
6.9 Cumulative distribution of client-side connection hit fraction. . . . . . . 157
x
Abstract
The Internet has become a popular tool to acquire information and knowledge. Usu-
ally information retrieval on the Internet depends on request-response protocols, where
clients and servers exchange data. Despite of their wide use, request-response pro-
tocols bring challenges for security and privacy. For example, source-address spoof-
ing enables denial-of-service (DoS) attacks, and eavesdropping of unencrypted data
leaks sensitive information in request-response protocols. There is often a trade-o
between security and performance in request-response protocols. More advanced pro-
tocols, such as Transport Layer Security (TLS), are proposed to solve these problems
of source spoofing and eavesdropping. However, developers often avoid adopting those
advanced protocols, due to performance costs such as client latency and server memory
requirement. We need to understand the trade-o between security and performance for
request-response protocols and find a reasonable balance, instead of blindly prioritizing
one of them.
The thesis of this dissertation is that it is possible to improve security of net-
work request-response protocols without compromising performance, by protocol
and deployment optimizations, that are demonstrated through measurements of
protocol developments and deployments. We support the thesis statement through
three specific studies, each of which uses measurements and experiments to evaluate the
development and optimization of a request-response protocol. We show that security
xi
benefits can be achieved with modest performance costs. In the first study, we measure
the latency of Online Certificate Status Protocol (OCSP) in TLS connections. We show
that OCSP has low latency due to its wide use of CDN and caching, while identify-
ing certificate revocation to secure TLS. In the second study, we propose to use TCP
and TLS for the Domain Name System (DNS) to solve a range of fundamental prob-
lems in DNS security and privacy. We show that DNS over TCP and TLS can achieve
favorable performance with selective optimization. In the third study, we build a config-
urable, general-purpose DNS trace replay system that emulates global DNS hierarchy
in a testbed and enables DNS experiments at scale eciently. We use this system to
further prove the reasonable performance of DNS over TCP and TLS at scale in the real
world.
In addition to supporting our thesis, our studies have their own research contribu-
tions. Specifically, In the first work, we conducted new measurements of OCSP by
examining network trac of OCSP and showed a significant improvement of OCSP
latency: a median latency of only 20 ms, much less than the 291 ms observed in prior
work [SHI
+
12]. We showed that CDN serves 94% of the OCSP trac and OCSP use
is ubiquitous. In the second work, we selected necessary protocol and implementation
optimizations for DNS over TCP/TLS, and suggested how to run a production TCP/TLS
DNS server [HZH
+
16]. We suggested appropriate connection timeouts for DNS opera-
tions: 20 s at authoritative servers and 60 s elsewhere. We showed that the cost of DNS
over TCP/TLS can be modest. Our trace analysis showed that connection reuse can be
frequent (60%–95% for stub and recursive resolvers). We showed that server memory
is manageable, and latency of connection-oriented DNS is acceptable (9%–22% slower
than UDP). In the third work, we showed how to build a DNS experimentation frame-
work that can scale to emulate a large DNS hierarchy and replay large traces. We used
this experimentation framework to explore how trac volume changes (increasing by
xii
31%) when all DNS queries employ DNSSEC. Our DNS experimentation framework
can benefit other studies on DNS performance evaluations.
xiii
Chapter 1
Introduction
The Internet has become an essential part of people’s everyday life. Because of the high
speed and easy access of the Internet, more and more people use it to acquire information
and knowledge, such as watching news and searching answers, and to communicate
with each other, such as email and online social networks. Google reported 1.2 trillion
searches, an average 3.3 billion per day in 2012 [Goo12]. Facebook has 1.45 billion
daily active users in March 2018 [Fac18]. As of December 2017, there are estimated 4.1
billion Internet users, more than half of the world population [Int17].
Almost every instance of information retrieval on the Internet is based on a request-
response protocol, where a client sends a request and a server replies with data. Two
widely used examples of request-response protocols for network applications are the
Hypertext Transfer Protocol (HTTP) and the Domain Name System (DNS) protocol.
As the fundamental data exchange method for the web, HTTP is critical for every web
transactions. DNS provides the essential name-to-address mapping, used by web, e-
email and all other activities on the Internet. DNS also helps to build other services,
such as anti-spam [LS12] and service discovery [SCKB06].
Request-response protocols raise concerns about security and privacy. First, we need
to authenticate the responses. If the source of the response is unverified, an attacker
can spoof the authority of a real server and send a false and destructive reply, causing
security issues. Second, we need to keep request and response private when they contain
personally identifiable information (PII) or sensitive data. Third, the data-exchange
1
property of request-response protocols facilitates two types of attacks: flooding requests
to exhaust server resources and spoofing requests to attack other hosts.
In network protocols, there is often a trade-o between security and performance.
We often need more advanced protocols to guard security and privacy. However,
advanced protocols can bring performance overhead, such as extra latency and more
resources requirements (such as CPU and memory) at servers. For example, HTTPS
(HTTP over Transport Layer Security (TLS)) provides confidentiality and integrity with
public key infrastructure (PKI). However, TLS brings extra latency for key exchange,
cryptographic processing, and checking certificate revocation. Since ordinary users
care more about client latency, applications often take the risk of sacrificing security
in exchange of good performance. For example, most major web browsers do not reli-
ably check certificate revocation information in HTTPS [LTZ
+
15] because of its latency.
As another example, DNS always supports TCP which can mitigate UDP amplification
attack. However, current DNS implementations prefer UDP for queries and responses
due to traditional expectation on TCP latency.
Given the importance of request-response protocols in network applications, we
need to balance security and performance, instead of blindly prioritizing one of them.
There are two main challenges in evaluating and discovering the options about the
trade-o: massive client applications and flexibility in server experiments. First, there
are many dierent client applications running under dierent networks and operating
systems. It is dicult or even impossible to investigate all use cases. Second, it is often
hard to conduct repeatable experiments for dierent configurations, and at the same
scale as production servers, because such experiments would require massive server
infrastructure and huge input trac volume.
2
In this thesis, we contribute to understanding these challenges by performing selec-
tive measurements and scaled testbed experiments to study the balance between the
security and performance of request-response protocols in network applications.
1.1 Thesis Statement
This thesis shows that it is possible to improve security of network request-response
protocols without compromising performance, by protocol and deployment opti-
mizations, that are demonstrated through measurements of protocol developments
and deployments.
This thesis statement has three parts. The first part is “it is possible to improve secu-
rity of network request-response protocols without compromising performance”. Our
goal is to improve the balance between security and performance for network request-
response protocols. Common security problems of request-response protocols include
privacy, lack of authentication, amplification attack targeting other hosts, and denial-of-
service (DoS) attack aiming at servers. Despite existing techniques built to solve these
common problems, applications may not use security protocols due to concerns about
performance overhead. We aim to improve the security with only modest performance
overhead, such as reasonable client latency and server memory requirements.
As stated in the first part of the thesis statement, “request-response protocols” are
our research subject. By request-response protocols, this study examines protocols that
have three unique properties that separate them from other network protocols. First, the
request-response protocols in this work are based on the client-server model [DR92].
Second, the request-response protocols have two-way message exchange where a client
initiates a query and expects a reply from a server. Third, the sizes of the request and
response are generally small (order of kilobytes) and the process of message exchange is
3
generally short (order of milliseconds). Specifically, we choose Online Certificate Sta-
tus Protocol (OCSP) [SMA
+
13] and Domain Name System (DNS) [Moc87b, Moc87a]
protocols to study because they are important examples of request-response protocols.
OCSP is used for certificate revocation [FCW
+
03] and secures Transport Layer Security
(TLS) [DR08] and all the other protocols using certificates. The use of certificates is
fundamental for Internet security. Beyond fundamental name-to-address mapping, DNS
provides many other services, such as anti-spam [LS12] and replica selection for content
delivery networks (CDNs) [SCKB06].
The second part of the thesis statement is “protocol and deployment optimizations”.
Optimizations of protocol design and deployment can improve performance by avoiding
unnecessary overhead of security protocols. Protocols define the communication details,
but the performance also depends on decisions about protocol deployment, such as loca-
tions of clients and servers. Understanding current deployment optimizations of security
protocols is important because it may uncover an existing performance improvement
which is unknown by applications developer. Revealed performance improvement can
encourage the adoption of security protocols that were previously considered too slow.
We also observe that, by taking advantage of protocol developments and optimizing
protocols, it is possible to deploy certain security protocols, which were not deployable
in the past due to performance overhead. Specifically, we show that deployment opti-
mization is the reason for improved OCSP latency. We select a set of key design and
implementation decisions for DNS over TCP, in order to achieve good performance.
The third part of the thesis statement is “measurements of protocol developments
and deployments”. Measurements guide protocol tuning and optimizations. We need
measurements to understand performance advances due to current development of pro-
tocols and deployment. We also need measurements to evaluate the cost and benefit of
4
possible optimizations, which helps to discover specific protocol tuning and optimiza-
tions. By development of protocols and deployment, we mean new changes in protocols
or their deployment that could lead to performance improvements. By measuring devel-
opment, we mean using measurements to understand the developments and quantify the
performance benefit.
1.2 Demonstrating the Thesis Statement
We prove the thesis statement through three specific studies, each of which uses mea-
surements or experiments to evaluate the development and optimization of a request-
response protocol, and we show that security benefit comes with modest performance
cost. We next introduce these studies.
1.2.1 Measuring OCSP latency
Our first study (Chapter 2) evaluates latency of OCSP that improves the security of TLS
and all the other protocols that use certificates.
This study of OCSP latency demonstrates it is possible to improve security of net-
work request-response protocols without compromising performance by deployment
optimizations that are demonstrated through measurements of protocol deployments,
an example for part of the thesis statement.
The security improvement that we show with this study is that applications that
uses certificates can check certificate revocation using OCSP to improve security with
reasonable performance. Our study does not improve the security of OCSP itself, but
it shows the security of TLS and all the other protocols that use certificates, such as
HTTPS, can be improved by using OCSP. This security improvement does not com-
promise too much performance: we observe a median OCSP latency of only 20 ms
5
which has improved significantly compared to 291 ms reported in prior study [SHI
+
12].
Because of OCSP latency improvement, it is now possible for applications that uses cer-
tificates to quickly checking certificate revocation with OCSP, improving the security of
all network activities that use certificates. Previous applications disable OCSP because
of its large latency [Lan12], taking security risk of using compromised certificates.
The measurements that we use in this study are passive network trace and active
probing servers. Our measurements all show similar results of OCSP latency. We first
measure OCSP in live trac passively collected at the Internet uplink of a large research
university. Prior studies measure application-level OCSP latency by probing OCSP
servers and using browser extensions [SHI
+
12]. We measure OCSP latency at network
level by examining OCSP network trac which has broader coverage of diverse appli-
cations and users, in addition to application-level probing. To verify our observation on
the development of OCSP, we then study individual servers by actively probing OCSP
servers of popular Alexa top websites from two dierent vantage points. Our active
measurement is ecient because we can get results that approximate the whole popula-
tion while probing only 1000 websites (0.0006% of all 172 M active websites [Netb] on
the Internet).
The optimization that we show with this study is the optimization of protocol deploy-
ment, where the use of CDN among OCSP servers leads to OCSP performance improve-
ment. We discover that almost all (94%) of the requests are served by CDNs, and many
(40%) of the OCSP servers use CDN. This heavy use of CDN strongly suggests that the
improvement of OCSP latency is because of the use of CDN.
1.2.2 DNS over TCP and TLS
Our first study shows the potential of TLS because of fast certificate revocation, and it
supports our use of TLS for DNS to improve the privacy of users’ DNS queries. In our
6
second study (Chapter 3), we propose to use TCP and TLS to improve the security and
privacy of DNS.
This study of DNS over TCP and TLS demonstrates it is possible to improve security
of network request-response protocols without compromising performance by protocol
optimizations that are demonstrated through measurements of protocol developments,
an example for part of the thesis statement.
The security improvement that we show with this study is that TCP mitigates spoof-
ing and amplification for DoS, and TLS provides privacy from users to their DNS
resolvers, optionally to authoritative servers. This security improvement only requires
modest cost without compromising performance: client latency approaches UDP and
server memory requirements match current hardware. Previous expectations about DNS
suggest connections will balloon client latency and overwhelm server with state.
The protocol optimizations we show with this study are connection persistence and
a set of design decisions. Specifically, we show that connection persistence, reusing the
same connection for multiple requests, amortizes connection setup. We identify a set of
key design and implementation decisions for DNS over TCP and TLS to achieve good
performance: query pipelining, out-of-order responses, TCP fast-open and TLS connec-
tion resumption, and plausible timeouts. Pipelining is sending multiple queries before
their responses, avoiding round-trip delays for stop-and-wait alternative. Out-of-order
responses avoid head-of-line blocking at servers. TCP fast-open and TLS connection
resumption save RTT for connection setup and shift state to clients when possible. We
propose plausible timeouts for connection and session reuse, in order to balance client
latency and server memory.
The main measurement approaches that we use in this study are network trace anal-
ysis and modeling.
7
First, we analyze network traces to measure and study protocol tuning for connec-
tion and session reuse. We study connection hit fraction both at client and server to
understand the benefit of connection reuse. We estimate server memory requirement
based on total concurrent connections and conservative memory per connection. We
then derive appropriate connection reuse timeouts from analysis, in order to balance
client latency and server memory requirement. To the best of our knowledge, we are
the first to combine estimated server memory and connection hit fraction to discover the
reasonable timeout of TCP connections and TLS sessions for DNS.
Second, we model end-to-end client latency based on the our measurement and anal-
ysis results. To the best of our knowledge, we are the first to model end-to-end client
latency for DNS over TCP and TLS to study the performance. We show that latency of
DNS over TCP and TLS approaches UDP with all the protocol optimizations.
1.2.3 DNS Experiment Framework
In the third study (Chapter 4), we use large-scale trace replay and real client-server
implementations, to measure client latency and server memory of connection-oriented
DNS.
This study demonstrates it is possible to improve security of network request-
response protocols without compromising performance by protocol optimizations that
are demonstrated through measurements of protocol developments, an example for part
of the thesis statement.
Although the security goals and protocol designs in this study are the same as stated
in our second study, we use a completely dierent measurement approach to answer
additional research questions, such as dynamics of the performance overhead for DNS
over TCP and TLS. Our experimental results confirm prior models in a real-world imple-
mentation, showing that even if all DNS were shifted to connection oriented protocols,
8
memory requirements are manageable. Our experiments show that connection tracking
and cryptography processing in TLS does not increase CPU usage noticeably over UDP.
The measurement approach that we use in this study is experiments of large-scale
DNS trace replay, enabling to answer the open research questions that could not be
answered in the second study.
We conduct a new measurement of dynamic performance of clients and servers for
DNS over TCP and TLS, a new research contribution. The new result is to show the
dynamics of the performance overhead under dierent scenarios. For example, under-
standing worst-case performance overhead can help capacity planning at servers. These
evaluation were not possible in our previous study, due to limitation of analysis and
modeling, where we use expected RTT for client latency and fixed conservative memory
per connection for server state. We emulate RTT and study client latency with dierent
RTT. We replay queries against real servers and study the dynamics of server CPU and
memory requirements in dierent real server implementations.
Our new measurement approach is a DNS experimentation framework that uses a
single DNS server to emulate complete DNS hierarchy eciently. We use a single server
to correctly emulate multiple independent zones while providing correct responses. We
use proxies to solve the problem of multi-level DNS emulation. A naive way of using
separate servers or virtual machines to host each authoritative zone is never scalable to
the number of zones in experiments. Future DNS studies that need to emulate DNS
hierarchy would benefit from our experiment architecture. Our DNS framework also
emulates connections for TCP, and supports repeatable experiments. Our system allows
the first evaluation of CPU consumption of DNS over TCP and TLS while prior work
was unable to model CPU costs. We use this framework to conduct connection-oriented
DNS trace replay for the experiments of evaluating DNS over TCP and TLS.
9
1.2.4 Generalization
Other studies of evaluating the trade-o between the security and performance of net-
work request-response protocols can benefit from some of our approaches.
First, the protocol optimizations we select for DNS over TCP can potentially opti-
mize the performance for other similar request-response protocols which use UDP for
small message exchanges. For example, Network Time Protocol (NTP) uses UDP for
message exchanges, which also suers the similar spoofing and amplification attacks to
DNS over UDP. Although NTP over TCP has not been standardized, the possible use
of TCP for NTP emerges. A recent Internet draft proposes to use TLS and DTLS to
provide integrity and authentication for NTP [FST18], without specifying performance
suggestions. NTP over TCP might benefit from the similar approaches that we studied
for DNS over TCP, although future studies are needed. Based on our study of enhanced
DNS security with TCP, future studies can also explore the eectiveness of TCP miti-
gating spoofing and amplification attacks for NTP.
Second, our DNS testbed work provides an example of ecient emulation of large
server infrastructure for request-response protocols. We propose a new experiment
architecture that emulates complete DNS hierarchy with a single server, avoiding the
alternative of deploying hundreds of servers separately. Future DNS studies that need to
emulate DNS hierarchy would benefit from our experiment architecture, such as study-
ing DNS caching eects on query load of authoritative servers.
Finally, we are the first to evaluate connection-oriented DNS at large scale, which
requires emulation of round-trip time (RTT) and massive connections in experiments.
Emulation of RTT is important for experiments of connection-oriented DNS. Before
actual DNS request-response, DNS over TCP requires additional time for connection
setup, where a client sends a connection synchronization message to a server and waits
10
for the acknowledgment message back from the server. UDP does not incur this extra
message exchange required for connection setup with TCP. Future experiment archi-
tectures for connection-oriented DNS must follow our approach, emulating RTT and
dierent connections, in order to accurately evaluate connection-oriented DNS perfor-
mance, such as client latency and server memory.
1.3 Research Contributions
Each of the above three studies partly supports our thesis statement. Thus, the first con-
tribution is to prove our thesis statement. Additionally, each work has its own research
contributions.
In our study of OCSP latency, we make two contributions. Our first contribution is to
show that OCSP latency improves so that applications can now reliably check certificate
revocation to improve security by using OCSP, contradicting the prevailing wisdom that
OCSP was slow [SHI
+
12] and applications disabled OCSP due to latency [Lan12]. Our
new measurements of OCSP show its latency has improved significantly since 2012.
We see a median latency of only 20 ms, far lower than the 291 ms reported in prior
studies [SHI
+
12]. We show that one reason for this improvement is that most OCSP
trac today is served by CDNs. Our second contribution is a cost evaluation of OCSP
connections. We identify that OCSP verification typically accounts for 10% of the TLS
setup time. OCSP will almost never delay TLS when being run in parallel with the TLS
handshake, and it only adds a modest delay if run sequentially, depending on dierent
implementations of applications. To the best of our knowledge, we are the first to mea-
sure OCSP latency at network level by examining OCSP network trac that has broad
coverage of diverse applications and users, while prior studies measure application-level
OCSP latency by using individual application [SHI
+
12].
11
In our study of connection-oriented DNS, we make two contributions. Our first con-
tribution is the design of DNS over TCP. We select a set of key design decisions (such as
query pipelining and out-of-order responses) for DNS over TCP to achieve good perfor-
mance, which have been standardized in the specification of DNS over TLS [HZH
+
16].
Our second contribution is our new analysis of DNS over TCP and TLS shows that per-
formance cost is modest with our careful implementation choices. To the best of our
knowledge, we are the first modeling of end-to-end client latency for DNS over TCP
and TLS. Our models show latency increases by only 9% for TLS vs UDP-only where
TLS is used just from stub to recursive resolver, and it increases by 22% when we add
TCP from recursive to authoritative. With conservative timeouts (20 s at authoritative
servers and 60 s elsewhere) and estimated per-connection memory, we show that server
memory requirements match current hardware: a large recursive resolver may have 24k
active connections requiring about 3.6 GB additional RAM. We show that TCP and TLS
can improve the security and privacy of DNS with reasonable performance, contradict-
ing the prevailing expectations about DNS that suggest connections will balloon client
latency and overwhelm servers with state.
In our study of DNS trace replay, we build a DNS experimentation framework
(LDplayer) and make two contributions. First, we show how LDplayer can scale to
eciently model a large DNS hierarchy and playback large traces. LDplayer can cor-
rectly emulate multiple independent levels of the DNS hierarchy on a single instance
of DNS server, exploiting a combination of proxies and routing to circumvent opti-
mizations that would otherwise distort results. With LDplayer we show that a single
computer can accurately replay more than 87 k queries per second, twice as fast as typ-
ical query rates a DNS root letter. Multiple computers can generate trac at 10-100
that rate. Second, we demonstrate this capability of modifying the replay to explore
“what if” questions with two experiments. We explore how trac volume changes
12
(increasing by 31%) if all DNS queries employ DNSSEC. Relative to prior studies of
DNS over TCP [ZHH
+
15], our use of trace-replay provides strong statements about all
aspects of server memory (15 GB for TCP and 18 GB for TLS) and CPU usage with
real-world implementation. Other potential applications include the study server hard-
ware and software under denial-of-service attack, growth of the number or size of zones,
or changes in hardware and software. All of these questions are important operational
concerns today. While some have been answered through one-o studies and custom
experiments or analysis, LDplayer allows evaluation of actual server software, provid-
ing greater confidence in the results. We are the first to evaluate connection-oriented
DNS at large scale, by using trace replay, trace analysis and modeling.
13
Chapter 2
Measuring the Latency and
Pervasiveness of TLS Certificate
Revocation
In this chapter, we study the latency and pervasiveness of Online Certificate Status Pro-
tocol (OCSP) to support the thesis.
Transport-Layer Security (TLS) is the bedrock of Internet security for the web appli-
cations, and it depends on the Public Key Infrastructure (PKI) to authenticate endpoint
identity. An essential part of a PKI is the ability to quickly revoke certificates after a
key compromise. As the most common way to distribute revocation information, OCSP
allows clients to check revoked certificates by sending short HTTP requests to servers
of the respective Certificate Authorities (CAs). However, prior concerns about OCSP
latency and privacy raise questions about its use [SHI
+
12, Lan12]. Most major web
browsers do not reliably check certificate revocation information [LTZ
+
15]. We eval-
uate OCSP using passive network monitoring of live trac at the Internet uplink of a
large research university and verify the results using active scans at two dierent sites.
We aim to understand the current latency and deployment optimizations of OCSP, which
may uncover an existing performance improvement unknown by application developers.
This revealed performance improvement can encourage the adoption of OCSP that was
previously considered too slow.
14
This study of OCSP supports our thesis statement that it is possible to improve
security of network request-response protocols without compromising performance by
deployment optimizations that are demonstrated through measurements. The security
improvement we show in this study is that application now can reliably check certifi-
cate revocation by OCSP. Our study does not improve the security of OCSP. Instead, we
show the security of TLS and all the other protocols using certificates can be improved
by using OCSP. This security improvement comes with modest cost because of OCSP
latency improvements: we observe a median OCSP latency of 20 ms, while 291 ms
in prior study [SHI
+
12]. The optimization that we observe is the use of CDN among
OCSP servers that minimizes latency. We use active and passive measurements to mea-
sure OCSP latency and its deployment. To the best of our knowledge, we are the first
to examine OCSP network trac that has broader coverage of diverse applications and
users. The revealed performance improvement of OCSP by our study can encourage the
adoption of OCSP that was previously considered too slow, improving Internet security.
This study was a joint work with Dr. Johanna Amann and Prof. John Heidemann.
Part of this chapter was published in Passive and Active Measurements Conference
2016 [ZAH16].
2.1 Introduction
Transport Layer Security (TLS), the successor to Secure Socket Layer (SSL) is one of
the key building blocks of today’s Internet security. It provides authentication through
its underlying X.509 Public Key Infrastructure (PKI) as well as encryption for end-to-
end communication over the Internet such as online banking and e-mail.
With the millions of certificates that are part X.509 PKI, it is inevitable that some
private keys will be compromised by malicious third parties, lost, or corrupted. An
15
attacker that manages to get access to a certificate’s private key can impersonate its
owner until the certificate’s expiration date. Heartbleed is one example where the private
keys of certificates were potentially exposed [DLK
+
14,ZCL
+
14]. Even more risky than
attacks on individual certificates and keys are attacks on the infrastructure of specific
Certificate Authorities (CAs), which can issue certificates for any server (e.g. [Com11,
Bha11, Art11]).
Two primary mechanisms exist to revoke certificates: Certificate Revocation Lists
(CRLs) [CSF
+
08] which provide downloadable lists of revoked certificates, and the
Online Certificate Status Protocol (OCSP) [SMA
+
13] which allows clients to check
for revoked certificates by sending short HTTP requests to servers of the respective
CA. Alternatively, OCSP stapling [Pet13] allows revocation information to be sent by
the server in the initial TLS handshake. Some in the security community question
the usefulness and viability of these approaches, citing privacy, speed, and other con-
cerns [SHI
+
12, Lan12].
Today, most major web browsers do not reliably check certificate revocation infor-
mation [LTZ
+
15], thus opening their users up to attacks.
In this work, we examine live trac at the Internet uplink of the University of Cal-
ifornia at Berkeley (UCB) to check the pervasiveness and latency of OCSP, and then
confirm our conclusions with active measurements from two sites.
The primary contribution of this chapter is new measurements of OCSP that show
that OCSP latency has improved significantly since 2012. We see a median latency of
only 20 ms (x2.4), far lower than the 291 ms reported in previous studies [SHI
+
12]. We
show that one reason for this improvement is that most OCSP trac today is served
by content delivery networks (CDNs). The second contribution is a cost evaluation of
OCSP connections. We identify that OCSP verification typically accounts for 10% of
the TLS setup time. OCSP will almost never delay TLS when being run in parallel with
16
the TLS handshake, and it only adds a modest delay if run sequentially (x2.4.3). The
third contribution is examination of how OCSP is being used today: all popular web
browsers and important non-web applications such as MS-Windows code signing (x2.3)
use OCSP. Furthermore, 88% of the IPv4 addresses that perform TLS queries during
our measurement also perform OCSP queries. The final contribution is that this study
support the thesis.
2.2 Data Collection
Our study uses passive data collected from live Internet trac to determine character-
istics and features of OCSP use. We augment our passive data with information from
active scans to verify our timing results and to check which OCSP servers use CDNs.
We use passive measurements to study how OCSP is actually used on the Internet, and
to evaluate the interplay between server and client software. These passive measure-
ments are from a specific site (UCB), so our passive results depend on what sites that
population visits. We take active probes from two sites, Berkeley and the University of
Southern California. While this data source may bias our results, Berkeley has a large
user population and we probe many observed sites, so our data reflects the real expe-
riences of this population, and does not reflect outliers due to rarely used servers. Our
active measurements are from two sites (to avoid some bias), but both are well con-
nected and users with slower connectivity may experience higher latencies. We believe
this dataset is informative and reflects the lookup performance of current OCSP servers
and their use of CDNs, even if future work is needed to confirm the results from other
viewpoints. These risks are common to many measurement studies that depend on data
from large, real populations where multiple data sources are dicult to obtain due to
privacy concerns around network measurement.
17
For our data collection, we extended the Bro Network Monitor [bro,Pax99] with the
capability to parse OCSP requests and responses. Bro uses a file signature (expressed as
a regular expression) to detect OCSP requests and replies. We correlate OCSP messages
with TLS connections using IP addresses, certificate hashes and timing information (see
x2.4.3). Our changes will be integrated in the next version of Bro.
Our passive measurements cover 56 days of data taken between 2015-07-28 to 2015-
09-28 at the Internet uplink of the University of California at Berkeley (UCB). We record
data for only 56 days of this 63-day period because of outages due to hardware failures,
fire, and preemption by another study that required complete access to the hardware for
about a week. We observed 1690 M TLS connections with certificates encountered and
about 42 M OCSP requests over this period.
After processing the data we noticed that in 0.43% of the OCSP connections, we
have zero (or in a handful of cases negative) lookup times. We verified the correctness
of our measurement manually against network traces and were not able to reproduce
these error cases. We believe these impossible results are caused by interactions between
packet retransmissions and Bro.
2.3 OCSP Use in Applications and Hosts
We first want to understand how widely OCSP is used—how many applications and
hosts make OCSP queries.
Applications: We evaluate which applications use OCSP by examining the user-
agent header of the OCSP requests. Table 2.1 shows the resulting distribution of user-
agents. The majority of the lookups are done by Firefox and system libraries and dae-
mons: Microsoft-CryptoAPI (Windows) and ocspd (Mac OS).
18
category application percent
Web
browsers
32.10%
Firefox 31.63%
Chrome .21%
Pale moon .06%
Opera .06%
Rekonq, Bolt, Midori, Iceweasel, Seamonkey, Safari <.15%
Sonkeror, IE, Camino, Epiphany, Konqueror
Library or
daemon used
by
applications
66.87%
ocspd 37.15%
Microsoft-CryptoAPI 23.74%
securityd 4.74%
java 1.24%
cfnetwork <.0001%
Email client .32%
Thunderbird .30%
Postbox, Gomeza, Zdesktop, Eudora, Icedove .02%
Other
applications
.33%
Lightning .31%
Zotero .01%
Celtx, ppkhandler, Komodo, Dalvik, slimerjs, Unity <.0074%
Phoenix, Sunbird, Slurp, miniupnpc, googlebot
Entrust entelligence security provider
Unknown .38% Unknown .38%
Table 2.1: OCSP applications (based on HTTP user agent) observed in 41.87 M OCSP
HTTP requests. Date: 2015-07-28 to 2015-09-28.
To understand this distribution, we examine the behavior of common Internet
Browsers (IE, Chrome, Firefox, Safari) and operating systems. We find that Fire-
fox always uses its own user-agent, which is attributable to the fact that it uses
its own encryption library [nss]. Microsoft Internet Explorer and Safari use their
respective operating system functionality for OCSP lookups, not directly revealing
their user-agents. Google Chrome only uses OCSP for extended Validation certifi-
cates [Lan12, LTZ
+
15]. It uses the operating system functionality for OCSP lookups on
Windows and Mac OS. On Linux, it performs OCSP requests with its own user-agent.
This use of libraries makes it dicult to distinguish the dierent browsers. This
problem is exacerbated by the fact that a manual examination of OCSP requests revealed
that Windows and Mac OS also perform OCSP requests for application signatures with
19
the same user-agent. When we examine all unique OCSP requests (those for dierent
certificates), we see that 81% of these unique certificates account for nearly all (95%) of
total OCSP requests observed on the wire. Hence, the number of code-signing requests
is at most the number of requests without matching certificates in trac: 5% of all OCSP
requests and at 19% of the unique requests encountered.
Application Comments: While examining the OCSP requests, we noticed a num-
ber of software bugs in dierent implementations. According to the respective stan-
dard, an OCSP request sent with HTTP GET will be base64 and then URL-encoded.
Some clients do not adhere to this standard, skipping the URL-encoding of requests.
Servers still seem to accept these malformed requests. In our dataset, 99.9% of these
non-standard requests were caused by the Apple ocspd versions 1.0.1 and 1.0.2. The
bug was apparently fixed in version 1.0.3, appearing in MacOS 10.10. We also encoun-
tered requests where the user-agent only contains the string representation of a memory
address.
Clients can choose which hash algorithm they wish to use in an OCSP requests.
During our monitoring eort, all clients used SHA1.
During a random day (2015-08-24), the median size of the OCSP requests and
responses were 300 and 1900 bytes.
Use of OCSP by Hosts: To evaluate how many hosts send OCSP, we examine
how many IP addresses send both OCSP and TLS trac. We found that 88% of IPv4
addresses using TLS also send OCSP, suggesting widespread use of OCSP. We do not
measure IPv6 addresses because hosts exchange web trac via TLS on IPv6 but issue
their OCSP request via IPv4. Underuse of IPv6 for OCSP is likely because of limited
support of IPv6 in OCSP servers: only 45% of the 304 unique OCSP servers we observe
have an IPv6 address.
20
0
0.2
0.4
0.6
0.8
1
0.0001 0.001 0.01 0.1 1 10 100 1000
median: 19.25 ms
CDF
OCSP lookup time (seconds)
all
no connection reuse
connection reuse
get
post
Figure 2.1: Cumulative distribution of OCSP lookup time including the TCP handshake
time for the first OCSP request in HTTP connection, over 41.12 M OCSP lookups. Date:
2015-07-28 to 2015-09-28
Please note that Network Address Translation may cause an overestimate of OCSP
deployment. Ideally, one would want to determine the exact number of connections that
use OCSP; however performing such measurements would require simulating the use of
OCSP caching and is beyond the scope of this chapter.
2.4 Latency of OCSP
Web browsing is very sensitive to latency, and there have been concerns that the latency
introduced by OCSP is too high [Lan12]. In this section, we study OCSP latency in
three ways. First, we measure OCSP latency in live Internet trac inx2.4.1. Then, we
verify these results with active probes of OCSP servers inx2.4.2. Finally, we compare
OCSP latency to the TLS connection setup latency inx2.4.3.
21
2.4.1 OCSP Delay in Network Trac
As a first step, we use our passive dataset (x2.2), to analyze the distribution of OCSP
latency.
Methodology: We define OCSP lookup time as the time from setting up a new
TCP connection to getting the first OCSP response. When multiple OCSP responses
are pipelined over a single TCP connection, we define the lookup time for subsequent
requests from start of request to the end of the corresponding response. This defini-
tion reflects the amortization of connection setup time over several requests, but it may
underrepresent the user-perceived time if requests arrived in a burst.
CDNs used in trac: We find that overall the current OCSP lookup is very quick
with a median time of 19.25 ms (Figure 2.1). Even when we include the connection
setup times by considering only new HTTP connections, the median OCSP lookup
time is still only 23.78 ms. Studies by Stark et al. in 2012 showed medians 14
larger [SHI
+
12] (291 ms compared to our 19 ms). Although most lookups are fast, the
distribution of times has a long tail, with a very few (less than 0.1%) taking 5 seconds
to 8 minutes.
We believe the primary reason OCSP performance has improved since 2012 is that
today most OCSP trac is served by CDNs. To identify OCSP queries going to CDNs
we mapped IP addresses in the trac to hostnames and well known CDNs (like Akamai,
Edgecast, and Google) or the presence of CDNs in the reverse hostname.
Table 2.2 shows the fraction of lookups (dynamic trac) and servers (static OCSP
sites) that we identify as being served or hosted by CDNs. While only 39% of the servers
that are accessed in our passive measurements are hosted by known CDNs, we see that
these servers manage the popular certificates: more than nine-tenths of queries (94%)
are served by CDNs. Service is quite heavily skewed, with the 68% of trac serviced
22
Query Trac OCSP Servers
CDN 39313464 94% 120 39%
other 2526338 6% 184 61%
total 41839802 100% 304 100%
Table 2.2: CDN usage of 304 unique OCSP servers discovered in our passive monitoring
over two months. Date: 2015-07-28 to 2015-09-28
by the top 10 busiest OCSP servers (Table 2.3). All of them are handled by third party
or internal CDNs.
CDNs seen on servers: To get further evidence of the use of CDNs by CAs,
we examine the certificates of an Internet-wide scan of TCP port 443 by Rapid7
Labs [son15]. Using their scan of 2015-09-28, we extract a list of 455 unique OCSP
servers. This list includes 57% of the OCSP servers we discovered, but neither list sub-
sumes the other. We evaluate this list for CDNs using the same method as before. We
find that 29% of the OCSP servers are invalid (non-existent domain), which is probably
cased by misconfigurations, outdated, or internal certificates. Of all certificates with
valid servers, 23% are served by CDNs, confirming that many CAs use CDNs for their
OCSP servers. It also shows that CDN use is more common in certificates of popularly
used servers than in all certificates. We believe this skew to be caused by the fact that
popular services keep their certificates updated better than the “average” TLS user. This
result again shows the importance of studying dynamic trac to dierentiate typical
OCSP performance from the worst case.
We have two additional observations about OCSP latency. First, we see that GET
requests are faster than POST requests (median 13.0 ms compared to 22.8 ms, Fig-
ure 2.1). The HTTP standards recommend GET for short requests, and we see about
half of all OCSP requests using this method.
Finally, we see that it is not uncommon for OCSP requests to reuse an existing HTTP
connection, avoiding connection setup latency. In our measurements, 24% of all OCSP
23
server observed CDN lookup
ocsp.digicert.com phicdn.net 6,205,125 14.83%
clients1.google.com self-hosted 4,859,409 11.61%
sr.symcd.com akamaiedge 3,778,672 9.03%
ocsp.entrust.net akamaiedge 2,421,420 5.79%
ocsp.godaddy.com self-hosted (using akadns) 2,399,931 5.74%
ocsp.usertrust.com self-hosted 2,248,577 5.37%
vassg141.ocsp.omniroot.com akamai 1,915,287 4.58%
ss.symcd.com akamaiedge 1,663,053 3.97%
ocsp.comodoca.com self-hosted 1,478,911 3.53%
ocsp.verisign.com akamaiedge 1,345,724 3.22%
all 294 others 13,523,693 32.32%
total 41,839,802 100%
Table 2.3: Top 10 busy OCSP servers and their lookups discovered in our passive mon-
itoring. Date: 2015-07-28 to 2015-09-28
lookups reuse a connection. Examining random samples of OCSP requests that were
reused reveals that connection reuse has several likely causes: webpages that include
resources from several other pages that share the same OCSP servers, users accessing
pages that share the same OCSP server quickly to each other, and checks for end-host
and intermediate certificates that share the same OCSP server. Connection reuse reduces
the overhead significantly: OCSP queries that reuse connections complete with a median
of 10 ms; less than half that of those that start new connections (24 ms).
Our data includes OCSP requests for both intermediate certificates and leaf certifi-
cates. Our analysis reflects the overall lookup performance of OCSP servers; we do not
study specific types of certificates.
2.4.2 OCSP Server Delay
Our passive study of OCSP trac emphasizes the performance of the most commonly
used servers. We next augment our study with observations of active probes to OCSP
24
servers, to verify the results of our passive measurements and capturing a static picture
of the time an application takes to verify the validity of certificates.
Methodology: We actively probe OCSP servers of the Alexa top-1000 from two
dierent vantage points, UCB and USC. We perform an HTTPS connection attempt for
each site (Table 2.4). We discard 362 (USC: 364) sites with failing DNS lookups, where
servers not answer to HTTPS requests or where we cannot obtain valid certificate chains.
We obtain complete certificate chains for the remaining 638 (USC: 636) sites. We iden-
tify 508 (USC: 506) unique end host certificates, discarding 130 (USC: 130) duplicate
certificates (typically by sites operated by the same company, such as youtube.com and
google.com). We then query the OCSP servers to check each end certificate using a
custom program that employs the OpenSSL library to send OCSP requests via HTTP
POST. We record the query start and response times. We conducted this experiment on
two well connected, capable machines (32-core with x86-64 Fedora 21 Linux 4.0.5 and
4-core with x86-64 Fedora 22 Linux 4.2.6). We repeat each query 20 times and report
the median value to avoid outliers.
Our active probes show overall short latencies with a median of 22.28 ms at UCB
(Figure 2.2), which is similar to the median of OCSP network delay measured by pas-
sively collected data (x2.4.1). It also shows that computational cost for generating OCSP
request and parsing response is small; in our experiment, the time to generate an OCSP
request is normally less than 0.5 ms. The latency of most OCSP requests is acceptable:
at UCB, 77% of the OCSP queries are completed within 50 ms, although there are also
some tardy responses (22%) taking more than 150 ms. This also confirms our passive
measurements of network delay and reinforces that lookup time improved significantly
compared to [SHI
+
12]. Our measurements from USC show a similar distribution of
OCSP lookup performance, but with slightly smaller latency (median 6.6 ms). We think
the dierence is caused by fewer hops to CDNs from our vantage point at USC. The
25
sites UCB USC
considered 1000 1000
no IPv4 29 27
no TLS 308 310
TLS 663 663
no cert/chain 25 27
duplicates 130 130
unique certs 508 506
no ocsp url 2 2
complete 506 504
Table 2.4: Certificates retrieved at two dierent sites for OCSP across the Alexa top-
1000 websites. Date: 2016-01-09.
0
0.2
0.4
0.6
0.8
1
0.0001 0.001 0.01 0.1 1 10 100 1000
median: 22.28 ms
median
6.6 ms
CDF
OCSP server delay (seconds)
Static: Alexa top-1000 (USC)
Static: Alexa top-1000 (UCB)
Dynamic: Network traffic (UCB)
Figure 2.2: Cumulative distribution of OCSP delay for Alexa top-1000 websites. Date:
2016-01-09.
stepped pattern in Figure 2.2 is caused by certificates sharing the same OCSP servers
and the speed of the dierent CDNs.
2.4.3 OCSP Overhead in TLS
Our measurements show only modest OCSP delays. However, this cost needs to be
put into the context of overhead it adds to the TLS connection setup. We now examine
26
how OCSP aects TLS performance during session establishment, using our passive
dataset (x2.2). We define TLS delay as the time between the client hello message and
the first encrypted application data packet sent by client. During an OCSP query, the
TLS handshake can either be interrupted until an OCSP response is received, or continue
in parallel. In the parallel case, the client must not send its first request to the server until
receiving a valid OCSP response.
Matching OCSP requests to TLS connections: To understand the overhead OCSP
adds to TLS, we must map OCSP messages transmitted via HTTP to their corresponding
TLS connections. We log all TLS connections and information about their certificates
in addition to all OCSP requests and responses. We then match OCSP requests to TLS
connections using the 4-tuple (source ip, ocsp URL, issuer name hash, serial number)
from both flows and identify the TLS connection closest in time to the OCSP request.
We identify and discard cases where the OCSP request precedes the TLS connection (an
early request), and when it follows by more than 10 s (a late request).
Using the method above, we successfully correlate 52% of the 41 M OCSP requests
with TLS connections (matched requests). We discard 17% as early requests, 1.8% as
late requests and are unable to match 30% using the 4-tuple (unmatched requests).
Although we match the majority of requests, the high mismatch rate (including
impossible early requests) stems from several challenges in matching. We believe a
large number of mismatches are caused by dual-stack, IPv4/v6 hosts where TLS connec-
tions occur on IPv6 but where the OCSP servers only support IPv4. While 88% of IPv4
addresses send both TLS and OCSP requests, 90% of IPv6 addresses send no OCSP
requests. OCSP requests caused by non-TLS services, such as code-signing [Wik]
are another reason for unmatched OCSP requests [Wik] (seex2.3) Finally, while the
reported packet-loss in our monitoring infrastructure is low (about 1% of packets), it
27
0
0.2
0.4
0.6
0.8
1
0.0001 0.001 0.01 0.1 1 10 100 1000
0.0001 0.001 0.01 0.1 1 10 100 1000 10000
median: 0.0965
median: 15.8 ms
median: 241.3 ms
CDF
Latency (seconds)
OCSP/TLS Latency ratio
OCSP (all)
TLS (all)
ratio
TLS (1 OCSP)
TLS (>1 OCSP)
Figure 2.3: Cumulative distribution of OCSP network latency and TLS delay for all
matched pairs. Date: 2015-07-28 to 2015-09-28. The blue dotted line shows the cumu-
lative distribution of the ratio of the sum of OCSP network latency to TLS delay for
every matched pair.
may still prevent the identification and parsing of some TLS and OCSP connections. To
avoid errors, we use only matched requests in the following analysis.
Finally, we filter empty TLS and OCSP queries: we discard 11% of TLS connections
that have no client application data and 0.9% of OCSP lookups that are missing either a
request or response.
OCSP lookup in TLS Delay: Using the paired OCSP queries and TLS connections,
we evaluate how much latency the OCSP lookup adds to TLS connections in Figure 2.3.
The key result is that OCSP typically accounts for one-tenth of the total TLS delay.
We see a median TLS delay of 242 ms compared to a median OCSP lookup time of
15.8 ms. We compute the ratio of OCSP lookup time to TLS delay for each paired
connection with a median of 0.0965. We see some outliers (1.2%) where the OCSP
lookup time exceeds the TLS delay; we expect these cases to be caused by timeouts.
The actual delay OCSP incurs to the user depends on the structure of the application.
Many applications do OCSP validation in parallel with starting the TLS connection. Our
28
evaluation shows that OCSP lookup time is only about 10% of TLS delay overall, as
shown by the median latency ratio, and the cost is typically 16 ms. This cost suggests
that, if OCSP is performed in parallel with TLS session setup, the OCSP delay is almost
never visible.
In summary, the OCSP lookup latencies we observe have improved significantly
compared to prior reports [SHI
+
12]. OCSP lookup only adds modest delay to TLS
setup, and potentially never adds latency when performed in parallel.
2.4.4 Eectiveness of OCSP Caching
A final component of OCSP latency is caching of OCSP responses. Our data does not
provide an exact picture of caching, but we can use it to estimate the eectiveness of
caching.
The potential of OCSP caching can be seen in the OCSP validity periods. Our pas-
sive dataset of OCSP trac, Table 2.5 shows that most OCSP responses have a validity
period of a week or more. 95% are valid for at least one day.
To give some estimate of the eectiveness of caching we counted the aggregate
number of OCSP requests relative to TLS connections. Figure 2.4 shows the number of
TLS connections and OCSP requests per day, with a mean of 30 M TLS connections and
only 0.7 M OCSP requests per day. Since we have shown that most browsers and most
IPv4 addresses use OCSP (x2.3), this ratio of 1 OCSP request for 40 TLS connections
suggests very eective caching. To understand the exact impact of OCSP caching, future
work must distinguish a cache hit from cases where browsers disable OCSP, and from
TLS sessions are established by software that does not use TLS.
The potential of long-term OCSP caching is important because it significantly atten-
uates the information about end-user browsing that is visible to CAs. Since OCSP
29
validity percent
< 1 day 2%
1–6 days 38%
7–10 days 57%
> 10 days 3%
Table 2.5: Distribution of OCSP validity times for 290 k unique certificates. Date: 2015-
07-28 to 2015-09-28.
100000
1x10
6
1x10
7
1x10
8
07-28
08-07
08-17
08-27
09-06
09-16
09-26
0
0.01
0.02
0.03
0.04
0.05
TLS connections
HTTP (OCSP) connections
OCSP requests
Ratio: OCSP/TLS
Ratio: HTTP/TLS
daily connections and OCSP requests
Ratio
Figure 2.4: Daily number of TLS connections, OCSP requests and HTTP (OCSP) con-
nections. Date: 2015-07-28 to 2015-09-28.
replies can be cached for at least one day, the information visible over this channel
is quite limited.
2.5 OCSP In Action: Revoked certificates
The point of OCSP is to revoke certificate that are no longer suitable for use, a condition
that we expect to be very rare but still very important. OCSP is eective in practice—we
see a few examples of revoked certificates in our data.
30
As expected, there are relatively few revoked certificates. We see OCSP replies for
2,180 unique revoked certificates in our passive dataset that contains OCSP replies for
1,418,315 unique certificates. Only 0.3% of OCSP queries report a revoked certificate.
We manually examined the top 10 revoked certificates by number of OCSP requests
to understand their use. Seven of these were expired code-signing certificates for
software on the Windows platform. The rest were for subdomains of t-mobile.com,
aol.com and lijit.com that were inaccessible in October 2015. We speculate that
these revocations indicate deployed software that has not been updated and is trying to
use discontinued services.
Finally, we observe a very few 638 (0.001%) OCSP responses for 105 unique certifi-
cates with the status of “unknown”. Searching for cases where the same OCSP responses
also returned a dierent status revealed the cause for 72 of these requests for 12 unique
certificates: The most common cause is certificates that have just been issued and are
not known to the revocation server yet. For 4 certificates, the CA returned an unknown
status, apparently without reason (certificate valid, later replies indicate “good” again).
For 1 code-signing certificate, the CA apparently returns unknown after the certificate
expired. For the remainder of the requests we could not identify a reason.
2.6 Related Work
There has been a wealth of work to measure dierent parts of the TLS and certifi-
cate ecosystem, including studies of details of the CA ecosystem [HBKC11], TLS
errors [AA VS13] and certificates contained in root stores [PFS14].
Prior work examined dierent aspects of TLS certificate revocation. After the 2008
Debian OpenSSL vulnerability and the Heartbleed bug, researchers studied the number
of revocations, revocation patterns and patching behavior [YRS
+
09,DLK
+
14,ZCL
+
14].
31
In dierence to these studies which focus on certificate revocation patterns after a vul-
nerability, we study the performance impact of revocation in general. Researchers also
proposed to use alternative approaches to certificate revocation like FM radio broad-
casts for certificate revocation [SLS14] as well as using short-lived certificates to make
revocations unnecessary [TSH
+
12].
Most recently, Liu et al. use full IPv4 scans and compare them with black-
lists [LTZ
+
15]. They also study revocation checking behavior of web browsers and
operating systems as well as Google’s certificate revocation infrastructure. In dierence
to us, they do not study the actual use of OCSP on the Internet or its latency impact on
the Internet.
Most related to our work, Stark et al. measured OCSP lookup latency [SHI
+
12]. Like
their work, we use active and passive approach to understand OCSP latency. However,
we collect network trac at a university network with a broader coverage. Our data has
a more diverse and much larger set of clients. Furthermore, we also compare the speed
of OCSP connections to the remainder of the TLS handshake. Netcraft published OCSP
performance surveys of major CAs [Netc, Neta]. They use static sites to study OCSP
latency and reliability. In contrast, our analysis uses live network trac to understand
current OCSP latency.
OCSP stapling [Pet13] was proposed as an alternative. Examining the usage of
OCSP stapling and its overhead is future work.
To the best of our knowledge, no previous work examined OCSP network trac. Our
analysis of actual trac patterns provides insight into dynamic trac, complementing
these prior studies that focused on analysis of static sites.
32
2.7 Conclusion
This chapter provides new measurements of OCSP latency and its usage. Our measure-
ments show that the speed of OCSP servers has increased tremendously. Due to the
widespread use of CDNs OCSP almost never has any user-perceived performance cost
when done in parallel with TLS setup, and adds only about 10% additional latency if
done sequentially (x2.4.1). Privacy has been another concern about OCSP—CAs run-
ning the OCSP servers can potentially deduce parts of a users browsing behavior. We
have shown that OCSP caching means that queries most queries are sent weekly or at
most daily, limiting this channel (x2.4.4). Finally, we have shown that while certificate
revocations are quite rare (as expected), they do occur in practice (x2.5). Ultimately, we
show that OCSP today is both important and viable—it adds minimal or no user-visible
delay or privacy, and it provides an essential protection against certificate compromise.
This chapter provides a strong evidence to support our thesis statement that it is pos-
sible to improve security of network request-response protocols without compromising
performance by protocol and deployment optimizations that are demonstrated through
measurements of protocol developments and deployments (x1.2). Specifically, we show
that applications of TLS and all the other protocols using certificates now can reliably
check certificate revocation by using OCSP. We show that this security improvement
of certificate applications comes with modest cost because of OCSP latency improve-
ments. By examining network trac at the Internet uplink of a large university we
observe a median OCSP latency of 20 ms, much lower than 291 ms reported in prior
work [SHI
+
12]. We further conduct active probes of a set of dierent OCSP server,
and discover that most OCSP trac is served by CDN. The deployment optimization
of CDN use among OCSP servers is possibly the reason for the latency improvement
of OCSP. This revealed performance improvement of OCSP can facilitate certificate
33
applications to adopt OCSP that was previously considered slow, and improve Internet
security.
In this chapter, we show that is is possible to improve security of network
request-response protocols without compromising performance, by understanding cur-
rent deployment optimizations of security protocols. In the next chapter, we will show
an additional example of protocol optimizations reducing the performance overhead and
making TCP and TLS usable for DNS.
34
Chapter 3
Connection-Oriented DNS to Improve
Privacy and Security
In this chapter, we use TCP and TLS to improve the security and privacy of DNS, and
evaluate the the performance of DNS over TCP and TLS. We demonstrate parts of the
thesis statement, as described inx1.2.2.
The Domain Name System (DNS) seems ideal for connectionless UDP. However,
UDP brings the challenges for DNS, such as eavesdropping, source-address spoofing
and reply-size limits. We propose T-DNS to address these problems. It uses TCP to
smoothly support large payloads and to mitigate spoofing and amplification for DoS.
T-DNS uses transport-layer security (TLS) to provide privacy for users. The expecta-
tions about DNS suggest connections will balloon client latency and overwhelm server
with state. However, we show that with careful implementation choices, the benefits of
TCP and TLS come at only modest cost: end-to-end latency from TLS to the recursive
resolver is only about 9% slower when UDP is used to the authoritative server, and 22%
slower with TCP to the authoritative. With conservative timeouts, we show that esti-
mated server memory requirements match current hardware: a large recursive resolver
may have 24k active connections requiring about 3.6 GB additional RAM.
This study of DNS over TCP and TLS supports our thesis statement. The secu-
rity improvement is that TCP mitigates spoofing and amplification attack, and TLS
provides privacy from users to their DNS resolvers. This security improvement does
not compromise performance: both client latency and server memory requirement are
35
acceptable. The protocol optimizations are connection persistence and a set of imple-
mentations decisions: query pipelining, out-of-order responses, TCP fast-open and TLS
connection resumption, and plausible timeouts. The measurement approaches we use
in this study are network trace analysis and modeling. To the best of our knowledge,
we are the first to study connection and session timeout for DNS over TCP and TLS
by balancing client latency and server memory, and the first to model end-to-end client
latency for DNS over TCP and TLS. Our studies show that DNS over TCP and TLS
is deployable with protocol optimizations, improving the security and privacy of DNS.
Later in Chapter 4, we will use large-scale trace replay to further demonstrate the per-
formance of DNS over TCP and TLS.
This study was a joint work with Zi Hu, Duane Wessels, Allison Mankin, and
John Heidemann. Particularly, Zi Hu leaded the evaluation of computation cost
(x3.6.1), the experiments for client latency of stub-to-recursive (x3.6.2), and recursive-
to-authoritative (x3.6.3). Zi Hu also contributed significantly to the design and imple-
mentations of DNS over TCP (x3.3.1).
Part of this chapter was published in IEEE Symposium on Security and Privacy
2015 [ZHH
+
15].
3.1 Introduction
DNS is the canonical example of a simple request-response protocol. DNS resolves
domain names like www.iana.org into the IP addresses; rendering a single web page
may require resolving several domain names, so it is desirable to minimize the latency
of each query [BMS11]. Requests and responses are typically small (originally required
to be less than 512 B, and today under 1500 B as a practical matter), so a single-packet
36
request is usually answered with a single-packet reply over UDP. Simplicity and e-
ciency has prompted DNS use in broader applications [Vix09].
DNS standards have always required support for TCP, but it has been seen as a poor
relative—necessary for large exchanges between servers, but otherwise discouraged.
TCP is more expensive than UDP, since connection setup adds latency with additional
packet exchanges, and tracking connections requires memory and computation at the
server. Why create a connection if a two-packet exchange is sucient?
This chapter makes two contributions. First, we demonstrate that DNS’s connection-
less protocol is the cause of a range of fundamental weaknesses in security and privacy
that can be addressed by connection-oriented DNS. Connections have a well understood
role in longer-lived protocols such as ssh and HTTP, but DNS’s simple, single-packet
exchange has been seen as a virtue. We show that it results in weak privacy, denial-of-
service (DoS) vulnerabilities, and policy constraints, and that these problems increase as
DNS is used in new applications, and concerns about Internet safety and privacy grow.
While prior problems have been discussed in isolation (for example, [Bor13a, Ros14])
and individual problems can often be worked around, taken together they prompt revis-
iting assumptions. We then propose T-DNS, where DNS requests should use TCP by
default (not as last resort), and DNS requests from end-users should use Transport-Layer
Security (TLS [DR08]). TCP prevents denial-of-service (DoS) amplification against
others, reduces the eects of DoS on the server, and simplifies policy choices about
DNSSEC key size, and that TLS protects queries from eavesdroppers to the recursive
resolver.
The second contribution is to show that the benefits of connection-oriented DNS
in T-DNS come at only modest cost: For clients, end-to-end latency of T-DNS (time
from a stub’s request to an answer, considering all queries and caches) is only moder-
ately more than connectionless DNS. Our models show latency increases by only 9%
37
for TLS vs UDP-only where TLS is used just from stub to recursive-resolver, and it
increases by 22% when we add TCP from recursive to authoritative. Connection reuse
results in latencies almost the same as UDP once the connection is established. With
moderate timeouts (20 s at authoritative servers and 60 s elsewhere), connection reuse
is high for servers (85–98%), amortizing setup costs for client and server. Connection
reuse for clients is lower (60–80% at the edge, but 20–40% at the root), but still results
in amortized costs and lowered latencies. For servers, connection rates are viable for
modest server-class hardware today. With conservative timeouts (20 s at authoritative
servers and 60 s elsewhere) and overestimates of per-connection memory, a large recur-
sive resolver may have 24k active connections using about 3.6 GB of RAM; authoritative
servers double those needs.
TCP and TLS are well established protocols, and many DNS variations have been
proposed, with TCP in the original specification, and prior proposals to use TLS, DTLS,
SCTP, and HTTP with XML or JSON. The contribution of our work is not protocol
novelty, but a careful evaluation of what is necessary to add established protocols to an
existing ecosystem: evaluation that shows the performance costs are modest and exper-
iments that show the security and privacy benefits are real. With wide belief that con-
nectionless DNS is mandatory for adequate performance, this study addresses a primary
impediment to improving DNS privacy. While we evaluate our specific design, we sug-
gest that our performance evaluation generalizes to most connection-like approaches
to DNS, nearly all of which require some state at both ends. In addition, we iden-
tify the specific implementation choices needed to get good performance with TCP and
TLS; alternative protocols for DNS encryption will require similar optimizations, and
we suggest they will see similar performance.
38
Why: Connection-based communication is important to improve security in three
ways. First, it improves DNS privacy through the use of encryption. We discuss alter-
natives inx3.7.4: although some employ UDP, all eectively build connections at the
application-layer to keep session keys and manage setup. DNS trac is important to
protect because hostnames are richer than already visible IP addresses and DNS queries
expose application information (x3.2.2.3). DNS queries are increasingly vulnerable,
with wireless networks, growth of third-party DNS (OpenDNS since 2006 [Ope06] and
Google Public DNS since 2009 [Ram09]), meaning that end-user requests often cross
several networks and are at risk of eavesdropping. Prior work has suggested from-
scratch approaches [Opea,Dem10,WW14]; we instead utilize existing standards to pro-
vide confidentiality for DNS, and demonstrate only moderate performance costs. As a
side-eect, T-DNS also protects DNS queries from tampering over parts of their path.
Second, TCP reduces the impact of denial-of-service (DoS) attacks in several ways.
Its connection establishment forces both sides of the conversation to prove their exis-
tence, and it has well-established methods to tolerate DoS attacks [Edd07]. Lack of
these methods has allowed UDP-based DNS to be exploited by attackers with amplifi-
cation attacks; an anonymous attacker who spoofs addresses through a DNS server can
achieve a 20 increase in trac to its victim, a critical component of recent multi-Gb/s
DoS attacks [Arb12]. We examine performance under attack inx3.5.
Finally, UDP limits on reply sizes constrains key sizes and DNS applications.
EDNS0 [DGV13] often makes 4096 B replies possible, extending the original 512 B
limit [Moc87b]. However, due to IP fragmentation [DGV13], 1500 B is seen as an
operational constraint and this limit has repeatedly aected policy choices in DNS secu-
rity and applications. IP fragmentation presents several dangers: fragments require a
resend-all loss recovery [KM87], about 8% of middleboxes (firewalls) block all frag-
ments [WKNP11], and fragmentation is one component in a class of recently discovered
39
attacks [HS13]. Of course current DNS replies strive to fit within current limits [VK12],
but DNSSEC keys approaching 2048-bits lead to fragmentation, particularly during key
rollover (x3.2.2.1). Finally, DNSSEC’s guarantees make it attractive for new protocols
with large replies, but new applications will be preempted if DNS remains limited to
short replies.
How: On the surface, connection-oriented DNS seems untenable, since TCP setup
requires an extra round-trip and state on servers. TCP is seen as bad for DNS, and so
TLS’ heavier weight handshake is impossible.
Fortunately, we show that connection persistence, reusing the same connection
for multiple requests, amortizes connection setup. We identify the key design and
implementation decisions needed to minimize overhead—query pipelining, out-of-order
responses, TCP fast open and TLS connection resumption, shifting state to clients when
possible. Combined with persistent connections with conservative timeouts, these opti-
mizations balance end-to-end latency and server load.
Our key results are to show that T-DNS is feasible and that it provides a clean solu-
tion to a broad range of DNS problems across privacy, security, and operations. We
support these claims with end-to-end models driven by analysis of day-long traces from
three dierent types of servers and experimental evaluation of prototypes
3.2 Problem Statement
We next briefly review today’s DNS architecture, the specific problems we aim to solve,
and our threat model.
40
3.2.1 Background
DNS is a protocol for resolving domain names to dierent resource records in a globally
distributed database. A client makes a query to a server that provides a response of a
few dozen specific types. Domain names are hierarchical with multiple components.
The database has a common root and millions of independent servers.
Originally DNS was designed to map domain names to IP addresses. Its success
as a lightweight, well understood key-to-value mapping protocol caused its role to
quickly grow to other Internet-related applications [Vix09], including host integrity
identification for anti-spam measures and replica selection in content-delivery net-
works [CFH
+
13]. Recently DNS’s trust framework (DNSSEC) has been used to
complement and extend traditional PKI/Certificate Authorities for e-mail [Fin14] and
TLS [HS12].
Protocols: DNS has always run over both connectionless UDP and connection-
oriented TCP transport protocols. UDP has always been preferred, with TCP used pri-
marily for zone transfers to replicate portions of the database, kilobytes or more in size,
across dierent servers. Responses larger than advertised limits are truncated, prompt-
ing clients to retry with TCP [Vix99]. UDP can support large packets with IP fragmen-
tation, at the cost of new problems discussed below.
The integrity of DNS data is protected by DNSSEC [AAL
+
05]. DNSSEC provides
cryptographic integrity checking of positive and negative DNS replies, but not privacy.
Since July 2010 the root zone has been signed, providing a root of trust through signed
sub-domains.
As a Distributed System: DNS resolvers have both client and server components.
Resolvers typically take three roles: stub, recursive, authoritative (Figure 3.1). Stub
resolvers are clients that talk only to recursive resolvers, which handle name resolution.
41
Figure 3.1: Stub, recursive, and authoritative resolvers.
Stubs typically send to one or a few recursive resolvers, with configuration automated
through DHCP [Dro97] or by hand.
Recursive resolvers operate both as servers for stubs and clients to authoritative
servers. Recursive resolvers work on behalf of stubs to iterate through each of the sev-
eral components in a typical domain name, contacting one or more authoritative servers
as necessary to provide a final answer to the stub. Much of the tree is stable and some
is frequently used, so recursive resolvers cache results, reusing them over their time-to-
live.
Authoritative servers provide answers for specific parts of the namespace (a zone).
Replication between authoritative peers is supported through zone transfers with notifi-
cations and periodic serial number inquiries.
This three-level description of DNS is sucient to discuss protocol performance for
this chapter. We omit both design and implementation details that are not relevant to our
discussion. The complexity of implementations varies greatly [SCRA13]; we describe
some aspects of one operator’s implementation inx3.4.1.
3.2.2 The Limitations of Single-Packet Exchange
Our goal is to remove the limitations caused by optimizing DNS around a single-packet
exchange as summarized in Table 3.1. We consider transition inx3.3.4.
42
problem current DNS with T-DNS (why)
packet size limitations guarantee: 512 B, typical:
1500 B
64 kB
source spoofing spoof-detection depends on
source ISP
most cost pushed back to spoofer (SYN cook-
ies in TCP)
privacy (stub-to-recursive) vulnerable to eavesdropping privacy (from TLS encryption)
(recursive-to-authoritative) aggregation at recursive aggregation, or optional TLS
Table 3.1: Benefits of T-DNS.
3.2.2.1 Avoiding Arbitrary Limits to Response Size
Limitation in payload size is an increasing problem as DNS evolves to improve secu-
rity. Without EDNS [DGV13], UDP DNS messages are limited to 512 B. With EDNS,
clients and servers may increase this limit (4096 B is typical), although this can lead to
fragmentation which raises its own problems [KM87]. Due to problematic middleboxes,
clients must be prepared to fall back to 512 B, or resend the query by TCP. Evidence
suggests that 5% [WKNP11] or 2.6% [Hus13] of users find TCP impeded. Such work-
arounds are often fragile and the complexities of incomplete replies can be a source of
bugs and security problems [HS13].
Evolution of DNS and deployment of DNSSEC have pushed reply sizes larger. We
studied Alexa top-1000 websites, finding that 75% have replies that are at least 738 B
(seex6.1 for details).
With increasingly larger DNS replies (for example, from longer DNSSEC keys),
IP-level fragmentation becomes a risk in many or all replies. To quantify this prob-
lem, Figure 3.2 examines a 10-minute trace with 13.5M DNSSEC enabled responses
of one server for .com. Over this real-world trace we model the eects of dierent key
sizes by replacing current 1024-bit RSA signatures with longer ones. We model regular
operation for several key sizes, showing CDFs for the size of all responses, and dots for
negative responses (medians for NXD; quartiles are within 1% and so are omitted) using
43
0
0.2
0.4
0.6
0.8
1
0 500 1000 1500 2000 2500 3000 3500
CDF (all responses)
Estimated DNSSEC Response size (Bytes)
normal
case
KSK
rollover
ZSK size
(bits)
1024
2048
3072
4096
>1500: IP fragmentation likely
NXD median
DNSKEY
2k KSK
3k KSK
4k KSK
Figure 3.2: Estimated response sizes with dierent length DNSSEC keys. Dots show
sizes for DNSKEY and median for NXDomain replies. (Data: trace and modeling)
NSEC3 [LSAB08], and DNSKEY replies for several sizes of KSK (each row) and ZSK
(dierent shapes, exact values).
Figure 3.2 shows that with a 2048-bit ZSK, 5% of DNSSEC responses and almost all
NXDomain responses, and some DNSKEYs during rollover will suer IP fragmentation
(shown in the shaded region above 1500 B).
This evaluation supports our claim that connectionless transport distorts current
operational and security policies. Worries about fragmentation have contributed to
delay and concern about key rollover and use of 2048-bit keys. More importantly, other
designs have been dismissed because of reply sizes, such as proposals to decentralize
signing authority for the DNS root which might lead to requiring TCP for root reso-
lution [Sul14]. For some, this requirement for TCP is seen as a significant technical
barrier forcing use of shorter keys or limitations of algorithms.
Finally, size can also preempt future DNS applications. Recent work has explored
the use of DNS for managing trust relationships (for example [OKLM12]), so one might
44
ask how DNS would be used if these constraints to response size were removed. We
examine the PGP web of trust [Pen14] as a trust ecosystem that is unconstrained by
packet sizes. Rather than a hierarchy, key authentication PGP builds a mesh of signa-
tures, so 20% of keys show 10 or more signatures, and well connected keys are essential
to connecting the graph. PGP public keys with 4 signatures exceeds 4kB, and about
40% of keys have 4 signatures or more [Pen14]. If DNS either grows to consider non-
hierarchical trust, or if it is simply used to store such information [Wou14], larger replies
will be important.
T-DNS’s use of TCP replaces IP-level fragmentation with TCP’s robust methods for
retry and bytestream.
3.2.2.2 Need for Sender Validation
Uncertainty about the source address of senders is a problem that aects both DNS
servers and others on the Internet. Today source IP addresses are easy to spoof, allow-
ing botnets to mount denial-of-service (DoS) attacks on DNS servers directly [ICA07,
Sen12], and to leverage DNS servers as part of an attack on a third party through a DNS
Amplification attack [VE06, MP13].
Work-arounds to DNS’s role in DoS attacks exist. Many anti-spoofing mechanisms
have been proposed, and DNS servers are able to rate-limit replies. T-DNS would greatly
reduce the vulnerability of DNS to DoS and as DoS leverage against others. Well estab-
lished techniques protect DNS servers from TCP-based DoS attacks [Edd07, Sim11],
and TCP’s connection establishment precludes source address spoofing, eliminating
amplification attacks.
We do not have data to quantify the number of DNS amplification attacks. How-
ever, measurements of source-IP spoofing shows that the number of networks that allow
45
spoofing has been fairly steady for six years [BKkc13]. Recent measurement of dis-
tributed reflective denial-of-service (DRDoS) shows the majority of the attacks involve
DNS amplification [Ros14]. Recent reports of DoS show that DNS amplification is a
serious problem, particularly in the largest attacks [Arb12]. T-DNS suggests a long-term
path to reduce this risk.
Even if TCP reduces DoS attacks, we must ensure it does not create new risks, as we
show experimentally inx3.5. Fortunately TCP security is well studied due to the web
ecosystem. We describe our approaches to DoS above, and most other known attacks
have defenses. A more detailed list of TCP-specific attacks that do not apply is inx6.8.
3.2.2.3 Need for DNS Privacy
Lack of protection for query privacy is the final problem. Traditionally, privacy of Inter-
net trac has not been seen as critical. However, recent trends in DNS use, deploy-
ment and documentation of widespread eavesdropping increase the need for query pri-
vacy [Bor13a]. First, end-user queries are increasingly exposed to possible eavesdrop-
ping, through use of third-party DNS services such as OpenDNS and Google Public
DNS, and through access on open networks such as WiFi hotspots. Second, presence
of widespread eavesdropping and misdirection is now well documented, for govern-
ment espionage [Gre13], censorship [Ano12], and criminal gain [MDL13]. Finally,
ISPs have recognized the opportunity to monetize DNS typos, redirecting non-existing
domain responses (NXDOMAIN hijacking), a practice widespread since 2009 (for
example [Met09]). For both corporate or national observation or interference, we sug-
gest that one must follow the policies of one’s provider and obey the laws of one’s
country, but we see value in making those policies explicit by requiring interaction with
the operator of the configured recursive name server, rather than making passive obser-
vation easy.
46
DNS is also important to keep private because it is used for many services. While
protecting queries for IP addresses may seem unnecessary if the IP addresses will then
immediately appear in an open IP header, full domain-names provide information well
beyond just the IP address. For web services provided by shared clouds, the domain
name is critical since IP addresses are shared across many services. DNS is also used
for many things other than translating names to IP addresses: one example is anti-spam
services where DNS maps e-mail senders to reputation, exposing some e-mail sources
via DNS [LS12].
Although DNS privacy issues are growing, most DNS security concerns have
focused on the integrity of DNS replies, out of fear of reply modification. The integrity
of DNS replies has been largely solved by DNSSEC which provides end-to-end integrity
checks.
3.2.3 Threat Model
To understand security aspects of these problems we next define our threat model.
For DoS attacks exploiting spoofed source addresses, our adversary can send to the
30M currently existing open, recursive resolvers that lack ingress filtering [Mau13].
For query eavesdropping and attacks on privacy, we assume an adversary with net-
work access on the network between the user and the recursive resolver. We assume
aggregation and caching at the recursive resolver provide eective anonymization to
authoritative servers; if not it could enable TLS.
We also assume the operator of the recursive resolver is trusted. Although outside
the scope of this chapter, this requirement can be relaxed by alternating requests across
several DNS providers, implementing a mix network shuing requests from multiple
users, or padding the request stream with fake queries. Similarly, privacy attacks using
cache timing are outside our scope, but solved by request padding [JSBM02].
47
For fragmentation attacks due to limited packet size, we assume an o-path adver-
sary that can inject packets with spoofed source addresses, following Herzberg and
Schulman [HS13].
Other attacks on query integrity are already largely prevented by DNSSEC and so
they are outside the scope of this chapter. (T-DNS augments DNSSEC, it is not intended
to replace it.)
We depend on existing mechanisms to avoid person-in-the-middle attacks on T-DNS
setup of TLS as discussed inx3.3.2. Concurrent with our work, Shulman identified
information leakage in encrypted DNS [Shu14]. This chapter seeks to close the primary
channel; we recognize side channels remain.
T-DNS clients may set their own policy for handling downgrade attacks, where a
request for privacy is declined. An adversary in control of the network can interfere
with TLS negotiation, preventing its use. A conservative client may retry other servers
or refuse to provide non-private DNS, or it may alert the user.
3.3 Design and Implementation of T-DNS
Table 3.2 lists design choices for T-DNS; next we describe in-band TLS negotiation
(our protocol addition), and implementation choices that improve performance (shown
inx3.6). These design choice are critical to amortize the cost of connections.
3.3.1 DNS over TCP
Design of DNS support for TCP was in the original specification [Moc87b] with later
clarifications [Bel10]. However, implementations of DNS-over-TCP have been under-
developed because it is not seen today as the common case. We consider three imple-
mentation decisions, two required to to make TCP performance approach UDP.
48
system
feature T-DNS unbound DNSCrypt DNSCurve
signaling in-band implicit implicit per-query
protocol/port TCP/53 TCP/443 TCP/443 UDP/53
encryption negotiable from TLS Curve25519
stub/recursive yes yes yes no
recursive/authoritative yes no no yes
pipelining yes no* from UDP
out-of-order replies yes no* from UDP
TCP Fast Open yes no n/a n/a
TLS resumption yes no n/a n/a
Table 3.2: Design choices in T-DNS as compared to alternatives.
Pipelining is sending multiple queries before their responses arrive. It is essential
to avoid round-trip delays that would occur with the stop-and-wait alternative. Batches
of queries are common: recursive resolvers with many clients have many outstanding
requests to popular servers, such as that for .com. End-users often have multiple names
to resolve, since most web pages draw resources from multiple domain names. We
examined 40M web pages (about 1.4% of CommonCrawl-002 [Gre11]) to confirm that
62% of web pages have 4 or more unique domain names, and 32% have 10 or more.
Support for receiving pipelined requests over TCP exists in bind and unbound.
However neither sends TCP unless forced to by indication of reply truncation in UDP;
and although explicitly allowed, we know of no widely used client that sends multiple
requests over TCP. Our custom stub resolver supports pipelining, and we are working to
bring T-DNS to thegetdns resolver.
Out-of-order processing (OOOP) at recursive resolvers is another important opti-
mization to avoid head-of-line blocking. TCP imposes an order on incoming queries;
OOOP means replies can be in a dierent order, as defined and explicitly allowed by
RFC-5966 [Bel10]. Without OOOP, queries to even a small percentage of distant servers
will stall a strictly-ordered queue, unnecessarily delaying all subsequent queries. (For
49
UDP, absence of connections means all prominent DNS servers naturally handle queries
with OOOP.)
We know of no DNS server today that supports out-of-order processing of TCP
queries. Both bind and unbound instead resolve each query for a TCP connection
before considering the next. We have implemented out-of-order processing in our DNS
proxy (converting incoming TLS queries back to UDP at the server), and have a proto-
type implementation inunbound.
Finally, when possible, we wish to shift state from server to client. Per-client state
accumulates in servers with many connections, as observed in the TIME-WAIT state
overheads due to closed TCP connections previously observed in web servers [FTY99].
Shifting TCP state with DNS is currently being standardized [WA13].
These implementation details are important not only to DNS; their importance has
been recognized before in HTTP [NGBS
+
97,FTY99]. HTTP/1.1 supports only pipelin-
ing, but both are possible in DNS and proposed HTTP/2 [Not14].
3.3.2 DNS over TLS
TLS for DNS builds on TCP, with new decisions about trust, negotiation, and imple-
mentation choices.
3.3.2.1 Grounding Trust
TLS depends on public-key cryptography to establish session keys to secure each con-
nection and prevent person-in-the middle attacks [DR08]. DNS servers must be given
TLS certificates, available today from many sources at little or no cost.
Client trust follows one of several current practices. We prefer DANE/TLSA to
leverage the DNSSEC chain of trust [HS12], but other alternatives are the current public-
key infrastructures (PKI) or trusted Certificate Authorities (CAs) provided out-of-band
50
(such as from one’s OS vendor or company). To avoid circular dependencies between
T-DNS and DANE, one may bootstrap T-DNS’s initial TLS certificate through external
means (mentioned above) or with DANE without privacy.
3.3.2.2 Upwards TLS Negotiation
T-DNS must negotiate the use of TLS. Earlier protocols selected TLS with separate
ports, but IETF now encourages in-protocol upgrade to TLS to reduce port usage;
this approach is the current preference for many protocols (IMAP, POP3, SMTP, FTP,
XMPP, LDAP, and NNTP, although most of these do have legacy, IANA-allocated, but
not RFC-standardized, ports to indicate TLS, XMPP, the most recent, being an excep-
tion). to indicate TLS). We therefore propose a new EDNS0 extension [DGV13] to
negotiate the use of TLS. We summarize our proposal below and have provided a for-
mal specification elsewhere [HZH
+
14].
Our negotiation mechanism uses a new “TLS OK” (TO) bit in the extended flags
of the EDNS0 OPT record. A client requests TLS by setting this bit in a DNS query.
A server that supports TLS responds with this bit set, then both client and server carry
out a TLS handshake [DR08]. The TLS handshake generates a unique session key that
protects subsequent, normal DNS queries from eavesdropping over the connection.
The DNS query made to start TLS negotiation obviously is sent without TLS encryp-
tion and so should not disclose information. We recommend a distinguished query with
name “STARTTLS”, type TXT, class CH, analogous to current support queries [WC07].
Once TLS is negotiated, the client and server should retain the TLS-enabled TCP
connection for subsequent requests. Either can close connections after moderate idle
periods (evaluated inx3.4), or if resource-constrained.
51
3.3.2.3 Implementation Optimizations
Two implementation choices improve performance. TLS connection resumption
allows the server to give all state needed to securely re-create a TLS connection to the
client [SZET08]. This mechanism allows a busy server to discard state, yet an inter-
mittently active client can regenerate that state more quickly than a full, fresh TLS
negotiation. A full TLS handshake requires three round-trip exchanges (one for TCP
and two for TLS); TLS resumption reduces this cost to two RTTs, and reduces server
computation by reusing the master secret and ciphersuite. Experimentally we see that
resumption is 10 faster than a new connection (x3.6.1).
TLS close notify allows one party to request the other to close the connection. We
use this mechanism to shift TCP TIME-WAIT management to the client.
3.3.3 Implementation Status
We have several implementations of these protocols. Our primary client implementation
is a custom client resolver that we use for performance testing. This client implements
all protocol options discussed here and uses either the OpenSSL or GnuTLS libraries.
We also have some functionality in a version ofdig.
We have three server implementations. Our primary implementation is in a new
DNS proxy server. It provides a minimally invasive approach that allows us to test any
recursive resolver. It receives queries with all of the options described here, then sends
them to the real recursive resolver via UDP. When the proxy and real resolver are on
the same machine or same LAN we can employ unfragmented 9 kB UDP packets, avoid
size limitations and exploiting existing OOOP for UDP. It uses either the OpenSSL or
GnuTLS libraries.
52
In the long run we expect to integrate our methods into existing resolvers. We have
implemented subsets of our approach in BIND-9.9.3 and unbound-1.4.21.
3.3.4 Gradual Deployment
Given the huge deployed base of DNS clients and servers and the complexity of some
implementations [WKNP11], any modifications to DNS will take eect gradually and
those who incur cost must also enjoy benefits. We discuss deployment in detail else-
where (x6.7), but we summarize here.
T-DNS deployment is technically feasible because our changes are backwards com-
patible with current DNS deployments. TLS negotiation is designed to disable itself
when either the client or server is unaware, or if a middlebox prevents communica-
tion. Approaches analogous to DNSSEC-trigger [NLn14] may be used to bootstrap
through temporarily interfering middleboxes, and can report long-term interference,
prompting middlebox replacement or perhaps circumvention using a dierent port. In
the meantime, individuals may select between immediately correcting the problem or
operating with DNS privacy. DNS already supports TCP, so clients and servers can
upgrade independently and will get better performance with our implementation guide-
lines. T-DNS benefits from TCP extensions like fast-open that are only very recently
standardized [CCRJ14], so T-DNS performance depends their deployment (Fast Open
is in Linux-3.7 since December 2012). Gradual deployment does no harm; as clients
and servers upgrade, privacy becomes an option and performance for large responses
improves.
Motivation for deployment stems from T-DNS’s privacy and DoS-mitigation. Some
users today want greater privacy, making it a feature ISPs or public DNS-operators can
promote. The DoS-mitigation eects of TCP allows DNS operators to reduce their
amount of capacity overprovisioning to handle DoS. T-DNS’s policy benefits from size
53
require widespread adoption of TCP, but the penalty of slow adoption is primarily lower
performance, so complete deployment is not necessary.
T-DNS deployment is feasible and motivations exist for deployment, but the need
for changes to hardware and software suggests that much deployment will likely follow
the natural hardware refresh cycle.
3.4 Connection Reuse and Resources
Connection reuse is important for T-DNS performance to amortize setup over multiple
queries (x3.6). Reuse poses a fundamental trade-o: with plentiful resources and strict
latency needs, clients prefer long-lived connections. But servers share resources over
many clients and prefer short-lived connections.
We next examine this trade-o, varying connection timeout to measure the connec-
tion hit fraction, how often an existing connection can be reused without setup, and
concurrent connections, how many connections are active on a server at any time. We
relate active connections to server resource use.
3.4.1 Datasets
We use three dierent datasets (Table 3.3) to stand in for stub clients, recursive resolvers,
and authoritative servers in our analysis. These datasets are derived from server logging
(Level3) or packet capture (the others). While more data is always better, we believe
our data captures very diverse conditions and more data is very unlikely to change the
conclusions.
DNSChanger: DNSChanger is a malware that redirects end-users’ DNS resolvers
to a third party so they could inject advertising. This dataset was collected by the work-
ing group that, under government authority, operated replacement recursive resolvers
54
dataset date client IPs records
DNSChanger 2011-11-15
all-to-one 15k 19M
all-to-all 692k 964M
DITL/Level 3 2012-04-18
cns4.lax1 282k 781M
cns[1-4].lax1 655k 2412M
DITL/B-root 2013-05-29 3118k 1182M
Table 3.3: Datasets used to evaluate connection reuse and concurrent connections. Each
is 24 hours long.
while owners of infected computers were informed [MDL13]. It includes timing of all
queries from end-user IP addresses with this malware as observed at the working group’s
recursive resolvers. We use this dataset to represent stub-to-recursive trac, and select
trac to the busiest server (all-to-one) inx3.4.3 and the trac from all sources to all
servers (all-to-all) inx3.6.4. (We know of no public sources of stub-to-recursive data
due to privacy concerns.)
DITL/Level 3: Level 3 operates DNS service for their customers, and also as an
open public resolver [Rei]. Their infrastructure supports 9 sites, each with around 4
front-end recursive resolvers, each load-balanced across around 8 back-end resolvers,
as verified by the operators. We use their 48-hour trace hosted by DNS-OARC [DOa].
We examine two subsets of this data. We first select a random site (lax1, although we
confirmed other sites give similar results). Most client IP addresses (89%) access only
one site, so we expect to see all trac for each client in the dataset (cns[1-4].lax1).
Many clients (75%) only access one front-end at a site, so we select the busiest front-end
at this site (cns4.lax1) to provide a representative smaller (but still large) subset. We
use these Level 3 traces to represent a recursive resolver.
DITL/B-Root: This dataset was collected at the B-Root nameserver as part of
DITL-2013 and is also provided through DNS-OARC. We selected B-Root because
55
at the time of this collection it did not use anycast, so this dataset captures all trac into
one root DNS instance. (Although as one of 13 instances it is only a fraction of total
root trac.) We use this trac to represent an authoritative server, since commercial
authoritative server data is not generally accessible.
Generality: These datasets cover each class of DNS resolver (Figure 3.1) and so
span the range of behavior in dierent parts of the DNS system and evaluate our design.
However, each dataset is unique. We do not claim that any represents all servers of that
class, and we are aware of quirks in each dataset. In addition, we treat each source IP
address as a computer; NAT may make our analysis optimistic, although this choice is
correct for home routers with DNS proxies.
3.4.2 Trace Replay and Parameterization
To evaluate connection hits for dierent timeout windows we replay these datasets
through a simple simulator. We simulate an adjustable timeout window from 10 to
480 s, and track active connections to determine the number of concurrent connections
and the fraction of connection hits. We ignore the first 10 minutes of trace replay to
avoid transient eects due to a cold cache.
We convert the number of concurrent connections to hardware memory requirements
using two estimates. First, we measure memory experimentally idle TCP connections
by opening 10k simultaneous connections to unbound and measuring peak heap size
with valgrind. On a 64-bit x86 computer running Fedora 18, we estimate TCP con-
nection at 260 kB, and each TLS connection at 264 kB; to this we estimate about 100 kB
kernel memory, yielding 360 kB as a very loose upper bound. Second, Google transi-
tioned gmail to TLS with no additional hardware through careful optimizations, report-
ing 10 kB memory per connection with minimal CPU cost due to TLS [Lan10]. Based
56
0
500
1000
1500
2000
2500
3000
3500
0 50 100 150 200 250 300
0
0.25
0.5
number of concurrent connections
memory consumption (GB)
time-out window (seconds)
DNSChanger/all-to-one
Figure 3.3: Median and quartiles of number of concurrent connections. Black circles
show design point. Dataset: DNSChanger
on their publicly available optimizations, we use a conservative 150 kB as the per con-
nection memory cost.
3.4.3 Concurrent Connections and Hit Fraction
Trace replay of the three datasets provides several observations. First we consider
how usage changes over the course of the day, and we find that variation in the num-
ber of active connections is surprisingly small. When we measure counts over one-
second intervals, connections vary by10% for Level 3, with slightly more variation
for DNSChanger and less for B-Root (Figure 6.3). Connection hit fractions are even
more stable, varying by only a few percent (Figure 6.4). Given this stability, Figure 3.3,
Figure 3.4 and Figure 3.5 summarize usage with medians and quartiles. (x6.3 shows
raw data.)
The three servers have very dierent absolute numbers of active connections, con-
sistent with their client populations (Figure 3.3 DNSChanger: for this dataset, a few
thousand uncorrected users; Figure 3.4 Level 3: many thousand customers per site and
57
0
50000
100000
150000
200000
250000
300000
0 50 100 150 200 250 300
0
10
20
30
40
number of concurrent connections
memory consumption (GB)
time-out window (seconds)
Level 3, cns4.lax1
DITL/B Root
Figure 3.4: Median and quartiles of number of concurrent connections. Black circles
show design point. Datasets: Level 3/cns4.lax1 and B-Root
0.75
0.8
0.85
0.9
0.95
1
0 50 100 150 200 250 300 350 400 450 500
connection hit fractions
time-out window (seconds)
DNSChanger/all-to-all
DITL/B-Root
Level 3, cns4.lax1
Figure 3.5: Median connection hit fractions, taken server-side. Black circles show
design point. (Quartiles omitted since always less than 1%.)
B-Root: potentially any global recursive resolver). All servers show asymptotic hit
fractions with diminishing benefits beyond timeouts of around 100 s (Figure 3.5). The
asymptote varies by server: with a 120 s window, DNSChanger is at 97-98%, Level 3 at
98-99%, and B-Root at 94-96%. These fractions show that connection caching will be
58
very successful. Since much network trac is bursty, it is not surprising that caching is
eective.
Finally, comparing the authoritative server (B-Root) with recursive resolvers, we see
the ultimate hit fraction is considerably smaller (consistently several percent lower for
a given timeout). We believe the lower hit fraction at B-Root is due to its diverse client
population and is relatively small zone. We expect this result will hold for servers that
provide static DNS zones. (DNS servers providing dynamic content, such as blackhole
lists, are likely to show dierent trends.)
Recommendations: We propose timeouts of 60 s for recursive resolvers and 20 s for
authoritative servers, informed by Figure 3.3, Figure 3.4 and Figure 3.5, with a conser-
vative approach to server load. We recommend that clients and servers not preemptively
close connections, but instead maintain them for as long as they have resources. Of
course, timeouts are ultimately at the discretion of the DNS operator who can experi-
ment independently.
These recommendations imply server memory requirements. With 60 s and 20 s
timeouts for recursive and authoritative, each DNSChanger needs 0.3 GB RAM (2k
connections), Level 3 3.6 GB (24k connections), and B-Root 7.4 GB (49k connections),
based on the 75%iles in Figure 3.3, Figure 3.4 and Figure 3.5, for both user and kernel
memory with some optimization, in addition to memory for actual DNS data. These
values are well within current, commodity server hardware. With Moore’s law, memory
is growing faster than root DNS trac (as seen in DITL [CWFC08]), so future deploy-
ment will be even easier. Older servers with limited memory may instead set a small
timeout and depend on clients to use TCP Fast Open and TLS Resume to quickly restart
terminated connections.
59
DNS TCP-SYNs
action (UDP) no cookies w/cookies
query 82 B 76 B 76 B
reply 200–4096 B 66–360 B 66 B
amplification 3–40 1–6 1
Table 3.4: Amplification factors of DNS/UDP and TCP with and without SYN cookies.
3.5 Performance Under Attack
We next consider the role of DNS in denial-of-service attacks: first DNS’s role in attack-
ing others through amplification attacks, then the performance of a DNS server itself
under attack. In both cases we show that TCP mitigates the problem, and that TLS does
not make things worse.
3.5.1 DNS: Amplifying Attacks on Others
Recently, amplification attacks use DNS servers to magnify attack eects against oth-
ers [VE06,MP13]. An attacker’s botnet spoofs trac with a source address of the victim,
and the DNS server amplifies a short query into a large reply.
Table 3.4 shows our measurements of amplification factors of three classes of
attacks: DNS over UDP, and DNS over TCP without and with SYN cookies. DNS
allows an attacker to turn a short UDP request into a large UDP reply, amplifying the
attack by a factor of up to 40. TCP can amplify an attack as well, since a single SYN
can solicit multiple SYN-ACKs attempts [KHRH14], but only by a factor of 6. With
SYN cookies, TCP does not retransmit SYN-ACKs, so there is no amplification for the
attacker.
DoS-prevention also requires rate limiting, which can help defuse UDP-based ampli-
fication. Such rate limiting will be important during a potential transition from UDP to
60
1Gb/s,
5ms
F
A1
Am
ixp S
1Gb/s,
<1ms
200Mb/s,
5ms
1Gb/s,
<1ms
Figure 3.6: Network topology for DoS attack evaluation: legitimate (F), attackers (A),
and server (S).
TCP for DNS: wide use of TCP can allow more aggressive rate limits for TCP, as we
show inx3.5.2, and partial use of TCP can allow more aggressive rate limiting, as we
discuss next.
We conclude that, although TCP does not eliminate DoS attacks, full use of TCP
eliminates amplification of those attacks, and partial use of TCP allows more aggressive
rate limiting during transition.
3.5.2 Direct Denial-of-Service on the DNS Server
We next consider UDP and TCP attacks designed to overwhelm the DNS server itself.
While some DoS attacks overwhelm link bandwidth, UDP attacks on DNS often tar-
get server CPU usage, and TCP attacks overwhelm OS-limits on active connections.
Current DNS operators greatly overprovision to absorb attacks, with best-practices rec-
ommending a factor of three [BKKP00]. We next aim to show that UDP attacks are a
threat, and attacks on a naive TCP service are deadly, but a TCP service using TCP SYN
cookies forces attackers to use far more resources than today.
To evaluate a DoS attack, we deploy the network shown in Figure 3.6 in the DETER
testbed. We send foreground trac from F to a DNS server S, then evaluate the eects of
attack trac (A1 to Am) sent to the server. The trac merges at a router (IXP, an Internet
61
0
0.2
0.4
0.6
0.8
1
0 50k 100k 150k 200k 250k 300k
10
100
1000
10000
fraction of failed queries
median query latency (ms)
attack rate at server (queries/second)
more than 99% CPU usage
230k queries/s
fraction of failed
UDP queries
median UDP query latency
Figure 3.7: UDP-based DNS performance under DoS attack. Dataset: testbed experi-
ment.
0
0.2
0.4
0.6
0.8
1
0 50k 100k 150k 200k 250k
0
10k
20k
30k
40k
50k
60k
70k
fraction of failed queries
median half-open connections
spoofed TCP syn attack rate at server (queries/second)
fraction of failed
TCP queries
half-open connections
with SYN cookies
no SYN cookies
no SYN cookies
with SYN cookies
Figure 3.8: TCP-based DNS performance under spoofed DoS attack, with (filled circles)
and without (empty circles) SYN cookies. Dataset: testbed experiment.
Exchange Point) and is sent to the server behind a bottleneck link. The server hosts
a DNS domain with 6.5M names in example.com, and the attacker queries random
names that exist in this domain. The server is a single-core 2.4 GHz Intel Xeon running
62
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5 6 7 8 9
0
1k
2k
3k
4k
5k
6k
fraction of failed queries
established TCP connections
number of attackers (attack rate 10 connections/second)
fraction of failed queries
number of established connections
> 4096
max concurrent connections + backlog
max concurrent connections
Figure 3.9: TCP-based DNS performance with non-spoofed DoS attack. Dataset:
testbed experiment.
attacker foreground
protocol src IP cookies protocol resource limit
UDP spoofed n/a UDP CPU
TCP spoofed no TCP TCP control buers
TCP spoofed yes TCP TCP control buers
TCP real yes TCP TCP control buers
Table 3.5: Limited resource for each protocol combination in tested DoS attacks.
Linux Ubuntu-14.04 (64-bit). This scenario (hardware and network speeds) represents
a scaled down version of a typical deployment.
We compare several combinations of protocols for attacker and legitimate, fore-
ground trac (Table 3.5). We focus on all-UDP trac and three cases of all-TCP use to
compare current DNS with proposed TCP deployments. While the future will include
both TCP and UDP usage, these two “pure” cases show the limits. We use NSD-4.1.0
as our DNS server, with the OS and application configured to support either 65k or 4k
TCP sessions. Since we are only testing connection initiation, we do not use persis-
tent connections or TCP fast-open. Foreground trac is sent with the dig program.
63
Attack trac uses a custom-written UDP/TCP flooder or hping3; we vary the number
of attackers and measure the observed attack rate. For each kind of attack scenario, we
repeat the experiment for 10 times.
UDP-based DNS under attack: First we consider a current DNS where all trac
is UDP. A UDP receiver cannot verify source addresses, so an attacker will spoof query
source addresses to defeat source-based rate-limiting or filtering. Servers must reply
to both foreground and attack queries; the attacker exploits this to exhaust either host
processing power or network capacity in the reverse path. In the experiment, two attack-
ers can easily DoS the server if not network-limited (we pace the attackers to study a
range of rates). In Figure 3.7 we see that our server handles about 230k queries/s at
full capacity (CPU limited in the gray region). Both reply latency for completed queries
(dashed blue line and right axis) and the number of unanswered queries (solid red line
and left axis) rise dramatically under overload. These results are typical of DNS servers
under DoS attacks that are not limited by network speed; they show why robust DNS
operations are heavily overprovisioned.
DNS-over-TCP under attack: Next we consider three variations of a TCP SYN-
flood attack in Figure 3.8. Here, the attacker’s goal is to exhaust the number of available
TCP connections on the server (right axis), resulting in unanswered queries (left axis).
First, we allow the attacker to spoof source addresses and operate the server with-
out SYN cookies. By default, half-open connections persist for tens of seconds, and
legitimate queries must compete with attack trac for connection slots. In Figure 3.8,
without SYN cookies on the server (lines with empty circles), a single attacker can eas-
ily send 60k SYN/s and consume all possible TCP connections on the server, resulting
in 70% of foreground queries being dropped. With SYN cookies (lines with filled cir-
cles), all state is pushed back on the sender, so attack trac consumes no memory at the
server and no foreground replies are lost.
64
Finally we consider a TCP SYN-flood attack without spoofed addresses. A wise
server will use SYN cookies to prevent spoofed attacks, and will rate-limit new con-
nections to bound non-spoofed attacks. If we rate-limit to 10 new connections/s per IP
address (for DNS with p-TCP, there should never be more than 1 active connection per
IP), and the server has 60k TCP connection slots, then it requires 6k attackers to fully
consume the server’s connections. In experiment we test a scaled-down version of this
scenario: attackers are limited to 10 connection/s and the server supports 4096 active
connections. Figure 3.9 shows the outcome: 5 attackers are required to consume most
connection slots, at which point all legitimate trac is dropped. Although this experi-
ment is scaled down to fit our testbed, full-scale server would support 60k connections
or more. With SYN cookies against spoofers and rate limiting to 1 TCP connection
per source-IP when under attack, a single server can tolerate thousands of attackers,
many times what are required for a UDP attack. Large DNS providers often serve from
clusters of machines, requiring even more attackers.
A final threat is attacks on TLS, with the goal of consuming server CPU [Ber11].
Since TLS handshakes are expensive, we expect servers to adopt strict rate limits per
source address, perhaps 4 TLS connections/s per IP address. A 2006 study shows
that a server with a PIII-933 MHz, dual CPU can handle more than 1000 TLS con-
nections/s with optimizations [CDW06] (we expect this number to be much larger on
current hardware), requiring an attacker with 250 machines. Since we require non-
spoofed addresses, active filtering to these attacks become possible. A DoS attacker
requires more resources against an all-TLS DNS server when compared to an all-UDP
DNS server. We know that TLS-based websites survive DoS attacks, suggesting TLS-
based DNS can as well.
To summarize, we show that TCP with SYN cookies and TLS greatly increase the
work factor for an attacker to overwhelm the DNS server itself, compared with UDP.
65
Our experiments use a moderate-size DNS server, but this work-factor increase means
it is even harder for attacker to defeat large DNS deployments such as authoritative
servers for large zones. With the overhead and performance optimizations we describe,
large-size servers should find TCP and TLS both feasible and highly beneficial to the
mitigation of DoS attacks.
3.6 Client-side Latency
For clients, the primary cost of T-DNS is the additional latency due to connection setup.
Using experiments, we next examine stub-to-recursive and recursive-to-authoritative
query latency with TCP and TLS, highlighting the eects of pipelining and out-of-order
processing. Three parameters aect these results: the computation time needed to exe-
cute a query, the client-server RTT, and the workload. We show that RTTs dominate
performance, not computation. We study RTTs for both stub-to-recursive and recursive-
to-authoritative queries, since the RTT is much larger and more variable in the second
case. We consider two workloads: stop-and-wait, where each query is sent after the
reply for the last is received, and pipelining, where the client sends queries as fast as
possible. These experiments support modeling of end-to-end latency.
3.6.1 Computation Costs
We next evaluate CPU consumption of TLS. Our experiments’ client and server are
4-core x86-64 CPUs, running Fedora 19 with Linux-3.12.8 over a 1Gb/s Ethernet. We
test our own client and the Apache-2.4.6 web-server with GnuTLS and OpenSSL. We
also measure the DNSCurve client [Mat], and the DNSCrypt proxy [Opeb].
66
DNSCrypt/
step OpenSSL GnuTLS DNSCurve
TCP handshake 0.15 ms none
packet handling 0.12 ms none
crypto handshake 25.8 ms 8.0 ms 23.2 ms
key exchange 13.0 ms 6.5 ms —
CA validation 12.8 ms 1.5 ms —
crypto resumption 1.2 ms 1.4 ms no support
DNS resolution 0.1–0.5 ms same same
crypto 1 ms 0.7–1.8 ms
Table 3.6: Computational costs of connection setup and packet processing.
We report the median of 10 experimental trials, where each trial is the mean of many
repetitions because each event is brief. We measure 10k TCP handshakes, each by set-
ting up and closing a connection. We estimate TCP packet processing by sending 10k
full-size packets over an existing connection. We measure TLS connection establish-
ment from 1000 connections, and isolate key exchange from certificate validation by
repeating the experiment with CA validation disabled. We measure TLS connection
resumption with 1000 trials.
Table 3.6 compares TLS costs: TCP setup and DNS resolution are fast (less than
1 ms). TLS setup is more expensive (8 or 26 ms), although costs of key exchange and
validation vary by implementation. We see that TLS resumption is ten times faster than
full TLS setup for both OpenSSL and GnuTLS.
We also examine DNSCurve and DNSCrypt cost in Table 3.6 and find similar com-
putation is required for their session key establishment. Their client and server can cache
session keys to avoid this computation, but at the expense of keeping server state, just
as T-DNS keeps TCP and TLS state. If elliptic curve cryptography has performance or
other advantages, we expect it to be added to future TLS protocol suites.
Finally, prior work has reported server rates of 754 uncached SSL connections per
second [BHH
+
10]. These connection rates sustain steady state for recursive DNS, and
67
two servers will support steady state for our root server traces. Provisioning for peaks
would require additional capacity.
Although TLS is computationally expensive, TLS computation will not generally
limit DNS. For clients, we show (x3.6.5) that RTT dominates performance, not com-
putation. Most DNS servers today are bandwidth limited and run with very light CPU
loads. We expect server memory will be a larger limit than CPU. While our cost estima-
tion is very promising, we are still in the progress of carrying out full-scale experimental
evaluation of T-DNS under high load.
3.6.2 Latency: Stub-to-Recursive Resolver
We next carry out experiments to evaluate the eects of T-DNS on DNS use between
stub and both local and public recursive resolvers.
Typical RTTs: We estimate typical stub-to-recursive resolver RTTs in two ways.
First, we measure RTTs to the local DNS server and to three third-party DNS services
(Google, OpenDNS, and Level3) from 400 PlanetLab nodes. These experiments show
ISP-provided resolvers have very low RTT, with 80% less than 3 ms and only 5% more
than 20 ms. Third-party resolvers vary more, but anycast keeps RTT moderate: median
RTT for Google Public DNS is 23 ms, but 50 ms or higher for the “tail” of 10–25% of
stubs; other services are somewhat more distant. Second, studies of home routers show
typical RTTs of 5-15 ms [SFTM13].
Methodology: To estimate T-DNS performance we experiment with a stub resolver
with a nearby (1 ms) and more distant (35 ms) recursive resolver (values chose to rep-
resent typical extremes observed in practice). We use our custom DNS stub and the
BIND-9.9.3 combined with our proxy as the recursive. For each protocol (UDP, TCP,
TLS), the stub makes 140 unique queries, randomly drawn from the Alexa top-1000
68
400
800
1200
1600
2000
per query time (ms)
Stub to recursive RTT:
Left: RTT=1ms
Right: RTT=35ms
connection:
handshake
reuse
sending
processing
-- full
noreuse
stop-and-wait
in-order
full
reuse
stop-and-wait
in-order
full
reuse
pipeline
in-order
-- full
reuse
pipeline
out-of-order
UDP TCP TLS p-TCP p-TLS p-TCP UDP p-TCP p-TLS
2353
10312
22256
34670
41144
4072
10663
22222
36316
47275
.
.
.
.
.
.
(different scale)
(a) (b) (c) (d) (e) (f)
(g) (h) (i)
Figure 3.10: Per-query response times with a cold cache for 140 unique names with dif-
ferent protocol configurations and two stub-to-recursive RTTs (1 ms and 35 ms). Boxes
show median and quartiles. Case (f) uses a dierent scale.
sites [Ale] with DNS over that protocol. We restart the recursive resolver before chang-
ing protocols, so each protocol test starts with a known, cold cache. We then vary
each combination of protocol (UDP, TCP, and TLS), use of pipelining or stop-and-wait,
and in-order and out-of-order processing. Connections are either reused, with multiple
queries per TCP/TLS connection (p-TCP/p-TLS), or no reuse, where the connection
is reopened for each query. We repeat the experiment 10 times and report combined
results.
Cold-Cache Performance: Figure 3.10 shows the results of these experiments. We
see that UDP, TCP, and TLS performance is generally similar when other parameters
are held consistent (compare (a), (b), and (c), or (g), (h), and (i)). Even when the RTT
is 35 ms, the recursive query process still dominates protocol choice and setup costs
are moderate. The data shows that out-of-order processing is essential when pipelin-
ing is used; case (f) shows head-of-line blocking compared to (h). This case shows
69
that while current servers support TCP, our optimizations are necessary for high per-
formance. Pipelining shows higher latency than stop-and-wait regardless of protocol
(compare (g) with (a) or (i) with (c)). This dierence occurs when 140 simultaneous
queries necessarily queue at the server when the batch begins; UDP is nearly equally
aected as TCP and TLS (compare (i) and (h) with (g)). Finally, we see that the costs
of TLS are minimal here: comparing (c) with (b) and (a) or (i) with (g) and (h), natural
variation dominates performance dierences.
Warm-Cache Performance: Cold-cache performance is dominated by communi-
cation time to authoritative name servers. For queries where replies are already cached
this communication is omitted and connection setup times become noticeable. For con-
nection handling, performance of cache hits are equivalent to authoritative replies, so
our recursive-to-authoritative experiments inx3.6.3 represent warm-cache performance
with 100% cache hits. (We verified this claim by repeating our stub-to-recursive experi-
ment, but making each query twice and reporting performance only for the second query
that will always be answered from the cache.) While cache hits are expensive when they
must start new connections, persistent connections completely eliminate this overhead
(Figure 3.11, cases (e) and (f) compared to (a)). In addition, median TCP out-of-order
pipelined connections (cases (h) and (i)) are slightly faster than UDP (case (g)) because
TCP groups multiple queries into a single packet.
We conclude that protocol choice makes little performance dierence between stub
and recursive provided RTT is small and connections is not huge and connection reuse
is possible. This result is always true with cold caches, where connection setup is
dwarfed by communication time to authoritative name servers. This result applies to
warm caches provided connections can be often reused or restarted quickly. We know
that connections can be reused most of the time (x3.4.3), and TCP fast open and TLS
70
resumption can reduce costs when they are not reused. A more detailed discussion of
these experiments can be found inx6.4.1.
3.6.3 Latency: Recursive to Authoritative
We next consider performance between recursive resolvers and authoritative name
servers. While recursives are usually near stubs, authoritative servers are globally dis-
tributed with larger and more diverse RTTs.
Typical RTTs: To measure typical recursive-to-authoritative RTTs, we use both the
Alexa top-1000 sites, and for diversity, a random sample of 1000 sites from Alexa top-
1M sites. We query each from four locations: our institution in Los Angeles (isi.edu),
and PlanetLab sites in China (www.pku.edu.cn), UK (www.cam.ac.uk), and Australia
(www.monash.edu.au). We query each domain name iteratively and report the time
fetching the last component, taking the median of 10 trials to be robust to competing
trac and name server replication. We measure query time for the last component to
represent caching of higher layers.
The U.S. and U.K. sites are close to many authoritative servers, with median RTT of
45 ms, but a fairly long tail with 35% of RTTs exceeding 100 ms. Asian and Australian
sites have generally longer RTTs, with only 30% closer than 100 ms (China), and 20%
closer than 30 ms (Australia), while the rest are 150 ms or more. This jump is due to the
long propagation latency for services without sites physically in these countries. (See
Figure 6.7 for data.)
Methodology: To evaluate query latencies with larger RTTs between client and
server, we set up a DNS authoritative server (BIND-9.9.3) for an experimental domain
(example.com) and query it from a client 35 ms (8 router hops on a symmetric path)
away. Since performance is dominated by round trips and not computation we measure
latency in units of RTT and these results generalize to other RTTs. For each protocol,
71
1
2
3
4
5
0
50
100
150
200
median per query time (RTTs)
median per query time (ms)
connection:
handshake
reuse
sending
processing
-- full fastopen
noreuse
stop-and-wait
in-order
full
reuse
stop-and-wait
in-order
reuse
pipeline
out-of-order
-- full
UDP TCP TLS TCP p-TCPp-TLS UDP p-TCPp-TLS
(a)
(b)
(c)
(d) (e) (f) (g)
(h)
(i)
Figure 3.11: Per-query response times for 140 repeated queries with dierent protocols,
measured in RTTs (left axis) and ms (right). (Medians; boxes add quartiles.)
we query this name server directly, 140 times, varying the protocol in use. As before,
we repeat this experiment 10 times and report medians of all combined experiments
(Figure 3.11). Variation is usually tiny, so standard deviations are omitted except for
cases (h) and (i).
Performance: Figure 3.11 shows the results of this experiment. We first confirm
that performance is dominated by protocol exchanges: cases (a), (b) and (c) correspond
exactly to 1, 2, and 5 RTTs as predicted. Second, we see the importance of connection
reuse or caching: cases (e) and (f) with reuse have identical performance to UDP, as
does TCP fast open (case (d)).
As before, pipelining for TCP shows a higher cost because the 140 queries queue
behind each other. Examination of packet traces for cases (h) and (i) shows that about
10% of queries complete in about 1 RTT, while additional responses arrive in batches of
72
around 12, showing stair-stepped latency. For this special case of more than 100 queries
arriving simultaneously, a single connection adds some latency.
We next consider the cost of adding TLS for privacy. The community generally
considers aggregation at the recursive resolver sucient for anonymity, but TLS may
be desired there for additional privacy or as a policy [Ele11] so we consider it as an
option. Without connection reuse, a full TLS query always requires 5 RTTs (case (c),
175 ms): the TCP handshake, the DNS-over-TLS negotiation (x3.3.2.2), two for the
TLS handshake, and the private query and response.
However, once established TLS performance is identical to UDP: cases (f) and (a)
both take 1 RTT. Encryption’s cost is tiny compared to moderate round-trip delays when
we have an established connection. We expect similar results with TLS resumption.
Finally, when we add pipelining and out-of-order processing, we see similar behav-
ior as with TCP, again due to how the large, batched queries become synchronized over
a single connection.
We conclude that RTTs completely dominate recursive-to-authoritative query
latency. We show that connection reuse can eliminate connection setup RTT, and we
expect TLS resumption will be as eective as TCP fast-open. We show that TCP is
viable from recursive-to-authoritative, and TLS is also possible. A more detailed dis-
cussion of these experiments can be found inx6.4.2.
3.6.4 Client connection-hit fractions
Connection reuse is important and we found very high reuse from the server’s perspec-
tive (x3.4.3). We next show that client connection-hit fractions are lower because many
clients query infrequently.
73
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25
connection hit fractions
time-out window (hours)
DNSChanger/all-to-all cns[1-4].lax1
DITL/B Root
Figure 3.12: Median client-side connection hit fractions with quartiles with larger time-
out windows
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200 250 300 350 400 450 500
connection hit fractions
time-out window (seconds)
DNSChanger/all-to-all
cns [1-4]
.lax1
DITL BRoot
Figure 3.13: Median client-side connection hit fractions with quartiles.
To evaluate client connection hit fractions, we replay our three DNS traces through
the simulator fromx3.4.3, but we evaluate connection hit fractions per client. Fig-
ure 3.13 shows these results, with medians (lines) and quartiles (bars, with slight oset
to avoid overlap).
74
0
50
100
150
200
250
300
350
400
450
latency (ms)
s-r RTT (in ms)
5
10
20
40
80
1.0003
1.05
1.14
1.19
2.83
1.0006
1.09
1.13
1.22
2.74
1.0010
1.16
1.11
1.27
2.58
1.0017
1.24
1.09
1.33
2.37
1.0023
1.34
1.06
1.40
2.14
s-r:
r-a:
udp tcp
. . . . udp . . . .
tls tcp
....tcp....
tls tls
tls
(a) (b) (c) (d) (e)
(f)
Figure 3.14: End-to-end-performance as a function of protocol choice and stub-to-
resolver RTT
Among the three traces, the DNSChanger hit fraction exceeds Level 3, which
exceeds B-Root, because servers further up the hierarchy see less trac from any given
client. We see that the top quartile of clients have high connection hit fractions for all
traces (at 60 s: 95% for DNSChanger, 91% for Level 3, and 67% for B-Root). The con-
nection hit rate for the median client is still fairly high for DNSChanger and Level 3
(89% and 72%), but quite low for B-Root (28%). Since most B-Root content can be
cached, many clients only contact it infrequently and so fail to find an open connection.
These results suggest that clients making few requests will need to restart connec-
tions frequently. Fortunately TCP Fast Open and TLS Resumption allow these clients
to carry the state needed to accelerate this process.
3.6.5 Modeling End-to-End Latency for Clients
With this data we can now model the expected end-to-end latency for DNS users and
explore how stub, recursive and authoritative resolvers interact with dierent protocols
75
and caching. Our experiments and measurements provide parameters and focus mod-
eling on connection setup (both latency and CPU costs). Our model captures clients
restarting connections, servers timing out state, and the complex interaction of stub,
recursive, and authoritative resolvers. Our modeling has two limitations. First, we focus
on typical latency for users, per-query; the modeling reflects query frequency, empha-
sizing DNS provisioning for common queries and reflecting queries to rare sites only in
proportion to their appearance in our traces. We do not evaluate mean latency per-site,
since that would be skewed by rarely used and poorly provisioned sites. Second, our
models provide mean performance; they cannot directly provide a full distribution of
response times and “tail” performance [DB13]. In Chapter 4, we will use large-scale
trace replay to determine a full distribution with production-quality servers.
Modeling: We model latency from client to server, L
c
, as the probability of con-
nection reuse (P
C
c
) and the cost of setting up a new connection (S
C
c
) added to the the
cost of the actual query (Q
c
):
L
c
= (1 P
C
c
)S
C
c
+ Q
c
(3.1)
From Figure 3.11, Q
c
is the same for all methods with an open connection: about
one client-server RTT, or R
c
. Setup cost for UDP (S
C;udp
c
) is 0. With the probability for
TCP fast-open (TFO), P
T FO
c
, TCP setup costs:
S
C;tcp
c
= (1 P
T FO
c
)R
c
(3.2)
We model TLS setup (S
C;tls
c
) as the probability of TLS resumption (P
RE
c
) and its cost
S
C;tls
r
c
, or the cost of setting up a completely new TLS connection S
C;tls
n
c
:
S
C;tls
c
= P
RE
c
S
C;tls
r
c
+ (1 P
RE
c
)S
C;tls
n
c
(3.3)
76
For simplicity, we assume TCP fast open and TLS resumption have the same timeout, so
P
RE
c
= P
T FO
c
. Thus, S
C;tls
r
c
is 2R
c
+S
cpu
r
(1 each for TLS negotiation and handshake) and
S
C;tls
n
c
is 4R
c
+ S
cpu
n
(1 for TCP, 1 for TLS negotiation, and 2 for TLS handshake). We
set S
cpu
n
at 25.8 ms and S
cpu
r
is at 1.2 ms (Table 3.6, with and without CA validation).
We estimate P
C
c
, P
RE
c
and P
T FO
c
from our timeout window and trace analysis (Figure 3.12
and Figure 3.13).
To compute end-to-end latency (stub-to-authoritative, L
sa
), we combine stub-to-
recursive latency (L
sr
) with behavior at the recursive resolver. For a cache hit (prob-
ability P
N
r
) the recursive resolver can reply immediately. Otherwise it will make several
(N
Q
r
) queries to authoritative resolvers (each taking L
ra
) to fill its cache:
L
sa
= L
sr
+ (1 P
N
r
)N
Q
r
L
ra
(3.4)
Where L
sr
and L
ra
follow from Equation 3.1. We model recursive with the Level 3
data and authoritative as B-Root. With our recommended timeouts (60 s and 20 s), we
get P
C
sr
= 0:72 and P
C
ra
= 0:24. We assume TCP fast open and TLS resumption last 2
hours at recursive (P
RE
sr
= P
T FO
sr
= 0:9858) and 7 h at authoritative (P
RE
ra
= P
T FO
ra
= 0:8).
Prior studies of recursive resolvers suggest P
N
r
ranges from 71% to 89% [JSBM02].
We determine N
Q
r
by observing how many queries BIND-9.9.3 requires to process
the Alexa top-1000 sites. We repeat this experiment 10 times, starting each run with
a cold cache, which leads to N
Q
r
= 7:24 (standard deviation 0.036, includes 0.09 due
to query retries). We round N
Q
r
to 7 in our analysis of estimated latency. Although
this value seems high, the data shows many incoming queries require multiple outgo-
ing queries to support DNSSEC, and due to the use of content-delivery networks that
perform DNS-based redirection.
77
Scenarios: With this model we can quickly compare long-term average performance
for dierent scenarios. Figure 3.14 compares six protocol combinations (each group of
bars) We consider R
sr
= 5 ms and R
sr
= 20 ms suitable for a good U.S. or European ISP,
but we report stub-to-recursive RTTs from 5 to 80 ms.
For the local resolver, the analysis shows that use of TCP and TLS to the local
resolver adds moderate latency: current DNS has mean of 61 ms, and TCP is the same,
and TLS is only 5.4% slower with UDP upstream. Second, we see that use of connec-
tions between recursive and authoritative is more expensive: with TLS stub-to-recursive,
adding TCP to the authoritative is 19% slower and adding TLS to the authoritative more
than 180% slower. This cost follows because a single stub-to-recursive query can lead
to multiple recursive-to-authoritative queries, at large RTTs with a lower connection-
hit fraction. However this analysis is pessimistic; the expected values underestimate
possible locality in those queries.
For a third-party resolver (R
sr
= 20 ms), the trends are similar but the larger latency
to the recursive resolver raises costs: TLS to recursive (with UDP to authoritative), is
15.5% slower than UDP.
3.7 Related work
Our work draws on prior work in transport protocols and more recent work in DNS
security and privacy.
3.7.1 Siblings: DNSSEC and DANE/TLSA
DNS Security Extensions (DNSSEC) uses public-key cryptography to ensure the
integrity and origin of DNS replies [AAL
+
05]. Since the 2010 signature of the root
zone, it has provided a root of trust for DNS. DNS-based Authentication of Named
78
Entities for TLS (DANE/TLSA) allows DNS to serve as a root of trust for TLS certifi-
cates [HS12]. Our work complements these protocols, addressing the related area of
privacy.
Although DNSSEC protects the integrity and origin of requests, it does not address
query privacy. We propose TLS to support this privacy, complementing DNSSEC.
Although not our primary goal, TLS also protects against some attacks such as those
that exploit fragmentation; we discuss these below.
DANE/TLSA’s trust model is unrelated to T-DNS’s goal of privacy. Seex3.3.2.1 for
how they interact.
3.7.2 DNSCrypt and DNSCurve
OpenDNS has oered elliptic-curve cryptography to encrypt and authenticate DNS
packets between stub and recursive resolvers (DNSCrypt [Opea]) and recursive
resolvers and authoritative servers (DNSCurve [Dem10]). We first observe that these
protocols address only privacy, not denial-of-service nor limits to reply size.
These protocols address the same privacy goal as our use of TLS. While ECC is
established cryptography, above this they use a new approach to securing the channel
and a new DNS message format. We instead reuse existing DNS message format and
standard TLS and TCP. Although DNSCrypt and DNSCurve are attractive choices,
we believe TLS’ run-time negotiation of cryptographic protocol is important for long-
term deployment. We also see significant advantage in adopting existing standards
with robust libraries and optimizations (such as TLS resumption) rather than design-
ing bespoke protocols for our new application. In addition, while TLS implementations
have reported recent flaws, our view is that common libraries benefit from much greater
scrutiny than new protocols. Finally, DNSCurve’s mandate that the server’s key be its
79
hostname cleverly avoids one RTT in setup, but it shifts that burden into the DNS, poten-
tially adding millions of nameserver records should each zone require a unique key.
DNSCrypt suggests deployment with a proxy resolver on the end-user’s computer.
We also use proxies for testing, but we have prototyped integration with existing servers,
a necessity for broad deployment.
We compare DNSCrypt and DNSCurve performance inx3.6.1, and features in
Table 3.2.
3.7.3 Unbound and TLS
We are not the first to suggest combining DNS and TLS. A recent review of DNS privacy
proposed TLS [Bor13a], and NLnet Lab’s Unbound DNS server has supported TLS
since December 2011. Unbound currently supports DNS-over-TLS only on a separate
port, and doesn’t support out-of-order processing (x3.3.1), and there is no performance
analysis. Our work adds in-band negotiation and out-of-order processing (see Table 3.2),
and we are the first to study performance of DNS with TCP and TLS. Since the only
dierence is signaling TLS upgrade, our performance evaluation applies to other TLS
approaches, although unbound’s use of a new port avoids 1 RTT latency.
3.7.4 Reusing Other Standards: DTLS, TLS over SCTP, HTTPS,
and Tcpcrypt
Although UDP, TCP and TLS are widely used, additional transport protocols exist to
provide dierent semantics. Datagram Transport Layer Security (DTLS) provides TLS
over UDP [RM12], meeting our privacy requirement. While DTLS strives to be lighter
weight than TCP, it must re-create parts of TCP: the TLS handshake requires reliabil-
ity and ordering, DoS-prevention requires cookies analogous to SYN cookies in TCP’s
80
handshake, and it caches these, analogous to TCP fast-open. Thus with DoS-protection,
DTLS provides no performance advantage, other than eliminating TCP’s data ordering.
(We provide a more detailed evaluation of these inx6.9.) Applications using DTLS
suer the same payload limits as UDP (actually slightly worse because of its additional
header), so it does not address the policy constraints we observe. Since DTLS libraries
are less mature than TLS and DTLS oers few unique benefits, we recommend T-DNS.
TLS over SCTP has been standardized [JRT02]. SCTP is an attractive alternative
to TCP because TCP’s ordering guarantees are not desired for DNS, but we believe
performance is otherwise similar, as with DTLS.
Several groups have proposed some version of DNS over HTTP. Kaminsky pro-
posed DNS over HTTP [Kam10] with some performance evaluation [Kam11]; Unbound
runs the DNS protocol over TLS on port 443 (a non-standard encoding on the HTTPS
port); others have proposed making DNS queries over XML [PV11] or JSON [Bor13b]
and full HTTP or HTTPS. Use of port 443 saves one RTT for TLS negotiation, but
using DNS encoding is non-standard, and HTTP encoding is significantly more bulky.
Most of these proposals lack a complete specification (except XML [PV11]) or detailed
performance analysis (Kaminsky provides some [Kam11]). At a protocol level, DNS
over HTTP must be strictly slower than DNS over TCP, since HTTP requires its own
headers, and XML or JSON encodings are bulkier. One semi-tuned proxy shows 60 ms
per query overhead [Ver14], but careful studies quantifying overhead is future work.
Tcpcrypt provides encryption without authentication at the transport layer. This
subset is faster than TLS and shifts computation to the client [BHH
+
10]. T-DNS’s uses
TLS for privacy (and DNSSEC for authentication), so tcpcrypt may be an attractive
alternative to TLS. Tcpcrypt is relatively new and not yet standardized. Our analysis
suggests that, since RTTs dominate performance, tcpcrypt will improve but not qualita-
tively change performance; experimental evaluation is future work.
81
The very wide use of TCP and TLS-over-TCP provides a wealth of time-tested
implementations and libraries, while DTLS and SCTP implementations have seen less
exercise. We show that TCP and TLS-over-TCP can provide near-UDP performance
with connection caching. Because DTLS carries out the same protocol exchange as
TLS (when spoof prevention is enabled), it will have the same latency. Our analysis
applies directly to HTTP-based approaches, although its more verbose framing may
have slightly higher overhead.
3.7.5 Other Approaches to DNS Privacy
Zhao et al. [ZHS07] proposed adding cover trac (additional queries) to DNS to con-
ceal actual queries from an eavesdropper, Castillo-Perez and Garcia-Alfaro extend this
work [CPGA09]. These approaches may help protect against an adversary that controls
the recursive resolver; we instead provide only communications privacy, without range
queries.
Lu and Tsudick [LT10] identify a number of privacy threats to DNS and propose
replacing it with a DHT-based system, and Zhao et al. [ZFHS10] later propose DNS
modifications to support their range queries [ZHS07]. Such approaches can provide
very strong privacy guarantees, but such large protocol modifications pose significant
deployment challenges.
3.7.6 Specific Attacks on DNS
As a critical protocol, DNS has been subject to targeted attacks. These attacks often
exploit currently open DNS recursive name servers, and so they would be prevented
with use of TLS’ secure client-to-server channel. Injection attacks include the Kamin-
sky vulnerability [Kam08], mitigated by changes to DNS implementations; sending of
duplicate replies ahead of the legitimate reply [Ano12], mitigated by Hold-On at the
82
client [DWZ
+
12]; and injection of IP fragments to circumvent DNSSEC [HS13], miti-
gated by implementation and operations changes.
Although specific countermeasures exist for each of these attacks, responding to new
attacks is costly and slow. Connection-level encryption like TLS may prevent a broad
class of attacks that manipulate replies (for example, [HS13]). Although TLS is not
foolproof (for example, it can be vulnerable to person-in-the-middle attacks), and we do
not resolve all injection attacks (such as injection of TCP RST or TLS-close notify), we
believe TLS significantly raises the bar for these attacks.
Similarly, recent proposals add cookies to UDP-based DNS to reduce the impact
of DoS attacks [Eas14]. While we support cookies, a shift to TCP addresses policy
constraints as well as DNS, and enables use of TLS.
3.8 Conclusion
This chapter provides new performance evaluation of DNS over TLS and TLS. Connec-
tionless DNS is overdue for reassessment due to privacy limitations, security concerns,
and sizes that constrain policy and evolution. Traditional expectations about DNS sug-
gest that connections will bring much overhead on client latency and server state. Our
analysis and experiments show that connection-oriented DNS addresses these problems,
and that latency and resource needs of T-DNS are manageable.
This chapter shows a strong evidence to support our thesis statement that it is pos-
sible to improve security of network request-response protocols without compromising
performance by protocol optimizations that are demonstrated through measurements of
protocols developments (x1.2). We first demonstrate that DNS’s connectionless proto-
col is the cause of a range of fundamental weaknesses in security and privacy that can be
addressed by connection-oriented DNS. The security improvement is that TCP mitigates
83
spoofing and amplification for DoS, and TLS provides privacy from users to their DNS
resolvers, optionally to authoritative servers. We show that this security improvement
only requires modest cost without compromising performance. Our analysis show that
latency increases by only 9% when TLS is used from stub to recursive-resolver, and
it increases by 22% when we add TCP from recursive to authoritative. We show that
with moderate timeouts (20 s at authoritative servers and 60 s elsewhere), connection
rates are viable for server-class hardware: a large recursive resolver use about 3.6 GB
of RAM. To achieve this manageable performance, we identify a set of protocol opti-
mizations for DNS over TCP and TLS: query pipelining, out-of-order responses, TCP
fast-open and TLS connection resumption, and plausible timeouts. Our measurement
approaches are network trace analysis and modeling. We evaluate the connection reuse
at client and server through network trace simulation, and derive appropriate connection
reuse timeouts. We model end-to-end client latency based on our analysis results. To
the best of our knowledge, we are the first to model end-to-end client latency for DNS
over TCP and TLS.
In this chapter, we show that is is possible to improve security of network request-
response protocols without compromising performance, by understanding protocol
development and optimizations that are demonstrated through network trace analysis
and modeling. In the next chapter, we will further support the thesis statement by using
large-scale trace replay to study dynamics of the performance overhead for DNS over
TCP and TLS.
84
Chapter 4
DNS Experimentation at Scale
In this chapter, we build a configurable, general-purpose DNS experimental framework
named LDplayer. We use this experimental framework to conduct large-scale experi-
ments, further evaluating the performance of DNS over TCP and TLS. We demonstrate
parts of the thesis statement, as describedx1.2.3.
The impact of DNS changes is dicult to model due to complex interactions
between DNS optimizations, caching, and distributed operation. We suggest that exper-
imentation at scale is needed to evaluate changes and facilitate DNS evolution. This
chapter presents LDplayer, a configurable, general-purpose DNS experimental frame-
work that enables DNS experiments to scale in several dimensions: many zones, mul-
tiple levels of DNS hierarchy, high query rates, and diverse query sources. LDplayer
provides high fidelity experiments while meeting these requirements through its dis-
tributed DNS query replay system, methods to rebuild the relevant DNS hierarchy from
traces, and ecient emulation of this hierarchy on minimal hardware. We use LDplayer
to demonstrate the memory and CPU requirements of a DNS root server with all traf-
fic running over TCP and TLS, although in Chapter 3, we estimated server memory by
trace analysis (x3.4) and model end-to-end latency (x3.6.5).
This study of DNS over TCP and TLS using large-scale experiment supports our
thesis statement. The security improvement and protocol optimizations that we show in
this chapter are the same as the previous chapter. The security improvement is that TCP
mitigates spoofing and amplification attack, and TLS provides privacy for users. The
protocol optimizations are connection persistence and a set of design choices: query
85
pipelining, out-of-order responses, TCP fast-open and TLS connection resumption, and
plausible timeouts. We conduct new measurement studies to answer some open research
questions that could not be answered before, such as dynamic cost of DNS over TCP and
TLS. We convert all queries to use TCP and TLS, and conduct experiments of large-scale
DNS trace replay. We show the dynamics of server memory and CPU usage, demon-
strating actual resource requirements. Our experimental results confirm prior models
in a real-world implementation, showing that even if all DNS were shifted to connec-
tion oriented protocols, memory requirements are manageable. Our system allows the
first evaluation of CPU consumption of TCP and TLS while prior work was unable to
model CPU costs. Our experiments show that connection tracking and cryptography
processing in TLS does not increase CPU usage noticeably over UDP.
This study is a joint work with Prof. John Heidemann. Part of this chapter was
published in ACM Internet Measurement Conference 2018 [ZH18].
4.1 Introduction
The Domain Name System (DNS) is critical to the Internet. It resolves human-readable
names likewww.iana.org to IP addresses like 192.0.32.8 and service discovery for many
protocols. Almost all activity on the Internet, such as web-browsing and e-mail, depend
on DNS for the correct operations. Beyond name-to-address mapping, DNS today has
grown to play various of broader roles in the Internet. It provides query engine for anti-
spam [LS12] and replica selection for content delivery networks (CDNs) [SCKB06].
DANE (DNS-based Authentication of Named Entities) [HS12] provides additional
source of trust by leveraging the integrity verification of DNSSEC [AAL
+
05]. The wide
use and critical role of DNS prompt its continuous evolution.
86
However, evolving the DNS protocol is challenging because it lives in a complex
ecosystem of many implementations, archaic deployments, and interfering middleboxes.
These challenges increasingly slow DNS development: for example, DNSSEC has
taken a decade to deploy [ORMZ08] and current use of DANE is growing but still
small [ZWMH15]. Improvements to DNS privacy are needed [Bor15] and now avail-
able [ZHH
+
15, HZH
+
16], but how long will deployment take?
DNS performance issues are also a concern, both for choices about protocol changes,
and for managing inevitable changes in use. There are a number of important open
questions: How does current server operate under the stress of a Denial-of-Service
(DoS) attack? What is the server and client performance when protocol or architec-
ture changes? What if all DNS requests were made over QUIC, TCP or TLS? What
about increasing DNSSEC key size?
Ideally measurement and models would guide these questions. However, measure-
ments captures only what is, not what might be, and DNS models are challenging
because of details of how caching and optimizations interact across levels of the DNS
hierarchy and between clients and servers. It is also dicult to estimate performance
limits with DNS involving the kernel, libraries, applications, and distributed services.
Definitive answers to DNS performance therefore require end-to-end controlled
experiments from data-driven trace replay. Experiments enable testing dierent
approaches for DNS and evaluating the costs and benefits against dierent infrastruc-
tures, revealing unknown constraints. Trace replay can drive these experiments with
real-world current workloads, or with extrapolated “what-if” workloads.
Accurate DNS experiments are quite challenging. In addition to the requirements
of modeling, the DNS system is large, distributed, and optimized. With millions of
authoritative and recursive servers, it is hard to recreate a global DNS hierarchy in a
controlled experiment. A naive testbed would therefore require millions of separate
87
servers, since protocol optimizations cause incorrect results when many zones are pro-
vided by one server. Prior DNS testbeds avoided these complexities, instead studying
DNSSEC overhead in a piece of the tree [ADF06] and query distribution of recursive
servers [WFBc04]. While eective for their specific topics, these approaches do not
generalize to support changing protocols, large query rates, and diverse query sources
across a many-level hierarchy.
In this chapter, we present LDplayer, a configurable, general-purpose DNS experi-
mental framework that enables DNS experiments at scale in several dimensions: many
zones, numerous levels of DNS hierarchy, large query rates, and diverse query sources.
Our system provides DNS researchers and operators a basis for DNS experimentation
that can further lead to DNS evolution.
Our first contribution is to show how LDplayer can scale to eciently model a large
DNS hierarchy and playback large traces (x4.2). LDplayer can correctly emulate multi-
ple independent levels of the DNS hierarchy on a single instance of DNS server, exploit-
ing a combination of proxies and routing to circumvent optimizations that would oth-
erwise distort results. Our insight is that a single server hosting many dierent zones
reduces deployment cost; we combine proxies and controlled routing to “pass” queries
to the correct zone so that the server gives the correct answers from a set of dierent
zones. Emulating multiple zones on limited hardware is a DNS-specific technique that
goes beyond the prior systems that replay general network trac. To this framework
we add a two-level query replay system where a single computer can accurately replay
more than 87 k queries per second, twice as fast as typical query rates a DNS root letter.
Multiple computers can generate trac in parallel with minimal coordination overhead,
potentially scaling roughly linearly with compute power to much larger rates.
Second, the power of controlled replay of traces is that we can modify the replay
to explore “what if” questions about possible future DNS evolution (x4.5), beyond just
88
replaying existing traces (x4.4). We demonstrate this capability with two experiments.
We explore how trac volume changes (increasing by 31%) if all DNS queries employ
DNSSEC (x4.5.1). We also use LDplayer to consider how server memory and client
latency changes if all queries were TCP and TLS instead of UDP. Other potential appli-
cations include the study of server hardware and software under denial-of-service attack,
growth of the number or size of zones, or changes in hardware and software. All of these
questions are important operational concerns today. While some have been answered
through one-o studies and custom experiments or analysis, LDplayer allows evalua-
tion of actual server software, providing greater confidence in the results. For example,
relative to prior studies of DNS over TCP [ZHH
+
15], our use of trace-replay provides
strong statements about all aspects of server memory (15 GB for TCP and 18 GB for
TLS) and CPU usage with real-world implementation (x4.5.2), and discovers previously
unknown discontinuities in client latency.
The software of our system is publicly available at: https://ant.isi.edu/
software/ldplayer/index.html.
4.2 LDplayer: DNS trace player
We next describe our requirements, then summarize the architecture and describe critical
elements in detail.
4.2.1 Design Requirements
The goal of LDplayer is to provide a controlled testbed for repeatable experiments upon
realistic evaluation of DNS performance, with the following requirements:
89
Emulate complete DNS hierarchy, eciently: LDplayer must emulate multiple
independent levels of the DNS hierarchy and provide correct responses using minimal
commodity hardware.
We must support many zones. It is not scalable to use separated servers or virtual
machines to host each zone because of hardware limits and many dierent zones in a
network trace. A single server providing many zones of DNS hierarchy does not work
directly, because the server gives the final DNS answer straightly and skips the round
trip of DNS referral replies.
Replays do not leak trac to the Internet: Experimental trac must stay inside
the testbed, without polluting the Internet. Otherwise each experiment could leak bursts
of requests to the real Internet, causing problems for the Internet and the experiment.
Resolving a single query will require interaction of multiple authoritative DNS servers.
For the Internet, leaks of replay from high-rate experiments might stress real-world
servers. For the experiment, we need to control response times, and queries that go
to the Internet add uncontrolled delay and jitter.
Repeatability of experiments: LDplayer needs to support repeatable, controlled
experiments. When an experiment is re-run, the replies to the same set of replayed
queries should stay the same. This reproducibility is very important for experiments
that require fixed query-response content to evaluate new transform in DNS, such as
protocol changes and new server implementations. Without building complete zone, the
responses could change over time when re-looked up. Some zones hosted at CDNs may
have external factors that influence responses, such as load balancing.
Controlled variations in trac, when desired: Replay must be able to manipulate
traces to answer “what if” questions with variations of real trac. Since input is nor-
mally network traces in some binary format (for example, pcap), the main challenge is
how to provide a flexible and user-friendly mechanism for query modification. We also
90
need to minimize the delay caused by query manipulation, so that trace replay is fast
enough to keep up with real time.
Accurate timing at high query rates: LDplayer must be capable of replaying
queries at fast rates, while preserving correct timing, to reproduce interesting real-world
trac patterns for both regular and under attack. However, both using a single host and
many hosts have challenges. Due to resource constraints on CPU and the number of
ports, a single host may not be capable to replay fast query stream or emulate diverse
sources. A potential solution is to distribute input to dierent hosts, however, it brings
another challenge in ensuring the correct timing and ordering of individual queries.
Support multiple protocols eectively: LDplayer needs to support both connec-
tionless (UDP) and connection-oriented (TCP and TLS), given increasing interest in
DNS over connections [ZHH
+
15]. However, connection-oriented protocols bring chal-
lenges in trace replay: emulating connection reuse and round-trip time (RTT). The query
replay system of LDplayer is the first system that can emulate connection reuse for DNS
over TCP. Emulation of RTT is important for experiments of connection-oriented DNS,
because RTT will aect protocol responses with extra messages for connection setup,
while connectionless protocols do not incur those extra messages.
4.2.2 Architecture Overview
We next describe LDplayer’s architecture (Figure 4.1). With captured network traces of
DNS queries (required) and responses (optional), a researcher can use our Zone Con-
structor to generate required zone files. LDplayer uses a logically single authorita-
tive DNS server with proxies to emulate entire DNS hierarchy (Hierarchy Emulation).
The single DNS server provides all the generated zone files. The proxies manipulate
packet addresses to achieve successful interaction between the recursive and authori-
tative servers, such as providing correct answers to replayed queries. As a distributed
91
Query
Mutator
Zone
Constructor
Recursive
Server
Authoritative
Server
Pre-captured
Network trace
Proxy
Proxy
Authoritative
Server
recursive
replay
authoritative
replay (optional)
Query
Engine
Hierarchy
Emulation
Root, TLD,
SLD ...
Single zone
Figure 4.1: The architecture of LDplayer.
query system, the Query Engine replays queries in the captured traces. Optionally, the
researcher can use Query Mutator to change the original queries arbitrarily for dierent
replay, and query mutator can run live with query replay.
Each component in LDplayer addresses a specific design requirement fromx4.2.1.
In LDplayer’s zone constructor, we synthesize data for responses and generate required
zone files by performing one-time fetch of missing records over the Internet (x4.2.3).
We run a real DNS server that hosts these reusable zone files and provides answers
to replayed queries, so that we can get repeatable experiments without disturbing the
Internet.
With generated zone files, we need to emulate DNS hierarchy to provide correct
answers. Logically, we want many server hosts, one per each zone, like the real world.
However, we compress those down to a single server process with single network inter-
face using split-horizon DNS [Car00, spl], so that the system scales to many zones. For
easy deployment, we redirect the replayed experimental trac to proxies, which then
manipulate packet addresses to simplify routing configuration and discriminate queries
92
for dierent zones to get correct responses (x4.2.4). We could run multiple instances of
the server to support large query rate and massive zones, with routing configuration that
redirects queries to the correct servers.
In LDplayer’s query mutator, we pre-process the trace so that query manipulation
does not limit replay times. We convert network traces to human-readable plain text for
flexible and user-friendly manipulation. After necessary query changes, we convert the
result text file to a customized binary stream of internal messages for fast query replay
(x4.2.5). In principle, at lower query rates, we could manipulate a live query stream in
near real time.
In LDplayer’s query engine, we use a central controller to coordinate queries from
many hosts and synchronize the time between the end queriers, so that LDplayer can
replay large query rates accurately. The query engine can replay queries via dierent
protocols (UDP, TCP or TLS) eectively. We distribute queries from the same sources
in the original trace to the same end queriers for replay, in order to emulate queries
from the same sources which is critical for connection reuse (x4.2.6). LDplayer replays
queries based on the timing in the original trace without preserving query dependencies.
4.2.3 Synthesize Zones to Provide Responses
To support experiment repeatability and avoid leaking bulk experimental DNS queries
to the Internet, we build the zone files that drive the experiment once and then reuse
them in each experiment. We build zones by replaying the queries, once, against the
real-world servers on the Internet and harvesting these responses.
One-time Queries to the Internet: We need to build a DNS hierarchy that includes
answers to all the queries that will be made during replay. When emulating an author-
itative server, we can often acquire the zone from its manager, but when emulating
93
recursive servers we must recreate all zones that will be queried. (If any part of hierar-
chy is missing, replayed queries may fail.) For example, if .com delegation (NS records
of .com) is missing in the root zone, a recursive server will fail to answer all the queries
for .com names in experiments.
To build a DNS hierarchy that covers all queries, we send all unique queries in the
original trace to a recursive server with cold cache and allow it to query Internet to
satisfy each query. In this case, the recursive server walks down the DNS hierarchy,
querying root servers, top-level domain (TLD) servers, and all other necessary authori-
tative servers. We then capture all the DNS responses that authoritative servers respond,
recording the trac at the upstream network interface of the recursive server. Since the
recursive server walks down the DNS hierarchy for each queries, the captured trace con-
tains all authoritative data needed to build zones for the parts of the DNS hierarchy that
are needed for the replay. When we do trace replay from our rebuilt zones, a recursive
might fail to resolve a query if the query was not exercised when the zone was generated.
Zone construction need to be done only once (we save the recreated zones for reuse)
so any load it places on the original servers is a one time cost. We also prototyped
an alternative that primes these zones with replies from the trace, but we found that
caching makes raw traces incomplete if the traces are captured after the cache is warm.
We therefore rebuild the entire zone from scratch to provide a consistent snapshot. If an
experiment requires updated zone data, we make an additional pass of zone construction.
Construct Zones from Traces: Given the traces captured at the recursive server,
we next reverse the traces to recreate appropriate zone data.
We convert traces to multiple zone files, since a full DNS query (for example,mail.
google.com) may touch several dierent servers (root, .com, googlemail.l.google.
com, plus their authoritative nameservers, DNSSEC records, etc.).
94
We first scan the whole trace and identify authoritative nameservers (NS records) for
dierent domains and their host addresses (A or AAAA records) from all the responses.
Since most of domains have multiple nameservers (for example, google.com has 4
nameservers: ns{1-4}.google.com), a recursive server may choose any of them to trace
the query based on its own strategy. We group the set of nameservers responsible for the
same domain, and aggregate all DNS response data from the same group of nameservers
by checking the source address in responses. We then generate an intermediate zone file
from the aggregate data.
Since a nameserver can serve multiple dierent zones, the intermediate zone file
we generate may contain data of dierent domains and may not be a valid zone file
acceptable by a DNS server. We further split the response data in the intermediate zone
file by dierent domains, and output corresponding separated zone files. Optionally
we can also merge the intermediate zone files of multiple traces. To determine zone
cuts (which parts of the hierarchy are served by dierent nameservers), we probe forNS
records at each change of hierarchy.
Similarly, we can recreate a zone file for queries replaying at an authoritative server.
Since only single authoritative server is involved without the recursive, the zone file
reconstruction is straightforward.
Recover Missing Data: Sometimes records needed for a complete, valid zone will
not appear in the traces. For example, a valid zone file needs SOA (Start of Authority)
record and NS records for the zone, however, those records are not required for regular
DNS use. We create a fake but valid SOA record and explicitly fetch NS records if they
are missing.
Handle inconsistent replies: DNS queries sometimes vary over time, such as
replies to CDNs that balance load across a cluster, or in the unlikely event that the
zone is modified during our rebuild. DNS records can be updated. However sometimes
95
Stub
Recursive
Server
Recursive
Proxy
Recursive
TUN
all queries
(dport: 53)
From: Rec
To: .com
From: .com
To: Aut
From: .com
To: Rec
Authoritative
Server
Authoritative
Proxy
Authoritative
TUN
all responses
(sport: 53)
From: Aut
To: .com
Figure 4.2: With a network tunnel (TUN), server proxies manipulate the source and
destination addresses in the queries and responses to make routing work and get the
correct responses.
those update conflict with each other, such as multipleCNAME records for the same name
while only one allowed in principal. More often, the address mapping for names may
change over time, such as content delivery network (CDN) redirecting by updating DNS
using its own algorithm.
By default, to build a consistent zone, we choose the first answer when there are
multiple diering responses. Simulating the various CDN algorithms to give dierent
addresses for queries is future work.
4.2.4 Emulate DNS Hierarchy Eciently
With zones created from traces, we next introduce how we emulate DNS hierarchy in
order to answer replayed queries correctly in LDplayer. Handling queries to a recur-
sive server requires emulating multiple hierarchical zones, while handling queries to an
authoritative server does not need to emulate hierarchy due to a single zone.
The greatest challenges of emulating full DNS hierarchy in a testbed environment
are scalability to support many dierent zones and easy deployment. Since we use real
96
DNS records (such as real public IP addresses) in zone files, the other challenge is how
to make these zone files work in a private testbed environment with local IP addresses.
A naive way would use separate authoritative servers for each zone, each on its own
server. Even with virtual machines, such an approach cannot emulate typical recur-
sive workloads that see hundreds or thousands of zones over days or weeks—it will
encounter limits of memory and virtual network interfaces. We see 549 valid zones in
a 1-hour trace Rec-17 (Table 4.2) captured at a department-level recursive server. DNS
server software can host multiple zones in one server, but optimizations built into com-
mon server software mean that putting the whole hierarchy in one server gives dierent
results. (Asking for www.example.com will directly produce an IP address from a server
that stores the root, .com, and example.com zones, not three queries.)
Scale to many zones with a single server: To emulate complete DNS hierarchy e-
ciently, instead we contribute a meta-DNS-server: a single authoritative server instance
with a single network interface correctly emulates multiple independent levels of DNS
hierarchy using real zone files, while providing correct responses as if they were inde-
pendent.
Challenges: There are some challenges in making the recursive server successfully
interact with the meta-DNS-server during query replay, because we use a single server
instance and a single network interface to provide authoritative service to all relevant
zones in the trace.
First, how do the queries sent by the recursive server merge to the same network
interface at meta-DNS-server? Typically, if a recursive receives an incoming query
(for example, www.google.com A) with cold cache, it walks down the DNS hierarchy
(for example, root! com! google.com) and sends queries to respective authoritative
servers (for example, a.root-servers.net! a.gtld-servers.net! ns1.google.
com). As a result, the queries out of the recursive have a set of dierent destination IP
97
addresses. Without changes, those queries will not be routed to the meta-DNS-server by
default.
Second, how does the meta-DNS-server know which zone files to use in order to
answer the incoming queries correctly? When a recursive server resolves an incoming
query iteratively with cold cache, the query content sent by the recursive is the same,
regardless of which level of the DNS hierarchy it is contacting. Assume the meta-DNS-
server receives a query (for example, www.google.com A) which was meant to send to
the authoritative server of com. The meta-DNS-server is not able to identify the target
zone (com) based on the query content. The answers from root, com and google.com
zones are are completely dierent (a referral answer ofcom, a referral answer ofgoogle.
com, and an authoritative answer of www.google.com A respectively). A wrong answer
which is not from the correct zone (com) can lead to a broken hierarchy at the recursive
and further failure of query replay.
Third, how are meta-DNS-server’s responses accepted by the recursive server?
Assume the meta-DNS-server can pick the correct zone (for example, com) to answer
queries (we will present the solution later). All the reply packets by meta-DNS-server
have the same meta-DNS-server’s address as source IP addresses. Even if the recur-
sive receives this “correct” reply, it will not accept the reply because the reply source
address (the address of meta-DNS-server) is not matched with the original query desti-
nation address (for example, the address of a.gtld-servers.net)
Solutions: To overcome those challenges, at high level, we use split-horizon
DNS [Car00, spl] to host dierent zones discriminated by incoming query source
addresses. We use network tunnel (TUN) to redirect all the DNS queries and responses
to proxies. Those proxies further manipulate packets addresses to successfully deliver
the packets and to let the meta-DNS-server find the correct answers (Figure 4.2). We
explain details of our solutions in the following.
98
To redirect recursive server’s queries to meta-DNS-server we must change the des-
tination or source addresses of those DNS packets.
Before any address manipulation, we first need to capture all the queries and
responses, because any leaked packets are non-routable and dropped, leading to the
failure of trace replay. We create two TUN interfaces to get all required packets at the
recursive and meta-DNS-server respectively (Figure 4.2). We use port based routing
that all queries (packets with destination port 53) at the recursive, and responses (pack-
ets with source port 53) at the meta-DNS-server are routed to TUN interfaces. We man-
age this routing by using iptable: first mark the desired packets using mangle table,
and then redirect all the marked packets to TUN interfaces. We choose TUN interface
because it let us observe all raw IP packets to manipulate IP addresses.
We build two proxies (recursive proxy and authoritative proxy) to manipulate packet
addresses at the recursive server and meta-DNS-server respectively (Figure 4.2). The
common task of the proxies is to make sure captured packets can be routed to the
server at the other end smoothly for correct trace replay. Specifically, recursive proxy
captures recursive server’s queries and authoritative proxy captures meta-DNS-server’s
responses. Then, both of the proxies rewrite the destination address with the IP address
of the server at the other end.
To make the meta-DNS-server determine the correct answer and let the recursive
server accept the reply, the proxies replace the source address with the original desti-
nation address in the packets. We will explain the functionality of using original desti-
nation address below. After recalculating the checksum, the proxies send the modified
packets directly to the meta-DNS server and the recursive server respectively.
This process with proxy rewriting allows the meta-DNS server to determine to which
zone each query is addressed. To address the zone selection, the meta-DNS server hosts
multiple zones using software-based, split-horizon DNS [Car00, spl], where a server
99
provides dierent answers based on query source and destination addresses. When a
recursive server resolves an incoming query iteratively with cold cache, the destina-
tion addresses (target authoritative server address) of the iterative queries is the only
identifier for dierent zones, because the query content is always the same and not
distinguishable by itself. However, matching queries by destination addresses at the
meta-DNS-server requires the server listens on dierent network interfaces for each
zone separately, which brings deployment complexity, such as creating many (virtual)
network interfaces and a giant routing table in testbed. This complexity conflicts our
goal of scalability and deployability to support many dierent zones.
With split horizon, we make the meta-DNS server listen on one address and uses the
source IP address to determine for which level of the hierarchy the query is destined.
Since recursive proxy already replaces the query source address with the original query
destination address (OQDA), the current query source address becomes the zone iden-
tifier now. To correctly discriminate queries for dierent zones, we take the public IP
addresses of zone’s nameservers as the matching criteria (query source addresses). In
this way, the meta-DNS-server sees a query coming from OQDA instead of the recur-
sive server’s address (Figure 4.2). The meta-DNS server then determines the correct
zone file from the this source address, and issues a correct reply where the destination
address is OQDA. As discussed above, the authoritative proxy captures this reply, and
puts the destination address in source address. As a result, the recursive server observes
a normal reply from OQDA and can match this reply to the original query, without
knowing any address manipulation in the background. Our method works with author-
itative server implementation that supports split-horizon DNS, such as BIND with its
view andmatch-clients clauses in configuration.
100
4.2.5 Mutate Trace For Various Experiments
Another benefit of our system is that we support arbitrary trace manipulation to study
dierent questions from one trace.
There are two challenges in changing the traces. First, binary network trace is com-
plicated to edit directly because changes are not space-equivalent. We need a user-
friendly method to manipulate queries. Second, the delay caused by manipulation and
processing traces, may also bring problems for accurate query replay.
Plain text for easy manipulation: To easily manipulate input queries, we convert
network traces to human-readable plain text. We develop a DNS parser to easily extract
relevant data from network trace, and output a column-based plain text file where each
line contains necessary information of a DNS message. In this stage, users can edit DNS
messages as desired with a program or text editor. Most data in a DNS message can be
modified, including DNS header flags, query names, EDNS data, and transport protocol.
Binary for fast processing: Since plain text as input delays building DNS messages,
we convert the resulting text file to a customized binary stream of internal messages to
serve as input for trace replay (Figure 4.3) for fast processing. To distinguish dierent
messages in the input stream, we pre-pend the length of each message at the beginning
of each binary message.
To save unnecessary input delay in query replay, we pre-process the input and sepa-
rate the input processing from the query replay system. Optionally, the input engine of
our system can also read network trace and formatted text file directly, and convert to
internal binary messages on the fly.
We handle trace replay and support mutation of the trace in ways that are simi-
lar to the original. In some cases, what-if experiments may imply changes to trac
that are very dierent from the original trace. For example, if all zones are changed
101
Trace
Converter
Text
Converter
Internal
Message
Network
Trace
Plain Text
Customized
binary
Binary
Reader
pcap, erf …
time: 1461234567.012345
src: 192.168.1.1
query: example.com A IN
protocol: TCP …
Length: 200 bytes
010101110001…..
DNS
Parser
Converter
(de-serialize)
LDplayer’s input engine
Query Manipulator
Input files
Figure 4.3: Trace mutator converts network trace to plain text for easy editing, and
further converts to customized binary stream as input. LDplayer accepts three types of
input: network trace, formatted plain text and customized binary files.
to be DNSSEC signed, then one must generate new DNSKEY and RRSIG records. For
such experiments, the experimenter must insure that trial zone includes new data for the
replay to provide correct results.
4.2.6 Distribute Queries For Accurate Replay
With server setup and input trace, the next step for a successful DNS trace replay is to
emulate DNS queries with correct timing from dierent sources and connections.
Fast query replay and diverse sources: There are several resource limit in a single
host: CPU, memory and the number of ports. The query rate generated at a single host
is limited because of CPU constraints. The ability to maintain concurrent connections
in a single host is limited by memory and the number of ports (typical 65 k).
To support fast query rates from many sources, our approach is to distribute query
stream to many dierent hosts, allowing many senders to provide a large aggregate query
rate. In particular, we coordinate queries from many hosts with a central Controller
102
C
q q
…
…
…
q q
…
…
…
…
querier
level-1
level-2
level-n
…
controller
1
2
2
…
1
2
2
… …
distributor
Figure 4.4: Multi-level query distribution.
managing a team of Distributors which further controls several Queriers (Figure 4.4).
The end Queriers directly interact with DNS servers via dierent protocols (UDP, TCP
or TLS). For reliable communication, we decide to choose TCP for message exchange
among distributors.
The primary purpose of multiple levels is to connect enough end Queriers when there
is a limit on the number of distribution connections in each Distributor. Without limit,
one-level distribution (Controller distributes to Queriers directly) can bring 4 billion
connections theoretically in total, with maximum 65 k Querier hosts connected at any
time.
If the input trace is extremely fast, the CPU of Controller may become bottleneck
because it limits the speed of input processing. To solve this problem, we can split input
stream to feed multiple controllers.
Correct timing for replayed queries: The ultimate goal of query replay system is
to replay DNS queries with correct timing and reproduce the trac pattern.
Due to distributing queries among dierent hosts, it is challenging to synchronize
time and ensure the correct timing and ordering of individual queries.
To replay queries at accurate time, LDplayer keeps tracking trace time and real time,
and schedules timer events to send queries. When getting the first input query message,
103
trace time real time timer time
query absolute relative absolute relative (if 0, send now)
(q
i
) (
¯
t
i
) (
¯
t
i
) (t
i
) (t
i
) T
i
q
1
¯
t
1
0 t
1
0 0
q
2
¯
t
2
¯
t
2
¯
t
1
t
2
t
2
t
1
¯
t
2
t
2
... ...
q
i
¯
t
i
¯
t
i
¯
t
1
t
i
t
i
t
1
¯
t
i
t
i
Table 4.1: Determine the time to send queries.
controller broadcasts a special time synchronization message to all the queriers to indi-
cate the start time of the trace. Upon receiving the time synchronization message, a
querier obtains the current trace time (
¯
t
1
) and real time (t
1
).
On receiving the subsequent query stream, a querier extracts the absolute query time
in trace (
¯
t
i
) and computes the relative trace time (
¯
t
i
), as
¯
t
i
=
¯
t
i
¯
t
1
. The relative trace
time is the ideal delay that should be injected for trace replay assuming no input delay.
Similarly, the querier also gets current absolute real time (t
i
) and the relative real
time (t
i
) as t
i
= t
i
t
1
. The relative real time represents the accumulated program
run-time delay, such as input processing and communication delay, that has already been
generated.
To replay the query (q
i
) at correct time, LDplayer removes the added latency and
schedules a timer event at T
i
in the future (Table 4.1): where T
i
=
¯
t
i
t
i
. If the
trace is extremely fast and the input processing falls behind (T
i
0), LDplayer sends
the query immediately without setting up a timer event.
By tracking timing and continuously adjusting, LDplayer provides good absolute
and relative timing (as shown inx4.4).
Some experiments, such as load testing, prefer large query streams, as fast as possi-
ble, instead of tracking original timing time. As an option, LDplayer can disable time
tracking and replay as fast as possible.
104
Emulating queries from the same source: Some traces or experiments require
reproduction of inter-query dependencies. Two examples are UDP queries where the
second query can be sent only after the first is answered, or when studying TCP queries
where connections are reused. In general, we assume all queries from the same source IP
address are dependent and queries from dierent sources are independent. We assume
queries are independent, since captured DNS traces normally do not show application
dependency. Identifying semantic dependence between queries is an area for future
work.
We do preserve queries that originate from the same source as one kind of depen-
dency, since it aects performance of DNS-over-TCP. We use dierent network sockets
to emulate query sources. To emulate queries from the same sources, we must first
deliver all the queries from the same sources (IP addresses) in the original trace to the
same end querier for replay. To accomplish this, each distributor tracks the original
query source address and the lower level component in the message distribution flow.
When queries are distributed, each distributor either picks the next entity based on a
recent query source address in record, or selects randomly otherwise (during startup).
Similarly, the controller guarantees the same-source queries are assigned to same dis-
tributor. Each entity keeps the record during the experiments.
Similarly, queriers map the query sources and the underlying network socket, insur-
ing that same-source queries use the same socket if it is still open. New sources start
new sockets.
When emulating TCP connection reuse, queriers also tracks open TCP connections.
They may close them after a pre-set timeout.
As a result, during query replay, a DNS server observes queries from the same set
of host addresses but with a range of dierent port numbers, which emulates dierent
queries from the same sources.
105
. . .
Distributor
Querier
Querier
DNS
Server
Query
Stream
. . .
Unix
socket
Reader
Postman
Controller
. . .
Distributor
Querier
Querier
Client instance
optional
Figure 4.5: A prototype of distributed query system with two-level query distribution.
Distributors and queriers are implemented as processes and running on the same host
(client instance). Optionally, a single distributor can read input query stream directly.
An alternative is to setup virtual interfaces with dierent IP addresses at queriers,
and use those interfaces for each query sources address in query replay. However, the
method does not scale to a large number of addresses.
4.3 Implementation
We implement a prototype replay system and proxies in C++, to provide ecient run-
time, and full control over memory usage.
Query System: In two-level query distribution system (Figure 4.5), with a controller
and multiple clients. The controller runs two processes, the Reader, for trace input, and
another, the Postman to distribute queries. One or more machines are clients, each with
distributor and multiple querier processes. Processes use event-driven programming to
minimize state and scale to a large number of concurrent TCP connections. The reader
pre-loads a window of queries to avoid falling behind real time.
Server Proxy: The proxies around the server run as either recursive proxy or
authoritative proxy (x4.2.4). A single reader thread reads from a tunnel network inter-
face, while multiple worker threads read from a thread-safe queue that rewrites queries
106
(x4.2.4). Our prototype of the recursive proxy only talks to a single authoritative proxy.
Supporting partitioning the zones across the set of dierent authoritative servers is a
future work.
4.4 Evaluation
We validate the correctness of our system by replaying dierent DNS traces in con-
trolled testbed environment (x4.4.1). Specifically, we validate query inter-arrival time
and query rate. Our experiments show that the distributed client system replays DNS
queries with correct timing, reproducing the DNS trac pattern (x4.4.2).
4.4.1 Experiment Setup and Traces
To evaluate our system, we deploy the network shown in Figure 4.6 in the DETER
testbed [Ben11]. We use a controller (T) to distribute query stream to client instances
(C
1
to C
n
). Each client instance runs several distributor and querier processes to replay
input queries. The query trac merges at a LAN representing an Internet Exchange
Point, and is then sent to the server (S ). Each hosts is a 4-core (8-thread) 2.4 GHz Intel
Xeon running Linux Ubuntu-14.04 (64-bit). We use several traces, listed in Table 4.2
and described below, to evaluate the correctness of our system under dierent condi-
tions.
B-Root: This trace represents all trac at B-Root DNS server (both anycast sites)
over one hour during the 2016 and 2017 DITL collections [DNS17]. It is available from
the authors and DNS-OARC. We use B-Root-16 trace (Table 4.2) in this section to
validate our system can accurately replay high-volume queries against an authoritative
server. We use other groups of B-Root-17 traces in later sections (x4.5). Trac to each
root server varies, but the B-Root trace is not significantly dierent from the others.
107
S
T
C
1
IXP
C
n
1Gb/s
<1ms
Figure 4.6: Network topology used for evaluation: controller (T), server (S), and client
instances (C)
inter-arrival
traces start (min) (seconds) client IPs records
B-Root-16 2016-04-06 +60 .000027 1.07 M 137 M
15:00 UTC .000619
B-Root-17a 2017-04-11 +60 .000023 1.17 M 141 M
15:00 UTC .001647
B-Root-17b +20 .000025 725 k 53 M
.001536
Rec-17 2017-09-01 +60 .180799 91 20 k
17:22 UTC .355360
Synthetic
syn-0 - 60 1 3 k 3.6 k
syn-1 - 60 .1 9.7 k 36 k
syn-2 - 60 .01 10 k 360 k
syn-3 - 60 .001 10 k 3.6 M
syn-4 - 60 .0001 10 k 36 M
Table 4.2: DNS traces used in experiments and evaluation. Mean and standard deviation
of inter-arrival time for B-Root and Rec traces.
Synthetic: To validate the capability to replay query traces with various query rates,
we create five synthetic traces (syn-0 to syn-4 in Table 4.2), each with dierent, fixed
inter-arrival times for queries, varying from 0.1 ms to 1 s. Each query uses a unique
name to allow us to associate queries with responses after-the-fact.
4.4.2 Accuracy of Replay Timing and Rate
We first explore the accuracy of the timing and rate of query replay.
108
Methodology: We replay B-Root and synthetic traces over UDP in real time and
capture the replayed trac at server. We match query with reply by prepending a unique
string to every query names in each trace. We then report the query timing, inter-arrival
time and rate, comparing the original trace with the replay. We use a real DNS root zone
file in server for B-Root trace replay to provide responses. For synthetic trace replay, we
setup the server to host names in example.com with wildcards, so that it can respond all
the queries within that domain. We repeat each type of trace replay for 5 times to avoid
outliers.
Query time: We use unique query names to identify the same queries in original
and replayed traces, and study the timing of each query: the absolute time dierence
compared to the first query. We ignore the first 20-seconds of the replay to avoid startup
transients.
Figure 4.7 shows that timing dierences in replay are tiny, usually quartiles are
within 2:5 ms. We observe small, but noticeably larger dierences when the query
interarrival is fixed at 0.1 s: 8 ms quartiles. We are examining this case, but suggest it
is an interaction between application and kernel-level timers at this specific timescale.
Even when we look at minimum and maximum errors, timing dierences are small,
within 17 ms.
Query Inter-arrival Time: We next shift from absolute to relative timing with inter-
arrival times.
Figure 4.8 shows the CDF of experimental interarrival times for real (B-Root-16)
and synthetic traces of dierent interarrival rates. (Note that timescale is shown on a
logarithmic scale.) Interarrival is quite close for traces with input inter-arrivals of 10 ms
or more, and for real-world trac with varying interarrivals. We see larger variation for
very small, fixed interarrivals (less than 1 ms), although the median is on target, there
is some variation. This variation occurs because it is hard to synchronize precisely at
109
-20
-15
-10
-5
0
5
10
15
20
.0001 .001 .01 .1 1 B Root
trace Synthetic trace: query inter-arrival time (seconds)
query time error (ms) in replay
Figure 4.7: Query timing dierence between replayed and original traces. Figure shows
quartiles, minimum and maximum. The empty circles on x-axis exceed 20 ms (out-
liers).
these fine timescales, since the overhead from system calls to coordinate take nearly as
much time as the desired delay, adding a lot of jitter. We see divergence for the smallest
interarrivals for the real-world B-Root trace, but little divergence for the 50% longest
B-Root interarrivals. Uneven spacing in real traces gives us fee time to synchronize. We
repeat this experiment for 5 times; all show similar results to the one shown here.
Query Rate: We finally evaluate query rates. To do so, we replay the B-Root-16
trace and compute the query rate in each second of trace replay against the corresponding
rate of that second in the original trace. We repeat this test five times.
Figure 4.9 shows the CDF of the dierence in these per-second rates for all 3,600
seconds of each of the five replays. We observe that almost all (4 trials with 98%-99%
and 1 trial with 95%) of 3.6 k data points (1-hour period) have tiny ( 0:1%) dierence
in average query rate per second. This experiment uses the B-Root because it has large
query rate (median 38 k queries/s) and the rate varies over time. We use a 1-second
110
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
.000001
.00001
.0001
.001
.01
.1
1
CDF
query inter-arrival time (seconds)
dash lines: original
dots: replayed
B Root
synthetic
0.1 ms
synthetic
1 ms
synthetic
10 ms
synthetic
100 ms
synthetic
1 s
Figure 4.8: Cumulative distribution of the inter-arrival time of original and replayed
traces.
0
0.2
0.4
0.6
0.8
1
-2% -1% 0 1% 2%
CDF
query rate (per second) difference
original median query rate:
38K q/s
B root trace replay
for 5 times
+0.1%
-0.1%
Figure 4.9: Query rate dierences between replayed and original B-Root trace (5 trials).
Black solid circles on the edge are a few cases out of2%.
window to study overall replay rate; finer (smaller) windows may show greater variation
as OS-scheduling variation becomes significant.
111
4.4.3 Accuracy of Responses when Emulating DNS Hierarchy
Having shown we can replay queries at accurate timing, we next validate the accuracy of
response content when emulating DNS hierarchy. We first capture network traces, and
build zone data using the captured traces. We then configure our meta-DNS-server to
emulate DNS Hierarchy. We finally replay queries to an experimental recursive server
to evaluate the accuracy of both zone construction and DNS hierarchy emulation. We
also test zone construction and hierarchy emulation separately during developments.
4.4.3.1 Experiment Setup and Methodology
We first generate network traces and build DNS zone data from these traces. We use
the Alexa top-500 domains in our experiment because of their popularity. We use dig
to send queries of the addresses (A record) of the Alexa top-500 domains to a recursive
server (bind-9.11.4-P1). We flush the cache of the recursive server before each query,
so that the recursive server always walks a full DNS hierarchy. We capture network
traces at the recursive server, and construct 861 zones for the Alexa top-500 domains
using the methods described inx4.2.3. We also use these captured traces as the base for
comparison of dierent query replay. We run the recursive server in a virtual machine
(Linux Fedora-28) to avoid capturing DNS queries by other application.
We must correctCNAME inconsistency in zone files. We discover 4 domains that have
CNAME records for zone apex, while forbidden by standard [Bar96]. Since BIND fails
to load the zone with CNAME at zone apex, we modify the BIND source code to avoid
checking CNAME. These CNAME inconsistencies we discover are possibly due to older
server software, while new developments in DNS (such as CNAME Flattening [Mat14,
Clo18]) could work around.
112
We run proxies and the meta-DNS-server to emulate the DNS Hierarchy (x4.2.4).
We configure a split-horizon authoritative server (bind-9.11.4-P1) with 837 horizons
as meta-DNS-server, by using view and match-clients clauses in configuration. We
configure the server with 64 GB RAM on a 24-core (48-thread) 2.2 GHz Intel Xeon,
running Ubuntu 16.04.2 LTS (64-bit) with4.4.0-83-generic kernel. When the zones
of the Alexa top-500 domains are fully loaded, our meta-DNS-server consumes 20 GB
RAM.
We conduct three query replays with two dierent server implementations. First,
we replay queries of the Alexa top-500 domains to BIND with meta-DNS-server in a
controlled testbed, to examine the accuracy of the replayed responses. In order to verify
our system supports dierent server implementations, we then repeat the experiment
with Unbound (unbound-1.8.0). Finally, we let Unbound talk to the Internet (reality)
instead of meta-DNS-server, to compare with BIND in reality. In each case, we always
restart the recursive server before replaying each query to avoid caching eects. We use
tcpdump to capture the network trac at the recursive server. For each type of server,
we compare the replayed results with reality. We repeat each experiment in LDplayer
for three times.
4.4.3.2 Experiment Results
Response Content: We examine the accuracy of replayed responses from the recursive
server by verifying DNS response code and answer sections. Table 4.3 shows the results
of comparison between Reality (recursive server talks to the Internet) and our system
LDplayer (recursive server talks to meta-DNS-server with proxies). We observe that
our system can generally provide accurate response content for the query replay against
a recursive server: 99.8% of the replayed responses (including 2 NXDOMAIN) are the
same as the reality for both BIND and Unbound (Table 4.3). The consistent results of
113
BIND Unbound
Responses Reality LDplayer Reality LDplayer
Considered 500 500 500 500
Match BIND Reality 500 499 446 499
NOERROR 498 497 444 497
NXDOMAIN 2 2 2 2
Di BIND Reality 1 54 1
NOERROR 54
NXDOMAIN 1 1
Table 4.3: Comparisons between responses in reality and responses in replay for the
Alexa top-500 websites. Reality: recursive server is connected to Internet. LDplayer:
recursive server is connected to meta-DNS-server with proxies. We generate zone files
for LDplayer by using traces captured under BIND-Reality.
two dierent server implementations and accurate responses also verify the accuracy
of our methods of zone construction. We repeated this experiment for three times and
observe the same results.
We observe that 1 query (0.1%) gets a NXDOMAIN response in our replay, while
getting NOERROR in reality. We examine this query name (bp.blogspot.com) specif-
ically. We find that the reply to address of bp.blogspot.com has an empty answer
withNOERROR in reality, indicating that the name is valid but has other types of records
other than requested type. Since we cannot see those hidden records during zone con-
struction, we expect to get a NXDOMAIN reply in replay. As a possible future work, we
could extend our system to query every possible type or a single ANY type (TYPE255)
of the DNS name to find the hidden records during zone construction. Such brute-force
queries may burden real DNS servers, and some servers may intentionally ignore ANY
queries because they can be used in DDoS attacks. For example, at the time of writing,
a query of ANY type of bp.blogspot.com returns empty answers from its authoritative
server (ns1.google.com), while we expect to receive all available types of records of
bp.blogspot.com.
114
Our query replay with Unbound and the Internet (Unbound Reality in Table 4.3)
show 11% dierent replies, although none of the replies is NXDOMAIN. By examining a
few cases manually, we find some of the names are hosted by CDN and DNS gives dif-
ferent answers each time. We think the most of dierences in this replay withUnbound
are potentially due to CDN or DNS load balancing, while address update is rare during
a short time. Answers by CDN are time-varying, and such address changes are hard to
emulate in DNS trace replay without the emulation of a CDN infrastructure. CDN and
DNS load balancing aect any DNS-trace-replay system that assumes replies are the
same, and they will also aect any experiments involved with repetition of real-world
answers, because the answers keep changing over time. Future experiments are needed
to explore the reason why real-world responses vary over time in some cases.
Required Probes: Having shown we can get accurate replies in LDplayer, we next
further evaluate the accuracy of emulating DNS hierarchy: does the recursive server
send multiple queries to authoritative servers in LDplayer? Does the recursive server
behave similarly in LDplayer compared to reality? We answer these questions by check-
ing the number of “probes” needed by the recursive server to answer the incoming query.
We show that the number of required “probes” for each incoming query is well
within the typical behavior compared to dierent server implementations in reality. Fig-
ure 4.10 shows the number of queries sent by the recursive server in reality and LDplayer
for each of the Alexa top-500 domains. We observe that the number of queries sent by
recursive server in LDplayer is roughly similar to that in reality with some variations:
blue squares and red circles asymptotically align with the diagonal line in Figure 4.10.
We observe that multiple round-trip messages between the recursive and authoritative
servers in our experiments with LDplayer. Misconfigured zones would lead to a single
round-trip message that answers the query directly. By examining randomly-chosen
network traces, we observe the recursive server walks down the DNS hierarchy in
115
0
20
40
60
80
100
120
0 20 40 60 80 100 120
number of queries by recursive server in Reality
number of queries by recursive server in LDplayer
BIND
Unbound
Figure 4.10: Number of queries sent by the recursive server for each of the Alexa top-
500 websites. We measure the number of replies received at the recursive server and map
that to the number of queries to avoid overcount of extra transactions by retransmitted
queries.
LDplayer, querying root servers, top-level domain (TLD) servers, and other authorita-
tive servers. We repeated this experiment for three times and observe the similar results.
We observe some dierences in server behavior. We find that BIND requires more
queries thanUnbound in general: red circles lean towards left corner while blue squares
are sparse in Figure 4.10. The additional queries by BIND are possibly query retries
due to packet truncation and the conservative edns-udp-size advertisement in BIND.
We find two types of packet truncation in the captured network traces. First, truncated
116
UDP replies with TC bit make the recursive server retry queries through TCP. Second,
some referral replies (NS records in authority sections without answers) have truncated
additional sections (missing glue records) without setting TC bit. While accepting such
referral replies, a recursive server need additional queries to resolve the address of theNS
names. BIND conservatively advertises aedns-udp-size of 512 bytes in the first query
and progressively increase edns-udp-size on successive queries, while Unbound use
4096 bytes by default. This conservative strategy of setting edns-udp-size in BIND
can lead to more packet truncations and more queries thanUnbound.
We also observe that reality requires more queries than LDplayer in general: more
blue squares and red circles align above the diagonal line in Figure 4.10. One possible is
that middleboxes may drop large DNS replies and triggers more query retries in reality,
while we use a simpler topology (Figure 4.6) with minimum middlebox interference in
our experiments. Packet loss in reality may also lead to query retries, while we expect
less packet loss in our controlled experiments.
The choices of authoritative servers during query resolution can also result in the
variation between reality and LDplayer. When resolving a incoming query, a recur-
sive server can choose any of the authoritative servers in each DNS hierarchy. Dier-
ent authoritative server names may require dierent number of queries to resolve their
addresses. Possibly, the recursive server in our experiments chooses a dierent authori-
tative server other than the one observed in reality.
Summary: In addition to replay query at accurate timing, our system supports accu-
rate responses when emulation of DNS hierarchy. We show that our replay of query
content against a recursive server is well within the typical behavior compared to dif-
ferent server implementations in reality. Our system can generally provide accurate
response content without any Internet connections during experiments. Our system can
also demonstrate the number of queries sent by recursive servers.
117
4.4.4 Single-Host Throughput
Having shown our query system is accurate to replay dierent traces, we next evaluate
the maximum throughput: how fast can our system replay using a single host?
Methodology: We use an artificial query generator for controlled, high-speed replay.
We send a continuous stream of identical queries (www.example.com) to the target, send-
ing them with UDP, without timeouts, to an authoritative server hosting example.com
zone with wildcards. We run our query replay system with one distributor and six
querier processes, along with the query generator (total 8 processes), on a single 4-
core (8-hyperthread) host. We monitor the packet rate and bandwidth after the query
system is in steady state.
Results: With this setup we replay 87 k queries/s (60 Mb/s), as shown in Figure 4.11.
This rate is more than twice of normal DNS B-Root trac rate (as of mid-2017). In this
experiment the query generator is the bottleneck (it completely saturates one CPU core),
while other processes (distributor and queriers) each consumes about 50% of single CPU
core. Higher rates would be possible with faster query generation.
4.5 Applications
With controlled, configurable and accurate trace replay, our system provides a basis
for large-scale DNS experimentation which can produce new results and answer open
research questions. We next show such applications of LDplayer, including studying the
impact of increased DNSSEC queries and exploring the performance of DNS over TCP
and TLS at a Root DNS server.
118
0
20k
40k
60k
80k
100k
0 50 100 150 200 250 300
0
20
40
60
80
100
query rate (q/s)
bandwidth (Mbit/s)
time (seconds)
Figure 4.11: The throughput of fast replay a continuous input query stream over UDP
directly: queries are sent immediately without timer events. Data point is sampled every
two seconds over total 5 minutes.
4.5.1 Impact of Increased DNSSEC Queries
How does the root DNS trac change when more and more applications start to use
DNSSEC? We use LDplayer to enable DNSSEC for all queries when we replay traces,
allowing us to predict potential future behavior. We start to answer this question and
predict future DNS root trac. Prior studies used trace replay with current trac
mixes [Wes16].
We replay the B-Root-16 trace (Table 4.2) with a mix of dierent key sizes and
dierent portions of queries requiring DNSSEC, under the previous experiment setup
(x4.4.1).
119
Our new experiments of all queries with DNSSEC show that going from 72% DO
(as of mid-2016) to 100%, root response trac becomes 296 Mb/s (median) with 2048-
bit ZSK in steady state (right group in Figure 4.12). Compared to 225 Mb/s with cur-
rent 72% DO and 2048-bit ZSK, root response trac could increase by 31% in the
future when all queries require DNSSEC. Our experiments also demonstrate 32% trac
increase when DNS root ZSK was upgraded to 2048-bit from 1024-bit keys, replicating
experiments previously done in a custom testbed [Wes16]. As a future work, we could
use LDplayer to study the trac under 4096-bit ZSK.
As future work, one could use this experiment to test LDplayer’s predictive ability.
For example, one could take 2016 data, adjust it to 2018 DNSSEC levels, and see how
well they match.
0
50
100
150
200
250
300
350
1024 2048 2048 1024 2048 2048 ZSK (bits):
...normal... rollover
72.3% queries with DO bit (current)
...normal... rollover
All queries with DO bit
Bandwidth of all responses (Mbit/s)
Figure 4.12: Bandwidth of responses under dierent DNSSEC ZSK sizes. Trace: B-
Root-16. Figures show medians, quartiles, 5th and 95th percentiles.
120
0
2
4
6
8
10
12
14
16
0 5 10 15 20 25 30 35 40
overall percent of cpu usage
TCP time-out window (seconds) at server
original trace
(3% queries
over TCP)
all queries
over TCP
all queries
over TLS
Figure 4.13: CPU usage with dierent TCP timeouts under minimal RTT (<1 ms).
Trace: B-Root-17a. Figures show medians, quartiles, 5th and 95th percentiles.
S
T
C
1
…
IXP
C
n
1Gb/s
<1ms
IXP
1Gb/s
<1ms
Figure 4.14: Network topology for experiments of replaying Root DNS traces over TCP
and TLS: controller (T), server (S), and client instances (C)
4.5.2 Performance of DNS over TCP and TLS at a Root Server
We next use experiments to study DNS over TCP and TLS. Our goal is to understand
real-world resource usage at servers (memory and CPU) and client latency. Our experi-
ments here are the first to study these topics at scale, with a full server implementation;
prior work used micro-benchmarks and modeling [ZHH
+
15]. We convert all queries to
use TCP and TLS, demonstrating actual resource usage and also revealing performance
discontinuities in latency as a function of RTT, which modeling cannot capture.
121
4.5.2.1 Experiment Setup and Methodology
To evaluate server resource requirements and query latency, we deploy a network topol-
ogy (Figure 4.14), separating control and experimental trac. We vary the client-to-
server RTT for dierent experiments. All client hosts use 16 GB RAM and 4-core
(8-thread) 2.4 GHz Intel Xeon. To support the all-TCP/TLS workload, we config-
ure the authoritative server with 64 GB RAM on a 24-core (48-thread) 2.2 GHz Intel
Xeon, and controller with 24 GB RAM on a 12-core (24-thread) 2.2 GHz Intel Xeon.
We run nsd-4.1.0 with 16 processes for all the experiments, and a TLS-patched
version [Sin] for TLS experiments. All hosts run Ubuntu 16.04.2 LTS (64-bit) with
4.4.0-83-generic kernel. Dierent server implementations may have dierent mem-
ory requirements.
We conduct three types of query replay. First, we replay the queries using the proto-
cols in the original trace (3% TCP queries) to establish a baseline for comparison. We
then mutate the queries so all employ TCP and TLS respectively for two dierent sets
of experiments. We vary either TCP timeouts (5 ms to 40 ms) at the server, or the client-
server RTTs (0 ms to 140 ms or based on a distribution). In TCP and TLS experiments,
we optimize TCP at the client and server by enablingnet.ipv4.tcp_low_latency in
Linux, and disable the Nagle algorithm [Hei97] at the client.
We use two B-Root traces (Table 4.2) in the experiments in this section. We first
use 1-hour B-Root-17a trace to study server state with controlled minimal RTT (<1 ms),
verifying the experiment reaches steady state in about 5 minutes. For later experiments
we use B-Root-17b, a 20-minute subset of the B-Root-17a trace.
We log server memory withtop andps, CPU withdstat, and active TCP connec-
tions withnetstat.
122
4.5.2.2 Memory and Connection Footprint
For DNS over TCP and TLS, a server should keep idle connections open for some
amount of time, to amortize TCP connection and TLS session setup costs [HZH
+
16,
ZHH
+
15]. However, a server cannot keep the connection open forever, since maintain-
ing concurrent connections costs server memory. LDplayer provides the unique abil-
ity emulating and maintaining a large number of concurrent connections for DNS that
enables replaying queries over TCP and TLS at a root server, while most of the previous
DNS studies focus on UDP-dominated DNS. Server memory is an important metric to
study to understand the constraints of connection-oriented DNS.
Our experimental results confirm prior models [ZHH
+
15] in a real-world imple-
mentation, showing that even if all DNS were shifted to connection oriented protocols,
memory requirements are manageable. Figure 4.15 and Figure 4.16 show the memory
and connections for our experiments. We demonstrate that both the number of active
TCP connections and server memory consumption rise as the TCP timeout increases.
We show that with 20 s TCP timeout suggested in prior work [ZHH
+
15], our experi-
mental server requires about 15 GB RAM for TCP (Figure 4.15a) and 18 GB RAM for
TLS (Figure 4.16a). The server requires 180 k connections for TCP, one-third are active
(Figure 4.15c) and the rest in TIME WAIT state (Figure 4.15b), while TLS has similar
connection requirement (Figure 4.16c and Figure 4.16b). These values are well within
current commodity server hardware, although much larger than today’s UDP-dominated
DNS (2 GB RAM, blue bottom line in Figure 4.15a). DNS operators with old hardware
will need to upgrade server when preparing for DNS over TCP and TLS. Resource usage
reaches steady state in about 5 minutes and is thereafter stable (approximately flat lines
in Figure 4.15 and Figure 4.16).
Most of the extra memory cost is used for TCP connections. We observe that server
memory requirement increases significantly (6 more) when shifting from UDP to TCP,
123
while only increases moderately (30% more) from TCP to TLS. Prior work [ZHH
+
15]
models server memory without testing in a real-world implementation. One possible
way to reduce the memory requirement is using smaller TCP read and write buer in
kernel, although future experiments are needed.
We expected memory to vary depending on querier RTT, but the memory does not
change regardless of the distance from client to server. This resource stability is because
memory is dominated by connection timeout duration, which at 20 s is 200 longer than
RTTs.
4.5.2.3 CPU Usage
LDplayer enables the first experimental evaluation of CPU consumption of TCP and
TLS; prior work was unable to model CPU costs.
Figure 4.13 shows our evaluation of server CPU usage for DNS over TCP and TLS.
We observe that overall the CPU usage is about 5% (median) over 48 cores for all
queries over TCP and 9% to 10% (median) for all queries over TLS, again manageable
on current commodity server hardware. Results are stable regardless of the connection
timeout (the flat lines). We observe a slightly higher (2% more at median) CPU usage
for TLS at 5 ms timeout, likely due to more frequent connection timeout and setup.
In contrast, replaying original trace (3% TCP queries) requires median 10% CPU,
surprisingly higher (5% more at median) than CPU usage of all queries over TCP. We
are investigating the reason for this surprisingly lower CPU usage in TCP. One possible
is that operating system and network stack might be highly optimized for TCP. Another
possible is the TCP optimizations built in network interface card (Intel X710 40G in
our experimental server), such as TCP ooad engine and TCP segmentation ooad.
These TCP optimizations may help to reduce the server CPU usage, although further
investigation is needed.
124
4GB
8GB
12GB
16GB
20GB
24GB
28GB
0 10 20 30 40 50 60
memory consumption
time (minute)
5s
10s
15s
20s
25s
30s
35s
40s
dashed lines: All
solid lines: NSD all queries over TCP
original trace (3% queries over TCP) with 20s timeout
(a) Memory consumption.
0
20k
40k
60k
80k
100k
120k
0 10 20 30 40 50 60
number of established TCP connections
time (minute)
5s
10s
15s
20s
25s
30s
35s
40s
all queries over TCP
original trace (3% over TCP) with 20s timeout
(b) Established TCP connections.
0
50k
100k
150k
200k
0 10 20 30 40 50 60 number of TCP connections in TIME_WAIT
time (minute)
5s
10s
15s
20s
25s
30s
35s
40s
all queries over TCP
original trace (3% over TCP) with 20s timeout
(c) TCP connections in TIME WAIT state.
Figure 4.15: Evaluation of server memory and connections requirement with dierent
TCP timeouts and minimal RTT (<1 ms). Trace: B-Root-17a. Protocol: TCP
125
4GB
8GB
12GB
16GB
20GB
24GB
28GB
0 10 20 30 40 50 60
memory consumption
time (minute)
5s
10s
15s
20s
25s
30s
35s
40s
dashed lines: All
solid lines: NSD all queries over TLS
original trace (3% queries over TCP) with 20s timeout
(a) Memory consumption.
0
20k
40k
60k
80k
100k
120k
0 10 20 30 40 50 60
number of established TCP connections
time (minute)
5s
10s
15s
20s
25s
30s
35s
40s
all queries over TLS
original trace (3% over TCP) with 20s timeout
(b) Established TCP connections.
0
50k
100k
150k
200k
0 10 20 30 40 50 60 number of TCP connections in TIME_WAIT
time (minute)
5s
10s
15s
20s
25s
30s
35s
40s
all queries over TLS
original trace (3% over TCP) with 20s timeout
(c) TCP connections in TIME WAIT state.
Figure 4.16: Evaluation of server memory and connections requirement with dierent
TCP timeouts and minimal RTT (<1 ms). Trace: B-Root-17a. Protocol: TLS
126
Our experiments confirm that connection tracking and cryptography processing in
TLS does not increase CPU usage noticeably over UDP. CPU usage for all queries
over TCP is even lower than UDP with possible TCP optimizations in hardware. These
results are only possible in experiment, since there are no good models of CPU con-
sumption for DNS.
4.5.2.4 Query Latency
LDplayer experiments also allow us a first look at the distribution of query latency.
Prior modeling provided only expected values (the mean), but experimentation allows
understanding of tail performance.
Figure 4.17a shows query latency for DNS over TCP and TLS with dierent RTTs.
Query latency is asymmetric: the 5th and 25th percentiles are similar, but performance
in the tail varies greatly (compare the 75%ile and 95%ile). This skew is captured in
experimentation, but not in modeling.
We demonstrate TCP connection reuse helps to reduce query latency: median query
latency in TCP is similar to UDP at small 20 ms RTT and is only about 15% slower than
UDP at large 160 ms RTT (Figure 4.17a), while if all connections were fresh, models
predict 100% overhead for TCP due to the extra RTT in connection setup.
The small median latency dierences between UDP and TCP/TLS are weighted by
queries from a few busy clients where a connection may always get reused. In B-Root-
17b trace (Table 4.2), we find that a tiny set (1%) of the clients contribute three quarters
of the total query load, while most (81%) of the clients are inactive (<10 queries over the
20-minute trace (Figure 4.17c)), similar to the observations in prior work [CWFC08].
We next evaluate the query latency for the group of non-busy clients which send less
than 250 queries in B-Root-17b trace. Figure 4.17b shows the statistics of query latency
from a subset of 708 k non-busy clients, covering 98% clients and 14% query load. The
127
0
100
200
300
400
500
600
700
0 20 40 60 80 100 120 140 160 180
query latency (milliseconds)
RTT (milliseconds)
original trace (3% TCP)
all queries over TCP
all queries over TLS
(a) Query latency over all clients. Figure shows medians, quartiles, 5th
and 95th percentiles
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 20 40 60 80 100 120 140 160 180
query latency (milliseconds)
RTT (milliseconds)
original trace (3% TCP)
all queries over TCP
all queries over TLS
(b) Query latency over non-busy clients that send less than 250 queries in
the trace
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000 1x10
6
CDF
number of queries from each IP (log scale)
(c) Cumulative distribution of query load per client in the original trace
Figure 4.17: Evaluation of query latency over all clients and non-busy clients respec-
tively, with 20-second TCP timeout and dierent RTTs. Trace: B-Root-17b.
128
median query latency in TCP among those non-busy clients is about 2 RTT, much larger
than the 1-RTT median latency in UDP, indicating many queries are sent through fresh
connections, while connection reuse is still eective (25th percentile is 1 RTT for TCP).
Experimentation also helps reveal dierences due to RTT. As the RTT increases, the
median query latency of TLS increases non-linearly from 2 to 4 RTTs (red dashed line
in Figure 4.17b), while not captured in models.
We observe that some queries have large multi-time RTT latency (75th percentile and
up in Figure 4.17b), which is is unexpected since a single TCP query would only require
2 RTTs and a TLS query needs 4 RTTs. By examining packet traces, we see many server
reply TCP segments (possibly DNS messages) reassembled into a large TCP message.
Resembling may cause the large delay in DNS over TCP, because waiting for all the
packets. Another optimization is to disable the Nagle algorithm on the server.
By contrast, latency with UDP is consistent regardless of RTT, because UDP has no
algorithms like Nagle trying to reduce packet counts.
Evaluating these real-world performance interactions between the DNS client and
server was only possible in full trace-driven experiments, since there no generic model
for TCP and TLS query processing in DNS servers. Our experiments shows the eect
of TCP connection reuse although the query latency of TCP and TLS is still larger than
UDP, providing much greater confidence to testbed experiments with synthetic trac
and modeling [ZHH
+
15]. Our use of real traces and server software also showed an
unexpected increase in median query latency of TLS for large client RTTs.
4.6 Related Work
DNS Replay Systems: Several other systems that replay DNS trac and simulate
parts of DNS hierarchy. Wessels et al. simulate the root, TLD and SLD servers with
129
three computers to study the caching eects of dierent resolvers on the query load
of upper hierarchy [WFBc04]. Yu et al. build a similar system with multiple TLD
servers hosting one TLD (.com), to understand authority servers selections of dier-
ent resolvers [YWLZ12] . Ager et al. set up a testbed simulating DNS hierarchy
to study DNSSEC overhead [ADF06]. DNS-OARC develops a DNS trac replay
tool [DOc, DOb] to test server load.
Our system diers from these in scale, speed, and flexibility. Each of these systems
host each zone on a dierent name server, so they cannot scale to thousands of zones.
They also often make modifications to the zones (dropping and modifying NS records),
to make the routing work and obtain the correct answers from servers. We instead
use proxies to allow all zones to be provided from one name server, and to provide
a query sequence that matches real DNS. In addition, these systems do not carefully
track timing. (For example, the Ager et al. system uses batch-mode dig and so can
handle only light loads.) Our client system replays DNS queries with correct timing,
reproducing the trac pattern accurately. Finally, prior systems are designed to recreate
today’s protocol; we instead include the ability to project a current trace through future
protocol options, such as replaying UDP queries as TCP with preset connection timeout.
Trac Generators: Several trac generators can create DNS [Haa,Nat]. Like these
tools, our query replay system can also generate a stream of DNS packets with specified
parameters. However, these tools are not specific for DNS; they provide only simple
replay or generation. Our system focuses on DNS protocol and provides a generic DNS
experimentation platform. Our system can replay queries with accurate timing, and
mutate queries to test what-if scenarios.
Network Replay Tools: Several tools replay general network traces [LW,TK,Hen].
While these tools can replay DNS trace with timing given in the trace, our replay-client
system simulates the DNS query semantics, allowing us to replay real-world queries
130
with dierent variations (such as if all used TCP). Rather than just replaying each packet
in the trace mechanically, our system allows exploration of future DNS design options.
Other tools replay HTTP traces with accurate timing [NSW
+
14, tel, chr]. Our system is
specifically designed for DNS, and takes steps to emulate the DNS hierarchy on a single
instance of DNS server.
DNS Studies: There are studies that replay DNS queries to evaluate the performance
of DNS applications [PPPW04, KKC11, BFVSK09]. Our replay-client system supports
analysis like these studies, but it provides a more flexible platform that also enables new
studies at high query rates with protocol variants. The focus of our system is accurate
trace replay, while DNS Flagger can replay trace at faster rates [BFVSK09]. We would
like to compare the accuracy of our approach to these prior systems, but they do not
have published performance results on timing accuracy. Other studies improve web
performance by using customized DNS proxy [OSRB12]. We also use proxies in our
replay system, but our focus is to provide a query sequence that matches real DNS.
To the best of our knowledge, ours is the only experimental DNS system that can
replay DNS trace with original zone files, uses distributed clients to handle large query
rate and simulate dierent query sources, and lets us vary protocols.
4.7 Conclusion
This chapter has described LDplayer, a system that supports trace-driven DNS exper-
iments. This replay system is ecient (87k queries/s per core) and able to reproduce
precise query timing, inter-arrivals, and rates (x4.4). We have used our system to eval-
uate alternative DNS scenarios, such as where all queries use DNSSEC, or all queries
use TCP and TLS. Our system is the first to make at-scale experiments of these types
131
possible, and experiments with TCP confirm that memory and latency is good (as pre-
dicted by modeling). In addition, experimental confirmation of complex systems factors
such as memory usage are critical to gain confidence that an all-TCP DNS is feasible on
current server-class hardware.
This chapter supports our thesis statement that it is possible to improve security
of network request-response protocols without compromising performance by proto-
col optimizations that are demonstrated through measurements of protocols develop-
ments (x1.2). The security improvement and protocol optimizations are the same as the
previous chapter: TCP mitigates spoofing and amplification attack, and TLS provides
privacy for users. The main protocol optimization connection persistence that amortize
end-to-end client latency. Dierent from previous chapter, we build a DNS experimental
framework, and we use this framework to conduct large-scale DNS trace replay. We aim
to understand the dynamics of server memory and CPU usage and demonstrate actual
resource requirements for DNS over TCP and TLS. By replaying a root DNS trace, our
experiments show that server memory is manageable under connection-oriented DNS
(15 GB for TCP and 18 GB for TLS with 20 s connection timeouts), confirming the
modeling results in the previous chapter. Our experiments also show that CPU usage of
managing connections and cryptography at a DNS server does not increase much over
UDP, while no such a model of CPU usage for DNS over TCP is available. In summary,
by large-scale experiments with real-world client and server implementations, we prove
that TCP and TLS can work for DNS and the resource requirements are manageable.
132
Chapter 5
Future Work and Conclusions
In this chapter, we discuss possible future directions of our studies, and summarize the
contributions of this thesis in the end.
5.1 Future Work
In addition to demonstrating our thesis statement by those example studies in previ-
ous chapters, there are directions of immediate future work that could strengthen our
statements. In this section, we discuss these future work for each of our three studies.
In Chapter 2, we studied the latency and pervasiveness of Online Certificate Status
Protocol (OCSP). There are two future directions that could extend this work. First, we
would like to better understand OCSP latency at dierent types of networks globally.
Our study of measuring OCSP latency (Chapter 2) was limited to two vantage points
of large universities in America, which have good connectivity and are close to some
of the CDN providers. Other networks with slow connectivity might see larger OCSP
latency. To confirm our results and understand the global distribution of OCSP latency,
we can measure OCSP latency and usage at geographically dierent vantage points over
the world. For example, we could potentially collect passive network traces at other
locations, especially those networks that have slow connectivity and are far from CDNs.
The challenge is that measurement data from real populations is often dicult to obtain
due to privacy concerns. Another option is to conduct active probes to OCSP servers
by using global research network platform, such as PlanetLab [CCR
+
03]. Second, we
133
can conduct a longitudinal study of OCSP and show the trend of OCSP adoption in
applications. We can potentially monitor OCSP over longer time and publish live data
on the Internet, encouraging the adoption of OCSP.
In Chapter 3, we proposed to use TCP and TLS to improve the security and pri-
vacy of DNS, and evaluated client latency and server memory requirement for DNS
over TCP and TLS by using trace analysis and modeling. There are three directions
to strengthen this work. First, we would like to use additional datasets to confirm our
results. In this work, we used DNSChanger traces to represent stub-to-recursive traf-
fic. If the computers infected by DNSChanger (malware) were also compromised by
other malware that generate many DNS queries, the analysis of connection reuse was an
over-estimation for normal users. We could potentially collect DNS traces at a recursive
server at a large university to get a cleaner result. We used traces from B-Root (one
of the 13 root letters) to represent a DNS root server. It would be helpful to confirm
our root server memory estimation using DITL traces of other root letters, although
we expect the variation of result is modest. Second, we can optimize TLS for DNS to
reduce our estimation of server memory cost. Our estimation of per connection memory
cost (150 kB) is a bit conservative, since Google has reported results of only 10 kB per
connection for HTTPS [Lan10]. We could look into those techniques that Google used,
to understand whether they can work for DNS over TLS, such as picking an appropriate
cipher-suites and integrating OpenSSL optimization with dierent DNS server imple-
mentation. Third, we can improve our estimation of end-to-end client latency by using
a wider range of round-trip time (RTT). Our models of end-to-end client latency pro-
vide mean performance, and do not directly provide a full distribution of latency. To
better understand the impact of RTT on the latency of DNS over TCP and TLS, we
could obtain a distribution of real-world RTT from census of the Internet [HPG
+
08],
134
and project a distribution of client latency using our existing models. This better esti-
mation of latency based on existing model complements the experiments of trace replay
(Chapter 4), if network trace is not available.
In Chapter 4, we built a configurable, general-purpose DNS experimental frame-
work named LDplayer. We used this experimental framework to conduct large-scale
experiments, further evaluating the performance of DNS over TCP and TLS with real
implementations. There are three areas of immediate future work for this study. First, we
would like to conduct an additional experiment of DNSSEC to demonstrate LDplayer
can accurately predict future DNS changes (suggested by an anonymous IMC 2018
reviewer). We can use LDplayer to replay a historical query trace (for example captured
in 2016), while adjusting the percentage of DNSSEC queries to the DNSSEC level in a
recent trace (for example captured in 2018). We can compare the results of trace replay
to see how well the adjusted historical trace and the recent trace match. Despite of
the same percentage of DNSSEC queries, we expect that the replayed trac pattern of
adjusted 2016 trace will be slightly dierent from the 2018 trace, because the DNSSEC
zones and names could have changed. Second, we want to extend our framework to
support partitioning the zones across a set of dierent authoritative servers. Although
our current prototype of the recursive proxy only talked to a single authoritative proxy,
we could modify the proxy to support multiple communication channels. Partitioning
the zones across dierent server can greatly improve the scalability of supporting a large
number of zones. Third, we would like to add supports for other experimental protocols
of DNS, for example, DNS over HTTPs [HM18], Datagram Transport Layer Security
(DTLS) [RWP17] and QUIC. With the ability of replay trace over future protocols,
LDplayer can help to evaluate the costs and benefits against dierent infrastructures and
protocols, revealing unknown constraints.
135
In addition to the above areas of immediate future work that could strengthen our
claims, our work suggests wider future directions out of these three specific studies.
First, future work can use the protocol optimizations (such as query pipelining and
OOOP) we selected for DNS over TCP in Chapter 3, to improve the performance of
other UDP-based request-response protocols. Similar to DNS over UDP, most of UDP-
based request-response protocols (such as Network Time Protocol (NTP)) have the prob-
lems of source address spoofing and amplification attacks. Although TCP can mitigate
spoofing and greatly reduce the amplification factors (TCP SYN ACK vs UDP response
message), TCP brings common concerns on client latency and server memory require-
ments. The protocol optimizations we selected for DNS over TCP can potentially help
to optimize the performance for other similar request-response protocols that shift from
UDP to TCP. For example, future studies can explore NTP over TCP and evaluate the
client latency and server memory requirements, based on the protocol optimizations our
used for DNS over TCP.
Second, future work of evaluating new transport protocols for DNS or studies that
require emulate DNS hierarchy would benefit from the DNS experimental framework
presented in Chapter 4. Our work of LDplayer showed an example of ecient emu-
lation of the distributed server infrastructures of a network request-response protocol.
By using proxies, network tunnel and controlled routing, we successfully emulated full
DNS hierarchy within a single server, instead of deploying many servers separately.
LDplayer could facilitate future DNS studies that need DNS hierarchy. For example,
we can use LDplayer to study the eects of dierent DNS algorithms on the query load
of authoritative servers. We can also evaluate the server memory requirement at a recur-
sive server, with TCP upstream to authoritative servers. With future support, LDplayer
can also evaluate the costs and benefits against dierent experimental protocols of DNS,
such as DNS over DTLS [RWP17], HTTPs [HM18] and QUIC.
136
Third, future study of performance evaluation for connection-oriented request-
response protocols would benefit from our analysis of connection reuse and latency
modeling for DNS over TCP. Common questions for a newly deployed service includes
how much server memory is required and what latency to expect. In our study of DNS
over TCP, we analyzed network traces and estimated server memory by studying con-
nection reuse under dierent connection timeouts. We modeled TCP and TLS latency
and used the parameters from trace analysis to estimate mean client latency. These
procedures can be applied to performance evaluation of connection-oriented request-
response protocols in general. For example, one can use the similar analysis of con-
nection reuse and latency modeling to study the costs of Remote Procedure Call (RPC)
over TCP. The model of RPC over TCP would be similar to our model of DNS over
TCP, although RPC might need extra RTTs for message exchanges. The analysis of
connection reuse would be similar if a network trace of RPC is available.
5.2 Conclusions
In this dissertation, we make the following thesis statement: “it is possible to improve
security of network request-response protocols without compromising performance, by
protocol and deployment optimizations that are demonstrated through measurements of
protocol developments and deployments.”
We demonstrated this thesis with measurements and experiments that evaluated the
developments and optimizations of request-response protocols, and showed that secu-
rity benefit can come with reasonable performance cost. Specifically, our first work
showed that OCSP had low latency because of the wide use of CDN among OCSP
servers, and applications that use certificates could reliably check certificate revocation
by using OCSP. Our second work showed that we could use TCP and TLS to improve
137
the security and privacy of DNS, and we could achieve reasonable performance with
selective protocol and implementation optimizations. In our third work, we built a DNS
experimentation framework that emulates complete DNS hierarchy, and we used this
framework to prove the modest performance of DNS over TCP and TLS at scale with
real client and server implementations. Altogether, our studies support the thesis state-
ment.
Each of our studies also has its own research contributions. Specifically, In the
first work, we conducted new measurements of OCSP by examining network trac of
OCSP that has broad coverage of diverse applications and users. We showed OCSP
latency had improved significantly since 2012: a median latency of only 20 ms, much
less than the 291 ms reported in previous studies [SHI
+
12]. Our measurement showed
that 94% of the OCSP trac were served by CDN, and OCSP use is ubiquitous. In
the second work, we identified necessary protocol and implementation optimizations
(such as query pipelining and OOOP) for DNS over TCP/TLS, and showed that the
cost of DNS over TCP/TLS is modest. We also suggested how to run a production
TCP/TLS DNS server [HZH
+
16]. To the best of our knowledge, we are the first to
model end-to-end client latency for DNS over TCP and TLS. Our trace analysis showed
that connection reuse can be frequent (60%–95% for stub and recursive resolvers and
half that for authoritative servers). We suggested appropriate connection timeouts for
DNS operations: 20 s at authoritative servers and 60 s elsewhere. We showed that server
memory requirements match current hardware (a large recursive resolver require addi-
tional 3.6 GB), and latency of connection-oriented DNS is acceptable (about 9%–22%
slower than UDP). In the third work, we showed how to build a DNS experimentation
framework that scales to eciently emulate a large DNS hierarchy and replay large
traces. We used this experimentation framework to explore how trac volume changes
(increasing by 31%) when all DNS queries employ DNSSEC. We also used trace replay
138
of DNS over TCP/TLS to confirm our previous modeling results. Our experiments pro-
vide strong statements about server memory (15 GB for TCP and 18 GB for TLS) and
CPU usage with real-world implementation. Our DNS experimentation framework can
benefit other studies on DNS performance evaluations.
139
Chapter 6
Appendix for Connection-Oriented
DNS to Improve Privacy and Security
This chapter provides additional data and experiments to support Chapter 3 and our the-
sis. This chapter was a joint with Zi Hu, Duane Wessels, Allison Mankin, and John Hei-
demann. Particularly, Zi Hu leaded the experiments of studying response sizes (x6.1),
further discussions on latency (x6.4), and measurements of RTT (x6.5).
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 200 400 600 800 1000 1200 1400
CDF
response size (bytes)
size of response for
each component
size of maximum response
for all components
512 bytes
A root
J root
Figure 6.1: Response sizes from name servers of the Alexa top-1000 websites (as of
2014-01-24), and UDP response sizes out of two root servers. (Data: 2013-10-01)
140
6.1 Current Query Response Sizes
Figure 6.1 shows the size of responses from popular authoritative name servers and from
two root DNS servers. This table shows resolution of full domain names for the Alexa
top-1000 websites. We show the distribution of responses for each component as well
as the maximum seen over all components of each domain name. From this graph we
see that responses today are fairly large: nearly 75% of top 1000 result in a response
that is at least 738 bytes (the DNSSEC-signed reply for .com). Resolvers today require
EDNS support for large replies.
Resolution of these domain names typically requires 600–800 B replies. Many Inter-
net paths support 1500 B packets without fragmentation, making these sizes a good
match for today’s network. This result is not surprising: of course DNS use is tai-
lored to match current constraints. However, transient conditions stress these limits.
Examples are the two methods of DNSSEC key rollover: with pre-published keys,
DNSKEY responses grow, and with double-signatures all signed responses are tem-
porarily inflated. Both stretch reply sizes for the transition period of hours or days,
during which IP fragmentation reduces performance of aected domains (like .com) for
everyone.
6.2 Domain names per Web Page
To demonstrate the need for pipelining DNS queries for end-users (x3.3.1), we examined
about 40M web pages (about 1.4%) from a sample of CommonCrawl-002 [Gre11]. The
sample is selected arbitrarily, so we do not expect any bias. We count the number of
unique domain names per page.
141
0
0.2
0.4
0.6
0.8
1
1 10 100
CDF
number of unique domain names in a webpage, logscale
Figure 6.2: Cumulative distribution of number of unique hostnames per web page.
(Dataset: 40584944 page sample from CommonCrawl-002 [Gre11]).
Figure 6.2 shows the results: to confirm that 62% of web pages have 4 or more
unique domain names, and 32% have 10 or more.
6.3 Additional Data for Server-Side Latency
Figure 6.3 and shows the number of connections over the day for all three datasets,
Figure 6.4 shows the hit fraction over the day for all three datasets, expanding on the
data in Figure 3.3, Figure 3.4 and Figure 3.5.
Figure 6.5 summarizes the data in Figure 6.4 by quartiles.
6.4 Detailed Discussion of Latency
This appendix provides a detailed, case-by-case discussion of latency for our experi-
ments, expanding on the discussion inx3.6.2 andx3.6.3.
142
0
500
1000
1500
2000
2500
3000
3500
4000
0h 6h 12h 18h 24h
0
0.05
0.1
0.15
0.2
0.25
concurrent connections
average # of connections per IP
time from trace start
480
360
240
120
60
50
40
30
20
10
(a) Dataset: DNSChanger
0
10000
20000
30000
40000
50000
60000
70000
0h 6h 12h 18h 24h
0
0.05
0.1
0.15
0.2
concurrent connections
average # of connections per IP
time from trace start
(b) Dataset: Level 3, cns4.lax1.
0
50000
100000
150000
200000
250000
300000
350000
0h 6h 12h 18h 24h
0
0.05
0.1
concurrent connections
average # of connections per IP
time from trace start
(c) Dataset: B Root
Figure 6.3: The number of concurrent connections given by dierent time-out window
sizes.
143
0.75
0.8
0.85
0.9
0.95
1
0h 6h 12h 18h 24h
connection hit fraction
time from trace start
480
360
240
120
60
50
40
30
20
10
(a) Dataset: DNSChanger
0.75
0.8
0.85
0.9
0.95
1
0h 6h 12h 18h 24h
connection hit fraction
time from trace start
(b) Dataset: Level 3, cns4.lax1.
0.75
0.8
0.85
0.9
0.95
1
0h 6h 12h 18h 24h
connection hit fraction
time from trace start
(c) Dataset: B Root
Figure 6.4: Server-side hit ratio (connection reuse) of queries given by dierent time-out
window sizes
144
0.75
0.8
0.85
0.9
0.95
1
0 50 100 150 200 250 300 350 400 450 500
connection hit fractions
time-out window (seconds)
DNSChanger/all-to-one
(a) Dataset: DNSChanger, all-to-one
0.75
0.8
0.85
0.9
0.95
1
0 50 100 150 200 250 300 350 400 450 500
connection hit fractions
time-out window (seconds)
Level 3, cns4.lax1
(b) Dataset: Level 3, cns4.lax1.
0.75
0.8
0.85
0.9
0.95
1
0 50 100 150 200 250 300 350 400 450 500
connection hit fractions
time-out window (seconds)
DITL/B Root
(c) Dataset: B Root
Figure 6.5: Quartile plot of server-side connection hit fraction.
145
6.4.1 Detailed Discussion of Latency: Stub-to-Recursive Resolver
We first estimate what RTTs to expect for stub-to-recursive, then compare protocol alter-
natives.
Typical Stub-to-Recursive RTTs: Stubs typically talk to a few (one to three) recur-
sive resolvers. Recursive resolvers are usually provided as a service by ones ISP, and
so they are typically nearby in the network. Alternatively, some users use third-party
resolvers. These may have a higher round-trip-time (RTT) from the stub, but provide a
very large cache and user population.
We measure the RTT between stub and recursive resolvers across 400 PlanetLab
nodes to their local (ISP-provided) resolver, and also to three third-party DNS services
(Google, OpenDNS, and Level 3). For each case, we issue the same query 7 times, each
after the previous reply, and report the median. We expect the first query to place the
result in the cache, so the following query latency approximates RTT. We report the
median to suppress noise from interfering trac.
The result confirms that the ISP-provided recursive resolver almost always has very
low latency (80% less than 3 ms). Only a few stragglers have moderate latency (5%
above 20 ms). For third-party resolvers, we see more variation, but most have fairly low
latency due to distributed infrastructure. Google Public DNS provides median latency
of 23 ms and the others only somewhat more distant. The tail of higher latency here
aects more stubs, with 10–25% showing 50 ms or higher. (See Figure 6.6 for data.)
PlanetLab nodes are primarily hosted at academic sites and so likely have better-
than-typical network connectivity. These observed third-party DNS latency may be
lower than typical. However, with this sample of more than 300 organizations, this
data provide some diversity in geography and configuration of local DNS.
146
TCP connection setup: stub-to-recursive: We next evaluate the costs of TCP con-
nection setup for stub-to-recursive queries.
We emulate both a close (RTT=1 ms) and a far (RTT=35 ms) stub to recursive
resolver configurations. We use our custom DNS stub and the BIND-9.9.3 server with
our DNS proxy. For each protocol (UDP, TCP, TLS), the stub makes 140 unique queries,
randomly drawn from the Alexa top-1000 sites [Ale] with DNS over that protocol. We
restart the recursive resolver before changing protocols, so each protocol test starts with
a cold cache.
For each protocol we also vary several policies. On the client side we consider
pipeline: send several queries before the responses arrive and stop-and-wait: wait for
response before sending the next query. Processing on the server, we compare in-order,
where queries are processed sequentially and out-of-order processing (OOOP), where
queries are processed concurrently. Connections are either reused, with multiple queries
per TCP/TLS connection (p-TCP/p-TLS), or no reuse, where the connection is reopened
for each query.
We repeat the whole experiment 10 times and report results combining all experi-
ments.
Figure 3.10 shows the results of these experiments. We first consider UDP compared
to TCP: when queries are sent in stop-and-wait mode, and server processes them in-
order, TCP can always achieve almost the same latency as UDP (a: left) with (d: left) or
without (b: left) connection reuse, when RTT is small. When RTT becomes larger, TCP
without (b: right) connection reuse incurs slightly higher latency due to the handshake
cost than UDP (a: right). However, TCP with (d: right) connection reuse still can
achieve similar latency as UDP. This experiment demonstrates that with small client-
server RTTs, TCP setup time is irrelevant; it is dwarfed by overall cost of resolving a
147
new name. Even with large RTT, TCP could still get the same latency as UDP by reusing
connections.
To consider pipelining, sending multiple queries before the replies return. In Fig-
ure 3.10 the four rightmost whiskerbars (f, g, h, i) indicate pipelining. First, we see that
per-query resolution times are actually higher with pipelining than when done sequen-
tially (stop-and-wait). This delay occurs because all 140 queries arrive at the server at
nearly the same time, so they queue behind each other as they are processed by our
4-core computer. Second, with pipeline but in-order processing, TCP (f) has horri-
ble latency. The reason for this high latency is that while both BIND-9 and Unbound
can process multiple queries from UDP concurrently(out-of-order), they process queries
from the same TCP connection sequentially(in-order), which causes head of line block-
ing: later queries get blocked by previous ones. While correct, current resolvers are not
optimized for high-performance TCP query handling.
DNS specifications require support for out-of-order queries and responses, even
though current implementations do not process queries this way (see prior discussion in
x3.3.1). Here we approximate native resolvers support for out-of-order TCP queries by
placing a proxy resolver on the same computer as the real resolver. The proxy receives
queries from the stub over TCP, then forwards them to recursive resolver over UDP.
This approach leverages current native out-of-order UDP processing and incurs no frag-
mentation since UDP is sent inside the same machine (over the loopback interface).
The proxy then returns replies over TCP to the stub, but in whatever order the recursive
resolver generates results. The light blue whiskerbar (h) in the right side of Figure 3.10
shows the eects of this improvement: TCP (h) and UDP (g) performance are again
equivalent.
Finally, pipelining and OOOP improve aggregate throughput, and they are essen-
tial to make batch query performance approach that of UDP. While batch performance
148
appears slower than individual (compare (g) to (a)), this dierence is because we esti-
mate times from batch start and is independent of protocol (the cost in (i) to (c) is similar
to that of (g) to (a)).
TLS privacy: stub-to-recursive: Connection establishment for TLS is much more
expensive than TCP, requiring additional round trips and computation to establish a
session key. We repeat our experiments from the prior section, this time comparing
UDP with TLS. For consistency with our out-of-order experiments, we place our proxy
resolver on the same machine as the recursive resolver.
Figure 3.10 show TLS performance as cases (c), (e) and (i), all green bars. With
sequential queries: when RTT is small, TLS (c: left) performance is almost the same
as UDP (a: left) and TCP (b:left), because TLS handshake cost (3 RTTs) is negligible
relative to the cost of the recursive-to-authoritative query (ten vs. hundreds of ms). When
RTT becomes larger, TLS without (c: right) connection reuse incurs somewhat higher
latency than UDP (a: right), but their performance is equivalent with connection reuse
(e: right). With both pipelining and out-of-order processing: TLS performance (i) is
comparable with UDP (g), no matter the RTT is large or not. In all cases, variation in
timing, as shown by the quartile boxes, is far larger than dierences in medians, although
variation rises with larger RTTs.
Overall Stub-to-Recursive: In summary, this section shows that when the stub
and recursive resolvers are close to each other the extra packet exchanges add very
small latency to the query, and even the TLS connection setup cost is dwarfed by the
costs involved in making distributed DNS queries to authoritative name servers. Second,
minimizing connection setup requires reusing connections, and we showed that head-of-
line blocking in the TCP processing of current resolver implementations adds significant
latency. Current resolvers have most of the machinery to fix this problem, and our
149
experiments show out-of-order processing allows DNS performance with both TCP and
TLS to be very close to that of simple UDP.
6.4.2 Detailed Discussion of Latency: Recursive Resolver to
Authoritative Server
We next turn to latency we expect between the recursive resolvers and authoritative name
servers. While stubs query only a few, usually nearby recursive resolvers, authoritative
servers are distributed around the globe and so the recursive/authoritative round-trip
times are both larger and more diverse.
Typical Recursive-to-Authoritative RTTs: To estimate typical recursive-to-
authoritative RTTs, we again turn to the Alexa top-1000 sites. We query each from
four locations: our institution in Los Angeles (isi.edu), and PlanetLab sites in China
(www.pku.edu.cn), UK (www.cam.ac.uk), and Australia (www.monash.edu.au).
For each site we query each domain name. We use dig +trace to resolve each
domain component from the root to the edge, including DNSSEC where possible. We
report the median of 10 repetitions of the query time of the last step to estimate of
best-case recursive-to-authoritative RTTs. This method represents performance as if
higher layers were already cached by the recursive resolver, and median provides some
robustness to competing trac and random selection from multiple name servers.
We observe that the U.S. and UK sites are close to many authoritative servers, with
median RTT of 45 ms, but it also has a fairly long tail, with 35% more than 100 ms. The
Chinese site has generally longer RTTs, with only 30% responding in 100 ms. While
many large sites operate Asian mirrors, many don’t. The Australian site shows a sharp
shift with about 20% of sites less than 30 ms, while the remaining 150 ms or longer.
This jump is due to the long propagation latency for services without sites physically in
Australia. (See Figure 6.7 for data.)
150
We see a similar shift when we look at less popular services. Examination of 1000
domains randomly chosen from the Alexa top-1M sites shows that median RTT is 20–
40 ms larger than for the top-1000 sites, with the largest shifts in China and Australia.
(See Figure 6.8 for data.)
Overall, the important dierence compared to stub-to-recursive RTTs is that while a
few authoritative servers are close (RTT< 30 ms), many will be much further.
TCP connection setup: recursive-to-authoritative: With noticeably larger RTTs
to authoritative servers compared to the stub/recursive RTTs, we expect to see a much
higher overhead for connection negotiation with TCP and TLS.
To evaluate query latencies with larger RTTs between client and server, we set up
a DNS authoritative server for an experimental domain and queried it from a client
35 ms (8 router hops on a symmetric path) away. Since performance is dominated by
round trips instead of computation, we measure latency in units of RTT and these results
generalize to other RTTs.
We operate a BIND-9.9.3 server as the authoritative name server for an experimental
domain (example.com) at one site. For each protocol, we query this name server directly,
140 times (query example.com), then vary the protocol in use. As before, we repeat this
experiment 10 times and report results combining all experiments.
We first compare UDP with TCP without connection reuse , the two leftmost bars
(a, b) in Figure 3.11. We see that all queries made by TCP (b) take about 70 ms, exactly
two RTTs, due to TCP’s handshake followed by the request and response. This overhead
goes away with TCP fast-open (d), even without connection reuse.
With connection reuse, TCP (e) also takes only 35 ms, one RTT per query. This
dierence shows the importance in reusing TCP connections for multiple queries to
avoid connection setup latency, highlighting the need for good connection hit ratios
(x3.4).
151
We next consider pipelining multiple queries over a single TCP connection and sup-
porting out-of-order processing. Basic UDP already supports both of these. To match
our prior experiment we implement these options for TCP with a proxy server running
on the same computer as the authoritative server, and we plot these results as (h, light
blue) in Figure 3.11. In this case, median TCP latency is about 2.5 RTTs. Examination
of the raw data shows that 10% of the queries complete with performance similar to
UDP, while the other queries take slightly longer, in steps. We examined packet traces
and verified each step is a single TCP packet with 12 or 13 responses. Thus the delay is
due to synchronization overhead as all 140 responses, processed in parallel, are merged
into a single TCP connection in our proxy. For this special case of more than 100 queries
arriving simultaneously, a single connection can add some latency.
TLS privacy: recursive-to-authoritative: Next we consider the addition of TLS.
Use of TLS from recursive-to-authoritative is a policy decision; one might consider
aggregation at the recursive resolver to provide sucient anonymity, or one might
employ TLS on both hops as a policy matter (for example, as with HTTPS Every-
where [Ele11]). Here we consider the eects on latency of full use of TLS.
In Figure 3.11, green cases (c), (f), and (i) show TLS usage. Without connection
reuse (c), TLS always takes 5 RTTs (175 ms). This corresponds to one RTT to setup
TCP, one to negotiate DNS-over-TLS (x3.3.2.2), two for the TLS handshake, and then
the final private query and response.
However, once established, the TLS connection can easily be reused. If we reuse the
existing TLS connection and send queries in stop-and-wait mode (f), TLS performance
is identical to UDP with a mean latency of one RTT, except for the first TLS query. This
result shows that the expense of encryption is tiny compared to moderate round-trip
delays, when we have an established connection.
152
Finally, when we add pipelining and out-of-order processing , we see similar stair-
stepped behavior as with TCP, again due to synchronization over a single connection
and our unoptimized proxy. The rightmost case (i, light-green) shows connection reuse,
pipelining, and out-of-order processing; with this combination TLS performance is
roughly equivalent to TCP, within measurement noise.
Overall Recursive-to-Authoritative: This section showed that round-trip latency
dominates performance for queries from recursive resolvers to authoritative name
servers. Latency is incurred in connection setup, with TCP adding one additional RTT
and TLS three more. This latency is very expensive, but it can be largely eliminated by
connection reuse.
6.5 Data To Estimate Stub-to-Recursive and Recursive-
to-Authoritative RTTs
We need to know the typical stub-to-recursive and recursive-to-authoritative RTTs in
order to accurately estimate the end-to-end query latency with our model inx3.6.5. We
use dierent ways to measure stub-to-recursive (x6.4.1) and recursive-to-authoritative
(x6.4.2) RTTs.
Stub-to-Recursive RTT: Figure 6.6 shows the CDF of Stub-to-Recursive RTT from
400 Planetlab nodes to the ISP-provided and three third-party recursive resolvers. ISP-
provided recursive resolvers are almost always close. Third-party resolvers show more
variation, but most have fairly low latency due to distributed infrastructure.
Recursive-to-Authoritative RTT: Figure 6.7 shows the CDF of Recursive-to-
Authoritative RTT from recursive resolvers at four locations to authoritative servers of
Alexa top-1000 domains. Dierent locations give dierent results: U.S. and UK sites
are close to many authoritative servers while China and Australia shows longer RTTs.
153
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120 140
CDF
RTT between stub and recursive resolvers (ms)
ISP
Google Level3
OpenDNS
Figure 6.6: Measured RTTs from stub and recursive resolver from 400 PlanetLab nodes,
for the ISP-provided and three third-party recursive resolvers.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500
CDF
RTT between recursive and authoritative (ms)
Western US
China
Australia
UK
Figure 6.7: RTTs from recursive resolvers at four locations to authoritative servers of
Alexa top-1000 domains
To understand if top-1000 sites use better quality DNS providers than top-1M sites,
Figure 6.8 shows the recursive-to-authoritative RTTs: from recursive resolvers at four
locations: Los Angeles, China, UK and Australia to authoritative servers of 1000
domains chosen randomly from Alexa top-1M. The data shows that top-1000 sites use
better DNS providers, with lower global latency. The results are most strong when
154
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500
CDF
RTT between recursive and authoritative (ms)
Western US
China
Australia
UK
Figure 6.8: RTTs from recursive resolvers at four locations to authoritative servers of
1000 domains chosen randomly from Alexa top-1M
viewed from China or Australia—about 20% of the top-1000 have DNS service local to
Australia, but that drops to about 8% of the top-1M.
Implications for modeling: This evaluation implies that estimates of median
latency based on the top-1000 sites are better than estimates based on the top-1M sites.
However, the top-1000 sites get many queries. Analysis of a sample (1M queries) of
Level3/cns4.lax1 shows that top-1000 sites get 20% of queries, while 80% are sent to
the non-top-1000 sites.
Our modeling is based on R
ra
= 40 ms, drawn from Figure 6.7. Since the top-1000
represent typical common queries, this modeling is representative for what end-users
will experience for typical queries. Latency will be larger for rarely used sites with
poorer DNS infrastructure; our modeling underestimates the cost measured across all
sites because rarely used sites are more greatly aected. We suggest that this focus is
appropriate, since the Internet should be designed to support users, not site operators. It
also suggests the need for anycast support for DNS hosting to lower latency for global
users.
155
6.6 Additional Data for Client-Side Latency
Figure 6.9 shows the data that underlies Figure 3.13.
6.7 Detailed Evaluation of Deployment
Full exploration of deployment of T-DNS is complicated by the large and varied installed
base of DNS resolvers and servers, the necessity of incremental deployment, and the
challenge of non-compliant implementations and possibly interfering middleboxes. We
discuss these issues next in the context of our three goals: improving privacy, preventing
DoS, and relaxing policy constraints.
6.7.1 Overall goals and constraints
Overall, our view is that T-DNS can be deployed gradually and incrementally. Its bene-
fits accrue to those who deploy it at both client and server, and until both sides upgrade it
either disabled on trial or passive and unexercised. Since T-DNS is “hop-by-hop,” some
interfering middleboxes can essentially result in a downgrade attack, causing clients to
fall back to standard DNS. Middleboxes might aect TCP and TLS dierently. Policy
benefits of T-DNS require widespread deployment; we discuss how partial deployment
aects them below.
Eective gradual deployment requires that costs and benefits be aligned, and that
costs be minimal. We show that costs and benefits are most strongly aligned for privacy
and DoS, where users or operators are directly aected.
We assume clients and servers use current commodity hardware and operating sys-
tems, and DNS client and server software with the changes we suggest. An implicit
cost of our work is therefore the requirement to upgrade legacy hardware, and to deploy
156
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
CDF
connection hit fraction
10
20
30
40
50
60
120
240
480
(a) Dataset: DNSChanger
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
CDF
connection hit fraction
(b) Dataset: Level 3, cns[1-4].lax1.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
CDF
connection hit fraction
(c) Dataset: B Root
Figure 6.9: Cumulative distribution of client-side connection hit fraction.
157
software with our extensions. This requirement implies full deployment concurrent with
the natural business hardware upgrade cycle, perhaps 3 to 7 years. The client and server
software changes we describe are generally small, and our prototypes are freely avail-
able.
Some clients today struggle or even fail to receive large responses. We believe
these problems are due to overly restrictive firewalls, or incorrect implementations. Net-
alyzr results suggest about 5% cannot do TCP [WKNP11], but this error rate is much
lower than the 8% and 9% that cannot send or receive fragmented UDP. More recent
data suggests only 2.6% of web users are behind resolvers that fail to retry queries
over TCP [Hus13]. A 2010 study of DNSSEC through home routers showed mixed
results, although full success when DNSSEC clients bypass DNS proxies in home
routers [Die10]. This data suggests that TCP is the best current choice to handle large
responses and T-DNS is more deployable than the alternatives (such as fragmented UDP
messages), but it will be unsuitable for a small fraction of clients (5% or less) until home
equipment is reconfigured or upgraded.
We recognize that some clients and servers are more challenging. For clients, it may
not be economical to field-upgrade embedded clients such as home routers. We suggest
that such systems still often have an upgrade cycle, although perhaps a longer one that
is driven by improvements to the wireless or wireline access networks.
Some ISPs have developed extensive and custom server infrastructure. For example,
Schomp et al. identifies the mesh of recursive and caching servers used by Google and
similar large providers [SCRA13]. This infrastructure is proprietary and so it is dicult
to speculate on the diculty of their ability to supporting large responses internally.
Since the IETF standards require support for large replies, they may already include
such support, and if it employs TCP or a similar program their cost of upgrade may be
similar to T-DNS . Upgrade is more challenging if they make assumptions that assume
158
a subset the specifications, such as assuming UDP is always sucient. However, a
single provider with internal infrastructure may have upgrade paths unavailable to the
decentralized Internet, such as mandating use of 9 kB UDP jumbograms.
A final complexity in resolvers are use of load balancers and anycast. We observe
that the web server community manages both of these challenges, with large-scale load
balancers in cloud providers, although at some cost (for example, see [PJ13]). DNS
trac, even over TCP, is much easier to handle than web queries, since query response
trac is much smaller than most web trac and queries are stateless. A load balancer
that tracks all TCP connections will face the same state requirements as end servers, with
50–75k active connections in a large authoritative server. As with servers, we allow load
balancers to shed load should they exceed capacity; DNS’s anycast and replication can
assist distributing hot spots.
Anycast for DNS implies that routing changes may redirect TCP connections to
servers at other sites. Since a new site lacks state about connections targeted elsewhere,
a routing change causes open connections to reset and clients must then restart. Fortu-
nately, routing changes are relatively rare for 99% of clients [BFR06] and even the most
frequent changes occur many times less often than our TCP timeout interval and so will
aect relatively few connections. We also observe that some commercial CDNs (such
as Edgecast [Che13]) provide web trac with anycast successfully, strongly suggesting
few problems are likely for T-DNS .
6.7.2 Improving privacy
Use of T-DNS to improve privacy requires updates to both the stub and recursive
resolvers, and their ability to operate without interference. Usually the stub resolver is
determined by the host operating system, but applications can use custom stub resolvers
if they choose to. Recursive resolvers are usually controlled by ones ISP, but millions
159
of users opt in to third-party, public DNS infrastructure. We anticipate deployment for
most users will occur automatically through the process of upgrades to end-host OSes
and ISP resolvers. Upgrades to both ends do not need to be synchronized: T-DNS will
be enabled as soon as both sides are upgraded.
Because privacy is an issue between users and their ISPs (or their providers of public
DNS service), costs and benefits are well aligned: users interested in private DNS can
seek it out and providers may use privacy as a dierentiator. Just as HTTPS is now
widely deployed for webmail, privacy can become a feature that operators of public
DNS services promote (for example [Ele11]), justifying the higher computational cost
they incur.
Interference from middleboxes is the primary threat to T-DNS privacy. We consider
adversarial and accidental middleboxes. We oer no alternatives against an on-path,
adversarial middlebox that intentionally disables TLS negotiation, other than to allow
the client to refuse to operate if TLS is disabled. We know of no security protocol that
can recover from an adversary that can drop or alter arbitrary packets.
There are several flavors of accidental middleboxes that may aect T-DNS . We
expect a T-DNS-aware middlebox to receive T-DNS queries, and make outgoing TCP
or TLS queries, or perhaps transparently forward T-DNS that use TLS. Our T-DNS
upgrade is designed to be discarded by conforming middleboxes unaware of it, since
both EDNS0 extensions and CHAOS queries are defined to be hop-by-hop and so should
not be forwarded. Thus an EDNS0-conforming transparent DNS accelerator will drop
the TO-bit in T-DNS negotiation, disabling T-DNS but not preventing regular DNS. A
non-conforming middlebox that passes the TO-bit but does not understand TLS will
attempt to interpret TLS negotiation as DNS-encoded queries. A likely outcome is that
the DNS client and server will fail TLS negotiation; clients should be prepared to fall
back without T-DNS in this case. The middlebox will interpret the TLS negotiation as
160
malformed DNS packets; should discard them if it is robust to fuzz testing [MFS90], as
all protocols that interpret packets from the public network. A less robust middlebox
may crash, indicating it is likely vulnerability to buer overruns.
Although we cannot study all possible middleboxes, we tested an unmodified version
of dnsmasq, a DNS forwarder that is widely used on home routers. We confirmed that
it correctly doesn’t forward our request to upgrade to TLS, and that it does not crash
but suppresses a “forced” TLS attempt. A T-DNS-aware implementation of dnsmasq is
future work.
A special case of an interfering middlebox is “hotspot signon” interception. Pub-
lic networks such as wifi hotspots often require end-user identification via a web form
before opening access to the public Internet. They redirect all DNS and web trac
to a proxy where the user self-identifies, allowing an opportunity for billing or access
control. Applications today must cope with this transient state where network access is
limited and all trac is redirected. DNSSEC-trigger shows a possible work-around: on
network activation, it identifies DNSSEC failure, alerts the user, and retries frequently
waiting for full network access [NLn14].
We do not focus on use of TLS between recursive and authoritative servers. Recur-
sive resolvers that choose to use TLS will face similar challenges as above, and deploy-
ment of TLS across millions of authoritative resolvers will be a long-term proposition.
Fortunately, aggregation at the recursive resolver provides some degree of privacy to
stubs, so slow deployment here has relatively little privacy penalty.
6.7.3 Preventing DoS
Traditional anti-DoS methods have been challenged by needing near-full deployment
to be eective, and a mis-alignment of costs with benefits. As a result, deployment of
practices like ingress filtering [FS00] has been slow [BBHc09].
161
Two eects help align T-DNS deployments costs with benefits of reducing DoS
attacks. First, DNS operators suer along with victims in DNS amplification attacks—
DNS operators must support the trac of the requests and the amplified responses.
Desire to constrain these costs motivates deployment of solutions at the DNS server.
Second, large DNS operators today typically deploy a great deal of overcapacity to
absorb incoming UDP-based DoS attacks. We suggest that shifting some of that capacity
from UDP to TCP can provide robustness to absorbing DoS attacks.
The main challenge in use of T-DNS to reduce DoS is the necessity of backwards
compatibility with a UDP-based client population. Authoritative resolvers must accept
queries from the entire world, and UDP-based queries must be accepted indefinitely. To
manage this transition, we support the idea of rate-limiting UDP responses, which is
already available in major implementations and in use by some DNS operators [Vix12].
A shift to TCP would allow these rates to be greatly tightened, encouraging large
queriers to shift to TCP.
A secondary challenge is optimizing TCP and TLS use servers so that they do not
create new DoS opportunities. Techniques to manage TCP SYN floods are well under-
stood [Edd07], and large web providers have demonstrated infrastructure that serves
TCP and TLS in the face of large DoS attacks. We are certain that additional work is
needed to transition these practices to TCP-based DNS operations.
6.7.4 Removing Policy Constraints
Global changes to address policy constraints are very challenging—costs aect everyone
and benefits accrue only with near-full adoption. However, EDNS0 shows migration is
possible (although perhaps slow), from standardization in 1999 [Vix99] to widespread
use today.
162
EDNS0 deployment was motivated by the need of DNSSEC to send responses larger
than 512 B, and the implications of not supporting EDNS0 is reduced performance.
We suggest that T-DNS presents a similar use-case. Because all DNS implementations
require TCP today when UDP results in a truncated reply, defaulting to TCP for DNS
instead of trying UDP and failing over to TCP is largely about improving performance
(avoiding a UDP attempt and the TCP retry), not about correctness. EDNS0 suggests
we might expect a transition time of at least ten years.
6.7.5 Comparison to deployment of alternatives
Finally, it is useful to consider the deployment costs of T-DNS relative to alternatives.
DNS-over-HTTPS (perhaps using XML or JSON encodings) has been proposed as
a method that gets through middleboxes. We believe DNS-over-HTTPS has greater
protocol overheads than T-DNS: both use TCP, but use of HTTP adds a layer of HTTP
headers. It also requires deployment of an completely new DNS resolution infrastructure
in parallel with the current infrastructure. Its main advantage is avoiding concerns about
transparent DNS middleboxes that would be confused by TLS. We suggest that the
degree to which this problem actually occurs should be studied before “giving up” and
just doing HTTP. The performance analysis of T-DNS largely applies to DNS-over-
HTTPS, oering guidance about what performance should be expected.
Use of a new port for T-DNS would avoid problems where middleboxes misinterpret
TLS-encoded communication on the DNS port. It also allows skipping TLS negotiation,
saving one RTT in setup. Other protocols have employed STARTTLS to upgrade exist-
ing protocols with TLS, but an experimental study of interference on the DNS reserved
port is future work. The DNS ports are often permitted through firewalls today, so use
of a new port may avoid problems with DNS-inspecting middleboxes only to create
problems with firewalls requiring an additional open port.
163
6.8 Potential Vulnerabilities Raised by TCP
Wide use of TCP risks raising new vulnerabilities in DNS. We address several here.
There are a set of attacks on TLS that exploit compression to leak information about
compressed-then-encrypted text [Kel02]. Versions of these attacks against HTTPS are
known as CRIME and BREACH. For HTTPS, these attacks depend on the attacker
being able to inject input that precedes the target text in the reply. For DNS queries this
condition is not met and so CRIME does not pay.
One may be concerned that long-lived DNS TCP connections can lead to exhaustion
of TCP ports. This concern is incorrect for several reasons. The main reason is that the
number of TCP ports is defined by the 4-tuple of (source IP, source port, destination IP,
destination port). It is not defined only by destination port, which for DNS will always
be fixed at 53. The (source IP, source port) portion provides an eectively unlimited
resource because interactions with many dierent servers will get many source IPs, and
we expect individual servers to keep TCP connections open and reuse them. The worst
scenario is that an attacker can spoof 65k source ports for a single victim, but even then
the DNS server will return SYN-ACKs with SYN cookies to the victim; if such an attack
becomes common victims could deploy countermeasures such as discarding unsolicited
SYN-ACKs. This analysis applies to DNS servers, clients, and load balancers. Addi-
tional safety comes from our approach to deal with all resource exhaustion: a server
can always close connections to shed resources if it detects resource contention, such as
running low on free memory. Finally, the existence of large-scale web servers demon-
strates that it is clearly possible to scale to support many concurrent TCP connections,
well within the tens of thousands of active connections we project as being required.
Attacks on the TCP sequence number space are another potential risk, with concerns
that connections could be reset [Wat04] or even data injected. T-DNS provides strong
164
protection against trac injection when TLS is used. For TCP-only queries, risk of
these attacks is minimized with strong initial sequence numbers [GB12]. T-DNS clients
must be prepared to resume interrupted connections, so a successful connection reset
(an expensive proposition to guess the sequence number) will cause only a slight delay.
6.9 Relationship of T-DNS and TLS to DTLS
We have asserted that our modeling of T-DNS also applies to DNS-over-DTLS, if one
removes the RTT for the TCP three-way-handshake, because both implement the same
fundamental cryptographic protocol. In short, our argument is that There Is No Free
Lunch—DTLS must pay nearly the same costs as TLS over TCP, because both require
ordering and sequencing in the TLS handshake. We next review that argument in more
detail.
TLS Handshake: The DTLS handshake reproduces the TLS handshake, including
the assumption that messages are delivered reliably and in-order.
DTLS lacks the TCP three-way handshake, saving one RTT. However to avoid DoS
attacks, it adds a cookie analogous to TCP SYN cookies. DTLS makes cookie exchange
optional as a separate step. If done separately it becomes equivalent to the TCP hand-
shake. If done with the first step of the TLS handshake, it avoids an extra RTT but
allows an amplification attack. Cookies can be cached and replayed, analogous to TCP
fast-open.
To provide reliability and ordering of an arbitrary-sized TLS handshake pro-
cess DTLS adds a message sequence number and fragmentation handling to mes-
sage exchanges, recreating a subset of TCP. DTLS’s uses timeouts based on
165
TCP’s [PACS11], but the implementation is intentionally simpler (for example, omit-
ting fast retransmit). It seems unlikely that DTLS retransmission will perform better
than TCP under loss, and likely that it may perform somewhat worse.
Operation: The primary operational dierence, post handshake, is that DTLS forces
use of block ciphers, not stream ciphers, and that DTLS does not force packet ordering.
Block ciphers expose some information hidden in stream ciphers, where prior trac
aects subsequent encryption.
DTLS requires each DTLS record fit in a single datagram and it strives to avoid IP
fragmentation. Thus DTLS exacerbates the existing problems with large DNS replies,
adding at least 12 bytes to each packet.
TCP bytestreams allow aggregation of concurrent requests into a single packet.
Aggregation reduces per-packet overhead by sending more information in each packet.
We disable Nagle’s algorithm in our work to avoid adding latency; in experiments we
still see aggregation when multiple requests occur in bursts. This kind of aggregation is
not possible with DNS-over-DTLS (over UDP).
DTLS and TLS trade-os: Because the cryptographic protocols of TLS over TCP
and DTLS over UDP are so similar we see little performance dierence between them.
There are two semantic dierences: DTLS is advantageous because it imposes no order-
ing on individual requests. Thus it gets pipelining and out-of-order processing automat-
ically, just as basic UDP does; we describe how to provide those in the application for
TCP, but we still suer from head-of-line blocking from lost packets. TCP is advan-
tageous because it imposes no per-packet size limits. We identify policy constraints
brought on by per-packet size limits as a problem, so for this reason we prefer DNS-
over-TLS and TCP over DNS-over-DTLS and UDP.
166
Bibliography
[AAL
+
05] R. Arends, R. Austein, M. Larson, D. Massey, and S. Rose. DNS Security
Introduction and Requirements. RFC 4033 (Proposed Standard), March
2005.
[AA VS13] Devdatta Akhawe, Bernhard Amann, Matthias Vallentin, and Robin Som-
mer. Here’s my cert, so trust me, maybe?: Understanding tls errors on the
web. In Proceedings of the 22nd International Conference on World Wide
Web, WWW ’13, pages 59–70, New York, NY , USA, 2013. ACM.
[ADF06] B. Ager, H. Dreger, and A. Feldmann. Predicting the DNSSEC overhead
using DNS traces. In Annual Conference on Information Sciences and
Systems, pages 1484–1489, March 2006.
[Ale] Alexa. http://www.alexa.com.
[Ano12] Anonymous. The collateral damage of internet censorship by DNS injec-
tion. SIGCOMM Comput. Commun. Rev., June 2012.
[Arb12] Arbor Networks. Worldwide infrastructure security report. Technical
report, Arbor Networks, September 2012.
[Art11] Charles Arthur. DigiNotar SSL certificate hack amounts to cyberwar,
says expert. http://www.theguardian.com/technology/2011/sep/05/
diginotar-certificate-hack-cyberwar, September 2011.
[Bar96] Barr, D. Common DNS Operational and Configuration Errors. RFC 1912
(Proposed Standard), February 1996.
[BBHc09] Robert Beverly, Arthur Berger, Young Hyun, and k clay. Understanding
the ecacy of deployed internet source address validation filtering. In Pro-
ceedings of the 9th ACM SIGCOMM Conference on Internet Measurement,
IMC ’09, pages 356–369, New York, NY , USA, 2009. ACM.
[Bel10] R. Bellis. DNS Transport over TCP - Implementation Requirements. RFC
5966 (Proposed Standard), August 2010.
167
[Ben11] Terry Benzel. The science of cyber security experimentation: The deter
project. In Proceedings of the 27th Annual Computer Security Appli-
cations Conference, ACSAC ’11, pages 137–148, New York, NY , USA,
2011. ACM.
[Ber11] Vincent Bernat. SSL computational DoS mitigation. blog
http://vincent.bernat.im/en/blog/2011-ssl-dos-mitigation.html,
November 2011.
[BFR06] Hitesh Ballani, Paul Francis, and Sylvia Ratnasamy. A measurement-based
deployment proposal for ip anycast. In Proceedings of the 6th ACM SIG-
COMM Conference on Internet Measurement, IMC ’06, pages 231–244,
New York, NY , USA, 2006. ACM.
[BFVSK09] J. Brustoloni, N. Farnan, R. Villamarin-Salomon, and D. Kyle. Ecient
detection of bots in subscribers’ computers. In 2009 IEEE International
Conference on Communications, pages 1–6, June 2009.
[Bha11] Sathya Bhat. Gmail Users in Iran Hit by MITM Attacks. http://techie-
buzz.com/tech-news/gmail-iran-hit-mitm.html, August 2011.
[BHH
+
10] Andrea Bittau, Michael Hamburg, Mark Handley, David Mazi` eres, and
Dan Boneh. The case for ubiquitous transport-level encryption. In Pro-
ceedings of the 19th USENIX Conference on Security, USENIX Secu-
rity’10, pages 26–26, Berkeley, CA, USA, 2010. USENIX Association.
[BKkc13] Robert Beverly, Ryan Koga, and kc clay. Initial longitudinal analysis of
IP source spoofing capability on the Internet. Internet Society, July 2013.
[BKKP00] R. Bush, D. Karrenberg, M. Kosters, and R. Plzak. Root name server
operational requirements. RFC 2870, Internet Request For Comments,
June 2000. (also Internet BCP-40).
[BMS11] Michael Butkiewicz, Harsha V . Madhyastha, and Vyas Sekar. Understand-
ing website complexity: Measurements, metrics, and implications. In Pro-
ceedings of the ACM Internet Measurement Conference, pages 313–328,
November 2011.
[Bor13a] S. Bortzmeyer. DNS privacy problem statement. Work in progress (Inter-
net draft) draft-bortzmeyer-dnsop-dns-privacy-01, December 2013.
[Bor13b] S. Bortzmeyer. JSON format to represent DNS data. Work in progress
(Internet draft) draft-bortzmeyer-dns-json-01, February 2013.
[Bor15] S. Bortzmeyer. DNS privacy considerations. RFC 7626, August 2015.
168
[bro] The Bro Network Security Monitor. https://www.bro.org.
[Car00] B. Carpenter. Internet Transparency. RFC 2775 (Proposed Standard),
February 2000.
[CCR
+
03] Brent Chun, David Culler, Timothy Roscoe, Andy Bavier, Larry Peterson,
Mike Wawrzoniak, and Mic Bowman. Planetlab: An overlay testbed for
broad-coverage services. SIGCOMM Comput. Commun. Rev., 33(3):3–12,
July 2003.
[CCRJ14] Y . Cheng, J. Chu, S. Radhakrishnan, and A. Jain. TCP Fast Open. RFC
7413, December 2014.
[CDW06] Cristian Coarfa, Peter Druschel, and Dan S. Wallach. Performance analy-
sis of tls web servers. ACM Trans. Comput. Syst., 24(1):39–69, February
2006.
[CFH
+
13] Matt Calder, Xun Fan, Zi Hu, Ethan Katz-Bassett, John Heidemann, and
Ramesh Govindan. Mapping the expansion of Google’s serving infrastruc-
ture. In Proceedings of the ACM Internet Measurement Conference, page
to appear, Barcelona, Spain, October 2013. ACM.
[Che13] Andre Cheung. CDN demonstration: EdgeCast application delivery
network. web page http://demo.cheungwaikin.com/adn/adn-a.html,
April 2013.
[chr] chromium. web-page-replay. https://github.com/chromium/web-page-
replay.
[Clo18] Cloudflare. Cname flattening: Rfc-compliant support for cname at the root.
https://support.cloudflare.com/hc/en-us/articles/200169056-
CNAME-Flattening-RFC-compliant-support-for-CNAME-at-the-root,
July 2018.
[Com11] Comodo. Comodo Fraud Incident. https://www.comodo.com/Comodo-
Fraud-Incident-2011-03-23.html, March 2011.
[CPGA09] Sergio Castillo-Perez and Joaquin Garcia-Alfaro. Evaluation of two
privacy-preserving protocols for the DNS. In Proceedings of the 6thIEEE
International Conference on Information Technology: New Generations,
pages 411–416. IEEE, April 2009.
[CSF
+
08] D. Cooper, S. Santesson, S. Farrell, S. Boeyen, R. Housley, and W. Polk.
Internet X.509 Public Key Infrastructure Certificate and Certificate Revo-
cation List (CRL) Profile. RFC 5280, May 2008.
169
[CWFC08] Sebastian Castro, Duane Wessels, Marina Fomenkov, and Kimberly Clay.
A day at the root of the Internet. ACM Computer Communication Review,
38(5):41–46, October 2008.
[DB13] Jerey Dean and Luiz Andr´ e Barroso. The tail at scale. Communications
of the ACM, 56(2):74–80, February 2013.
[Dem10] M. Dempsky. DNSCurve: Link-level security for the Domain Name Sys-
tem. Work in progress (Internet draft) draft-dempsky-dnscurve-01, Febru-
ary 2010.
[DGV13] J. Damas, M. Gra, and P. Vixie. Extension mechanisms for DNS
(EDNS(0)). RFC 6891, April 2013.
[Die10] Thorsten Dietrich. DNSSEC support by home routers in Germany. Pre-
sention at RIPE-60, May 2010.
[DLK
+
14] Zakir Durumeric, Frank Li, James Kasten, Johanna Amann, Jethro Beek-
man, Mathias Payer, Nicolas Weaver, David Adrian, Vern Paxson, Michael
Bailey, and J. Alex Halderman. The matter of heartbleed. In Proceedings
of the 2014 Conference on Internet Measurement Conference, IMC ’14,
pages 475–488, New York, NY , USA, 2014. ACM.
[DNS17] DNS-OARC. Day In The Life of the internet (DITL) 2017. https://www.
dns-oarc.net/oarc/data/ditl/2017, April 2017.
[DOa] DNS-OARC. https://www.dns-oarc.net.
[DOb] DNS-OARC. dnsjit. https://github.com/DNS-OARC/dnsjit.
[DOc] DNS-OARC. drool. https://github.com/DNS-OARC/drool.
[DR92] Alex Delis and Nick Roussopoulos. Performance and scalability of client-
server database architectures. In Proceedings of the 18th International
Conference on Very Large Data Bases, VLDB ’92, pages 610–623, San
Francisco, CA, USA, 1992. Morgan Kaufmann Publishers Inc.
[DR08] T. Dierks and E. Rescorla. The Transport Layer Security TLS Protocol
Version 1.2. RFC 5246 (Proposed Standard), August 2008.
[Dro97] R. Droms. Dynamic host configuration protocol. RFC 2131, Internet
Request For Comments, March 1997.
170
[DWZ
+
12] Haixin Duan, Nicholas Weaver, Zongxu Zhao, Meng Hu, Jinjin Liang, Jian
Jiang, Kang Li, and Vern Paxson. Hold-on: Protecting against on-path dns
poisoning. In Proc. Workshop on Securing and Trusting Internet Names,
SATIN, 2012.
[Eas14] Donald E. Eastlake. Domain Name System (DNS) cookies. Work in
progress (Internet draft) draft-eastlake-dnsext-cookies-04, January 2014.
[Edd07] W. Eddy. TCP SYN flooding attacks and common mitigations. RFC 4987,
Internet Request For Comments, August 2007.
[Ele11] Electronic Frontier Foundation. Encrypt the web with HTTPS everywhere.
Web page https://www.eff.org/https-everywhere, August 2011.
[Fac18] Facebook. http://newsroom.fb.com/company-info, 2018.
[FCW
+
03] Dr. Warwick S. Ford, Dr. Santosh Chokhani, Stephen S. Wu, Randy V .
Sabett, and Charles (Chas) R. Merrill. Internet X.509 Public Key Infras-
tructure Certificate Policy and Certification Practices Framework. RFC
3647, November 2003.
[Fin14] T. Finch. Secure SMTP using DNS-based authentication of named entities
(DANE) TLSA records. Work in progress (Internet draft) draft-ietf-dane-
smtp-01, February 2014.
[FS00] P. Ferguson and D. Senie. Network ingress filtering: Defeating denial
of service attacks which employ IP source address spoofing. RFC 2267,
Internet Request For Comments, May 2000. also BCP-38.
[FST18] D. Franke, D. Sibold, and K. Teichel. Network time security for the
network time protocol. Internet-Draft draft-ietf-ntp-using-nts-for-ntp-10,
Internet Engineering Task Force, March 2018. Work in Progress.
[FTY99] Theodore Faber, Joe Touch, and Wei Yue. The TIME-WAIT state in TCP
and its eect on busy servers. In Proceedings of the IEEE Infocom, 1999.
[GB12] F. Gont and S. Bellovin. Defending against sequence number attacks. RFC
6528, Internet Request For Comments, February 2012.
[Goo12] Google. https://www.google.com/zeitgeist/2012/#the-world, 2012.
[Gre11] Lisa Green. Common crawl enters a new phase. Com-
mon Crawl blog http://www.commoncrawl.org/common-crawl-enters-
a-new-phase, November 2011.
171
[Gre13] Glenn Greenwald. NSA collecting phone records of millions of Verizon
customers daily. The Guardian, June 2013.
[Haa] Herbert Haas. Mausezahn. http://netsniff-ng.org.
[HBKC11] Ralph Holz, Lothar Braun, Nils Kammenhuber, and Georg Carle. The
ssl landscape: A thorough analysis of the x.509 pki using active and pas-
sive measurements. In Proceedings of the 2011 ACM SIGCOMM Confer-
ence on Internet Measurement Conference, IMC ’11, pages 427–444, New
York, NY , USA, 2011. ACM.
[Hei97] John Heidemann. Performance interactions between p-http and tcp imple-
mentations. SIGCOMM Comput. Commun. Rev., 27(2):65–73, April 1997.
[Hen] Addy Yeow Chin Heng. Bit-twist. http://bittwist.sourceforge.net.
[HM18] Paul E. Homan and Patrick McManus. DNS Queries over HTTPS (DoH).
Internet-Draft draft-ietf-doh-dns-over-https-14, Internet Engineering Task
Force, August 2018. Work in Progress.
[HPG
+
08] John Heidemann, Yuri Pradkin, Ramesh Govindan, Christos Papadopou-
los, Genevieve Bartlett, and Joseph Bannister. Census and survey of the
visible internet. In Proceedings of the ACM Internet Measurement Confer-
ence, pages 169–182, V ouliagmeni, Greece, October 2008. ACM.
[HS12] P. Homan and J. Schlyter. The DNS-based authentication of named enti-
ties (DANE) transport layer security (TLS) protocol: TLSA. RFC 6698,
Internet Request For Comments, August 2012.
[HS13] Amir Herzberg and Haya Shulmanz. Fragmentation considered poisonous.
In Proceedings of the IEEE Conference on Communications and Network
Security (CNS), October 2013.
[Hus13] Geo Huston. A question of protocol. Talk at RIPE ’67, 2013.
[HZH
+
14] Z. Hu, L. Zhu, J. Heidemann, A. Mankin, and D. Wessels. Starting TLS
over DNS. Work in progress (Internet draft) draft-start-tls-over-dns-00,
January 2014.
[HZH
+
16] Z. Hu, L. Zhu, J. Heidemann, A. Mankin, D. Wessels, and P. Homan.
Specification for DNS over Transport Layer Security (TLS). RFC 7858
(Proposed Standard), May 2016.
[ICA07] ICANN. Root server attack on 6 February 2007. Technical report, Internet
Corporation for Assigned Names and Numbers, March 2007.
172
[Int17] Internet World Stats. http://www.internetworldstats.com/stats.htm,
December 2017.
[JRT02] A. Jungmaier, E. Rescorla, and M. Tuexen. Transport Layer Security over
Stream Control Transmission Protocol. RFC 3436, December 2002.
[JSBM02] Jaeyeon Jung, Emil Sit, Hari Balakrishnan, and Robert Morris. DNS per-
formance and the eectiveness of caching. ACM/IEEE Transactions on
Networking, 10, October 2002.
[Kam08] Dan Kaminsky. It’s the end of the cache as we know it. Presentation, Black
Hat Asia, October 2008.
[Kam10] Dan Kaminsky. The DNSSEC diaries, ch. 6: Just how much should we put
in DNS? blog post http://dankaminsky.com/2010/12/27/dnssec-ch6,
December 2010.
[Kam11] Dan Kaminsky. DNSSEC interlude 1: Curiosities of benchmarking DNS
over alternate transports. blog post http://dankaminsky.com/2011/01/
04/dnssec-interlude-1, January 2011.
[Kel02] John Kelsey. Compression and information leakage of plaintext. In NinthI-
ACR International Workshop on Fast Software Encrytpion, pages 263–276,
Leuven, Belgium, February 2002. Springer.
[KHRH14] Marc K¨ uhrer, Thomas Hupperich, Christian Rossow, and Thorsten Holz.
Hell of a handshake: Abusing TCP for reflective amplification DDoS
attacks. In Proceedings of the USENIX Workshop on Oensive Technolo-
gies, San Diego, CA, USA, August 2014. USENIX.
[KKC11] Ahmed Khurshid, Firat Kiyak, and Matthew Caesar. Improving robust-
ness of dns to software vulnerabilities. In Proceedings of the 27th Annual
Computer Security Applications Conference, ACSAC ’11, pages 177–186,
New York, NY , USA, 2011. ACM.
[KM87] Christopher A. Kent and Jerey C. Mogul. Fragmentation considered
harmful. In Proceedings of the ACM SIGCOMM, pages 390–401, August
1987.
[Lan10] Adam Langley. Overclockign SSL. blog post https://www.
imperialviolet.org/2010/06/25/overclocking-ssl.html, June 2010.
[Lan12] Adam Langley. Revocation checking and Chrome’s CRL. https://www.
imperialviolet.org/2012/02/05/crlsets.html, February 2012.
173
[LS12] C. Lewis and M. Sergeant. Overview of Best Email DNS-Based List
(DNSBL) Operational Practices. RFC 6471 (Informational), January 2012.
[LSAB08] B. Laurie, G. Sisson, R. Arends, and D. Blacka. DNS security (DNSSEC)
hashed authenticated denial of existence. RFC 5155, Internet Request For
Comments, March 2008.
[LT10] Yanbin Lu and Gene Tsudik. Towards plugging privacy leaks in domain
name system. In Proceedings of the IEEE International Conference on
Peer-to-Peer Computing, Delft, Netherlands, August 2010. IEEE.
[LTZ
+
15] Yabing Liu, Will Tome, Liang Zhang, David Chones, Dave Levin, Bruce
Maggs, Alan Mislove, Aaron Schulman, and Christo Wilson. An end-to-
end measurement of certificate revocation in the web’s pki. In Proceedings
of the 2015 Internet Measurement Conference, IMC ’15, pages 183–196,
New York, NY , USA, 2015. ACM.
[LW] Andreas Loef and Yuwei Wang. libtrace tool: tracereplay. http://www.
wand.net.nz/trac/libtrace/wiki/TraceReplay.
[Mat] Matthew Dempsky. Dnscurve client tool. https://github.com/
mdempsky/dnscurve.
[Mat14] Matthew Prince. Introducing cname flattening: Rfc-compliant cnames at
a domain’s root. https://blog.cloudflare.com/introducing-cname-
flattening-rfc-compliant-cnames-at-a-domains-root, April 2014.
[Mau13] Jared Mauch. Open resolver project. Presentation, DNS-OARC Spring
2013 Workshop (Dublin), May 2013. https://indico.dns-oarc.net/
/contributionDisplay.py?contribId=24&sessionId=0&confId=0.
[MDL13] Wei Meng, Ruian Duan, and Wenke Lee. DNS Changer remediation study.
Talk at M3AAWG 27th, February 2013.
[Met09] Cade Metz. Comcast trials (domain helper service) DNS hijacker. The
Register, July 2009.
[MFS90] Barton P. Miller, Louis Fredriksen, and Bryan So. An empirical study of
the reliability of UNIX utilities. Communications of the ACM, 33(12):32–
44, December 1990.
[Moc87a] P. Mockapetris. Domain names—concepts and facilities. RFC 1034,
November 1987.
174
[Moc87b] P. Mockapetris. Domain names—implementation and specification. RFC
1035, November 1987.
[MP13] John Marko and Nicole Perlroth. Attacks used the Internet against itself
to clog trac. New York Times, March 2013.
[Nat] Je Nathan. nemesis. http://nemesis.sourceforge.net.
[Neta] Netcraft. Certificate revocation and the performance of OCSP.
http://news.netcraft.com/archives/2013/04/16/certificate-
revocation-and-the-performance-of-ocsp.html.
[Netb] Netcraft. February 2017 Web Server Survey. https://news.netcraft.
com/archives/2017/02/27/february-2017-web-server-survey.html.
[Netc] Netcraft. OCSP Server Performance in April 2013. http:
//news.netcraft.com/archives/2013/05/23/ocsp-server-
performance-in-april-2013.html.
[NGBS
+
97] Henrik Frystyk Nielsen, Jim Gettys, Anselm Baird-Smith, Eric
Prud’hommeaux, Hekon Lie, and Chris Lilley. Network performance
eects of http/1.1, css1, and png. In Proceedings of the ACM SIGCOMM,
September 1997.
[NLn14] NLnetLabs. Dnssec-trigger. web page http://www.nlnetlabs.nl/
projects/dnssec-trigger, May 2014.
[Not14] M. Nottingham. HTTP/2 frequently asked questions. web page http:
//http2.github.io/faq/#why-is-http2-multiplexed, 2014.
[nss] Network Security Services. https://developer.mozilla.org/en-US/
docs/Mozilla/Projects/NSS.
[NSW
+
14] Ravi Netravali, Anirudh Sivaraman, Keith Winstein, Somak Das, Ameesh
Goyal, and Hari Balakrishnan. Mahimahi: A lightweight toolkit for
reproducible web measurement. SIGCOMM Comput. Commun. Rev.,
44(4):129–130, August 2014.
[OKLM12] Eric Osterweil, Burt Kaliski, Matt Larson, and Danny McPherson. Reduc-
ing the X.509 attack surface with DNSSEC’s DANE. In Proceedings of the
Workshop on Securing and Trusting Internet Names (SATIN), Teddington,
UK, March 2012.
[Opea] OpenDNS. Dnscrypt. http://www.opendns.com/technology/dnscrypt.
175
[Opeb] OpenDNS. Dnscrypt proxy tool. https://github.com/opendns/
dnscrypt-proxy.
[Ope06] OpenDNS. Opendns website. www.opendns.com, 2006.
[ORMZ08] Eric Osterweil, Michael Ryan, Dan Massey, and Lixia Zhang. Quantifying
the operational status of the dnssec deployment. In Proceedings of the 8th
ACM SIGCOMM Conference on Internet Measurement, IMC ’08, pages
231–242, New York, NY , USA, 2008. ACM.
[OSRB12] John S. Otto, Mario A. S´ anchez, John P. Rula, and Fabi´ an E. Bustamante.
Content delivery and the natural evolution of dns: Remote dns trends, per-
formance issues and alternative solutions. In Proceedings of the 2012
Internet Measurement Conference, IMC ’12, pages 523–536, New York,
NY , USA, 2012. ACM.
[PACS11] V . Paxson, M. Allman, J. Chu, and M. Sargent. Computing TCP’s retrans-
mission timer. RFC 6298, Internet Request For Comments, June 2011.
[Pax99] Vern Paxson. Bro: A system for detecting network intruders in real-time.
Comput. Netw., 31(23-24):2435–2463, December 1999.
[Pen14] Henk P. Penning. Analysis of the strong set in the PGP web of trust. web
page http://pgp.cs.uu.nl/plot, January 2014.
[Pet13] Y . Pettersen. The Transport Layer Security (TLS) Multiple Certificate Sta-
tus Request Extension. RFC 6961, 2013.
[PFS14] Henning Perl, Sascha Fahl, and Matthew Smith. You won’t be needing
these any more: On removing unused certificates from trust stores. In
Nicolas Christin and Reihaneh Safavi-Naini, editors, Financial Cryptogra-
phy and Data Security, pages 307–315, Berlin, Heidelberg, 2014. Springer
Berlin Heidelberg.
[PJ13] Rahul Potharaju and Navendu Jain. Demystifying the dark side of the
middle: A field study of middlebox failures in datacenters. In Proceedings
of the ACM Internet Measurement Conference, page to appear, Barcelona,
Spain, October 2013. ACM.
[PPPW04] KyoungSoo Park, Vivek S. Pai, Larry Peterson, and Zhe Wang. Codns:
Improving dns performance and reliability via cooperative lookups. In
Proceedings of the 6th Conference on Symposium on Opearting Systems
Design & Implementation - Volume 6, OSDI’04, pages 14–14, Berkeley,
CA, USA, 2004. USENIX Association.
176
[PV11] M. Parthasarathy and P. Vixie. Representing dns messages using xml.
Work in progress (Internet draft) draft-mohan-dns-query-xml-00, Septem-
ber 2011.
[Ram09] Prem Ramaswami. Introducing Google Public DNS. Google O-
cial Blog http://googleblog.blogspot.com/2009/12/introducing-
google-public-dns.html, December 2009.
[Rei] Sean Reifschneider. 4.2.2.2: The story behind a DNS legend. http://
www.tummy.com/articles/famous-dns-server.
[RM12] E. Rescorla and N. Modadugu. Datagram Transport Layer Security Version
1.2. RFC 6347, January 2012.
[Ros14] Christian Rossow. Amplification hell: Revisiting network protocols for
DDoS abuse. In Proceedings of the ISOC Network and Distributed Sys-
tem Security Symposium, San Diego, California, USA, February 2014. The
Internet Society.
[RWP17] Tirumaleswar Reddy, Dan Wing, and Prashanth Patil. DNS over Datagram
Transport Layer Security (DTLS). RFC 8094, February 2017.
[SCKB06] Ao-Jan Su, David R. Chones, Aleksandar Kuzmanovic, and Fabi´ an E.
Bustamante. Drafting behind akamai (travelocity-based detouring). In
Proceedings of the 2006 Conference on Applications, Technologies, Archi-
tectures, and Protocols for Computer Communications, SIGCOMM ’06,
pages 435–446, New York, NY , USA, 2006. ACM.
[SCRA13] Kyle Schomp, Tom Callahan, Michael Rabinovich, and Mark Allman. On
measuring the client-side DNS infrastructure. In Proceedings of the ACM
Internet Measurement Conference, October 2013.
[Sen12] Somini Sengupta. Warned of an attack on the Internet, and getting ready.
New York Times, page B1, Mar. 31 2012.
[SFTM13] Srikanth Sundaresan, Nick Feamster, Renata Teixeira, and Nazanin
Magharei. Measuring and mitigating web performance bottlenecks in
broadband access networks. In Proceedings of the ACM Internet Measure-
ment Conference, page to appear, Barcelona, Spain, October 2013. ACM.
[SHI
+
12] Emily Stark, Lin-Shung Huang, Dinesh Israni, Collin Jackson, and Dan
Boneh. The Case for Prefetching and Prevalidating TLS Server Certifi-
cates. In Proc. of 19th ISOC Network and Distributed System Security
Symposium, 2012.
177
[Shu14] Haya Shulman. Pretty bad privacy: Pitfalls of DNS encryption. In Pro-
ceedings of the 13th ACM Workshop on Privacy in the Electronic Society,
pages 191–200, Scottsdale, Arizona, November 2014. ACM.
[Sim11] W. Simpson. TCP cookie transactions (TCPCT). RFC 6013, Internet
Request For Comments, January 2011.
[Sin] Sinodun. Dns over tls patch for nsd-4.1.0. https://portal.sinodun.com/
stash/projects/TDNS/repos/dns-over-tls_patches/browse/nsd-
4.1.0_dns-over-tls.patch.
[SLS14] Aaron Schulman, Dave Levin, and Neil Spring. Revcast: Fast, private cer-
tificate revocation over fm radio. In Proceedings of the 2014 ACM SIGSAC
Conference on Computer and Communications Security, CCS ’14, pages
799–810, New York, NY , USA, 2014. ACM.
[SMA
+
13] Stefan Santesson, Michael Myers, Rich Ankney, Ambarish Malpani, Slava
Galperin, and Carlisle Adams. X.509 Internet Public Key Infrastructure
Online Certificate Status Protocol - OCSP. RFC 6960, June 2013.
[son15] Project Sonar: IPv4 SSL Certificates. https://scans.io/study/sonar.
ssl, August 2015.
[spl] Split-horizon dns. https://en.wikipedia.org/wiki/Split-horizon_
DNS.
[Sul14] Andrew Sullivan. More keys in the DNSKEY RRset at “.”, and draft-ietf-
dnsop-respsize-nn. DNSOP mailing list, January 2014. https://www.
mail-archive.com/dnsop@ietf.org/msg05565.html.
[SZET08] J. Salowey, H. Zhou, P. Eronen, and H. Tschofenig. Transport Layer Secu-
rity (TLS) Session Resumption without Server-Side State. RFC 5077, Jan-
uary 2008.
[tel] Telerik fiddler. http://www.telerik.com/fiddler.
[TK] Aaron Turner and Fred Klassen. Tcpreplay. http://tcpreplay.appneta.
com.
[TSH
+
12] Emin Topalovic, Brennan Saeta, Lin-Shung Huang, Collin Jackson, and
Dan Boneh. Towards Short-Lived Certificates. In Web 2.0 Security &
Privacy Workshop, 2012.
[VE06] Randal Vaughn and Gadi Evron. DNS amplification attacks. White paper
at http://isotf.org/news/DNS-Amplification-Attacks.pdf, March
2006.
178
[Ver14] Verisign Labs. DNS over JSON prototype, 2014.
[Vix99] P. Vixie. Extension mechanisms for DNS (EDNS0). RFC 1999, Internet
Request For Comments, August 1999.
[Vix09] Paul Vixie. What DNS is not. ACM Queue, November 2009.
[Vix12] Paul Vixie. Response Rate Limiting in the Domain Name System (DNS
RRL). blog post http://www.redbarn.org/dns/ratelimits, June 2012.
[VK12] P. Vixie and A. Kato. DNS referral response size issues. Work in progress
(Internet draft) draft-ietf-dnsop-respsize-14, May 2012.
[WA13] P. Wouters and J. Abley. The edns-tcp-keepalive EDNS0 option. Work
in progress (Internet draft) draft-wouters-edns-tcp-keepalive-00, October
2013.
[Wat04] Paul (Tony) Watson. Slipping in the window: TCP reset attacks. presenta-
tion at CanSecWest, 2004.
[WC07] S. Woolf and D. Conrad. Requirements for a Mechanism Identifying a
Name Server Instance. RFC 4892, June 2007.
[Wes16] Duane Wessels. Increasing the Zone Signing Key Size for the Root Zone.
In RIPE 72, May 2016.
[WFBc04] D. Wessels, M. Fomenkov, N. Brownlee, and k. clay. Measurements
and Laboratory Simulations of the Upper DNS Hierarchy. In Passive and
Active Network Measurement Workshop (PAM), pages 147–157, Antibes
Juan-les-Pins, France, Apr 2004. PAM 2004.
[Wik] Wikipedia. Code signing. https://en.wikipedia.org/wiki/Code_
signing.
[WKNP11] Nicholas Weaver, Christian Kreibich, Boris Nechaev, and Vern Paxson.
Implications of Netalyzr’s DNS measurements. In Proceedings of the
Workshop on Securing and Trusting Internet Names (SATIN), April 2011.
[Wou14] P. Wouters. Using DANE to associate OpenPGP public keys with email
addresses. Work in progress (Internet draft) draft-wouters-dane-openpgp-
02, February 2014.
[WW14] W. Wijngaards and G. Wiley. Confidential dns. Work in progress (Internet
draft) draft-wijngaards-dnsop-confidentialdns-01, March 2014.
179
[YRS
+
09] Scott Yilek, Eric Rescorla, Hovav Shacham, Brandon Enright, and Ste-
fan Savage. When private keys are public: Results from the 2008 debian
openssl vulnerability. In Proceedings of the 9th ACM SIGCOMM Con-
ference on Internet Measurement, IMC ’09, pages 15–27, New York, NY ,
USA, 2009. ACM.
[YWLZ12] Yingdi Yu, Duane Wessels, Matt Larson, and Lixia Zhang. Authority
server selection in dns caching resolvers. SIGCOMM Comput. Commun.
Rev., 42(2):80–86, March 2012.
[ZAH16] Liang Zhu, Johanna Amann, and John Heidemann. Measuring the latency
and pervasiveness of tls certificate revocation. In International Confer-
ence on Passive and Active Network Measurement, pages 16–29. Springer,
2016.
[ZCL
+
14] Liang Zhang, David Chones, Dave Levin, Tudor Dumitras ¸, Alan Mislove,
Aaron Schulman, and Christo Wilson. Analysis of ssl certificate reissues
and revocations in the wake of heartbleed. In Proceedings of the 2014
Conference on Internet Measurement Conference, IMC ’14, pages 489–
502, New York, NY , USA, 2014. ACM.
[ZFHS10] Fangming Zhao, Fukuoka, Y . Hori, and K. Sakurai. Analysis of existing
privacy-preserving protocols in domain name system. In IEICE TRANS-
ACTIONS on Information and Systems, May 2010.
[ZH18] Liang Zhu and John Heidemann. Ldplayer: Dns experimentation at scale.
In ACM Internet Measurement Conference, 2018.
[ZHH
+
15] Liang Zhu, Zi Hu, John Heidemann, Duane Wessels, Allison Mankin, and
Nikita Somaiya. Connection-oriented dns to improve privacy and security.
In Proceedings of the 2015 IEEE Symposium on Security and Privacy,
SP ’15, pages 171–186, Washington, DC, USA, 2015. IEEE Computer
Society.
[ZHS07] Fangming Zhao, Yoshiaki Hori, and Kouichi Sakurai. Analysis of privacy
disclosure in DNS query. In 5th Intl. Conf. on Multimedia and Ubiquitous
Engineering, Loutraki, Greece, April 2007. FTRI.
[ZWMH15] Liang Zhu, Duane Wessels, Allison Mankin, and John Heidemann. Mea-
suring DANE TLSA deployment. In Proceedings of the 7th IEEE Inter-
national Workshop on Trac Monitoring and Analaysis, pages 219–232,
Barcelona, Spain, April 2015. Springer.
180
Abstract (if available)
Abstract
The Internet has become a popular tool to acquire information and knowledge. Usually information retrieval on the Internet depends on request-response protocols, where clients and servers exchange data. Despite of their wide use, request-response protocols bring challenges for security and privacy. For example, source-address spoofing enables denial-of-service (DoS) attacks, and eavesdropping of unencrypted data leaks sensitive information in request-response protocols. There is often a trade-off between security and performance in request-response protocols. More advanced protocols, such as Transport Layer Security (TLS), are proposed to solve these problems of source spoofing and eavesdropping. However, developers often avoid adopting those advanced protocols, due to performance costs such as client latency and server memory requirement. We need to understand the trade-off between security and performance for request-response protocols and find a reasonable balance, instead of blindly prioritizing one of them. ❧ The thesis of this dissertation is that it is possible to improve security of network request-response protocols without compromising performance, by protocol and deployment optimizations that are demonstrated through measurements of protocol developments and deployments. We support the thesis statement through three specific studies, each of which uses measurements and experiments to evaluate the development and optimization of a request-response protocol. We show that security benefits can be achieved with modest performance costs. In the first study, we measure the latency of Online Certificate Status Protocol (OCSP) in TLS connections. We show that OCSP has low latency due to its wide use of CDN and caching, while identifying certificate revocation to secure TLS. In the second study, we propose to use TCP and TLS for the Domain Name System (DNS) to solve a range of fundamental problems in DNS security and privacy. We show that DNS over TCP and TLS can achieve favorable performance with selective optimization. In the third study, we build a configurable, general-purpose DNS trace replay system that emulates global DNS hierarchy in a testbed and enables DNS experiments at scale efficiently. We use this system to further prove the reasonable performance of DNS over TCP and TLS at scale in the real world. ❧ In addition to supporting our thesis, our studies have their own research contributions. Specifically, In the first work, we conducted new measurements of OCSP by examining network traffic of OCSP and showed a significant improvement of OCSP latency: a median latency of only 20 ms, much less than the 291 ms observed in prior work. We showed that CDN serves 94% of the OCSP traffic and OCSP use is ubiquitous. In the second work, we selected necessary protocol and implementation optimizations for DNS over TCP/TLS, and suggested how to run a production TCP/TLS DNS server. We suggested appropriate connection timeouts for DNS operations: 20 s at authoritative servers and 60 s elsewhere. We showed that the cost of DNS over TCP/TLS can be modest. Our trace analysis showed that connection reuse can be frequent (60%-95% for stub and recursive resolvers). We showed that server memory is manageable, and latency of connection-oriented DNS is acceptable (9%-22% slower than UDP). In the third work, we showed how to build a DNS experimentation framework that can scale to emulate a large DNS hierarchy and replay large traces. We used this experimentation framework to explore how traffic volume changes (increasing by 31%) when all DNS queries employ DNSSEC. Our DNS experimentation framework can benefit other studies on DNS performance evaluations.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Anycast stability, security and latency in the Domain Name System (DNS) and Content Deliver Networks (CDNs)
PDF
Detecting and mitigating root causes for slow Web transfers
PDF
Measuring the impact of CDN design decisions
PDF
Improving network security through collaborative sharing
PDF
A protocol framework for attacker traceback in wireless multi-hop networks
PDF
Robust routing and energy management in wireless sensor networks
PDF
Improving user experience on today’s internet via innovation in internet routing
PDF
Detecting and characterizing network devices using signatures of traffic about end-points
PDF
Mitigating attacks that disrupt online services without changing existing protocols
PDF
On practical network optimization: convergence, finite buffers, and load balancing
PDF
Optimizing task assignment for collaborative computing over heterogeneous network devices
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
PDF
Multichannel data collection for throughput maximization in wireless sensor networks
PDF
Relative positioning, network formation, and routing in robotic wireless networks
PDF
Making web transfers more efficient
PDF
Performant, scalable, and efficient deployment of network function virtualization
PDF
Learning, adaptation and control to enhance wireless network performance
PDF
Improving network reliability using a formal definition of the Internet core
PDF
Leveraging programmability and machine learning for distributed network management to improve security and performance
PDF
Crowd-sourced collaborative sensing in highly mobile environments
Asset Metadata
Creator
Zhu, Liang
(author)
Core Title
Balancing security and performance of network request-response protocols
School
Computer Science
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
10/17/2018
Defense Date
09/06/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Domain Name System (DNS),experiments,measurements,OAI-PMH Harvest,Online Certificate Status Protocol (OCSP),Performance,request-response protocols,Security,trace replay,Transport Layer Security (TLS)
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Heidemann, John (
committee chair
), Govindan, Ramesh (
committee member
), Krishnamachari, Bhaskar (
committee member
)
Creator Email
liangzhu@outlook.com,liangzhu@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-89658
Unique identifier
UC11676705
Identifier
etd-ZhuLiang-6862.pdf (filename),usctheses-c89-89658 (legacy record id)
Legacy Identifier
etd-ZhuLiang-6862.pdf
Dmrecord
89658
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zhu, Liang
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Domain Name System (DNS)
experiments
measurements
Online Certificate Status Protocol (OCSP)
request-response protocols
trace replay
Transport Layer Security (TLS)