Page 1 |
Save page Remove page | Previous | 1 of 176 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
TIMELY, ACCURATE AND SCALABLE NETWORK MANAGEMENT FOR DATA CENTERS by Masoud Moshref Javadi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER ENGINEERING) May 2017 Copyright 2017 Masoud Moshref Javadi
Object Description
Title | Timely, accurate and scalable network management for data centers |
Author | Moshref Javadi, Masoud |
Author email | moshrefj@usc.edu;masoud.moshref.j@gmail.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Computer Engineering |
School | Viterbi School of Engineering |
Date defended/completed | 2016-07-29 |
Date submitted | 2017-02-01 |
Date approved | 2017-02-01 |
Restricted until | 2017-02-01 |
Date published | 2017-02-01 |
Advisor (committee chair) | Govindan, Ramesh |
Advisor (committee member) |
Yu, Minlan Annavaram, Murali |
Abstract | Managing data center networks is critical to keeping cloud services always available, fast and efficient. Today, network operators have limited tools that provide a delayed and inaccurate view and control for the network. This causes huge service delay and hours of service disruption before resolving a failure, which can cost millions of dollars. The key problem is that these tools are designed in a bottom-up fashion and are driven by device limitations instead of high-level goals. This keeps human operators always involved in device-level details such as the allocated resources for each management task at each device, which prevents detecting all events of interest and a fast reaction to network events. In this dissertation, we follow a top-down approach, where operators program a centralized controller based on high-level abstractions. Knowing what the operator wants, the controller uses algorithms to program switches and servers. These algorithms can quickly fine tune the switches and servers to keep high accuracy, react to events quickly, and leverage device optimizations and network knowledge to scale. In this dissertation, we developed four systems following the top-down approach that configure network monitoring and control resources at switches and servers. ❧ First, we focused on providing scalable and accurate network measurement. There can be a variety of concurrent, dynamically instantiated, measurement tasks in a data center. These tasks configure flow counters in commodity switches to monitor traffic. However, flow counters use TCAM memory which is fundamentally limited, and the accuracy of the measurement tasks is a function of the resources devoted to them on each switch and traffic properties that change over time. We developed DREAM as an adaptive measurement framework that allows the operators to define measurement tasks with a required level of accuracy. DREAM dynamically adjusts the resources devoted to each measurement task while ensuring the user-specified accuracy. DREAM can support 30% more concurrent tasks with up to 80% more accurate measurements than static allocation. ❧ Next, we focused on the measurement algorithms that use sketches. Sketches support more measurement tasks than flow counters and use hash-tables that run on a cheaper memory at switches (SRAM). However, their counters may have random errors depending on the hash collisions in the hash-tables. We developed SCREAM as a measurement framework to adaptively allocate SRAM memory to measurement tasks considering the probability of such errors while ensuring the user-specified level of accuracy. SCREAM can support 2x more tasks with higher accuracy than the state-of-the-art static allocation. ❧ Third, we explored timely, accurate and scalable monitoring using the CPU resources and programmability of end-hosts (servers). We proposed Trumpet, an event monitoring system that translates a network-wide event to per end-host triggers, monitors every packet and aggregates triggers into an event in millisecond timescales. Using careful design, Trumpet can evaluate triggers by inspecting every packet at full line rate even on future generations of NICs, scale to thousands of triggers per end-host while bounding packet processing delay to a few microseconds, and report events to a controller within 10 milliseconds, even in the presence of attacks. ❧ Finally, we developed vCRIB to achieve scalable and accurate network control for data centers. Cloud operators increasingly need more and more fine-grained rules to better control individual network flows for various traffic management policies. vCRIB provides the abstraction of a centralized rule repository using which operators will not be worried about where to place rules and the resources at the target switch/server. vCRIB automatically places rules on switches and servers with enough resources on/off the shortest path of flows while respecting the semantics of rules. Moreover, vCRIB minimizes the traffic overhead induced by rule placement in the face of traffic changes and VM migration. |
Keyword | computer networks; data center; network management; network measurement; software defined networking; sketches; resource allocation |
Language | English |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m |
Contributing entity | University of Southern California |
Rights | Moshref Javadi, Masoud |
Physical access | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
Repository name | University of Southern California Digital Library |
Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-MoshrefJav-4980.pdf |
Archival file | Volume13/etd-MoshrefJav-4980.pdf |
Description
Title | Page 1 |
Full text | TIMELY, ACCURATE AND SCALABLE NETWORK MANAGEMENT FOR DATA CENTERS by Masoud Moshref Javadi A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER ENGINEERING) May 2017 Copyright 2017 Masoud Moshref Javadi |