Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A function-based methodology for evaluating resilience in smart grids
(USC Thesis Other)
A function-based methodology for evaluating resilience in smart grids
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A FUNCTION-BASED METHODOLOGY FOR EVALUATING
RESILIENCE IN SMART GRIDS
By
Anas Al Majali
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER ENGINEERING)
December 2014
Copyright 2014 Anas Al Majali
ii
Defense Committee
Dr. Clifford Neuman (Committee chair), Computer Science, USC
Dr. Viktor Prasanna (Committee co-chair), Electrical Engineering, USC
Dr. Mohammed Beshir, Electrical Engineering, USC
Dr. William G.J. Halfond (External Faculty), Computer Science, USC
iii
Dedication
To my beloved parents, wife and little angel Zaina …
iv
Acknowledgments
First, I would like to thank my advisor and the chair of my defense committee,
Professor Clifford Neuman, for giving me the opportunity to work under his supervision.
I would also like to thank Professor Neuman for giving me the opportunity to work as a
research assistant in the cyber security team of the Los Angeles Department of Water and
Power Smart Grid Regional Demonstration Project (SGRDP). Besides my advisor, I
would like to thank the members of my defense: Professor Viktor Prasanna, Professor
Mohammed Beshir and Professor William G.J. Halfond. In addition to the defense
members, I would like to thank Professor Murali Annavaram and Professor John
Silvester for serving on my qualification exam committee.
My sincere thanks also go to the members of the SGRDP cyber security team at
the Information Sciences Institute (ISI) for their useful discussions and comments: Joe
Touch, Goran Scuric, Tatyana Ryutov and Greg Finn. I specially thank my colleague in
the same team, Arun Viswanathan, for the lengthy and detailed discussions and
comments on my work. I would like to acknowledge Alba Regalado-Palacios for her
administrative support at ISI. I would also like to acknowledge Eric Rice from the Jet
Propulsion Labs (JPL) for the detailed discussions and useful collaboration that led to
v
parts of this work. My thanks also go to Kymie Tan from JPL for here useful feedback on
parts of this work.
I would like to thank the Hashemite University (Jordan) for sponsoring the first
four years of my graduate studies at USC. My research was also supported by the United
States Department of Energy under Award Numbers DE-OE000012 with the Los
Angeles Department of Water and Power (LA DWP), and DE-OE0000199 with Southern
California Edison, and by the Department of Homeland Security and the Department of
the Navy under Contract Number N66-001-10-C-2018
1
.
I would like to acknowledge my friend and roommate for four years, Waleed
Dweik, with whom I shared the good and tough times pursuing the doctorate degree. I
would also like to thank all the friends in Los Angeles who made living away from home
and family possible.
Finally, I am sincerely grateful to my beloved parents for their constant support
and patience. Your constant encouragement made this possible. I would like to express
my gratitude to my beloved wife, Amal, who joined me midway of my doctorate pursuit.
Having you beside me gave me the strength to finish what I had started.
1
This dissertation is based upon work supported by the United States Department of Energy under Awards Number
DE-OE000012 and DE-OE0000199, provided through the Los Angeles Department of Water and Power and Southern
California Edison, and by the Department of Homeland Security and the Department of the Navy under Contract No.
N66-001-10-C-2018. Neither the United States Government or any agency thereof, the Los Angeles Department of
Water and Power, Southern Californian Edison, nor any of their employees make any warranty, express or implied, or
assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus,
product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to
any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not
necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any
agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the
United States government or any agency thereof. Figures and descriptions are provided by the authors and used with
permission.
vi
Contents
Dedication .......................................................................................................................... iii
Acknowledgments.............................................................................................................. iv
List of Figures .................................................................................................................... ix
List of Tables ..................................................................................................................... xi
Abstract ............................................................................................................................... 1
Introduction ................................................................................................... 4 Chapter 1
1.1 Problem Statement ............................................................................................... 6
1.2 Smart Grid Resiliency .......................................................................................... 9
1.3 Failures in the Smart Grid .................................................................................. 13
Related Work .............................................................................................. 16 Chapter 2
2.1 Qualitative Assessment Approaches .................................................................. 18
2.2 Quantitative Assessment Approaches ................................................................ 23
2.3 Qualitative and Quantitative Combination ......................................................... 28
2.4 Related Work Analysis....................................................................................... 31
Function-based Resilience Evaluation Methodology ................................. 37 Chapter 3
3.1 Objective ............................................................................................................ 37
3.2 Smart Grid Resilience Evaluation Requirements ............................................... 39
3.3 Methodology Description ................................................................................... 41
Identify the function under study and the functions and components on 3.3.1
which it depends ........................................................................................................ 43
Create attack tree......................................................................................... 44 3.3.2
Perform sensitivity analysis based on the first two steps ............................ 46 3.3.3
Analyze a bottom-up attack scenario .......................................................... 47 3.3.4
3.4 Discussion .......................................................................................................... 49
vii
Use Case 1: Load Drop Attack ................................................................... 52 Chapter 4
4.1 System Description ............................................................................................ 53
4.2 Applying the Function-based Resilience Evaluation Methodology ................... 58
Identify the function under study and the functions and components on 4.2.1
which it depends ........................................................................................................ 58
Create attack tree......................................................................................... 59 4.2.2
Perform sensitivity analysis based on the first two steps ............................ 60 4.2.3
Analyze a bottom-up attack scenario .......................................................... 62 4.2.4
4.3 Results Analysis ................................................................................................. 67
Use Case 2: Communication Architecture Resilience ................................ 69 Chapter 5
5.1 System Description ............................................................................................ 70
5.2 Applying the Function-based Resilience Evaluation Methodology ................... 71
Identify the service under study and the services and components on which 5.2.1
it depends ................................................................................................................... 72
Create attack tree......................................................................................... 75 5.2.2
Perform sensitivity analysis based on the first two steps ............................ 76 5.2.3
Analyze a bottom-up attack scenario .......................................................... 79 5.2.4
5.3 Results Analysis ................................................................................................. 80
Use Case 3: DR as Spinning Reserve ......................................................... 83 Chapter 6
6.1 System Description ............................................................................................ 85
6.2 Applying the Function-based Resilience Evaluation Methodology ................... 86
Identify the function under study and the functions and components on 6.2.1
which it depends. ....................................................................................................... 87
Create attack tree......................................................................................... 88 6.2.2
Perform sensitivity analysis based on the first two steps ............................ 90 6.2.3
Analyze a bottom-up attack scenario .......................................................... 92 6.2.4
6.3 Results Analysis ................................................................................................. 94
Discussion ................................................................................................... 97 Chapter 7
7.1 Thesis Discussion ............................................................................................... 97
7.2 Related Work Comparison ............................................................................... 101
Cyber-physical Security Domain .............................................................. 102 7.2.1
Environmental Hazards and Socio-technical Systems .............................. 103 7.2.2
7.3 Limitations ....................................................................................................... 104
viii
Conclusion and Future Work .................................................................... 106 Chapter 8
8.1 Summary .......................................................................................................... 106
8.2 Contributions .................................................................................................... 107
8.3 Future Work ..................................................................................................... 108
8.4 Concluding Remarks ........................................................................................ 110
References ....................................................................................................................... 111
ix
List of Figures
Figure 1.1: The three time stages of system resilience. .................................................... 12
Figure 2.1: High level overview of related work objectives and approaches. .................. 18
Figure 2.2: Control system abstraction where y is the measurement of a sensor in the
physical system and u is the control signal [27]. .............................................................. 21
Figure 2.3: Cyber to physical bridge proposed by Stamp et al. [32]. ............................... 23
Figure 3.1: A functional view of the smart grid layers. .................................................... 43
Figure 3.2: Fault/attack tree for total system shutdown. ................................................... 45
Figure 4.1: Smart Grid system model consists of four elements: (1) the head end at the
utility for smart meter management, (2) ‘N’ RF wireless mesh networks of ‘m’ smart
meters each, (3) a neighborhood model that defines meter and load distribution and, (4) a
model of the power system. .............................................................................................. 53
Figure 4.2: Wireless mesh network configurations. Dots represent meters placed on a
regular grid. Meters could be one of residential, industrial or commercial types as shown
by different colors. Multiple dots clustered together represent multi-unit structures with
many meters at the same location. .................................................................................... 56
Figure 4.3: IEEE 9-bus power model. .............................................................................. 57
Figure 4.4: Contour plot showing the maximum system frequency measured in the power
simulation for different magnitudes and duration of load drops. The black dots represent
cases that resulted in system shutdown (zero power output from generators)
2
. ............... 61
Figure 4.5: Flow chart that demonstrates how a bottom-up load drop attack was
simulated. .......................................................................................................................... 62
x
Figure 4.6: Wireless mesh simulation results showing command delivery rates or smart
meters within an RF Mesh for different command spacing intervals. .............................. 64
Figure 4.7: Error analysis for the 100 ms and 150 ms command spacing intervals
produced by varying the command ordering. ................................................................... 65
Figure 4.8: Integration of power and wireless simulation results showing resulting
interactions between the two systems
3
. ............................................................................. 66
Figure 5.1: Geographical image of the simulated region. Each house in the image has one
meter and the star represents the wireless router in the center of the region. ................... 70
Figure 5.2: Attack tree for remote metering and DR combined. ...................................... 76
Figure 5.3: Plots of performance metrics with the network under an active DoS attack.
The experiment configuration is the baseline configuration of 250 meters, meter sending
interval set to 900 seconds, and the routing protocol as AODV. Each figure represent two
cases: 5% and 10% compromised meters. ........................................................................ 79
Figure 6.1: Smart Grid system model consists of four elements: (1) the head end at the
utility for smart meter management, (2) ‘N’ RF wireless mesh networks of ‘m’ smart
meters each, (3) a neighborhood model that defines meter and air conditioners and, (4) a
model of the power system. .............................................................................................. 86
Figure 6.2: Attack tree for DR as spinning reserve function. ........................................... 88
Figure 6.3: Frequency of the system after 100 seconds of a contingency (y-axis) when
varied load responds (x-axis) to the DR request. .............................................................. 91
Figure 6.4: The impact of the DoS attack when DR responds to a 16MW contingency. On
average, 34% of the load responded with error bars as shown in the figures. .................. 93
xi
List of Tables
Table 1.1: Properties of smart grids, modeling requirements and research challenges [12].
............................................................................................................................................. 7
Table 2.1: Summary of the related work approach, domain, metrics and the systematic
approach. ........................................................................................................................... 30
Table 3.1: Properties of smart grids, modeling and resilience evaluation requirements and
research challenges. .......................................................................................................... 49
Table 4.1: Neighborhood model of meter and load distribution. ...................................... 56
Table 6.1: Neighborhood model of meter and air conditioners distribution. .................... 85
Table 6.2: DR load curtailment customer and load distribution for a 16MW contingency.
........................................................................................................................................... 92
Table 7.1: Resilience quantification metrics for each use case scenario. ....................... 100
1
Abstract
Utilizing communication, control and computation technologies in the modern
smart grid can enhance the reliability of the smart grid, reduce electricity costs and
provide new real-time customer services. While utilizing those technologies can be
beneficial to customers and utilities, they also make the smart grid susceptible to new
types of attacks and failures. One of the main characteristics that are required in modern
smart grids is to operate resiliently in the presence of attacks and other disturbances.
Evaluating the resilience of the smart grid has been a topic of interest in recent years.
Researchers in the environmental hazards domain use stochastic and statistical methods
to evaluate smart grid resilience. However, those techniques do not always apply when
evaluating resilience in the presence of malicious sources. On the other hand, researchers
in the cyber-security domain evaluate resilience of smart grids in an ad hoc fashion or
rely on risk assessment methodologies to do the evaluation.
In this work, we introduce a systematic and comprehensive function-based
methodology that can be used to evaluate the resilience of the smart grid to failures that
are caused by malicious sources. This methodology consists of four main steps. The first
two steps represent the modeling part of the methodology whereas the second two steps
represent the experimental evaluation of those models. First, we start by identifying the
2
function under study and the functions and components on which it depends. By doing
this, we scope the evaluation process to a single function at a time. By exploiting those
dependencies, an attack tree is created to abstract the consequences of multiple attacks
and demonstrate how attacks propagate between different domains. In the experimental
part of the methodology, we use simulation tools to evaluate the resilience of the function
under study in the presence of cyber-physical attacks. Based on the evaluation process,
the main factors that affect resilience of a certain functionality can be determined. The
resilience of the system is quantified by identifying the dependability limits to which the
system can withstand variations caused by attacks. Knowing this type of information
helps in deriving security policies and designing security components that govern the
behavior of the system and keep it resilient.
The usefulness of the function-based methodology is demonstrated by three
novel use cases: 1) cyber-physical threat of a load drop attack 2) cyber threat of a Denial
of Service (DoS) attack on the communication architecture of the smart grid and 3)
cyber-physical threats on demand response when used as spinning reserve in the smart
grid. The resilience of the smart grid in the presence of these three threats was evaluated.
In the load drop attack use case, we evaluated the impact of a sudden load drop on
the power delivery functionality and frequency of the system. The results identify the
maximum load that the system can withstand if dropped within a certain time. In the
second use case, we evaluated the impact of a DoS attack on the remote metering and
demand response functionalities. The DoS attack is performed in the customers’
neighborhood area where smart meters communicate with the utility through an RF mesh
3
network. The results showed that it requires an attacker to compromise only a small
fraction of the meters in a typical RF mesh region to disrupt the communication resilience
within the region. The results demonstrate that disrupting the communication resilience
caused remote metering and demand response failures. Finally, we evaluated the
resilience of the system when demand response is used as spinning reserve. When there is
a power contingency, demand response curtails certain amount of load to stabilize the
system to 60Hz. In this use case, we analyze the stability of the system when demand
response is under attack. The results identify the minimum amount of load that should
respond to a power contingency to stabilize the frequency of the system.
Resilience evaluation is done by creating a boundary of acceptable system
dependability in the presence of the malicious attacks. System dependability is measured
based on the function specific metrics used in each use case. Our results can be used to
derive security policies and our function-based methodology will be useful for the
evaluation of additional use cases in the smart grid and other cyber-physical systems.
4
Chapter 1
Introduction
Utilizing communication, control and computation technologies in the modern
smart grid can enhance the reliability of the smart grid, reduce electricity costs and
provide new real-time customer services [1] [2] [3]. However, these enhancements create
new cyber-physical threats that are exploitable by malicious entities to disrupt smart grid
operations at a large scale. According to the National Institute of Standards and
Technology (NIST) [4], smart grids are “next -generation electrical grids that attempt to
predict and intelligently respond to the behavior and actions of all electric power users
connected to it—suppliers, consumers and those that do both—in order to efficiently
deliver reliable, economical and sustainable electricity services.” The main characteristics
and benefits of the smart grid as identified by The U.S. Department of Energy (DoE)
Smart Grid System Report [5] can be summarized as follows:
1. Enabling informed participation by customers.
2. Accommodating all generation and storage options.
3. Enabling new products, services, and markets.
5
4. Providing the power quality for the range of needs.
5. Optimizing asset utilization and operating efficiently.
6. Operating resiliently to disturbances, attacks, and natural disasters.
Utilizing communication, control and computation technologies in the modern
smart grid can help achieve these characteristics. For example, adding smart metering
systems and other devices to collect critical information from customer premises will
assist power utilities in better decision making which improves the overall reliability of
the smart grid. A customer plugging in an electric vehicle (EV) and programming it to
charge during off-peak hours is an example of how smart grid capabilities promise to
reduce electricity costs [1].
These enhancements also create new cyber-physical vulnerabilities that are
exploitable by malicious entities to disrupt smart grid operations at a large scale. For
example, some electric vehicles offer a smart phone interface that enables remote control
over vehicle charging and discharging. An attacker gaining malicious control of this
interface for a large number of EVs can trigger simultaneous charging to create peak
loads on the power grid [6].
As indicated by the sixth characteristic of the DoE Smart Grid Report and other
reports from NIST and NETL [4] [7], the smart grid is required to be resilient to
disturbances, attacks and natural disasters. To fulfill this requirement, a comprehensive
resilience evaluation methodology is needed. In this dissertation, we introduce a
comprehensive methodology to evaluate the resilience of the smart grid to cyber-physical
threats. In addition, we answer some of the emerging questions in this domain like: What
6
is smart grid resilience? How to quantify smart grid resilience? What is missing from the
current evaluation methodologies?
1.1 Problem Statement
In this section, first, we present the thesis statement then we discuss the questions
that we want to answer in this dissertation. Finally, we provide a brief description of each
section in this dissertation.
Thesis Statement:
Given a smart grid design and the set of functions it provides: Can we quantify
the resilience of the given smart grid system to failures that are caused by malicious
sources using the function-based evaluation methodology?
By quantifying we mean identifying the conditions under which resilience is
achieved in the presence of attacks. The input to this methodology is a list of smart grid
functions and services that need to be assessed. The output is a correlation between
measures of function dependability in the presence of attack and measures of attack that
the system can tolerate. By performing this correlation, the conditions under which
resilience is achieved are identified.
Smart grids are susceptible to failures that are caused by malicious or non-
malicious sources [8] [9] [10]. An evaluation process is required to evaluate the resilience
of the smart grid especially under cyber-attacks which are becoming more prominent [9]
[10]. The following list of questions summarizes the challenges faced doing this work:
1. What is smart grid resilience? There is a need to have a clear definition of smart
grid resilience. Smart grid resilience is mentioned as a required characteristic of
7
the modern grid in several government reports directly and indirectly [4] [7] [11].
However, a precise description of this characteristic is not always explained as it
should be. In addition, the smart grid is a system that combines systems from
several disciplines (e.g. power systems and information technology). Each
discipline has its own definition of resilience which may lead to confusion when
using the term resilience.
2. What constitutes a failure for a smart grid? As smart grids are required to be
resilient to failures that are caused by attacks or faults, there is a need to
demonstrate what constitutes an unacceptable failure in the smart grid and its
subsystems.
3. How to quantify and demonstrate the resiliency of smart grids under cyber-
physical threats? Answering this question constitutes the main contribution of this
work. Utilizing computation, control and communication technologies in the
smart grid changed the way power grids operate which requires assigning a proper
set of metrics to measure the resilience of the smart grid and its services.
Evaluating the resilience of smart grids in the presence of attacks and faults is
challenging because of the characteristics show in Table 1 [12]:
a. The smart grid is a large complex system constructed from multiple
Smart grid properties Requirements Research challenges
Large
Complex
Heterogeneous
Scalability
Hierarchical
Modular
Abstraction
Hybrid models
Table 1.1: Properties of smart grids, modeling requirements and
research challenges [12].
8
subsystems (system of systems) that are interacting with each other. Each
system performs a set of functions and each function is performed to
achieve a set of objectives. The effect of these functions may be local
within a single system or may cross system boundaries affecting other
systems. The large scale, federated, and complex nature of the smart grid
requires an evaluation methodology that is able to abstract the system at
different levels.
b. The smart grid is a heterogeneous system that contains both cyber and
physical subsystems. This requires that a combination of different models
be used to represent the various components of the system at different
granularity [12].
4. How to improve the resilience of the smart grid based on the function-based
evaluation methodology? Cyber security components should be designed to
improve the resiliency of the smart grid. The design of these components should
be influenced by the outcomes of resilience evaluation methodology. In addition,
resilience evaluation should continuously be used to assess the efficiency of any
component in enhancing the resilience of smart grids.
Based on the above challenges, this work is expected to make the following
contributions:
1. A clear definition of smart grid resiliency based on a survey of the literature.
2. A clear characterization of what constitutes an unacceptable failure for a certain
smart grid function and the metrics used to measure its resilience.
9
3. A methodology to experimentally and analytically quantify and demonstrate
smart grid resiliency in the presence of cyber-physical threats within a system of
systems context. This methodology is expected to be repeatable as demonstrated
by three use cases.
Finally, the results of the evaluation methodology can be used to aid the design of
cyber security components like Detection, Diagnosis and Remediation (DDR). They can
also be used to define proper policies that can be used to govern the system.
Those objectives coincide with the goal of the Cyber Security (CS) team in the
Smart Grid Regional Demonstration Project (SGRDP) which aims to make the smart grid
resilient to failures that are caused by malicious (attacks) or non-malicious sources.
While our work in this dissertation is focused on smart grid resilience evaluation, it also
applies to other cyber-physical systems and critical infrastructures.
In Chapter one, we introduce the problem statement, provide a definition of smart
grid resilience and finally discuss the concept of function failure in the smart grid. The
related work is presented and analyzed in Chapter two. The function-based resilience
evaluation methodology is presented in Chapter three. In the next three Chapters, three
use cases are used to demonstrate the usefulness of the methodology. In Chapter seven,
the dissertation is discussed. Finally, the conclusion and future work are presented in
Chapter eight.
1.2 Smart Grid Resiliency
Resilience has been defined by researchers in different fields of study in the
literature [13] [14] [15] [16] [17] [18]. Holling’s definition of resilience [14], which is in
10
the field of ecology, is typically cited as one of the earliest efforts in this field. As defined
by Holling, resilience is “the ability of systems to absorb changes of state variab les,
driving variables, and parameters, and still persist” . In the field of transportation security,
Cox et al. [19] define resilience as “the ability of a system to maintain function and to
bounce back quickly from a disturbance”. This definition includes the two types of
resilience identified by Rose [17]: static resilience which is “the ability of an entity or
system to maintain function when shocked” and dynamic resilience which is the “the
speed at which an entity or system recovers from a severe shock to achieve a desired
state.”
Bruneau et al. [13] provide a comprehensive and detailed analysis of seismic
resilience of communities. According to Bruneau et al. resilience is the “ability of a
system to reduce the chances of a shock, to absorb a shock if it occurs, and to recover
quickly after a shock.” In addition, Bruneau et al. conceptualize four dimensions to
resilience: technical, organizational, social and economic. As our work is on smart grids,
we are interested in the technical dimension of resilience which refers to “the ability of
physical systems to perform to acceptable/desired levels when subject to earthquake
forces.” This definition matches to a high extent the one provided by Laprie [15].
Laprie defines resilience as the persistence of dependability when facing changes.
Dependability of a system that is used in this context is defined by Avizienis et al. [20] as
the ability of a system to avoid service failures that are more frequent and more severe
than is acceptable. Laprie used this definition of resilience in the context of large,
11
networked, complex ubiquitous systems. An earthquake can be considered a special case
of a “change” in the system’s environment.
Within the context of smart grids, we believe that Laprie’s definition contains all
the required elements within a smart grid context:
1. System: the smart grid can be thought of as the modern vision of the current
electricity infrastructure or the power grid. In this vision, the smart grid utilizes
communication, control and computation technologies to enhance the services
provided by it.
2. Function: a function refers to any task or mission of a certain component or
system in the smart grid. While the main function of the smart grid is power
delivery, other functions are also provided to customers like remote metering and
load management mainly through the Advance Metering Infrastructure (AMI) and
Demand Response (DR).
3. Failure: as power delivery is the main function of the smart grid, the main failure
would be a power outage. A power outage can be caused by a variety of reasons.
However, in this dissertation we focus on failures that are caused by cyber-
physical threats in the smart grid. We want to emphasize that we are not limiting
ourselves to power outage failures. We also consider failure in providing new
smart grid services to customers.
4. Acceptable function failure: we should identify the proper metrics to decide
whether a failure is acceptable or not. These metrics should capture the severity
and the frequency of failures. If a given measurement crosses certain threshold
12
then this means that the system is not able to maintain its functions. As we
elaborate later in the next subsection, we derive these metrics from the higher
level functions of the smart grid. For example, the severity of a power outage can
be measured by System Average Interruption Duration Index (SAIDI) and the
frequency of a power outage can be measured by System Average Interruption
Frequency Index (SAIFI).
Based on the general definition of resilience discussed above, resilience of the
smart grid can be defined as:
The persistent ability of the smart grid to avoid service failures that are more
frequent and more severe than is acceptable when facing changes in the environment, and
to recover from failures whenever they occur.
A number of factors such as cyber-attacks, internal system failures, policy
changes, configuration changes, or deployment changes can result in adverse conditions
Figure 1.1: The three time stages of system resilience.
Minimum
acceptable
performance
Time
Quality of
Service
1
2 3
Target
performance
13
and disrupt system operation. We are specifically interested in analyzing the resiliency of
the smart grid under cyber-physical threats. As shown in Figure 1.1, there are three time
stages for system resilience. The first stage is failure avoidance in which events (faults
and attacks) that cause failures are predicted and anticipated so that they can be avoided.
The second stage starts when a failure starts (which can be a partial failure or degradation
of the service(s) provided). In this stage, failures should be contained within the
acceptable levels and prevented from propagating to the whole system. The third stage is
the recovery stage in which the system tries to get the services to their desired
performance levels. In this work, our main focus is on the first two stages.
1.3 Failures in the Smart Grid
Cyber-physical attacks have different impacts on the smart grid like loss of
power/load, loss of information, or damage of equipment [21]. These impacts may
propagate and affect smart grid functions causing function and service failures. However,
a resilient smart grid should be able to avoid service failures that are more sever or
frequent than is acceptable. Measuring resilience of critical infrastructure in general has
been a topic of interest for researchers [12] [22]. Strigini [12] summarizes three main
measures that can be used to quantify resilience (more details about quantifying resilience
in the literature in Section 2.2):
1. Measures of dependability in the presence of disturbances
2. Measures of the amount of disturbances that a system can tolerate
3. Measures of probability of correct service given that a disturbance occurred
14
What is common in these three types of measures is that they all require
identifying function failures and acceptable degradation levels of smart grid services
(Figure 1.1) which agrees with resilience definition presented in Section 1.2. For
example, it may be acceptable to have a power outage in a contained small region to
avoid larger outages; however, it is not acceptable if the power outage propagates to other
regions causing blackouts. There are many ways to identify what is acceptable level of
dependability or performance. For example, specific utility requirements can be used as a
guide for identifying what is acceptable like stating that at least x% of meters in a region
should report their reading to the utility in a single reading cycle. Another way would be
to do a sensitivity analysis to find the boundaries of acceptable system services [23]. We
are adopting a function-based resilience assessment approach to evaluate the impact of
cyber-physical attacks on smart grid functionalities. Associating failures with functions
helps identify the proper metrics that can be used for assessment purposes. More details
on the function-based approach to quantify resilience can be found in 0Chapter 3. What is
also common between the three metrics is that they all measure resilience in the presence
of a disturbance. In this dissertation, we refer to these disturbances as failures that are
caused by malicious or non-malicious sources.
Our approach in measuring resilience relies on modeling certain attacks or faults
in the system then 1) measuring performance and dependability metrics that are derived
from the function under study 2) measuring the levels of the attack or fault that caused
function failures. What is important from a resilience point of view is correlating those
15
two measures so that unacceptable function failure (measured through performance and
dependability metrics) are correlated with the attack or fault level that causes it.
16
Chapter 2
Related Work
As mentioned in the introduction, in recent years, there has been a lot of work
addressing smart grids and cyber physical systems security in general. Work targeting
smart grid security can be classified based on the objective of the work and the approach
used to achieve this objective. Objectives can be classified to:
1. Resilience evaluation: This involves understanding the cyber-physical security
posture of the smart grid. In this area, system behavior (i.e. reliability,
performance, or specially derived resilience metrics) in the presence of attacks or
faults is evaluated. Attack and threat models are created for failures caused by
malicious sources to describe the attack and the strategies and actions of the
attacker whenever he gains access to the system. On the other hand, non-
malicious activities that cause failures like earthquakes and component
breakdown are modeled to evaluate their consequences. The approaches used to
evaluate the consequences of failures can be categorized to qualitative and
quantitative (some approaches combined both approaches to do the analysis).
17
There is a general agreement in the literature that evaluating these consequences
in one domain (i.e. cyber or physical) only is not sufficient so there is a need for
cyber-physical evaluation especially for events that cross system boundaries.
2. Attack detection [24]: This involves cyber and physical mechanism and
algorithms used to detect attacks. While some researchers try to detect attacks
within one domain, others claim that a comprehensive system view of the cyber
and physical domains should be emphasized when designing attack detection
algorithms.
3. Design and testing of new attack resilient algorithms, devices and systems
[24]: In this area, researchers are trying to design systems that can avoid failures
that are more severe or frequent than acceptable when facing attacks. Just like the
first two points, a comprehensive view of the system is crucial to maintain a
resilient system.
18
Almost every work we investigated started by evaluating the consequences of
attack and/or faults in the system. This evaluation, as we mentioned before, can be
classified to qualitative, quantitative or both. In the following sections we discuss related
work according to this classification. Our main focus is research that addressed the first
objective (i.e. resilience assessment and evaluation) with minor attention to the remaining
two objectives. Figure 2.1 provides a high level overview of related work objectives and
approaches when addressing resilience evaluation.
2.1 Qualitative Assessment Approaches
Neuman and Tan [6] model attacks on smart grids based on how threats propagate
between different regions in the system. In this model, the smart grid is divided into three
regions: 1) untrusted domain that includes customer side and the internet 2) distribution
domain 3) utility business domain. Each of these regions has a cyber part and physical
part. Based on this system model, attacks can be described according to the way they
propagate. A cyber-cyber threat can reside in a single region or propagate through
Figure 2.1: High level overview of related work objectives and approaches.
Research objectives
1. Assess/evaluate
system resilience.
2. Design of attack
detection
components.
3. Design and testing
of new attack
resilient
algorithms, devices
and systems
What to assess?
1. Likelihood of
attack, fault or
failure.
2. Consequences of
those attacks, faults
or failures.
How to assess/evaluate
the system?
1. Qualitatively.
2. Quantitatively.
19
multiple regions. For example an attacker can hack into the utility business domain from
the internet. An example of a cyber-physical threat is an attacker using the cyber domain
to reprogram a controller that will change the physical characteristics of the system.
Sridhar et al. [25] [26] present a coarse assessment methodology to demonstrate
the dependency between the power system and the supporting infrastructures. The
methodology starts by cyber infrastructure vulnerability analysis followed by application
impact analysis to determine possible impacts on the applications supported by the
infrastructure. Finally, physical system analysis is performed to quantify the impact on
the power system. After completing the previous full iteration, risk level is evaluated and
risk mitigation techniques are applied if the risk level is not acceptable. The authors use
this framework to identify cyber-attacks that can target industrial control systems. Then,
the authors identify control loops in the smart grid at different levels. For example,
governor control is at the generation level, state estimation is at the transmission level
and load shedding is at the distribution level. The authors then determine the cyber-
attacks that target each control loop in the smart grid and the impacts of these attacks. In
order to have “attack resilient control algorithms”, Sridhar et al. propose “domain
specific anomaly detection and intrusion tolerance algorithms” which can detect att acks
at the application level.
Huang et al. [27] took a different analytical approach for developing threat
models for attacks on control systems. A control system refers to any physical system
with sensors that send specific measurements (y) to a controller. Based on these
measurements the controller sends control signals (u) to the physical system to adjust its
20
operation (Figure 2.2). Each measurement or control signal is bounded by minimum and
maximum value. The authors use a chemical reactor as a use case example of a control
system. The cyber-attack models proposed by Huang et al. are useful in identifying and
analyzing the strategies and actions of an attacker on a physical system that he controlled.
Namely, two cyber-attack models were proposed: Denial of Service (DoS) attack model
and integrity attack models. In an integrity attack, the attacker hides his attack by keeping
the modified value of a measurement or a control signal within the acceptable range but
with extreme values. For example, the following minimum integrity attack can be
launched against a sensor:
̂
{
Where ̂
is measurement y of sensor i at time t under a minimum integrity
attack that changes the value of the measurement to its minimum value in the attack
duration
. These attack models attracted many researchers [21] [28] [24]. While the
authors propose a systematic approach of the attack model, they do not provide a
systematic approach to evaluate the impact of the attack. The attacks were evaluated
based on their physical and economic consequences on a chemical reactor.
21
Mo et al. [29] introduce attack models for the cyber part of a smart grid
infrastructure based on the entry points of the attack and actions that the attacker will
implement. Cyber and physical consequences of cyber-attacks are listed in addition to
countermeasure to protect against these attacks (the countermeasures presented here are
typical IT security mechanisms like key management and secure communication). Mo et
al. also introduced an attack model of the physical system (system theoretic approach). In
this attack model an attack (or contingency) can cause a change in system measurements
(like bus voltage magnitude). Countermeasures for the physical attacks were also listed
like checking if each measurement is outside its operating regions. The authors argued
that isolated countermeasures in each domain (cyber and physical) cannot fully protect
the system because: system attack models for both approaches are incomplete (no
cooperation to assess the consequences of the attack between the countermeasures in
different domains), security requirement of both approaches are incomplete but both are
required for the security of the smart grid and the countermeasures of both approaches
have drawbacks. Therefore, Mo et al. argued that there is a need for a combined cyber-
physical security. A replay attack that is generated from the cyber part of the system
Figure 2.2: Control system abstraction where y is the
measurement of a sensor in the physical system and u is
the control signal [27].
Physical System
Controller
y
u
22
targeting the physical part is detected in the physical part by monitoring the state of the
physical system and state variables. This methodology is called physical authentication in
which the attack is detected independent of the source of the cyber-attack that granted the
attacker access to the control system.
To measure the effect of failures, the percentage of total power demand that is not
met was used. In order to qualitatively characterize the interdependencies between the
electric and information systems, Laprie et al. [30] used State Machines and Petri Nets.
The electric and information systems transfer between different states (e.g. working,
active, passive and lost) based on certain events that happen in the system (e.g. failure,
restoration). The model also considers malicious attacks and their consequences on the
power system. No quantitative models were presented in this work.
Masera and Fovino [31] proposed a service-oriented approach for assessing
critical infrastructure security with a focus on automating the assessment process by
forming dependency links between services and components in the system. Beside the
typical vulnerability, threat and attack analysis, the concept of “service chains” was
introduced. A service chain is an oriented graph describing the direct and indirect
connections between all services in the system of systems domain. The main objective of
service chains is that they help identify all the dependencies that impact the security of
the system. The authors also emphasize the importance of validating plausible attacks in
the system using the proposed methodology. We adopt a function-based approach but
with slightly different goals. Our goal is not to automate the evaluation process but to aid
in more efficient modeling of the system under evaluation. In addition, we emphasize the
23
importance of impact analysis when applying this concept to critical infrastructure
resilience instead of the risk analysis that starts by vulnerability analysis (likelihood of
the attack) of the system.
2.2 Quantitative Assessment Approaches
Stamp et al. [32] quantitatively analyze the reliability impacts from cyber-attacks
on the power grid. Their analysis consists of two parts; in the first part, the authors
introduce the concept of a Cyber-to-Physical bridge. The cyber to physical bridge starts
by an attack vector that represents a set of adversary targets in the cyber domain. Attack
vectors can lead to cyber events that are called immediate outcomes. These immediate
outcomes in the cyber domain can cause technical effects in the physical domain that can
propagate and eventually produce grid impacts like power outage (Figure 2.3).
In the second part of the analysis, the authors propose a method to measure
degraded reliability of the power grid because of cyber-attacks. Using Monte-Carlo (MC)
[33] simulation, impacts of the cyber-attack on the system are analyzed using metrics like
Frequency of Interruption (FOI). The interval between successful attacks is modeled
using an exponentially distributed random variable with a “selected” Mean Time to
Attack (MTTA) which is analogous to the concept of Meat Time to Failure (MTTF).
Mean Time to Repair (MTTR) is also modeled after each failure or attack. Following this
approach, cyber-attacks are assumed and grid impacts are analyzed. While MTTF can be
Figure 2.3: Cyber to physical bridge proposed by Stamp et al. [32].
Attack
Vector
Immediate
Outcome
Technical
Effect
Grid
Impacts
24
estimated using statistical data, it is hard to estimate MTTA especially for zero-day
attacks. This fact demonstrates the difficulty in analytically modeling the frequency of a
cyber-attack in such system like the power grid. The main problem of this approach is
that it is limited to cyber-physical threats only. It does not consider other threats like
cyber-cyber or physical-cyber.
Cárdenas et al. [24] used the integrity attack models proposed by Huang et al. [27]
to demonstrate the physical consequences of a cyber-attack on a chemical reactor. The
results indicate that an integrity attack can transfer the system to unsafe state (in certain
circumstances). In addition to identifying the consequences of cyber-attacks on the
physical system, according to Cárdenas et al., better understanding of the interactions
between control and physical systems can also help in designing new detection
algorithms. To detect an attack on the physical system, the actual response of a physical
system to a control system is compared with the expected response. This requires a model
of the behavior of the physical system and a detection algorithm. The physical system can
be modeled using the laws of physics or a simulation tools. The authors suggest using
sequential and change detection algorithms [34]. Cárdenas et al. take the design one step
further by designing a response to the attack to maintain a resilient system. Whenever the
system detects that there is an integrity attack on sensor measurements, it starts using the
controller estimated measurements from the system model. The authors claim that these
estimated measurements cause safety concerns.
Liu et al. [35] use analytical and experimental approaches to analyze how a cyber-
attack can destabilize the power grid. The attack scenario indicates that an attacker gains
25
control to a circuit breaker that connects portion of the load in a power system. The
attacker keeps switching this circuit breaker until the system is destabilized. The authors
use a power simulator to demonstrate the effects of the attack on the frequency and
voltage of a modified version of the IEEE 9-bus system [36]. Again the cyber- attack is
assumed to be successful and the power simulation is implemented. This type of analysis
is limited to the illustrated scenario at the transmission level of the power grid and at sub-
second time scale. Attacks on the distribution and residential level require more
sophisticated power models and realistic cyber-attack scenarios.
Sridhar and Manimaran [28] claim that integrity attacks on voltage control loops
of the power grid can produce abnormal voltage conditions. The simulated attack targets
local voltage control scheme called Flexible AC Transmission Systems (FACTS). By
manipulating operational data for FACTS devices, an attacker can create abnormal
voltage conditions. The results were demonstrated on the IEEE 9-bus system. Sridhar and
Manimaran [21] evaluated the effects of a cyber-attack on a wide-area control loop, this
time the Automatic Generation Control (AGC). AGC is responsible for correcting power
flow and frequency deviations in multi-area power systems. AGC algorithm uses the
Area Control Error (ACE) control signals to make its adjustments. If the ACE values are
intelligently manipulated then AGC will adjust the generation in the system which puts
the system in an imbalanced condition. For example, if malicious ACE measurements
(frequency and power flow) indicate load increase in the system then AGC will respond
by increasing generation which will increase the normal frequency and power flow in the
system.
26
Nicol et al. [37] built a testbed for power system security evaluation by
integrating a customized version of PowerWorld [38] with real devices (e.g. relays),
emulated and simulated networks, computers and sensors. The relays, which are
computers that control breakers, respond to changes in the state of the power system
simulated in PowerWorld by opening or closing breakers. The state of the power system
(in PowerWorld) changes accordingly. In addition, networks and other devices can be
emulated or simulated at different resolutions. Nicol et al. describe an attack scenario in
which an attacker manages to control a relay and changes its configuration to open a
breaker which changes the topology of electric flow and overloads some lines. The
attacker also manages to obscure these changes by implanting botnets in vulnerable
devices that flood the network to prevent state changes from reaching the control station.
This is an interesting approach because it combines both the cyber and physical parts of
the power system. While this approach may be sufficient for power systems security
evaluation at the substation level, it lacks the capabilities to directly evaluate smart grid
applications like AMI and DR. In addition, the cyber and physical implications of the
described attack scenario were not demonstrated quantitatively. Although the authors
claim that power systems can be evaluated at large-scale and different levels of
resolutions, they did not demonstrate these capabilities.
Bergman et al. [39] introduced the Virtual Power System Testbed (VPST). In
addition to its local subsystems that include PowerWorld and RINSE, VPST has the
potential to connect to remote testbeds like DETER [40] in order to leverage their
capabilities. The Virtual Control System Environment (VCSE) [41] integrates several
27
power simulation tools, network components and other control devices (real and
simulated). VCSE was used to model the Supervisory Control And Data Acquisition
(SCADA) system which allowed assessing security vulnerabilities of SCADA. Ashok el
al. [42] built the Power-Cyber testbed by integrating the Real Time Digital Simulator
(RTDS) for power system simulation, the Internet Scale Event and Attack Generation
Environment (ISEAGE) for network emulation and real devices like routers and
firewalls. The National SCADA Test Bed (NSTB) [43] is a large scale physical testbed
that has a great value in assessing smart grid vulnerabilities. NSTB includes several
testing facilities like Power Grid Test Bed, Cyber Security Test Bed and Next-Generation
Wireless Test Bed. These facilities are used to identify and correct vulnerabilities in
control systems. While NSTB is a realistic testbed that can be used to evaluate cyber-
physical security of the smart grid, the fact that it is a physical testbed makes it
insufficient for repeatable security experimentation. For example, the Power Grid Test
Bed has 61 miles of 138 kV transmission lines and 7 substations.
Mainly, these testbeds are used to: 1) assess the vulnerabilities of power systems
and the network components that control them. 2) Study the physical consequence of
those vulnerabilities. The testbeds so developed are often proprietary in nature which
prohibits sharing of methodologies and results and its subsequent use by other
researchers. To ensure that our approach is repeatable, reproducible and adaptable; we
model the cyber-physical components of the smart grid using well-known simulation
tools like ns-2 [44] and PowerWorld. However, the evaluation methodology we are
proposing is not restricted to a specific simulation tool or testbed.
28
2.3 Qualitative and Quantitative Combination
Shinozuka et al. [45] developed an analysis procedure and database to evaluate
the resilience of electric power and water supply systems before and after a major
catastrophe like an earthquake. Based on statistical data, failure probabilities were
computed for electric components. Then those probabilities were used to create risk
curves for system degradation (e.g. reduction of power supply) under different
earthquake scenarios. Repair and restoration models for system restoration were also
created based on statistical data. Reed et al. [46] proposed a simple analytical method to
assess the resilience of networked infrastructures for natural hazard events. Two metrics
were used to measure resilience: 1) fragility which is the probability of damage given a
level of hazard and 2) quality which describes structural performance over time following
hazard (e.g. earthquake). Statistical data of power outage and restoration were used to
demonstrate the use of the proposed method.
Chiaradonna et al. [47] used a variant of Petri Nets called Stochastic Activity
Network (SAN) to model electric power system components and interdependencies
between them. External and internal failure probabilities are used to model different
failure modes in the system and how they propagate to other components. Ouyang et al.
[48] [49] assessed the resilience of critical infrastructure according to time-dependent
approach. In this approach, there are three time stages: disaster prevention, damage
propagation and recovery (similar to Figure 1.1). Resilience is quantified using the ratio
between the target performance and the real performance in the interval [0, T]. An
example power model is used to demonstrate the approach in which failure probabilities
29
and restoration models are assigned to components in the model and system resilience is
evaluated iteratively. Finally, the impact of resilience improvement mechanisms like
situational awareness and load management were demonstrated.
Table 2.1 lists the approach, domain, metrics and the systematic approach that
was used in surveyed related work.
30
First Author Approach Domain Metric Systematic Approach
Stamp et al. [32] Quantitative Physical Yes, Frequency of
Interruption (FOI)
Yes, Monte Carlo
Nicol et al. [37] Quantitative Cyber &
Physical
Experimental Testbed
Bergman et al. [39] Quantitative Cyber &
Physical
Experimental Testbed
McDonald et al. [41] Quantitative Cyber &
Physical
Experimental Testbed
Ashok el al. [42] Quantitative Cyber &
Physical
Experimental Testbed
(NSTB) [43] Quantitative Cyber &
Physical
Experimental Testbed
Liu et al. [35] Quantitative Physical Yes, voltage and
frequency
No, this work is a special case
that defines metrics relevant only
to this work
Sridhar et al. [26] Qualitative Cyber &
Physical
No Yes, exhaustive that depends on
listing all possible risks. Coarse
methodology with focus on
power system controls
Sridhar et al. [28] Quantitative Physical Yes, voltage No
Sridhar et al. [21] Quantitative Cyber &
Physical
Yes, frequency No
Huang et al. [27] Qualitative Cyber Yes, chemical
reactor specific
metrics
Yes, systematic evaluation
Cárdenas et al. [24] Quantitative Cyber &
Physical
Yes, related to the
chemical reactor
but not general to
resilience
Yes, attack model for process
control
Mo et al. [29] Qualitative Cyber &
Physical
No Yes, focus is on direct control
loop attack detection
Neuman and Tan [6] Qualitative Cyber &
Physical
No Yes
Shinozuka et al. [45] Quantitative Physical Yes, reduction of
power supply
Yes
Reed et al. [46] Both Physical Yes, fragility and
quality
Yes
Chiaradonna et al. [47] Both Physical Yes, percentage of
expected power
demand that is not
met
Yes
Ouyang et al. [49] Both physical Yes, ratio between
target performance
and real
performance
Yes
Ouyang et al. [48] Both physical Yes, ratio between
target performance
and real
performance
Yes
Laprie et al. [30] Qualitative Cyber &
Physical
No Yes
Masera and Fovino [31] Qualitative Cyber No Yes
Table 2.1: Summary of the related work approach, domain, metrics and the systematic approach.
31
2.4 Related Work Analysis
In this section we analyze the related work that has already been done and
presented in the previous three subsections. Even though different efforts have different
objectives, almost every work we investigated included some sort of evaluation of cyber
and/or physical consequences of faults or cyber-physical attacks on the system. The
evaluation process itself is either called risk assessment or resilience evaluation. We will
elaborate the difference between resilience and risk assessment and management later on
this section. Because our main focus in this dissertation is evaluating the resilience of
smart grids under cyber-physical threats, we focus on the limitations of previous efforts
in this area of study.
Quantifying Resilience: One of the main limitations in resilience evaluation in
previous work is how to quantify the resilience of smart grids (or critical infrastructures
in general) to cyber-attacks. Some researchers focus on qualitative evaluation and neglect
quantifying their evaluation or leave it to future work [6] [26] [30]. Another group of
researchers analyze the resilience of smart grids under specific cyber-attacks that impact
certain functions, services or components. In this case, the impact of the attack is
quantified using metrics that are tailored to that specific attack [21] [27] [28] [32] [35].
Finally, some researchers try to evaluate the resilience of smart grids to environmental
hazards (e.g. earthquakes and hurricanes) or typical component failures and propose more
systematic ways to quantify resilience [45] [46] [47] [48] [49]. In order to quantify
resilience, “fragility” was defined as the probability of damage given the hazard level. On
the other hand, “q uality” describes structural performance over time following hazard
32
[46] (details in Section 2.3). Ouyang et al. [49] quantified resilience using the ratio
between the target performance and the real performance in the interval [0, T]. We
believe that there is a need for coupling between systematic approaches to quantify
resilience and evaluation under cyber-physical threats. In this dissertation we solve this
issue and quantify resilience from a functional point of view as it is demonstrated later in
Chapter 3.
Evaluation Process: Different approaches were used to evaluate the resilience of
smart grids starting by ad hoc approaches and ending by more systematic and generalized
ones. In the presence of cyber-attacks, most of the efforts to evaluate the resilience of
smart grids were done in an ad hoc fashion where a single attack that exploits a certain
vulnerability was evaluated [21] [28] [35]. Systematic and generalized approaches in the
field of cyber-physical security are limited or done in a coarse grained approach [26]. The
three main steps shared among those different approaches are: 1) system vulnerability
assessment 2) direct impact analysis of exploiting these vulnerabilities and 3) physical
impact analysis [26] [50]. These risk assessment approaches deal with resilience as a goal
of risk management. By definition risk is: the likelihood times the impact of an event. In
the cyber-security domain this is usually referred to by: vulnerability X threat X impact.
While this type of assessment covers likely risks (because of the vulnerability assessment
step), it marginalizes unlikely risks that are still possible. More systematic approaches
have been proposed when studying environmental hazards and component failure
assessment. Because of the nature of the hazards and failures that were investigated in
this area (e.g. earthquakes and component failure), probabilistic approaches (statistical
33
and stochastic) were used and generalized to do the evaluation. The main problem with
this type of analysis is that failure probability models are mainly designed based on
statistical data for physical components in the system (e.g. transformers and generators in
the presence of an earthquake) or stochastic models of failures for those components.
This requires estimates of the probabilities of failures because of these events in the
system which is not a trivial task [51].
There has been an attempt to use the same probabilistic approaches to analyze
smart grids under cyber-attack in both the cyber-physical security domain and the critical
infrastructure safety domain. However, using the same method to estimate the probability
of cyber-attacks (that cause failures) may not be appropriate because: it is hard to
represent cyber-attacks using probabilistic methods similar to the ones used to models
failures because of earthquakes (e.g. what is the probability of a zero-day attack?). In
addition these methods do not capture the behavior of the attacker (attack scenario) which
results in unrealistic attack modeling and impact analysis of the attack. For example,
assigning a random variable to represent the mean time to attack that will cause a failure
of a single power component like a generator neglects the attack scenario and leads to
unrealistic impact analysis.
System and System Models: there are some limitations in previous work
regarding the system under study and how the system itself is modeled. In the field of
cyber-physical security, the focus has been on evaluating the resilience of smart grids
when direct control loops like SCADA and AGC are under attack (these systems can be
modeled as shown in Figure 2.2). While this type of evaluation is quite important, there is
34
also an equivalent need to do similar evaluation when the newly integrated services of the
smart grid are under attack. For example, what are the consequences of attacks on AMI
and DR services?
System theoretic approaches were used to model cyber and physical aspects of
the system. These approaches model the system at a high level of abstraction and
approximation. As a result of this oversimplification, these models and the results
extracted from them deviate from real-world systems [29]. In addition, consequences of
failures and the dependencies of different components and functions need to be
accurately modeled in the system which might be lacking from those system theoretic
models. Building specialized testbeds provides a great environment to evaluate the
impacts of different attack and failure scenarios in smart grids. Using those specialized
testbeds, simulated, emulated and real components can be used to model the system.
However, because of the high cost and effort required to build these testbeds, they lack
the flexibility needed to explore different models of the system [39]. In addition, these
testbeds are not publically available.
Resilience versus Risk: In the past few years, resilience has been listed as a
requirement for critical infrastructure protection and security in many federal reports in
the U.S. and other countries [7] [11] [52] [53]. This led to confusion in the relationship
between resilience on one side and the predominant risk assessment and management
practices on the other side [12] [54] [55]. Risk as defined in The Guide for Conducting
Risk Assessment by NIST [56] is “A measure of the extent to which an entity is
threatened by a potential circumstance or event, and typically a function of: (i) the
35
adverse impacts that would arise if the circumstance or event occurs; and (ii) the
likelihood of occurrence.” However, resilience was considered in three different
perspectives compared to its predominant risk assessment and management practices
[54]:
1. Goal of risk management: In this perspective, resilience replaces protection as a
goal of risk management for critical infrastructures. This perspective
acknowledges that not all adverse events (i.e. faults and attacks) can be avoided or
prevented in the system, so the focus should be on reducing the impact of adverse
events when they occur. In other words, risk management should not only try to
avoid or prevent failures but also cope with them if they occur.
2. Part of risk management: Resilience is considered a process or a methodology
rather than a goal in this perspective. In this process, activities to strengthen
resilience are needed to deal with the remaining risks that could not be prevented
[12] [54] [55]. Resilience improvement activities or methodologies should be
integrated with the existing risk management methodologies to complement them.
3. Alternative to risk management: In this perspective, resilience is proposed as an
alternative method in dealing with risk. Instead of calculating the likelihood and
impact of risks, protection measures are designed independent of the source and
type of risk. The key argument is that “unlikely risks are only unlikely, but not
impossible” so risk management methodologies that are based on probabilistic
risk assessment approaches are not sufficient [54]. This perspective is the most
radical of the three perspectives.
36
Based on the previous analysis of the relationship between resilience and risk
assessment and management, we believe that it is more appropriate to classify our work
in this dissertation as resilience evaluation (not risk evaluation or assessment) for the
following reason: in order to do risk assessment, estimates of the probability of cyber-
physical attacks (the adverse events in our case) are required to model the likelihood of
the attack [12] [51] [54] [55]. Because we are dealing with cyber-physical attacks,
coming up with those estimates is difficult and can be misleading as some events (in this
case cyber-physical attacks) may be unlikely but they are not impossible (e.g. zero-day
attacks). As a result, our focus is on the consequences or impact of the attacks rather than
the likelihood of attacks which makes the focus on resilience evaluation more appropriate
than risk assessment. However, it is worth mentioning also that risk and resilience
evaluations are not mutually exclusive. A more resilient system is less susceptible to
risks.
37
Chapter 3
Function-based Resilience Evaluation Methodology
In this chapter I introduce the function-based resilience evaluation methodology
for the smart grid. First, I discuss the main objectives of resilience evaluation. Second,
the requirements to evaluate resilience of smart grids are discussed. Finally, I present the
details of this methodology and demonstrate how it fills the missing pieces in the work
that has already been done in the literature. In the next three chapters, we demonstrate the
usefulness of the function-based methodology with three use cases.
3.1 Objective
The main objective of smart grid resilience evaluation is to analyze the smart
grid’s ability to avoid function failures that are more frequent and more severe than is
acceptable when certain events happen in the environment. An event may refer to
multiple things such as a change in the environment (e.g. change in the temperature), an
action taken by a component (e.g. DR request) or a cyber-attack. While we are concerned
about all types of events that can interrupt the operation of the smart grid, we are
especially interested in the impacts of cyber and cyber-physical threats on the system. In
38
addition, interrupting the operation of the smart grid is not solely restricted to power
delivery (smart grid’s basic operation) ; it also includes other integrated functions like
load and outage management. Knowing the actual consequences of these events (i.e.
cyber-physical threats) can help us and stakeholders address the events that cause these
consequences [32]. For example, this evaluation can help analyze the impacts of
exploiting some smart grid functionalities like remote disconnect and load management
(through Demand Response) on the power delivery function of the smart grid. By
knowing these impacts and the plausible events that cause them, we can identify the
conditions under which resilience is achieved in the system. In addition, proper security
policies and mechanisms can be derived to improve the resilience of the smart grid in
both the cyber and physical domains.
As summarized in Section 1.2, the smart grid is a large-scale, complex,
heterogeneous and federated system of systems. The smart grid has a large number of
interconnected components that span large geographical areas and serve millions of
customers. These components are controlled by different organizations (e.g. utilities)
which makes the smart grid a federated system. The smart grid is a heterogeneous system
of systems. For example, the smart grid has cyber and physical systems; moreover, the
physical (electrical) part consists of generation, transmission and distribution systems. In
the cyber part, there are different subsystems as well. For example, there is the AMI
system and SCADA system which by itself can be considered a cyber-physical system.
All the characteristics mentioned above make the smart grid a complex system. In
addition, events in cyber-physical systems like the smart grid can propagate between
39
different domains. Neuman and Tan [6] classify the smart grid into three types of
domains and illustrate how events can propagate through different domains, these
domains are: untrusted domain which includes customer premises and the Internet (each
customer can be considered a separate domain), utility distribution domain (which can be
divided to multiple domains) and utility business domain. In this classification, each
domain has a physical and a cyber component.
3.2 Smart Grid Resilience Evaluation Requirements
Based on these characteristics of the smart grid, a resilience evaluation
methodology of the smart grid should have the following requirements in order to model
and evaluate the system [57]. The first four requirements are related to system modeling
and the last two are related to evaluating resilience and modeling attacks and failures in
the system:
1. The ability to abstract different components in the system, that is, model
components at different resolutions. The smart grid is a large-scale complex
system, so it is hard to model and represent each and every component at full
detail. Instead, the proper level of abstraction should be applied without losing
information that can affect the final output of the evaluation process.
2. The ability to model different types of systems (heterogeneous models) and have
a solution to the modeled system so that the required measurements are captured.
The smart grid is a heterogeneous system that has cyber and physical components.
If a power system is modeled then the model should be able to find a solution for
the system under study given a set of events that occur in the system. The same
40
principle applies when modeling the cyber components of the system (e.g.
servers, wired or wireless networks).
3. The ability to have modular evaluation. The smart grid is a very complex system
that cannot be evaluated at once and at full scale. It is easier to deal with
components rather than the whole system. This gives researchers the freedom to
use any useful tool that is suitable to model the system or parts of it and can be
integrated with the other parts.
4. The ability to adequately model dependencies and how events cross system
boundaries. This is a crucial point because this is how many threats propagate
through the system. The smart grid is a complex system that includes many
subsystems, each subsystem performs a set of functions, and each function is
performed to achieve a set of objectives. The effect of these functions can be local
within the subsystem or can cross system boundaries affecting other subsystems
within the smart grid. What this means is that there are deep dependencies in the
smart grid. In order to analyze and evaluate the resilience of smart grids to cyber
and cyber-physical attacks, dependencies between different components and
systems have to be adequately modeled in the evaluation methodology. For
example, power system stability limits should be checked for violations caused by
cyber-attacks on the system [26] and if these attacks affect the power system then
that means that the power system depends on the component affected by the
cyber-attack. More details about dependency definition that we adopt in this
dissertation can be found in Section 3.3.1. Note that this point does not contradict
41
with the previous point as if the proper dependencies are defined, modular
evaluation becomes easier. In complex system, interdependencies may also exist
and they should be modeled properly.
5. The ability to measure and evaluate the resilience of the system. As discussed in
Chapter 2, qualitative and quantitative methods can be used to evaluate the
resilience of the smart grid. This can be done by assessing the properties of
interest whenever an event occurs (an event is an attack in the cases that we
consider) [22] [57].
6. The ability to model events that impact the resilience of the smart grid [57]. These
events may refer to many things like a change in the environment, an action taken
by a certain entity or an attack. In this dissertation, we pay special attention to
events that are caused by cyber-physical threats.
3.3 Methodology Description
In the following subsections, we introduce the function-based methodology that
we used in evaluating the resilience of the smart grid in our preliminary work. A
function-based evaluation mainly means decomposing the smart grid into functions. A
function in this realm refers to any task or mission of a certain component or system in
the smart grid. This methodology consists of four main steps. The first two steps form the
modeling part of the methodology whereas the second two steps form the experimental
evaluation of those models. First, we start by identify the function under study and the
functions and components on which it depends. By doing this, we scope down the
evaluation process to a single function at a time. By exploiting those dependencies, an
42
attack tree is created to abstract the consequences of multiple attacks and demonstrate
how attacks propagate from the cyber domain to the physical domain of the system. In
the experimental part of the methodology, we use simulation tools to evaluate the
resilience of the function under study in the presence of cyber-physical attacks.
Along each step describing this methodology, we use the load drop cyber-
physical attack [23] as an illustrative example (the full details of this example are
explained later in Chapter 4). The function under study in this attack scenario is power
delivery. So, how can this function be interrupted? There are many events that can
interrupt this function from natural disasters to component failures; however, we are
interested in failures that are caused by attacks on the smart grid. In terms of
dependability, power generators are the main element in the power delivery service. If
power generators trip (stop working), then power delivery will stop in the serviced area.
As a protection mechanism, generators are configured to trip if the power system
frequency (60 Hz in the North America) departs from 60 Hz by certain limits. A sudden
load drop or increase in power demand may cause these frequency variations.
In the current smart grid, there are many functions that can cause load dropping in
the system like load management through the DR system and remote disconnect through
the AMI system. There are two main factors that affect the outcome of this attack: the
amount of load dropped and the time over which the load is dropped. Each of the
following subsections represents one step in the resilience evaluation methodology.
43
Identify the function under study and the functions and components on 3.3.1
which it depends
The main objectives of this step are to scope down the evaluation process of this
large-scale complex system and derive metrics to quantify the evaluation process. This is
achieved by, first, decomposing the smart grid to functions and focusing the evaluation
on one function at a time. Figure 3.1 demonstrate how the smart grid can be decomposed
to functions. At the top we can see high level functions like smart metering that depend
on the communication and power infrastructures. Focusing on one function at a time
helps derive metrics to do the evaluation process based on the failure aspects of the
function under study. Each function has an acceptable service level and dropping below
this level constitutes a failure. Because the main affected function in the use case that we
are using is power delivery, we can look for performance or dependability metrics in that
area to reflect the smart grid’s resilience to such an attack. For example, we use the total
power generation lost (generators tripped) in MW because of the attacks.
Figure 3.1: A functional view of the smart grid layers.
AMI Communication Layer
Combination of wireless, cellular and wired Networks
providing communication services between utilities and consumers
Physical Power Grid
Delivers power to the end consumers
Smart
Metering
Automated
readings
and remote
meter
management
Demand
Response
Dynamic load
Management
Electric
Vehicles
Automated
(dis)charging
based on
dynamic
pricing
signals
Outage
Management
Automated
outage
detection
Cyber
Security
Protects the
smart grid
against
cyber threats
and failures
44
Second, the dependencies for this function are identified. Dependencies may exist
between functions, physical or cyber components. By identifying those dependencies, we
can analyze how failures can propagate to the function under study from other
components and functions. For this reason, we adopt the dependency definition provided
by Masera [58] which indicates that dependency is a linkage between the failure
mechanisms of two systems (function in our case) so that dependability of one of them is
affected by the other. For example, the automated metering functionality depends on the
communication architecture between smart meters and the utility [59]. On the other hand,
we consider that the power delivery function depends on the remote disconnect function
even though the remote disconnect function is not required for the operation of power
delivery but a failure in the remote disconnect function affects power delivery. One of the
functions that can cause sudden load drop in the system is the remote disconnect
functionality in the AMI system [23].
Create attack tree 3.3.2
The main objective of this step is to group attacks that have the same impact on
the system. In addition, the main factors that lead to the function failure are identified.
Those main factors can be cyber or physical factors depending on the function under
study. The main factors are produced by the cyber-attacks that cause function failure at
the cyber or physical part of the system. For example, load drop (physical factor) that
results from malicious remote disconnect (cyber-attack) is a physical factor that may lead
to power delivery failure [32]. A top-down approach is followed to create the attack tree.
The dependencies derived in the first step of the methodology are used to create the
45
attack tree. The attack tree (demonstrated in Figure 3.2) can be divided to four levels: The
first level of the attack tree represents the main function failure (i.e. attacker’s objective)
like power grid instability or total power shutdown. The second level represents the
impact on the cyber (e.g. causing network traffic collisions) or physical system (e.g.
causing a load drop). The main factors are derived from this level. The third level
represents the cyber-attack that stimulates the main factors. Different cyber-attacks may
have the same impact on the system (i.e. cause the same cyber or physical factors). By
grouping those cyber-attacks under the same nodes, the evaluation process can be
abstracted. The fourth level elaborates how the cyber-attack is implemented. Although
we are interested in failures that result from malicious activities (cyber-attacks), the same
failures may also result from non-malicious activities. The second, third and fourth levels
of the attack tree can be extended to further sub-levels.
Figure 3.2: Fault/attack tree for total system shutdown.
Total system shutdown
(generators tripping)
Sudden drop in load
Sudden increase in
load
Malicious remote
disconnect
commands
Malicious demand
response commands
Malicious demand
response commands
(1) Top level failure
. . .
(2) Physical consequence
(3) Cyber attack
Compromise head
end
(4) Attack technique
46
It is worth mentioning that the metrics used to evaluate the system or function
under study are derived from top nodes of the attack tree. The leaves of the attack trees
are created by exploiting smart grid functions and the components on which these
functions depend (i.e. software and hardware components). Unlike risk assessment
methodologies, we are not only concerned about likely risks, but also unexpected and
unforeseen risks that can still occur and impact the system [33]. To do this, we focus our
analysis on the attack tree nodes where the attack propagates from the cyber part of the
system (third level) to physical part (second level) causing the physical factors that will
lead to the main failure [34].
Perform sensitivity analysis based on the first two steps 3.3.3
The main objective of this step is to create a boundary of acceptable system
performance in the presence of the attack. This boundary is created by performing a
sensitivity analysis that correlates the physical factors and the function failure. The main
factors, that lead to the top node of the attack tree (i.e. function failure), are varied as
inputs to the sensitivity analysis and the metrics identifies in the first step are monitored.
Function failure can be identified by monitoring those metrics. For example, a sudden
load drop may affect the power delivery function in the smart grid. In this case, the
amount of load dropped and time during which load is dropped (i.e. physical factors) are
varied as inputs to the sensitivity analysis and the system performance (e.g. frequency) is
monitored. The sensitivity analysis is implemented on nodes of the first and second
levels of each attack tree which helps abstract the system and demonstrates the
47
consequences of multiple attacks at the same time. In addition, a single function is
evaluated at a time which is the identified function that is affected by the attack.
The sensitivity analysis is performed using the main factors which are stimulated
in the second level of the attack tree. This means that the consequences of multiple attack
paths in the third level of the attack tree can be abstracted at the second level of the attack
tree. By performing this type of analysis, the question of what if those physical factors
are stimulated is answered regardless of the cause (i.e. attack). Answering the “what if”
question demonstrates how resilient the system is in the presence of the event under study
(i.e. attack). In addition, abstracting multiple attack paths under the same consequences
simplifies the modeling process that is required for evaluation.
In the same example, we used PowerWorld as a power simulation tool to
demonstrate the effects of loads dropping within a bounded time. So a power simulator is
used to model the top system even though the attack starts from the cyber part of the
system (AMI). The physical system was abstracted at the transmission level using the
IEEE 9-bus model. By doing this analysis at the second level of the attack tree (Figure
3.2), all the cyber-attacks under the second level node are abstracted (i.e. the same
consequences can be assumed for a successful attack).
Analyze a bottom-up attack scenario 3.3.4
While the sensitivity analysis done in the previous subsection demonstrates the
consequences of the attack in the system, analyzing a bottom-up attack scenario validates
that the attacks can actually propagate to the top nodes of the tree. This step is not
intended to validate each and every attack path, however, it is intended to demonstrate (to
48
researchers and stakeholders) that at least one path may succeed which makes the impact
in the root node realistic. As mentioned in the second step, there are still unexpected or
unforeseen risks that can occur and invoke the same main factors that will lead to failure
of the function under study.
The leaf nodes (cyber-attacks) are modeled to verify that they can actually cause
failure of the function by invoking the main factors. This process may involve multiple
systems and events that propagate from the cyber to the physical domain and vice versa.
A modular approach is used to model events that happen in different subsystems. The
challenge is to connect between those separate models without losing critical information.
In the load drop example, we used ns-2 to demonstrate that the remote disconnect
commands can reach customers’ meters within a bounded time so the generator tripping
can actually occur. A simple analytical model is used to connect between the cyber and
power domains.
Finally, the results of the evaluation process can be used to design security
components that can be used to improve the resilience of the smart grid to failures that
are caused by malicious or non-malicious sources. By knowing the main factors that
affect smart grid resilience through this evaluation, certain security policies can be
derived and imposed to protect the system from unacceptable function failures.
49
3.4 Discussion
In order to fulfill the resilience assessment requirements of the smart grid and the
limitations of previous work, we presented a function-based assessment methodology to
evaluate and quantify the resilience of the smart grid in the presence of failures that are
caused by malicious sources. This methodology can be used by researchers to quantify
the resilience of smart grid functions to events that cause failures in the smart grid
(especially attacks). The input to this methodology is a list of smart grid functions that
need to be assessed. The output is a correlation between measures of function
dependability in the presence of attack and measures of attack that the system can
tolerate. Table 3.1 summarizes the characteristics of smart grids, modeling and resilience
evaluation requirements and research challenges. The main characteristics of the
function-based methodology can be summarized as follows:
1. We presented a systematic methodology to quantitatively evaluate resilience (not
risk). In this methodology, events that can cause failures should be considered
even if they are unlikely or unexpected. As a result, our focus is on answering
“what if those events occur ?” questions instead of “how they occur? And are they
Smart grid characteristics Requirements Research challenges
Large-scale
Complex
Heterogeneous
Scalability
Multi-resolution
Heterogeneous models
Modular
Model events (attacks)
Measure resilience
Abstraction
Hybrid models
Resolve
dependencies
Realistic attack
models
Table 3.1: Properties of smart grids, modeling and resilience evaluation
requirements and research challenges.
50
likely?” In other word, the focus is on the impacts of realistic models of events
(e.g. cyber-physical attacks) on smart grids.
2. We presented a function-based approach to assess the resilience of the system.
This approach allows abstracting the system as one function and the components
on which it depends. Based on these dependencies, a modular design of the
system can be used to evaluate the impact of the attack on that function. This
modular approach allows modeling heterogeneous systems (e.g. cyber and
physical) using publicly available simulation and emulation tools. A major
challenge in this approach is finding the proper way to connect different modules
of the system without losing information that affects the final outcome of the
evaluation process.
3. We model realistic threats that take into consideration the attack scenario and not
only a model of the effect of the attack (e.g. probability that a power component
fails because of a cyber-attack). Different events can cause failures in the smart
grid. However, the main focus in this work is on failures that are caused by
attacks. Realistic attack scenarios are used to model these events in order to
generate realistic impact analysis on the resilience of the system under study.
More details about these attack scenarios and use cases to demonstrate them are
introduced in the following three chapters.
4. We quantify the evaluation process by modeling certain events in the system then
monitoring and measuring the failure of the function under study (because we are
considering a function-based approach). The function is considered resilient if it
51
can avoid failures that are more sever or frequent than what is acceptable in the
presence of those events (i.e. attacks). If the function fails then we can identify the
conditions under which resilience is achieved.
What is special about our approach is that it is a comprehensive methodology to
evaluate the resilience of the smart grid in the presence of cyber-physical threats which
combines resilience evaluation and the causality of cyber-physical threats. After making
the proper evaluation, a set of security policies and mechanisms can be derived to
improve the resilience of the smart grid. We believe that this methodology can be
generalized to evaluate the resilience of other critical infrastructures and cyber-physical
systems.
In the next three chapters, we demonstrate the usefulness of our methodology by
presenting three use cases. In the first use case, we present a load drop attack that impacts
the power delivery function in the smart grid. In the second use case, two smart grid
functions are analyzed under a cyber-attack on communication architecture of the smart
grid. These two use cases are the smart metering and load management. Finally, we
evaluate the resilience of the smart grid when DR is used as spinning reserve.
52
Chapter 4
Use Case 1: Load Drop Attack
In this section we evaluate the resilience of the smart grid from the power delivery
function point of view in the presence of a cyber-physical threat. A sudden load drop in
the system can cause power generators to trip resulting in a total or partial system
shutdown and loss of power delivery to customer. The cyber-physical threat is based on
the assumption that an attacker has gained access to a trusted utility system which can
also be thought of as insider threat. Then, the attacker uses that access to issue remote
service disconnect commands to each smart meter in the serviced area to cause a
corresponding loss of load in the power network. What makes this attack scenario
interesting is that: 1) it addresses a new class of risk introduced by adding load switching
capability to customers’ load unlike previous studies that modeled load drop attacks at
higher levels of the power system [35] [28]; 2) it demonstrates how the characteristics of
the communication system affect the cyber-attack and how threats propagate to the power
side of the system. In the following sections we describe the system under study and
53
demonstrate the usability of the methodology in evaluating the resilience of the smart
grid in presence of a cyber-physical attack.
4.1 System Description
Separate simulation tools were used to model the cyber and physical (power) parts
of the system. Ns-2 was used to simulate the communication network between the utility
and the customer side. On the other hand, PowerWorld was used to simulate the power
side of the system. To model the dependencies between the two subsystems, analytical
models of the in-between components were used. The results we got from these models
do not apply directly to real world systems, however, more complex and sophisticated
representations of real systems can be used in future work. The whole model of the
Figure 4.1: Smart Grid system model consists of four elements: (1) the head end at the utility
for smart meter management, (2) ‘N’ RF wireless mesh networks of ‘m’ smart meters each,
(3) a neighborhood model that defines meter and load distribution and, (4) a model of the
power system.
Head End
Meter
1.1
Network 1
Power
System
… … …
Network 2 Network N
Load
1.1
(2) Wireless Mesh
(3) Neighborhood
(4) Power Systems
(1) Head End
Meter
1.2
Load
1.2
Meter
1.m
Load
1.m
Meter
2.1
Load
2.1
Meter
2.2
Load
2.2
Meter
2.m
Load
2.m
Meter
n.1
Load
n.1
Meter
n.2
Load
n.2
Meter
n.m
Load
n.m
Communication link
Power line
54
system is represented in Figure 4.1. Next, we describe the details of the system and the
models used to represent it.
Head End: the head end represents the central control system of smart meters at
the utility side. The head end is responsible for issuing remote disconnect commands
targeting smart meters at customers’ premises. The head end is assumed to be capable of
sending commands at sufficiently high rate (1000’s of commands per second) so it is not
a bottleneck in the system.
Wireless Mesh Network: this model represents the communication network
between the head end and the meters (customer side) which includes wired and wireless
segments. We make the simplifying assumption that the wireless segment dominates the
characteristics of the communication network. The wireless segment is actually the RF
mesh in the neighborhood model which consists of wireless smart meters that
communicate with the head end in an ad hoc fashion through a wireless router placed at
the center of the region.
Network Protocols: For this work, we chose to simulate a meter network using
TCP/IP as the transport and network protocols. While meter implementations may be
built on other protocols, many implement TCP/IP communications [60] [61]. The routing
protocol in the RF mesh is Ad-hoc On-Demand Distance Vector (AODV) [62].
Meter Radio Configuration: Each meter is configured in the region with the
following parameters derived from specifications of commercial smart meters [63]: radio
frequency = 900 MHz, data rate = 100 kbits/s, transmitter output = 30 dBm (1 Watt),
receiver sensitivity = −97dBm.
55
Propagation Model: The ns-2 simulator is configured to simulate an outdoor
shadowed urban area using the shadowing propagation model with the following
parameters: path loss exponent (β) = 2.7, standard deviation (σ
dB
) = 4, reference distance
= 1 m.
Neighborhood Model: We model 400 meters distributed in a region with a
wireless router placed at the center of the region. Our preliminary simulation results
showed that 400 meters is a suitable number for a single wireless router. In a practical
implementation, this number may vary and can be enhanced by the use of repeaters that
extend the communication range of the wireless router [64]. In this neighborhood model,
buildings are uniformly distributed in the region with customer types: residential,
commercial and industrial. Customer types, percentages, distribution in the neighborhood
model and load ratings are assigned based on: 1) Analytical model of example
neighborhood based on census data for area with ZIP code 90057; 2) average numbers for
the entire Los Angeles Department of Water and Power (LADWP) service area as
reported to the US Energy Information Administration [65].
56
Each costumer is assumed to have a single smart meter that is responsible for
collecting usage information and sending it to the utility. In addition, each meter has the
remote disconnect capability which is available in many commercial smart meters [60]
[66] [67]. The execution time and latencies of command execution are assumed to be
negligible relative to other communication factors. Figure 4.2 shows an example
Figure 4.2: Wireless mesh network configurations. Dots represent meters placed on a
regular grid. Meters could be one of residential, industrial or commercial types as
shown by different colors. Multiple dots clustered together represent multi-unit
structures with many meters at the same location.
90057 Neighborhood Energy Customer Model
Customer Type Percent (%) Num. of Meters Avg. Load (kW)
Industrial 0.50 2 19.17
Commercial 12.20 49 8.51
Residential 87.30 349 0.67
1 unit 5 17
2 units 2 6
3-5 units 5 20
5-9 units 6 21
10-19 units 12 45
20+ units 70 240
Totals 100 400 689.16
Table 4.1: Neighborhood model of meter and load distribution.
57
neighborhood model and Table 4.1 shows the details of the customers’ types and average
loads.
Power Distribution: The power system is simulated using the IEEE 9-bus, 3-
machine test model, shown in Figure 4.3. This model is used frequently in the literature
for stability and frequency control analysis. Starting with the PowerWorld library version
of the IEEE 9-bus model, the model was configured to include [68]: IEEET1 for the
exciter, TGOV1 for the governor and IEEL for the load. To provide semi-realistic
stability in the model, over-frequency protection was enabled with a threshold of 61.8 Hz
and a pickup of 0.25 seconds, and under-frequency protection was enabled with a
threshold of 57.6 Hz and a pickup of 2 seconds. As shown in Table 4.1, a single RF mesh
has an average load of 689.16 kW. The IEEE 9-bus model has a load of 315 MW (Figure
4.3). Our analysis thus considers 457 RF Meshes.
Figure 4.3: IEEE 9-bus power model.
58
4.2 Applying the Function-based Resilience Evaluation Methodology
In this section we demonstrate how we evaluate the resilience of the power
delivery function in the presence of the load drop attack using the function-based
evaluation methodology (Section 3.3). In the following analysis, we map each step in
analyzing this attack to the corresponding step in the framework.
Identify the function under study and the functions and components on 4.2.1
which it depends
The function under study in this use case is power delivery which is the main
function of smart grids. The question that we ask at this stage is: how can this function be
interrupted? But in order to answer this question, we have to identify the dependencies of
the power delivery function in the system. In our current work, we rely on experts’
opinion for this matter. So the question becomes: what are the components and functions
that have dependency connection with the power delivery function? The main component
in the power delivery function is power generation. If power generation fails then power
delivery will be directly affected and may cause total system shutdown. Power delivery
can be affected by manipulating system load or generation. For example, a sudden load
drop (i.e. load manipulation) may cause generators to trip which leads to a total or partial
system shutdown. This is based on the fact that a sudden load drop will increase the
frequency of the power system above its normal level (i.e. 60 Hz). As an over-frequency
protection mechanism, governors will respond by tripping generators [69]. This will not
only cause power delivery failure in the serviced area, but may also propagate to other
59
areas because of cascading failure effects. We continue by asking: what can cause a
sudden load drop in the system (i.e. continue investigating dependencies)?
Some of the newly introduced services in the smart grid, like load management
and remote disconnect, have a control capability on customer loads. In this use case, we
evaluate the threat of an attacker compromising the remote disconnect functionality and
using it to cause power delivery failure and total system shutdown. This use case is novel
because it considers this load switching capability at the customer side; not only circuit
breaker and switches at the transmission level of the system.
The metrics used to evaluate the resilience of the power delivery function in this
use case are derived from the main function under study. A total or partial system
shutdown will cause a severe power delivery failure in the serviced area. However, what
causes generator tripping is the over-frequency protection in the system which trips the
generator if the frequency exceeds 61.8 Hz for more than 0.25 seconds (pick-up time).
We use the system frequency as indicator of the power delivery function as demonstrated
in Section 4.2.3. Using this function-based approach; the evaluation process is scaled
down to one function at a time (i.e. power delivery).
Create attack tree 4.2.2
Based on the dependencies identified in the first step and a list of possible threats
that are taken as input to this step, an attack tree is created (Figure 3.2). This attack tree
can be divided to four main levels: 1) the top level failure which is system shutdown and
represents the attacker’s objective (power delivery failure is included implicitly in system
shutdown); 2) physical consequence of the cyber-attack on the physical system which is a
60
sudden load drop in this case; 3) cyber-attack that caused the load drop which is
exploiting the remote disconnect function in the AMI system; and 4) the attack technique
that is used to implement the attack. In this case it can be compromising the head end to
send illegitimate remote-disconnect commands. There are two physical factors that
propagate from the cyber domain to the physical domain (from the second to the third
levels of the attack tree): the amount of load dropped and the time over which load is
dropped. Those two factors are the main physical factors in this case.
In Figure 3.2, the attack tree nodes in the second and third levels do not include
all the scenarios through which the top level node (goal of the attack) can be achieved.
For example, there might be other unforeseen cyber-attacks that can cause sudden load
drop. In addition, there might be faults from non-malicious events that have the same
(and even worse) impact on the smart grid. The fourth level of the attack tree can be
extended to more detailed levels. For example, the leaf nodes of the attack tree can be
extended to demonstrate how the head end at the utility can be compromised. However,
this is out of the scope of this dissertation as we are interested in evaluating the effect of
the attacks not how they may manifest.
Perform sensitivity analysis based on the first two steps 4.2.3
A sensitivity analysis is performed on the power system to answer “what if there
is a sudden load drop?” question. This analysis was performed at the second level of the
attack tree in order to abstract as many cases as possible (i.e. multiple cyber-attacks with
the same impact). The goal of this sensitivity analysis is to evaluate the impact of
variations of load drops on the system. Taking a function-based approach allows focusing
61
on the top node of the attack tree and the system it represents (target of the attacker). In
this case, it is the power system. The abstraction in this step simplifies the whole
resilience evaluation process. The transient stability add-on in PowerWorld was used to
implement this analysis. Two inputs were varied to inject a load drop in the simulations:
1) the amount of load dropped which was varied between 20% and 80% of the total load
in the system; 2) the duration over which the load was dropped was varied between 0 and
100 seconds. The results of this sensitivity analysis are shown in Figure 4.4
2
. The
simulation results demonstrate that a total system shutdown can be achieved for a subset
of simulated scenarios. For these scenarios, the frequency safety limit of 61.8 Hz was
2
The figure was generated by Eric Rice from JPL [23]
Figure 4.4: Contour plot showing the maximum system frequency measured in the power
simulation for different magnitudes and duration of load drops. The black dots represent
cases that resulted in system shutdown (zero power output from generators)
2
.
62
exceeded. As demonstrated in Figure 4.4, there is a clear boundary between the system
shutdown cases (non-acceptable function failure) and partial failure of the power delivery
(acceptable function failure).
The importance of this sensitivity analysis is that 1) it demonstrates the effect of a
load drop on the system; 2) it abstracts many events (malicious or non-malicious) that can
cause a load drop in the system and those results are not limited to the proposed attack
scenario. For example, a load drop can be caused by exploiting the remote disconnect
command (Section 4.2.4), load management or both.
Analyze a bottom-up attack scenario 4.2.4
The purpose of this step is to demonstrate and verify that cyber-attacks can
actually propagate to the physical system and cause the consequences demonstrated in
Section 4.2.3. In addition, we want to study the impact of the communication network
Figure 4.5: Flow chart that demonstrates how a bottom-up load drop attack was simulated.
1. Generate background traffic.
2. Generate a series of remote
disconnect commands.
3. Capture the command
receipt time at each meter
(load drop time).
Network Simulation
4. Aggregate the load dropped
by combining loads that are
dropped within certain time
steps.
5. Drop the aggregated load
(MW) within each time step in
the power model.
Power Simulation
Out-
put
Receipt time of remote disconnect commands (i.e. load drop
time) mapped to the neighborhood model
Input
63
reliability on the command delivery from the head end to the meters. A modular approach
was used to demonstrate how the attack propagates from the cyber to physical part of the
system. This analysis started with the assumption that an attacker gains access to the head
end at the utility and issues remote disconnect commands to individual meters. The
communication network between the head end and meters was simulated using ns-2. The
simulation of this bottom-up analysis was implemented according to the following
procedure (Figure 4.5):
1. Create background traffic in the wireless network representing nominal mesh
traffic against which the scenario can be assessed. For the cases discussed here, it
was assumed that the meters are collecting usage measurements every 15 minutes,
and transmitting a 1558-byte batch of measurements every 12 hours on a
staggered schedule [70].
2. Generate a series of service disconnect commands from the head end addressed to
each meter in a single RF mesh. The time spacing between two subsequent
commands is referred to as (t
s
). Each command is 47 bytes [70].
3. Send the meter commands through the meter wireless network, and capture the
resulting command receipt reliability and times at the meters.
4. Expand the wireless simulation results to all meters in the region under study.
This is done by assuming that the large-scale region under study consists of
multiple RF meshes. The head end sends a remote disconnect command in a
round robin fashion to each RF mesh targeting a single meter. The head-end
finishes one round by sending a single remote disconnect command to one meter
64
in each RF mesh within a bounded time t
s
. After finishing this operation, each
meter in the network is assigned a receive time of the remote disconnect
command. This means that the load connected to this meter can be dropped at this
assigned time.
5. Aggregate the load dropped by combining loads that are dropped within a
bounded time t
l
. As a result, load drops are executed in time steps each of length
t
l
. In this case, we aggregate the load dropped in one round robin cycle (i.e. the
load connected to a single meter in each RF mesh). The time assigned to these
load drops is the time recorded for the receipt of the remote command (step 3).
6. Use the aggregate from the previous step as input to the power model.
7. Analyze the output of the power simulation.
In order to study how the characteristics of the communication network affect the
load drop attack, we performed the above procedure and mapped the results to the load
Figure 4.6: Wireless mesh simulation results showing command delivery rates or
smart meters within an RF Mesh for different command spacing intervals.
65
drop analysis (Figure 4.4). A sequence of attack scenarios were generated by varying the
time spacing between subsequent commands (t
s
) in 50 millisecond steps between 50 and
400 milliseconds. Figure 4.6 shows the cumulative number of meters that received a
remote disconnect command for each scenario. As mentioned in the system description,
each wireless RF mesh has 400 meters. For scenarios with inter-command time spacing
(t
s
) of 150 milliseconds and above, the behavior is dominated by t
s
. On the other hand,
when t
s
is less than or equal to 100 milliseconds, the delivery rate of the commands is
affected by collisions and packet losses due to network contention.
Command ordering was varied to generate error analysis for the boundary cases,
that is, the 150 and 100 milliseconds. The results shown in Figure 4.7 indicate that this is
a trustworthy boundary. We can conclude from this analysis that successful delivery for
all remote disconnect commands can be assumed reliable with the given network
configuration and for command spacing of greater than or equal to 150 milliseconds.
Figure 4.7: Error analysis for the 100 ms and 150 ms command spacing intervals
produced by varying the command ordering.
66
Finally, the results of the network simulation were used to generate the load drop curves
in the power simulator. The load drop curve for each of the simulated scenarios is
overlaid on top of the sensitivity analysis output as shown in Figure 4.8
3
1F. The results
demonstrate how some scenarios (e.g. the 150 milliseconds case) cross the acceptable
function failure boundary and cause a total system shutdown. A total system shutdown is
the ultimate goal in the attack tree.
The results we got from this analysis can be used to improve the resilience of the
smart grid. There are two factors that affect this cyber-physical threat: amount of load
dropped and time over which the load was dropped. By controlling those two factors, that
is, reducing the amount of load dropped and expanding the time, the threat can be
3
This figure was generated by Eric Rice from JPL [23]
Figure 4.8: Integration of power and wireless simulation results showing
resulting interactions between the two systems
3
.
67
mitigated and resilience can be improved. We will focus on those aspects in our future
work especially in deriving policies that can maintain system resilience.
4.3 Results Analysis
As demonstrated in the previous section, there are two parts in evaluating the
load-drop use case: the first one is the sensitivity analysis that involved the power system
only (root of the attack tree). As Figure 4.4 demonstrates, there is clear boundary between
acceptable and unacceptable function failure. In the acceptable function area, none of the
generators tripped and the system was resilient enough to handle the load-drop attack.
Specifically, generator governors responded in a timely manner and protected the system
from over-frequency tripping. In the upper area of the boundary there are two failure
behaviors: 1) total system shutdown in which the three generators in the system tripped
which resulted in zero power output from them. This implies that the maximum
frequency exceeded 61.8 Hz in those cases. 2) Partial system shutdown that results from
less than three generators tripping. Those cases happen when one or two generators trip
out-of-synch with the others. The loss of generation balances the loss of load which
protects the system from over-frequency tripping.
The second analysis is the end-to-end analysis that involved the network and
power simulations to gather. As demonstrated by Figure 4.6 and Figure 4.8, the network
in the given configuration can be assumed reliable for command spacing greater than or
equal to 150 milliseconds for which the system crosses that boundary of unacceptable
function failure which indicates that the system is not resilient to such attack.
68
Using the function-based evaluation methodology, we were able to quantify
resilience evaluation in two ways: first, by quantifying system stability in the presence of
cyber-attacks measured through frequency (Hz) and generator status (e.g. shutdown
because of the attack). Second, the amount of disturbance the system can tolerate
measured by the amount of load dropped (MW) and time over which load is dropped
(seconds). By correlating those two measures as we did in Figure 4.4, we created a
boundary that separates system failure (total system shutdown) and acceptable system
performance. This boundary identifies the conditions under which the system achieves
resilience. In this case, it is the maximum amount of load that can be dropped within a
certain time without power delivery failure.
69
Chapter 5
Use Case 2: Communication Architecture Resilience
In this use case, we evaluated the resilience of two smart grid functions, remote
metering and Demand Response (DR), under a cyber-attack on the communication
architecture [59]. The main part of the communication architecture under study is the
wireless RF mesh in the neighborhood area. The RF mesh is the most exposed and
vulnerable part of the communication path between the utility and the meters (customer
side). We assume that an attacker wants to generate a DoS attack targeting the wireless
router in a certain RF mesh. The attack takes advantage of the large number of meters
within the geographical region to generate a DoS attack on the wireless router node by
simultaneously generating low bit-rate traffic (hundreds of kbits/s) from individual
meters. Realistically, an attacker can accomplish this attack using different means, for
example, an attacker could compromise smart meters in a certain RF mesh and reprogram
them to increase the frequency at which they send meter reads. Or, an attacker could take
control of other customer devices such as the service gateway within a Home Area
Network (HAN) to send spurious traffic creating a DoS attack.
70
5.1 System Description
We modeled a real geographical region, shown in Figure 5.1, using ns-2. Each
house shown in the figure represents a real smart meter node and they communicate with
a wireless router, represented by a star, located at the center of the region. The wireless
router is responsible for relaying packets between the meters in the RF mesh and the
utility through the Wide Area Network (WAN).
Meter Configuration We configured each meter in the region with the following
parameters derived from specifications of a real smart meter [63]: radio frequency = 900
MHz, data rate = 100 kbits/s, transmitter output = 30 dBm (1 Watt), receiver sensitivity
= −97dBm. We configured the transport and network protocols to UDP/IP.
Meter Distribution We use the region shown in Figure 5.1, to make an informed
Figure 5.1: Geographical image of the simulated
region. Each house in the image has one meter and
the star represents the wireless router in the center
of the region.
71
guess about the meter coordinates. The chosen region allows placing meters uniformly
and placing the wireless router at the center of the region.
Propagation Model We configure the ns-2 simulator to simulate an outdoor
“shadowed urban area” using the shadowing propagation model with the following
parameters: path loss exponent = 2.7, standard deviation = 4, reference distance = 1 m.
The two functions were modeled as follows: for metering, we assumed that all
meters send their meter reads to the central wireless router, where each meter read is
1000 bytes, according to a preconfigured sending interval set by the utility. For DR, we
simulated sending of a DR load curtailment signal from the wireless router to a group of
enrolled homes requesting that they curtail certain amount of load. We assumed that only
20% of the smart meters register in the DR program to receive DR requests from the
utility. Upon receipt of a DR request, the smart meter immediately responds by sending a
DR reply to the wireless router. In our experiment, we simulated a DoS attack by
assuming that an attacker compromises some fraction of the meters within the region and
reprograms them to send spurious meter reads at a higher frequency. The effect of the
DoS attack, that is, the amount of traffic in the network, is controlled by varying the
meter sending intervals between 20 s to 60 s.
5.2 Applying the Function-based Resilience Evaluation Methodology
In this section, we demonstrate how the function-based evaluation methodology
was used to evaluate the remote metering and DR functions. Similar to Section 4.2, we
map each step in analyzing this attack to the corresponding step in the framework.
72
Identify the service under study and the services and components on 5.2.1
which it depends
Mainly, there are two functions under study in this use case: remote metering and
DR. Automated remote metering requires meters to send meter reads to the utility at a
configurable frequency. This function depends on reliable and timely delivery of meter
data to the utility by the underlying AMI communication infrastructure. Long term
disruption of the metering function impacts the operational resiliency of the smart grid by
interfering with revenue. DR is a critical component of automated load management and
relies on the ability of the AMI communication infrastructure to reliably send load
curtailment requests to smart meters and other end devices for dynamically managing the
overall system load. DR signals to the HAN could travel through the Internet or the AMI
system, but we only consider the latter.
Unlike metering, disruption of DR operations can have near-term effects on
operational resiliency of the smart grid by destabilizing the power grid. There are other
functions that are indirectly analyzed in this analysis because of the nature of the DoS
attack on the communication architecture, those include: automated outage management
and cyber security. Automated outage management requires smart meters to send outage
information in a last gasp message on detection of an outage by the meter [71]. The
utility uses information such as time and location of the outage from the message to
restore power in a timely manner. A disruption of this function directly affects the
operational resiliency of the grid by delaying the recovery and restoration of power to end
customers. The Cyber Security (CS) component protects the smart grid system against
73
attacks and failures and provides integrity, availability and confidentiality services for the
smart grid. CS functions such as detection, diagnosis and response depend on the
underlying communication infrastructure for tasks such as transporting monitored data
from different critical points in the system, exchanging detection and diagnosis messages
across its components and communicating response actions for responding promptly to
adverse situations. Disruption of these functions has direct consequences on the security
of the smart grid and impacts its overall resilience.
As shown in Figure 3.1, the resilience of the overall smart grid also depends on
the resilience of its higher level functions which in turn are directly dependent on the
resilience of the AMI communication layer. The AMI communication layer is common
component on which all these functions depend. In order to have reasonable metrics to
measure the resilience of the smart grid to the DoS attack, we need to identify what
constitutes an acceptable performance for each of the analyzed functions. Remote
metering is resilient if data from some percentage of the meters is always delivered to the
utility and within a bounded time, where the percentage and time are dependent on
utility-specific requirements. Demand response is resilient if required kWh of load is
always curtailed within a bounded time, where the required load and time are dependent
on utility-specific requirements. Outage management is resilient if the utility can always
identify and recover from outages within a bounded time, where the time is dependent on
utility specific requirements. Cyber security component is resilient if it always detects
and responds to security threats before performance and security requirements of other
functions are violated.
74
Since the attack is directly performed on the communication infrastructure, it
causes legitimate packets belonging to higher-level functions to be dropped or delayed
which impacts their performance and consequently their resilience. We analyze the
performance of these higher-level functions for different configurations of the
communication architecture, discussed in the next step. Resilient communication
architecture is one which sustains the cyber-attack without compromising the
performance requirements of the higher-level functions. We define four metrics to
measure the impact of the attack on the performance of higher-level functions. For the
purposes of this work and the definitions below, we assume the sender to be a customer-
side device such as a smart meter and the receiver to be a node such as the wireless router
node within a smart grid region.
Packet Delivery Ratio (PDR) defined as the number of packets successfully
received by a receiver over the expected number of packets.
Average End-to-end Delay defined as the average time taken for packets to be
transmitted from the sending application to the receiving application.
Average Packet Hop Count defined as the average number of intermediate
nodes through which the packets sent by a sender are routed. In the case of an RF mesh-
based network, the average hop count measures the number of meters traversed by a
packet before it reaches the receiver.
Successful DR Requests Ratio defined as the number of DR requests that
successfully receive a reply over the total DR requests that were issued.
75
The first three metrics measure performance of the metering function while the
last metric applies to DR. The above metrics are not unique to our work and have been
previously used by other researchers to measure resilience in different domains. Liu et al.
[72] define network resilience as the percentage of lost traffic upon failures. Cholda et al.
[73] define network resilience as general ability to improve network fault tolerance and
reliability. Metrics derived from dependability attributes of systems like availability and
performance have also been proposed to quantify resilience. For example, Liu et al. [72]
use packet loss rate and Najjar et al. [74] use packet loss rate and packet delay to quantify
resilience in their work. Lee et al. [75] quantify the resilience of a system under DoS
attack by the amount of traffic that needs to be sent to the system to make it unavailable.
Our choice of metrics is due to our approach based on measuring performance of higher-
level functions.
Create attack tree 5.2.2
Based on the dependencies discussed in the first step, an attack tree is created as
demonstrated in Figure 5.2. Because the same attack can be used to interrupt the two
analyzed functions, the top node in the attack tree combines the two attack objectives. For
remote metering, the attack objective is to interrupt the remote metering and/or DR
services. The attack is implemented based on the assumption that an attacker
compromised smart meters in a certain RF mesh and reprogramed them to increase the
frequency at which they send meter reads. If such an attack is successful it can have a
major impact on customers at a large scale in addition to affecting multiple functions at
the same time. A DoS attack can be created in many ways in the RF. In this evaluation,
76
we are interested in grouping multiple attacks with the same impact on the RF mesh.
There are two main cyber factors that affect the result of the attack: the number of nodes
and the data rate at which those nodes launch the attack. Because we are modeling a
relatively small region, some factors like physical obstacles and node location were
neglected. However, those factors may still be taken into consideration if future work.
Perform sensitivity analysis based on the first two steps 5.2.3
Our high-level procedure involves first running experiments under normal
operating conditions, that is, without any DoS attack to determine a baseline experiment
configuration. Then, we used the parameters from the baseline configuration to study the
resilience of the communication architecture and the performance of functions under the
DoS attack. An experiment configuration is a set of parameters controlling a particular
experiment run and defined using three parameters: 1) the routing protocol (R) used in the
Figure 5.2: Attack tree for remote metering and DR combined.
Remote metering: interrupt remote metering for a
large number of customers
DR: interrupt DR for a large number of customers
DoS attack on the
wireless networks
DoS attack on the
head end
Reprogram meters to increase
the frequency at which they
send meter reads
Reprogram service
gateway to send
spurious traffic
Malicious demand
response commands
(1) Top level failure
. . .
(2) Cyber consequence
(3) Cyber attack
Compromise certain
percentage of smart
meters
(4) Attack technique
Compromise certain
percentage of
service gateways
77
RF mesh, 2) the number of smart meter nodes (N) in the RF mesh network, and 3) the
sending interval of the meters (I). An experiment run consists of all N meters configured
to use the routing protocol R, with each meter sending its readings periodically at the
configured sending interval I. Each meter starts sending its data at a time (T) chosen from
a uniform random distribution (T ~U(0, I)). Additionally, the wireless router initiates DR
requests to 20% of the N meter nodes. We collect the results for three reading cycles, that
is, three sending intervals.
To find a baseline experiment configuration, we run experiments by varying the
choice of routing protocol, the number of meters, and meters’ sending intervals, and
record the performance metrics earlier. We considered three RF mesh routing protocols:
Ad-hoc On-Demand Distance Vector (AODV) [62], Dynamic Source Routing (DSR)
[76] and Destination Sequenced Distance Vector (DSDV) [77]. Our initial simulations for
comparing protocol performance showed that on-demand routing protocols like AODV
and DSR outperform the proactive routing protocol DSDV by imposing less overhead on
the network. We thus only considered AODV and DSR for determining the baseline
experiment configuration. We varied the number of meters within the region starting
from 150 to 350 in 50 meters step. Finally, we varied the sending intervals as 60, 420,
900 and 1800 seconds. Ideally, utilities would dictate the requirements for choosing an
acceptable baseline configuration. Our method for choosing a baseline configuration
relies on identifying the configuration values that result in high percentage of successful
DR transaction, followed by a high packet delivery ratio, a low average end-to-end delay
and finally a low average packet hop count. Using the above, we identified an acceptable
78
configuration with AODV, number of meters = 250 and sending interval = 900 s, that is,
a configuration for which PDR is 97.07%, average packet end-to-end delay is 2.86 s,
average hop count is 2.28 and 100% of DR requests received a reply.
We want to emphasize here that we are not trying to find the best configuration
for the RF mesh, but instead we try to find an acceptable configuration with which we
can simulate the attack. For example, we understand that some routing protocols such as
PRL [78] are more suitable for the RF mesh network and we plan to use these in our
future simulations.
The DoS attack assumes that the attacker has managed to comprise Y% of smart
meters (uniformly distributed in the region) and has reprogrammed their sending interval
to Z seconds. Figure 5.3 summarizes the results of the experiment. We measure the same
performance parameters as for the baseline case under the attack scenario. We observe
that for Y = 10% and Z = 60 seconds the percentage of successfully received packets
drops from 97.07% to 65.45%. The average packets end-to-end delay increases from 2.85
to 4.02 seconds. Instead of compromising smart meters, the attacker can install his own
devices to launch the DoS attack. The attacker can also compromise other devices like
the service gateway in the HAN. In this analysis, we are interested in analyzing the
consequences of the attack rather than the cause of it.
79
Utilities may require the packet hop count in the RF mesh to be within a certain
threshold so as to place deterministic bounds on the latency experienced by meters in a
large network. As we lack the details for such a requirement, we do not enforce it in the
simulation. We observe that enforcing such a requirement would further degrade the
performance with respect to PDR and successful DR requests ratio when the network is
under a DoS attack.
Analyze a bottom-up attack scenario 5.2.4
Because this is a simple use case, the sensitivity analysis implemented in Section
4.2.3 can also be applied to the bottom-up attack scenario implementation. The objective
of this step is to validate the attack can happen, however, we assume there are
Figure 5.3: Plots of performance metrics with the network under an active DoS attack. The
experiment configuration is the baseline configuration of 250 meters, meter sending interval
set to 900 seconds, and the routing protocol as AODV. Each figure represent two cases: 5%
and 10% compromised meters.
0
10
20
30
40
50
60
70
80
90
100
60 50 40 30 20
Packet delivery ratio (%)
(a) Reprogrammed sending interval (s)
5%
10%
0
10
20
30
40
50
60
70
80
90
100
60 50 40 30 20
Successful DR request
ratio (%)
(d) Reprogrammed sending interval (s)
5%
10%
0
2
4
6
8
10
12
14
60 50 40 30 20
Average packet delay
(seconds)
(b) Reprogrammed sending interval (s)
5%
10%
0
20
40
60
80
100
120
140
60 50 40 30 20
Average packet hop count
(c) Reprogrammed sending interval (s)
5%
10%
80
vulnerabilities in the system that makes it possible for this attack to succeed and
reprogram meters to launch the DoS attack.
Based on this study, there are two main factors that affect such a DoS attack: the
percentage of compromised meters and the rate at which meter send their readings. By
applying rate limiting policies to on the meters such attack can be mitigated and
resilience can be improved. Even if meters are compromised, such attack cannot succeed.
5.3 Results Analysis
Analysis of the kind discussed in the previous section helps in understanding the
attack scenarios that disrupt the operation of the communication architecture and the
realistic impacts of those attacks on high-level smart grid functions. We summarize our
key finding as follows: It requires an attacker to compromise only a small fraction of the
meters in a typical RF mesh region to disrupt the communication resilience within the
region.
Specifically, we see from Figure 5.3 that a compromise of about 5% of the 250
meters was sufficient to reduce the PDR to 10% and the successful DR request ratio to
zero. Although these figures apply to a single RF mesh region, we observe that given the
cyber nature of the attack, an attacker can easily scale-up this attack by replicating it over
multiple RF mesh regions. In this analysis, we correlated the disturbance level (measured
by percentage of compromised nodes and sending interval) with the dependability
measures of the higher level functions (i.e. remote metering and DR). This correlation
identifies the conditions under which the system achieves resilience. For example,
compromising 5% of the meters and programing them to launch a DoS attack indicates
81
that the higher level functions will fail and so the system is no more resilient. We discuss
the implications of our results to key smart grid functions in subsequent paragraphs.
Remote Metering: Utilities expect to receive a certain percentage of meter reads
per reading cycle and within a bounded time. Missing meter reads from meters may not
be severe as far as billing operations are concerned but the periodic meter reads are also
used in a continuous manner as an input to important demand response functions such as
load monitoring and forecasting. Disruption of these continuous inputs has consequences
for the stability of the overall power grid thereby impacting its resilience.
Demand Response: DR functionality depends on the ability to successfully
curtail load within a bounded time period. This requires DR requests to be successfully
communicated and acknowledged within a bounded time. As we observe from the results,
attacks can cause successful DR transactions (request-response pairs) to reduce to zero.
With additional simulations we found that the average round trip time (RTT) for
messages increased approximately by 35 times during an attack (RTT was 0.11 s for the
baseline case and around 4 seconds during an attack, for 5% compromised meters and 30
seconds sending interval). This again shows that attackers can easily disrupt the
automated load management functions in the smart grid which can eventually lead to
consequences such as large-scale blackouts.
Cyber Security: Given that an attacker needs to compromise only a small
fraction of meters to launch a DoS attack, cyber security functions at the utility may not
be able to detect and characterize the impact of the attack immediately and thus result in a
82
delayed response. In addition, the DoS attack could disrupt critical meter events from
reaching the utility which could add additional delays to detection and response.
Overall, in this chapter, we have quantitatively demonstrated through simulation,
the effects of a cyber-attack on the resilience of the RF mesh communication architecture
and its impact on the performance of two key higher-level functions of automated
metering and demand response. An important implication of our work is that improperly
configured and improperly secured smart grid communication architecture, can lend itself
to simple DoS attacks thereby compromising the resilience of the overall smart grid.
83
Chapter 6
Use Case 3: DR as Spinning Reserve
The primary function of the power system is to deliver continuous power. But, a
large, complex system such as the power grid faces several threats to its stability in the
form of disturbances and contingencies. The power system in North America operates
stably at 60Hz. Minor disturbances and contingencies such as generation loss cause the
frequency to fluctuate, but as long as the system is able to prevent the frequency from
going out of the optimal operation region (59.97 – 60.03 Hz) and quickly recover to
60Hz, the system operates continuously [30].
Power reserves are the primary mechanism to handle disturbances and
contingencies and keep the system operating in its optimal operation region (59.97 –
60.03 Hz). Reserves are classified as “spinning” or “non -spinning”, where spinning refers
to the unused but synchronized capacity of the system and non-spinning refers to the
unconnected capacity. The reserves are used by various response mechanisms such as
Governor and Automatic Generation Control (AGC) to balance the frequency of the
system. Based on their type, power reserves are classified as regulating reserves and
84
contingency reserves. Mechanisms such as governor response and AGC use the
regulating reserves to handle normal operational disturbances in the system. Contingency
reserves (also referred to as reliability response) handle supply contingencies such as loss
of generation [4] [30].
In the future, automated DR mechanisms will be used as a spinning power-reserve
by utilities to automatically manage load in the system during times of contingencies or
during times of peak-demand. For instance, during a contingency such as a generator trip,
DR will enable an intelligent system controller (or an operator) to send control
commands in the form of load reduction requests to selected customers (or customer
appliances), who (or which) will comply by shutting off the requested amount of load,
thus providing a means to balance and stabilize the system without resorting to more
expensive means like buying more energy. DR thus promises to be an efficient, low-cost
option for utilities to ensure system stability.
There are several reasons that make DR suitable for this type of reliability
response. First, it is infrequently needed (few times a month) and only needed for a short
amount of time (10 – 15 minutes depending the requirements and regulation adopted) [4].
This makes DR less intrusive to customers’ daily life. Second, DR commands can be
automatically deployed, with the right communication and control technologies, which
provides fast response. In addition, DR provides faster response than generation [4].
Finally, using DR as a spinning reserve may reduce the cost of operating and maintaining
typical spinning reserve (synchronized generation).
85
DR can be considered resilient if the required amount of load is always curtailed
within a bounded time, where the required load and time are dependent on utility-specific
requirements [20]. Using this definition, we can evaluate if DR was successful in
performing its function as a spinning reserve in the presence of a cyber-physical attack
(i.e. whether DR was resilient to cyber-physical attacks). Studies have shown that DR
signals can be sent from the utility to customers’ loads wi thin about 70 seconds [1] [2]
[3]. If DR was not successful in its function as a spinning reserve then this means that
certain requirements were violated, system stability was not maintained and additional
actions should be taken to stabilize the system (like increasing generation).
6.1 System Description
In this analysis, we use the same system described in Chapter 4. However, we
perform some modifications in order to model DR load curtailment commands that are
sent from the head end to the remotely controllable air conditioners at the customer side.
DR commands are sent by the head end to the smart meters and smart meters transfer
those commands to controllable air conditioners. In this neighborhood model, buildings
Customer Type Percent (%) Num. of Meters Avg. AC (kW)
Industrial 0.50 2
Commercial 12.20 49 3.50
Residential 87.30 349
1 unit 5 17 3.50
2 units 2 6 3.50
3-5 units 5 20 1.44
5-9 units 6 21 1.44
10-19 units 12 45 1.44
20+ units 70 240 0.70
Totals 100 400
Table 6.1: Neighborhood model of meter and air conditioners distribution.
86
are uniformly distributed in the region with residential, commercial and industrial
customer types. Customer types, percentages, distribution in the neighborhood model and
air conditioner load ratings are assigned based on analytical model of example
neighborhood based on census data for ZIP code area 90057 and real air conditioner load
ratings [38]. Customer ratings and air conditioner values are shown in Table 6.1.
Figure 6.1 shows the smart grid system model of the simulated area. Note that
general loads are replaced by air conditioners in this use case.
6.2 Applying the Function-based Resilience Evaluation Methodology
DR should respond when there is a contingency by curtailing the required amount
of load to stabilize the system to its normal frequency levels (59.97 - 60.03 Hz) within the
required time (15 minutes) [30]. So DR as a spinning reserve can be considered resilient
Figure 6.1: Smart Grid system model consists of four elements: (1) the head end at the
utility for smart meter management, (2) ‘N’ RF wireless mesh networks of ‘m’ smart meters
each, (3) a neighborhood model that defines meter and air conditioners and, (4) a model of
the power system.
Head End
Meter
1.1
Network 1
Power
System
… … …
Network 2 Network N
AC
1.1
(2) Wireless Mesh
(3) Neighborhood
(4) Power Systems
(1) Head End
Meter
1.2
AC
1.2
Meter
1.m
AC
1.m
Meter
2.1
AC
2.1
Meter
2.2
AC
2.2
Meter
2.m
AC
2.m
Meter
n.1
AC
n.1
Meter
n.2
AC
n.2
Meter
n.m
AC
n.m
Communication link
Power line
87
if the frequency and time requirements are met. By the end of this evaluation, we
demonstrate how the resilience of DR in the presence of attacks can be quantified in
terms of system frequency (Hz). Here is a step-by-step resilience evaluation for the
cyber-physical threats when DR is used as spinning reserve:
Identify the function under study and the functions and components on 6.2.1
which it depends.
The function under study in this case is DR when used as spinning reserve. By
focusing on this single function, we are scoping down the evaluation of this large-scale
complex system. The failure conditions of this function can be identified based on its
requirements. DR is required to stabilize system frequency by curtailing the required
amount of load within the required time. The required amount of load is defined based on
the size of the contingency that happens (e.g. generator failure) whereas time
requirements are defined by standards.
This function directly depends on the communication network and control devices
that transfer, receive and execute DR requests. There are other dependencies related to
how DR diagnoses contingencies and makes its decision (e.g. which customers to choose
for load curtailment); however, those aspects are out of the scope of this work. For now,
we rely on expert’s opinion to identify those dependencies. Cyber -attacks on the
communication and control components of the DR system at time of a contingency may
have direct consequences on the amount and timing of load curtailment. Manipulation of
system load may impact the stability of the physical system measured by its frequency.
88
Create attack tree 6.2.2
Based on the dependencies identified in the first step, an attack tree is created
(Figure 6.2). The main objective behind creating the attack tree is to group cyber-attacks
that have the same impact which abstracts the evaluation process. This attack tree can be
divided to four levels: first, function failure which is system instability that also includes
violating required standards. In this case, the main violation is operating in the under-
frequency region. There are several other consequences for operating in the under-
frequency region in the system which results from loss of generation (or increase in load).
For example, under-frequency may have effects on power system equipment like motors
and transformers [30].
The second level represents the direct impact of the cyber-attacks on the physical
system which is load manipulation in this case. Load manipulation may result from
Figure 6.2: Attack tree for DR as spinning reserve function.
Manipulate
system load
Cause grid
instability / violate
standards and
requirements
Manipulate
system
generation
Prevent load
reduction
Cause
unintended
load reduction
Cause load
increase
Block DR
commands
Corrupt DR
commands
Delay DR
commands
Illegitimate DR
commands
DoS attack
Compromise
head end
Compromise
customer
devices
1. Attack
objective
2. Physical
consequences
3. Cyber attack
4. Attack
technique
89
causing load reduction, preventing load reduction or increasing load. The physical factor
that causes function failure is manipulated at this level. The physical factor in this case is
the actual amount of load the responds to a contingency. If this amount is manipulated
then function failure may happen (i.e. system instability).
The third level represents the cyber-attacks that stimulate the physical factor. The
cyber-attacks are listed based on the dependencies identified in the previous point. DR
depends on the communication network to transfer its commands. Blocking load
curtailment commands results in preventing load reduction (physical factor). This is how
the attack propagates from the cyber domain to the physical domain. In Figure 6.2, the
attack tree nodes in the second and third levels do not include all the scenarios through
which the top level node (goal of the attack) can be achieved. In addition, there might be
faults from non-malicious events that may have the same impact on the smart grid.
The fourth level is the action that the attacker takes to perform the attack (i.e. how
the attacker implemented the attack). For example, the attacker may need to compromise
the head end or launch a DoS attack in order to block DR load curtailment commands.
The fourth level of the attack tree can be extended to more detailed levels. For example,
the leaf nodes of the attack tree can be extended to demonstrate how the control devices
at the customer side are compromised. However, we stop at the fourth level because we
are not concerned about the cause of attack but concerned about evaluating resilience
when the attack happens.
90
Perform sensitivity analysis based on the first two steps 6.2.3
The goal of this sensitivity analysis is to quantify the resilience of the system by
drawing a boundary between acceptable function performance and function failure in the
presence of an attack. The impact of variation of DR responses to variation of
contingencies was analyzed. There are two inputs to the sensitivity analysis: first, the size
of the contingency that happens, which is a loss of certain MW of generation, and second,
the amount of load that responds to the contingency through DR. The metric that
demonstrates system stability is the frequency of the system (Hz) (i.e. output of the
sensitivity analysis). This analysis was performed on the power system to answer what
happens if a contingency occurs in the presence of a cyber-attack that eventually reduces
the amount of load that responds to the contingency. By performing this analysis at the
second level of the attack tree, multiple cyber-attacks with the same impact are
abstracted.
A contingency is simulated in the IEEE 9-bus model in PowerWorld by losing
certain MW of generation. For each contingency, the amount of load curtailed (i.e.
responded to the DR request) was varied from 0 to 100% where 100% represents the
required amount of load that should have responded to the contingency. We made sure
that the biggest simulated contingency does not cause the frequency to get below 59.1
Hz. If the frequency hits below 59.1 Hz then other protection mechanisms will intervene
like Under Frequency Load Shedding (UFLS) and Under Frequency Generator Protection
(UFGP) [30]. Those protection mechanisms are out of the scope of this paper.
91
The frequency of the system was monitored after each run as shown in Figure 6.3.
A boundary for acceptable system performance can be seen in Figure 6.3 (blue) where
the frequency of the system stabilizes to its normal level. This figure also demonstrates
the coupling between the physical factor (load that responded to the DR request) and the
frequency of the system. The resilience of the DR as spinning reserve in the presence of
cyber-attacks is quantified by system frequency at the end of the simulation. If the
frequency deviates from its nominal values (i.e. 60.0 Hz) at the end of the simulation then
DR failed in its functions as a spinning reserve and the smart grid may be instable. Based
on the values of the frequency, the power system can be operating on one of three regions
(related to this use case): optimal operation region (59.97 – 60.03 Hz), continuous
operation region (59.50 – 59.97 Hz) and restricted operation region (59.10 – 59.50 Hz).
The results are discussed in more details in the Section 6.3.
Figure 6.3: Frequency of the system after 100 seconds of a contingency (y-axis)
when varied load responds (x-axis) to the DR request.
Optimal operation region
(59.97 – 60.03 Hz)
Continuous operation
region (59.50 – 59.97 Hz)
Restricted operation
region (59.10 – 59.50 Hz)
92
Analyze a bottom-up attack scenario 6.2.4
Based on the attack tree that is generated in the previous step, many attacks may
propagate from the cyber domain to the physical domain. Customers’ control devices are
usually susceptible to be compromised especially if they are connected to the Internet. If
those devices are compromised and configured to ignore DR requests then load reduction
will be blocked when needed. The success of this attack path depends on the percentage
of compromised devices in the serviced area.
In this section, we analyze a DoS attack targeting the wireless router in each RF
mesh. If there is a DoS attack targeting the wireless router at the time of DR event then
DR commands may be blocked. As a result of this attack, load curtailment will be
blocked. In the attack scenario, we assume that there is a 16MW (5% of the total
generation in the system) contingency in the system (loss of generation). DR is used as a
spinning reserve to compensate for the contingency. This means that on average, each RF
mesh (of the 457) should curtail 35kW. One way of curtailing this amount in one RF
mesh is through the distribution shown in Table 6.2. Finding the optimal distribution to
Customer Type Num. of Meters
Avg. AC Load
(kW) per Customer
Avg. Load
Curtailed (kW)
Commercial 457 3.50 1599.5
Residential
1 unit 457 3.50 1599.5
2 units 457 3.50 1599.5
3-5 units 914 1.44 1316.16
5-9 units 457 1.44 658.08
10-19 units 2285 1.44 3290.4
20+ units 8683 0.70 6078.1
Totals 13710 16141.24
Table 6.2: DR load curtailment customer and load distribution for a 16MW contingency.
93
curtail this amount of load is out of the scope of this paper.
As a response to this contingency, the head end starts issuing DR commands to
the designated customers. In a normal case, air conditioners should receive those
commands through the smart meters and curtail the load. However, we assume that two
rogue nodes exist in each RF mesh launching a DoS attack at the wireless router by
simultaneously generating low bit-rate traffic from individual meters. Realistically, an
attacker can accomplish this attack using different means, for example, an attacker could
compromise smart meters in a certain RF mesh and reprogram them to increase the
frequency at which they send meter reads. Or, an attacker could take control of other
customer devices such as the service gateway within a Home Area Network (HAN) to
send spurious traffic creating a DoS attack.
Figure 6.4 demonstrates the results of the attack on the top of the sensitivity
analysis. In the wireless simulation, we capture which customers received the DR
Figure 6.4: The impact of the DoS attack when DR responds to a 16MW contingency.
On average, 34% of the load responded with error bars as shown in the figures.
Only 34% of load responded
because of the attack
Optimal operation
region for 16MW
contingency
94
commands and accordingly which customers curtailed the load (based on Table 6.2).
Because of the attack, DR was not able to stabilize the system and bring the frequency
back to the optimal operation region (59.97 – 60.03 Hz). Because of the DoS attack, only
34% of load was curtailed which brings the frequency to 59.86 Hz (continuous operation
region) This means that extra actions should be taken to bring the frequency back to the
optimal operation region like increasing generation or shedding load.
6.3 Results Analysis
Based on the sensitivity analysis, the operating states of the underlying power
system can be divided to three regions Figure 6.3:
1. Optimal operation region (59.97 – 60.03 Hz): This is the safe and desired
region of operation. In this region, the required amount of load responded to
bring the frequency to its normal level within the required time. By analyzing
this category we can identify the level of disturbance that the system can
tolerate. The disturbance in this case is the percentage of load that does not
respond to the DR request because of the attack. From Figure 6.3 we can put a
lower bound on the percentage of load that should respond to a contingency
for the system to be claimed resilient. This percentage varies based on the size
of the contingency. For example, 88.0% of the load should respond for a
15MW contingency in order to maintain the system in the optimal operation
region. This analysis identifies the conditions under which the system
achieves resilience. In other words, the system is resilient to a maximum of
12.0% disturbance in load response for a 15MW contingency.
95
2. Continuous operation region (59.50 – 59.97 Hz): While this region is still
safe, it is not the desired region of operation and requires frequency
correction. This means that the DR system did not curtail the required amount
of load within the required time to bring the frequency back to its normal
conditions. Additional actions should be taken to get to the optimal operation
region like increasing generation or load shedding. For example, if less than
88.0% of the required load responds to a 15MW contingency then the system
will be in the continuous operation region.
3. Restricted operation region (59.10 – 59.50 Hz): The system may remain in
this region for a restricted amount of time (based on steam turbine off-
frequency limits [30]). Being in this region means that the DR system failed to
achieve its goal which directly affects the resilience of the whole system (i.e.
the smart grid). For example, if only 5.0% of the required load responds to a
40MW contingency then the system will be in the restricted operation region.
After deploying DR at time of contingency, if the frequency is in the optimal
operation region then DR succeeded and the system is resilient. Otherwise, DR either
partially (continuous operation region) or fully (restricted operation region) fails. This
demonstrates the importance of anticipating the successful operation of DR as a spinning
reserve. The situational awareness component should handle this task which can save
precious time in the process of responding to a contingency.
Using our approach, we were able to quantify resilience of DR when used as
spinning reserve in two ways:
96
1. The stability level of the system measured by system frequency (Hz) when
there is an attack on DR measured by the percentage of load that responds at
the time of a contingency.
2. The attack level on DR measured by the percentage of load that responds that
the system can tolerate at the time of a contingency in order to stay in the
optimal operation region.
Grouping attacks that stimulate the same physical factors in the system helped
abstract attacks that share the same impact on the system (load manipulation in this case).
The function-based methodology is repeatable so it can be used to evaluate the resilience
of other functions in the smart grid. By decomposing the smart grid (which is a large-
scale and complex system) into functions, systematic resilience evaluation can be done
on the designated functions.
97
Chapter 7
Discussion
My thesis statement asks: Can we quantify the resilience of the given smart grid
system to failures that are caused by malicious sources using the function-based
evaluation methodology? In this chapter, we discuss how the function-based resilience
evaluation methodology answers the thesis statement. In addition we discuss how it
contributes to what has already been done in the literature. Finally, we discuss the known
limitations in the function-based methodology.
7.1 Thesis Discussion
In order to evaluate resilience of smart grids, we first had to define what
resilience means. However, there is an extensive amount of work done in the literature to
define resilience. We adopted a definition that is influenced by the definition of resilience
given by Laprie [15] and the definition of dependability given by Avizienis et al. [20]
because it contains all the required elements within a smart grid context (Section 1.2).
This definition stated that resilience is the persistent ability of the smart grid to avoid
service failures that are more frequent and more severe than is acceptable when facing
98
changes in the environment, and to recover from failures whenever they occur. We are
particularly interested in changes in the environment that are caused by cyber-physical
attacks.
Second, we had to evaluate the resilience of the smart grid. However, the smart
grid is a large-scale, heterogeneous and complex system-of-systems so it is not feasible to
model the system and measure resilience of the whole system at once. To simplify and
abstract the evaluation process, we introduced a function-based methodology to evaluate
the resilience of the smart grid. Using this methodology, the smart grid can be
decomposed into functions and each function can be evaluated separately. This is
captured in the modeling first two steps of the methodology. Based on the simplification
and abstraction done in the first two steps, experiment evaluation is implemented in the
second two steps.
In the modeling steps of the methodology, we focused on modeling the function
under study and other functions and components on which it depends. This means that
unnecessary components can be neglected or removed from the model. For example, in
the load drop attack through the remote disconnect function; we modeled the AMI
network from the head end to the meters at the customer’s side with an abstracted power
model (IEEE 9-bus model). Using those abstracted models of the cyber and physical parts
of the system, we were able to perform the evaluation.
The cyber part of the system was simulated in ns-2 whereas the physical part of
the system was simulated in PowerWorld. In order to have semi-realistic models of the
systems we built, the following was configured in the models:
99
1. Network simulation (ns-2): At the customer side, we modeled an RF mesh of
smart meters which is based on vendor’s designs of the neighborhood area.
The topological distribution of meters was influenced by the coordinates of a
real geographical area. Meters were configured with parameters derived from
specifications of commercial smart meters (more details are available in
Sections 4.1, 5.1 and 6.1).
2. Power simulation (PowerWorld): We used the IEEE 9-bus model to simulate
the power side of the system. This model is widely used in the literature for
simulation purposes. We used WECC approved models for the exciter,
governor and load (more details available in Sections 4.1 and 6.1).
3. Neighborhood model: customer types, percentages, distribution in the
neighborhood model and load ratings are assigned based on: 1) Analytical
model of example neighborhood based on census data for area with ZIP code
90057; 2) average numbers for the entire Los Angeles Department of Water
and Power (LADWP) service area as reported to the U.S. Energy Information
Administration; and 3) real air conditioning load values for each customer
type (more details available in Sections 4.1 and 6.1).
Third, we had to quantify the resilience of the function under study. To do that,
we introduced a combination of two existing ways to measure resilience at the same time
(Section 1.3). Those two measures are: the measures of dependability or performance of
the function in the presence of attack and the measures of the amount of disturbances that
100
a function can tolerate to be considered resilient. Combining those two measures is done
by correlating the main factors that cause a function failure with the failure aspects of
that function (i.e. dependability or performance measures). Those main factors can be
either physical as in the first and third use cases or cyber as in the second use case.
Table 7.1 shows the function-based metrics used to demonstrate the resilience in
the three use cases presented in this dissertation. When evaluating other functions, their
own function-based metrics should be derived. The use cases introduced in Chapters 4, 5
and 6 demonstrate the correlation between the main factors and the function failures of
the system. Using this correlation, a boundary between acceptable function dependability
or performance and function failure in the presence of the attack can be drawn. This
boundary identifies the maximum level of disturbance caused by the attack to which the
system is resilient. In other words, this boundary identifies the conditions under which
Use case Load Drop Attack Communication
Architecture
DR as Spinning
Reserve
Resilience Measure
Measure of
dependability or
performance of the
function in the presence
of cyber-attack
System stability
measured through:
1. Frequency (Hz)
2. Generator status
(e.g. shutdown
because of the
attack)
1. Remote metering
data received by the
utility (PDR)
2. Acknowledged DR
commands
(percentage of total
sent)
3. Average delay of DR
commands
System stability
measured through:
Frequency (Hz)
Measure of the amount
of disturbance a
function can tolerate
1. Amount of load
dropped (MW)
2. Time over which
load is dropped
(seconds)
1. Percentage of
compromised devices
launching DoS attack
2. DoS attack rate per
compromised device
(Kbps)
Percentage of load
that actually
responds to a DR
event (i.e. received
DR request)
Table 7.1: Resilience quantification metrics for each use case scenario.
101
the system achieves resilience. As a result, the system should be designed taking those
levels into consideration. For example and as demonstrated in Section 6.3, if there is a
15MW contingency then the smart grid can tolerate a maximum of 12% of the required
load not responding to the DR event. Proper policies and design choices should be made
to guarantee operating within this boundary. In addition, given a certain level of
disturbance because of an attack, we can identify the expected dependability of the
system using the function-based methodology. For example and again as demonstrated in
Section 6.3, we identify the frequency (dependability) of the system if only 5% of the
load responds to any contingency (e.g. 40MW contingency will be in restricted
operations region).
Fourth, the results that we got from the evaluation process can be used to build
security policies that maintain the resilience of the smart grid to disturbances caused by
cyber-physical threats. They can also be used to influence the design of the system so that
resilience is taken into consideration during the design process and not added on later in
an ad hoc fashion.
7.2 Related Work Comparison
One challenge that we faced while doing this work is how to assess this
methodology and compare it with other resilience evaluation methodologies in the
literature. One of the reasons for this problem is the way resilience is perceived by
researchers when evaluating certain systems (specifically critical infrastructure). In the
cyber-physical security domain, resilience is usually perceived as a goal of risk
management. On the other hand, researches in the environmental hazards and socio-
102
technical systems rely on statistical data and stochastic models to evaluate the resilience
of smart grids. In the following two subsections we discuss those two points.
Cyber-physical Security Domain 7.2.1
Researchers in this discipline rely on risk assessment methodologies to evaluate
resilience which is considered the goal for risk management, that is, risk management
enhances the resilience of the system under study [26]. By definition, risk is: the
likelihood of an event multiplied by the potential impact of that event. In the cyber-
security domain this is usually referred to by:
. While this type of assessment covers likely risks (because of the vulnerability
assessment step), it marginalizes unlikely risks (that are still possible) and does not cover
unknown risk. In addition, while more systematic approaches are being developed [79] in
this domain, most of the work has been done in an ad hoc fashion [59].
We presented a different approach. First, we evaluated resilience (not risk) which
is the persistent ability of the system to avoid severe or frequent failures in the presence
of attacks. Second, we addressed the unlikely or unknown risks problem through the
sensitivity analysis step of the methodology. In this step, the sensitivity analysis is
performed using the main factors (e.g. physical factors) which are stimulated in the
second level of the attack tree. This means that the consequences of multiple attack paths
in the third level of the attack tree are abstracted at the second level of the attack tree. By
performing this type of analysis, the question of “what if those physical factors are
stimulated?” is answered regardless of the cause (i.e. attack). Third, we presented a
103
systematic approach that is repeatable and reproducible as demonstrated by evaluating
the three use cases.
Environmental Hazards and Socio-technical Systems 7.2.2
More systematic approaches have been proposed in the environmental
hazards/socio-technical systems discipline to evaluate the resilience of smart grids (and
critical infrastructures in general). Resilience is evaluated in this discipline for events like
natural disasters (e.g. earthquakes and hurricanes), component failure and human
vandalism [45] [46] [48] [49]. Because of the nature of the events in this field of research,
probabilistic approaches (statistical and stochastic) are used and generalized to do the
evaluation. The main problem with this type of analysis is that failure probability models
are mainly designed based on statistical data for physical components in the system (e.g.
transformers and generators in the presence of an earthquake) or stochastic models of
failures for those components. This requires estimates of the probabilities of failures
because of these events in the system which is a difficult task when it comes to cyber-
physical threats [51].
There has been an attempt to use the same probabilistic approaches to analyze
smart grids under cyber-attacks in both the cyber-physical security domain and the
environmental hazards/socio-technical systems domain [47] [32]. However, using the
same method to estimate the probability of cyber-attacks (that cause failures) may not be
appropriate because: first, it is hard to represent cyber-attacks using probabilistic
methods similar to the ones used to model failures because of earthquakes (e.g. what is
the probability of a zero-day attack?). Second, these methods do not capture the behavior
104
of the attacker (attack scenario) which results in unrealistic modeling and impact analysis
of the attack. For example, assigning a random variable to represent the mean time to
attack that will cause a failure of a single power component like a generator neglects the
attack scenario and leads to unrealistic impact analysis.
In our approach, we study the impact on the system when certain cyber or
physical factors (main factors) are stimulated. This is done regardless of the probability
of the attack that stimulated those factors. By connecting events that lead to the function
failure, we demonstrate a realistic chain of events that may lead to functions failure. In
the last step of the methodology, we demonstrate how the main factors are stimulated in
at least one scenario. For example, in the load drop attack we do not assume that a
generator may trip with a certain probability. We demonstrated that if certain physical
factors (sudden load drop) are stimulated then the generator will trip. Then we
demonstrated that a sudden load drop may occur because of manipulating remote
disconnect commands. Those two steps make our evaluation realistic because the
dynamics of the physical system are modeled (in this case the load that dropped).
7.3 Limitations
There are three main limitations of the function-based methodology introduced in
this dissertation. Following is a discussion of those limitations:
1. Interdependencies: Interdependencies exist when there is a mutual
dependency between two functions, components or systems. In this
dissertation, we discussed how to evaluate resilience by analyzing
dependencies. However, in a complex system like the smart grid,
105
interdependencies may also exist. For example, there are interdependencies
between the cyber side and the power side of the system. While the power
side depends on the cyber side for controllability, the cyber side depends on
the power side for operating (electricity). An attack propagating from the
cyber side to the power side may have additional consequences on the cyber
side (i.e. cyber-physical-cyber-attack [6]). We believe that the function-based
methodology can be extended to handle interdependencies in the future.
2. Identifying the main physical factors: Identifying the main factors (physical or
cyber) is a key point in performing the sensitivity analysis of the
methodology. If there are too many factors then the sensitivity analysis
becomes difficult.
3. Generalizing the results: while modeling the system for the three use cases,
we configured the simulated models with parameters to reflect semi-realistic
scenarios. While the methodology still applies, the results we got do not
directly apply to real word scenarios. To be able to generalize the results,
sophisticated models of the system should be used. On the other hand, those
sophisticated models may not be accessible to the public and the results
gained from them may not be publishable.
106
Chapter 8
Conclusion and Future Work
In this chapter, we conclude the dissertation by first introducing a summary of the
work that has been done. Then, the main contributions of this dissertation are listed.
Finally, directions for future work are presented.
8.1 Summary
The smart grid is a large-scale, complex and heterogeneous system that is
susceptible to failures that are caused by malicious and non-malicious sources. One of the
main required characteristics of the smart grid is to operate resiliently to attacks and other
disturbances. In this dissertation, first, we surveyed the work that has already been done
in the literature to evaluate the resilience of the smart grid to attacks and faults.
Researchers in the environmental hazards domain used stochastic and statistical methods
to evaluate smart grid resilience. However, those techniques do not always apply when
evaluating resilience in the presence of malicious or non-malicious sources. On the other
hand, researchers in the cyber-security domain evaluated resilience of smart grids in an
ad hoc fashion or relied on risk assessment methodologies to do the evaluation.
107
In this dissertation, we introduced a comprehensive function-based evaluation
methodology to quantify the resilience of the smart grid. The evaluation process was
quantified by identifying the conditions under which resilience can be achieved in the
presence of attacks on the system. The effectiveness of this methodology in evaluating
the resilience of this large-scale, complex and heterogeneous system (i.e. smart grid) was
demonstrated using three use cases.
First, we introduced the load drop attack (affecting the power delivery function)
which is considered a cyber-physical attack in which the attack propagates from the cyber
side to the physical side of the system. The second use case is an attack on the
communication architecture. This attack can be considered both a cyber-cyber-attack
(interrupting billing data) and a cyber-physical attack (interrupting the load management
service) [6]. Third, we presented a cyber-physical attack on the DR system when it is
used as a spinning reserve. While demonstrating these three use cases, detailed models
were built using well known simulation tools like ns-2 and PowerWorld to represent and
evaluate the system.
8.2 Contributions
The main contributions of this work can be summarized in the following points:
1. A function-based methodology to evaluate the resilience the smart grid in the
presence of attacks (Chapter 3): This methodology consists of a modeling part
and experimental evaluation part. By applying this methodology to specific
functions in the smart grid, we can model this function and evaluate its
resilience. Using this methodology, the evaluation process is quantified by
108
identifying the conditions (i.e. disturbance levels caused by the attack) under
which resilience can still be achieved in the system. This is demonstrated by a
boundary between acceptable system dependability and failure. Acceptable
function dependability is distinguished from function failure by a function
specific measurement.
2. Three novel use cases (Chapters 4, 5 and 6): Three smart grid related use
cases we evaluated using the function-based methodology. First, the power
delivery functionality was evaluated when there is a load drop attack. Second,
the smart metering and demand response functionalities were evaluated when
there is a DoS attack on the communication architecture. Finally, demand
response as spinning reserve functionality was evaluated when there a cyber-
physical attack on the system. A boundary of acceptable system dependability
was created for the three use cases and resilience was successfully evaluated.
3. Detailed models of AMI and DR (Chapters 4, 5 and 6): Using ns-2 (network
simulator) and PowerWorld (power simulator), we built detailed models of the
AMI and DR. This includes the cyber and physical sides of the system. The
models incorporate parameters from the modeled regions, devices, protocols,
customer loads, customer types and customer distribution to reflect semi-
realistic simulations.
8.3 Future Work
Our primary focus in this work was on evaluating resilience of the smart grid in
the presence of cyber-physical threats. Beside the evaluated use cases, other scenarios
109
involving other smart grid subsystems can also be investigated. For example, the
resilience of the SCADA system, which by itself can be considered a cyber-physical
system, to cyber-physical threats can be evaluated.
While the use cases presented in this dissertation are smart grid related, we
believe that the same methodology can be also used to evaluate resilience of other cyber-
physical systems like water and gas cyber-physical systems. In addition, while the
primary sources of failures in this work are malicious activities, we believe that it can
also be extended to evaluate resilience in the presence of non-malicious activities. This
can be done by extending the analysis of dependencies of functions in the system to
include how non-malicious functions or components faults propagate to the function
under study. Future work may also involve adapting the function-based methodology to
cover interdependencies in the system.
Based on the evaluation process, the main factors that affect resilience of a certain
function can be determined. The limits to which the system can withstand a variation in
those factors can also be determined. Knowing this type of information can help in
deriving security policies and designing security components that govern the behavior of
the system to keep it in a secure state. Based on our preliminary results, we propose a
governor component that can be defined as: a component that serves to protect the smart
grid from failures that are more severe and frequent than is acceptable by enforcing
secure policies on the actions of higher-level functions.
110
8.4 Concluding Remarks
In this dissertation, I introduced a function-based methodology that can be used to
evaluate the resilience of smart grids in the presence of attacks. The usefulness of this
methodology was demonstrated by applying it to three use cases. Detailed models of the
systems under evaluation were built using simulation tools. The evaluation methodology
is comprehensive and repeatable so it can be applied to other use cases and cyber-
physical systems. The results of the evaluation process can be used to derive security
policies and design security components that govern the behavior of the system and keep
it resilient.
111
References
[1] A. Faruqui, D. Mitarotonda, L. Wood, A. Cooper and J. Schwartz, "The Costs and
Benefits of Smart Meters for Residential Customers," The Edison Foundation,
Institute for Electric Efficiency, 2011.
[2] H. Khurana, R. Bobba, T. Yardley, P. Agarwal and E. Heine, "Design principles for
power grid cyber-infrastructure authentication protocols," in 2010 43rd Hawaii
International Conference on System Sciences (HICSS), 2011.
[3] A. R. Metke and R. L. Ekl, "smart grid security technology," in Innovative Smart
Grid Technologies (ISGT), 2010, 2010.
[4] National Institute of Standards and Technology (NIST), "U.S., Europe Collaborating
on Smart Grid Standards Development," 13 September 2011. [Online]. Available:
http://www.nist.gov/smartgrid/grid-091311.cfm.
[5] "Smart Grid System Report," U.S. Department of Energy, 2009.
[6] C. Neuman and K. Tan, "Mediating cyber and physical threat propagation in secure
smart grid architectures," in Proceedings of the 2nd International Conference on
Smart Grid Communications (IEEE SmartGridComm), Brussels, 2011.
[7] National Energy Technology Laboratory (NETL), "A Vision For The Modern Grid,"
112
U.S. Department of Energy, 2007.
[8] U.S.-Canada Power System Outage Task Force, "Final Report on the August 14,
2003 Blackout in the United States and Canada: Causes and Recommendations,"
2004.
[9] N. Falliere, L. O. Murchu and E. Chien, "W32.sutxnet Dossier," February 2011.
[Online]. Available:
http://www.symantec.com/content/en/us/enterprise/media/security_response/whitepa
pers/w32_stuxnet_dossier.pdf. [Accessed September 2013].
[10] J. Meserve, "Sources: Staged cyber attack reveals vulnerability in power grid,"
September 2007. [Online]. Available:
http://edition.cnn.com/2007/US/09/26/power.at.risk/index.html. [Accessed
September 2013].
[11] "2010 Smart Grid System Report, Report to Congress February 2012," U.S. ,
Washington, DC, 2012.
[12] L. Strigini, "Fault tolerance and resilience: meanings, measures and assessment," in
Resilience Assessment and Evaluation of Computing Systems, Springer Berlin
Heidelberg, 2012, pp. 3-24.
[13] M. Bruneau, S. E. Chang, R. T. Eguchi, G. C. Lee, T. D. O’Rourke, A. M. Reinhorn,
M. Shinozuka, K. Tierney, W. A. Wallace and D. von Winterfeldt, "A framework to
quantitatively assess and enhance the seismic resilience of communities," in
Earthquake Spectra 19, 2003.
[14] C. S. Holling, "Resilience and stability of ecological systems," in Annual review of
ecology and systematics 4, 1973.
[15] J.-C. Laprie, "From dependability to resilience," in 38th IEEE/IFIP Int. Conf. On
Dependable Systems and Networks, 2008.
113
[16] J. Plodinec, "Definitions of resilience: An analysis," Oak Ridge: Community and
Regional Resilience Institute (CARRI), 2009.
[17] A. Rose, "Economic resilience to natural and man-made disasters: Multidisciplinary
origins and contextual dimensions," Environmental Hazards, vol. 7, no. 4, pp. 383-
398, 2007.
[18] K. Tierney and M. Bruneau, "Conceptualizing and measuring resilience: a key to
disaster loss reduction," TR news, no. 250, 2007.
[19] A. Cox, F. Prager and A. Rose, "Transportation security and the role of resilience: A
foundation for operational metrics," in Transport Policy 18, 2011.
[20] A. Avizienis, J.-C. Laprie, B. Randell and C. Landwehr, "Basic concepts and
taxonomy of dependable and secure computing," Dependable and Secure
Computing, IEEE Transactions on, vol. 1, no. 1, pp. 11 - 33, 2004.
[21] S. Sridhar and G. Manimaran, "Data integrity attacks and their impacts on SCADA
control system," in Power and Energy Society General Meeting, 2010 IEEE,
Minneapolis, MN, 2010.
[22] M. Vieira, H. Madeira, K. Sachs and S. Kounev, "Resilience Benchmarking," in
Resilience Assessment and Evaluation of Computing Systems, Springer Berlin
Heidelberg, 2012, pp. 283-301.
[23] A. AlMajali, E. Rice, A. Viswanathan, K. Tan and C. Neuman, "A Systems
Approach to Analysing Cyber-Physical Threats in the Smart Grid.," in IEEE Smart
Grid Communications (SmartGridComm), 2013.
[24] A. A. Cárdenas, S. Amin, Z.-S. Lin, Y.-L. Huang, C.-Y. Huang and S. Sastry,
"Attacks against process control systems: risk assessment, detection, and response,"
in In Proceedings of the 6th ACM Symposium on Information, Computer and
Communications Security, 2011.
114
[25] S. Sridhar, A. Hahn and M. Govindarasu, "Cyber attack-resilient control for smart
grid," in Innovative Smart Grid Technologies (ISGT), 2012 IEEE PES, 2012.
[26] S. Sridhar, A. Hahn and M. Govindarasu, "Cyber-physical system security for the
electric power grid," Proceedings of the IEEE, vol. 100, no. 1, pp. 210-224, 2012.
[27] Y.-L. Huang, A. A. Cárdenas, S. Amin, Z.-S. Lin, H.-Y. Tsai and S. Sastry,
"Understanding the physical and economic consequences of attacks on control
systems," International Journal of Critical Infrastructure Protection, vol. 2, no. 3,
pp. 73-83, 2009.
[28] S. Sridhar and G. Manimaran, "Data integrity attack and its impacts on voltage
control loop in power grid," in Power and Energy Society General Meeting, 2011
IEEE, San Diego, CA, 2011.
[29] Y. Mo, T.-J. Kim, K. Brancik, D. Dickinson, H. Lee, A. Perrig and B. Sinopoli,
"Cyber-physical security of a smart grid infrastructure," Proceedings of the IEEE,
vol. 100, no. 1, pp. 195 - 209, 2012.
[30] J.-C. Laprie, K. Kanoun and M. Kaâniche, "Modelling Interdependencies Between
the Electricity and Information Infrastructures," in Computer Safety, Reliability, and
Security, Springer Berlin Heidelberg, 2007, pp. 54-67.
[31] M. Masera and I. N. Fovino, "A service-oriented approach for assessing
infrastructure security," in Critical Infrastructure Protection, Springer, 2007, pp.
367-379.
[32] J. Stamp, A. McIntyre and B. Ricardson, "Reliability impacts from cyber attack on
electric power systems," in Power Systems Conference and Exposition, 2009. PSCE
'09. IEEE/PES, Seattle, WA, 2009.
[33] R. Billinton and L. Wen Yuan, Reliability assessment of electric power systems
using Monte Carlo methods, Springer, 1994.
115
[34] T. Kailath and H. V. Poor, "Detection of stochastic processes," Information Theory,
IEEE Transactions on , vol. 44, no. 6, pp. 2230-2231, 1998.
[35] S. Liu, X. Feng, D. Kundur, T. Zourntos and K. Butler-Purry, "Switched system
models for coordinated cyber-physical attack construction and simulation," in Smart
Grid Modeling and Simulation (SGMS), 2011 IEEE First International Workshop
on, Brussels, 2011.
[36] P. W. Sauer and M. A. Pai, Power System Dynamics and Stability, Stipes Publishing
Co, 2007.
[37] D. M. Nicol, C. M. Davis and T. Overbye, "A testbed for power system security
evaluation," International Journal of Information and Computer Security, vol. 3, no.
2, pp. 114-131, October 2009.
[38] Power World Corporation , "PowerWorld," [Online]. Available:
http://www.powerworld.com/.
[39] D. C. Bergman, D. Jin, D. M. Nicol and T. Yardley, "The virtual power system
testbed and inter-testbed integration," in Proceedings of the 2nd conference on
Cyber security experimentation and test (CSET'09), Montreal, Canada, 2009.
[40] T. Benzel, R. Braden, D. Kim, C. Neuman, A. Joseph, K. Sklower, R. Ostrenga and
S. Schwab, "Experience with deter: a testbed for security research," in Testbeds and
Research Infrastructures for the Development of Networks and Communities, 2006.
TRIDENTCOM 2006. 2nd International Conference on, Barcelona, 2006.
[41] M. J. McDonald, G. N. Conrad, T. C. Service and R. H. Cassidy, "Cyber Effects
Analysis Using VCSE," Sandia National Laboratories, Albuquerque, New Mexico,
2008.
[42] A. Ashok, A. Hahn and M. Govindarasu, "A Cyber-physical Security Testbed for
Smart Grid: System Architecture and Studies," in Proceedings of the Seventh
Annual Workshop on Cyber Security and Information Intelligence Research
116
(CSIIRW '11), Oak Ridge, Tennessee, 2011.
[43] "National SCADA Test Bed," [Online]. Available: http://energy.gov/oe/national-
scada-test-bed.
[44] "Network Simulator - 2 (ns-2)," [Online]. Available: http://www.isi.edu/nsnam/ns/.
[45] M. Shinozuka, S. E. Chang, T.-C. Cheng, M. Feng, T. D. O’rourke, M. A.
Saadeghvaziri, X. Dong, X. Jin, Y. Wang and P. Shi, "Resilience of integrated
power and water systems," Multidisciplinary Center for Earthquake Engineering
Research, 2003.
[46] D. A. Reed, K. C. Kapur and R. D. Christie, "Methodology for assessing the
resilience of networked infrastructure," Systems Journal, IEEE, vol. 3, no. 2, pp.
174-180, 2009.
[47] S. Chiaradonna, F. Di Giandomenico and P. Lollini, "Case Study on Critical
Infrastructures: Assessment of Electric Power Systems," in Resilience Assessment
and Evaluation of Computing Systems, Springer Berlin Heidelberg, 2012, pp. 365-
390.
[48] M. Ouyang and L. Dueñas-Osorio, "Resilience Modeling and Simulation of Smart
Grids," in Structures Congress 2011, 2011.
[49] M. Ouyang and L. Dueñas-Osorio, "Time-dependent resilience assessment and
improvement of urban infrastructure systems," Chaos: An Interdisciplinary Journal
of Nonlinear Science, vol. 22, no. 3, 2012.
[50] E. Zio and W. Kroger, "Vulnerability assessment of critical infrastructures," IEEE
Reliability Society 2009 Annual Technology Report, 2009.
[51] G. Dondossola, L. O. and A. Torkilseng, "Key issues and related methodologies in
the security risk analysis and evaluation of electric power control systems," Cigré
117
Session 2006, 2006.
[52] "Keeping the Country Running: Natural Hazards and Infrastructure," Cabinet
Office, London, 2011.
[53] Australian Government, "Critical Infrastructure Resilience Strategy," 2010.
[54] M. Suter, "Focal Report 7: CIP Resilience and Risk Management in Critical
Infrastructure Protection Policy," Risk and Resilience Research Group, Center for
Security Studies (CSS), ETH Zürich, 2011.
[55] T. Mitchell and K. Harris, "Resilience: A risk management approach," ODI
background note, 2012.
[56] National Institute of Standards and Technology (NIST), "Guide for Conducting Risk
Assessments," U.S. Department of Commerce, Gaithersburg, MD, 2012.
[57] A. Avritzer, F. Di Giandomenico, A. Remke and M. Riedl, "Assessing
Dependability and Resilience in Critical Infrastructures," in Resilience Assessment
and Evaluation of Computing Systems, Springer Berlin Heidelberg, 2012, pp. 41-63.
[58] M. Masera, "Interdependencies and Security Assessment: a Dependability view," in
2006 IEEE International Conference on Systems, Man and Cybernetics, 2006,
Taipei, 2006.
[59] A. AlMajali, A. Viswanathan and C. Neuman, "Analyzing resiliency of the smart
grid communication architectures under cyber attack," in 5th Workshop on Cyber
Security Experimentation and Test, 2012.
[60] Itron Whitepaper, Itron Publication, "OpenWay® Security Overview," 2011.
[Online]. Available:
https://www.itron.com/na/publishedcontent/wp_openwaysecurityoverview.pdf.
118
[61] Silver Spring Networks, WHITEPAPER, "Smart Grid Standards," 2012. [Online].
Available: http://www.silverspringnet.com/pdfs/whitepapers/SilverSpring-
Whitepaper-SmartGridStandards.pdf.
[62] C. Perkins, E. Belding-Royer and S. Das, "Ad hoc On-Demand Distance Vector
(AODV) Routing," July 2003. [Online]. Available:
http://www.ietf.org/rfc/rfc3561.txt.
[63] Silver Spring Networks, "Communications Module for Electricity Meters
(datasheet)," [Online]. Available: http://www.silverspringnet.com/pdfs/SilverSpring-
Datasheet-Communications-Modules.pdf.
[64] B. Lichtensteiger, B. Bjelajac, C. M ller and C. ietfeld, RF Mesh Systems for
Smart Metering: System Architecture and Performance," in Smart Grid
Communications (SmartGridComm), 2010 First IEEE International Conference on,
Gaithersburg, MD, 2010.
[65] U.S. Energy Information Administration, "Electric Sales, Revenue, and Average
Price," [Online]. Available: http://www.eia.gov/electricity/sales_revenue_price/.
[66] GE Digital Energy, "Grid IQTM AMI P2MP: Grid Connectivity for Smart Metering,
Distribution Monitoring and Sensing Applications," 28 January 2013. [Online].
Available:
http://www.gedigitalenergy.com/products/brochures/SmartMetering/GridIQ_P2MP.
pdf.
[67] Itron Specification Sheet, Itron Publication 100808SP-05-08/11, "OpenWay®
Centron Meter," 2011. [Online]. Available:
https://www.itron.com/na/PublishedContent/OpenWay%20Centron%20Meter.pdf.
[68] WECC, "WECC Approved Dynamic Model Library," 2011.
[69] North American Electric Reliability Corporation (NERC) Technical Document,
"Balacing and Frequency Control," 2011.
119
[70] Engage Consulting Limited for Energy Networks Association (ENA), "High-level
Smart Meter Data Traffic Analysis," May 2010. [Online]. Available:
http://www.energynetworks.org/modx/assets/files/electricity/futures/smart_meters/E
NA-CR008-001-1%204%20_Data%20Traffic%20Analysis_.pdf. [Accessed 2013].
[71] Intl. Electrotechnical Commission, "IEC 61968-9 (ed1.0): Interfaces for Meter
Reading and Control," September 2009. [Online]. Available: URL:
http://www.iec.ch/smartgrid/standards/.
[72] G. Liu and C. Ji, "Scalability of Network-failure Resilience: Analysis Using Multi-
layer Probabilistic Graphical Models," Networking, IEEE/ACM Transactions on,
vol. 17, no. 1, p. 319–331, February 2009.
[73] P. Cholda, J. Tapolcai, T. Cinkler, K. Wajda and A. Jajszczyk, "Quality of
Resilience as a Network Reliability Characterization Tool," IEEE Network, vol. 23,
no. 2, p. 11–19, 2009.
[74] W. Najjar and J.-L. Gaudiot, "Network Resilience: A Measure of Network Fault
Tolerance," Computers, IEEE Transactions, vol. 39, no. 2, pp. 174-181, 1990.
[75] K.-W. Lee, S. Chari, A. Shaikh, S. Sahu and P.-C. Cheng, "Improving the Resilience
of Content Distribution Networks to Large Scale Distributed Denial of Service
Attacks," Computer Networks, vol. 51, no. 10, p. 2753–2770, July 2007.
[76] D. Johnson, Y. Hu and D. Maltz, "RFC 4728: The Dynamic Source Routing
Protocol (DSR)for Mobile Ad Hoc Networks for IPv4," 2007.
[77] C. E. Perkins and P. Bhagwat, "Highly Dynamic Destination-Sequenced Distance-
Vector Routing (DSDV) For Mobile Computers," SIGCOMM '94 Proceedings of the
conference on Communications architectures, protocols and applications, vol. 24,
no. 4, p. 234–244, Octobor 1994.
[78] D. Wang, Z. Tao, J. Zhang and A. Abouzeid, "RPL Based Routing for Advanced
Metering Infrastructure in Smart Grid," in 2010 IEEE International Conference on
120
Communications Workshops (ICC), 2010.
[79] D. J. Bodeau, R. D. Graubart and E. R. Laderman, "Cyber Resiliency Engineering,
Overview of the Architectural Assessment Process," in Conference on Systems
Engineering Research (CSER 2014), Redondo Beach, 2014.
[80] Electric Power Research Institute, "EPRI Power System Dynamics Tutorial," 2009.
[81] Oak Ridge National Laboratory, "Demand Response For Power System Reliability:
FAQ," 2006.
[82] J. N.-H. J. Eto, E. Parker, C. Bernier, P. Young, D. Sheehan, J. Kueck and B. Kirby,
"The Demand Response Spinning Reserve Demonstration--Measuring the Speed
and Magnitude of Aggregated Demand Response," in 2012 45th Hawaii
International Conference on System Science (HICSS), Maui, HI, 2012.
[83] Ernest Orlando Lawrence Berkeley National Laboratory, "Demand Response
Spinning Reserve Demonstration," 2007.
[84] Ernest Orlando Lawrence Berkeley National Laboratory, "Demand Response
Spinning Reserve Demonstration – Phase 2 Findings from the Summer," 2009.
Abstract (if available)
Abstract
Utilizing communication, control and computation technologies in the modern smart grid can enhance the reliability of the smart grid, reduce electricity costs and provide new real-time customer services. While utilizing those technologies can be beneficial to customers and utilities, they also make the smart grid susceptible to new types of attacks and failures. One of the main characteristics that are required in modern smart grids is to operate resiliently in the presence of attacks and other disturbances. Evaluating the resilience of the smart grid has been a topic of interest in recent years. Researchers in the environmental hazards domain use stochastic and statistical methods to evaluate smart grid resilience. However, those techniques do not always apply when evaluating resilience in the presence of malicious sources. On the other hand, researchers in the cyber-security domain evaluate resilience of smart grids in an ad hoc fashion or rely on risk assessment methodologies to do the evaluation. ❧ In this work, we introduce a systematic and comprehensive function-based methodology that can be used to evaluate the resilience of the smart grid to failures that are caused by malicious sources. This methodology consists of four main steps. The first two steps represent the modeling part of the methodology whereas the second two steps represent the experimental evaluation of those models. First, we start by identifying the function under study and the functions and components on which it depends. By doing this, we scope the evaluation process to a single function at a time. By exploiting those dependencies, an attack tree is created to abstract the consequences of multiple attacks and demonstrate how attacks propagate between different domains. In the experimental part of the methodology, we use simulation tools to evaluate the resilience of the function under study in the presence of cyber-physical attacks. Based on the evaluation process, the main factors that affect resilience of a certain functionality can be determined. The resilience of the system is quantified by identifying the dependability limits to which the system can withstand variations caused by attacks. Knowing this type of information helps in deriving security policies and designing security components that govern the behavior of the system and keep it resilient. ❧ The usefulness of the function-based methodology is demonstrated by three novel use cases: 1) cyber-physical threat of a load drop attack 2) cyber threat of a Denial of Service (DoS) attack on the communication architecture of the smart grid and 3) cyber-physical threats on demand response when used as spinning reserve in the smart grid. The resilience of the smart grid in the presence of these three threats was evaluated. ❧ In the load drop attack use case, we evaluated the impact of a sudden load drop on the power delivery functionality and frequency of the system. The results identify the maximum load that the system can withstand if dropped within a certain time. In the second use case, we evaluated the impact of a DoS attack on the remote metering and demand response functionalities. The DoS attack is performed in the customers’ neighborhood area where smart meters communicate with the utility through an RF mesh network. The results showed that it requires an attacker to compromise only a small fraction of the meters in a typical RF mesh region to disrupt the communication resilience within the region. The results demonstrate that disrupting the communication resilience caused remote metering and demand response failures. Finally, we evaluated the resilience of the system when demand response is used as spinning reserve. When there is a power contingency, demand response curtails certain amount of load to stabilize the system to 60Hz. In this use case, we analyze the stability of the system when demand response is under attack. The results identify the minimum amount of load that should respond to a power contingency to stabilize the frequency of the system. ❧ Resilience evaluation is done by creating a boundary of acceptable system dependability in the presence of the malicious attacks. System dependability is measured based on the function specific metrics used in each use case. Our results can be used to derive security policies and our function-based methodology will be useful for the evaluation of additional use cases in the smart grid and other cyber-physical systems.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Defending industrial control systems: an end-to-end approach for managing cyber-physical risk
PDF
Dynamic graph analytics for cyber systems security applications
PDF
Discrete optimization for supply demand matching in smart grids
PDF
Prediction models for dynamic decision making in smart grid
PDF
Data-driven methods for increasing real-time observability in smart distribution grids
PDF
Adaptive and resilient stream processing on cloud infrastructure
PDF
Model-driven situational awareness in large-scale, complex systems
PDF
Novel and efficient schemes for security and privacy issues in smart grids
PDF
A complex event processing framework for fast data management
PDF
Distribution system reliability analysis for smart grid applications
PDF
Introspective resilience for exascale high-performance computing systems
PDF
A new hot-spot temperature model and methodology to determine the loss of insulation life in distribution transformers
PDF
Electric vehicle integration into the distribution grid: impact, control and forecast
PDF
The smart grid network: pricing, markets and incentives
PDF
Reducing inter-component communication vulnerabilities in event-based systems
PDF
Workflow restructuring techniques for improving the performance of scientific workflows executing in distributed environments
PDF
Utility – customer interface strategy and user application for electric vehicles and distributed generation management
PDF
Supporting faithful and safe live malware analysis
PDF
Detecting anomalies in event-based systems through static analysis
PDF
Provenance management for dynamic, distributed and dataflow environments
Asset Metadata
Creator
Al Majali, Anas
(author)
Core Title
A function-based methodology for evaluating resilience in smart grids
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Engineering
Publication Date
09/22/2014
Defense Date
08/28/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cyber-physical security,OAI-PMH Harvest,resilience,smart grids
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Neuman, Clifford B. (
committee chair
), Prasanna, Viktor (
committee chair
), Beshir, Mohammed J. (
committee member
), Halfond, William G. J. (
committee member
)
Creator Email
almajali@usc.edu,anasmajali@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-481892
Unique identifier
UC11286625
Identifier
etd-AlMajaliAn-2967.pdf (filename),usctheses-c3-481892 (legacy record id)
Legacy Identifier
etd-AlMajaliAn-2967.pdf
Dmrecord
481892
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Al Majali, Anas
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
cyber-physical security
resilience
smart grids