Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Data-driven methods for increasing real-time observability in smart distribution grids
(USC Thesis Other)
Data-driven methods for increasing real-time observability in smart distribution grids
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DATA-DRIVEN METHODS FOR INCREASING REAL-TIME OBSERVABILITY IN
SMART DISTRIBUTION GRIDS
by
Cheung Chung Ming
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Computer Science)
August 2021
Copyright 2021 Cheung Chung Ming
Dedication
To my parents for their sacrices and support.
ii
Acknowledgements
I would like to express my deepest gratitude to my advisor Professor Viktor K. Prasanna for his support,
kindness and patience throughout my PhD. His guidance and encouragement has helped me overcome
hardships and struggles that I have encountered. He has taught me to learn from my mistakes and improve
my skills as a researcher. I would also like to thank Dr. Rajgopal Kannan who had long discussions with me
on many research problems we have tackled. He has provided me great insights to approaching problems.
Also, I am grateful to Professor Cauligi Raghavendra and Professor Aiichiro Nakano for serving on my
qualier and dissertation committee. Moreover, I would like to thank Professor Ashutosh Nayyar and
Professor Xuehai Qian for serving on my qualier committee.
I would like to thank Professor Anand Panangadan for his guidance during the initial years of my PhD.
He has taught me the fundamental skills of a researcher. I would also like to thank Professor Donald Paul
for his support and guidance. I would also like to express my gratitude to everyone in the Data Science Lab,
especially Yinuo Zhang, Om Patri, Greg Harris, Ajitesh Srivastava, Rizwan Saeed, Chi Zhang. In particular,
I would like to express my deepest thanks to Sanmukh Kuppannagari for his guidance during my last few
years, we have had many meaningful discussions. Also, I would like to thank my friends Palash, Ayush,
Nitin, Gozde, Michail, Thanos and many others. They have made this long journey of completing the thesis
an unforgettable experience in my life. Moreover, I would like to thank all the CS and ECE department
sta, especially Lizsl De Leon, Kathyrn Kassar, Michelle Wilkinson and Juli Legat, for their help.
iii
Finally, I am very grateful to my family. My parents and my sister have been very supportive of
my pursue for the PhD degree. I would have not been able to nish my thesis without their love and
encouragement.
iv
TableofContents
Dedication ii
Acknowledgements iii
ListofTables ix
ListofFigures xii
Abstract xv
Chapter1: Introduction 1
1.1 Introduction to Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Observability in Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Motivation for High Observability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 Missing Data Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.4 Achieving High Observability Through Metering and Time Series Analytics . . . . 6
1.4 Challenges in Increasing Observability in Smart Grid . . . . . . . . . . . . . . . . . . . . . 6
1.5 Data-Driven Models for Increasing Real-Time Observability . . . . . . . . . . . . . . . . . 8
1.6 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.8 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter2: DataAnalyticsinSmartGrid 11
2.1 Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Grid Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Behind-the-Meter Distributed Energy Resources . . . . . . . . . . . . . . . . . . . 12
2.1.4 BTM Solar Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.5 Smart Inverter and Power Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.6 BTM Battery Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Data Driven Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Use of Data-Driven Models for Time Series Analysis . . . . . . . . . . . . . . . . . 14
2.2.2 Analytical Models and Data-Driven Models . . . . . . . . . . . . . . . . . . . . . . 14
2.2.3 Applications of Data-Driven Models . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Challenges and Opportunities in Increasing Observability . . . . . . . . . . . . . . . . . . . 16
v
2.3.1 Observability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2 Challenges of Partial Observability . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2.1 Hidden Solar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2.2 Hidden Battery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2.3 Real and Reactive Power . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.2.4 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.3 Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.3.1 Unsupervised Behind-The-Meter DER Disaggregation . . . . . . . . . . . 19
2.3.3.2 Leveraging Spatial-temporal Information . . . . . . . . . . . . . . . . . . 20
Chapter3: RelatedWork 21
3.1 Behind-the-Meter Solar Disaggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Load Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Missing Data Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Other Time Series Applications in Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . 25
Chapter4: Behind-the-MeterSolarDisaggregationusingConsumerMixtureModel 27
4.1 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.1 Disaggregation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2 Disaggregation in Presence of Batteries . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.3 Requirements and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Consumer Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Model Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.3 Aggregation of AMI measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.4 Disaggregation with Consumer Mixture Model . . . . . . . . . . . . . . . . . . . . 34
4.2.4.1 Regular CMM with 1 solar feature source . . . . . . . . . . . . . . . . . . 34
4.2.4.2 Iterative CMM for multiple solar feature sources . . . . . . . . . . . . . . 37
4.2.5 Base CMM with Battery Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.6 Post-Disaggregation Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.7 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1 Performance of Base CMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.1.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.2 Performance of Iterative CMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.2.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.2.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.2.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.2.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.3 Experiments in presence of batteries . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.3.1 Dataset Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.3.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.3.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.3.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
vi
Chapter5: Spatial-TemporalDataAnalysisinSmartGrid 70
5.1 Background of Spatial-Temporal Graph Convolution Networks . . . . . . . . . . . . . . . . 71
5.1.1 Graph Convolutional Network (GCN) . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1.2 Spatial-Temporal Graph Convolutional Network (STGCN) . . . . . . . . . . . . . . 72
5.2 Load Forecasting with STGCNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.1 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.1.1 Short-term Load Forecasting (STLF) Denition . . . . . . . . . . . . . . . 74
5.2.1.2 STLF with Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.2 Application of STGCN to Power grid . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.3.1 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2.4.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3 Missing Data Imputation with STGCNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.1 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.1.1 Problem of Missing Data in Smart Grid . . . . . . . . . . . . . . . . . . . 95
5.3.1.2 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3.2.1 Spatial-Temporal GNN based Auto-Encoder (STGNN-DAE) . . . . . . . . 97
5.3.3 STGNN-DAE based MDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3.3.1 Deployment of STGNN-DAE for MDI . . . . . . . . . . . . . . . . . . . . 101
5.3.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3.4.1 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3.4.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.3.4.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3.4.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Chapter6: ImprovedAnalyticsThroughDisaggregatedData 109
6.1 Power Factor Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.1.1 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.1.1.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.1.2.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.1.2.2 Power Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.1.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.1.3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.1.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.1.3.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.1.3.4 Results: Complete Observability Case . . . . . . . . . . . . . . . . . . . . 119
6.1.3.5 Results: Partial Observability Case . . . . . . . . . . . . . . . . . . . . . . 121
6.2 User Proling with Disaggregated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2.1 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.2.1 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.2.2 Consumer Mixture Model for Signal Disaggregation . . . . . . . . . . . . 125
vii
6.2.2.3 Acquiring Final Clustering Results . . . . . . . . . . . . . . . . . . . . . . 127
6.2.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.2.3.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.2.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.2.3.3 Datasets and Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 128
6.2.3.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.3 Exogenous Features Inference By Consumption Clustering . . . . . . . . . . . . . . . . . . 131
6.3.1 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.3.2.1 Soft Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.3.2.2 Input Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3.2.3 Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.3.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.3.3.1 Dataset and Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 138
6.3.3.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.3.3.3 Experiment Setup and Evaluation Metrics . . . . . . . . . . . . . . . . . 140
6.3.3.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Chapter7: ConclusionandFutureWork 146
7.1 Broader Impacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2.1 Disaggregation of BTM Electric Vehicles as a DER . . . . . . . . . . . . . . . . . . 147
7.2.2 Attention-based STGCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.2.3 Utilizing Spatial Features for Anomaly Detection . . . . . . . . . . . . . . . . . . . 148
7.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Bibliography 150
viii
ListofTables
4.1 Identication of consumers with and without PVs . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 MSE comparison of various training methods . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 MASE comparison of various training methods . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Disaggregation error for varyingT whenN = 100 in August 2012 . . . . . . . . . . . . . 55
4.5 Disaggregation error for varyingT whenN = 100 in February 2012 . . . . . . . . . . . . 55
4.6 Disaggregation error for varying aggregation level whenN = 100 andT = 14 for CMM
(Random) in August 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.7 Disaggregation error for varying aggregation level whenN = 100 andT = 14 for Kara
(Random) in August 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.8 Disaggregation error for varying cluster number for aggregation whenN = 100 and
T = 14 for CMM in August 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.9 Disaggregation error for varying cluster number for aggregation whenN = 100 and
T = 14 for Kara in August 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.10 Disaggregation error for varying aggregation level whenN = 100 andT = 14 for CMM
(Random) in February 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.11 Disaggregation error for varying aggregation level whenN = 100 andT = 14 for Kara
(Random) in February 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.12 Disaggregation error for varying cluster number for aggregation whenN = 100 and
T = 14 for CMM in February 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.13 Disaggregation error for varying cluster number for aggregation whenN = 100 and
T = 14 for Kara in February 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.14 Runtime (s) for varying aggregation level whenN = 100,T = 14 with August 2012 Data . 62
ix
4.15 Disaggregation error for varying aggregation level whenN = 100 andT = 14 for CMM
(Random) without reverse ows in August 2012 . . . . . . . . . . . . . . . . . . . . . . . . 62
4.16 Disaggregation accuracy under varying length of disaggregation periods in MAE . . . . . 65
4.17 Disaggregation accuracy under varying length of disaggregation periods in MAPE . . . . . 66
4.18 Disaggregation accuracy under varying customer aggregation levels in MAE . . . . . . . . 66
4.19 Disaggregation accuracy under varying customer aggregation levels in MAPE . . . . . . . 66
4.20 Runtime for various models for disaggregation period 30 days . . . . . . . . . . . . . . . . 68
5.1 List of Algorithms used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Missing value periods statistics [85] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3 Setups for Missing data experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.4 Varying hops of neighbor eect on STGCN prediction accuracy in MAE . . . . . . . . . . . 88
5.5 MAE, MAPE and RMSE for various algorithms for predicting real load with varying
prediction window size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.6 MAE, MAPE and RMSE for various algorithms for predicting reactive load with varying
prediction window size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.7 MAE, MAPE and RMSE for Missing Data experiment 1 (Random, Zeros) . . . . . . . . . . 93
5.8 MAE, MAPE and RMSE for Missing Data experiment 2 (Random, Interpolated) . . . . . . . 94
5.9 MAE, MAPE and RMSE for Missing Data experiment 3 (Spatial Locality, Zeros) . . . . . . 94
5.10 MAE, MAPE and RMSE for Missing Data experiment 4 (Spatial Locality, Interpolated) . . . 94
5.11 Missing Data Congurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.12 Mean Absolute Error of Missing Data Imputation . . . . . . . . . . . . . . . . . . . . . . . 105
5.13 Root Mean Squared Error of Missing Data Imputation . . . . . . . . . . . . . . . . . . . . . 105
5.14 Normalized Absolute Error of Missing Data Imputation . . . . . . . . . . . . . . . . . . . . 106
5.15 Real Mean Value during each experiment execution . . . . . . . . . . . . . . . . . . . . . . 106
5.16 Testing STGNN-DAE on Varying Missingness Congurations . . . . . . . . . . . . . . . . 107
5.17 Testing STGNN-DAE on Mixed Block Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
x
6.1 Congurations of Nodes in Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.2 Testing Errors for Model With or Without Neighbor Inputs for Complete Observability Case 119
6.3 Testing Errors for Node 1 and 2 for Complete Observability Case . . . . . . . . . . . . . . . 120
6.4 Testing Errors for Node 0, 1 and 2 for Partial Observability Case . . . . . . . . . . . . . . . 121
6.5 List of consumption gure input features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.6 List of ratio input features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.7 List of features extracted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.8 List of input features congurations used for each experiment . . . . . . . . . . . . . . . . 140
6.9 Classication accuracy for each input feature conguration with SVC predictor compared
against baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.10 Balanced accuracy for each input feature conguration with SVC predictor compared
against baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.11 Classication accuracy for each input feature conguration with NN predictor compared
against baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.12 Balanced accuracy for each input feature conguration with NN predictor compared
against baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
xi
ListofFigures
1.1 Simplied Representation of Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Achieving High Observability Through Metering and Time Series Analytics . . . . . . . . 6
4.1 Overall architecture of the disaggregation model . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Grid topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 A simulated customer data sample for 1 day . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Cluster centroids of consumption of consumers without PVs . . . . . . . . . . . . . . . . . 45
4.5 Disaggregated consumption load for a user compared to the true load . . . . . . . . . . . . 45
4.6 Disaggregated solar generation for a user compared to the true solar generation . . . . . . 45
4.7 Disaggregated consumption load for a user compared to the true load for dierent T values 46
4.8 MSE for various T values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.9 MASE for various T values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.10 Centroids discovered from feature engineering step . . . . . . . . . . . . . . . . . . . . . . 54
4.11 First 7 days of load disaggregation results comparison of CMM (Random) and Kara
(Random) for a single aggregated measurement inN = 100 andT = 14 case . . . . . . . . 59
4.12 First 7 days of solar disaggregation results comparison of CMM (Random) and Kara
(Random) for a single aggregated measurement inN = 100 andT = 14 case . . . . . . . . 59
4.13 Comparison of the solar generation of a customer in August 2012 and February 2012 for 7
days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.14 First day of data of centroids from the clustering results . . . . . . . . . . . . . . . . . . . . 65
xii
4.15 Disaggregated solar and ground truth measurements of a user with no hidden-battery
model disaggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.16 Disaggregated solar and ground truth measurements of a user with hidden-battery model
disaggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.17 Disaggregated load and ground truth measurements of a user with no hidden-battery
model disaggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.18 Disaggregated load and ground truth measurements of a user with hidden-battery model
disaggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.1 Architecture of a Spatial-Temporal Convolutional Block . . . . . . . . . . . . . . . . . . . . 72
5.2 Description of convolution on inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 Grid topology (Image borrowed from from [9]) . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4 Modication on graph by appending edges to multihop neighbors . . . . . . . . . . . . . . 77
5.5 Example of setting missing values to zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.6 Example of linearly interpolating missing values . . . . . . . . . . . . . . . . . . . . . . . . 84
5.7 Prediction results of one customer sample for real load . . . . . . . . . . . . . . . . . . . . 89
5.8 Prediction results of one customer sample for reactive load . . . . . . . . . . . . . . . . . . 90
5.9 Prediction results of Missing Data experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . 92
5.10 Prediction results of Missing Data experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . 92
5.11 Illustration of time series with single missing entries and block missing entries . . . . . . . 96
5.12 Overall training workow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1 Overall Architecture of Our Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.2 Comparison between Neural Network Outputs and Simulation Outputs on Node 0 with
no neighbor inputs for Complete Observability Case . . . . . . . . . . . . . . . . . . . . . . 119
6.3 Comparison between Neural Network Outputs and Simulation Outputs on Node 0 with
neighbor inputs for Complete Observability Case . . . . . . . . . . . . . . . . . . . . . . . 119
6.4 Comparison between Neural Network Outputs and Simulation Outputs on Node 1 for
Complete Observability Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
xiii
6.5 Comparison between Neural Network Outputs and Simulation Outputs on Node 2 for
Complete Observability Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.6 Subsequence of Disaggregated Load compared to True Values for Node 1 . . . . . . . . . . 121
6.7 Subsequence of Disaggregated Solar compared to True Values for Node 1 . . . . . . . . . . 121
6.8 Comparison between Neural Network Outputs and Simulation Outputs on Node 0 for
Partial Observability Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.9 Comparison between Neural Network Outputs and Simulation Outputs on Node 1 for
Partial Observability Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.10 Comparison between Neural Network Outputs and Simulation Outputs on Node 2 for
Partial Observability Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.11 Clustering for separating PV and non-PV customers . . . . . . . . . . . . . . . . . . . . . . 129
6.12 Clustering with K-Medoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.13 Clustering with disaggregated measurements using K-Shape . . . . . . . . . . . . . . . . . 132
6.14 Silhouette Coecient for clustering results of each algorithm . . . . . . . . . . . . . . . . . 132
6.15 Workow for feature extraction from customer consumption data . . . . . . . . . . . . . . 134
6.16 Workow for training and evaluating the characteristics prediction model . . . . . . . . . 134
xiv
Abstract
Traditional power distribution grid have evolved into smart grid with the development of advanced me-
tering infrastructures and renewable energy based distributed energy resources (DERs). This has intro-
duced the following challenges: (1) The stochasticity of renewable energy based DERs has increased the
volatility of grid frequency; (2) The decentralization of generation into small scaled DERs has reduced grid
inertia which is used to provide grid operators with sucient buer time to ramp up/down generators
to balance supply-demand. To address these challenges, real-time knowledge and understanding of sig-
nal measurements of grid assets, called observability, are crucial to make grid operation decisions swiftly.
High observability can be obtained through extensive metering of assets in smart grid for data collection,
and time series analytics that extract information from the collected time series data.
Presently, these time series analytics are performed with data-driven models. Using measured data
from meters, models are trained to perform specic tasks related to smart grid operations. For example,
forecasting models are used to predict future values, clustering models are used to perform user proling
for understanding customer behavior. However, the proliferation of DERs has introduced new challenges
in these analytics. DERs located behind-the-meter (BTM) are not recorded individually and hidden from
real-time observations. This combined with the volatile nature of DER assets greatly reduces observability.
As a result, these data-driven models do not have full observability of data and suer from accuracy losses.
In this thesis, we develop data-driven approaches to improve observability. We develop unsupervised
disaggregation models for separation of signals of BTM DERs hidden from net meter measurements. We
xv
focus on the separation of signals from the activity of BTM solar photovoltaics and battery storage. We also
propose capturing spatial features using sophisticated machine learning models such as spatial-temporal
graph convolution networks for improving time series analytics in smart grid. Such time series analytics
include load forecasting and missing data imputation, which estimate information like future or missing
values that is needed for grid operations. Moreover, we show that the increase in observability provided
by these data-driven models can enhance time series analytics in smart grid. We consider two types of
analytics, power factor prediction and user proling. We evaluate the accuracy of these analytics using
data with BTM solar generation, and demonstrate that the accuracy can be improved by preprocessing the
data with disaggregation to separate BTM solar signals. Finally, we show that user proling results can be
used to improve models for prediction of household socio-demographic characteristics.
xvi
Chapter1
Introduction
1.1 IntroductiontoSmartGrid
Electric power distribution grid is a sophisticated infrastructure managed by utility companies for the dis-
tribution of electricity. Traditionally, electricity is generated from centralized power generation facilities
owned by utility companies and then delivered to their customers. Electricity generated is transmitted at
very high voltages on transmission lines over long distances so as to reduce the energy loss. When elec-
tricity reaches near customers, substations with transformers convert the electricity to lower voltage and
branch out to deliver electricity to industry, commercial and residential customers. These transmission
lines form a complex network topology that connects all assets in the grid together.
Traditionally, control over the grid has been centralized at the grid operators. The main goals of a
grid operator in utility companies is to balance demand and supply ensuring that there is always enough
supply to satisfy demands [52]. Since controls are centralized, grid operators can adjust power production
from power plants in accordance to demand changes. Another important task of the grid operators is to
maintain the grid frequency within safety limits to prevent damage to equipment and blackouts. Variability
in demand causes uctuations in the frequency of the grid. Frequency that is either too high or too low
can cause large damage to the grid [16]. Currently, such uctuations can be oset by inertia provided
by power generation turbines that rotate at similar frequencies as the grid. The utility-owned centralized
1
Figure 1.1: Simplied Representation of Smart Grid
power plants generate electricity with large steam or water turbines which can provide large amounts of
inertia. This provides adequate time for grid operators to ramp up or down generation accordingly as the
uctuations are oset by inertia.
Technological advancements has changed power grid greatly in the past few decades, evolving them
into smart grid. Small-scaled consumer owned Distributed Energy Resources (DERs) have become aord-
able for many households. Fig. 1.1 illustrates examples of DERs that exist in smart grid, for example, solar
rooftop photovoltaics (PV) converting solar energy to electricity in residential customers. The existence of
these energy resources decentralizes the full control of power production from utilities, further complicat-
ing the task of grid operation [67]. DERs are more volatile in nature, causing greater uctuations in grid
frequency. Also, they have replaced many traditional power plants and many turbines have gone out of
operation [37]. This reduces the inertia available to oset uctuations, making grid control a much more
challenging task. At the same time, more eort has been put into monitoring and data collection on assets,
2
including household loads and DERs, in the grid. This is powered by bi-directional Advanced Metering In-
frastructure (AMI) meters [66]. These meters provide real-time grid information and allow remote control
of assets. This provides opportunities for grid operators to adopt articial intelligence in the analysis of
real-time grid data to make better operation decisions and adapt to uncontrollable volatile DER behavior.
Customers can also participate in supply demand balancing by voluntarily reducing demand motivated
by incentives provided by utility companies. This increase in bi-directional information communication
between utility and customers is the main characteristic of smart grid. In this section, we will discuss the
challenges and opportunities brought about by smart grid and how data-driven models play an important
role in facilitating optimal grid operation.
1.2 ObservabilityinSmartGrid
To enable the benets of the communication capabilities of smart grid, the knowledge of signal measure-
ments is necessary. We refer to the ability of acquiring such knowledge on grid assets as observability, as
in the ability to observe the information desired. This includes knowing the type and number of resources
that is present in each household and the power ows into and out of each asset over time. Advanced
Metering Infrastructure (AMI) smart meters are installed to monitor assets in the grid in real time. They
measure and record the amount of power consumption or generation at regular time intervals. In addition,
we also refer observability as the ability to accurately extract information needed from data collected by
AMI smart meters. In most cases, raw measurement data is not useful for making operational decisions.
Statistical or data-driven models are used to analyze and extract meaningful information, for example,
future values and trends of power ows.
3
1.3 MotivationforHighObservability
High observability can enable optimization of smart grid operation strategies. Ideally, utilities want to
minimize the operation costs while maintaining the safety constraints of the grid by balancing supply and
demand. To achieve this, optimization problems based on power ow equations are solved to nd the best
state for the smart grid to be in. This requires knowledge of the grid acquired by high observability [52].
Also, to have better control over customer behavior, utilities develop demand response programmes for
customers. These programmes provide incentives (e.g. monetary incentives) for customers to voluntarily
reduce their demand at specic times. The incentives need to be tailored to customers in order for it to be
motivating, this requires detailed observability of the usage patterns and behavior of customers [74, 75].
Besides increasing observability through metering, understanding of the grid can be further deepened
by extracting useful information from the collected meter data. For example, predicting future values can
be done with historical meter readings. These information extraction analyses in smart grid are typically
time series analytics, as most data collected from meters are time series data, for example, the real and
reactive power consumption over time. We describe some common analytics in power grid and motivate
the need for them:
1.3.1 Forecasting
Forecasting refers the prediction of future values of time series based on trends, patterns from historical
values and correlations with external factors (e.g. weather). Forecasting is applied to predict future trends
of both electricity demand and supply. Such knowledge is crucial for planning for balanced demand and
supply in the grid. Utility companies need to ensure there is enough electricity supply for the demand or
else there would be a risk of blackout [30]. On the other hand, it is expensive to store excess electricity
generation [77] and thus it is preferable to avoid generating more electricity than needed for the grid to
minimize operation costs.
4
1.3.2 Clustering
The process of grouping similar data points given a set of data is called clustering. The clustering of time
series is more challenging than that of other kinds of data because it is dicult to dene similarity between
two dierent time series. It is not sucient to simply compute the distance between two time series as the
aggregate of the distance between each corresponding time interval (Euclidean distance). If two time series
have the exact same shape, but one of them is shifted by one interval, the Euclidean distance between them
would be large even though they should be very similar. Many dierent kinds of distances and algorithms
have been proposed specically for measuring similarity of time series [1].
For smart grid, clustering is crucial for nding similar customer groups in terms of their consumption
habits. The ability to understand customer consumption habits is needed for preparing demand response
programmes [75]. These programmes refer to policies that provide incentives for customers to change their
consumption in ways that is benecial to the grid operation. They are called demand response because they
attempt to control the demand of the grid as opposed to other strategies like curtailment that controls the
amount of supplied electricity. Designing these programmes require understanding of customer behavior
so that incentives that is attractive to the customers can be provided.
1.3.3 MissingDataImputation
Data collection is a important component of smart grid which is facilitated by AMI smart meters. However,
the collection process does not always operating perfectly, and missing values can occur in the collected
data due to a number of reasons. For example, it can be caused by failure in either the meters or the
communication network [41]. Time series analytics are not designed to handle time series with missing
data, so it is needed to approximate the original values of any missing data. The method for doing so is
called missing data imputation.
5
Figure 1.2: Achieving High Observability Through Metering and Time Series Analytics
1.3.4 AchievingHighObservabilityThroughMeteringandTimeSeriesAnalytics
Fig. 1.2 summarizes how a combination of AMI and time series analytics can provide high observability in
smart grid. AMI meters collect net load from households. These data may contain missing values, which
can be recovered by missing data imputation. The data can then be processed to discover the component
signals in the measurements, e.g., the solar generation and load consumption. Forecasting is then used to
predict future values and trends that can be used for both short-term and long-term future planning in the
grid.
1.4 ChallengesinIncreasingObservabilityinSmartGrid
Recent developments in the technology of energy generation resources have greatly lowered the installa-
tion costs of such assets. This has made it viable for energy resources to be installed on a small, distributed
scale. In particular, rooftop photovoltaics (PV) has seen a large growth in residential markets [24]. This
6
allows residential households to produce electricity on a small scale from solar energy. The characteris-
tics of these resources are that they are distributed (not centrally controlled), volatile, and generally not
monitored.
These resources introduce new challenges for observability in terms of diculty for grid operator to
maintain grid stability and for traditional time series analytics algorithms due to their characteristics:
1. Many of these resources are not monitored. The current metering infrastructure only measures
the net power ow into and out of households. Metering infrastructure for each individual energy
resource can be prohibitively expensive. Moreover, not all customers report to the utility when they
install rooftop PVs, so the utility cannot keep track of all energy resources in their grid. Fig. 1.1
illustrates how DERs are located behind-the-meter, so that meters only measure the net power ow
of consumption and DER generation of each customer.
2. They are distributed resources. This means that they are not centrally controlled by utility companies
unlike company-owned power plants. Thus, it is not possible to curtail or increase generation when
needed. Existing algorithms for optimal power ows need to take into account the activity of such
resources.
3. Renewable energy resources are volatile. Most distributed energy resources are renewable energy
resources that convert energy sources in nature (e.g. solar or wind) into electricity. The amount
of electricity generated depends heavily on environmental factors and cannot be controlled. Their
power output can change abruptly due to external factors, for example, a sudden drizzle can greatly
reduce solar energy production temporarily. The volatility of these resources make analysis of their
time series more dicult.
4. Volatility of DERs increase uctuations in frequency of the grid. Moreover, the increase in energy
generation provided by DERs has caused a decrease for demand of utility generated power. Power
7
plants are forced to operate under capacity or even be demolished due to economical reasons [37].
This decreases the amount of inertia in the grid which shortens the time that it can oset frequency
uctuations in grid. The combination of these two eects mean grid operators would need more
accurate observability more quickly.
The introduction of these distributed resources greatly reduces the observability of the grid. These
distributed assets are often not metered, so grid operators do not have knowledge of the activity of these
resources. Moreover, the energy that is provided by these resources is consumed directly by the households
as they are generated. The amount of energy consumption in these households appear to be lower than
without DERs as part of the consumption is supplied from these resources rather than the grid. AMI meters
that measure household consumption are unable to account for the apparent reduction in consumption
caused by this, thus it cannot report the true power consumption value. The reduction in observability
aects the accuracy of many algorithms for relied on having full observability of the grid [89, 53]. Thus,
there is a need for improving observability in smart grid through data-driven means.
1.5 Data-DrivenModelsforIncreasingReal-TimeObservability
In order to support accurate time series analytics in power grids, there is a need for increasing real-time
observability of the distributed energy resources. Installing extra meters for every asset to be monitored is
a straightforward solution. However, this induces both large monetary and time cost for implementation,
so it is not preferable. The alternative is to take a data-driven approach, which means that unobserved
data is approximated by using other observable data in the grid through machine learning methods. Data-
driven approaches cost less and can be implemented much quicker since there is no physical hardware
that needs to be installed.
8
1.6 ThesisStatement
Full real-time observability in smart distribution grid is crucial for optimal and safe grid op-
eration. ProliferationofDistributedEnergyResources(DERs)havecreatednewchallengesfor
achieving full observability in smart grid. Data-driven methods can be utilized to increase ob-
servability that current Advanced Metering Infrastructure (AMI) cannot provide. Our research
focusisintwomaindirections: discoveryofhiddensolarbehind-the-metersandutilizationof
spatial features. Our goal is to estimate unobserved signals through data-driven methods. We
alsoshowthattheseresearchresultsleadingtohigherobservabilityinsmartgridimprovecru-
cialtimeseriesanalyticsforsmartgridoperations.
1.7 ResearchContributions
In this work, we develop data-driven models for increasing observability of smart grid. We develop data-
driven models to overcome the challenges in metering caused by hidden DERs, in particular, solar PVs and
batteries. We also enhance time series analytics in smart grid like forecasting and missing data imputation.
Our contributions are summarized as follows:
1. We develop a novel data-driven model for accurate disaggregation of hidden solar energy behind
meters from AMI measurements. This model can separate solar generation component from net load
measurements from AMI meters in an unsupervised manner. We also consider the situation where
battery storage exists alongside solar PV installations. Our model can still perform disaggregation
accurately in either situations.
2. We propose the use of grid topology information as spatial features input for improving load fore-
casting results. We utilize Spatial-Temporal Graph Convolution Networks (STGCNs) for learning
9
dependencies between spatial features and time series. We also develop the procedure for process-
ing grid topology into an appropriate graph format as input for the network. We also show that our
approach works better than state-of-the-art when the input data is partially missing.
3. We use STGCNs to leverage information from spatial features for performing missing data imputa-
tion (MDI). We show that the inclusion of neighbor information can improve estimations of missing
data values compared to traditional MDI algorithms.
4. We demonstrate how time series analytics can be improved from increased observability using the
aforementioned results. We show enhancements to applications like power factor prediction and
user load proling in smart grid. In both applications, the accuracy of the results is improved when
data is rst preprocessed by disaggregation.
5. We show that the use of soft clustering results as an input feature can improve prediction of socio-
demographic characteristics for customers based on electricity consumption data.
1.8 ThesisOutline
The rest of the thesis is organized as follows: We discuss the challenges of partial observability in power
grid and its causes, then give an overview of data-driven models that can be used to overcome these
challenges in Chapter 2. We then discuss current work that relate to improving observability in Chapter 3.
We then explain our proposed disaggregation model for approximating behind-the-meter solar in Chapter
4. This is followed by a discussion on data-driven models leveraging spatial data in power grid in Chapter
5. In Chapter 6, we show time series analytics accuracy improvements utilizing results from previous
chapters. Finally, we conclude our work and suggest future research directions in Chapter 7.
10
Chapter2
DataAnalyticsinSmartGrid
In this work, we investigate improving common data analytical problems in smart distribution grid that
occur due to a need for understanding customer behavior and electricity ow in the grid. Smart grid
facilitates a two-way communication between utility companies and customers. By doing so, the utility
company obtains data about the customers and their electricity usage. They can then analyze these data
for grid planning and operations, like demand response programmes [52, 53]. However, recent develop-
ment in Distributed Energy Resources (DERs) has greatly lowered the cost of installing such resources
for residential use. The main examples are residential rooftop photovoltaics (PVs) for solar generation
and storage batteries for energy storage. These DERs located behind the smart meters that measure cus-
tomer usage are "invisible" to utility companies, thus, introducing a challenge for existing data analytics
solutions which are not designed to handle inputs with DERs. Therefore, we need data-driven models for
discovering Behind-the-Meter (BTM) DERs, and integrate them to data analytics to improve their results.
In this chapter, we describe the smart grid model and relevant background of problems we tackle in
this work. We then describe data driven models that are used currently for solving smart grid related
problems. Finally, we explain the challenges in increasing observability in the grid through these models
and the opportunities to apply novel methods to overcome these challenges.
11
2.1 SmartGrid
2.1.1 SmartGrid
Unlike traditional electricity distribution grid, the smart grid facilitates a two way communication between
utilities and customers. In this work, we consider the smart grid model where each customer/household
is metered with a smart meter. This meter should record the electricity ow going into or out of each
household at every specic time interval, we denote this reading as the Advanced Metering Infrastructure
(AMI) measurements.
2.1.2 GridTopology
We can view the smart grid network as a graph topology. Each node can represent either an individual
customer or a group of customers under the same sub-feeder transformer. Edges between nodes denote
that they are connected by feeder lines.
2.1.3 Behind-the-MeterDistributedEnergyResources
Distributed Energy Resources (DERs) are small scaled decentralized production of energy in the distribu-
tion grid. In this work, we focus on investigating DERs located in residential households. These DERs
are located Behind-the-Meter (BTM), where "meter" refers to the metering instrument installed by utility
companies for measuring and recording customer electricity usage. Theses BTM DERs provide electricity
for on-site use at the residential household.
2.1.4 BTMSolarGeneration
BTM Solar Generation refers to rooftop photovoltaics (PVs) installed for residential buildings. PVs are
arrays of solar cells that can convert solar energy into electricity. It is one of the most popular forms of
12
DER due to their relatively low cost and small size. Reports from the National Renewable Energy Lab
shows that the residential PV sector is a growing market in the US [25].
Solar generation is a type of renewable energy source. It is not a reliable energy source as it can
only generate electricity when there is sunlight. It is also heavily aected by environmental conditions
including cloud coverage, weather conditions, shadows from trees or buildings, etc. As a result, it is not
ideal to be used as a main energy source.
2.1.5 SmartInverterandPowerFactor
Inverters are electrical devices in distribution grid used to convert a portion of the output real power
to reactive power. The ratio of conversion is called the power factor. Traditionally, the power factors of
inverters are xed. However, the penetration of PV systems has been increasing signicantly in smart grid
in recent years [24]. This has dramatically increased the complexity of grid operations as the stochastic
nature of PV generation has the potential to cause frequent voltage and frequency violations. Traditional
inverters are not able to regulate voltage as their power factors are xed and cannot adapt to the volatile
changing output power of PV generators. To solve this problem, smart inverters have been developed to
allow real time control of PVs and other renewable energy based Distributed Energy Resources (DER) via
protocols such as Active Network Management (ANM) [81]. Grid operators now have the tools at their
disposal to dispatch DERs in real time and mitigate violations in voltage and frequency.
2.1.6 BTMBatteryStorage
Battery Storage is a device that can be charged with electricity and discharged to supply electricity. In this
work, we focus on BTM batteries installed with PVs. The motivation for using BTM batteries is that PV
generation in daytime is often higher than the load demand at residential households. Batteries are able
to store the excess generation. In the night time, there is no solar generation and load demand is usually
13
higher, batteries can then discharge to supply extra electricity to the households. This allows residents to
lower costs of their electricity bills and also utilize resources better. BTM batteries are considered a DER
because it is able to supply electricity just like other energy resources, even though it does not actually
generate electricity.
2.2 DataDrivenModels
2.2.1 UseofData-DrivenModelsforTimeSeriesAnalysis
Time series analysis is the process of extracting useful and meaningful information out of raw time series
data. This process is important in many elds, because it is dicult to directly use raw unprocessed data
for any kind of decision making. For example, in nance, it is desirable to learn the trends and regularities
of stock prices so that accurate predictions of future values can be made.
2.2.2 AnalyticalModelsandData-DrivenModels
Traditionally, time series data analysis has been done using analytical models. Analytical models are math-
ematical models with a closed solution. These models are equations that have been developed to model the
behaviors based on underlying physical factors and knowledge of the time series. A simple example is the
moving average model for future predictions, which takes the average of past values as a prediction for the
future value. However, in many cases, the underlying physical factors aecting time series data behavior
are too complex to develop analytical models to capture their correlations. In contrast, data-driven models
do not try to understand and explain the relationships between the inputs and the outputs. Instead, they
aim to nd a function that can translate the input features into the outputs.
Data-driven models usually start with some insights on how the inputs and outputs correlate, and a
suitable model with variables that has to be tted is picked. The simplest kind of model is the linear model
y =mx+c where a coecientm and oset constantc has to be tted to nd a linear correlation between
14
the inputsx and outputsy. More complex models have been developed for time series analysis, which
try to capture common characteristics of time series trends. An example is the ARIMA model [33], which
is an auto-regressive model to predict future values based on past values, while modelling the error term
with a moving average model and eliminating non-stationarity in the time series.
In the past two decades, a new family of data-driven models called neural networks have seen great
improvements in performance. Unlike previous data-driven models, they have much greater modelling
capability. It has been proven that a neural network with arbitrary layer width or depth can model any
continuous function [20]. Training of neural networks generally takes more computational resources than
previous models, but advancements in computational technology (e.g. use of GPUs to accelerate compu-
tation) have made training deep and complex neural networks possible.
2.2.3 ApplicationsofData-DrivenModels
Data-Driven Models have been successfully applied to problems in a wide range of elds. In particular,
models have been developed specializing in time series analysis. Data-driven models like ARIMA [33], Ker-
nel Regression [23], Random Forest Regression [55] have been used for forecasting problems like weather
prediction [32], stock price prediction [71]. Similarly, neural networks designed to learn temporal depen-
dencies in time series have also been developed. These networks are a family of networks called Recurrent
Neural Networks (RNNs) [64], which include variations like LSTM networks [51] and Gated RNNs [104].
RNN diers from other neural networks in that it uses outputs from a previous iteration in addition to input
features for each layer input. This allows RNNs to learn dependencies of entries in sequential inputs.
Besides forecasting problems, time series clustering problems are also important in many elds. In
nance, [69] have shown that Self Organizing Maps can be used for clustering stock price time series to
discover patterns. Clustering can also be applied to speech recognition with the use of LSTM RNNs [86].
15
2.3 ChallengesandOpportunitiesinIncreasingObservability
2.3.1 Observability
Observability refers to the knowledge of signal measurements for every component in the grid such as real
and reactive power ows over the lines, power injections or consumption at homes or PV systems in the
grid, voltages at transformers etc. If the economics of the system are discounted, real-time full observability
can be obtained by extensive installation of metering infrastructure to monitor each attribute. With full
data accessibility, it is possible to conduct analysis on the data to further increase the observability by
increasing our understanding of the system, e.g. accurate modelling of customer behaviors, future load
forecasting. This is however not feasible in budget-constrained real world scenarios, where remaining
attributes must be inferred only using partial data from the limited metering system. We denote such
scenarios as partial observability scenarios.
In this section, we discuss the challenges in inferring data that is not metered from partial observability.
We describe the causes of partial observability in data that we have investigated in this work and the
diculties in making inference for these data. With challenges comes opportunities, development of data
driven models have made it possible to model correlations and dependencies among data of interest with
higher accuracy than traditional statistical models have allowed. We discuss data driven models that can
overcome partial observability challenges in smart grid.
2.3.2 ChallengesofPartialObservability
2.3.2.1 HiddenSolar
As penetration of solar generation in the residential market increases, there is a greater amount of BTM
solar generation in the grid. This solar energy appears hidden to the utility because it is only measured as
part of the net load (sum of the consumption and generation at the household).
16
When solar generation is greater than the consumption, the excess energy is treated dierently de-
pending on whether reverse ows are allowed. Reverse ows being allowed means that the utility would
buy the excess energy from customers, resulting in a reverse ow of energy from customers to the grid.
For the case where reverse ows are not allowed, the excess energy would simply be wasted unless there
is a battery storage.
As mentioned in Section 2.1.5, smart inverters have been deployed to provide real time control of DERs
in the grid. However, proper decisions of DER dispatchment or curtailment can only be made if real-time
BTM DER states are known. The lack of real-time BTM solar generation data not only hinders the ability
for utilities to control the grid for protection and optimization, it also reduces their ability to make planning
decisions. For example, in the 2018 Angeles Forest event, system operators have approximated 130MW of
DER that have tripped oine in addition to 860MW of large scale utility operated PVs [18]. They were
unable to acquire an accurate number of DERs that were tripped oine and could only estimate by the
increase in net load. Full knowledge of the DERs in the grid can help the operators better protect the grid
and plan ahead for future emergencies.
2.3.2.2 HiddenBattery
When a battery storage is installed with solar for storing excess solar generation, an eect called peak
shaving is created [73]. By supplying extra energy for demand load peaks, this reduces the amount of
electricity ow from the grid to the customer, thus the peak demand is reduced. As a result, the battery
hides part of the load consumption of the customer. The observability of the grid is further reduced when
battery storage is presence. In particular, it makes estimating BTM solar generation a more dicult task,
amplifying the problems associated with hidden solar presence.
17
2.3.2.3 RealandReactivePower
Power ows in smart grid consists of real and reactive power components. Reactive power ows may
come from both generators and consumption devices like laptops. Most metering instrument currently in
smart grid are not equipped to measure both real and reactive power. Instead, only the apparent power is
measured, which is the magnitude of the sum of the real and reactive power vectors.
Another reason for not having reactive power data is the use of smart inverters. Traditional inverters
have a xed power factor, the real and reactive power in the grid can be determined easily. However,
smart inverters are congured with Volt/Var curves which enables them to congure the amount of real
and reactive power injected into the grid based on the voltage at its terminal [17]. Since their power factor
is variable, real and reactive power cannot be determined.
Knowledge of reactive power in the grid is important for voltage control, as reactive power can directly
aect the voltage level in a grid thus needing careful management [45]. It is important to avoid violating
voltage limits by controlling the reactive power in the grid, because such violations can cause potential
damage to equipment and even cause power outage [16]. Moreover, a lack of full understanding of the real
and reactive power proles of the grid, and the states of DERs like solar generation makes it dicult to
estimate the maximum PV hosting capacity in the grid. This leads to more conservative estimations, thus
the full PV potential of the grid cannot be utilized.
2.3.2.4 MissingData
Missing values occur frequently in AMI measurements. The main cause of this is communication errors
between meters and the data collection server or errors in measurements. This can greatly aect data
analytics in smart grid. For example, in the case of forecasting of future customer loads. If a customer’s
data is missing for some time due to error in measurements, conventional models will be greatly aected
18
by these wrong measurements. This would aect the accuracy of many data analytics, e.g. state estimation,
load forecasting, etc, which are essential for optimizing the operation cost of smart grid.
2.3.3 Opportunities
As discussed in Section 2.2.1, developments in data-driven models have allowed for models that can dis-
cover correlations in data sets at very high accuracy. While BTM DERs introduce challenges in providing
full observability of smart grid, they also provide opportunities to utilize these new technologies in pro-
viding a data-driven solution. There are several advantages of using data-driven models over hardware
solutions like separately metering each DER device of interest. The most important benet is that it is
much more economical to deploy data-driven models than using additional metering instruments. It also
takes less time to deploy such models. This makes data-driven solutions a more attractive choice for utility
companies in the short term until smart grid networks can be overhauled to meet new requirements on
information transmission.
In this section, we discuss a number of potential solutions to challenges introduced in Section 2.3.2.
2.3.3.1 UnsupervisedBehind-The-MeterDERDisaggregation
Disaggregation is the problem of separating signals into their constituent components. This can be used
to discover BTM DERs from AMI measurements, for example, determining hidden solar generation and
battery storage values. It is required for the disaggregation to be performed in an unsupervised manner.
This means that there is no historical data available for training the models for disaggregation. The reason
is that collecting historical data for constituent components would require extra metering instrumentation
which defeats the purpose of using disaggregation for lower cost.
Denotey
i
2R
T
as the AMI measurements forT intervals for a particular consumeri, andc
1
;c
2
;:::;c
k
2
R
T
as the components of interest in the same period. The problem can then be written as nding the
19
estimation for each component ^ c
1
; ^ c
2
;:::; ^ c
k
such that for an error functionE, the error between each es-
timation and the true values are minimized. These components can be the load of the customer, their solar
generation or battery charge, etc.
min
^ c
k
X
k
(E(c
k
; ^ c
k
))) (2.1)
s:t:
X
k
^ c
k
=y
i
(2.2)
2.3.3.2 LeveragingSpatial-temporalInformation
While temporal correlations are commonly utilized in data analytics for smart grid, spatial correlations
have not been investigated. Spatial correlations refer to relationships between data that are close to each
other, this is usually represented in the form of a weighted graph, where nodes are the observed entities
and edges connect the related entities. In the case of smart grid, the nodes are customers and edges rep-
resent customers that are correlated. There are several reasons why nearby customers may show spatial
correlations in their AMI measurements. Customers in the same neighborhood are likely to have similar
socio-economic status, they may have similar electricity usage habits as a result. Customers close by also
experience similar weather conditions, solar irradiance, etc. These are all factors that can aect electricity
consumption or solar generation patterns. If both spatial and temporal correlations are fully utilized, more
accurate data-driven models can be developed.
20
Chapter3
RelatedWork
As discussed in previous chapters, several data-driven methods are used for improving observability in
smart distribution grid. In this thesis, we focus mostly on behind-the-meter solar disaggregation, load
forecasting and missing data imputation. Disaggregation encompasses a large class of problems and there
exists plenty of works on this topic. In this chapter, we discuss current technology on disaggregation
problems and their limitations, as well as how we improve over existing disaggregation models. We also
perform a survey of load forecasting and missing data imputation works. Finally, we also describe several
time series analytics applications in smart grid like user proling and customer socio-demographic charac-
teristics prediction. We show in this thesis that these applications can be enhanced by utilizing improved
observability.
3.1 Behind-the-MeterSolarDisaggregation
The solar generation disaggregation problem is part of a larger set of disaggregation problems called non-
intrusive load monitoring (NILM). NILM attempts to separate AMI measurements from households into
each consumption source [109], including electric appliances and electric vehicles. Solar generation diers
from other disaggregation problems in that it is not an additional term, rather a subtractive term. It is not a
discrete event unlike electric appliances which can be modelled by ON/OFF models. It is also less dependent
21
on consumer habits but more on environment and weather. This means that existing NILM solutions are
not applicable to the solar generation case.
The disaggregation problem is closely related to the solar generation prediction problem, where the
task is to predict solar generation based on environmental variables [92]. A possible solution of the dis-
aggregation problem is to predict the solar generation at a given location, then subtract that from its AMI
measurement. SunDance [13] is an example of a work that combines the overlaps of these two problems.
However, the prediction problem generally requires more detailed inputs of the environment to produce a
good estimation. In contrast, disaggregation can relax the requirements on inputs by making use of AMI
measurements. Another similar problem is the estimation of invisible solar PV generation done by [90, 91].
Their work focuses on rst nding representative PV sites that need to be fully monitored so that they can
be used to model invisible PV sites with unmonitored generation. To nd the representative PV sites, their
methods requires a preprocessing step which involves collection of solar generation data from all the PV
sites for a short period of time. We argue that collection of even a small amount of such data from all the
PV sites will be expensive as it will requires techniques such as surveying or installing temporary meters.
Thus, we use solar data from a small number of sites which already have the required metering infrastruc-
ture and focus on fully unsupervised disaggregation. We aim to solve the problem without ever needing
to collect solar generation data at the sites where disaggregation is needed. Sossan, Nespoli, Medici and
Paolone [95] suggests a dierent approach in which they separate generation from net load by leveraging
the fact that the frequencies of these signals are dierent. A bandpass lter is used to remove consumption
signals from the AMI measurements to get the solar generation. All of these methods focus only on mod-
elling the solar generation component of the disaggregation problem. These models are unable to utilize
additional features and information that can produce a more accurate load model.
Few other works in the literature approach the unsupervised disaggregation problem with the contex-
tually supervised source separation model [106]. Unlike the previously mentioned models, this approach
22
utilizes features to model each component with a linear model. The features that should be used is cho-
sen by the authors guided by contextual knowledge they have on the problem. Kara, Tabone, Roberts,
Kiliccote, and Stewards [44, 43, 98] uses temperature-based features for modelling the aggregated load
consumption of customers [63]. [42] uses a combination of physical models and data-driven models to
perform disaggregation. They use physical parameters to model the solar model, while using an HMM for
the load component. Their method requires knowing the customer location as well as weather features for
estimation of the solar model. These methods all require some kind of meteorological feature, e.g. solar
irradiance or weather. In this thesis, we investigate the use of the consumption and generation signals of
neighboring customers as input features which can capture external factors like meteorological features
implicitly. We compare the performance of our method with models that use temperature as a feature in
our experiments. As many metering infrastructures currently do not support sampling rates greater than
15-30 mins, most disaggregation work in the literature evaluate their models on data that is within this
sampling rate range.
3.2 LoadForecasting
Proliferation of AMI meters in smart grid has enabled data collection at ner granularity (few minutes to an
hour) [2]. This has propelled the research in the development of data-driven solutions for several problems
in smart grid [4]. One such problem is the prediction of customer consumption for future horizons, also
known as load forecasting.
The problem of load forecasting has received widespread attention and several methods have been
developed for accurate prediction of future load demand [110]. Traditionally, statistical methods like mov-
ing average or ARIMA [94] have been the most popular methods for load forecasting due to their low
complexity and low computational overhead. However, as computational power of computers increased,
data-driven models have surpassed the popularity of statistical models as they can represent complex
23
non-linear relationships. Models like Support Vector Regression and Random Forest Regression have been
widely successful in many applications including time series analysis and forecasting [10].
Recently, state-of-the-art deep neural networks such as Long-Short Term Memory (LSTM) networks
and Convolutional Neural Networks (CNN) have been applied to the problem of STLF. These networks are
capable of representing even more complex models than traditional ML and can deliver state-of-the-art
performance. For example, [51] uses LSTMs to learn representations for sequential data. In addition to the
historical loads, time related features like the day of the week, the time interval of the day, and whether the
day is a holiday are used. [107] uses CNNs to learn correlations of data within a certain temporal locality
and make predictions, this paper also uses k-Means to segment the dataset into subsets of data point
clusters to construct an ensemble of CNN models. While both kinds of networks are powerful, neither is
able to consider spatial information in smart grid as inputs.
3.3 MissingDataImputation
There are plenty of works that have explored ways to perform missing data imputation. Works such as [61,
26] include matrix operation and machine learning based models which operate on the entire input matrix
to perform imputation. For example, the denoising autoencoder developed in [26] takes the entire matrix
with corrupt entries as input to reconstruct an uncorrupt version. The scalability (at deployment time) of
the matrix operation based models is limited while the machine learning based models suer from sample
ineciency as they try to model all-to-all correlations. On the other hand, works such as [79, 47, 85, 59]
work on individual time series data and may not be able to capture the spatial correlations. For example,
the autoencoders developed in [85, 59] take the daily load prole as input to perform imputation. Their
works dier by the choice of neural network structure used in the autoencoders, [85] used variational
autoencoders, while [59] used LSTM networks. [47] uses a k-nearest neighbor based approach to search
for past situations similar to the each missing entry time series to estimate the missing values. Authors
24
in [96] develop a GNN based technique for Missing Data Imputation. However, their technique is not
tailored towards time series data and thus does not capture temporal dependencies.
Recently, works such as [50, 8] have developed analytical models to estimate missing data using infor-
mation from the neighbors — thus capturing spatial correlations. Our proposed technique is a data-driven
version of these methods which only needs the connectivity information and forgoes other hard to obtain
information such as impedances, current ows, etc.
3.4 OtherTimeSeriesApplicationsinSmartGrid
Substantial literature has been dedicated to time series clustering [54, 1]. Commonly used distance metrics
such as Euclidean distance do not capture the similarity of time series data. Therefore, distance metrics
such as Dynamic Time Warping (DTW) metric [6] are used. DTW warps the time series within a certain
constraint to allow matching of time series that are slightly shifted. Recently, K-Shape [76] was proposed
as a faster clustering algorithm that makes use of cross-correlation to dene a new distance measure.
Several works have focused on the clustering problem in the context of load demand clustering in smart
grid. Dierent data-driven models have been used to represent each group of consumption patterns and
perform clustering, including recurrent neural network models [99] and Dirichlet Process Mixture Models
[28]. There has also been research to perform clustering on Big Data for demand response [12, 74]. The
prior works are able to form clusters of good quality, but to the best of our knowledge, there is currently
no work that investigates the case of load clustering under the presence of distributed BTM solar in the
grid.
Fuzzy clustering of load consumption time series has been used to discover correlations between clus-
ter groups of customers and their socio-demographic characteristics [102]. They form fuzzy-based rules
to predict characteristics for members of the clusters. Alternatively, there are works that predict these
characteristics using input features regarding consumption gures extracted from load [5, 97]. However,
25
these work does not include fuzzy clustering results as a kind of input feature in addition to other features
extracted directly from load.
26
Chapter4
Behind-the-MeterSolarDisaggregationusingConsumerMixtureModel
The residential PhotoVoltaics (PVs) market has seen great growth in the past decade. Large amount of
solar energy is generated and consumed at the residential level. Behind-the-Meter (BTM) Solar refers to
these residential PV generation that is hidden from utility companies. The only measurement seen by
utility is the net load measurement (sum of consumption and generation) from the Advanced Metering
Infrastructure (AMI) in place.
In this chapter, we develop models for performing the disaggregation of BTM solar from the net load.
Disaggregation means the models are able to separate solar generation and load consumption components
from net load by data-driven methods. We investigate the performance of our models in scenarios where
batteries are absent or present.
4.1 ProblemDenition
4.1.1 DisaggregationProblem
We are given AMI measurements of a set of customers, presumably all under the same feeder network.
These measurements are sum of the consumption load and rooftop PV generation of each customer. Our
task is to separate these AMI measurements into the component load and generation signals. The objective
27
is to minimize the dierence between the estimated disaggregated signal and the real signal of all the
signals of the set of customers.
Formally, we state the problem as follows. Given a set of customersC, we denote AMI measurements
at timet for customeri2 C asy
it
. We would like to nd for customeri at timet their load and solar
generation denoted asl
it
ands
it
respectively. The goal is then to nd estimates for the disaggregated load
and solar signals
^
l
it
and ^ s
it
. Denote an error function between two time series asE. DenoteL
t
=
P
i2C
l
it
andS
t
=
P
i2C
s
it
, then the objective of the problem can be written:
min
^
l
it
; ^ s
it
(E(L
t
;
^
L
t
) +E(S
t
;
^
S
t
)) (4.1)
s:t:
^
l
it
+ ^ s
it
=y
it
8i (4.2)
However, it is generally the case thatl
i
ands
i
are not available to train a model in a supervised manner.
Thus, the estimations need to be made using only y
i
. To do so, let x
1
i
2R
D
1
T
and x
2
i
2R
D
2
T
be the
features related to consumption and generation, respectively, for consumeri, whereD
1
andD
2
are the
respective feature dimension size. The procedure for obtaining these features is discussed in Section 4.2.
We model the estimations as approximately equal to a linear combination of these features such that their
sum is the AMI measurement. Formally,
^
l
i
T
i
x
1
i
; (4.3)
^ s
i
T
i
x
2
i
; (4.4)
^
l
i
+ ^ s
i
= y
i
; (4.5)
where
i
and
i
are coecients for combining the features for the consumeri.
28
We focus on disaggregating AMI measurements into user consumption and solar generation using only
minimal information, i.e., we are only given the AMI measurements of consumers and the solar irradiance
in the region. We do not know any other consumer specic details such as their specic location, solar
panel conguration, history of consumption/generation or if they have PVs. This makes sure that our
unsupervised model can be generalized to multiple datasets.
The motivation for using a linear model for solar generation as shown in Equation (4.4) is derived
from the formula suggested by Clark et. al. [49] that shows the energy output per unit time is the product
of the collector area, collector parameters and the global solar irradiation. For each photovoltaic system
owned by one household, given the solar irradiance R, the total solar panel area A, solar panel yield r and
performance ratio PR, we can simplify the formula toE =R, where =ArPR.
4.1.2 DisaggregationinPresenceofBatteries
We also consider the problem of unsupervised disaggregation of load consumption, BTM solar information
from AMI measurements under the particular case where energy storage is present. The AMI measure-
ments from customers contains energy generation by solar PVs and energy storage in the form of batteries
installed. PVs generate electricity in the daytime when the sun is up, causing a drop in AMI measurement
during this period as consumptions of the house are supplied by self generated electricity. At the same
time, battery storage is also charged from excess generated energy, which can then be discharged in the
evening when the consumption load rises. This creates a combined eect of reducing the AMI measure-
ments greatest and lowest peaks, also known as "peak shaving".
In this problem, we need to model an extra function that captures battery charging and discharging
with respect to time. We denote the function asf
batt
(t). The model for generation in (4.4) needs to be
modied to include this battery function.
29
^ s
it
T
i
x
2
it
+f
batt
(t) 8t (4.6)
4.1.3 RequirementsandAssumptions
We require that our model can be trained in an unsupervised manner. This means that for the customers
whose AMI measurements are to be disaggregated, there would not be historical disaggregated signals of
these customers for training the model.
We make the following assumptions in our model:
1. The dataset contains customers within the same region. In other words, they are under the inuence
of common latent variables.
2. The customers in the dataset consists of those with and those without rooftop PVs installed. For a
small number of customers with PVs, the separated solar generation measurement is known.
3. Reverse ows are allowed in the network. This means that when the generation is greater than the
consumption for a customer, there is a reverse ow of electricity from the customer to the grid. We
dene the reverse ow as negative value in this thesis.
4.2 ConsumerMixtureModel
4.2.1 ModelOverview
In this section, we describe the model we have developed for modelling load and solar signals in disaggre-
gation. Our methodology consists of two main steps.
30
Figure 4.1: Overall architecture of the disaggregation model
The rst step is a feature engineering step for extracting features used for performing the disaggrega-
tion. This step takes a dataset of AMI measurements of customers within a neighborhood as inputs and
produces the features needed for the next step.
The second step is the disaggregation of AMI measurements. This involves training a model that
estimates the disaggregated signals using the features extracted from the rst step as input. The output is
the disaggregated load consumption and solar generation time series.
We also describe a variation of our model for disaggregation in the presence of batteries in addition
to solar generation. A function for modelling the battery charge and discharge activity is added to the
generation signal model to account for their presence.
4.2.2 FeatureExtraction
For our model, we need to engineer features that can capture information about load consumption and
solar generation.
For load consumption, we propose a novel model called the Consumer Mixture Model (CMM) for
modelling consumption of customers using a set of known major consumption patterns. To do this, we
rst need to engineer the set of major consumption patterns in the neighborhood being investigated. This
is acquired by the following steps:
31
1. Assume the dataset of customers contains a mix of customers with or without PVs installed.
2. Use K-Means clustering [54] withK = 2 to separate customers into those with or without PVs
3. Use K-Shape [76] on the set of customers without PVs to cluster these customers based on their
consumption patterns
4. For each cluster, nd the centroid using DTW barycenter averaging [80]. These centroids are the
major consumption patterns used for the disaggregation step
For solar generation, we use a set of reference solar generation time series. We assume that a source
for solar generation data of a small number of solar PVs in the neighborhood is available. We call this the
reference solar generation dataset. This can be achieved by metering a small number of customers, which
induces a much lower cost than installing metering instruments for all customers. Deployment of full
metering infrastructure for the whole grid is also dicult. This is because many customers may not accept
having their detailed electricity usage and generation monitored due to privacy reasons. Similar to feature
extraction for CMM, K-Shape is used to cluster the reference solar generation time series. The centroids
of each cluster found by DTW barycenter averaging are used as input features for the disaggregation step.
For both features, we use actual load and solar signals from other customers that does not need to have
their measurement disaggregated. The advantage of doing this is that these actual measurements would
already reect the eects of latent variables like weather and temperature that aect the signals. It is not
needed to nd and model all the relevant latent variables.
4.2.3 AggregationofAMImeasurements
When given a set of customer AMI measurements as inputs, there is a choice of performing disaggregation
on each customer individually, or aggregating the AMI measurements before disaggregation. We dene
the term “Aggregation Level" to refer to the degree of aggregation performed, where the aggregation level
32
Figure 4.2: Grid topology
is the number of customers aggregated to form one aggregated measurement. For example, with 100
customers and aggregation level = 5, we would get 20 aggregated measurements where each one is an
aggregation of 5 customer measurements. Aggregation is done by summing up of the time series and
dividing by the aggregation level.
There are many reasons to perform aggregation. For large datasets, aggregation reduces the number
of data points and thus reduces running time. Aggregating also reduces Gaussian noise in the data by
smoothing the time series. However, one drawback is that information is lost when aggregation is done.
We will examine the eects of aggregation in our experiments.
From the point of view of the utility, there can be two ways of aggregation:
Topology-driven - In the grid, the most common topology design is the radial network. This network
forms a tree network as transmission radiates out from the utility supply to progressively lower voltage
lines to their destinations. As a result, there is a natural clustering of customers based on the closest
common point they are connected to (typically a transformer). Information of load and solar for all the
consumers/generation downstream of a transformer can enable ecient voltage regulation at each trans-
former. This is especially critical under the increasing penetration of highly variable renewable energy.
Fig. 4.2 shows a feeder with three levels of aggregation.
33
Behavior-driven - Customers to aggregate are chosen by clustering of customers with similar load
consumption patterns. Our clusters mainly aim to group customers with daily load consumption curves
of similar shapes. This means that these customers reacts similarly to external factors like temperature,
day of the week, etc.
This type of aggregation is for cases where we have control over which customers to aggregate. For
example, when we are given individual customer measurements and thus can perform any kind of prepro-
cessing on it.
In the following subsections, when we refer to a customer, it can either mean individual or aggregated.
This distinction will not aect the proposed methodology.
4.2.4 DisaggregationwithConsumerMixtureModel
We perform the disaggregation of solar generation and load consumption signals by a model inspired by
the contextually supervised signal separation model [106]. In our method, we use the Consumer Mixture
Model (CMM) for modelling the customer load consumption signals. Depending on the number of data
sources we use for tting the solar generation model, we use two dierent CMM variations. For when
there is only one solar feature source (e.g. solar irradiation data), a simpler CMM can be used as less
parameters need to be t. When more than one data sources are used for solar features (e.g. multiple
metered solar generation sources nearby), the models are more complicate and we need to use an iterative
training method to t CMM.
4.2.4.1 RegularCMMwith1solarfeaturesource
We perform unsupervised disaggregation on the AMI measurements by a modied version of the con-
textually supervised signal separation model [106]. The model constrains the possible values that can be
34
taken for each separated signal using domain knowledge, then perform an optimization to estimate the
disaggregated signals for load and solar.
Figure 4.3: A simulated customer data sample for 1 day
Denote estimated disaggregated load and solar for customeri at time intervalt as
^
l
it
and ^ s
it
respec-
tively. The model requires AMI measurements only from customers as input for the disaggregation, but
also a small amount of customer data referred to as reference customer data (that is not from customer
data that needs disaggregation) with the consumption and solar generation components known for tuning
the model. Another input that is needed is a set of customers without solar panel installations for feature
extraction, which will be referred to as non-PV customers.
The disaggregation is performed using features extracted from the feature extraction step. Using the
features, we model consumption load, solar generation and battery activity separately by dierent models.
Load is modelled with the set of centroids from step one denoted by vector g plus a constant variable
l
base
which represents consumption that is always present as shown in (4.8). Solar is modelled by a linear
relationship with neighbouring PV generationsR
t
from customers in the the reference dataset in (4.9).
Denoting the AMI measurement for timet for customeri asy
it
, we formulate the optimization (4.7) to
solve for disaggregated components by nding the weight parameters and, where the objective is to
35
minimize the l1-norm of the dierent between the estimated AMI measurement (
^
l ^ s) and the true AMI
y.
min
i
;
i
;l
b
ase
X
t
jjy
it
(
^
l
it
^ s
it
)jj
`1
(4.7)
subject to
^
l
it
T
i
g
t
+l
base
(4.8)
^ s
it
i
R
t
(4.9)
0 (4.10)
0 (4.11)
However, solving the optimization is dicult asl ands are of opposite signs and therefore can increase
as much as possible together without increasing the error. To overcome this, we add a l1-regularizer to
which is tuned by the reference dataset, and solve for the load part and solar part separately. To solve for
l, we also consider the night time data only where solar generation is close to zero. Thus, the objective
function is split into two parts where one part considers the whole period of timeT while the other part
considers only the night time. The two parts are weighted by a hyperparameter. (4.12) forms the new
optimization where is the parameter of the regularizer for. Then, is solved for by least squares in
(4.14).
i
andphi
i
are iteratively solved for until the values do not change above a certain threshold.
min
i
;l
base
X
t2T
((
^
l
it
+ ^ s
it
)y
it
)
2
+j
i
j
`1
+(1)
X
t2nighttime
(
^
l
it
y
it
)
2
+j
i
j
`1
(4.12)
subject to
ij
0 8j (4.13)
36
^
l
i
+
i
R = y
i
(4.14)
After
i
and
i
are found, we need to adjust
^
l
i
and ^ s
i
such that they add up toy
i
. This is because the
solutions found are only approximations and does not satisfy the constraint that the disaggregated compo-
nents should add up to the original AMI measurement. Thus, we solve the following optimization which
nds the estimated components such that they are as close to the approximated solution while satisfying
the constraint. is a hyperparameter for deciding the weight to put the error of the approximation into
load or solar, we decide on the value of by doing validation on the reference dataset.
min
l
it
;s
it
T
X
t=0
(l
it
T
i
g
t
) + (1:0)(s
it
i
R
t
) (4.15)
subject to l
it
0; s
it
0 8t (4.16)
l
it
+s
;t
=y
it
0 8t (4.17)
4.2.4.2 IterativeCMMformultiplesolarfeaturesources
Similar to the base CMM, we model the load and solar generation components as linear models dependent
on the input features from the feature extraction step. However, we cannot use least squares to t the solar
linear model as there is more than one solar feature. To t this model, we:
1. Solve for the load linear model. This is done with a convex optimization with the solar linear model
xed.
2. Solve for the solar linear model. The load model from the previous step is used to nd the solution
with a convex optimization.
3. Repeat Steps 1-2 until a convergence or the iteration threshold is reached.
37
The following subsection will describe this algorithm in detail.
Denote g
t
=fg
1t
;g
2t
;:::;g
Dt
g as the vector of values of theD consumption pattern centroids dis-
covered from the feature engineering step at time intervalt. Denote R
t
=fR
1t
;R
2t
;:::;R
Mt
g as theM
number of reference solar generation readings at time intervalt. CMM approximates load consumption
of each customeri at timet denoted asl
it
as a weighted sum of each centroid value with the weights
i
dened for each customeri.
l
it
T
i
g
t
(4.18)
The estimated solar generations
it
is modelled as the reference solar generationR
t
at timet times the
weights
i
dened for each customeri.
s
it
T
i
R
t
(4.19)
This model is trained in two steps by making use of the fact that there is no solar generation in night
time. Night time data consists of load consumption only.
Step 1: Find load features coecient
i
We formulate a convex optimization problem to nd
i
for each customeri as follows. DenoteT as
the set of time intervals to be considered, andT
n
as the subset of intervals that is in the night time. We
dene night time as 7PM-5AM in our model. The problem formulation consists of three terms. (4.20)
minimizes the dierence between the estimated disaggregated components (l
it
+s
it
) and the observed
AMI measurementsy
it
over the full time periodT . (4.21) concerns only the night time periodT
n
so it
38
is assumed that the solar component s
it
= 0. The importance of these two terms is weighted by the
parameterK. (4.22) is a lasso term to make the problem prefer to nd
i
with more zero terms.
min
i
K
X
t2T
(
T
i
g
t
+
T
i
R
t
y
it
)
2
(4.20)
+(1K)
X
t2Tn
(
T
i
g
t
y
it
)
2
(4.21)
+j
i
j
1
(4.22)
subject to
ij
0 8j (4.23)
Step 2: Find solar coecients
i
Once we have
i
, we form a optimization problem similar to the formulation for nding
i
except we do
not need to consider theT
n
only term.
min
i
X
t2T
(
T
i
g
t
+
T
i
R
t
y
it
)
2
+j
i
j
1
(4.24)
subject to
ij
0 8j (4.25)
Iterate Step 1 and Step 2 until convergence:
The ultimate goal of our disaggreagtion algorithm is to nd the coecients
i
for the load model in step
1, and to nd the coecients for solar
i
in step 2. The disaggregated components can then be retrieved
by Eq (4.18) and (4.19). The two steps are iterated repeatedly until either the number of iterations reached
the maximum threshold that is dened, or there is a convergence. Convergence is dened as when the L1-
norm between the solved coecient values from previous iteration and current iteration is smaller than a
39
threshold. On the rst iteration, since there is no solved values for solar coecients
i
yet, we setK = 0
so that only the night time values are considered. This is because there is no solar generation in night time,
so that an approximation for
i
can be found. For the remaining iterations,K is set to a predetermined
value found by hyper-parameter searching. Algorithm 1 summarizes the overall procedure.
Algorithm1: Disaggregation Algorithm with CMM
Disaggregation(x, K’, MaxIter,): //x: Input features, K’: Parameter for weights in Eq (4.20),
MaxIter: Maximum number of iterations,: Convergence threshold
iter = 0
While(iter< MaxIter):
if(iter == 0):
K = 0 // On rst iteration, consider only night
else:
K = K’
Solve for
i
using Eq. (4.20) - (4.23)
Solve for
i
using Eq. (4.24) - (4.25)
Denote
0
i
and
0
i
as solution from prev. iteration
if (j
i
0
i
j
1
+j
i
0
i
j
1
<):
break
4.2.5 BaseCMMwithBatteryModel
To modify the base CMM method to model battery as well, we propose the hidden-battery model which
approximates the battery charging curve for each day as a sine curve with amplitude
t
at timet, where
one amplitude value is assigned to all time intervals within the same day. This is to ensure that the area
under the negative region (representing battery discharge) are equal to that of the positive region before
it (representing battery charge). Equation (4.7) will be solved by iteratively solving for the hidden-battery
model functions, and the load and solar models, and repeat until the results do not change more than a
certain threshold. Denote the estimated battery curve asf
batt
(t), which is dened as
f
batt
(t) =sin(t)
t
(4.26)
40
,wheresin(t) should have the same period as the number of intervals per day in the data, and
t
has the
same value fort
1
andt
2
if they are time intervals from the same day.
Equations (4.8) and (4.9) are used for this model as well, except the estimated load is appended with
the battery function.
^
l
it
T
i
g
t
+l
base
+f
batt
(t) (4.27)
After each iteration of updating
i
and
i
, we perform an extra optimization (4.28) for updating the
battery function.
min
t
y
i
(
^
l
i
+ ^ s
i
+f
batt
(t)) (4.28)
subject to
t
0 8t (4.29)
4.2.6 Post-DisaggregationAdjustment
The estimations computed from Equations (4.18) and (4.19) are only an approximation and do not follow
the constraint that sum of load and solar equals the AMI measurement. Thus, we perform the following
optimization after the disaggregation optimization which enforces this constraint. controls whether to
put the error on load or solar generation more.
min
l
it
;s
it
T
X
t=0
(l
it
T
i
g
t
) + (1:0)(s
it
T
i
R
t
) (4.30)
subject to l
it
0; s
it
0 8j (4.31)
l
it
+s
it
=y
it
(4.32)
41
4.2.7 Complexity
For disaggregation of a set of customer measurements, the algorithm performs a convex optimization in
Steps 1 and 2 , and another optimization in Step 3. Steps 1 and 2 are repeated until convergence or the
iteration count threshold, so the runtime depends on the threshold value. Step 3 takes linear time inO(T )
because the values ofl
i;t
ands
i;t
at each time stept are independent of other time steps, thus the solution
can be found very quickly.
Overall, these 3 steps are repeated for each customer, so as number of customer measurements increase,
the runtime will grow linearly. We study the empirical runtime in the experiments section.
4.3 ExperimentalResults
4.3.1 PerformanceofBaseCMM
4.3.1.1 Datasets
In the experiment, we use a dataset from Pecan Street, Inc. [35] for consumer data. We utilize the following
data on consumers from the dataset: whether they have PV systems, AMI measurements, energy generated
by the PV systems, and their consumption. We extract 15-minute interval data from all consumers in
Austin, Texas only, to comply with our assumption that all consumers are in the same region or city. This
gives us data on 197 consumers with PVs and 191 consumers without PVs. It is important to note that even
though there are many public datasets with solar generation data available, few of them provide separated
consumption and generation data like the Pecan Street dataset. For example, popular datasets for NILM
like DRED [100] only provide consumption data.
We also utilize the National Solar Radiation Database [29] provided by the National Renewable Energy
Laboratory in our experiments. This dataset contains solar irradiance data for the entire US in 4km 4km
grid resolution in 30-minute intervals.
42
4.3.1.2 Experiments
FeatureEngineering We took 197 consumers with rooftop PVs and 191 consumers without, and clus-
tered them into two clusters using their AMI measurements from October 2015. We use the true labels of
the consumers to evaluate the quality of the clusters by precision, recall and F1-score.
We also took the 197 consumers without rooftop PVs which are clustered based on their consumption
patterns and used the centroids of the clusters for the disaggregation step. The optimal hyperparameters
for the clustering algorithm were picked by using the Silhouette Coecient [84]. We varied the number
of clusters, the warping penalty and window size for DTW.
Disaggregation For our experiments, we tested the results of the disaggregation methods on 197 con-
sumers with rooftop PVs from October 16th 2015 to October 30th 2015. The centroids of clustering from
the feature engineering step are used as input features.
There are several hyperparameters that need to be chosen in our methods, including the time period
lengthT ,; for post optimization adjustment, and which is the coecient for L1 regularization. Con-
sidering the cost of installing additional meters to obtain ground truth for a validation dataset, we assume
that it is not easily obtainable and thus only use 10% of the consumers for validation. In our experiments, we
rst compared the dierent methods with the optimal; and while keepingT = 96 intervals14 days,
which is over the whole testing time period.
Afterwards, we investigated the choice of T for the best unsupervised method. We chose T from
{2,4,7,14,28} days 96 intervals. Then, we solved eachT interval as a separate optimization problem. We
use 4 weeks of data from October 3rd 2015 to October 30th 2015 to evaluate the choice ofT .
Baseline As a baseline, we assume we have access to ground truth, i.e., historical disaggregated con-
sumption l
i
and generation s
i
for a certain time period, which is used to nd
i
and
i
through linear
regression. This model was trained on disaggregated signals from October 1st 2015 to October 15th 2015.
43
Class Precision Recall F1-Score Support
0 0.89 1.00 0.94 191
1 1.00 0.88 0.94 197
Total 0.95 0.94 0.94 388
Table 4.1: Identication of consumers with and without PVs
The results are evaluated by computing the mean squared error (MSE) and mean absolute scaled error
(MASE) between the disaggregated consumption load and solar generation with the true load and genera-
tion respectively. MSE takes the squared error which is more sensitive to outliers. MASE scales each time
series by using naive forecasts, so that it is independent of the scale of the original data, this gives us a
fairer comparison among the errors of dierent consumers.
We use MASE over the more commonly used Mean Average Percentage Error (MAPE) metric because
there are many zeros or close to zero values in our data, and this causes the percentage error to be ex-
trememly high regardless of the absolute error. Thus, we use MASE which is also able to scale errors to
treat all consumers equally.
MSE =
1
N
(x ^ x)
2
(4.33)
MASE =
N 1
N
jx ^ xj
P
N
i=2
(x
i
x
i1
)
(4.34)
4.3.1.3 ResultsandDiscussion
FeatureEngineering Table 4.1 shows the results of clustering a collection of the AMI readings in Octo-
ber 2015 of 197 consumers with PVs and 191 without, after excluding users with invalid values. We can see
that we are able to get a F1-Score of 0.94 overall, which means that a very high percentage of consumers
are classied correctly. This supports our assumption that we can have separated datasets of consumers
with and without PVs for the disaggregation step.
44
Figure 4.4: Cluster centroids of consumption of consumers without PVs
Figure 4.5: Disaggregated consumption load for a user compared to the true load
Figure 4.6: Disaggregated solar generation for a user compared to the true solar generation
Method Consumption MSE Solar MSE
Baseline Supervised 0.4318 0.5506
Unsupervised (Whole Period) 0.2600 0.3864
Unsupervised (Night-time Split) 0.2494 0.3762
Table 4.2: MSE comparison of various training methods
Fig 4.4 shows the centroids of the 8 clusters of consumption patterns of consumers without PVs. Based
on the Silhouette Coecient, the best hyperparameter is 8 clusters, with window size 3. The gure shows
that there are clearly dierent types of consumption behaviors among consumers. These centroids are
used in the disaggregation step as input features for modelling consumption of consumers with PVs.
45
Figure 4.7: Disaggregated consumption load for a user compared to the true load for dierent T values
Method Consumption MASE Solar MASE
Baseline Supervised 0.7606 3.7034
Unsupervised (Whole Period) 0.6246 3.3048
Unsupervised (Night-time Split) 0.6050 3.3390
Table 4.3: MASE comparison of various training methods
Disaggregation Table 4.2 and 4.3 shows the MSE and MASE respectively between the true consumption
and solar generation signals compared to the disaggregated signals of the 3 methods described in Section
IV. Out of the 2 unsupervised methods, training by night-time split performed better than training by whole
period. Training by whole period does not perform best because tting the centroids to the readings in
day time does not give any extra information as the patterns with solar generation is completely dierent
from those without. On the other hand, getting a good t for the consumption at night guarantees we nd
a good mapping for the PV consumers to clusters.
The results of unsupervised methods are also better than the baseline supervised method. This means
that even without ground truth of disaggregated signals, our model is able to t the data better than where
ground truth is available. The reason why supervised is not the best model is because we are tting a
linear model using the rst half of October for the second half of October, but it is very likely the model
that ts well to the rst half of October does not model the rest of October accurately. The reason for this
is that consumer consumption behaviors and weather eects can change signicantly within a month.
Fig. 4.5 and 4.6 illustrates the disaggregated signals compared to the true signals for one user in 14
days. We can see that for this user, we get a good estimation for the coecient for modeling solar
46
Figure 4.8: MSE for various T values
Figure 4.9: MASE for various T values
generation, and derive disaggregated load by the disaggregated solar and the AMI measurements by using
the night-time split unsupervised method.
Since the night-time split method performs better, we study the eects of varying the period of time
T for which the model is trained on with this method. Fig. 4.8 and 4.9 shows the MSE and MASE for
variousT values respectively. For smallT values, the errors are large in general. This may be due to the
lack of enough data to t a good model. The bestT values are 1344 and 2688, which represent training
a model for every half a month, and training only one model for the whole month respectively. If we
consider the MSE,T = 2688 performs the best. However, if we look at the MASE,T = 1344 performs
better. Therefore the choice ofT depends on the objective - if every consumer is to be treated “equally”
then MASE (scaled consumption) should be used, otherwise, MSE (unscaled consumption) should be used.
The choice ofT is also dictated by how quickly the consumer behavior is expected to change (seasonal
47
changes). For example, Fig. 4.7 shows a user that has a signicantly dierent consumption behavior in the
rst half than the second half of the extracted subsequence of time series. By using a shorterT , a dierent
model can be t for each half, so that the disaggregated consumption load is less overestimated compared
to when a longerT is used.
4.3.2 PerformanceofIterativeCMM
4.3.2.1 Dataset
For customers data, we used the dataset provided by Ausgrid [82], which contained measurements from
300 customers in 30-min intervals. The dataset recorded measurements separated as 3 components: Gross
Generation (GG), General Consumption (GC), and Controlled Load Consumption (CL). GG measures the
generation from solar PVs. CL refers to utility-controlled water heating system that exists in 137 of the 300
customers. GC refers to the remaining general load consumption of each customer. Data from 2010 to 2012
was available, we focused on data from the months February 2012 and August 2012 in our experiments.
For temperature data, we took data from Sydney, Australia from NNDC Climate Data Online [70]. The
dataset provided the temperature data in Sydney in 30 minutes intervals in 2012.
4.3.2.2 DataPreprocessing
We performed linear interpolation to augment any missing values in the temperature data.
The Ausgrid dataset had no signicant missing values, so there is no need to perform any kind of
interpolation for augmenting missing value data. We calculated the total consumption and net load (AMI
measurement) for each customer:
48
Total Consumption =GC +CL (4.35)
AMI measurement =GC +CLGG (4.36)
We divided the dataset of 300 customers into two halves. For the rst 150 customers, we removed solar
generation from their measurements and treated them as customers without PVs. To preprocess the load
consumption time series of each customer for the feature extraction step, each time series was scaled to
values between 0 and 1.
For the next 115 consumers, we split the dataset as follows: 15 customers for reference purpose, as-
suming the disaggregated signals are known; and 100 customers for evaluating the algorithms, where only
AMI measurements was known to the algorithms. For the reference data, the solar generation time series
was scaled to values between 0 and 1 for clustering.
4.3.2.3 EvaluationMetrics
We consider three dierent error measures: Mean Absolute Error (MAE), and Mean Average Percentage
Error (MAPE), and Mean Absolute Scaled Error (MASE).
MAE considers the`1-norm of the absolute error, this simply measures the absolute dierence between
the estimated and the actual values. This means that data points with larger actual values would have a
greater signicance in the error, as they are prone to having greater errors. Thus, we also consider the
MAPE, which calculates the percentage error instead, therefore treats every data point equally as the error
will be scaled. However, the MAPE has a problem where if the actual values are too small, the error will be
close to innity. In our experiments, we remove actual values smaller than 1:0 for the calculation of MAPE,
which usually occurs for night time solar generation and thus is ne to ignore as it is of little importance
49
for most applications. (As there is no solar generation at night) An alternative to MAPE for a scaled error
is the MASE, which scales the errors by the mean of dierence between each consecutive data points.
MAE =jj
x ^ x
N
jj
`1
(4.37)
MAPE =
1
N
jj
x ^ x
x
jj
`1
100% (4.38)
MASE =
N 1
N
jjx ^ xjj
`1
P
N
i=2
(x
i
x
i1
)
(4.39)
In our experiments, an estimated load and solar measurement is acquired from each disaggregation
method for each customer. The load error is calculated as the error between the estimated load and the
actual load measurement. Similarly, the solar error is the error between the estimated solar and actual solar
measurement. The nal error is the mean of the errors of all customers. Since tting each disaggregation
model does not require any training data, the testing error is computed using all data not used in hyper-
parameter tuning or solar reference.
4.3.2.4 ExperimentalSetup
For our experiments, we use the 30-min interval dataset described in Section 4.3.2.1. First, we perform
the feature engineering step to produce the features needed for disaggregation. Our method is compared
against two dierent baselines as described below, using the evaluation metrics dened in Section 4.3.2.3.
Algorithms We test two variations of each method based on how the customer measurements are ag-
gregated.
50
Random: Refers to topology-driven aggregation. Since there is no topology information in the datasets
used, we simulate the eect of topology by randomly picking customers for aggregation. The aggregation
level is 5 in our experiments.
Cluster: Refers to behavior-driven aggregation. We cluster the customers with K-Shape [76] to group
those of similar behavior together. The cluster number is 5 in our experiments.
We compare the eectiveness of CMM in disaggregation compared to two baseline models:
Temperature Features Regression: Assume that 1 extra month of load data is available for each cus-
tomer to learn a temperature based load model by regression. Then, for the period of time where the
AMI measurement is to be disaggregated, estimate the load signal by the load model. Then, nd the so-
lar generation by deducting the load from the AMI measurement. The post disaggregation adjustment
as described in Section 4.2.6 is then applied. We follow the process dened in [63] for generating input
features for the regression model based on temperature. For each temperature value, it is distributed into
bins that represent ascending consecutive temperature ranges. Denote =f
1
;
2
;:::;
k
g as the bins and
1
;
2
;:::;
k
1 as the upper temperature range edge of the respective bin. The nal bin has no upper
limit so
k
is not dened. Given a temperature valueX, the bin values are assigned as follows:
i
=
8
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
:
min(X;
1
) i = 1
min(
i
i1
;X
i1
) i = 2::k 1
max(0;X
k1
) i =k
(4.40)
In our case, we set
1
= 5 and the following upper edges in increments of 3 (e.g.
2
= 8;
3
= 11). For
example, ifX = 15, then =f5; 3; 3; 3; 1; 0g. is used as the input features for the regression model.
It is unlikely that load data is available for training the load model, so this method is not practical and
solely done for evaluation purposes.
51
Kara: We implement a disaggregation model developed by Kara, et al 2018 [43] to perform disaggre-
gation. The model aims to nd the loadL
i
and solar generationS
i
of each customeri using input features
X
L
andX
i
respectively.X
L
includes time of day information and temperature related features generated
in the same way as the Temperature Features Regression described above. X
i
includes solar generation
time series from a separate tuning dataset. For these features we use the same reference dataset of solar
generation used in the CMM methodology. The weighting termsA
L
for load andA
i
for solar is taken
by estimating the inverse of the variance of the errors of the load and solar modelling. The optimiza-
tion formulation for this method is written as below, please refer to the original paper for details of the
implementation.
min jjA
^
L
(
^
L
t
X
^
L
^
L
)jj
2
2
+jjA
i
X
t
(S
i
X
i
i
)jj
2
2
(4.41)
subject to
^
L
t
+S
i
=Y
i
(4.42)
S
i
0; 8i (4.43)
L
i
0; 8i (4.44)
i
0; 8i (4.45)
We denote each method as follows:
• CMM : Our proposed methodology using a Consumer Mixture Model for modelling load
• CMM Temp : CMM with temperature features appended to the load modelling features
• Temp : Temperature Model Regression Baseline [63]
• Kara : Disaggregation Model using temperature features based on [43]
52
Feature Engineering We take 150 customers without rooftop PVs, and cluster them based on their
consumption patterns. The optimal hyperparameters for the clustering algorithm are picked by using the
Silhouette Coecient [84]. We use K-Shape algorithm for the clustering [76]. We vary the number of
clusters, the warping penalty and window size for Dynamic Time Warping, and number of clusters for
K-Shape.
Afterwards, centroids of the clusters are computed for the disaggregation step using DTW Barycenter
Averaging [80], which introduces a
parameter since we are using the soft-DTW variation where
is the
regularization parameter. The set of centroids based on cluster results of dierent number of clusters and
are saved separately, as these are also parameters of the disaggregation step.
Disaggregation For each disaggregation method, we perform a feeder-level disaggregation and calcu-
late the MAE, MASE and MAPE. The errors are calculated with the disaggregated signals compared to
the ground truth for each customer. Finally, the mean of the errors of all customers is taken to evaluate
the disaggregation performance. These errors are calculated separately for load measurements and solar
measurements.
For the CMM method, four parameters need to be tuned:
1. - Regularization parameter for CMM weights
2. K - Weight terms for night time only terms and full period terms
3. - Post optimization parameter
4. Number of clusters and
for clustering
These parameters can be tuned by the reference dataset using MAE for evaluation.
For all algorithms, we test the performance of the algorithms when the length of training period
T =f7; 14; 28g days. To simplify experiments, when varying T , we x number of customers (before
53
Figure 4.10: Centroids discovered from feature engineering step
aggregation)N = 100. We also investigate the eects on changing aggregation level on the performance
for CMM and Kara model when clustering by random or clustering. Each experiment is repeated for data
in February 2012 and August 2012 to test the algorithms under dierent conditions.
Runtime (Training) - For training the model on allN customers overT time intervals, record the
time needed.
4.3.2.5 ResultsandDiscussion
Analysis of feature engineering
In the feature engineering step, consumption patterns of customers without PVs were clustered. The
clustering was evaluated by silhouette coecient to pick the appropriate clustering for the disaggregation
step. Fig. 4.10 shows the centroids used for the August 2012 dataset experiments, which exhibited distinct
consumption patterns of dierent customers.
Comparison of various algorithms
54
Table 4.4: Disaggregation error for varyingT whenN = 100 in August 2012
MAE MASE MAPE
Load Solar Load Solar Load Solar
T (in days) 7 14 28 7 14 28 7 14 28 7 14 28 7 14 28 7 14 28
CMM (Random) 0.14 0.13 0.13 0.14 0.13 0.13 0.23 0.2 0.21 1.34 1.28 1.19 7.53 6.73 6.7 23.28 23.98 21.32
CMM Temp (Random) 0.17 0.18 0.24 0.17 0.18 0.24 0.28 0.29 0.39 1.66 1.84 2.26 9.49 9.70 12.51 27.85 30.52 37.48
Kara (Random) 0.18 0.15 0.16 0.18 0.15 0.16 0.3 0.24 0.28 1.73 1.47 1.52 9.95 7.93 8.79 29.91 27.78 27.69
Temp (Random) 1.1 1.07 1.17 1.1 1.07 1.17 1.08 1.01 1.2 4.68 4.79 4.96 34.25 33.44 38.71 42.98 48.06 45.22
CMM (Cluster) 0.39 0.41 0.41 0.39 0.41 0.41 0.26 0.25 0.26 1.14 1.13 1.01 6.69 5.96 6.26 22.68 23.68 19.76
CMM Temp (Cluster) 0.33 0.83 1.04 0.33 0.83 1.04 0.20 0.48 0.64 0.89 1.95 2.39 5.14 11.04 15.62 14.66 38.65 44.84
Kara (Cluster) 1.04 0.85 0.90 1.04 0.85 0.90 0.63 0.50 0.56 2.53 2.13 2.01 17.10 13.51 14.67 45.82 41.13 37.01
Temp (Cluster) 1.47 1.51 1.7 1.47 1.51 1.7 1.11 1.06 1.28 4.8 4.92 5.17 27.07 24.73 29.91 43.25 53.64 52.30
Table 4.5: Disaggregation error for varyingT whenN = 100 in February 2012
MAE MASE MAPE
Load Solar Load Solar Load Solar
T (in days) 7 14 28 7 14 28 7 14 28 7 14 28 7 14 28 7 14 28
CMM (Random) 0.22 0.21 0.21 0.22 0.21 0.21 0.49 0.48 0.52 2.89 2.08 2.02 13.54 12.97 13.31 23.33 22.34 21.07
CMM Temp (Random) 0.21 0.21 0.21 0.21 0.21 0.21 0.49 0.49 0.51 2.92 2.12 2.01 12.31 12.38 12.63 27.16 24.15 22.10
Kara (Random) 0.24 0.24 0.25 0.24 0.24 0.25 0.55 0.59 0.61 3.06 2.39 2.29 15.69 16.57 16.8 44.02 39.39 37.42
Temp (Random) 0.54 0.59 0.61 0.54 0.59 0.61 0.76 0.86 0.89 3.23 2.63 2.46 18.59 21.64 22.14 38.51 34.07 32.07
CMM (Cluster) 0.87 0.83 0.71 0.87 0.83 0.71 0.80 0.79 0.67 3.05 2.22 1.67 18.01 17.02 14.07 54.27 46.84 35.7
CMM Temp (Cluster) 0.67 0.60 0.58 0.67 0.60 0.58 0.60 0.56 0.55 2.45 1.70 1.48 14.14 13.05 12.52 36.88 28.56 24.37
Kara (Cluster) 0.97 1.13 1.27 0.97 1.13 1.27 0.88 1.04 1.16 3.40 2.93 2.85 19.82 21.79 23.51 53.92 52.18 50.70
Temp (Cluster) 0.88 0.88 0.90 0.88 0.88 0.90 0.74 0.78 0.8 3.05 2.44 2.26 16.15 17.00 16.90 43.30 36.59 33.09
Table 4.4 shows the results for each algorithm with varyingT whenN = 100 with August 2012 data.
In general, greater period of disaggregationT resulted in lower errors for all methods. GreaterT gives
more information for disaggregation, resulting in better accuracy.
When comparing CMM, CMM Temp, Kara and Temp Model with random aggregation on August data,
CMM performed best out of the four models in all error measures andT values. ForT = 14, load MAE of
CMM was 31.16% lower than the Kara model and 737.5% lower than Temp model. ForT = 28, load MAE
of CMM was 28.37% lower than Kara model and 830.1% lower than Temp model. Similar trends could be
seen from MASE and MAPE, meaning that the results were not aected by the scale of the data. CMM
Temp performance was around the same as the Kara Model. From Table 4.5, CMM performed better than
the non-CMM models on February data as well, but CMM Temp results were better than CMM in some
of the cases. It was likely that the data was volatile during this period of time and the extra temperature
features allowed for more accurate predictions.
For the experiments on cluster aggregation data in Table 4.4, CMM and CMM Temp performed better
than the other two models on August data for allT values. This result was consistent with the observations
for random aggregation. For February data results in Table 4.5, the improvement of CMM over Temp
55
model were smaller forT = 7 where load MAE of CMM was 1.96% lower than Temp compared toT =
28 where the load MAE of CMM was 26.95% lower. This showed that for CMM which was using the
contextually supervised source separation model, longer disaggregation periodT allowed it to perform
disaggregation more accurately. On the other hand, Temp relied on load prediction based on historical
temperature data, thus it became more dicult to predict the load further into the future asT becomes
larger. CMM Temp performed better than CMM and other baselines in all cases for this particular set
of experiments, showing that temperature features could be included into CMM to enhance accuracy.
Similar to the random aggregation experiments with February data, this was caused by the volatility of
the February data being higher than August data. A more detailed discussion is given at the end of this
section.
To visualize the disaggregation results, in Fig. 4.11 and 4.12 we plotted the load and solar respectively
of the rst 7 days of disaggregation results of CMM (Random) and Kara (Random) for a single aggregated
measurement for theN = 100 andT = 14 case. The Temp model was left out because it had a signicantly
larger error than the other two models. In Fig. 4.11, both models performed well when it was night time
(time intervals where solar generation was zero in Fig. 4.12). This was expected as the load could be
determined from the net load directly when solar generation was absent. For the time intervals when
solar was present, CMM performed better and followed closer to the ground truth. This was observed more
clearly in the solar disaggregation results in Fig. 4.12, as CMM was able to estimate solar disaggregation
more accurately than Kara.
Overall, it could be seen that CMM model outperformed the other baselines. These results were con-
sistent over dierent aggregation methods, seasonal dierences and length of disaggregation.
Analysis of varying aggregation
We experimented the eects of aggregation levels on dierent algorithms. For each algorithm, we xed
56
Table 4.6: Disaggregation error for varying aggregation level when N = 100 and T = 14 for CMM
(Random) in August 2012
MAE MASE MAPE
Agg. level Load Solar Load Solar Load Solar
5 0.128 0.128 0.204 1.284 6.732 23.979
10 0.230 0.230 0.233 1.218 7.916 25.256
20 0.383 0.383 0.246 1.018 5.897 21.427
Table 4.7: Disaggregation error for varying aggregation level whenN = 100 andT = 14 for Kara (Ran-
dom) in August 2012
MAE MASE MAPE
Agg. level Load Solar Load Solar Load Solar
5 0.150 0.150 0.243 1.465 7.928 27.779
10 0.362 0.362 0.373 1.869 13.132 36.206
20 0.835 0.835 0.544 2.200 14.696 42.853
number of customers N = 100 and length of disaggregation period T = 14 days. For clustering ag-
greagtion, the number of clusters was varied. For random aggreagtion, the aggregation level (number of
aggregated customers per measurement) was varied.
Table 4.6 and 4.7 showed the results for CMM and Kara model for random aggregation on August 2012
data. The results showed that for CMM, MASE and MAPE remained almost the same as aggregation level
varied. For Kara, MASE and MAPE increaseed as aggregation level increased. MAE increased along with
aggregation level for both models because the amplitude of each aggregated signal increased as aggrega-
tion level increased. CMM performed better in terms of all error measures for all aggregation levels as
well. Since the scales of data diered when aggregation level changed, MASE and MAPE should be used to
compare dierent models. When varying aggregation levels between 5 and 20, the load MASE and MAPE
of CMM was within 0.032 and 2.019% dierence respectively, while that of Kara was within 0.310 and
6.768% respectively. That of solar MASE and MAPE for CMM was within 0.266 and 3.829% respectively,
while that of Kara was within 0.725 and 15.074% respectively. This showed that CMM was more robust to
performing disaggregation on data of varying aggregation level. This can be explained by CMM perform-
ing disaggregation for each customer individually while the Kara model solved for all customers in one
57
Table 4.8: Disaggregation error for varying cluster number for aggregation whenN = 100 andT = 14
for CMM in August 2012
MAE MASE MAPE
Cluster no. Load Solar Load Solar Load Solar
5 0.406 0.406 0.251 1.128 5.963 23.680
10 0.193 0.193 0.188 1.040 5.966 20.066
15 0.149 0.149 0.192 1.267 6.008 19.142
20 0.131 0.131 0.227 1.408 5.279 21.016
Table 4.9: Disaggregation error for varying cluster number for aggregation whenN = 100 andT = 14
for Kara in August 2012
MAE MASE MAPE
Cluster no. Load Solar Load Solar Load Solar
5 0.851 0.851 0.507 2.136 13.510 41.113
10 0.373 0.373 0.354 1.835 12.408 35.634
15 0.229 0.229 0.279 1.643 8.966 29.475
20 0.178 0.178 0.284 1.564 7.222 24.114
optimization formulation. As a result, as aggregation level increased, less customer data was available for
disaggregation and the Kara model lost accuracy. The same trends were observed in Table 4.10 and 4.11
for the same experiments performed on February 2012 data.
Table 4.8 and 4.9 showed the results for CMM and Kara model for cluster aggregation on August 2012
data. For this experiment, increasing cluster numbers was equivalent to decreasing the aggregation level.
The trends that could be seen from the results were similar to the random aggregation experiment. MASE
and MAPE decreased as cluster number increased for both models. CMM performed better for all cluster
numbers. The same conclusions could be drawn from Table 4.12 and 4.13 which records results of the same
experiments performed on February 2012 data.
Remarks on February 2012 data
In all experiments, accuracy of disaggregation on the February data was lower in general compared
to that of August data. This could be explained by the much higher volatility of solar generation patterns
in February. Fig. 4.13 plotted 7 days of solar generation of a randomly selected customer in February and
58
Figure 4.11: First 7 days of load disaggregation results comparison of CMM (Random) and Kara (Random)
for a single aggregated measurement inN = 100 andT = 14 case
Figure 4.12: First 7 days of solar disaggregation results comparison of CMM (Random) and Kara (Random)
for a single aggregated measurement inN = 100 andT = 14 case
Figure 4.13: Comparison of the solar generation of a customer in August 2012 and February 2012 for 7 days
59
Table 4.10: Disaggregation error for varying aggregation level when N = 100 and T = 14 for CMM
(Random) in February 2012
MAE MASE MAPE
Agg. level Load Solar Load Solar Load Solar
5 0.205 0.205 0.484 2.075 12.971 22.335
10 0.369 0.369 0.538 2.057 14.033 26.339
20 0.589 0.589 0.534 1.727 10.500 28.831
Table 4.11: Disaggregation error for varying aggregation level when N = 100 and T = 14 for Kara
(Random) in February 2012
MAE MASE MAPE
Agg. level Load Solar Load Solar Load Solar
5 0.304 0.304 0.735 3.029 20.914 55.366
10 0.655 0.655 0.980 3.579 24.197 62.633
20 1.322 1.322 1.205 3.801 23.468 63.085
August. While the August pattern stayed mostly regular with slight variations over each day, the February
pattern greatly changed between days, this made it dicult to predict the February pattern.
Analysis of the case when reverse ows are not allowed
One assumption of our problem denition is that reverse ows are allowed in the grid. This means that
when solar generation is greater than consumption in a household, the excess generation is owed from
the household to the grid. This is recorded as a negative load in the net load. However, in some scenarios
utility companies do not allow customers to put the excess generation back into the grid forcing them to
curtail. In this case, when solar generation exceeds the consumption, a net load of zero is measured by the
AMI meter.
We demonstrate that our models can be used without any changes under such cases by evaluating
the disaggregation of CMM (Random) on a modied Ausgrid dataset. In the modied dataset, the solar
generation was curtailed whenever there was excess generation, instead of allowing reverse ows. Table
4.15 shows the results. Comparing the results against CMM (Random) on the original dataset in Table
4.6, we see that the error increased. In terms of MAE, the error increased by 21.09%, 14.34% and 25.06%
for when aggregation level was 5, 10 and 20 respectively. The accuracy was lower on the modied dataset
60
Table 4.12: Disaggregation error for varying cluster number for aggregation whenN = 100 andT = 14
for CMM in February 2012
MAE MASE MAPE
Cluster no. Load Solar Load Solar Load Solar
5 0.577 0.577 0.551 1.662 12.331 25.261
10 0.357 0.357 0.545 2.383 12.978 28.862
15 0.248 0.248 0.525 2.349 12.342 19.036
20 0.191 0.191 0.491 1.955 11.235 18.846
Table 4.13: Disaggregation error for varying cluster number for aggregation whenN = 100 andT = 14
for Kara in February 2012
MAE MASE MAPE
Cluster no. Load Solar Load Solar Load Solar
5 1.136 1.136 1.045 2.932 21.798 52.180
10 0.530 0.530 0.791 3.034 19.000 51.179
15 0.324 0.324 0.637 2.454 15.142 32.709
20 0.243 0.243 0.676 2.295 13.560 32.520
because less information regarding solar was available to the model for disaggregation. The shape of excess
solar generation provides crucial information for accurate solar modeling which is unavailable in the cases
when reverse ows are not allowed. Regardless, our CMM was still able to perform disaggregation on the
modied dataset with lower error than the baseline methods on the original dataset.
Analysis of runtime
The experiments were run on a Ubuntu 14.04.4 LTS system with 32 cores, 128 GB RAM and a clock
speed of 2.6 GHz. The GPU used was Nvidia Tesla K40C. All methods were implemented using Python.
Table 4.14 shows the runtime of CMM and Kara model under varying aggregation levels for a xedN
andT . We saw that all algorithms increased in runtime linearly with decreasing aggregation level. This
showed empirically that the methods were scalable to large datasets, and that it followed the theoretical
analysis of runtime as well. Note that preprocessing time was not measured in these records and thus there
may be extra time needed for the initial run for tasks like feature engineering and customer clustering.
61
Table 4.14: Runtime (s) for varying aggregation level whenN = 100,T = 14 with August 2012 Data
Method / Agg. level 5 10 20
CMM 22.889 11.094 6.641
Kara 16.210 4.125 1.344
Table 4.15: Disaggregation error for varying aggregation level when N = 100 and T = 14 for CMM
(Random) without reverse ows in August 2012
MAE MASE MAPE
Agg. level Load Solar Load Solar Load Solar
5 0.155 0.155 0.243 2.177 7.476 35.338
10 0.263 0.263 0.262 2.196 10.112 51.352
20 0.479 0.479 0.303 2.170 8.527 39.474
4.3.3 Experimentsinpresenceofbatteries
4.3.3.1 DatasetPreparation
Various sources of residential AMI measurement data are available, but it is dicult to acquire datasets
for customers with both battery storage and PV installed. We generated simulated data with the System
Advisor Model (SAM) software developed by National Renewable Energy Laboratory [7] using Pecan Street
Inc. customer electric load data as inputs for data with PV generation. SAM appends these data with battery
behavior through simulations. Pecan Street Inc. dataset [21] contains unique, circuit-level electricity use
data at one-minute to one-second intervals for approximately 800 homes in the United States, with PV
generation and EV charging data for a subset of these homes.
We extracted 200 customers with PV and 200 customers without PV living in Austin, Texas from the
Pecan Street dataset. For each customer, we took their consumption data for 2018 in 15 minute intervals.
Clustering was performed on the non-PV customers in 15 minute intervals while the PV customer data
was downsampled to 60 minute intervals for SAM.
To preprocess the dataset, we rst ltered out customers with missing data on over 1% of the intervals.
Afterwards, any missing values were lled by linear interpolation. This left us with 115 customers in the
dataset. For the load of each customer, we simulated PV and battery behavior using SAM with the solar
62
panel module SunPower SPR-X21-335, invert module SMA America: SB3800TL-US-22 [240V] and lithium
ion nickel manganese cobalt oxide battery model. Each customer was assigned varying number of solar
panel modules and battery capacity to introduce some randomness in the dataset. The battery behavior
was set to be forward peak shaving. Figure 4.3 shows an example of customer data after the simulation.
4.3.3.2 Baselines
We compared our new method with two baselines: the base CMM method and a temperature-based load
model. These baselines are state-of-the-art for disaggregation of BTM PV, but do not consider the existence
of BTM battery.
The base CMM as proposed in [15] is the CMM method proposed in this thesis without the battery
model step. The details are described in section 4.2.
The temperature-based load model ts a temperature-dependent function that maps the current ambi-
ent temperature to the current load consumption [63]. It is used in recent disaggregation algorithms like
[44]. In our experiments, we train the function using load data from each customer 2 months prior to the
period of time considered for disaggregation. Then, the function is used to predict the load consumption of
the customer, the predicted load replaces the CMM for estimating load in the disaggregation optimization.
We experiment with both such model with and without hidden-battery modelling.
4.3.3.3 ExperimentalSetup
Our experiments were set up as follows. We prepared 115 customer data with PVs modied with simulated
BTM battery behavior, and split the dataset into 100 customer data for testing and 15 customer data for
referencing in the CMM method. For each algorithm, the testing dataset of AMI measurements was dis-
aggregated into consumption, solar generation, and battery components. The results were then evaluated
by computing the error between the estimated disaggregated signals from the algorithms and the ground
63
truth for each kind of component, then the errors of all components were summed and the mean across
all customers were taken. We will use Mean Absolute Error (MAE) and Mean Average Percentage Error
(MAPE) as the error metrics.
MAE =jj
x ^ x
N
jj
`1
(4.46)
MAPE =
1
N
jj
x ^ x
x
jj
`1
(4.47)
MAE computes the absolute error at each time interval while MAPE computes the percentage error.
MAE as a result takes customers with greater consumption and generation at higher importance, while
MAPE evaluates all customers equally. When calculating MAPE, all time intervals where the ground truth
data value is below 0:1 is ignored. This is done because MAPE tends to innity when the real value is very
small.
Disaggregation period referred to the amount of time intervals of measurements that would be disag-
gregated into their separate components. Aggregation level referred to the the number of customers that
would be aggregated into one measurements before performing disaggregation, for example, at aggrega-
tion level 10, we would aggregate the measurement for every 10 customers out of the 100 in the testing
dataset to form 10 new customer measurements. Aggregation was done by taking their arithmetic mean.
For each algorithm, we repeated the experiments to evaluate their accuracy for varying disaggregation
periods [15,30,45] days while keeping aggregation level constant at 10, and varying the aggregation level
[1,10,25,50] while keeping disaggregation period constant at 30 days.
Finally, we also recorded the run time of each algorithm for disaggregating increasing number of cus-
tomers to study the empirical computational complexity of each algorithm.
64
Figure 4.14: First day of data of centroids from the clustering results
Table 4.16: Disaggregation accuracy under varying length of disaggregation periods in MAE
Method \ Period (days) 15 30 45
CMM (Hidden-Battery) 0.933 1.028 0.999
CMM (No Hidden-Battery) 1.355 1.433 1.455
Temperature model (Hidden-Battery) 0.992 1.021 1.043
Temperature model (No Hidden-Battery) 1.173 1.227 1.243
4.3.3.4 ResultsandDiscussion
Feature Engineering (Clustering) In order to build features for modelling customer consumption as
part of CMM, non-solar household consumption curves were clustered to discover major consumption
patterns in the neighborhood.
Based on evaluation by Silhouette Coecient [84], the best number of clusters was 4. Figure 4.14 shows
the rst day of data extracted from the centroids found from the clustering results.
Disaggregation First, we compared the disaggregation accuracy under varying length of disaggrega-
tion periods while keeping aggregation level constant at 10. Table 4.16 and 4.17 summarized the results.
65
Table 4.17: Disaggregation accuracy under varying length of disaggregation periods in MAPE
Method \ Period (days) 15 30 45
CMM (Hidden-Battery) 0.732 0.763 0.710
CMM (No Hidden-Battery) 1.099 1.147 1.138
Temperature model (Hidden-Battery) 0.789 0.802 0.804
Temperature model (No Hidden-Battery) 0.958 0.992 0.981
Table 4.18: Disaggregation accuracy under varying customer aggregation levels in MAE
Method \ Aggregation level 1 10 25 50
CMM (Hidden-Battery) 1.144 1.028 1.003 1.009
CMM (No Hidden-Battery) 1.325 1.433 1.432 1.422
Temperature model (Hidden-Battery) 1.042 1.021 1.078 1.094
Temperature model (No Hidden-Battery) 1.187 1.227 1.220 1.218
Table 4.19: Disaggregation accuracy under varying customer aggregation levels in MAPE
Method \ Aggregation level 1 10 25 50
CMM (Hidden-Battery) 1.018 0.763 0.771 0.786
CMM (No Hidden-Battery) 1.125 1.147 1.140 1.133
Temperature model (Hidden-Battery) 0.955 0.802 0.856 0.873
Temperature model (No Hidden-Battery) 1.007 0.992 0.985 0.984
In general, hidden-battery models perform better than no hidden-battery models. To illustrate the dier-
ence between having a hidden-battery model and not with performing the disaggregation, we plotted the
disaggregated and ground truth measurements on the same plot for a particular user using either models
in Fig. 4.15, 4.16, 4.17, 4.18. Fig. 4.15,4.16 shows the solar disaggregation measurements with or with-
out hidden-battery model repsectively, and Fig. 4.17,4.18 shows that of consumption loads. The gures
illustrate how without the hidden-battery model, a large amount of solar generated was missing from the
solar disaggregation estimation. This was because without the hidden-battery model, the disaggregation
model cannot account for the amount of solar generation or consumption that was hidden by peak shaving
behavior of battery, this results in underestimation of the disaggregated estimations.
With regards to the variations of disaggregation period, there did not seem to be a clear pattern that
indicates a relationship between the period and disaggregation accuracy.
66
Figure 4.15: Disaggregated solar and ground truth measurements of a user with no hidden-battery model
disaggregation
Figure 4.16: Disaggregated solar and ground truth measurements of a user with hidden-battery model
disaggregation
Figure 4.17: Disaggregated load and ground truth measurements of a user with no hidden-battery model
disaggregation
67
Table 4.20: Runtime for various models for disaggregation period 30 days
Method \ Runtime (s) 2 4 10 100
CMM (Hidden-Battery) 3.221 5.798 13.329 125.908
CMM (No Hidden-Battery) 0.689 0.788 1.177 6.261
Temperature model (Hidden-Battery) 3.017 5.380 12.634 121.405
Temperature model (No Hidden-Battery) 0.613 0.708 1.089 5.492
We also evaluated the algorithms under dierent aggregation levels while keeping disaggregation pe-
riod as 30 days. Table 4.18 and 4.19 summarize the results. For aggregation level = 1, there was no clear
advantage of using the hidden-battery model, as our hidden-battery model used a sine curve to model a
simplied shape of the battery. However, at aggregation level = 10, 25 or 50, having a hidden-battery
model in the optimization improves the result signicantly. For the CMM, MAE error decreased by about
28%, and for the temperature model, MAE error decreased by roughly 19% at aggregation level = 10.
When comparing the CMM and the temperature model, temperature model worked better at aggrega-
tion level = 1 while CMM worked slightly better for other cases. The two models require dierent knowl-
edge and inputs for the disaggregation hence it is not appropriate to directly compare the two models. In
general, the CMM requires less prerequisite knowledge, as it only required separated BTM measurements
for a small number of reference households, while to train the temperature load model, it was required to
know the pure load consumption for each household. The better performance of CMM at higher aggrega-
tion levels can be explained by how aggregated load measurements can be better represented by centroids
of major consumption pattern clusters.
Finally, Table 4.20 shows the runtime of the algorithms for dierent aggregation levels, the higher
the aggregation level, the less number of measurements are disaggregated. As expected, the runtime of
all algorithms follow a linear relationship with the number of customers to disaggregate. Inclusion of
hidden-battery model increases the runtime, but the computation time was still within a reasonable range
for practical use (approximately 2 minutes for 100 customers).
68
Figure 4.18: Disaggregated load and ground truth measurements of a user with hidden-battery model
disaggregation
69
Chapter5
Spatial-TemporalDataAnalysisinSmartGrid
Data measured from the smart grid are spatial-temporal data. Spatial refers to the location of which the
data is collected, and temporal stems from how they are sequence of data recorded over time. The temporal
dependencies in smart grid data are leveraged in most existing data-driven models used for smart grid data
analytics. For example, load forecasting models are tted to predict future load values from historical data.
However, the spatial aspect of smart grid data has not been investigated in the current literature.
We propose the use of spatial information from smart grid data in the form of network topological
connections for data analytics in smart grid. In this section, we rst explain Spatial-Temporal Graph Con-
volution Networks (STGCNs), a neural network architecture specialized for modelling spatial and time
correlations. Then, we propose to utilize STGCN models to solve two data analytics in smart grid: Load
forecasting (with missing data) and missing data imputation. We focus on problems involving missing
data because spatial features can be leveraged for analyzing periods of missing data, thus, giving better
performance over existing methods.
70
5.1 BackgroundofSpatial-TemporalGraphConvolutionNetworks
5.1.1 GraphConvolutionalNetwork(GCN)
Graph Convolutional Network (GCN) was rst developed by Kipf and Welling [48] for the classication
of nodes in a graph network. It is a variant of CNN that performs convolution directly on graphs, this
convolution allows each node to aggregate the input representations of neighboring nodes to form a new
representation. This allows information exchange between neighboring nodes to learn spatial dependen-
cies among these nodes.
In the context of smart grids, the network topology of the power grid can be used to dene the graph
structure. An example of the network topology is shown in Figure 5.3. The transformers and loads, denoted
by solid black dots, constitute the nodes of the graph. The feeder lines, denoted by solid or dashed lines,
constitute the edges of the graph. The edges connects the nodes which represent the two end points of the
feeder lines. We denote the graph obtained from the power grid topology asg.
The inputs to a GCN layer is a matrix ofN data points, each withD
i
dimensions at layeri. Each layer
of GCN performs a matrix multiplication on the inputs and the adjacency matix. To avoid changing the
scale of the inputs, the adjacency matrix is normalized by symmetric normalization using the diagonal
node degree matrix. Self-loops are also added for each node to the adjacency matrix so that the features of
each node itself are also considered when computing the outputs for each layer. Mathematically, a GCN
layer can be written as
H
(l+1)
=(
~
D
1
2 ~
A
~
D
1
2
H
(l)
W
(l)
) (5.1)
whereH
(l)
2R
ND
l
denotes thel
th
layer of the network with its weight parameters denoted asW
(l)
. A
denotes the adjacency matrix of the graphg, and
~
A = A+I
N
is dened as the adjacency matrix with added
self-connections, whereI
N
is the identity matrix.
~
D is the degree matrix calculated by the corresponding
modied adjacency matrix
~
A by
~
D
ii
=
P
j
~
A
ij
.
71
Figure 5.1: Architecture of a Spatial-Temporal Convolutional Block
Figure 5.2: Description of convolution on inputs
Ecient methods for implementing GCNs are discussed in [48].
5.1.2 Spatial-TemporalGraphConvolutionalNetwork(STGCN)
While a GCN layer only learns spatial correlations, Spatial-Temporal Graph Convolutional Network (STGCN)
forms spatial-temporal convolutional blocks which consists of gated convolutional layers to perform con-
volution in the temporal dimension and GCN layers to discover spatial correlations. Gated convolutional
72
layer performs a 1D convolution operation with aK-length lter of a inputl-length vector withC
i
chan-
nels denoted as x2 R
lC
i
. This lter discovers correlations among K neighboring values in the vec-
tor, and the operation reduces the length of the vector by K 1. We denote the convolution lter as
2R
KC
i
Co
whereC
o
is the output channels size. x thus performs convolution onx to give an
output inR
(lK+1)Co
. The gated convolution operation can then be dened as function
f
c
= (x)(x) (5.2)
where is the element-wise product operator and is the sigmoid function.
A spatial-temporal convolutional block is structured as a gated convolutional layer in the temporal
dimension followed by a GCN layer and another gated convolutional layer. The structure of this convo-
lutional block is illustrated in Figure 5.1. Denotingf
g
as the graph convolution operation as described in
Section 5.1.1,f
c1
andf
c2
as gated convolutional layers, andx
t
as the input at time intervalt, the spatial-
temporal convolutional block function can be written mathematically as
x
t
=f
c2
(f
g
(f
c1
(x
t
))) (5.3)
Overall, the STGCN layers takes from each node, a time series input of the historical load sequences
of that node, a temporal convolution is performed along the axis of the input of each node, the output is
then passed to a graph convolution layer to perform convolution across neighbor nodes inputs, a second
temporal convolution is then performed. This is illustrated in Figure 5.2. A bottleneck strategy is applied
within the block, meaning the output feature size of the GCN layer is smaller than the two temporal
layers, forcing the network to learn a meaningful representation of the inputs that considers both spatial
and temporal dependencies. A fully connected output layer is applied at the end of the model to translate
the learned representation from convolutional blocks to the targeted prediction values.
73
In this thesis, the STGCN model we used is similar to STGCNs used in other applications like trac
ow prediction [108].
5.2 LoadForecastingwithSTGCNs
5.2.1 ProblemDenition
5.2.1.1 Short-termLoadForecasting(STLF)Denition
Given the time series of load consumption of customers in a smart grid, the problem of short-term load
forecasting is to predict the load consumption values for each customer for the next few intervals. We are
also given the topology of the power distribution network.
Formally, the problem of load forecasting that we consider in this section is dened as follows: Let
x
1:T
=fx
1
;x
2
;x
3
;:::;x
T
g denote the input time series, where the data point at each time intervalt is
represented byx
t
and represents the vector of load consumption values of the customers in the grid. Let
x
i
t
denote the data point for customeri. We are also given a window sizeW that denotes the length of
previous intervals to use for prediction and a future horizon size H that denotes the number of future
intervals to predict. The problem of STLF is to nd a prediction functionf such that:
x
t+W+1:t+W+H
=f(x
t:t+W
) (5.4)
The functionf can be any regression model or neural network architecture.
74
Figure 5.3: Grid topology (Image borrowed from from [9])
Now let us assume that the power grid distribution network topology graphg is provided to us with
a customeri represented as nodei on the graph. The edges in the graph represent the feeder lines be-
tween two customers. In this case, we extend the problem of STLF to consider both spatial and temporal
correlations and learn a functionf
0
such that:
x
t+W+1:t+W+H
=f
0
(x
t:t+W
;g) (5.5)
The network topology graph of the dataset used for our experiments is shown in Figure 5.3. This net-
work is a real radial distribution system located in Midwest U.S. with 240 nodes. Customers are connected
to these nodes via secondary distribution transformers.
Our objective is to investigate whether the use of spatial information can improve short term load
forecasting accuracy. This is done by comparing prediction models without spatial inputf to those with
spatial inputsf
0
. Accuracy of a model will be measured by the error between the predicted values ^ x to the
75
actual values x using an error functionE(^ x;x). Widely used error metrics such as Mean Average Error
(MAE), Mean Average Percentage Error (MAPE) and Root Mean Squared Error (RMSE) can be chosen for
E. Please see Section 5.2.4 for more details.
5.2.1.2 STLFwithMissingData
In real-world deployment scenarios, the data obtained from the metering infrastructure is prone to missing
or corrupted values [85]. The missing or corrupted values are often labelled with 0s orNaN. STLF on
such data may lead to very large errors in predictions which is undesirable.
We hypothesize that by leveraging spatial information, STLF models can be robust to missing data. To
test the hypothesis, we evaluate the robustness of our STLF models by testing their accuracy via a modied
testing methodology (Section 5.2.3.2). We remove some data points from our test datasetX
te
using missing
data patterns that occur widely in the eld and reect several typical real-world missing data scenarios
[85]. Letx
0
te
denote this modied test dataset.
In our modied testing methodology, the testing dataset predictions ^ x are acquired usingx
0
te
.
^ x =f
0
(x
0
te
;g); (5.6)
while the error is evaluated between the predicted values and the unmodied testing dataset.
Error =E(^ x;x
te
) (5.7)
5.2.2 ApplicationofSTGCNtoPowergrid
To apply STGCNs for power grid networks, we need to dene the grid as a graph. As explained earlier in
this section, nodes represents loads of customers, and edges connect nodes according to how transformers
and loads are connected in the smart grid topology. However, edges do not represent a uniform distance
76
Figure 5.4: Modication on graph by appending edges to multihop neighbors
between the nodes, so customers that are connected on the topology may actually be very distant, while
nodes two hops away (meaning it is reached by going through two edges) can actually have a small dis-
tance between them. We can see through power ow equations that the power at nodes are correlated to
neighbors that are two hops away. Using the DC power ow model [88], denoteP
ij
as the power ow
from busi to busj,X
i
as the phasor angle at busi, andb
ij
as the line inductive reactance of the line from
busi to busj.P
ij
can be dened as
P
ij
=b
ij
(X
i
X
j
) (5.8)
and the phasor angle can be represented as
X
i
=
X
j6=i
b
ij
P
i6=j
b
ij
X
j
+
1
P
i6=j
b
ij
P
i
(5.9)
WhereP
i
is the sum of power owing into busi. Suppose we examine a the powerP
i
at a busi connected
to busj, and denote buses connected toj as busk, by substituting (5.9) into (5.8), we can get
P
ij
=b
ij
X
i
b
ij
X
k6=j
b
jk
P
j6=k
b
jk
X
k
b
ij
P
j6=k
b
jk
P
j
(5.10)
We can see that buses that are two-hop neighborsX
k
have an eect on the value of power ows out of
bus i as well, but the term is multiplied by the coecient b
ij
b
jk
P
j6=k
b
jk
while immediate neighbors are
77
multiplied byb
ij
. Since
P
j6=k
b
jk
> b
jk
as inductive reactance is always non-negative, we can conclude
that
b
ij
b
jk
P
j6=k
b
jk
30
3 Electric Cooker Electric Cooker present in household Yesor No
4 Electric Heater Electric Heater present in household Yesor No
5 House Floor Area Floor Area of the house (in square feet) <180or180
6 Children Have children living in household Yesor No
7 Resident Number Number of residents living in house-
hold
2or>3
8 Residents in Daytime Have residents living in household in
daytime
Yesor No
9 Retirement Main income earner in household has
retired
Yesor No
10 Age Age of main income earner in house-
hold
<65or65
11 Detached House Is the house a detached house or con-
nected house
Yesor No
12 Number of Bedrooms Number of Bedrooms in the house 2or>3
13 Tumble Dryer Tumble Dryer present in household Yesor No
14 Dishwasher Dishwasher present in household Yesor No
15 Stand alone freezer Stand alone freezer present in house-
hold
Yesor No
16 Education Education level of main income earner
in the household
less than pri-
mary or above
second level
17 Income Annual income of the main income
earner in the household
55kor>55k
139
6.3.3.2 Baselines
We use two naive methods as baselines to compare against results from using extracted input features.
These two methods are statistical methods that make prediction based on the statistics of the target feature
values only:
• Identical Outputs (Identical) - Outputs either all “True" or all “False" for a feature. Choose “True" or
“False" depending on whether the training dataset has more "True" or "False" values for that feature.
• Biased Random Guess (Biased) - Make a biased random guess based on how likely a value is “True"
or “False" for a feature. Estimate the probability of a featured having a “True" outcomeP (x
d
=
\T ") =
N
(T)
d
N
whereN
(T)
d
is the number of “True" values in the training dataset for featured and
N is the total number of data points in the training dataset. Then, for each prediction for feature
d in the testing dataset, output "True" with probability P (x
d
= \T ") or “False" with probability
1P (x
d
= \T ").
6.3.3.3 ExperimentSetupandEvaluationMetrics
Table 6.8: List of input features congurations used for each experiment
No. Input Features Used
1 Clusters only
2 Consumption and Ratio
3 Clusters, Consumption and Ratio
In the experiments, we predict the target features using a Support Vector Classier (SVC) model or a
Feed Forward Neural Network (NN) with the prepared input features. In Section 6.3.3.1, we have described
three types of input features: Consumption features, ratio features and cluster features. We test the perfor-
mance of the SVC predictor when using dierent sets of features. Table 6.8 summarizes the 3 set of input
features combination we use. We label each method the model initials and the input features conguration
140
number, for example, "SVC1" denotes using SVC model with cluster features only, "NN3" denotes using
NN model with all input features.
Since all target features are binary values, the result prediction can either be:
• True Positive (TP): Predicting a "True" result correctly
• True Negative (TN): Predicting a "False" result correctly
• False Positive (FP): Predicting a "True" result as a "False"
• False Negative (FN): Predicting a "False" result as a "True"
We evaluate the results by computing the classication accuracy and balanced accuracy as used by
[102]. The classication accuracy simply calculates the rate of correct predictions, while the balanced
accuracy is the average of correct "True" predictions and correct "False" predictions. The balanced accuracy
is able to better show the ability of the predictor to predict correctly when the target feature is heavily
biased towards either "True" or "False".
Classication accuracy =
TP +TN
TP +TN +FP +FN
(6.12)
Balanced accuracy =
1
2
TP
TP +FN
+
1
2
TN
TN +FP
(6.13)
The dataset of 4323 customers is split into training, validation and testing dataset by a 4 : 1 : 5 split.
The model is trained by the training dataset and validation is used to search for the best hyperparameters
for the SVC model. The results are evaluated by the classication accuracy and balanced accuracy on the
testing dataset.
141
Table 6.9: Classication accuracy for each input feature conguration with SVC predictor compared
against baselines
No Feature Identical Biased SVC1 SVC2 SVC3 MissRate(%)
1 Social Class 60.92 52.94 60.87 63.37 63.12 96.50
2 House Age 56.24 49.57 54.63 57.42 58.18 100.00
3 Electric Cooker 74.79 64.57 74.79 75.50 76.30 99.91
4 Electric Heater 69.87 58.70 69.87 69.82 69.82 99.91
5 House Floor Area 96.55 93.77 96.55 96.55 96.55 42.49
6 Children 71.31 59.26 71.31 71.88 71.60 100.00
7 Resident Number 71.83 58.36 71.83 72.02 72.07 100.00
8 Residents in Daytime 65.08 57.18 68.38 71.79 71.93 100.00
9 Retirement 68.86 57.37 68.86 69.38 69.80 100.00
10 Age 77.08 66.40 77.08 77.13 77.08 100.00
11 Detached House 83.40 73.94 83.40 83.40 83.40 99.91
12 Number of Bedrooms 90.73 82.07 90.73 90.68 90.68 99.91
13 Tumble Dryer 66.97 59.26 69.47 71.08 70.60 100.00
14 Dishwasher 66.21 57.23 68.95 71.08 70.94 100.00
15 Stand alone freezer 50.43 49.95 56.85 63.61 64.08 100.00
16 Education 86.65 76.15 86.65 86.65 86.65 94.52
17 Income 50.67 49.64 50.98 52.02 54.20 45.60
- Average (Mean) 71.03 62.73 71.84 73.14 73.35 92.87
6.3.3.4 ResultsandDiscussion
Table 6.9 and 6.10 shows the classication accuracy and balanced accuracy of the baselines and SVC using
the dierent input congurations. The nal column shows the MissRate, which is the rate of missing data
for each feature in the testing dataset. We could observe that not all features are correlated to features
extracted from consumption data. For "Electric Heater", "House Floor Area", "Age", "Detached House",
"Number of Bedrooms", and "Education", the results were either the same as "Identical" outputs baseline or
were within a very small margin of less than 0:1% dierence. We could also see that the balanced accuracy
for these features were 50:00% for all baselines as they classied all features correctly for one class and all
incorrectly for the other. This meant that the SVC was not able to nd a good classication for the input
features and simply output the same class for all samples. This may be due to these features being not
correlated to the electricity consumption. For "Electric Heater" in particular, since only summer electricity
consumption data was used in this experiment, the heater may not have been used in this period.
142
Table 6.10: Balanced accuracy for each input feature conguration with SVC predictor compared against
baselines
No Feature Identical Biased SVC1 SVC2 SVC3 MissRate(%)
1 Social Class 50.00 50.68 50.05 57.04 56.89 96.50
2 House Age 50.00 49.20 53.60 56.69 57.41 100.00
3 Electric Cooker 50.00 51.19 50.00 52.28 54.06 99.91
4 Electric Heater 50.00 50.72 50.00 50.01 50.01 99.91
5 House Floor Area 50.00 48.56 50.00 50.00 50.00 42.49
6 Children 50.00 50.32 50.00 53.45 53.25 100.00
7 Resident Number 50.00 48.22 50.00 54.26 54.35 100.00
8 Residents in Daytime 50.00 51.77 56.93 61.99 62.51 100.00
9 Retirement 50.00 49.35 50.00 52.12 53.47 100.00
10 Age 50.00 51.69 50.00 50.18 50.22 100.00
11 Detached House 50.00 51.29 50.00 50.00 50.00 99.91
12 Number of Bedrooms 50.00 48.89 50.00 49.97 49.97 99.91
13 Tumble Dryer 50.00 51.90 56.33 60.07 59.42 100.00
14 Dishwasher 50.00 51.27 56.42 61.00 60.69 100.00
15 Stand alone freezer 50.00 49.96 56.92 63.62 64.10 100.00
16 Education 50.00 49.80 50.00 50.00 50.00 94.52
17 Income 50.00 49.60 50.56 51.79 53.98 45.60
- Average (Mean) 50.00 50.26 51.81 54.38 54.72 92.87
Next, we compare the performance of "SVC3", which includes all input features including clustering,
consumption gures and ratio features, to "SVC2" which does not contain clustering features. In particular,
"SVC3" performed better than "SVC2" for "House Age", "Electric Cooker", "Residents in Daytime", "Retire-
ment", "Stand Alone Freezer" and "Income". This shows that the addition of clustering features has aided
characteristic predictions for characteristics involving mainly resident occupancy, electric appliances with
frequent usage like electric cooker and freezer, and characteristics concerning the income of the house-
hold like the annual income and the retirement status. Among only these features, the average accuracy
of "SVC3" was 65:75% while that of "SVC2" is only 64:95%. On the other hand, "SVC3" performed worse
than "SVC2" for "Social Class", "Children", "Tumble Dryer" and "Dishwasher" feature and did not see im-
provements for features like "Age" and "Education", showing that clustering features did not provide much
information about personal characteristics involving the main income earner. Among only these features,
the average accuracy of "SVC3" was 69:07% while that of "SVC2" was 69:35%. The dierence is smaller
143
Table 6.11: Classication accuracy for each input feature conguration with NN predictor compared
against baselines
No Feature Identical Biased NN1 NN2 NN3 MissRate(%)
1 Social Class 60.92 53.09 60.48 62.14 62.83 96.50
2 House Age 56.24 49.67 55.01 58.22 58.41 100.00
3 Electric Cooker 74.79 62.58 74.79 76.02 76.11 99.91
4 Electric Heater 69.87 58.04 69.87 69.87 69.73 99.91
5 House Floor Area 96.55 94.10 96.55 96.55 96.55 42.49
6 Children 71.31 56.00 71.31 71.50 72.02 100.00
7 Resident Number 71.83 59.17 71.83 72.40 72.35 100.00
8 Residents in Daytime 65.08 54.54 68.19 73.16 73.49 100.00
9 Retirement 68.86 57.56 68.86 69.23 68.95 100.00
10 Age 77.08 64.13 77.08 77.41 77.17 100.00
11 Detached House 83.40 72.71 83.40 83.40 83.40 99.91
12 Number of Bedrooms 90.73 81.79 90.73 90.73 90.73 99.91
13 Tumble Dryer 66.97 56.00 69.38 71.41 71.46 100.00
14 Dishwasher 66.21 53.78 68.81 71.08 71.12 100.00
15 Stand alone freezer 50.43 48.91 58.03 63.23 63.23 100.00
16 Education 86.65 77.00 86.65 86.65 86.65 94.52
17 Income 50.67 48.81 50.67 50.67 50.67 45.60
- Average (Mean) 71.03 61.64 71.86 73.16 73.23 92.87
compared to the average accuracy on features where "SVC3" performs better. On average, we could see
that "SVC3" has the highest classication accuracy of 73:35% among all features compared to "SVC2" with
73:14%.
Table 6.11 and 6.12 shows the classication accuracy and balanced accuracy of using NN as the pre-
diction model instead of SVC. Similar to SVC, "NN3" which used all input features performed better than
"NN1" and "NN2" which only used either cluster features or statistical features. The overall accuracy of
"NN3" however was lower than "SVC3". This was compensated by the fact that "NN3" was consistently
performing better than "NN2" in most features. "NN3" only had a lower classication accuracy than "NN2"
in "Electric Heater", "Resident Number", "Retirement", and "Age". On the other hand, "NN3" had higher
classication accuracy than "NN2" in "Social Class", "House Age", "Electric Cooker", "Children", "Residents
in Daytime", "Tumble Dryer", and "Dishwasher". For the remaining features, no improvements can be seen
using an NN prediction model compared to the "Identical" baseline.
144
Table 6.12: Balanced accuracy for each input feature conguration with NN predictor compared against
baselines
No Feature Identical Biased NN1 NN2 NN3 MissRate(%)
1 Social Class 50.00 50.85 52.63 60.37 61.43 96.50
2 House Age 50.00 49.24 53.29 57.86 58.11 100.00
3 Electric Cooker 50.00 48.37 50.00 54.24 54.80 99.91
4 Electric Heater 50.00 49.84 50.00 50.62 50.08 99.91
5 House Floor Area 50.00 53.40 50.00 50.00 50.00 42.49
6 Children 50.00 47.59 50.00 56.14 56.16 100.00
7 Resident Number 50.00 49.50 50.00 57.89 57.96 100.00
8 Residents in Daytime 50.00 49.21 56.13 64.52 65.27 100.00
9 Retirement 50.00 49.74 50.00 53.06 53.06 100.00
10 Age 50.00 49.57 50.00 51.95 51.87 100.00
11 Detached House 50.00 50.09 50.00 50.00 50.00 99.91
12 Number of Bedrooms 50.00 49.20 50.00 50.23 50.00 99.91
13 Tumble Dryer 50.00 49.21 55.53 61.65 61.76 100.00
14 Dishwasher 50.00 48.25 56.17 62.72 62.89 100.00
15 Stand alone freezer 50.00 48.91 58.05 63.21 63.22 100.00
16 Education 50.00 50.45 50.00 50.00 50.00 94.52
17 Income 50.00 48.74 50.00 50.00 50.00 45.60
- Average (Mean) 50.00 49.54 51.87 55.56 55.68 92.87
145
Chapter7
ConclusionandFutureWork
In this chapter, we will provide a conclusion to this dissertation by discussing the broader impacts of the
work. We will also discuss future research directions that can be pursued.
7.1 BroaderImpacts
Smart grid has seen a large increase in the penetration of Distributed Energy Resources (DERs) like rooftop
photovoltaics (PVs) and battery storage in recent years. DERs have high volatility and reduce inertia in
the grid. They also decentralize supply in the grid as they replace utility-owned large scale turbine power
generators. This introduces challenges in providing full observability, i.e. full access to knowledge about
measurements in the grid. We envision that high observability is going to become increasingly crucial for
the development and stability of smart grid, as limited access to DER assets information cause problems in
both grid planning and operation. For example, the 2018 Angeles Forest event has showed the vulnerability
of the grid due to a lack of accurate account of all available DER PVs [18].
In this thesis, we proposed two main approaches for improving observability. We introduced a novel
load model Consumer Mixture Model for unsupervised disaggregation of Behind-the-Meter (BTM) solar
generation in smart grid. We also proposed the use of spatial features in solving crucial problems in smart
grid like load forecasting and missing data imputation. As shown in Chapter 6, the improved observability
146
enhances the results of many related time series analytics. With greater observability of the grid, we could
guarantee a safer grid operation and enable both short-term and long-term grid planning and development.
7.2 FutureDirections
7.2.1 DisaggregationofBTMElectricVehiclesasaDER
In this thesis, we tackled the disaggregation problem for BTM solar and battery storage. However, these
were not the only kinds of BTM DER that could be present. Electric Vehicles (EVs) are becoming more
popular and replacing traditional gasoline-powered vehicles as their costs are lowered and they become
more energy ecient. EVs are generally an energy consumer, as they draw energy from the grid to charge
their internal batteries. In recent decades, researchers have proposed a grid model called Vehicle-to-Grid
(V2G) that allows EVs to act as DERs [46]. This model utilizes the EV battery as a battery storage, it can be
used to store extra generation from BTM PVs, or it can be charged strategically outside of peak hours to
reduce the peak power consumption. Unlike a typical battery storage, using EVs as a storage requires that
it does not aect the customer from using the EV for driving. Some examples of the requirements are that
the EV needs to be fully charged within a certain time limit depending on the customer, and that the EV is
not always present in the household. To facilitate such models, observability of EVs in the grid is important
so that planning can be done to ensure the requirements for EV are satised. To do so, disaggregation of
net load measurements to discover component signals can be used.
To perform this disaggregation, we need to model EV activity with a data-driven model. This approach
is similar to our proposed approach for modelling BTM battery, where we use a function that models the
battery charge state based on the time of the day and the expected charge and discharge patterns of the
battery based on solar generation patterns. A model for EV activity would be more complicated as its
charging activity is fully dependent of the customer behavior. The model would need to capture the usage
147
patterns of the EV, when is it typically driven out, when it is charged, etc. To do so, we can use Hidden
Markov Models as the state of the EV can be treated as a Markov Process.
7.2.2 Attention-basedSTGCN
We used the Spatial-Temporal Graph Convolutional Networks (STGCNs) architecture in this work for
modelling spatial and temporal correlations in time series data in smart grid. While GCN was a powerful
architecture that supports information passing between neighboring nodes, it treated all neighbors with
equal weights.
Graph Attention Networks (GATs) are proposed as a architecture that can adaptively learn the impor-
tance of neighbors [101]. GATs are able to adaptively adjust the weights to take neighboring information.
This is done by utilizing a graph attentional layer that transforms each set of node features into a new set
of node features such that the new features are computed by a weighted linear combination of neighboring
node features. These weights are learned during the training process and represent the degree of correla-
tion between any two nodes. GATs are useful in the context of smart grid time series because the degree
of correlation between two neighboring nodes can vary greatly depending on the actual physical distance
between the nodes, dierence in weather conditions and socio-demographic characteristics. It is possible
to obtain better accuracy on application like load forecasting and missing data imputation by leveraging
GATs.
7.2.3 UtilizingSpatialFeaturesforAnomalyDetection
In this thesis, we explored the use of STGCNs for two important time series analytics in smart grid: Load
forecasting and missing data imputation. These analytics beneted from the use of spatial features because
neighboring nodes could show correlation with each other in smart grid. We did not investigate all possible
148
applications in smart grid that might benet from using spatial features. One such application was anomaly
detection.
Anomaly detection is a kind of time series analytics used to prevent cyber attacks and detection of
fraud and energy theft in smart grid by detecting anomalous measurements [65]. Smart grid meters can
receive cyber attacks that manipulates the energy consumption reading, this can lead to destabilization
of the grid. Manipulation of meter readings can also be used to hide evident of energy fraud or theft.
Thus, it is important to be able to detect outlier meter measurements by anomaly detection. Currently,
many data-driven model approaches have been proposed to model customer behavior and detect outlier
behaviors. STGCNs can be considered as an anomaly detection model that includes spatial features as
input. By comparing the readings from a certain node and its neighboring nodes, it may be possible to
detect unusual consumption patterns.
7.3 ConcludingRemarks
Enabling observability in smart grid will remain an important issue as the smart grid continues to develop.
In addition to the increasing proliferation of solar PVs and batteries, technological developments are also
enabling new kinds of DERs like Electric Vehicles as a storage. The proposed approaches in this thesis are a
rst step towards further improving the observability. We hope that this dissertation inspires the research
community to take interest in tackling future challenges in observability introduced by the proliferation
of DERs. We believe that there are many opportunities for improving observability, and also investigation
of how it can benet time series analytics in smart grid. This will lead to more optimal smart grid devel-
opment in terms of both operation costs and grid safety. Moreover, this can promote green technology
by paving the way for higher volume of DER integration which are mostly renewable energy resources.
These resources are highly volatile and unreliable, high observability will allow us to analyze the maximum
capacity of integrating such resources in the grid.
149
Bibliography
[1] Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, and Teh Ying Wah. “Time-series clustering–A
decade review”. In: Information Systems 53 (2015), pp. 16–38.
[2] Mohamed H Albadi and Ehab F El-Saadany. “A summary of demand response in electricity
markets”. In: Electric power systems research 78.11 (2008), pp. 1989–1996.
[3] Hesham K Alfares and Mohammad Nazeeruddin. “Electric load forecasting: literature survey and
classication of methods”. In: International journal of systems science 33.1 (2002), pp. 23–34.
[4] Syed Saqib Ali and Bong Jun Choi. “State-of-the-Art Articial Intelligence Techniques for
Distributed Smart Grids: A Review”. In: Electronics 9.6 (2020), p. 1030.
[5] Christian Beckel, Leyna Sadamori, and Silvia Santini. “Automatic socio-economic classication of
households using electricity consumption data”. In: Proceedings of the fourth international
conference on Future energy systems. 2013, pp. 75–86.
[6] Donald J Berndt and James Cliord. “Using dynamic time warping to nd patterns in time series.”
In: KDD workshop. Vol. 10. 16. Seattle, WA. 1994, pp. 359–370.
[7] Nate Blair, Aron P Dobos, Janine Freeman, Ty Neises, Michael Wagner, Tom Ferguson,
Paul Gilman, and Steven Janzou. System advisor model, sam 2014.1. 14: General description.
Tech. rep. National Renewable Energy Lab.(NREL), Golden, CO (United States), 2014.
[8] Cruz E Borges, Oihane Kamara-Esteban, Tony Castillo-Calzadilla, Cristina Martin, and
Ainhoa Alonso-Vicario. “Enhancing the missing data imputation of primary substation load
demand records”. In: Sustainable Energy, Grids and Networks (2020), p. 100369.
[9] Fankun Bu, Yuxuan Yuan, Zhaoyu Wang, Kaveh Dehghanpour, and Anne Kimber. “A time-series
distribution test system based on real utility data”. In: 2019 North American Power Symposium
(NAPS). IEEE. 2019, pp. 1–6.
[10] Ervin Ceperic, Vladimir Ceperic, and Adrijan Baric. “A strategy for short-term load forecasting by
support vector regression machines”. In: IEEE Transactions on Power Systems 28.4 (2013),
pp. 4356–4364.
150
[11] CER Smart Metering Project - Electricity Customer Behaviour Trial, 2009-2010. Accessed via the Irish
Social Science Data Archive - www.ucd.ie/issda. 2009-2010.
[12] Charalampos Chelmis, Jahanvi Kolte, and Viktor K Prasanna. “Big data analytics for demand
response: Clustering over space and time”. In: 2015 IEEE International Conference on Big Data (Big
Data). IEEE. 2015, pp. 2223–2232.
[13] Dong Chen and David Irwin. “Sundance: Black-box behind-the-meter solar disaggregation”. In:
ProceedingsoftheEighthInternationalConferenceonFutureEnergySystems. ACM. 2017, pp. 45–55.
[14] Yongbao Chen, Peng Xu, Yiyi Chu, Weilin Li, Yuntao Wu, Lizhou Ni, Yi Bao, and Kun Wang.
“Short-term electrical load forecasting using the Support Vector Regression (SVR) model to
calculate the demand response baseline for oce buildings”. In: Applied Energy 195 (2017),
pp. 659–670.
[15] Chung Ming Cheung, Wen Zhong, Chuanxiu Xiong, Ajitesh Srivastava, Rajgopal Kannan, and
Viktor K Prasanna. “Behind-the-Meter Solar Generation Disaggregation using Consumer Mixture
Models”. In: 2018 IEEE International Conference on Communications, Control, and Computing
Technologies for Smart Grids (SmartGridComm). IEEE. 2018, pp. 1–6.
[16] Michael A Cohen and Duncan S Callaway. “Physical Eects of Distributed PV Generation on
California’s Distribution System”. In: arXiv preprint arXiv:1506.06643 (2015).
[17] Common Functions for Smart Inverter. 2012.url:
http://www.epri.com/abstracts/Pages/ProductAbstract.aspx?ProductId=1026809.
[18] North American Electric Reliability Corporation. “April and May 2018 Fault Induced Solar
Photovoltaic Resource Interruption Disturbances Report”. In: (2018).url:
https://www.nerc.com/pa/rrm/ea/April_May_2018_Fault_Induced_Solar_PV_Resource_Int/April_May_
2018_Solar_PV_Disturbance_Report.pdf.
[19] Corinna Cortes and Vladimir Vapnik. “Support-vector networks”. In: Machine learning 20.3
(1995), pp. 273–297.
[20] Balázs Csanád Csáji et al. “Approximation with articial neural networks”. In: Faculty of Sciences,
Etvs Lornd University, Hungary 24.48 (2001), p. 7.
[21] Data Source: Pecan Street Inc. Dataport [2019].
[22] Xishuang Dong, Lijun Qian, and Lei Huang. “Short-term load forecasting in smart grid: A
combined CNN and K-means clustering approach”. In: 2017 IEEE International Conference on Big
Data and Smart Computing (BigComp). IEEE. 2017, pp. 119–125.
[23] Guo-Feng Fan, Li-Ling Peng, and Wei-Chiang Hong. “Short term load forecasting based on phase
space reconstruction algorithm and bi-square kernel regression model”. In: Applied energy 224
(2018), pp. 13–33.
[24] David J Feldman, O’Shaughnessy Eric, and Robert M Margolis. Q3 2019 / Q4 2019 Solar Industry
Update. Tech. rep. National Renewable Energy Lab.(NREL), Golden, CO (United States), 2019.
151
[25] Global Residential Solar Energy Storage Market Growth 2019-2024. Tech. rep. LP Information Inc,
2019.
[26] L Gondara and K Wang. “Multiple imputation using deep denoising autoencoders. arXiv 2017”.
In: arXiv preprint arXiv:1705.02737 ().
[27] Lovedeep Gondara and Ke Wang. “Mida: Multiple imputation using denoising autoencoders”. In:
Pacic-Asia Conference on Knowledge Discovery and Data Mining. Springer. 2018, pp. 260–272.
[28] Ramon Granell, Colin J Axon, and David CH Wallom. “Clustering disaggregated load proles
using a Dirichlet process mixture model”. In: Energy Conversion and Management 92 (2015),
pp. 507–516.
[29] Aron Habte, Manajit Sengupta, and Anthony Lopez. Evaluation of the National Solar Radiation
Database (NSRDB): 1998-2015. Tech. rep. NREL (National Renewable Energy Laboratory (NREL),
Golden, CO (United States)), 2017.
[30] Hassan Haes Alhelou, Mohamad Esmail Hamedani-Golshan, Takawira Cuthbert Njenda, and
Pierluigi Siano. “A survey on power system blackout and cascading events: Research motivations
and challenges”. In: Energies 12.4 (2019), p. 682.
[31] David G Hart. “Using AMI to realize the Smart Grid”. In: 2008 IEEE Power and Energy Society
General Meeting-Conversion and Delivery of Electrical Energy in the 21st Century. Vol. 10. sn. 2008.
[32] Sue Ellen Haupt and Branko Kosovic. “Big data and machine learning for applied weather
forecasts: Forecasting solar power for utility operations”. In: 2015 IEEE Symposium Series on
Computational Intelligence. IEEE. 2015, pp. 496–501.
[33] Siu Lau Ho and Min Xie. “The use of ARIMA models for reliability forecasting and analysis”. In:
Computers & industrial engineering 35.1-2 (1998), pp. 213–216.
[34] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8
(1997), pp. 1735–1780.
[35] Chris Holcomb. “Pecan Street Inc.: A test-bed for NILM”. In: International Workshop on
Non-Intrusive Load Monitoring, Pittsburgh, PA, USA. 2012.
[36] IEEE PES AMPS DSAS Test Feeder Working Group Resources.
http://sites.ieee.org/pes-testfeeders/resources/. Accessed: 2019-03-04.
[37] Electric Power Research Institute. “Meeting the Challenges of Declining System Inertia”. In:
(2019).url: https://www.epri.com/research/products/000000003002015131.
[38] Electric Power Research Institute. OpenDSS electrical power system simulation tool.
http://smartgrid.epri.com/SimulationTool.aspx. Accessed: 2019-03-04.
[39] Electric Power Research Institute. Southern California Edison 2015 Static Load Proles.
https://www.sce.com/regulatory/load-profiles/2015-static-load-profiles. Accessed: 2019-03-04.
152
[40] Samantha A Janko, Michael R Arnold, and Nathan G Johnson. “Implications of high-penetration
renewables for ratepayers and utilities in the residential solar photovoltaic (PV) market”. In:
Applied energy 180 (2016), pp. 37–51.
[41] Zhen Jiang, Di Shi, Xiaobin Guo, Guangyue Xu, Li Yu, and Chaoyang Jing. “Robust smart meter
data analytics using smoothed ALS and dynamic time warping”. In: Energies 11.6 (2018), p. 1401.
[42] Farzana Kabir, Nanpeng Yu, Weixin Yao, Rui Yang, and Yingchen Zhang. “Estimation of
Behind-the-Meter Solar Generation by Integrating Physical with Statistical Models”. In: 2019 IEEE
International Conference on Communications, Control, and Computing Technologies for Smart Grids
(SmartGridComm). IEEE. 2019, pp. 1–6.
[43] Emre C Kara, Ciaran M Roberts, Michaelangelo Tabone, Lilliana Alvarez, Duncan S Callaway, and
Emma M Stewart. “Disaggregating solar generation from feeder-level measurements”. In:
Sustainable Energy, Grids and Networks 13 (2018), pp. 112–121.
[44] Emre Can Kara, Michaelangelo Tabone, Ciaran Roberts, Sila Kiliccote, and Emma M Stewart.
“Estimating behind-the-meter solar generation with existing measurement infrastructure”. In:
Proceedings of the 3rd ACM International Conference on Systems for Energy-Ecient Built
Environments. ACM. 2016, pp. 259–260.
[45] Hessam Kazari, A Abbaspour-Tehrani Fard, AS Dobakhshari, and Ali Mohammad Ranjbar.
“Voltage stability improvement through centralized reactive power management on the Smart
Grid”. In: 2012 IEEE PES Innovative Smart Grid Technologies (ISGT). IEEE. 2012, pp. 1–7.
[46] Willett Kempton and Jasna Tomić. “Vehicle-to-grid power fundamentals: Calculating capacity and
net revenue”. In: Journal of power sources 144.1 (2005), pp. 268–279.
[47] Minkyung Kim, Sangdon Park, Joohyung Lee, Yongjae Joo, and Jun Kyun Choi. “Learning-based
adaptive imputation methodwith kNN algorithm for missing power data”. In: Energies 10.10
(2017), p. 1668.
[48] Thomas N Kipf and Max Welling. “Semi-supervised classication with graph convolutional
networks”. In: arXiv preprint arXiv:1609.02907 (2016).
[49] S. A. Klein and W. A. Beckman. “Review of Solar Radiation Utilizability”. In: Journal of Solar
Energy Engineering 106.4 (Nov. 1984), pp. 393–402.doi: 10.1115/1.3267617.
[50] Daisuke Kodaira and Sekyung Han. “Topology-based estimation of missing smart meter
readings”. In: Energies 11.1 (2018), p. 224.
[51] Weicong Kong, Zhao Yang Dong, Youwei Jia, David J Hill, Yan Xu, and Yuan Zhang. “Short-term
residential load forecasting based on LSTM recurrent neural network”. In: IEEE Transactions on
Smart Grid 10.1 (2017), pp. 841–851.
[52] Sanmukh R Kuppannagari, Rajgopal Kannan, and Viktor K Prasanna. “An ilp based algorithm for
optimal customer selection for demand response in smartgrids”. In: 2015 International Conference
on Computational Science and Computational Intelligence (CSCI). IEEE. 2015, pp. 300–305.
153
[53] Sanmukh R Kuppannagari, Rajgopal Kannan, and Viktor K Prasanna. “Optimal Discrete Net-Load
Balancing in Smart Grids with High PV Penetration”. In: ACM Transactions on Sensor Networks
(TOSN) 14.3-4 (2018), p. 24.
[54] T Warren Liao. “Clustering of time series data - a survey”. In: Pattern recognition 38.11 (2005),
pp. 1857–1874.
[55] Andy Liaw, Matthew Wiener, et al. “Classication and regression by randomForest”. In: R news
2.3 (2002), pp. 18–22.
[56] Bryan Lim and Stefan Zohren. “Time Series Forecasting With Deep Learning: A Survey”. In:arXiv
preprint arXiv:2004.13408 (2020).
[57] King-Ip Lin and Ravikumar Kondadadi. “A similarity-based soft clustering algorithm for
documents”. In: Proceedings Seventh International Conference on Database Systems for Advanced
Applications. DASFAA 2001. IEEE. 2001, pp. 40–47.
[58] You Lin, Jianhui Wang, and Mingjian Cui. “Reconstruction of power system measurements based
on enhanced denoising autoencoder”. In: 2019 IEEE Power & Energy Society General Meeting
(PESGM). IEEE. 2019, pp. 1–5.
[59] Jun Ma, Jack CP Cheng, Feifeng Jiang, Weiwei Chen, Mingzhu Wang, and Chong Zhai. “A
bi-directional missing data imputation scheme based on LSTM and transfer learning for building
energy data”. In: Energy and Buildings (2020), p. 109941.
[60] Mia K Markey, Georgia D Tourassi, Michael Margolis, and David M DeLong. “Impact of missing
data in evaluating articial neural networks trained on complete data”. In: Computers in Biology
and Medicine 36.5 (2006), pp. 516–525.
[61] Gonzalo Mateos and Georgios B Giannakis. “Load curve data cleansing and imputation via
sparsity and low rank”. In: IEEE Transactions on Smart Grid 4.4 (2013), pp. 2347–2355.
[62] Barry A Mather. “Quasi-static time-series test feeder for PV integration analysis on distribution
systems”. In: 2012 IEEE Power and Energy Society General Meeting. IEEE. 2012, pp. 1–8.
[63] Johanna L Mathieu, Phillip N Price, Sila Kiliccote, and Mary Ann Piette. “Quantifying changes in
building electricity use, with application to demand response”. In: IEEE Transactions on Smart
Grid 2.3 (2011), pp. 507–518.
[64] Larry R Medsker and LC Jain. “Recurrent neural networks”. In: Design and Applications 5 (2001).
[65] Ramin Moghaddass and Jianhui Wang. “A hierarchical framework for smart grid anomaly
detection using large-scale smart meter data”. In: IEEE Transactions on Smart Grid 9.6 (2017),
pp. 5820–5830.
[66] Ramyar Rashed Mohassel, Alan Fung, Farah Mohammadi, and Kaamran Raahemifar. “A survey
on advanced metering infrastructure”. In: International Journal of Electrical Power & Energy
Systems 63 (2014), pp. 473–484.
154
[67] Khosrow Moslehi and Ranjit Kumar. “A reliability perspective of the smart grid”. In: IEEE
transactions on smart grid 1.1 (2010), pp. 57–64.
[68] Meinard Müller. “Dynamic time warping”. In: Information retrieval for music and motion (2007),
pp. 69–84.
[69] Binoy B Nair, PK Saravana Kumar, NR Sakthivel, and U Vipin. “Clustering stock price time series
data to generate stock trading recommendations: An empirical study”. In: Expert Systems with
Applications 70 (2017), pp. 20–36.
[70] NNDC Climate Data Online. (accessed March, 2020).url: https://www7.ncdc.noaa.gov/CDO/cdo.
[71] Mehtabhorn Obthong, Nongnuch Tantisantiwong, Watthanasak Jeamwatthanachai, and
Gary Wills. “A survey on machine learning for stock price prediction: algorithms and
techniques”. In: (2020).
[72] Daniel W Otter, Julian R Medina, and Jugal K Kalita. “A survey of the usages of deep learning for
natural language processing”. In: IEEE Transactions on Neural Networks and Learning Systems
(2020).
[73] Alexandre Oudalov, Rachid Cherkaoui, and Antoine Beguin. “Sizing and optimal operation of
battery energy storage system for peak shaving application”. In: 2007 IEEE Lausanne Power Tech.
IEEE. 2007, pp. 621–625.
[74] Ranjan Pal, Charalampos Chelmis, Saima Aman, Marc Frincu, and Viktor Prasanna. Challenge
Online Time Series Clustering For Demand Response A Theory to Break the ‘Curse of
Dimensionality’. Tech. rep. City of Los Angeles Department, CA (United States), 2015.
[75] Ranjan Pal, Charalampos Chelmis, Marc Frincu, and Viktor Prasanna. “Towards dynamic demand
response on ecient consumer grouping algorithmics”. In: IEEE Transactions on Sustainable
Computing 1.1 (2016), pp. 20–34.
[76] John Paparrizos and Luis Gravano. “k-shape: Ecient and accurate clustering of time series”. In:
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM.
2015, pp. 1855–1870.
[77] David Parra, Maciej Swierczynski, Daniel I Stroe, Stuart A Norman, Andreas Abdon,
Jörg Worlitschek, Travis O’Doherty, Lucelia Rodrigues, Mark Gillott, Xiaojin Zhang, et al. “An
interdisciplinary review of energy storage for communities: Challenges and perspectives”. In:
Renewable and Sustainable Energy Reviews 79 (2017), pp. 730–749.
[78] Yayu Peng, Yishen Wang, Xiao Lu, Haifeng Li, Di Shi, Zhiwei Wang, and Jie Li. “Short-term load
forecasting at dierent aggregation levels with predictability analysis”. In: 2019 IEEE Innovative
Smart Grid Technologies-Asia (ISGT Asia). IEEE. 2019, pp. 3385–3390.
[79] Jouni Peppanen, Xiaochen Zhang, Santiago Grijalva, and Matthew J Reno. “Handling bad or
missing smart meter data through advanced data imputation”. In: 2016 IEEE Power & Energy
Society Innovative Smart Grid Technologies Conference (ISGT). IEEE. 2016, pp. 1–5.
155
[80] François Petitjean, Alain Ketterlin, and Pierre Gançarski. “A global averaging method for dynamic
time warping, with applications to clustering”. In: Pattern Recognition 44.3 (2011), pp. 678–693.
[81] Milana Plećaš, Han Xu, and Ivana Kockar. “Integration of energy storage to improve utilisation of
distribution networks with active network management schemes”. In: CIRED-Open Access
Proceedings Journal 2017.1 (2017), pp. 1845–1848.
[82] Elizabeth L Ratnam, Steven R Weller, Christopher M Kellett, and Alan T Murray. “Residential load
and rooftop PV generation: an Australian distribution network dataset”. In: International Journal
of Sustainable Energy 36.8 (2017), pp. 787–806.
[83] Waseem Rawat and Zenghui Wang. “Deep convolutional neural networks for image
classication: A comprehensive review”. In: Neural computation 29.9 (2017), pp. 2352–2449.
[84] Peter J Rousseeuw. “Silhouettes: a graphical aid to the interpretation and validation of cluster
analysis”. In: Journal of computational and applied mathematics 20 (1987), pp. 53–65.
[85] Seunghyoung Ryu, Minsoo Kim, and Hongseok Kim. “Denoising Autoencoder-Based Missing
Value Imputation for Smart Meters”. In: IEEE Access 8 (2020), pp. 40656–40666.
[86] Haşim Sak, Andrew Senior, Kanishka Rao, and Françoise Beaufays. “Fast and accurate recurrent
neural network acoustic models for speech recognition”. In: arXiv preprint arXiv:1507.06947
(2015).
[87] Bernhard Scholkopf, Kah-Kay Sung, Christopher JC Burges, Federico Girosi, Partha Niyogi,
Tomaso Poggio, and Vladimir Vapnik. “Comparing support vector machines with Gaussian
kernels to radial basis function classiers”. In: IEEE transactions on Signal Processing 45.11 (1997),
pp. 2758–2765.
[88] Hanie Sedghi and Edmond Jonckheere. “Statistical structure learning to ensure data integrity in
smart grid”. In: IEEE Transactions on Smart Grid 6.4 (2015), pp. 1924–1933.
[89] Ra Avo Sevlian, Jiafan Yu, Yizheng Liao, Xiao Chen, Yang Weng, Emre Can Kara,
Michelangelo Tabone, Srini Badri, Chin-Woo Tan, David Chassin, et al. “Vader: Visualization and
analytics for distributed energy resources”. In: arXiv preprint arXiv:1708.09473 (2017).
[90] Hamid Shaker, Hamidreza Zareipour, and David Wood. “A data-driven approach for estimating
the power generation of invisible solar sites”. In: IEEE Transactions on Smart Grid 7.5 (2015),
pp. 2466–2476.
[91] Hamid Shaker, Hamidreza Zareipour, and David Wood. “Estimating power generation of invisible
solar sites using publicly available data”. In: IEEE Transactions on Smart Grid 7.5 (2016),
pp. 2456–2465.
[92] Navin Sharma, Pranshu Sharma, David Irwin, and Prashant Shenoy. “Predicting solar generation
from weather forecasts using machine learning”. In: Smart Grid Communications
(SmartGridComm), 2011 IEEE International Conference on. IEEE. 2011, pp. 528–533.
156
[93] Alex Sherstinsky. “Fundamentals of recurrent neural network (rnn) and long short-term memory
(lstm) network”. In: Physica D: Nonlinear Phenomena 404 (2020), p. 132306.
[94] Sarabjit Singh and Rupinderjit Singh. “ARIMA based short term load forecasting for Punjab
region”. In: International Journal of Science and Research 4.6 (2015), p1819–1822.
[95] Fabrizio Sossan, Lorenzo Nespoli, Vasco Medici, and Mario Paolone. “Unsupervised
disaggregation of photovoltaic production from composite power ow measurements of
heterogeneous prosumers”. In: IEEE Transactions on Industrial Informatics 14.9 (2018),
pp. 3904–3913.
[96] Indro Spinelli, Simone Scardapane, and Aurelio Uncini. “Missing data imputation with
adversarially-trained graph convolutional networks”. In: Neural Networks (2020).
[97] Gan Sun, Yang Cong, Dongdong Hou, Huijie Fan, Xiaowei Xu, and Haibin Yu. “Joint household
characteristic prediction via smart meter data”. In: IEEE Transactions on Smart Grid 10.2 (2017),
pp. 1834–1844.
[98] Michaelangelo Tabone, Sila Kiliccote, and Emre Can Kara. “Disaggregating solar generation
behind individual meters in real time”. In: Proceedings of the 5th Conference on Systems for Built
Environments. ACM. 2018, pp. 43–52.
[99] Kalman Tornai, András Oláh, Rajmund Drenyovszki, Lóránt Kovács, István Pinté, and
Janos Levendovszky. “Recurrent neural network based user classication for smart grids”. In:
2017 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT). IEEE.
2017, pp. 1–5.
[100] Akshay SN Uttama Nambi, Antonio Reyes Lua, and Venkatesha R Prasad. “Loced: Location-aware
energy disaggregation framework”. In: Proceedings of the 2nd ACM International Conference on
Embedded Systems for Energy-Ecient Built Environments. ACM. 2015, pp. 45–54.
[101] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and
Yoshua Bengio. “Graph attention networks”. In: arXiv preprint arXiv:1710.10903 (2017).
[102] Joaquim L Viegas, Susana M Vieira, and João MC Sousa. “Fuzzy clustering and prediction of
electricity demand based on household characteristics”. In: 2015 Conference of the International
Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology
(IFSA-EUSFLAT-15). Atlantis Press. 2015.
[103] Xiaoyang Wang, Yao Ma, Yiqi Wang, Wei Jin, Xin Wang, Jiliang Tang, Caiyan Jia, and Jian Yu.
“Trac Flow Prediction via Spatial Temporal Graph Neural Network”. In: Proceedings of The Web
Conference 2020. 2020, pp. 1082–1092.
[104] Yusen Wang, Wenlong Liao, and Yuqing Chang. “Gated recurrent unit network-based short-term
photovoltaic forecasting”. In: Energies 11.8 (2018), p. 2163.
[105] Christopher KI Williams and Matthias Seeger. “Using the Nyström method to speed up kernel
machines”. In: Advances in neural information processing systems. 2001, pp. 682–688.
157
[106] Matt Wytock and J Zico Kolter. “Contextually Supervised Source Separation with Application to
Energy Disaggregation.” In: AAAI. 2014, pp. 486–492.
[107] Xishuang Dong, Lijun Qian, and Lei Huang. “Short-term load forecasting in smart grid: A
combined CNN and K-means clustering approach”. In: 2017 IEEE International Conference on Big
Data and Smart Computing (BigComp). 2017, pp. 119–125.
[108] Bing Yu, Haoteng Yin, and Zhanxing Zhu. “Spatio-temporal graph convolutional networks: A
deep learning framework for trac forecasting”. In: arXiv preprint arXiv:1709.04875 (2017).
[109] Ahmed Zoha, Alexander Gluhak, Muhammad Ali Imran, and Sutharshan Rajasegarar.
“Non-intrusive load monitoring approaches for disaggregated energy sensing: A survey”. In:
Sensors 12.12 (2012), pp. 16838–16866.
[110] Kasım Zor, Oğuzhan Timur, and Ahmet Teke. “A state-of-the-art review of articial intelligence
techniques for short-term electric load forecasting”. In: 2017 6th International Youth Conference on
Energy (IYCE). IEEE. 2017, pp. 1–7.
158
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Prediction models for dynamic decision making in smart grid
PDF
Adaptive and resilient stream processing on cloud infrastructure
PDF
Cyberinfrastructure management for dynamic data driven applications
PDF
Dynamic graph analytics for cyber systems security applications
PDF
Model-driven situational awareness in large-scale, complex systems
PDF
A function-based methodology for evaluating resilience in smart grids
PDF
Discrete optimization for supply demand matching in smart grids
PDF
Probabilistic data-driven predictive models for energy applications
PDF
Empirical study of informational regularizations in learning useful and interpretable representations
PDF
Modeling and recognition of events from temporal sensor data for energy applications
PDF
Scalable exact inference in probabilistic graphical models on multi-core platforms
PDF
Provenance management for dynamic, distributed and dataflow environments
PDF
Data-driven 3D hair digitization
PDF
A complex event processing framework for fast data management
PDF
Customized data mining objective functions
PDF
Physics-based data-driven inference
PDF
Novel and efficient schemes for security and privacy issues in smart grids
PDF
Failure prediction for rod pump artificial lift systems
PDF
Data and computation redundancy in stream processing applications for improved fault resiliency and real-time performance
PDF
Discovering and querying implicit relationships in semantic data
Asset Metadata
Creator
Cheung, Chung Ming
(author)
Core Title
Data-driven methods for increasing real-time observability in smart distribution grids
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2021-08
Publication Date
07/24/2021
Defense Date
08/10/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
data driven modelling,machine learning,OAI-PMH Harvest,smart grids
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Prasanna, Viktor (
committee chair
), Nakano, Aiichiro (
committee member
), Raghavendra, Cauligi S. (
committee member
)
Creator Email
ccming3@gmail.com,chungmin@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC15620661
Unique identifier
UC15620661
Legacy Identifier
etd-CheungChun-9859
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Cheung, Chung Ming
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
data driven modelling
machine learning
smart grids